PyKX: Top Features for Data Mastery

Input: pyatom = 2 kx.toq(pyatom) Output: pykx.LongAtom(pykx.q('2')) Input: pylist = [1, 2, 3] kx.toq(pylist) Output: pykx.LongVector(pykx.q('1 2 3')) Input: pydict = {'x': [1, 2, 3], 'y': {'x': 3}} kx.toq(pydict) Output: x | 1 2 3 y | (,`x)!,3

Input: conn = kx.SyncQConnection(HOST,PORT) conn('select count i by date from multifeed') Output: date | 2024.05.24 | 20085306 2024.05.25 | 20121897 2024.05.26 | 20499580 2024.05.27 | 20092729 2024.05.28 | 20558866

Input: conn['tab'] = kx.q('([]100?`a`b;100?1f;100?1f)') conn.qsql.select('tab', where = 'x=`a') Output: | x | x1 | x2 0 | a | 0.2032099 | 0.7250709 1 | a | 0.5611439 | 0.9452199 2 | a | 0.8685452 | 0.7092423 3 | a | 0.01221208 | 0.002184472 4 | a | 0.7716917 | 0.06670537

Input: tab = conn('10#select from multifeed where date=2024.05.28') tab.pd() Output: | date | time | market | symbol | qty | price 0 | 2024-05-28 | 2024-05-28 00:00:00.001422921 | c | CFG | 78 | 898.82 1 | 2024-05-28 | 2024-05-28 00:00:00.001422921 | b | BDE | 7 | 352.28 2 | 2024-05-28 | 2024-05-28 00:00:00.001422921 | b | BFG | 51 | 931.12 3 | 2024-05-28 | 2024-05-28 00:00:00.001422921 | c | CDE | 17 | 560.66 4 | 2024-05-28 | 2024-05-28 00:00:00.001422921 | c | CDE | 1 | 397.56 5 | 2024-05-28 | 2024-05-28 00:00:00.001422921 | b | BBC | 1 | 526.93 6 | 2024-05-28 | 2024-05-28 00:00:00.001422921 | b | BDE | 50 | 849.64 7 | 2024-05-28 | 2024-05-28 00:00:00.001422921 | a | ADE | 30 | 611.68 8 | 2024-05-28 | 2024-05-28 00:00:00.001422921 | b | BBC | 94 | 905.93 9 | 2024-05-28 | 2024-05-28 00:00:00.001422921 | c | CBC | 54 | 850.91

3. Streaming Tickerplant:

Kdb+ real-time streaming is extremely performant and straightforward. Using a tickerplant in Python, you can get the Kdb+ performance alongside the broad analytics and onward connectivity of python. KX provides a useful notebook demo (https://code.kx.com/pykx/3.0/examples/streaming/index.html) of how PyKX can quickly and easily set-up a simple tick framework using in-built functionality i.e., as opposed to manually setting up each component using only Kdb+. This arguably provides a lower level of entry to setting up and experimenting with Kdb+ streaming, particularly useful for developers with more Python experience than Kdb+.
The in-built kx.PyKXReimport() command is used to run Python scripts that will create the different components of a kdb framework. In this case a generate_hdb.py script is ran to create a hdb containing dummy data.

with kx.PyKXReimport(): db = subprocess.Popen( ['python', 'generate_hdb.py', '--datapoints', '100000', '--days', '5', '--name', 'db'], stdin=subprocess.PIPE, stdout=None, stderr=None, ) rc = db.wait() if rc !=0: db.stdin.close() db.kill() raise Exception('Generating HDB failed') else: db.stdin.close() db.kill()

trade = kx.schema.builder({ 'time': kx.TimespanAtom , 'sym': kx.SymbolAtom, 'exchange': kx.SymbolAtom, 'sz': kx.LongAtom, 'px': kx.FloatAtom}) quote = kx.schema.builder({ 'time': kx.TimespanAtom , 'sym': kx.SymbolAtom, 'exchange': kx.SymbolAtom, 'bid': kx.FloatAtom, 'ask': kx.FloatAtom , 'bidsz': kx.LongAtom, 'asksz': kx.LongAtom})

simple = kx.tick.BASIC( tables = {'trade': trade, 'quote': quote, 'aggregate': aggregate}, ports={'tickerplant': 5010, 'rdb': 5013, 'hdb': 5011}, log_directory = 'log', database = 'db' ) simple.start()

Input: %%q --port 1800 select count i by date from multifeed select count i from multifeed where date=2024.05.28 Output: date | x -----------| -------- 2024.05.24 | 20085306 2024.05.25 | 20121897 2024.05.26 | 20499580 2024.05.27 | 20092729 2024.05.28 | 20558866 x -------- 20558866

# Set environment variables needed to run Steamlit integration import os os.environ['PYKX_BETA_FEATURES'] = 'true' # This is optional but suggested as without it's usage caching # is not supported within streamlit os.environ['PYKX_THREADING'] = 'true' import streamlit as st import pykx as kx import matplotlib.pyplot as plt def main(): st.header('PyKX Demonstration') connection = st.connection('pykx', type=kx.streamlit.PyKXConnection, port=5013) if connection.is_healthy(): tab = connection.query('10#select from trade') tab = tab.pd() else: try: connection.reset() except BaseException: raise kx.QError('Connection object was not deemed to be healthy') fig, x = plt.subplots() x.scatter(tab['sz'], tab['px']) st.write('Queried kdb+ remote table') st.write(tab) st.write('Generated plot') st.pyplot(fig) if __name__ == "__main__": try: main() finally: kx.shutdown_thread()

6. Summary

PyKX does a fantastic job of providing Python-first developers a seamless low level of entry to Kdb. Having its own built-in Kdb memory space, and conversion functions for Python and Kdb datatypes, makes it an ideal sandbox for experimenting with and discovering faster and better ways of processing/analysing large amounts of data than exclusively using Python. Having the tickerplant framework (one of the biggest attractions of Kdb) boiled down to a small number of user-friendly scripts and functions is a great way of additionally introducing Python developers to real-time analytics using both Python and Kdb functionality.

With Python being so widely adopted and supported, interoperability between it and Kdb opens the floodgates for Kdb developers to more easily visualise and analyse data and present findings (as seen with the Streamlit integration section).

It’s worth reminding that this documentation is far from an exhaustive list of the capabilities of PyKX, and other topics make great candidates for future blogs E.g., Database creation and management, direct manipulation of HDB tables, multi-threaded execution.