A C++ Utopia – Some comments on interfacing kdb+ and C++

Blog Data Analytics & Engineering 11 Feb 2021

Matt Hughes

Recently quite a few projects at AquaQ have involved the manipulation of large numeric vectors in kdb+, and passing this data to and from external applications or tools written in C or C++. A practical example would relate to the use of machine learning libraries written in C++ that require output to be written to a kdb+ tickerplant or database. These applications tend to start out as simple python scripts using embedPy, however as use of the application increases, streamlining the calculations becomes a priority, and the code base is then migrated to lower level languages that can offer paths to significant optimisations.

At a practical level these libraries usually require the construction or interpretation of kdb+ data structures such as tables which in turn consist primarily of vectors of numeric data or lists of atoms. This blog discusses the use of a library created by one of AquaQ’s senior developers, which provides a general method for translating data to and from kdb+, against a basic approach which uses little of the later features available in C++.

The set of benchmarks in this discussion gives an idea of the relative performance for creating vectors and atoms via the two approaches outlined above. It also highlights some performance bottle necks that can occur if vectors are constructed for kdb+ without thought.

Template header file usage

The output below shows a simple example of how the kdb template library, available here, can be used for translating kdb+ data to and from native objects. In the example we transform a basic integer, a simple list and essentially generate a method for translating a struct. Various other utility methods are provided but these simply extend the methods outlined below.


(base) kdb@homer$ cat src/wrapper.cpp

#define KXVER 3
#include "k.h"
#include "kdb.hpp"


struct instr {
  kdb::type::atom_float price;
  kdb::type::atom_long quantity;
};

KDB_REGISTER_TYPE(instr,&instr::price,&instr::quantity)

extern "C"
K int_eg( K x)
{
 long long  j;
 kdb::convert::to_native(x,j);
 return kdb::convert::from_native(1+j);
 }

extern "C"
K list_eg( K x)
{
 kdb::type::list_long l;
 kdb::convert::to_native(x,l);
 return kdb::convert::from_native(l);
 }

extern "C"
K struct_eg( K x)
{
 long long j;
 kdb::convert::to_native(x,j);
 instr data = { .price =j+0.0 , .quantity =j };
 return kdb::convert::from_native(data);
 }


(base) kdb@homer$ #############################################################
(base) kdb@homer$ c++ -fPIC -shared src/wrapper.cpp -Iinclude/ -o out.so -march=native -std=gnu++1z
(base) kdb@homer$ ############################################################
(base) kdb@homer$ cat q/eg.q
lib:`:./out
`. upsert n!{[x]ld:lib 2:(x;1)}'[n:`int_eg`list_eg`struct_eg];
int_eg[1]
list_eg[1 2 3]
struct_eg[2]
(base) kdb@homer$ ############################################################
(base) kdb@homer$ q q/eg.q
KDB+ 4.0 2020.07.15 Copyright (C) 1993-2020 Kx Systems
2
1 2 3
(2f;2)
q)\\
kdb@homer$ ############################################################

Testbed and Benchmark details

The benchmarks are run and timings are compared when creating integer vectors of length 8, 64, 512, 4094, 32768, 262144 and 524288. The gcc 7.5 compiler was used in conjunction with cmake version 3.18.2. (Exact compile lines can be ascertained from the make files generated when running the project.) The benchmarks used in the discussion below where created on an Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz processor with 128Gb of RAM available running on a vanilla install of Ubuntu 18.04.4. The tests given below have been tailored from the referenced repository for this blog, however each test can be found by referencing the history of that benchmark folder. A brief description of each benchmark class is given below.

  • *MANUAL – Relevant list is created using a simple loop and the kI accessor after being initialised via ktn with the correct size
  • *MEMCOPY – Relevant list is created using a memcpy operation after been initialised via ktn with the correct size
  • *APPEND – Relevant list is initialised with size zero and each element is appended on individually
  • *TEMPLATE – Relevant list is created using a member of the template library

Eg The benchmark name ‘CreatingListFromVectorMANUAL/8’ relates to creating a kdb+ vector of length 8 using the kI accessor and initialising the list to the correct length at runtime.

Benchmark Results

The timings for each of the different approaches outlined above are given in the table below along with the number of iterations taken to achieve a stable result. Please note that the timings given here should be interpreted relative to one another, as the actual performance will depend significantly on a number of environment factors such as hardware and exact version of software used in the compilation process.

Benchmark Time (ns) CPU (ns) Iterations
CreatingAtomMANUAL 6.96 6.96 101361947
CreatingAtomTEMPLATE 6.47 6.47 103966622
CreatingAtomVARIANTTEMPLATE 6.90 6.90 101639270
CreatingListFromVectorMANUAL/8 18.2 18.2 40834575
CreatingListFromVectorMANUAL/64 25.4 25.4 28442063
CreatingListFromVectorMANUAL/512 35.9 35.4 18768147
CreatingListFromVectorMANUAL/4096 253 252 2847602
CreatingListFromVectorMANUAL/32768 3556 3551 203369
CreatingListFromVectorMANUAL/262144 78963 78896 9032
CreatingListFromVectorMANUAL/524288 160582 159805 4241
CreatingListFromVectorMEMCOPY/8 15.7 15.6 46486776
CreatingListFromVectorMEMCOPY/64 19.8 19.8 36579450
CreatingListFromVectorMEMCOPY/512 34.2 34.2 20408479
CreatingListFromVectorMEMCOPY/4096 159 159 4401796
CreatingListFromVectorMEMCOPY/32768 3710 3706 135237
CreatingListFromVectorMEMCOPY/262144 54093 5405 12888
CreatingListFromVectorMEMCOPY/524288 157126 157119 4576
CreatingListFromVectorAPPEND/8 141 141 4948300
CreatingListFromVectorAPPEND/64 922 922 786445
CreatingListFromVectorAPPEND/512 6068 6068 115970
CreatingListFromVectorAPPEND/4096 47058 47056 14887
CreatingListFromVectorAPPEND/32768 374661 374644 1866
CreatingListFromVectorAPPEND/262144 3067844 3067702 229
CreatingListFromVectorAPPEND/524288 6236505 6236208 114
CreatingListFromVectorTEMPLATE/8 16.7 16.7 42040040
CreatingListFromVectorTEMPLATE/64 26.2 26.2 26715034
CreatingListFromVectorTEMPLATE/512 35.0 35.0 19925226
CreatingListFromVectorTEMPLATE/4096 255 255 2761251
CreatingListFromVectorTEMPLATE/32768 3503 3499 200112
CreatingListFromVectorTEMPLATE/262144 78530 78445 8930
CreatingListFromVectorTEMPLATE/524288 158415 158219 4426
CreatingStructMANUAL 50.7 50.7 13800423
CreatingStructTEMPLATE 50.4 50.3 13904076

The above graph compares the length of time taken by the four benchmarks to create a kdb+ integer vector of length 64. 

Observations

It is clear from the benchmarks produced that the manner in which the kdb+ vector objects are created has a significant effect on overall performance of any application where translation of data to kdb+ objects is a major component. At a simple level two orders of magnitude separate the fastest and slowest approaches. This is clearly demonstrated in the benchmarks relating to “APPEND”, which use the ja utility compared to the MEMCOPY and MANUAL benchmarks which use ktn utility to allocate memory for the kdb+ list initially.

It is interesting to see just how close the timings for using memcopy and the kx supplied accesor method kI are. Over the vector range chosen the timings are usually within 10{e673f69332cd905c29729b47ae3366d39dce868d0ab3fb1859a79a424737f2bd} of one another. This would suggest that the compiler was able to translate these operations into relatively similar machine code at compile time.

Somewhat surprisingly the template approach, while also producing more aesthetically pleasing (and less) code, is able to run as quickly as the custom MANUAL methods.This would suggest that the TEMPLATE and MANUAL approaches end up almost identical at a machine code level. The top level code behind these benchmarks highlights how much of the work has been abstracted away from end user via the use of the template utility without any significant loss of performance.

Conclusions

Prior knowledge of vector lengths can be used to efficiently create kdb+ arrays, and when related to tables, columns should be created as close to the final size as possible. I.e. appending individual elements must be avoided where possible. The template approach provided here helps to create a reduced code base, which may be less prone to error for the accomplished and novice C++ programmer alike without making any sacrifices in terms of performance.

Share this:

LET'S CHAT ABOUT YOUR PROJECT.

GET IN TOUCH