Matt Hughes
Recently quite a few projects at AquaQ have involved the manipulation of large numeric vectors in kdb+, and passing this data to and from external applications or tools written in C or C++. A practical example would relate to the use of machine learning libraries written in C++ that require output to be written to a kdb+ tickerplant or database. These applications tend to start out as simple python scripts using embedPy, however as use of the application increases, streamlining the calculations becomes a priority, and the code base is then migrated to lower level languages that can offer paths to significant optimisations.
At a practical level these libraries usually require the construction or interpretation of kdb+ data structures such as tables which in turn consist primarily of vectors of numeric data or lists of atoms. This blog discusses the use of a library created by one of AquaQ’s senior developers, which provides a general method for translating data to and from kdb+, against a basic approach which uses little of the later features available in C++.
The set of benchmarks in this discussion gives an idea of the relative performance for creating vectors and atoms via the two approaches outlined above. It also highlights some performance bottle necks that can occur if vectors are constructed for kdb+ without thought.
The output below shows a simple example of how the kdb template library, available here, can be used for translating kdb+ data to and from native objects. In the example we transform a basic integer, a simple list and essentially generate a method for translating a struct. Various other utility methods are provided but these simply extend the methods outlined below.
(base) kdb@homer$ cat src/wrapper.cpp
#define KXVER 3
#include "k.h"
#include "kdb.hpp"
struct instr {
kdb::type::atom_float price;
kdb::type::atom_long quantity;
};
KDB_REGISTER_TYPE(instr,&instr::price,&instr::quantity)
extern "C"
K int_eg( K x)
{
long long j;
kdb::convert::to_native(x,j);
return kdb::convert::from_native(1+j);
}
extern "C"
K list_eg( K x)
{
kdb::type::list_long l;
kdb::convert::to_native(x,l);
return kdb::convert::from_native(l);
}
extern "C"
K struct_eg( K x)
{
long long j;
kdb::convert::to_native(x,j);
instr data = { .price =j+0.0 , .quantity =j };
return kdb::convert::from_native(data);
}
(base) kdb@homer$ #############################################################
(base) kdb@homer$ c++ -fPIC -shared src/wrapper.cpp -Iinclude/ -o out.so -march=native -std=gnu++1z
(base) kdb@homer$ ############################################################
(base) kdb@homer$ cat q/eg.q
lib:`:./out
`. upsert n!{[x]ld:lib 2:(x;1)}'[n:`int_eg`list_eg`struct_eg];
int_eg[1]
list_eg[1 2 3]
struct_eg[2]
(base) kdb@homer$ ############################################################
(base) kdb@homer$ q q/eg.q
KDB+ 4.0 2020.07.15 Copyright (C) 1993-2020 Kx Systems
2
1 2 3
(2f;2)
q)\\
kdb@homer$ ############################################################
The benchmarks are run and timings are compared when creating integer vectors of length 8, 64, 512, 4094, 32768, 262144 and 524288. The gcc 7.5 compiler was used in conjunction with cmake version 3.18.2. (Exact compile lines can be ascertained from the make files generated when running the project.) The benchmarks used in the discussion below where created on an Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz processor with 128Gb of RAM available running on a vanilla install of Ubuntu 18.04.4. The tests given below have been tailored from the referenced repository for this blog, however each test can be found by referencing the history of that benchmark folder. A brief description of each benchmark class is given below.
Eg The benchmark name ‘CreatingListFromVectorMANUAL/8’ relates to creating a kdb+ vector of length 8 using the kI accessor and initialising the list to the correct length at runtime.
The timings for each of the different approaches outlined above are given in the table below along with the number of iterations taken to achieve a stable result. Please note that the timings given here should be interpreted relative to one another, as the actual performance will depend significantly on a number of environment factors such as hardware and exact version of software used in the compilation process.
Benchmark | Time (ns) | CPU (ns) | Iterations |
CreatingAtomMANUAL | 6.96 | 6.96 | 101361947 |
CreatingAtomTEMPLATE | 6.47 | 6.47 | 103966622 |
CreatingAtomVARIANTTEMPLATE | 6.90 | 6.90 | 101639270 |
CreatingListFromVectorMANUAL/8 | 18.2 | 18.2 | 40834575 |
CreatingListFromVectorMANUAL/64 | 25.4 | 25.4 | 28442063 |
CreatingListFromVectorMANUAL/512 | 35.9 | 35.4 | 18768147 |
CreatingListFromVectorMANUAL/4096 | 253 | 252 | 2847602 |
CreatingListFromVectorMANUAL/32768 | 3556 | 3551 | 203369 |
CreatingListFromVectorMANUAL/262144 | 78963 | 78896 | 9032 |
CreatingListFromVectorMANUAL/524288 | 160582 | 159805 | 4241 |
CreatingListFromVectorMEMCOPY/8 | 15.7 | 15.6 | 46486776 |
CreatingListFromVectorMEMCOPY/64 | 19.8 | 19.8 | 36579450 |
CreatingListFromVectorMEMCOPY/512 | 34.2 | 34.2 | 20408479 |
CreatingListFromVectorMEMCOPY/4096 | 159 | 159 | 4401796 |
CreatingListFromVectorMEMCOPY/32768 | 3710 | 3706 | 135237 |
CreatingListFromVectorMEMCOPY/262144 | 54093 | 5405 | 12888 |
CreatingListFromVectorMEMCOPY/524288 | 157126 | 157119 | 4576 |
CreatingListFromVectorAPPEND/8 | 141 | 141 | 4948300 |
CreatingListFromVectorAPPEND/64 | 922 | 922 | 786445 |
CreatingListFromVectorAPPEND/512 | 6068 | 6068 | 115970 |
CreatingListFromVectorAPPEND/4096 | 47058 | 47056 | 14887 |
CreatingListFromVectorAPPEND/32768 | 374661 | 374644 | 1866 |
CreatingListFromVectorAPPEND/262144 | 3067844 | 3067702 | 229 |
CreatingListFromVectorAPPEND/524288 | 6236505 | 6236208 | 114 |
CreatingListFromVectorTEMPLATE/8 | 16.7 | 16.7 | 42040040 |
CreatingListFromVectorTEMPLATE/64 | 26.2 | 26.2 | 26715034 |
CreatingListFromVectorTEMPLATE/512 | 35.0 | 35.0 | 19925226 |
CreatingListFromVectorTEMPLATE/4096 | 255 | 255 | 2761251 |
CreatingListFromVectorTEMPLATE/32768 | 3503 | 3499 | 200112 |
CreatingListFromVectorTEMPLATE/262144 | 78530 | 78445 | 8930 |
CreatingListFromVectorTEMPLATE/524288 | 158415 | 158219 | 4426 |
CreatingStructMANUAL | 50.7 | 50.7 | 13800423 |
CreatingStructTEMPLATE | 50.4 | 50.3 | 13904076 |
The above graph compares the length of time taken by the four benchmarks to create a kdb+ integer vector of length 64.
It is clear from the benchmarks produced that the manner in which the kdb+ vector objects are created has a significant effect on overall performance of any application where translation of data to kdb+ objects is a major component. At a simple level two orders of magnitude separate the fastest and slowest approaches. This is clearly demonstrated in the benchmarks relating to “APPEND”, which use the ja utility compared to the MEMCOPY and MANUAL benchmarks which use ktn utility to allocate memory for the kdb+ list initially.
It is interesting to see just how close the timings for using memcopy and the kx supplied accesor method kI are. Over the vector range chosen the timings are usually within 10{e673f69332cd905c29729b47ae3366d39dce868d0ab3fb1859a79a424737f2bd} of one another. This would suggest that the compiler was able to translate these operations into relatively similar machine code at compile time.
Somewhat surprisingly the template approach, while also producing more aesthetically pleasing (and less) code, is able to run as quickly as the custom MANUAL methods.This would suggest that the TEMPLATE and MANUAL approaches end up almost identical at a machine code level. The top level code behind these benchmarks highlights how much of the work has been abstracted away from end user via the use of the template utility without any significant loss of performance.
Prior knowledge of vector lengths can be used to efficiently create kdb+ arrays, and when related to tables, columns should be created as close to the final size as possible. I.e. appending individual elements must be avoided where possible. The template approach provided here helps to create a reduced code base, which may be less prone to error for the accomplished and novice C++ programmer alike without making any sacrifices in terms of performance.
Share this: