Jemma Borland
Think of your kdb+ application as home sweet home. Much like we ensure stringent security on our homes, we want to do the same to our kdb+ systems; keeping an eye on who comes and goes, and knowing what they’re doing when they’re here.
The security of a kdb+ system can be broken down into three main areas:
If your application lacks any of the above, we recommend also performing an audit exercise in addition to implementing authentication and entitlements.
All activities on kdb+ processes should be logged, both read/write and synchronous/asynchronous. This will give application developers a transparent view on who exactly is using their data, and how it is being used/queried.
.z.po
/.z.wo
and .z.pc
/.z.wc
message handlers give control over what happens after a connection has been successfully established to, or closed from, the kdb+ process.
.z
variables e.g. .z.u
, .z.a
, .z.p
and .z.w
..z.pg
, .z.ps
and .z.ws
allow requests to be inspected; again these could be logged alongside the username, IP address and timestamp of request. A more advanced version may also include whether the request was permitted (more on that under the Entitlements section) and how long it took to serve the request.Simon Garland’s dotz.q provides example usage of how to implement similar to the above, and serves as the base of TorQ’s usage logging functionality. Note that any access tracking tables generated from message handlers should be saved to disk and retained for auditability purposes.
Authentication is the process of verifying who a user is, while authorization verifies the access that a user has. These are typically done at the same time in a kdb+ process, and so can be used interchangeably.
The most basic form of authentication/authorization are the the -u/U
command line parameters passed upon starting a kdb+ process.
-u [1|passwordfile.txt]
– this enforces restrictions on file system commands and/or username/password combinations-U passwordfile.txt
– slightly less restrictive than the above, enforcing username and password checks, but not file system restrictionsThese arguments require a password file to be supplied containing approved credentials. The passwords can be hashed to protect their secrecy using either md5 or sha1 hash (the latter since kdb+ v4.0).
This is a good start but relies on the existence and maintenance of a password file on the application host, which is an obvious disadvantage.
.z.pw
is a message handler intended to validate users accessing a kdb+ process. This is evaluated after any -u/U
validations, if they exist.
In its most basic form, this takes two arguments (username and password), performs some authentication logic of our choosing, and should return a boolean atom dictating if the user is allowed access to the process.
.z.pw:{[u;p] doComplexAuth[u;p]};
What might doComplexAuth look like?
A lot of organizations use some form of LDAP server, e.g. Active Directory. Instead of our kdb+ application managing credentials, we can pass the input usernames/passwords over to the LDAP server, which in turn will return whether these are correct.
This means that users and their passwords can be centrally managed, and groups can also be created to assist with access management.
A more advanced system may involve Single Sign On (SSO) flows, such as OAuth2 or SAML. These are token-based; an example is provided from Auth0:
The flows in this diagram concern both a regular web application and an API. Since kdb+ is very flexible it can be either of these; as a web application it can manage the authentication flow itself, or it can act as the API piece. The major advantage of the latter is that our kdb+ application never sees the password, only the token; the identity management aspect is completely off-loaded.
Authenticating other kdb+ processes tends to be a bit trickier; in any secure system, the process doing the authentication needs to know a secret e.g. a password or token.
How do we store those secrets? The general consensus is:
Third party secret storage tools could be used here; an example is HashiCorp’s vaultproject.
An alternative to storing secrets is using IP allow-lists; that is, blindly accepting connections from a known IP address. This has its own disadvantages (namely, malicious users gaining access to one of the IP addresses on the allow-list).
Once our users have gained access to the kdb+ application, entitlements control what exactly the users have access to.
The solution in kdb+ lies within our IPC message handlers; .z.p*
and .z.ws
. These allow every inbound request to be interrogated and will determine whether the request should be executed.
A few tips from us:
-b
command line parameter to ensure read-only access to processes is appropriate for your use casereval
for a more flexible read-only approach; this could be used in conjunction with .z.pg
/ .z.ps
to ensure read-only evaluation of a request.z.ph
Remember we noted above that LDAP servers such as Active Directory may provide groups to assist with access management? This adds extra potential to our entitlements strategy, and allows the opportunity to define user “classes” to handle table-, row- and column-level control.
There are two aspects to encryption;
kdb+ v3.4 added support for SSL/TLS encrypted connections. This is where two kdb+ processes are set up with certificates to enable encrypted data transfer between them using OpenSSL libraries.
However, with encrypted connections come significant overheads; opening a connection is 40-50x slower, and data transfer itself is approximately 1.5 times slower.
To best manage this overhead, it’s worth asking:
Note that setting up a TLS certificate on a kdb+ process does not dictate that all connections must be encrypted; kdb+ processes are configurable to accept both plain text and encrypted connections using the -E
command line parameter. Only encrypt what you need.
Data is considered to “rest” both on-disk and in-memory. For transparency, in-memory data encryption is not currently possible.
Historically, data on-disk was encrypted using full disk encryption (FDE). This does exactly what it says on the tin and is managed by the file system. For an extra degree of flexibility, kdb+ v4.0 added transparent disk encryption (TDE).
With TDE, kdb+ manages both the encryption and the decryption. This has a number of advantages:
The above image illustrates the last advantage. With FDE, the data is decrypted at the storage side and then passed to the kdb+ application. With TDE, the data remains encrypted during transfer and is only decrypted on the application end.
This advantage can also be applied to a tiered storage system. For example, consider an application that stores the last weeks’ worth of data (“hot” data) on a fast SSD, with anything older transferred to a slower disk. With FDE, this process involves decrypting, transferring, then re-encrypting the data. With TDE, the data remains encrypted across both file systems and during the transfer – a more efficient operation.
Similar to in-transit encryption, data at-rest encryption comes with an overhead. KX recommend using AES-NI capable CPUs for maximum performance; with these chipsets, the overhead is a small percentage with compressed data. For non-compressed data, overheads will be higher.
This has been an initial discussion of the elements that make up best practices; the list is non-exhaustive and client systems may vary on their approach. An application with a truly mature sense of security and access controls will consider all of the above in some shape or form.
If you need any help with auditing your existing application’s access control, please get in touch with the team: info@dataintellect.com.
Stay tuned for the next instalment of our ARK blog series!
Share this: