ARK 6: Security & Access Controls

Blog ARK 24 Jul 2024

Jemma Creighton

Introduction

Think of your kdb+ application as home sweet home. Much like we ensure stringent security on our homes, we want to do the same to our kdb+ systems; keeping an eye on who comes and goes, and knowing what they’re doing when they’re here.

The security of a kdb+ system can be broken down into three main areas:

  • Authentication/authorization
  • Entitlements
  • Data Encryption

If your application lacks any of the above, we recommend also performing an audit exercise in addition to implementing authentication and entitlements.

Auditing Access

All activities on kdb+ processes should be logged, both read/write and synchronous/asynchronous. This will give application developers a transparent view on who exactly is using their data, and how it is being used/queried.

  • The .z.po/.z.wo and .z.pc/.z.wc message handlers give control over what happens after a connection has been successfully established to, or closed from, the kdb+ process.
    • This could include updating an access table of users accessing the process, their IP addresses, timestamps (open and close) and handle – all using .z variables e.g. .z.u, .z.a, .z.p and .z.w.
  • The message event handlers .z.pg, .z.ps and .z.ws allow requests to be inspected; again these could be logged alongside the username, IP address and timestamp of request. A more advanced version may also include whether the request was permitted (more on that under the Entitlements section) and how long it took to serve the request.

Simon Garland’s dotz.q provides example usage of how to implement similar to the above, and serves as the base of TorQ’s usage logging functionality. Note that any access tracking tables generated from message handlers should be saved to disk and retained for auditability purposes.

Authentication and Authorization

Authentication is the process of verifying who a user is, while authorization verifies the access that a user has. These are typically done at the same time in a kdb+ process, and so can be used interchangeably.

Client authentication

Command line parameters

The most basic form of authentication/authorization are the the -u/U command line parameters passed upon starting a kdb+ process.

  • -u [1|passwordfile.txt] – this enforces restrictions on file system commands and/or username/password combinations
  • -U passwordfile.txt – slightly less restrictive than the above, enforcing username and password checks, but not file system restrictions

These arguments require a password file to be supplied containing approved credentials. The passwords can be hashed to protect their secrecy using either md5 or sha1 hash (the latter since kdb+ v4.0).

This is a good start but relies on the existence and maintenance of a password file on the application host, which is an obvious disadvantage.

Password protection message handler

.z.pw is a message handler intended to validate users accessing a kdb+ process. This is evaluated after any -u/U validations, if they exist.

In its most basic form, this takes two arguments (username and password), performs some authentication logic of our choosing, and should return a boolean atom dictating if the user is allowed access to the process.

.z.pw:{[u;p] doComplexAuth[u;p]};

What might doComplexAuth look like?

A lot of organizations use some form of LDAP server, e.g. Active Directory. Instead of our kdb+ application managing credentials, we can pass the input usernames/passwords over to the LDAP server, which in turn will return whether these are correct.

LDAP authentication

This means that users and their passwords can be centrally managed, and groups can also be created to assist with access management.

A more advanced system may involve Single Sign On (SSO) flows, such as OAuth2 or SAML. These are token-based; an example is provided from Auth0:

alt text

The flows in this diagram concern both a regular web application and an API. Since kdb+ is very flexible it can be either of these; as a web application it can manage the authentication flow itself, or it can act as the API piece. The major advantage of the latter is that our kdb+ application never sees the password, only the token; the identity management aspect is completely off-loaded.

IPC authentication

Authenticating other kdb+ processes tends to be a bit trickier; in any secure system, the process doing the authentication needs to know a secret e.g. a password or token.

How do we store those secrets? The general consensus is:

  • Not in version control
  • Injected to codebase on build/compile (not a common solution, given that q is not a compiled language)
  • Encrypted on-disk (in a separate location to the code-base) and decrypted at runtime

Third party secret storage tools could be used here; an example is HashiCorp’s vaultproject.

An alternative to storing secrets is using IP allow-lists; that is, blindly accepting connections from a known IP address. This has its own disadvantages (namely, malicious users gaining access to one of the IP addresses on the allow-list).

Entitlements

Once our users have gained access to the kdb+ application, entitlements control what exactly the users have access to.

The solution in kdb+ lies within our IPC message handlers; .z.p* and .z.ws. These allow every inbound request to be interrogated and will determine whether the request should be executed.

A few tips from us:

  • Centralize control; ensure users do not have direct access to underlying RDB/HDB/CEP processes and use a gateway-led approach where possible
  • Restrict requests to pre-defined function calls, i.e. a request of a list form, where the first element is the function to call and the remaining elements are the parameters to call it with
  • Consider if the -b command line parameter to ensure read-only access to processes is appropriate for your use case
  • Use reval for a more flexible read-only approach; this could be used in conjunction with .z.pg/ .z.ps to ensure read-only evaluation of a request
  • Close down any interfaces that you don’t want to support e.g. lock down web browser access by modifying .z.ph

Entitlements in conjunction with user groups

Remember we noted above that LDAP servers such as Active Directory may provide groups to assist with access management? This adds extra potential to our entitlements strategy, and allows the opportunity to define user “classes” to handle table-, row- and column-level control.

Data Encryption

There are two aspects to encryption;

  1. Data in-transit encryption: where data is moving between two processes
  2. Data at-rest encryption: where data is stored, either on-disk or in-memory

Data In-Transit Encryption

kdb+ v3.4 added support for SSL/TLS encrypted connections. This is where two kdb+ processes are set up with certificates to enable encrypted data transfer between them using OpenSSL libraries.

However, with encrypted connections come significant overheads; opening a connection is 40-50x slower, and data transfer itself is approximately 1.5 times slower.

To best manage this overhead, it’s worth asking:

  • Do all connections need to be encrypted?
  • Does all data need to be encrypted? What data is sensitive?
    • Data moving externally, i.e. outside a firewall, must be encrypted
    • Market data may not required encryption, whereas internal execution data likely will
  • Is there a way to modify the architecture to reduce the number of connections that need to be encrypted?
    • Localhost connections typically don’t need encryption

Note that setting up a TLS certificate on a kdb+ process does not dictate that all connections must be encrypted; kdb+ processes are configurable to accept both plain text and encrypted connections using the -E command line parameter. Only encrypt what you need.

Data At-Rest Encryption

Data is considered to “rest” both on-disk and in-memory. For transparency, in-memory data encryption is not currently possible.

Historically, data on-disk was encrypted using full disk encryption (FDE). This does exactly what it says on the tin and is managed by the file system. For an extra degree of flexibility, kdb+ v4.0 added transparent disk encryption (TDE).

With TDE, kdb+ manages both the encryption and the decryption. This has a number of advantages:

  • Separation of responsibilities: the database owner (kdb+ application) is the “keeper of the keys” instead of the hardware
  • We can be much more selective about what we want to encrypt, but also what we want to decrypt, i.e. we may choose to only decrypt a subset of our encrypted data
  • Decryption will be carried out after any data transfer; the encrypted data is portable across file systems, platforms, etc.

FDE vs TDE

The above image illustrates the last advantage. With FDE, the data is decrypted at the storage side and then passed to the kdb+ application. With TDE, the data remains encrypted during transfer and is only decrypted on the application end.

This advantage can also be applied to a tiered storage system. For example, consider an application that stores the last weeks’ worth of data (“hot” data) on a fast SSD, with anything older transferred to a slower disk. With FDE, this process involves decrypting, transferring, then re-encrypting the data. With TDE, the data remains encrypted across both file systems and during the transfer – a more efficient operation.

Similar to in-transit encryption, data at-rest encryption comes with an overhead. KX recommend using AES-NI capable CPUs for maximum performance; with these chipsets, the overhead is a small percentage with compressed data. For non-compressed data, overheads will be higher.

Conclusion

This has been an initial discussion of the elements that make up best practices; the list is non-exhaustive and client systems may vary on their approach. An application with a truly mature sense of security and access controls will consider all of the above in some shape or form.

If you need any help with auditing your existing application’s access control, please get in touch with the team: info@dataintellect.com.

Stay tuned for the next instalment of our ARK blog series!

Share this:

LET'S CHAT ABOUT YOUR PROJECT.

GET IN TOUCH