Hashing, salting and key stretching in kdb+

Previously we have looked at substitution ciphers in q. Here hashing algorithms will be examined and how their use can help ensure password security.

Password security is often a weak link in hardening systems against intrusion, as can be seen by the many reports of high profile breaches, e.g. linkedin and sony. With 32-bit kdb+ now free for commercial or educational use, it is timely to look at best practices in password security.

A good overview of password security can found at: hashing-security

In this article, password hashing via MD5, SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512 will be looked at. The need for salting passwords via a cryptographically secure random number generator is then introduced, leading to a discussion of key stretching via PBKDF2.

Kdb+ 3.1 2014.05.21 (32 bit) was used for all examples described herein. The code discussed in this article can be found on code.kx

Linking kdb+ to a cryptographic library
There are many freely available cryptography libraries that could be linked to kdb+ in a similar way to the examples to be discussed in this article. Here, openssl will be used. To setup openssl, see: install guide

It is also noted that a kdb+ wrapper around libcurl which is commonly used for secure file transfers (SSL/TLS) using openssl has been released.

In this article a small set of wrapper functions will be used to interface kdb+ and certain openssl functions. The c code for this, qcrypt.c, can be compiled as follows (alter linking part of the compile line to suit particular openssl version, library location)

This produces a shared object qcrypt.so comprising of three functions qrand, hash and pbkdf2 that can then be loaded into kdb+

For details on interfacing kdb+ with C, see ExtendingWithC and InterfacingWithC

Hashes
A cryptographic hash is a one-way function that scrambles an input string. Kdb+ does have a built in hash function, namely md5 (Message-Digest algorithm 5).
However md5 is no longer recommended for serious cryptographic protection due to weaknesses in the algorithm.
As a first test of the kdb+ to openssl interface, the md5 function built into kdb+ will be compared to the corresponding md5 function in openssl

The first argument to the qcrypt ‘hash’ is the input string and the second argument is the hashing algorithm.

A stronger set of hash functions is the SHA group of algorithms – SHA-1, SHA-224, SHA-256, SHA-384 and SHA-512.
These hashes are implemented in openssl. The kdb+ results are shown below

To check the results, there are online calculators that can used

Salt
A hash of a password by itself is not sufficient to guarantee security. User passwords are often chosen insecurely and are amenable to dictionary attacks which compare the hashes of common passwords against the hash of the user password. It is also possible to pre-compute hashes and then check them against stored passwords via rainbow tables.
To illustrate this consider the md5 hash of the string “password123”.

Now if the md5 hash is entered into google, it can quickly be reversed

In order to make such attacks harder and more costly to the attacker, a random salt can be added to the password before it is hashed. As long as a unique salt is generated per password, this means that each password must be attacked individually.
It is important that the salt is produced using a cryptographically secure random number generator. Here the openssl function RAND_bytes will be used. On linux, this uses /dev/urandom or /dev/random as sources of entropy to seed a pseudo-random number generator.
The kdb+ function ‘qrand’ takes a single argument, the desired number of random output bytes

Key Stretching
As has been seen, simply hashing an input password is not enough to stop an attacker. Adding a salt to the password before hashing strengthens security. By then iterating the hash of the salted password in a process known as ‘key stretching’, attacks can be made much more computationally expensive. There are various key stretching algorithms such as pbkdf2, bcrypt and scrypt.
Here the pbkdf2 (Password-Based Key Derivation Function 2) algorithm implemented in openssl will be used.
The kdb+ function ‘pbkdf2’ takes 4 arguments – the password, salt, number of iterations and the length of the derived output key. Example usage is shown below,

This can be compared to an online pbkdf2 calculator

A higher number of iterations of the pbkdf2 algorithm provides more security. For example in tests it takes about a second to run the pbkdf2 function with a 512 byte salt, 25000 iterations and a 512 byte output key. However note the time versus security trade off here in that more security comes at the cost of more time spent in validating user connection requests.

Putting it all together
Now the various functions in qcrypt can be combined together to form the first stage of a simple access control layer. Note that a complete acl layer would also need to account for user classes and filtering of user function calls.

The q script access.q provides a number of functions to add, delete and update user passwords and to verify incoming access requests.
The default setting for the hashing algorithm (.acl.HASHFN) is pbkdf2 with a salt length (.acl.SALTLEN) of 512 bytes, 25000 iterations (.acl.ITERATIONS) and a derived key length (.acl.DKLEN) of 512. These default settings can be overridden by changing the ‘saltlen’, ‘iterations’, ‘dklen’ and ‘hashfn’ parameters in settings.csv. The ‘hashfn’ parameter can take the values:
md5, sha1, sha224, sha256, sha384, sha512 and pbkdf2.

Usernames and passwords are stored in a file users.csv. Note that if you change settings.csv you will have to re-generate all saved password hashes, so no function is provided for this in order to reduce the risk of unplanned changes to the algorithm settings.

The message handler .z.pw is used to verify incoming user passwords against the stored hash.

Some examples are shown below to illustrate usage. The salt and pbkdf2 key lengths are reduced here to 10 with the number of iterations set to 100 iterations so as to make usage easier to demonstrate:

Users can be added using the function .acl.addUser that takes two input arguments, a string for the username and a string for the password. The function then generates a random salt, encrypts a concatenation of the salt and password using the function .acl.enCrypt and then upserts it to the keyed table .acl.users that has columns for the user, password hash and salt. The users table is then saved down to a csv file, users.csv. The users.csv and settings.csv files are read when the access.q script is loaded.

Similarly if .acl.addUser is ran where the username matches an existing username, the entry for that user is updated with a new password. The function .acl.delUser takes a single input symbol for the username to be deleted.

Now a test connection can be made from another q session. Note that as well as .z.pw, other message handlers such as .z.po, .z.pc, .z.pg, .z.ps would also be overloaded in a full system.

Network Security
There is a loophole in the password encyption techniques discussed in the preceding sections, namely there is an implicit assumption that all communications are taking place over a secure network. If the network connection is untrusted, then kdb+ passwords can easily be retrieved. This can be done be using packet sniffers such as tcpdump to listen to network traffic. This is illustrated below for a password sent to a kdb+ server from a web browser where the password is sent over the network as unencrypted base64 encoded bytes.

There a couple of solutions to this. A secure tcp connection can be achieved through the use of stunnel which wraps an insecure tcp connection with SSL/TLS. However note that only tcp connections can be protected through the use of stunnel, non-tcp connections using q ipc still send passwords in the clear.

A more robust solution is the use of Kerberos which can provide secure authentication over an insecure network. Note that combining Kerberos with LDAP is often a good combination, with Kerberos used to authenticate clients and LDAP used for authorization.

Ensuring network security for communicating kdb+ processes will be the subject of a future guide…