ARK-8 Testing for Data Quality in Trading Systems

Matt Doherty

Software testing is a very large topic; really, an entire sub-field of software engineering. So, I won’t be trying to thoroughly cover the entire topic in a short blog post, instead just giving a few of my own thoughts based on experience, and hopefully a few helpful pointers specific to kdb+ systems as the next part of our ARK series, following on from our previous post on supportability.

Philosophy of software testing

Wikipedia says,

Software testing is the act of checking whether software satisfies expectations.

That’s one definition of what software testing is, but a more important question is what is it for? What is the point, why do we do it? Here’s another definition:

“The purpose of testing is to increase the confidence for stakeholders through evidence”

Now we’re getting somewhere. My own simple definition is a little more straightforward:

“The purpose of testing is so that you find issues and bugs in your code before your users find them”.

this one obviously written from the perspective of developers rather than users. So why am I starting with this semi-philosophical discussion of the exact definition of testing? There is no single definition of what good software testing is for, so it’s important to think a bit about what you specifically want to get out of testing, before you make any further decisions. In discussions of testing, you often see things like code coverage metrics, which can be great as they highlight specific lines of code that aren’t tested etc. They’re also tempting as they give you a precise number – your application has 95.6% coverage and you get a sea of green ticks as all the tests pass – but this can easily become a crutch. The relationship between code test coverage and how well your application is tested, is at best tangential. It depends on what your definition of the purpose of testing is. It’s entirely possible that your application has 100% code coverage, and your users are still finding bugs. Sending them a screenshot of all the green ticks on your test suite probably won’t make them feel any better. My own definition is much harder to measure. How do you know which bugs your users would have found if you had missed them? Of course you can’t, but I think this is the nature of testing: it’s not cut and dry. It’s better to live in the grey area, than assure yourself that you know exactly how well your application is tested, and it’s important to think about exactly what you want to get out of your tests. It’s also important to consider that my definition of the purpose of testing involves two different sets of people: you, and your users. Of course, the tests need to ensure the system does what your users need, but it also needs to work for you and the other developers. That means it needs to be user friendly for developers, and needs their buy in. If this second set of people are not happy with it, they won’t use it effectively, or at all. Again the key here, is to really think about what your definition of the purpose of testing is in your project at this point in time. Is user confidence key? Developer speed? What is the focus?

The second big thing to consider before we discuss anything more specific to kdb+, is how many tests should you have? Essentially all tests are code, and all code is a form of technical debt. It takes time to write, time to run, and time to maintain. If you’re a small team on a small project, a very large test infrastructure might be worse than less or no tests. Generally, more tests will catch more issues, but it also costs time, and that’s time that can’t be spent building actual things for actual users. Controversial opinion warning: the correct number of tests in some scenarios might be zero. If your project is very small, or just a proof of concept, testing too early might be a waste of time. On the other hand, for very mature projects with large userbases, or very critical use cases, testing might be very important. You might want to devote most of your time and effort to testing, rather than feature development. Here’s a chart:

I definitely wouldn’t argue with the overall message of this. However it’s important to realize the a small or medium sized project might never really leave the left side of the chart, and if it doesn’t time spent on fancy automated testing may be wasted. There’s no single right answer and every scenario is different, but just like thinking about what exactly you want to get out of your tests, it’s also important to think about how much time you’re willing to devote to them.

Types of testing

Software testing is broken into many categories or types: unit testing, integration testing, functional testing, end-to-end testing, acceptance testing, smoke testing, performance testing…. The overlap between these terms is often large, particularly from one technology to another. To simplify our discussion here I’m going to talk about just two types of testing:

Testing from the bottom up, which I’ll call unit testing from here onwards
And testing from the top down, which I’ll call end-to-end testing from here onwards

These two types and the spectrum between them – while obviously not exhaustive – will cover much of the core testing of your system. So let’s start talking about kdb+ with another strong and perhaps controversial opinion: in kdb+ systems end-to-end tests are generally better than unit tests. If you think otherwise that’s fine, you have every right to be wrong. Before anyone starts writing angry comments, this opinion does not apply to broader classes of software. Like the point I made I the first section, you need to think carefully about what tests are most likely to add value in your specific system. In many cases unit testing might be crucial. And even in kdb+ systems unit testing is certainly not a waste of time. However, end-to-end tests tend to be more valuable in my experience.

Let’s justify that opinion while talking more around the uses of each type of testing in kdb+ systems. In unit tests we test the smallest units of our system: usually functions. These tests can be great: they are simple, quick to write, quick to run, and generally have good tooling to support them (more on that later). However, they are necessarily more abstracted from the user, and what they care about. kdb+ systems are generally large, complex and consist of many processes interacting. A lot of important stuff happens in the spaces between those processes, and unit testing cannot help us there. How will a change in one process impact another? What if a certain bug shows up only at a certain time of day? This is where end-to-end testing can help us a lot more, and why I think it’s generally a more valuable tool in these cases. What do your users care about? Maybe they hit an API endpoint, in which case that is what we should test: is the API endpoint active and behaving as expected? Maybe we stream data out, in which case we should have some test that subscribes and tests: can we get data and does it look as expected? Maybe our users view a web dashboard of some form, in which case that is what we should test. End-to-end testing is not without its downsides: to run these tests you need some form of test environment, and that environment will never be a perfect match for production. Data in particular is often difficult to match between testing environments and production, but we have to do our best. Testing data itself is a topic too large for this post, and it bleeds into data quality testing For anyone more interested in this topic here is a good place to start reading. End-to-end test tooling is also more varied and less straightforward, as what you want to test depends on what your users care about. And running the tests is generally slower. But even with these caveats in mind, I believe generally end-to-end tests will add more value, as they come closer to answering the “what is the point of testing” questions we opened with.

A useful hypothetical to think about is this: say you have an existing system you are not familiar with, that has no testing in place, and you want to add some, where do you start? I would argue the best place to look is live monitoring: does the system have any? If so, that will likely tell you want your users care about (i.e., what are you actively monitoring), and you can extend that down into some form of more thorough end-to-end tests. If it has no live monitoring, then talking to your users and writing monitoring for whatever they care about most is probably going to add more value than any form of testing, and only then work on adding end-to-end tests. This is an important general point: end-to-end testing and live monitoring are not really distinct, but just two different flavours of the same thing. Live monitoring is a form of end-to-end testing. The line between them is and should be blurred, and for this reason Andrew Wilson included a section on testing in our previous ARK blog post of supportability. On the other hand, if you start by trying to write a test suite for every function in the system you might be in for a long haul. Unlike with end-to-end tests that are closer to your users, it’s difficult to tell what adds the most (or any) value, so it’s hard to know where to start. It also might be difficult to write tests for functions in isolation if you don’t know exactly how they should behave in all cases.

Another useful heuristic that can be applied to both unit and end-to-end tests is this: how often do we have to change this test? If you find you’re changing your unit tests every time you alter any part of your code, that should generally be an indication that the tests are not very useful. It’s probably just testing the current implementation, rather than some higher-level behaviour. Tests should be targeted at behaviour you want, rather than the current specific implementation. I think this is somewhat easier with end-to-end tests than unit tests, but it’s certainly possible in both cases.

Tooling

Ok some more specifics. Unit testing in kdb+ has a few nice libraries to help you out:

K4unit – now well over 15 years old!
Qcumber – newer library written in BDD style, based on Cucumber. This could also be useful for some amount of end-to-end testing, since it focuses on specifying expected behaviours.
QSpec – Dan Nugent’s testing library, also in the BDD style.

And it’s fairly straightforward to automate each of these so for example they run on each commit to master, or each tagged release. So what about end-to-end testing. This is trickier. It depends a lot more on what your system does. For me the main question you should ask here is, what do my users use? If they mainly connect in python to an API, that’s what you should test. If they subscribe from Java processes I’d argue that is what you should test, ideally with similar code to them (based on shared documentation etc.). So there’s obviously a lot more variety here, but possible tests might include:

API testing In python using the requests library and pytest.
If your users are subscribing to streaming data from Java, then maybe testing that with JUnit or Cucumber
Selenium or similar browser-based testing if your end product is a dashboard of some form (this is not straightforward though)

Another important point about tooling – in particular end-to-end testing – is there’s a lot of value in integrating with whatever is already available or used elsewhere in the organisation. This means less overhead, potentially other teams to help with maintenance, or to ask questions, or help monitoring your application for you. If your organization uses Teamcity, or Jenkins, or github to automate, that’s probably what you should do too. Testing is a very human problem, so use the other humans where available!

And finally automate as much as possible! This is where unit tests can be great, just run them all the time. If possible end-to-end tests should be the same, run them automatically after every deploy, or throughout the day on a QA environment. The less you need to think about running tests the better. As with everything here, this needs some careful thought, as your tests need to alert you when actual issues arise, not produce a sea of noise. The balance is crucial, and tricky to get right.

An ideal setup for a kdb+ system with a medium sized team and userbase might be:

A reasonable set of unit tests covering your API, and any other key functions that are suited to it. These can run automatically on every commit, or deploy, hopefully catching any simple errors or mistakes, syntax issues (particularly as kdb+ lacks a good linter) etc. This hopefully catches some issues early and avoids you wasting time re-deploying.
Once deployed to a test or QA environment, you then have a more thorough set of end-to-end tests that check your application behaves as expected in terms of actual user functionality. This can be run post-deploy, or ideally on some sort of timer so you get alerted of any issues that don’t surface immediately.