ARK 3: Software Development Life Cycle

Read time:
10 minutes

Andrew Steele

What is SDLC?

In our previous blog, we took a look at some of the hardware options you need to consider when building out your kdb+ architecture. This month, we are going to shift our attention to the software development lifecycle.

A software development lifecycle (SDLC) is the process of getting a piece of software designed, created and deployed to production in such a way that it is high-quality, cost effective and developed in the shortest time possible.

There are many different methodologies for SDLC such as waterfall, agile, etc, and each of these have their pros and cons which are covered in many other blogs, for example here or here so I won’t go into them in detail.

An interesting take on this can be found in this blog, which compares the use of different methodologies to the collective fictions described in Yuval Harari’s book, Sapiens: A Brief History of Humankind. It argues that it doesn’t really matter which methodology we follow as long as there is some sort of system in place and that the developers/BA’s/end users are all on board with it.

Similarly, a quick google search shows that there are ~~four~~, ~~five~~, ~~six~~, seven phases of the software development lifecycle (you can also find articles for 8, 9 and 10 step SLDCs, but hopefully you get the point).

Again, I don’t think it really matters how many exact phases your SDLC process has, but there should be some well-defined set of steps in place and, the one common thing that all of these blogs demonstrate is that each step should feed into the next one.

In the next few sections, we will detail some of the key steps that you should be considering as part of your SDLC process.

Requirements Analysis

First things first: we need to figure out what it is we are trying to build or fix.

Usually, the business or an end user will come up with a high-level set of things they want from the software project. These can include:

Capturing new data sources
Adding new reports or APIs to access the data
Adding new UI features to visualise the data
Bugfixes

We then need to refine these down to an achievable and concrete set of requirements.

For example, if the requirement is to store 10 years of NYSE TAQ data on a single PC, we know this isn’t possible so we need to manage the user’s expectations and either find a much smaller dataset to store or a much larger storage solution (You can find more information about hardware planning in our previous ARK blog post and in an upcoming blog about environment planning).

On the other hand, if the requirement is create an API which returns the open, high, low and close (OHLC) prices for stocks, then it is important to understand what the expected user inputs are (a single stock vs a list of stocks or a single date vs many dates), and what format they expect the output to be in. These factors can have a massive impact on how we plan and write the code for this feature and if we don’t define these requirements up front, we can end up with a feature that doesn’t meet the needs of the business.

For small projects, developers can often work directly with the end users to gather requirements. This can have the benefit of more frequent feedback from the users and quicker turn-around of features. However, as the project gets larger and has more developers working on it, it can be useful to have a Business Analyst (BA) or Project Manager (PM) to act as a liaison between developers and end users. A BA can provide more business insight into any new requirements and help make sure new features meet the specification of the end users whereas a PM will help to manage the work backlog and ensure the project stays on track.

Planning

Ok, you know what the users want… so how are you going to do it?

This is the part where you take the specifications from the requirements phase and turn them into actual implementation details. These implementation details can be anything from what technology you use for the project to how you write the database query to extract the data.

While it isn’t possible to plan every single detail up front, it is important to be as thorough as possible so that you can provide good estimations of timelines back to the end users. The subject of estimation of tasks could fill a whole blog post on its own, but as a rule of thumb

It’s better to over-estimate than under-estimate.
For larger pieces of work, gather a few developers together and plan it out as a group.

If you are a solo developer on a project, this last point wont apply but it is still a good idea to sit down and sketch out a plan before diving into the task.

For larger teams, you may look at following the scrum methodology or even just adopting some practices from it, e.g. planning poker or story time. If you are new to scrum, these practices can seem a bit overblown and cumbersome, but they do get easier over time (I promise) and they help to ensure the backlog of tasks you have is in a good shape.

Once you have got all the implementation details worked out, it is a good idea to store and track these in an issue management system like Jira or Redmine. These are useful, not only for developers working on the tasks, but also for BA/PM’s to manage the backlog and for end users to report new issues and track the progress of any bugs or new features they may be waiting on.

Software development

Now on to the fun stuff, actually writing some code.

Every developer works differently and everyone has their own favourite IDE’s and development tools. It’s important not to impose too strict a workflow on your developers and to give them the freedom to use their preferred tools as that is how they will be most productive. Of course, there will be limitations to this: security is a major concern in larger organisations and your company security policy may not allow downloading unapproved software development tools from the internet. In this case, there should be some procedure in place for developers to install software from an approved list of commonly used development tools and this list should be extensive enough to give your developers a good choice of tools.

It’s not just software though, your devs need access to the correct hardware in order to do their jobs. This could be a beefy laptop or desktop computer with enough RAM and processor power to compile their code and run the tests (more on that in the next section) in a timely manner. Alternatively, they may need access to a linux server or equivalent in order to run their code.

A common trend is the move towards cloud based desktops. These allow developers to create relatively high powered machines to write/compile and test their code but with the advantage that they can be shut down when not in use and save your company money.

So far we have talked about making sure your experienced developers have all the tools they need to do their thing, but it’s also important to think about the more junior devs on your team. This might be their first job out of university and they haven’t figured out their favourite tools yet. Even for experienced hires, your project may use a technology they have never worked with before so they won’t have any reference for developing with it. In this case, it’s a good idea to have some kind of development guide for new starters to follow. This might even include a sample first task, for example, adding your name to a CONTRIBUTORS.md file.

We can take this one step further by defining a set of contribution guidelines or a code style guide – a set of best practices when adding new code to the project. This is important as a consistent codebase is easier to understand and maintain.

Finally, you should be using some kind of version control system (VCS) to store your code – git has become the most popular VCS but there are alternatives like svn or mercurial. These usually come with online or enterprise hosting solutions for these, for example github or bitbucket. These offer a web based UI for managing your code base as well as making changes and tracking bugs. Many of them also have integrations with issue trackers like Jira or CI tools like Jenkins or Teamcity. One nice feature of these tools is that you can create pull/merge requests for changes that you want to merge to the codebase. These can then be reviewed by other developers in the team before they are merged (or rejected). Code review is an important part of the software development process, it helps find bugs before they get merged and improve code quality overall.

*NB – for the rest of the blog I will use git and github in the examples, but most of the concepts should work for any VCS and hosting solution.

Testing

Unless you are very confident in your coding ability, you should probably do some testing of your product before you release it to production. This could be a manual set of checks or queries that you run against a QA environment or a comprehensive set of unit and integration tests that run automatically on every code check-in. The important thing is that these tests are repeatable and relevant – there is no point in spending time testing a piece of code you deprecated 5 years ago!

It’s a good idea to be familiar with the concept of the testing pyramid. This breaks down the various test types into 3 main categories and gives an indication of how many of each there should be and how often they should run.

Unit tests – these should test individual units or functions of your codebase in isolation. There should be quite a lot of these and they should run quickly and often.

Integration tests – these should test that various components of you system can work together, for example, can your front end query the database successfully. There probably won’t be as many of these and they can be slower and run less frequently.

End to End tests – these should test your system as a whole. They should run under real-world conditions, e.g. realistic data and representative user queries. These can be more manual, but they should be repeatable.

As mentioned above, there are various tools that can help you with automated testing, for example teamcity or jenkins. These can be used to run continuous integration (CI) so that every time changes are made to the codebase, your test suite is ran. This will give you greater confidence in your changes and will hopefully help to catch any bugs much earlier. Many of these integrate nicely with your VCS (e.g. teamcity or jenkins) and you can even set up branch protection so that you changes cannot be merged unless the CI tests pass.

Deployment

Finally, we’ve wrote all our code, ran our tests, ironed out all the bugs – now lets get it shipped to the users.

Again, version control is your friend here. All of the VCS frameworks offer some ability to tag your code at a given point in time. Once you are happy that you have got all the new user features and fixed any bugs you can create a tag which will act as your release candidate. Note, this is still a release candidate, there is always time to find a new bug or improvement, but this tagged version should act as a starting point.

If you have a testing/QA/preprod environment, you should start by deploying your release candidate to each of these. These environments each have their own purpose and we will have more information on them coming up in a future blog. Importantly, deploying to these environments first should help iron out any potential issues or bugs with your release.

This brings us to the actual deployment process. This should be automated as much as possible – if you are still copying your code files around manually you are asking for trouble. You will probably have to deploy your application quite a lot over the course of its lifetime so you want this process to be quick and reliable.

At a minimum, a simple bash script can be used to download the tagged version of your code and copy it to the correct location on the target server. It can also call any stop and start scripts you may have to make sure your application picks up the latest changes.

As your application grows you may want to look at larger scale automated deployment tools. You can control the entire tagging and deployment process with the CI tools we have already mentioned, or you may want to more complete automation engines like ansible and terraform. These are more useful when you also need to provision hardware as part of your deployment, for example in cloud architectures.

One thing you may need to consider in your deployment procedure is handling any pre or post release steps. This is particularly important where databases are involved as you may need to add or modify columns, or rebuild an entire database. These steps can be time-consuming and error-prone so it is good to spend a bit of extra time beforehand thinking about how to do these and factoring in any extra time required to complete these steps – you don’t want to be starting a full database rebuild at 9pm on Sunday night.

Alas, even the best planned releases can still go awry. The data in production could be different or much larger than expected, users could be querying the data in unexpected ways – any number of things can go wrong and you may find you need to roll back your release. You should be prepared for this as well – any good release script or tool should have a method of rolling back the changes to the previous version.

Problems with the release can happen no matter how good your testing and CI tooling is, what is important in this case is that you review what went wrong and feed this back into the next iteration of your software development lifecycle.

Conclusion

The subject of SDLC is a massive topic and there has already been much written about it. The purpose of this blog isn’t to tell you exactly how you should do SDLC, whether you use agile or waterfall methodology, scrum or kanban, github or bitbucket, the important thing is that you do have some kind of process in place that everyone in your team can follow and get behind. Hopefully this blog has given some ideas and suggestions on this.

Also, remember that the SDLC process is supposed to help you deliver your software quicker and with higher quality. As your project evolves, you may find some aspect of your process isn’t working for you anymore, maybe your CI jobs are taking too long to run or your team are getting fatigued with too many planning meetings. If this is the case, don’t be afraid to change it up. There is no one correct way to build software and the most important thing is finding a process that works for your team.

Stay tuned for the next instalment of of our ARK blog series where we will be taking a look at how to best make use of different environments when developing your kdb+ system.