End-of-Day Parallel Sorting in TorQ

Blog Data Analytics 22 Sep 2016

Data Intellect

We’ve written some blog posts previously on TorQ End-of-Day and more efficient methods for structuring temporary data on disk.  Until now TorQ has handled the end-of-day sorting or merging of tables sequentially, processing one table at a time.  Additional functionality has been added in the latest release of TorQ (v2.6.2) which will enable the parallel sorting of tables. This will allow users to configure multiple table sorts at once thereby reducing the overall time it takes to complete the end-of-day operations. This has been achieved via the inclusion of additional sorting processes, known as sortslaves. The sortslave processes are clones of the original sort process and allow the sort process (sort master) to delegate table sorting to the multiple sortslave processes using .z.pd (peach handles).

Parallel Sorting in TorQ Diagram

When a q process is started with -s -N (where N is the number of worker processes); peach calls the .z.pd function to return the handles to the worker process. .z.pd can be defined as a static list or a function which returns a list of handles to worker processes e.g.

.z.pd:{`u#exec w from .servers.getservers[`proctype;sortslavetypes;()!();1b;0b]]}

In the example above .z.pd is defined as a function which will return a list of handles to the processes which are of type sortslave. In doing this, TorQ allows users to add enough sortslave processes to accommodate the number of tables to be sorted. The WDB process also tracks the size of each table and passes the largest tables to the sortslaves first to ensure a shorter runtime.

In the event .z.pd returns an empty list (i.e. sortslaves are not available), or if the sort process is started without the -s -N tag, TorQ will perform the end-of-day table sort on the sort master process sequentially as it did before.  If the sort master process is not available, the WDB process will do the sort.  Note that, as with any process that is connected directly to the tickerplant and has a long end-of-day time, there will potentially be a message backlog created on the tickerplant.

The addition to the wdb.q script can be viewed here.  The new functionality has also been added to our example application, the TorQ Finance Starter Pack.  If you have any market data architecture problems please feel free to get in touch!

Share this: