Read time:
5 minutes
Samantha Devlin
In our years of scoping, building, and calibrating trading surveillance platforms, one challenge stood out: detecting spoofing without drowning in false alarms. Spoofing is the act of placing orders with no intention to execute, just to mislead the market, and create an artificial impression that benefits the spoofer’s real intentions. It can be subtle, especially in today’s fast, complex markets, and early detection models often light up like Times Square with false positives.
The latest A-Team Insight handbook AI in Capital Markets highlights that regulatory bodies such as the SEC and FCA are concerned about AI’s potential to enable sophisticated forms of market manipulation, such as spoofing and layering. The other side of that coin is that we can also leverage AI to identify anomalies in trading patterns, flagging those bad actors. As we know by now, data is the fuel of AI, and it is also the fuel of surveillance systems. Quality data will produce quality results. In this post, I’ll share how increasing the granularity of market data enables smarter and more accurate spoofing surveillance, peppered with a bit of my own experience.
Market Data Levels
Before we dive into how the various data depths can support strong alerts, here is a brief overview of each of the depths that I will be referencing throughout:
[Below image source: BMLL]
So how do we avoid having so many false positive spoofing alerts? There are refinements that can be done to these models to make them smarter. These extra checks try to differentiate a malicious spoof from coincidental normal trading:
These supplementary rules make the alert logic more nuanced. In my experience, each added criterion knocks out a chunk of false positives. For instance, one of the biggest early wins was ignoring supposed spoof orders that were more than a few ticks away from the best price. It turned out a lot of traders naturally place far-away iceberg orders as “just in case” liquidity, with no intent to mislead – and those were triggering alerts unnecessarily. Incorporating a proximity check to only focus on near-touch activity aligned the alerts more with truly deceptive behaviour.
[Below image source: Spoofing the Limit Order Book]
Even with those refinements, using only Level 1 data (just the top-of-book quotes and last trades) limits what you can do. This is where Level 2 market data comes in, giving a fuller picture with the order book. Level 2 shows multiple price levels of buy and sell orders (often the top 5, 10, or more levels on each side). Having this depth of book unlocks more advanced analysis and can further reduce false positives.
With Level 2 data, a surveillance system isn’t blind to the broader order book. It can see, for example, that you placed a buy order three levels down, 5,000 shares strong, and at that moment the order book at that price had maybe 6,000 shares total. This allows a few powerful enhancements:
In short, Level 2 data provides a richer picture that both improves the trigger conditions and gives the analyst a more complete story to assess. More data means more context, and context is the antidote to false positives.
Last but not least, I am keen to explore how Level 3 data can take your surveillance to the next level (pun intended…). With Level 3 data, we can see every individual order: each order’s size, price, timestamp, and often a unique order ID. Some trading venues or data providers offer this to high-end users, and it’s the kind of data regulators themselves see when they comb through audit trails.
In practice, Level 3 isn’t commonly used in firm-level surveillance yet, partly because of the massive data volume. But let me paint a picture of what it could enable if we had it (and in some cutting-edge projects, we do):
It’s worth noting that handling Level 3 data is non-trivial. The volume is enormous (major exchanges generate tens of billions of order messages per day), so any surveillance solution using it needs serious engineering prowess. But the benefit is a dramatically increased richness of data. From what I’ve seen, the trend is that surveillance tech is slowly moving this way. As infrastructure catches up, using full depth and even reconstructing historical order books on demand is becoming feasible. That means future spoofing models might well leverage these Level 3 insights to all but eliminate certain types of false positives and, more importantly, catch manipulations that simpler models would miss.
Thus far we’ve focused on one order book, one instrument at a time. But real-world market manipulation isn’t always self-contained. A cunning strategy we’ve seen (and regulators have busted) is cross-market spoofing – using orders in one market to influence prices in a related market. This is another case where having a wide lens on data is crucial.
Consider a scenario involving a futures contract and the underlying cash market. A trader holds a position in U.S. Treasury bonds, but they go and place large spoof orders in U.S. Treasury futures, which are different but closely correlated. The fake orders push the futures price up or down, which in turn nudges the price of the bonds, allowing the trader to profit on their bond position. This actually happened – a bank’s trader took advantage of the tight link between Treasuries and Treasury futures, and he placed spoof orders in the futures to profit in the cash bonds market. This cross-market manipulation led to a hefty $35 million fine for the firm. Similarly, equities vs. equity futures, or index futures vs. component stocks, or even between related commodities, are fertile ground for this tactic.
Why do I bring this up? Because detecting such a scheme means your surveillance system must ingest and analyse different data streams together. You’d need the order book, or at least trade and quote data, for both markets, plus logic to correlate them. Traditional systems that look at only one product at a time will completely miss this – the spoof orders alone might not trigger any alert if no trade happened in that futures market for the spoofer, and the bond trade alone wouldn’t look odd. It’s the combination of “spoof here, profit there” that completes the picture.
The push toward integrating more data sources is happening. Regulators themselves emphasize connecting the dots across markets. But more data means more complexity as it requires more processing and more analytics to find genuine correlations versus random coincidences. It can also mean more false positives if done naïvely (correlation is not always causation!). However, when built with greater data dimensionality and better models, multi-market surveillance can be extremely powerful.
From my perspective, this is an exciting direction, as it means breaking down silos and having surveillance platforms that see the whole chessboard, not just one piece. It’s another frontier where data granularity can make alerts more reliable, allowing you to weed out false positives and get better alerts.
Enhancing a spoofing alert’s reliability boils down to providing it with the right information. Level 1 data (top-of-book) gives you the bare essentials, enough to catch the outline of a spoofing scenario but not the whole shape – it often leads to crude, noisy alerts. Level 2 data (market depth) adds important context, like a higher resolution image, allowing the detection logic to zero in on realistic manipulation and filter out benign activity. Level 3 data (full order flow) is the ultimate granularity, giving x-ray vision into the order book. It’s not widely used yet by most compliance teams, but it holds the promise of detecting spoofing behaviours that were previously invisible.
On a personal note, working with these different data granularities has been eye-opening. I’ve felt the frustration of sifting through dozens of false positive spoofing alerts and the relief when a new data feed or rule tweak eliminates a chunk of them. It’s a constant cat-and-mouse game: as spoofers get more sophisticated, we counter by capturing more detail about the market’s state to expose their tricks. Yes, more data can mean more complexity, causing the engineering load to go up, and you need good tools to store, process, and analyse it. But the payoff is fewer false alarms and greater confidence when an alert does trigger. The best technologies to handle these complex analytics are those that specialise in big data storage, have exceptional processing features, and are optimized for time-series tick data – such as kdb+.
At the end of the day, surveillance is about protecting market integrity without unnecessarily hindering legitimate trading. Having richer market data, whether it’s deeper order book levels or broader cross-market coverage, is like having a more reliable detector. It allows us to focus human attention on truly anomalous patterns with a higher probability of misconduct. The result is a trade surveillance platform that can confidently say “we see what you’re up to”, while ignoring the noise. And that, to me, is worth every tick of market data we ingest.
Share this: