Read time:
3 minutes
Gary Davies
Previously, we highlighted the cost and impact of dead data in a time series capture. This time we want to focus on the data equivalent of scope creep.
Imagine your database as a tidy room with neatly organized shelves. Initially, you carefully place only the essential items on those shelves; your core data columns. But over time, as more requirements emerge, you start adding extra shelves and piling up more items. These new columns and data points accumulate, often without a second thought. This phenomenon is what we call data creep.
In technical terms, data creep refers to the gradual expansion of data capture beyond the original project scope or initial requirements. It’s like that cluttered closet at home; you keep adding stuff, and suddenly you’re drowning in a sea of old shoes, forgotten sweaters, and mismatched socks.
From a time series perspective, this gradual expansion also refers to the historical data partitions that are naturally growing (usually daily). I’m sure many of us have worked on or owned a system that you ponder if anyone ever looks at the data from 10+ years ago.
Data creep is like a silent intruder, it sneaks in unnoticed and wreaks havoc, often going unnoticed until the complexity of the system increases the resourcing and cost needed to support it.
In this blog we’ve supplied approaches that can be used to reduce and mitigate against the creep to help you figure out which data doesn’t belong.
At Data Intellect, we are always keen to help improve the efficiencies and cost effectiveness of data systems, please reach out to us if you are interested in hearing more, or would like us to help reduce your creep.
If we all did this, how much money could we save?
Share this: