Dan Williams
Timeseries data sets in Capital Markets are large. Market data volumes can be multiple terabytes per day. Internal data, such as orders and executions, is usually small in comparison but not insignificant.
Databricks Genie provides the ability to interrogate data using natural language. In this demo we’ll show how to prime a Genie Space to use natural language to analyse Capital Markets datasets.
Databricks is unlikely to be the primary point of capture for capital markets data- usually a specialised timeseries technology is employed. The goal is not to replace the timeseries technology, particularly for quantitative research use cases.
The full datasets could be shipped to Databricks, but we believe the optimal pattern requires consideration of which datasets to ship and how, either in raw, aggregated or enriched forms.
Once in Databricks, the data sets will be governed by Unity Catalog, and can be opened up to a wider set of use cases perhaps in conjunction with additional internal or external datasets.
Share this: