The major challenges for today's data warehousing solutions are improving performance and reducing costs. Some systems have to scale to process terabytes of data to update data warehouses with tens of terabytes of data. Typical solutions involve some combination of micro-batching and higher performance hardware. However, both are viewed as short term stop-gaps, and have proven unable to scale as data volumes and complexity continues to increase.
About Continuous ETL
Extract, Transform and Load (ETL) is used to describe a common approach for populating data warehouses. Most data warehouses require data to be consolidated across multiple source systems, where data structures and data granularity may differ.
- Extract involves extraction and the conversion of data into a format suitable for processing by the transformation engine.
- Transform is the mapping of the source data to the data warehouse’s data model. Some sources are straightforward, many transformations require joining, aggregation, filtering and splitting of the source data, as well as joining from multiple sources.
- Load involves writing the transformed data into the data warehouse, and requires an understanding of the database constraints, as well as facilities such as triggers.
However, ETL on its own is insufficient. The high volume of data and meta-data that is generated and captured by today's web applications can grind a data warehouse to a stand still. Data warehouses require aggregation of the vast volumes of raw data in order to extract meaningful business intelligence while it is still relevant.
SQLstream enhances the data warehouse by adding true, real-time capabilities that increase scalability, help to keep it current, and eliminate latency and down time. SQLstream uses a combination of continuous ETL and real-time SQL queries to offload heavy data processing, reduce replication issues and deliver real-time operational intelligence.
SQLstream enables multiple sources of heterogeneous data to be aggregated, correlated and filtered in real-time. Change Data Capture adapters provide the real-time ‘Extract’ function, delivering a stream of relational data from the source systems. SQLstream processes the relational data streams, providing both the ‘Transform’ and ‘Aggregation’ functions in a single platform.
Most importantly, the data is aggregated in SQLstream, in real-time, before the ‘Load’ operation takes place.
Have questions about your current project? Click here to ask a SQLstream expert.