‘Real-time’ and ‘Hadoop’ had been considered synonymous, yet Hadoop is not as real-time as many have hoped. Hadoop has many strengths, but was never intended for low latency, real-time analytics over high velocity machine data streams. With the SQL language emerging as the key enabler for the mainstream adoption of Hadoop, executing streaming SQL queries over Hadoop extends the platform out to the edge of the network, making it possible to query unstructured log file, sensor and network machine data sources on the fly and in real-time.
SQLstream accelerates Hadoop to process live, high velocity unstructured data streams, delivering the low latency, streaming operational intelligence demanded by today’s real-time businesses.
SQLstream for Hadoop combines SQLstream’s real-time operational intelligence from high velocity machine data with the power of Hadoop for high volume data storage and on-going analysis. SQLstream for Hadoop enables:
The first phase of Hadoop and Big Data saw the emergence of NoSQL data storage platforms, looking to overcome the rigidity of normalized schemas. However, as the technology hits mainstream industry, the need for simpler, high performance and reliable queries is driving a resurgence in SQL as the de facto language for Big Data processing (for example, Cloudera Impala and Google BigQuery). What is now apparent is that SQL is the ideal language for processing data streams using real-time, windows-based queries. The issue with normalization and rigid schemes is a non-issue for a streaming data platform – there are no tables, no data gets stored!
SQL was developed to process stored data in a traditional RDBMS. It has a massive existing skills base, proven scalability and sophisticated dynamic query optimization. It also functions equally well, if not better, as a real-time stream computing query language. SQLstream’s ANSI SQL:2008 streaming SQL queries are exactly that – standards compliant. We test our SQL queries for standards compliance against the leading RDBMS SQL platforms. There are however two differences. SQLstream’s core s-Server stream computing platform does not persist any data before processing (Hadoop HBase is the default storage platform for stream persistence although any data storage platform can be supported), and streaming SQL queries execute continuously, processing new data as they are created. So why SQL as a stream computing language?
The following query is a basic example of a streaming SQL query. The query finds Orders from New York that ship within one hour. Unlike a traditional static SQL query, this query executes continuously, processing new data as they arrive across all streams in the join, and pushing out results as the query condition is met. The keyword STREAM is used to maintain standards compatibility as without it the query would return a table not a stream of results that continue ad infinitum.
Streaming SQL supports all standard SQL operations for data streams, including:
SQLstream s-Server, our core streaming computing platform, operates both as a streaming Big Data engine and as a streaming SQL language extension for Hadoop HBase. In Hadoop mode, Hadoop HBase is utilized as the default platform for stream persistence. Data can be streamed directly into Hadoop HBase in real-time, including the raw machine data as it is collected from the log files, applications and sensors, also filtered and enhanced versions of the same streams, as well as any pre-aggregated and analytical intelligence information. SQLstreams streaming SQL language support for Hadoop offers:
A key advantage with SQLstream is the ability to extract and replay processed data from Big Data storage platforms and join this information with the incoming, live data streams. Operational intelligence results are enhanced by combining real-time data against known trends, eliminating false alarms and longer term comparisons. The extraction and data processing in SQLstream uses standards-based SQL queries, enabling powerful real-time queries to be deployed over streaming stored data.