Blog

Streaming SQL for Stream Processing and Streaming Analytics

Gartner has predicted that the data processing market will grow to a value of $55B by 2016 and that much of that value is held in the data, waiting to be exploited. Maximizing the potential means analyzing data in motion, as soon as the data are created, as well as data at rest, once the data are stored. In fact, the ideal scenario is to combine the two in an integrated architecture. This is simpler to achieve that many would think, particularly with the right integration architecture in place, and the right tools for building analytics and integrations.

Data stream processing is most commonly applied to unstructured machine data streams – event data generated by servers, applications, sensors, devices and networks. And many industries are now utilizing data stream processing to provide continuous, real-time visibility in their operations from streaming analytics and to drive automated updates and actions as a result. However those industries also have significant investment and business processes based on their analytics from the stored data. So how to combine the best of both worlds?

A unified data analytics architecture would contain systems for data stream processing, machine data staging and pre-processing, and the enterprise data warehouse(s). The machine data staging and pre-processing platform is typically Hadoop / HBase, or an alternative NoSQL platform. The enterprise data warehouse would be the existing DBMS, where important requirements include ACID compliance, reusabiliy of analytics and enterprise-wide access.

Streaming Analytics for data in motion with Continuous ETL to load data at rest

The integration framework for a unified architecture is built on the connector and API capabilities of all interconnected systems. For a DBMS, this is typically the JDBC driver, and for Hadoop, this would be fast dataload into HDFS (file updates) with read through a NoSQL platform such as HBase. However, both must be supported by the stream processing platform in order to support read and write to both types of systems simultaneously.

Unified Data Analytics Architecture Combining Data in Motion with Data at Rest

Unified Data Analytics Architecture Combining Data in Motion with Data at Rest

SQLstream Blaze includes native SQL connectivity to both 3rd party database platforms and Hadoop / NoSQL platforms. The connectors include Hadoop connectivity plus a standards-based SQL/MED (Mediation of External Data) framework that utilizes JDBC connectivity and allows tables in external databases to be accessed as if they were local tables in SQLstream (using SELECT, INSERT and MERGE operations). SQLstream Blaze to data storage connectivity is used to support any or all of the following use cases:

  • Continuous ETL for unstructured machine data streams. Continuous ETL enables databases and data warehouses to be maintained in real-time based on the continuous aggregation and filtering of unstructured event data streams. Continuous ETL enables accurate, timely business reporting (the data is always accurate, no need to wait for nightly aggregation runs for example).
  • Static – stream table joins. The majority of real-time systems utilize existing stored data, for data augmentation and enhancement (for example, joining with customer attributes), or for greater accuracy by joining real-time data with longer term trends and predictive analytics.
  • Re-streaming for time-based analysis of stored data. Stored event and time-series data can be replayed (in fast forward mode), enabling ad-hoc, time-based analysis and scenario analysis over much larger datasets.

SQL as the big data analytics language of choice

A further key requirement is the use of a single analytics language and increasingly SQL is the analytics language of choice for both structured and unstructured data. SQL is of course the primary analytics for the DBMS, but is also available through many platforms for Hadoop and NoSQL storage platforms. SQL is also a powerful analytics language for data streams as SQLstream Blaze has shown, the only difference being streaming SQL queries execute continuously over moving data streams, rather than repeatedly over stored data at rest. Given that most are familiar with SQL analytics over stored data, here’s a few examples of typical streaming SQL queries that would be executing in a unified data in motion / data at rest architecture.

As per any SQL platform, the types of analytics and analytical processes for data stream processing fall into four broad areas – alerts, analytics, predictive analytics and aggregation. The main difference being that for SQLstream Blaze, these generate real-time results with millisecond latency as measured from the time of data arrival. There are also other operations that contribute to more powerful analytics such as partitions (PARTITION BY) and joins (JOIN, UNION). Some simple examples include:

Alerts.
The requirement is to generate an output alert through a connector to a notification system or other external platform. For example:

     SELECT STREAM ROWTIME, City, Temperature
     FROM WEATHERSTREAM
     WHERE “City” = ‘San Francisco’ AND “Temperature” > ‘100’;

This query processes an input stream of temperature readings and outputs a record containing the time, location and temperature reading for every input reading matching the selection criteria. The latency from the time the data record was emitted by the sensor to the alert being issued can be as little as a few milliseconds, even where data arrive at millions of records per second.

Aggregations (Tumbling Windows)
Streaming aggregations are used extensively in stream processing, and form the basis of continuous aggregation / ETL operations into Hadoop and DBMSs. Queries utilize functions such as AVG, COUNT, MAX, MIN, SUM, STDDEV_POP, STDDEV_SAMP, VAR_POP, VAR_SAMP, and output a record at specified intervals for each GROUP of input records. For example:

     SELECT STREAM
          FLOOR(WEATHERSTREAM.ROWTIME to MINUTE) AS FLOOR_MINUTE,
          MIN(TEMP) AS MIN_TEMP,
          MAX(TEMP) AS MAX_TEMP,
          AVG(TEMP) AS AVG_TEMP
     FROM WEATHERSTREAM
     GROUP BY FLOOR(WEATHERSTREAM.ROWTIME TO MINUTE);

The result is a stream of new records, one per minute, specifying the maximum, minimum and average readings recorded in that minute. This type of query is particularly useful for generating periodic reports with zero latency from the input data streams.

Analytics (Windowed Analytics)
Windowed analytics are the basic building block for streaming analytics. Queries generate an incrementally updated output record (or row) for each new input record. Each field (or column) in the output record may be calculated using a different window or partition. Windows can be time or row-based. For example:

     SELECT STREAM    
          ROWTIME,
          MIN(TEMP) OVER W1 AS WMIN_TEMP, 
          MAX(TEMP) OVER W1 AS WMAX_TEMP,
          AVG(TEMP) OVER W1 AS WAVG_TEMP
     FROM WEATHERSTREAM
     WINDOW W1 AS (RANGE INTERVAL '1' MINUTE PRECEDING);

In this query an output record (or row) is generated for each new input record, specifying the updates to the minimum and maximum temperatures over the preceding 60 seconds, plus an incrementally updated average for the temperature over that period.

The Advantage of a Unified Data in Motion / Data at Rest architecture

The integration of data stream processing with DBMS and Hadoop can therefore be entirely SQL standards-compliant. At the system level, the data at rest are updated in real-time, and for trend and other data to be joined with streaming data. For developers, this offers the ability to:

  • Execute analytics over stored or streaming data using the same SQL queries.
  • Maintain up to the second accuracy of stored data using streaming aggregation / continuous ETL.
  • Build powerful predictive analytics by joining trend and predictive analytics from the DBMS with streaming predictive analytics in SQLstream Blaze.

However, for the end customer, the technical capability and benefits of an integration stored/streaming analytics platform translate into significant business benefits:

  • Better strategic decision making through real-time updates of the data driving their business reporting.
  • Improved operational efficiency through real-time insights into their business operations and the ability to automate actions and updates.
  • Generate new revenue streams by extended existing applications for real-time access and performance.
Real-time Traffic Apps built on a stream processing platform

Apps have been the single most important IT innovation of recent times. Apps bridge the gap between users and data, providing immediate access to information when it’s most convenient, presented in a way that’s accessible and useful. However, app development, and in particular the backend platform capability, has struggled… Read more →

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

There are undoubtedly several approaches to the way systems deal with real-time data before it is persisted in a database. For example, two of the most common open source platforms for this are Apache Storm and Apache Spark (with its Spark Streaming framework), and both take a very different… Read more →

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

There was a lot of interest and discussion at the recent Strata in San Jose around real-time analytics. Seems to be the hot topic, and many have been blogging about it since, including why SQL is a good idea for real-time streaming analytics. I would certainly agree with… Read more →

Stream Processing and Streaming Analytics for Telecommunications

The potential for Big Data in Telecoms is immense, with the global big data market in the telecom sector growing at a CAGR of 55.24% over the period 2011-2015. Streaming integration and analysis of call data records (CDRs and IPDRs)combined with customer, device, location and network data is at… Read more →

s-Visualizer-Real-time-Dashboards

Just about everyone has heard the term Big Data, even my most non-technical of friends. Significantly fewer people outside the industry are quite so familiar with the term “self service analytics”, although most can make a stab as to its meaning. Self-service analytics is not a new phenomenon but… Read more →

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

This past year has seen Big Data technology mature into Enterprise-class platforms capable of delivering value in the largest of organizations. Hadoop storage platforms and stream processing are now core components of the standard Big Data enterprise architecture, with traditional RDBMS and warehouse platforms regaining their position in the… Read more →

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

This week saw the publication of the results for a comparative real-time performance benchmark between Apache Storm and SQLstream Blaze. Using the WordCount example shipped with Hadoop and Apache Storm, we were interested to see just how quickly each could process records.
As it turns out, pretty quickly for… Read more →

Security Internet of Things

Cybersecurity and the Internet of Things are increasingly uncomfortable bedfellows. We’ve blogged before on the the security gaps that already exist as a result of connecting yesterday’s technology to the Internet. A recent article by Colin Wood published on govtech.com goes several steps further and brings us up to date. The… Read more →

Extreme data stream processing performance

The stream processing paradigm differs from the traditional storage-based data management paradigm with which we grew up. Stream processors are fast as they are in-memory (although this is not unusual these days), process data streams record-by-record as they arrive over time or record-based windows using continuous queries (which never… Read more →

Mobile-Map

Big Data is also about faster results, streaming analytics and real-time actions (as low as millisecond latency) in the case of stream processing, with faster batch operations in the case of Hadoop (a few hours). Fast data also means a different set of data quality issues – which must… Read more →

Real-time to Action for Stream Processing

There’s certainly a vast range of different IoT API, connection protocol technologies and data formats. At first glance, this makes device to device communication tricky, particularly as the Internet of Things encompasses all vendors and technologies. However, the Internet of Things is not necessarily a direct vendor to vendor… Read more →

Real-time to Action for Stream Processing

A large variety of commercial and open source event processing software is available to architects and developers who are building event processing applications. These are sometimes called event processing platforms, complex-event processing (CEP) systems, event stream processing (ESP) systems, or distributed stream computing platforms (DSCPs).
Distinguished analyst Roy Schulte from… Read more →

August 27, 2014 by in Stream Processing
Real-time Big Data Cost of Performance

Data processing technologies perform at different rates, making the Total Cost of Performance a hot topic for feasibility studies concerning Big Data tools including traditional databases, Hadoop and stream processors. The reason is simple. Any storage-based technology must store the data first before they can be queried and processed,… Read more →

July 3, 2014 by in Hadoop
Real-time to Action for Stream Processing

Log analytics has been around for a while but until recently, “real-time log analytics” usually meant
a) slow answers (in many minutes or even a few hours)  and
b) low volumes (data arriving at a few thousand records per second).
Things, needless to say, are changing (could be the recent uprise in… Read more →

300x200-Blog-BigData

A distributed data management architecture is an essential requirement for real-time Big Data applications such as managing IoT sensor and machine data payloads. Smart services for IoT applications will require low latency answers, multiple servers and distributed processing for scalability, plus built-in redundancy for resilient, 24×7 operations.
It’s also important… Read more →

On Thursday, April 24, SQLstream hosted a webinar exploring the potential of the Internet-of-Things. With a focus on monetization, the event expanded on harvesting real-time value from IoT services, discussing technology requirements, security concerns and likely directions for commercialization.
So what is the Internet of Things? To many, it’s about connected devices,… Read more →

SQLstream StreamApps are fast-start templates for real-time streaming machine Big Data applications.  Each StreamApp is a library of components for a specific operational business process. In a Big Data industry typified by a lack of standards and high development costs, StreamApps takes SQLstream’s standards-based SQL platform for streaming operational… Read more →

Just back from the Silicon Valley Comes to Oxford (SVCO), an invite-only event at the Said Business School, University of Oxford University in the UK. The aim of the event is to provide insight to the Business School graduates on how to start, scale and run high-growth companies. The speakers… Read more →

Ventana’s Technology Innovation Awards showcase “advances in technology that contribute significantly to improved efficiency, productivity and performance of the organization.” SQLstream’s IT Analytics and Performance award recognized SQLstream’s innovative technology and ability to optimize operational processes and systems. The award considered all aspects of SQLstreams technology and business approach,… Read more →

The definition of machine data covers, not surprisingly, all data generated by machines – servers, applications, sensors, web feeds, networks and service platforms. It covers everything from data centers, telecommunications networks and services to machine-to-machine and the Internet of Things in a device-connected world.
The value of machine data is… Read more →

Folklore has it that the term ‘Internet of Things’ (IoT) was first popularized in 1999 at MIT to describe the architecture of connected RFID devices. Cisco then looked to define when the IoT came in to being as a concrete entity – defined as the year in which the… Read more →

We participated on the “Architecting Big Data Systems for Speed” panel at E2 Conference. Great event, and a great opportunity to discuss technology in a business context. The panel offered a range of perspectives with other panelists from Translattice and Oracle’s NoSQL division. A number of interesting topics emerged,… Read more →

June 19, 2013 by in Streaming Analytics

Sensors Expo is the leading industry event for the types of intelligent sensor-integrated systems that are driving the next generation Internet of Everything, Industrial Internet, telematics and Machine-to-Machine services. As a key sponsor in the Big Data and Wireless Systems pavilion, and speaking in the Big Data track, we found… Read more →

logo_sensors_0

San Francisco, CA | May 30, 2013 – SQLstream Inc., the Streaming Big Data Company, announced today that SQLstream VP Americas, Glenn Hout, has been invited to speak on real-time operational intelligence and prescriptive analytics for the Internet of Everything at Sensors Expo 2013, Jun 4-6.
Held in Rosemont, Illinois,… Read more →

The Internet of Everything is the new frontier for real-time and Big Data, where Velocity now trumps Volume as the primary driver, where the geographical distribution of streaming data adds new levels of complexity, and yet the useful lifetime of data, the window within which to make a decision,… Read more →

Real-time to Action for Stream Processing

SQLstream sponsored the recent IE Group Big Data Innovation Summit in San Francisco where I also presented on streaming SQL for Hadoop, and extending Hadoop for real-time operational intelligence and streaming analytics. As Big Data technologies and Hadoop push further into mainstream enterprises, so the need for real-time business… Read more →

300x200-Blog-BigData

Today’s edition of Information Management’s DM Radio Broadcast, The Future of Integration: ETL, CDC and IOA, had a great panel line up discussing the breadth of data integration issues in today’s world of Big Data, Cloud and traditional enterprise architectures. The session was hosted by Eric Kavanagh (Bloor) and… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

I have recently researched Infosphere Streams’ “Stream Processing Language” or SPL, after a pretty good talk at a recent SVForum SIG group meeting. As I understand it from the talk, early users of this technology have found that their Version 2.0 Stream Processing Application Declarative Engine (SPADE) programming language has been… Read more →

Real-time Big Data means the streaming, continuous integration of high volume, high velocity data from all sources to all destinations,  coupled with powerful in-memory analytics. It’s a paradigm shift from conventional store and process systems that’s playing well here at the Intelligent Transportation Systems and Solutions World Congress here… Read more →

We’re at the Intelligent Transportation Society’s World Congress in Vienna, Austria next week, Mon 22 – Fri 26 October. The theme for this year is ‘Smarter on the Way’ and this is the first year since 2009 that it’s been held in Europe. The full panoply of intelligent transportation… Read more →

Perhaps the highlight of Oracle OpenWorld last week, or at least, the most commented on by attendees at our booth, seemed to be Larry Ellison’s demo of Exadata and Exalytics – querying 10 days or so of stored twitter feeds with the hope of finding the best US athlete… Read more →

This week we’re at the Intelligent Transportation Society of California’s Annual Conference & Expo in Sacramento. The conference is focussed on the adoption of advanced technologies to improve traveler mobility and heighten safety in California.  Attendees include specialists from government, industry and academia.
This week we’ll be demoing real-time traffic congestion… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

I am going to discuss a SQLstream application for monitoring traffic flow in real-time. In this application, vehicles with GPS enabled devices transmit vehicle position along with other vehicle information such as speed and engine state. SQLstream receives this information as a real-time data stream and uses streaming SQL… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

Last year has been an interesting experience as I participated in a number of stream processing and streaming analytics customer projects for SQLstream. Developing these real-time, stream computing projects greatly increased my appreciation for the advantages of an open, extensible and standards-compliant middleware infrastructure.
For example, I needed to implement… Read more →

Glue Conference 2012 , Denver CO, at the end of May was a great conference, well attended, knowledgeable participants and is the only conference I know that looks at gluing cloud and mobile applications together with a developer focus.
There was the usual wave of NoSQL, cloud storage, cloud platforms… Read more →

We’re at ITS America Annual Meeting, National Harbor, Washington DC  this week (see the team in action below), the yearly opportunity for the US intelligent transportation community to get together and discuss how IT and technology can be used to better serve travelers, industry and government. Real-time, the Smart… Read more →

May 22, 2012 by in Internet of Things

This week I’m attending an interesting conference at UC Berkeley called the “Berkeley conference on Streaming Data”.  The organizers are primarily astronomers and statisticians, but the talks discuss issues and solutions to streaming data problems across a wide selection of scientific areas and engineering applications.  Real-time streaming analytics and Big… Read more →

Joining real-time structured and unstructured data feeds for better accuracy and reliability from your operational intelligence, and the Text Analytics Summit, 2012, London.
Three IT trends have emerged over the past year – Big Data, real-time and the importance of unstructured data. Taking the latter first, there is an increasing… Read more →

Last week SQLstream sponsored and CEO Damian Black presented at Structure Data in New York, a conference exploring “the technical and business opportunities spurred by the growth of big data”.
It’s clear that Big Data has moved on considerably in a very short space of time. From the Silicon Valley,… Read more →

For Big Data, 2012 has started where 2011 left off, with a plethora of reports, articles and blogs. Interestingly, most still begin with the question “what is Big Data”. It appears ‘Big Data’ as a market is broadening its footprint far beyond its open source and Hadoop origins…. Read more →

January 11, 2012 by in Hadoop

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the third and final part of the Geospatial Visualization tutorial. The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the second in the Geospatial Visualization tutorial.  The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps required to capture… Read more →

The real-time data hub for smart city, intelligent transportation and Internet of Things appliances

A defining feature of the show is the Technology Showcase featuring demonstrations from some of the technologies and applications that are bringing the future of transportation to life. Each ‘village’ covers a specific theme such as Safety, Mobility, Environment/Sustainability and Pricing.  Environment/Sustainability focuses on the potential for reducing emissions.  Interesting… Read more →

SAN FRANCISCO, CA, October 17, 2011 – SQLstream Inc. today announced the public availability of SQLstream ITS Insight, the first real-time solution for reducing congestion to exploit low cost wireless GPS data as a complement to existing fixed-road sensor investment. Transportation Agencies are already benefiting already from SQLstream ITS Insight,… Read more →

Visit SQLstream on Booth #1366
Technology and innovation are central themes of this year’s ITS World Congress.  There’s been much written about the issues of congestion, green transportation schemes and improving personal mobility, not least in this blog.  At SQLstream we’ve been playing our part to help revolutionize the Intelligent… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

A streaming SQLstream application will feel very familiar to anyone with some basic knowledge of SQL and traditional RDBMS applications.  SQLstream uses standards-based SQL, except that streaming SQL queries run forever, processing data as they arrive over specified time windows. This blog is the first in a series of… Read more →

The 18th World Congress on Intelligent Transport Systems (ITS) is being held in Orlando from October 16th – 20th, 2011. This is the leading event for intelligent transportation solutions, and attracts a large audience of government, technology and industry professionals. The event seeks to demonstrate advances in the application… Read more →

The latest product update of SQLstream, version 2.5.1, has just been released and shipped.  This will be the final 2.x release prior to the SQLstream 3 launch, and although SQLstream 2.5.1 is predominately a maintenance release, it does include a range of feature enhancements, including:
– Support for exponentially decaying… Read more →

August 10, 2011 by in Streaming Analytics

SQLstream is helping to predict earthquakes across the world in real time. The system has been developed by a consortium of universities and government agencies, with funding from NSF (National Science Foundation), to provide an infrastructure of networked tools for research in ocean science – constructing an internet-based system… Read more →

SQLstream has been powering Mozilla’s Firefox Download Monitor since 2009. A SQLstream based application has been continually aggregating hundreds of millions of download events, receiving minute by minute aggregations via a continuously running SQL SELECT statement using the SQLstream JDBC driver. A continuously running SELECT statement is syntactically and… Read more →

A streaming SQL query is a continuous, standing query that executes over streaming data. Data streams are processed using familiar SQL relational operators augmented to handle time sensitive data. Streaming queries are similar to database queries in how they analyze data; they differ by operating continuously on data as… Read more →

In the game industry, complex game logic needs to be applied to streams of events generated by gameplay.  In single player games, this logic is simply handled by applying the correct computations.  However, in an Internet based social game where millions of players interact together online, the problem takes… Read more →

Businesses need to respond faster than ever to customer information and demands, which are arriving in rapidly increasing volumes from ever more diverse and distributed systems. This need for real-time business models can not be addressed by traditional integration and business intelligence solutions because streaming analytics and related concepts… Read more →

Contact Us

Would you like to know more about stream processing?

+1 877 571 5775

Ask a Question