Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

There are undoubtedly several approaches to the way systems deal with real-time data before it is persisted in a database. For example, two of the most common open source platforms for this are Apache Storm and Apache Spark (with its Spark Streaming framework), and both take a very different approach to processing data streams. Storm, like SQLstream Blaze, IBM InfoSphere Streams and many others, are true record-by-record stream processing engines. Others such as Apache Spark take a different approach and collect events together for processing in batches. I’ve summarized here the main considerations when considering which paradigm is most appropriate.

#1 Stream Processing versus batch-based processing of data streams

There are two fundamental attributes of data stream processing. First, each and every record in the system must have a timestamp, which in 99% of cases is the time at which the data were created. Second, each and every record is processed as it arrives. These two attributes ensure a system that can react to the contents of every record, and can correlate across multiple records over time, even down to millisecond latency. In contrast, approaches such as Spark Streaming process data streams in batches, where each batch contains a collection of events that arrived over the batch period (regardless of when the data were actually created). This is fine for some applications such as simple counts and ETL into Hadoop, but the lack of true record-by-record processes makes stream processing and time-series analytics impossible.

#2 Data arriving out of time order is a problem for batch-based processing

Processing data in the real world is a messy business. Data is often of poor quality, records can be missing, and streams arrive with data out of (creation) time order. Data from multiple remote sources may be generated at the same time, but due to network or other issues, some streams may be delayed. A corollary of stored batch processing of data streams is that these real-time factors cannot be addressed easily, making it impossible or at best expensive (computing resources and therefore performance) to detect missing data, data gaps, correct out of time order data etc. This is a simple problem to overcome for a record-by-record stream processing platform, where each record has its own timestamp, and is processed individually.

#3 Batch length restricts Window-based analytics

Any system that uses batch-based processing of data streams is limiting the granularity of response to the batch length. Window operations can be simulated by iterating repeatedly over a series of micro batches, in much the same way as static queries operate over stored data. However, this is expensive in terms of processing resources and adds further to the computation overheads. And processing is still limited to the arrival time of the data (rather than the time at which the data were created).

#4 Spark claims to be faster than Storm but is still performance limited

Spark Streaming’s Java or Scala-based execution architecture is claimed to be 4X to 8X faster than Apache Storm using the WordCount benchmark. However, Apache Storm offers limited performance per server by stream processing standards these days, although does scale out over large numbers of servers to gain overall system performance. (This can make larger systems expensive, both in terms of server, power and cooling costs, but also a factor of the additional distributed system complexity.) The point here is that Spark Streaming’s performance can be improved by using larger batches, which may explain the performance increase, but larger batches moves further away from real-time processing towards stored batch mode, and exacerbates the stream processing and real-time, time-based analytics issues.

#5 Writing stream processing operations from scratch is not easy

Batch-based platforms such as Spark Streaming typically offer limited libraries of stream functions that are called programmatically to perform aggregation and counts on the arriving data. Developing a streaming analytics application on Spark Streaming for example requires writing code in Java or Scala. Processing data streams is a a different paradigm, and moreover, Java is typicaly 50X less compact than say SQL – significantly more code required. Java and Scala require significant garbage collection which is particularly inefficient and troublesome for in-memory processing.

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

There was a lot of interest and discussion at the recent Strata in San Jose around real-time analytics. Seems to be the hot topic, and many have been blogging about it since, including why SQL is a good idea for real-time streaming analytics. I would certainly agree with… Read more →

Stream Processing and Streaming Analytics for Telecommunications

The potential for Big Data in Telecoms is immense, with the global big data market in the telecom sector growing at a CAGR of 55.24% over the period 2011-2015. Streaming integration and analysis of call data records (CDRs and IPDRs)combined with customer, device, location and network data is at… Read more →


Just about everyone has heard the term Big Data, even my most non-technical of friends. Significantly fewer people outside the industry are quite so familiar with the term “self service analytics”, although most can make a stab as to its meaning. Self-service analytics is not a new phenomenon but… Read more →

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

This past year has seen Big Data technology mature into Enterprise-class platforms capable of delivering value in the largest of organizations. Hadoop storage platforms and stream processing are now core components of the standard Big Data enterprise architecture, with traditional RDBMS and warehouse platforms regaining their position in the… Read more →

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

This week saw the publication of the results for a comparative real-time performance benchmark between Apache Storm and SQLstream Blaze. Using the WordCount example shipped with Hadoop and Apache Storm, we were interested to see just how quickly each could process records.
As it turns out, pretty quickly for… Read more →

Security Internet of Things

Cybersecurity and the Internet of Things are increasingly uncomfortable bedfellows. We’ve blogged before on the the security gaps that already exist as a result of connecting yesterday’s technology to the Internet. A recent article by Colin Wood published on goes several steps further and brings us up to date. The… Read more →

Extreme data stream processing performance

The stream processing paradigm differs from the traditional storage-based data management paradigm with which we grew up. Stream processors are fast as they are in-memory (although this is not unusual these days), process data streams record-by-record as they arrive over time or record-based windows using continuous queries (which never… Read more →


Big Data is also about faster results, streaming analytics and real-time actions (as low as millisecond latency) in the case of stream processing, with faster batch operations in the case of Hadoop (a few hours). Fast data also means a different set of data quality issues – which must… Read more →

Real-time to Action for Stream Processing

There’s certainly a vast range of different IoT API, connection protocol technologies and data formats. At first glance, this makes device to device communication tricky, particularly as the Internet of Things encompasses all vendors and technologies. However, the Internet of Things is not necessarily a direct vendor to vendor… Read more →

Real-time to Action for Stream Processing

A large variety of commercial and open source event processing software is available to architects and developers who are building event processing applications. These are sometimes called event processing platforms, complex-event processing (CEP) systems, event stream processing (ESP) systems, or distributed stream computing platforms (DSCPs).
Distinguished analyst Roy Schulte from… Read more →

August 27, 2014 by in Stream Processing
Real-time Big Data Cost of Performance

Data processing technologies perform at different rates, making the Total Cost of Performance a hot topic for feasibility studies concerning Big Data tools including traditional databases, Hadoop and stream processors. The reason is simple. Any storage-based technology must store the data first before they can be queried and processed,… Read more →

July 3, 2014 by in Hadoop
Real-time to Action for Stream Processing

Log analytics has been around for a while but until recently, “real-time log analytics” usually meant
a) slow answers (in many minutes or even a few hours)  and
b) low volumes (data arriving at a few thousand records per second).
Things, needless to say, are changing (could be the recent uprise in… Read more →


A distributed data management architecture is an essential requirement for real-time Big Data applications such as managing IoT sensor and machine data payloads. Smart services for IoT applications will require low latency answers, multiple servers and distributed processing for scalability, plus built-in redundancy for resilient, 24×7 operations.
It’s also important… Read more →

On Thursday, April 24, SQLstream hosted a webinar exploring the potential of the Internet-of-Things. With a focus on monetization, the event expanded on harvesting real-time value from IoT services, discussing technology requirements, security concerns and likely directions for commercialization.
So what is the Internet of Things? To many, it’s about connected devices,… Read more →

SQLstream StreamApps are fast-start templates for real-time streaming machine Big Data applications.  Each StreamApp is a library of components for a specific operational business process. In a Big Data industry typified by a lack of standards and high development costs, StreamApps takes SQLstream’s standards-based SQL platform for streaming operational… Read more →

Just back from the Silicon Valley Comes to Oxford (SVCO), an invite-only event at the Said Business School, University of Oxford University in the UK. The aim of the event is to provide insight to the Business School graduates on how to start, scale and run high-growth companies. The speakers… Read more →

Ventana’s Technology Innovation Awards showcase “advances in technology that contribute significantly to improved efficiency, productivity and performance of the organization.” SQLstream’s IT Analytics and Performance award recognized SQLstream’s innovative technology and ability to optimize operational processes and systems. The award considered all aspects of SQLstreams technology and business approach,… Read more →

The definition of machine data covers, not surprisingly, all data generated by machines – servers, applications, sensors, web feeds, networks and service platforms. It covers everything from data centers, telecommunications networks and services to machine-to-machine and the Internet of Things in a device-connected world.
The value of machine data is… Read more →

Folklore has it that the term ‘Internet of Things’ (IoT) was first popularized in 1999 at MIT to describe the architecture of connected RFID devices. Cisco then looked to define when the IoT came in to being as a concrete entity – defined as the year in which the… Read more →

We participated on the “Architecting Big Data Systems for Speed” panel at E2 Conference. Great event, and a great opportunity to discuss technology in a business context. The panel offered a range of perspectives with other panelists from Translattice and Oracle’s NoSQL division. A number of interesting topics emerged,… Read more →

June 19, 2013 by in Streaming Analytics

Sensors Expo is the leading industry event for the types of intelligent sensor-integrated systems that are driving the next generation Internet of Everything, Industrial Internet, telematics and Machine-to-Machine services. As a key sponsor in the Big Data and Wireless Systems pavilion, and speaking in the Big Data track, we found… Read more →


San Francisco, CA | May 30, 2013 – SQLstream Inc., the Streaming Big Data Company, announced today that SQLstream VP Americas, Glenn Hout, has been invited to speak on real-time operational intelligence and prescriptive analytics for the Internet of Everything at Sensors Expo 2013, Jun 4-6.
Held in Rosemont, Illinois,… Read more →

The Internet of Everything is the new frontier for real-time and Big Data, where Velocity now trumps Volume as the primary driver, where the geographical distribution of streaming data adds new levels of complexity, and yet the useful lifetime of data, the window within which to make a decision,… Read more →

Real-time to Action for Stream Processing

SQLstream sponsored the recent IE Group Big Data Innovation Summit in San Francisco where I also presented on streaming SQL for Hadoop, and extending Hadoop for real-time operational intelligence and streaming analytics. As Big Data technologies and Hadoop push further into mainstream enterprises, so the need for real-time business… Read more →


Today’s edition of Information Management’s DM Radio Broadcast, The Future of Integration: ETL, CDC and IOA, had a great panel line up discussing the breadth of data integration issues in today’s world of Big Data, Cloud and traditional enterprise architectures. The session was hosted by Eric Kavanagh (Bloor) and… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

I have recently researched Infosphere Streams’ “Stream Processing Language” or SPL, after a pretty good talk at a recent SVForum SIG group meeting. As I understand it from the talk, early users of this technology have found that their Version 2.0 Stream Processing Application Declarative Engine (SPADE) programming language has been… Read more →

Real-time Big Data means the streaming, continuous integration of high volume, high velocity data from all sources to all destinations,  coupled with powerful in-memory analytics. It’s a paradigm shift from conventional store and process systems that’s playing well here at the Intelligent Transportation Systems and Solutions World Congress here… Read more →

We’re at the Intelligent Transportation Society’s World Congress in Vienna, Austria next week, Mon 22 – Fri 26 October. The theme for this year is ‘Smarter on the Way’ and this is the first year since 2009 that it’s been held in Europe. The full panoply of intelligent transportation… Read more →

Perhaps the highlight of Oracle OpenWorld last week, or at least, the most commented on by attendees at our booth, seemed to be Larry Ellison’s demo of Exadata and Exalytics – querying 10 days or so of stored twitter feeds with the hope of finding the best US athlete… Read more →

This week we’re at the Intelligent Transportation Society of California’s Annual Conference & Expo in Sacramento. The conference is focussed on the adoption of advanced technologies to improve traveler mobility and heighten safety in California.  Attendees include specialists from government, industry and academia.
This week we’ll be demoing real-time traffic congestion… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

I am going to discuss a SQLstream application for monitoring traffic flow in real-time. In this application, vehicles with GPS enabled devices transmit vehicle position along with other vehicle information such as speed and engine state. SQLstream receives this information as a real-time data stream and uses streaming SQL… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

Last year has been an interesting experience as I participated in a number of stream processing and streaming analytics customer projects for SQLstream. Developing these real-time, stream computing projects greatly increased my appreciation for the advantages of an open, extensible and standards-compliant middleware infrastructure.
For example, I needed to implement… Read more →

Glue Conference 2012 , Denver CO, at the end of May was a great conference, well attended, knowledgeable participants and is the only conference I know that looks at gluing cloud and mobile applications together with a developer focus.
There was the usual wave of NoSQL, cloud storage, cloud platforms… Read more →

We’re at ITS America Annual Meeting, National Harbor, Washington DC  this week (see the team in action below), the yearly opportunity for the US intelligent transportation community to get together and discuss how IT and technology can be used to better serve travelers, industry and government. Real-time, the Smart… Read more →

May 22, 2012 by in Internet of Things

This week I’m attending an interesting conference at UC Berkeley called the “Berkeley conference on Streaming Data”.  The organizers are primarily astronomers and statisticians, but the talks discuss issues and solutions to streaming data problems across a wide selection of scientific areas and engineering applications.  Real-time streaming analytics and Big… Read more →

Joining real-time structured and unstructured data feeds for better accuracy and reliability from your operational intelligence, and the Text Analytics Summit, 2012, London.
Three IT trends have emerged over the past year – Big Data, real-time and the importance of unstructured data. Taking the latter first, there is an increasing… Read more →

Last week SQLstream sponsored and CEO Damian Black presented at Structure Data in New York, a conference exploring “the technical and business opportunities spurred by the growth of big data”.
It’s clear that Big Data has moved on considerably in a very short space of time. From the Silicon Valley,… Read more →

For Big Data, 2012 has started where 2011 left off, with a plethora of reports, articles and blogs. Interestingly, most still begin with the question “what is Big Data”. It appears ‘Big Data’ as a market is broadening its footprint far beyond its open source and Hadoop origins…. Read more →

January 11, 2012 by in Hadoop

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the third and final part of the Geospatial Visualization tutorial. The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the second in the Geospatial Visualization tutorial.  The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps required to capture… Read more →

A defining feature of the show is the Technology Showcase featuring  demonstrations from some of the technologies and applications that are bringing the future of transportation to life.  Each ‘village’ covers a specific theme such as Safety, Mobility, Environment/Sustainability and Pricing.  Environment/Sustainability focuses on the potential for reducing emissions. … Read more →

SAN FRANCISCO, CA, October 17, 2011 - SQLstream Inc. today announced the public availability of SQLstream ITS Insight, the first real-time solution for reducing congestion to exploit low cost wireless GPS data as a complement to existing fixed-road sensor investment. Transportation Agencies are already benefiting already from SQLstream ITS Insight,… Read more →

Visit SQLstream on Booth #1366
Technology and innovation are central themes of this year’s ITS World Congress.  There’s been much written about the issues of congestion, green transportation schemes and improving personal mobility, not least in this blog.  At SQLstream we’ve been playing our part to help revolutionize the Intelligent… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

A streaming SQLstream application will feel very familiar to anyone with some basic knowledge of SQL and traditional RDBMS applications.  SQLstream uses standards-based SQL, except that streaming SQL queries run forever, processing data as they arrive over specified time windows. This blog is the first in a series of… Read more →

The 18th World Congress on Intelligent Transport Systems (ITS) is being held in Orlando from October 16th – 20th, 2011. This is the leading event for intelligent transportation solutions, and attracts a large audience of government, technology and industry professionals. The event seeks to demonstrate advances in the application… Read more →

The latest product update of SQLstream, version 2.5.1, has just been released and shipped.  This will be the final 2.x release prior to the SQLstream 3 launch, and although SQLstream 2.5.1 is predominately a maintenance release, it does include a range of feature enhancements, including:
- Support for exponentially decaying… Read more →

August 10, 2011 by in Streaming Analytics

SQLstream is helping to predict earthquakes across the world in real time. The system has been developed by a consortium of universities and government agencies, with funding from NSF (National Science Foundation), to provide an infrastructure of networked tools for research in ocean science – constructing an internet-based system… Read more →

SQLstream has been powering Mozilla’s Firefox Download Monitor since 2009. A SQLstream based application has been continually aggregating hundreds of millions of download events, receiving minute by minute aggregations via a continuously running SQL SELECT statement using the SQLstream JDBC driver. A continuously running SELECT statement is syntactically and… Read more →

A streaming SQL query is a continuous, standing query that executes over streaming data. Data streams are processed using familiar SQL relational operators augmented to handle time sensitive data. Streaming queries are similar to database queries in how they analyze data; they differ by operating continuously on data as… Read more →

In the game industry, complex game logic needs to be applied to streams of events generated by gameplay.  In single player games, this logic is simply handled by applying the correct computations.  However, in an Internet based social game where millions of players interact together online, the problem takes… Read more →

Businesses need to respond faster than ever to customer information and demands, which are arriving in rapidly increasing volumes from ever more diverse and distributed systems. This need for real-time business models can not be addressed by traditional integration and business intelligence solutions because streaming analytics and related concepts… Read more →

Contact Us

Would you like to know more about stream processing?

+1 877 571 5775

Ask a Question