Blog

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

The concept of time is at the core of all Big Data processing technologies but is particularly important in the world of data stream processing. Indeed, it is reasonable to say that the way in which different systems handle time-based processing is what differentiates the wheat from the chaff as it were, at least in the world of real-time stream processing. There are many aspects of time-based processing. We’re going to look at three. First, the ability to generate results in real-time, second, the ability to process a time-series data stream in real-time, and third, although it may seem obvious, an in-built understanding of time – in particular the time at which data records were created. More on this later.

In essence these factors boil down to the key requirements to consider when rolling out a data stream processing application:

  • The ability to maintain low latency results (< 10ms) independent of arriving data volume, and particular as data volumes grow hundreds of thousands of records per second.
  • The ability to process arriving data record by record, essential for real-time analytics, alerting and true time-based processing.
  • A native understanding of time and the ability to process by data creation time, rather than simply wall clock time.

These attributes are discussed further in the following paragraphs.

Real-time Processing

There are many archtectures capable of processing an unbounded, time-series data stream. The issue is one of generating answers quickly enough as data volume increase. I don’t believe the term streaming implies a particular execution engine. An RDBMS or in-memory column store are perfectly capable of processing a data stream and generating low latency responses if the data volume is low and latency is measured in minutes. As data volume increases, it is architectural limitations that define the boundary between using a data management platform or a data stream management platform. The main difference I see is the trade off between result latency and data volume – data stream management delivers on both.

Time-series Data Processing

For me, stream processing of time-series data means record by record processing, that is,processing each new record as they arrive. Systems such as Spark Streaming use a different approach, using a batch-based architecture. I wouldn’t consider batch-based platforms data stream processing platforms, rather in the wider category of systems that can process a data stream. See 5 reasons why Spark Streaming’s batch processing of data streams is not stream processing. for more details. Batch-based is a new phenomenon which at first glance is difficult to understand as to the reasons why as it appears to be a backwards step. For example, batch-based stream processing is severely restrictive when it comes to real-time analytics from data streams. However, there is one use case which is the data loading Hadoop – where dataload is the only requirement, batch-based can be a useful approach.

Understanding Time

A native capability to understand time is essential for the correct, repeatable and reliable processing of a time-series data stream. Most data stream processing utilizes time windows to process the arriving data. The simplest approach (used by Spark Streaming for example) is to process data using wall clock time – the time that the data arrived as understood by the underlying processing platform. However, wall clock time is unsuitable for most real-world stream processing applications, where alerts and patterns are only valid when based on the time of record creation. Furthermore, data in the real world is often delayed, sometimes significantly, where the only way to achieve useful answers is to wait and to process all data streams against data creation time.

It is a common misnomer that wall clock time is the more popular. It is certainly true and Hadoop-based frameworks are based on wall clock time, however this is a limitation of the architecture. We see very few use cases for processing by wall clock time except in two scenarios: (1) where there is no option as the data for whatever reason does not contain a timestamp or could not be punctuated at source by the change data capture collectors, and (2) dataload for Hadoop where the timestamp of data creation is not required. Therefore a native ability to support both is required, with data creation time being the default requirement in most cases.

Summary

In summary, not all data stream processing platforms are alike. The key consideration is the use case. The simplest case is Hadoop dataload, for which there are numerous solutions of varying throughout capacity and ease of configuration, and where processing by wall clock time and batch-based architectures may be sufficient. However, if real-time alerting and analytics, and the ability to integrate most sophisticated predictive analytics on data streams is required, drive automated actions etc, then processing by data creation time and the ability to process each record as it arrives are essential.

Streaming SQL for Stream Processing and Streaming Analytics

Gartner has predicted that the data processing market will grow to a value of $55B by 2016 and that much of that value is held in the data, waiting to be exploited. Maximizing the potential means analyzing data in motion, as soon as the data are created, as well… Read more →

Real-time Traffic Apps built on a stream processing platform

Apps have been the single most important IT innovation of recent times. Apps bridge the gap between users and data, providing immediate access to information when it’s most convenient, presented in a way that’s accessible and useful. However, app development, and in particular the backend platform capability, has struggled… Read more →

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

There are undoubtedly several approaches to the way systems deal with real-time data before it is persisted in a database. For example, two of the most common open source platforms for this are Apache Storm and Apache Spark (with its Spark Streaming framework), and both take a very different… Read more →

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

There was a lot of interest and discussion at the recent Strata in San Jose around real-time analytics. Seems to be the hot topic, and many have been blogging about it since, including why SQL is a good idea for real-time streaming analytics. I would certainly agree with… Read more →

Stream Processing and Streaming Analytics for Telecommunications

The potential for Big Data in Telecoms is immense, with the global big data market in the telecom sector growing at a CAGR of 55.24% over the period 2011-2015. Streaming integration and analysis of call data records (CDRs and IPDRs)combined with customer, device, location and network data is at… Read more →

s-Visualizer-Real-time-Dashboards

Just about everyone has heard the term Big Data, even my most non-technical of friends. Significantly fewer people outside the industry are quite so familiar with the term “self service analytics”, although most can make a stab as to its meaning. Self-service analytics is not a new phenomenon but… Read more →

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

This past year has seen Big Data technology mature into Enterprise-class platforms capable of delivering value in the largest of organizations. Hadoop storage platforms and stream processing are now core components of the standard Big Data enterprise architecture, with traditional RDBMS and warehouse platforms regaining their position in the… Read more →

Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

This week saw the publication of the results for a comparative real-time performance benchmark between Apache Storm and SQLstream Blaze. Using the WordCount example shipped with Hadoop and Apache Storm, we were interested to see just how quickly each could process records.
As it turns out, pretty quickly for… Read more →

Security Internet of Things

Cybersecurity and the Internet of Things are increasingly uncomfortable bedfellows. We’ve blogged before on the the security gaps that already exist as a result of connecting yesterday’s technology to the Internet. A recent article by Colin Wood published on govtech.com goes several steps further and brings us up to date. The… Read more →

Extreme data stream processing performance

The stream processing paradigm differs from the traditional storage-based data management paradigm with which we grew up. Stream processors are fast as they are in-memory (although this is not unusual these days), process data streams record-by-record as they arrive over time or record-based windows using continuous queries (which never… Read more →

Mobile-Map

Big Data is also about faster results, streaming analytics and real-time actions (as low as millisecond latency) in the case of stream processing, with faster batch operations in the case of Hadoop (a few hours). Fast data also means a different set of data quality issues – which must… Read more →

Real-time to Action for Stream Processing

There’s certainly a vast range of different IoT API, connection protocol technologies and data formats. At first glance, this makes device to device communication tricky, particularly as the Internet of Things encompasses all vendors and technologies. However, the Internet of Things is not necessarily a direct vendor to vendor… Read more →

Real-time to Action for Stream Processing

A large variety of commercial and open source event processing software is available to architects and developers who are building event processing applications. These are sometimes called event processing platforms, complex-event processing (CEP) systems, event stream processing (ESP) systems, or distributed stream computing platforms (DSCPs).
Distinguished analyst Roy Schulte from… Read more →

August 27, 2014 by in Stream Processing
Real-time Big Data Cost of Performance

Data processing technologies perform at different rates, making the Total Cost of Performance a hot topic for feasibility studies concerning Big Data tools including traditional databases, Hadoop and stream processors. The reason is simple. Any storage-based technology must store the data first before they can be queried and processed,… Read more →

July 3, 2014 by in Hadoop
Real-time to Action for Stream Processing

Log analytics has been around for a while but until recently, “real-time log analytics” usually meant
a) slow answers (in many minutes or even a few hours)  and
b) low volumes (data arriving at a few thousand records per second).
Things, needless to say, are changing (could be the recent uprise in… Read more →

300x200-Blog-BigData

A distributed data management architecture is an essential requirement for real-time Big Data applications such as managing IoT sensor and machine data payloads. Smart services for IoT applications will require low latency answers, multiple servers and distributed processing for scalability, plus built-in redundancy for resilient, 24×7 operations.
It’s also important… Read more →

On Thursday, April 24, SQLstream hosted a webinar exploring the potential of the Internet-of-Things. With a focus on monetization, the event expanded on harvesting real-time value from IoT services, discussing technology requirements, security concerns and likely directions for commercialization.
So what is the Internet of Things? To many, it’s about connected devices,… Read more →

SQLstream StreamApps are fast-start templates for real-time streaming machine Big Data applications.  Each StreamApp is a library of components for a specific operational business process. In a Big Data industry typified by a lack of standards and high development costs, StreamApps takes SQLstream’s standards-based SQL platform for streaming operational… Read more →

Just back from the Silicon Valley Comes to Oxford (SVCO), an invite-only event at the Said Business School, University of Oxford University in the UK. The aim of the event is to provide insight to the Business School graduates on how to start, scale and run high-growth companies. The speakers… Read more →

Ventana’s Technology Innovation Awards showcase “advances in technology that contribute significantly to improved efficiency, productivity and performance of the organization.” SQLstream’s IT Analytics and Performance award recognized SQLstream’s innovative technology and ability to optimize operational processes and systems. The award considered all aspects of SQLstreams technology and business approach,… Read more →

The definition of machine data covers, not surprisingly, all data generated by machines – servers, applications, sensors, web feeds, networks and service platforms. It covers everything from data centers, telecommunications networks and services to machine-to-machine and the Internet of Things in a device-connected world.
The value of machine data is… Read more →

Folklore has it that the term ‘Internet of Things’ (IoT) was first popularized in 1999 at MIT to describe the architecture of connected RFID devices. Cisco then looked to define when the IoT came in to being as a concrete entity – defined as the year in which the… Read more →

We participated on the “Architecting Big Data Systems for Speed” panel at E2 Conference. Great event, and a great opportunity to discuss technology in a business context. The panel offered a range of perspectives with other panelists from Translattice and Oracle’s NoSQL division. A number of interesting topics emerged,… Read more →

June 19, 2013 by in Streaming Analytics

Sensors Expo is the leading industry event for the types of intelligent sensor-integrated systems that are driving the next generation Internet of Everything, Industrial Internet, telematics and Machine-to-Machine services. As a key sponsor in the Big Data and Wireless Systems pavilion, and speaking in the Big Data track, we found… Read more →

logo_sensors_0

San Francisco, CA | May 30, 2013 – SQLstream Inc., the Streaming Big Data Company, announced today that SQLstream VP Americas, Glenn Hout, has been invited to speak on real-time operational intelligence and prescriptive analytics for the Internet of Everything at Sensors Expo 2013, Jun 4-6.
Held in Rosemont, Illinois,… Read more →

The Internet of Everything is the new frontier for real-time and Big Data, where Velocity now trumps Volume as the primary driver, where the geographical distribution of streaming data adds new levels of complexity, and yet the useful lifetime of data, the window within which to make a decision,… Read more →

Real-time to Action for Stream Processing

SQLstream sponsored the recent IE Group Big Data Innovation Summit in San Francisco where I also presented on streaming SQL for Hadoop, and extending Hadoop for real-time operational intelligence and streaming analytics. As Big Data technologies and Hadoop push further into mainstream enterprises, so the need for real-time business… Read more →

300x200-Blog-BigData

Today’s edition of Information Management’s DM Radio Broadcast, The Future of Integration: ETL, CDC and IOA, had a great panel line up discussing the breadth of data integration issues in today’s world of Big Data, Cloud and traditional enterprise architectures. The session was hosted by Eric Kavanagh (Bloor) and… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

I have recently researched Infosphere Streams’ “Stream Processing Language” or SPL, after a pretty good talk at a recent SVForum SIG group meeting. As I understand it from the talk, early users of this technology have found that their Version 2.0 Stream Processing Application Declarative Engine (SPADE) programming language has been… Read more →

Real-time Big Data means the streaming, continuous integration of high volume, high velocity data from all sources to all destinations,  coupled with powerful in-memory analytics. It’s a paradigm shift from conventional store and process systems that’s playing well here at the Intelligent Transportation Systems and Solutions World Congress here… Read more →

We’re at the Intelligent Transportation Society’s World Congress in Vienna, Austria next week, Mon 22 – Fri 26 October. The theme for this year is ‘Smarter on the Way’ and this is the first year since 2009 that it’s been held in Europe. The full panoply of intelligent transportation… Read more →

Perhaps the highlight of Oracle OpenWorld last week, or at least, the most commented on by attendees at our booth, seemed to be Larry Ellison’s demo of Exadata and Exalytics – querying 10 days or so of stored twitter feeds with the hope of finding the best US athlete… Read more →

This week we’re at the Intelligent Transportation Society of California’s Annual Conference & Expo in Sacramento. The conference is focussed on the adoption of advanced technologies to improve traveler mobility and heighten safety in California.  Attendees include specialists from government, industry and academia.
This week we’ll be demoing real-time traffic congestion… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

I am going to discuss a SQLstream application for monitoring traffic flow in real-time. In this application, vehicles with GPS enabled devices transmit vehicle position along with other vehicle information such as speed and engine state. SQLstream receives this information as a real-time data stream and uses streaming SQL… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

Last year has been an interesting experience as I participated in a number of stream processing and streaming analytics customer projects for SQLstream. Developing these real-time, stream computing projects greatly increased my appreciation for the advantages of an open, extensible and standards-compliant middleware infrastructure.
For example, I needed to implement… Read more →

Glue Conference 2012 , Denver CO, at the end of May was a great conference, well attended, knowledgeable participants and is the only conference I know that looks at gluing cloud and mobile applications together with a developer focus.
There was the usual wave of NoSQL, cloud storage, cloud platforms… Read more →

We’re at ITS America Annual Meeting, National Harbor, Washington DC  this week (see the team in action below), the yearly opportunity for the US intelligent transportation community to get together and discuss how IT and technology can be used to better serve travelers, industry and government. Real-time, the Smart… Read more →

May 22, 2012 by in Internet of Things

This week I’m attending an interesting conference at UC Berkeley called the “Berkeley conference on Streaming Data”.  The organizers are primarily astronomers and statisticians, but the talks discuss issues and solutions to streaming data problems across a wide selection of scientific areas and engineering applications.  Real-time streaming analytics and Big… Read more →

Joining real-time structured and unstructured data feeds for better accuracy and reliability from your operational intelligence, and the Text Analytics Summit, 2012, London.
Three IT trends have emerged over the past year – Big Data, real-time and the importance of unstructured data. Taking the latter first, there is an increasing… Read more →

Last week SQLstream sponsored and CEO Damian Black presented at Structure Data in New York, a conference exploring “the technical and business opportunities spurred by the growth of big data”.
It’s clear that Big Data has moved on considerably in a very short space of time. From the Silicon Valley,… Read more →

For Big Data, 2012 has started where 2011 left off, with a plethora of reports, articles and blogs. Interestingly, most still begin with the question “what is Big Data”. It appears ‘Big Data’ as a market is broadening its footprint far beyond its open source and Hadoop origins…. Read more →

January 11, 2012 by in Hadoop

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the third and final part of the Geospatial Visualization tutorial. The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the second in the Geospatial Visualization tutorial.  The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps required to capture… Read more →

The real-time data hub for smart city, intelligent transportation and Internet of Things appliances

A defining feature of the show is the Technology Showcase featuring demonstrations from some of the technologies and applications that are bringing the future of transportation to life. Each ‘village’ covers a specific theme such as Safety, Mobility, Environment/Sustainability and Pricing.  Environment/Sustainability focuses on the potential for reducing emissions.  Interesting… Read more →

SAN FRANCISCO, CA, October 17, 2011 – SQLstream Inc. today announced the public availability of SQLstream ITS Insight, the first real-time solution for reducing congestion to exploit low cost wireless GPS data as a complement to existing fixed-road sensor investment. Transportation Agencies are already benefiting already from SQLstream ITS Insight,… Read more →

Visit SQLstream on Booth #1366
Technology and innovation are central themes of this year’s ITS World Congress.  There’s been much written about the issues of congestion, green transportation schemes and improving personal mobility, not least in this blog.  At SQLstream we’ve been playing our part to help revolutionize the Intelligent… Read more →

Streaming SQL for Stream Processing and Streaming Analytics

A streaming SQLstream application will feel very familiar to anyone with some basic knowledge of SQL and traditional RDBMS applications.  SQLstream uses standards-based SQL, except that streaming SQL queries run forever, processing data as they arrive over specified time windows. This blog is the first in a series of… Read more →

The 18th World Congress on Intelligent Transport Systems (ITS) is being held in Orlando from October 16th – 20th, 2011. This is the leading event for intelligent transportation solutions, and attracts a large audience of government, technology and industry professionals. The event seeks to demonstrate advances in the application… Read more →

The latest product update of SQLstream, version 2.5.1, has just been released and shipped.  This will be the final 2.x release prior to the SQLstream 3 launch, and although SQLstream 2.5.1 is predominately a maintenance release, it does include a range of feature enhancements, including:
– Support for exponentially decaying… Read more →

August 10, 2011 by in Streaming Analytics

SQLstream is helping to predict earthquakes across the world in real time. The system has been developed by a consortium of universities and government agencies, with funding from NSF (National Science Foundation), to provide an infrastructure of networked tools for research in ocean science – constructing an internet-based system… Read more →

SQLstream has been powering Mozilla’s Firefox Download Monitor since 2009. A SQLstream based application has been continually aggregating hundreds of millions of download events, receiving minute by minute aggregations via a continuously running SQL SELECT statement using the SQLstream JDBC driver. A continuously running SELECT statement is syntactically and… Read more →

A streaming SQL query is a continuous, standing query that executes over streaming data. Data streams are processed using familiar SQL relational operators augmented to handle time sensitive data. Streaming queries are similar to database queries in how they analyze data; they differ by operating continuously on data as… Read more →

In the game industry, complex game logic needs to be applied to streams of events generated by gameplay.  In single player games, this logic is simply handled by applying the correct computations.  However, in an Internet based social game where millions of players interact together online, the problem takes… Read more →

Businesses need to respond faster than ever to customer information and demands, which are arriving in rapidly increasing volumes from ever more diverse and distributed systems. This need for real-time business models can not be addressed by traditional integration and business intelligence solutions because streaming analytics and related concepts… Read more →

Contact Us

Would you like to know more about stream processing?

+1 877 571 5775

Ask a Question