Real-Time Analytics with Stream Processing for Operational Intelligence and the Internet of Things

This week saw the publication of the results for a comparative real-time performance benchmark between Apache Storm and SQLstream Blaze. Using the WordCount example shipped with Hadoop and Apache Storm, we were interested to see just how quickly each could process records.

As it turns out, pretty quickly for Blaze, around 4.6 million records per second per 8-core server, less so for Apache Storm. The 4.6 million number is more or less equivalent to processing the Complete Works of Shakespeare (a little under 1 million words according to the Internet) fives times every second. Although why you might want to do that, I’m not sure. The difference in performance between SQLstream Blaze and Apache Storm was even more impressive as the WordCount benchmark does not expose one Storm’s major weaknesses – no native concept of time-based processing over time windows. Which happens to be a particular strength of SQLstream, and all real-world use cases for streaming analytics require this capability.

That said, the key point here is not about the fastest performance, it’s about what that means in real terms. The cost of Big Data and real-time systems has come into focus more over the past year. Although Big Data technologies utilize commodity hardware, when you need 100+ boxes to get any reasonable performance, that can become expensive. Faster performance per server means less servers.

The Cost of Performance for stream processing

Throughput performance offers confidence of future scalability but also translates directly into the monthly and lifetime costs for the solution. The average bare metal cloud server is around $500 – $2000 per month depending on the specification and I/O bandwidth.

ROI Metric Apache Storm SQLstream Blaze Blaze ROI
Hardware Cost / Month
Processing 5 million words per second using bare metal 8-core Cloud servers at $500/month
121 servers, $60,500 / month 2 servers, $1000 / month SQLstram Blaze 60X reduction in infrastructure cost versus Apache Storm.

Even though storage is less of a consideration with a stream processing platform, why deploy 100+ servers when the same work could be carried out on two or three? This means a significant reduction in server costs (60X reduction even considering just the simple WordCount benchmark), but also much simpler manageability and platform stability going forward.

Time to Value.

Time to value, system stability and agility for new requirements are also front of mind for CIOs when deploying Big Data platforms. Time to value is the time taken to get to an operational system from scratch. It’s swings and roundabouts with Hadoop and stream processing frameworks such as Storm. On one hand, anything is possible, an important consideration when processing unstructured machine data, but on the other, it takes time and money to build from scratch, often at the expense of solution reuse and the ability to change for new requirements. The WordCount example is trivial in both SQLstream Blaze and Storm, but in real-world scenarios, such as covered in a previous independent benchmark between SQLstream and Storm, time to value is an important consideration.

ROI Metric Apache Storm SQLstream Blaze Blaze ROI
Development Effort
From download to operations, based on a customer 4G network performance monitoring app.
6 months
(180 days)
1 week
(5 days)
SQLstream Blaze delivers robust operational systems 30X faster.

This is where the power of SQL for stream processing comes to the fore – powerful analytics, quickly, and a stable platform – but also pre-built adapters, integrated real-time dashboards for streaming analytics, out of box integration for continuous ETL and stream persistence with Hadoop HDFS and HBase, plus a range of other RDBMS and data warehouses.

In summary, performance does matter. Stream processing offers scalability for systems at the junction of fast data and big data. But stream processing platforms are not all equal. As Shakespeare said “Time travels at different speeds for different people” (As You Like It). For SQLstream’s customers, it gallops, and we can keep up.

Security Internet of Things

Cybersecurity and the Internet of Things are increasingly uncomfortable bedfellows. We’ve blogged before on the the security gaps that already exist as a result of connecting yesterday’s technology to the Internet. A recent article by Colin Wood published on govtech.com goes several steps further and brings us up to date. The… Read more →

Extreme data stream processing performance

The stream processing paradigm differs from the traditional storage-based data management paradigm with which we grew up. Stream processors are fast as they are in-memory (although this is not unusual these days), process data streams record-by-record as they arrive over time or record-based windows using continuous queries (which never… Read more →

Mobile-Map

Big Data is also about faster results, streaming analytics and real-time actions (as low as millisecond latency) in the case of stream processing, with faster batch operations in the case of Hadoop (a few hours). Fast data also means a different set of data quality issues – which must… Read more →

Real-time to Action for Stream Processing

There’s certainly a vast range of different IoT API, connection protocol technologies and data formats. At first glance, this makes device to device communication tricky, particularly as the Internet of Things encompasses all vendors and technologies. However, the Internet of Things is not necessarily a direct vendor to vendor… Read more →

Real-time to Action for Stream Processing

A large variety of commercial and open source event processing software is available to architects and developers who are building event processing applications. These are sometimes called event processing platforms, complex-event processing (CEP) systems, event stream processing (ESP) systems, or distributed stream computing platforms (DSCPs).
Distinguished analyst Roy Schulte from… Read more →

August 27, 2014 by in Stream Processing
Real-time Big Data Cost of Performance

Data processing technologies perform at different rates, making the Total Cost of Performance a hot topic for feasibility studies concerning Big Data tools including traditional databases, Hadoop and stream processors. The reason is simple. Any storage-based technology must store the data first before they can be queried and processed,… Read more →

July 3, 2014 by in Hadoop
Real-time to Action for Stream Processing

Log analytics has been around for a while but until recently, “real-time log analytics” usually meant
a) slow answers (in many minutes or even a few hours)  and
b) low volumes (data arriving at a few thousand records per second).
Things, needless to say, are changing (could be the recent uprise in… Read more →

300x200-Blog-BigData

A distributed data management architecture is an essential requirement for real-time Big Data applications such as managing IoT sensor and machine data payloads. Smart services for IoT applications will require low latency answers, multiple servers and distributed processing for scalability, plus built-in redundancy for resilient, 24×7 operations.
It’s also important… Read more →

On Thursday, April 24, SQLstream hosted a webinar exploring the potential of the Internet-of-Things. With a focus on monetization, the event expanded on harvesting real-time value from IoT services, discussing technology requirements, security concerns and likely directions for commercialization.
So what is the Internet of Things? To many, it’s about connected devices,… Read more →

SQLstream StreamApps are fast-start templates for real-time streaming machine Big Data applications.  Each StreamApp is a library of components for a specific operational business process. In a Big Data industry typified by a lack of standards and high development costs, StreamApps takes SQLstream’s standards-based SQL platform for streaming operational… Read more →

Just back from the Silicon Valley Comes to Oxford (SVCO), an invite-only event at the Said Business School, University of Oxford University in the UK. The aim of the event is to provide insight to the Business School graduates on how to start, scale and run high-growth companies. The speakers… Read more →

Ventana’s Technology Innovation Awards showcase “advances in technology that contribute significantly to improved efficiency, productivity and performance of the organization.” SQLstream’s IT Analytics and Performance award recognized SQLstream’s innovative technology and ability to optimize operational processes and systems. The award considered all aspects of SQLstreams technology and business approach,… Read more →

The definition of machine data covers, not surprisingly, all data generated by machines – servers, applications, sensors, web feeds, networks and service platforms. It covers everything from data centers, telecommunications networks and services to machine-to-machine and the Internet of Things in a device-connected world.
The value of machine data is… Read more →

Folklore has it that the term ‘Internet of Things’ (IoT) was first popularized in 1999 at MIT to describe the architecture of connected RFID devices. Cisco then looked to define when the IoT came in to being as a concrete entity – defined as the year in which the… Read more →

We participated on the “Architecting Big Data Systems for Speed” panel at E2 Conference. Great event, and a great opportunity to discuss technology in a business context. The panel offered a range of perspectives with other panelists from Translattice and Oracle’s NoSQL division. A number of interesting topics emerged,… Read more →

June 19, 2013 by in Streaming Analytics

Sensors Expo is the leading industry event for the types of intelligent sensor-integrated systems that are driving the next generation Internet of Everything, Industrial Internet, telematics and Machine-to-Machine services. As a key sponsor in the Big Data and Wireless Systems pavilion, and speaking in the Big Data track, we found… Read more →

logo_sensors_0

San Francisco, CA | May 30, 2013 – SQLstream Inc., the Streaming Big Data Company, announced today that SQLstream VP Americas, Glenn Hout, has been invited to speak on real-time operational intelligence and prescriptive analytics for the Internet of Everything at Sensors Expo 2013, Jun 4-6.
Held in Rosemont, Illinois,… Read more →

The Internet of Everything is the new frontier for real-time and Big Data, where Velocity now trumps Volume as the primary driver, where the geographical distribution of streaming data adds new levels of complexity, and yet the useful lifetime of data, the window within which to make a decision,… Read more →

Real-time to Action for Stream Processing

SQLstream sponsored the recent IE Group Big Data Innovation Summit in San Francisco where I also presented on streaming SQL for Hadoop, and extending Hadoop for real-time operational intelligence and streaming analytics. As Big Data technologies and Hadoop push further into mainstream enterprises, so the need for real-time business… Read more →

300x200-Blog-BigData

Today’s edition of Information Management’s DM Radio Broadcast, The Future of Integration: ETL, CDC and IOA, had a great panel line up discussing the breadth of data integration issues in today’s world of Big Data, Cloud and traditional enterprise architectures. The session was hosted by Eric Kavanagh (Bloor) and… Read more →

Real-time Big Data means the streaming, continuous integration of high volume, high velocity data from all sources to all destinations,  coupled with powerful in-memory analytics. It’s a paradigm shift from conventional store and process systems that’s playing well here at the Intelligent Transportation Systems and Solutions World Congress here… Read more →

We’re at the Intelligent Transportation Society’s World Congress in Vienna, Austria next week, Mon 22 – Fri 26 October. The theme for this year is ‘Smarter on the Way’ and this is the first year since 2009 that it’s been held in Europe. The full panoply of intelligent transportation… Read more →

Perhaps the highlight of Oracle OpenWorld last week, or at least, the most commented on by attendees at our booth, seemed to be Larry Ellison’s demo of Exadata and Exalytics – querying 10 days or so of stored twitter feeds with the hope of finding the best US athlete… Read more →

This week we’re at the Intelligent Transportation Society of California’s Annual Conference & Expo in Sacramento. The conference is focussed on the adoption of advanced technologies to improve traveler mobility and heighten safety in California.  Attendees include specialists from government, industry and academia.
This week we’ll be demoing real-time traffic congestion… Read more →

Glue Conference 2012 , Denver CO, at the end of May was a great conference, well attended, knowledgeable participants and is the only conference I know that looks at gluing cloud and mobile applications together with a developer focus.
There was the usual wave of NoSQL, cloud storage, cloud platforms… Read more →

We’re at ITS America Annual Meeting, National Harbor, Washington DC  this week (see the team in action below), the yearly opportunity for the US intelligent transportation community to get together and discuss how IT and technology can be used to better serve travelers, industry and government. Real-time, the Smart… Read more →

May 22, 2012 by in Internet of Things

This week I’m attending an interesting conference at UC Berkeley called the “Berkeley conference on Streaming Data”.  The organizers are primarily astronomers and statisticians, but the talks discuss issues and solutions to streaming data problems across a wide selection of scientific areas and engineering applications.  Real-time streaming analytics and Big… Read more →

Joining real-time structured and unstructured data feeds for better accuracy and reliability from your operational intelligence, and the Text Analytics Summit, 2012, London.
Three IT trends have emerged over the past year – Big Data, real-time and the importance of unstructured data. Taking the latter first, there is an increasing… Read more →

Last week SQLstream sponsored and CEO Damian Black presented at Structure Data in New York, a conference exploring “the technical and business opportunities spurred by the growth of big data”.
It’s clear that Big Data has moved on considerably in a very short space of time. From the Silicon Valley,… Read more →

For Big Data, 2012 has started where 2011 left off, with a plethora of reports, articles and blogs. Interestingly, most still begin with the question “what is Big Data”. It appears ‘Big Data’ as a market is broadening its footprint far beyond its open source and Hadoop origins…. Read more →

January 11, 2012 by in Hadoop

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the third and final part of the Geospatial Visualization tutorial. The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps… Read more →

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the second in the Geospatial Visualization tutorial.  The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps required to capture… Read more →

A defining feature of the show is the Technology Showcase featuring  demonstrations from some of the technologies and applications that are bringing the future of transportation to life.  Each ‘village’ covers a specific theme such as Safety, Mobility, Environment/Sustainability and Pricing.  Environment/Sustainability focuses on the potential for reducing emissions. … Read more →

SAN FRANCISCO, CA, October 17, 2011 - SQLstream Inc. today announced the public availability of SQLstream ITS Insight, the first real-time solution for reducing congestion to exploit low cost wireless GPS data as a complement to existing fixed-road sensor investment. Transportation Agencies are already benefiting already from SQLstream ITS Insight,… Read more →

Visit SQLstream on Booth #1366
Technology and innovation are central themes of this year’s ITS World Congress.  There’s been much written about the issues of congestion, green transportation schemes and improving personal mobility, not least in this blog.  At SQLstream we’ve been playing our part to help revolutionize the Intelligent… Read more →

A streaming SQLstream application will feel very familiar to anyone with some basic knowledge of SQL and traditional RDBMS applications.  SQLstream uses standards-based SQL, except that streaming SQL queries run forever, processing data as they arrive over specified time windows.
This blog is the first in a series of tutorials… Read more →

The 18th World Congress on Intelligent Transport Systems (ITS) is being held in Orlando from October 16th – 20th, 2011. This is the leading event for intelligent transportation solutions, and attracts a large audience of government, technology and industry professionals. The event seeks to demonstrate advances in the application… Read more →

I am going to discuss a SQLstream application for monitoring traffic flow in real-time. In this application, vehicles with GPS enabled devices transmit vehicle position along with other vehicle information such as speed and engine state. SQLstream receives this information as a real-time data stream and uses streaming SQL… Read more →

The latest product update of SQLstream, version 2.5.1, has just been released and shipped.  This will be the final 2.x release prior to the SQLstream 3 launch, and although SQLstream 2.5.1 is predominately a maintenance release, it does include a range of feature enhancements, including:
- Support for exponentially decaying… Read more →

August 10, 2011 by in Streaming Analytics

SQLstream is helping to predict earthquakes across the world in real time. The system has been developed by a consortium of universities and government agencies, with funding from NSF (National Science Foundation), to provide an infrastructure of networked tools for research in ocean science – constructing an internet-based system… Read more →

SQLstream has been powering Mozilla’s Firefox Download Monitor since 2009. A SQLstream based application has been continually aggregating hundreds of millions of download events, receiving minute by minute aggregations via a continuously running SQL SELECT statement using the SQLstream JDBC driver. A continuously running SELECT statement is syntactically and… Read more →

A streaming SQL query is a continuous, standing query that executes over streaming data. Data streams are processed using familiar SQL relational operators augmented to handle time sensitive data. Streaming queries are similar to database queries in how they analyze data; they differ by operating continuously on data as… Read more →

In the game industry, complex game logic needs to be applied to streams of events generated by gameplay.  In single player games, this logic is simply handled by applying the correct computations.  However, in an Internet based social game where millions of players interact together online, the problem takes… Read more →

Businesses need to respond faster than ever to customer information and demands, which are arriving in rapidly increasing volumes from ever more diverse and distributed systems. This need for real-time business models can not be addressed by traditional integration and business intelligence solutions because streaming analytics and related concepts… Read more →

Last year has been an interesting experience as I participated in a number of customer “Proof Of Concept” projects for SQLstream. Developing these real-time, stream computing projects greatly increased my appreciation for the advantages of an open, extensible and standards-compliant middleware infrastructure.
For example, I needed to implement an “edge… Read more →

Contact Us

Would you like to know more about stream processing?

+1 877 571 5775

Ask a Question