SQL for Apache Storm

SQLstream’s Stream Processor for Storm enables distributed stream processing implementations using Apache Storm to benefit from real-time analytics of machine data using continuous SQL queries. Organizations using or considering Storm will benefit from improved performance, operational flexibility and the reduced management costs of using continuous SQL query plug-ins for the real-time analysis and integration of streaming data.

About Apache Storm Project

Storm is a distributed stream processing framework available under the Apache Open Source license. A Storm cluster is similar in concept to a Hadoop cluster, but replacing Hadoop’s Map Reduce jobs with Storm topologies.storm-topology A simple Storm topology is illustrated, showing Spouts (data sources) and Bolts (nodes for insertion of stream processing logic). Bolts are commonly written in Java, but in principle could be defined in any coding language. SQLstream’s Stream Processor for Storm enables Bolts to be declared as high level continuous SQL queries. As with continuous SQL queries, Storm topologies execute Bolts forever or until stopped, unlike Map Reduce jobs that complete and must be re-executed to process newly arriving data. Both SQLstream and Storm’s streaming data model are based on tuples.

SQLstream Stream Processing for Storm

The primary focus of the Storm project is the distributed stream processing infrastructure, rather than the real-time analytics or connectors for machine data sources and enterprise storage platforms. In addition, although Storm can scale for high volume, low latency requirements, performance per core is low and real-world scenarios can require large numbers of servers to achieve the desired performance.

sql-storm-topologyThat’s where SQLstream comes in. SQLstream’s Stream Processor for Storm enables Storm Bolts to be deployed as continuous SQL queries. The Stream Processor can also be deployed as a Storm Spout, utilizing SQLstream’s agents and adapters for real-time machine data collection and integration. In fact, the same SQLstream Stream Processor can be configured as both a Spout and a Bolt. SQLstream’s Stream Processors are connected into a topology using the SQLstream API for Storm. The API enables SQLstream’s Stream Processors to be deployed as both a data source in a Storm topology (Spout) and as a stream processing node (Bolt). Out of the box, the API supports two modes of operation: SQLstream embedded in a Storm topology, and SQLstream interworking in parallel with an executing Storm topology.

SQLstream API for Storm

The SQLstream API for Storm includes the facilities required for stream interoperability in embedded and parallel modes of operation. The API is implemented as a standard SQLstream Extensible Common Data Adapter (ECDA), with both input and output capability, and is delivered as part of the SQLstream s-Server 4.0 package. The API includes:

  • SQLstreamBolt API for embedding SQLstream’s Stream Processors in a Storm topology.
  • StreamBolt and StreamSpout APIs for stream connections in parallel mode.


Why Streaming SQL for Storm?

SQLstream’s Stream Processor for Storm enables the power of continuous SQL queries as Bolts in a Storm cluster, for SQLstream’s machine data collection agents and data storage adapters to be deployed as Storm Spouts, and for Storm processing to be offloaded to SQLstream for improved hardware utilization and lower total cost of performance. In summary, SQLstream enables organizations with Storm implementations to achieve:

  • Greater performance throughput on significantly less hardware. Benchmarks indicate that SQLstream’s Stream Processors on Storm deployments can achieve a 10x reduction in hardware for the same processing throughput performance.
  • Faster time to value for real-time applications. Storm Bolts can now be written as declarative continuous SQL queries. Real-time analytics and continuous integration applications can be deployed in a fraction of the time required to develop low level Java Bolts for example.
  • Dynamic updates to operational Storm systems. Unlike a Storm-only implementation, Spouts and Bolts implemented using continuous SQL queries can be updated and changed dynamically without having to stop, rebuilt and redeploy the Storm topology.