Blog


Vice President Marketing
June 18, 2013
GigaOM: Structure 2013 Logo

“The Total Cost of Performance in a Massively Connected World” Session to be Presented at GigaOM Structure on June 19th at 4:55PM PST in San Francisco

San Francisco, CA | June 18, 2013

SQLstream Inc., a pioneer of the streaming Big Data engine for real-time operational intelligence, today announced that SQLstream CEO Damian Black has been invited to speak about the cost of achieving real-time performance from Big Data technologies at GigaOM’s Structure 2013, June 19-20.

Damian will be exploring the cost and technical limitations of generating real-time operational intelligence from different Big Data technologies. As data-management architectures evolve, each new technology looks to increase real-time visibility and reduce the latency-to-action and decision making. Yet technologies such as Hadoop have an associated cost and a practical performance limit beyond which real-time, high velocity distributed data management becomes unfeasible.

Held at Mission Bay Conference Center at UCSF, and now in its sixth year, GigaOM Structure gathers data scientists, business leaders and investors interested to learn more about the emerging business opportunities for Cloud and Big Data. Structure 2013 will focus on how real-time business needs are shaping IT architectures, hyper-distributed infrastructure and creating a cloud that will look completely different from everything that’s come before. Damian will present The Total Cost of Performance in a Massively Connected World at 4:55PM PST on Wednesday, June 19.

Damian Black, was selected for his pioneering work in real-time and Big Data software and his commitment to reducing latency-to-action for business operations.  Black has participated in many GigaOM events, including the first ever Big Data panel session at GigaOM Structure 2008. “GigaOM has its finger on the pulse yet again with the focus on addressing real-time business needs from Big Data,” said Damian Black, SQLstream CEO. “The drive towards high velocity data coupled with low latency actions marks a new frontier for Big Data, one where the cost of performance is driving streaming data technology to the fore.”

If you are a member of the press or analyst community and are interested in setting up a meeting with SQLstream at GigaOM Structure:Data, please contact Ronnie Beggs via email at ronnie.beggs@sqlstream.com.

About SQLstream

SQLstream (www.sqlstream.com) is the pioneer and innovator of a patented Streaming Big Data Engine that unlocks the real-time value of high-velocity unstructured machine data. SQLstream’s s-Streaming products put “Big Data on Tap™ – enabling businesses to harness action-oriented analytics, with on-the-fly visualization and streaming operational intelligence from their log file, sensor, network and device data. SQLstream’s core data streaming technology is a massively scalable, distributed platform for analyzing unstructured Big Data streams using continuous SQL, with support for streaming SQL query execution on Hadoop and other enterprise database, data warehouse and data management platforms. SQLstream’s headquarters are in San Francisco, CA.

Media Contact

Ronnie Beggs, VP Marketing

Company Name |  SQLstream, Inc.
Contact E-mail | pr@sqlstream.com
Website URL | www.sqlstream.com

Resources

GigaOM Structure agenda

GigaOM Structure registration

SQLstream GigaOM Structure:Data 2013 Damian Black interview

 

Posted under Events · Press Releases
June 17, 2013
UBM TECH 32 CONFERENCE

Ronnie Beggs Participating on “Big Data: Architecting Systems at Speed ” Panel at E2 Conference, June 18th at 2:30PM ET, Boston

San Francisco, CA | June 17, 2013 – SQLstream Inc., the streaming Big Data platform, today announced that SQLstream has been invited to speak at E2 Conference on the architectural challenges of generating real-time operational intelligence from high-velocity Big Data streams.

The panel gathers industry experts to explore how real-time, high velocity Big Data streams can be processed rapidly in order to make meaningful decisions quickly.  It has been assumed that volume is the fundamental problem presented by Big Data.  It’s not. It’s data velocity and speed to action. The panel lead by esteemed analyst Johna Till Johnson of Nemertes Research will analyze the challenges of rethinking how Big Data processing is architected.

Held at Boston Marriott Copley Place, E2 is the only conference focused entirely on the choices and challenges organizations face as a result of shifts in technology. A new event in the Big Data world, E2 offers decision makers in IT, data scientists, and business leaders a progressive forum where they can engage in sessions on the future of software, and explore how it can dictate the needed changes in corporate strategies.

“What I like about E2 Conference is its focus on the challenges that organizations face as a result of technology shifts, rather than the technology itself,” said Ronnie Beggs, SQLstream’s Vice President Marketing.  “Our panel is timely as well, given the wider focus on the cost of Big Data performance for low latency actions from high velocity machine data.”

If you are a member of the press or analyst community and are interested in setting up a meeting with SQLstream at GigaOM Structure:Data, please contact Ronnie Beggs via email at ronnie.beggs@sqlstream.com.

About SQLstream

SQLstream (www.sqlstream.com) is the pioneer and innovator of a patented Streaming Big Data Engine that unlocks the real-time value of high-velocity unstructured machine data. SQLstream’s s-Streaming products put “Big Data on Tap™ – enabling businesses to harness action-oriented analytics, with on-the-fly visualization and streaming operational intelligence from their log file, sensor, network and device data. SQLstream’s core data streaming technology is a massively scalable, distributed platform for analyzing unstructured Big Data streams using continuous SQL, with support for streaming SQL query execution on Hadoop and other enterprise database, data warehouse and data management platforms. SQLstream’s headquarters are in San Francisco, CA.

 

Media Contact

Ronnie Beggs, VP Marketing, Company Name |  SQLstream, Inc.

Phone: +1 415 758 8342
Contact E-mail | pr@sqlstream.com
Website URL | www.sqlstream.com

 

Resources

E2 agenda

E2 registration

Posted under Events · Press Releases

Vice President Marketing
June 11, 2013

logo_sensors_0Sensors Expo is the leading industry event for the types of intelligent sensor-integrated systems that are driving the next generation Internet of Everything, Industrial Internet, telematics and Machine-to-Machine services. As a key sponsor in the Big Data and Wireless Systems pavilion, and speaking in the Big Data track, we found the event a great litmus test for the current state of the sensor-related systems’ industries.

This year, the focus was very much on the monetization of intelligent sensors networks. This included how businesses and consumers can benefit from the Internet of Things, but also a focus on infrastructure costs, in particular, architectures for managing the high cost of wireless backhaul for large scale sensor data transmission.

The clues were there at last year’s event, where the focus was not so much real-time operational intelligence and monetization, rather the more fundamental issue of real-time streaming sensor data integration. The main question twelve months’ ago was how can such huge volumes of data from many different sources and locations be integrated into the operational platforms, and at an affordable cost.

Drinking from the Sensor Firehose

The Internet of Everything is the new frontier for real-time Big Data management, where data velocity now trumps data volume as the primary driver, where the geographical distribution of streaming data adds new levels of complexity, and yet the useful lifetime of data, the window within which to make a decision, has decreased dramatically. As data management architectures evolve, each new technology looks to increase real-time visibility and reduce the latency to actions and decision-making.

Which is exactly problem that SQLstream’s VP of Americas, Glen Hout, presented in his session on Real-Time and Big Data: The Perfect Union for Large-Scale Sensor Network Management. SQLstream is a streaming Big Data management platform, sensor data is processed in real-time as they are created, and answers generated with sub-second latency. SQLstream also collects unstructured sensor data of any type and from any number of sources simultaneously, and can deliver multiple aggregated and real-time intelligence to external systems such as operational management platforms, existing data warehouses and Big Data storage platforms.

The Intelligent Edge

However, the issue with the cost of transmission backhaul is also addressed by SQLstream. SQLstream’s agent framework enables intelligent data collection agents to be deployed as close to the data source as possible. Intelligence Agents offer filtering, aggregation and analytics capability at the edge of the network. Depending on the information content in the data, the volume and therefore cost of real-time transmission will be reduced significantly.

So the future is bright, the future is real-time operational intelligence. And to achieve that, requires the next generation of streaming data management and true real-time operational intelligence platforms.

June 6, 2013

dbta_logo

SQLstream was named as one of the top 100 companies that provide technologies and services for data management by the Database Trends and Applications (DBTA) magazine.

A result of lengthy research and compiled by the publication’s editorial staff, the first “DBTA 100: The Companies That Matter Most in Data” list recognizes industry leaders that are helping organizations derive tangible value and insight from Big Data.

Read more, HERE.

Posted under In the News
June 5, 2013

InfoArmor

InfoArmor and SQLstream Team Up to Deliver the Next Generation of Continuous Identity  Monitoring Services at Massive Scale 

SAN FRANCISCO – JUNE 5TH, 2013 – SQLstream, Inc., the Streaming Big Data Company, today announced that InfoArmor, a U.S. leader in identity theft monitoring and internet surveillance, selected SQLstream as the real-time engine for the next generation of its identity monitoring and surveillance platform. SQLstream’s s-Streaming products enable InfoArmor to meet the rapid growth in demand for its real-time identity monitoring services. SQLstream was chosen for its capacity to capture, process and integrate in real-time high volumes of unstructured data coming from a large variety of sources, its massive scalability for high velocity real-time operational intelligence, and its simple, fast deployment.

Identity theft in the U.S. alone claimed 12.6 million victims in 2012, a rise of over 1 million from 2011, resulting in more than $21 billion in damages. InfoArmor’s identity protection services are helping detect and correct identity theft before any damage is done or costs incurred. Real-time identity theft monitoring has complex, rules-based alerting intelligence that must be applied over a large number of different data sources simultaneously.

 “We needed a real-time operational intelligence platform that could scale to new levels of data acquisition, conditioning, analytics and alert delivery,” said Christian Lees, CTO of InfoArmor. “We evaluated building our own and explored other vendors, but chose SQLstream because they met our requirements entirely and they provided the only 100% ISO ANSI/SQL standards-based streaming platform. That enabled us massive scalability, a very fast deployment and a highly competitive TCO.”

The system captures and parses multiple data feeds in different XML-based formats. SQLstream parses and conditions the data feeds on the fly, and applies a sophisticated rules catalog across all data, delivering real-time alerts to consumers through SMS when rules and specific combinations of rules are breached.

“We’re delighted to have been chosen by InfoArmor for such a challenging operational Big Data requirement,” said Damian Black, CEO of SQLstream. “Seeing SQLstream making a real difference to people’s lives is a validation of our approach to operational intelligence and predictive analytics.”

Learn more about SQLstream’s security solutions today.

About InfoArmor

InfoArmor is a leader in offering identity theft protection and privacy management solutions to businesses. By focusing on employee benefits, InfoArmor provides a turn-key, value-adding benefit with easy employer administration. Rather than solely focusing on credit fraud, InfoArmor uses industrial strength technologies to enable consumers and businesses alike to monitor and identify fraudulent activity. For more information, visit http://www.infoarmor.com.

Posted under Customer win · In the News · Press Releases
May 30, 2013
logo_sensors_0

San Francisco, CA | May 30, 2013 – SQLstream Inc., the Streaming Big Data Company, announced today that SQLstream VP Americas, Glenn Hout, has been invited to speak on real-time operational intelligence and prescriptive analytics for the Internet of Everything at Sensors Expo 2013, Jun 4-6.

Held in Rosemont, Illinois, and now in its 27th year, Sensors Expo is the leading industry event in its class and gathers the world’s top engineers, scientists and business leaders involved in the development and deployment of sensor networks and systems. Areas of focus include sensor Big Data analytics, wireless sensor networks for Machine-to-Machine services, telematics and smart energy.

Glenn Hout is presenting “Real-Time and Big Data: the Perfect Union for Large-Scale Sensor Network Management” at 10:10AM CT on Thursday, June 6th.  SQLstream was selected to address the Sensor Expo audience in the Big Data and Analytics track based on its pioneering work in real-time operational intelligence and streaming integration of prescriptive analytics for industrial sensor, telecommunications, intelligent transportation, telematics and M2M networks and services.

“It’s always a pleasure to engage with an audience so familiar with the practical implications of Big Data,” said Glenn Hout, SQLstream VP Americas. “The Internet of Everything is the new frontier for Big Data, and SQLstream’s prescriptive analytics engine is the only platform able to meet its real-time requirements. Scaling up to handle sensor data velocities of millions of records per second with ultra-low latency response is the prerequisite for generating and acting on operational intelligence.”

If you are a member of the press or analyst community and are interested in setting up a meeting with SQLstream at Sensors Expo, please contact Ronnie Beggs via email at ronnie.beggs@sqlstream.com.


Vice President Marketing
May 16, 2013

The Internet of Everything is the new frontier for real-time and Big Data, where Velocity now trumps Volume as the primary driver, where the geographical distribution of streaming data adds new levels of complexity, and yet the useful lifetime of data, the window within which to make a decision, has decreased dramatically.

Industries such as telecommunications, telematics, M2M and the Industrial Internet are starting to generate high velocity data streams at rates of millions of records per second. Gaining actionable insight from data of this magnitude and speed may be technically feasible to a point for the elastic scale-out architectures of the leading Big Data and Cloud technologies, but at what point does it cease to be financially viable?

The Cost of Real-time for Big Data

The Technology Tipping Point for Real-time Operational Intelligence

As data management architectures evolve, each new technology looks to increase real-time visibility and reduce the latency to action and decision-making. Yet each technology has an associated cost and a practical limit beyond which real-time, distributed data management becomes commercially infeasible and then technically impossible.

Big Data technologies emerged due to the combination of lower cost commodity hardware and Cloud infrastructure, coupled with the cost and the technical challenges of processing massive volumes of unstructured machine data (server logs, sensors, IP network data for example) rapidly using conventional “store-first, query second” relational database technology. This lead a charge towards new elastic scale out processing frameworks based on HDFS and Map-Reduce. However, as Hadoop-based and other similar technologies mature, the true cost of real-time and Big Data for Hadoop is starting to emerge.

For low latency operational intelligence applications, Big Data storage-based solutions are proving to be costly and complex to deploy, and technically challenging to scale for high velocity / low latency performance. Furthermore, today’s Cloud multi-tenant and shared infrastructures introduce unacceptable latency to real-time business analytics, and the cost ramps up significantly as ingress and egress data velocities increase.

The ROI and technical tipping points for storage-based (Big Data and traditional RDBMS-based architectures) is very much at the low end of the scale, typically in the order of 10,000 records per second. This is sufficient to process a Twitter firehose at full speed, but falls significantly short for industry applications for real-time operational intelligence. For example:

  • Operational performance & QoS monitoring in a 4G cellular network generates cell and call detail information at many 10s of millions of events per second.
  • Telematics applications are generating data rates of between 5 million and 20 million records per second for a small installation.
  • For IP service networks, IP probe data regularly exceeds data rates of 10 million records per second.

These data rates are well within the technical capabilities of SQLstream’s Big Data streaming platform, and importantly, SQLstream has a proven ROI for high velocity data applications. That’s not to say in-stream analytics is the only platform required, and in fact SQLstream deploys Hadoop HBase as its default stream persistence platform, and a typical architecture deploys SQLstream for the in-stream analytics with continuous feeds of raw and operational intelligence from SQLstream to existing operational control and data warehouse platforms. The resulting architecture offers an attractive overall ROI and eliminates the technical tipping points for low latency, high velocity data management.

More on the cost of real-time performance in a Big Data world to follow as part of this series in subsequent blogs.


CEO
April 21, 2013

SQLstream sponsored the recent IE Group Big Data Innovation Summit in San Francisco where I also presented on streaming SQL for Hadoop, and extending Hadoop for real-time operational intelligence. As Big Data technologies and Hadoop push further into mainstream enterprises, so the need for real-time business operations is an important parallel trend. ‘Real-time’ and ‘Hadoop’ had been considered synonymous by some, yet surprisingly, people are surprised when Hadoop does not seem to be as real-time as they hoped. This should not come as a surprise, as Hadoop as many strengths, but was never intended for low latency, real-time analytics over high velocity data.

SQLstream Hadoop-Innovation-Summit Real-time Hadoop

Click to View Damian’s Presentation on Slideshare

Real-time Big Data or real-time Big Data?

Which raises the question, what do we mean by real-time? Many products have emerging that claim ‘real-time’ analytics over Hadoop. Yet Hadoop remains a batch processing framework, and struggles to deliver low latency analytics against high velocity streaming data, struggling due to the same limitations as existing RDBMS-based data management platforms. These ‘real-time’ products may generate rapid results over the stored data, but ignore the latency introduced by data collection and storage, and also ignore the resource load of repeated execution of queries to process newly arriving data. The latency issue may not be apparent for slower data streams, such as twitter feeds for example, but with the data rates of machine data in the world of telecommunications, industrial automation, M2M and large scale security intelligence for example, the problem rapidly becomes extreme.

SQLstream’s core stream computing platform, s-Server, processes high velocity data as soon as they are generated, executing continuous SQL queries and analytics directly over log files, sensor feeds and any other machine-generated data source. We measure real-time form the time of data creation, eliminating completely the latency introduced by collecting, storing and the repeated updates of results.

Drive real-time actions with streaming operational intelligence

We discussed in a previous blog how real-time operational intelligence eliminates the chasm between business operations and analytics. Operational intelligence is about more than the collection and analysis of log file and machine-generated data. One of the advantages of stream computing is the ease with which predictive analytics can be applied over multiple data streams. This makes it possible to alert on time and space-based patterns of machine, user and consumer behavior that are predictors of some future event – a security breach, network failure or service fault.

streaming operational intelligence

And true operational intelligence platforms need to go one step further – true real-time platforms must do more than visualize results on a dashboard – it’s essential to connect back to application and operational systems, and to drive automated updates. Security breaches can be avoided, network resilience mechanisms activated, and service faults corrected before SLA breaches occur and customers are aware of the problem.

Real-time operational intelligence on Hadoop

So what does this mean for Hadoop? Streaming is not a new technology, but approaches streaming technologies have focussed on single source problems, and have been deployed as standalone platforms for low velocity use cases. With SQLstream, standard SQL queries, albeit continuously executing SQL queries, execute to join, group, partition and analyze real-time machine data streams. There is a further difference – SQLstream’s s-Server streaming SQL platform can also be deployed as a streaming SQL query extension for Hadoop.

A number of streaming Hadoop scenarios are supported:

  • Stream persistence – Hadoop HBase as an active archive for streaming data and derived intelligence using the Flume API. SQLstream also performs continuous aggregation  to support high velocity streams without data loss.
  • Stream replay – restream the complete history of persisted streams from HBase for ‘fast forwarding’ of time-based and spatial analytics. Various interfaces can be utilized, including Cloudera’s Impala.
  • Streaming data queries, joining streaming real-time data with historical streams and intelligence persisted in HBase.

Making the Elephant fly

Accelerating Hadoop to process live, high velocity unstructured data streams delivers the low latency, streaming operational intelligence demanded by today’s real-time businesses. Hadoop has been the driving force behind Big Data Analytics but as the technology hits the mainstream, many industries are seeking to take a step further and eliminate latency from their business completely. With the SQL language emerging as the key enabler for the mainstream adoption of Hadoop, executing streaming SQL queries over Hadoop extends the platform out to the edge of the network, making it possible to query unstructured log file, sensor and network machine data sources on the fly and in real-time.

Posted under Big Data · Events
April 17, 2013

Organizations that deploy Big Data tools to analyze operational data soon learn staff’s expertise in SQL isn’t always helpful. New skills need to be learned.

Distinguished analyst Dan Kusnetzky of ZDNet discusses how SQLstream technology makes the SQL skills already found in most organizations useful for the analysis of streaming operational data.

Read full article HERE.

Posted under Analyst review · In the News
April 9, 2013

Unlike traditional relational database management systems (RDBMS), which persist data that can later be queried, SQLstream runs continuous queries on targeted data streams – enabling a real-time view of an organization’s situation.

Eric Kavanagh recaps SQLstream’s last session in The Briefing Room, held alongside analysts Mark Madsen and Robin Bloor- read full summary here.

Posted under Analyst review · In the News
March 26, 2013

SQL or NoSQL? In-memory or hard disks? Graph? These questions have been top of mind in recent years as developers and IT administrators check out new-age databases capable of handling scale-out data sets. SQLstream CEO and Founder Damian Black, together with 3 other executives from new data management technology companies, showed how they stand out in a hot market at GigaOM’s Structure:Data.

Read full article and watch full recording on gigaom.com HERE.

Posted under In the News

CEO
March 20, 2013
  • SQLstream s-Streaming Big Data Engine Benchmarks at 1.35 Million Streaming Events Per Second per 4-core server  – Outperforming Twitter’s Storm Stream Computation Project with Significant Overall TCO Advantage

New York, NY | March 20, 2013– SQLstream Inc., the Streaming Big Data Company, announced today at GigaOM’s Structure:Data, the results of an independent performance benchmark which measured the SQLstream s-Server 3.0 Big Data Engine processing 1.35 million 1Kbyte records per second per 4-core commodity server, outperforming a comparable configuration based on the Twitter Storm distributed real-time computation system. SQLstream’s s-Server outperformed the Storm-based solution by a factor of 15x.

SQLstream’s s-Streaming Big Data Engine delivers action-oriented analytics, extracting operational intelligence in real-time from high velocity, unstructured log file, sensor and other machine-generated data. Streaming intelligence can be persisted, queried and replayed in Hadoop, with additional connectors to all major storage platforms and data warehouses.

The streaming Big Data benchmark was conducted by a large enterprise with a roadmap to stream unstructured operational data from multiple remote log and machine data flows at up to 10 million records per second for each installation. The benchmark requirement was to perform advanced time-series analytics over mobile network infrastructure records in order to predict potential service-impact problems. The benchmark projects that the s-Server platform would require just eight servers to scale up to 10 million records per second — versus an estimated more than 110 servers for the comparable Storm approach.

SQLstream s-Server 3.0 was able to demonstrate significant cost savings with dramatically lower TCO. The TCO savings came from a combination of reduced hardware and power consumption, the power and simplicity of SQL over low-level Java development, plus reduced maintenance requirements. Other factors influencing SQLstream s-Server’s TCO advantage came from its integrated Big Data platform architecture, ability to update on the fly as new data flows are incorporated, significantly faster implementation timescales using SQL for streaming analytics and integration, and automatic platform optimization for turbo-charged performance and parallel dataflow execution.

“SQLstream excels through the combination of its mature, industry-strength streaming Big Data platform, our support for standard SQL (SQL:2008) for streaming analysis and integration, plus a flexible adapter and agent architecture,” said SQLstream CEO Damian Black. “SQLstream s-Server is today’s clear streaming performance winner – with blazingly fast throughput, an ability to handle a wide variety of message types, sources and formats, and an efficient Streaming Data Protocol with compact optimized binary data formats.”

Advantages of SQLstream’s s-Server, the core element of the company’s s-Streaming Big Data Engine, as demonstrated in the performance benchmark project include:

  • Scaling to a throughput of 1.35 million 1Kbyte records per second per four-core server each fed by twenty remote streaming agents.
  • Expressiveness of the standards-based streaming SQL language with support for enhanced streaming User Defined Functions and User Defined Extensions (UDF/UDX).
  • Deploying new processing analytics pipelines on the fly without having to stop and recompile or rebuild applications.
  • Advanced pipeline operations including data enrichment, sliding time windows, external data storage platform read and write, and other advanced time-series analytics, all based on existing SQL standards.
  • Advanced memory management, with query optimization and execution environments to utilize and recover memory efficiently.
  • Higher throughput and performance per server for lower hardware requirements, lower costs and simple to maintain installations.
  • Proven and mature enterprise-grade product with a validated roadmap and controlled release schedule.

All required modules used in the benchmark were integrated with s-Server 3.0, using 20 remote streaming agents connected per SQLstream s-Server instance each running on a four-core Intel® Xeon© server platform with RedHat Enterprise Linux.

 

About SQLstream

SQLstream (www.sqlstream.com) is the pioneer and innovator of a patented Streaming Big Data Engine that unlocks the real-time value of high-velocity unstructured machine data. SQLstream’s
s-Streaming products put “Big Data on Tap™ – enabling businesses to harness action-oriented and predictive analytics, with on the fly visualization and streaming operational intelligence from their log file, sensor, network and device data. SQLstream’s core V5 streaming technology is a massively scalable, distributed platform for analyzing unstructured Big Data streams using standards-based SQL, with support for streaming SQL query execution over Hadoop/HBase, Oracle, IBM, and other enterprise database, data warehouse and data management systems.  SQLstream’s headquarters are in San Francisco, CA.

Posted under Events · Press Releases
March 18, 2013
  • “Four For The Future: Upcoming Database Technologies” Session to be Held at GigaOM Structure:Data on March 21st at 4:40PM ET

San Francisco, CA | March 14, 2013 – SQLstream Inc., a pioneer of the streaming Big Data engine for real-time operational intelligence, today announced that SQLstream CEO Damian Black will speak about innovative real-time Big Data technologies at GigaOM’s Structure:Data 2013, March 20-21.

Held on Pier Sixty, The Chelsea Piers in New York, and now in its third year, Structure:Data  gathers data scientists, business leaders and investors interested to learn more about the emerging business opportunities for Cloud and Big Data. Damian will speak on the invitation-only panel session Four For The Future: Upcoming Database Technologies at 4:40PM ET on Thursday, March 21.

Damian will be discussing the emergence of SQL for Big Data, and more specifically, SQL as a streaming language for low-latency, real-time operational intelligence in a Hadoop environment. Damian was selected for the panel because of his pioneering work in real-time and Big Data software. SQLstream’s s-Streaming products transform log, sensor and other machine-generated data into streaming operational intelligence in real-time. Its core streaming Big Data architecture is built on a massively parallel platform for processing high velocity machine data. Streaming intelligence can be persisted, queried and replayed in Hadoop, with additional connectors to all major storage platforms and enterprise data warehouses if required.

SQLstream CEO Damian Black has participated in many GigaOM events, including the first ever Big Data panel session at GigaOM Structure 2008. “It’s always a pleasure to speak at a GigaOM and to engage with an audience at the forefront of Big Data,” said Damian Black. “Real-time is the next wave of Big Data, and SQLstream is the future of true real-time business decisions. Real-time is only a streaming SQL query away.”

If you are a member of the press or analyst community and are interested in setting up a meeting with SQLstream at GigaOM Structure:Data, please contact Ronnie Beggs via email at ronnie.beggs@sqlstream.com.

About SQLstream

SQLstream (www.sqlstream.com) is the pioneer and innovator of a patented Streaming Big Data Engine that unlocks the real-time value of high-velocity unstructured machine data. SQLstream’s s-Streaming products put “Big Data on Tap™ – enabling businesses to harness action-oriented analytics, with on-the-fly visualization and streaming operational intelligence from their log file, sensor, network and device data. SQLstream’s core V5 data streaming technology is a massively scalable, distributed platform for analyzing unstructured Big Data streams using continuous SQL, with support for streaming SQL query execution on Hadoop and other enterprise database, data warehouse and data management platforms. SQLstream’s headquarters are in San Francisco, CA.

 

Resources

GigaOM Structure:Data 2013 agenda

GigaOM Structure:Data 2103 registration

SQLstream GigaOM Structure:Data 2013 Damian Black interview

Posted under Press Releases
February 26, 2013

In Part One of the Hadoop 2013 series, Merv Adrian points out how significant attention is being lavished on performance in 2013. In this second installment, the topic is projects, which are proliferating precipitously. One of the most frequent client inquiries is “which of these pieces make Hadoop?” As recently as a year ago, the question was pretty simple for most people: MapReduce, HDFS, maybe Sqoop and even Flume, Hive, Pig, HBase, Lucene/Solr, Oozie, Zookeeper. When the Gartner piece How to Choose the Right Apache Hadoop Distribution was published, that was pretty much it.

Since then, more projects have matured, and SQLstream is now part of of shortlist.

Read full article on gartner.com HERE. 

Hadoop 2013- Projects

Posted under Analyst review · In the News · Market Views

Vice President Business Development
February 20, 2013

Developed to Meet Demand from Universities and Supercomputing Centers

San Francisco, CA | February 20, 2013 – SQLstream, a pioneer of real-time Big Data, today announced an extension to its partner program that helps to fulfill the growing need for real-time data processing systems in supercomputing centers and academic cyberinfrastructure facilities.

The SQLstream University Partner Program (UPP) addresses educational institutions looking to expand research into streaming Big Data concepts by offering a non-commercial, royalty-free license to its market leading real-time analytics platform.

The Cornell University Center for Advanced Computing (CAC) headlines the growing list of UPP organizations using SQLstream’s streaming analytics platform to gain real-time insight into massive volumes of data. CAC is one of 17 partner institutions in the Extreme Science and Engineering Discovery Environment (XSEDE), a single virtual system that scientists can use to interactively share computing resources, data and expertise.

“Too many computing systems producing too many logs too quickly are unmanageable,” says Lucia Walle of Cornell University CAC. “SQLstream provided the real-time data mining system compatible with our local business logic that allows CAC to find events of importance using Sisyphus and queries, accurately simulate logs including the scale and pace of log activity, and monitor both simulated and actual system logs so that we can gain reliable insight into the cause and imminence of system failures before they actually occur.”

All UPP members have access to the full suite of SQLstream server technologies and have access to engineering support from SQLstream. Among many of the SQLstream University Partner Program benefits are:

  • Access to a non-commercial, royalty-free license to the SQLstream technology;
  • Access to a library of use cases and analytic algorithms;
  • Discounted training and kick-off assistance to research projects;
  • Discounted commercial license available for those Universities desiring to implement SQLstream in a production environment.

“One of our main goals as an organization is to accumulate business intelligence by adding research partners that can challenge and complement our existing solutions,” said Chris Clabaugh, VP of Business Development at SQLstream. “Streaming analytics and Big Data is seeing a lot of demand in the commercial and research space right now, so we are thrilled to be the streaming analytics technology supplier to university supercomputing centers.”

To learn more about UPP, please visit http://www.sqlstream.com/university-partner-program/.

 

SQLstream Inc. (www.sqlstream.com) makes systems responsive to real-time operational Big Data. SQLstream enables organizations to query their log, sensor and service data directly, and to share streaming operational intelligence with external systems, continuously and in real-time. SQLstream is built on a standards-based, distributed and massively parallel architecture, and uses industry standard SQL for the rapid analysis of high volume, real-time data streams. Standards mean lower costs, proven performance and seamless integration. Headquartered in San Francisco, CA, SQLstream is transforming the world of real-time, Big Data stream management.

Posted under Press Releases

Vice President Marketing
January 31, 2013

Today’s edition of Information Management’s DM Radio Broadcast, The Future of Integration: ETL, CDC and IOA, had a great panel line up discussing the breadth of data integration issues in today’s world of Big Data, Cloud and traditional enterprise architectures. The session was hosted by Eric Kavanagh (Bloor) and Jim Ericsson (ex-Information Management, today was his last DM Radio after 5 years at the helm), and supported by Philip Russom (TDWI). SQLstream’s Damain Black was on the panel, with representatives from Denodo and Dell.

Extending the reach of Hadoop to the edge of your business.

A key observation emerged from the discussion, that the world of ETL and data integration is changing with the need for lower latency business operations. Most businesses are now global. There is not the window of opportunity for batched data management processes. Exploiting streaming data enables organizations to extend the reach of their Big Data storage platforms and data warehouses out to the edge of their business, connecting these platforms directly to the data sources. Streaming data can be filtered, aggregated and integrated with minimal latency, plus real-time operational intelligence extracted as the data stream past. This means businesses can be responsive to their real-time data without having to wait for data to be stored and processed.

Blurring the boundary between analytics and operations.

The emergence of low latency business operations is also blurring the boundary between traditional BI analytics and the world of business operations. The emergence of operational intelligence is the application of analytical techniques in real-time. The two are no longer distinct business functions, rather a seamless continuum from streaming data to real-time business value. The benefit is that businesses can now extract the information they need in real-time from arriving data streams, but still populate existing Big Data storage platforms and data warehouses, enabling existing business processes to function as normal.

The rising power of SQL queries over Hadoop.

Integrating data between systems implies a degree of structure, a common understanding of sink and source formats. The majority of integration platforms today assume SQL as the data management language. However, SQL as a language does not necessarily mean that the underlying data store has to be relational. And this is the game changer for Hadoop that will open up the existing enterprise market – the emergence of a true standards-based SQL query interface. Cloudera is already some way down this road with Impala.

So what’s the answer: SQLstream = Real-time data integration + operational intelligence + SQL.


Vice President Marketing
January 24, 2013

I was listening to the BBC News on the Internet, and the technology section had two experts explaining Big Data to the layman. After some mutterings about it being ‘very large’ and ‘useful in genomics’, most listeners would have been left none the wiser. The BBC’s Science Editor, Susan Watts, had a bit of a better crack at it here. Our industry obviously has a way to go in explaining itself to the wider world.

Susan Watt’s BBC article of Big Data

Tackling the idea of ’very large’ first, and how Big is Big Data? Turns out, it isn’t, at least, not yet. EMA Research recently published a report; that debunked the myth of Big Data being Petabytes upwards. Authored by John Myers and Shawn Rogers, the report suggests that Big Data starts at 110GB, with typical use cases between 10 to 30TB.

But even at that size, why is the Big Data market still forecast to grow dramatically, with IDC predicting the market to be $24B by 2016? Surely this can’t all be put down to market hype and engineers’ excitement. Well, the answer lies in the ‘useful in genomics’. Not everyone has data of that scale or complexity, but most want to extract value from their data much more quickly and often in real-time. Big Data technology is aimed at making that easier, by removing many of the constraints of traditional data management and processing. For example, removing fixed schemas, processing all data, and generating answers on the fly, all with low cost, commodity hardware.

So Big Data Analytics is the killer app? Absolutely. Unfortunately, with today’s first generation Big Data platforms such as Hadoop, HDFS and even HBase, many are still struggling to analyze their data at scale and in real-time. That’s where technologies such as streaming and the (re-)emergence of SQL as a Big Data processing language is key. For example, Cloudera’s Impala is the first full frontal attack on mature Big Data market for larger enterprises. Real-time and SQL will help us bridge not only the technology gap, but also the ability to explain in layman’s terms how Big Data really benefit the person in the street.

Posted under Big Data · Big Data Analytics

Vice President Marketing
January 17, 2013

Bloor Group’s Robin Bloor hosted SQLstream’s CEO Damian in The Briefing Room on Tuesday January 8th 2013. The webcast, entitled “Windows of Opportunity: Big Data on Tap” focussed on the emergence of both SQL and the streaming data platform as a key enabler for real-time Big Data solutions in an ever-maturing marketplace. You can watch the full webinar from the link below, but I’m going to focus on some of the topics arising from the online discussion between Robin, Damian and the audience.

It was an interesting discussion, covering Big Data and streaming in the wider context of enterprise deployments, but a number of important points were raised:

Hadoop is a data reservoir, not a real-time platform.

Many believe incorrectly that Hadoop is a platform for real-time low latency analytics.  It’s not. Hadoop is a multi-purpose engine but not a real-time, high performance engine. The parallelism of Hadoop is great for processing the data once it’s stored, but has high throughout latency.  However, with the integration of a streaming data platform for continuous data collection, analysis and streaming integration, Hadoop can be used as the active archive for a true real-time, streaming Big Data system.

Operational intelligence needs a Streaming Big Data Platform

The bulk of real-time operational intelligance today is derived from log and machine data, data generated by the Internet, Cloud infrastructure and applications for example. There are many log monitoring tools out there, and while very capable, we’re finding that SQLstream with our real-time streaming Big Data platform is being used to solve the high volume, high velocity, complex data problems that log monitoring tools are unable to address at an affordable price point.

The emergence of SQL for Big Data

The first phase of Hadoop and Big Data platforms saw the emergence of NoSQL data storage platforms, looking to overcome the rigidity of normalized RDBMS schemas. However, as the technology hits mainstream industry, the need for simpler, high performance and reliable queries is driving a resurgence in SQL as the de facto language for Big Data processing (see Cloudera Impala for example). What’s not apparent is that SQL is the ideal language for processing data streams using real-time, windows-based queries. The issue with normalization and rigid schemes is a non-issue for a streaming data platform – there are no tables, no data gets stored!

So in summary, streaming Big Data is the emerging technology for 2013. And SQL is the (re-)emerging technology as Big Data hits mainstream industry.  Processing real-time log and machine data streams is a key requirement today, but industry with sensor, M2M and telematics applications are catching up fast.

 

Posted under Big Data · Events
January 15, 2013

Dylan Janus from Information Today talks about the latest SQLstream release, 3.0. With improved speed and scalability, the flagship product introduces high performance distributed stream processing. Google BigQuery integration, and enhanced platform manageability and streaming application development.

“We have improved ways of processing machine data, so you can process much higher volumes and much more flexible and resilient processing of log file data,” Damian Black, CEO of SQLstream, tells 5 Minute Briefing.

Read the full article on DBTA.com here. 

Posted under Big Data · In the News

Vice President Marketing
January 10, 2013

The SQLstream Briefing Room webinar with Robin Bloor took place on Tuesday January 8th 2013.  ”Windows of Opportunity: Big Data on Tap”  highlighted how the evolving Big Data landscape needs technologies that enable a much bigger enterprise-wide picture, complete with multiple data streams that can be combined to show what’s happening in real-time. The speakers included:

  • Eric Kavanagh, CEO, The Bloor Group, who hosted the event.
  • Robin Bloor, Chief Analyst, The Bloor Group, who lead the online briefing
  • Damian Black, President & CEO, SQLstream, discussing the emergence of streaming Big Data management as a key enabler for Big Data solutions, and how SQLstream is at the forefront of streaming innovation.

This was a very interesting and informative briefing on the emergence of streaming Big Data management and the use cases for real-time Big Data solutions.

Click here to watch the webcast …

Title: Windows of Opportunity: Big Data in Tap
Most business opportunities are moving targets these days, rendering static analytical solutions rather ineffective. Instead, organizations need technologies that enable a much bigger picture, complete with multiple data streams that can be combined to show what’s happening in real-time. And increasingly, companies need to analyze both traditional structured data as well as Big Data, including machine-generated data from all manner of enterprise systems.

Register for this episode of The Briefing Room to hear veteran Analyst Robin Bloor explain how a confluence of market forces has opened the door to a new analytical paradigm, one in which companies can leverage a vast array of data streams to pinpoint windows of opportunity as or even just before they appear. Bloor will be briefed by Damian Black of SQLstream, who will discuss his company’s analytical platform, which enables the management of dynamic information assets in much the way that traditional databases do for stored assets.

Posted under Events
January 2, 2013

Justin Rowlatt, BBC World Service, explores how Big Data is not just a whole lot more thank the old “small” data, but a transformative technology that is set to change the way we do everything. From Formula 1 racing to targeted retail, the effective use of data is going to be a basis for competition going forward and a necessary addition to the toolkit of every company.

Why Big Data is so Big (BBC World Service Report)

Damian Black, SQLstream CEO joins Michael Chui of management consultants McKinsey in a talk about how the torrent of data digital technology is creating can be used, and explains why all this data could allow companies and governments to learn all sorts of secrets about us all (which is, it seems, not all that bad).

Broadcast on: BBC World Service

Duration: 18 minutes

Click HERE to listen

Posted under Analyst review · In the News
December 21, 2012

This evolution toward coexistence of the Hadoop framework, NoSQL and SQL approaches could mark a new step in big data’s maturation. As 2013 approaches, there is the possibility that big data may move from hot topic to practical reality.

SQL may have taken a punch or two in 2012, but it refused to go down for the count. Companies specializing in the alternative NoSQL and Hadoop side of things brushed up their SQL credentials this year. A prime example was Hadoop startup Cloudera Inc. It looked to enhance its SQL standing with Impala, a Hadoop software offering that supports interactive queries done in standard SQL.

Read the full article HERE

Posted under In the News

Vice President Business Development
December 20, 2012

I have recently researched IBM® Infosphere Streams’ “Stream Processing Language” or SPL, after a pretty good talk at a recent SVForum SIG group meeting. Early users of this technology (which are really PS projects disguised as license sales) found that their Version 2.0 Stream Processing Application Declarative Engine (SPADE) programming language has been replaced by the IBM Streams Processing Language (SPL).

I suppose the research labs either changed interns for the summer or the products team have had changes in management. More seriously this is a natural outcome of productizing a customer software development project, paid for by the US Government, and then re-badged as a COTS product (commercial off-the-shelf product). IBM has time and again demonstrated their mastery of this jiu-jitsu move. It is truly impressive.

What we have done I believe is actually far harder, yet far simpler for the real users concerned with looking at and analyzing massive flows of streaming big data. From the start, we had a fundamental belief that our customers, partners, and end users, would need a familiar programming syntax and language. We would do all the difficult hard stuff “under the covers” as it were, so as to hide the complexities of aggregating and analyzing time-stamped streaming data over some time frame of relevance. We made a bet some 10 years ago that the deterministic and concise characteristics of SQL would continue to be a dominant force in the market; yet we undertook the very hard work of making difficult streaming data management appear easy to our end users and partners.

So, as IBM continues to revise / debug / enhance its various versions of SPL based upon various PS-led projects in the field, we view that as a great thing – for SQLstream and our customers. Our standard remains constant.

Posted under Market Views

Vice President Marketing
December 13, 2012

New SQLstream s-Server 3.0 enables faster response to operational Big Data by tapping into streaming log file, sensor and service data in real-time.

SAN FRANCISCO, CA, December 13, 2012 – SQLstream Inc. (www.sqlstream.com), a pioneer of real-time Big Data, today delivered the new generation of its streaming Big Data management platform. SQLstream s-Server 3.0 is the fastest and most scalable release of the company’s flagship product, introducing high performance distributed stream processing, Google BigQuery integration, and enhanced platform manageability and streaming application development.

SQLstream’s Big Data on Tap™ platform architecture is built from the ground up for real-time, streaming Big Data applications. The new release, with its real-time data collection, transformation and sharing capabilities, enables businesses to respond even faster to their operational Big Data. SQLstream s-Server 3.0 queries log file, sensor and service data in real-time, joins and transforms data streams using only the standard SQL language, and shares results continuously to outputs such as Big Data storage platforms. Streaming SQL enables high volume, high velocity applications for both structured and unstructured data to be built rapidly without having to resort to low level code development.

With s-Server 3.0, throughput performance is up to 10 times faster than that of the previous releases. The performance breakthrough is enabled by lock-free distributed processing for live data streams, a method pioneered by SQLstream and essential for real-time Big Data scalability. In addition, s-Server 3.0 brings faster integration, achieved through a new bi-directional connector for Google BigQuery, and faster development through new streaming SQL operators and added Windows support.

“Recent EMA research shows that Big Data environments consist of multiple platforms including structured (RDBMS) and multi-structured (Hadoop and other NoSQL) data stores – each one handling the processing that matches the strengths of the platform. This collection is called the Hybrid Data Ecosystem,” said John Myers, senior business intelligence and data warehousing analyst at Enterprise Management Associates. “Streaming Big Data solutions like SQLstream s-Server offer continuous integration with real-time analysis. This will be important in the area of operational intelligence analysis where low query latency is key.”

“SQLstream s-Server 3.0 addresses the operational Big Data problems that other log and network monitoring software solutions are unable to solve,” said Damian Black, SQLstream CEO. “Organizations want to be more responsive to their real-time Big Data, and that requires SQLstream’s low latency and raw performance, plus our ability to join and share data continuously between any data source and any destination storage platform.”

An early availability program for SQLstream s-Server 3.0 validated the new performance and integration capabilities across a range of industries including telecommunications, transportation, financial services and High Performance Computing (HPC). High performance log and machine data processing was a key requirement, met by SQLstream s-Server 3.0’s ability to detect real-time operational issues in high volume, high velocity data streams that were out of range for existing log monitoring systems.

“We had too many systems producing too many logs too quickly for any of our existing tools to process in real-time or otherwise,” says Lucia Walle, Cornell University Center for Advanced Computing. “SQLstream is the solution that scaled to monitor logs in real time for key patterns indicating imminent and undesirable conditions.”

To download a free trial and for more information about SQLstream s-Server 3.0, its technical capabilities, business benefits, and our Big Data on Tap™ approach, visit www.sqlstream.com/whats-new-in-3.

About SQLstream Inc.

SQLstream Inc. (www.sqlstream.com) makes systems responsive to real-time operational Big Data. SQLstream enables organizations to query their log, sensor and service data directly, and to share streaming operational intelligence with external systems, continuously and in real-time. SQLstream is built on a standards-based, distributed and massively parallel architecture, and uses industry standard SQL for the rapid analysis of high volume, real-time data streams. Standards mean lower costs, proven performance and seamless integration. SQLstream is headquartered in San Francisco, CA.

Posted under Press Releases
December 4, 2012

The move over recent years has been towards increasingly distributed processing of data, both in terms of the underlying model and in terms of the processing architectures available. More and more of this data is also streaming and SQLstream is focused on effective access to streaming big data …

Read the full article on Decision Management Solutions: http://jtonedm.com/2012/12/03/first-look-sqlstream/

 

 

Posted under In the News

Vice President Business Development
November 21, 2012

I have a difficult time restraining my impulse to interrupt customers, vendors, employees, the janitor etc., when I hear them refer to ‘real time data’ when what is meant is the ability to ingest massive amounts of data and to interpret or analyze this in a short amount of time. Invariably they will say, “Well, this is clearly a ‘real time’ data problem” or the like. In short, they have confused acceptable low latency with streaming flows of data.

As the character Luke (portrayed by Steve McQueen, who many moons ago drove Vic Hickey’s Baja Boot in one of our Baja 1000 excursions, the Baja Boot being the precursor to the Humvee – but that’s another story) says in the classic 1967 movie Cool Hand Luke, “What we’ve got here is failure to communicate” – I submit  ‘real time’ can never be actually achieved or determined; it is very much akin to the intersection of Brownian motion with Heisenberg’s Uncertainty Principal – assuming you can catch / read / interrogate something in ‘real time’, both time has passed and the mere action of observing the object has caused the object to move away. That is, ‘real time’ can never be truly achieved or realized, an event is over by the time one tries to measure or quantify it. Low latency, or rather low latency in a time frame of relevance, is actually what these folks mean.

Streaming data, however, is constant, flowing, measurable, and accessible should one have the means to do so, SQLstream s-Server being a great tool for such purposes. Results from analysis of such streams are streams themselves, or can be quantized into discrete events which can then be used to trigger or visualize events of import, leading systems, administrators, executives, etc., to take appropriate actions. This needs to happen in a low latency time constraint; however, ‘low latency’ itself may have different meanings for different applications of these technologies.

Posted under Market Views

Vice President Marketing
November 16, 2012

Silicon Valley Comes to Oxford 2012 (SVCO 12) is an iconic entrepreneurship event now in its 12th year, that brings the most relevant technology entrepreneurs from Silicon Valley to Oxford, UK to engage with University of Oxford’s Saïd Business School students plus members of the wider Oxford entrepreneurial ecosystem. SQLstream CEO Damian Black is hosting and presenting sessions on both aspects – the trials and tribulations of getting a tech Big Data startup off the ground in Silicon Valley, plus insight into the current state and future potential of the Big Data market. There’s also the debate at the University of Oxford Student Union where Damian will be leading one side of the debate, but more on that in a separate blog.  Other speakers include senior representatives from Google, IBM and McKinsey. Click here for the full list of speakers.

What they don’t teach you in Business School on how to create and build a start-up company.

Damian’s workshop on tech startups focuses on a real-world perspective that is often overlooked in Business Schools. Most are aware of the importance of developing the right ideas, the right business plans, on building the right team and attracting the right capital. These are of course all useful steps but the reality is often very different.

Damian is leading a workshop that will cover the lessons and scars learnt during the early days of SQLstream and other start-ups, giving a real-world perspective as to what was easy, what was hard, what worked, what didn’t, and at the unanticipated pitfalls and the unexpected boosts that together made up the start-up journey of taking a company from crazy concept to everyday reality.

The full material and paper from the workshop will be published here on the blog over the coming weeks.

Also coming up in the next few weeks, The “Big Data” Janus Effect (and why is everyone might be looking the wrong way). Damian’s talk on Big Data at SVCO 12 will look at the intersection of Big Data and real-time, and whether the current emerging Big Data storage technologies are the beginning of a new era, or the final death throes of the static storage approach to business and operational intelligence that has persisted since the 70s.

Silicon Valley Comes to Oxford is on the 18th and 19th of November at the Oxford Centre for Entrepreneurship and Innovation.

Posted under Uncategorized

Vice President Marketing
October 25, 2012

Real-time Big Data means the streaming, continuous integration of high volume, high velocity data from all sources to all destinations,  coupled with powerful in-memory analytics. It’s a paradigm shift from conventional store and process systems that’s playing well here at the Intelligent Transportation Systems and Solutions World Congress here in Vienna.

(more…)

October 20, 2012

The latest installment in the datacentre mysteries – in which Hadoop has long been the prime suspect for writing off the relational database (RDB) – is out now. With fox-like cunning, [the NoSQL movement] devised a plan so that traditional databases could be slowly migrated onto this new model, without anyone noticing.

And they would have got way with it too, if it wasn’t for a young startup, SQLstream, sticking its nose in where it doesn’t belong. Damian Black, its CEO, might have accidentally disrupted the perfect disposal of RDB, by creating a real time analytical system that doesn’t need to number crunch big data. Instead, it intercepts live data streams, in real time, and creates insights from them a lot quicker. If they used it to analyse tweets about traffic jams, for example, they could avoid hundreds of millions of snarl ups on the motorway by tipping people off quicker than those ludicrous outdated ‘information’ notices they have on all our major trunk roads.

Click HERE to read the article in full.

Posted under In the News

Vice President Marketing
October 19, 2012

GigaOM took their successful format for the Structure events to Europe this week for the first time and SQLstream’s CEO, Damian Black, was one of the key speakers. In a session with GigaOM’s leading Big Data writer, Derrick Harris, Damian discussed how real-time Big Data can only be achieved using continuous streaming integration, and how standard SQL is the perfect vehicle as a powerful language for the massively parallel distributed processing of live data streams.

As always, the GigaOM events were extremely well executed,and drew many of the main thought leaders and business leaders in the industry. For example, Ame Awadakkah CTO of Cloudera and Werner Vogels CTO of Amazon.

Real-time Distributed Big Data Processing

Damian’s talk was entitled ‘The Rebirth of Dataflow Computing: Real-time Distributed Big Data Processing’, and discussed how dataflow, an architecture for massively parallel distributed computing, is ideal for streaming Big Data integration and analysis. And SQLstream’s dataflow architecture harnesses the power of parallel dataflow using standard SQL as the data management language.

SQL is ideally suited to driving massively scalable applications based on a parallel, distributed processing architecture. In turn, this enables our customers to respond to each and every new record or piece of data as they arrive, in real time. So SQLstream is super low latency, unlike traditional data warehousing and Hadoop storage platforms, where data must be stored first and then queried. Even if you query faster, querying faster is not real-time.

Posted under Big Data · Real-time · Streaming SQL

Vice President Marketing

We’re at the Intelligent Transportation Society’s World Congress in Vienna, Austria next week, Mon 22 – Fri 26 October. The theme for this year is ‘Smarter on the Way’ and this is the first year since 2009 that it’s been held in Europe. The full panoply of intelligent transportation issues is addressed, everything to do with software, hardware and firmware that improves the traveler experience, transportation network utilization and capacity optimization. However, the combination of real-time traveler information systems, wireless sensors, GPS and predictive analytics is a specific focus. And increasingly, telematics and V2I/V2V applications are represented.

We’re going to be exhibiting, but also presenting on a special interest panel session: Optimizing ITS from a Customer Service Perspective. Essentially looking at how new technology can be used to make a significant and noticeable difference to a traveler’s experience. Our focus is to bring to bear our experience of consumer and customer quality of experience monitoring from other industries, particularly telecommunications. Social media and Twitter also has a role to play, with significant interest is extracting semantic information from Tweets to detect serious traffic incidents and congestion as it happens. However, there are issues with relying on social media, as I’ve discussed here in a recent ITS International article, Social media mooted for traffic management.

Industries such as transportation, M2M and telematics are clearly leading the way in deploying real-time Big Data applications. And that’s understandable, it’s sensor networks and the Internet of Things that’s creating vast volumes of high velocity data where each event or record has value for only a very short period of time after its creation. If you’d like to understand more about streaming geospatial analytics from live vehicle GPS data, the video below presents a brief overview of a recent deployment.

Continuous ETL for Google BigQuery

There is another significant difference from the traditional business intelligence and real-time analytics market . The mainstream Big Data market is focused entirely on Business Intelligence, a one way function for  collecting data in a static store, and displaying the results on a graph or report. Visualization is also an important consideration for industry, but streaming integration is the first requirement, and the ability of the platform to support real mission critical applications a close second.

The Intelligent Transportation industry has all three requirements in abundance – vast volumes of fixed, wireless GPS sensor data that must be integrated seamlessly across the operational systems, mission critical operation as significant safety, environmental and traveller impact decisions are made as a result, and the abilty to visualize network flow, congestion and alarms.

So if you’re in Vienna, Austria next week, drop by our booth, P21.


Vice President Marketing
October 11, 2012

Perhaps the highlight of Oracle OpenWorld last week, or at least, the most commented on by attendees at our booth, seemed to be Larry Ellison’s demo of Exadata and Exalytics – querying 10 days or so of stored twitter feeds with the hope of finding the best US athlete from the recent London 2012 Olympics to endorse a car company. This seemed to strike a chord with the audience. How many organizations employ a marketing analytics company to spend a vast amount of time poring over data to work out the top candidates for a marketing campaign? That said, would the CMO really go with a query result, or chose their favorite in any case?

Cloud and business applications were a focus, although as others have blogged elsewhere, despite 80+ acquisitions in the past few years, Oracle remains a database company. Major announcements / news included:

  • Release of Oracle 12c (the ‘c’ for ‘cloud’), and the announcement of its first multi-tenanted and ‘pluggable’ databases got a few ripples of applause from the audience.
  • Exadata X3 box, the in-memory machine with 22 raw TB of memory and a claimed 10X compression making a total of 220TB of ‘memory’ in a rack. Oracle claims this is 100 times faster that the Exadata Oracle launched in the last few years.

Streaming analytics and Twitter

Back to Larry’s Twitter example. Of course, this can be achieved easily as a streaming application in real-time. Semantic streaming is something SQLstream’s been doing for some time, taking unstructured data such as twitter, emails and texts and determining sentiment and aggregated scoring in real-time. Use cases include identifying traffic incidents on the road networks to augment geospatial analytsis of vehicle GPS data, and also in telecommunications, to better determine in real-time a customer’s true perception of their quality of experience for delivered services.

The numbers seemed impressive – Larry crunched nearly five billion tweets and 27 billion social media relationships. But breaking this down, is this really a Big Data problem? Five billion tweets, even over a one day period (I believe the demo was 10 days), is only 58,000 tweets per second. This is well in access of Twitter’s top peak loads during major events such as the Superbowl. But well within the capability of SQLstream’s real-time streaming Big Data platform, even on an entry level single server, 2-core machine. Of course, the complete solution architecture may include data storage platforms such as Hadoop or Oracle, where aggregated streaming results can be loaded and persisted in real-time, further crunched in the data warehouse, and historical analysis joined back with the real-time streams to help identify better any moving trends.

It was an interesting demo nonetheless, and one that really should be completed in real-time as a streaming problem. SQLstream’s ability to analyze and aggregate streams across in this case keywords and hashtags, provide geospatial and clustering analysis, as well as delivering raw and aggregated data as continuous streams to the backend storage platforms, makes this very achievable today.

On the show floor

Oracle OpenWorld Speaking RobotApart from the heavy footfall at the SQLstream booth, perhaps most notable was the increasingly uninventive marketing mechanisms used to persuade unsuspecting attendees to listen to product pitches based on the promise of winning a piece of Apple hardware. Surely marketing managers can think up something a bit more inventive than an iPad? The exception was the the speaking robot. Not sure if this was an exhibit floor attraction, although I saw it ‘chatting’ to passersby on the Wipro booth.

Contact us if you’d like to find out more about SQLstream and our streaming Big Data management platform.


Vice President Marketing
September 28, 2012

We’re going to be exhibiting at Oracle OpenWorld with SQLstream next week. Looking forward to it following some great analyst briefings over the past few weeks and a lot of interest now from  companies looking for a real-time Big Data management platform where they can built there apps using standards-based streaming SQL.

In the beginning, or at least 18 months ago, there was Hadoop, and an increasing awareness that the Internet, sensors, M2M and cloud infrastructure was generating more data that existing platforms could manage.  The trend was Map-Reduce, NoSQL databases and a wide range of analytics companies claiming ‘real-time’. But querying faster is not real-time, even with the distributed power of Map-Reduce and more structured storage platforms such as HBase and Cassandra. Nor does Hadoop and related technologies address one key requirement – the real-time integration of high volume, high velocity and high variety data. Real-time, streaming integration is essential to making Big Data applications work.

Even more importantly, mainstream industry is looking to adopt real-time streaming technology, but does want to base their mission critical applications on high latency, non-standardised development frameworks.

This is where SQLstream comes in.  A real-time Big Data management platform for integrating live data, and performing real-time, in-memory analytics on the data as it streams past.  Industry and major companies have a real-time need, but prefer the use of standards such as SQL. The ability to lift existing SQL from their high performance analytics databases and data warehouses and drop the queries into a real-time streaming platform is a key.

So if you’re going, come and see us, we’re at #106 Moscone South.

Posted under Uncategorized

Vice President Marketing
September 25, 2012

This week we’re at the Intelligent Transportation Society of California’s Annual Conference & Expo in Sacramento. The conference is focussed on the adoption of advanced technologies to improve traveler mobility and heighten safety in California.  Attendees include specialists from government, industry and academia.

This week we’ll be demoing real-time traffic congestion analytics and visualization based on processing live vehicle GPS data. Play the video below to watch a short video of the live customer case study.

The real-time traffic analytics and connected vehicle programs are of particular interest to SQLstream.  SQLstream turns live data into real-time value, enabling industries such as Transportation, Telecommunications and Automotive Telematics to integrate and analyze all their real-time data on a single, streaming Big Data platform.

So if you are visiting Sacramento this week, we’d love to meet up.


Vice President Marketing
August 23, 2012

SAN FRANCISCO, CA, August 23, 2012. SQLstream Inc, the leading standards-based real-time Big Data solutions provider, today announced the appointments of Glenn Hout as Vice President of Sales for the Americas and Chris Clabaugh as Vice President of Business Development. Hout and Clabaugh join the senior management team to build on SQLstream’s strong foundation in North America, and to expand SQLstream’s success into Asia Pacific and Europe.

Damian Black, SQLstream CEO, attributes SQLstream’s success to the rate with which industries such as transportation and telecommunications are adopting real-time, streaming Big Data technology. “As a result, we have seen a surge in customers seeking real-time Big Data solutions,” continues Damian. “Chris and Glenn bring passion, experience and leadership, as well as the existing relationships necessary to establish SQLstream as industry’s preferred partner for mission critical, real-time Big Data applications.”

Glenn Hout brings twenty-four years’ experience in the database and software applications industry with both early-stage and established software organizations. Glenn joins SQLstream from Big Data startup Algebraix Data, and prior to that, held management positions at Oracle, Information Builders and Hyperion solutions.

Chris Clabaugh joins SQLstream from Kabira Technologies (acquired by TIBCO) where he was Vice President of Business Development, overseeing ISV and OEM partnerships globally, managing branch operations in Japan, and opening new markets and geographical expansion in the Americas. Prior to Kabira, Chris held management positions at Collabnet, Progress Software, Allegrix and SCO.

Interested in Real-time Big Data solutions? Register here for a 60-day free trial of SQLstream s-Server. The download version is fully functional, and includes all adapters, drivers and development tools.

About SQLstream Inc.

SQLstream is the leading standards-based platform for integrating and analyzing streaming Big Data, forging real-time competitive advantage from live service, system and sensor data. SQLstream’s standards-based, distributed and scalable architecture uses industry standard SQL for the rapid analysis of high volume, real-time data streams. Standards mean lower costs, proven performance and seamless integration. With SQLstream, our customers are turbo-charging their Big Data environments for real-time, and responding with confidence to business exceptions based on accurate, up to the second information. SQLstream is headquartered in San Francisco and can be found on the web at www.sqlstream.com.

Media contact: Ronnie Beggs, +1 877 571 5775, pr@sqlstream.com

Posted under Press Releases
August 3, 2012

Datanami article on SQLstream for real-time Big Data solutions, reporting on a SQLstream application for real-time traffic analytics based on live GPS data.

“Providing real-time traffic updates is a practical and feasible application of the big data analysis technology that exists today. According to real-time streaming big data platform vendor, SQLstream, which aims to integrate and quickly analyze live data feeds,  “High-performance systems such as SQLstream Transport are able to transform large volumes of raw GPS data into real-time actionable information.” The transformation in question took place on the roads of Venezuela in the form of SQLstream’s ETL Connector for Google Big Query….”

Read the full article on datanami here http://bit.ly/M9ps1b

Posted under In the News

Vice President Marketing
July 24, 2012

SQLstream, the leading platform for real-time Big Data integration and analytics, today announced it has joined the Google Cloud Platform Partner Program as a Technology Partner with its release of the Continuous ETL connector for real-time Big Data integration with Google BigQuery.

Continuous ETL for Google BigQuery

SQLstream’s continuous ETL is the key building block for powerful real-time Big Data solutions that integrate vast volumes of real-time streaming Big Data with historical trend information. SQLstream provides streaming integration for structured and unstructured data in real-time and on a massive scale, overcoming the poor scalability and high latency issues that are typical with traditional batch-based solutions. With SQLstream, organizations can act instantly on new information as it arrives, improving operational efficiency and driving new revenue opportunities.

“The Google Cloud Platform Partner Program enables us to integrate our product offerings for real-time Big Data with the power of the Google cloud platform,” said Damian Black, SQLstream CEO. “SQLstream is the only real-time, streaming Big Data management platform that is built on the ANSI and ISO SQL standards, making it the perfect real-time complement for Google Big Query. Our customers benefit from the power of the Google Cloud Platform while utilizing the same SQL skills to perform streaming Big Data integration with real-time analytics.”

The continuous ETL integration has already been deployed by Grupo Intech in Venezuela as an integral component of a SQLstream and Google BigQuery solution for detecting road network traffic congestion in real-time using GPS data. “Understanding the quality and coverage offered by each of our GPS data providers is essential,” said Marcelo Ricigliano Cantos, CEO of Grupo Intech. “Google BigQuery is updated continuously and in real-time by SQLstream, and generates on-demand confidence indicators as to GPS data quality and reliability.”

“To help customers get the most out of our cloud platform products,” explains Eric Morse, Head of Sales and Business Development, for Google’s Cloud Platform, “we work closely with technology companies, like SQLstream, that provide powerful complementary solutions integrated with our platform.”

Google’s Cloud Platform products enable customers to implement:

  • Cloud app solutions, such as mobile apps, social apps, business process apps, and websites, using Google App Engine and Google Cloud SQL.
  • Cloud storage solutions, such as high-end backup and recovery, active archiving, global file sharing/collaboration, and primary SAN/NAS, using Google Cloud Storage.
  • Large-scale computing solutions, such as batch processing, data processing and high performance computing using Google Compute Engine.
  • Big data solutions, such as interactive tools, trend detection and BI dashboards, using Google BigQuery and Google Prediction API.

About SQLstream Inc.
SQLstream is the leading standards-based platform for integrating and analyzing streaming Big Data, forging real-time competitive advantage from live service, system and sensor data. SQLstream’s standards-based, distributed and scalable architecture uses industry standard SQL for the rapid analysis of high volume, real-time data streams. Standards mean lower costs, proven performance and seamless integration. With SQLstream, our customers are turbo-charging their Big Data environments for real-time, and responding with confidence to business exceptions based on accurate, up to the second information. SQLstream is headquartered in San Francisco and can be found on the web at www.sqlstream.com.

Contact: Ronnie Beggs, VP of Marketing, +1 877-571-5775

Posted under Press Releases

Vice President Marketing

SQLstream has joined the Google Cloud Platform Partner Program as a Technology Partner with the release of our Continuous ETL connector for real-time Big Data integration with Google BigQuery. Continuous ETL solves the real-time performance issue for Big Data storage platforms. Although very capable of processing large datasets once the data has been stored, Big Data storage platforms are not designed to process real-time streaming data.

Continuous ETL for Google BigQuery

SQLstream’s Continuous ETL connectors enable SQLstream to integrate and analyze vast volumes of live, real-time Big Data, and to update the Big Data storage platforms immediately as input data arrives. With existing ETL solutions, the data warehouse and Big Data storage platforms are updated in batch mode and are therefore always out of date. Continuous ETL updates continuously and in real-time.

Real-time, integrated Big Data solutions

Grupo IntechThe continuous ETL integration was carried out by Grupo Intech as an integral component of a SQLstream and Google BigQuery solution for detecting road network traffic congestion in real-time using GPS data.

GPS records are collected from several specialist data providers. Each GPS record contains the position, direction and speed of a vehicle. SQLstream collects GPS records from all the providers in real-time and turns the live GPS data feeds into real-time traffic flow information, and by applying a variety of congestion detection algorithms, is able to generate a map of the current traffic congestion positions, plus additional information as to the extent or severity of the problem.

SQLstream calculates real-time traffic analytics and displays the results in real-time on a map-based display. Roads are color-coded based on the average traffic speed relative to the posted speed limits for each road segment. Colored push-pins appear on the highways to provide further information on the potential problem, such as the average speed for every minute over the previous 15 minutes.

Continuous ETL and Google BigQuery

So how does Big Query help? GPS data is of variable quality, and also, as the data is available from many different providers, the network coverage for each provider can differ significantly.

Google Big Query is used to store the real-time traffic information produced by SQLstream, and to generate a confidence index on the coverage and quality, and therefore the reliability, of the incoming GPS records. Effectively, Google Big Query is helping Grupo Intech answer the question – do we have sufficient valid data for the real-time congestion information to be reliable.

SQLstream’s Connector for Google Big Query is used to deliver real-time analyzed data using a Continuous ETL operation. With continuous ETL, streaming data is continuously acquired, cleaned, conditioned, transformed and then periodically appended to the BigQuery table, for example, every minute or every 500,000 events. Therefore the data in Big Query is always up to date and accurate.

Here we see the Big Query table, which holds a minimum of 3 months of enhanced and interpolated data for ad-hoc data mining. The period can be extended if needed for improved accuracy.

Continuous ETL – more than real-time

SQLstream’s analysis of the road network is based on individual 10m road segments. Typically GPS records are not available for every vehicle for each and every road segment, so SQLstream interpolates missing road segment data in real-time, and continuously appends those results as well into the Google Big Query table – storing a complete real-time map of traffic flow for the entire road network at a level of granularity of every 10 meters. Therefore unlike traditional ETL tools that tend to offer only a basic mapping function, SQLstream delivers significantly more information into Google Big Query than is available in the source data.

How does it work

SQLstream allows the user to zoom in and center the geographic map to an area of interest. This action defines a bounding box and the coordinates of the bounding box are used in this first Big Table query to extract the number of 10 meter road segments within the bounding box. The action of defining the bounding box in SQlstream  launches a query similar to the following example:

select count(*) from road_elements
where
  "reLatitude" between 7.7109920000000001 
                   and 9.8525100000000005 AND
  "reLongitude" between -65.754776000000007 
                    and -62.958756000000001.

This query retunes the number of road segments within the bounding box (in this example, there are 191,866 road segments) which is then plugged into the query executed by BigQuery in order to calculate the percentage of road elements for each GPS data provider actually used for calculating the road network coverage within the bounding box.

select vendorID, float(integer(count(*) / 191866 * 1000)/10) as Confidence
from (select vendorID, reID from sample.july2nd
where 
 float(reLatitude) between 7.7109920000000001 
                       and 9.8525100000000005 AND
 float(reLongitude) between -65.754776000000007 
                        and -62.958756000000001
group by vendorID, reID)
group by vendorID order by Confidence desc

The query results are returned to SQLstream which displays the answer to the user. Big Query is ideal for these types of geospatial coverage metrics. There is a high volume of meaningful data stored in the Big Query table and the calculation of coverage is available immediately. By choosing a variety of bounding boxes, it is possible to obtain coverage metrics a country-wide basis, on a metropolitan area basis, and on for a much smaller problem area.

Conclusions

In conclusion, we have demonstrated how continuous ETL is the basic building block for integrating real-time and historical data. Continuous ETL enables offline data storage to be kept up to date directly from real-time applications. And unlike conventional ETL solutions that simply map data between systems, SQLstream’s Continuous ETL connector for Google Big Query delivers aggregated data, analytics as well as augmented and interpolated data values.

 


Vice President Marketing
June 25, 2012

We’ve been exhibiting at Structure 2012 in San Francisco, where our CEO Damian Black was speaking on dataflow architectures for massively scalable real-time Big Data computing. In fact, this was a milestone for us as Damian was on the very first Big Data panel at the first Structure event in 2008.

Dataflow is a technique for parallel computing that emerged from research in the 1970s. It’s based on graph-based execution models where data flows along the arcs on a graph and is processed at the nodes. It was decades ahead of its time in an era when hardware was expensive and real-world requirements for massively parallel, low latency computing architectures were not required in the mainstream. However, dataflow as an architecture has found its place and time, with the emergence of Big Data volume, real-time low latency requirements, commodity hardware and low cost storage. Dataflow is driving the architectures for today’s real-time big data solutions.

Structure 2012 - Dataflow comes of age

Click to view Structure 2012 presentation video

SQLstream adopted the principles of dataflow as the basis of our architecture for SQLstream s-Server. Our adapters turn any data source into a live stream of data tuples which are combined, aggregated and analyzed by the SQLstream s-Server platform. SQLstream has added one essential feature to data flow – the use of SQL as a dataflow management language. SQL has been used for some time as the language of choice for relational database management systems, and in this context is getting a bad press in light of new structures for Big Data storage and NoSQL queries. However, SQL is powerful, declarative (therefore applications can be built easily, quickly and cheaply) and is a natural, powerful paradigm for processing streaming dataflows. The benefit is extremely low latency with the ability to process massive volumes of live data over an unlimited number of servers – exactly the requirements of real-time Big Data. In fact, this is the only architecture capable of processing real-time Big Data streams. With real-time requirements now in the 20 to 100 million events per second range, power, scalability and low latency are key.

Dataflow architecture for real-time streaming Big Data computing

Diagram 1: Dataflow architecture for real-time streaming Big Data computing

The SQLstream s-Server architecture concept is illustrated in Diagram 1. As a dataflow architecture, each node is a streaming SQL statement – a continuous SQL query, processing arriving data over a moving time window (time windows can be from 1 millisecond for ultra low latency requirements, to months or even years where comparison against long term moving averages is required, for example, Bollinger bands). Why is this important? Well, it’s the only approach for low latency, real-time solutions, as information flows out of the system as soon as input data arrives, that is, the high latency of batch-based approaches such as Hadoop Map-Reduce is removed completely.

Mozilla Glow: Real-time download monitor with SQLstream and HBase

Mozilla Glow: Real-time download monitor with SQLstream and HBase

Damian presented a simple example of SQLstream and parallel dataflow in action. Mozilla’s Glow application is a continuously updating download counter for the Firefox 4 browser when it was released. The application used SQLstream s-Server to collect live download statistic from all the download servers worldwide. Download records were processed and aggregated in real-time and displayed on the Glow visualization map, illustrating exactly how many copies of the browser had been retrieved. SQLstream s-Server also provided a continuous ETL operation into Apache Hbase, storing aggregated and filtered records for further in depth analysis. Click here to watch the application in action.

Finally, and in contrast to Structure, we also attended a Gartner session last week with Merv Adrian and Svetlana Sicular, which sought to bring some sense of perspective to Big Data. This was really a reality check as to the current maturity levels of the Hadoop Big Data platforms and the effort required to deploy. The wider adoption across industry in general will require significantly more mature products and applications, particularly around the OPEX costs for deployment, security concerns and ability to deliver business intelligence all consumers in an large organization. The recommendation was to use an integrator such as Cloudera or Hortonworks. Mainstream organizations are looking at the Hadoop / Big Data approach, but many do not currently see either a use case or a reason for adoption. It was interesting to hear a perspective that didn’t need to be buzzword compliant, and presented a positive yet realistic perspective on wider adoption.

Posted under Big Data · Streaming SQL · Uncategorized

Vice President Marketing
June 7, 2012

We’re at Sensors Expo this week, showcasing in the Big Data & Analytics Pavilion. This is the first year the event has included a specific area for real-time Big Data solutions for sensor networks.

Real-time control in a Big Data World

SQLstream CEO, Damian Black, presented on Real-time Control in a Big Data World. The presentation focused on the increase of sensor data and the emergence of the “Sensor Internet”, plus the applications required to collect and analyze streaming sensor data, and to drive real-time actions and updates. In particular, addressing the emerging real-time Big Data challenges in this area driven by wireless and GPS technologies, M2M applications and V2V/V2I.

SQLstream Damian Black Sensors Expo 2012 Real-time Big Data Integration

Click here to view Real-time Big Data integration and analytics for sensor networks

Real-time streaming data integration for Big Data

It’s clear the primary challenge is not managing the data volume per se, or even delivering real-time operational intelligence, rather it’s the more fundamental issue of real-time streaming data integration. How can such huge volumes of data from many different sources and locations be integrated into the operational platforms, and how can the issues of multiple operational siloes be overcome to provide an integrated real-time control platform.

Interestingly, these are the exact same issues SQLstream addresses for the Big Data and Hadoop world in general – getting data in, getting data out, connecting existing data stores in real-time, and delivering real-time in-memory analytics on the data as it streams past:

  • Real-time streaming data integration of any data source and between existing storage platforms and operational systems
  • Real-time streaming monitoring and analytics on the arriving and streaming data
  • Scalability through parallel distributed processing of processing pipelines

The importance of geospatial analytics

Geospatial analytics is a key requirement in the sensor data market. Big Data analytics in general is about one dimensional problems, usually the correlation of similar events, or the correlation of events over time. The geospatial dimension is the key difference between Big Data platforms for the “Sensor Internet” and the wider IT / machine data applications. Fortunately this has been a feature of SQLstream for some time, and central to many of our customer deployments. For example, real-time traffic analytics from GPS data, and real-time seismic monitoring.

Posted under Uncategorized

Vice President Marketing
June 5, 2012

Glue Conference 2012 , Denver CO, at the end of May was a great conference, well attended, knowledgeable participants and is the only conference I know that looks at gluing cloud and mobile applications together with a developer focus.

There was the usual wave of NoSQL, cloud storage, cloud platforms and Hadoop presentations, as you’d expect, but also with some interesting keynotes as well. Ray O’Brien, CTO for IT at NASA. talked about the evolution of Nebula and OpenStack at NASA, and James Governor from Redmonk, talking about the evolution of historical analytics.

From our perspective, the strength of the show was in making physical rather than logical connections. Both partnerships and potential customer interest in building real-time Big Data applications, and how SQL has been repurposed as an API for streaming Big Data, moving it forward significantly from its roots as static data management language.

Real-time, streaming Big Data

Relational Streaming for real-time Big Data scaling

Relational Streaming for real-time Big Data scaling

Our CEO, Damian Black, presented on real-time streaming Big Data, both as a real-time alternative to Hadoop, and also as a complement to add real-time responses and streaming integration to existing Hadoop installations. One question we were asked several times was why SQL? A good question. This isn’t a religious debate about the language by any means, and if we had opted to build a Big Data batch storage and analytics platform (e.g. like Hadoop), we would have gone a different route.

However, when it comes to processing streaming tuples in real-time, a standard SQL approach has two big advantages over all others. First, with the extension of the SQL WINDOW operator to process streaming data over fixed time windows, both structured and unstructured data can be processed painlessly without having (no pun intended) to define a static schema and without the need for any coding whatsoever. In effect, SQLstream processes streams of arriving tuples over time windows and pushes out the results to other systems.  Similar in concept at least to Hadoop, although Hadoop is purely batch-based, processing static files and pipelining sets of tuples through low level Map-Reduce functions.

Relational Streaming for Real-time Big Data

Click on the title slide to view the real-time Big Data presentation

However, the second benefit is equally important. Streaming SQL queries include standard operators such GROUP BY and PARTITION. These provide the best clues possible to a query planner capable of automating the dynamic scaling of streaming pipelines over vast numbers of servers. This gives a reliable and controllable mechanism for Big Data scalability without the need for hardcoding server allocation hints.

Real-time at GlueCON

The strength of the real-time track at GlueCON was encouraging. It was interesting though that the term ‘real-time’ is now about as over used as ‘Big Data’, and about as poorly understood. For SQLstream, it’s the streaming integration of any and all data sources with in-memory analytics, processing streams at millions of events per second. For some other vendors, it appears real-time drops off at significantly lower rates and numbers of connections!

Next stop, GigaOM Structure in San Francisco

Next stop GigaOM Structure in San Francisco, June 20 / 21 at the Moscone Center. Visit us there if you’re attending.

Posted under Big Data · Real-time · Streaming SQL · Uncategorized

Vice President Marketing
May 22, 2012

We’re at ITS America Annual Meeting, National Harbor, Washington DC  this week (see the team in action below), the yearly opportunity for the US intelligent transportation community to get together and discuss how IT and technology can be used to better serve travelers, industry and government. Real-time, the Smart City and Vehicle-2-Vehicle communication have been the trend topics.  Bill Ford previously said “the car will be a rolling set of sensors” has echoed here.  Martin Thall, VP Telematics at Verizon talked about the issues of getting the data up from the vehicle, and how 4G / LTE will be key due to the large volume of fast data, however the pricing and subscription model for 4G services is currently one of the hurdles to wider telematics adoption.  Essentially real-time Big Data requirements for mission critical applications, the area where SQLstream excels.

Chris Vein, deputy white house CTO discussed the need for open standards, and for government to find ways for private industry to participate more.  The idea of “government as platform, encouraging the “lean startup” and to offer “smart disclosure” of data to encourage information sharing and real innovation.

Why is my GPS Travel Time always wrong?

Ursula Burns, Xerox CEO captured exactly the need for SQLstream s-Transport. How come, every morning when I drive to work, does my GPS always tell me the journey will take 37 minutes, and it has yet to take 37 minutes?  Accurate and reliable real-time Travel Time is the first problem SQLstream solved with the SQLstream s-Transport product.  Using GPS data to monitor traffic flow accurately and in real-time, and present correct end to end journey time information throughout the journey, not just at the beginning.  (More on SQLstream s-Transport here).

I’ll update and publish a final report on the show  these next final few days. If you’re attending ITS America Annual Conference, National Harbor, Washington DC, please drop by Booth #631.

Posted under Uncategorized

Test
May 10, 2012

This week I’m attending an interesting conference at UC Berkeley called the “Berkeley conference on Streaming Data”.  The organizers are primarily astronomers and statisticians, but the talks discuss issues and solutions to streaming data problems across a wide selection of scientific areas and engineering applications.  Real-time streaming Big Data applications presented included oceanography biology genetics, reading handwriting, astrophysics, particle physics, recommendation engines for social media, and inevitably, real-time fraud detection from live data feeds.

I presented on a deployment of SQLstream as a Dynamically Scalable Cloud Platform for the Real-Time Detection of Seismic Events. Based on work with UCSD seismologists, SQLstream has been deployed to detect significant events in data collected from a large grid of seismic sensors. A large-scale data infrastructure (the OOI/CI) provides raw signal data over an AMQP message bus.

Plot of Seismic Events

SQLstream monitors live seismic data feeds in real-time, applying heuristic algorithms that look for patterns indicating earthquakes. The live system scales dynamically across multiple servers in a cloud environment based on the current demand. You can view the presentation here.  I also blogged previously on the application here.

In conclusion, I have two main observations from the conference so far (it continues until Friday). The first is that the majority of fields in science and technology appear to have a Big Data and often a real-time Big Data problem.  Secondly, the extent of the innovation and computer science resources dedicated to solving these problems.  In particular for this conference, developing algorithms for data analysis and machine learning (that is automatic pattern recognition) that work on streams of flowing data.  It’s clear that traditional data management and even Big Data batch-based methods don’t work when you need continuous results from dynamic data. And the amount of data is huge.


Vice President Marketing
April 25, 2012

The Text Analytics Summit in London this week was an opportunity to catch up on the latest trends and state of the Text Analytics market.  An interesting couple of days with a few themes emerging.

Firstly, Big Data.  Not entirely unexpected, but almost every presentation referred to Big Data in some shape or form.  In part this was referring to the volume of data to be processed, but primarily in the context of databases for the storage and processing of unstructured data of any volume.

Although not discussed explicitly, there’s obviously a search for business models that work.  Most applications were B2B platforms, sold as a package of product, services and consultancy, enabling organizations to better mine text data for market and competitor intelligence.  However, some were seeking to monetize through subscriptions to information feeds.

 

For SQLstream, we presented on the use of real-time text analytics for improving incident detection and prediction.  In particular, the use of real-time Twitter and text messages for identifying Quality of Experience issues with IP content services, but also the use of Twitter for improving real-time incident detection in transportation networks.  And in line with the rest of the conference, we did our bit for Big Data, describing how real-time streaming integration and analytics can be built on unstructured data analytics as an integrated real-time Big Data and Hadoop platform.


Vice President Marketing
April 19, 2012

Joining real-time structured and unstructured data feeds for better accuracy and reliability from your operational intelligence, and the Text Analytics Summit, 2012, London.

Three IT trends have emerged over the past year – Big Data, real-time and the importance of unstructured data. Taking the latter first, there is an increasing awareness that much of the data we have available to us today is unstructured (Cloudera amongst the many claiming 80% of all data is unstructured).  Unstructured data includes text messages, documents, tweets emails and video content. There’s also a growing industry for tools and software that perform unstructured data analytics – primarily text analytics using semantic modeling, tagging and subsequent analysis.

The past year has also seen Big Data and Hadoop emerge from the rarefied atmosphere of California’s Silicon Valley into mainstream IT.  Driven by statistics such as 90% of all data available today has been generated in the past two years, Big Data as a functional area for primarily unstructured data is here to stay, and is effectively supercomputing lite for the masses.

The need for real-time streaming data management

However, the real-time trend is less well served today by either Hadoop or by the currently available tools and software for unstructured data analytics. Real-time is about the need for immediate detection and response – turning data sources into live data feeds, and processing the data on the fly, then loading batch based distributed platforms such as Hadoop as an output data stream.

‘Stream Reasoning’

I’ve also seen the term ‘stream reasoning’ used to describe the real-time processing of unstructured data, although this is still an area that is less well developed and understood than the more mainstream text analytics from stored data.  ‘Streaming Reasoning’ is the ability to process and respond to semantic knowledge about tweets, messages and other social media interaction in real-time, on the fly. The diagram below illustrates how a semantic modeling library has been plugged into a real-time streaming pipeline in SQLstream – the example is based on SQLstream’s GATE UDX but any library with reasonable performance and a query response API can be plugged in.

Combining streaming structured and unstructured live data feeds

Unstructured data feeds, such as text messages and tweets, are streamed through the semantic tagging UDX and library, with the output of this stage being real-time streams of semantic tagged data.  The data can then be analyzed and frequency charted in real-time.

Text Analytics Summit, 2012, London

I’ll be speaking on this topic at the  Text Analytics Summit, 2012, London.  I’ll be discussing how to combine streaming reasoning (admittedly, mostly Twitter messages) with structured data, with the objective of improving the overall accuracy and reliability of the resulting operational intelligence.  I’ll be using a couple of examples – customer experience management for IP content services such as VoIP and VoD, and also improving the accuracy and reliability of traffic congestion information and travel time information – how can text analysis of tweets and messages help to pinpoint the severity of road network traffic problems.

Look forward to seeing you there, or if you can’t make, I’ll be blogging on the highlights next week.

 


Vice President Marketing
April 2, 2012

Last week SQLstream sponsored and CEO Damian Black presented at Structure Data in New York, a conference exploring “the technical and business opportunities spurred by the growth of big data”.

It’s clear that Big Data has moved on considerably in a very short space of time. From the Silicon Valley, 101 world of Java developers and Hadoop, into the mainstream wider business world (but still with Hadoop!).

Some themes emerging from the conference:

  • The basic need to deliver high performance, massively scalable computing infrastructure as data volumes grow exponentially. It’s clear that the pain from structured and unstructured data is driving different approaches at different stages in the data management lifecycle – better visualizations, better cleansing and filtering, and a better understanding of the appropriate analytics tools that are most applicable at each stage.
  • The emergence of the SQL layer. It’s clear Hadoop has its strengths and is here to stay. It’s effectively ‘supercomputing lite’ and given today’s data volumes, is just the tool for the job. However, there are a couple of trends emerging. First, is it actually necessary to store all the data, when much of it is obviously not of interest? Second, once the initial analysis of both all structured and unstructured data is achieved, there’s an emerging layer above Hadoop that’s looking very structured.  Both these functions are looking much more SQL-like.
  • Real-time, low latency analytics. Hadoop is not, nor does not claim to be, a low latency, real-time data management platform. There is a well-defined business need to analyze log file, sensor and network data in real-time (sub-second to a few minutes latency), but also to stream the arriving data through to Hadoop for further analysis. Obviously this layer needs to as scalable, if not more so, than the underlying Hadoop platform.

Damian’s presentation Structure Data focused on relational streaming – massive-scale parallel data processing using SQL, generating real-time results from streaming input data. The talk described relational streaming as a standalone real-time management layer, and also SQLstream integrated with Hadoop as the streaming layer in the Big Data stack (you can also read the GigaOM report in the presentation here).

 

Posted under Big Data · Events · In the News

Vice President Marketing
March 26, 2012

GigaOM reports on Damian Black, SQLstream CEO, talking about streaming Big Data at Structure Data, New York,March 21 – 22. In the talk entitled “Streaming Big Data: Millions of events per second”, Damian discussed the similarities between Hadoop Map/Reduce and the parallel, distributed architecture used for streaming data processing in SQLstream.

Click here to read the GigaOM report.

Posted under In the News

Vice President Marketing
March 20, 2012

Visit our new website to find out more about real-time Big Data applications

Big Data is here to stay. The breadth of the term Big Data may change as it becomes as much a marketing imperative as the ‘Cloud’ word, but the requirement for ‘supercomputing lite’ processing for the non-supercomputing world of enterprise data is a must have.

The rise of Big Data has happened in parallel with the emergence of real-time operational intelligence, and the extension of real-time analytics into the world of real-time updates and process control. Much of the recent interest has focussed on how these two worlds merge into a single complementary solution.

The NoSQL BigData platforms offer massively scalable, resilient data processing over commodity hardware. Ideally suited to scaling large scale data problems over hundreds or thousands of servers. However, platforms such as Hadoop do not support, nor were designed to support, real-time streaming data processing and analytics. Their forte is the batch-based, highly scalable, store-compute loop of map/reduce.

That’s where SQLstream comes in. SQLstream collects and conditions real-time updates from sources such as log files, sensor networks and GPS events, and both integrates streaming data into and from Big Data stores, but also generates real-time analytics from the data as they stream past. The SQLstream architecture also has parallels to that of map/reduce. SQLstream uses Relational Streaming, which is a paradigm for processing streaming Big Data tuples using standard SQL queries. SQL offers strong potential for automatic optimization and distributed parallel processing of streaming data. Whereas platforms such as Hadoop execute batch queries over stored tuples, SQLstream and Relational Streaming executes continuous queries over arriving data.

We’re also at Structure Data this week in New York, where our CEO, Damian Black, will be presenting on the wider area of streaming Big Data and massive scalability. However, if you are attending, visit us for a demo of the ‘millions of events per second” program, and a demonstration of massively parallel stream processing on an Elastic Compute Cloud.


Test
March 13, 2012

SQL is a declarative language – a SQL query is a specification for the result, it’s neither a recipe nor a program to produce the results. A traditional relational database query returns a set of rows, the ResultSet. A streaming SQL query in SQLstream returns a stream of rows. That is, the ResultSet may never end. In a traditional relational database query, all the rows are fetched, and the ResultSet scans them. With a relational streaming query, the result rows do not exist as yet – as time goes by they come into existence as arriving data are processed.

However, just like any relational database, the SQLstream stream computing Server has two main components:

  1. The query engine or planner, calculates the most efficient plan to produce the requested results – this is the query plan.
  2. The data engine or kernel, executes the query plan to produce the results. The scheduler controls the execution process.

Streaming dataflow graphs

Executing a query means computing the results from the inputs. For a streaming query that means processing the rows in the input streams as they arrive. The execution is organized as a dataflow graph, that is, a mathematical directed graph of nodes and arrows, where the nodes represent elementary operations on data, and the edges into and out of a node represent the input and output data streams. In effect, an assembly line that produces results, where the nodes are the machines or stages on the line.

A query plan defines one of these data flow graphs. The executor runs the data through the graph: it is responsible for executing the nodes, where each node consumes rows arriving on input edges and produces rows on the output edges. Of course each output edge is often the input of a downstream node.

Multiple, connected query plans

In a traditional static database, each query plan is independent and transitory, and operates against persistent tables and indexes. In a relational streaming platform, the query plans last forever and are interconnected. Although streams are used just like tables in SQL, they are not persistent, in fact they have no contents at all, and can be as rendezvous points that accept input rows and pass them on to their output consumers.

Now if the data flow graph were a physical system — say, a collection of transparent plastic straws with colored water flowing through them — then all the processing would be happening simultaneously. However, for the software abstraction of the streaming straw pipelines, it’s not practical or necessary to run all the nodes at the same time. It is the scheduler that manages this network of interconnected query plans, and when and how to execute each node.

In a traditional, static database, the result of a query is a set of rows that are computable all at once. The executor can give good performance by running one node at a time, pushing batches of rows through the graph. A streaming database is different. When the inputs are streams of recent events, arriving in real time, it’s important to produce the outputs fast enough so that the result rows are timely.

The execution works by pushing outputs, not by pulling inputs, and it means executing several nodes at the same time, whenever possible. This requires a finer management of the execution objects and the ability to schedule parallel execution on the nodes.

Parallel scheduling of stream execution

In SQLstream, multiple, interconnected query plans are being executed at the same time. Together they constitute a large dataflow graph in which each node is a mini data processing machine that performs a simple operation on its input data, and passes its output to the next node.

The scheduler is responsible for managing the interconnected dataflow graph. It keeps track of the status of each node: at any moment, some are running, some are ready to run, some are waiting for more input, some are waiting for their output to be consumed. Each node is allowed to run for a quota of time, and where possible nodes are selected to execute in parallel as separate threads.

The SQLstream scheduler may not want to be fair: some branches in the graph (some streams) may be more important, some may need high throughput, some may need lower latency. The application designer decides, and the SQLstream scheduler delivers.

Next time …

This is the first is a series of blogs discussed both the principles and practical examples of parallel stream execution. The next blog in the series will look at some real world examples, and how parallel execution is essential to deliver both high throughput and low latency requirements.

February 17, 2012

One of the great advantages of SQLstream as an analytical platform is that it uses the most popular, standardized language for data analysis, SQL. SQLstream worked to make only the minimum number of extensions to SQL necessary to encompass the streaming data paradigm, so that most streaming SQL pipelines look almost indistinguishable from SQL for reading static relational data. This enables data analysts to leverage virtually all of their existing SQL skills in the streaming context.

Similarly, SQLstream felt it was important to make the streaming environment feel familiar and productive to application developers as well, so SQLstream supports the standard JDBC interface for using streams, again with just the minimum extensions necessary to encompass streams.

This post assumes a basic familiarity with JDBC and its main components: connections, statements, and result sets. First we’ll look at these in their usual tabular context, then see what it takes to extend the model to streaming data. All the data items come from the SALES example schema that comes with SQLstream.

Reading JDBC Data

In a database world, the pattern for reading data is quite standardized: Connect to the database, execute a query, and read each row that comes back until there are no more rows. For example,


Connection c = getConnection();
try {
     Statement s = c.createStatement();
     ResultSet rs = s.executeQuery(“SELECT * FROM SALES.EMPS”);
     while (rs.next()) {
           System.out.println(rs.getString(“NAME”) + “ “ + rs.getString(“EMPID”));
     }
     rs.close();
     s.close();
     c.close();
} catch (SQLException se) {
}

This simple example loops through the entire EMPS table, printing the name and employee ID number for each row in the table, then finishes. The key for this model is that the result set is finite. Even if this were a very large table, the loop would eventually process every row. So you can handle reading the data from a table as a monolithic step in a sequence of procedures.

The Challenge of Streams

In a streaming data environment, however, you have to change a couple of your basic assumptions. In particular, I have found three Rules of Streaming that dictate how to write client code:

  1. There always might be more data.
  2. You never know when the next row might arrive.
  3. The rate of the rows matters.

The same pattern shown earlier for reading finite tables will work for streams,  as long as you don’t expect your application to do anything else. For a streaming ResultSet, the next() method only returns false when the stream closes. In many applications, that might never happen, or at least might not happen for weeks or months or longer, so clearly you cannot simply wait for each stream to end.

This is particularly critical in an application such as SQLstream Studio, where a developer needs to be able to edit the definitions of objects and at the same time be able to watch data flowing in existing streams. These streaming data views – known as Inspect windows – have to function in an event-driven, multitasking environment. There can be a virtually unlimited number of them active at any given time, along with editors, console views, and other dynamic content. So nothing should block the updates of other views. And for a little added complexity, Studio also needs to be able to handle non-streaming items as well, so preferably the same code should handle either tables or streams.

Studio also has to deal with a number of other requirements that are fairly unique to the development environment, such as how to manage updating a human-readable window of maybe 20 or 30 rows at a time from a stream that might be flowing at many thousands of rows per second. For now, we will just focus on the tasks common to handling streams in any application environment.

Reading in the Background

As with any long-running task, the solution involves partitioning the work into threads. Because of Rule #1 (“There always might be more data.”), you should just assume that reading from a stream needs to happen in a background thread. There is essentially never a scenario in which you want to wait for an entire stream to be read before proceeding to the next task.

To some extent how you implement your background stream-handling tasks will depend on the environment you are working in. Variations in the data rate and the required responsiveness of the application might cause you to make some different choices.

One good rule of thumb is to do as little processing as possible in the stream-reading loop. Slow processing of incoming rows can result in “back-pressure” in the data pipeline. As a result, it’s best to read rows from the ResultSet as expeditiously as possible, handing off the data to other threads for processing.

Here, for example, is a simple thread to read rows from the SALES.BIDS stream:


class BidReader extends Thread
{
      @Override
      void run()
      {
            try {
                 Connection c = getConnection();
                 Statement s = c.createStatement();
                 ResultSet rs = s.executeQuery(“SELECT STREAM * FROM SALES.BIDS”);
                 while (!interrupted() && rs.next()) {
                       // read columns and put into work queue for processing thread
                 }
                // close rs, s, and c
            } catch (SQLException se) {
            }
      }
}

Note that the loop doesn’t depend solely on the ResultSet’s next() method, but also tests whether the thread has been interrupted. You probably wouldn’t use this exact mechanism (I prefer to override Thread.interrupt() and set a boolean flag), but it shows you succinctly that you need to be aware of more things than just whether there is more data to read.

Another thing to keep in mind is that ResultSet.next() blocks until either more data arrives or the stream closes. That’s yet another reason to have this happening in a background thread.

You’ll notice that code snippet references a “processing thread.” That’s because of Rule #2 (“You never know when the next row might arrive.”). It’s generally good to decouple the reading code from the processing code. One easy model is to have the reader read each row, then add the row to a shared queue structure, where the processing thread can pick up the rows and process them asynchronously. This minimizes the chances of a slow processing step causing the reader to slow down and potentially push back up the stream. To the extent possible, you’d like your reader to be sitting and waiting for the next row when it arrives, rather than constantly trying to handle a backlog of incoming rows. If you have to have a backlog, you want it to be in your application, not in the stream server, because that will slow everyone down, including other applications trying to read from the same streams.

Such decoupling doesn’t really make sense in a pure database application, since there will be a finite number of rows to process, so your goal is to minimize total processing time. If there are 1,000 rows, it doesn’t matter whether you read them all in one second, then spend an hour processing, or read them over the period of an hour, processing as you go. The total amount of data read and processed is the same.

In the streaming world, it is important to read the data as quickly as possible in order to maximize the overall throughput of the system. That’s Rule #3 (“The rate of the rows matters.”) coming into play. The data throughput of a given stream is gated by its slowest reader. Part of the contract your client needs to fulfill is to not bog down the system for other clients.


Vice President Marketing
January 26, 2012

QoS and service level monitoring has always presented a challenge for telecommunications companies. With the increase in uptake of IP voice and video services, the vast data volumes generated, and the lack of an end to end view, make monitoring the service experience in real-time increasingly difficult.

Real-time QoS Monitoring

In this blog I’m looking at the core building blocks of a real-time IP service monitoring solution by using a much simplified view of a real-time application. Diagram 1 illustrates the basic problem – how to monitor an IP service when the end to end view is only possible by piecing together large volumes of events from many different sources – the core network provider’s network, the home network and the cable modem, and the service providers platforms.

In SQLstream we capture each event stream in real-time. Applications are built as streaming pipelines – unlike a traditional database solution, where event data must first be stored and then processed, SQLstream streams the data through processing views, capturing, combining, filtering, aggregating and applying analytics to the events streams, without having to store the data.  This enables real-time operational intelligence with extremely high volume performance with very low latency.

The first views in the pipeline capture the data streams. A declaration for an external data feed is shown below, the real-time MyEvent stream, where source of events is the external system agent or integration adapter.

CREATE OR REPLACE STREAM MyEvent
( "eventName" VARCHAR(10),
  "eventSeq" BIGINT,
  "eventVal1" INTEGER,
  "eventVal2" SMALLINT,
  "eventVal3" BIGINT )
DESCRIPTION 'source of events';

The raw MyEvent data is first filtered, searching for the events of interest. As illustrated in the code example below, these initial views tend to be as simple as possible in order to maximize reuse – the simplest being a SELECT STREAM * FROM WHERE statement. Streams can be combined, grouped or joined in a single view, or a single view provided per stream, or both.

CREATE OR REPLACE VIEW RawEvents 
    AS SELECT STREAM * FROM MyEvent 
WHERE "eventName" = 'RawEvent';

Diagram 2 illustrates the concept of the streaming data pipeline, using a simplified example for exception detection. The SQL view illustrated above is the Stream Capture #1 view in the diagram. The use case is built on a real world example, raising an exception if a number of events of a particular type or value are detected within a specified time window.

Real-time Stream Processing Pipeline

The second view in the pipeline, Stream Processor #1, is shown below. In this example the view is responsible for the basic processing of the stream, counting the number of events that occur within a time window, in this case 180 seconds.

CREATE OR REPLACE VIEW CountedEvents AS 
SELECT STREAM *, 
   COUNT("eventName") OVER win AS "eventCount",
   FIRST_VALUE(RE.ROWTIME) 
          OVER win AS "firstEventTime",
   FIRST_VALUE("eventSeq") 
          OVER win AS "firstEventSeq" 
FROM RawEvents 
   AS RE WINDOW win 
   AS (RANGE INTERVAL '180' SECOND(3) PRECEDING);

The final stage in this particular processing pipeline is the detection of the alert.

CREATE OR REPLACE VIEW FlagTriggerEvents 
    AS SELECT STREAM *, 
    "eventCount" >= 3 AS "alert" 
FROM CountedEvents;

It would of course be possible to include all processing in a single view. However, maximizing reuse of views is a major consideration when building a stream processing application. The example is to illustrate how a pipeline can be constructed, where each view can have any number of consumers. For example, any number of Rule views can read from the Stream Processor #1 view, and any number of views can read directly from the stream capture view.

The application includes significantly more sophisticated integrations, features and analytics than illustrated here. For example:

  • Multiple rules
  • Recording and forwarding the events responsible for the generation of the alerts
  • Detect escalation
  • Detect clearance events
  • Join with alert history to identify exceptional events that deviate significantly from historical norms

These use cases are important components of a complete solution, and I’ll be providing examples in subsequent blogs, explaining how these have been implemented.


Vice President Marketing
January 11, 2012

For Big Data, 2012 has started where 2011 left off, with a plethora of reports, articles and blogs. Interestingly, most still begin with the question “what is Big Data”. It appears ‘Big Data’ as a market is broadening its footprint far beyond its open source and Hadoop origins. My favourite new term in this quest for delineation is “Small Big Data”. (Isn’t that just “Data”?)

The most interesting trend for us is streaming Big Data processing and analytics. Edd Dumbill, O’Reilly Radar, talks about this as one of the “Five big data predictions for 2012”, “Hadoop’s batch-oriented processing is sufficient for many use cases, especially where the frequency of data reporting doesn’t need to be up-to-the-minute. However, batch processing isn’t always adequate, particularly when serving online needs such as mobile and web clients, or markets with real-time changing conditions such as finance and advertising.”

The real-time use case is an obvious one. If you need to respond or be warned in real-time or near real-time, for example, security breaches or a service impacting event on a VoIP or video call, the high initial latency of batch oriented data stores such as Hadoop is not sufficient.

However, there is also an emerging discussion on the storage of Big Data for big data’s sake. This is the blind collection and storage of data without due consideration as to how it’s going to be used. Dan Woods talks about this in his recent Forbes article “Curing the Big Data Storage Fetish”. The data will never create value without analysis, and little thought has been given to increasing analytics capacity.

There are many vendors emerging for the historical analysis of Big Data repositories, either on the Hadoop platform, or on platforms from the other large scale data warehouse vendors. However, there are very few vendors in streaming Big Data analytics space, and even fewer products with the maturity, flexibility and scalability to process Big Data streams in real-time.

Streaming Big Data analytics needs to address two areas.  First, the obvious use case, monitoring across all input data streams for business exceptions in real-time. This is a given.  But perhaps more importantly, much of the data held in Big Data repositiories is of little or no business value, and will never end up in a management report. Sensor networks, IP telecommunications networks, even data center log file processing – all examples where a vast amount of ‘business as usual’ data is generated. It’s therefore important to understand what’s being stored, and only persist what’s important (which admittedly, in some cases, may be everything).  For many applications, streaming data can be filtered and aggregated prior to storing, significantly reducing the Big Data burden, and significantly enhancing the business value of the stored data.  At least until we understand why we’re trying to store everything.

Posted under Big Data

Vice President Marketing
November 18, 2011

ITS California AGM and Exhibition, 2011Event report from ITS California,  Annual General Meeting in Long Beach, Nov 13 – 15, 2011.

ITS California’s AGM is SQLstream’s local intelligent transportation event, and an opportunity for public and private sector companies to discuss those issues specific to the state of California. The event has grown in size significantly over the past few years, with new organizations attending, all keen to contribute and discuss different perspectives on the various transportation problems. ITS-CA was established in 1994 as a not-for-profit organization with the remit to foster the adoption of ITS technology across the state. Its funding is a mix of public and private industry.

SQLstream setting up the booth at ITS California

California has some acute transportation issues. For example, in Los Angeles County, the number of new residents is set to increase traffic on already congested roadways by an estimated 39 percent. Roadway expansion in this period is set to increase by only 3 percent, resulting in congestion levels rising by more than 200 percent in the next 25 years. Obviously addressing these issues ha been the focus of the Californian transportation industry as a whole, but ITS-CA serves as a focus point once per year to discuss in a wider forum.

The theme of this year’s event therefore (not unsurprisingly) was “Discovering Keys to the Next Decade”, with session focused on:

  • Support for connected vehicle deployments.
  • Transportation systems efficiency, for example reliable travel time and pricing.
  • Analyzing the success of traffic light systems and how close we are to the ‘continuous green’ vision.
  • Improving traveller information systems (of specific interest to SQLstream).
  • The effectiveness of schemes such as car sharing.

Although there’s always an over-arching theme or direction that emerges, and in this case it was the increasing presence of the private sector, and how private sector innovation is helping to drive the adoption of new technology and approaches. This was a common topic at the ITC-CA in Berkley last year, and has continued to grow. It was also a trend that was raised frequently at the recent ITS World Congress in Orlando. Long may it continue.

November 9, 2011

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the third and final part of the Geospatial Visualization tutorial. The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps required to capture the data and create a display list using Rails. The second part of this tutorial presented the core of the application – how to render the display list. And in this the concluding part of the visualization tutorial, the final key element is discussed – how to get the data flowing.

Getting the Data Flowing

The last step is to tie SQLstream and Postgres together. First we need to give SQLstream the credentials to access Postgres. Create a new file:

$QUAKES/quake.postgres.properties
1 URI=jdbc:postgresql://localhost/quakekml_development 
2 DRIVER=org.postgresql.Driver 
3 CONNPARAMPREFIX=dbConn_ 
4 dbConn_databaseName=quakekml_development 
5 dbConn_user=USER 6 dbConn_password=PASSWORD
7 dbConn_applicationName=SQLstream TableReader Adapter

On lines 5 and 6 insert the Postgres user and password you set up (same as in $QUAKES/quakekml/config/database.yml). Create a directory under $SQLSTREAM_HOME/plugin called jndi (if it’s not there already), copy your quake.postgres.properties file there and restart the SQLstream server.


linux> cd $SQLSTREAM_HOME/plugin
linux> mkdir jndi
linux> cp $QUAKES/quake.postgres.properties jndi/
linux> cd $SQLSTREAM_HOME
linux> bin/sqlstreamd

If you haven't loaded webfeed.sql and usgs.sql as described in the beginning of this post, you should do so now. Now we need to write the SQL that bridges the databases: $QUAKES/viz.sql


 1 SET SCHEMA '"WebFeed"';
 2 
 3 CREATE OR REPLACE FOREIGN DATA WRAPPER "TableUpdate"
 4     LIBRARY 'class com.sqlstream.plugin.tableupdate.TableUpdateStreamControlPlugin'
 5 ++++LANGUAGE java
 6 ++++DESCRIPTION 'adapter for doing insert/update/merge/delete to an external database';
 7 
 8 --
 9 -- Create SQL/MED foreign server and foreign stream
10 --
11 CREATE OR REPLACE SERVER "Postgres_TableUpdate"
12 FOREIGN DATA WRAPPER "TableUpdate"
13 OPTIONS (
14 ++++connParams 'quake.postgres',
15 ++++sqlDialect 'Postgres 8.x',
16 ++++pollingMillis '5000',
17 ++++commitCount '1000',
18 ++++commitMillis '2000')
19 DESCRIPTION 'Postgres database with visualization';
20 
21 --
22 -- Display list for quake events
23 --
24 CREATE OR REPLACE FOREIGN STREAM "QuakeEventsDB" (
25 ++++"SQLS_opcode" CHAR(2) NOT NULL,
26 ++++"SQLS_chg" VARBINARY(32),
27 ++++"id" INTEGER options("insert" 'skip', "update" 'skip'),
28 ++++"when" TIMESTAMP,
29 ++++"lat" DOUBLE,
30 ++++"lon" DOUBLE,
31 ++++"mag" DOUBLE,
32 ++++"created_at" TIMESTAMP,
33 ++++"updated_at" TIMESTAMP
34     )
35 ++++SERVER "Postgres_TableUpdate"
36 ++++OPTIONS (
37 ++++++++TYPE 'tableUpdates',
38 ++++++++MASTER 'true',
39 ++++++++updatesTable 'quake_events')
40 ++++DESCRIPTION 'table updated with quake events';
41 
42 --
43 -- Pump quake events into display list
44 --
45 CREATE OR REPLACE PUMP "1000-QuakeEventsPump" STOPPED
46 DESCRIPTION 'pump from "SmallQuakesDay" view to "QuakeEventsDB" foreign stream' AS
47 ++++INSERT INTO "QuakeEventsDB" (
48 ++++++++"SQLS_opcode", "when", "lat", "lon", "mag",
49 ++++++++"created_at", "updated_at")
50 ++++SELECT STREAM 'IN',
51 ++++++++q.ROWTIME,
52 ++++++++CAST(SUBSTRING("point", 1, POSITION(' ' IN "point") - 1) AS DOUBLE),
53 ++++++++CAST(SUBSTRING("point", POSITION(' ' IN "point") + 1) AS DOUBLE),
54 ++++++++"mag",
55 ++++++++CURRENT_TIMESTAMP, CURRENT_ROW_TIMESTAMP
56 ++++FROM "SmallQuakesDay" as q;

The first 20 lines set up the table updater (note the reference to our properties file on line 14). We then create a foreign stream to describe the Postgres table quake_events. When Rails created the table, it added an auto-incrementing id field and two timestamps, created_at and updated_at. The options specified on line 27 cause SQLstream to ignore the id column and let Postgres maintain it. At line 45 we describe the pump that reads from SmallQuakesDay and inserts into the stream we defined above. In the select clause we use the 'IN' opcode to indicate that we're inserting and the ROWTIME of the quake record to set the 'when' column in the display list. We parse the 'point' column the USGS provides, which is in the format "lat<space>lon", on lines 52 and 53. To set the Rails timestamps correctly we use CURRENT_TIMESTAMP to get the creation time and CURRENT_ROW_TIMESTAMP to get the time of this update. Load this file into SQLstream with:

linux> sqllineClient < viz.sql

Now all that's left to do is start the pump. Create this one-line file:

$QUAKE/start-pump.sql
1 ALTER PUMP "1000-QuakeEventsPump" START;

and start the pump:

linux> sqllineClient < start-pump.sql

Data should now be flowing from the USGS web service, through SQLstream, into Postgres, and rendered to Google Earth from Postgres via Rails. To verify that data is flowing, you can use the 'Edit quake events' in your web app. When you follow the Google Earth link, you should see pins scattered across the globe indicating the past day's earthquakes, and updated every 60 seconds. You'll probably have to zoom out a bit to see them, unless San Francisco is having another bad day. This is what I saw when I ran it:

And finally

That concludes the streaming data visualization tutorial. In case you've missed the earlier posts, the complete series can be found using these links:

++++++++Part 1: Capture the data and create the display list

++++++++Part 2: Rendering the display list

++++++++Part 3: Making it work with flowing data

Please contact us if you have any questions.

Posted under SQLstream Tutorials
November 2, 2011

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the second in the Geospatial Visualization tutorial.  The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps required to capture the data and create a display list using Rails.  In the second part of this tutorial, we’re going to discuss the meat of the application –  how to render the display list.

Rendering the Display List

To keep Google Earth continuously updated with the data flowing from SQLstream we’ll have to serve two different KML files: one will contain a KML Placemark for each quake, and the other gives the URL of the quakes feed and tells GE to continuously refresh it (in KML, this is the NetworkLink). We’re going to be serving compressed KML to cut down on the transmission time, so we’ll need the rubyzip gem we installed earlier included in our web app. Stop the server, go to $QUAKE/quakekml, and edit the file “Gemfile” to add this line to the end of the file:

gem 'rubyzip'

then issue these commands:

linux> bundle
linux> bundle package
linux> rails generate controller home index
linux> rails generate controller quakes start feed

The first two commands bundle up the application, including the new gem. The third command creates a Rails controller for our home page, while the last command creates a controller with two actions, one for each of our services. If you restart the server now, you can view these services at http://localhost:3000/home/index, http://localhost:3000/quakes/start and http://localhost:3000/quakes/feed. Edit $QUAKE/quakekml/config/routes.rb and add this line after the “get” commands to make home/index the home page for the web app:

root :to => "home#index"

You’ll also have to remove the file $QUAKE/quakekml/public/index.html. Restart the server and visit http://localhost:3000, you should now see the home#index default page rather than Rails’ startup page. Note that it shows you the name of the template file for this page, relative to the $QUAKE/quakekml directory. Edit $QUAKE/quakekml/app/views/home/index.html.erb to create the content for your landing page, at some point in the body add this line to create a link to the start service:

<%= link_to 'Earthquake events (open in Google Earth)',
+++++:controller => 'quakes', :action => 'start' %>

You should also add this link to the scaffolding for quake events:

<%= link_to 'Edit quake events', quake_events_path %>

You shouldn’t have to restart the server, just refresh http://localhost:3000 to see the changes.

The Ruby code that implements our services is in $QUAKE/quakekml/app/controllers, starter code has already been written by the “rails generate” commands we’ve been issuing. The ancestor of all of our controller classes is in application_controller.rb, we’ll add a method for setting up parameters available in any request (in this case, only one, the path to the server) and two methods for sending text so that the browser recognizes it as KML or KMZ:

$QUAKE/quakekml/app/controllers/application_controller.rb

1 require 'zip/zip'
2
3 class ApplicationController < ActionController::Base
4 protect_from_forgery
5
6 # Set up the common params to be computed
7 # after the request is received (can't go
8 # initialize). These are available in all views.
9 #
10 def setup_common
11 @path = request.host + ':' + request.port.to_s
12 end
13
14 # Output the kmz, given the kml
15 #
16 def send_kmz(kml)
17 t = Tempfile.new("zipout-#{request.remote_ip}")
18 Zip::ZipOutputStream.open(t.path) do |zos|
19 zos.put_next_entry("sqlstream.kml")
20 zos.print kml
21 end
22
23 send_file t.path,
24 :type => "application/vnd.google-earth.kmz",
25 :filename => "sqlstream.kmz"
26
27 t.close
28 end
29
30 # Output the kml directly
31 #
32 def send_kml(kml)
33 render :text => kml,
34 :layout => false,
35 :content_type => "application/vnd.google-earth.kml+xml"
36 end
37 end

Next we edit quakes_controller.rb to write the methods that respond to the quakes/start and quakes/feed requests. Each method uses Rails’ template support to render a KML template, with an option set to prevent it from being laid out like an HTML page. The start method sets an instance variable, @refresh, to the number of seconds we want to wait before refreshes. The feed method uses Rails’ database support to store all of the quake event rows in @quakes. These instance variables are expanded in the templates. $QUAKES/quakekml/app/controllers/quakes_controller.rb

 1 class QuakesController < ApplicationController
 2+++++def start
 3+++++++++setup_common
 4+++++++++@refresh = 60
 5+++++++++kml = render_to_string :template => 'quakes/start.kml',
 6+++++++++++++:layout => false
 7+++++++++send_kmz kml
 8+++++end
 9
10+++++def feed
11+++++++++setup_common
12+++++++++@quakes = QuakeEvent.all
13+++++++++kml = render_to_string :template => 'quakes/feed.kml',
14+++++++++++++:layout => false
15+++++++++send_kmz kml
16+++++end
17+end

The path for the template files is relative to $QUAKES/quakekml/app/views, the quakes directory there should already exist and contain the default templates generated by Rails. We’ll create two new template files, starting with the one for quakes/start:

$QUAKES/quakekml/app/views/quakes/start.kml

 1 +<?xml version="1.0" encoding="UTF-8"?>
 2 +<kml
+++++xmlns="http://www.opengis.net/kml/2.2"
+++++xmlns:gx="http://www.google.com/kml/ext/2.2"
+++++xmlns:kml="http://www.opengis.net/kml/2.2"
+++++xmlns:atom="http://www.w3.org/2005/Atom">
 3 +<Document>
 4 +++<name>Earthquake Monitor</name>
 5 +++<open>1</open>
 6 +++<visibility>1</visibility>
 7 +++<LookAt>
 8 ++++++<longitude>-122.418955</longitude>
 9 ++++++<latitude>37.775410</latitude>
10+++++++<altitude>359000.0</altitude>
11+++++++<range>37000.0</range>
12+++++++<altitudeMode>relativeToGround</altitudeMode>
13 +++</LookAt>
14 +++<NetworkLink>
15 +++++<name>Quakes</name>
16 +++++<open>0</open>
17 +++++<visibility>1</visibility>
18 +++++<refreshVisibility>0</refreshVisibility>
19 +++++<flyToView>0</flyToView>
20 +++++<Link>
21 +++++++<href><%= url_for(:controller => 'quakes', :action => 'feed', : only_path => false) %></href>
22 +++++++<refreshMode>onInterval</refreshMode>
23 +++++++<refreshInterval><%= @refresh %></refreshInterval>
24 +++++++<viewRefreshMode>onStop</viewRefreshMode>
25 +++++++<viewRefreshTime>1.0</viewRefreshTime>
26 +++++</Link>
27 +++</NetworkLink>
28 +</Document>
29 +</kml>

The start KML begins with a LookAt element specifying the starting view (directly above SQLstream HQ!). The NetworkLink element includes two substitutions handled by Rails: at line 21 we insert the URL for the quakes/feed service, and at line 23 we insert the refresh rate.

$QUAKES/quakekml/app/views/quakes/feed.kml

1 <?xml version="1.0" encoding="UTF-8"?>
2 <kml xmlns="http://www.opengis.net/kml/2.2"
+++++xmlns:gx="http://www.google.com/kml/ext/2.2"
+++++xmlns:kml="http://www.opengis.net/kml/2.2"
+++++xmlns:atom="http://www.w3.org/2005/Atom">
3 <Document>
4 ++<name>Pins</name>
5 ++<open>1</open>
6 ++<visibility>1</visibility>
7 ++<Style id="pin">
8 ++++<IconStyle id="pin">
9 ++++++<scale>1.0</scale>
10+++++++<Icon>
11+++++++++<href>http://<%= @path %>/images/pin.png</href>
12+++++++</Icon>
13+++++</IconStyle>
14+++++<LabelStyle>
15+++++++++<color>ff0000dd</color>
16+++++++++<scale>1.2</scale>
17+++++</LabelStyle>
18+++</Style>
19+++<Folder>
20+++++<name>Earthquakes</name>
21+++++<open>0</open>
22+++++<visibility>1</visibility>
23+++++<description></description>
24+++++<%= render :partial => "quake_event", :collection => @quakes %>
25+++</Folder>
26+</Document>
27+</kml>

At line 11 we use the @path variable to specify the location of an image we want to appear on the globe at each quake location. You should place an image in $QUAKES/quakekml/public/images/pin.png, we use this one:

The other substitution, at line 24, causes a partial template to be rendered for each record in the @quakes collection. According to Rails’ naming conventions, the partial must be in this controller’s view directory with the name _quake_event.html.erb:

$QUAKES/quakekml/app/views/quakes/_quake_event.html.erb

 1 +++<Placemark>
 2 ++++ <name><%= quake_event.mag %></name>
 3 +++++<open>1</open>
 4 +++++<visibility>1</visibility>
 5  ++++<description><![CDATA[<%= render :partial => 'quake_description',
+++++++++++++:locals => {:quake_event => quake_event} %>]]></description>
 6  ++++<styleUrl>pin</styleUrl>
 7  ++++<Point id="quake">
 8  ++++++<extrude>false</extrude>
 9  ++++++<coordinates><%= quake_event.lon %>,<%= quake_event.lat %>,0</coordinates>
10  ++++</Point>
11  ++</Placemark>

The quake event records are inserted at lines 2 and 9. At line 5 we reference another partial that renders the HTML for the popup that appears when you click on the pin in Google Earth:

$QUAKES/quakekml/app/views/quakes/_quake_description.html.erb

1 <table style="text-align: center; width: 300px;" border="0" cellpadding="2" cellspacing="2">
2 ++<tbody>
3 ++++++<tr align="left">
4 ++++++++<td>on <%= quake_event.when.strftime("%Y/%m/%d") %>
+++++++++++++at <%= quake_event.when.strftime("%X %Z") %></td>
5 ++++++</tr>
6 ++++++<tr align="left">
7 +++++++++<td>at lat: <%= number_with_precision(quake_event.lat, :precision => 6) %>
+++++++++++++lon: <%= number_with_precision(quake_event.lon, :precision => 6) %></td>
8 ++++++</tr>
9  +++</tbody>
10 </table>

Note that this separation into multiple templates lets us express code in Ruby files, KML in KML files, and HTML in HTML files. Details about how the data is presented, such as how we format a timestamp, are taken out of the code and expressed in markup language.

You should now have a working KML renderer. Go to your app’s home page and follow the ‘Edit quake events’ link to add one or more fake quakes. Use SQLstream’s lat/lon from start.kml (above) for at least one of them. Now follow the Google Earth link from your app’s home page (you may have to instruct your browser to open KML/KMZ files in Google Earth, if you’ve never done this before). Google Earth should open and zoom to SQLstream HQ, and there should be a pin indicating an earthquake there.

You can now refine the visualization of a quake event by updating the templates and refreshing the display (you can refresh the quakes/feed stream by right-clicking on ‘Quakes’ in the Places tree on the left and selecting ‘Refresh’). We have a tool to display whatever quake events are dropped into the database, the next step is to feed it from SQLstream.

Next time
Part 3 of the visualization tutorial concludes this series and will be published next week. It will discuss the final key element – how to get the data flowing.

Posted under SQLstream Tutorials

Vice President Marketing
October 21, 2011

It struck me that the underlying theme of the conference could be described as ‘low risk innovation’.  An oxymoron?  At first glance, yes, but in this case it describes the cautious adoption of new technology while protecting existing investment.

The perception of the ITS industry, rightly or wrongly, has been one of an industry focussed on major manufacturing and hardware deployment projects, rather than software and new technologies. Therefore perhaps the industry hasn’t seen the level of growth and innovation that has occurred in other areas, telecommunications being a good example.  Or perhaps it simply hasn’t been possible until new technologies come along with a sufficiently compelling business case – lower cost solutions and faster to deploy.

New technology introduction – complementary and overlaid

GPS and Bluetooth featured strongly at the event on both the exhibition floor and in the breakout panel sessions.  It was clear however that the preference was to use these either as a proof concept across a small area, or where monitoring infrastructure exists, as complementary solutions, extending the accuracy and scope of existing fixed-road sensor deployments.

Software solutions and architecture

It was interesting how many of our booth visitors were initially attracted by SQLstream ITS Insight, but who were actually looking for more of a horizontal platform solution.  Yes, out of the box capability was important to get solutions up and running quickly, but wider concerns touched on some fundamental software engineering principles – openness, scalability and interoperability.

Open and flexible

The focus on manufacturing and hardware oriented projects tends to produce software support systems that are built to do just that – support the particular hardware installed for that project.  This leads to capable and often feature-rich systems, but systems that are difficult to extend and configure for new technologies and requirements.  There was a definite theme in the questions being asked at our booth for openness – open platforms, where an agency’s IT department, or consulting partner, can add new applications easily, and in fact are encouraged to do so.

Performance and scalability

Systems are required to scale in two ways.  The first is raw performance as the number of sensor events increases and real-time performance is required. Secondly, and perhaps less obvious, is that transportation agencies are looking to consolidate systems and provide common systems across multiple counties and even at the state level.  This highlights immediately the scalability issues with existing systems.   As the user base increases, the geographical scope of the system increases, and the drive for real-time information increases, the weakness of existing systems have been exposed.

Standards

The emergence of IT standards is a sure sign of increasing maturity in any industry.  This represents a move away from limited, siloed solutions to consideration of the wider integration issues – integration with network hardware, but also the integration between management systems.

A final word …

One final word, not quite a trend yet, but ITS appears to be embracing the Cloud as a solutions platform, as a mechanism for providing access to applications, but also for scalability and as a lower cost solution where large infrastructure deployment would be required.


Vice President Marketing
October 19, 2011

A defining feature of the show is the Technology Showcase featuring  demonstrations from some of the technologies and applications that are bringing the future of transportation to life.  Each ‘village’ covers a specific theme such as Safety, Mobility, Environment/Sustainability and Pricing.  Environment/Sustainability focuses on the potential for reducing emissions.  Interesting demonstration offered by Imperial College, London, that will use a pollution sensor mounted on the roof of a demonstration vehicle travelling around the site.  Data is transferred over a GSM connection and viewed over the internet. Another is Ricardo Engineering’s demonstration of improving fuel efficiency using GPS navigation and traffic signal phase and timing data.

On the theme from yesterday of using simulators to demonstrate the benefits of technology, Toyota’s Star safety system is a star of the show – attendees get a spin in a off-road driving simulator, once with all electronic aids turned off, and once with traction control, stability control, brake assist and anti-lock braking turned on.

And for SQLstream’s day, another busy day with many questions and demonstrations of real-time traffic analytics and congestion detection with SQLstream ITS Insight.  Our core Stream-to-Business platform, and the ability to integrate and extend SQLstream ITS Insight easy and quickly, is generating significant interest.


Vice President Marketing
October 18, 2011

The ITS World Congress claims to be the largest transportation event in 2011. Certainly the range of attendees and exhibitors is impressive, from software products to the latest in roadside hardware infrastructure. Some interesting themes emerging. Vehicle to vehicle communication is generating a lot of interest, and electronic driver aids. Some great simulators as well to amuse the attendees.

‘Real-time’ is also a common theme across the exhibition hall. And in particular the use of new technologies such as Bluetooth and GPS. Damian Black, SQLstream’s CEO, was speaking in a session today on how to achieve accurate arterial travel time. This is a hot topic right now, where existing in-road sensors are too expensive and the solutions too inaccurate. Bluetooth and GPS are the two emerging, although not necessarily competing, technologies – both can be used simulataneously to reinforce the other.

SQLstream in action – Day 1

It was great to have so much interest at the booth for our real-time intelligent transportation solutions. We announced the public launch of our SQLstream ITS Insight solution this morning. Perhaps we’re one of the few new companies at the show, but the ability to build real-time traffic analytics solutions based on maximizing the use of all available sensor data is generating a lot of interest.


Vice President Marketing
October 17, 2011

SAN FRANCISCO, CA, October 17, 2011 - SQLstream Inc. today announced the public availability of SQLstream ITS Insight, the first real-time solution for reducing congestion to exploit low cost wireless GPS data as a complement to existing fixed-road sensor investment. Transportation Agencies are already benefiting already from SQLstream ITS Insight, using it to deliver real-time Travel Time and congestion detection solutions. The official public launch for SQLstream ITS Insight is today at the ITS World Congress, Orlando.

For commuters in their cars, traffic lines are lengthening and travel times are increasingly unreliable. Congestion is growing globally and transportation agencies are already struggling with their existing congestion management systems. Traditional approaches are expensive to install and maintain, and they provide very limited information and network coverage.

“Transportation Agencies are revolutionizing their approach to transport network management”, said Damian Black, SQLstream CEO. “SQLstream is delighted to be the core of their strategy for a single, real-time Intelligent Transportation platform.”

About SQLstream ITS Insight

SQLstream’s ‘Insight’ product range offers fast start solution packs for industry markets based on SQLstream’s core Stream-to-Business platform. SQLstream ITS Insight is a real-time traffic analytics and management platform for Intelligent Transportation agencies, offering real-time Travel Time, and sophisticated congestion detection algorithms that combine real-time and historical trend data. With SQLstream ITS Insight, transportation agencies will:

  • Significantly reduce the cost of fulfilling congestion reduction targets
  • Implement travel time improvements in just weeks rather than years
  • Achieve the impossible – effective real-time insight for arterial routes

SQLstream will be presenting live demonstrations of SQLstream ITS Insight at ITS World Congress, Orlando, booth #1366.

About SQLstream Inc.

SQLstream’s Stream-to-Business platform analyzes real-time service and sensor data streams to deliver instant alerts, analytics and immediate answers to business decision makers. Using the industry standard SQL language, SQLstream executes queries on the wire, before data reaches the warehouse, enabling businesses to make smarter decisions sooner. SQLstream adds real-time operational intelligence, monitoring and control to existing systems while reducing total cost and complexity. SQLstream is headquartered in San Francisco, California and is on the web at http://www.sqlstream.com. For further information, please call Ronnie Beggs (877) 571-5775, or email pr@sqlstream.com.


Vice President Marketing
October 12, 2011

Visit SQLstream on Booth #1366

Technology and innovation are central themes of this year’s ITS World Congress.  There’s been much written about the issues of congestion, green transportation schemes and improving personal mobility, not least in this blog.  At SQLstream we’ve been playing our part to help revolutionize the Intelligent Transportation industry.  It’s clear that the concepts of streaming data and real-time analytics are entering the main stream – from low level Big Data toolkits that require a streaming, low latency front end, to the real world of sensor networks and industries such as smart grid and telecommunications.

This is just as true in transportation.  Here we have an industry with vast volumes of sensor data, a need for sophisticated real-time analytics, and platforms capable of driving real-time process automation.  We’ve been working with a number of transportation agencies for some time, and are about to launch a new ‘Insight’ product for intelligent transportation.  Our ‘Insight’ range provides tools and out of the box support for specific industry verticals based on our core Stream-to-Business platform.

Google Earth Display for Road Traffic Congestion

Google Earth Display for Road Traffic Congestion

For Intelligent Transportation this means processing sensor data from GPS and fixed-road sensors, to deliver applications such as real-time Travel Time, live congestion detection and network KPI reporting.

Should you be attending the ITS World Congress, we’d be delighted to see you on our booth (#1366) for a demonstration.

October 7, 2011

A streaming SQLstream application will feel very familiar to anyone with some basic knowledge of SQL and traditional RDBMS applications.  SQLstream uses standards-based SQL, except that streaming SQL queries run forever, processing data as they arrive over specified time windows.

This blog is the first in a series of tutorials for SQLstream developers, describing how to build a streaming SQL applications.  Over the coming months, these tutorials will address the different components of streaming data applications, and provide worked examples and guidance.

Streaming Visualization, Part 1: Setting up

We’ll begin the series by looking at a typical streaming use case – displaying real-time sensor data on a map.  We have a source of geo-located data flowing in SQLstream that we’d like to visualize. Using Google Earth and Ruby on Rails, I’ll demonstrate an easily-implemented solution with lots of room for expansion.

Google Earth - Real-time streaming data visualization

For this example, our approach is to connect the SQLstream pipeline to Google Earth using a staging database–a common deployment scenario. We’ll be using PostgreSQL for the staging database, but MySQL or any other database supported by Rails will work. A SQLstream pump will use TableUpdate to write a record of latitude, longitude, and description for each event to a display list in PostgreSQL. When Google Earth places a web request for data, Rails will service the request by rendering the contents of the display list as KML, Earth’s dialect of XML. We’ll start with SQLstream, Ruby, and PostgreSQL already installed and focus on what’s necessary to get them all talking to each other.

Getting the Data

With SQLstream installed, make sure all of the distributed plugins are installed (if you haven’t done this already) and start the server:

linux> cd $SQLSTREAM_HOME/plugin/autocp
linux> ln -s ../*.jar .
linux> cd $SQLSTREAM_HOME
linux> bin/sqlstreamd

We’re going to get our data from a web feed of recent earthquakes provided by the US Geological Survey. In another shell:

linux> cd $SQLSTREAM_HOME/examples/webfeed
linux> sqllineClient < webfeed.sql
linux> sqllineClient < usgs.sql

We now have several streams available to us within SQLstream, the one we want to visualize is SmallQuakesDay, which includes columns containing the location (‘point’ as lat/lon) and magnitude (‘mag’) of the quake.

Creating the Display List

We’ll use Rails to do all of the work of creating the display list. If you don’t have Rails installed yet, start by installing Ruby’s Gem package management system (in Ubuntu, this is the rubygems package). You’ll also need the development files for Postgres installed (postgres-server-dev in Ubuntu). You can now use gem to install rails and associated tools with this command:

linux> gem install mongrel rails pg rubyzip

I recommend that you add gem’s bin directory to your path (on my system it’s /var/lib/gems/1.8/bin) so that the commands ‘rails’, ‘rake’, and ‘bundle’ are found. Create an empty directory to work in (we’ll call it ‘$QUAKE’ here), cd there, and create a new rails server in the sub-directory ‘quakekml’ with these commands:

linux> cd $QUAKE
linux> rails new quakekml -d postgresql

You can test the server by starting it with these commands and visiting http://localhost:3000 in a web browser:

linux> cd $QUAKE/quakekml
linux> rails server

Use ^C to shut the server down so we can configure the database access. Edit the file $QUAKE/quakekml/config/database.yml, it should already contain sections describing the development, test, and production databases. Edit the username and password settings in each section to match a user you’ve configured in Postgres who can create databases. The only database we’ll be using is ‘quakekml_development’, but Rails will create all three when you issue this command:

linux> rake db:create:all

Create a display list consisting of a timestamp, lat/lon, and magnitude for each quake with the commands:

linux> rails generate scaffold quake_event when:timestamp lat:float lon:float mag:float
linux> rake db:migrate

You now not only have an empty table in Postgres, you also have a full web interface for viewing and editing that table. Start the server again and visit http://localhost:3000/quake_events to see it. Our next steps are to generate KML for Google Earth from this table, and to feed the table from SQLstream. The scaffolding created by Rails is a handy debugging tool we can use to inspect the table and manually add items to test the visualization.

Next time

Parts 2 and 3 of the visualization tutorial will be published over the coming weeks.  Part 2 focuses on how to render streaming analytics in Google Earth, and the final part of the tutorial will discuss how to get the data flowing.


Vice President Marketing
September 13, 2011

The 18th World Congress on Intelligent Transport Systems (ITS) is being held in Orlando from October 16th – 20th, 2011. This is the leading event for intelligent transportation solutions, and attracts a large audience of government, technology and industry professionals. The event seeks to demonstrate advances in the application of new technology and smart transportation. Major areas of focus include the reduction of traffic congestion and improvement in  personal mobility.

With 800 million vehicles on the world’s roads today, a number forecast to grow to between 2 and 4 billion by 2050, it is clear that transportation management  systems will need to analyze real-time sensor and GPS data dynamically on a massive scale to reduce congestion and optimize personal mobility. The objective is to achieve a fluid and reliable transportation network, that can respond dynamically to changing loads and conditions, and provide consistent and acceptable travel times.

The performance of a transportation network can be measured based on road usage (number of vehicles), and the travel speed and time from origin to destination.  Today’s traffic management systems rely on historical analysis of data from fixed sensors.  However, roadside and in-road sensor projects are very expensive to install and maintain. As a consequence, only a very limited view of the overall road network is available,  with sensor deployments focusing on primary routes and major intersections only. Also, fixed sensors tend to report traffic flow – at best a secondary measure of the real requirement –  congestion.

Most important however is the lack of real-time, dynamic behaviour from existing traffic management systems.  Flow control, for example at intersections and on freeways, is activated at specific times based on the historical analysis of the fixed sensor data – this helps, but is unable to react to changing patterns of traffic flow and congestion.

One approach to the problem is to introduce the latest wireless GPS sensor technology.  Wireless GPS sensors have two significant advantages:

  1. Immediate and real-time information on vehicle speed and location.
  2. Low cost solutions that can be deployed quickly, with little or no maintenance.
  3. Provides a direct measure of vehicle speed and the ability for real-time and accurate measure of congestion.
  4. Complete network insight – highways and arterial routes – at the granularity of a few meters.

SQLstream ITS Insight

For example, when one national road agency was re-evaluating its approach to intelligent transportation systems, it identified wireless GPS technology as both a significantly cheaper and potentially much superior solution to congestion detection and Travel Time. SQLstream was selected as the real-time traffic analytics and congestion detection platform based on processing in-vehicle GPS sensor data. The SQLstream solution enabled the agency to cancel a $20million fixed sensor program,  and to build a real-time traffic management platform based on SQLstream’s ITS Insight.

We will be demonstrating our real-time traffic management capabilities on our stand at ITS World Congress in Orlando.  In addition, our CEO, Damian Black will be participating in a number of related panel sessions on arterial travel time solutions and real-time data management for intelligent transportation.  For those attending ITS World Congress, please visit us for a demo at Booth #1366, or visit our website for more information on SQLstream and real-time transportation management systems.  We look forward to seeing some of you at least some at the show.

August 31, 2011

I am going to discuss a SQLstream application for monitoring traffic flow in real-time. In this application, vehicles with GPS enabled devices transmit vehicle position along with other vehicle information such as speed and engine state. SQLstream receives this information as a real-time data stream and uses streaming SQL analytics to detect and predict the rapid onset of congestion on the road network in real-time.

Streaming SQL for Congestion Detection
The SQLstream application for congestion detection uses a typical streaming SQL processing pipeline. In this case, data is fed into the SQLstream pipeline using our Log File Adapter. SQLstream adapters provide an interface to sources and targets such as databases, log files, network sockets and mail servers. Adapters are built using SQL/MED specification which is part of ANSI SQL standard. In this application, each log file contains the vehicle positions on the road network for the latest minute.

The conditioning pipeline performs data cleansing operations such as rejecting poor quality data (records with missing or out-of-bounds columns) followed by mapping of vehicle positions (lat/long pair) to a “road element” of the road network using a UDX to perform geo-spatial lookups in an external road network database.

The diagram and the example SQL below show our implementation of a streaming SQL pipeline for congestion detection. Each vehicle reports its position and speed every minute. Two consecutive vehicle positions are then used to interpolate vehicle speeds for each road element on the vehicle path between reporting positions. The interpolated speed is based on actual distance traveled by the vehicle between two consecutive reports. The interpolated speed is calculated in a User Defined Transform(UDX). The UDX is written in Java. The UDX also associates a confidence factor with each interpolated speed value based on the position of the road element relative to endpoints of the vehicle path.

Streaming Traffic Flow Analytics
As illustrated below, the analytics pipeline calculates 15, 5, 4, 3, 2 & 1 minute moving average speeds for each road element. Each road element is color coded based on the 15-minute moving average speed. The results are streamed to a Google Earth display.

CREATE OR REPLACE VIEW “EstimatedReSpeeds” AS
SELECT STREAM “RE”, “reID”, “Carriageway”, “rePrescribed”, “reSpeedLimit”,
++SUM(“reVehicles”) OVER “last1″ AS “reVehiclesLast1″,
++SUM(“reVehicles”) OVER “last2″ AS “reVehiclesLast2″,
++SUM(“reVehicles”) OVER “last3″ AS “reVehiclesLast3″,
++SUM(“reVehicles”) OVER “last4″ AS “reVehiclesLast4″,
++SUM(“reVehicles”) OVER “last5″ AS “reVehiclesLast5″,
++SUM(“reVehicles”) OVER “last15″ AS “reVehiclesLast15″,
++SUM(“reSpeed” * “reConfidence”) OVER “last1″ /
++SUM(“reConfidence”) OVER “last1″ AS “reSpeedLast1″,
++SUM(“reSpeed” * “reConfidence”) OVER “last2″ /
++SUM(“reConfidence”) OVER “last2″ AS “reSpeedLast2″,
++SUM(“reSpeed” * “reConfidence”) OVER “last3″ /
++SUM(“reConfidence”) OVER “last3″ AS “reSpeedLast3″,
++SUM(“reSpeed” * “reConfidence”) OVER “last4″ /
++SUM(“reConfidence”) OVER “last4″ AS “reSpeedLast4″,
++SUM(“reSpeed” * “reConfidence”) OVER “last5″ /
++SUM(“reConfidence”) OVER “last5″ AS “reSpeedLast5″,
++SUM(“reSpeed” * “reConfidence”) OVER “last15″ /
++SUM(“reConfidence”) OVER “last15″ AS “reSpeedLast15″
FROM “Stage3″
WINDOW “last1″ AS (PARTITION BY “RE”
++RANGE INTERVAL ’1′ MINUTE PRECEDING),
+++++“last2″ AS (PARTITION BY “RE”
++RANGE INTERVAL ’2′ MINUTE PRECEDING),
+++++“last3″ AS (PARTITION BY “RE”
++RANGE INTERVAL ’3′ MINUTE PRECEDING),
+++++“last4″ AS (PARTITION BY “RE”
++RANGE INTERVAL ’4′ MINUTE PRECEDING),
+++++“last5″ AS (PARTITION BY “RE”
++RANGE INTERVAL ’5′ MINUTE PRECEDING),
+++++“last15″ AS (PARTITION BY “RE”
++RANGE INTERVAL ’15′ MINUTE PRECEDING);

Detecting the rapid onset of congestion
Congestion is detected by comparing moving averages for the larger time window with that for the smaller time window. For example, comparing a 2-minute average with a 1-minute average:

CREATE OR REPLACE VIEW “CongestionRule1″ AS
SELECT STREAM
++–- name, ID, highway name, speed limit etc. for each road element
++“RE”, “reID”, “Carriageway”, “rePrescribed”, “reSpeedLimit”,
++–- volume of vehicle reports in each time window
++“reVehiclesLast1″, “reVehiclesLast2″, “reVehiclesLast3″,
++“reVehiclesLast4″, “reVehiclesLast5″, “reVehiclesLast15″,
++–- estimated avg speed for each road element
++“reSpeedLast1″, “reSpeedLast2″, “reSpeedLast3″,
++“reSpeedLast4″, “reSpeedLast5″,”reSpeedLast15″
FROM “EstimatedReSpeeds”
WHERE “reSpeedLast1″ < 0.80 * “reSpeedLast2″ AND – slowdown by 20 %
++“reSpeedLast2″ < 0.80 * “reSpeedLast3″ AND
++“reSpeedLast3″ < 0.80 * “reSpeedLast4″ AND
++“reSpeedLast4″ < 0.80 * “reSpeedLast5″ ;

SQLstream Traffic Congestion Detection - Visualization

Note that these estimated speeds are over overlapping windows and as such slowdown thresholds are set accordingly.

Fine tuning slowdown thresholds and other information, such as the proximity of traffic lights and the volume of vehicle reports in each time window, improves the quality of congestion detection algorithm.

The Google Earth screenshot illustrates real-time traffic view as well as detected slowdowns as pins. The severity of the slowdown is indicated by different shades of red.

August 23, 2011

At SQLstream we have a comprehensive implementation of standard SQL windowing operations such as SUM, COUNT and AVG. Recently though we needed a more sophisticated function for a decaying weighted average that would emphasize more recent samples over older samples. We implemented a new EXP_AVG() operation, as shown in this example query:

SELECT rowtime, ticker, price,
++EXP_AVG(price, INTERVAL ’10′ SECOND) OVER w
FROM t
WINDOW w AS (PARTITION BY ticker
+++++++++++++ORDER BY rowtime
+++++++++++++RANGE INTERVAL ’30′ SECOND PRECEDING);

EXP_AVG takes a value expression and an interval constant half life. In this example, two samples within the WINDOW are separated by 10 seconds, the older one will be given half as much weight as the newer one.

How would this look in standard SQL without an EXP_AVG function?

I thought it would be interesting to look at how a decaying average would be implemented in SQL without an EXP_AVG operation, as under the covers we implement it using standard SQL windowed sums. If we knew that the times for our rows would always fall under a narrow range, we would increase the weights of the samples as time goes forwards and simply scale down the total (rather than lowering the weights of samples as time goes backwards from the most recent sample). This would be straightforward. The SQL would look something like this.

First, we need a function to turn our time based units into something on which we can do arithmetic:

CREATE FUNCTION toSeconds(t TIMESTAMP, offset TIMESTAMP)
RETURNS DECIMAL(12,3) CONTAINS SQL
RETURN CAST((t – offset) SECOND(9,3) AS DECIMAL(12,3));

Then, we need a function for calculating exponential weight:

CREATE FUNCTION expWeight (seconds DOUBLE, halfLife DOUBLE)
RETURNS DOUBLE CONTAINS SQL
RETURN EXP(seconds * (LN(2)/halfLife));

The complete SQL will then be:

SELECT rowtime, ticker,
++SUM(price * expWeight(toSeconds(rowtime, OFFSET), 10)) OVER w
++/ SUM(expWeight(toSeconds(rowtime, OFFSET), 10)) OVER w
++as avgPrice
FROM t
WINDOW w AS (PARTITION BY ticker
+++++++++++++ORDER BY rowtime
+++++++++++++RANGE INTERVAL ’30′ SECOND PRECEDING);

Unfortunately this would not work in practice as the expWeight function would overflow once the row times got more than a few halfLifes advanced from OFFSET. We can’t reset the offset as long as there are any non zero values still in the window. This gives us our out. What we can do is partition the incoming values into two windowed aggregates, always sending zeros to one of them and switching and reseting the offset when the window for that aggregate fills with zeros, this way we’re never summing values whose weights are calculated using different starting offsets.

Here’s the SQL for this. We’ll need some more functions describing when and how we reset our offsets. We’ll need an arbitrary reference for our time calculations:

CREATE FUNCTION toSeconds(t TIMESTAMP)
RETURNS DECIMAL(12,3) CONTAINS SQL
RETURN CAST((t – TIMESTAMP ’1970-01-01 00:00:00′) SECOND(9,3)
+++++++++++AS DECIMAL(12,3));

To use modulo arithmetic in SQL, convert to an integer:

CREATE FUNCTION toMillis(t TIMESTAMP)
RETURNS BIGINT CONTAINS SQL
RETURN CAST(toSeconds(t))*1000 aS BIGINT);

We’ll want to partition time into window sized epochs starting at our arbitrary time reference:

CREATE FUNCTION isEvenEpoch (t TIMESTAMP, windowSeconds INT)
RETURNS BOOLEAN CONTAINS SQL
RETURN MOD(toMillis(t),
++windowSeconds*2000)< windowSeconds*1000;

We’ll use the start of each epoch as the offset for all rows that fall in that epoch:

CREATE FUNCTION epochStart (t TIMESTAMP, windowSeconds INT)
RETURNS decimal(12,3) CONTAINS SQL
RETURN CAST(MOD(toMillis(t), windowSeconds*1000)
+++++++++++AS DECIMAL(12,3))/1000;

The following two functions are used to correct the inserting of zeros into a window and all the non zero values still in the window are from the previous epoch:

CREATE FUNCTION evenAgingFactor (t TIMESTAMP,
+++++++++++++windowSeconds INT, halflife DOUBLE)
RETURNS DOUBLE CONTAINS SQL
RETURN CASE WHEN isEvenEpoch(t, windowSeconds)
+++++++++++++THEN 1
+++++++++++++ELSE expWeight(-windowSeconds, halflife) END;
CREATE FUNCTION oddAgingFactor (t TIMESTAMP,
+++++++++++++windowSeconds INT, halflife DOUBLE)
RETURNS double CONTAINS SQL
RETURN CASE WHEN isEvenEpoch(t, windowSeconds)
+++++++++++++THEN expWeight(-windowSeconds, halflife)
+++++++++++++ELSE 1 END;

The complete SQL that uses the two window approach to calculate the decaying average using a 30 second window and a 10 second half life is:

SELECT rowtime, ticker,
+++(oddAgingFactor(rowtime, 30, 10) *
++++SUM(CASE WHEN isEvenEpoch(rowtime, 30)
++++++++THEN 0
++++++++ELSE price * expWeight(epochStart(rowtime, 30), 10) END) OVER w
+++++ evenAgingFactor(rowtime, 30, 10) *
++++SUM(CASE WHEN isEvenEpoch(rowtime, 30)
++++++++THEN price * expWeight(epochStart(rowtime, 30), 10)
++++++++ELSE 0 END) OVER w)
++++/
++++(oddAgingFactor(rowtime, 30, 10) *
++++SUM(CASE WHEN isEvenEpoch(rowtime, 30)
++++++++THEN 0
++++++++ELSE expWeight(epochStart(rowtime, 30), 10) END) OVER w
+++++ evenAgingFactor(rowtime, 30, 10) *
++++SUM(CASE WHEN isEvenEpoch(rowtime, 30)
++++++++THEN expWeight(epochStart(rowtime, 30), 10)
++++++++ELSE 0 END) OVER w)
as avgPrice
FROM t WINDOW w AS (PARTITION BY TICKER
++++++++++++++++++++ORDER BY rowtime
++++++++++++++++++++RANGE INTERVAL ’30′ SECOND PRECEDING);

To understand how this would work in practice lets assume we start our query at 12:00. We’ll only look at the numerator of our query as the denominator uses the same technique to calculate a weighted count.

Time Window A state Window B state
12:00:00 Empty Empty
12:00:00 – 12:00:30 Values with increasing weights being added. Zeros being added.
12:00:30 Has weighted sum * expWeight(30,10) Zero
12:00:30 – 12:01:00 Zeros being added. Will have older part of weighted sum. Values with increasing weights being added. Will have newer part of weighted sum.
12:01:00 Zero. All non zero values will have been aged out. We can safely reset our scaling factor. Has weighted sum * expWeight(30,10)

With care to apply appropriate scaling to the parts this SQL can be used to calculate a weighted average using standard windowed operations. However, I think you’ll agree, it’s easier with our EXP_AVG() operation.

Posted under Streaming SQL

Vice President Marketing
August 10, 2011

The latest product update of SQLstream, version 2.5.1, has just been released and shipped.  This will be the final 2.x release prior to the SQLstream 3 launch, and although SQLstream 2.5.1 is predominately a maintenance release, it does include a range of feature enhancements, including:

- Support for exponentially decaying averages in windowed aggregation functions

- Enhancements to the standard Log File and Socket Adapters

A number of our customers have already downloaded and upgraded to the new version, and across a range of industries, including those in transportation, sensor network management, e-Commerce and web analytics.

Click here if you’d like to register for SQLstream product downloads (note – registration is required), or would  like to know more about solutions for real-time analytics and data management.

But what of SQLstream 3?  Well, watch this space.  SQLstream 3 will take a major step forward, not just in SQLstream’s stream computing product capability, but also in the way stream computing solutions are built and deployed.  More on this over the coming months.

Posted under Uncategorized

Test
July 20, 2011

SQLstream is helping to predict earthquakes across the world in real time. The system has been developed by a consortium of universities and government agencies, with funding from NSF (National Science Foundation), to provide an infrastructure of networked tools for research in ocean science – constructing an internet-based system to collect and share data.

Ocean Research Program Overview

This is a large system with 16,000 land and sea-based sensors, each sending several channels of seismic event data at a rate of 40 data points per second. The application executes in an Amazon EC2 Cloud on a cluster of SQLstream servers, connected by an AMQP message bus. SQLstream’s AMQP adapter (built with RabbitMQ) enables the streaming SQL application to view the AMQP bus as a domain of input and output data streams. The initial prototype was upgraded to a full-scale system running on a cluster of servers without any changes to the streaming SQL application.

SQLstream’s contribution to the project, an application that processes seismographic data in real-time, demonstrates:

  1. An operational deployment of streaming SQL for scientific calculations in real-time
  2. Real-time distributed processing in an Amazon EC2 cloud using SQLstream and AMQP
  3. Rapid development and rollout of real-time data applications using standards-based streaming SQL.

Seismic Monitoring
The sensor network contains about 16,000 seismic sensors, organized into grids, covering large parts of the North American continent and the adjacent oceans (see illustration below). Each sensor measures the motion of the ground under it in three dimensions, and transmits its data as several digitized channels, typically sampled 40 times a second.

Seismic Sensor Map

While the rate of each signal channel is modest (since seismic waves are low-frequency),
this adds up to a large amount of data to process in real time. Moreover, the rules for detecting a seismic event are heuristics that apply to a time interval of several minutes: so the application has to calculate some quantities from the raw data and to store these calculated values over a time window.

But to detect an earthquake reliably, it’s better to monitor all the sensors at once, looking for a disturbance in the signals that first appear in one place, and then appear in nearby places: a disturbance signal that propagates and changes shape in a way consistent with the physics of a seismic wave.

Real-time sensor network management in an EC2 Cloud
Monitoring tens of thousands of signal channels arriving at 40 sample points per second is a complicated problem, but it can be made simpler by breaking it into stages. We’re interested in earthquakes, which are infrequent, so SQLstream first reduces the amount of data by scanning each channel for patterns that suggest the beginning, the peak or the end of a quake: in other words reduce the dense signal to a sequence of interesting events. Then we can look for events detected on other channels that could be due to the same quake propagating in physical space and time.

In the first phase, we built a real-time seismic event detector in streaming SQL.
We translated a scientific algorithm into streaming SQL, and connected to the scientific sensor data infrastructure – using our AMQP adapter. This involved less than 100 lines of streaming SQL.

In the second phase, the prototype was scaled up to a full-size system, dealing with 16,000 sensor channels, running on multiple SQLstream server nodes created automatically in an Amazon Elastic Cloud. This expansion required no changes to the streaming SQL application developed for the initial prototype – simply running the same streaming SQL application inside an elastic container/manager.

Real-time Event Detection with Streaming SQL
The sensor data processing pipeline has five functional stages:

  1. Reading Messages – over the AMQP adapter. A configuration parameter specifies which “topics” SQLstream subscribes to, that is, which set of sensor channels it receives.
  2. Unpack Data – a user defined transform (UDX) unpacks data channel messages into individual data points.
  3. Signal Processing – extract higher order information from the raw data to identify seismic events given background noise and non-seismic ‘bumps’. An example function would be to calculate the ratios of multiple rolling averages of the signal value (x) over different time windows.

  4. Event Detection – The next stage applies heuristic rules to the processed data streams – the output is much sparser: a stream of significant events, each indicating a possible start/peak/stop of a seismic wave on a particular channel. If events of the correct type occur within a correct interval of each other (as shown in the Signal Plot Diagram), they are accepted as significant.
  5. Writing Messages – output (publish) significant events over the AMQP adapter.

Automatic EC2 scaling for Big Data sensor volume
The pipeline described has been proven to scale well to handle even greater data volume:

  1. Channels are processed independently, so scale as O(N).
  2. Additional channels are processed by adding more pipelines
  3. Each pipeline starts and ends with AMQP messages – the SQLstream / AMQP interoperability has been shown to scale well.
  4. To add a pipeline, we simply add another Elastic Compute server, running the same pipeline streaming SQL, but configured to subscribe to its own set of sensor channels.

This is the first part of a series of blogs describing the seimic monitoring solution. The next blogs will focus on the streaming SQL used in the application, and the SQLstream / AMQP architecture for Big Data scalability.


Vice President Marketing
May 18, 2011

San Francisco, CA / Detroit, MI USA, May 18, 2011 - Fontinalis Partners, LLC, a Michigan-based strategic investment firm, today announced that it has invested in SQLstream Inc., the first standards-based stream computing platform to enable companies to exploit and monetize their real-time service and sensor data. The financial terms of the transaction were not disclosed. SQLstream is headquartered in San Francisco and launched its first product to market in 2008.

SQLstream enables businesses to drive new revenue opportunities by harnessing the full power of their real-time service and sensor data, and to analyze and respond to streaming data on the fly without first storing. This eliminates any delay from when the data arrive to when new answers stream out, allowing services to react and adapt immediately, based on continuous, complex analysis. SQLstream is at the forefront of this real-time stream computing market, using a standards-based architecture for the rapid analysis of high volume, real-time data, and delivers truly innovative, lower cost solutions.

Fontinalis Partners recognized SQLstream’s potential following the Company’s success in assisting various transportation agencies around the world. With 800 million vehicles on the world’s roads today, a number forecast to grow to 4 billion by 2050, multi-modal transportation management systems will need to analyze real-time sensor and GPS data dynamically on a massive scale to reduce congestion and optimize personal mobility. SQLstream’s technology makes this possible today. Both Fontinalis and SQLstream believe that SQLstream is uniquely positioned to drive the use of real time data, the explosion of which will spur opportunity and innovation across countless industries.

Led by Chief Executive Officer Damian Black, SQLstream’s executive leadership team represents one of the most experienced in the successful application of real-time computing technology.

William Clay Ford, Jr. (“Bill Ford”), a Founding Partner of Fontinalis Partners and Executive Chairman of Ford Motor Company, commented “We’re excited to announce our partnership with the SQLstream team. Real-time systems that react immediately to changing traffic conditions are essential to finding sustainable solutions to the world’s most pressing congestion and environmental problems. SQLstream’s pioneering real-time technology will be instrumental in improving personal mobility across the globe.”

Fontinalis Partners is not affiliated with Ford Motor Company.

“Fontinalis shares our vision for the future of real-time stream computing” said Damian Black, SQLstream CEO. “Inexpensive wireless sensors are transforming many industries, including transportation, energy and manufacturing by generating huge volumes of data that can be analyzed and turned into useful information in real-time. Delivering real-time answers from massive volumes of data with minimal delay requires the type of cloud-scale applications offered by SQLstream that can be deployed easily, without the inefficiency, complexity and high latency of batch-based approaches.”

About SQLstream Inc.

SQLstream analyzes real-time data streams to deliver instant alerts, analytics and immediate answers to business decision makers. Using the industry standard SQL language, SQLstream executes queries on the wire, before data reaches the warehouse, enabling businesses to make smarter decisions sooner, adding real-time, operational intelligence to existing systems while reducing total cost and complexity. SQLstream is headquartered in San Francisco, California and is on the web atwww.sqlstream.com. For further information, please call Ronnie Beggs (877) 571-5775, or email pr@sqlstream.com

About Fontinalis Partners

Fontinalis Partners, with offices in Detroit and Boston, is a leading transportation technology strategic investment firm founded by Bill Ford, Ralph Booth, Mark Schulz, Chris Cheever and Chris Thomas. Fontinalis’ mission is to leverage the firm’s considerable management experience, market access, strategic relationships, international expertise, and background in transportation innovation to scale companies providing the transportation technology solutions of tomorrow. Fontinalis Partners, LLC, invests as a strategic partner across all facets of the world’s transportation infrastructure on a stage, structure and size agnostic basis. Fontinalis Partners is not affiliated with Ford Motor Company. For further information about Fontinalis Partners, please visit www.fontinalispartners.com or call (313) 432-0321.

Posted under Press Releases
March 23, 2011

Since Tuesday’s announcement that the Firefox Download Monitor is powered by SQLstream, we’ve received a number of questions about how it all fits together. I hope this description helps answer some of those questions.

Mozilla Firefox Real-Time Download Monitor - Day 2

SQLstream server executes SQL statements, just like standard SQL, except the SQLstream’s queries run continuously, analyzing input data in real-time as it arrives. Statements are presented via JDBC or user friendly tools which use JDBC internally. Statements are compiled/prepared, the planner/optimizer chooses an access plan, and a runtime engine executes the plan. SQLstream is compliant with SQL 2008 and 2003 with just a couple of extensions. One extension includes the keyword STREAM as part of a SELECT statement. The STREAM keyword indicates that the results are continuously streaming rather than a point in time TABLE.

Applications in SQLstream are constructed out of a set of SQL CREATE STREAM statements and SQL VIEWS against streams and other views. Those statements are assembled into a pipeline. When describing a pipeline we refer to statements on the source side as being upstream, and statements closer to the destination as being downstream.

In the middle of the pipeline, we define a stream named FirefoxDownloadStream_ which contains the results of the parsed and conditioned download events. The stream declaration is identical to a table definition with the exception of the type of the object being a STREAM rather than TABLE.

CREATE STREAM "FirefoxDownloadStream_" (
+++"download_type"++++++++++VARCHAR(15),
+++"utc_timestamp"++++++++++TIMESTAMP,
+++"product_name"+++++++++++VARCHAR(12),
+++"product_version"++++++++VARCHAR(12),
+++"product_major_version"++VARCHAR(12),
+++"product_os"+++++++++++++VARCHAR(10),
+++"locale_code"++++++++++++VARCHAR(5),
+++"country_code"+++++++++++VARCHAR(2),
+++"city_name"++++++++++++++VARCHAR(32),
+++"region_code"++++++++++++VARCHAR(2),
+++"longitude"++++++++++++++VARCHAR(8),
+++"latitude"+++++++++++++++VARCHAR(8)
);

The stream is populated with a SQL INSERT-SELECT statement. Again, standard SQL statements are used. The WHERE clause defines that the downloads include new first time downloads, complete upgrades of prior versions of Firefox, or partial upgrades of prior versions of Firefox.

INSERT INTO "FirefoxDownloadStream_"
++("download_type",
+++"utc_timestamp",
+++"product_name",
+++"product_version",
+++"product_major_version",
+++"product_os",
+++"locale_code",
+++"country_code",
+++"city_name",
+++"region_code",
+++"longitude",
+++"latitude"
++)
SELECT STREAM
+++"dlType"+++AS "download_type",
+++"dlTime"+++AS "utc_timestamp",
+++"product"++AS "product_name",
+++"version"++AS "product_version",
+++"GetMajorVersion"("version") AS "product_major_version",
+++"os"+++++++AS "product_os",
+++"lang",++++AS "locale_code",
+++"cc",++++++AS "country_code",
+++"city",++++AS "city_name",
+++"rg",++++++AS "region_code",
+++CAST("latitude" AS VARCHAR(10)) AS "latitude",
+++CAST("longitude" AS VARCHAR(10)) AS "longitude"
FROM "FirefoxCountryFilter"
WHERE (("dlType" IS NULL) OR ("dlType" = 'complete') OR ("dlType" = 'partial'));

The download events contain the time of each download. Mozilla has a number of download servers feeding the worldwide requests to download Firefox. Each of these servers feeds the results of the download requests to a common logfile which is “tailed” by SQLstream. As the time for each download differs due to each client’s network capacity, the download requests may be slightly out of order. In practice the biggest gap we’ve seen is 4 seconds. Since we’re measuring downloads over the long period of time, it was deemed sufficient to adjust the download time of late arrivals to match the most recent download time.
The following SQL statement does that adjustment.


CREATE OR REPLACE VIEW "FirefoxDownloadStream" AS
+++SELECT STREAM MAX("utc_timestamp") OVER(ROWS UNBOUNDED PRECEDING)
++++++++++AS ROWTIME,
++++++++++*
+++FROM "FirefoxDownloadStream_";

SQLstream associates a ROWTIME with each row in a STREAM. The ROWTIME is a monotonically increasing SQL timestamp. In the default case, the ROWTIME is the current time expressed in UTC. Most applications require time to be defined according to time associated with the data itself. Associating the ROWTIME of a row in a stream based on the data contents of the row, is done by the AS ROWTIME clause for an individual column. In the Mozilla pipeline, we set the ROWTIME to be the maximum of the values in the “utc_timestamp” column to be that rows ROWTIME.
The analytics portion of the pipeline is implemented with a standard SQL statement. For example, each 10 seconds the number of downloads for each product, version, … country, city, region is calculated.


CREATE OR REPLACE VIEW "FirefoxStreamForLocationCounters"
DESCRIPTION 'Compute product counters for a minute' AS
+++SELECT STREAM
++++++++++"download_type",
++++++++++"product_name",
++++++++++"product_major_version",
++++++++++"product_version",
++++++++++"country_code",
++++++++++"region_code",
++++++++++"city_name",
++++++++++"latitude",
++++++++++"longitude",
++++++++++count(*) AS "count"
+++FROM "FirefoxDownloadStream" F
+++GROUP BY FLOOR(F.ROWTIME TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '10' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '20' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '30' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '40' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '50' SECOND TO MINUTE),
++++++++++++"product_name",
++++++++++++"download_type",
++++++++++++"product_major_version",
++++++++++++"product_version",
++++++++++++"country_code",
++++++++++++"region_code",
++++++++++++"city_name",
++++++++++++"latitude",
++++++++++++"longitude";

There is a similar view declaration where similar calculations are done for each product. Most of the interest since Tuesday is of course related to Firefox 4.0 downloads. This second view allows Mozilla to drill down on downloads by platform as well as downloads for previous (and future) Firefox versions.


CREATE OR REPLACE VIEW "FirefoxStreamForProductCounters"
DESCRIPTION 'Compute product counters for a minute' AS
+++SELECT STREAM
++++++++++"download_type",
++++++++++"product_name",
++++++++++"product_major_version",
++++++++++"product_version",
++++++++++"product_os",
++++++++++count(*) AS "count"
+++FROM "FirefoxDownloadStream" F
+++GROUP BY FLOOR(F.ROWTIME TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '10' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '20' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '30' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '40' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '50' SECOND TO MINUTE),
++++++++++++"product_name",
++++++++++++"download_type",
++++++++++++"product_major_version",
++++++++++++"product_version",
++++++++++++"product_os";

Each of these views (FirefoxStreamForLocationCounters and FirefoxStreamForProductCounters) is based on the FirefoxDownloadStream. Each defined stream and view is a point where the application can access data either directly or via another VIEW or INSERT…SELECT.

One component of the solution is a piece of code we call the HBaseAgent. The agent uses the JDBC interface to SQLstream and issues a SELECT * FROM each of the described views containing the location and product counter 10-second download counts. The HBaseAgent maps each fetched row to the HBase schema as defined by Mozilla.

I write this blog about 24 hours after Firefox 4 launched. So far there are more than 8 million downloads of Firefox 4. It certainly has been an exciting day for Mozilla and I congratulate everyone who contributed. I’m happy that SQLstream has been able to contribute to their success.

March 22, 2011

SQLstream has been powering Mozilla’s Firefox Download Monitor since 2009. A SQLstream based application has been continually aggregating hundreds of millions of download events, receiving minute by minute aggregations via a continuously running SQL SELECT statement using the SQLstream JDBC driver. A continuously running SELECT statement is syntactically and semantically identical to other SELECT statements with the addition that end of data is never returned in SQLSTATE by the FETCH associated with the cursor.

Mozilla 4.0 Real-Time Download Monitor is Powered By SQLstream

For the launch of Firefox 4, Mozilla again turned to SQLstream to enhance the download monitor. Applications in SQLstream are built by defining a series of SQL stream definitions and SQL views. Business rules are embedded in these definitions. Definitions are assembled into a pipeline and each definition provides a point where data is available to applications. A stream definition is analogous to a SQL table definition and contains the column names and data types for each defined element. Each stream has an implicit ROWTIME column, a monotonically increasing value associated with the data in each column.

The download monitor tails the log files written by all of Mozilla’s download servers which provide new versions of Firefox. Each entry in the log is parsed. The IP address of each download is converted to a country, city, region, latitude, and longitude. For the Firefox 4 release, Mozilla wanted the results of the download aggregation to be stored in an HBase table in their HBase/Hadoop cluster. Storing the data allows future historical analysis to complement the realtime analysis provided by SQLstream.

SQLstream aggregates downloads to two separate column families in a single HBase table. The ‘product’ column family contains the overall download count. The ‘location’ column family contains the count of downloads for each country, region, city, latitude, longitude.

SQLstream uses a GROUP BY clause along with a COUNT(*) to calculate the number of downloads for each 10 seconds. The author wrote a new piece of SQLstream code which provides an interface from SQLstream to HBase. The HBaseAgent maps the results of the GROUP BY and calls the HBase API to persist data in HBase. The incrementColumnValue API is key in that it allows SQLstream to aggregate download counts on a realtime basis and efficiently update HBase by providing incremental values.

The application periodically reads data from HBase and sends formatted data to each connected browser. See for yourself at http://glow.mozilla.org/. The map provides a running count of Mozilla Firefox 4.0 download with raindrops on the map indicating each location where one or more downloads have just occurred. As I write this blog entry, I can see that Europe and Asia are hot while North America is just waking up. Clicking on the colored rings in the lower left hand corner of the map, allows drill down to the geographic locations.

Mozilla’s Daniel Einspanjer has also blogged about their new real-time download vizualization application. The blog explains the overall architecture of the real-time application using SQLstream, and the SQLstream integration with HBase.

You can read more about the previous Firefox 3 download monitor on Julian Hyde’s blog. Julian is the CTO of SQLstream.

Mozilla also blogged about the history of the Firefox 3 download monitor on the Mozilla Webdev blog.


Vice President Marketing
February 21, 2011

With service and sensor data growing at 60% CAGR, having both the raw power and correct architecture for processing streaming data is essential. IDC released recently estimates for the size of the ‘Digital Universe’ – a term used to describe every electronically stored piece of data. According to IDC, stored data will reach 1.8 million petabytes (1800 exabytes) by the end of 2011.

Data overload (source IDC)

As a recent article in the Economist points out, all of this data raises significant processing performance and storage issues. Conventional database technology requires data to be stored, cleaned and aggregated before being queried. With the volume of data growing so quickly, it has become cost prohibitive and technologically infeasible to process all data using conventional solutions.

But how much of the raw data actually needs to be stored? The value of individual data is often low, and the useful lifetime of the raw data short. However, the information content is potentially high – it’s just a matter of identifying the valuable information in the raw data.

Introducing SQLstream Server 2.5

For SQLstream, this is the future of data processing – real-time, continuous analysis of streaming data – generate operational business intelligence from live streaming data without first storing the data in a database.

For the latest release of SQLstream Server, SQLstream 2.5, we’ve focussed on the common business requirements that are required for the rapid adoption of real-time stream computing across all markets – performance, reliability and scalability. More specifically, SQLstream 2.5 offers:

  • 10X performance improvement, benchmarked against live operational deployments on a single server installation.
  • Scalability for mission critical applications with federated installations across multiple servers.
  • Business critical reliability following an exhaustive stability and operational optimization program.

Of course, we’ve also addressed a range of important requirements across our customer base, in particular, additional input and output connectors built on the SQL/MED standard for integration, including:

  • enhanced database insert/update/select Adapters.
  • enterprise messaging integration using AMQP.
  • enhanced Log File management and XML feed processing Adapters.

And last but by no means least, supporting the SQL:2008 standards-based streaming SQL language with new functions including:

  • support for GROUP BY ORDER BY.
  • new and enhanced data analysis functions for detecting unique events, such as early emit SELECT DISTINCT.
  • support for the SQL HAVING function.
  • and a new range of streaming statistical functions for calculating variance and standard deviation.

Most existing customers have already upgraded to SQLstream 2.5. Some examples of recent SQLstream 2.5 upgrades include customers in the following markets:

Intelligent Transportation – real-time analytics for the intelligent transportation market. A case study for SQLstream’s Intelligent Transportation solutions was featured recently in ITS International magazine. An overview of the product’s features can be found on www.sqlstream.com/products/transport.

Environmental monitoring and event detection – integrating with AMQP, which provides the guaranteed delivery of real-time raw data from a large sensor network, SQLstream filters (using windowed aggregation) the raw sensor and applies event detection patterns in real-time, generating a continuous stream of environmental exceptions events.

Social gaming infrastructure – working with a new entrant in the on-line social gaming market, SQLstream monitors user activity and provides continuous real-time scoring updates – including real-time incremental updates of historical, aggregated game data maintained in a back-end data warehouse.

Posted under Big Data
November 8, 2010

PostgreSQL Conference: West 2010Attendance at the PostgreSQL West 2010 Conference was encouraging considering a million people had gathered in the city to celebrate the World Series victory of the San Francisco Giants.

I’ve posted the presentation from the event (previously blogged about here).

We presented the concepts of Streaming GIS, integrating SQLsteam’s real-time streaming data analytics with the PostgreSQL-based Geographic Information Systems (GIS) engine PostGIS. With examples from SQLstream’s commercial traffic congestion monitoring application, we discussed how sophisticated high performance real-time geospatial applications can be delivered quickly and easily using standards-based SQL.

Streaming GIS using PostGIS and SQLstream at PostgreSQL West 2010

 

 

November 2, 2010

PostgreSQL Conference: West 2010SQLstream’s founding engineer Sunil Mujumdar is set to present at PostgreSQL Conference: West 2010.

In a talk entitled ‘Streaming GIS using PostGIS and SQLstream’, Sunil will describe the SQLstream stream computing platform based on industry-standard SQL, and its integration with the PostgreSQL-based Geographic Information Systems (GIS) engine PostGIS.

Wireless sensors and internet services are generating data faster than conventional database technologies can process that data. In particular, mobile resource management requires the stream processing of high volume, location-based data. Solutions require new methods such as Streaming SQL to address these new sources of big data, in real-time. We’ll be discussing key concepts in stream computing, data warehouse feeds and the integration of PostGIS into a high performance streaming environment.

Using mobile resource management as a case study, we will illustrate with examples from SQLstream’s commercial traffic monitoring application.


Vice President Marketing
October 26, 2010

A streaming SQL query is a continuous, standing query that executes over streaming data. Data streams are processed using familiar SQL relational operators augmented to handle time sensitive data. Streaming queries are similar to database queries in how they analyze data; they differ by operating continuously on data as they arrive and by updating results in real-time.

Streaming SQL queries process dynamic, flowing data, in contrast to traditional RDBMSs, which process static, stored data with repeated single-shot queries. Streaming SQL is simple to configure using existing IT skills, dramatically reducing integration cost and complexity. Combining the intuitive power of SQL with this simplicity of configuration enables much faster implementation of business ideas, while retaining the scalability and investment protection important for business-critical systems.

By processing transactions continuously, streaming SQL directly addresses the real-time business needs for low latency, high volume, and rapid integration. Complex, time-sensitive transformations and analytics, operating continuously across multiple input data sources, are simple to configure and generate streaming-analytics answers as input data arrive. Sources can include any application inputs or outputs, or any of the data feeds processed or generated within an enterprise. Examples include financial trading data, internet clickstream data, sensor data, and exception events. SQL can process multiple input and output streams of data, for multiple publishers and subscribers. To learn more about Streaming SQL, please read our “Concepts in Streaming SQL” mini-white paper.

Posted under Streaming SQL
October 18, 2010

In the game industry, complex game logic needs to be applied to streams of events generated by gameplay.  In single player games, this logic is simply handled by applying the correct computations.  However, in an Internet based social game where millions of players interact together online, the problem takes on an entirely different dimension.  Storing the game events on disk inside of a database becomes increasingly difficult as the rate of gameplay events increases.  Logic and computation must be applied to the data and a disparate set of data must be queried to correctly update the game state.

The solution can be surprisingly simple and similar in form to storing all gameplay events in a traditional database.  SQLstream’s powerful streaming and windowed aggregation capabilities can reduce this use case and complex logic to a single query.

For example, consider a game where users create videos that are viewed and rated by other users:

  • A score for the video must be computed based upon the last 4 weeks of gameplay.
  • Let’s say a view is worth 12 points and various ratings levels are worth between 0 and 30 points.
  • The content’s score is the total points accumulated in the last week, plus 50% of the points in the week before that, plus 20% the points in the week before that and 10% of the points in the week before that.

Let’s assume we have two streams of data which contain streams of gameplay events:

--
-- A stream containing 1 row per an individual view of a video
--
CREATE STREAM "s_video_view" (
++++++++++++++++"video_id" INTEGER,+++++-- the viewed video
++++++++++++++++"user_id" INTEGER,++++++-- the id of the viewer
++++++++++++++++"performer_id" INTEGER++-- the id of the performer
++++++++++++++++);


--
-- A stream containing 1 row per an individual rating of a video
--
CREATE STREAM "s_video_rating" (
++++++++++++++++"video_id" INTEGER,+++++-- the viewed video
++++++++++++++++"performer_id" INTEGER,+-- the id of performer
++++++++++++++++"rater_id" INTEGER,+++++-- the id of the rater
++++++++++++++++"rating" INTEGER++++++++-– the rating given
++++++++++++++++);

To handle all the described business logic, we must first compute the total number of points generated from each gameplay event and add that to the stream of tuples entering the system.  The following query accomplishes this:

SELECT STREAM "performer_id", "video_id", 12 AS "points"
+++++FROM "s_video_view"
UNION ALL
SELECT STREAM "performer_id", "video_id",
++++++++++CASE "s_video_rating"."rating" WHEN 2 THEN 1
+++++++++++++++++++++++++++++++++++++++++WHEN 3 THEN 8
+++++++++++++++++++++++++++++++++++++++++WHEN 4 THEN 20
+++++++++++++++++++++++++++++++++++++++++WHEN 5 THEN 30
+++++++++++++++++++++++++++++++++++++++++ELSE 0
+++++++++++++++++++++++++++++++++++++++++END AS "points"
+++++FROM "s_video_rating"

This query will produce a stream of data containing the performer_id, video_id, and points for each scoring event in the system.

Next we must compute rolling time based score over the event stream for each unique video_id.  SQLstream provides the SQL:99, SQL:2003, and SQL:2008 standard WINDOW facility to make this easy:

WINDOW "last_7_days" AS ( PARTITION BY "video_id"
++++++++++++++++++++++++++RANGE INTERVAL '7' DAY PRECEEDING),
+++++++"last_14_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++RANGE INTERVAL '14' DAY PRECEEDING),
+++++++"last_21_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++RANGE INTERVAL '21' DAY PRECEEDING),
+++++++"last_28_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++RANGE INTERVAL '28' DAY PRECEEDING)

These rolling windows contain all the scoring events over the last 7, 14, 21, and 28 days respectively grouped by the video_id.  This means that any aggregation function applied to that window will be applied to the stream events with the same video_id.  So, COUNT(*) OVER "last_7_days" would produce one row for each unique video_id with a scoring event in the last week.  Those rows would contain a count of the number of scoring events for each unique video_id.

By subtracting the SUM of the points in the 7 day window from the number of points in the 14 day window, we can compute the number of points in the week starting two weeks ago and ending one week ago.  This technique allows us to implement computations on rolling windows that are not bounded by the current time.

Putting the entire example together, we get the following view:

CREATE VIEW "v_video_score" AS
+++++SELECT STREAM "video_id", "performer_id",
+++++++++++++++SUM("points") OVER "last_7_days" +
+++++++++++++++++((SUM("points") OVER "last_14_days" -
+++++++++++++++++++SUM("points") OVER "last_7_days") * 0.5) +
+++++++++++++++++((SUM("points") OVER "last_21_days" -
+++++++++++++++++++SUM("points") OVER "last_14_days") * 0.2) +
+++++++++++++++++((SUM("points") OVER "last_28_days" -
+++++++++++++++++++SUM("points") OVER "last_21_days") * 0.1) AS "score"
++++++++FROM ( SELECT STREAM "performer_id", "video_id", 12 AS "points"
++++++++++++++++++++FROM "s_video_view"
+++++++++++++++UNION ALL
+++++++++++++++SELECT STREAM "performer_id", "video_id",
+++++++++++++++++++++++++++CASE "s_video_rating"."rating" WHEN 2 THEN 1
+++++++++++++++++++++++++++++++++++++++++++++++++WHEN 3 THEN 8
+++++++++++++++++++++++++++++++++++++++++++++++++WHEN 4 THEN 20
+++++++++++++++++++++++++++++++++++++++++++++++++WHEN 5 THEN 30
+++++++++++++++++++++++++++++++++++++++++++++++++ELSE 0
+++++++++++++++++++++++++++++++++++++++++++++++++END AS "points"
++++++++++++++++++++FROM "s_video_rating" )
++++++++WINDOW "last_7_days" AS ( PARTITION BY "video_id"
++++++++++++++++++++++++++++++++++RANGE INTERVAL '7' DAY PRECEEDING),
+++++++++++++++"last_14_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++++++++++RANGE INTERVAL '14' DAY PRECEEDING),
+++++++++++++++"last_21_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++++++++++RANGE INTERVAL '21' DAY PRECEEDING),
+++++++++++++++"last_28_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++++++++++RANGE INTERVAL '28' DAY PRECEEDING);

which produces a stream of rows containing the video_id, performer_id, and a score computed over the rolling windows.  This view outputs a row every time a new event is inserted added the system.  The complex business logic has been reduced to one SQL query.

By attaching the results of this query to a foreign table, a nearly real-time cache of the videos and their current scores can be maintained in your database.  Alternatively, by using SQLstream’s JDBC driver, a distributed caching system such as memcached could be kept updated with the latest scores for each video simply by calling this query.  But those are topics for another post.

I hope you enjoyed this simplified real world example of how streaming SQL can:

  • simplify your business logic processing
  • improve your ability to deliver real-time data to your customers, clients, and colleagues.
Posted under Streaming SQL

Vice President Marketing
October 4, 2010

Businesses need to respond faster than ever to customer information and demands, which are arriving in rapidly increasing volumes from ever more diverse and distributed systems. This need for real-time business models can not be addressed by traditional integration and business intelligence solutions because streaming analytics and related concepts are central to the solution. The real-time model means responding immediately to new information as it arrives and streaming analytics is at the core of these next generation IT systems.

Increasing the speed of business under these pressures of rapidly increasing data volume and more diverse data sources has been expensive and complex. Rapid responsiveness has proved elusive because real-time needs simply cannot be met by delivering more information faster from historical data. Real-time businesses require distributed technology that provides low latency and high-performance processing of data and event streams. By using continuous, streaming SQL queries, business answers can be generated as soon as input data becomes available. Whereas databases query historical data, streaming SQL queries and transforms data on the wire without any prior staging in a database.

As a result, streaming SQL is complementary to traditional EAI, business intelligence, and data warehousing solutions. By completing real-time processing and analysis before storing the data, streaming SQL delivers reduces the cost of processing rapidly arriving data. Even better, streaming SQL makes existing, in-house SQL skills immediately applicable to real-time analysis, reducing integration time and costs.

To learn more about the Business Case for Streaming SQL, please read our “Concepts in Streaming SQL” mini-white paper.

Posted under Streaming SQL
July 29, 2010

Railroads have used track side readers to scan bar codes on the sides of freight cars since the 1970s. Such sensors provided real time tracking of goods as they made their way from the supplier to the delivery point. Retail businesses increased the use of RFID tags in the past 20 years to track goods through the manufacturing process. Since the Indian Ocean tsunami of December 2004 the public has become aware of deep water pressure sensors which sit on the ocean floor to detect tsunamis and are intended to generate warnings about potential disasters.

The cost of sensors has decreased significantly in recent years and as a result inexpensive sensors are present nearly everywhere in businesses. As the price of sensors decreases it becomes economically feasible to deploy thousands and even millions of sensors. Such sensors cumulatively generated huge volumes of data. Imagine placing a sensor capable of measuring temperature, humidity, sun light and air pressure sensor within each square kilometer in the state of Iowa to assist farmers in managing crop production. Now imagine each of those 145,743 sensors generating 100 bytes of data every minute resulting in a data volume of nearly 21GB per day.

There is much buzz about Big Data and the challenges of applying traditional database management tools to extract business value from such data. Fortunately, there is a better way – integrating real time data, as provided by sensors, with stream analytic processing, allows timely enterprise decisions in response to changing conditions.

I urge you to read Damian Black’s recent postings on this blog describing the SQLstream approach to “Big Data”.

(more…)


CEO
July 1, 2010

GigaOM Structure 2010 Big Data and Cloud ComputingLast week I was on a panel for “Big Data” at Structure2010 – a GigaOm event. As usual, it was very well run and there was a large throng of silicon valley luminaries ranging from entrepreneurs to venture capitalists scattered in with some large customers and users of technology. We clearly have moved on a long way from the days when I was told to change my slides and remove the cloud graphic and replace it with a box because “clouds are cloudy” (direct quotation from a tier one venture capitalist – I wish to protect his identity to avoid personal embarrassment).

SQLstream is already the market leader in applying stream computing to Intelligent Transportation Systems, and we also have the opportunity to provide a similar impact to the Cloud Computing Service Monitoring space. It seems we have exactly the perfect solution to provide real-time insights into service usage, bottlenecks, error rates and service level compliance. And you can add regulatory compliance to that list too – from the continuous alerting side to complement the excellent historical solutions that are out there.

From the presentations at the show, it is clear that Cloud Computing has truly come of age. SQLstream uses cloud services for all demonstrations and also in our QA and Engineering processes. We also have customers deploying in the cloud. The latest emerging cloud solutions fill in many of the former technology gaps, allowing seamless integration into or transition from traditional data centers. You can even run your own private clouds leveraging the same APIs available on the public clouds.

On the Big Data front, on the panel alongside SQLstream were a Hadoop vendor and a high-performance column store data warehouse vendor. The other two panelists were users of “big data” technologies. It was interesting to discover that we already had two implementations where SQLstream operates in concert with or in parallel with the other two panelist vendors’ technologies.

There is even a customer (Mozilla) that uses all three technology approaches for download analytics – Hadoop in the form of HBase and a column store data warehouse for historical SQL queries over downloads, and SQLstream to generate high-performance continuous real-time analytics and reporting on download statistics for all versions of FireFox. This clearly demonstrates that there is a role for each of the Big Data technologies high-lighted on the panel, and an interesting and growing market opportunity. It also indicates some clear partnership opportunities.

I look forward to seeing the developments in our space and in cloud computing over the coming year and hope to be invited back again soon. We were originally present on the Big Data panel at GigaOm’s inaugural Structure2008 event, so I guess we should be set for a reappearance at Structure2012?! If so, I am sure we will have some exciting new stories to share.

Here is a link to the video recording of the panel session. A big thank-you to Phil Hendrix for his excellent moderation of the panel and the professional preparation work he did beforehand so that the actual event went smoothly.

Posted under Big Data

CEO
June 22, 2010

GigaOM Structure 2010 Big Data and Cloud Computing There is a lot of buzz these days about the challenge of “Big Data”.  I’ll be speaking on the subject at GigaOM’s Structure2010, on the “DEALING WITH THE DATA TSUNAMI: THE BIG DATA” panel. There are many dimensions to the challenges posed by “Big Data”, which I’ve presented here as five separate but related themes.

Speed of data arrival

The first theme is speed.  When a lot of data arrive fast, it is often overlooked that they arrive in raw form and need to be processed or cooked before they can be of any real value. The processing normally comprises cleaning, filtering, aggregating and validating.  Sometimes the data need to be enhanced, normalized or de-normalized.  While there are a number of proprietary ETL tools out there that can help, most people prefer to perform these operations using SQL.  This approach has become known as ELT as the data are Extracted, Loaded and then Transformed (as opposed to Transformed then Loaded).  In the past, this has meant loading raw data into a data warehouse’s staging tables and then performing the ELT with SQL in batches until the data are fully cooked and ready to take part in the “main course” queries.

One of the strengths of the SQLstream approach is that for the first time you can use standards-based SQL for performing these ELT steps but as Continuous ETL rather than operating upon the data after first storing it.  We call this “analyze-before-store” approach: Query the Future – as the scope of the continuous queries is from the moment they start until the end of future time (in contrast with historical queries whose scope is from the moment they start until as far back in time as the data are stored).  SQLstream’s queries continuously process, clean, aggregate and enhance the data in a highly parallelized dataflow pipelined process.  The staging is in main memory using 64-bit architecture and multiple cores and servers.  This provides a highly scalable efficient and cost effective solution to ETL, with the virtuous side-effect of enabling the data warehouse to be kept continuously up-to-date by feeding it a stream of fully cooked data and updating its aggregate tables continuously in near real-time.  All of this is done without stealing valuable cycles of the data warehouse server.

Data location

The second theme is data location.  Like houses, location is very important when it comes to assessing the value (or usefulness) of the data.  Location might be spatial or temporal.  If you wish to be alerted of a special price for gas at a specific gas station, clearly it is of greater value if you are currently in the immediate vicinity of the gas station.  This shows the value of both the location in space and the location in time.  In contrast, most data warehouses dumbly store all service data and records without regard to their value.

Clearly, the value of the data in many cases greatly diminishes over time.  Many of the queries that a business might pose are better targeted at current data.  That is particularly true of targeted advertisements, but also when monitoring customer service level, cloud computing infrastructure and the like.  The data are much more valuable when the business is able to take proactive initiative to capitalize on the value – fixing problems or issues before they negatively impact customers, or making that promotion or sale before the customer purchases product or service from a competitor.  SQLstream’s continuous queries are all about focusing analytics where they have the most value by specifying explicit windows of focus for the queries in terms of time, quantity or space.  While many rows can flow into and out of the window of focus for any given query, the window represents the immediate focus of attention.

Pace of change

The third theme is the pace of change of data.  If you have a large quantity of data that is not changing very much, then historical queries and analysis will no doubt provide you with all of your answers.  However, if the data are changing constantly, or a lot of new data arriving constantly, or if you have a focus on a specific window of time or space, then historical analysis has little value.  What you care about is the derivative of the change – the rates of change.  For example, are our sales accelerating or decelerating?  Is the rate of acceleration unusually high or low?  What about service outages and error rates?  Or customer complaints?  The SQLstream approach enables you to see what is changing rather than what is staying the same.  It is analogous to predator vision: the predators want to see what is moving and their vision system prioritizes that over what remains motionless.  SQLstream provides such dynamic vision.

Balancing historical and continuous analysis

The fourth theme is the need to complement data mining and the results of historical analysis with continuous analysis.  Data warehousing allows you to find patterns and predictors from past data and to back test all of your hypotheses over extended periods of time.  The back testing of such hypotheses often takes the form of SQL queries that search for patterns of changes of data over time and check that the predicted results occurred and with what frequency.  Once you have mined and captured such valuable predictors, it is straightforward to take the SQL you have generated and tweak it to be used in real-time, continuously executed against live data.  Using this approach, SQLstream allows you to leverage you data mining results to perform real-time predictive analytics, giving your business a real-time heads up for key indicators of buying signals, or systems’ failure or what ad should be served up based on a customer’s web behavior.

Brain over brawn processing

My fifth and final theme is “smart declarative” versus “dumb brute force” when applied to data queries.  The latter is how I see Hadoop-based approaches.  You parallelize a problem to take advantage of a lot of available servers and related CPU cycles, but you do not rely on any intelligence on how you partition the problem.  In fact not having to “think” is one of the primary appeals of the technique.  It is a brute force method of brawn over brain.  However, where the problem space is truly huge, or the time or financial budget is more limited, there is always the attraction of the “brain over brawn” technique.  Declarative SQL processing draws upon the mathematical tractability of analyzing patterns and dependencies within the data, the use of keys and indexing, the rewriting of complex formulae into simpler ones and avoiding recalculation of intermediate results – in order to provide a faster, more efficient and smarter way of finding the solutions.  Such declarative techniques can still take extensive advantage of parallelism and inexpensive or available servers and CPU cycles, but they rely on smart analysis in order to optimize the calculations.  SQLstream, and all SQL-based data warehouses, heavily draw upon these mathematical SQL properties and patterns and analysis of the data to do the smart thing when it comes to query processing.

Stream Computing of the kind embodied by SQLstream however has even greater potential to take advantage of parallelism over and above SQL data warehouses because SQLstream’s Stream Computing has no transactional bottleneck and is purely declarative.  Input streams are not “side-effected” by the execution of stream SQL statements, rather new streams are created from the original ones (which are left untouched and can be presented concurrently to other SQLstream servers).  The execution paradigm is one of parallel dataflow execution – a paradigm that lends itself not only to massive parallel execution but also to massively distributed execution.  I believe that as Hadoop becomes more widely understood and deployed, people will begin to see just how much of a better job could be performed by adding a little intelligence and just how powerful declarative stream computing can be.

Posted under Big Data
June 10, 2010

Last year has been an interesting experience as I participated in a number of customer “Proof Of Concept” projects for SQLstream. Developing these real-time, stream computing projects greatly increased my appreciation for the advantages of an open, extensible and standards-compliant middleware infrastructure.

For example, I needed to implement an “edge detection” mechanism for a POC project. My colleagues at SQLstream recommended using “Bollinger bands” for determining outliers. So, I browsed through the  wikipedia entry for Bollinger Bands to learn more. Bollinger bands are very similar to standard deviations or quartile deviations. A Standard deviation measures variability or dispersion in data distribution. Bollinger bands, on the other hand, provide thresholds to filter outliers in the data. In fact, Bollinger bands are based on the moving average and moving standard deviation of the data set. For typical data sets, Bollinger bands can be defined as:

lowerBB(lower Bollinger Band) = avg – (k * stddev),

upperBB(upper Bollinger Band) = avg + (k* stddev)

where avg and stddev are the average and standard deviation over a sufficiently large time window and k is the constant that needs to be determined for the activity being monitored. For typical data sets, k = 2 will create the upper bollinger band at 95th percentile of the data set.

Bollinger Bands are widely used in the financial services industry. However, Bollinger Bands can be applied to solve problems in other industries. (As I am not claiming to be a statistics expert, I would certainly appreciate honest feedback on our application of Bollinger bands in streaming queries.)

Bollinger bands certainly are a good tool to identify sudden spikes in the activity being monitored in real-time. A number of examples come to my mind,

  • Sudden spikes in the price for a ticker symbol in a stock exchange. For example,

SELECT STREAM ROWTIME, ticker, price,

FROM (SELECT STREAM ROWTIME, ticker, price,

AVG(price) OVER (PARTITION BY ticker RANGE INTERVAL ’1′ HOUR PRECEDING) AS “avgLastHour”,

STDDEV(price) OVER (PARTITION BY ticker RANGE INTERVAL ’1′ HOUR PRECEDING) AS “stdDevLastHour”,

AVG(price) OVER (PARTITION BY ticker ROWS 5 PRECEDING) AS “avgLast5Trades”

FROM BIDS) AS S

WHERE S.”avgLast5Trades” > S.”avgLastHour” + 2 * S.”stdDevLastHour”;

  • Spikes in the error rate on a web server. For example,

SELECT STREAM ROWTIME, url, “numErrorsLastMinute”,

FROM (SELECT STREAM ROWTIME, url, “numErrorsLastMinute”,

AVG(“numErrorsLastMinute”) OVER (PARTITION BY url RANGE INTERVAL ’1′ MINUTE PRECEDING) AS “avgErrorsPerMinute”,

STDDEV(“numErrorsLastMinute”) OVER (PARTITION BY url RANGE INTERVAL ’1′ MINUTE PRECEDING) AS “stdDevErrorsPerMinute”

FROM “HttpRequestsPerMinute”) AS S

WHERE S.”numErrorsLastMinute” > S.”avgErrorsPerMinute” + 2 * S.”stdDevErrorsPerMinute”;

  • Monitoring call volumes in a call center.
  • Analytics on social/online gaming services.

In the Stream Computing context, Bollinger bands provide the high/low-water marks for monitoring activity. Whenever the level of recent activity crosses these Bollinger Band thresholds, the activity can be flagged. The streaming analytics engine can then perform additional analytics to detect patterns in the activity and to provide actionable information to regulate the system that is being monitored. At the very least, Bollinger bands can be used to filter out “uninteresting” rows from the stream, thereby reducing the load on the streaming pipeline.

At SQLstream, we used windowed aggregation functions such as AVG() OVER (…) and STDDEV() OVER (…) to establish Bollinger bands. It is necessary to compute AVG and STDDEV on sufficiently large windows of time. In a streaming context, we used sufficiently large windows of time to calculate Bollinger bands. So, as the window slides forward in time, the Bollinger bands reflect more recent activity levels. The current activity levels can then be computed on a much smaller window, potentially including only the current row in the stream. Should the current activity level cross either of the Bollinger bands, we then mark that as a spike in the activity level. The formula for Bollinger bands needs to be changed based on the data distribution, that is, to determine exactly what multiple of standard deviation is appropriate.

Coming back to my point about openness and extensibility, as you can see in the example queries above, you could execute a very similar query in Oracle or SQL server. Key features such as windowed aggregation functions, often called SQL OLAP functions, have been in SQLstream for a long time. Interestingly, SQLstream did not support STDDEV() windowed aggregation function during the POC. A lot of the SQL experts will know STDDEV can be easily rewritten using a formula involving AVG. Our Chief Architect, Julian Hyde, was quick enough to “sweeten” the deal by adding the “syntactic sugar” necessary to support STDDEV natively.

I am sure a lot of you readers have interesting ideas and questions. Please feel free to post them here and I will be happy to engage in conversation.

Posted under Streaming SQL

Vice President Marketing
June 2, 2010

Just back from the 2010 Intelligent Transportation Society of America’s Annual Meeting.  For those unfamiliar with intelligent transportation, I am not referring to the “shovel ready” projects that have been funded by President Obama as part of the economic stimulus package. These projects were designed to spend money and create jobs, thereby, stimulating the economy. Unlike the federal “shovel ready” projects, “network ready” intelligent transportation technologies and projects are rapidly being adopted and implemented by local and state departments of transportation that must still operate under fixed or reduced budgets. These local and state DOTs are using new technologies to “Do More with Less.”

ITSAIntelligent Transportation aims to reduce costs, delays, pollution, injuries and deaths by connecting infrastructure control and monitoring systems to the network and enabling these systems, and their operators, to communicate in real-time. Some examples of intelligent transportation solutions and control systems include dynamic speed limits that change according to traffic and road conditions, stop lights that know when you can go and the FasTrak electronic toll system that reduces congestion on the Golden Gate Bridge and other Bay Area bridges. Real-time technology is essential if these dynamic control systems are to collect your toll at 45 miles per hour or detect when it is safe to proceed through an intersection.

All of these intelligent transportation systems and devices can be thought of as “sensors” on the network. The data is collected by the sensors, streamed to a server, analyzed and eventually stored in a warehouse. (Imagine the final scene from Raiders of The Lost Ark, except with crates full of hard drives). Meanwhile, the analytic results are communicated back to the original sources (stop lights, toll booths and electronic road information signs) as well as to the mobile devices in your vehicle.

In some cases, new intelligent transportation solutions need to be integrated with legacy systems. In other cases, they simply need to be able to talk to each other. Thus, it becomes imperative that all new intelligent transportation solutions be built on a set of common, open standards. In the long run, solutions built on open standards reduce the total costs to those who implement and maintain the solutions. Open standards, and in particular, the global use of open data standards, within the intelligent transportation industry is essential, not just so that different sensors on the network and IT solutions can communicate with each other, but so that drivers can experience consistent and safe journeys as they cross from federal highways to state and local roads, always in contact with intelligent transportation systems that control these roads.


Vice President Marketing
October 29, 2009

Gravity Bear and SQLstream today announced a partnership to bring cutting edge real-time analytic technologies to games for social networks. Established to create a new breed of social games, Gravity Bear is poised to create engaging, original content for social gaming platforms. The relationship marks a significant point of convergence for the interactive entertainment and the enterprise software industries.

SQLstream is the first company to provide real-time monitoring and business intelligence using the ISO standard SQL language. Bridging the gap between operational intelligence systems and data warehouses, SQLstream greatly reduces the time it takes for information to flow between products and content providers. This revolutionary system allows for real-time monitoring of virtual ecosystems and economies, placing Gravity Bear at the forefront of an all-new market opportunity that delivers game content to social networks faster and more efficiently than previously possible.

“Our strategic partnership with SQLstream will enable Gravity Bear to measure and understand how players are interacting with our games in real-time and respond faster than ever, delivering the online experience that players really want from social gaming,” said Phil Shenk, co-founder and CEO of Gravity Bear. “It`s a rapidly growing market that we only see expanding further. The casual games industry is constantly evolving and it is very exciting to be in a position to offer players something new that we believe could change the face of entertainment on community sites.”

“Our work with Gravity Bear is leading to an exciting new business opportunity in the social games market,” said Damian Black, President and CEO of SQLstream, “The technologies we will be integrating with Gravity Bear`s unique game design philosophy will help customize game content for the masses, increasing loyalty and resulting in a new kind of relationship between the player and content providers.”

The partnership between SQLstream and Gravity Bear arrives at a time when the social games market is experiencing unprecedented growth. Social platforms such as Facebook have become the birthplace of a new gaming movement with a proven audience of tens of millions. Gravity Bear will use the latest in cutting edge technology, such as SQLstream, to establish new entertainment products that evolve and develop with each player’s input and participation.

Gravity Bear is currently developing an original IP built to satisfy the company’s core mission of providing new entertainment products for an all-new era of social gaming.

The Gravity Bear and SQLstream teams will be attending the Virtual Goods Summit, held in San Francisco on October 29th and 30th, and are available for meetings and interviews. Additionally, members of both teams can be scheduled for joint press and analyst calls.

About Gravity Bear

Gravity Bear is a new breed of social games company where dedicated game creators share the goal of making unique social games through a steady diet of creativity, fun and contemporary design. Gravity Bear was founded in 2008 by co-founder and gaming industry veteran Phil Shenk to build a seasoned team of like-minded talent devoted to making casual games as distinct as the players themselves. Gravity Bear set up shop in sunny Emeryville, California, where they share office space with a tribe of equally motivated hamsters. For more information on Gravity Bear, please visit: gravitybear.com. Follow the Gravity Bear Blog at gravitybear.com/blog.

About SQLstream Inc.

SQLstream is making the real-time web possible, monitoring data streams to deliver instant alerts, analytics and answers to business decision makers. Using the industry standard SQL language, SQLstream executes queries on the wire, before data reaches the warehouse. Built on open source technologies, SQLstream enhances existing business intelligence solutions while maximizing the value received from your organization’s data when really urgent analytics are required. SQLstream’s investors and advisors include Bob Frankenberg, former CEO of Novell and current Board Member at National Semiconductor, Dick Watts, former member of Hewlett-Packard’s Executive Committee and Duane Zitzner, former HP Executive. SQLstream is headquartered in San Francisco, California and is on the web at www.sqlstream.com.

ONE PR Studio

Jeane Wong / Juan Castro, 510-893-3271
jeane@oneprstudio.com
juan@oneprstudio.comSQLstream

877-571-5775
PR@sqlstream.com


Vice President Marketing
April 7, 2009

April 7, 2009. Perez Hilton tweets and then blogs that Lindsay Lohan was just spotted leaving Santa Monica Pier while drinking a Red Bull. Paparazzi swarm the promenade as curious onlookers follow Twitter from their iPhones. Tipped off by their new real-time BI solution, a cutting edge advertising executive receives a mobile alert that there is a spike in Twitter activity about a client. The agency shelves the usual morning Raisin Bran ads for an energy drink campaign, all in real-time. Sales volume surges as the agency establishes web advertising leadership by delivering unique value to their client. The morning ends with executives on their Blackberries discussing the rapid progress made toward KPIs and sales goals while drinking a…….

Fresh off the Pentaho Partner Summit, SQLstream, Pentaho and SQL Power announce a collaborative partnership to deliver actionable, affordable real-time business intelligence. This easily integrated solution will query, triage and analyze high volume data feeds such as Twitter, on the wire, then deliver it via sub-second updates into an executive business intelligence dashboard.

SQLstream’s real-time analytics engine bridges the gap between operational business systems and the data warehouse. Pentaho is the world’s most widely deployed open source business intelligence suite and is based on the open source Mondrian OLAP server. SQL Power is the premier business intelligence and data migration consultancy in Canada, specializing in the implementation of cost effective BI solutions.

“Committed to open-source and open standards, the SQLstream, Pentaho & SQL Power alliance will break down the barriers that have prevented business leaders from capitalizing on their data,” said Damian Black, CEO of SQLstream. “We do more than monitor ad campaigns in real-time. SQLstream enables continuous revenue optimization by enabling ad campaigns to be adjusted in real-time, while at the same time optimizing pricing, catching fraud, detecting service failure and recognizing customer disaffection. By processing the data continuously in real-time, the classic ELT/ETL bottlenecks are removed from the data warehouse.”

“Enterprises need more from their data warehouses but can not afford to be locked into expensive and proprietary architectures,” said Lance Walter, Vice President of Marketing at Pentaho, the open source business intelligence leader. “Pentaho, SQLstream & SQL Power are dedicated to the open architectures that have become a non-negotiable requirement for enterprise software.”

“In this tough economic climate, now more than ever, companies need real time business intelligence in order to improve efficiencies and quickly react to changing market conditions. The integration of SQL Stream’s real-time technology, Pentaho’s OLAP technology and SQL Power’s Wabit Dashboard functionality will deliver affordable, real time performance metrics to progressive organizations around the world,” said Sam Selim, President of SQL Power. “Together, SQL Power, Pentaho & SQLstream can help clients react to new information and fresh business opportunities in real time, thus improving performance on key metrics and positively impacting their bottom line.”

Both SQLstream and Pentaho executives will be speaking at the 2009 MySQL Conference and Expo, April 20th – 23rd in Santa Clara, California. On Tuesday at 11:55 a.m., Pentaho (booth #308) will explain “MySQL Data Warehousing.” On Wednesday at 10:50 a.m., SQLstream (booth #117) will present a technical session on “Eliminating MySQL Bottlenecks with Continuous ETL” as part of the Products & Services track. To see a demo of the Real-Time Twitter BI application, attend SQLstream’s session or visit SQLstream at booth #117 and have yourself a Red Bull.

About SQLstream, Inc.

SQLstream is making the real-time enterprise possible, enabling business decisions based on up to the minute information. Using the industry standard SQL language, SQLstream executes queries before data reaches the warehouse. Built on open source technologies, SQLstream enhances existing business intelligence solutions while maximizing the value received from your organization’s data warehouse when really urgent analytics are required. SQLstream’s investors and advisors include Bob Frankenberg, former CEO of Novell and current Board Member at National Semiconductor, Dick Watts, former member of Hewlett-Packard’s Executive Committee and Duane Zitzner, former HP Executive. SQLstream is headquartered in San Francisco, California and is on the web at www.sqlstream.com.

About Pentaho Corporation

Pentaho Corporation is the commercial open source alternative for Business Intelligence (BI). Pentaho BI Suite Enterprise Edition provides comprehensive reporting, OLAP analysis, dashboards, data integration, data mining and a BI platform that have made it the world’s leading and most widely deployed open source BI suite. Pentaho’s commercial open source business model eliminates software license fees, providing support, services, and product enhancements via an annual subscription. In the years since Pentaho’s inception as the pioneer in commercial open source BI, Pentaho’s products have been downloaded more than three million times, with production deployments at companies ranging from small organizations to The Global 2000. For more information, visit www.pentaho.com.

About SQL Power

Founded in 1988, SQL Power Group is a leading Business Intelligence software and consulting firm. Our proven methodology, highly skilled consultants and our use of state-of-the-art open source productivity tools have delivered Business Intelligence solutions of the highest quality to value-oriented clients. SQL Power’s dedication and ingenuity have established us as the premier Business Intelligence and Data Migration solution provider in Canada. For more information, visit www.sqlpower.ca.


Vice President Marketing
January 26, 2009

San Francisco–January 26th, 2009: SQLstream Inc. today announced the release of SQLstream 2.0, making really urgent analytics possible for enterprises that need to slash the time and costs to receive real-time information from business intelligence systems. SQLstream reduces the time required to turn data into information by analyzing live data on the wire using the industry standard SQL language and off the shelf adapters for capturing data from databases, applications and streaming web feeds such as Twitter, RSS and Atom.

Standards based, real-time integration reduces costs and eliminates latency, avoiding the slow and expensive staging and preprocessing of data arriving in the data warehouse. “SQLstream helps companies to do more with less,” said Damian Black, CEO of SQLstream. “SQLstream 2.0 enables offloading of the data warehouse to better manage high volumes of data. Companies can experience dramatic reductions in the time and costs for querying and processing data in real-time, getting immediate answers to business critical questions, while ensuring that the data warehouse remains accurate.”

IT managers can implement business requirements faster and without taking any systems down. It is now possible to utilize real-time analytics for account management, compliance monitoring, exception reporting and fraud detection at the point of transaction by adding SQLstream to your existing infrastructure.

“SQLstream complements existing business intelligence and data warehouse systems,” said Dave Henry, Vice President of Services at Pentaho, an acknowledged business intelligence leader, “SQLstream is dedicated to the open architectures that have become a non-negotiable requirement for enterprise software.”

Built on open source and open standards, SQLstream is a next generation SQL engine based on the open source Eigenbase Project for building data management systems, to which SQLstream is a core contributor. Frowning on the proprietary solutions of many CEP and ETL vendors, SQLstream uses the industry standard SQL language for data integration with existing systems.

Business and IT executives can now leverage the wide availability of affordable SQL talent to implement real-time business solutions. “What really matters is that projects are successful, deployed on time and can be maintained with existing skills,” said Damian Black, “With SQLstream 2.0, businesses will configure solutions to their own requirements using their own in house skills as well as lower cost, readily available SQL consultants.”

SQLstream 2.0 features include:

• User-defined analytics functions enabling developers to specify streaming data operations for inclusion into a standard SQL query.

• Enhanced support for aggregating streaming data: moving averages, totals and pattern matching of events over different time windows.

• Standard web feed adapters for Twitter, RSS and Atom queries.

• Full 64 bit client and server support.

• More details available at www.sqlstream.com/Products/products

About SQLstream

SQLstream is making the real-time enterprise possible, enabling business decisions based on up to the minute information. Using the industry standard SQL language, SQLstream executes queries before data reaches the warehouse. SQLstream’s investors and advisors include Bob Frankenberg, former CEO of Novell and current Board Member at National Semiconductor, Dick Watts, former member of Hewlett-Packard’s Executive Committee and Duane Zitzner, former HP Executive. SQLstream is headquartered in San Francisco, California and is on the web at http://www.sqlstream.com.

Contacts

For more information on SQLstream, please email pr@sqlstream.com.

Summary

SQLstream Inc. today announced SQLstream 2.0. Uses SQL language to make real-time business intelligence and urgent analytics possible, slashing the time and costs to information.

 

Posted under Press Releases