<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SQLstream - Real-time Big Data &#187; Blog</title>
	<atom:link href="http://www.sqlstream.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sqlstream.com</link>
	<description>Real-time Big Data integration with in-memory operational intelligence</description>
	<lastBuildDate>Thu, 17 May 2012 01:06:34 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>From Data to Knowledge with real-time streaming applications</title>
		<link>http://www.sqlstream.com/blog/2012/05/from-data-to-knowledge-with-real-time-streaming-applications/</link>
		<comments>http://www.sqlstream.com/blog/2012/05/from-data-to-knowledge-with-real-time-streaming-applications/#comments</comments>
		<pubDate>Thu, 10 May 2012 20:46:50 +0000</pubDate>
		<dc:creator>Marc Berkowitz</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Real-Time Analytics]]></category>
		<category><![CDATA[Real-time Industry Solutions]]></category>
		<category><![CDATA[Sensor Networks]]></category>
		<category><![CDATA[Streaming SQL]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Real-time]]></category>

		<guid isPermaLink="false">http://www.sqlstream.com/?p=2746</guid>
		<description><![CDATA[This week I&#8217;m attending an interesting conference at UC Berkeley called the &#8220;Berkeley conference on Streaming Data&#8221;.  The organizers are primarily astronomers and statisticians, but the talks discuss issues and solutions to streaming data problems across a wide selection of scientific areas and engineering applications.  Real-time streaming Big Data applications presented included oceanography biology genetics, reading&#160;<a class="readmore" href="http://www.sqlstream.com/blog/2012/05/from-data-to-knowledge-with-real-time-streaming-applications/">Read more &#8594;</a>]]></description>
			<content:encoded><![CDATA[<p>This week I&#8217;m attending <a href="http://lyra.berkeley.edu/CDIConf/" target="_blank">an interesting conference at UC Berkeley called the &#8220;Berkeley conference on Streaming Data&#8221;</a>.  The organizers are primarily astronomers and statisticians, but the talks discuss issues and solutions to streaming data problems across a wide selection of scientific areas and engineering applications.  Real-time streaming Big Data applications presented included oceanography biology genetics, reading handwriting, astrophysics, particle physics, recommendation engines for social media, and inevitably, real-time fraud detection from live data feeds.</p>
<p>I presented on a deployment of SQLstream as a <a href="http://www.slideshare.net/sqlstream/dynamic-scaling-realtimebigdata-12885213" target="_blank">Dynamically Scalable Cloud Platform for the Real-Time Detection of Seismic Events</a>. Based on work with UCSD seismologists, SQLstream has been deployed to detect significant events in data collected from a large grid of seismic sensors. A large-scale data infrastructure (the OOI/CI) provides raw signal data over an AMQP message bus.</p>
<div id="attachment_447" class="wp-caption alignleft" style="width: 310px"><a href="http://www.sqlstream.com/wp-content/uploads/2011/07/Seismic_Events_Plot.png"><img class="size-medium wp-image-447" title="Seismic Events" src="http://www.sqlstream.com/wp-content/uploads/2011/07/Seismic_Events_Plot-300x185.png" alt="" width="300" height="185" /></a><p class="wp-caption-text">Plot of Seismic Events</p></div>
<p>SQLstream monitors live seismic data feeds in real-time, applying heuristic algorithms that look for patterns indicating earthquakes. The live system scales dynamically across multiple servers in a cloud environment based on the current demand. <a href="http://www.slideshare.net/sqlstream/dynamic-scaling-realtimebigdata-12885213" target="_blank">You can view the presentation here</a>.  I also blogged previously on the application <a href="http://www.sqlstream.com/blog/2011/07/real-time-seismic-monitoring-in-the-cloud-with-sqlstream/">here</a>.</p>
<p>In conclusion, I have two main observations from the conference so far (it continues until Friday). The first is that the majority of fields in science and technology appear to have a Big Data and often a real-time Big Data problem.  Secondly, the extent of the innovation and computer science resources dedicated to solving these problems.  In particular for this conference, developing algorithms for data analysis and machine learning (that is automatic pattern recognition) that work on streams of flowing data.  It&#8217;s clear that traditional data management and even Big Data batch-based methods don&#8217;t work when you need continuous results from dynamic data. And the amount of data is huge.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sqlstream.com/blog/2012/05/from-data-to-knowledge-with-real-time-streaming-applications/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Text Analytics, and real-time Big Data</title>
		<link>http://www.sqlstream.com/blog/2012/04/text-analytics-and-real-time-big-data/</link>
		<comments>http://www.sqlstream.com/blog/2012/04/text-analytics-and-real-time-big-data/#comments</comments>
		<pubDate>Wed, 25 Apr 2012 22:12:40 +0000</pubDate>
		<dc:creator>Ronnie Beggs</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Real-time Industry Solutions]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Real-time]]></category>
		<category><![CDATA[Stream Reasoning]]></category>
		<category><![CDATA[Text Analytics]]></category>
		<category><![CDATA[unstructured data]]></category>

		<guid isPermaLink="false">http://www.sqlstream.com/?p=2716</guid>
		<description><![CDATA[The Text Analytics Summit in London this week was an opportunity to catch up on the latest trends and state of the Text Analytics market.  An interesting couple of days with a few themes emerging. Firstly, Big Data.  Not entirely unexpected, but almost every presentation referred to Big Data in some shape or form.  In&#160;<a class="readmore" href="http://www.sqlstream.com/blog/2012/04/text-analytics-and-real-time-big-data/">Read more &#8594;</a>]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.textanalyticsnews.com/text-mining-conference-europe/">Text Analytics Summit in London</a> this week was an opportunity to catch up on the latest trends and state of the Text Analytics market.  An interesting couple of days with a few themes emerging.</p>
<p>Firstly, Big Data.  Not entirely unexpected, but almost every presentation referred to Big Data in some shape or form.  In part this was referring to the volume of data to be processed, but primarily in the context of databases for the storage and processing of unstructured data of any volume.</p>
<p>Although not discussed explicitly, there&#8217;s obviously a search for business models that work.  Most applications were B2B platforms, sold as a package of product, services and consultancy, enabling organizations to better mine text data for market and competitor intelligence.  However, some were seeking to monetize through subscriptions to information feeds.</p>
<div id="__ss_12676368" style="width: 340px;"><strong style="display: block; margin: 12px 0 4px;"><a title="Sqlstream Real-time Text Analytics" href="http://www.slideshare.net/sqlstream/sqlstream-realtime-text-analytics" target="_blank">Sqlstream Real-time Text Analytics</a></strong> <object id="__sse12676368" width="340" height="284" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="wmode" value="transparent" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sqlstreamtextanalytics2012-realtimetextanalytics-120424172839-phpapp01&amp;rel=0&amp;stripped_title=sqlstream-realtime-text-analytics&amp;userName=sqlstream" /><param name="allowscriptaccess" value="always" /><param name="allowfullscreen" value="true" /><embed id="__sse12676368" width="340" height="284" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=sqlstreamtextanalytics2012-realtimetextanalytics-120424172839-phpapp01&amp;rel=0&amp;stripped_title=sqlstream-realtime-text-analytics&amp;userName=sqlstream" allowFullScreen="true" allowScriptAccess="always" wmode="transparent" allowscriptaccess="always" allowfullscreen="true" /> </object></div>
<p>&nbsp;</p>
<p>For SQLstream, we presented on the use of real-time text analytics for improving incident detection and prediction.  In particular, the use of real-time Twitter and text messages for identifying Quality of Experience issues with IP content services, but also the use of Twitter for improving real-time incident detection in transportation networks.  And in line with the rest of the conference, we did our bit for Big Data, describing how real-time streaming integration and analytics can be built on unstructured data analytics as an integrated real-time Big Data and Hadoop platform.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sqlstream.com/blog/2012/04/text-analytics-and-real-time-big-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Joining real-time structured and unstructured data feeds</title>
		<link>http://www.sqlstream.com/blog/2012/04/joining-real-time-structured-and-unstructured-data-feeds/</link>
		<comments>http://www.sqlstream.com/blog/2012/04/joining-real-time-structured-and-unstructured-data-feeds/#comments</comments>
		<pubDate>Thu, 19 Apr 2012 21:09:33 +0000</pubDate>
		<dc:creator>Ronnie Beggs</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[Real-Time Analytics]]></category>
		<category><![CDATA[Stream Computing Blog]]></category>
		<category><![CDATA[GATE]]></category>
		<category><![CDATA[Real-time]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Stream Reasoning]]></category>
		<category><![CDATA[Text Analytics]]></category>

		<guid isPermaLink="false">http://www.sqlstream.com/?p=2683</guid>
		<description><![CDATA[Joining real-time structured and unstructured data feeds for better accuracy and reliability from your operational intelligence, and the Text Analytics Summit, 2012, London. Three IT trends have emerged over the past year – Big Data, real-time and the importance of unstructured data. Taking the latter first, there is an increasing awareness that much of the&#160;<a class="readmore" href="http://www.sqlstream.com/blog/2012/04/joining-real-time-structured-and-unstructured-data-feeds/">Read more &#8594;</a>]]></description>
			<content:encoded><![CDATA[<p><strong>Joining real-time structured and unstructured data feeds for better accuracy and reliability from your operational intelligence, and the <a href="http://www.textanalyticsnews.com/text-mining-conference-europe/">Text Analytics Summit, 2012, London</a>.<a href="http://www.textanalyticsnews.com/text-mining-conference-europe/"><img class="alignright size-full wp-image-2167" title="Text Analytics News" src="http://www.sqlstream.com/wp-content/uploads/2011/12/structuredata.jpeg" alt="" width="136" height="39" /></a></strong></p>
<p>Three IT trends have emerged over the past year – Big Data, real-time and the importance of unstructured data. Taking the latter first, there is an increasing awareness that much of the data we have available to us today is unstructured (<a href="http://www.cloudera.com/blog/2011/06/if-80-of-data-is-unstructured-is-it-the-exception-or-a-new-rule/" target="_blank">Cloudera amongst the many claiming 80% of all data is unstructured</a>).  Unstructured data includes text messages, documents, tweets emails and video content. There’s also a growing industry for tools and software that perform unstructured data analytics – primarily text analytics using semantic modeling, tagging and subsequent analysis.</p>
<p>The past year has also seen Big Data and Hadoop emerge from the rarefied atmosphere of California’s Silicon Valley into mainstream IT.  Driven by statistics such as <a href="http://m.bbc.co.uk/news/business-17682304" target="_blank">90% of all data available today has been generated in the past two years</a>, Big Data as a functional area for primarily unstructured data is here to stay, and is effectively supercomputing lite for the masses.</p>
<p><strong>The need for real-time streaming data management</strong></p>
<p>However, the real-time trend is less well served today by either Hadoop or by the currently available tools and software for unstructured data analytics. Real-time is about the need for immediate detection and response – turning data sources into live data feeds, and processing the data on the fly, then loading batch based distributed platforms such as Hadoop as an output data stream.</p>
<p><strong>‘Stream Reasoning’</strong></p>
<p>I’ve also seen the term ‘stream reasoning’ used to describe the real-time processing of unstructured data, although this is still an area that is less well developed and understood than the more mainstream text analytics from stored data.  ‘Streaming Reasoning’ is the ability to process and respond to semantic knowledge about tweets, messages and other social media interaction in real-time, on the fly. The diagram below illustrates how a semantic modeling library has been plugged into a real-time streaming pipeline in SQLstream – the example is based on SQLstream’s GATE UDX but any library with reasonable performance and a query response API can be plugged in.</p>
<p><a href="http://www.sqlstream.com/wp-content/uploads/2012/04/GATE-4.png"><img class="aligncenter size-medium wp-image-2686" title="Combining streaming structured and unstructured live data feeds" src="http://www.sqlstream.com/wp-content/uploads/2012/04/GATE-4-300x183.png" alt="Combining streaming structured and unstructured live data feeds" width="300" height="183" /></a></p>
<p>Unstructured data feeds, such as text messages and tweets, are streamed through the semantic tagging UDX and library, with the output of this stage being real-time streams of semantic tagged data.  The data can then be analyzed and frequency charted in real-time.</p>
<p><strong>Text Analytics Summit, 2012, London</strong></p>
<p>I&#8217;ll be speaking on this topic at the  <a href="http:/http://www.textanalyticsnews.com/text-mining-conference-europe//" target="_blank">Text Analytics Summit, 2012, London</a>.  I&#8217;ll be discussing how to combine streaming reasoning (admittedly, mostly Twitter messages) with structured data, with the objective of improving the overall accuracy and reliability of the resulting operational intelligence.  I&#8217;ll be using a couple of examples – customer experience management for IP content services such as VoIP and VoD, and also improving the accuracy and reliability of traffic congestion information and travel time information &#8211; how can text analysis of tweets and messages help to pinpoint the severity of road network traffic problems.</p>
<p>Look forward to seeing you there, or if you can’t make, I’ll be blogging on the highlights next week.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sqlstream.com/blog/2012/04/joining-real-time-structured-and-unstructured-data-feeds/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Structure Data, New York, and streaming Big Data for Hadoop</title>
		<link>http://www.sqlstream.com/blog/2012/04/structure-data-new-york-and-streaming-big-data-for-hadoop/</link>
		<comments>http://www.sqlstream.com/blog/2012/04/structure-data-new-york-and-streaming-big-data-for-hadoop/#comments</comments>
		<pubDate>Mon, 02 Apr 2012 22:10:32 +0000</pubDate>
		<dc:creator>Ronnie Beggs</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[In the News]]></category>
		<category><![CDATA[HADOOP]]></category>
		<category><![CDATA[Real-time]]></category>
		<category><![CDATA[Streaming SQL]]></category>

		<guid isPermaLink="false">http://www.sqlstream.com/?p=2482</guid>
		<description><![CDATA[Last week SQLstream sponsored and CEO Damian Black presented at Structure Data in New York, a conference exploring “the technical and business opportunities spurred by the growth of big data”. It’s clear that Big Data has moved on considerably in a very short space of time. From the Silicon Valley, 101 world of Java developers&#160;<a class="readmore" href="http://www.sqlstream.com/blog/2012/04/structure-data-new-york-and-streaming-big-data-for-hadoop/">Read more &#8594;</a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.sqlstream.com/wp-content/uploads/2012/03/logo_structuredata_sm.png"><img class="alignright size-full wp-image-2445" title="logo_structuredata_sm" src="http://www.sqlstream.com/wp-content/uploads/2012/03/logo_structuredata_sm.png" alt="" width="150" height="43" /></a>Last week SQLstream sponsored and CEO Damian Black presented at <a title="Structure Data 2012, New York" href="http://event.gigaom.com/structuredata/">Structure Data in New York</a>, a conference exploring “the technical and business opportunities spurred by the growth of big data”.</p>
<p>It’s clear that Big Data has moved on considerably in a very short space of time. From the Silicon Valley, 101 world of Java developers and Hadoop, into the mainstream wider business world (but still with Hadoop!).</p>
<p>Some themes emerging from the conference:</p>
<ul>
<li>The basic need to deliver <strong>high performance, massively scalable computing infrastructure</strong> as data volumes grow exponentially. It’s clear that the pain from structured and unstructured data is driving different approaches at different stages in the data management lifecycle – better visualizations, better cleansing and filtering, and a better understanding of the appropriate analytics tools that are most applicable at each stage.</li>
<li><strong>The emergence of the SQL layer</strong>. It’s clear Hadoop has its strengths and is here to stay. It’s effectively ‘supercomputing lite’ and given today’s data volumes, is just the tool for the job. However, there are a couple of trends emerging. First, is it actually necessary to store all the data, when much of it is obviously not of interest? Second, once the initial analysis of both all structured and unstructured data is achieved, there’s an emerging layer above Hadoop that’s looking very structured.  Both these functions are looking much more SQL-like.</li>
<li><strong>Real-time, low latency analytics</strong>. Hadoop is not, nor does not claim to be, a low latency, real-time data management platform. There is a well-defined business need to analyze log file, sensor and network data in real-time (sub-second to a few minutes latency), but also to stream the arriving data through to Hadoop for further analysis. Obviously this layer needs to as scalable, if not more so, than the underlying Hadoop platform.</li>
</ul>
<p><a title="Damian Black, Structure Data 2012" href="http://www.livestream.com/gigaombigdata/video?clipId=pla_a59ab460-f515-4dab-b513-63a913c78ee8">Damian’s presentation</a> Structure Data focused on relational streaming &#8211; massive-scale parallel data processing using SQL, generating real-time results from streaming input data. The talk described relational streaming as a standalone real-time management layer, and also SQLstream integrated with Hadoop as the streaming layer in the Big Data stack (<a title="Damian Black, Structure Data 2012" href="http://gigaom.com/2012/03/22/sqlstream-structure-data-2012/">you can also read the GigaOM report in the presentation here</a>).</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sqlstream.com/blog/2012/04/structure-data-new-york-and-streaming-big-data-for-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to query Big Data quickly: Stream those queries!</title>
		<link>http://www.sqlstream.com/blog/2012/03/how-to-query-big-data-quickly-stream-those-queries/</link>
		<comments>http://www.sqlstream.com/blog/2012/03/how-to-query-big-data-quickly-stream-those-queries/#comments</comments>
		<pubDate>Mon, 26 Mar 2012 23:16:12 +0000</pubDate>
		<dc:creator>Ronnie Beggs</dc:creator>
				<category><![CDATA[In the News]]></category>

		<guid isPermaLink="false">http://www.sqlstream.com/?p=2466</guid>
		<description><![CDATA[GigaOM reports on Damian Black, SQLstream CEO, talking about streaming Big Data at Structure Data, New York,March 21 &#8211; 22. In the talk entitled &#8220;Streaming Big Data: Millions of events per second&#8221;, Damian discussed the similarities between Hadoop Map/Reduce and the parallel, distributed architecture used for streaming data processing in SQLstream. Click here to read&#160;<a class="readmore" href="http://www.sqlstream.com/blog/2012/03/how-to-query-big-data-quickly-stream-those-queries/">Read more &#8594;</a>]]></description>
			<content:encoded><![CDATA[<p>GigaOM reports on Damian Black, SQLstream CEO, talking about streaming Big Data at Structure Data, New York,March 21 &#8211; 22. In the talk entitled &#8220;Streaming Big Data: Millions of events per second&#8221;, Damian discussed the similarities between Hadoop Map/Reduce  and the parallel, distributed architecture used for streaming data processing in SQLstream.</p>
<p><a href="http://gigaom.com/2012/03/22/sqlstream-structure-data-2012/">Click here to read the GigaOM report.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.sqlstream.com/blog/2012/03/how-to-query-big-data-quickly-stream-those-queries/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Real-time Big Data analytics and millions of events per second</title>
		<link>http://www.sqlstream.com/blog/2012/03/real-time-big-data-analytics-and-millions-of-events-per-second/</link>
		<comments>http://www.sqlstream.com/blog/2012/03/real-time-big-data-analytics-and-millions-of-events-per-second/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 17:18:03 +0000</pubDate>
		<dc:creator>Ronnie Beggs</dc:creator>
				<category><![CDATA[Big Data]]></category>
		<category><![CDATA[Real-Time Analytics]]></category>
		<category><![CDATA[Streaming SQL]]></category>

		<guid isPermaLink="false">http://www.sqlstream.com/?p=2438</guid>
		<description><![CDATA[Big Data is here to stay. The breadth of the term Big Data may change as it becomes as much a marketing imperative as the &#8216;Cloud&#8217; word, but the requirement for &#8216;supercomputing lite&#8217; processing for the non-supercomputing world of enterprise data is a must have. The rise of Big Data has happened in parallel with&#160;<a class="readmore" href="http://www.sqlstream.com/blog/2012/03/real-time-big-data-analytics-and-millions-of-events-per-second/">Read more &#8594;</a>]]></description>
			<content:encoded><![CDATA[<div id="attachment_2334" class="wp-caption alignleft" style="width: 160px"><a href="http://www.sqlstream.com/"><img class="size-thumbnail wp-image-2334 " title="SQLstream Big Data" src="http://www.sqlstream.com/wp-content/uploads/2012/03/sql_blue_elephant-300x214-150x150.jpg" alt="" width="150" height="150" /></a><p class="wp-caption-text">Visit our new website to find out more about real-time Big Data applications</p></div>
<p>Big Data is here to stay. The breadth of the term Big Data may change as it becomes as much a marketing imperative as the &#8216;Cloud&#8217; word, but the requirement for &#8216;supercomputing lite&#8217; processing for the non-supercomputing world of enterprise data is a must have.</p>
<p>The rise of Big Data has happened in parallel with the emergence of real-time operational intelligence, and the extension of real-time analytics into the world of real-time updates and process control. Much of the recent interest has focussed on how these two worlds merge into a single complementary solution.</p>
<p>The NoSQL BigData platforms offer massively scalable, resilient data processing over commodity hardware. Ideally suited to scaling large scale data problems over hundreds or thousands of servers. However, platforms such as Hadoop do not support, nor were designed to support, real-time streaming data processing and analytics. Their forte is the batch-based, highly scalable, store-compute loop of map/reduce.</p>
<p>That&#8217;s where SQLstream comes in. SQLstream collects and conditions real-time updates from sources such as log files, sensor networks and GPS events, and both integrates streaming data into and from Big Data stores, but also generates real-time analytics from the data as they stream past. The SQLstream architecture also has parallels to that of map/reduce. SQLstream uses Relational Streaming, which is a paradigm for processing streaming Big Data tuples using standard SQL queries. SQL offers strong potential for automatic optimization and distributed parallel processing of streaming data. Whereas platforms such as Hadoop execute batch queries over stored tuples, SQLstream and Relational Streaming executes continuous queries over arriving data.</p>
<p><a href="http://event.gigaom.com/structuredata/"><img class="alignleft size-full wp-image-2445" title="Structure Data 2012" src="http://www.sqlstream.com/wp-content/uploads/2012/03/logo_structuredata_sm.png" alt="" width="150" height="43" /></a>We&#8217;re also at <a href="http://event.gigaom.com/structuredata/">Structure Data</a> this week in New York, where our CEO, Damian Black, will be presenting on the wider area of streaming Big Data and massive scalability. However, if you are attending, visit us for a demo of the &#8216;millions of events per second&#8221; program, and a demonstration of massively parallel stream processing on an Elastic Compute Cloud.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sqlstream.com/blog/2012/03/real-time-big-data-analytics-and-millions-of-events-per-second/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Parallel scheduling of stream execution in SQLstream</title>
		<link>http://www.sqlstream.com/blog/2012/03/parallel-scheduling-of-stream-execution-in-sqlstream/</link>
		<comments>http://www.sqlstream.com/blog/2012/03/parallel-scheduling-of-stream-execution-in-sqlstream/#comments</comments>
		<pubDate>Tue, 13 Mar 2012 06:37:10 +0000</pubDate>
		<dc:creator>Marc Berkowitz</dc:creator>
				<category><![CDATA[SQLstream Tutorials]]></category>
		<category><![CDATA[Streaming SQL]]></category>
		<category><![CDATA[open standards]]></category>
		<category><![CDATA[stream computing]]></category>

		<guid isPermaLink="false">http://www.sqlstream.com/blog/?p=905</guid>
		<description><![CDATA[SQL is a declarative language &#8211; a SQL query is a specification for the result, it’s neither a recipe nor a program to produce the results. A traditional relational database query returns a set of rows, the ResultSet. A streaming SQL query in SQLstream returns a stream of rows. That is, the ResultSet may never&#160;<a class="readmore" href="http://www.sqlstream.com/blog/2012/03/parallel-scheduling-of-stream-execution-in-sqlstream/">Read more &#8594;</a>]]></description>
			<content:encoded><![CDATA[<p>SQL is a declarative language &#8211; a SQL query is a specification for the result, it’s neither a recipe nor a program to produce the results. A traditional relational database query returns a set of rows, the ResultSet. A streaming SQL query in SQLstream returns a stream of rows. That is, the ResultSet may never end. In a traditional relational database query, all the rows are fetched, and the ResultSet scans them. With a relational streaming query, the result rows do not exist as yet &#8211; as time goes by they come into existence as arriving data are processed.</p>
<p>However, just like any relational database, the SQLstream stream computing Server has two main components:</p>
<ol>
<li>The query engine or planner, calculates the most efficient plan to produce the requested results – this is the query plan.</li>
<li>The data engine or kernel, executes the query plan to produce the results. The scheduler controls the execution process.</li>
</ol>
<h3>Streaming dataflow graphs</h3>
<p>Executing a query means computing the results from the inputs. For a streaming query that means processing the rows in the input streams as they arrive. The execution is organized as a dataflow graph, that is, a mathematical directed graph of nodes and arrows, where the nodes represent elementary operations on data, and the edges into and out of a node represent the input and output data streams. In effect, an assembly line that produces results, where the nodes are the machines or stages on the line.</p>
<p>A query plan defines one of these data flow graphs. The executor runs the data through the graph: it is responsible for executing the nodes, where each node consumes rows arriving on input edges and produces rows on the output edges. Of course each output edge is often the input of a downstream node.</p>
<h3>Multiple, connected query plans</h3>
<p>In a traditional static database, each query plan is independent and transitory, and operates against persistent tables and indexes. In a relational streaming platform, the query plans last forever and are interconnected. Although streams are used just like tables in SQL, they are not persistent, in fact they have no contents at all, and can be as rendezvous points that accept input rows and pass them on to their output consumers.</p>
<p>Now if the data flow graph were a physical system &#8212; say, a collection of transparent plastic straws with colored water flowing through them &#8212; then all the processing would be happening simultaneously. However, for the software abstraction of the streaming straw pipelines, it&#8217;s not practical or necessary to run all the nodes at the same time. It is the scheduler that manages this network of interconnected query plans, and when and how to execute each node.</p>
<p>In a traditional, static database, the result of a query is a set of rows that are computable all at once. The executor can give good performance by running one node at a time, pushing batches of rows through the graph. A streaming database is different. When the inputs are streams of recent events, arriving in real time, it&#8217;s important to produce the outputs fast enough so that the result rows are timely.</p>
<p>The execution works by pushing outputs, not by pulling inputs, and it means executing several nodes at the same time, whenever possible. This requires a finer management of the execution objects and the ability to schedule parallel execution on the nodes.</p>
<h3>Parallel scheduling of stream execution</h3>
<p>In SQLstream, multiple, interconnected query plans are being executed at the same time. Together they constitute a large dataflow graph in which each node is a mini data processing machine that performs a simple operation on its input data, and passes its output to the next node.</p>
<p>The scheduler is responsible for managing the interconnected dataflow graph. It keeps track of the status of each node: at any moment, some are running, some are ready to run, some are waiting for more input, some are waiting for their output to be consumed. Each node is allowed to run for a quota of time, and where possible nodes are selected to execute in parallel as separate threads.</p>
<p>The SQLstream scheduler may not want to be fair: some branches in the graph (some streams) may be more important, some may need high throughput, some may need lower latency. The application designer decides, and the SQLstream scheduler delivers.</p>
<h3><strong>Next time …</strong></h3>
<p>This is the first is a series of blogs discussed both the principles and practical examples of parallel stream execution. The next blog in the series will look at some real world examples, and how parallel execution is essential to deliver both high throughput and low latency requirements.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sqlstream.com/blog/2012/03/parallel-scheduling-of-stream-execution-in-sqlstream/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reading Streaming Data from Java &#8211; Part 1: The Three Rules of Streaming Data</title>
		<link>http://www.sqlstream.com/blog/2012/02/reading-streaming-data-from-java-part-1-the-three-rules-of-streaming-data/</link>
		<comments>http://www.sqlstream.com/blog/2012/02/reading-streaming-data-from-java-part-1-the-three-rules-of-streaming-data/#comments</comments>
		<pubDate>Fri, 17 Feb 2012 13:46:30 +0000</pubDate>
		<dc:creator>Richard Nelson</dc:creator>
				<category><![CDATA[SQLstream Tutorials]]></category>
		<category><![CDATA[Streaming SQL]]></category>
		<category><![CDATA[JDBC]]></category>

		<guid isPermaLink="false">http://www.sqlstream.com/blog/?p=895</guid>
		<description><![CDATA[One of the great advantages of SQLstream as an analytical platform is that it uses the most popular, standardized language for data analysis, SQL. SQLstream worked to make only the minimum number of extensions to SQL necessary to encompass the streaming data paradigm, so that most streaming SQL pipelines look almost indistinguishable from SQL for&#160;<a class="readmore" href="http://www.sqlstream.com/blog/2012/02/reading-streaming-data-from-java-part-1-the-three-rules-of-streaming-data/">Read more &#8594;</a>]]></description>
			<content:encoded><![CDATA[<p>One of the great advantages of SQLstream as an analytical platform is that it uses the most popular, standardized language for data analysis, SQL. SQLstream worked to make only the minimum number of extensions to SQL necessary to encompass the streaming data paradigm, so that most streaming SQL pipelines look almost indistinguishable from SQL for reading static relational data. This enables data analysts to leverage virtually all of their existing SQL skills in the streaming context.</p>
<p>Similarly, SQLstream felt it was important to make the streaming environment feel familiar and productive to application developers as well, so SQLstream supports the standard JDBC interface for using streams, again with just the minimum extensions necessary to encompass streams.</p>
<p>This post assumes a basic familiarity with JDBC and its main components: connections, statements, and result sets. First we&#8217;ll look at these in their usual tabular context, then see what it takes to extend the model to streaming data. All the data items come from the <code>SALES</code> example schema that comes with SQLstream.</p>
<h2>Reading JDBC Data</h2>
<p>In a database world, the pattern for reading data is quite standardized: Connect to the database, execute a query, and read each row that comes back until there are no more rows. For example,</p>
<pre><code>
Connection c = getConnection();
try {
     Statement s = c.createStatement();
     ResultSet rs = s.executeQuery(“SELECT * FROM SALES.EMPS”);
     while (rs.next()) {
           System.out.println(rs.getString(“NAME”) + “ “ + rs.getString(“EMPID”));
     }
     rs.close();
     s.close();
     c.close();
} catch (SQLException se) {
}
</code></pre>
<p>This simple example loops through the entire EMPS table, printing the name and employee ID number for each row in the table, then finishes. The key for this model is that the result set is finite. Even if this were a very large table, the loop would eventually process every row. So you can handle reading the data from a table as a monolithic step in a sequence of procedures.</p>
<h2>The Challenge of Streams</h2>
<p>In a streaming data environment, however, you have to change a couple of your basic assumptions. In particular, I have found three Rules of Streaming that dictate how to write client code:</p>
<ol>
<li>There always might be more data.</li>
<li>You never know when the next row might arrive.</li>
<li>The rate of the rows matters.</li>
</ol>
<p>The same pattern shown earlier for reading finite tables <em>will</em> work for streams,  as long as you don&#8217;t expect your application to do anything else. For a streaming ResultSet, the <code>next()</code> method only returns false when the stream closes. In many applications, that might never happen, or at least might not happen for weeks or months or longer, so clearly you cannot simply wait for each stream to end.</p>
<p>This is particularly critical in an application such as SQLstream Studio, where a developer needs to be able to edit the definitions of objects and at the same time be able to watch data flowing in existing streams. These streaming data views – known as Inspect windows – have to function in an event-driven, multitasking environment. There can be a virtually unlimited number of them active at any given time, along with editors, console views, and other dynamic content. So nothing should block the updates of other views. And for a little added complexity, Studio also needs to be able to handle non-streaming items as well, so preferably the same code should handle either tables or streams.</p>
<p>Studio also has to deal with a number of other requirements that are fairly unique to the development environment, such as how to manage updating a human-readable window of maybe 20 or 30 rows at a time from a stream that might be flowing at many thousands of rows per second. For now, we will just focus on the tasks common to handling streams in any application environment.</p>
<h2>Reading in the Background</h2>
<p>As with any long-running task, the solution involves partitioning the work into threads. Because of Rule #1 (“There always might be more data.”), you should just assume that reading from a stream needs to happen in a background thread. There is essentially never a scenario in which you want to wait for an entire stream to be read before proceeding to the next task.</p>
<p>To some extent how you implement your background stream-handling tasks will depend on the environment you are working in. Variations in the data rate and the required responsiveness of the application might cause you to make some different choices.</p>
<p>One good rule of thumb is to do as little processing as possible in the stream-reading loop. Slow processing of incoming rows can result in “back-pressure” in the data pipeline. As a result, it&#8217;s best to read rows from the ResultSet as expeditiously as possible, handing off the data to other threads for processing.</p>
<p>Here, for example, is a simple thread to read rows from the <code>SALES.BIDS</code> stream:</p>
<pre><code>
class BidReader extends Thread
{
      @Override
      void run()
      {
            try {
                 Connection c = getConnection();
                 Statement s = c.createStatement();
                 ResultSet rs = s.executeQuery(“SELECT STREAM * FROM SALES.BIDS”);
                 while (!interrupted() &amp;&amp; rs.next()) {
                       // read columns and put into work queue for processing thread
                 }
                // close rs, s, and c
            } catch (SQLException se) {
            }
      }
}
</code></pre>
<p>Note that the loop doesn&#8217;t depend solely on the ResultSet&#8217;s <code>next()</code> method, but also tests whether the thread has been interrupted. You probably wouldn&#8217;t use this exact mechanism (I prefer to override <code>Thread.interrupt()</code> and set a boolean flag), but it shows you succinctly that you need to be aware of more things than just whether there is more data to read.</p>
<p>Another thing to keep in mind is that <code>ResultSet.next()</code> blocks until either more data arrives or the stream closes. That&#8217;s yet another reason to have this happening in a background thread.</p>
<p>You&#8217;ll notice that code snippet references a “processing thread.” That&#8217;s because of Rule #2 (“You never know when the next row might arrive.”). It&#8217;s generally good to decouple the reading code from the processing code. One easy model is to have the reader read each row, then add the row to a shared queue structure, where the processing thread can pick up the rows and process them asynchronously. This minimizes the chances of a slow processing step causing the reader to slow down and potentially push back up the stream. To the extent possible, you&#8217;d like your reader to be sitting and waiting for the next row when it arrives, rather than constantly trying to handle a backlog of incoming rows. If you have to have a backlog, you want it to be in your application, not in the stream server, because that will slow everyone down, including other applications trying to read from the same streams.</p>
<p>Such decoupling doesn&#8217;t really make sense in a pure database application, since there will be a finite number of rows to process, so your goal is to minimize total processing time. If there are 1,000 rows, it doesn&#8217;t matter whether you read them all in one second, then spend an hour processing, or read them over the period of an hour, processing as you go. The total amount of data read and processed is the same.</p>
<p>In the streaming world, it is important to read the data as quickly as possible in order to maximize the overall throughput of the system. That&#8217;s Rule #3 (“The rate of the rows matters.”) coming into play. The data throughput of a given stream is gated by its slowest reader. Part of the contract your client needs to fulfill is to not bog down the system for other clients.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sqlstream.com/blog/2012/02/reading-streaming-data-from-java-part-1-the-three-rules-of-streaming-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Could real-time intelligence be the catalyst for industrial innovation?</title>
		<link>http://www.sqlstream.com/blog/2012/02/could-real-time-intelligence-be-the-catalyst-for-industrial-innovation/</link>
		<comments>http://www.sqlstream.com/blog/2012/02/could-real-time-intelligence-be-the-catalyst-for-industrial-innovation/#comments</comments>
		<pubDate>Thu, 02 Feb 2012 01:10:34 +0000</pubDate>
		<dc:creator>Nick Carruthers</dc:creator>
				<category><![CDATA[Real-time Industry Solutions]]></category>
		<category><![CDATA[Sensor Networks]]></category>
		<category><![CDATA[industrial automation]]></category>
		<category><![CDATA[Real-Time Analytics]]></category>

		<guid isPermaLink="false">http://www.sqlstream.com/blog/?p=881</guid>
		<description><![CDATA[Today’s new world economy has manufacturers racing toward opportunities requiring growth through expansion and increased productivity while pricing remains flat. The increase in fuel, energy, raw materials and labor prices are offsetting scientific and technological advances applied to modern factory machinery, processes and the workforce. Manufacturing automation technology solutions offer manufacturers monitoring and alerting applications&#160;<a class="readmore" href="http://www.sqlstream.com/blog/2012/02/could-real-time-intelligence-be-the-catalyst-for-industrial-innovation/">Read more &#8594;</a>]]></description>
			<content:encoded><![CDATA[<p>Today’s new world economy has manufacturers racing toward opportunities requiring growth through expansion and increased productivity while pricing remains flat. The increase in fuel, energy, raw materials and labor prices are offsetting scientific and technological advances applied to modern factory machinery, processes and the workforce.</p>
<p>Manufacturing automation technology solutions offer manufacturers monitoring and alerting applications improving plant manager oversight and response to quality, consistency and cost issues. While top 100 computer and software companies offer solutions in this space, finding a realistic positive ROI offering is daunting with many requiring huge investments in entire systems overhaul or replacement.</p>
<p><a href="http://www.sqlstream.com/wp-content/uploads/2012/02/pwc1.png"><img class=" wp-image-892   alignright" title="Innovation insights from the chemicals industry" src="http://www.sqlstream.com/wp-content/uploads/2012/02/pwc1.png" alt="Innovation insights from the chemicals industry" width="425" height="254" /></a></p>
<p>“<em>More than any other sector, the chemicals industry is investing heavily in innovation to garner a competitive edge. Ninety-two percent of CEOs in this industry believe that innovation will lead to operational efficiencies and competitive advantage, 13 percent more than all CEOs surveyed.</em>” By Tom Craren, PricewaterhouseCoopers, LLC. (See Figure)</p>
<p>Today&#8217;s IT systems are exorbitant purchases requiring a long-term commitment and a finite vision of volume and quality. Unfortunately these solution sets quickly become static when margins shrink and volumes must increase to continue operating in the red. Competitive solutions in this economic environment must show nearly immediate returns on investment by increasing output and improving quality. This requires a lighter and more powerful system that has the following traits.</p>
<ul>
<li>Unlimited scalability</li>
<li>Seamless integration with current systems.</li>
<li>Low cost fast deployment.</li>
</ul>
<p>Plant managers and engineers should consider a lightweight approach to their efficiency shortfalls rather than the hefty out of the box system overhaul which may give a pretty picture but not the tailored in depth analysis and alerting needed.</p>
<p>Envision a real time layer over existing systems currently in place. A real time data engine that stands alone aggregating unlimited amounts of disparate data, analyzing it “on the fly” without a database and delivering it to any device in any format for real time machine-to-machine and human response.</p>
<p>The real time data engine would also have unlimited scalability creating an ever growing solutions platform using standard database querying language. The flexibility and power of this automation platform allows for continuous upgrading of machinery, flow process and simultaneously integrates dated systems and disparate devices from diverse manufacturers.</p>
<p>Practical benefits of a real time data engine will include the following:</p>
<ul>
<li>Real time big data processing and operational intelligence “on the fly”.</li>
<li>Real time data enhancement “on the fly”.</li>
<li>Real time historical comparatives and complex predictive analysis.</li>
</ul>
<p>Real time operational decisions made in time by machines and humans will reduce downtime, improve quality and increase output.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sqlstream.com/blog/2012/02/could-real-time-intelligence-be-the-catalyst-for-industrial-innovation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Real-time QoS Monitoring for IP Services</title>
		<link>http://www.sqlstream.com/blog/2012/01/real-time-qos-monitoring-for-ip-services/</link>
		<comments>http://www.sqlstream.com/blog/2012/01/real-time-qos-monitoring-for-ip-services/#comments</comments>
		<pubDate>Thu, 26 Jan 2012 20:51:38 +0000</pubDate>
		<dc:creator>Ronnie Beggs</dc:creator>
				<category><![CDATA[Real-time Industry Solutions]]></category>
		<category><![CDATA[Sensor Networks]]></category>
		<category><![CDATA[Streaming SQL]]></category>

		<guid isPermaLink="false">http://www.sqlstream.com/blog/?p=816</guid>
		<description><![CDATA[QoS and service level monitoring has always presented a challenge for telecommunications companies. With the increase in uptake of IP voice and video services, the vast data volumes generated, and the lack of an end to end view, make monitoring the service experience in real-time increasingly difficult. In this blog I&#8217;m looking at the core&#160;<a class="readmore" href="http://www.sqlstream.com/blog/2012/01/real-time-qos-monitoring-for-ip-services/">Read more &#8594;</a>]]></description>
			<content:encoded><![CDATA[<p>QoS and service level monitoring has always presented a challenge for telecommunications companies. With the increase in uptake of IP voice and video services, the vast data volumes generated, and the lack of an end to end view, make monitoring the service experience in real-time increasingly difficult.</p>
<p><a href="http://www.sqlstream.com/wp-content/uploads/2012/01/Slide1.jpg"><img class=" wp-image-825  alignleft" title="Real-time QoS Monitoring" src="http://www.sqlstream.com/wp-content/uploads/2012/01/Slide1.jpg" alt="Real-time QoS Monitoring" width="462" height="337" /></a></p>
<p>In this blog I&#8217;m looking at the core building blocks of a real-time IP service monitoring solution by using a much simplified view of a real-time application. Diagram 1 illustrates the basic problem &#8211; how to monitor an IP service when the end to end view is only possible by piecing together large volumes of events from many different sources &#8211; the core network provider&#8217;s network, the home network and the cable modem, and the service providers platforms.</p>
<p>In SQLstream we capture each event stream in real-time. Applications are built as streaming pipelines &#8211; unlike a traditional database solution, where event data must first be stored and then processed, SQLstream streams the data through processing views, capturing, combining, filtering, aggregating and applying analytics to the events streams, without having to store the data.  This enables real-time operational intelligence with extremely high volume performance with very low latency.</p>
<p>The first views in the pipeline capture the data streams. A declaration for an external data feed is shown below, the real-time <code>MyEvent</code> stream, where <code>source of events</code> is the external system agent or integration adapter.</p>
<pre><code>CREATE OR REPLACE STREAM MyEvent
( "eventName" VARCHAR(10),
  "eventSeq" BIGINT,
  "eventVal1" INTEGER,
  "eventVal2" SMALLINT,
  "eventVal3" BIGINT )
DESCRIPTION 'source of events';</code></pre>
<p>The raw <code>MyEvent</code> data is first filtered, searching for the events of interest. As illustrated in the code example below, these initial views tend to be as simple as possible in order to maximize reuse &#8211; the simplest being a <code>SELECT STREAM * FROM WHERE</code> statement. Streams can be combined, grouped or joined in a single view, or a single view provided per stream, or both.</p>
<pre><code>CREATE OR REPLACE VIEW RawEvents
    AS SELECT STREAM * FROM MyEvent
WHERE "eventName" = 'RawEvent';</code></pre>
<p>Diagram 2 illustrates the concept of the streaming data pipeline, using a simplified example for exception detection. The SQL view illustrated above is the Stream Capture #1 view in the diagram. The use case is built on a real world example, raising an exception if a number of events of a particular type or value are detected within a specified time window.</p>
<p><a href="http://www.sqlstream.com/wp-content/uploads/2012/01/NetworkDiagram-3.jpeg.png"><img class=" wp-image-835   alignleft" title="Real-time Stream Processing Pipeline" src="http://www.sqlstream.com/wp-content/uploads/2012/01/NetworkDiagram-3.jpeg.png" alt="Real-time Stream Processing Pipeline" width="418" height="70" /></a></p>
<p>The second view in the pipeline, Stream Processor #1, is shown below. In this example the view is responsible for the basic processing of the stream, counting the number of events that occur within a time window, in this case 180 seconds.</p>
<pre><code>CREATE OR REPLACE VIEW CountedEvents AS
SELECT STREAM *,
   COUNT("eventName") OVER win AS "eventCount",
   FIRST_VALUE(RE.ROWTIME)
          OVER win AS "firstEventTime",
   FIRST_VALUE("eventSeq")
          OVER win AS "firstEventSeq"
FROM RawEvents
   AS RE WINDOW win
   AS (RANGE INTERVAL '180' SECOND(3) PRECEDING);</code></pre>
<p>The final stage in this particular processing pipeline is the detection of the alert.</p>
<pre><code>CREATE OR REPLACE VIEW FlagTriggerEvents
    AS SELECT STREAM *,
    "eventCount" &gt;= 3 AS "alert"
FROM CountedEvents;</code></pre>
<p>It would of course be possible to include all processing in a single view. However, maximizing reuse of views is a major consideration when building a stream processing application. The example is to illustrate how a pipeline can be constructed, where each view can have any number of consumers. For example, any number of Rule views can read from the Stream Processor #1 view, and any number of views can read directly from the stream capture view.</p>
<p>The application includes significantly more sophisticated integrations, features and analytics than illustrated here. For example:</p>
<ul>
<li>Multiple rules</li>
<li>Recording and forwarding the events responsible for the generation of the alerts</li>
<li>Detect escalation</li>
<li>Detect clearance events</li>
<li>Join with alert history to identify exceptional events that deviate significantly from historical norms</li>
</ul>
<p>These use cases are important components of a complete solution, and I&#8217;ll be providing examples in subsequent blogs, explaining how these have been implemented.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.sqlstream.com/blog/2012/01/real-time-qos-monitoring-for-ip-services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

