Real-time QoS Monitoring for IP Services

QoS and service level monitoring has always presented a challenge for telecommunications companies. With the increase in uptake of IP voice and video services, the vast data volumes generated, and the lack of an end to end view, make monitoring the service experience in real-time increasingly difficult.

Real-time QoS Monitoring

Diagram 1: Real-time QoS Monitoring

In this blog I’m looking at the core building blocks of a real-time IP service monitoring solution by using a much simplified view of a real-time application. Diagram 1 illustrates the basic problem – how to monitor an IP service when the end to end view is only possible by piecing together large volumes of events from many different sources – the core network provider’s network, the home network and the cable modem, and the service providers platforms.

In SQLstream we capture each event stream in real-time. Applications are built as streaming pipelines – unlike a traditional database solution, where event data must first be stored and then processed, SQLstream streams the data through processing views, capturing, combining, filtering, aggregating and applying analytics to the events streams, without having to store the data.  This enables real-time operational intelligence with extremely high volume performance with very low latency.

The first views in the pipeline capture the data streams. A declaration for an external data feed is shown below, the real-time MyEvent stream, where source of events is the external system agent or integration adapter.


   CREATE OR REPLACE STREAM MyEvent (
       "eventName"    VARCHAR(10),
       "eventSeq"     BIGINT,
       "eventVal1"    INTEGER,
       "eventVal2"    SMALLINT,
       "eventVal3"    BIGINT
   ) DESCRIPTION 'source of events'; 

The raw MyEvent data is first filtered, searching for the events of interest. As illustrated in the code example below, these initial views tend to be as simple as possible in order to maximize reuse – the simplest being a SELECT STREAM * FROM WHERE statement. Streams can be combined, grouped or joined in a single view, or a single view provided per stream, or both.


   CREATE OR REPLACE VIEW RawEvents AS
        SELECT STREAM *
        FROM MyEvent
        WHERE "eventName" = 'RawEvent';

Diagram 2 illustrates the concept of the streaming data pipeline, using a simplified example for exception detection. The SQL view illustrated above is the Stream Capture #1 view in the diagram. The use case is built on a real world example, raising an exception if a number of events of a particular type or value are detected within a specified time window.

Real-time Stream Processing Pipeline

Real-time Stream Processing Pipeline

The second view in the pipeline, Stream Processor #1, is shown below. In this example the view is responsible for the basic processing of the stream, counting the number of events that occur within a time window, in this case 180 seconds.


  CREATE OR REPLACE VIEW CountedEvents AS
   SELECT STREAM
      *,
      COUNT("eventName") OVER win AS "eventCount",
      FIRST_VALUE(RE.ROWTIME) OVER win AS "firstEventTime",
      FIRST_VALUE("eventSeq") OVER win AS "firstEventSeq"
      FROM RawEvents AS RE
   WINDOW win AS (RANGE INTERVAL '180' SECOND(3) PRECEDING);

The final stage in this particular processing pipeline is the detection of the alert.

  CREATE OR REPLACE VIEW FlagTriggerEvents AS
      SELECT STREAM *,
          "eventCount" >= 3 AS "alert"
      FROM CountedEvents;

It would of course be possible to include all processing in a single view. However, maximizing reuse of views is a major consideration when building a stream processing application. The example is to illustrate how a pipeline can be constructed, where each view can have any number of consumers. For example, any number of Rule views can read from the Stream Processor #1 view, and any number of views can read directly from the stream capture view.

The application includes significantly more sophisticated integrations, features and analytics than illustrated here. For example:

  • Multiple rules
  • Recording and forwarding the events responsible for the generation of the alerts
  • Detect escalation
  • Detect clearance events
  • Join with alert history to identify exceptional events that deviate significantly from historical norms

These use cases are important components of a complete solution, and I’ll be providing examples in subsequent blogs, explaining how these have been implemented.

 

Streaming Big Data – a major trend for 2012

For Big Data, 2012 has started where 2011 left off, with a plethora of reports, articles and blogs. Interestingly, most still begin with the question “what is Big Data”. It appears ‘Big Data’ as a market is broadening its footprint far beyond its open source and Hadoop origins. My favourite new term in this quest for delineation is “Small Big Data”. (Isn’t that just “Data”?)

The most interesting trend for us is streaming Big Data processing and analytics. Edd Dumbill, O’Reilly Radar, talks about this as one of the “Five big data predictions for 2012”, “Hadoop’s batch-oriented processing is sufficient for many use cases, especially where the frequency of data reporting doesn’t need to be up-to-the-minute. However, batch processing isn’t always adequate, particularly when serving online needs such as mobile and web clients, or markets with real-time changing conditions such as finance and advertising.”

The real-time use case is an obvious one. If you need to respond or be warned in real-time or near real-time, for example, security breaches or a service impacting event on a VoIP or video call, the high initial latency of batch oriented data stores such as Hadoop is not sufficient.

However, there is also an emerging discussion on the storage of Big Data for big data’s sake. This is the blind collection and storage of data without due consideration as to how it’s going to be used. Dan Woods talks about this in his recent Forbes article “Curing the Big Data Storage Fetish”. The data will never create value without analysis, and little thought has been given to increasing analytics capacity.

There are many vendors emerging for the historical analysis of Big Data repositories, either on the Hadoop platform, or on platforms from the other large scale data warehouse vendors. However, there are very few vendors in streaming Big Data analytics space, and even fewer products with the maturity, flexibility and scalability to process Big Data streams in real-time.

Streaming Big Data analytics needs to address two areas.  First, the obvious use case, monitoring across all input data streams for business exceptions in real-time. This is a given.  But perhaps more importantly, much of the data held in Big Data repositiories is of little or no business value, and will never end up in a management report. Sensor networks, IP telecommunications networks, even data center log file processing – all examples where a vast amount of ‘business as usual’ data is generated. It’s therefore important to understand what’s being stored, and only persist what’s important (which admittedly, in some cases, may be everything).  For many applications, streaming data can be filtered and aggregated prior to storing, significantly reducing the Big Data burden, and significantly enhancing the business value of the stored data.  At least until we understand why we’re trying to store everything.

 

Public and private sector innovation an emerging theme at ITS California

ITS California AGM and Exhibition, 2011
Event report from ITS California,  Annual General Meeting in Long Beach, Nov 13 – 15, 2011.

ITS California’s AGM is SQLstream’s local intelligent transportation event, and an opportunity for public and private sector companies to discuss those issues specific to the state of California. The event has grown in size significantly over the past few years, with new organizations attending, all keen to contribute and discuss different perspectives on the various transportation problems. ITS-CA was established in 1994 as a not-for-profit organization with the remit to foster the adoption of ITS technology across the state. Its funding is a mix of public and private industry.

SQLstream setting up the booth at ITS California

SQLstream setting up the booth at ITS California

California has some acute transportation issues. For example, in Los Angeles County, the number of new residents is set to increase traffic on already congested roadways by an estimated 39 percent. Roadway expansion in this period is set to increase by only 3 percent, resulting in congestion levels rising by more than 200 percent in the next 25 years. Obviously addressing these issues ha been the focus of the Californian transportation industry as a whole, but ITS-CA serves as a focus point once per year to discuss in a wider forum.

The theme of this year’s event therefore (not unsurprisingly) was “Discovering Keys to the Next Decade”, with session focused on:

  • Support for connected vehicle deployments.
  • Transportation systems efficiency, for example reliable travel time and pricing.
  • Analyzing the success of traffic light systems and how close we are to the ‘continuous green’ vision.
  • Improving traveller information systems (of specific interest to SQLstream).
  • The effectiveness of schemes such as car sharing.

Although there’s always an over-arching theme or direction that emerges, and in this case it was the increasing presence of the private sector, and how private sector innovation is helping to drive the adoption of new technology and approaches. This was a common topic at the ITC-CA in Berkley last year, and has continued to grow. It was also a trend that was raised frequently at the recent ITS World Congress in Orlando. Long may it continue.

 

Tutorial: Streaming applications: Geospatial Visualization – Part 3

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the third and final part of the Geospatial Visualization tutorial. The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps required to capture the data and create a display list using Rails. The second part of this tutorial presented the core of the application – how to render the display list. And in this the concluding part of the visualization tutorial, the final key element is discussed – how to get the data flowing.

Getting the Data Flowing

The last step is to tie SQLstream and Postgres together. First we need to give SQLstream the credentials to access Postgres. Create a new file:

$QUAKES/quake.postgres.properties

1 URI=jdbc:postgresql://localhost/quakekml_development
2 DRIVER=org.postgresql.Driver
3 CONNPARAMPREFIX=dbConn_
4 dbConn_databaseName=quakekml_development
5 dbConn_user=USER
6 dbConn_password=PASSWORD
7 dbConn_applicationName=SQLstream TableReader Adapter

On lines 5 and 6 insert the Postgres user and password you set up (same as in $QUAKES/quakekml/config/database.yml). Create a directory under $SQLSTREAM_HOME/plugin called jndi (if it’s not there already), copy your quake.postgres.properties file there and restart the SQLstream server.

linux> cd $SQLSTREAM_HOME/plugin
linux> mkdir jndi
linux> cp $QUAKES/quake.postgres.properties jndi/
linux> cd $SQLSTREAM_HOME
linux> bin/sqlstreamd

If you haven’t loaded webfeed.sql and usgs.sql as described in the beginning of this post, you should do so now. Now we need to write the SQL that bridges the databases:

$QUAKES/viz.sql

 1 SET SCHEMA '"WebFeed"';
 2 
 3 CREATE OR REPLACE FOREIGN DATA WRAPPER "TableUpdate"
 4     LIBRARY 'class com.sqlstream.plugin.tableupdate.TableUpdateStreamControlPlugin'
 5 ++++LANGUAGE java
 6 ++++DESCRIPTION 'adapter for doing insert/update/merge/delete to an external database';
 7 
 8 --
 9 -- Create SQL/MED foreign server and foreign stream for database output
10 --
11 CREATE OR REPLACE SERVER "Postgres_TableUpdate"
12 FOREIGN DATA WRAPPER "TableUpdate"
13 OPTIONS (
14 ++++connParams 'quake.postgres',
15 ++++sqlDialect 'Postgres 8.x',
16 ++++pollingMillis '5000',
17 ++++commitCount '1000',
18 ++++commitMillis '2000')
19 DESCRIPTION 'Postgres database with visualization results';
20 
21 --
22 -- Display list for quake events
23 --
24 CREATE OR REPLACE FOREIGN STREAM "QuakeEventsDB" (
25 ++++"SQLS_opcode" CHAR(2) NOT NULL,
26 ++++"SQLS_chg" VARBINARY(32),
27 ++++"id" INTEGER options("insert" 'skip', "update" 'skip'),
28 ++++"when" TIMESTAMP,
29 ++++"lat" DOUBLE,
30 ++++"lon" DOUBLE,
31 ++++"mag" DOUBLE,
32 ++++"created_at" TIMESTAMP,
33 ++++"updated_at" TIMESTAMP
34     )
35 ++++SERVER "Postgres_TableUpdate"
36 ++++OPTIONS (
37 ++++++++TYPE 'tableUpdates',
38 ++++++++MASTER 'true',
39 ++++++++updatesTable 'quake_events')
40 ++++DESCRIPTION 'table updated with quake events';
41 
42 --
43 -- Pump quake events into display list
44 --
45 CREATE OR REPLACE PUMP "1000-QuakeEventsPump" STOPPED
46 DESCRIPTION 'pump from "SmallQuakesDay" view to "QuakeEventsDB" foreign stream' AS
47 ++++INSERT INTO "QuakeEventsDB" (
48 ++++++++"SQLS_opcode", "when", "lat", "lon", "mag",
49 ++++++++"created_at", "updated_at")
50 ++++SELECT STREAM 'IN',
51 ++++++++q.ROWTIME,
52 ++++++++CAST(SUBSTRING("point", 1, POSITION(' ' IN "point") - 1) AS DOUBLE),
53 ++++++++CAST(SUBSTRING("point", POSITION(' ' IN "point") + 1) AS DOUBLE),
54 ++++++++"mag",
55 ++++++++CURRENT_TIMESTAMP, CURRENT_ROW_TIMESTAMP
56 ++++FROM "SmallQuakesDay" as q;

The first 20 lines set up the table updater (note the reference to our properties file on line 14). We then create a foreign stream to describe the Postgres table quake_events. When Rails created the table, it added an auto-incrementing id field and two timestamps, created_at and updated_at. The options specified on line 27 cause SQLstream to ignore the id column and let Postgres maintain it. At line 45 we describe the pump that reads from SmallQuakesDay and inserts into the stream we defined above. In the select clause we use the ‘IN’ opcode to indicate that we’re inserting and the ROWTIME of the quake record to set the ‘when’ column in the display list. We parse the ‘point’ column the USGS provides, which is in the format "lat<space>lon", on lines 52 and 53. To set the Rails timestamps correctly we use CURRENT_TIMESTAMP to get the creation time and CURRENT_ROW_TIMESTAMP to get the time of this update. Load this file into SQLstream with:

linux> sqllineClient < viz.sql

Now all that’s left to do is start the pump. Create this one-line file:

$QUAKE/start-pump.sql

1 ALTER PUMP "1000-QuakeEventsPump" START;

and start the pump:

linux> sqllineClient < start-pump.sql

Data should now be flowing from the USGS web service, through SQLstream, into Postgres, and rendered to Google Earth from Postgres via Rails. To verify that data is flowing, you can use the ‘Edit quake events’ in your web app. When you follow the Google Earth link, you should see pins scattered across the globe indicating the past day’s earthquakes, and updated every 60 seconds. You’ll probably have to zoom out a bit to see them, unless San Francisco is having another bad day. This is what I saw when I ran it:

And finally

That concludes the streaming data visualization tutorial. In case you’ve missed the earlier posts, the complete series can be found using these links:

++++++++Part 1: Capture the data and create the display list

++++++++Part 2: Rendering the display list

++++++++Part 3: Making it work with flowing data

Please contact us if you have any questions.

 

Tutorial: Streaming applications: Geospatial Visualization – Part 2

The Tutorial blog series helps SQLstream developers build streaming SQL applications. This blog is the second in the Geospatial Visualization tutorial.  The first blog in the series set out the streaming use case for connecting SQLstream to a Google Earth visualization, and described the initial steps required to capture the data and create a display list using Rails.  In the second part of this tutorial, we’re going to discuss the meat of the application –  how to render the display list.

Rendering the Display List

To keep Google Earth continuously updated with the data flowing from SQLstream we’ll have to serve two different KML files: one will contain a KML Placemark for each quake, and the other gives the URL of the quakes feed and tells GE to continuously refresh it (in KML, this is the NetworkLink). We’re going to be serving compressed KML to cut down on the transmission time, so we’ll need the rubyzip gem we installed earlier included in our web app. Stop the server, go to $QUAKE/quakekml, and edit the file “Gemfile” to add this line to the end of the file:

gem 'rubyzip'

then issue these commands:

linux> bundle
linux> bundle package
linux> rails generate controller home index
linux> rails generate controller quakes start feed

The first two commands bundle up the application, including the new gem. The third command creates a Rails controller for our home page, while the last command creates a controller with two actions, one for each of our services. If you restart the server now, you can view these services at http://localhost:3000/home/index, http://localhost:3000/quakes/start and http://localhost:3000/quakes/feed. Edit $QUAKE/quakekml/config/routes.rb and add this line after the “get” commands to make home/index the home page for the web app:

root :to => "home#index"

You’ll also have to remove the file $QUAKE/quakekml/public/index.html. Restart the server and visit http://localhost:3000, you should now see the home#index default page rather than Rails’ startup page. Note that it shows you the name of the template file for this page, relative to the $QUAKE/quakekml directory. Edit $QUAKE/quakekml/app/views/home/index.html.erb to create the content for your landing page, at some point in the body add this line to create a link to the start service:

<%= link_to 'Earthquake events (open in Google Earth)',
+++++:controller => 'quakes', :action => 'start' %>

You should also add this link to the scaffolding for quake events:

<%= link_to 'Edit quake events', quake_events_path %>

You shouldn’t have to restart the server, just refresh http://localhost:3000 to see the changes.

The Ruby code that implements our services is in $QUAKE/quakekml/app/controllers, starter code has already been written by the “rails generate” commands we’ve been issuing. The ancestor of all of our controller classes is in application_controller.rb, we’ll add a method for setting up parameters available in any request (in this case, only one, the path to the server) and two methods for sending text so that the browser recognizes it as KML or KMZ:

$QUAKE/quakekml/app/controllers/application_controller.rb

 1 require 'zip/zip'
 2
 3 class ApplicationController < ActionController::Base
 4+++++protect_from_forgery
 5
 6+++++# Set up the common params that have to be computed
 7+++++# after the request is received (can't go in
 8+++++# initialize). These are available in all views.
 9+++++#
10+++++def setup_common
11+++++++++@path = request.host + ':' + request.port.to_s
12+++++end
13
14+++++# Output the kmz, given the kml
15+++++#
16+++++def send_kmz(kml)
17+++++++++t = Tempfile.new("zipout-#{request.remote_ip}")
18+++++++++Zip::ZipOutputStream.open(t.path) do |zos|
19+++++++++++++zos.put_next_entry("sqlstream.kml")
20+++++++++++++zos.print kml
21+++++++++end
22
23+++++++++send_file t.path,
24+++++++++++++:type => "application/vnd.google-earth.kmz",
25+++++++++++++:filename => "sqlstream.kmz"
26
27+++++++++t.close
28+++++end
29
30+++++# Output the kml directly
31+++++#
32+++++def send_kml(kml)
33+++++++++render :text => kml,
34+++++++++++++:layout => false,
35+++++++++++++:content_type => "application/vnd.google-earth.kml+xml"
36+++++end
37+end

Next we edit quakes_controller.rb to write the methods that respond to the quakes/start and quakes/feed requests. Each method uses Rails’ template support to render a KML template, with an option set to prevent it from being laid out like an HTML page. The start method sets an instance variable, @refresh, to the number of seconds we want to wait before refreshes. The feed method uses Rails’ database support to store all of the quake event rows in @quakes. These instance variables are expanded in the templates.

$QUAKES/quakekml/app/controllers/quakes_controller.rb

 1 class QuakesController < ApplicationController
 2+++++def start
 3+++++++++setup_common
 4+++++++++@refresh = 60
 5+++++++++kml = render_to_string :template => 'quakes/start.kml',
 6+++++++++++++:layout => false
 7+++++++++send_kmz kml
 8+++++end
 9
10+++++def feed
11+++++++++setup_common
12+++++++++@quakes = QuakeEvent.all
13+++++++++kml = render_to_string :template => 'quakes/feed.kml',
14+++++++++++++:layout => false
15+++++++++send_kmz kml
16+++++end
17+end

The path for the template files is relative to $QUAKES/quakekml/app/views, the quakes directory there should already exist and contain the default templates generated by Rails. We’ll create two new template files, starting with the one for quakes/start:

$QUAKES/quakekml/app/views/quakes/start.kml

 1 +<?xml version="1.0" encoding="UTF-8"?>
 2 +<kml
+++++xmlns="http://www.opengis.net/kml/2.2"
+++++xmlns:gx="http://www.google.com/kml/ext/2.2"
+++++xmlns:kml="http://www.opengis.net/kml/2.2"
+++++xmlns:atom="http://www.w3.org/2005/Atom">
 3 +<Document>
 4 +++<name>Earthquake Monitor</name>
 5 +++<open>1</open>
 6 +++<visibility>1</visibility>
 7 +++<LookAt>
 8 ++++++<longitude>-122.418955</longitude>
 9 ++++++<latitude>37.775410</latitude>
10+++++++<altitude>359000.0</altitude>
11+++++++<range>37000.0</range>
12+++++++<altitudeMode>relativeToGround</altitudeMode>
13 +++</LookAt>
14 +++<NetworkLink>
15 +++++<name>Quakes</name>
16 +++++<open>0</open>
17 +++++<visibility>1</visibility>
18 +++++<refreshVisibility>0</refreshVisibility>
19 +++++<flyToView>0</flyToView>
20 +++++<Link>
21 +++++++<href><%= url_for(:controller => 'quakes', :action => 'feed', : only_path => false) %></href>
22 +++++++<refreshMode>onInterval</refreshMode>
23 +++++++<refreshInterval><%= @refresh %></refreshInterval>
24 +++++++<viewRefreshMode>onStop</viewRefreshMode>
25 +++++++<viewRefreshTime>1.0</viewRefreshTime>
26 +++++</Link>
27 +++</NetworkLink>
28 +</Document>
29 +</kml>

The start KML begins with a LookAt element specifying the starting view (directly above SQLstream HQ!). The NetworkLink element includes two substitutions handled by Rails: at line 21 we insert the URL for the quakes/feed service, and at line 23 we insert the refresh rate.

$QUAKES/quakekml/app/views/quakes/feed.kml

1 <?xml version="1.0" encoding="UTF-8"?>
2 <kml xmlns="http://www.opengis.net/kml/2.2"
+++++xmlns:gx="http://www.google.com/kml/ext/2.2"
+++++xmlns:kml="http://www.opengis.net/kml/2.2"
+++++xmlns:atom="http://www.w3.org/2005/Atom">
3 <Document>
4 ++<name>Pins</name>
5 ++<open>1</open>
6 ++<visibility>1</visibility>
7 ++<Style id="pin">
8 ++++<IconStyle id="pin">
9 ++++++<scale>1.0</scale>
10+++++++<Icon>
11+++++++++<href>http://<%= @path %>/images/pin.png</href>
12+++++++</Icon>
13+++++</IconStyle>
14+++++<LabelStyle>
15+++++++++<color>ff0000dd</color>
16+++++++++<scale>1.2</scale>
17+++++</LabelStyle>
18+++</Style>
19+++<Folder>
20+++++<name>Earthquakes</name>
21+++++<open>0</open>
22+++++<visibility>1</visibility>
23+++++<description></description>
24+++++<%= render :partial => "quake_event", :collection => @quakes %>
25+++</Folder>
26+</Document>
27+</kml>

At line 11 we use the @path variable to specify the location of an image we want to appear on the globe at each quake location. You should place an image in $QUAKES/quakekml/public/images/pin.png, we use this one:

The other substitution, at line 24, causes a partial template to be rendered for each record in the @quakes collection. According to Rails’ naming conventions, the partial must be in this controller’s view directory with the name _quake_event.html.erb:

$QUAKES/quakekml/app/views/quakes/_quake_event.html.erb

 1 +++<Placemark>
 2 ++++ <name><%= quake_event.mag %></name>
 3 +++++<open>1</open>
 4 +++++<visibility>1</visibility>
 5  ++++<description><![CDATA[<%= render :partial => 'quake_description',
+++++++++++++:locals => {:quake_event => quake_event} %>]]></description>
 6  ++++<styleUrl>pin</styleUrl>
 7  ++++<Point id="quake">
 8  ++++++<extrude>false</extrude>
 9  ++++++<coordinates><%= quake_event.lon %>,<%= quake_event.lat %>,0</coordinates>
10  ++++</Point>
11  ++</Placemark>

The quake event records are inserted at lines 2 and 9. At line 5 we reference another partial that renders the HTML for the popup that appears when you click on the pin in Google Earth:

$QUAKES/quakekml/app/views/quakes/_quake_description.html.erb

1 <table style="text-align: center; width: 300px;" border="0" cellpadding="2" cellspacing="2">
2 ++<tbody>
3 ++++++<tr align="left">
4 ++++++++<td>on <%= quake_event.when.strftime("%Y/%m/%d") %>
+++++++++++++at <%= quake_event.when.strftime("%X %Z") %></td>
5 ++++++</tr>
6 ++++++<tr align="left">
7 +++++++++<td>at lat: <%= number_with_precision(quake_event.lat, :precision => 6) %>
+++++++++++++lon: <%= number_with_precision(quake_event.lon, :precision => 6) %></td>
8 ++++++</tr>
9  +++</tbody>
10 </table>

Note that this separation into multiple templates lets us express code in Ruby files, KML in KML files, and HTML in HTML files. Details about how the data is presented, such as how we format a timestamp, are taken out of the code and expressed in markup language.

You should now have a working KML renderer. Go to your app’s home page and follow the ‘Edit quake events’ link to add one or more fake quakes. Use SQLstream’s lat/lon from start.kml (above) for at least one of them. Now follow the Google Earth link from your app’s home page (you may have to instruct your browser to open KML/KMZ files in Google Earth, if you’ve never done this before). Google Earth should open and zoom to SQLstream HQ, and there should be a pin indicating an earthquake there.

You can now refine the visualization of a quake event by updating the templates and refreshing the display (you can refresh the quakes/feed stream by right-clicking on ‘Quakes’ in the Places tree on the left and selecting ‘Refresh’). We have a tool to display whatever quake events are dropped into the database, the next step is to feed it from SQLstream.

Next time
Part 3 of the visualization tutorial concludes this series and will be published next week. It will discuss the final key element – how to get the data flowing.

 

ITS World Congress – The rise of low risk innovation?

It struck me that the underlying theme of the conference could be described as ‘low risk innovation’.  An oxymoron?  At first glance, yes, but in this case it describes the cautious adoption of new technology while protecting existing investment.

The perception of the ITS industry, rightly or wrongly, has been one of an industry focussed on major manufacturing and hardware deployment projects, rather than software and new technologies. Therefore perhaps the industry hasn’t seen the level of growth and innovation that has occurred in other areas, telecommunications being a good example.  Or perhaps it simply hasn’t been possible until new technologies come along with a sufficiently compelling business case – lower cost solutions and faster to deploy.

New technology introduction – complementary and overlaid

GPS and Bluetooth featured strongly at the event on both the exhibition floor and in the breakout panel sessions.  It was clear however that the preference was to use these either as a proof concept across a small area, or where monitoring infrastructure exists, as complementary solutions, extending the accuracy and scope of existing fixed-road sensor deployments.

Software solutions and architecture

It was interesting how many of our booth visitors were initially attracted by SQLstream ITS Insight, but who were actually looking for more of a horizontal platform solution.  Yes, out of the box capability was important to get solutions up and running quickly, but wider concerns touched on some fundamental software engineering principles – openness, scalability and interoperability.

Open and flexible

The focus on manufacturing and hardware oriented projects tends to produce software support systems that are built to do just that – support the particular hardware installed for that project.  This leads to capable and often feature-rich systems, but systems that are difficult to extend and configure for new technologies and requirements.  There was a definite theme in the questions being asked at our booth for openness – open platforms, where an agency’s IT department, or consulting partner, can add new applications easily, and in fact are encouraged to do so.

Performance and scalability

Systems are required to scale in two ways.  The first is raw performance as the number of sensor events increases and real-time performance is required. Secondly, and perhaps less obvious, is that transportation agencies are looking to consolidate systems and provide common systems across multiple counties and even at the state level.  This highlights immediately the scalability issues with existing systems.   As the user base increases, the geographical scope of the system increases, and the drive for real-time information increases, the weakness of existing systems have been exposed.

Standards

The emergence of IT standards is a sure sign of increasing maturity in any industry.  This represents a move away from limited, siloed solutions to consideration of the wider integration issues – integration with network hardware, but also the integration between management systems.

A final word …

One final word, not quite a trend yet, but ITS appears to be embracing the Cloud as a solutions platform, as a mechanism for providing access to applications, but also for scalability and as a lower cost solution where large infrastructure deployment would be required.

 

From the show floor: ITS World Congress Day 2

A defining feature of the show is the Technology Showcase featuring  demonstrations from some of the technologies and applications that are bringing the future of transportation to life.  Each ‘village’ covers a specific theme such as Safety, Mobility, Environment/Sustainability and Pricing.  Environment/Sustainability focuses on the potential for reducing emissions.  Interesting demonstration offered by Imperial College, London, that will use a pollution sensor mounted on the roof of a demonstration vehicle travelling around the site.  Data is transferred over a GSM connection and viewed over the internet. Another is Ricardo Engineering’s demonstration of improving fuel efficiency using GPS navigation and traffic signal phase and timing data.

Toyota Star Safety System Simulator

On the theme from yesterday of using simulators to demonstrate the benefits of technology, Toyota’s Star safety system is a star of the show – attendees get a spin in a off-road driving simulator, once with all electronic aids turned off, and once with traction control, stability control, brake assist and anti-lock braking turned on.

And for SQLstream’s day, another busy day with many questions and demonstrations of real-time traffic analytics and congestion detection with SQLstream ITS Insight.  Our core Stream-to-Business platform, and the ability to integrate and extend SQLstream ITS Insight easy and quickly, is generating significant interest.

 

From the show floor: ITS World Congress Day 1

Honda's Safe Driving Simulator - One of many simulators at the show

The ITS World Congress claims to be the largest transportation event in 2011. Certainly the range of attendees and exhibitors is impressive, from software products to the latest in roadside hardware infrastructure. Some interesting themes emerging. Vehicle to vehicle communication is generating a lot of interest, and electronic driver aids. Some great simulators as well to amuse the attendees.

‘Real-time’ is also a common theme across the exhibition hall. And in particular the use of new technologies such as Bluetooth and GPS. Damian Black, SQLstream’s CEO, was speaking in a session today on how to achieve accurate arterial travel time. This is a hot topic right now, where existing in-road sensors are too expensive and the solutions too inaccurate. Bluetooth and GPS are the two emerging, although not necessarily competing, technologies – both can be used simulataneously to reinforce the other.

SQLstream in action – Day 1

It was great to have so much interest at the booth for our real-time intelligent transportation solutions. We announced the public launch of our SQLstream ITS Insight solution this morning. Perhaps we’re one of the few new companies at the show, but the ability to build real-time traffic analytics solutions based on maximizing the use of all available sensor data is generating a lot of interest.

 

SQLstream to announce new product for real-time Intelligent Transportation at ITS World Congress, Orlando.

Visit SQLstream on Booth #1366

Technology and innovation are central themes of this year’s ITS World Congress.  There’s been much written about the issues of congestion, green transportation schemes and improving personal mobility, not least in this blog.  At SQLstream we’ve been playing our part to help revolutionize the Intelligent Transportation industry.  It’s clear that the concepts of streaming data and real-time analytics are entering the main stream – from low level Big Data toolkits that require a streaming, low latency front end, to the real world of sensor networks and industries such as smart grid and telecommunications.

This is just as true in transportation.  Here we have an industry with vast volumes of sensor data, a need for sophisticated real-time analytics, and platforms capable of driving real-time process automation.  We’ve been working with a number of transportation agencies for some time, and are about to launch a new ‘Insight’ product for intelligent transportation.  Our ‘Insight’ range provides tools and out of the box support for specific industry verticals based on our core Stream-to-Business platform.

Google Earth Display for Road Traffic Congestion

Google Earth Display for Road Traffic Congestion

For Intelligent Transportation this means processing sensor data from GPS and fixed-road sensors, to deliver applications such as real-time Travel Time, live congestion detection and network KPI reporting.

Should you be attending the ITS World Congress, we’d be delighted to see you on our booth (#1366) for a demonstration.

 

Tutorial: Streaming applications: Geospatial Visualization – Part 1

A streaming SQLstream application will feel very familiar to anyone with some basic knowledge of SQL and traditional RDBMS applications.  SQLstream uses standards-based SQL, except that streaming SQL queries run forever, processing data as they arrive over specified time windows.

This blog is the first in a series of tutorials for SQLstream developers, describing how to build a streaming SQL applications.  Over the coming months, these tutorials will address the different components of streaming data applications, and provide worked examples and guidance.

Streaming Visualization, Part 1: Setting up

We’ll begin the series by looking at a typical streaming use case – displaying real-time sensor data on a map.  We have a source of geo-located data flowing in SQLstream that we’d like to visualize. Using Google Earth and Ruby on Rails, I’ll demonstrate an easily-implemented solution with lots of room for expansion.

Google Earth - Real-time streaming data visualization

Google Earth - Real-time streaming data visualization

For this example, our approach is to connect the SQLstream pipeline to Google Earth using a staging database–a common deployment scenario. We’ll be using PostgreSQL for the staging database, but MySQL or any other database supported by Rails will work. A SQLstream pump will use TableUpdate to write a record of latitude, longitude, and description for each event to a display list in PostgreSQL. When Google Earth places a web request for data, Rails will service the request by rendering the contents of the display list as KML, Earth’s dialect of XML. We’ll start with SQLstream, Ruby, and PostgreSQL already installed and focus on what’s necessary to get them all talking to each other.

Getting the Data

With SQLstream installed, make sure all of the distributed plugins are installed (if you haven’t done this already) and start the server:

linux> cd $SQLSTREAM_HOME/plugin/autocp
linux> ln -s ../*.jar .
linux> cd $SQLSTREAM_HOME
linux> bin/sqlstreamd

We’re going to get our data from a web feed of recent earthquakes provided by the US Geological Survey. In another shell:

linux> cd $SQLSTREAM_HOME/examples/webfeed
linux> sqllineClient < webfeed.sql
linux> sqllineClient < usgs.sql

We now have several streams available to us within SQLstream, the one we want to visualize is SmallQuakesDay, which includes columns containing the location (‘point’ as lat/lon) and magnitude (‘mag’) of the quake.

Creating the Display List

We’ll use Rails to do all of the work of creating the display list. If you don’t have Rails installed yet, start by installing Ruby’s Gem package management system (in Ubuntu, this is the rubygems package). You’ll also need the development files for Postgres installed (postgres-server-dev in Ubuntu). You can now use gem to install rails and associated tools with this command:

linux> gem install mongrel rails pg rubyzip

I recommend that you add gem’s bin directory to your path (on my system it’s /var/lib/gems/1.8/bin) so that the commands ‘rails’, ‘rake’, and ‘bundle’ are found. Create an empty directory to work in (we’ll call it ‘$QUAKE’ here), cd there, and create a new rails server in the sub-directory ‘quakekml’ with these commands:

linux> cd $QUAKE
linux> rails new quakekml -d postgresql

You can test the server by starting it with these commands and visiting http://localhost:3000 in a web browser:

linux> cd $QUAKE/quakekml
linux> rails server

Use ^C to shut the server down so we can configure the database access. Edit the file $QUAKE/quakekml/config/database.yml, it should already contain sections describing the development, test, and production databases. Edit the username and password settings in each section to match a user you’ve configured in Postgres who can create databases. The only database we’ll be using is ‘quakekml_development’, but Rails will create all three when you issue this command:

linux> rake db:create:all

Create a display list consisting of a timestamp, lat/lon, and magnitude for each quake with the commands:

linux> rails generate scaffold quake_event when:timestamp lat:float lon:float mag:float
linux> rake db:migrate

You now not only have an empty table in Postgres, you also have a full web interface for viewing and editing that table. Start the server again and visit http://localhost:3000/quake_events to see it. Our next steps are to generate KML for Google Earth from this table, and to feed the table from SQLstream. The scaffolding created by Rails is a handy debugging tool we can use to inspect the table and manually add items to test the visualization.

Next time

Parts 2 and 3 of the visualization tutorial will be published over the coming weeks.  Part 2 focuses on how to render streaming analytics in Google Earth, and the final part of the tutorial will discuss how to get the data flowing.

 

SQLstream, Intelligent Transportation and ITS World Congress

ITS World Congress, 2011. Visit SQLstream, Booth #1366

The 18th World Congress on Intelligent Transport Systems (ITS) is being held in Orlando from October 16th – 20th, 2011. This is the leading event for intelligent transportation solutions, and attracts a large audience of government, technology and industry professionals. The event seeks to demonstrate advances in the application of new technology and smart transportation. Major areas of focus include the reduction of traffic congestion and improvement in  personal mobility.

With 800 million vehicles on the world’s roads today, a number forecast to grow to between 2 and 4 billion by 2050, it is clear that transportation management  systems will need to analyze real-time sensor and GPS data dynamically on a massive scale to reduce congestion and optimize personal mobility. The objective is to achieve a fluid and reliable transportation network, that can respond dynamically to changing loads and conditions, and provide consistent and acceptable travel times.

The performance of a transportation network can be measured based on road usage (number of vehicles), and the travel speed and time from origin to destination.  Today’s traffic management systems rely on historical analysis of data from fixed sensors.  However, roadside and in-road sensor projects are very expensive to install and maintain. As a consequence, only a very limited view of the overall road network is available,  with sensor deployments focusing on primary routes and major intersections only. Also, fixed sensors tend to report traffic flow – at best a secondary measure of the real requirement –  congestion.

Most important however is the lack of real-time, dynamic behaviour from existing traffic management systems.  Flow control, for example at intersections and on freeways, is activated at specific times based on the historical analysis of the fixed sensor data – this helps, but is unable to react to changing patterns of traffic flow and congestion.

One approach to the problem is to introduce the latest wireless GPS sensor technology.  Wireless GPS sensors have two significant advantages:

  1. Immediate and real-time information on vehicle speed and location.
  2. Low cost solutions that can be deployed quickly, with little or no maintenance.
  3. Provides a direct measure of vehicle speed and the ability for real-time and accurate measure of congestion.
  4. Complete network insight – highways and arterial routes – at the granularity of a few meters.
SQLstream ITS Insight

SQLstream ITS Insight (Click to enlarge)

For example, when the  Roads and  Traffic Authority (RTA) for New South Wales in Australia was re-evaluating its approach to intelligent transportation systems, it identified wireless GPS technology as both a significantly cheaper and potentially much superior solution to congestion detection and Travel Time.  The RTA selected SQLstream as the real-time traffic analytics and congestion detection platform based on processing in-vehicle GPS sensor data. The SQLstream solution enabled the RTA to cancel a $20million fixed sensor program,  and to build a real-time traffic management platform based on SQLstream’s ITS Insight.

We will be demonstrating our real-time traffic management capabilities on our stand at ITS World Congress in Orlando.  In addition, our CEO, Damian Black will be participating in a number of related panel sessions on arterial travel time solutions and real-time data management for intelligent transportation.  For those attending ITS World Congress, please visit us for a demo at Booth #1366, or visit our website for more information on SQLstream and real-time transportation management systems.  We look forward to seeing some of you at least some at the show.

 

Real-time congestion detection with Streaming SQL

I am going to discuss a SQLstream application for monitoring traffic flow in real-time. In this application, vehicles with GPS enabled devices transmit vehicle position along with other vehicle information such as speed and engine state. SQLstream receives this information as a real-time data stream and uses streaming SQL analytics to detect and predict the rapid onset of congestion on the road network in real-time.

Streaming SQL for Congestion Detection
The SQLstream application for congestion detection uses a typical streaming SQL processing pipeline. In this case, data is fed into the SQLstream pipeline using our Log File Adapter. SQLstream adapters provide an interface to sources and targets such as databases, log files, network sockets and mail servers. Adapters are built using SQL/MED specification which is part of ANSI SQL standard. In this application, each log file contains the vehicle positions on the road network for the latest minute.

Streaming SQL Pipeline for Real-time Traffic Congestion Detection (click to enlarge)

The conditioning pipeline performs data cleansing operations such as rejecting poor quality data (records with missing or out-of-bounds columns) followed by mapping of vehicle positions (lat/long pair) to a “road element” of the road network using a UDX to perform geo-spatial lookups in an external road network database.

The diagram and the example SQL below show our implementation of a streaming SQL pipeline for congestion detection. Each vehicle reports its position and speed every minute. Two consecutive vehicle positions are then used to interpolate vehicle speeds for each road element on the vehicle path between reporting positions. The interpolated speed is based on actual distance traveled by the vehicle between two consecutive reports. The interpolated speed is calculated in a User Defined Transform(UDX). The UDX is written in Java. The UDX also associates a confidence factor with each interpolated speed value based on the position of the road element relative to endpoints of the vehicle path.

Streaming Traffic Flow Analytics
As illustrated below, the analytics pipeline calculates 15, 5, 4, 3, 2 & 1 minute moving average speeds for each road element. Each road element is color coded based on the 15-minute moving average speed. The results are streamed to a Google Earth display.

CREATE OR REPLACE VIEW “EstimatedReSpeeds” AS
SELECT STREAM “RE”, “reID”, “Carriageway”, “rePrescribed”, “reSpeedLimit”,
++SUM(“reVehicles”) OVER “last1″ AS “reVehiclesLast1″,
++SUM(“reVehicles”) OVER “last2″ AS “reVehiclesLast2″,
++SUM(“reVehicles”) OVER “last3″ AS “reVehiclesLast3″,
++SUM(“reVehicles”) OVER “last4″ AS “reVehiclesLast4″,
++SUM(“reVehicles”) OVER “last5″ AS “reVehiclesLast5″,
++SUM(“reVehicles”) OVER “last15″ AS “reVehiclesLast15″,
++SUM(“reSpeed” * “reConfidence”) OVER “last1″ /
++SUM(“reConfidence”) OVER “last1″ AS “reSpeedLast1″,
++SUM(“reSpeed” * “reConfidence”) OVER “last2″ /
++SUM(“reConfidence”) OVER “last2″ AS “reSpeedLast2″,
++SUM(“reSpeed” * “reConfidence”) OVER “last3″ /
++SUM(“reConfidence”) OVER “last3″ AS “reSpeedLast3″,
++SUM(“reSpeed” * “reConfidence”) OVER “last4″ /
++SUM(“reConfidence”) OVER “last4″ AS “reSpeedLast4″,
++SUM(“reSpeed” * “reConfidence”) OVER “last5″ /
++SUM(“reConfidence”) OVER “last5″ AS “reSpeedLast5″,
++SUM(“reSpeed” * “reConfidence”) OVER “last15″ /
++SUM(“reConfidence”) OVER “last15″ AS “reSpeedLast15″
FROM “Stage3″
WINDOW “last1″ AS (PARTITION BY “RE”
++RANGE INTERVAL ‘1′ MINUTE PRECEDING),
+++++“last2″ AS (PARTITION BY “RE”
++RANGE INTERVAL ‘2′ MINUTE PRECEDING),
+++++“last3″ AS (PARTITION BY “RE”
++RANGE INTERVAL ‘3′ MINUTE PRECEDING),
+++++“last4″ AS (PARTITION BY “RE”
++RANGE INTERVAL ‘4′ MINUTE PRECEDING),
+++++“last5″ AS (PARTITION BY “RE”
++RANGE INTERVAL ‘5′ MINUTE PRECEDING),
+++++“last15″ AS (PARTITION BY “RE”
++RANGE INTERVAL ‘15′ MINUTE PRECEDING);

Detecting the rapid onset of congestion
Congestion is detected by comparing moving averages for the larger time window with that for the smaller time window. For example, comparing a 2-minute average with a 1-minute average:

CREATE OR REPLACE VIEW “CongestionRule1″ AS
SELECT STREAM
++–- name, ID, highway name, speed limit etc. for each road element
++“RE”, “reID”, “Carriageway”, “rePrescribed”, “reSpeedLimit”,
++–- volume of vehicle reports in each time window
++“reVehiclesLast1″, “reVehiclesLast2″, “reVehiclesLast3″,
++“reVehiclesLast4″, “reVehiclesLast5″, “reVehiclesLast15″,
++–- estimated avg speed for each road element
++“reSpeedLast1″, “reSpeedLast2″, “reSpeedLast3″,
++“reSpeedLast4″, “reSpeedLast5″,”reSpeedLast15″
FROM “EstimatedReSpeeds”
WHERE “reSpeedLast1″ < 0.80 * “reSpeedLast2″ AND – slowdown by 20 %
++“reSpeedLast2″ < 0.80 * “reSpeedLast3″ AND
++“reSpeedLast3″ < 0.80 * “reSpeedLast4″ AND
++“reSpeedLast4″ < 0.80 * “reSpeedLast5″ ;

SQLstream Traffic Congestion Detection - Visualization

SQLstream Traffic Congestion Detection - Visualization (click to enlarge)

Note that these estimated speeds are over overlapping windows and as such slowdown thresholds are set accordingly.

Fine tuning slowdown thresholds and other information, such as the proximity of traffic lights and the volume of vehicle reports in each time window, improves the quality of congestion detection algorithm.

The Google Earth screenshot illustrates real-time traffic view as well as detected slowdowns as pins. The severity of the slowdown is indicated by different shades of red.

 

Calculating Decaying Averages in Streaming SQL

At SQLstream we have a comprehensive implementation of standard SQL windowing operations such as SUM, COUNT and AVG. Recently though we needed a more sophisticated function for a decaying weighted average that would emphasize more recent samples over older samples. We implemented a new EXP_AVG() operation, as shown in this example query:

SELECT rowtime, ticker, price,
++EXP_AVG(price, INTERVAL ‘10′ SECOND) OVER w
FROM t
WINDOW w AS (PARTITION BY ticker
+++++++++++++ORDER BY rowtime
+++++++++++++RANGE INTERVAL ‘30′ SECOND PRECEDING);

EXP_AVG takes a value expression and an interval constant half life. In this example, two samples within the WINDOW are separated by 10 seconds, the older one will be given half as much weight as the newer one.

How would this look in standard SQL without an EXP_AVG function?

I thought it would be interesting to look at how a decaying average would be implemented in SQL without an EXP_AVG operation, as under the covers we implement it using standard SQL windowed sums. If we knew that the times for our rows would always fall under a narrow range, we would increase the weights of the samples as time goes forwards and simply scale down the total (rather than lowering the weights of samples as time goes backwards from the most recent sample). This would be straightforward. The SQL would look something like this.

First, we need a function to turn our time based units into something on which we can do arithmetic:

CREATE FUNCTION toSeconds(t TIMESTAMP, offset TIMESTAMP)
RETURNS DECIMAL(12,3) CONTAINS SQL
RETURN CAST((t – offset) SECOND(9,3) AS DECIMAL(12,3));

Then, we need a function for calculating exponential weight:

CREATE FUNCTION expWeight (seconds DOUBLE, halfLife DOUBLE)
RETURNS DOUBLE CONTAINS SQL
RETURN EXP(seconds * (LN(2)/halfLife));

The complete SQL will then be:

SELECT rowtime, ticker,
++SUM(price * expWeight(toSeconds(rowtime, OFFSET), 10)) OVER w
++/ SUM(expWeight(toSeconds(rowtime, OFFSET), 10)) OVER w
++as avgPrice
FROM t
WINDOW w AS (PARTITION BY ticker
+++++++++++++ORDER BY rowtime
+++++++++++++RANGE INTERVAL ‘30′ SECOND PRECEDING);

Unfortunately this would not work in practice as the expWeight function would overflow once the row times got more than a few halfLifes advanced from OFFSET. We can’t reset the offset as long as there are any non zero values still in the window. This gives us our out. What we can do is partition the incoming values into two windowed aggregates, always sending zeros to one of them and switching and reseting the offset when the window for that aggregate fills with zeros, this way we’re never summing values whose weights are calculated using different starting offsets.

Here’s the SQL for this. We’ll need some more functions describing when and how we reset our offsets. We’ll need an arbitrary reference for our time calculations:

CREATE FUNCTION toSeconds(t TIMESTAMP)
RETURNS DECIMAL(12,3) CONTAINS SQL
RETURN CAST((t – TIMESTAMP ‘1970-01-01 00:00:00′) SECOND(9,3)
+++++++++++AS DECIMAL(12,3));

To use modulo arithmetic in SQL, convert to an integer:

CREATE FUNCTION toMillis(t TIMESTAMP)
RETURNS BIGINT CONTAINS SQL
RETURN CAST(toSeconds(t))*1000 aS BIGINT);

We’ll want to partition time into window sized epochs starting at our arbitrary time reference:

CREATE FUNCTION isEvenEpoch (t TIMESTAMP, windowSeconds INT)
RETURNS BOOLEAN CONTAINS SQL
RETURN MOD(toMillis(t),
++windowSeconds*2000)< windowSeconds*1000;

We’ll use the start of each epoch as the offset for all rows that fall in that epoch:

CREATE FUNCTION epochStart (t TIMESTAMP, windowSeconds INT)
RETURNS decimal(12,3) CONTAINS SQL
RETURN CAST(MOD(toMillis(t), windowSeconds*1000)
+++++++++++AS DECIMAL(12,3))/1000;

The following two functions are used to correct the inserting of zeros into a window and all the non zero values still in the window are from the previous epoch:

CREATE FUNCTION evenAgingFactor (t TIMESTAMP,
+++++++++++++windowSeconds INT, halflife DOUBLE)
RETURNS DOUBLE CONTAINS SQL
RETURN CASE WHEN isEvenEpoch(t, windowSeconds)
+++++++++++++THEN 1
+++++++++++++ELSE expWeight(-windowSeconds, halflife) END;
CREATE FUNCTION oddAgingFactor (t TIMESTAMP,
+++++++++++++windowSeconds INT, halflife DOUBLE)
RETURNS double CONTAINS SQL
RETURN CASE WHEN isEvenEpoch(t, windowSeconds)
+++++++++++++THEN expWeight(-windowSeconds, halflife)
+++++++++++++ELSE 1 END;

The complete SQL that uses the two window approach to calculate the decaying average using a 30 second window and a 10 second half life is:

SELECT rowtime, ticker,
+++(oddAgingFactor(rowtime, 30, 10) *
++++SUM(CASE WHEN isEvenEpoch(rowtime, 30)
++++++++THEN 0
++++++++ELSE price * expWeight(epochStart(rowtime, 30), 10) END) OVER w
+++++ evenAgingFactor(rowtime, 30, 10) *
++++SUM(CASE WHEN isEvenEpoch(rowtime, 30)
++++++++THEN price * expWeight(epochStart(rowtime, 30), 10)
++++++++ELSE 0 END) OVER w)
++++/
++++(oddAgingFactor(rowtime, 30, 10) *
++++SUM(CASE WHEN isEvenEpoch(rowtime, 30)
++++++++THEN 0
++++++++ELSE expWeight(epochStart(rowtime, 30), 10) END) OVER w
+++++ evenAgingFactor(rowtime, 30, 10) *
++++SUM(CASE WHEN isEvenEpoch(rowtime, 30)
++++++++THEN expWeight(epochStart(rowtime, 30), 10)
++++++++ELSE 0 END) OVER w)
as avgPrice
FROM t WINDOW w AS (PARTITION BY TICKER
++++++++++++++++++++ORDER BY rowtime
++++++++++++++++++++RANGE INTERVAL ‘30′ SECOND PRECEDING);

To understand how this would work in practice lets assume we start our query at 12:00. We’ll only look at the numerator of our query as the denominator uses the same technique to calculate a weighted count.

Time Window A state Window B state
12:00:00 Empty Empty
12:00:00 – 12:00:30 Values with increasing weights being added. Zeros being added.
12:00:30 Has weighted sum * expWeight(30,10) Zero
12:00:30 – 12:01:00 Zeros being added. Will have older part of weighted sum. Values with increasing weights being added. Will have newer part of weighted sum.
12:01:00 Zero. All non zero values will have been aged out. We can safely reset our scaling factor. Has weighted sum * expWeight(30,10)

With care to apply appropriate scaling to the parts this SQL can be used to calculate a weighted average using standard windowed operations. However, I think you’ll agree, it’s easier with our EXP_AVG() operation.

 

SQLstream 2.5.1 is available for download

The latest product update of SQLstream, version 2.5.1, has just been released and shipped.  This will be the final 2.x release prior to the SQLstream 3 launch, and although SQLstream 2.5.1 is predominately a maintenance release, it does include a range of feature enhancements, including:

- Support for exponentially decaying averages in windowed aggregation functions

- Enhancements to the standard Log File and Socket Adapters

A number of our customers have already downloaded and upgraded to the new version, and across a range of industries, including those in transportation, sensor network management, e-Commerce and web analytics.

Click here if you’d like to register for SQLstream product downloads (note – registration is required), or would  like to know more about solutions for real-time analytics and data management.

But what of SQLstream 3?  Well, watch this space.  SQLstream 3 will take a major step forward, not just in SQLstream’s stream computing product capability, but also in the way stream computing solutions are built and deployed.  More on this over the coming months.

 

Real-Time Seismic Monitoring in the Cloud with SQLstream

SQLstream is helping to predict earthquakes across the world in real time. The system has been developed by a consortium of universities and government agencies, with funding from NSF (National Science Foundation), to provide an infrastructure of networked tools for research in ocean science – constructing an internet-based system to collect and share data.

Ocean Research Program Overview

Real-time Seismic Monitoring Program

This is a large system with 16,000 land and sea-based sensors, each sending several channels of seismic event data at a rate of 40 data points per second. The application executes in an Amazon EC2 Cloud on a cluster of SQLstream servers, connected by an AMQP message bus. SQLstream’s AMQP adapter (built with RabbitMQ) enables the streaming SQL application to view the AMQP bus as a domain of input and output data streams. The initial prototype was upgraded to a full-scale system running on a cluster of servers without any changes to the streaming SQL application.

SQLstream’s contribution to the project, an application that processes seismographic data in real-time, demonstrates:

  1. An operational deployment of streaming SQL for scientific calculations in real-time
  2. Real-time distributed processing in an Amazon EC2 cloud using SQLstream and AMQP
  3. Rapid development and rollout of real-time data applications using standards-based streaming SQL.

Seismic Monitoring
The sensor network contains about 16,000 seismic sensors, organized into grids, covering large parts of the North American continent and the adjacent oceans (see illustration below). Each sensor measures the motion of the ground under it in three dimensions, and transmits its data as several digitized channels, typically sampled 40 times a second.

Seismic Sensor Map

Seismic Sensor Map

While the rate of each signal channel is modest (since seismic waves are low-frequency),
this adds up to a large amount of data to process in real time. Moreover, the rules for detecting a seismic event are heuristics that apply to a time interval of several minutes: so the application has to calculate some quantities from the raw data and to store these calculated values over a time window.

But to detect an earthquake reliably, it’s better to monitor all the sensors at once, looking for a disturbance in the signals that first appear in one place, and then appear in nearby places: a disturbance signal that propagates and changes shape in a way consistent with the physics of a seismic wave.

Real-time sensor network management in an EC2 Cloud
Monitoring tens of thousands of signal channels arriving at 40 sample points per second is a complicated problem, but it can be made simpler by breaking it into stages. We’re interested in earthquakes, which are infrequent, so SQLstream first reduces the amount of data by scanning each channel for patterns that suggest the beginning, the peak or the end of a quake: in other words reduce the dense signal to a sequence of interesting events. Then we can look for events detected on other channels that could be due to the same quake propagating in physical space and time.

In the first phase, we built a real-time seismic event detector in streaming SQL.
We translated a scientific algorithm into streaming SQL, and connected to the scientific sensor data infrastructure – using our AMQP adapter. This involved less than 100 lines of streaming SQL.

In the second phase, the prototype was scaled up to a full-size system, dealing with 16,000 sensor channels, running on multiple SQLstream server nodes created automatically in an Amazon Elastic Cloud. This expansion required no changes to the streaming SQL application developed for the initial prototype – simply running the same streaming SQL application inside an elastic container/manager.

Real-time Event Detection with Streaming SQL
The sensor data processing pipeline has five functional stages:

  1. Reading Messages – over the AMQP adapter. A configuration parameter specifies which “topics” SQLstream subscribes to, that is, which set of sensor channels it receives.
  2. Unpack Data – a user defined transform (UDX) unpacks data channel messages into individual data points.
  3. Signal Processing – extract higher order information from the raw data to identify seismic events given background noise and non-seismic ‘bumps’. An example function would be to calculate the ratios of multiple rolling averages of the signal value (x) over different time windows.

    Plot of Seismic Events

  4. Event Detection – The next stage applies heuristic rules to the processed data streams – the output is much sparser: a stream of significant events, each indicating a possible start/peak/stop of a seismic wave on a particular channel. If events of the correct type occur within a correct interval of each other (as shown in the Signal Plot Diagram), they are accepted as significant.
  5. Writing Messages – output (publish) significant events over the AMQP adapter.

Automatic EC2 scaling for Big Data sensor volume
The pipeline described has been proven to scale well to handle even greater data volume:

  1. Channels are processed independently, so scale as O(N).
  2. Additional channels are processed by adding more pipelines
  3. Each pipeline starts and ends with AMQP messages – the SQLstream / AMQP interoperability has been shown to scale well.
  4. To add a pipeline, we simply add another Elastic Compute server, running the same pipeline streaming SQL, but configured to subscribe to its own set of sensor channels.

This is the first part of a series of blogs describing the seimic monitoring solution. The next blogs will focus on the streaming SQL used in the application, and the SQLstream / AMQP architecture for Big Data scalability.

 

SQLstream and Mozilla Firefox 4, a look at the SQL behind the scene

Since Tuesday’s announcement that the Firefox Download Monitor is powered by SQLstream, we’ve received a number of questions about how it all fits together. I hope this description helps answer some of those questions.

Mozilla Firefox Real-Time Download Monitor - Day 2

Mozilla's Firefox Real-Time Download Monitor - Day 2

SQLstream server executes SQL statements, just like standard SQL, except the SQLstream’s queries run continuously, analyzing input data in real-time as it arrives. Statements are presented via JDBC or user friendly tools which use JDBC internally. Statements are compiled/prepared, the planner/optimizer chooses an access plan, and a runtime engine executes the plan. SQLstream is compliant with SQL 2008 and 2003 with just a couple of extensions. One extension includes the keyword STREAM as part of a SELECT statement. The STREAM keyword indicates that the results are continuously streaming rather than a point in time TABLE.

Applications in SQLstream are constructed out of a set of SQL CREATE STREAM statements and SQL VIEWS against streams and other views. Those statements are assembled into a pipeline. When describing a pipeline we refer to statements on the source side as being upstream, and statements closer to the destination as being downstream.

In the middle of the pipeline, we define a stream named FirefoxDownloadStream_ which contains the results of the parsed and conditioned download events. The stream declaration is identical to a table definition with the exception of the type of the object being a STREAM rather than TABLE.

CREATE STREAM "FirefoxDownloadStream_" (
+++"download_type"++++++++++VARCHAR(15),
+++"utc_timestamp"++++++++++TIMESTAMP,
+++"product_name"+++++++++++VARCHAR(12),
+++"product_version"++++++++VARCHAR(12),
+++"product_major_version"++VARCHAR(12),
+++"product_os"+++++++++++++VARCHAR(10),
+++"locale_code"++++++++++++VARCHAR(5),
+++"country_code"+++++++++++VARCHAR(2),
+++"city_name"++++++++++++++VARCHAR(32),
+++"region_code"++++++++++++VARCHAR(2),
+++"longitude"++++++++++++++VARCHAR(8),
+++"latitude"+++++++++++++++VARCHAR(8)
);

The stream is populated with a SQL INSERT-SELECT statement. Again, standard SQL statements are used. The WHERE clause defines that the downloads include new first time downloads, complete upgrades of prior versions of Firefox, or partial upgrades of prior versions of Firefox.

INSERT INTO "FirefoxDownloadStream_"
++("download_type",
+++"utc_timestamp",
+++"product_name",
+++"product_version",
+++"product_major_version",
+++"product_os",
+++"locale_code",
+++"country_code",
+++"city_name",
+++"region_code",
+++"longitude",
+++"latitude"
++)
SELECT STREAM
+++"dlType"+++AS "download_type",
+++"dlTime"+++AS "utc_timestamp",
+++"product"++AS "product_name",
+++"version"++AS "product_version",
+++"GetMajorVersion"("version") AS "product_major_version",
+++"os"+++++++AS "product_os",
+++"lang",++++AS "locale_code",
+++"cc",++++++AS "country_code",
+++"city",++++AS "city_name",
+++"rg",++++++AS "region_code",
+++CAST("latitude" AS VARCHAR(10)) AS "latitude",
+++CAST("longitude" AS VARCHAR(10)) AS "longitude"
FROM "FirefoxCountryFilter"
WHERE (("dlType" IS NULL) OR ("dlType" = 'complete') OR ("dlType" = 'partial'));

The download events contain the time of each download. Mozilla has a number of download servers feeding the worldwide requests to download Firefox. Each of these servers feeds the results of the download requests to a common logfile which is “tailed” by SQLstream. As the time for each download differs due to each client’s network capacity, the download requests may be slightly out of order. In practice the biggest gap we’ve seen is 4 seconds. Since we’re measuring downloads over the long period of time, it was deemed sufficient to adjust the download time of late arrivals to match the most recent download time.
The following SQL statement does that adjustment.


CREATE OR REPLACE VIEW "FirefoxDownloadStream" AS
+++SELECT STREAM MAX("utc_timestamp") OVER(ROWS UNBOUNDED PRECEDING)
++++++++++AS ROWTIME,
++++++++++*
+++FROM "FirefoxDownloadStream_";

SQLstream associates a ROWTIME with each row in a STREAM. The ROWTIME is a monotonically increasing SQL timestamp. In the default case, the ROWTIME is the current time expressed in UTC. Most applications require time to be defined according to time associated with the data itself. Associating the ROWTIME of a row in a stream based on the data contents of the row, is done by the AS ROWTIME clause for an individual column. In the Mozilla pipeline, we set the ROWTIME to be the maximum of the values in the “utc_timestamp” column to be that rows ROWTIME.
The analytics portion of the pipeline is implemented with a standard SQL statement. For example, each 10 seconds the number of downloads for each product, version, … country, city, region is calculated.


CREATE OR REPLACE VIEW "FirefoxStreamForLocationCounters"
DESCRIPTION 'Compute product counters for a minute' AS
+++SELECT STREAM
++++++++++"download_type",
++++++++++"product_name",
++++++++++"product_major_version",
++++++++++"product_version",
++++++++++"country_code",
++++++++++"region_code",
++++++++++"city_name",
++++++++++"latitude",
++++++++++"longitude",
++++++++++count(*) AS "count"
+++FROM "FirefoxDownloadStream" F
+++GROUP BY FLOOR(F.ROWTIME TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '10' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '20' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '30' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '40' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '50' SECOND TO MINUTE),
++++++++++++"product_name",
++++++++++++"download_type",
++++++++++++"product_major_version",
++++++++++++"product_version",
++++++++++++"country_code",
++++++++++++"region_code",
++++++++++++"city_name",
++++++++++++"latitude",
++++++++++++"longitude";

There is a similar view declaration where similar calculations are done for each product. Most of the interest since Tuesday is of course related to Firefox 4.0 downloads. This second view allows Mozilla to drill down on downloads by platform as well as downloads for previous (and future) Firefox versions.


CREATE OR REPLACE VIEW "FirefoxStreamForProductCounters"
DESCRIPTION 'Compute product counters for a minute' AS
+++SELECT STREAM
++++++++++"download_type",
++++++++++"product_name",
++++++++++"product_major_version",
++++++++++"product_version",
++++++++++"product_os",
++++++++++count(*) AS "count"
+++FROM "FirefoxDownloadStream" F
+++GROUP BY FLOOR(F.ROWTIME TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '10' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '20' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '30' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '40' SECOND TO MINUTE),
++++++++++++FLOOR(F.ROWTIME - INTERVAL '50' SECOND TO MINUTE),
++++++++++++"product_name",
++++++++++++"download_type",
++++++++++++"product_major_version",
++++++++++++"product_version",
++++++++++++"product_os";

Each of these views (FirefoxStreamForLocationCounters and FirefoxStreamForProductCounters) is based on the FirefoxDownloadStream. Each defined stream and view is a point where the application can access data either directly or via another VIEW or INSERT…SELECT.

One component of the solution is a piece of code we call the HBaseAgent. The agent uses the JDBC interface to SQLstream and issues a SELECT * FROM each of the described views containing the location and product counter 10-second download counts. The HBaseAgent maps each fetched row to the HBase schema as defined by Mozilla.

I write this blog about 24 hours after Firefox 4 launched. So far there are more than 8 million downloads of Firefox 4. It certainly has been an exciting day for Mozilla and I congratulate everyone who contributed. I’m happy that SQLstream has been able to contribute to their success.

 

SQLstream and Mozilla Firefox 4.0

SQLstream has been powering Mozilla’s Firefox Download Monitor since 2009. A SQLstream based application has been continually aggregating hundreds of millions of download events, receiving minute by minute aggregations via a continuously running SQL SELECT statement using the SQLstream JDBC driver. A continuously running SELECT statement is syntactically and semantically identical to other SELECT statements with the addition that end of data is never returned in SQLSTATE by the FETCH associated with the cursor.

Mozilla 4.0 Real-Time Download Monitor is Powered By SQLstream

Mozilla 4.0 Real-Time Download Monitor is Powered By SQLstream

For the launch of Firefox 4, Mozilla again turned to SQLstream to enhance the download monitor. Applications in SQLstream are built by defining a series of SQL stream definitions and SQL views. Business rules are embedded in these definitions. Definitions are assembled into a pipeline and each definition provides a point where data is available to applications. A stream definition is analogous to a SQL table definition and contains the column names and data types for each defined element. Each stream has an implicit ROWTIME column, a monotonically increasing value associated with the data in each column.

The download monitor tails the log files written by all of Mozilla’s download servers which provide new versions of Firefox. Each entry in the log is parsed. The IP address of each download is converted to a country, city, region, latitude, and longitude. For the Firefox 4 release, Mozilla wanted the results of the download aggregation to be stored in an HBase table in their HBase/Hadoop cluster. Storing the data allows future historical analysis to complement the realtime analysis provided by SQLstream.

SQLstream aggregates downloads to two separate column families in a single HBase table. The ‘product’ column family contains the overall download count. The ‘location’ column family contains the count of downloads for each country, region, city, latitude, longitude.

SQLstream uses a GROUP BY clause along with a COUNT(*) to calculate the number of downloads for each 10 seconds. The author wrote a new piece of SQLstream code which provides an interface from SQLstream to HBase. The HBaseAgent maps the results of the GROUP BY and calls the HBase API to persist data in HBase. The incrementColumnValue API is key in that it allows SQLstream to aggregate download counts on a realtime basis and efficiently update HBase by providing incremental values.

The application periodically reads data from HBase and sends formatted data to each connected browser. See for yourself at http://glow.mozilla.org/. The map provides a running count of Mozilla Firefox 4.0 download with raindrops on the map indicating each location where one or more downloads have just occurred. As I write this blog entry, I can see that Europe and Asia are hot while North America is just waking up. Clicking on the colored rings in the lower left hand corner of the map, allows drill down to the geographic locations.

Mozilla’s Daniel Einspanjer has also blogged about their new real-time download vizualization application. The blog explains the overall architecture of the real-time application using SQLstream, and the SQLstream integration with HBase.

You can read more about the previous Firefox 3 download monitor on Julian Hyde’s blog. Julian is the CTO of SQLstream.

Mozilla also blogged about the history of the Firefox 3 download monitor on the Mozilla Webdev blog.

 

SQLstream 2.5 – Real-time stream computing eliminates Big Data performance and storage bottlenecks

With service and sensor data growing at 60% CAGR, having both the raw power and correct architecture for processing streaming data is essential. IDC released recently estimates for the size of the ‘Digital Universe’ – a term used to describe every electronically stored piece of data. According to IDC, stored data will reach 1.8 million petabytes (1800 exabytes) by the end of 2011.

Data overload (source IDC)

Data overload (source IDC)

As a recent article in the Economist points out, all of this data raises significant processing performance and storage issues. Conventional database technology requires data to be stored, cleaned and aggregated before being queried. With the volume of data growing so quickly, it has become cost prohibitive and technologically infeasible to process all data using conventional solutions.

But how much of the raw data actually needs to be stored? The value of individual data is often low, and the useful lifetime of the raw data short. However, the information content is potentially high – it’s just a matter of identifying the valuable information in the raw data.

Introducing SQLstream Server 2.5

For SQLstream, this is the future of data processing – real-time, continuous analysis of streaming data – generate operational business intelligence from live streaming data without first storing the data in a database.

For the latest release of SQLstream Server, SQLstream 2.5, we’ve focussed on the common business requirements that are required for the rapid adoption of real-time stream computing across all markets – performance, reliability and scalability. More specifically, SQLstream 2.5 offers:

  • - 10X performance improvement, benchmarked against live operational deployments on a single server installation.
  • - Scalability for mission critical applications with federated installations across multiple servers.
  • - Business critical reliability following an exhaustive stability and operational optimization program.

Of course, we’ve also addressed a range of important requirements across our customer base, in particular, additional input and output connectors built on the SQL/MED standard for integration, including:

  • - enhanced database insert/update/select Adapters.
  • - enterprise messaging integration using AMQP.
  • - enhanced Log File management and XML feed processing Adapters.

And last but by no means least, supporting the SQL:2008 standards-based streaming SQL language with new functions including:

  • - support for GROUP BY ORDER BY.
  • - new and enhanced data analysis functions for detecting unique events, such as early emit SELECT DISTINCT.
  • - support for the SQL HAVING function.
  • - and a new range of streaming statistical functions for calculating variance and standard deviation.

Most existing customers have already upgraded to SQLstream 2.5. Some examples of recent SQLstream 2.5 upgrades include customers in the following markets:

Intelligent Transportation – real-time analytics for the intelligent transportation market. A case study for SQLstream ITS Insight was featured recently in ITS International magazine, and an overview of the product’s feature can be found on www.sqlstream.com/Products/itsinsight.

Environmental monitoring and event detection – integrating with AMQP, which provides the guaranteed delivery of real-time raw data from a large sensor network, SQLstream filters (using windowed aggregation) the raw sensor and applies event detection patterns in real-time, generating a continuous stream of environmental exceptions events.

Social gaming infrastructure – working with a new entrant in the on-line social gaming market, SQLstream monitors user activity and provides continuous real-time scoring updates – including real-time incremental updates of historical, aggregated game data maintained in a back-end data warehouse.

SQLstream 2.5 is available now

But if you’d like to learn more about the Business Case for Streaming SQL, try our “Concepts in Streaming SQL” mini-white paper, or follow this blog over the next couple of months where we’ll be posting how some of these functions are being used to solve real-world problems across our customer base.

 

SQLstream at PostgreSQL Conference: West 2010 (2)

PostgreSQL Conference: West 2010Attendance at the PostgreSQL West 2010 Conference was encouraging considering a million people had gathered in the city to celebrate the World Series victory of the San Francisco Giants.

I’ve posted the presentation from the event (previously blogged about here) as given by myself and SQLstream’s chief architect Julian Hyde.

We presented the concepts of Streaming GIS, integrating SQLsteam’s real-time streaming data analytics with the PostgreSQL-based Geographic Information Systems (GIS) engine PostGIS. With examples from SQLstream’s commercial traffic congestion monitoring application, we discussed how sophisticated high performance real-time geospatial applications can be delivered quickly and easily using standards-based SQL.

Click here for a non-slideshare PDF version.

 

SQLstream at PostgreSQL Conference: West 2010

PostgreSQL Conference: West 2010SQLstream’s chief architect Julian Hyde and founding engineer Sunil Mujumdar are to present at PostgreSQL Conference: West 2010.

In a talk entitled Streaming GIS using PostGIS and SQLstream, Julian and Sunil will describe the SQLstream stream computing platform based on industry-standard SQL, and its integration with the PostgreSQL-based Geographic Information Systems (GIS) engine PostGIS.

Wireless sensors and internet services are generating data faster than conventional database technologies can process that data. In particular, mobile resource management requires the stream processing of high volume, location-based data. Solutions require new methods such as Streaming SQL to address these new sources of big data, in real-time. We’ll be discussing key concepts in stream computing, data warehouse feeds and the integration of PostGIS into a high performance streaming environment.

Using mobile resource management as a case study, we will illustrate with examples from SQLstream’s commercial traffic monitoring application.

 

Concepts in Streaming SQL

A streaming SQL query is a continuous, standing query that executes over streaming data. Data streams are processed using familiar SQL relational operators augmented to handle time sensitive data. Streaming queries are similar to database queries in how they analyze data; they differ by operating continuously on data as they arrive and by updating results in real-time.

Streaming SQL queries process dynamic, flowing data, in contrast to traditional RDBMSs, which process static, stored data with repeated single-shot queries. Streaming SQL is simple to configure using existing IT skills, dramatically reducing integration cost and complexity. Combining the intuitive power of SQL with this simplicity of configuration enables much faster implementation of business ideas, while retaining the scalability and investment protection important for business-critical systems.

By processing transactions continuously, streaming SQL directly addresses the real-time business needs for low latency, high volume, and rapid integration. Complex, time-sensitive transformations and analytics, operating continuously across multiple input data sources, are simple to configure and generate streaming-analytics answers as input data arrive. Sources can include any application inputs or outputs, or any of the data feeds processed or generated within an enterprise. Examples include financial trading data, internet clickstream data, sensor data, and exception events. SQL can process multiple input and output streams of data, for multiple publishers and subscribers. To learn more about Streaming SQL, please read our “Concepts in Streaming SQL” mini-white paper.

 

How SQLstream Can Reduce Complex Business Logic to a Single SQL Query

In the game industry, complex game logic needs to be applied to streams of events generated by gameplay.  In single player games, this logic is simply handled by applying the correct computations.  However, in an Internet based social game where millions of players interact together online, the problem takes on an entirely different dimension.  Storing the game events on disk inside of a database becomes increasingly difficult as the rate of gameplay events increases.  Logic and computation must be applied to the data and a disparate set of data must be queried to correctly update the game state.

The solution can be surprisingly simple and similar in form to storing all gameplay events in a traditional database.  SQLstream’s powerful streaming and windowed aggregation capabilities can reduce this use case and complex logic to a single query.

For example, consider a game where users create videos that are viewed and rated by other users:

  • A score for the video must be computed based upon the last 4 weeks of gameplay.
  • Let’s say a view is worth 12 points and various ratings levels are worth between 0 and 30 points.
  • The content’s score is the total points accumulated in the last week, plus 50% of the points in the week before that, plus 20% the points in the week before that and 10% of the points in the week before that.

Let’s assume we have two streams of data which contain streams of gameplay events:

--
-- A stream containing 1 row per an individual view of a video
--
CREATE STREAM "s_video_view" (
++++++++++++++++"video_id" INTEGER,+++++-- the viewed video
++++++++++++++++"user_id" INTEGER,++++++-- the id of the viewer
++++++++++++++++"performer_id" INTEGER++-- the id of the performer
++++++++++++++++);


--
-- A stream containing 1 row per an individual rating of a video
--
CREATE STREAM "s_video_rating" (
++++++++++++++++"video_id" INTEGER,+++++-- the viewed video
++++++++++++++++"performer_id" INTEGER,+-- the id of performer
++++++++++++++++"rater_id" INTEGER,+++++-- the id of the rater
++++++++++++++++"rating" INTEGER++++++++-– the rating given
++++++++++++++++);

To handle all the described business logic, we must first compute the total number of points generated from each gameplay event and add that to the stream of tuples entering the system.  The following query accomplishes this:

SELECT STREAM "performer_id", "video_id", 12 AS "points"
+++++FROM "s_video_view"
UNION ALL
SELECT STREAM "performer_id", "video_id",
++++++++++CASE "s_video_rating"."rating" WHEN 2 THEN 1
+++++++++++++++++++++++++++++++++++++++++WHEN 3 THEN 8
+++++++++++++++++++++++++++++++++++++++++WHEN 4 THEN 20
+++++++++++++++++++++++++++++++++++++++++WHEN 5 THEN 30
+++++++++++++++++++++++++++++++++++++++++ELSE 0
+++++++++++++++++++++++++++++++++++++++++END AS "points"
+++++FROM "s_video_rating"

This query will produce a stream of data containing the performer_id, video_id, and points for each scoring event in the system.

Next we must compute rolling time based score over the event stream for each unique video_id.  SQLstream provides the SQL:99, SQL:2003, and SQL:2008 standard WINDOW facility to make this easy:

WINDOW "last_7_days" AS ( PARTITION BY "video_id"
++++++++++++++++++++++++++RANGE INTERVAL '7' DAY PRECEEDING),
+++++++"last_14_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++RANGE INTERVAL '14' DAY PRECEEDING),
+++++++"last_21_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++RANGE INTERVAL '21' DAY PRECEEDING),
+++++++"last_28_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++RANGE INTERVAL '28' DAY PRECEEDING)

These rolling windows contain all the scoring events over the last 7, 14, 21, and 28 days respectively grouped by the video_id.  This means that any aggregation function applied to that window will be applied to the stream events with the same video_id.  So, COUNT(*) OVER "last_7_days" would produce one row for each unique video_id with a scoring event in the last week.  Those rows would contain a count of the number of scoring events for each unique video_id.

By subtracting the SUM of the points in the 7 day window from the number of points in the 14 day window, we can compute the number of points in the week starting two weeks ago and ending one week ago.  This technique allows us to implement computations on rolling windows that are not bounded by the current time.

Putting the entire example together, we get the following view:

CREATE VIEW "v_video_score" AS
+++++SELECT STREAM "video_id", "performer_id",
+++++++++++++++SUM("points") OVER "last_7_days" +
+++++++++++++++++((SUM("points") OVER "last_14_days" -
+++++++++++++++++++SUM("points") OVER "last_7_days") * 0.5) +
+++++++++++++++++((SUM("points") OVER "last_21_days" -
+++++++++++++++++++SUM("points") OVER "last_14_days") * 0.2) +
+++++++++++++++++((SUM("points") OVER "last_28_days" -
+++++++++++++++++++SUM("points") OVER "last_21_days") * 0.1) AS "score"
++++++++FROM ( SELECT STREAM "performer_id", "video_id", 12 AS "points"
++++++++++++++++++++FROM "s_video_view"
+++++++++++++++UNION ALL
+++++++++++++++SELECT STREAM "performer_id", "video_id",
+++++++++++++++++++++++++++CASE "s_video_rating"."rating" WHEN 2 THEN 1
+++++++++++++++++++++++++++++++++++++++++++++++++WHEN 3 THEN 8
+++++++++++++++++++++++++++++++++++++++++++++++++WHEN 4 THEN 20
+++++++++++++++++++++++++++++++++++++++++++++++++WHEN 5 THEN 30
+++++++++++++++++++++++++++++++++++++++++++++++++ELSE 0
+++++++++++++++++++++++++++++++++++++++++++++++++END AS "points"
++++++++++++++++++++FROM "s_video_rating" )
++++++++WINDOW "last_7_days" AS ( PARTITION BY "video_id"
++++++++++++++++++++++++++++++++++RANGE INTERVAL '7' DAY PRECEEDING),
+++++++++++++++"last_14_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++++++++++RANGE INTERVAL '14' DAY PRECEEDING),
+++++++++++++++"last_21_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++++++++++RANGE INTERVAL '21' DAY PRECEEDING),
+++++++++++++++"last_28_days" AS ( PARTITION BY "video_id"
+++++++++++++++++++++++++++++++++++RANGE INTERVAL '28' DAY PRECEEDING);

which produces a stream of rows containing the video_id, performer_id, and a score computed over the rolling windows.  This view outputs a row every time a new event is inserted added the system.  The complex business logic has been reduced to one SQL query.

By attaching the results of this query to a foreign table, a nearly real-time cache of the videos and their current scores can be maintained in your database.  Alternatively, by using SQLstream’s JDBC driver, a distributed caching system such as memcached could be kept updated with the latest scores for each video simply by calling this query.  But those are topics for another post.

I hope you enjoyed this simplified real world example of how streaming SQL can:

  • simplify your business logic processing
  • improve your ability to deliver real-time data to your customers, clients, and colleagues.
 

The Business Case for Streaming SQL

Businesses need to respond faster than ever to customer information and demands, which are arriving in rapidly increasing volumes from ever more diverse and distributed systems. This need for real-time business models can not be addressed by traditional integration and business intelligence solutions because streaming analytics and related concepts are central to the solution. The real-time model means responding immediately to new information as it arrives and streaming analytics is at the core of these next generation IT systems.

Increasing the speed of business under these pressures of rapidly increasing data volume and more diverse data sources has been expensive and complex. Rapid responsiveness has proved elusive because real-time needs simply cannot be met by delivering more information faster from historical data. Real-time businesses require distributed technology that provides low latency and high-performance processing of data and event streams. By using continuous, streaming SQL queries, business answers can be generated as soon as input data becomes available. Whereas databases query historical data, streaming SQL queries and transforms data on the wire without any prior staging in a database.

As a result, streaming SQL is complementary to traditional EAI, business intelligence, and data warehousing solutions. By completing real-time processing and analysis before storing the data, streaming SQL delivers reduces the cost of processing rapidly arriving data. Even better, streaming SQL makes existing, in-house SQL skills immediately applicable to real-time analysis, reducing integration time and costs.

To learn more about the Business Case for Streaming SQL, please read our “Concepts in Streaming SQL” mini-white paper.

 

Can GPS solutions really monitor parolees in real-time?

A recent San Francisco Chronicle article described using GPS anklets to track former gang members on parole, expanding a program first used on paroled sex offenders.

The concept is great: if you know where parolees are, you can make sure they don’t violate parole restrictions (or catch them if they do). But the technology doesn’t always achieve the goal, for the simple reason that someone has to be tracking the anklets.

Technology can also help solve the problem. Standard data warehousing practices can tell authorities if someone violated parole yesterday, but having the information at the moment parole violations are occurring enables reacting quickly and decisively. In some cases, GPS time-&-location data can prevent crimes as well as aid in solving them after the fact.

Real time data analytics can do this by monitoring the information in real time and sending a page or other alert to parole officers as soon as a violation occurs.  At SQLstream, we’ve been working with customers to monitor real-time data from such sensors, providing instant real-time reports and alerts against pre-determined boundaries of time or space.

It’s also a growing market across the globe. In one state in Germany, ankle bracelets for monitoring time or location boundaries assigned to offenders on probation appear effective, helping probationers stick to a regular schedule, among other benefits.

The business case for better technology is driven by the potential for huge cost savings.  GPS anklet solutions cost only one-third what incarceration costs: about 33 euros per day versus about 100 euros for a day in jail (about $44 and $133, respectively). In California, using numbers from the Chronicle article and the California Department of Corrections and Rehabilitation, GPS ankle monitoring costs about $26 per day, while jail time averages about $133.

Such devices are also in use in other American communities, and German State justice ministers were scheduled to meet June 30th to discuss implementing the bracelets in other parts of Germany.

So it looks like GPS anklets for dangerous parolees may be here to stay, and with a step change in the supporting monitoring technology, true real-time analysis and reporting of exceptions and corrective action can be a reality.

 

Streaming Sensor Data

Railroads have used track side readers to scan bar codes on the sides of freight cars since the 1970s. Such sensors provided real time tracking of goods as they made their way from the supplier to the delivery point. Retail businesses increased the use of RFID tags in the past 20 years to track goods through the manufacturing process. Since the Indian Ocean tsunami of December 2004 the public has become aware of deep water pressure sensors which sit on the ocean floor to detect tsunamis and are intended to generate warnings about potential disasters.

The cost of sensors has decreased significantly in recent years and as a result inexpensive sensors are present nearly everywhere in businesses. As the price of sensors decreases it becomes economically feasible to deploy thousands and even millions of sensors. Such sensors cumulatively generated huge volumes of data. Imagine placing a sensor capable of measuring temperature, humidity, sun light and air pressure sensor within each square kilometer in the state of Iowa to assist farmers in managing crop production. Now imagine each of those 145,743 sensors generating 100 bytes of data every minute resulting in a data volume of nearly 21GB per day.

There is much buzz about Big Data and the challenges of applying traditional database management tools to extract business value from such data. Fortunately, there is a better way – integrating real time data, as provided by sensors, with stream analytic processing, allows timely enterprise decisions in response to changing conditions.

I urge you to read Damian Black’s recent postings on this blog describing the SQLstream approach to “Big Data”.

Read the rest of this entry »

 

Structure10 – after the “Big Data” event

GigaOM Structure 2010 Big Data and Cloud ComputingLast week I was on a panel for “Big Data” at Structure2010 – a GigaOm event. As usual, it was very well run and there was a large throng of silicon valley luminaries ranging from entrepreneurs to venture capitalists scattered in with some large customers and users of technology. We clearly have moved on a long way from the days when I was told to change my slides and remove the cloud graphic and replace it with a box because “clouds are cloudy” (direct quotation from a tier one venture capitalist – I wish to protect his identity to avoid personal embarrassment).

SQLstream is already the market leader in applying stream computing to Intelligent Transportation Systems, and we also have the opportunity to provide a similar impact to the Cloud Computing Service Monitoring space. It seems we have exactly the perfect solution to provide real-time insights into service usage, bottlenecks, error rates and service level compliance. And you can add regulatory compliance to that list too – from the continuous alerting side to complement the excellent historical solutions that are out there.

From the presentations at the show, it is clear that Cloud Computing has truly come of age. SQLstream uses cloud services for all demonstrations and also in our QA and Engineering processes. We also have customers deploying in the cloud. The latest emerging cloud solutions fill in many of the former technology gaps, allowing seamless integration into or transition from traditional data centers. You can even run your own private clouds leveraging the same APIs available on the public clouds.

On the Big Data front, on the panel alongside SQLstream were a Hadoop vendor and a high-performance column store data warehouse vendor. The other two panelists were users of “big data” technologies. It was interesting to discover that we already had two implementations where SQLstream operates in concert with or in parallel with the other two panelist vendors’ technologies.

There is even a customer (Mozilla) that uses all three technology approaches for download analytics – Hadoop in the form of HBase and a column store data warehouse for historical SQL queries over downloads, and SQLstream to generate high-performance continuous real-time analytics and reporting on download statistics for all versions of FireFox. This clearly demonstrates that there is a role for each of the Big Data technologies high-lighted on the panel, and an interesting and growing market opportunity. It also indicates some clear partnership opportunities.

I look forward to seeing the developments in our space and in cloud computing over the coming year and hope to be invited back again soon. We were originally present on the Big Data panel at GigaOm’s inaugural Structure2008 event, so I guess we should be set for a reappearance at Structure2012?! If so, I am sure we will have some exciting new stories to share.

Here is a link to the video recording of the panel session. A big thank-you to Phil Hendrix for his excellent moderation of the panel and the professional preparation work he did beforehand so that the actual event went smoothly.

 

Big Data – Dealing with the Data Tsunami

GigaOM Structure 2010 Big Data and Cloud Computing There is a lot of buzz these days about the challenge of “Big Data”.  I’ll be speaking on the subject at GigaOM’s Structure2010, on the “DEALING WITH THE DATA TSUNAMI: THE BIG DATA” panel. There are many dimensions to the challenges posed by “Big Data”, which I’ve presented here as five separate but related themes.

Speed of data arrival

The first theme is speed.  When a lot of data arrive fast, it is often overlooked that they arrive in raw form and need to be processed or cooked before they can be of any real value. The processing normally comprises cleaning, filtering, aggregating and validating.  Sometimes the data need to be enhanced, normalized or de-normalized.  While there are a number of proprietary ETL tools out there that can help, most people prefer to perform these operations using SQL.  This approach has become known as ELT as the data are Extracted, Loaded and then Transformed (as opposed to Transformed then Loaded).  In the past, this has meant loading raw data into a data warehouse’s staging tables and then performing the ELT with SQL in batches until the data are fully cooked and ready to take part in the “main course” queries. 

One of the strengths of the SQLstream approach is that for the first time you can use standards-based SQL for performing these ELT steps but as Continuous ETL rather than operating upon the data after first storing it.  We call this “analyze-before-store” approach: Query the Future – as the scope of the continuous queries is from the moment they start until the end of future time (in contrast with historical queries whose scope is from the moment they start until as far back in time as the data are stored).  SQLstream’s queries continuously process, clean, aggregate and enhance the data in a highly parallelized dataflow pipelined process.  The staging is in main memory using 64-bit architecture and multiple cores and servers.  This provides a highly scalable efficient and cost effective solution to ETL, with the virtuous side-effect of enabling the data warehouse to be kept continuously up-to-date by feeding it a stream of fully cooked data and updating its aggregate tables continuously in near real-time.  All of this is done without stealing valuable cycles of the data warehouse server.

Data location

The second theme is data location.  Like houses, location is very important when it comes to assessing the value (or usefulness) of the data.  Location might be spatial or temporal.  If you wish to be alerted of a special price for gas at a specific gas station, clearly it is of greater value if you are currently in the immediate vicinity of the gas station.  This shows the value of both the location in space and the location in time.  In contrast, most data warehouses dumbly store all service data and records without regard to their value. 

Clearly, the value of the data in many cases greatly diminishes over time.  Many of the queries that a business might pose are better targeted at current data.  That is particularly true of targeted advertisements, but also when monitoring customer service level, cloud computing infrastructure and the like.  The data are much more valuable when the business is able to take proactive initiative to capitalize on the value – fixing problems or issues before they negatively impact customers, or making that promotion or sale before the customer purchases product or service from a competitor.  SQLstream’s continuous queries are all about focusing analytics where they have the most value by specifying explicit windows of focus for the queries in terms of time, quantity or space.  While many rows can flow into and out of the window of focus for any given query, the window represents the immediate focus of attention.

Pace of change

The third theme is the pace of change of data.  If you have a large quantity of data that is not changing very much, then historical queries and analysis will no doubt provide you with all of your answers.  However, if the data are changing constantly, or a lot of new data arriving constantly, or if you have a focus on a specific window of time or space, then historical analysis has little value.  What you care about is the derivative of the change – the rates of change.  For example, are our sales accelerating or decelerating?  Is the rate of acceleration unusually high or low?  What about service outages and error rates?  Or customer complaints?  The SQLstream approach enables you to see what is changing rather than what is staying the same.  It is analogous to predator vision: the predators want to see what is moving and their vision system prioritizes that over what remains motionless.  SQLstream provides such dynamic vision.

Balancing historical and continuous analysis

The fourth theme is the need to complement data mining and the results of historical analysis with continuous analysis.  Data warehousing allows you to find patterns and predictors from past data and to back test all of your hypotheses over extended periods of time.  The back testing of such hypotheses often takes the form of SQL queries that search for patterns of changes of data over time and check that the predicted results occurred and with what frequency.  Once you have mined and captured such valuable predictors, it is straightforward to take the SQL you have generated and tweak it to be used in real-time, continuously executed against live data.  Using this approach, SQLstream allows you to leverage you data mining results to perform real-time predictive analytics, giving your business a real-time heads up for key indicators of buying signals, or systems’ failure or what ad should be served up based on a customer’s web behavior.

Brain over brawn processing

My fifth and final theme is “smart declarative” versus “dumb brute force” when applied to data queries.  The latter is how I see Hadoop-based approaches.  You parallelize a problem to take advantage of a lot of available servers and related CPU cycles, but you do not rely on any intelligence on how you partition the problem.  In fact not having to “think” is one of the primary appeals of the technique.  It is a brute force method of brawn over brain.  However, where the problem space is truly huge, or the time or financial budget is more limited, there is always the attraction of the “brain over brawn” technique.  Declarative SQL processing draws upon the mathematical tractability of analyzing patterns and dependencies within the data, the use of keys and indexing, the rewriting of complex formulae into simpler ones and avoiding recalculation of intermediate results – in order to provide a faster, more efficient and smarter way of finding the solutions.  Such declarative techniques can still take extensive advantage of parallelism and inexpensive or available servers and CPU cycles, but they rely on smart analysis in order to optimize the calculations.  SQLstream, and all SQL-based data warehouses, heavily draw upon these mathematical SQL properties and patterns and analysis of the data to do the smart thing when it comes to query processing. 

Stream Computing of the kind embodied by SQLstream however has even greater potential to take advantage of parallelism over and above SQL data warehouses because SQLstream’s Stream Computing has no transactional bottleneck and is purely declarative.  Input streams are not “side-effected” by the execution of stream SQL statements, rather new streams are created from the original ones (which are left untouched and can be presented concurrently to other SQLstream servers).  The execution paradigm is one of parallel dataflow execution – a paradigm that lends itself not only to massive parallel execution but also to massively distributed execution.  I believe that as Hadoop becomes more widely understood and deployed, people will begin to see just how much of a better job could be performed by adding a little intelligence and just how powerful declarative stream computing can be.

 

Streaming SQL and Bollinger Bands

Last year has been an interesting experience as I participated in a number of customer “Proof Of Concept” projects for SQLstream. Developing these real-time, stream computing projects greatly increased my appreciation for the advantages of an open, extensible and standards-compliant middleware infrastructure.

For example, I needed to implement an “edge detection” mechanism for a POC project. My colleagues at SQLstream recommended using “Bollinger bands” for determining outliers. So, I browsed through the  wikipedia entry for Bollinger Bands to learn more. Bollinger bands are very similar to standard deviations or quartile deviations. A Standard deviation measures variability or dispersion in data distribution. Bollinger bands, on the other hand, provide thresholds to filter outliers in the data. In fact, Bollinger bands are based on the moving average and moving standard deviation of the data set. For typical data sets, Bollinger bands can be defined as:

lowerBB(lower Bollinger Band) = avg – (k * stddev),

upperBB(upper Bollinger Band) = avg + (k* stddev)

where avg and stddev are the average and standard deviation over a sufficiently large time window and k is the constant that needs to be determined for the activity being monitored. For typical data sets, k = 2 will create the upper bollinger band at 95th percentile of the data set.

Bollinger Bands are widely used in the financial services industry. However, Bollinger Bands can be applied to solve problems in other industries. (As I am not claiming to be a statistics expert, I would certainly appreciate honest feedback on our application of Bollinger bands in streaming queries.)

Bollinger bands certainly are a good tool to identify sudden spikes in the activity being monitored in real-time. A number of examples come to my mind,

  • Sudden spikes in the price for a ticker symbol in a stock exchange. For example,

SELECT STREAM ROWTIME, ticker, price,

FROM (SELECT STREAM ROWTIME, ticker, price,

AVG(price) OVER (PARTITION BY ticker RANGE INTERVAL ‘1′ HOUR PRECEDING) AS “avgLastHour”,

STDDEV(price) OVER (PARTITION BY ticker RANGE INTERVAL ‘1′ HOUR PRECEDING) AS “stdDevLastHour”,

AVG(price) OVER (PARTITION BY ticker ROWS 5 PRECEDING) AS “avgLast5Trades”

FROM BIDS) AS S

WHERE S.”avgLast5Trades” > S.”avgLastHour” + 2 * S.”stdDevLastHour”;

  • Spikes in the error rate on a web server. For example,

SELECT STREAM ROWTIME, url, “numErrorsLastMinute”,

FROM (SELECT STREAM ROWTIME, url, “numErrorsLastMinute”,

AVG(“numErrorsLastMinute”) OVER (PARTITION BY url RANGE INTERVAL ‘1′ HOUR PRECEDING) AS “avgErrorsPerMinute”,

STDDEV(“numErrorsLastMinute”) OVER (PARTITION BY url RANGE INTERVAL ‘1′ HOUR PRECEDING) AS “stdDevErrorsPerMinute”

FROM “HttpRequestsPerMinute”) AS S

WHERE S.”numErrorsLastMinute” > S.”avgErrorsPerMinute” + 2 * S.”stdDevErrorsPerMinute”;

  • Monitoring call volumes in a call center.
  • Analytics on social/online gaming services.

In the Stream Computing context, Bollinger bands provide the high/low-water marks for monitoring activity. Whenever the level of recent activity crosses these Bollinger Band thresholds, the activity can be flagged. The streaming analytics engine can then perform additional analytics to detect patterns in the activity and to provide actionable information to regulate the system that is being monitored. At the very least, Bollinger bands can be used to filter out “uninteresting” rows from the stream, thereby reducing the load on the streaming pipeline.

At SQLstream, we used windowed aggregation functions such as AVG() OVER (…) and STDDEV() OVER (…) to establish Bollinger bands. It is necessary to compute AVG and STDDEV on sufficiently large windows of time. In a streaming context, we used sufficiently large windows of time to calculate Bollinger bands. So, as the window slides forward in time, the Bollinger bands reflect more recent activity levels. The current activity levels can then be computed on a much smaller window, potentially including only the current row in the stream. Should the current activity level cross either of the Bollinger bands, we then mark that as a spike in the activity level. The formula for Bollinger bands needs to be changed based on the data distribution, that is, to determine exactly what multiple of standard deviation is appropriate.

Coming back to my point about openness and extensibility, as you can see in the example queries above, you could execute a very similar query in Oracle or SQL server. Key features such as windowed aggregation functions, often called SQL OLAP functions, have been in SQLstream for a long time. Interestingly, SQLstream did not support STDDEV() windowed aggregation function during the POC. A lot of the SQL experts will know STDDEV can be easily rewritten using a formula involving AVG. Our Chief Architect, Julian Hyde, was quick enough to “sweeten” the deal by adding the “syntactic sugar” necessary to support STDDEV natively.

I am sure a lot of you readers have interesting ideas and questions. Please feel free to post them here and I will be happy to engage in conversation.

 

Intelligent Transportation and the ITSA Conference

Just back from the 2010 Intelligent Transportation Society of America’s Annual Meeting.  For those unfamiliar with intelligent transportation, I am not referring to the “shovel ready” projects that have been funded by President Obama as part of the economic stimulus package. These projects were designed to spend money and create jobs, thereby, stimulating the economy. Unlike the federal “shovel ready” projects, “network ready” intelligent transportation technologies and projects are rapidly being adopted and implemented by local and state departments of transportation that must still operate under fixed or reduced budgets. These local and state DOTs are using new technologies to “Do More with Less.”

ITSAIntelligent Transportation aims to reduce costs, delays, pollution, injuries and deaths by connecting infrastructure control and monitoring systems to the network and enabling these systems, and their operators, to communicate in real-time. Some examples of intelligent transportation solutions and control systems include dynamic speed limits that change according to traffic and road conditions, stop lights that know when you can go and the FasTrak electronic toll system that reduces congestion on the Golden Gate Bridge and other Bay Area bridges. Real-time technology is essential if these dynamic control systems are to collect your toll at 45 miles per hour or detect when it is safe to proceed through an intersection.

All of these intelligent transportation systems and devices can be thought of as “sensors” on the network. The data is collected by the sensors, streamed to a server, analyzed and eventually stored in a warehouse. (Imagine the final scene from Raiders of The Lost Ark, except with crates full of hard drives). Meanwhile, the analytic results are communicated back to the original sources (stop lights, toll booths and electronic road information signs) as well as to the mobile devices in your vehicle.

In some cases, new intelligent transportation solutions need to be integrated with legacy systems. In other cases, they simply need to be able to talk to each other. Thus, it becomes imperative that all new intelligent transportation solutions be built on a set of common, open standards. In the long run, solutions built on open standards reduce the total costs to those who implement and maintain the solutions. Open standards, and in particular, the global use of open data standards, within the intelligent transportation industry is essential, not just so that different sensors on the network and IT solutions can communicate with each other, but so that drivers can experience consistent and safe journeys as they cross from federal highways to state and local roads, always in contact with intelligent transportation systems that control these roads.