Blog


Vice President Marketing
October 11, 2012

Perhaps the highlight of Oracle OpenWorld last week, or at least, the most commented on by attendees at our booth, seemed to be Larry Ellison’s demo of Exadata and Exalytics – querying 10 days or so of stored twitter feeds with the hope of finding the best US athlete from the recent London 2012 Olympics to endorse a car company. This seemed to strike a chord with the audience. How many organizations employ a marketing analytics company to spend a vast amount of time poring over data to work out the top candidates for a marketing campaign? That said, would the CMO really go with a query result, or chose their favorite in any case?

Cloud and business applications were a focus, although as others have blogged elsewhere, despite 80+ acquisitions in the past few years, Oracle remains a database company. Major announcements / news included:

  • Release of Oracle 12c (the ‘c’ for ‘cloud’), and the announcement of its first multi-tenanted and ‘pluggable’ databases got a few ripples of applause from the audience.
  • Exadata X3 box, the in-memory machine with 22 raw TB of memory and a claimed 10X compression making a total of 220TB of ‘memory’ in a rack. Oracle claims this is 100 times faster that the Exadata Oracle launched in the last few years.

Streaming analytics and Twitter

Back to Larry’s Twitter example. Of course, this can be achieved easily as a streaming application in real-time. Semantic streaming is something SQLstream’s been doing for some time, taking unstructured data such as twitter, emails and texts and determining sentiment and aggregated scoring in real-time. Use cases include identifying traffic incidents on the road networks to augment geospatial analytsis of vehicle GPS data, and also in telecommunications, to better determine in real-time a customer’s true perception of their quality of experience for delivered services.

The numbers seemed impressive – Larry crunched nearly five billion tweets and 27 billion social media relationships. But breaking this down, is this really a Big Data problem? Five billion tweets, even over a one day period (I believe the demo was 10 days), is only 58,000 tweets per second. This is well in access of Twitter’s top peak loads during major events such as the Superbowl. But well within the capability of SQLstream’s real-time streaming Big Data platform, even on an entry level single server, 2-core machine. Of course, the complete solution architecture may include data storage platforms such as Hadoop or Oracle, where aggregated streaming results can be loaded and persisted in real-time, further crunched in the data warehouse, and historical analysis joined back with the real-time streams to help identify better any moving trends.

It was an interesting demo nonetheless, and one that really should be completed in real-time as a streaming problem. SQLstream’s ability to analyze and aggregate streams across in this case keywords and hashtags, provide geospatial and clustering analysis, as well as delivering raw and aggregated data as continuous streams to the backend storage platforms, makes this very achievable today.

On the show floor

Oracle OpenWorld Speaking RobotApart from the heavy footfall at the SQLstream booth, perhaps most notable was the increasingly uninventive marketing mechanisms used to persuade unsuspecting attendees to listen to product pitches based on the promise of winning a piece of Apple hardware. Surely marketing managers can think up something a bit more inventive than an iPad? The exception was the the speaking robot. Not sure if this was an exhibit floor attraction, although I saw it ‘chatting’ to passersby on the Wipro booth.

Contact us if you’d like to find out more about SQLstream and our streaming Big Data management platform.


Test
May 10, 2012

This week I’m attending an interesting conference at UC Berkeley called the “Berkeley conference on Streaming Data”.  The organizers are primarily astronomers and statisticians, but the talks discuss issues and solutions to streaming data problems across a wide selection of scientific areas and engineering applications.  Real-time streaming Big Data applications presented included oceanography biology genetics, reading handwriting, astrophysics, particle physics, recommendation engines for social media, and inevitably, real-time fraud detection from live data feeds.

I presented on a deployment of SQLstream as a Dynamically Scalable Cloud Platform for the Real-Time Detection of Seismic Events. Based on work with UCSD seismologists, SQLstream has been deployed to detect significant events in data collected from a large grid of seismic sensors. A large-scale data infrastructure (the OOI/CI) provides raw signal data over an AMQP message bus.

Plot of Seismic Events

SQLstream monitors live seismic data feeds in real-time, applying heuristic algorithms that look for patterns indicating earthquakes. The live system scales dynamically across multiple servers in a cloud environment based on the current demand. You can view the presentation here.  I also blogged previously on the application here.

In conclusion, I have two main observations from the conference so far (it continues until Friday). The first is that the majority of fields in science and technology appear to have a Big Data and often a real-time Big Data problem.  Secondly, the extent of the innovation and computer science resources dedicated to solving these problems.  In particular for this conference, developing algorithms for data analysis and machine learning (that is automatic pattern recognition) that work on streams of flowing data.  It’s clear that traditional data management and even Big Data batch-based methods don’t work when you need continuous results from dynamic data. And the amount of data is huge.


Vice President Marketing
April 19, 2012

Joining real-time structured and unstructured data feeds for better accuracy and reliability from your operational intelligence, and the Text Analytics Summit, 2012, London.

Three IT trends have emerged over the past year – Big Data, real-time and the importance of unstructured data. Taking the latter first, there is an increasing awareness that much of the data we have available to us today is unstructured (Cloudera amongst the many claiming 80% of all data is unstructured).  Unstructured data includes text messages, documents, tweets emails and video content. There’s also a growing industry for tools and software that perform unstructured data analytics – primarily text analytics using semantic modeling, tagging and subsequent analysis.

The past year has also seen Big Data and Hadoop emerge from the rarefied atmosphere of California’s Silicon Valley into mainstream IT.  Driven by statistics such as 90% of all data available today has been generated in the past two years, Big Data as a functional area for primarily unstructured data is here to stay, and is effectively supercomputing lite for the masses.

The need for real-time streaming data management

However, the real-time trend is less well served today by either Hadoop or by the currently available tools and software for unstructured data analytics. Real-time is about the need for immediate detection and response – turning data sources into live data feeds, and processing the data on the fly, then loading batch based distributed platforms such as Hadoop as an output data stream.

‘Stream Reasoning’

I’ve also seen the term ‘stream reasoning’ used to describe the real-time processing of unstructured data, although this is still an area that is less well developed and understood than the more mainstream text analytics from stored data.  ‘Streaming Reasoning’ is the ability to process and respond to semantic knowledge about tweets, messages and other social media interaction in real-time, on the fly. The diagram below illustrates how a semantic modeling library has been plugged into a real-time streaming pipeline in SQLstream – the example is based on SQLstream’s GATE UDX but any library with reasonable performance and a query response API can be plugged in.

Combining streaming structured and unstructured live data feeds

Unstructured data feeds, such as text messages and tweets, are streamed through the semantic tagging UDX and library, with the output of this stage being real-time streams of semantic tagged data.  The data can then be analyzed and frequency charted in real-time.

Text Analytics Summit, 2012, London

I’ll be speaking on this topic at the  Text Analytics Summit, 2012, London.  I’ll be discussing how to combine streaming reasoning (admittedly, mostly Twitter messages) with structured data, with the objective of improving the overall accuracy and reliability of the resulting operational intelligence.  I’ll be using a couple of examples – customer experience management for IP content services such as VoIP and VoD, and also improving the accuracy and reliability of traffic congestion information and travel time information – how can text analysis of tweets and messages help to pinpoint the severity of road network traffic problems.

Look forward to seeing you there, or if you can’t make, I’ll be blogging on the highlights next week.

 


Vice President Marketing
March 20, 2012

Visit our new website to find out more about real-time Big Data applications

Big Data is here to stay. The breadth of the term Big Data may change as it becomes as much a marketing imperative as the ‘Cloud’ word, but the requirement for ‘supercomputing lite’ processing for the non-supercomputing world of enterprise data is a must have.

The rise of Big Data has happened in parallel with the emergence of real-time operational intelligence, and the extension of real-time analytics into the world of real-time updates and process control. Much of the recent interest has focussed on how these two worlds merge into a single complementary solution.

The NoSQL BigData platforms offer massively scalable, resilient data processing over commodity hardware. Ideally suited to scaling large scale data problems over hundreds or thousands of servers. However, platforms such as Hadoop do not support, nor were designed to support, real-time streaming data processing and analytics. Their forte is the batch-based, highly scalable, store-compute loop of map/reduce.

That’s where SQLstream comes in. SQLstream collects and conditions real-time updates from sources such as log files, sensor networks and GPS events, and both integrates streaming data into and from Big Data stores, but also generates real-time analytics from the data as they stream past. The SQLstream architecture also has parallels to that of map/reduce. SQLstream uses Relational Streaming, which is a paradigm for processing streaming Big Data tuples using standard SQL queries. SQL offers strong potential for automatic optimization and distributed parallel processing of streaming data. Whereas platforms such as Hadoop execute batch queries over stored tuples, SQLstream and Relational Streaming executes continuous queries over arriving data.

We’re also at Structure Data this week in New York, where our CEO, Damian Black, will be presenting on the wider area of streaming Big Data and massive scalability. However, if you are attending, visit us for a demo of the ‘millions of events per second” program, and a demonstration of massively parallel stream processing on an Elastic Compute Cloud.