Blog


Vice President Marketing
April 25, 2012

The Text Analytics Summit in London this week was an opportunity to catch up on the latest trends and state of the Text Analytics market.  An interesting couple of days with a few themes emerging.

Firstly, Big Data.  Not entirely unexpected, but almost every presentation referred to Big Data in some shape or form.  In part this was referring to the volume of data to be processed, but primarily in the context of databases for the storage and processing of unstructured data of any volume.

Although not discussed explicitly, there’s obviously a search for business models that work.  Most applications were B2B platforms, sold as a package of product, services and consultancy, enabling organizations to better mine text data for market and competitor intelligence.  However, some were seeking to monetize through subscriptions to information feeds.

 

For SQLstream, we presented on the use of real-time text analytics for improving incident detection and prediction.  In particular, the use of real-time Twitter and text messages for identifying Quality of Experience issues with IP content services, but also the use of Twitter for improving real-time incident detection in transportation networks.  And in line with the rest of the conference, we did our bit for Big Data, describing how real-time streaming integration and analytics can be built on unstructured data analytics as an integrated real-time Big Data and Hadoop platform.


Vice President Marketing
April 19, 2012

Joining real-time structured and unstructured data feeds for better accuracy and reliability from your operational intelligence, and the Text Analytics Summit, 2012, London.

Three IT trends have emerged over the past year – Big Data, real-time and the importance of unstructured data. Taking the latter first, there is an increasing awareness that much of the data we have available to us today is unstructured (Cloudera amongst the many claiming 80% of all data is unstructured).  Unstructured data includes text messages, documents, tweets emails and video content. There’s also a growing industry for tools and software that perform unstructured data analytics – primarily text analytics using semantic modeling, tagging and subsequent analysis.

The past year has also seen Big Data and Hadoop emerge from the rarefied atmosphere of California’s Silicon Valley into mainstream IT.  Driven by statistics such as 90% of all data available today has been generated in the past two years, Big Data as a functional area for primarily unstructured data is here to stay, and is effectively supercomputing lite for the masses.

The need for real-time streaming data management

However, the real-time trend is less well served today by either Hadoop or by the currently available tools and software for unstructured data analytics. Real-time is about the need for immediate detection and response – turning data sources into live data feeds, and processing the data on the fly, then loading batch based distributed platforms such as Hadoop as an output data stream.

‘Stream Reasoning’

I’ve also seen the term ‘stream reasoning’ used to describe the real-time processing of unstructured data, although this is still an area that is less well developed and understood than the more mainstream text analytics from stored data.  ‘Streaming Reasoning’ is the ability to process and respond to semantic knowledge about tweets, messages and other social media interaction in real-time, on the fly. The diagram below illustrates how a semantic modeling library has been plugged into a real-time streaming pipeline in SQLstream – the example is based on SQLstream’s GATE UDX but any library with reasonable performance and a query response API can be plugged in.

Combining streaming structured and unstructured live data feeds

Unstructured data feeds, such as text messages and tweets, are streamed through the semantic tagging UDX and library, with the output of this stage being real-time streams of semantic tagged data.  The data can then be analyzed and frequency charted in real-time.

Text Analytics Summit, 2012, London

I’ll be speaking on this topic at the  Text Analytics Summit, 2012, London.  I’ll be discussing how to combine streaming reasoning (admittedly, mostly Twitter messages) with structured data, with the objective of improving the overall accuracy and reliability of the resulting operational intelligence.  I’ll be using a couple of examples – customer experience management for IP content services such as VoIP and VoD, and also improving the accuracy and reliability of traffic congestion information and travel time information – how can text analysis of tweets and messages help to pinpoint the severity of road network traffic problems.

Look forward to seeing you there, or if you can’t make, I’ll be blogging on the highlights next week.

 


Vice President Marketing
April 2, 2012

Last week SQLstream sponsored and CEO Damian Black presented at Structure Data in New York, a conference exploring “the technical and business opportunities spurred by the growth of big data”.

It’s clear that Big Data has moved on considerably in a very short space of time. From the Silicon Valley, 101 world of Java developers and Hadoop, into the mainstream wider business world (but still with Hadoop!).

Some themes emerging from the conference:

  • The basic need to deliver high performance, massively scalable computing infrastructure as data volumes grow exponentially. It’s clear that the pain from structured and unstructured data is driving different approaches at different stages in the data management lifecycle – better visualizations, better cleansing and filtering, and a better understanding of the appropriate analytics tools that are most applicable at each stage.
  • The emergence of the SQL layer. It’s clear Hadoop has its strengths and is here to stay. It’s effectively ‘supercomputing lite’ and given today’s data volumes, is just the tool for the job. However, there are a couple of trends emerging. First, is it actually necessary to store all the data, when much of it is obviously not of interest? Second, once the initial analysis of both all structured and unstructured data is achieved, there’s an emerging layer above Hadoop that’s looking very structured.  Both these functions are looking much more SQL-like.
  • Real-time, low latency analytics. Hadoop is not, nor does not claim to be, a low latency, real-time data management platform. There is a well-defined business need to analyze log file, sensor and network data in real-time (sub-second to a few minutes latency), but also to stream the arriving data through to Hadoop for further analysis. Obviously this layer needs to as scalable, if not more so, than the underlying Hadoop platform.

Damian’s presentation Structure Data focused on relational streaming – massive-scale parallel data processing using SQL, generating real-time results from streaming input data. The talk described relational streaming as a standalone real-time management layer, and also SQLstream integrated with Hadoop as the streaming layer in the Big Data stack (you can also read the GigaOM report in the presentation here).

 

Posted under Big Data · Events · In the News