space13left Solutions
spacer Case Studies
 
 
 
  Enterprise Info Mgmt
 
 
 
 
 
 
 
 
 
 
  Telecomm
 
 
 
  Financial Services
   
   
   
   
   
   
Extract, Translate and Load – ETL

Before people started using MOM (Messaging Oriented Middleware) to share changing data between applications, they used a technology known as ETL (Extract Translate and Load). It collected data; cleaned, aggregated and transformed the data; and shipped it to a central location, optionally to populate a database or data warehouse.

Historically Batch-Oriented

The ETL process was and is inherently batch-oriented. As market pressures and business needs have forced ETL vendors to offer more of a real-time solution, their main response has been to move to micro-batches. The idea is simple – if the batches are relatively small, they can label their activities “near real-time” without having to significantly rework their product or approach. However, moving to micro-batches offers no architectural benefits and no competitive leverage against products designed from inception to provide real-time, continuous processing.

ETL Traditional Tools Characteristics

ETL tools are extremely useful, performing valuable tasks, but fundamentally functionally inferior to using Relational Asynchronous Messaging (RAM) and the RAM Management System (RAMMS) applied to the same task set. Basically, ETL tools are

  • not declarative,

  • often hard to manage or administer from a central location,

  • not self-healing or optimizing, requiring substantial interaction to fix, maintain, enhance, upgrade, or optimize for particular needs, and

  • not linked easily to the relational data model.

As a result, they unsurprisingly do not support standard driver interfaces such as ODBC or JDBC, and they do not follow the SQL security model or other related standards.

They do not support the kinds of plug-in extensibility and elegance of a RAMMS.

ETL tools often do support rich and intuitive user interfaces, and allow for easy repeatability in terms of batch scheduling to run their processes. They also are reliable in delivering and loading data into their target databases and data warehouses.

However, many of the graphical user interfaces available in today's ETL tools map simple transformations of data by using mouse clicks and lines dragged from source fields to destination fields.  This approach is great for small numbers of fields and simple data remapping, but it does not scale to the increasingly complicated, real-world scenarios facing customers today.  

The most maintainable and scalable approaches are all language-based. SQL shines here by providing excellent facilities for transforming individual fields and handling complicated joins and aggregations of data.  Another advantage of SQL approach is that it enables the creation of programmatically updated solutions.  

In other words, one can create SQL-based computer programs and readily edit the SQL rules to update them to handle changes in business policy. Normally, such changes are simple, such as a changing threshold, or validation parameters or something of the sort.

Functionality Subsumed in the RAM Model

In very many ways, the RAM model completely subsumes the functions of ETL while concurrently subsuming the functions of MOM. Moreover, RAM achieves all of this through leveraging a standards-based, widely understood and used language, SQL, allowing nearly instant productivity for the large pool of SQL-literate programmers.

Basic ETL Requirements Met by RAM

A review of the high-level requirements for ETL tools reveals just how applicable the RAM model is:

  • Collect data from a wide range of distributed, distant locations, formats, and sources

  • Efficiently process both large and small batches of data records

  • Reliably deliver the data to the specified locations

  • Easily populate external applications and databases

  • Transform data, and check and enhance against external data tables

  • Easily manage the whole process with ready repeatability

Additional Facilities Inherent in RAM

Features inherent in RAM, but which ETL tools cannot easily do, include the following capabilities:

  • Allows arbitrary reuse and re-purposing of the data streams

  • Allows arbitrary views over the data streams while the data continues to flow, without using intermediate databases or tables

  • Performs continuous, real-time processing down to record-by-record granularity

  • Manages timestamps to perform time-sensitive or time-transformation operations

  • Provides standard SQL views, queries, and operations

  • Purely declarative specification of operations with automatic optimization

  • Provides automatic load-balancing and self-healing recovery from node failures

  • Implements and offers the familiar SQL data management paradigm right down to security model, data querying, reporting, and related drivers

  • Allows easy extensibility, without stopping ongoing application execution, by using plug-in modules

  • Matches the end-to-end throughput of an optimized RAMMS-based RAM system

While ETL may have a host of other deficiencies when compared to RAM, the ones listed above serve to illustrate the powerful advantages RAM provides over the “old school of integration” – ETL.

RAM as the Grail of Merged EAI and EII

Many industry analysts have been looking for the “Holy Grail” of data and application integration. That Grail would be a way to seamlessly blend Enterprise Application Integration (EAI) with ETL and Enterprise Information Integration (EII).  We would argue that RAM is in fact that Holy Grail.

We explore why in fact RAM in many ways subsumes the need for EII in the EII page.