Software moves streams in real time
By Ugur Cetintemel
,
Network World
, 05/02/2005
This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
- Share/Email
- Tweet This
- Print
Applications that process real-time datastreams are pushing the limits of traditional data processing technologies. These
applications are characterized by the need for sub-second response times - whether they involve automating trades, monitoring
networks for intrusions, or tracking credit card transactions for fraud. Applications that depend on the traditional store-and-query
model cannot handle the volume and velocity of streaming data, whose value might exist only in the moment.
A stream-processing engine (SPE) is data management software that enables the execution of queries and computations - and
ultimately, actions - on streaming data in real time. Previously, queries and computations could only be executed with stored
data using standard database management systems. An SPE accepts SQL-like, stream-oriented, continuous queries and executes
them over live event streams, outputting results in real time.
An SPE achieves real-time operation by integrating several mechanisms. First, it supports inbound processing, in which incoming
event streams immediately start to flow through the continuous queries as they enter the system. The queries transform the
events as they move, continuously producing results, all in main memory. Read or write operations to storage are optional
and can be executed asynchronously in many cases.
Inbound processing overcomes a limitation of the traditional outbound processing model conventional database management systems
employ, in which data must be inserted into the database and indexed before any processing can take place. By removing storage
from the critical path of processing, an SPE achieves significant performance gains compared with traditional processing approaches.
Second, an SPE adopts a single-process model, in which all time-critical operations (including event processing, storage and
execution of custom application logic) are run as part of one multi-threaded process. This integrated approach eliminates
high-overhead process switches present in solutions that use multiple software systems to provide the same capabilities.
Third, an SPE provides a flexible, in-process storage model and standards-based access to external databases. In-memory hash
tables are used for very fast insert and look-up operations. Embedded databases are used to ensure persistence of data and
can be accessed and manipulated using SQL-style declarative queries. External, remote-process databases are accessible through
standard Open Database Connectivity calls and are convenient to use when supporting legacy databases or facilitating database
sharing with external applications.
An SPE has built-in filtering, aggregating and correlating, and merging operators that manipulate windows of events. Standard
SQL is defined over finite-sized tables, and an execution engine thereby knows when it is finished with all its operations.
In contrast, streams potentially never end, and an SPE must be instructed when to finish processing and output an answer.
Comment