Resolving Disassociated Processing of Real-Time and Historical Data in IoT

Konstantin Boudnik, Chief Technologist Bigdata Open Source Fellow, EPAM

With the increasing pace of digital disruption, many enterprises today are focused on staying ahead with real-time data and real-time insights. Real-time analytics provides an opportunity to make proactive decisions, eliminate risks and gain competitive advantage in the marketplace by allowing companies to react quickly to changing conditions by tapping into data that’s always on.. For example, if you’re launching a healthcare app, you could use real-time monitoring information to provide early warning signs and save patients’ lives. But the challenge of processing real-time data from a variety of sensors and mobile and remote devices, in combinations with historical datasets to enhance actionable insights, can add more complexity into already sophisticated pipelines. There are a few solutions to adding two data-flows, requiring the same data to be processed twice, to the same pipeline.

The most advanced solution is to combine the batch, stream processing, and serving DBcomponents intoan in-memory data fabric. A system like this will transparently work with the stored and streamed data at the same time in a transactional fashion. Therefore, the data fabric becomes the single source of truth. Alexandre Boudnik, a computer scientist who has worked on compilers, hardware emulators and testing tools for over 20 years, came up with the term Iota architecture (Greek letter i).

  The most advanced solution is to combine the batch, stream processing, and serving DB components into an in-memory data fabric 

One example of this solution is using Apache Kafka, in combination with Apache Ignite, to provide messages serving and to process the streaming in combination with the data retained in the secondary storage (Apache Cassandra, HDFS, or even a traditional RDBMS server).Feature rich in-memory data-fabrics like Ignite:

• Behave as a data sink with persistence guarantees either in a secondary database storage or a distributed file system
• Implement distributed computation models for stateful CEP and streaming
• Expose APIs for applications written in Java, Groovy, Scala and other languages
• Provide complete support for SQL querying including indexing, distributed joins, and more

Another approach that was adopted earlier is known as Lambda architecture (Greek L), where two intake layers deal with the incoming data at different speeds, 

reconciling at the query point (commonly called serving DB). While offering better delivery SLAs, it isn’t free of interoperation impedance and high operational, hardware and management costs. One particular issue experienced is the correct recovery following the failure of an intake. The recovery logic is frequently moved to the client software forcing it to be stateful and more complex as a result. Making changes to and deploying the stateful code in a distributed system could be quite an intricate undertaking, especially with a need for data reprocessing once the new code is provisioned and running. One possible optimization, sometimes dubbed as Kappa architecture (Greek K), is to combine the batch and stream processing components into a single sub-system, which is then used by the serving DB in the querying. Some telecommunications companies are using this kind of processing to capture and consume data with sensor feeds and telemetry through Apache Kafka and then piped into a streaming dataflow engine, like Apache Flink, for analytics.

The more advanced solution, Iota architecture, is the most recommended. The Iota design pattern has a number of unique properties: elimination of the need for expertise spanning multiple programming models and platforms, reduced hardware needs, low data-center operational complexity, a shorter application development and deployment loop, and a low-cost, long term ownership. This combination helps to increase the data platform ROI by bringing down the capital expenditures into rapidly commoditized computer systems and using smaller development and cluster operation teams.

See Also:

Top Big Data Solution Companies

Top Big Data Consulting Companies

Read Also

Big Data: Separating the Hype from Reality in Corporate Culture

Big Data: Separating the Hype from Reality in Corporate Culture

Brett MacLaren, VP, Enterprise Analytics, Sharp HealthCare
Maintaining Maximum Relevancy for Buyers and Sellers

Maintaining Maximum Relevancy for Buyers and Sellers

Zoher Karu, Vice President and Chief Data Officer, eBay
Building Levies to Manage Data Flood

Building Levies to Manage Data Flood

Adam Bowen, World Wild Lead of Innovation, Delphix
Big Data, Small Business

Big Data, Small Business

Matt Laudato, Director of Big Data Analytics, Constant Contact