Streaming data ingestion, from the batch model to real-time

Traditional software architecture in financial services uses batch processes to manage data, but the need for data freshness pushes towards real-time streaming.
In this article we show use cases of Offloading and Data Integration in near-real time, to reduce costs and improve data access.

Context

In our experience, the core components of financial information systems rely on legacy systems, in an architecture conceived with a "daytime" component and a "nighttime" batch component in which the data entered during the day is consolidated. During this consolidation phase the platform is unavailable.

In this type of architecture it is not possible to use the legacy system as a database to provide services to web and mobile applications that can be used continuously. Therefore, the need arises to "move" the data onto an infrastructure enabling their use 24/7 without sacrificing the freshness and quality of the data.


Streaming to the rescue

One of the classic techniques for aligning different systems is the use of batch flows, i.e. consolidated information that passes between the different systems in a specific time slot.

This type of power supply introduces a time delta between the power system and the powered system because the information present in the flows represents the snapshot of the power system at the end of the day.

The production, loading and processing times of batch flows, as well as the need for consolidation, usually require many resources (both feeder and fed side) making it difficult to carry out multiple alignments during the day.

Solution

To overcome this limitation and support business needs that require timely information, the streaming technique comes to our aid, i.e. capturing information in real time from the source systems and propagating it towards the target systems.

Use case

In the projects we develop for our customers, the state of the art architecture includes legacy systems usually based on Mainframe or Oracle databases.

The technique we use to enable data streaming is the adoption of a Change Data Capture system (the reference tool for the market is Oracle Golden Gate) which collects the operations carried out on the records of the source system and allows their real-time replication towards a target system enabling the most modern business solutions.

The target of a CDC system is an event-streaming platform composed of topics (usually created with Apache Kafka), in which events are managed with high throughput and reliability in terms of consistency and temporal ordering of the data.

Through a streaming process tool such as Apache Flink, the data present on the topics is processed in near-real time and used by the target systems according to the logic defined for the specific use case. Classic projects that use the paradigm described above are Data Offloading and Data Integration in near-real time.