Data streams: In the right place at the right time

A Cloud Streaming Platform for the Telecommunications Industry

Starting Point

The amount of data that is available to a telecommunications company how much of it, it generates on a daily basis is enormous, but also extremely useful. This includes information on customer contracts, internal data logs and many more. To make it possible for all departments of a large telecommunications provider to efficiently use the multitude of data that reaches, travels through or leaves the organisation every day, the company has been using the data streaming approach for some time now.

It was important for the company that different departments could be provided with the same data sets in order to be able to use them for their respective needs. While a company's business intelligence focuses on analysing data and deriving new use cases from it as precisely as possible, technical departments use it to develop new applications. Overall, the information gained from the data analysis enables a company to gain insight into many aspects of its organization and customer activities, such as server activity or customer use of services, and to respond quickly to changing situations.

Solution

In order to be able to react to all the challenges of processing huge amounts of data, the telecommunications service provider decided to implement a cloud-based solution within the business units together with the experts from Data Reply.

Furthermore, in order to efficiently implement the data streaming platform used for this, Kubernetes on AWS with multiple accounts was chosen. With this solution, incoming data can be read in real time.

Requirements


1. The customer's requirement was that the various Kafka cluster environments should be set up in a way that allows data to be captured from their actual sources. In addition to this scalable infrastructure, new big data use cases can now be created.

2. Another requirement was that the solution should be automated, scalable and fault tolerant. The challenge for the specialists at Data Reply was to transfer their current knowledge about building data-lake platforms on premise to the cloud. This requirement meant more development work in terms of security and appropriate scaling of the Kafka clusters.

technical implementation

The developed solution is fully encrypted – to ensure the required security – and has enabled authorisation and authentication at the data level. Data Reply redesigned the entire Kafka infrastructure on a Kubernetes cluster to address some of the issues that would otherwise have been difficult to resolve. One of the advantages of Kafka is the automated verification of data consistency. If, for example, a column of a data set is accidentally deleted, this does not mean the failure of a process: Kafka automatically detects inconsistencies and stops the system before the data set can be destroyed.

Moreover, the DevOps approach was chosen for the project to ensure agile and rapid development of the solution and at the same time facilitate collaboration between Data Reply's data-lake specialists and colleagues at the telecom company.

Availability within minutes

The solution has significantly increased the speed at which platforms can be deployed. Deploying a cluster that already contains all the requested data now takes less than 30 minutes. With the automation and scalability of the solution, everything has become replicable for each department in the organization and it no longer takes several months for the organization to do a deployment without the infrastructure. In addition, the variety of use cases that can be deployed on the enterprise platform has expanded.

The project is currently being further developed in order to be able to implement new internal tasks, for example GDPR implementation. The guideline demands new tools in order to be able to quickly guarantee requirements such as the deletion of customer data upon request.

Advantages of the Solution

  • Flexible open source product
  • Simple infrastructure
  • Very well suited for data streaming and triggering actions
  • Autoscalable
  • Automated data consistency check