Media streaming service: an ecosystem for real-time data insights

Background

Knowing the trends, analyzing behaviors, and meeting copyright contracts – businesses in the online on-demand industry must provide highly customized services. To meet this need, an online media streaming service had to implement a variety of real-time data-driven use cases.

The most important one: preventing customers from being able to log into a premium account on several devices simultaneously with a blocking feature. This is necessary because of legal contracts with content providers, but also has a clear impact on the service business. The blocking functionality also needed to work without data from the frontend players because at that time the frontend components did not provide the necessary information for the backend.

The client operates his infrastructure completely on the Amazon Web Services public cloud and has a focus on using modern technologies and approaches for containerisation, scalability, and event sourcing.

Exit sourcing and an event driven architecture

Data Reply was tasked to build the first use case (blocking concurrent Streams) from scratch. The blocking must be enacted quickly so the team decided on a near-real-time event-driven approach.

The idea: The frontend components would generate events which need to be processed in a scalable way. The events would be processed in the cloud environment and the results would be made available through REST APIs.

Validating the right data

First, the consultants needed to identify what data was needed to block the simultaneous account access. Therefore, Data Reply developed an integration for the customer frontends that would send packages of information (so-called heartbeats) every 10s to a REST API layer. These events are then validated and sent to Apache Kafka, a scalable event streaming platform, where they can be distributed to a multitude of micro-services which all enable different use cases.

For the blocking of concurrent streams, a micro-service using the Kafka Streams framework aggregates the heartbeats in real-time and provides REST APIs. These enable the frontends to check whether the video stream they are displaying needs to be blocked or not.

Enabling an ecosystem of real-time use cases

Once structured data was provided through the event streaming platform, more and more use cases could be defined and prioritised:

The infrastructure: flexibility and scalability with cloud services

Accommodating the cloud-only approach of the customer, Data Reply leveraged AWS technologies and specifically serverless technologies wherever possible.

All services have been containerised and deployed on container orchestration services like Fargate and ECS. DynamoDB has been used for intermediate storage of unstructured JSON payloads for clickstream tracking. Logging is performed through CloudWatch. Moreover, all infrastructure provisioning and rolling updates of the services are performed through AWS CloudFormation with an Infrastructure-as-Code approach. AWS Secrets Manager is leveraged to securely share credentials for other systems with the container instances. Permissions are efficiently managed through IAM policies.

Best practices for AWS development and infrastructure provisioning are used to provide efficient use of resources. Templates have been prepared to lessen the bootstrap time of rolling out a new stream processing service. This limits the effort spent on infrastructure and deployment, which can then be spent on implementing business logic and deliver value.

Advantages of the new solution

The new solution is scalable by design, taking advantage of the elasticity of AWS Cloud services and of the scalability guarantees of Apache Kafka.

Moreover, the introduction of event sourcing with a high volume of granular and information-dense data points allows for a variety of new use cases and business values to be explored and brought to production. All the services and architectures have been implemented with minimal operational and maintenance effort, which decreases the total cost of ownership.

This allowed Data Reply's customer to meet contractual obligations, deliver an improved user experience and provide more accurate and timely data to the business stakeholders in a cost-efficient manner.