Media Streaming Service: An ecosystem for real-time data insights

Background

Knowing the trends, analyzing behaviors and meeting copyright contracts – businesses in the online on-demand-industry must provide highly customized services. To meet this need an online media streaming service had to implement a variety of real-time data-driven use cases.

The most important one: preventing customers from being able to log into a premium account on several devices simultaneously with a blocking feature. This is necessary because of legal contracts with content providers, but also has a clear impact on the service business. The blocking functionality also needed to work without data from the frontend players because at that time the frontend components did not provide the necessary information for the backend.

The client operates his infrastructure completely on the Amazon Web Services public cloud and has a focus on using modern technologies and approaches for containerization, scalability and event sourcing.

EVENT SOURCING AND AN EVENT DRIVEN ARCHITECTURE

Data Reply was tasked to build the first use case (blocking concurrent Streams) from scratch. The blocking must be enacted quickly so the team decided on a near-real-time event-driven approach. The idea: The frontend components would generate events which need to be processed in a scalable way. The events would be processed in the cloud environment and the results would be made available through REST APIs.

VALIDATING THE RIGHT DATA

First the consultants needed to identify what data was needed to block the simultaneous account access. Therefore, Data Reply developed an integration for the customer frontends that would send packages of information (so-called heartbeats) every 10s to a REST API layer. These events are then validated and sent to Apache Kafka, a scalable event streaming platform, where they can be distributed to a multitude of microservices which all enable different use cases.

For the blocking of concurrent streams, a microservice using the Kafka Streams framework aggregates the heartbeats in real-time and provides REST APIs. These enable the frontends to check whether the video stream they are displaying needs to be blocked or not.


ENABLING AN ECOSYSTEM OF REAL-TIME USE CASES

Once structured data was provided through the event streaming platform, more and more use cases could be defined and prioritized:

  • Trending Content: The data collected by the frontend is essentially real-time information on what end users are watching. Data Reply developed microservices to aggregate this data and provide real-time insights into which content is currently trending (both live TV channels and Video-on-Demand contents). This enables the applications to highlight the trending content to the users, thus improving content discovery and user experience.
  • Resume Position: thanks to precise information on the current position of the user within the video stream, a more exact and always up to date playback position can be stored. Using this data to continue the stream at a later point results in a better user experience. Previously, this was achieved via explicit saving, which did not happen when the user closed the video stream abruptly (for example when shutting off the TV).
  • Video Stream Quality analytics: in addition to information on the content users are playing, the frontend component was extended to send events like commercial breaks, starts and ends, playback errors, degradations of bitrate and much more. With the event streaming platform and a real-time OLAP database, real-time analytics on the quality of the video streams has been bootstrapped.
  • Analysis of viewing behavior: previously, the only way to know if a user stopped watching a particular content was to receive an explicit "stop" event. This did not happen often as Smart TVs were just shut off, browser tabs and mobile apps closed or similar. In order to get a better picture of the end users' behavior, the heartbeats developed by Data Reply are also being used to drive real-time analysis.
  • Event Tracking: many of the events sent by the frontend component, as well as the computed view end events, have been integrated into the tracking and clickstream analytics solution via the event streaming platform


THE INFRASTRUCTURE: FLEXIBILITY AND SCALABILITY WITH CLOUD SERVICES

Accommodating the cloud-only approach of the customer, Data Reply leveraged AWS technologies and specifically serverless technologies wherever possible.

All services have been containerized and deployed on container orchestration services like Fargate and ECS. DynamoDB has been used for intermediate storage of unstructured JSON payloads for clickstream tracking. Logging is performed through CloudWatch. Moreover, all infrastructure provisioning and rolling updates of the services are performed through AWS CloudFormation with an Infrastructure-as-Code approach. AWS Secrets Manager is leveraged to securely share credentials for other systems with the container instances. Permissions are efficiently managed through IAM policies.

Best practices for AWS development and infrastructure provisioning are used to provide efficient use of resources. Templates have been prepared to lessen the bootstrap time of rolling out a new stream processing service. This limits the effort spent on infrastructure and deployment, which can then be spent on implementing business logic and deliver value.

ADVANTAGES OF THE NEW SOLUTION

The new solution is scalable by design, taking advantage of the elasticity of AWS Cloud services and of the scalability guarantees of Apache Kafka.

Moreover, the introduction of event sourcing with a high volume of granular and information-dense data points allows for a variety of new use cases and business values to be explored and brought to production. All the services and architectures have been implemented with minimal operational and maintenance effort, which decreases the total cost of ownership.

This allowed Data Reply's customer to meet contractual obligations, deliver an improved user experience and provide more accurate and timely data to the business stakeholders in a cost-efficient manner.

  • strip-0

    DATA REPLY

    Data Reply is the Reply group company offering a broad range of advanced analytics and AI-powered data services. We operate across different industries and business functions, enabling them to achieve meaningful outcomes through effective use of data. We have strong competences in Big Data Engineering, Data Science and IPA; we build Big Data platforms and implement ML and AI models in a manner that is repeatable, efficient, scalable, simple and yet secure. We supports companies in combinatorial optimization processes with Quantum Computing techniques that enable an engine with high computational performances.