Starting the data-driven journey with AWS

A Big Data Platform as added value

Scenario

A renowned sports company, started a data-driven transformation journey. In particular, the Data Analytics department was involved and the main goal was to own the core technologies at the base of the brand success. As a part of its digital vision, the client launched a platform project in order to create a new Big Data Platform, reducing maintenance costs and improving quality, flexibility and performance. The project, developed together with Data Reply, consisted in the migration from an Hadoop cluster to a more flexible, self-service oriented platform based on Amazon EMR. It increased performance and decoupled workloads into different computation environments, allowing greater flexibility.

Solution

Data Reply developed a custom platform by using AWS managed services and serverless approaches to decrease maintenance costs and improve availability and scalability of the service.

Big Data Platform and its development

Two environments have been developed, both based on Amazon EMR: a Lab environment, for data exploration and development of Data Science use cases and algorithms, with access to user interfaces and development tools; a Factory environment for production workloads and data preparation flows, to be accessed through custom APIs. Metadata is generated in the Factory and provided to the Labs through the Glue Data Catalog. 
A Chatbot has been developed to allow Lab users to perform operational tasks on their own Lab, like starting, stopping, scaling it in or out, inquiring status etc.
Throughout the development process, security, monitoring and alerting have been always taken into account in order to deliver, together with flexibility, scalability and performance, as well a very high degree of trust towards the platform itself. Best practices have been implemented to protect the data and the computation power from unallowed access. 
The next steps are going to be the integration with more of AWS's Big Data related services, (e.g. Athena, Redshift, Fargate..) in order to support delivering more diverse types of use cases (dashboards, data-driven applications, streaming applications, image processing etc) on top of the Big Data Platform.


  • strip-0

    Data Reply

    Data Reply is the Reply group company specialized in Big Data, Data Science and Artificial Intelligence. Our experience covers 4 main business areas: Sales and Marketing Intelligence, Big Data Engineering & Security Intelligence, Enterprise Intelligence, IoT & Industry4.0 Intelligence. We count more than 40 projects in production involving Datalakes, both on-premise and in Cloud, and ML & AI algorithm development. The algorithms are boosted by the Quantum Computing innovative approaches and we also offer Data Science and Deep Learning Education programs.