Solution
Starting from RCS’ requirements and drawing upon its extensive expertise gained through projects completed for numerous customers, Data Reply designed and implemented a Microsoft Cloud IaaS solution based on the Hadoop platform and in particular on Cloudera's distribution, a company of which Data Reply is a Silver partner. The information stored in RCS’ systems relating to user browsing is diverse and heterogeneous:
- Data Management Platform: metadata about pages and their contents, navigation-related events (e.g. clicks on an article) and visualisation-related events (such as video playback), mapping of users on their interest segments, at a content level;
- Web analytics solution: real-time navigation information (app, store, C+);
- Datawarehouse: comprehensive view of users, as subscribers and buyers;
- Semantic analysis platform: taxonomic and semantic information about textual content.
In order to integrate these data sources among each other and find synergies, a data ingestion layer interfacing with the Cloudera environment was designed and developed, which maintains the information synchronised with the source. Thanks to this approach, the Big Data environment becomes a natural playground for data exploration, data analysis and advanced analytics techniques, as a single point of access and integration of the information. The architecture and the horizontal scalability typical of Hadoop, which due to the addition of new commodity machines allows the computational power to grow as needed, also facilitates the handling of data based on an agnostic approach with respect to its volume, velocity and variety. This allows the data science team to freely explore and process the data, creating models within the context of initiatives targeted to key business objectives. The R and Python data science tools, used through an innovative collaborative data science platform, have allowed the “data scientists” team to achieve the first promising results in record time, viewed through business KPIs on a custom dedicated dashboard.
