Big Data in Connected Vehicle

Technology Reply has implemented a solution that allows our customer to collect, store and analyze data referring to a connected vehicle, with the aim of identifying common patterns and allowing the business and the team of data scientists to carry out deep dive analysis on phenomena related to the connected vehicle.


For one of the most important groups in the automotive field, Technology Reply has created a solution aimed at achieving two main objectives: on the one hand, supporting the customer in the implementation of a set of indicators useful for measuring driving habits and charging behaviors of the electric vehicle, on the other hand, with a wider vision, enrich the Corporate Data Lake with data relating to connected vehicles.


The solution developed by Technology Reply allows to collect data carried out using a single Data Lake platform, which allows to handle, efficiently and effectively, a large amount of data. Connected Vehicles data can now also be crossed with different data sources, such as the geographic position of the garage and assistance works, allowing to detect the activities and conditions of the vehicles, enabling new data analysis paradigms, both in terms of descriptive and predictive data analysis and providing added value to the world of analytics.

This solution allows new business opportunities: the collected information provides visibility on the real use of a vehicle in terms of times and activities and, through the use of Machine Learning techniques, the collected data can be used to implement advanced analysis, such as predictive maintenance activities, customer segmentation, driving behavior analysis and, by exploiting the technical data of the vehicle, it is possible to anticipate the needs of customers by offering tailor-made services based on real needs.

The realization of the solution has planned the reception of multiple flows, coming from heterogeneous sources, both in real-time and batch mode. In particular, in addition to the flows of the system that collects data from the vehicle's control unit when it is switched on and off (fuel level, tire pressure, odometer value, etc.), many other sources are involved containing the data collected during the real use of the vehicle such as the status of the battery or the vehicle itself, the data from the remote operations carried out by the user and from the subscriptions to the services.

Technology Reply dealt with the study, the design and the implementation of the data acquisition and integration components of the connected vehicle within the Corporate Data Lake: a distributed, cloud-based data platform, which processes approximately 1.5 billion records daily with more than 5,000 data integration processes taken from multiple sources.

The acquisition of the connected vehicle data, starting from the heterogeneous source systems, was carried out through a market-leading ETL tool and furthermore the realization of data integration and structures was designed in order to promote front-end solutions oriented to interactive reports and self-service analysis.

Four distinct design streams have been organized for the design, design and implementation of the solution:

  • Data Ingestion & Architecture, responsible for defining the reference architecture, setup and coordination of data integration activities
  • Data Modeling & Use Case definition, responsible for designing business scenarios and feeding rules for indicators of interest
  • Data Governance, responsible for Data Quality areas, Metadata Management and the definition of a common Business Glossary
  • Data Privacy, responsible for the definition and implementation of Security and Privacy logics such as Data Retention, anonymization of sensitive data


The solution designed and built by Technology Reply provided high standards in line with what was adopted for the creation of the Corporate Data Lake:

  • Functionality: suitability, accuracy, interoperability and compliance
  • Reliability: the architecture used has a high fault tolerance
  • Efficiency: loading response times have been optimized by also adopting ad hoc compression solutions
  • Portability: the data collected can be shared with other external systems and can be used by different front-end solutions


The challenge undertaken by Technology Reply in this context mainly concerned two key points:

  • Coordination and speed of implementation of the solution: thanks to the team of Specialists and Data Engineers, it was able in a short time to meet the customer's needs and implement the solution in an extremely reduced elapsed
  • The daily integration of a large amount of data: more than 100 GB of data from the connected vehicle are integrated daily within the Corporate Data Lake. Technology Reply has been able to meet this challenge by adopting an innovative logical compression solution on the ingested data, in order to make possible the process of data in real-time mode with an updating frequencies of business indicators several times a day