Best Practice

MLOPs: machine learning operations

The application of DevOps principles in ML systems enables large-scale solutions to be implemented and managed reliably and efficiently.


Despite the increased adoption of models based on Machine Learning algorithms in recent years, companies have only been able in part to deploy solutions based on this type of technology into production, achieving a good return on investment.
Machine Learning Operations (MLOps) is a set of practices conceived with the aim of filling the gaps related to the integration and maintenance of these systems within companies' software architectures, based on the DevOps principles to facilitate the development, management and maintenance of these tools.

MLOps vs DevOps

DevOps is a software development methodology based on the principles of Continuous Integration and Continuous Delivery. Its purpose is to make development quicker and more efficient through frequent testing, integration and release cycles.

These practices are necessary but not sufficient for the development of software based on Machine Learning algorithms, for the following reasons:

  • Continuous Integration is not only about software components, but also about the underlying data and model

  • Continuous Delivery no longer concerns a single software package or service, but also the entire model training pipeline

  • a model needs to be re-trained over time

The notion of Continuous Training must therefore be introduced, meaning that we must automate the re-training of the model and the deployment of the new prediction service.

Our MLOPs approach

Thanks to project-based experience gained in recent years, as well as to scientific literature on the subject, we have refined our MLOps approach, summarised in the following key points.

Data versioning

Ensuring the reproducibility of the training phase

Each stage in the development and industrialisation of a model must be reproducible. Several experiments are carried out during the development phase, which may vary in terms of both the model architecture used, and the data source and its processing. It is not sufficient to simply follow the model configuration in order to reproduce the results obtained during this iterative process. Data versioning must also be possible.

Real case application

As part of a series of activities aimed at reducing the churn rate of the services offered by one of our customers, we built a model capable of predicting those who would cancel their contract within a month. After re-training the model, we noticed a sharp drop in performance. To understand the source of the problem, we decided to compare the versions of the datasets used to train the last two versions of the model. During this analysis, we discovered that some numerical features contained text values, and that the problem occurred due to a change in the data preparation phase.

Versioning models

Enhancing the resilience and maintainability of the model in production

Downstream of industrialisation, both due to conflicts arising during the release of a component, and for performance reasons, it may be necessary to redistribute an earlier version of the model. To do this, it is also important to have information on the versions of the pipeline components used with the previous version of the model, as well as the metrics used to evaluate the performance. A versioning of both the model image and its metadata is therefore needed to ensure the maintainability and resilience of the software.

Real case application
During one of our projects, we deployed a new version of a model in production capable of classifying the customers of a website based on their behaviour while browsing the site. One day, this new model started to classify all customers as “beginners”, namely customers with a low level of familiarity in doing business on the website. During an analysis phase, we decided to redeploy the previous version of the model in production, as it returned less anomalous data than the new version. After analysing the results, we noticed that the performance of the previous version was also abnormal. Comparing the two versions of the model made it possible to identify that the source of the problem, in fact, lay in a malfunctioning data source. The source contained user behaviour pertaining to an advanced feature of the website. Since this advanced feature was not used by all customers, the model therefore failed to classify customers appropriately.

Monitoring performance

Ensuring the reliability of results over time

The relationship between the various entities and phenomena present in nature is constantly changing. The properties of data used by the industrialised model change over time, degrading its performance. In fact, it is important to remember that a model only guarantees the performance obtained during development if the data at its disposal belongs to the same data distribution identified during its training. By monitoring the model, it becomes possible to observe a change in performance and, if necessary, to trigger a new training cycle. This factor is crucial to ensure the reliability of results over time.

Real case application

One of our clients had developed an algorithm that could predict their customers’ purchases based on their behaviour on their e-commerce platform. When the model went into production, they observed low performance and could not understand why. When analysing the performance according to the time of day, we noticed that the model was associated with a high accuracy only at certain times of the day. After analysing the dataset used for training, we found that the model had been trained using data representing user behaviour only at the specific time when the model returned satisfactory results. By re-training it with a more representative user behaviour dataset covering the entire day, it was possible to improve the performance of the model considerably. This example clearly illustrates the importance of configuring a thorough monitoring of the model’s performance.

MLOPs with Google Cloud

Google Cloud offers a range of services to meet the various needs that may arise during the development and deployment cycle of a Cloud product. As a certified Google Cloud Partner, our team recommends the following services for implementing the key principles described above.

BigQuery for data versioning

BigQuery is a data warehouse optimised for the storage and analysis (via SQL) of large datasets. Internal tables can be partitioned based on the insertion date in the database. Partitions are stored and treated as individual tables, optimising the cost and speed of queries performed on individual partitions. This service offers an ideal tool for versioning the data used for training a model, which typically consists of a large set of data on which analyses need to be carried out where appropriate.

Cloud Storage as a registry of model versions

The service offers an object storage system with various customisation possibilities. In particular, it is possible to select from different storage classes, based on the desired access frequency, and to optimise management costs.

Cloud SQL as a registry for model metadata

Cloud SQL is a fully managed service for relational databases that guarantees scalability and high availability. It also integrates well with other services such as Google Kubernetes Engine, Compute Engine and App Engine. It provides an excellent, high-performance backend to store model metadata such as the metrics used to evaluate its performance and the component versions used in production.

TensorFlow Model Analysis to monitor model performance

TFMA is one of the TensorFlow Extended components the Google platform recommended for the use of machine learning models on a production scale. In particular, TFMA makes it possible to calculate and view the evaluation metrics for your models. This helps in assessing the performance of the model to ensure that it meets specific quality thresholds and performs as expected for the relevant data.

Kubeflow Pipelines as an orchestrator of the model development and distribution chain

Kubeflow Pipelines is a component of Kubeflow, Google’s open-source project aimed at simplifying the packaging and re-use of ML pipelines. It enables the management of components developed using the TensorFlow Extended libraries. In addition, it provides a useful user interface for keeping track of both the experiments conducted during the development phase and the various executions of the prediction service in production.


Machine Learning Reply is the Reply Group company that specialises in Machine Learning, Cognitive Computing and Artificial Intelligence solutions based on the Google technology stack. Based on the latest developments in artificial intelligence, Machine Learning Reply applies innovative Deep Learning, Natural Language Processing and Image/Video Recognition techniques to different realms of application, such as smart automation, predictive engines, recommendation systems and conversational agents.