We've touched upon the importance of data quality, and what we might term sort of a data-centric strategy around machine learning development. Now, what's your understanding of a machine learning pipeline?

It is important to understand that there is a machine learning model and then there is a machine learning pipeline. The model is just a set of coefficients that defines the mapping between the input between the features of a new instance and the target variables.

But the machine learning pipeline is already an engineering concept; it’s a system-level concept, which defines how you ingest the data, how do you pre-process the data, then how do you push the data through your machine learning model, and ultimately, how do you serve it in a scalable way.
Essentially, the entire chain of data processing delivers end to end experience from data ingestion down to a specific prediction .

From this perspective, there are two types of pipelines: the training pipeline and the inference pipeline. The training pipeline is necessary for creating a model and it must maintain reproducibility properties such that once you have a model, given the same data, you could get the same model, which is statistically insignificant from the one that you had before.

Then the inference model helps to answer the question on how to take an already trained model, taking a new piece of data that passes through the entire processing mechanism of your back end, and then serve it in a scalable manner. In that case, the focus shifts to model maintenance, model monitoring, and essentially, everything that happens after the model is put into production to interact with real users.

One thing that people need to understand is the concept of model performance model drift and data drift. How does this fit into MLOps and in terms of management of models and drift?

Two distinct conversations need to be explored. One is about MLOps 

At the moment, we are at a stage where organizations should think about implementing these practices. We need to implement those touchless systems, implementing DevOps cycle where you work on your development environment which automatically moves to QA and then after certain automated tests and some quality gates, each should move to production.

There are challenges to it in terms of data preparation, because we have multiple data sources, multiple formats, and then the cleaning and the transformation.

On the model building side, the choice of algorithms is huge; we cannot restrict the data scientists in terms of choice of language or the development tools, so we need to support a variety of systems. On the model training side, we find several challenges as well, as they are a code-first approach, and this carries several intrinsic issues.

Following this, they’re registering the models and the ML engineers will be validating and deploying, and it’s an end-to-end feedback loop. The aforementioned data drift is a trigger, where you have a baseline of the data and the next day, the data is getting changed and this can trigger a model rate retraining, and then model performance drift.

The issue of data drift is so often misunderstood, particularly by business leaders rather than technical people. What is the importance of drift monitoring? And why it's important in terms of managing your model performance? When when you have models in production?

In the instance of data drift, you have a distribution, all the machine learning and deep learning methodology, as a subclass of machine learning, and the model works only if you get data from the same distribution. If the distribution changes, but you have the model trained on the previous replicas of data or the previous version of the distribution, then it will start making mistakes.

This leads to problems such as incorrect predictions, or it might harm users, deliver a bad user experience, or the model might start discriminating against people.

To solve this, explainable AI, as a subfield, which is advocating for more interpretability in the AI model development, comes to help.

What are the challenges in this scenario?

Your data is not a universal set of all events happening in the world but it’s a sample and as any sample, it could be a biased sample. From that bias, you could have very different implications, and this leads to two very under-discussed aspects of machine learning model evaluation. 

Many people, are familiar with cross-validation and statistical model evaluation. You have a sample of your data, you take a subset of your data and as a holdout, you train on 80% of your data, and then you evaluate it on the rest of 20%, which the model hasn't seen before during training. If there is an intrinsic bias in this 20%, the model will have issues and would not reflect accurately the sample”.

Nick Spirin

How should an organization that is thinking about building machine learning applications, reflect on designing a pipeline for a specific industry use case and what are the kind of considerations in designing that pipeline?

Firstly, we need to understand that the impact of AI is not immune to the Pareto principle. So from 80% to 90% of the value will come from 20% of the potential use cases. The first thing that any industry should do is to find those high-value low hanging use cases that are good candidates for machine learning applications. A big advantage is that even if we are talking about industry-specific problems, the starting phase, is quite generalized.

For instance, from a rigid and risk-averse environment, the leaders need to look at a more agile, experimental and adaptable environment because the outcome is not guaranteed. There are solid machine learning items where we have achieved human parity at so many different levels, computer vision, chatbots etc., nevertheless, a perfect solution to your problem is not guaranteed.

This is due to the experimental nature of the application. A more agile mindset needs to come from those industries which are known to be the most rigid and adverse when approaching agile mindsets to their projects and are lagging on the adoption of machine learning

Business leaders also need to consider the build-versus-buy side of it because even if they have identified a use case, they need either data science in-house or they need to go for a commercial offering to build it. They, have to think about the quality of the cost, the data quality and they need to look at the data, how they will animate it can how much they can rely on a legacy data system. So patience becomes a virtue.

What are your last thoughts when considering the issue around machine learning pipelines and data-centric machine learning and what are your recommendations?

Firstly, it’s important to think about that problem at the platform level and start thinking about the building blocks that you could start reusing within your organization to enable more AI adoption over time.  Only in 2015, at Google, there were only a few large-scale machine learning applications; now it’s like more than 1000 different applications.

Inside Google powered by ML, every single decision that is made has some ML elements within itself and the same happens for any organization that is trying to transition to become AI-powered or adopt this new powerful technology.

Without a platform that could enable AI at scale, then on one side, there will be a great demand for AI after its first few successful applications but on the other side, their platform won’t be at the right maturity level to support these initiatives.

This gap makes projects fail at an even higher rate. From a 70% rate  up to 89,  90% of all digital transformation program fails, machine learning production models then fail and fail to be deployed.

People think about isolated ML applications and not systematically at the level of organization to balance supply and demand together. This is why 60% to 70% of all digital transformations fail. The crucial thing is to not look at these use cases in isolation, as you want to find use cases that are complementary to each other.

For instance, in the case of a manufacturer, they may want to do remaining useful life estimation to predict when a part will fail in a machine but, complementary to that, would be to do anomaly detection, to understand what in the machine is currently operating out of distribution.

Subsequently, they can then connect onto another pipeline that is effectively a reinforcement learning pipeline, so that we can have a pipeline that manages and optimizes the risk of failure.

As you scale them horizontally, these leads you from weak AI, with a singular application, to a system that when horizontally scaled, starts to look much more strong AI like, where you can predict when something’s going to happen, you know what’s going wrong, you can self optimize to minimize it and you could even have a front end with a conversational interface that allows the human engineer to converse with it.

Do you want to find out more?