Industrialising Machine Learning
Click the image to access the recording

First, what is machine learning (ML) and how can we define it?

Many definitions of machine learning float around. Machine learning is an application of artificial intelligence, that includes algorithms that parses the data, learn from the data, and then apply what they have learned to make informed decisions.

There are hundreds of examples of machine learning;  the music streaming service, for instance, uses ML to make a decision about which new song or artist to recommend to a listener. Machine learning algorithms associate the listeners’ preferences with other listeners who have similar musical tastes.

This technique is used in many services that offer an automatic recommendation.

More specifically, right now deep learning (DL) is considered an evolution of machine learning and it uses programmable neural networks that enable machines to make accurate decisions without the help of humans so it’s, it’s more sophisticated.

Andrew Ng, the chief scientist of Baidu, says that AI used a very popular analogy to explain DL;

Deep Learning is very similar to a rocket ship; You need a huge engine and a lot of will. And if you have a large engine and a tiny amount of fuel, you won't make it to orbit. Similarly, if you have a tiny engine and a lot of fuel, you won't even take off. So the analogy to deep learning is that the rocket engine is the deep learning model. And the fuel is a huge amount of data that these models feed on.

Andrew Ng

In terms of the place of machine learning and modern analytics, where does it fit into that very broad field of AI?

Organisations and scientists need to approach these questions by thinking about ML on a spectrum; from the simplest automation to the most sophisticated type of home automation.

On that spectrum, from very basic rule-based automation to strong AI, which essentially is equivalent to the human brain, you could have a place for machine learning.  ML comes in use when the complexity of the problem is big enough that you cannot maintain it all hardcoded in rules.

A solution is to take a lot of data, pass it through the algorithm, and make sure that the coefficients, the parameters in the machine learning model, capture the statistics in the data and different dependencies between the features and the target variable so that then it could be used in making new predictions.

From this perspective, ML is one of the possible techniques to achieve automation and, it’s a technique that allows us to do more complex types of automation, compared to what was available before with simple rules, and standard programming languages or paradigms.

Danone for instance, through ML automation, was able to achieve a 20% decrease in forecasting errors, 30% decrease in lost sales and 50% reduction in demand planners’ workload and a recent McKinsey report suggests that AI can improve forecasting accuracy in manufacturing by 10-20%, which translates to a 5% reduction in inventory costs and a 2-3% increase in revenues.

How has machine learning developed over its long history and more recently, in particular?

In terms of evolution, the first notable place in the history of AI was Rosenblatt paper, about perceptron.

From the implementation of a single neural mechanism where you could pass data for it, this mechanism will learn a way to separate points in a two-dimensional space, and the goal is to make a line that will separate them.

On the paper, it was then proven that with a sufficient number of samples coming from the same distribution, we could use the perceptron algorithm to draw that line. That was the beginning of, AI and machine learning.

Then the term got rebranded multiple times. It was pattern recognition or decision trees then  ultimately neural networks.

In 2012, there was a breakthrough: for the first time in the history of humankind, a neural network-based algorithm was able to beat the algorithm for computer vision based on alternative approaches, like morphology analysis, different filters, and so on.

And then computer vision-based deep learning algorithms surpassed humans, or they became comparable with humans in the task of making images. One of the most exciting applications, says Nik Spirin, happened in May 2021, when OpenAI released products where you could speak to the computer and the computer will write programs for you.

In summary, it’s probably unhelpful to try and classify definitions of what machine learning is because there has been a confluence of different techniques, methodologies, from different fields. We shouldn’t linger too long on definitions. What we want to get to is where the state of the field is, and how do we scale these systems.

Moving to data, is there a need for more data-centric approaches to ML?

There is a very distinct shift from the field of data as well as machine learning. A merge is happening between traditional data warehouses, unstructured data and machine learning. Data-centric and model-centric machine learning are not separate things, says Rajdeep Biswas; these need to walk together. You can build heavy deep learning models. But to custom tune it, you still need data.

Industrialising Machine Learning
Source: A Chat with Andrew on MLOps: From Model-centric to Data-centric AI

Another aspect of this is when someone is using commercial machine learning models. For example, at Microsoft, says Rajdeep Biswas, we expose it as cognitive services. Those are pre-built models that we build, not using customers’ data, but using our data, and then customers can use their data to train it.

We see, continues Rajdeep Biswas, that all cloud companies are trying to minimize the need for this huge amount of data that you need to build for custom machine learning.

However, as of today, data is still very much needed. And data should be the spearhead of how we are trying to solve the business problem and make an informed decision.

In the literature that has been recently published, around data-centric versus model-centric, which is the in the end, optimizing the model, continues Rajdeep,  what emerges is that there is an effort in trying to deal with noise in the data posthoc, and that is why Data Quality Matters.

Another aspect of this is when someone is using commercial machine learning models. For example, at Microsoft, says Rajdeep Biswas, we expose it as cognitive services . Those are pre-built models that we build, not using customers’ data, but using our data, and then customers can use their data to train it.

We see, continues Rajdeep Biswas, that all cloud companies are trying to minimize the need for this huge amount of data that you need to build for custom machine learning. However, as of today, data is still very much needed. And data should be the spearhead of how we are trying to solve the business problem and make an informed decision.

In the literature that has been recently published, around data-centric versus model-centric, which is the in the end, optimizing the model, continues Raddeep, what emerges is that there is an effort in trying to deal with noise in the data posthoc, and that is why Data Quality Matters.

Do you want to find out more?