One of the hard truths of the machine learning and AI industry is that only 16% of companies have figured out how to scale AI but 50% of IT leaders recently said in some essential research that they would struggle to move projects past the proof-of-concept stage. (Accenture)
Moreover, during this research, Accenture estimated that on average, a $100 million R.O.I gap was highlighted between organizations that are successfully strategically scaling their machine learning applications and those that are stuck in the PoC cycle.
So why is industrializing machine learning and AI so hard?
Knowing how to train and test models is no longer the hard part of data science. Making them work at scale to industrial standards is.
Industrialising and scaling machine learning are hindered by several factors. First is a very manual traditional manual development process, where a very large engineering overhead is necessary to put those systems into production, manage them and effectively industrialize them.
Google illustrated the size of the engineering overhead to putting ML code into production with large engineering overhead in a very clear and straightforward diagram. In it, the company illustrates the degree of engineering overhead, highlighted in blue, that is necessary to orbit around a small part of machine learning code for it to run at scale in production.
Engineering overheads is not the only challenge in play. To train and test models and make them work at scale to industrial standards, has an incredibly high cost of data to transfer and compute when we reach high volumes of data that we meet within, in particular the IoT setting.
Increasingly, we were able to notice that distributed and fragmented data landscape that derives from the use of distributed digital devices then creates several data privacy and security concerns considerations, despite an ever-evolving regulatory landscape.
At T-DAB, we believe that data at an industrial scale as having gravity.
In small amounts data is easy to move, store, process and easy to manage. In large amounts, however, handling and managing large volumes of data exponentially increases costs and technical complexity.
Furthermore, moving and centralizing data creates ever-expanding data privacy and security challenges, increasingly difficult to manage due to changing regulatory requirements across different geographies around the world. We think of data as having gravity because many drivers oppose its easy and inexpensive movement.
Traditional analytics and machine learning demand moving data to a centralized location. However, most the data around the globe, is now generated from distributed devices, in particular IoT devices and ideally, it is best to avoid moving this data and carry out as much processing as possible close to where it was generated.
This is a key tenet of the decentralized paradigm of data analytics.
According to decentralized paradigm, we can break the traditional way of doing machine learning analytics and instead of moving data effectively to this location for model training and development, we can bundle up ML models and move them to the where the data is generated.
This approach allows us to create entire pre-packaged machine learning pipelines that are wrapped up into a program in order to deploy them locally to distributed devices. Models can then be trained on each device, and then we are able to simply return back the abstracted model weights, which are ultimately aggregated by central server creating a new global algorithm.
These include all of the necessary ML-Ops components from pipelines to train, deploy, and then be monitored at scale.
These pipelines are then deployed via a distributed learning platform that leverages federated learning. This allows our customers to leverage all of the data from 1000s of machines across manufacturing lines, plants and geographies without having to actually connect them to a central data repository in order to train and deploy machine learning models.
The result for customers is that it saves huge amounts of money in terms of data transfer. Moreover it avoids the problems of limited constant network connectivity, and adds further layers of data privacy and security enhancement, particularly in the context of industrial espionage.
Case Study: ML Pipelines for Aero Engines Life Estimation
To test and optimize these pipelines we applied them to a dataset in order to develop the ability to do RUL estimation using IoT data and federated learning.
Often, life estimation can be a ubiquitous problem when it comes to the management of machinery both within the manufacturing context but also in other industries such as the aerospace one.
For this client, the challenge was to predict in advance of a failure happening when that failure was going to occur, to maximize the uptime, minimize the downtime, and optimize the maintenance scheduling of an engine. More broadly, machines.
This use case is particularly challenging in the presence of a poor network stability or an absence of connection, of highly distributed devices, generating heterogeneous data distributions (non-IID data I.e. non-independently and identically distributed).
Our solution was then to develop a pre-packed machine learning pipeline that was optimized by our data scientists for predicting remaining useful life. Then we deployed the pipeline, to 100 aero engines, in order to train them in a federated environment, and then aggregate back the weights from the models that were trained on the individual devices.
This was all deployed using a dockerized environment, backed off by a Microsoft Azure platform.
We were able to find that we could achieve federated global model performance that was at a highly competitive state of the art. Moreover, we discovered the possibility to massively outperform classical survival analysis, statistical models, and even our own optimized centralized models without moving any data.
We found that the performance of our federated models depended largely on the combination of the ML model type, and the aggregation algorithm that was used, and the balance of the data in the splits that were used for aggregation across the tree of engines.
These results were exciting and inspiring as this critical use case of remaining useful life estimation, was possible within a federated learning environment, really opening up the possibilities for applying this more widely into industrial IoT, bringing all of those advantages to our customers in terms of minimizing cost of data movement, minimizing model training costs on very large data sets, and maximizing the data privacy and security that is around that data, while also not having to worry about issues of when network connectivity
Get in touch to learn more about how we can help you accelerate your AI journey and how we can apply these cutting edge, federated systems and pre-packaged out of the box machine learning pipelines.