teaching ai to sail.
Improving Industrial Automation performance with Deep Reinforcement Learning and RNNs
Modern ocean racing sailing boats are high performance machines, almost more comparable to aircraft than the yachts of old. They combine cutting edge material science, aero and hydrodynamics, navigation systems, telecommunications, and sensors.
However, one underdeveloped technological domain is the use of real-time data from boat sensors for automated performance optimisation. While data is relayed via displays to a human, there is little or no interfacing with the autopilots that are vital for long distance racing with only one or two crew.
Crafting the winning edge
During long-distance solo races, sailors heavily rely on the autopilot to steer the boat more than 90% of the time. However, current autopilot technology underuses the vast amount of inputs available from the fully integrated IoT network of sensors installed on modern yachts and can only perform around 80% (in terms of boat performance metric) of the human sailor.
This is a vital margin when statistically just a 2% performance increase on the previous edition is all that is required to win the Vendee Globe.
The Data Analysis Bureau (T-DAB) worked with Jack Trigger Racing (JTR) to develop a solution using a machine learning-based system to match and potentially exceed human performance to give the solo sailor a winning edge while remaining within the ethical bounds of racing rules and regulations.
LSTM-based system predicting steering angles with a precision of +/- 1 deg. from angles chosen by a professional sailor
Improved industrial automation of the autopilot by nearly 10% for the winning advantage
Enabling the racing boat to steer optimally, adapting to new conditions to move more quickly and improve power consumption
T-DAB initially conducted a Discovery and Design study to assess the available data and how to affect the autopilot performance, and then applied different machine learning & deep learning approaches and simulation models to assess effectiveness through a Proof of Concept (PoC).
The project was developed in two parallel streams: the first stream focused on replicating sailor’s behaviour and matching their performance, whilst the second one focused on exceeding these results.
The former utilises recurrent neural networks to learn from data recorded while a professional sailor is at the tiller; it is learning the steering angles chosen by the sailor in a set of given conditions. Presented with a sufficient diversity of scenarios, the model was able to generalise across different settings, for instance by not being specific to one particular race and set steering angles to the same degree as the sailor would have chosen during the race.
The solution set to outperform human behavior instead, was based on model-free reinforcement learning technology, where the algorithm was given freedom to explore any series of actions, within physical constrains, and come up with its own strategy to maintain the optimal course. The potential of this method lies in the removal of any human-induced biases, allowing the model to explore ways humans might have never thought and thus potentially rendering superior performance.
"This is an exciting but also challenging project, and T-DAB have been a pleasure to work with; finding a comfortable balance between innovation and breakthrough, with real tangible results"
Jack Trigger | Skipper & Director | JTR
For such an algorithm to learn, however, it needed to receive accurate feedback on its actions, and the chosen actions could have led to settings that have never happened, so there wouldn’t have been any data to match them against. Since allowing a real boat to sea to learn unsupervised was out of the question, an accurate simulation of the boat’s interaction with the environment was required. This simulation was built using Long Short-Term Memory (LSTM) recurrent neural networks.
By observing the state of the environment (sea and wind), the state of the boat (position, orientation, speed) and the steering angle chosen by the algorithm, the simulation was able to predict the state of the boat in the next time step. By using a wide range of sample wind and sea state data, we were able to set up the reinforcement learning agent to freely explore different environments and generalise its decision process to be applicable in any race or race setting.
The solution is complex and yet fully automated by design, so the solo sailor (and primary user) does not have to interact with the system. It would instead run in-production in an automated manner on the boat processor, and therefore would have minimal disruption and maximum impact on the sailor’s effort.
The sailor would then upload the data to the Microsoft Azure Cloud via a simple interface where data scientists would then carry out model re-training, validation and benchmarking in a dedicated environment. Updated models would be deployed to the boat in docker container using the Microsoft Azure IoT Edge technology.
This project was developed as part of T-DAB’s Innovation Sandbox in collaboration with Imperial College, NKE and Microsoft. The data collected for the project is offered by Jack Trigger Racing sailing team and marine electronics company, NKE, and the solution managed and deployed using Microsoft Azure.