When the elite of the world’s sports sailors let their boats sail off in the Vendée Globe sailing race later this year, the boats will not only be steered by the solo skippers during large parts of the race, but by the autopilots on board. Yet, these do not deliver the same performance as human skippers who adapt their steering to wind, waves and other factors. Autopilots have very simple guidance behaviour, relying on traditional closed loop PID control that requires tedious fine tuning by the sailors.
Since 2019, T-DAB.AI (T-DAB) has teamed up with British professional sailor Jack Trigger, from Jack Trigger Racing (JTR), and NKE, to do just that: embrace the opportunities of machine learning (ML) to train intelligent sailing autopilots.
JTR have provided high-resolution data from several sailing races, including data about the boat’s state – such as the position, velocity and the position of the boat’s rudder – and the boat’s sea environment, e.g. wind speed and wind direction, all of which is recorded multiple times per second. This data is used to train ML algorithms that provide the optimal position of the boat’s rudder, as output, in a given state of the boat (velocity etc.) and of the sea (wind etc.).
Different breeds of digital twins
Various things can be understood by the word “optimal”. In previous iterations of the project, ML algorithms were trained to act as realistic digital twins of the human skipper’s steering behaviour. This was done by using datasets from races where the skipper steered the boat by hand. “Optimal” in this case meant to predict future rudder angles that the skipper would take as accurately as possible, based on a history of boat data preceding that future instant. In this instance of supervised learning, the ML algorithm can, in the very best case, reach the steering performance of the skipper, but by definition cannot exceed it.
In contrast, another approach previously tested within T-DAB allows in theory to exceed the performance of the human skipper. This is achieved by means of Reinforcement Learning (RL) algorithms, which learn to steer the boat by receiving feedback from their environment regarding their steering behaviour. This feedback comes in the form of a reward function: the faster and more directly the RL-controlled boat approaches a target point, the higher the obtained reward. Through the often newly attempted exploration and exploitation of many different control behaviours, the RL algorithm is thus to learn a reward-maximising control behaviour.
The catch is: to let a RL algorithm learn the real feedback of a real sailing boat over thousands and thousands of iterations in a certain sea environment does obviously not only exceed any research team’s budget but can also be very dangerous. Hence, this learning environment had to be moved from the real to the virtual world: a digital twin had to be developed, this time of the boat and not of the human sailor.
More precisely, this digital twin is given data of past boat and sea states and has to predict the state the boat will be in in the next time increments. Based on this simulation environment that reproduces the boat’s behaviour in a given situation, the RL algorithm can gain experience as to what the optimal rudder angle is, much like OpenAI’s gyms provides other environments to train RL algorithms. A first exploration of this approach within T-DAB has provided the proof-of-concept of this approach that has an RL algorithm learn optimal control behaviour in a virtual boat environment. However, the study has also shown that the development of a reliable digital twin of the boat is rather challenging. The improvement of this digital twin of the boat is the focus of the present investigations in this project.
Finding an appropriate architecture
The construction of a reliable simulation environment essentially corresponds to a multi-variate time series forecasting problem, a problem that also occurs e.g. in the context of financial time series and has been researched for many years. In the current project, inspiration is drawn from the state-of-the-art of these developments. In a first phase, the optimal hyper-parameters (number of layers, learning rate etc.) are identified for a stacked long-short term memory network (LSTM) that forecasts the boat state for the next 10 seconds in a resolution of Hz. This optimisation relies on Bayesian optimisation, which presents a number of advantages, among which its efficiency for the relatively high-dimensional hyperparameter search at hand.
This stacked LSTM receives as input the sensor data of the 60 seconds that precede the prediction time window. In the first phase that is currently underway, the model is trained to minimize the mean absolute error of the predictions, all of them being weighted equally. In an extension, this could be improved by weighting the losses of the different prediction time steps and of the different features differently, e.g. to incentivize more accurate results for the first time step, while later predictions could be of less importance. Indeed, upon reaching that point, the sea state might have changed strongly anyway.
Inspecting different time windows
In a second phase to follow suit, the optimised model is provided different lengths of input data, e.g. 120 seconds of sensor data instead of 60 seconds, which allows us to study the effect of the time window’s length on the model’s accuracy for each prediction and each feature. Indeed, the boat’s state is subject to different dynamics whose variations take place at different frequencies. As an example, one can roughly assume that wave movements are repeated every 10 seconds, while wind speed might only vary in a given range during minutes. In this light, it is expected that the prediction accuracy critically depends on the length of the time window that’s provided to the model.
Building an ensemble
With this experience at hand, the third phase is commenced, namely the training of different models with the same optimised hyper parameters, but with time windows of different lengths as inputs. These models are then combined in an ensemble to together predict the state of the boat for the next 10 seconds, with the output of the ensemble corresponding to the weighted average of the individual model outputs. In a first approach, the same weighting and ensemble architecture is used as proposed by Xia et al. for another prediction problem.
In a subsequent phase, this exact same approach will be applied to more sophisticated models, namely CNN bidirectional LSTMs that have proven their performance for many forecasting tasks. Finally, this allows to provide a reliable simulation environment to train ML algorithms to sail intelligently, and eventually outperform today’s autopilots.