Years before starting T-DAB.AI, our founder and Head of Data Science, Eric Topham, was already sailing for pleasure and competitively. Whilst at Oxford University Eric met Jack Trigger, now a professional sailor at Jack Trigger Racing. Although studying different subjects (one an engineer, the other a biologist with a heavy interest in modelling), the two kept the problems that data could solve if applied correctly with modern statistical and computer science methods.
In particular, they were frustrated that in competitive sailing, modern data analysis and prediction are still rarely applied at scale using computer science methods that are industry standard elsewhere. In considering the various use cases over a not inconsiderable variety of beers, the two identified improvement of autopilot control as the single biggest impact on performance in short-handed sailing.
Several years later through an event organised by T-DAB, Eric met Pedro Baiz, a data science consultant, entrepreneur in residence of the Royal Statistical Society, and fellow at Imperial College. Together they started looking for ways to collaborate due to shared academic interest. In particular. they shared an interest in pushing the cutting edge of how to develop deep learning-based simulations to then train other deep learning models for control of systems in manufacturing.
Starting the Sandbox
In early February 2019, the first group of students from Imperial College, including myself, joined the newly formed T-DAB Innovation Sandbox to undertake research for our master’s thesis projects on the topic of intelligent control through deep learning for marine autopilots. Amongst solo sailors, it is widely discussed that current autopilot technology performs worse than professional sailor at steering the boat optimally, even after the sailor has spent significant amounts of time manually adjusting settings. This performance difference is significant, estimated to be around 15-20%. Since the time differences between boats is of the order of single digit percentiles, it seemed natural to enhance the autopilot performance such that it would match the performance of the human to obtain a race winning edge.
We decided to separate the data we had into the segments where either the autopilot or Jack were at the tiller. Then using a neural network architecture called Long Short-Term Memory (LSTM), we created a model that would predict the steering angle and train it only using the data where Jack was steering. LSTM architecture was specifically designed for problems where keeping track of some information over (relatively) long periods of time is essential, making it very applicable to such an intricate dynamic system as a sailing boat. The results of the model are very promising; it predicts the steering angle that Jack will choose in the next second correct within one degree of rudder angle.
An alternative branch of the research looked at the same problem at a different angle. Instead of trying to emulate Jack by predicting his decisions, we hypothesized that reinforcement learning (RL) could be used to allow the model to explore the problem without any prior biases, being neither helped nor constrained by human experience and potentially outperforming Jack. Such an approach, however, requires a reliable and accurate simulation environment so algorithm can receive useful feedback and learn within the simulation.
The simulation was built using the same type of network described above – LSTM. Given the (short) history of boat’s movement, the model, i.e. the digital twin of the boat would return the state (speed, orientation) of the boat in the next second. On top of that was our RL algorithm exploring different actions and trying to overtake Jack. The RL model was validated on some standard problems from Open AI Gym, and performed well for similar (albeit simpler) types of problems. The boat’s digital twin, however, has developed a significant bias and did not render the performance accurate enough to be useful in training the RL agent.
With this in mind, Charles, who joined the sandbox this spring is now working on improving the digital twin. He is using several techniques from hyperparameter optimization and using more data to completely changing the model architecture. We expect the limitations of the simulation to be overcome and switch our focus on training and optimizing the RL agent.
Additionally, this year one of the other students, Hangming, is exploring the ideas of unsupervised classification of the sailing ‘modes’, i.e. distinct patterns of boat movement and state, which can aid the autopilot improvement by preparing a basis for more specialized models for the different modes. Going even more deep, there is a particular maneuver called ‘tacking’ that receives special attention from us, another student, Rafael, is building a model to recognize and filter out the tacks from the data with the goal of using this information to analyse optimal tacking tactics in the conditions of the fluctuating wind.
We know; that’s lots of exciting stuff, so hang on to your hats, keep the spray out of your face, and stay tuned to our blog to hear about all of these projects and how they progress in more detail.