Hopefully you are familiar with our Innovation Sandbox program, where students from one of the UK’s top university join us for several months to do their master’s projects. The flagship of the Sandbox is an effort towards using data to enhance performance of a world-class sailing team Jack Trigger Racing. There is a high level introduction to the project if you feel like you need a backstory. In this article we thought of giving you a bit more in-depth look at the technical problems and our thinking about them.
To reiterate for those who wants to jump right into the technical details, in early February 2019, the first group of students from Imperial College joined T-DAB’s Innovation Sandbox. Their projects revolved around the improvement of the autopilot that helps sailors to steer the boat in solo races. In sailing circles it’s widely discussed that current autopilot technology performs worse than professional sailor. Bearing that in mind it seemed natural to ask ourselves how we could enhance the autopilot performance such that it would match the performance of the human.
Additionally, we thought: what if we could define the objective in terms of more abstract metrics than ‘what would a human have done?’ then we are not restricted (nor are directed) by a sailor’s knowledge and experience and can allow the algorithm to explore all action space including once human might’ve never thought of, and potentially exceed the performance when it comes to racing. What we’ve just described is essentially the difference between supervised learning and reinforcement learning. If you want to learn more about these two types of ML, check out this breakdown from Hugo on supervised learning vs unsupervised learning. Now let’s look at the two approaches in turn and in more detail.
Creating digital Jack: Supervised approach
The data we have from the boat contains information about the boat’s state and the environment (wind and sea) around it. The idea is then to make a model that would map these environmental variables to the angles of the rudder set by a human driver so then during the next race the model would mimic the human standing at the tiller. We were jokingly calling it ‘the digital twin of Jack’. The data collected on the yacht contains a flag indicating if the autopilot or the sailor is steering at any given moment. Since the autopilot is the system we want to outperform, we don’t want to learn its mistakes. so times when the rudder angle was set by the current system were completely cleaned out. Among other things, the dataset contains wind and underwater currents speeds and directions, the boat’s orientation and speed and air temperature, so de facto the same variables a sailor observes (albeit not so precisely) while steering the boat.
It is logical that the sailor doesn’t make a decision based on one quick observation; they understand the context, know the history of boat’s movement, so the model for this task must be able to resolve that temporal dependency and utilize the mechanics of the boat over time. A recurrent neural network architecture known as long short-term memory (LSTM) was chosen due to its known ability to keep track of relevant information over long periods of time. The first iteration of the model showed promise, however it struggled with predicting the correct angles in some areas, a problem that was solved by Stan, who joined the project in April 2019.
Stan recognized that the model struggled the most in areas where Jack was performing a maneuver called ‘tack’. The basic idea of tack is to change the boat orientation relative to the incoming wind when sailing upwind. The series of tacks allows the yacht to progress upwind to a desired point despite not being able to sail directly to the point. Tack is not an easy maneuver performed by the sailor and in the available dataset it occurred relatively seldom, so when the model encountered such an unusual combination of variables, it was failing to make an accurate prediction. Tacks were removed from the dataset and the LSTM architecture showed its full potential and achieved accuracies of +/- 1 degree from the angles chosen by a professional sailor.
There is an almost philosophical point to this result. Sailors don’t use the numbers when they steer the boat, they just do it ‘by the feel’, so if instead of Jack we were to put someone else at the tiller, they wouldn’t be able to steer as close to Jack as our model would. And that is where the idea of ‘Jack’s digital twin’ came from. If we equip the boat with an autopilot that just mimics the behaviour of the same person who is in charge of the boat, that keeps the sport competitiveness intact because your autopilot can only be as good as you are.
Reinforcement learning and going beyond the twin
This competitiveness is much harder to argue for when we come to the reinforcement learning side of things. You have probably heard of the reinforcement learning models such as AlphaGo and AlphaZero, as well as Deep Mind’s Atari experiments, where these models outperform humans sometimes by a margin that is considered unreachable. If sailing is to be distilled to basic mechanics, we could see that the steering optimization that we are trying to achieve is not quite the Atari game or chess, and it’s not about complexity of the system per se, it’s about the scale of control the autopilot has. The boat is propelled by the wind acting on the sails, which are managed by the sailor. Sails are the primary control and the rudder is there to help the sailor go along the route chosen by him (with his sail trim) with minimum resistance.
To continue the analogy with arcades, if you let your model play a game, it sees the whole screen, it is in full control of the agent. It does something wrong, it dies, it receives a poor score, and hopefully learns the link between the action (e.g. jumping off a cliff) and the outcome (e.g. bad score). On the contrary, the rudder has a supportive role and while it would be obvious when the rudder is severely restricting the success, judging ‘poor’ from ‘ok’ from ‘good’ performance is hard. What the rudder is supposed to do depends on the actions of the sailor, which can only be observed by the autopilot indirectly from the changes in the boat’s behaviour.
However, the first difference between sailing and playing arcades that needed to be overcome is the cost of training. Obviously, we could not afford Jack to sail the boat around the globe while the autopilot is learning, so we needed a simulation that would provide feedback to the RL agent until its performance is stable enough to be put on the actual boat. There are some open-source frameworks for hydrodynamic and aerodynamic simulations, but they are not trivial to set up and are very computationally heavy, not to mention commercial complications in obtaining an accurate 3D model of the boat in question. A more radical approach was chosen and again the LSTM were employed, this time to create a digital twin of the boat.
Here the LSTM had to be answering a question, ‘given the observed sea states and boat movement, if I choose this rudder angle right now, what would be the boat’s state in the next time step?’. The data collected during the race is limited by autopilot choices, so we cannot assess counterfactuals. If we could answer the question above, the model can explore the environment and receive accurate feedback on its choices. We have kept one restriction though – the route. As mentioned above, the rudder alone cannot change the whole strategy of the race, it can only help (or stagger) the sailor’s chosen route. So we gave the reinforcement learning agent a corridor within Jack’s original movement, where it could explore its options and collect valuable experience.
A rather popular algorithm called deep deterministic policy gradient was chosen to train the reinforcement learning agent. The algorithm was tested on model problems from Open AI Gym and then set off to our virtual sea. Unfortunately, it didn’t sail very far at first, and the reason was lack of data for digital twin training. As we were limited in the data by only one race, the dynamic model had developed a bias towards port (left) turn. Even if we would turn the rudder all the way to the right, our virtual boat would still be turning left, although slower, which shows that it got the overall dynamics correctly and is indeed suffering from a bias. We still let our reinforcement learning agent to ‘play’ with this model to see what it will learn, and sure enough only 50 episodes (tries) in, the agent was just setting the rudder all the way to the right to prevent the boat from turning left. That assured us in the validity of the method and the need to focus on the bottleneck present in the system – port turn bias in boat state estimator.
Next steps for the Sailing AI
So far this project the revealed multiple challenges there are to be overcome on the way to optimizing control for speed at sea, but it has also showed great potential. This potential and the innovativeness of the solutions were recognized by the organizers of IOT World Congress and in October 2019 we held a panel discussion about our achievement and future goals in the state of AI theatre at the congress.
This year we have more students coming to work on both streams of the project and bring new exciting developments. For instance, you will be able to read more about how we are using unsupervised machine learning to discover ‘sailing modes’ or sailing behaviors of the boat, and how we are improving our deep neural network based digital twin of the boat to help deep reinforcement learning algorithms learn better the result of their steering actions on the state of the boat. Stay salty.