Former Dimension Data group applying machine learning to cycling
Artificial intelligence, virtual reality and data analysis have been revolutionizing sport around the world, and the Tour de France is just at the beginning of an effort to incorporate high technology into professional cycling. Working with technical partner NTT (formerly Dimension Data), the race has offered an increasing amount of live data to fans through ASO's Race Center app. Cyclingnews spoke to the group leader Peter Gray, NTT's Vice President of Global Advanced Technologies for Sport, about the effort to bring high tech to the Tour.
Cycling fans might have noticed that the @letourdata Twitter account sending out graphics about riders' qualities, speeds on climbs, the potential success of a breakaway and alerts to the potential for a critical moment in the peloton, 'Le Buzz'. It's all part of a massive data collection and analysis effort that aims to bring a richer fan experience to the Tour de France.
NTT are in the unique position as partners with ASO to have access to several years worth of positional data of riders from all of the ASO races.
I think there's definitely a role that machine learning and artificial intelligence play, but the beauty of sport is that it's inherently unpredictable.
"We've been working with the ASO now for five years and so we have all of the data from all of the ASO races, where we have groups and gaps data that are collated from the TV motorbikes. We have that data for pretty much every ASO race, which is 70 race days a year for the past five years the were able to to take that from," Gray told Cyclingnews.
Each rider in the race carries a GPS tracker under his saddle and the data piggybacks on the robust communications network used to send television images to all of the broadcasters to reach NTT's big data truck. The data is then crunched by the team's data scientist Robert Webster and used by journalist Benoit Vittek to help tell the story of the Tour on ASO's Race Center, live television and social media platforms.
NTT also take the data past and present and use it to tune models using a form of artificial intelligence known as machine learning to make predictions about the race.
"We're getting 98 per cent coverage for every single rider every second of the Tour de France, including all the really critical pieces like the major climbs where this data is actually the most useful."
What's the buzz
One of NTT's predictors, called 'Le Buzz' takes the position of each rider in the peloton to create a model of the bunch and its behaviour to send out alerts when there is a sudden change that might signal a critical moment such as a crash, an attack, or splits in the peloton.
"That allows us... to see the shape of the peloton. Is the peloton strung out, everybody just sitting on each other's wheels and it's 500 metres long? Or is the peloton all bunched up and 50 meters long and 10 riders wide," Gray explained.
"When [the computer] sees some sort of change in the behaviour or the formation of the peloton, we get an alert that pops up. We've been watching throughout the race and when it buzzes we look at what's causing it, what's coming up?
"We see some things that you'd expect. For example, as the peloton comes towards the feed zone we will typically get a buzz as everybody tries to position themselves to get their feed. We often see a buzz just before a key corner, maybe into the bottom of a climb, where everybody's accelerating to try and get a good position into that corner. We've seen buzzes when there have been key attacks or where teams have gone to the front and really put the pace on.
"This is something that we're just starting to experiment with, and we will continue to develop and to learn what else we can start to do with this type of analysis. It's a process of exploration to discover what we can do and how that can be best used, and what stories it can tell."
That information might come in handy for television broadcasters who might lack a trained eye, or it could be used by the team directors who should be keeping their eyes on the road and not the television feed.
"We know that teams will make use of the live tracking data. They can then identify which groups their riders are in and have they got the right guys in the right groups. Do we have any riders who are dropping off the back who maybe need support or are at risk of not making the time cut? We know a number of the teams are using that information as input to their race strategy and information that they can communicate to the riders. We certainly know team Dimension Data* certainly use that information and so do some of the other teams as well."
Picking favourites, defying the odds
NTT are applying other machine learning algorithms to picking favourites to win a stage, using six years of past UCI results to train the model on what kinds of courses favour each rider's qualities. The model is weighted toward the rider's most recent results and "builds a picture of a rider and their capabilities and maps that to the parcours of that particular stage."
The model also takes into account the team profiles - how often the riders in the Tour have raced together, what the qualities of the riders in the team are as a group, and more. The previous stage data is also included in the model to help predict the outcome of subsequent stages.
"We end up with about 35 different features that are input into the training set. That training set uses a random forest model, and then through that, it's producing its results on a daily basis."
Another machine learning model looks at the potential for a breakaway holding off the peloton to contest the stage win. This particular model is run every 10 kilometres along the course and takes into account the profile of the stage, distance to the finish, the time gaps between the groups, the standings of each rider in the general classification and what has happened in the past in similar situations.
"It's been pretty much on the money," Gray says of the model. "We don't predict a yes or no - the breakaway is going to be successful or not successful. We predict the percentage likelihood of the breakaway being successful.
"For example, we had the sprint stage ... which Caleb Ewan won [stage 11] and we could see that earlier in the stage, as the breakaway was forming, it says maybe there is a 20 per cent chance of break is going to be successful because it's a sprint stage. But as you get towards the finish and the gap starts coming down then that percentage likelihood of the break being successful decreases.
"Whereas [stage 12] we had the breakaway that was sitting at about four minutes and then it started to push out to six minutes. On to the final climb, the percentage likelihood of the breakaway being successful went from about a 50-50 to 90 per cent as they were going up that final climb."
The computer is not infallible, however, and riders can defy the odds. The one time a rider beat the odds was on [stage 8] when Thomas De Gendt (Lotto Soudal) held off the chasers to win in Saint-Etienne.
"It gave him a 20 per cent chance of winning and he put in an extraordinary performance in to achieve that one in five chance," Gray says. "From a prediction perspective, I'm quite happy with that result because in sport, nothing is ever certain. If we could predict every outcome 100 per cent of the time, sport would be really boring. The thing that makes sport exciting is that people sometimes do defy the odds. You have an unexpected winner."
Another stage that gave the machine fits was stage 10, where several sprinters who were favoured to contest for victory got left behind in the crosswinds.
"That made for an incredibly exciting stage. I think there's definitely a role that machine learning and artificial intelligence play, but the beauty of sport is that it's inherently unpredictable."
The computers might help the race organisers to create a parcours that helps make the race less predictable.
"Certainly as you build up a data set you can start to identify what features make for an exciting course or an exciting race," Gray says. "But as the saying goes, it's the racers who make the race, so that's never going to be 100 per cent. But certainly, you start to be able to identify what are the features of races that were particularly exciting."
Until then, NTT will continue to rack up 5,000km of driving around France, beaming data into their cloud servers and shooting out information to Race Center and its social media channels. They've signed on for another five years with ASO and will continue developing ways of making the fan experience more exciting, including looking at augmented reality as a way to transport fans from the sofa to a three-dimensional rendering of the race.
Being able to see the terrain and appreciate the severity of the climbs "gives you a completely different appreciation for the capabilities of these riders", Gray says.
"That whole area around virtual reality, augmented reality, mixed reality and particularly location-aware capabilities is definitely a space within the technology world that is emerging and developing very quickly. We see that those have potential applicability to sport and to the Tour de France."
What fans won't get access to is riders' power data, which the teams have kept a tight lid on.
"The current agreement between the Tour de France and the team's only allows us to capture the speed and the position because ... information like power is very sensitive information and so sharing that in a race like the Tour de France is something that the teams have not agreed to participate in at this point. It certainly is is technically possible and I think it's something that will go eventually it will eventually happen because I think there's a lot of interest in that type of information."