Analysis of Annual Prudential RideLondon Cycling Sportive

Introduction

This blog post was written prior to the 2017 event. I have since added the 2017 data (November 2017).

The Prudential RideLondon is an annual three-day festival of cycling held in London over the last/first weekend in July/August. There are several events held over the three days including race for professional cyclists and a “FreeCycle” in which the roads of central London are closed to traffic so cyclists of all ages and abilities can enjoy the sights and atmosphere.

For this study I will concentrate on the Prudential RideLondon-Surrey 100 which takes place on the Sunday. This is a 100 mile (161 km) route on closed roads that is open to the public. Shorter distances are available! It starts at the Stratford Olympic Park in East London, winds through picturesque Surrey before returning to the heart of London and finishing in front of Buckingham Palace. In terms of a spectacle, it can be thought of as the cycling equivalent of the London Marathon; closed roads lined with cheering crowds etc. The first event was held in 2013 so it’s quite young and very popular with cyclists.

The Prudential RideLondon-Surrey 100 is a “sportive” event, not a race. Completing the 100 miles and enjoying the day is the main priority for most of the participants. Many are sponsored by their friends and riding to support charities. However, a significant number are trying to complete the course in the fastest time possible for personal achievement of beating a target time, a previous time or just finishing faster than a friend!

This study will start with an overview of the event before concentrating on the riders who are trying to go fast and applying more detailed analysis. Finally, I’ll include some tips on how to get your best time.

All of my analysis was carried out using R and RStudio. Where there is something technically interesting I will include that on a linked page, with the R code if it is appropriate.

 

Overview

Organising the Event

Broadly speaking there are two ways you can ride in the 100 mile sportive event; ride for a charity that has been allocated places or enter the annual “ballot”. Most of the places are allocated to charities with the remainder going into the “ballot”. The quotes are because ballot suggests pulling names out of a hat, which is pretty far removed from what happens. The organisers have many things to consider including safety, benefitting charity and encouraging participation in cycling. There is no publicly available data on the ballot but perhaps that would make an interesting future study.

Organising such an event presents some problems specific to cycling. In contrast to a marathon, you cannot start 25,000 cyclists at the same time. They would tumble in to each other and grind to a halt almost immediately. Even after the start, if the density of riders becomes too high, riders will bump together and have accidents. High speeds and the possibility of being hit by the cyclists behind you mean accidents are likely to be far more serious than would occur in a marathon. The priority (I assume) for the organisers is to get everyone to the end safely.

So the event is started in “waves” of a few hundred riders released every few minutes, starting at 6am with the last wave departing at about 9:30am. All riders are informed by the organisers which wave they are in prior to the event.

There are several stops along the route where food, drinks and toilet facilities are provided; most riders will have to stop at least once.

 

The Data

This analysis is based on the data available on the Prudential Ride London web site, which is based on recording the time riders pass several checkpoints along the route. Each rider carries a “chip” so their passing is automatically recorded. Cyclists in the UK generally refer to distances and speeds in kilometers (km) and kilometers per hour (kph), so I will stick to that convention.

 

A few Notes

Throughout this post I will refer to “times” and “splits”. Times are an actual time of day and on charts I will display these in 24 hour format. Splits are a period of time, for instance the time taken to go from the start to the finish. On charts I will display these as XhYm eg 6h54m.

 

High Level Analysis

Participation

This plot below shows the growing popularity of the event and that it is mostly men who take part. I’ve included everyone who started the event, not just those who completed the full 100 miles.

Ride100_GenderTrendPlot

The plot below shows the age group and gender of the participants. The age profile of participants changed very little between 2014 and 2016 so I’ve combined that data into one chart. There was no age data available for 2013. Not a lot to see here, unsurprisingly “MAMIL” being the most numerous among the participants. I am a MAMIL so no offence is intended! Interestingly 2017 shows the first decline in participation and as the event is oversubscribed I assume this is a deliberate reduction by the organisers. The number of participants completing the full 100 mile course remained pretty much the same though.

Ride100_AgeGroupByGenderPlotFrom now on analysis will focus on those that completed the full course distance.

 

Age Group and Finishing Split

Analysis of finish splits for the different age categories promised to be useful, hopefully allowing first time riders of the event to set themselves realistic target times. However I found a few problems with the data; The 2014 course was shortened to 86 miles in order to remove some dangerous descents from the course (it rained). The 2016 event was interrupted and shortened for some riders who started later in the day, leaving only 2015, as there is no age data for 2013. I believe 2015 was also affected by an incident but it happened later on and had a smaller effect, so 2015 is the cleanest data I have for this “box plot” (a simple explanation of boxplots). I have added a blue dotted line showing the median time.

I was expecting greater differences between the age groups. It seems when examining the race entry as a whole, gender is a far more dominant factor than age for estimating finish splits.

Ride100_FinishSplitByAgeGroupPlot_2015Here is a link to the other years, Age Group and Finish Split

 

Age Group and Start Time

For completeness, and as a prelude to later sections, here is the same boxplot but this time showing the start times broken down by age group and gender. It looks pretty similar, eh? Is the finish split strongly correlated with the start time because your start time is a strong contributing factor, or because you were assigned by the organisers a start time based on an accurate estimate of your finish split? Chicken/egg situation?

Ride100_StartTimeByAgeGroupPlot_2015Here is a link to plots for the other years, Age Group and Start Time Plots

 

Detailed Analysis

Start Time vs Finish Split (aka the chicken/egg question)

There are many differences between cycling and running, but one that is of interest here is that the speed you can go on bike is limited by air resistance, hence the skin-tight lycra. This has another effect, if you can ride behind someone you can go at the same speed with less effort. If you watch the Tour de France you will see the riders form a large group called a “Peloton” for this reason.

The interest here is that if you started the event with lots of very fast riders, you might finish much faster because you hid behind them as you could go just as fast but for only 60% of the effort.

I would like to be able to isolate that effect but there are several problems. My hypothesis is that the organisers place faster riders in the earlier waves based on an estimation. However according to the data there is still a wide range of abilities across all the waves. I do not know if people who started earlier went faster because of the “peloton effect” (aka drafting) or because they were faster and given an earlier start time.

First lets look at all the data in a simple form to see what we are dealing with. Below is the scatter plot for 2013, showing a fairly constant relationship which I’ve shown by adding a regression line. There is slight vertical banding, but not as pronounced as in the subsequent plots. Ride100_StartTimeVsFinishSplitAllFinishersPlot_2013

 

Here’s the 2014 scatter plot:- again a straight line seems to fit quite well. This was the year the course was shortened due to the rain. There is much stronger vertical banding indicating bigger gaps between waves of riders starting.Ride100_StartTimeVsFinishSplitAllFinishersPlot_2014

For 2015, a lighter band is clearly visible which shows there was an interruption to the event, affecting some riders who started after 07:30.Ride100_StartTimeVsFinishSplitAllFinishersPlot_2015_annotated

Similarly for 2016, you can see there were some incidents. A slight interruption above the yellow line, a big interruption between the orange lines. You can also see a squashing of the finish times below the green line, probably the organisers either “turned off the clock” at the finish line (a missing finish time would eliminate riders from this chart) or shortened the course for these riders.Ride100_StartTimeVsFinishSplitAllFinishersPlot_2016_annotated

It might be interesting to isolate when and where the incidents occurred, but for now I will just try to avoid data that is affected.

Here is the 2015 data for riders who started before 7:30am, broken down by gender and the start times placed into “buckets”.

Ride100_StartBucketFinishTimeWithGenderBoxPlot_2015Here is a link to the charts for the other years (they all look pretty much the same), Start Time and Finish Split

 

The Plot Thickens

If the organisers allocate start waves on the basis of estimated finish time, it looks like they could safely add 15 minutes to the estimated finish time of the women. On the other hand, given some people take 10 hours to finish the course wouldn’t it make more sense to start the slowest riders first? The course can only be closed to traffic for so long.

If the organisers could precisely estimate every rider’s finish split and started all the 4h45m riders at the same time, would they do that? There would be a clump of riders making their way around the entire course, cursing the event for being too crowded!

I suggest therefore that either the organisers create each wave with a profile of estimated finish time, or rely on the inaccuracy of estimating finish splits. Given how difficult it must be to estimate a finish split, it’s probably the latter! The discrepancy between male/female start times and finish splits must be deliberate though, as a contribution to a spread or some other reason.

As for my aspiration to isolate the “peloton effect”, I admit defeat. However, I will make a few observations. Below are the riders starting in the first 90 minutes of 2015 showing the gradient of the best fit line. In this case it’s saying that, on average, riders who start at 7am will be 65 minutes slower than those who started at 6am (60 x 1.09).

Ride100_StartTimeVsFinishSplit730ScatterPlotWithGradient_2015

Here is the same plot but focussing on just the first 60 minutes. The gradient has increased so in this case on average riders starting at 7am will finish 75 minutes slower than those who started at 6am (60 x 1.25)

Ride100_StartTimeVsFinishSplit700ScatterPlotWithGradient_2015

The earlier in the morning the rider leaves, the stronger this effect is; so the line should be slightly curved, not straight. It is easy to create plots with curved lines that show this, but the curvature was not very obvious. This effect is visible in the other years, which I’ve included here, Start Time and Finish Split – Early Starters

 

Group Riding

For those attempting to get a sub-5 hour time, how effectively you ride in groups will be a big factor which I would like to investigate. However, the data is not particularly helpful in that regard; it tells me when each rider passed each of the 8 checkpoints, but just because people cross a checkpoint together does not mean they are riding together. As the day progresses you have faster riders/groups passing slower riders/groups.

For instance, if data indicates that 40 people passed a checkpoint in 10 seconds, probably there are several strata of riders within that 40. Perhaps a fast-moving group of 5 who ride regularly together are at the front, followed by 15 slower riders who tagged on the back and are holding on as long as they can. Alongside are a ragged line of 20 riders who got an earlier start time but are significantly slower, they just happened to be crossing the checkpoint at the same time.

Another issue is that eight checkpoints is a very sparse sample of what happens on a course that takes at least 4 hours. Some people ride the course with the same people from start to finish, but for many people who they are riding with is fluid. Even a 100 checkpoints would not capture it. In 2017 the number of checkpoints increased to thirteen, much better but the same issues apply.

The approach I have taken is to only consider riders as together if they passed two checkpoints together, not just one. Harder than it sounds! A more detailed technical explanation is here, Checkpoint Group Analysis

As we will be looking at plots specific to some checkpoints it is worth showing a few details. Below is a table showing the checkpoint distances from the start (in km). I’ve also included a little weather info and where the hills were on the course. Note that all the checkpoint and hill distances are approximate.Ride100Checkpoints(inc2017)

To start with lets look at the data, starting with the plots below for 2013. I’ve omitted groups smaller than four riders and limited the graphical “bubble” size of very large groups. You can see the riders being spread out as they get further into the course. i.e. it only takes 3.5 hours for everyone to cross checkpoint 4, but by the time they reach checkpoint 8 it takes 6 hours+.

By checkpoint 5 and checkpoint 6 there are fewer significant groups but they form up again before the end. Having looked at a YouTube video for 2016, I know that Checkpoint 5 was at the top of Leith Hill and I think it was in the same spot in 2013 too. That would explain the slower speeds and decimation of the groups at checkpoint 5! As I mentioned earlier, it is quite easy to ride in groups on flat terrain but once on a steep hill the peloton effect is greatly reduced and groups will break up. It is interesting that although the riders are more spread out, they form up into groups by checkpoints 7 and 8.

Ride100_MultiCheckpointPlot_A_2013

 

For 2014 there were no significant hills so the groups remained together. The tendency to form groups towards the end of the course is visible again though.Ride100_MultiCheckpointPlot_A_2014

In 2015, the effect of course incidents is visible.

Ride100_MultiCheckpointPlot_A_2015

For 2016, the effect of incidents is even more apparent.

Ride100_MultiCheckpointPlot_A_2016

For 2017 there were 13 checkpoints so I will show a more complete plot. The same effect of those hills on group size are apparent again.

Ride100_MultiCheckpointPlot_2017

The next plot shows the tendency of groups to split up in the middle of the course and then form up again towards the end.Ride100_MultiCheckpointAverageSizePlot

Focus on Groups

I have dozens of plots for each checkpoint, for each year and with different filters, which I will spare you as this post has become longer than I expected. I’ll just look at a couple here.

This plot of checkpoint 2 to show that the effect of start waves is still apparent (diagonal striping) after 27km of riding on mostly flat roads.

Ride100_SingleCheckpointPlot_2016_Cp2

Below are plots for checkpoint 7, the first showing only groups larger than 3, while the second shows everyone (groups of 1 and above). Note how most of the big group bubbles are above the red “average speed” line. Riding in groups is faster!

Ride100_SingleCheckpointPlot_2016_Cp7

Ride100_SingleCheckpointPlotA_2016_Cp7

Should you stop?

There are various reasons you might stop; to go to the toilet, to eat & drink, to have a rest, to repair a puncture etc. The data does not show directly who stopped, but I have estimated who stopped by seeing if they took longer to travel between two checkpoints than would be expected. A detailed explanation is here Ride100 – Did you stop?. The larger proportion of stops in 2015 and 2016 can be attributed to incidents that forced many riders to stop. I excluded 2014 from this plot as it was a shortened course, but the shape was almost identical to 2013/15/16.

As you can see the 2017 data shows a different profile and I think this is down to problems with my assumptions rather than a significant change in how many people stopped. The accuracy of the chart should therefore be treated as “suspect”! One of the issues is I assume the first 1% of riders across the finish line did not stop and use that as a template to compare everyone else’s sector times. However in 2017 the spread of times for the first 1% was larger than in previous years. The general trend remains though and should I have time I would like to revisit the analysis to see if I can improve it.Ride100_FinishTimeDidYouStop

So should you stop? Well if you want a time on the far left of this chart then stopping is not what everyone else is doing. There is a balance to be made of course. If you are dehydrated, at best you will slow down, at worst you will fall off your bike and be in a St John’s ambulance tent for the remainder of the day. Use good sense – a stop for a drink might mean you finish faster and enjoy your day.

 

Some Diversions

The Winning Move (2016)

Just for a bit of colour, who “won” the 2016 event? First over the line were Mr Harrison, Mr I Cipolletta, Mr F Cipolletta and Mr Vartasarian, all at 10:01:19. They were part of a group of 8 that crossed the line within a few seconds. After a bit of googling it turned out Mr Vartasarian’s place was actually being ridden by a Mr Percival of the same bike club (Regents Park Rouleurs). A youngster, not age group 40-44! Mr Percival also got the fastest point-to-point time; 3hours 57minutes and 1s.

The Winning Move (2017)

First over the line was Mr Vartasarian (the real one this time apparently!) in 3 hours 58 minutes and 5s, with Mr Sharland of Paceline RT one second behind. Again this year the man first across the line also got the fastest point-to-point time. Just under a minute behind them were the chasing pack of 22 riders. There was then five minutes before the rest started to come through. I suspect that gap was due in part to them giving up on getting a time under 4 hours (which they missed by about 5 minutes).

There was an initial break attempt by a pairing of Mr Mcquillan (Tri UK) and Mr Yanto ( Le Col) before mile55 where they had a gap of almost a minute. By mile63 this had extended out to 1min 45s. However by mile74 Mr Yanto had dropped leaving Mr Mcquillan on his own with a reduced gap of 36 seconds. By mile80 he had been overtaken by a group of three riders;  Mr Vartasarian (Regents Park Roleurs), Mr Sharland (Paceline RT) and Mr Gunther (London Phoenix). They had a lead of 33 seconds which they extended to 42 seconds by mile87. By mile94 Mr Gunther had been dropped and the remaining duo took a 52 second advantage to the finish line.

 

Biggest Teams

I define this as a group of riders who managed to stay together for the entire route. Not easy!

The data wasn’t especially interesting, about 5-10% of the participants ride the entire route in a group of two or more riders (depending on year, 10% in 2016). The largest group to make it from start to finish was the second group to cross the line in 2016 (just over two minutes later than the first eight), 26 riders from start to finish.

For 2017 the biggest group from start to finish was 19 riders who again were second group across the line. They did all get sub 4 hour times though. Nice!

 

Most Overtakes (2016)

This award goes to the pairing of Mr Connolly (Team Milton Keynes) and Mr Del Brocco (Njinga Cycling Club) who managed to overtake 10,146 and 10,050 riders respectively. They started at 07:45, a late slot considering they managed to finish in 4h19m and they rode almost the whole race on their own, only expanding to seven riders for the last sector. I would not be surprised if their power stats were better than the riders who crossed the line first (that’s a long time in the wind).

The most overtakes by a “lone” rider goes to Mr Mahon, 7,320! He started at 8:59, so perhaps he was supposed to start much earlier but overslept! Course completed in 5h51m.

Most Overtakes (2017)

Mr Rui and Mr Scott teamed up and took 10,984 and 10,871 riders respectively. A new record! Not bad considering a start time of 8:52 and final split of 4h46m.

Top Tips

If you’re an experienced rider and/or expect to get a time under 4h15m there’s not a lot I can help with. You would be better to read about how it was “won” in 2016 by the guy who “won” it, Tom Percival Blog

For those trying to get a sub-5 hour time, the most important factor is how fast you can ride a bike. Don’t expect a big secret here! Perhaps my tips will help you chop 10 minutes off your time.

Sub-5 hour tips for those starting before 7.30am

An early start slot is a big factor if you are on your own, but you can’t change it. If you didn’t get an early slot, all is not lost. Arrive at your allocated start pen early and get to the front. If you can’t do that, then “burn some matches” quickly after the start to get to the front. There is a large spread of abilities in each wave, so unless you are aiming to go under 4h15m you should be able to team up with some riders of similar ability.

Safety is you main concern, for yourself and the riders around you. If you are not well practised riding fast in large groups of riders, focus on getting through the busy sections safely rather than quickly. There are all sorts of amusing rules and guidelines, taken to almost religious fervor by some. You can familiarise yourself here, The Rules

As you approach the Embankment, the density of riders is going to increase quickly as you push through the previous wave. Some of the riders in your group are going to lose the wheel in front and you will find it very difficult to get past that split once it’s happened. But that 5 second split will be more than 5 minutes after 100 miles so more matches need to be burned as an investment (if you have them!)

Once things have settled down a bit you will hopefully find yourself in a group you are comfortable in until the first of the hills, Leith Hill. If you think you can hold your own up the hill, make sure you are near the front of the group BEFORE the hill. Otherwise you will be a victim of the splits that will inevitably occur.

Don’t stop unless you have to. Not only do you lose the time while you are stopped, but the average speed of the riders/groups will be slower when you re-join.

You have to ride in groups. In fact it is so important that perhaps this is my best tip: If you are dropped and find yourself riding on your own, slow down! Chill out, have a banana, and wait to join the next group that passes. Any effort you spend trying to outpace the dozens of peloton behind you is utterly wasted. This tip can be extended;- while there is glory in smashing it up a hill and leaving the rest of your group behind, it’s wasted energy as they will reel you in once over the top. Better to save those matches for the sprint down the Mall!

If your group is overtaken by a faster group, consider switching. It may burn some energy but it could be worth several minutes by the end. However be realistic about how close to your “red line” you are and whether you would be able to ride with the new group. Being spat out the back and being so exhausted you can’t ride with the group you were with originally would be an obvious “fail”.

Ok, I’m done. I hope some of this was interesting or even useful. I may continue to add each year’s data, but to be honest it doesn’t change that much each year so little value is added. For now I am looking for some other data to get stuck into!

9 thoughts on “Analysis of Annual Prudential RideLondon Cycling Sportive

    1. Thanks Tom, just saw your comment (5 days late!).
      If you took part this year I hope you went well. At some point I’ll load up the 2017 data and see if there’s anything intersting. Bit windy was what I heard, so times maybe a bit slower.

      Like

      1. I’ve somehow just seen this. Lots of days late! Unfortunately, this year I punctured in Ripley (40 miles or so in), but the “real” Arlen Vartazarian was “first”.

        Would be very interested in the analysis.

        Like

      2. Well I finally got around to loading the 2017 data. Bit of a palaver really as they added a load of checkpoints and started the thing 15 minutes earlier (plays havoc with my charts!). To be honest there’s not really much extra in that wasn’t there from previous years.

        Like

      3. PS – just had a look to see if I could spot you amongst the data – were you riding as a Mr Gurney or Mr Norris this year? (stopped between mile 37/47, then went like bat sh*t till the end!)

        Like

  1. 200 visitors! I’m very happy with that. If I can get it above 250 I’ll add the 2017 data, above 500 vistors and I’ll add some extra 2017 “Top 100” charts for a bit of fun; Most Overtakes, Slowest Time, Top Pairings/Teams/Clubs etc. Maybe based on an “adjusted finish split” based on estimate of how much early start/draft assistance riders got? Hmmm. Back to that chicken/egg problem again!

    Anyway, please post a link to your cycling forums as that’s where all the traffic has been from so far.

    If you want to see the 2017 data probably best to add this page to your favourites as it is almost impossible to find on Google!

    Like

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s