This week looks to be one of the most (if not the most) exciting weeks of the Regional schedule. We all know that the Central East men's region is the toughest in the world, but the European women's region and the Central East women's region are also super-competitive as well this year. It's also a little intriguing to see which "records" will fall now that the bar has been set on all the events. Personally, I think every single one of the men's records will go down (potentially all in the Central East?) and many of the women's records will go down (doubtful that Akinwale's records in events 1 and 2 will fall).
With that in mind, let's get down to the business at hand. Last week, I made a bunch of excuses for why I wasn't able to get formal predictions finished in time, but this week I was able to make it happen. I've been able to estimate the odds of qualifying for each athlete in all 10 competitions, and at the bottom of this post I've shown the odds for the top contenders in each region.
But first, here's a recap of the methodology, which is largely similar to what was done last year:
- Learn from prior results
- Separate the 2013 Regional competitors into various categories, based on their performance in the 2013 Open, 2012 Games and 2012 Regionals.
- See how frequently athletes in each category posted a very high (top 20 worldwide) or relatively high (20-50 worldwide) regional performance in 2013 (based on the cross-regional rankings last year).
- Repeat the first two steps one year further back. Combine the results with what I came up with in the first two steps. This helped me get a bigger sample size and hopefully improve the predictions.
- Apply learnings to what we know this year to make predictions
- For each athlete this year, place them in one of the categories based on their performance in the 2014 Open, 2013 Games and 2013 Regionals.
- Depending on their category, randomly generate a worldwide ranking for each athlete this year. The category affects this randomized worldwide ranking, i.e. those who had better results in the past year will generally get a better randomized worldwide ranking this year.
- Re-rank all athletes within a region based on these randomized worldwide rankings.
- Repeat 200 times and see how often each athlete qualifies for the 2014 Games.
Now, for those so inclined, here are a few details on this process:
- To get a large enough sample size to build this model, I combined men and women.
- The process for creating the categories of competitors was not straightforward. There was quite a bit of judgment on my part to make sure that each category had sufficient athletes to be credible and that the categories produced results that made sense with each other. For instance, I wanted to separate out the top 2012 Games competitors (I chose top 15), but that meant I could not further break those athletes down based on 2012 regional rank, because there just would not be enough athletes there to get a credible sample.
- The process for randomly generating the numbers is as follows:
- Generate a uniform random number between 0 and 1 (=rand() in Excel). If the first is lower than the athlete's chance of finishing in the top 20, assign him or her to the top 20. If not, then if it is lower than the athlete's chances of finishing in the top 50, assign him or her to be between 20-50. Otherwise, the athlete is assigned to be between 50 and 100.
- Once we have assigned the athlete to the a range of ranks, generate another uniform random number between 0 and 1. Multiply this by 20 to get the athlete's exact place within the range (multiply by 30 if they are in the 20-50 range or multiply by 50 if they are in the 50-100 range). Generally, you'll need to be in the top 50 worldwide to qualify, but depending on how other athletes fare, it's possible to end up in the 50-100 range and still be in the top 3.
- Here is a lifting of the categories I used to break down the athletes:
- Top 15 at prior Games
- Below 15 at prior Games, top 40 worldwide at prior Regionals
- Below 15 at prior Games, below 40 worldwide at prior Regionals
- Did not make prior Games, top 50 worldwide at prior Regionals, top 100 in current Open
- Did not make prior Games, top 50 worldwide at prior Regionals, below 100 in current Open
- Did not make prior Games, 50-100 worldwide at prior Regionals
- Did not make prior Games, below 100 worldwide at prior Regionals, top 250 in current Open
- Did not make prior Games, below 100 worldwide at prior Regionals, below 250 in current Open
- Did not make prior Games, did not compete at prior Regionals, top 75 in current Open
- Did not make prior Games, did not compete at prior Regionals, 75-150 in current Open
- Did not make prior Games, did not compete at prior Regionals, below 150 in current Open
Last year, my predictions weren't bad, but they generally overestimated the chances for the athletes on the low end and high end, but I underestimated the chances for the athletes in the middle. Consider:
- Athletes predicted 0-10% - 2.5% expected to qualify, 0.4% qualified
- Athletes predicted 10-50% - 20.7% expected to qualify, 39.5% qualified
- Athletes predicted 50-100% - 66.3% expected to qualify, 57.1% qualified
I have more data to train the model this year, which should help to calibrate things a little better, and I was a bit more liberal in applying some manual adjustments to some elite athletes. For instance, Julie Foucher did not compete last year, but I treated her as if she had finished in the top 15 in the Games. As a three-time top 5 athlete, I think this is only fair. Other athletes for whom I made at least some adjustment included Rich Froning, Samantha Briggs, Annie Thorisdottir, Frederick Aegideus and Camille Leblanc-Bazinet.
So with that in mind, below are my predictions for week 2 (athletes with less than 5% chance are not shown). As always, keep in mind that this is all in fun, and it's all simply based on the numbers. I'm not making any sort of judgment about the effort these athletes have put in, I'm simply reflecting how athletes in similar situations have performed in the past. Enjoy week 2, everyone!
*Note that Canada East only has two qualifying spots. All other regions this week have three.