Follow me on Twitter!

Thursday, May 15, 2014

Regionals Predictions, Week 2

Welcome back, everyone. The first week of the 2014 Regionals is in the books, and in many ways, I think it played out like I expected. The handstand walk derailed one heavy favorite (Stacie Tovar), all of the athletes I picked to fare well did (Lucas Parker, Elizabeth Akinwale, Talayna Fortunato) and the weekend as a whole seemed to favor athletes with strong gymnastic abilities and pure strength, rather than those with the biggest engines. I was pleasantly surprised by event 7, which appeared to be more balanced than I expected and provided some exciting shake-ups in a few regions.

This week looks to be one of the most (if not the most) exciting weeks of the Regional schedule. We all know that the Central East men's region is the toughest in the world, but the European women's region and the Central East women's region are also super-competitive as well this year. It's also a little intriguing to see which "records" will fall now that the bar has been set on all the events. Personally, I think every single one of the men's records will go down (potentially all in the Central East?) and many of the women's records will go down (doubtful that Akinwale's records in events 1 and 2 will fall).

With that in mind, let's get down to the business at hand. Last week, I made a bunch of excuses for why I wasn't able to get formal predictions finished in time, but this week I was able to make it happen. I've been able to estimate the odds of qualifying for each athlete in all 10 competitions, and at the bottom of this post I've shown the odds for the top contenders in each region.

But first, here's a recap of the methodology, which is largely similar to what was done last year:
  • Learn from prior results
    • Separate the 2013 Regional competitors into various categories, based on their performance in the 2013 Open, 2012 Games and 2012 Regionals.
    • See how frequently athletes in each category posted a very high (top 20 worldwide) or relatively high (20-50 worldwide) regional performance in 2013 (based on the cross-regional rankings last year).
    • Repeat the first two steps one year further back. Combine the results with what I came up with in the first two steps. This helped me get a bigger sample size and hopefully improve the predictions.
  • Apply learnings to what we know this year to make predictions
    • For each athlete this year, place them in one of the categories based on their performance in the 2014 Open, 2013 Games and 2013 Regionals.
    • Depending on their category, randomly generate a worldwide ranking for each athlete this year. The category affects this randomized worldwide ranking, i.e. those who had better results in the past year will generally get a better randomized worldwide ranking this year.
    • Re-rank all athletes within a region based on these randomized worldwide rankings.
    • Repeat 200 times and see how often each athlete qualifies for the 2014 Games.
Now, for those so inclined, here are a few details on this process:
  • To get a large enough sample size to build this model, I combined men and women.
  • The process for creating the categories of competitors was not straightforward. There was quite a bit of judgment on my part to make sure that each category had sufficient athletes to be credible and that the categories produced results that made sense with each other. For instance, I wanted to separate out the top 2012 Games competitors (I chose top 15), but that meant I could not further break those athletes down based on 2012 regional rank, because there just would not be enough athletes there to get a credible sample.
  • The process for randomly generating the numbers is as follows:
    • Generate a uniform random number between 0 and 1 (=rand() in Excel). If the first is lower than the athlete's chance of finishing in the top 20, assign him or her to the top 20. If not, then if it is lower than the athlete's chances of finishing in the top 50, assign him or her to be between 20-50. Otherwise, the athlete is assigned to be between 50 and 100.
    • Once we have assigned the athlete to the a range of ranks, generate another uniform random number between 0 and 1. Multiply this by 20 to get the athlete's exact place within the range (multiply by 30 if they are in the 20-50 range or multiply by 50 if they are in the 50-100 range). Generally, you'll need to be in the top 50 worldwide to qualify, but depending on how other athletes fare, it's possible to end up in the 50-100 range and still be in the top 3.
  • Here is a lifting of the categories I used to break down the athletes:
    • Top 15 at prior Games
    • Below 15 at prior Games, top 40 worldwide at prior Regionals
    • Below 15 at prior Games, below 40 worldwide at prior Regionals
    • Did not make prior Games, top 50 worldwide at prior Regionals, top 100 in current Open
    • Did not make prior Games, top 50 worldwide at prior Regionals, below 100 in current Open
    • Did not make prior Games, 50-100 worldwide at prior Regionals
    • Did not make prior Games, below 100 worldwide at prior Regionals, top 250 in current Open
    • Did not make prior Games, below 100 worldwide at prior Regionals, below 250 in current Open
    • Did not make prior Games, did not compete at prior Regionals, top 75 in current Open
    • Did not make prior Games, did not compete at prior Regionals, 75-150 in current Open
    • Did not make prior Games, did not compete at prior Regionals, below 150 in current Open
Last year, my predictions weren't bad, but they generally overestimated the chances for the athletes on the low end and high end, but I underestimated the chances for the athletes in the middle. Consider:
  • Athletes predicted 0-10% - 2.5% expected to qualify, 0.4% qualified
  • Athletes predicted 10-50% - 20.7% expected to qualify, 39.5% qualified
  • Athletes predicted 50-100% - 66.3% expected to qualify, 57.1% qualified
I have more data to train the model this year, which should help to calibrate things a little better, and I was a bit more liberal in applying some manual adjustments to some elite athletes. For instance, Julie Foucher did not compete last year, but I treated her as if she had finished in the top 15 in the Games. As a three-time top 5 athlete, I think this is only fair. Other athletes for whom I made at least some adjustment included Rich Froning, Samantha Briggs, Annie Thorisdottir, Frederick Aegideus and Camille Leblanc-Bazinet.

So with that in mind, below are my predictions for week 2 (athletes with less than 5% chance are not shown). As always, keep in mind that this is all in fun, and it's all simply based on the numbers. I'm not making any sort of judgment about the effort these athletes have put in, I'm simply reflecting how athletes in similar situations have performed in the past. Enjoy week 2, everyone!

*Note that Canada East only has two qualifying spots. All other regions this week have three.


  1. Sidell should be a massive outlier. I'm rooting for Amanda Allen to get back into the individual competition. Great performance by Brandon Swan just now.

    1. Whoa, celebrity sighting! Great work on 14.5, it was cool to see that on the Games site.

      But yes, I also noticed that the chances for Sidell are probably understated. I considered adjusting her upwards, but I really wanted to be judicious with the instances where I made any changes to the model. The athletes that I adjusted are all multi-time Games qualifiers. As good as Sidell has looked at times, she still has only completed one HQ-sanctioned competition (2014 Open), and she was only 10th in her region. I'm sure she may have been coasting through the Open, but this is also an extremely tough region. I'd be hard-pressed to bet against any of the top 4 in that region.

    2. Thanks Anders! What an honour to be on the games site, having my little video as the talk of the box when we got back from Bhutan took the sting out of not being able to finish the open this year.

      I was really expecting Sidell to do what Julie Foucher did. Quite a sad weekend watching some of my favourites bow out. Amanda Allen came so close, Graeme Holmberg, Sam, Ruth, Katrin, Frederik Aegidius.

      Will you do predictions for this weekend? I was doing hang snatches and talking strategy with Shingo Moromosa yesterday in open gym, the event choices by Castro are perfect for him. Prior to the announcement he'd climbed a mountain on his hands in 4 hours in training! The team I was on didn't qualify this year, but I'll be cheering on our main team, Shingo and Moe from our gym.

    3. Yes, I was planning to put out predictions for week 3 on Thursday evening, EST. Keep in mind that my predictions don't account for the specific programming for regionals this year, they only look at past results for each athlete in this Open as well as last year's Games and last year's Regionals.

      That being said, no doubt that having a strong handstand walk can go a long way this year. But probably more than that, having a BAD handstand walk can all but eliminate you.

  2. Danielle Sidell (who I really thought would have a "breakout" type performance), Briggs have fallen victim to the handstand walk.

  3. Anders, I just came across your blog. It's fascinating. Thanks for making all of this work available to us. Small request: would you mind enabling the email-alert function for the blog, so we can sign up to receive notifications for new posts?

    1. I believe if you follow me on Google+ it should notify you. I plan on posting again this Thursday, so if you try that and don't get an alert, let me know.

      (I'm admittedly pretty bad about knowing exactly how all that stuff works, though. One day I'll get my act together and learn more about it.)