Thursday, May 29, 2014

Regional Predictions, Week 4

Although last week's regional competitions had their share of drama, I'll admit it felt like a bit of a letdown after the amazing weekend prior. Thankfully, this fourth and final weekend looks like it has some fantastic stuff in store. I mean, how could Northern California's men's competition not be insane? We have seven former Games competitors vying for three spots, including three men who finished in the top 10 at the Games last year. Like we saw in the Central East, there will be some men not heading to the Games that probably would have gone in virtually any other region in the world.

With that in mind, let's get to some assorted topics before we move onto the predictions for week 4:
  • Taken in a vacuum, I don't have a problem with Dave Castro's statement (speaking for HQ I presume) that there will not be any wild cards given out this year. However, in context of this season, I'm not a fan.
    • Why even announce that wild card spots will be available (which they did earlier this year) if you are going to rule out that possibility before the Regionals have even finished? I cannot conceive of a scenario where wild cards would make more sense than they do for Sam Briggs this year. She is the reigning Fittest Woman on Earth, she had a single bad event in one of the most volatile events ever programmed at Regionals (1-attempt handstand walk) and she still finished fourth in a stacked region. If you're not going to use a wild card in that situation, then you're never going to use it.
    • Talent is so clearly bunched in a few regions (and has been for a few years). I can understand the argument that the regionals are set up with a limited number of spots in each region to increase drama and make things more exciting. However, I find it difficult to accept the argument that this system is ideal for finding the fittest athletes in the world. I get that cross-regional comparisons are not perfect, but I challenge anyone to argue that Graham Holmberg (4th in Central East) is not among the 40 best CrossFitters in the world. As it stands now, he is ranked ahead of the champions from 9 other regions. For Castro to argue that "the right athletes" are going to the Games seems a bit disingenuous. If you're just setting it up this way for drama, that's fine, but let's just call it what it is.
  • Although we have one week to go, the data from across all regions has allowed me to get a sneak peak at some interesting things from this year's regionals.
    • In terms of correlation with success across all Regional and Open events, it appears that events 3 and 7 are the top events at this point. I'll admit when I was wrong, and I was wrong on event 7. The top athletes are all crushing it, and it is damn exciting. Event 3 is a bit surprising, but again, look at the athletes who are doing well there, and they're usually dominating across the board.
    • On the other end of the spectrum, event 5 for the men actually has the lowest correlation with overall success. My guess here is that this is the one event this season that truly favors taller athletes, and so you are seeing some athletes with huge performances who otherwise are struggling. For the women, this event is not so bad, mainly because there are no athletes jumping 10-11 feet in the air and getting to the top of the rope in a couple pulls.
    • Not surprisingly, the two single-modality events (1 and 2) are among the least correlated with overall success for both men and women. Event 2 is slightly worse than event 1, but not by as much as you might think.
    • Events 4 and 6 are kind of middling in this respect. I expected event 6 to really bring out the top all-around athletes, but it might just be so grueling that it heavily favors the endurance specialists.
    • If we look at Open events in this context, 14.3 has the lowest correlation with overall success among Regional athletes (as it did for the entire Open field). On the other side, 14.4 was the highest correlation with overall success among Regional athletes (as it did for the entire Open field). In fact, it is basically neck-and-neck with Regional event 3 for the top spot across all events this season.
    • Some have suggested that results in the handstand walk might be correlated with success in event 4 (which has tons of handstand push-ups). It doesn't appear that way; ranks on those two events are not particularly correlated (52% for men, 44% for women - both of those figures are middle of the road this respect). The only combination of events that really stands out is events 1 and 7, which were 77% correlated for women and 68% correlated for men.
  • Last week I posted a chart and some statistics regarding the accuracy of my predictions (I should note that these are after removing athletes who withdrew prior to event 1). After week 3, the calibration plot looks about the same, but the mean-square error has dropped from 4.38% to 3.93%. For reference, last year's model was 4.43% and a model giving each athlete an equal chance would be about 6.40%. Below is the calibration plot (read last week's post for an explanation):

Alrighty... with all that out of the way. Let's get onto the predictions. This week, the only athlete for whom I made a manual adjustment to the model was Jason Khalipa. This year's events might not really favor him, but the guy has been so freaking consistent over the past 6 years that I felt he warranted special consideration.

With that said, here you go. Enjoy the final week of Regionals, everyone!

[Update 5/31: I've made a couple fixes to account for women's name changes since last year, as well as making the adjustment for Andrea Ager that I suggested in the comments a couple nights ago. I treated her as if she did not compete at Regionals last year, rather than as if she finished very low. Her low finish was due to a DQ in the OHS event, not due to a poor performance overall.]

Note that Africa only has one qualifying spot. All other regions this week have three.

Also note that the pictures look prettier this week because I'm posting from a Mac. Excel is terrible on a Mac, but at least it exports nicely to pictures.

Thursday, May 22, 2014

Regional Predictions, Week 3

I've said before that I have no doubt the CrossFit Games is a totally viable spectator sport. My opinion on the matter hasn't changed, but I'm beginning to think that it's the Regionals that should be on ESPN. And I doubt anyone who followed the coverage on the Games site this past weekend would disagree with me.

There was high drama all weekend, and with 10 different competitions to follow at various time zones across the globe, events were broadcast basically non-stop. We had the reigning fittest woman on Earth fighting just to make the Games in Europe, possibly the most competitive men's competition ever in Central East, a changing of the guard in Australia and the first real challenge to Camille and Michelle Letendre's dominance in Canada East. So before we move on to predictions for this week, let's get to some quick thoughts on week 2:
  • The handstand walk claimed another victim this week in Sam Briggs. I'm sure HQ would never admit that the programming was anything less than perfect, but anyone following that competition knows that Briggs was one of the three fittest women there, and likely the fittest. She dominated events 3-6 and placed decently on the two heavier workouts (1 and 7).
  • That being said, she will almost assuredly get a special invite, and the three women who did qualify certainly deserved it. Along with Southern California, Europe looks to be one of the two strongest women's regions in the world (Australia also looked sneaky-tough last weekend, too).
  • I certainly hope that HQ also extends a special invite to Graham Holmberg. We'll have to wait to see how the rest of the competitions play out, but I'd be he would have qualified in every other region (and likely won many of them). He came through when it counted most, smashing the event record in event 7, but unfortunately, four other men in his region also beat the event record (including third-place Will Moorad). He was also the only man in that region other than Rich Froning to take an outright first place, and he did it twice.
  • Super-impressive performance by Moorad to grab a spot in the Central East. As disappointed as I was to see Graham fall off, it is nice to see someone else break through in that region.
  • If the Games were programmed like the Regionals, Camille Leblanc-Bazinet might be the most dominant athlete in the sport. If the event has barbells and gymnastics, she's basically guaranteed a spot in the top 10 in the world. Hopefully she can fare a bit better with the more unorthodox Games events this year.
  • While I'm not convinced that event 7 is as good a test of fitness as, say event 4 or event 6, but I can't dispute that it is great for the viewers. Those 8 overhead squats have derailed more than one athlete's shot at the Games and it's given several others the chance to make up ground in dramatic fashion.
  • Event 6 looks like an absolute beast of a workout. Unlike prior years, I haven't been able to test out the workouts (due to a back injury), but I don't recall seeing Rich Froning that gassed in a workout since the rope climb/sled push workout at the 2012 Games. 
So how did my predictions do last week? Well, despite a couple of shockers, things actually pretty well. The charts below show how well the predictions were calibrated this year compared to last year. The athletes were bucketed based on their predicted rank, and each bucket was plotted as a poitn on the blue line.  The location on the x-axis represents my predicted chances of qualifying and the location on the y-axis represents the actual chances of qualifying.  The red line represents perfect predictions, so when the blue line is below the red line, my predictions over-estimated the chances of qualifying for those athletes.  As you can see, this year, the blue line tracks much more closely to the perfect predictions.

Of course, calibration only tells half the story. We could easily create a well-calibrated model by predicting an even chance for each athlete. If there are 30 athletes in a region with 3 qualifying spots, we could estimate that each athlete has a 10% chance of qualifying, and indeed, we would be right in some sense. It is sure that 10% of them will qualify. But we are also trying to be accurate. A perfect model would predict 100% chances for the 3 athletes that do qualify and 0% for all others. To measure our accuracy, we can calculate the means square error across all our athletes. In that respect, I did about as well as last year, and considerably better than the perfectly calibrated model with even predictions for each athlete.
  • 2014 Week 2, CFG Analysis Predictions - 4.38%
  • 2013 Weeks 3-4, CFG Analysis Predictions - 4.43%
  • 2013-2014 Equal Chance Predictions - 6.40%
So, with that in mind, I really didn't make any changes to the model this week. The only region where I had to deviate significantly was for the women in Asia. There are no returning Games qualifiers, hardly any returning regional competitors and just a handful of athletes in the top 2000 in the Open worldwide. My default model would have basically given all athletes the same chance, so I modifiied it as best I could, but the predictions are still pretty weak in that region.
[UPDATE 5/23/2014: In the Asia region, I just realized Candice Ford was Candice Howe last year, which meant I originally assumed she did not compete at the regionals last year. The predictions below are now fixed to account for this.]

Anyway, without further ado, here are my week 3 predictions. Enjoy the Regionals everyone!

Note that Asia only has one qualifying spot. All other regions this week have three.

Thursday, May 15, 2014

Regionals Predictions, Week 2

Welcome back, everyone. The first week of the 2014 Regionals is in the books, and in many ways, I think it played out like I expected. The handstand walk derailed one heavy favorite (Stacie Tovar), all of the athletes I picked to fare well did (Lucas Parker, Elizabeth Akinwale, Talayna Fortunato) and the weekend as a whole seemed to favor athletes with strong gymnastic abilities and pure strength, rather than those with the biggest engines. I was pleasantly surprised by event 7, which appeared to be more balanced than I expected and provided some exciting shake-ups in a few regions.

This week looks to be one of the most (if not the most) exciting weeks of the Regional schedule. We all know that the Central East men's region is the toughest in the world, but the European women's region and the Central East women's region are also super-competitive as well this year. It's also a little intriguing to see which "records" will fall now that the bar has been set on all the events. Personally, I think every single one of the men's records will go down (potentially all in the Central East?) and many of the women's records will go down (doubtful that Akinwale's records in events 1 and 2 will fall).

With that in mind, let's get down to the business at hand. Last week, I made a bunch of excuses for why I wasn't able to get formal predictions finished in time, but this week I was able to make it happen. I've been able to estimate the odds of qualifying for each athlete in all 10 competitions, and at the bottom of this post I've shown the odds for the top contenders in each region.

But first, here's a recap of the methodology, which is largely similar to what was done last year:
  • Learn from prior results
    • Separate the 2013 Regional competitors into various categories, based on their performance in the 2013 Open, 2012 Games and 2012 Regionals.
    • See how frequently athletes in each category posted a very high (top 20 worldwide) or relatively high (20-50 worldwide) regional performance in 2013 (based on the cross-regional rankings last year).
    • Repeat the first two steps one year further back. Combine the results with what I came up with in the first two steps. This helped me get a bigger sample size and hopefully improve the predictions.
  • Apply learnings to what we know this year to make predictions
    • For each athlete this year, place them in one of the categories based on their performance in the 2014 Open, 2013 Games and 2013 Regionals.
    • Depending on their category, randomly generate a worldwide ranking for each athlete this year. The category affects this randomized worldwide ranking, i.e. those who had better results in the past year will generally get a better randomized worldwide ranking this year.
    • Re-rank all athletes within a region based on these randomized worldwide rankings.
    • Repeat 200 times and see how often each athlete qualifies for the 2014 Games.
Now, for those so inclined, here are a few details on this process:
  • To get a large enough sample size to build this model, I combined men and women.
  • The process for creating the categories of competitors was not straightforward. There was quite a bit of judgment on my part to make sure that each category had sufficient athletes to be credible and that the categories produced results that made sense with each other. For instance, I wanted to separate out the top 2012 Games competitors (I chose top 15), but that meant I could not further break those athletes down based on 2012 regional rank, because there just would not be enough athletes there to get a credible sample.
  • The process for randomly generating the numbers is as follows:
    • Generate a uniform random number between 0 and 1 (=rand() in Excel). If the first is lower than the athlete's chance of finishing in the top 20, assign him or her to the top 20. If not, then if it is lower than the athlete's chances of finishing in the top 50, assign him or her to be between 20-50. Otherwise, the athlete is assigned to be between 50 and 100.
    • Once we have assigned the athlete to the a range of ranks, generate another uniform random number between 0 and 1. Multiply this by 20 to get the athlete's exact place within the range (multiply by 30 if they are in the 20-50 range or multiply by 50 if they are in the 50-100 range). Generally, you'll need to be in the top 50 worldwide to qualify, but depending on how other athletes fare, it's possible to end up in the 50-100 range and still be in the top 3.
  • Here is a lifting of the categories I used to break down the athletes:
    • Top 15 at prior Games
    • Below 15 at prior Games, top 40 worldwide at prior Regionals
    • Below 15 at prior Games, below 40 worldwide at prior Regionals
    • Did not make prior Games, top 50 worldwide at prior Regionals, top 100 in current Open
    • Did not make prior Games, top 50 worldwide at prior Regionals, below 100 in current Open
    • Did not make prior Games, 50-100 worldwide at prior Regionals
    • Did not make prior Games, below 100 worldwide at prior Regionals, top 250 in current Open
    • Did not make prior Games, below 100 worldwide at prior Regionals, below 250 in current Open
    • Did not make prior Games, did not compete at prior Regionals, top 75 in current Open
    • Did not make prior Games, did not compete at prior Regionals, 75-150 in current Open
    • Did not make prior Games, did not compete at prior Regionals, below 150 in current Open
Last year, my predictions weren't bad, but they generally overestimated the chances for the athletes on the low end and high end, but I underestimated the chances for the athletes in the middle. Consider:
  • Athletes predicted 0-10% - 2.5% expected to qualify, 0.4% qualified
  • Athletes predicted 10-50% - 20.7% expected to qualify, 39.5% qualified
  • Athletes predicted 50-100% - 66.3% expected to qualify, 57.1% qualified
I have more data to train the model this year, which should help to calibrate things a little better, and I was a bit more liberal in applying some manual adjustments to some elite athletes. For instance, Julie Foucher did not compete last year, but I treated her as if she had finished in the top 15 in the Games. As a three-time top 5 athlete, I think this is only fair. Other athletes for whom I made at least some adjustment included Rich Froning, Samantha Briggs, Annie Thorisdottir, Frederick Aegideus and Camille Leblanc-Bazinet.

So with that in mind, below are my predictions for week 2 (athletes with less than 5% chance are not shown). As always, keep in mind that this is all in fun, and it's all simply based on the numbers. I'm not making any sort of judgment about the effort these athletes have put in, I'm simply reflecting how athletes in similar situations have performed in the past. Enjoy week 2, everyone!

*Note that Canada East only has two qualifying spots. All other regions this week have three.

Thursday, May 8, 2014

Quick Regional Programming Thoughts

As much as I was hoping to have my regional predictions set up to go this week, I wasn't able to make that happen. A confluence of events over the last month - a 5-hour actuarial exam, a trip out of town to have my son baptized, my wife's birthday and numerous trips to the chiropractor/ART to try to fix some back issues that flared up after the Open - pretty much made that impossible for me. By next week, I do expect to be able to produce my stochastic regional predictions (like I did for the final two weeks of last year's season). But for this week, your guess is as good as mine about who will claim the first 24 spots at the Games.

That being said, I did have time to do a bit of work assessing the programming for this year's Regionals. Here are my thoughts:
  • Overall, I like the programming. In particular, I think events 3-6 seem to me to be well-balanced workouts that should be fun to watch.
  • Events 1 and 2 introduce a ton of volatility into the situation. Only getting 3 attempts on the hang snatch means we could see some top athletes get burned by taking a gamble on those 2nd and 3rd lifts. And a max handstand walk on a single attempt gives plenty of opportunity for a catastrophic failure that could cost an otherwise fit athlete a shot at the Games. In my opinion, I don't think this event really helps us find the fittest athletes, and it may end up preventing some really stellar athletes from making it.
  • Event 7 is really a wait-and-see event for me. It seems like a really weird design for a workout to have just 8 reps of the overhead squat, since it doesn't seem like it gives enough time for athletes to make up ground on that movement. But hopefully I'm wrong and this event turns out to be a better test of fitness than it appears to be on paper, especially considering it's the finale.
  • This year's regional is in some ways the heaviest regionals to date and in some ways the lightest. When weights are involved, the average relative load (1.55 men, 0.98 women) is the highest in the past four years. However, the programming is only 37% lifting, the smallest percentage of any Regionals. In fact, the 2011 Games is the only HQ competition with a lower percentage (34%).
  • When you combine those two factors, you get an load-based emphasis on lifting (LBEL) of 0.58 for men and 0.36 for women, both slightly lower than 2011 and 2013 and much lower than 2012. The chart below shows the progression of each of these metrics at the Regionals since 2011.

  • All-in-all, I believe this year's Regionals look more like the Games than in any past year. And what that means to me is that the emphasis is on strength, both in terms of weightlifting and very challenging bodyweight movements (like legless rope climbs or strict handstand push-ups). Don't get me wrong, you can't do well here without a high level of conditioning, but you will be punished much harder for lacking in strength.
  • Look for handstand push-ups and legless rope climbs to completely decimate the women's leaderboard. With the legless rope climbs, we saw how much variability there was at the Games last year. As far as handstand push-ups, keep in mind that before kipping became common (think 2011 and earlier), this was an extremely difficult movement for many top women, even at the Games.
  • With that in mind, here are some athletes that should do well this programming: Lucas Parker, Josh Bridges, Lacee Kovacs, Chris Spealler, Matthew Fraser, Elizabeth Akinwale, Talayna Fortunato, Annie Thorisdottir, Camille Leblanc-Bazinet.
  • Beyond those names, all the podium athletes from last year's Games should do fine. I'm just not sure they will fare any better because of this programming.
That's it for today. I hope you all enjoy the opening weekend of Regionals, and I'll see you again next week.