CFG Analysis: July 2013

Friday, July 26, 2013

After Day 1, Is Rich Froning Still The Favorite?

Anyone who has been following the CrossFit Games for the past few years probably knows that the results after the first couple events generally don't really look a whole lot like the results at the end of the weekend. For one, there are simply a lot of events left to shake things up. This year, it appears we have at least 8 left, but I'm guessing more. But also, the early events have typically involved some atypical CrossFit movements, particularly swimming. The best swimmers have had a big advantage in the early events in the past few years, but the best swimmers aren't necessarily the best CrossFitters, so they often fall off over the course of the weekend.

Still, if you're making predictions right now (and you can make them up until the first Friday event, in fact, at the contest at switchcrossfit.com), you can't simply ignore the results from Wednesday. Those points are in the bank, and guys like Dan Bailey (currently 34th) now have a lot of ground to make up if they want to make it back into contention. My stochastic projections prior to the Games had Bailey picked very high, but how high would I pick him right now? And what about Rich Froning, who was a heavy favorite coming in but is currently in 6th?

Well, I took a couple hours to look into this. What I did was pretty simple: I re-ran my stochastic projections, but I replaced three of the random events with the actual results from Wednesday. The events I replaced were the random event based on last year's "long event" and 2 events based on this year's Regionals. My model still assumes 15 scored events, so we have 10 left that are based on this year's Regionals and 2 left that are based on this year's Open. If we assumed this year will have fewer than 15 events, the results would a little bit different - the current leaders would have a bigger advantage. But I think there are still a lot of points left on the table.

To keep things short, I'm not going to reproduce the entire table here for men and women. Rather, I'll give a quick recap of the current favorites, as well as some of the biggest movers after day 1.

Men
Favorite: Rich Froning, 51% chance. Froning dropped from a 58% chance prior to the Games but still is close enough to be considered the favorite in the long run.
Biggest contender: Jason Khalipa, 34% chance. Khalipa was already in the discussion, but his dominant performance on day 1 moved him up from a 7% chance coming in.
Others still with a strong shot: Scott Panchik, 7%; Josh Bridges, 5%. Both lost some ground on the rowing events. For those who read my methodology, you'll recall Panchik and Bridges were expected to do well on day 1 because of strong showings on the long events in past years.
Other notes: Dan Bailey dropped from a 1.7% shot to an 0.6% shot after a rough day 1, and Ben Smith fell from 3.6% chance to a 1.0% chance. Even the guys like Garrett Fisher, Chad Mackay and Justin Allen who did really well on day 1 are still pretty big longshots based on their Regional performances. The fact that the leader is Jason Khalipa doesn't make it any easier for them to make up ground. However, I do now have Fisher (currently 2nd overall) with a 7% chance at the podium, up from 1% coming in.

Women
Favorite: Sam Briggs, 66% chance. She was the favorite coming in at 32%, and with a lead, she's got to be an even bigger favorite. She doesn't have a lot of holes in her game, but there are still a lot of unknown events that could shake things up.
Biggest contender: Lindsey Valenzuela, 8% chance. She had a strong day 1 and is always a threat to win some of the heavier events. She moved up from about a 5% shot coming into the Games.
Others still with a strong shot: Kaleena Ladeirous, 6%; Rebecca Voigt, 7%; Elizabeth Akinwale, 4%; Talayna Fortunato, 3%. Ladeirous and Fortunato moved into the mix with good showings on day 1. Akinwale was a big contender coming in but now has a good deal of ground to make up sitting in 20th.
Other notes: Camille Leblanc-Bazinet dropped from a 9% chance to just a 2% chance now that she's back in 28th. There are three other athletes still with at least a 1% chance: Michelle Letendre (2.4%), Alessandra Pichelli (2.3%) and Kara Webb (2.0%). Rory Zambard, relatively unknown coming in, is at a 0.3% chance of claiming the title, after a very solid day 1.

It's still early, and the key for the top athletes is just to keep themselves within striking distance. Nobody is truly out of it at this stage, but a few athletes certainly made things a bit harder on themselves, while others gave themselves a real shot.

I'll be in California watching the Games in person for the next few days, and I don't plan to post anything else until I get back into town next week. Until then, enjoy the Games everyone!

Monday, July 15, 2013

So Who CAN Win the 2013 CrossFit Games - Predictions

Just a few quick notes before getting to the picks:

These picks are based almost entirely off the results from this season, and thus the order will be similar (but not identical) to the Cross Regional Comparison found at http://crossfitregionalshowdown.com/leaderboard/men.
There are some Games veterans, like Matt Chan for instance, whose odds probably look lower than some would expect. That's because last year's Games played only a very minor role in these projections. Although there are some athletes for whom we could probably make an exception, I think that in general, the results from Regionals this season are the best predictors of what will happen at the Games this season. Regionals are competitive enough now that I doubt many athletes were holding much back.
For full methodology, see the previous post. The general idea is to use the results from the events that have occurred this season and simulate Games events that would be similar to them.
These are rounded to the nearest 1%, so some athletes listed with a 0% chance actually may have non-zero chance according to the model, but that chance is less than 0.5%. For instance, virtually everyone had a non-zero chance of finishing in the top 10. The list is sorted by chance of winning, prior to rounding, and in case of ties, it is sorted by average finish.
This is all in good fun, so don't take it too seriously if your favorite athlete doesn't appear as highly as you'd like. I'm well aware that this model isn't perfect, but my goal is to make the best predictions I can with the data we have available. There's plenty going on behind the scenes for each of these athletes and plenty of other variables that I simply can't capture.
I'm curious to hear who you guys are picking this year. I think it should be a blast to see how things play out given the level of competition we've already seen this season. Post to comments or shoot me an email to let me know your take.

OK, without further ado, here are the picks. For each athlete, I have the estimated chance of him/her winning, placing in the top 3 (podium) and placing in the top 10 (money), along with the average ranking he/she attained across all simulations.

So Who CAN Win the 2013 CrossFit Games - Methodology

In some ways, it seems like making predictions about the CrossFit Games should be relatively easy. After all, we have plenty of data to make direct comparisons between athletes. So far this season, these athletes all have completed the same 13 events. By this point, it seems like the cream should have risen to the top. Remember, the 2007 Games had only three events and the 2008 Games had only 4 events. With 13 events already, shouldn't the champion be fairly clear?

Of course, what we've seen is that competition has gotten much tighter in recent years. In 2008, there were only a handful of athletes of the caliber to even think about contending for the title. This year, if we compare the Games athletes, 14 different male athletes and 14 different female athletes have finished in the top 3 of at least one workout. So nearly a third of the field has shown the capability to be the close to the best in the world on a given workout.

So, obviously, the complicating factor with predicting the Games is that we don't know what the workouts will be. And even if we knew what they were (in fact, we likely will know some of the events within the next week or so), we can pretty much guarantee that they won't match any of the 13 workouts we've seen thus far. So what can we do?

Last year, I estimated the odds of each athlete winning the Games by randomly selecting 10 events from among the Regional and Open events that had occurred. As I looked back on that methodology, I noticed that it really only gave a small number of athletes a chance at winning or even placing in the top 3. The reason is that I implicitly assumed that each event of the Games would exactly mirror one of the prior events of the season. After some investigation, it turned out that most of the events from the Games did not match any one event from the Regionals or Open particularly closely.

Of the 20 events from the 2012 Games prior to cuts (10 men's events + 10 women's events), I looked at the correlation between that event and each Regional and Open event.
For each of those Games events, I took the maximum of those correlations.
3 of 20 were at least 60% correlated with one Regional or Open event.
10 of 20 were at least 50% correlated with one Regional or Open event.
5 of 20 were not more than 30% correlated with any Regional or Open event.

Certainly, the reason for this variation is due largely to the design of the workouts in the Games vs. the Regionals and the Open. But I also think part of it is due to the fact that the Games are simply a different competition than the Regionals and the Open. Athletes come in at varying levels of health, with varying levels of nerves, and so even if the events were identical to regionals, I think we'd have different results.

Either way, I felt that in estimating the chances for each athlete this year, I needed to account for how much variation we have seen from the Regionals/Open to the Games. I needed to simulate the Games using results that weren't identical to the Regionals/Open but were correlated. I also wanted to rely primarily on the Regional results, since we know that some top athletes tend to coast through the Open while others take it a bit more seriously. Still, I did include the Open results to a lesser extent, because I don't think it's fair to ignore it entirely as it provides insight into how athletes fare in events that are generally lighter than what we see at the Regional level.

Additionally, we know that historically, the Games has typically included at least one extremely long event (Pendleton 2, for instance). This event is generally very loosely correlated with anything at Regionals or in the Open. But, we can assume that athletes who did well on the "long" event the prior year will likely do well on the long event this year this year.

So I set up a simulation of 15 events, assuming no cuts (all athletes compete in all 15 events). Here is a description of how each event was simulated:

For 12 events, I randomly chose one of the Regional events to be the "base" event.
I started with the results (not the placement, the actual score) from that base event, then "shook up" those results enough so we'd get about new rankings that were roughly 50% correlated to the base event.

To "shake up" the original results, I adjusted each athlete's original result randomly up or down. Exactly how much I allowed the result to vary depended on how much variation was involved in that event to begin with. So if Regional Event 4 was the base event, I might let the scores vary by 3 minutes, but if Regional Event 1 was the base event, they might vary by only 1 minute.
I did testing in advance to see how much I needed to vary each individual's score to achieve about 50% correlation. It turned out to be about +/- 2.5 standard deviations. So each athlete's score could move from his/her original score by as much as 3 standard deviations in each direction.
The athletes scoring well in the base event still have an advantage, but we allow things to shift around a bit.

For 2 events, I used the same process, but I randomly chose one of the Open events to be the "base" event.
For 1 event, I used the Pendleton 2 results from 2012 as the "base" event. For athletes who didn't compete in the Games last year, they were assigned a totally random result.

Athletes who did well last year have an advantage, but I did "shake up" the results a bit in each simulation.
Keep in mind that finishing poorly in Pendleton 2 last year was considered worse than not competing at all.
I made two exceptions: Josh Bridges and Sam Briggs missed last year due to an injury but did extremely well on the long beach event in 2011. I treated them as if they had competed in Pendleton 2 and finished very highly.

These events were simulated 5,000 times. The Games Scoring table was applied to determine the final rankings after each simulation.

Before applying this method to this year's field, I went back to see what type of estimates I would have gotten last year with this method. Some notes from those simulations:

I looked at how good a job I did at predicting which athletes would finish in the top 10. The mean square error (MSE) of my model would have been 0.121 for women and 0.104 for men. Had I simply assumed the top 10 from Regionals would be top 10 at the Games with 100% probability, the MSE would have been 0.130 for men and 0.133 for women. If I had instead assumed all athletes had an equal shot at finishing in the top 10, the MSE would have been 0.254 for men and 0.259 for women. So I did have an improvement over those naive estimates.
On the men's side, I would have given Rich Froning a 45% chance of winning, with Dan Bailey having the next-best chance at 30%. For the women, I would have given Julie Foucher a 53% chance of winning and Annie Thorisdottir a 22% chance of winning (remember, Foucher was the pick for many in the community last year, including me). No one else would have had more than a 7% chance on the women's side.
For podium spots, I would have given Froning an 86% chance, Chan a 4% chance and Kasperbauer a 2% chance. For women, I would have given Thorisdottir a 61% chance, Foucher an 84% chance and Fortunato a 3% chance. While it would be nice to have given Chan, Kasperbauer and Fortunato a better shot, I don't recall many people talking these athletes up prior to the Games. None had ever reached the podium before, although Chan had been close.

My goal was to strike a balance between confidence in the favorites (like Froning) and allowing enough variation so that relative unknowns (like Fortunato) still have a shot. This largely comes down to how much I shook up those original results. The less I shook up the original results, the more confident I would have been that Froning would have won last year. But I also would have given someone like Matt Chan virtually no shot, because his Regional performance simply wasn't that strong compared to the other heavy hitters. But if I shook up the original results too much, things just got muddy and I allowed everyone to have a fairly even chance to win, which doesn't seem realistic either.

No model is going to be perfect with this many unknowns. Sure, you could argue that I am not taking into account other factors, like the advantage that Games "veterans" could have. But I would counter by pointing out that last year, Fortunato was a first-time competitor and Kasperbauer hadn't competed individually since 2009, and they both fared well. Other athletes like Neil Maddox simply didn't perform well at the Games despite experience at the Games and great performances at Regionals. A lot of it simply has to do with what comes out of the hopper, how each athlete manages the pressure and what little breaks go for or against each athlete throughout the course of the weekend. But at the end of the day, the fact is that the athletes who do well at Regionals and the Open generally fare well at the Games, and that's why I am using those results as the basis for my estimates.

With the methodology and assumptions out of the way, move ahead to my next post for the picks for the 2013 Games!

Thursday, July 11, 2013

Quick Update - Predictions Coming Soon

Hello all. I just wanted to drop a quick post to say that it's been a busy week, but my predictions for the Games will be forthcoming soon, likely this weekend. I'll be estimating the likelihood of each individual athlete winning the whole thing, placing in the top 3 and placing in the money (top 10). The process is a bit more complex than last year, but I think it should be pretty neat.

I'm also curious to see what you guys are thinking about this year's Games. Seems pretty clear that Froning will be the favorite on the men's side, but the women's side is wide open. Feel free to post thoughts to comments here or on my next post, after I make my predictions.

The Games are coming up on us soon (potentially under 2 weeks, depending on when the competition actually starts), and I'm pumped to get out to L.A. to watch. In the meantime, good luck with your training!