Sunday, June 17, 2012

So who CAN win the CrossFit Games?

Last week, I did some work to evaluate the regional performances of all of the Games Qualifiers. I tried to make this as fair as possible by adjusting the scores in each event based on the week of competition. You can see my previous two posts for the details.

This work produced some intriguing results, but in one respect, the results were not surprising at all: the champions of this analysis were Rich Froning and Annie Thorisdottir, the reigning champions and most people's choice to win the Games again this year. For Rich, the margin was fairly wide, but the race was a bit tighter on the women's side. Still, we know that the CrossFit Games are all about the unknown and unknowable, which means new events that will surely shake up those regional standings. Intuitively, we know other athletes have a "chance" to win the Games. But who really has a decent shot, and how good of a shot do they have?

On the CrossFit Games Update the past two weeks, Pat Sherwood, Rory McKernan and Miranda Oldroyd have thrown out some other names that they expect to be in the mix. There's no question that the athletes mentioned have a shot at dethroning Rich and Annie, but let's see if we can use the data we have available to back up those predictions (and maybe add some more of our own).

The basic concept of my work here was to take the results from this year's Regional and Open to "simulate" the Games. This is similar to Monte Carlo Simulation for those who are familiar with that technique. Here's how my system works:

We assume the Games have 10 events, as they did last year. For each simulated event, we randomly choose one of the events that has already occurred this year (either from the Regionals or the Open) and use the known results. We do this for all 10 events, add up the point totals and crown a winner. We then repeat this process 1000 times, which allows the elements of random chance to take effect and give us some rough probabilities of each athlete winning.

That's the basic idea. One item worth noting is that in my system, I allow events to be selected multiple times. For instance, it is possible that Regional Event 2 might be chosen three times, or it might not be chosen at all. The reason is that these previous results are meant to represent theoretical "events" at the Games. I'm not saying that at the Games, they will have three separate events that are identical to event to Regional Event 2. What I'm saying is that there could be three separate events at the Games that produce similar results. Also, if we didn't allow events to be picked twice, well then we'd only leave out one event each time (11 events in Regionals and Open combined, 10 events at the Games), and our results would be pretty dull.

I should also note that for the Regional results, I am using my adjusted results that I mentioned earlier. I feel that makes this as fair as possible.

Here's an example of one simulation of the Women's Competition:


OK, so what about the results? Well, if we run the simulation allowing all Regional and Open events to be selected, here's what happened: [Updated 6/19/2012 - Used updated results from regionals (see first post) and re-ran simulations. Results are largely the same but some values have shifted a bit.]

Men's winner: Rich Froning (886 times), Neal Maddox (68), Dan Bailey (41), Jason Khalipa (1) and Ben Smith (1)

Women's winner: Julie Foucher (439 times), Annie Thorisdottir (277), Kristan Clever (227), Camille Leblanc-Bazinet (35), Azadeh Boroumand (16) and Michelle Letendre (6)

Rich Froning, you are killing me. I'm trying to make this interesting and there you go winning almost 89% of the time.

However, I don't feel like our work is done here. Should we really be including events from the Open? We know some of these athletes put more effort into the Open events, while others focused on their Regional training and just did what was necessary to qualify. Plus there is the issue of inconsistent judging, as well as the fact that the Open will be 4-5 months old by the time the Games roll around. So what if we simulated the Games, but only using the Regional results? Well...

Men's winner: Rich Froning (702 times), Dan Bailey (152), Neal Maddox (101), Jason Khalipa (33) and Ben Smith (12)

Women's winner: Annie Thorisdottir (653 times), Azadeh Boroumand (207), Julie Foucher (104), Michelle Letendre (22), Kristan Clever (6), Camille Leblanc-Bazinet (6) and Elizabeth Akinwale (2)

Well, it's becoming pretty evident that the women's race is wide open. Even as the clear favorite under this scenario, there were 59 instances in which Annie didn't even finish on the podium. That only happened 31 times to Rich in this scenario (only 3 times in the previous scenario).

Let's try one more thing. Although I don't feel that the Open events are that indicative of the Games results, I'd rather not discount them entirely. After all, they hit on some movements that weren't represented as well the Regionals, such as burpees, box jumps and thrusters. So my not-exactly-scientific solution is to simulate 8 of 10 Games events using only the Regional results, then use the Open results for the other two Games events, but we'll treat them like the "skills tests" last year. For each "skills test," we'll draw three Open events, sum up the results and then rank everyone to get one score for the event.

And the results...

Men's winner: Rich Froning (765 times), Neal Maddox (121), Dan Bailey (106), Jason Khalipa (7) and Ben Smith (1)

Women's winner: Annie Thorisdottir (581 times), Julie Foucher (282), Kristan Clever (61), Azadeh Boroumand (48), Camille Leblanc-Bazinet (24), Michelle Letendre (2) and Lindsay Valenzuela (2)

So you take your pick on which simulation you feel is most appropriate. Of course, there are other ways to do this that might have their own merits.

A couple other random observations before we go: [Updated 6/19/2012 - Slight updates here due to re-running the simulations.]

I find it interesting that Jason Khalipa, despite being second in my Regional comparison, does not win often in these simulations. The reason, it seems, is that as good as Jason is, he is rarely better in any given event than Rich. He beat Rich in two events at Regionals, but not once in the Open. However, in these simulations, he was almost always on the podium (about 80% of the time). The takeaway here is that to be competitive in the Games, you have to be consistent.  But to WIN, you have to be absolutely great in a handful of events, and even if you may have a hole or two in your game somewhere, it is possible (but not likely) that it won't be exposed too badly.

Valerie Voboril was 15th in the Regional comparison, but she finished third overall nine times in the first scenario and two times in my last scenario. She was the lowest finisher to make the podium in the final simulation. In my first simulation, Denae Brown (32nd in regionals) pulled off a stunner and finished third once.

The men's race was more stable. In the last two scenarios, no one outside the top 5 finishers from the Regional Comparison finished on the podium. Kenneth Leverich (10th) finished third 12 times and Chase Daniels (18th) finished third once in the first scenario.

Please remember that this is all in fun. I'm not by any means saying that the athletes who don't win any of the 1,000 simulations CANNOT win the Games. There are so many variables that it's going to be very tough to predict the results of the Games based on Regional results only. But for one of those other athletes to do it, the fact is they WILL have to compete at a level higher than they have so far. Good luck to you all!

2 comments:

  1. Very cool. Much better analysis than we get in the official Update, where they were surprised that Dan Bailey did well on short events and Rich beat him on longer events, despite that being a pretty obvious result to predict. I mean if you look at past results as well as Rich and Dan's own words. I love that they do the update, but the analysis is generally pathetic.

    ReplyDelete
  2. Gotta say, though, I'm a little surprised Foucher doesn't win more when you looked at just the regionals. I think the more the events lean towards longer WODs, the better she's going to do. And as much as Annie is awesome and Clever is amazing... Julie Foucher is the one I'd want to date :)

    ReplyDelete