Follow me on Twitter!

Tuesday, July 24, 2012

Were the Games Well-Programmed? (Part 1)

This is the first part of a two-part look at how well the 2012 CrossFit Games season was programmed. Today, I only want to focus on the CrossFit Games itself, ignoring the Regionals and Open for now. Along the same lines as my post "Are certain events 'better' than others?', this won't be a discussion with a clear-cut answer. What I'm hoping to do is take an objective look at the programming of the CrossFit Games, something beyond just "Whoa, Dave Castro is CURRRRAAZZZYY for making them do a triathlon!"

OK, the bulk of my post is based on the opinion that in programming the Games, there should be five goals in mind (in descending order of importance):

1) Ensure that the fittest athletes win the overall championship
2) Make the competition as fair as possible for all athletes involved
3) Test events across broad time and modal domains (i.e., stay in keeping with CrossFit's general definition of fitness)
4) Balance the time and modal domains so that no elements are weighted too heavily
5) Make the event enjoyable for the spectators

While this may not be HQ's stated mission in programming the Games, it's certainly how I would approach programming the Games and I think it is roughly in line with what HQ purports to do. With that as the framework for this analysis, let's evaluate how well Mr. Castro and HQ fared this year.

1) Ensure that the fittest athletes win the overall championship - Although I did pick Julie Foucher to edge Annie Thorisdottir for the women's title, I have to concede that Thorisdottir and Rich Froning again appeared to be the fittest athletes in the world, and by a fairly wide margin.

There is no question about Froning - he won the Open, he finished atop the Regional rankings (before and after adjustments based on week of competition) and he won the Games by 114 points. Unless Josh Bridges comes back next year or Froning gets hurt, Froning has to be a huge favorite to win it again next year.

As far as Annie is concerned, she finished third in the Open but finished atop the Regional (before and after adjustments based on week of competition) and wound up winning by 85 points at the Games. If we take all Games athletes and add up their rankings across the 21 events that everyone completed (excluding the five events after cuts started at the Games), Annie had 172 points, 25 ahead of second-place Julie Foucher. Keep in mind that she then beat Foucher in three of the final five workouts. (If you're curious, Froning finished with just 118, well ahead of second-place Dan Bailey with 226).

Adding in the fact that these two athletes both won the title last year, and I think it's safe to say that the fittest athletes did indeed win the titles this year.

Grade: A

2) Make the competition as fair as possible for all athletes involved - This goal involves a few different things. First, the scoring system needs to be fair. I think the scoring system is far from perfect, but I think it's fair enough. HQ is clearly trying to reward the particularly high finishes on each event by spacing out the top 6 places more, and I'm fine with that. I think there is an element of head-to-head competition that is rewarded by a system like this, as opposed to a pure place-per-event system like Regionals. Ideally, as many people have suggested, they'd switch to some sort of "percentage of work" system that would take into account the discrepancy between places (e.g. a 10-second victory is worth more than a 1-second victory on the same workout). But I think this system is OK for now.

Second, the events themselves need to be judged, operated and scored fairly. I think this was a mixed bag. The standards on finishing certain workouts, like the medball-HSPU and the Double Banger, were inconsistently enforced. Some athletes were made to get the Medball to stay in the required area after finishing the medball-HSPU workout, while others were allowed to simply drop the ball or merely run across the line. In some cases, this made a difference of 5-6 seconds. Also, the medball toss (without a doubt the worst event, more on that in Part 2) had several instances where the equipment malfunctioned (Kristan Clever is one example) and the athletes weren't given a fair chance. And although it likely wasn't a major factor, the athletes who went second in each heat on the medball toss had considerably less rest than those who went first (around 90 seconds vs. about 3 minutes). But as far as judging, I will say that from what I could tell, things looked pretty even, especially given that we're working mostly with volunteer judges.

All in all, I'd say this year's event was better than years past in this respect, but improvements still need to be made in this area.

Grade: B-

3) Test events across broad time and modal domains (i.e., stay in keeping with CrossFit's general definition of fitness) - To look at this, I've constructed the following table to compare all the Games events in terms of time, number of movements, the level of weight lifted and bodyweight strength required. The times here represent a ballpark figure of the fastest finisher for men and women. As far as level of weight lifted, I grouped everything into broad categories, generally based on the heaviest weight involved in the workout. Keep in mind, this varies by lift: for a Games athlete, a 200-lb. jerk is relatively heavy (at least in a metcon), but a 200-lb. deadlift is not. Bodyweight strength is based only on the non-weighted movements, such as handstand push-ups and toes-to-bar.

A couple of notes here: 1) The obstacle course was a hard one to pick as far as number of movements, so I went with three just to indicate that there were multiple skills involved. 2) The chipper might be considered medium as far as weight is concerned, but it seemed to have a definite strength bias if you look at the athletes who did well. 3) I considered the burpee-muscle-up to be two movements and the three separate sledge-hammer angles to be the same movement. These are debatable, to be sure. 4) I considered the clean to be 0 time, because although it technically took 5-6 minutes, 95% of that was rest. What they were measuring was the amount of work accomplished in that one-second clean, not how much could be done in 5 minutes.

Now, looking at the chart, we see times ranging from basically zero to more than two hours, but all but two workouts were under 10 minutes for the winner. I think they may could have done a bit more in the 15-25 minute range. As far as weight, I think they did a good job of mixing it up: by my count, 6 light-weight workouts, 5 medium-weight workouts, 4 heavy-weight workouts. They definitely tested bodyweight strength, as we saw three that included what I would consider "high" bodyweight strength movements (HSPU to a deficit, bar muscle-ups and regular muscle-ups) and two more that I would consider "medium" (rope climbs and ring dips). Most others included at least some bodyweight component. (I think you could argue that the rope climbs at 20' were comparable to muscle-ups, but I think the point remains.)

Although I did feel they were lacking in some of the moderately long workouts, going short enabled them to put athletes through 15 workouts and hit almost every common CrossFit movement. I think they did a good job overall in this area.

Grade: A-

4) Balance the time and modal domains so that no elements are weighted too heavily - This concept is pretty closely related to the prior one, but the focus here is on whether or not the different areas of fitness were fairly represented in the scoring. As previously noted, HQ definitely hit on just about every time and modal domain, but were certain areas over- or under-counted? We did count about that the number of heavy, medium and light workouts were fairly even, but the time domains do seem to be clustered a bit around the 0-5 minute range (9 workouts). HQ mitigated this a bit by assigning only 50 points to the broad jump, medball toss and sprint, but that doesn't entirely address the problem. 

Another way I decided to look at this was to see if the athletes' rankings for any of the events were particularly correlated with each other. If two events are highly correlated, then we may be giving extra credit to the areas of fitness that are tested there. Below are two charts showing all the correlations between the events for men first, then women (the final five events are excluded because not all athletes completed them):

There's a lot going on there, but I've highlighted some key cells. The yellow cells are combinations of events with a correlation coefficient above 0.60. The only times this happens is between Pendleton 1 and Pendleton 2 - this isn't surprising at all, because almost half of Pendleton 2 was Pendleton 1. There was definitely some double-counting going on here. However, I personally feel this is OK. The reason is that it is harder to test longer events as often because of the toll they take on the body, so giving the endurance guys two events that basically test the same thing makes up (to some extent) for the emphasis on short, explosive events later on.

The red cells show events that had a significant negative correlation. For the men, we see that event 10 (the clean ladder) was negatively correlated with Pendleton 1 - this is not surprising, considering endurance athletes generally lack high-end strength. For the women, those events were basically not correlated at all, but we did see a fairly strong negative correlation between Pendleton 1 and both event 3 (obstacle course) and event 4 (broad jump). Again, not that shocking because you're comparing an endurance event with two explosive, short duration events. I think it's fine to have events that are negatively correlated, and it's bound to happen given the nature of this sport.

For a quick comparison, let's look at this same chart for the men's decathlon at the U.S. Olympic Trials. There are only 16 athletes, but it gives us a taste at least. These correlations are based on the ranks of the athletes, although that is not the way the decathlon is scored:

You see there that there are four different instances of combinations of events that are highly correlated: 100m and long jump; pole vault and long jump; shot put and javelin throw; shot put and pole vault. The shot put/pole vault correlation is a little curious, but the other three make sense. For instance, top-end speed is a key factor in both the 100m and the long jump, so in effect, the decathlon ends up testing that type of speed twice. Keep in mind that this is only a sample of one decathlon meet, but it gives you an idea of how events can overlap.

So in all, I think HQ did a pretty good job in this respect. Again, I'd like to see them maybe scale back the short workouts and hit a few more in the 15-25 minute range, but overall I think they did well.

Grade: B+

5) Make the event enjoyable for the spectators - I'm curious to see how the coverage will be when condensed into 30-minute segments for cable, but I can say that overall, I was happy with the spectator experience in person. Other than the medball toss, in which the fans had absolutely no idea how well anyone did, and some of the early heats of the medball/HSPU, where hardly any athletes could complete the HSPU, I thought the events were enjoyable to watch. The set-up of the events in the arena was generally good, so that you could easily track the athletes' progress through the movements. The "hype men" did a solid job keeping everyone up to date on the athletes to watch, and they even got most of the pronunciations right! All in all, HQ did a good job setting this thing up for the viewer. I'll be interested to see what the response is from the general public when this thing airs in a few weeks.

Grade: A-

So that's it for today - next week, I'll try to tackle the entire season, including the Open and Regionals. Did we pick the right athletes to go to the Games? Which events did we like? Is this current set-up fair? We'll try to figure it all out next week. Thanks for reading!

Thursday, July 19, 2012

Initial Post-Games Thoughts

Before we get to the numbers, I have to say seeing the 2012 Games in person was truly a blast. I've watched online the past couple years, and we went to Regionals this year, but the experience at the Games was beyond my expectations. The professionalism of it all was very impressive, and although it's been said many times before, the crowd at the Games is unlike any other sporting event. Not louder or more energetic than any crowd I've seen, but by far the most friendly and congenial (not bad looking, either). If you have been on the fence about going in the past, definitely make it a priority next year. Grab your tickets early and get out to L.A.

Now, it's time to START to assess what went down last weekend. This will certainly not be the last post on the Games, and in fact, I still have several things I'd like to look into for the 2012 season as a whole. But for starters, let's see how our predictions panned out.

Notes: Just like I did in my 2011 analysis, I assigned points to each athlete who was cut for those events that they missed. The method of estimating those points is explained a couple of posts back.

Men: I honestly expected to do a little better here than I did. The model I used had an R-squared of 66% on the training data set (the 2011 Games), and while I obviously would not expect to do that well again, I expected to do a pretty decent job picking this year's Games. In the end, the model had an R-squared of 49% (that was calculated based on the actual points for each athlete, not just the rank). We did pick the winner correctly - Rich Froning won convincingly, as was expected. Matt Chan, on the other hand, had a performance that I simply did not see coming. His regional performance was solid (7th), but he was only so-so in the Open (21st), and at 33 years old, I didn't know whether he could handle the volume required these days in the Games. The model had him picked 23rd, and he ended up outperforming the model (in terms of points) more than any other athlete. As far as placement, Scott Panchik had the biggest move, finishing fourth despite being projected 27th by the model.

Despite the performances of newcomers Panchik and Marcus Hendren (7th), Games experience still proved to be a factor. Here is a comparison showing the average ranks of previous competitors vs. newcomers (same comparison we did for 2011 a couple of posts back):

Except at the top end, prior Games competitors outperformed newcomers with similar regional rankings. Something we may want to consider is modeling the top athletes slightly differently than the ones finishing more modestly at regionals.

What was really NOT much of a predictor this year was Open results. After taking into account the regional results, the Open results basically told us nothing more (in fact it had a slightly negative coefficient in the regression). The results did show that age was still a factor (younger athletes are expected to do better), but not to the extent we expected. Each year over 26 was worth somewhere around 7 points, after accounting for prior experience, Regional and Open results - the model had assumed more like 20 (after scaling up to the 1,350 available points this year from 1,000 last year).

Overall, it looks like we would have been better off simply using the Regional rank. The R-squared there would have been was about 55% using my adjusted regional rankings and 53% using the raw regional rankings. With another year of data under our belt, hopefully we can do better next year.

Women: The women's model was much simpler than the men's, and it actually slightly outperformed the men's as well. The R-squared (using points) was 50%, and of the top 10 women, I had 6 predicted to be in the top 10. If we had simply used the adjusted regional ranks and converted them to Games-style points, then used that to predict the Games points, the R-squared would have been only 46%. So taking the Open results into account certainly helped out.

At the top, as expected, it was a dual between Julie Foucher and Annie Thorisdottir. While we expected some other top names, like Kristan Clever (predicted 3rd, finished 4th) and Camille Leblanc-Bazinet (predicted 4th, finished 6th), Talayna Fortunato was a surprise. She was 11th in the adjusted regional rankings and 8th in the Open, but a 3rd place finish was unexpected (predicted 10th). Overall, Jenny Davis outperformed the model more than anyone else, finishing 8th despite being picked 26th.

For the women, we did see a bit more correlation between prior Games experience and improved results this year (last year we saw basically none). Here is the same chart as above, except for women:

What stuck out, however, is that the Open results were a pretty darn good predictor of success for the women. After accounting for Regional results, the regression actually showed that the Open was a slightly stronger predictor than the Regionals. If you had used the Open results alone to predict the Games, the R-squared would have been 47%, slightly higher than using the Regional results alone. And again, like 2011, age did not appear to be a factor at all once you take into account regional peformance (see 40+ year-olds Becky Conzelman and Cheryl Brost finishing 14th and 15th, respectively). Obviously, in general, it helps to be younger, but if a woman has already qualified for the Games, there is no reason to believe their age will negatively impact them any more in the Games than it did at Regionals or in the Open.

I'll continue to dig into this in the coming weeks. Hopefully everyone enjoyed the Games. Only 7 months until the 2013 Open begins!

Monday, July 9, 2012

2012 CrossFit Games - Who Ya Got?

OK, it's time to get down to business and predict some friggin' results, people. For background on how we got here, see the previous post ("The Method to the Madness that is Predicting the CrossFit Games").

First, for the ladies. The model I used was as follows: Games Points = 0.82 * Regional Points + 0.48 * Open Points + 123. Remember, points are calculated using the current Games scoring system (all events using the 100-point scale). For those of you saying, "That's boring - why did you only include those two variables and not, say, age or prior Games experience?" read the previous post. Now, if you're mentally prepared for the results, the table is below.

That, my friends, is pretty much a straight-up "pick 'em" at the top. Would I question anyone whatsoever for picking Annie (or for that matter, any of the top 6 women)? Absolutely not. BUT I AM NOT PETER KING, AND I STICK BY MY PREDICTIONS: Julie Foucher will win the CrossFit Games. Don't let me down, Julie.

Now, for the men. Things were a bit more complex (and hopefully more accurate for that reason). The model here is: Games Points = 0.56 * Regional Points + 0.41 * Open Points + 162 * Prior Experience - 15.68 * Age Beyond 26 + 174. Prior Games experience is a 1 for "Yes" and 0 for "No." So for the men, you should do well if you are young, have competed at the Games before and did well at Regionals and the Open. Drum roll please...

Folks, there is just no way to take the data we have from this year and NOT predict Rich Froning to win the CrossFit Games. The man won the Open and had the top Regional performance overall, he's only 24 AND HE WON THE GAMES LAST YEAR. Do other guys have a shot? Certainly - see my post "So who CAN win the CrossFit Games?" But Froning is absolutely the man to beat. 

You'll notice that the top projection for a Games rookie is Kenneth Leverich at 22. I would guess that we'll see a rookie finish higher than that, but I think it's unlikely we'll have another Josh Bridges come in and finish on the podium on his first try. 

So there you have it. Are these predictions going to materialize perfectly? Of course not. Are some of the predictions debatable? Yes (I mean Patrick Burke at 33 seems pretty low even to me). The R-squared for the women's model, using last year's results, was 56%, so that's basically saying 44% of the variance is not explained by this model. For men, it's about 66%. That tells you how difficult it is to predict the Games.

But on the whole, given the data we have, this is what I'm going with. Think you can do better? By all means, post your top 10 or even your top 50 and we'll see how it all shakes out.


The Method to the Madness that is Predicting the CrossFit Games

Well, we're now less than 96 hours from the start of the 2012 CrossFit Games. As my old high school football coach would say, "The hay is in the barn." Sure, HQ decided to release about half of the workouts early, but for all intents and purposes, there's not much left the athletes can do to prepare. And the same goes for me: there's really nothing left to learn about these athletes between now and Friday morning, so I suppose that means it's prediction time. To quote the legendary Ronnie Coleman, "Ain't nothin' to it but to do it!"

So far, this blog has been a labor of love, but for these final predictions, I really had to put the emphasis on the labor. Despite the fact that I had done plenty of analysis on this year's open and regional results, there was really no way to get a good feel for how to predict the CrossFit Games without delving further back into the past. That meant getting my hands on the 2011 data, and by getting my hands on it, I mean doing tons of copying and pasting from the old Games2011 web site. Certainly, I could have used multiple years of data, but things would have quickly become more complicated, because prior to 2011, the Regionals were not standardized (and there was no Open), meaning it would have been awfully difficult to make any sort of predictions.

My initial thoughts were that there were a few different statistics about each athlete leading into the Games that might would do a decent job at modeling the actual results. They were: Regional results, Open results, prior Games experience and age. All of these were available online, and I thought each could help out in predicting how an athlete would perform in Carson. Since I already had a template in place, I went ahead and made some adjustments to the 2011 Regional results to account for some athletes competing weeks after other athletes. I also could have tried to adjusted the data for weather conditions (I seem to recall NorCal having some bad storms), but that was too difficult for the time I had available and probably not worth the effort. No adjustments were made to the Open results. For prior Games experience, I counted any experience in the Games after 2007 (the level of competition was simply too low and the atmosphere too different back then for the experience to really affect someone in 2011). Age was taken from the Games2011 web site. I'll go ahead and state now that Rob Orlando, Mikko Salo, Deborah Cordner and Helga Tofadottir were all excluded from the analysis entirely because they did not continue past the first event (all due in one way or another to the swim event).

[Quick note here: I am NOT taking into account the events that have been released already for 2012. First of all, half of the events are still unknown, and second, I think it is extremely difficult to say with any certainty how ALL of the athletes will do on each event. Will Dan Bailey do well on a 300m shuttle sprint? I assume yes because I happen to know he was a collegiate 400m runner. Will Austin Stack do well? I know virtually nothing about him other than his Regional and Open results (which include no running), so for me to even guess is pretty silly. And frankly, doing that much research on each athlete is beyond the scope of what I'm going for here.]

As I discussed in my last post, the Games uses a different scoring system than regionals or the Open. For now, I won't go into my opinions on the system itself, but this was something I needed to account for in my analysis. To do this, I ranked all the Games athletes in each event of the Open and Regionals, then converted those rankings to a score based on the new scoring system. The actual Games results were also scored this way, but there was a catch: because of the cuts, we did not have complete scores for athletes who didn't finish in the top 12. I felt it would be inappropriate to simply give these athletes 0 points for each of the events they missed. If I did, the 13th-place athlete for men (Patrick Burke) would have finished with 406 points, compared to 625 for 12th-place Zach Forrest. Did Forrest really perform 150% better than Burke? No.

MATH ALERT! (skip ahead if you don't care how I solved this issue and just want to see the damn predictions already) My solution was this: First, for each event after a cut, I compared the average points scored by the athletes who did complete the event to their average score on the prior events. The reason is that after the cut, the athletes were guaranteed to finish higher (on average) than before, so in order to assign points to the athletes who were cut, I needed to account for this. For each athlete who was cut, I took that ratio and multiplied it by his/her average score in the events he/she did finish. I re-calculated this after every cut. So for instance, 47th place Danie Du Preez received an average of about 16 points across the final events, despite the fact that 47th place in each event is only worth 10 points.

OK, so we've got the data set up, so let's take a look at what we found out. First, for the men, it became fairly clear that prior Games experience does matter. Here is a table showing the average rank for prior competitors vs. non-prior competitors, grouped by how the athletes finished overall at Regionals:

Aside from the bottom bracket, where we only had one prior Games competitor (Du Preez), the prior Games competitors fared better than the rookies who had similar Regional performances. However, for women, the relationship was non-existent. Here is the same chart for women:

Here we see that there was virtually no difference between those with or without Games experience, after accounting for Regional rank. Despite the fact that I assumed, Games experience would help us in our predictions, I simply had to ignore it for women.

More math-y stuff, proceed with caution! The effect of age was much more difficult to see with a chart like the above one, but I decided to include it in my regressions to see it was a significant variable. For the men, I used a four-variable linear regression, with regional points (Games scoring), Open points (Games scoring), prior Games experience (yes/no) and age. It turned out that all four variables were significant and in the direction I expected - better Regional results, better Open results, prior experience and youth all predicted better Games performance. But after some thought, I decided to try tweaking the age variable. Would we really expect an athlete to be worse at age 22 than at age 21? I wouldn't. So I messed around with different age cut-offs to see if it would be a better indicator. I ended up settling on 26, meaning the age variable was actually the athlete's age beyond 26 (or 0, if the athlete was 26 or younger). Age 26 seemed reasonable and gave me the most significant t-value on the regression.

But again, for women, no such luck! No matter what age cut-off I chose, the regression did not show any negative effects of older age - in fact, if anything, older athletes tended to do better (think Cheryl Brost finishing 7th at age 40 or Annie Sakamoto finishing 10th at age 35. I decided to ignore this variable altogether. My theory on this is that for women, the Games come down to who has the strength for the heavy lifts and the skills to perform the difficult gymnastic movements. A few of these movements or lifts (heavy front squats come to mind last year) can really eliminate many women from contention, regardless of age. On the men's side, there are fewer athletes who struggle that much with any one movement or lift.

So that's the background. For men, the model I used included Regional results, Open results, age and prior Games experience. For women, I simply used Regional and Open results. And now, for the absolutely 100% guaranteed predictions for this year's Games, see my next post...

Thursday, July 5, 2012

Games Scoring - We Almost Forgot!

Alas, as I started working on my final games predictions (hoping to post those late this weekend or early next week), I realized that my prior analysis had been incomplete in one regard: when I ranked all the athletes to get overall placings, I used the points-per-place system (that was used in regionals), not the actual Games scoring system used last year (and this year, although with a small tweak). The idea is the same in that placement in an event determines how many points an athlete receives, but the Games scoring system goes in the opposite direction. First place is worth 100 points, and the points drop for each subsequent place. Also noteworthy is the fact that the gap between placements is higher near the top. For instance, the gap from first (100) to second (95) is only five, but the gap between sixth (75) and seventh (73) is only two and the gap between 30th (27) and 31st (26) is only one. This has the effect of rewarding athletes who finish very high in some events but struggle in others over athletes who are more consistently in the middle (hence, Rich Froning's mediocre performance on Event 1 last year didn't hurt him that much).

The changes in scoring do not drastically affect the final results. Still, I felt it was worthwhile to re-produce some of the analysis I have done previously using the different scoring method. And in fact, I plan to use this scoring system for my final predictions in a few days.

Anyway, here are the men's and women's standings (adjusted to compensate for the week of regional competition) with the Games scoring system:

There were some shifts here, but generally they are in line with our previous results. Dan Bailey did jump Jason Khalipa for second place (again, more reward for Bailey's extremely high finishes in certain events as opposed to Khalipa's consistency) and Julie Foucher took second outright from Azadeh Boroumand. The biggest positive movers on the men's side were Lucas Parker, Aja Barto and Numi Snaer Katrinarson, who each moved up five spots. Phillip Kniep dropped five spots. On the women's side, the biggest positive mover was Jasmine Dever (five spots). No women dropped more than two spots.

I also re-ran the simulations with the new scoring system. The results didn't change that much, so I won't go into as much detail as last time. Here are the winners for the men and women using the simulation with Open events being scored as "skills tests":

Men's winner: Rich Froning (793 times), Dan Bailey (134) and Neal Maddox (75)

Women's winner: Annie Thorisdottir (500 times), Julie Foucher (384), Kristan Clever (78), Azadeh Boroumand (34), Camille Leblanc-Bazinet (7), Lindsay Valenzuela (4), Elizabeth Akinwale (2) and Michelle Letendre (1)

Again, no big changes here. We are still seeing that the women's race appears to be more up for grabs than the men's.

So unfortunately, nothing too exciting here today. But as mentioned above, I plan to make my final predictions in a few days. We know for sure that regional results do not always translate perfectly to the Games (Graham Holmberg was third at regionals the year he won it all), and there are tons of other factors in play here, too. I'm close to completing my work, but I'm also curious to see how you guys see things panning out as well.