OK, the bulk of my post is based on the opinion that in programming the Games, there should be five goals in mind (in descending order of importance):
1) Ensure that the fittest athletes win the overall championship
2) Make the competition as fair as possible for all athletes involved
3) Test events across broad time and modal domains (i.e., stay in keeping with CrossFit's general definition of fitness)
4) Balance the time and modal domains so that no elements are weighted too heavily
5) Make the event enjoyable for the spectators
While this may not be HQ's stated mission in programming the Games, it's certainly how I would approach programming the Games and I think it is roughly in line with what HQ purports to do. With that as the framework for this analysis, let's evaluate how well Mr. Castro and HQ fared this year.
1) Ensure that the fittest athletes win the overall championship - Although I did pick Julie Foucher to edge Annie Thorisdottir for the women's title, I have to concede that Thorisdottir and Rich Froning again appeared to be the fittest athletes in the world, and by a fairly wide margin.
There is no question about Froning - he won the Open, he finished atop the Regional rankings (before and after adjustments based on week of competition) and he won the Games by 114 points. Unless Josh Bridges comes back next year or Froning gets hurt, Froning has to be a huge favorite to win it again next year.
As far as Annie is concerned, she finished third in the Open but finished atop the Regional (before and after adjustments based on week of competition) and wound up winning by 85 points at the Games. If we take all Games athletes and add up their rankings across the 21 events that everyone completed (excluding the five events after cuts started at the Games), Annie had 172 points, 25 ahead of second-place Julie Foucher. Keep in mind that she then beat Foucher in three of the final five workouts. (If you're curious, Froning finished with just 118, well ahead of second-place Dan Bailey with 226).
Adding in the fact that these two athletes both won the title last year, and I think it's safe to say that the fittest athletes did indeed win the titles this year.
Grade: A
2) Make the competition as fair as possible for all athletes involved - This goal involves a few different things. First, the scoring system needs to be fair. I think the scoring system is far from perfect, but I think it's fair enough. HQ is clearly trying to reward the particularly high finishes on each event by spacing out the top 6 places more, and I'm fine with that. I think there is an element of head-to-head competition that is rewarded by a system like this, as opposed to a pure place-per-event system like Regionals. Ideally, as many people have suggested, they'd switch to some sort of "percentage of work" system that would take into account the discrepancy between places (e.g. a 10-second victory is worth more than a 1-second victory on the same workout). But I think this system is OK for now.
Second, the events themselves need to be judged, operated and scored fairly. I think this was a mixed bag. The standards on finishing certain workouts, like the medball-HSPU and the Double Banger, were inconsistently enforced. Some athletes were made to get the Medball to stay in the required area after finishing the medball-HSPU workout, while others were allowed to simply drop the ball or merely run across the line. In some cases, this made a difference of 5-6 seconds. Also, the medball toss (without a doubt the worst event, more on that in Part 2) had several instances where the equipment malfunctioned (Kristan Clever is one example) and the athletes weren't given a fair chance. And although it likely wasn't a major factor, the athletes who went second in each heat on the medball toss had considerably less rest than those who went first (around 90 seconds vs. about 3 minutes). But as far as judging, I will say that from what I could tell, things looked pretty even, especially given that we're working mostly with volunteer judges.
All in all, I'd say this year's event was better than years past in this respect, but improvements still need to be made in this area.
Grade: B-
3) Test events across broad time and modal domains (i.e., stay in keeping with CrossFit's general definition of fitness) - To look at this, I've constructed the following table to compare all the Games events in terms of time, number of movements, the level of weight lifted and bodyweight strength required. The times here represent a ballpark figure of the fastest finisher for men and women. As far as level of weight lifted, I grouped everything into broad categories, generally based on the heaviest weight involved in the workout. Keep in mind, this varies by lift: for a Games athlete, a 200-lb. jerk is relatively heavy (at least in a metcon), but a 200-lb. deadlift is not. Bodyweight strength is based only on the non-weighted movements, such as handstand push-ups and toes-to-bar.
A couple of notes here: 1) The obstacle course was a hard one to pick as far as number of movements, so I went with three just to indicate that there were multiple skills involved. 2) The chipper might be considered medium as far as weight is concerned, but it seemed to have a definite strength bias if you look at the athletes who did well. 3) I considered the burpee-muscle-up to be two movements and the three separate sledge-hammer angles to be the same movement. These are debatable, to be sure. 4) I considered the clean to be 0 time, because although it technically took 5-6 minutes, 95% of that was rest. What they were measuring was the amount of work accomplished in that one-second clean, not how much could be done in 5 minutes.
Now, looking at the chart, we see times ranging from basically zero to more than two hours, but all but two workouts were under 10 minutes for the winner. I think they may could have done a bit more in the 15-25 minute range. As far as weight, I think they did a good job of mixing it up: by my count, 6 light-weight workouts, 5 medium-weight workouts, 4 heavy-weight workouts. They definitely tested bodyweight strength, as we saw three that included what I would consider "high" bodyweight strength movements (HSPU to a deficit, bar muscle-ups and regular muscle-ups) and two more that I would consider "medium" (rope climbs and ring dips). Most others included at least some bodyweight component. (I think you could argue that the rope climbs at 20' were comparable to muscle-ups, but I think the point remains.)
Although I did feel they were lacking in some of the moderately long workouts, going short enabled them to put athletes through 15 workouts and hit almost every common CrossFit movement. I think they did a good job overall in this area.
Grade: A-
4) Balance the time and modal domains so that no elements are weighted too heavily - This concept is pretty closely related to the prior one, but the focus here is on whether or not the different areas of fitness were fairly represented in the scoring. As previously noted, HQ definitely hit on just about every time and modal domain, but were certain areas over- or under-counted? We did count about that the number of heavy, medium and light workouts were fairly even, but the time domains do seem to be clustered a bit around the 0-5 minute range (9 workouts). HQ mitigated this a bit by assigning only 50 points to the broad jump, medball toss and sprint, but that doesn't entirely address the problem.
Another way I decided to look at this was to see if the athletes' rankings for any of the events were particularly correlated with each other. If two events are highly correlated, then we may be giving extra credit to the areas of fitness that are tested there. Below are two charts showing all the correlations between the events for men first, then women (the final five events are excluded because not all athletes completed them):
There's a lot going on there, but I've highlighted some key cells. The yellow cells are combinations of events with a correlation coefficient above 0.60. The only times this happens is between Pendleton 1 and Pendleton 2 - this isn't surprising at all, because almost half of Pendleton 2 was Pendleton 1. There was definitely some double-counting going on here. However, I personally feel this is OK. The reason is that it is harder to test longer events as often because of the toll they take on the body, so giving the endurance guys two events that basically test the same thing makes up (to some extent) for the emphasis on short, explosive events later on.
The red cells show events that had a significant negative correlation. For the men, we see that event 10 (the clean ladder) was negatively correlated with Pendleton 1 - this is not surprising, considering endurance athletes generally lack high-end strength. For the women, those events were basically not correlated at all, but we did see a fairly strong negative correlation between Pendleton 1 and both event 3 (obstacle course) and event 4 (broad jump). Again, not that shocking because you're comparing an endurance event with two explosive, short duration events. I think it's fine to have events that are negatively correlated, and it's bound to happen given the nature of this sport.
For a quick comparison, let's look at this same chart for the men's decathlon at the U.S. Olympic Trials. There are only 16 athletes, but it gives us a taste at least. These correlations are based on the ranks of the athletes, although that is not the way the decathlon is scored:
You see there that there are four different instances of combinations of events that are highly correlated: 100m and long jump; pole vault and long jump; shot put and javelin throw; shot put and pole vault. The shot put/pole vault correlation is a little curious, but the other three make sense. For instance, top-end speed is a key factor in both the 100m and the long jump, so in effect, the decathlon ends up testing that type of speed twice. Keep in mind that this is only a sample of one decathlon meet, but it gives you an idea of how events can overlap.
So in all, I think HQ did a pretty good job in this respect. Again, I'd like to see them maybe scale back the short workouts and hit a few more in the 15-25 minute range, but overall I think they did well.
Grade: B+
5) Make the event enjoyable for the spectators - I'm curious to see how the coverage will be when condensed into 30-minute segments for cable, but I can say that overall, I was happy with the spectator experience in person. Other than the medball toss, in which the fans had absolutely no idea how well anyone did, and some of the early heats of the medball/HSPU, where hardly any athletes could complete the HSPU, I thought the events were enjoyable to watch. The set-up of the events in the arena was generally good, so that you could easily track the athletes' progress through the movements. The "hype men" did a solid job keeping everyone up to date on the athletes to watch, and they even got most of the pronunciations right! All in all, HQ did a good job setting this thing up for the viewer. I'll be interested to see what the response is from the general public when this thing airs in a few weeks.
Grade: A-
So that's it for today - next week, I'll try to tackle the entire season, including the Open and Regionals. Did we pick the right athletes to go to the Games? Which events did we like? Is this current set-up fair? We'll try to figure it all out next week. Thanks for reading!