Monday, June 25, 2012

Are certain events "better" than others?

Today's post will go in a slightly different direction than the previous three. Instead of focusing on the athletes, I'd like to focus on the events themselves and try to address the question (in the context of CrossFit competitions): "Are certain events better than others?"

Intuitively, I think most would agree the answer is yes. An event like, say, "Fran"will almost certainly test a person's fitness better than something like a competition to hit the longest drive with a golf ball. One way to look at this is using the purely CrossFit definition of fitness: of the 10 general physical skills (, I would say Fran hits on just about all of them (except maybe accuracy, balance and agility), while a long-drive competition hits on maybe three (accuracy, coordination and speed).

But this gets tricky to prove which events are better than other simply using the 10 general physical skills. Just in the above example, there is some wiggle room in saying which skills are tested even by those two events. So let me propose another definition of what makes a good test of fitness: a good CrossFit event will provide a strong indication of an athlete's ability to perform well in a wide variety of OTHER tests. To help explain, consider this example:

My contention here (and most CrossFitters would probably agree) is that "Elizabeth" is a better test of fitness than either a 5K run or a max bench press. For one, it tests more of the 10 physical skills, but I believe it does a better job of indicating which athletes would perform well in a wide variety of other tests. In this example, I assumed that there is generally no correlation between running a 5K and bench pressing. As such, I assigned random rankings to those events. However, a person who is strong on "Elizabeth" will probably do fairly well at both a 5K and a max bench press. So in this example, the person who has the best rank combined on both the 5K and bench press also has the top rank on Elizabeth.

This is an extreme example, and obviously I have rigged it, but it gives us an idea of how we can get a feel for which other events are good tests of fitness. The way we can do this is to look at the correlation between an athlete's finish on one event and their combined finish on a variety of other events. In the above example, the correlation between "Elizabeth" and the combined ranks on the other two is 98%. The correlation for the 5K run vs. the other two is 0%, and for the bench press vs. the other two it is -8%. What this says is that the bench and the 5K don't tell us much as much as "Elizabeth." All we need to test is "Elizabeth," because that tells us just as much as testing all three events. Note that we are talking purely about TESTING fitness here, not training for it. It might be the case that the 5K is worthwhile in training, but not as much in testing.

So on this theoretical basis, I decided to look at the events we have seen thus far in 2012. Like I did in my first analysis, I limited the field to athletes who completed all 6 events at regionals, which gives us a sample of about 250 men and 250 women. I used my adjusted regional results (see first two posts), and my measure of how well an athlete did on each event was simply the rank*. For each event, I looked at the correlation between an athlete's rank and his/her combined rank on the other 10 events.

Let's start with a visual representation. Here is a scatter plot of the men's Open WOD 3 ranks (x-axis) vs. the combined ranks on all other events.

It is fairly clear from this plot that a better rank on Open Event 3 (further left) corresponds to better results on the other events. Now let's look at the same scatter plot for men's Regional WOD 1 ("Diane").

Whoa. While there does appear to be some weak correlation, it's pretty clear that the results from "Diane" don't do much to predict how an athlete will do on the other events. I don't find this particularly surprising. For years, we have seen otherwise solid athletes struggle with handstand push-ups, and at the elite level, there is just no way to make up any ground on the deadlifts at a weight as light as 225. So basically we are testing handstand push-ups, which do tell us something about an athlete's overall fitness, but not much - certainly not as much as we can learn by testing 18 minutes of box jumps, medium load push press and toes-to-bar.

So which how do the events stack up in terms of correlation**? Well, here are the results, with women first and men second:

Well, would you look at that? Men's Open Event 3 had the highest correlation and Men's Regional Event 1 had the lowest. You'd almost think I chose those two graphs on purpose. It is clear, though, that for both men and women, Open Event 3, Regional Event 4 and Regional Event 2 were strong predictors of success across the board, while Regional Event 3 and Open Event 1 did not tell us as much. Regional Event 1 did have a somewhat higher correlation for women than for men, possibly because the event was not so blazing fast.

We can see another trend from this chart as well: events with more movements tend to be better predictors of overall fitness. While this is not surprising, I think it is an important point. Single-modality events simply do not tell us as much about an athlete as a couplet, triplet or chipper***. I do not believe we should eliminate them from competitions for this reason, but I do think that there should be some consideration to weighting these events less heavily. The Games struck a good balance last year, in my opinion, by grouping the single-modality events together into "Skills Tests," which didn't put as much weight on any one of those movements. I think giving a max effort snatch or an extremely heavy dumbbell snatch the same weight as something like Regional Event 4 may not be appropriate (somewhere, Chris Spealler is nodding his head right now).

This is certainly not a topic with one absolute right or wrong answer. I would be very interested to see other opinions, not only on what I have done, but also on what defines a "good" CrossFit event.

*Note: I also looked at this another way, which was to give each athlete a score on each event that was equal to the percentage of work done relative to the overall top score/time. For now, I will ignore those results because they are generally the same as these.
**To give some perspective to what these correlations mean, you can square the values to get the "r-squared." Men's Open Event 3, for instance, has an r-squared of 56%, while Men's Regional Event 1 has an r-squared of 20%. One rough interpretation of the r-squared is that it tells you how much of the variance in the other events' scores is explained by the event we are using as a predictor. So Men's Open Event 3 explains about 56% of the variance in the other events' scores.

***Yes, Regional Event 5 actually had two movements, but the double-unders didn't have much of an impact other than as a tiebreaker. You could also argue that Regional Event 3 was basically only one movement, too, since the impact of the running was negligible for most athletes.


  1. This one is my favorite, though I personally thought Open WOD 1 was the best predictor ;)

  2. I know you don't really have the data, but it would also be interesting to see how well different single modality events (namely different lifts) predict performance across all events. For example you'd expect the snatch (complex movement requiring total body strength and explosion) to be a better indicator than say bench press, but picking between the squat and snatch might not be so easy. Realistically you might expect the snatch to be better because squat strength will correlate with squat strength as well as testing overhead strength etc., but maybe that isn't the case...

  3. Unknown - Whatever I can do to convince Dave Castro that 7 minutes of burpees is NOT a good event, I'll do it. That was terrible and basically eliminated my regional chances right off the bat. I loved Open WOD 3, but I was kind of surprised it did so well here b/c it didn't have any squatting involved, which is such a key element of CrossFit.

    NT - You're right, I don't have the data for this at the moment, although I suppose I could look around for some local throwdowns to find one with a max bench press/deadlift/etc. After the Games, I plan to repeat this analysis, but with all the Games events included, too. Obviously that limits my sample size to under 50 men and under 50 women, but I think it would still be interesting.

  4. I should clarify that Open WOD 1 wasn't a terrible event; it was just terrible to have to do 7 minutes of burpees.

  5. What are the labels for the axis on both diagrams mate?

    1. Sorry, I probably should have made that more explicit. For each point, the location on the x-axis represents the athlete's rank on that particular event and the location on the y-axis represents the athlete's combined ranks on all other events.

      So on the second graph, we see a point out on the bottom right at about (280,600). This means the athlete finished 280th on Regional Event 1 but his combined ranks on all other events I included (there were 10 others) was only 600. So he average 60th place on everything despite doing quite poorly on Regional Event 1. You'll see on the graph of Open Event 3 that there are far fewer points in that portion of the graph.