Friday, April 19, 2013

A Look Back at the 2013 Open: Part II


Note: For details on the dataset I am using, including which athletes are and are not included, please see the introduction to Part I.
 
For Part II of my look back at the 2013 Open, let's transition into looking to some results from this year's Open, while still keeping the programming in mind. As I mentioned in the intro of Part I, one way to judge the effectiveness of a workout (for competition purposes) is to see how well it correlates with success in a variety of other events. In other words, an event is "good" if the best athletes tend to finish near the top. Although we don't yet have regional results to add to the mix, we can still look at how well each event predicted success in the other four Open events. By this metric, event 13.4 was probably the best test this season. For the entire field, a male athlete's rank on 13.4 was 91% correlated with his rank on all other events combined (for women, 90%). All other events ranged between 83-86% for men and 80-88% for women. 

If we limit the field to the top 1,000 finishers for men and top 500 finishers for women (roughly 2% of the final field for each), re-rank everyone, then look at the correlations, they do drop off. This makes sense, because it is not inconceivable that a top athlete might fall toward the bottom of that elite group on one event, but a top athlete falling anywhere beyond the top 5-10% across the entire field is unlikely.  Looking at the events across this elite group, 13.5 had the highest correlation for men at 39%, while 13.4 was still highest among women at 53%. Among this elite group, 13.2 proved to be the weakest test for both men and women (23% for each), which is not particularly surprising to me. The difference between top scores on this workout often came down to tiny fractions of a second on each rep, not to mention that the was widespread variation in judging standards on this one (which has been discussed on the internet ad nauseum already).

To put these numbers into some context, I looked at the correlations for the five events last year for the men's field. Across the entire field, events 12.3, 12.4 and 12.5 had correlations between 81-88%, but the two single-modality workouts, 12.1 and 12.2, only had correlations of 69% and 63% respectively. Limiting the field to the top 500 (roughly 2%), the correlations for 12.1 and 12.2 were just 15% and 12%. Not to beat a dead horse, but the multi-movement events just give us more information about the athletes, which is key in an competition as short as the Open.

Visually, these scatter plots help illustrate the idea. Each chart shows a random sampling of about 500 athletes from across the entire field (because showing them all would be too much for Excel, which is why I need to improve my R skills). On each chart, the location on the x-axis represents the athletes's rank on that event, and the location on the y-axis represents that athlete's combined rank on all other events. Notice how much more of a clear relationship we have in 13.4 as compared with 12.1. Virtually no one did amazingly well on 13.4 without also performing solidly across all five workouts.




Moving on, with the 2013 Open web site being set-up in conjunction with last year's site, it was much easier to track which competitors were returning and which were new. With that information available, I was curious to see how the returning athletes fared as compared to the first-timers. Would they have more of an advantage on certain events. 13.3 perhaps? Let's see.*


Simply put, the returning athletes did fare better than the newcomers (lower percentiles in this case mean a better ranking). However, the difference was not markedly different on any one event. I anticipated that returners would have an even bigger advantage on 13.3, given that it was in last year's Open, but this does not appear to be the case. Interestingly, among the top 1,000 male finishers, new athletes actually fared slightly better than returning athletes on 13.2. This is the only event where this was true, and it was actually not true for women.

What would be ideal here is to have the age for each competitor so I could control for any differences in the age mix between the three categories of athletes, but alas I do not have that information at the moment. My hunch is that the age mix is likely pretty similar between the three groups, however.

Since 13.3 was an exact repeat of 12.4, it should provide some good insight into how the community is progressing over time. In order to better compare performances on most events, I like to adjust the scores to be in terms of stations. So in this case, 150 wall balls is 1.0 stations, 90 double-unders is 1.0 stations and 30 muscle-ups is 1.0 stations. For this event in particular, this isn't a perfect solution, because 90 double-unders is a considerably shorter station than the other two, but I think this still gives us an improvement. We know that 1 wall ball is much less difficult than 1 muscle-up, so we try to reflect that.

After making that adjustment, I compared the distribution of scores across all competitors on 13.3 vs. the distribution of scores across all competitors on 12.4. The charts for men and women are similar, but since I've been showing men first up to this point, let's show the graph of the ladies here. The chart below is basically made up of two histograms, which I've shown as line because it makes it easier to compare their shapes. I've also omitted scores below 0.75 rounds (113 reps) because the graph tails off quickly and then has a spike at the very bottom due to all the scores of 1.


I found this truly remarkable. Despite doubling the size of the field this year, the scores across the entire community were distributed in nearly identical fashion. Additionally, the percentage of women completing a muscle-up inched up from 9.9% to 10.2%, and the percentage of women completing a muscle-up of those that reached the muscle-ups ticked up from 27.0% to 28.6% (these numbers are not easy to discern from the graph). For the men, the percentage completing a muscle-up actually dropped from 37.1% to 35.6% and percentage completing a muscle-up of those that reached the muscle-ups stayed flat at 73.6%. So essentially, no real change from last year.

Does this mean that CrossFit somehow doesn't work? Are athletes not improving at all with a year's worth of training? No. What it means is that the newer athletes are coming in and putting up scores at the lower end of the spectrum, offsetting the athletes who competed last year and are improving. How do we know this? Well, let's look at the same chart as above, but this time only with athletes who finished all five events in 2012 and 2013.


You can clearly see that the distribution of scores on 13.3 is heavier at the higher end than the distribution for 12.4. For instance, you can see that about 15% of athletes scored between 2.0-2.25 (240-247) on 13.3, while only about 8% did so on 12.4. Further, the percentage of women completing a muscle-up jumped up from 12.6% to 23.9%, and the percentage of women completing a muscle-up of those that reached the muscle-ups went up from 30.2% to 40.9%. For the men, the percentage completing a muscle-up skyrocketed from from 49.8% to 70.9% and percentage completing a muscle-up of those that reached the muscle-ups moved from 77.8% to 85.3%. Clearly, the athletes who came back this year worked on wall balls, double-unders and muscle-ups, and it paid off on 13.3.

Finally, let's wrap things up by taking a look at how many athletes stuck with it through all five weeks of the Open. As has been noted in the past, scores from athletes who only compete in the first event or two are not removed from the standings for the weeks in which they did compete. Although strangely, athletes who skip week 1 but compete later do not count at all. I would argue this is not exactly fair, as the first events essentially get weighted more because the size of the field is larger. My analysis, however, did not show that this would have had a significant effect on the final rank of competitors.

Still, it is interesting to look at when and how rapidly athletes drop out of the field. Here are stacked bar charts for the men and women in this year's field. If an athlete returns to the field after dropping out, that score after returning does not count in my analysis.


You can see only thing out of the ordinary there appears to be the number of women finishing the first 3 events. My hunch is that this is due to the large number of women who could not complete a 95-lb. clean and jerk. In fact, let's look back at how the field tailed off for men and women the past two years, from a slightly different perspective.


Again, the only drop that looks out of line is the women going from 13.3 to 13.4. Between those events, the field shrunk by 19%. In the past two years, the next-largest drop was 13% (men between 13.4 and 13.5), and all others were between 9% and 11%. I would have liked to look back at 2011 to see if a similar percentage dropped off from 11.2 to 11.3 (the heavy clean and jerk), but the 2011 Games site is considerably more challenging to handle. If anyone has an answer there, I'd certainly be curious.

Anyhow, that's it for now. There may be more topics to re-visit from this year's Open, but I think it's time to focus our attention to Regionals (even for the vast majority of us who won't be competing). See you all in a few weeks when the events are announced!

*"Returning incomplete competitors" refers to those who started last year's Open but did not complete all 5 events.

Thursday, April 18, 2013

A Look Back at the 2013 Open: Part I

After a somewhat longer-than-expected break, I think it's high time to get look back at the 2013 CrossFit Games Open, in its entirety, and see what we can learn. For sure, there will still be plenty of stones left unturned, in part because of limitations in the data I've been able to get to this point, but also because I believe the Regionals and Games can give further insight into the Open. For instance, last year I looked at athletes who competed in all 6 events at regionals and compared their results across all 11 events to that point. This allowed me to see which individual events were predictive of success across a wide variety of events. Obviously, we can't do that quite yet, at least not to that extent.

That being said, I think there is enough data out there to break things down adequately and understand more about the state of our sport and possibly where we're headed. Due to the amount of material, I'll be breaking this post up into two parts. To start, here is a list of topics I plan to cover, followed by a list of things I will not be touching on in this post:

Will cover:
  • Breakdown of the programming of this year's Open, much like last year's post "What to Expect from the 2013 Open and Beyond" (Part I)
  • Correlations between events this year, compared with last year (Part II)
  • Comparison of performance by new competitors vs. returning athletes (Part II)
  • Comparison of 12.4 and 13.3 results, in a fair amount of depth (Part II)
  • Attrition in this year's Open, compared with last year (Part II)
Will not cover:
  • Comparison between regions (don't have region information on the data at the moment)
  • Breakdown by age group (don't have age information, either)
  • Predictions for regionals
  • Probably lots of other subjects that I simply didn't think of. If you have suggestions for future analysis, by all means, post to comments or email me.
Finally, here are some notes on the data set I am using for any work dealing with the results of the Open:
  • Excluded any athletes who did not complete all 5 events. This simply makes for fairer comparisons. I did look at all scores in order to calculate the number who dropped off each week, but that is it.
  • Masters competitors are lumped in with everyone else. As mentioned above, I don't have age information this dataset since I pulled it straight off the worldwide leaderboard. This is not ideal, but I made sure to do the same when looking at last year's data to make comparisons. Only about 20% of the field are in the Master's divisions, with only 2% in ages 55+ (where the workouts are slightly scaled).
  • I have re-ranked athletes on each event among the athletes in this dataset.
  • Athletes were identified as returning athletes if their full name was in last year's dataset. There are multiple athletes with the same exact name, but I had no way around this without region or age information. I assume any impact here is minor. The one manual fix I made was to make sure the Ben Smith at the top of the leaderboard was matched up with the correct Ben Smith from last year's data. 

OK, with that out of the way, let's get rolling.

We'll start with the programming this year. As I mentioned in my prior posts, I felt this year's programming better was an improvement over last year, if for no other reason than we eliminated the single-modality events. I also felt the events this year were balanced, with specialists unlikely to finish particularly high on any given event, but yet there was enough diversity that we weren't testing the same thing over and over again. As I started looking into the programming further, it became clear that 2013 was, in many ways, a blend between 2011 and 2012. First, here is a basic comparison of the average loading* used each year in the men's competition (the pattern is the same for women).


The average relative weight was down slightly, but very much in the same neighborhood as previous years, while the percent of points from lifting and the load-based emphasis on lifting (LBEL) were both basically equal to the average of the previous two years. Now let's take a look at which movements** have been used across the three years, and how they have been valued (1.00 equals one full event).


You may notice that we have not introduced a single new movement since the Open began in 2011. I was glad to see we brought back the clean and deadlift this year, but notice which movement is at the bottom: overhead squat. In my mind, this is the quintessential CrossFit lift, and yet it's accounted for only 2% of the points in the Open over the past three years. Sad but true.

You'll also notice that in general, the Olympic-style lifts and derivatives (thruster, overhead squat), as well as basic gymnastics movements, are the biggest keys to Open success. Running, rowing, powerlifting, kettlebells, wall balls, high skill gymnastics, strongman lifts - these are all of minor importance until you reach beyond the Open. If you want to make Regionals, work on your snatch, clean and jerk, burpees, thrusters and pull-ups. If you can't do those things extremely well, it doesn't matter if you can bang out 25 consecutive ring handstand push-ups or lift a 300-lb. atlas stone. It doesn't even matter what your time is on a 5K run. That's not to discount the usefulness of these other skills in training; it's just that you're not likely to see that tested until at least the regionals.

Finally, here's a chart I put together showing the relationship between loading, the number of movements, and the length of workout in the past three years of the Open. In the past, I probably haven't spent as much time as I should looking into the time domains in workouts across the Games season, partly because after the Open, the workouts often have a set workload, not a set time. But for the Open, we know the time domain exactly***. In the chart below, the x-axis represents the time domain, the y-axis represents the number of movements and the size of each bubble represents the LBEL of that particular workout (roughly how "heavy" was each workout). Note that I considered 11.3 a single-modality despite technically being a clean and jerk.


What we see is fairly typical of CrossFit programming. The fewer movements are involved, the shorter the workout is likely to be. In training, this is generally true because of issues like Rhabdo that come into play when you hammer one muscle group too much. Also, the relationship isn't quite as strong, but typically the heavier the load, the shorter the time domain. Again, I think part of this is simply being smart and safe, since going heavy for an extended period of time lends itself to potential injury.

That's it for Part I. In Part II, I'll be focusing more on the results of this season's Open. See you soon.


*For background on these metrics, please see my post "What to Expect from the 2013 Open and Beyond." Just as I did last year, the average weight load on the burpee-snatch workout was calculated based on the average score from the regional-level competitors. Therefore, that workout was considered fairly heavy, despite the fact that many beginner and intermediate athletes would not lift more than a 75/45 lb. snatch.
**For 13.4, I considered the toes-to-bar to be worth 50% of the workout, the clean to be worth 25% and the jerk to be worth 25%.
***For 13.5, which had a varying time limit, I used the average time spent for athletes in the top 1,000 worldwide (roughly the regional competitors). This turned out to be 8:00 for men (seen in the chart) and about 5:20 for women.

Saturday, April 6, 2013

Quick Hits: Open Week 5 Initial Thoughts

Before I start, I'll go ahead and say that this will be a relatively short entry today. I'm going to try and cover the Open in its entirety in another post next week. In that post, I'll be updating a lot of the numbers I put together in my post last fall titled "What to Expect From the 2013 Open and Beyond." With another year of data under our belt, it should be interesting to see what's changing, what's been consistent, and what we can expect in the future.

OK, without further ado, let's dive into my thoughts on 13.5.

  • The combination of thrusters and pull-ups wasn't a surprise to anyone, and I don't have a problem with either of those two movements, but I'll re-iterate my disappointment that overhead squats were left out of the Open again this year. And while I know Fran is the most well-known workout in CrossFit, I personally think it's being put on too high a pedestal. Between the 2013 Games ending with Fran and the last three Opens ending with some sort of Fran-type workout, I think HQ is getting a little repetitive. Remember, "unknown and unknowable" was the Games mantra not long ago, and it seems we may be getting away from that in some respects.
  • I like the idea behind this workout, but to me, the time caps were simply too tight. As of Saturday evening, only 5.9% of men and 1.2% of women had reached beyond the first time cap, and only 16 men and 2 women had reached beyond the second time cap. So what we had for virtually the entire field was a 4-minute AMRAP. There will be plenty of men and women who reach the regionals that did not break into the second time period, so we really didn't get to test their ability to do more than an all-out sprint. And while 4 minutes was enough to get in a pretty nasty workout (as long as you have decent chest-to-bar pull-ups), I think the vast majority of the field was deprived of experiencing this new concept. Again, I like the concept, so why not let more people be exposed to it? My suggestion: a 6:00 time cap, followed by a 5:00 time cap, followed by a 4:00 time cap, followed by a 3:00 time cap, etc.
  • It appeared to me that scores were coming in much slower than in past weeks. I'm wondering if that was because a lot of pretty good athletes missed the time cap on their first attempt and were shy about submitting their score before trying again.
I've gone through and done the leveraging analysis once again for this one, just as I did for 13.2 and 13.4. Because of the unusual time cap of this one, I chose to focus solely on the first 4 minutes, because that's what the workout is really about for almost all of us. Even many of those who got beyond 4:00 had to really focus on simply making it to the first marker as opposed to concerning themselves with pacing for 8:00.

To do this analysis, I looked at 5 regional-level athletes, three of whom finished the first three rounds with :30 or less remaining and two of whom just missed the cut-off. So this is from the perspective of an athlete really looking to break into the second time cap. For background on this type of analysis, please read my post "Why It (Usually) Pays To Be Well-Rounded."


What we can see right off the bat is that this workout is more balanced between the two movements than 13.2 or 13.4, probably more than any other workout this year (I actually did this leveraging analysis for 12.4/13.3 in my original leveraging post, but I did not do it for 13.1 due to the varying weights). Both movements are positively leveraged, meaning you can't easily compensate for a deficiency in one movement with a strength in the other. This is not a coincidence - both movements take about the same time for these elite athletes, and the workout calls for an even number of reps of each in every round. This is one reason why Fran is a great test of fitness (though still not the end-all, be-all of fitness).

With 13.5, there is very little margin for error on either movement if you want to get beyond 4:00. From my experience, it does seem like the chest-to-bar pull-ups are going to be the limiting factor for most people who are more middle-of-the-pack, but for the elite athletes, being solid on the thrusters is also crucial.

Finally, here's are two charts showing how the athletes' pace decayed on each movement throughout those 4 minutes. Each athlete's final score on the workout is in parentheses on the right of the chart.



We can see that the middle round of pull-ups is where things started to separate for these athletes. From my own personal experience, this was absolutely true. I went unbroken until that point, but then the wheels came off and couldn't do more than 4 pull-ups at a time from there on out (I ended with 80 reps).

Well, I hope everyone has enjoyed this year's Open. Good luck to those giving 13.5 one more shot tomorrow and to those advancing to regionals. See you all next week for my overall Open wrap-up.

Monday, April 1, 2013

Fun With SWAGs: What Will 13.5 Be?

Welcome back. We're closing in on the end of the third CrossFit Games Open, and I think for the most part, it's been an enjoyable year for me. Without question, HQ has to address the judging issues that have surfaced this year (most notably in 13.2 but also in 13.4 with a big flap over Jason Khalipa's toes-to-bar), but overall I think we've seen an improvement from last year's Open. Participation is way up, the Games site has been running much more smoothly, and in my opinion, the programming has been better.

Let's talk a bit more about the programming. As I've said in the past, I don't think you can afford to program single-modality events in a competition with five events. You're just introducing too much variance into the equation and potentially letting some specialists into the Regionals ahead of more well-rounded competitors. Couplets and triplets simply tell us more about each athlete, and we need to gain a lot of information about these athletes in order to trim the field from 5,000+ down in some cases to only 48. To this point, HQ has not used any single-modality events, and it shows when you look at the standings. The 48th place competitor in the Central East currently has 273 points through 4 events, which is less than the 48th place competitor in that region had through just 3 events last year with half as many athletes. What does this mean? It means that a) my projections for the necessary points to qualify for regionals were way too high; and b) the cream is rising to the top faster. The best athletes are doing relatively well on every event.

Has the programming been perfect? No, I don't think it has. To be sure, 13.2 was a judging disaster, and in my opinion, weighted too heavily toward the box jumps. I liked 13.4, but I think it would have been a more balanced workout with the toes-to-bar first. I also don't think they've gone heavy enough to this point, meaning you may see the standings shake up quite a bit at the regionals when things inevitably jump up in weight.

That being said, I think it's been an improvement over last year. And hindsight is 20/20 on some of this stuff; I'm not claiming that I could have programmed five workouts that would, at the end of the day, been preferable.

Alright, enough chit-chat. Let's get down to the business at hand: making a (not really that wild) guess at 13.5. Anyone who's been paying attention knows two movements that are still on the table: thrusters and pull-ups. I'd be stunned if both these movements don't make an appearance in 13.5, so in my mind, the big questions are:

1) Will they include overhead squats?
2) Will the pull-ups be chest-to-bar?
3) How heavy will they go on the thrusters?
4) What will the time domain be?
5) What will the rep scheme be?

Let's tackle these one at a time. I think it's a toss-up whether overhead squats get used. I think one of the major disappointments last year in the Open was the lack of overhead squats, which is just about the quintessential CrossFit movement. I'd personally be disappointed if they left them out again, but they may not want to mess with the "Fran" concept (thrusters and pull-ups) for the final workout. But since I have to pick one way or the other, I say they include overhead squats.

Next, I think the pull-ups will be chest-to-bar. I was surprised when they did this in 2011, because there are a lot of otherwise fit women who can't do these. But from a judging perspective, they are simply much easier to handle than chin-to-bar. Advantage: chest-to-bar.

Third, I think the weight on the thrusters is an interesting question. Overall, the average relative weight is still lower (0.86 men, 0.56 women) than 2011 (0.93, 0.63) or 2012 (0.99, 0.61). The load-based emphasis on lifting (LBEL) is 0.43/0.28, compared with 0.52/0.35 in 2011 and 0.43/0.27 in 2012. Thrusters at 100/65 lbs. comes in at 0.91/0.59, which would bring the average relative weight up slightly this year but still keep us lower than average. At the end of the day, though, I think HQ really likes the 100/65 lb. weight too much to switch.

My guess on the time domain is that it's between 7-12 minutes. The average time this year has been 11:30, slightly over the two-year average of 11:00. I doubt they go long this time. Let's go with 8:00, because that puts us precisely at the 54 total minutes from 2012.

Finally, the rep scheme. After 13.4, I think we can bury the ascending ladder scheme for this one. Beyond that, let's face it: this is a wild guess. So with all this in mind, here's my scientific wild-ass guess for 13.5:

In 8 minutes, complete 40 overhead squats (100/65), followed by as many rounds as possible of 10 pull-ups (chest-to-bar), 10 thrusters (100/65)

Again, feel free to throw your guess into the mix in the comments. We were awfully close last week, and I think someone might be able to nail it this time around. Good luck to everyone on 13.5!