CFG Analysis: A Fairer Regional Comparison: 2013 Edition

Today's post will mark an anniversary of sorts for my blog. Roughly a year ago, I put up my first post, in which I produced adjusted worldwide Regional rankings, accounting for the advantage gained by athletes in the later weeks. The post went up with little fanfare, and I imagine many people who started coming to this site via Rudy Nielsen's Outlaw Way blog may not have even read this original post. In hindsight, it was probably overly technical, but give me a break: it was my first post.

Well, today I'll be doing essentially the same thing for this year's regionals. But first, let me clarify what I believe these rankings really mean. They are not my predictions for the Games, and they are not a ranking of the best CrossFitters in the world. Rather, what they are is an attempt to understand who would win if the exact same Regional events played themselves out at the Games. If all the Games qualifiers competed together, at the same time, using the Games scoring system, who do we believe would win? Certainly this gives us a good starting point to discuss who may win the Games (hint: Rich Froning has a shot!), but it's not a prediction for the Games.

Many of you have probably seen a version of these rankings at http://crossfitregionalshowdown.com/leaderboard/men and http://crossfitregionalshowdown.com/leaderboard/women. These are certainly informative, and in fact I grabbed all the original results from this site, so I really appreciate the work they've done. But my feeling has always been that the athletes in the later weeks have an advantage that isn't captured in these rankings. It stands to reason that having additional weeks to prepare, watch other competitions and game plan the events has to give athletes some advantage.

To get an idea of the advantage across the entire field, I ranked all athletes that finished all 7 Regional events based on their worldwide Open rank. Then I ranked them by their worldwide Regional rank (from the sites listed above). For each region, I looked at the average change from Open rank to Regional rank, then plotted them along with the week in which each region competed. This is shown in the chart below, with lower numbers representing improvements from the Open to Regionals.

You can see that while there is variation, the latter weeks clearly tended to have better scores from athletes, relative to their Open performance. This isn't a perfect metric: for one, some elite athletes don't take the Open as seriously as others, but also, it gets skewed at the each end because the top athletes in the Open can only get worse and the worst athletes in the Open can only get better. Still, this tells us that something is probably going on. For women, the same effect is there, but it's not quite as pronounced.

To attempt to account for this, I performed a series of two-variable linear regressions. For each event, I used the event result (the actual result*, not the ranking) as the dependent variable, and for one of the independent variables, I used the week of the athlete's region. For the other independent variable, I used either the athlete's 2012 Regional ranking (if he/she competed individually in 2012) or the athlete's 2013 Open ranking (if he/she did not compete individually at Regionals in 2012). Then I looked at the coefficient for the week of competition - this would give an indication of the impact of the week of competition, after controlling for the athlete's ability.

I then divided the coefficient by the average result on that event for all athletes to get the percentage impact of the week of competition. Depending on some of the summary statistics and using a bit of judgement, I arrived at a final adjustment factor for each event (some were 0 if the results didn't appear significant enough). If the adjustment factor for event 1 was -1.0%, then for every week beyond the midpoint (week = 2.5), I adjusted the athlete's score up by 1.0%. If the athlete competed in week 4 and his score was 6:00, then his adjusted score becomes (6:00 / (1 - 1.5%)) = 6:05. If the athlete competed in week 1, his adjusted score becomes (6:00 / (1 + 1.5%)) = 5:55.

Below are the adjustment factors I used for each event for men and women:

Event 1: -0.6% (men), -0.5% (women)
Event 2: +0.7% (men), +0.8% (women)
Event 3: -0.7% (men), none (women)
Event 4: -0.2% (men), none (women)
Event 5: none (men), none (women)
Event 6: -1.5% (men), none (women)
Event 7: -1.8% (men), -2.2% (women)

As you can see, these are not particularly aggressive adjustments. I think with the first wave of athletes having several weeks to prepare, the advantage is not dramatic each week beyond that. And for women, certain movements like muscle-ups and handstand push-ups are simply so troubling for many athletes that no amount of game-planning in a few weeks can make a significant difference.

For those who care, a couple notes on differences from my modeling last year, as well as some limitations:

This year, I used all Regional athletes. Last year I only used those who completed all made the cut to the final event, but since there were no cuts this year, I used everyone.
This year, I performed the regression across the individual athletes. Last year, I summarized up to the region level first before performing the regression. I think this year's method is preferable; last year I summed up first because I didn't have access to everyone's Open results, so I just counted up the number of athletes in each region in the top 180 worldwide as a proxy for region strength.
This year, I included Asia, Africa and Latin America in running the regressions. Last year, I excluded them.
I made no adjustments for the weather conditions at the outdoor regions. I realize that the elements may have made things more difficult, but with only a couple regions competing outdoors, it is difficult to assess just how much impact this had.
I used no tiebreakers. If two athletes tied, they just stayed tied. Sue me.

OK, let's cut the chit-chat and get to the results. The tables below show my adjusted worldwide Regional rankings, along with the rankings if I had not made any adjustments for the week of competition and the rankings if I had not made any adjustments and used Regional scoring.

My adjustments were generally smaller on the women's side, so there wasn't a ton of shifting on that side of the leaderboard. For the men, we definitely saw some big jumps. Among the biggest winners with my adjustments were:

Josh Bridges (7th to 4th)
Lacee Kovacs (28th to 18th)
Travis Mayer (32nd to 23rd)
ZA Anderson (34th to 24th)
Mikko Salo (30th to 22nd)
Daniel Tyminski (18th to 14th)
Marcus Filly (26th to 20th)
Valerie Voboril (20th to 17th)
Kristan Clever (31st to 28th)
Katrin Tanja Davidsdottir (35th to 32nd)
Talayna Fortunato (14th to 12th)

That's all for now, folks. Next week, I plan to look into all 12 events we've seen so far this season and see which ones were "better" than others. Until then, good luck with your training.

*For events 3, 4, 6 and 7, I added additional time for each rep still left at the time cap, since the 1 second per rep they have listed is not realistic. For events 3 and 7, I added 15 seconds/rep. For event 6, I added 10 seconds/rep. For event 4, I added 5 seconds/rep.

CFG Analysis

Follow me on Twitter!

Friday, June 21, 2013

A Fairer Regional Comparison: 2013 Edition

No comments:

Post a Comment