Today I’d like to take on a topic that may be irrelevant to 95% of the competitors, but which is of great consequence to the remaining 5%. Although we know exactly how well athletes overall must place in order to reach the regionals (ignoring for a moment the vagueness surrounding whether HQ will invite a few extra athletes in certain regions), we do not know exactly how well an athlete needs to do in each particular workout in order to finish well enough overall. In fact, we cannot know this for sure, no matter how much research we do and how robust a model we might use to predict it.
For the purposes of this post, let’s assume that an athlete
needs to finish in the top 48 in his/her region to qualify for the regionals.
To be sure, placing 48th or better in every workout would guarantee
this (that’s 240 points). But given the nature of our sport, we know it’s not
necessary to place that high in every event. Because of the way athletes’
performances vary between events, athletes finishing toward the top of the
standings on each event will tend to finish higher than the average of their
individual event placements. Conversely, athletes near the bottom will tend to
finish lower than the average of their individual event placements. So for
athletes near the top, the question is, just how far can you afford to fall in
each event and still keep your hopes alive? I hate to be the bearer of bad
news, but for about 80% of the athletes, their first event score alone will
be too many points to qualify for regionals. That’s the nature of the
points-per-place system: there are some holes you simply cannot dig yourself
out of.
So how can we try to estimate the number of points
necessary? Well, based on the data available, our best shot is to develop a
model based on the number of competitors in each region to try to estimate the
number of points you’ll need at the end of week 5. Without knowing any more
about each region (are certain ones more top-heavy than others, for instance),
this is really all we can base our estimate on. Let’s start by looking at what
happened last year.
I went through each region’s 2012 results and recorded the
following information: the point total of the 48th and 60th
place athletes and the total number of competitors that competed in week 1. I did
this separately for men and women. I’ll focus on the 48th place
model for now but I’ll throw in some results for the 60th place
model at the end, since it’s possible that HQ will end up taking athletes who
finish that low. Below are scatter plots for both men and women, with the
number of week 1 competitors on the x-axis and the 48th place points
on the y-axis. (I apologize for the small charts. I'm posting this from a different computer and it's not letting me size the charts when I post them to the blog. Hopefully I can get them re-sized eventually.)
To me, the relationship looks roughly linear with an
intercept near 300 or so. We know there is no way that the 48th
place athlete can possibly score fewer than 240 points (mentioned earlier),
even if there are only 60 or 70 athletes. From that point, however, the number
of points needed rises as the field gets larger, and thus more competitive. We
could simply slap a linear estimate on here and call it a day, but I think it might be a bit more complicated. I also decided to look at 2011 – would
we see the same type of relationship there? Well, sort of. Below are scatter plots for the men in 2011 and 2012, each with their own linear fit on the graph to help illustrate the point. (Note: To get the 2011 points, I tried to get an estimate of what the points would have been after 5 weeks. Since I was a little strapped for time, I looked at the total points after 6 weeks and then scaled it back to about 75% of that number. That 75% figure was based on looking at a few regions and actually calculating out the rankings after 5 weeks. That would have been too time-consuming to do for every region, though.)
We can see that the intercept is lower for the 2011 group, but the slope of the line is steeper. It's tough to use 2011 to infer too much from these differences, however, because the region sizes were generally so much smaller in 2011. So what do we do about predicting this year? Well, I decided to come up with three different models to provide a bit of a range. The first model is based solely off of 2012. This produces the "mid" estimate. Using all the data points from 2011 and 2012 produces a steeper line (the "high" estimate) that is higher for the larger groups but slightly lower for very small groups. The third model ("low" estimate) assumes that the slope of the line will continue to decrease in 2013, similarly to the way it did from 2011 to 2012. I repeated this process for the women as well.
Below is are the three men's models in graphical form.
For the men, the mean absolute error was 27 the 2012 model and 25 for the 2011-2012 combined model. For the women, the mean absolue error was 19 the 2012 model and 21 for the 2011-2012 combined model. Keep in mind those were the errors on the historical data; the tricky part here is estimating how well these models will translate to 2013.
Here are my final estimates. To use them, find the number of athletes in your region, then find that number (as closely as you can) on in the column on the left. For example, in my region, the Central East, we have about 4,800 men's athletes at the end of week 1, so my mid estimate is that 637 points (about 127 per week) will put you in 48th and 728 points (about 146 per week) will put you in 60th.
*Note: I think an interesting analysis for another day would be to
look at the number of athletes in each region who actually impact the
standings at the top. You could do this by removing each athlete from the
competition and testing whether the point totals of the top 48 (or 60) athletes
changed at all. This number of athletes would probably correlate much more
strongly to the number of points needed to make regionals. However, a difficult
task would be estimating this number for 2013 so we could make predictions. But
it’s something to consider.
For individual cross-yearly comparison of point total could you use a field adjustment calculation to determine relative points score each year?
ReplyDeleteFor example, you might have a goal to score less than 1,500 points in a given year. However, field size would obviously adjust how obtainable this is from year-to-year.
I guess the % of region finish order would be much easier -- ex .. 1-(410/1649) puts you in the top 75% of your region
Matt,
DeleteMy feeling is you're probably best just looking at your percentile. I like to use the number of competitors in week 1 as your denominator (basically ignoring attrition). Still, with the growth of the sport right now, it's hard to use your rank from year to year to judge your own improvement. Finishing in the top 10% in 2013 is probably harder than in 2011 or 2012. A 200-lb snatch used to be an elite lift for a crossfitter, but now it's basically a prerequisite to make regionals.
With 3 out of 5 WODS complete, just wondering how you feel about the points model you put out.
ReplyDeleteI took a quick look at NorCal men (my region) and, currently, that 48th spot is claimed by folks with ~220pts. With 2 WODs to do, those folks sitting in 48th could be expected to accumulate another ~150pts and finish with ~370pts.
That would be fairly well below the ~523 points your model had predicted they could accumulate and still sneak in at 48.
Not tossing stones, just wondering what your thoughts are...
Definitely a fair question. I was going to reply here but decided to address this more fully in a blog post. See my 'Fun With SWAGs' for this week.
Delete