Today I’d like to take on a topic that may be irrelevant to 95% of the competitors, but which is of great consequence to the remaining 5%. Although we know exactly how well athletes overall must place in order to reach the regionals (ignoring for a moment the vagueness surrounding whether HQ will invite a few extra athletes in certain regions), we do not know exactly how well an athlete needs to do in each particular workout in order to finish well enough overall. In fact, we cannot know this for sure, no matter how much research we do and how robust a model we might use to predict it.
For the purposes of this post, let’s assume that an athlete needs to finish in the top 48 in his/her region to qualify for the regionals. To be sure, placing 48th or better in every workout would guarantee this (that’s 240 points). But given the nature of our sport, we know it’s not necessary to place that high in every event. Because of the way athletes’ performances vary between events, athletes finishing toward the top of the standings on each event will tend to finish higher than the average of their individual event placements. Conversely, athletes near the bottom will tend to finish lower than the average of their individual event placements. So for athletes near the top, the question is, just how far can you afford to fall in each event and still keep your hopes alive? I hate to be the bearer of bad news, but for about 80% of the athletes, their first event score alone will be too many points to qualify for regionals. That’s the nature of the points-per-place system: there are some holes you simply cannot dig yourself out of.So how can we try to estimate the number of points necessary? Well, based on the data available, our best shot is to develop a model based on the number of competitors in each region to try to estimate the number of points you’ll need at the end of week 5. Without knowing any more about each region (are certain ones more top-heavy than others, for instance), this is really all we can base our estimate on. Let’s start by looking at what happened last year.
I went through each region’s 2012 results and recorded the following information: the point total of the 48th and 60th place athletes and the total number of competitors that competed in week 1. I did this separately for men and women. I’ll focus on the 48th place model for now but I’ll throw in some results for the 60th place model at the end, since it’s possible that HQ will end up taking athletes who finish that low. Below are scatter plots for both men and women, with the number of week 1 competitors on the x-axis and the 48th place points on the y-axis. (I apologize for the small charts. I'm posting this from a different computer and it's not letting me size the charts when I post them to the blog. Hopefully I can get them re-sized eventually.)To me, the relationship looks roughly linear with an intercept near 300 or so. We know there is no way that the 48th place athlete can possibly score fewer than 240 points (mentioned earlier), even if there are only 60 or 70 athletes. From that point, however, the number of points needed rises as the field gets larger, and thus more competitive. We could simply slap a linear estimate on here and call it a day, but I think it might be a bit more complicated. I also decided to look at 2011 – would we see the same type of relationship there? Well, sort of.
Below are scatter plots for the men in 2011 and 2012, each with their own linear fit on the graph to help illustrate the point. (Note: To get the 2011 points, I tried to get an estimate of what the points would have been after 5 weeks. Since I was a little strapped for time, I looked at the total points after 6 weeks and then scaled it back to about 75% of that number. That 75% figure was based on looking at a few regions and actually calculating out the rankings after 5 weeks. That would have been too time-consuming to do for every region, though.)
We can see that the intercept is lower for the 2011 group, but the slope of the line is steeper. It's tough to use 2011 to infer too much from these differences, however, because the region sizes were generally so much smaller in 2011.
So what do we do about predicting this year? Well, I decided to come up with three different models to provide a bit of a range. The first model is based solely off of 2012. This produces the "mid" estimate. Using all the data points from 2011 and 2012 produces a steeper line (the "high" estimate) that is higher for the larger groups but slightly lower for very small groups. The third model ("low" estimate) assumes that the slope of the line will continue to decrease in 2013, similarly to the way it did from 2011 to 2012. I repeated this process for the women as well.
Below is are the three men's models in graphical form.
For the men, the mean absolute error was 27 the 2012 model and 25 for the 2011-2012 combined model. For the women, the mean absolue error was 19 the 2012 model and 21 for the 2011-2012 combined model. Keep in mind those were the errors on the historical data; the tricky part here is estimating how well these models will translate to 2013.
Here are my final estimates. To use them, find the number of athletes in your region, then find that number (as closely as you can) on in the column on the left. For example, in my region, the Central East, we have about 4,800 men's athletes at the end of week 1, so my mid estimate is that 637 points (about 127 per week) will put you in 48th and 728 points (about 146 per week) will put you in 60th.
*Note: I think an interesting analysis for another day would be to look at the number of athletes in each region who actually impact the standings at the top. You could do this by removing each athlete from the competition and testing whether the point totals of the top 48 (or 60) athletes changed at all. This number of athletes would probably correlate much more strongly to the number of points needed to make regionals. However, a difficult task would be estimating this number for 2013 so we could make predictions. But it’s something to consider.