Today, I've just got a follow-up on my earlier post regarding the CrossFit Games scoring system ("Opening Pandora's Box: Do We Need a New Scoring System"). In fact, this is actually a follow-up to a comment to that post.
Tony Budding of CrossFit HQ was kind enough to stop by and respond to my article, in particular my suggestion that we move to a standard deviation scoring system. You can read my post and Tony's comment in full to get the details, but the long and short of it is this: HQ is sticking with the points-per-place system for the time being. I'd like to keep the discussion going in the future about possibly moving away from this system, but for now, I accept that the points-per-place is here to stay. Tony made some good points, and I understand the rationale, though I stand by my argument.
Anyway... Tony mentioned that they are still working on ways to refine the system. Certain flaws, like the logjams that occurred at certain scores (like a score of 60 on WOD 12.2) are probably fixable with different programming, and there are some tweaks that could be made to address other concerns (for instance, only allowing scores from athletes who compete in all workouts). But I had another thought that would allow us to stick with the points-per-place system while gaining some of the advantages of a standard deviation system.
At the Games for the past two years, the points-per-place has been modified to award points in descending order based on place, with the high score winning (in contrast to the open and regionals, where the actual ranking is added up and the low score wins). In addition, the Games scoring system has wider gaps between places toward the top of the leaderboard. In my opinion, this is an improvement over the traditional points-per-place system because it gives more weight to the elite performances. However, I think we can do a little better.
First, here is my rationale for why we should have wider gaps between the top places. If you look at how the actual results of most workouts are distributed, you'll see the performance gaps are indeed wider at the top end. The graph below is a histogram of results from Men's Open WOD 12.3 last year:
There are fewer athletes at the top end than there are in the middle, so it makes sense to reward each successive place with a wider point gap. However, the same thing occurs on the low end, with the scores being more and more spread out. But the current Games scoring table does not reflect this - the gaps get smaller and smaller the further down the leaderboard you go (the current Open scoring system obviously has equal gaps throughout the entire leaderboard).
Now, another issue with the current Games scoring table is that it's set up to handle only one size of competition (the maximum it could handle is around 60). So let's try to set up a scoring table that will address my concern about the distribution of scores but can be used for a comeptition of any size (even the Open).
Obviously, the pure points-per-place system used in the Open will work on a competition of any size, but what is essentially does is assume we have a uniform distribution of scores. Basically, the point spread between any two places is the same regardless of where you fall in the spectrum. So what happens is the point difference between 100 burpees and 105 burpees becomes much wider than the gap between 50 and 55 or 140 and 145. So my suggestion is this: let's use a scoring table that ranges from 0-100 but reflects a normal (bell-shaped) distribution rather than a uniform (flat) distribution. The graph below shows that same histogram of WOD 12.3 (green), along with a histogram of my suggested scores (red) and a histogram of the current open points (blue). The scale is different on each histogram, but there are 10 even intervals for each, so you can focus on how the shapes line up.
You can see that the points awarded with the proposed system are much more closely aligned with the actual performances than the current system. And this was done without using the actual performances themselves - I just assumed the distribution of performances was normal and awarded points, based on rank, to fit the assumed distribution.
Now, you may be asking, how well does this distribution fare when we limit the field to only the elite athletes? Well, the shape does not tend to match up as well as we saw in the graph above. Part of this is due to the field simply being smaller, so there is naturally more opportunity for variance from the expected distribution. However, for almost every event in last year's Games, there is no question that the normal distribution is a better fit than the current Games scoring table. The chart below shows a histogram the actual results from the men's Track Triplet along with the distribution of scores using the proposed scoring table and the current scoring table. I have displayed the distribution of scores from the scoring table with lines rather than bars to make the various shapes easier to discern.
As stated above, we do not perfectly match the actual distribution of results. But clearly the actual results are better modeled with the normal distribution than with the current scoring table. As further evidence, the R-squared between the actual results and the proposed scoring table is 96.0%; the R-squared between the actual results and the current scoring table is only 83.9%. If we make this same comparison for each of the first 10 events for men and women (excluding the obstacle course, which was a bracket-style tournament), the R-squared was higher with the proposed scoring table than with the current table, with the exception of the women's Medball-HSPU workout.
I believe this proposed system, while not radically different than our current system, would be an improvement but would not have any of the same issues that concerned HQ about the standard deviation system. While the math used to set up the scoring system may be difficult for many to digest, that's all done behind the scenes and the resulting table is no more difficult to understand than the current Games scoring table, especially if we round all scores to the nearest whole number. If used in the Open, we'd almost certainly have to go out to a couple decimal places, but I think otherwise this system would work fine. And since we are still basing the scores on the placement and not the actual performance, this system also does not allow, as Tony said, "outliers in a single event [to] benefit tremendously." It does, however, reward performances at the top end (and punish performances at the low end) more than the current system.
I appreciate the fact that Tony took the time to review my prior work, and I hope that he and HQ will consider what I've proposed here.
*Below is the actual table (with rounding) that would be used in a field of 45 people (men's Games this year), compared with the current system.
**MATH NOTE: In case you were wondering, here is the actual formula I used in Excel to generate the table:
POINTS = normsinv(1 - (placement / total athletes) + (0.5 / total athletes)) * (50 / normsinv(1 - (0.5 / total athletes)) + 50
This first part gives us the expected number of standard deviations from the mean, given the athlete's rank. Next we multiply that by 50 and divide by the expected number of standard deviations from the mean for the winner (this will give the winner 50 points and last place -50 points). Then we add 50 to make our scale go from 0-100.