CFG Analysis: If We're Going to Stick With Points-per-Place, A Suggestion

Sunday, November 18, 2012

If We're Going to Stick With Points-per-Place, A Suggestion

After the positive response in the past few days to my post about what to expect from the next Games season, I'd like to continue to write more about training for the upcoming season. I don't purport to be an expert trainer, and I'm certainly not going to be prescribing any workouts, but I hope I can provide a different perspective on the Games and get some discussion started on programming for training vs. competition. But, alas, my schedule this past week just did not give me the time to get into that topic in full detail yet.

Today, I've just got a follow-up on my earlier post regarding the CrossFit Games scoring system ("Opening Pandora's Box: Do We Need a New Scoring System"). In fact, this is actually a follow-up to a comment to that post.

Tony Budding of CrossFit HQ was kind enough to stop by and respond to my article, in particular my suggestion that we move to a standard deviation scoring system. You can read my post and Tony's comment in full to get the details, but the long and short of it is this: HQ is sticking with the points-per-place system for the time being. I'd like to keep the discussion going in the future about possibly moving away from this system, but for now, I accept that the points-per-place is here to stay. Tony made some good points, and I understand the rationale, though I stand by my argument.

Anyway... Tony mentioned that they are still working on ways to refine the system. Certain flaws, like the logjams that occurred at certain scores (like a score of 60 on WOD 12.2) are probably fixable with different programming, and there are some tweaks that could be made to address other concerns (for instance, only allowing scores from athletes who compete in all workouts). But I had another thought that would allow us to stick with the points-per-place system while gaining some of the advantages of a standard deviation system.

At the Games for the past two years, the points-per-place has been modified to award points in descending order based on place, with the high score winning (in contrast to the open and regionals, where the actual ranking is added up and the low score wins). In addition, the Games scoring system has wider gaps between places toward the top of the leaderboard. In my opinion, this is an improvement over the traditional points-per-place system because it gives more weight to the elite performances. However, I think we can do a little better.

First, here is my rationale for why we should have wider gaps between the top places. If you look at how the actual results of most workouts are distributed, you'll see the performance gaps are indeed wider at the top end. The graph below is a histogram of results from Men's Open WOD 12.3 last year:

There are fewer athletes at the top end than there are in the middle, so it makes sense to reward each successive place with a wider point gap. However, the same thing occurs on the low end, with the scores being more and more spread out. But the current Games scoring table does not reflect this - the gaps get smaller and smaller the further down the leaderboard you go (the current Open scoring system obviously has equal gaps throughout the entire leaderboard).

Now, another issue with the current Games scoring table is that it's set up to handle only one size of competition (the maximum it could handle is around 60). So let's try to set up a scoring table that will address my concern about the distribution of scores but can be used for a comeptition of any size (even the Open).

Obviously, the pure points-per-place system used in the Open will work on a competition of any size, but what is essentially does is assume we have a uniform distribution of scores. Basically, the point spread between any two places is the same regardless of where you fall in the spectrum. So what happens is the point difference between 100 burpees and 105 burpees becomes much wider than the gap between 50 and 55 or 140 and 145. So my suggestion is this: let's use a scoring table that ranges from 0-100 but reflects a normal (bell-shaped) distribution rather than a uniform (flat) distribution. The graph below shows that same histogram of WOD 12.3 (green), along with a histogram of my suggested scores (red) and a histogram of the current open points (blue). The scale is different on each histogram, but there are 10 even intervals for each, so you can focus on how the shapes line up.

You can see that the points awarded with the proposed system are much more closely aligned with the actual performances than the current system. And this was done without using the actual performances themselves - I just assumed the distribution of performances was normal and awarded points, based on rank, to fit the assumed distribution.

Now, you may be asking, how well does this distribution fare when we limit the field to only the elite athletes? Well, the shape does not tend to match up as well as we saw in the graph above. Part of this is due to the field simply being smaller, so there is naturally more opportunity for variance from the expected distribution. However, for almost every event in last year's Games, there is no question that the normal distribution is a better fit than the current Games scoring table. The chart below shows a histogram the actual results from the men's Track Triplet along with the distribution of scores using the proposed scoring table and the current scoring table. I have displayed the distribution of scores from the scoring table with lines rather than bars to make the various shapes easier to discern.

As stated above, we do not perfectly match the actual distribution of results. But clearly the actual results are better modeled with the normal distribution than with the current scoring table. As further evidence, the R-squared between the actual results and the proposed scoring table is 96.0%; the R-squared between the actual results and the current scoring table is only 83.9%. If we make this same comparison for each of the first 10 events for men and women (excluding the obstacle course, which was a bracket-style tournament), the R-squared was higher with the proposed scoring table than with the current table, with the exception of the women's Medball-HSPU workout.

I believe this proposed system, while not radically different than our current system, would be an improvement but would not have any of the same issues that concerned HQ about the standard deviation system. While the math used to set up the scoring system may be difficult for many to digest, that's all done behind the scenes and the resulting table is no more difficult to understand than the current Games scoring table, especially if we round all scores to the nearest whole number. If used in the Open, we'd almost certainly have to go out to a couple decimal places, but I think otherwise this system would work fine. And since we are still basing the scores on the placement and not the actual performance, this system also does not allow, as Tony said, "outliers in a single event [to] benefit tremendously." It does, however, reward performances at the top end (and punish performances at the low end) more than the current system.

I appreciate the fact that Tony took the time to review my prior work, and I hope that he and HQ will consider what I've proposed here.

*Below is the actual table (with rounding) that would be used in a field of 45 people (men's Games this year), compared with the current system.

**MATH NOTE: In case you were wondering, here is the actual formula I used in Excel to generate the table:

POINTS = normsinv(1 - (placement / total athletes) + (0.5 / total athletes)) * (50 / normsinv(1 - (0.5 / total athletes)) + 50

This first part gives us the expected number of standard deviations from the mean, given the athlete's rank. Next we multiply that by 50 and divide by the expected number of standard deviations from the mean for the winner (this will give the winner 50 points and last place -50 points). Then we add 50 to make our scale go from 0-100.

15 comments:

Euan RobertsonDecember 6, 2012 at 6:00 AM
Hi Anders,

I have been trying to reproduce exactly your table in excel. I get 164 points for the winner instead of 100. We are designing a competition and would like to try your formula. Could you please post exactly the excel forumala so that I can copy and paste it?
ReplyDelete
Replies
UnknownDecember 6, 2012 at 8:59 AM
Euan,

Thanks for giving this method a shot. Be sure to let me know how it works out for you. I'm guessing you may have had a parenthesis in the wrong spot when you converted the formula to Excel. Anyway, here is one way to do it in Excel:

In column A, starting on row 1, make a list of places from 1st through last in ascending order. Make sure not to have anything else in column A.
In cell B1, use the following formula:
=NORMSINV(1-(A1/MAX(A:A))+(0.5/MAX(A:A)))*(50/NORMSINV(1-0.5/MAX(A:A)))+50
Copy that formula down in column B to the end of your list of places.

If you have a field of 20 athletes, the points should look be 100 for 1st, 87 for 2nd, 79 for 3rd,... 21 for 18th, 13 for 19th, 0 for 20th.

If you're still having trouble, send me an email (anders@alumni.wfu.edu) and I'll send you an actual spreadsheet.
ReplyDelete
Replies
UnknownDecember 28, 2012 at 10:57 AM
This is awesome work. Knowing how much Greg Glassman and company love to talk about the standard bell curve, I am hopeful that they'll go for this scoring system in lieu of the current one. Thanks for this.
ReplyDelete
Replies
RonDecember 31, 2012 at 3:59 PM
I've implemented this method in my competition scoring system, but a question is what is the best way to handle ties with this method? Currently I assigned points for ties as the mean of the combined points for the tied places. So 1st and 2nd each get half of the combined points for 1st and 2nd. This keeps the total of points assigned in an event the same as if there were no ties. If I use the mean place for the tied places, i.e. 1.5 in the previous example, then the total of the points assigned via this method changes slightly.
ReplyDelete
Replies
UnknownJanuary 1, 2013 at 9:31 PM
Interesting question. Not sure there's a "right" answer, but I like the way you are doing it to average the points, not the placement. It's easier to handle and it keeps the total points unchanged, as you pointed out. Either way, much better than the current system awarding the highest possible place to all tied athletes.
ReplyDelete
Replies
AnonymousJanuary 9, 2013 at 12:28 PM
Hi Anders,

I'm a little late to this party, but I'm going to comment a bit on your thoughts here as well as in the original Pandora's Box post. The root of the disconnect between you and Tony is that you don't have the same goals. Flaw #2 in your original post is that there should be an award for outstanding performances. If that were fixed, then the direct consequence would be that it would punish the generalist and reward the specialist. But that is contrary to CrossFit's definition of fitness. Many of the real-world places where fitness matters are binary decisions. Do you kill the bad guy or does he kill you? Do you get your wall of sandbags built high enough before the water comes or not? Can you run down the purse snatcher or not? If you kill the bad guy because you're stronger than him, then it wouldn't have mattered if you had an 800 lb squat rather than your 650lb squat. I think you'll agree that there's no way CrossFit is going to change their definition of fitness, so this "flaw" is null and void from the get-go.

Next, I think flaw #3 is pretty closely related to flaw #2. Like I mentioned above, these binary questions are what matters. It's not so important whether a one-burpee increase in score improves your rank by 800 places or 300 places. And I think that if you fix this "flaw", then you're automatically going to be doing the wrong thing with respect to CrossFit's position on flaw #2. In short, if you reward people for the amount of spacing between contestants, then you're moving away from the binary aspects of fitness and thus towards rewarding specialists. Also, I think flaw #3 can be corrected with better programming, so it's not such an important question.

However, I do agree with you (and I think Tony would too) regarding flaws #1 and #4 (which are also closely related). For instance, consider a hypothetical scenario where towards the end of the games Dan Bailey was in 28th place and Mikko Salo was barely ahead of Rich Froning for first place. Now imagine a workout comes up which happens to play to Dan Bailey's strengths. In the middle of the workout, Dan is out front, but he sees that Rich is a little bit ahead of Mikko. Since Dan is close friends with Rich, he could slow down a bit...enough so that Rich could pass him but still remain ahead of Mikko. This action could conceivably change the outcome and make Rich win instead of Mikko. This is definitely a problem. I don't think this has happened yet, but with $250,000 at stake, you have to admit that if a situation arose, it would be very logical for Dan to throw the event so his buddy and training partner would get a big payday.

One way to deal with this problem would be to use condorcet voting. A system like this would mean that every event can be considered a vote in a race between every pair of athletes. You make a big table of all contestant pairings. If in event #1, athlete A beats athlete B, then that's essentially a vote that athlete A is fitter than athlete B. If in event #2, athlete B wins, then that's a vote that athlete B is fitter. The final voting resolution can be a bit complicated, but on the whole I think this seems like a really promising approach. The current ranking system is probably a little easier to understand, but I think a condorcet system would be very understandable. It would also have the side benefit of educating a lot of people about condorcet voting, which I think would be a good thing because it could also benefit the U.S. political system. :P (You know...just had to throw that in there...rest day discussions and all.)

I would love to do some research and analyze the results of past Games with this condorcet method to see how it would do, but unfortunately I don't have the time. I would be very interested if someone else posted an analysis like this.
ReplyDelete
Replies
UnknownApril 23, 2014 at 6:12 AM
I really like learning and following your post as I find out them extremely useful and interesting. This post is in the same way useful as well as interesting . Thank you for information you been putting on developing your web page such an interesting. I offered something for my information. Point Sticks
ReplyDelete
Replies

Add comment

Follow me on Twitter!

Sunday, November 18, 2012

If We're Going to Stick With Points-per-Place, A Suggestion

15 comments: