Monday, March 17, 2014

A Method to Project Overall Open Rankings Mid-Week

One quirk about the Open leaderboard is that while a workout is open for submissions, the overall rankings are basically useless. The rankings for the current week's workout are obviously understated, but as I explained in my previous post, you can at least get a decent sense of where the score will end up by looking at the percentile rank at any point in time. However, with the overall rankings, they are screwed up because the most recent week's rank is so understated in relation to the prior weeks' scores. For instance, if an athlete who was in 300th place in each of the first two weeks but posts the best score in the world early in week 3, he will still appear behind an athlete who finished 290th in each of the first two weeks but is currently 10th of 100 entries in week 3. But we know that by week's end, there will be much more separation between the athletes in their week 3 ranks, which will place the first athlete well in front.

So is there a way we can get at an accurate projection of an athlete's overall ranking mid-week? I think we can, but not without a little bit of work.

The idea is this: since we can reasonably project the ending percentile ranking for the current week's workout, we should be able to reasonably project the ending rank, if we make an assumption about how many athletes will complete the workout. If we can get that projection for any particular athlete, we should be able to do that for all athletes who have completed the workout. At that point, we can re-rank those athletes based on projected total points. Using that, we can basically "scale up" those ranking based on how many athletes we anticipate will complete the current week's workout.

More specifically, here is the process I am proposing:
  • Compute each athlete's percentile ranking for the current week (either overall or in the region) based on the athletes who have currently submitted scores
  • Based on the number of athletes in contention at the end of the prior week, reduce that by some factor (say 10%, which is near the historical average) to get an estimate of the number of athletes who will remain at the end of the current week
  • Multiply the athlete's percentile ranking by the estimated number of athletes who will remain at the end of the current week to get the projected rank for the current workout
  • Use these projected ranks to get a projected overall point total at the end of the current week
  • Re-rank the athletes who have submitted scores based on the projected point totals
  • Convert the projected rankings to a percentile rank based on the number of athletes who have currently submitted
  • Multiply this percentile by our earlier estimate about how many athletes will remain at the end of the current week. This will give you each athlete's projected overall rank at the end of the week.
To accomplish this, all we would need a snapshot of the full leaderboard at a given point in time. I do not think it is possible to accomplish this even for a single athlete without making the calculations for all athletes. However, with the right computing power, it would be a relatively painless calculation to generate the projected overall rankings. Obviously HQ would be in the best position to perform these calculations, but I think it is conceivable that someone on the outside could do this as well.

This is all theoretical at the moment - a decent amount of testing would be necessary to make sure this process actually produces reasonable projections. Still, I think the concept is something that could be used to improve the Open experience for all of us.

Note: If anyone out there has the resources and the know-how to get a hold of the leaderboard mid-week and get it into Excel or .csv, I'd be very interested to test this out. If so, post to comments or email me directly (anders@alumni.wfu.edu).

No comments:

Post a Comment