CFG Analysis: Week 3 Predictions: #Spealler and Stochastic Modeling

Welcome back for week 3 of the regional season. Looking at this slate of regions (Asia, North Central, North West, South West), it's pretty clear what sticks out: #spealler. What everyone wants to know is whether or not Chris Spealler can qualify for a 7th time, and if so, will he do it in as dramatic fashion as last year?

I'll attempt to answer that question in a couple of ways. One way is just based on experience watching the sport, sizing up the events and going by "feel" to some degree. The other way is to look at Spealler's chances the same way I'll be looking at everyone else's chances: with some stochastic modeling.

Before I get back to Spealler, I'll give an overview of how I did the modeling this week, because it's quite different than my week 1 and 2 predictions. This week, I wanted to look not only at who is most likely to qualify, but how likely they are to qualify. In other words, I wanted to estimate the probability of each athlete qualifying to the Games.

In short, this is done as follows:

Based on their performance in the 2012 Open, 2011 Games and 2011 Regionals, separate the 2012 Regional competitors into various categories.
See how frequently athletes in each category posted a very high (top 50 worldwide) or relatively high (50-100 worldwide) regional performance in 2012 (based on the cross-regional rankings last year).
For each athlete this year, place them in one of the categories based on their performance in the 2013 Open, 2012 Games and 2012 Regionals.
Depending on their category, randomly generate a worldwide ranking for each athlete this year. The category affects this randomized worldwide ranking, i.e. those who had better results in the past year will generally get a better randomized worldwide ranking this year.
Re-rank all athletes within a region based on these randomized worldwide rankings.
Repeat 1,000 times and see how often each athlete qualifies for the 2013 Games.

Now, for those so inclined, here are a few details on this process:

To get a large enough sample size to build this model, I combined men and women.
The process for creating the categories of competitors was not straightforward. There was quite a bit of judgment on my part to make sure that each category had sufficient athletes to be credible and that the categories produced results that made sense with each other. For instance, I wanted to separate out the top 2011 games competitors (I chose top 15), but that meant I could not further break those athletes down based on 2011 regional rank, because there just would not be enough athletes there to get a credible sample.
The process for randomly generating the numbers is as follows:

Generate a uniform random number between 0 and 1 (=rand() in Excel). If the first is lower than the athlete's chance of finishing in the top 50, assign him or her to the top 50. If not, then if it is lower than the athlete's chances of finishing in the top 100, assign him or her to be between 50-100. Otherwise, the athlete is assigned to be between 100 and 150.
Once we have assigned the athlete to the a range of ranks, generate another uniform random number between 0 and 1. Multiply this by 50 to get the athlete's exact place within the range. Generally, you'll need to be in the top 50 worldwide to qualify, but depending on how other athletes fare, it's possible to end up in the 50-100 range and still be in the top 3.

Here is a lifting of the categories I used to break down the athletes:

Top 15 at prior Games
Below 15 at prior Games, top 40 worldwide at prior Regionals
Below 15 at prior Games, below 40 worldwide at prior Regionals
Did not make prior Games, top 50 worldwide at prior Regionals, top .5% in current Open
Did not make prior Games, top 50 worldwide at prior Regionals, below .5% in current Open
Did not make prior Games, 50-100 worldwide at prior Regionals
Did not make prior Games, below 100 worldwide at prior Regionals, top 1% in current Open
Did not make prior Games, below 100 worldwide at prior Regionals, below 1% in current Open
Did not make prior Games, did not compete at prior Regionals, top 0.2% in current Open
Did not make prior Games, did not compete at prior Regionals, 0.2-1.0% in current Open
Did not make prior Games, did not compete at prior Regionals, below 1.0% in current Open

OK, before we move onto the model results, I promised some good old-fashioned analysis regarding one Chris Spealler. As I mentioned when the regional events were announced, I thought these events favored smaller athletes much more so than last year. It seems that has been the case so far on the men's side (Spencer Hendel failing to qualify and Josh Bridges dominating his regional are two pieces of evidence for this). I think Spealler will take a big hit on the overhead squat and a slightly smaller hit on the deadlift-box jump, but I see him faring well on everything else.

There are at least 5 really tough guys in that region, so he'll certainly need to be at the top of his game. I tend to think Hathcock will break through this year, which would mean Spealler would probably need to beat out Patrick Burke in order to qualify (doubt he can beat Matt Chan). You never know with Burke since he really struggled in the Open, but he has been in the Games 4 years running. But all in all, with a gun to my head, I say Spealler makes it. I don't exactly know how, but I say he makes it.

Onto the model results. Because this process was time-consuming, and because it's most definitely in the early stages with a few kinks to get worked out, I only produced predictions for the South West and North Central this week.

This process tends to give a lot of solid athletes a good chance of qualifying, but will rarely give anyone an extremely high chance of qualifying. Obviously, for someone like Rich Froning, I may be inclined to make some manual adjustments so that his odds go up substantially. But frankly, the Regionals are so tough these days that there are only a handful of athletes that we expect to cruise through to the Games.

This week, I have just produced the results straight up, with no modification. Take them for what they are worth: these models only take into account the performances from the past year, and obviously I have no idea how an athlete is feeling or how they have been training. On a large scale, I feel good about them, but there are certainly instances where certain athletes probably should be assigned a better chance than they are here (Kasperbauer seems like one, as you will see below, but then again, that is a very deep region).

OK, finally, below are the athletes in each region with the best chances of making it to the Games.

Enjoy the weekend everyone!

CFG Analysis

Friday, May 31, 2013

Week 3 Predictions: #Spealler and Stochastic Modeling

No comments:

Post a Comment