Friday, May 17, 2013

Testing, testing... 2013 Regional Predictions

The first weekend of Regionals is finally here! Unfortunately, this is also the same weekend I'll be taking my Final Assessment (which of course isn't actually the final hurdle to clear to finish my actuarial testing), so I'll have little to no time to follow the action live. I may not even be able to tune in for the live broadcasts on Sunday afternoon (the humanity!). Anyway, point being, it's a busy time right now.

At the same time, I did want to give predicting the Regionals a shot this year. For this first week, I had to compromise a bit: I've got a set of predictions for the four men's competitions this weekend, but I didn't have time to get through the women. On top of that, I knew the methodology I'd really like to use would be too time-consuming for this first week, so I opted for something a bit simpler. Consider this a sort of beta test for making Regional predictions.

Making these predictions posed quite a different challenge from the Games predictions for a few reasons: 1) we only have one set of results so far this season; 2) there are about 15-20 times more athletes; 3) there are multiple regions, meaning an athlete's success is dictated (to some extent) by the strength of his/her region. With that in mind, here is the basic methodology I employed to make these predictions:

I felt that using only the 2013 Open results to predict the 2013 Regionals was insufficient, so I decided to go back and grab the 2012 Games results and the 2012 Regional results. I wanted to develop 3 sets of models: for athletes who qualified for the Games last year, I would use all three competitions to inform these predictions; for athletes who missed the Games last year but competed in Regionals individually, I would use the 2013 Open and 2012 Regionals to inform these predictions; for the rest of the athletes, I would use only the 2013 Open.

To build these models, I had to go back in time a year and look at how the 2011 Regionals, 2011 Games and 2012 Open related to 2012 Regional results. Gathering all this information was time-intensive, and also forced a couple limitations upon me. First, I only had 2011 Regional information available for athletes that reached the final event (i.e., the top 12 in each region). This meant I had to be consistent in making my 2013 predictions and only use Regional results in my predictions for athletes who reached the finals in 2012. Also, because I often had funky Open result coming through for athletes with common names (Ben Smith, for example), I wound up limiting my work from last year to athletes finishing in the top 7.5% of the Open. This gave me confidence that the scores I did use were correct.

Anyway, let's go ahead and give the top 5 for each region this year (men only - sorry, not sexist, just short on time):

South East
1) Chase Daniels
2) Brandon Phillips
3) Guido Trinidad
4) Elijah Muhammad
5) Irving Hernandez

North East
1) Daniel Tyminski
2) Austin Malleolo
3) Spencer Hendel
4) Mike McKenna
5) Dan Goldberg

Europe
1) Frederik Aegidius
2) Mikko Aronpaa
3) Mikko Salo*
4) Lacee Kovacs
5) Jakob Magnusson

Southern California
1) Kenneth Leverich
2) Jeremy Kinnick
3) Josh Bridges*
4) Ryan Fischer
5) Bill Grundler

Top 10 Overall Performers of the Weekend
1) Kenneth Leverich
2) Daniel Tyminski
3) Austin Malleolo
4) Spencer Hendel
5) Chase Daniels
6) Frederik Aegidius
7) Jeremy Kinnick
8) Brandon Phillips
9) Mikko Aronpaa
10) Mikko Salo

In developing these models, what I found were two key things: 1) athletes who reached the Games last year have a much better chance of reaching the Games this year than other athletes, even given a similar Open result; 2) similarly, athletes who competed at a high level at Regionals last year have a much better chance of reaching the Games this year than other athletes, even given a similar Open result. In 2012, 81% of athletes who made the 2011 Games and were in the top 0.5% in the world in the 2012 Open ended up finishing in the top 50 worldwide at Regionals. Of those who reached the finals at 2011 Regionals but did not make the Games (still top 0.5% in the 2012 Open), that percentage drops to 33%. For those who didn't make the finals at the 2011 Regionals (still top 0.5% in the 2012 Open), the figure drops to 14%.

Sure, last year we had guys like Scott Panchik and Marcus Hendren who made a splash in their first Regionals, but for every one of them, there were generally about 6 other guys with similar Open performances who didn't do anything special at Regionals. Meanwhile, you had veterans like Patrick Burke who put up sub-par Open performances and still excelled at Regionals. To be sure, there will be some new faces who do amazing things at Regionals this year, but it's just hard to predict exactly who those will be.

These are interesting facts, but they make for somewhat boring predictions (i.e. huge advantage to prior Games athletes). This is why I'd like to work on some different techniques for next week (or maybe the 3rd or 4th week - no promises). Ideally I'd like to make some sort of stochastic model that gives a probability of reaching regionals for each athlete, not simply a best estimate of how each athlete will do.

So again, consider these a beta test, and don't take them too seriously. Enjoy the weekend, and I'll see you again soon!

Before I go, I'd also like to give a shout-out to Michael Girdley (see his blog at girdley.com for some 2013 Open analysis and more). Through some programming wizardry, Michael has been able to pull down all the detail from the 2013 Open (including age, height, weight, current maxes, etc.), which will allow for some more in-depth analysis of the Open (some of which he's already done on his site). I'm looking forward to digging more into that in the coming weeks and months.

*For Josh Bridges and Mikko Salo, I manually entered them with a Regional and Games result of 47th for last year (i.e. last place at the Games). They were special cases of athletes who have done exceptionally well in the past but missed last year due to injury. This was my compromise on them.

3 comments:

  1. ...bear with me.

    I was reading Chelsea Ryan's blog entry about her experience at the NorCal Regional (http://chelsearyantraining.blogspot.com/2013/05/11th-place-at-norcal-regional.html) and was struck by this bit:

    "My goal going into regionals was top 20, given that I was 51st in the open I thought that seemed like a reasonable goal. These girls have to be pretty good if they have been qualifying for regionals the previous years and I haven't, right? I was actually quite surprised by some of the athletes and the scores they posted on the open versus at regionals. There were quite a few things that didn't seem to add up."

    Chelsea goes on more about that as the article continues but it got me thinking that it'd be an interesting bit to see who fell where relative to their placing in the Open.

    Granted, the Regional is not the Open and some athletes come into the Regional with injuries they didn't have in the Open (and vice versa) but general conclusions might be drawn and it's interesting nonetheless.

    For the men, the four columns are: Final NorCal Regional Rank; Name; NorCal Open Rank; +/- from Open to Regional

    1 Jason Khalipa 2 1
    2 Neal Maddox 1 -1
    3 Garret Fisher 3 0
    4 Marcus Filly 15 11
    5 Pat Barber 4 -1
    6 Gabe Subry 17 11
    7 Ben Alderman 39 32
    8 Nick Zambruno 13 5
    9 Buddy Hitchcock 6 -3
    10 Anthony Malta 22 12
    11 Will Zerlang 14 3
    12 Shaun Eagen 28 16
    13 Nick Lucchesi 29 16
    14 Nick Pappas 18 4
    15 Aj Zambruno 32 17
    16 Dusty Sulon 35 19
    17 Lorin Adams 20 3
    18 Myles Lewis 37 19
    19 Ryan Hignell 8 -11
    20 Timmy Johnston 31 11
    21 Adam Jamieson 7 -14
    22 Mauricio Leal 43 21
    23 Trent Simmons 33 10
    24 Mark Pfeifer 0 -24
    25 Tyler Wilcox 27 2
    26 Spenser Scott 16 -10
    27 Daniel Felling 19 -8
    28 Desmond Bittner 5 -23
    29 Eric Botsford 21 -8
    30 Rikus Pretorius 23 -7
    31 Devon Simmons 38 7
    32 Travis McRoberts 40 8
    33 Carlos Ramirez 42 9
    34 Anthony DeJager 34 0
    35 Brian Huberty 41 6
    36 Mike Morales 36 0
    37 Dirty Alvarez 10 -27
    38 Alex Oberman 11 -27
    39 Alex Brown 30 -9
    40 riki gonzales 12 -28
    41 Jake Neubauer 26 -15
    42 Jonathan Jorgensen 25 -17
    43 Alex Rollin 9 -34

    I'm certain there is a more nuanced way to do an analysis of athlete performance in their Open vs. their Regional but I think it's both interesting and important.

    Thoughts...?

    ReplyDelete
    Replies
    1. Brian,

      In the post with predictions for week 2, I did do some quick comparisons showing the R-squared for my predictions along with the R-squared for predictions solely based on the Open (the same as your "Expected Regional Rank"). Both predictions had a pretty low R-squared, but mine (which relied more on last year's regional and Games results) were definitely better when looking at the entire field. The same seems to be holding true with week 2, although I'm not totally finished up with that analysis. Either way, it's clear that the Open is not that great of a predictor for Regionals.

      However, I think it's really tricky to try to pin that on one factor, such as fudging of scores. Like you mentioned, there are several other factors, such as much heavier programming at Regionals, injuries and the effects of fatigue. I'd personally like to see the Open, Regionals and Games be a bit more similar so we didn't have athletes making it to the next round who had glaring weaknesses that simply weren't tested. Almost every year, we'll see some athletes at the Games who really struggle on the long run/swim/bike - when you don't test that in your qualifying stages, you're bound to have some people qualify who struggle there. I know there are logistical problems with doing that, but in an ideal world, I think it would be nice for the three competitions to be a bit more similar in terms of loading, time domains, etc.

      Are there some athletes who probably fudged their scores or shorted reps in the Open in order to make Regionals? Surely there are. But I think it's hard to make the determination of who they are simply based on regional results. Chelsea probably has a better idea of who those folks might be from seeing them up close and personal - I think that's probably the only way to really get a sense for that.

      Delete
  2. My apologies, it's not "Open Rank" for Column 3, it's "Expected Regional Rank" which = Open Placement - High Placed Athletes Who Did Not Attend Regionals as an Individual.

    ReplyDelete