Saturday, January 24, 2015

CFG Analysis Podcast - Pilot Episode

I'm trying something new here at CFG Analysis.  In addition to the articles I'm posting, I'm adding a podcast to complement what I do with my written work.  These podcasts are going to be much more informal than what I typically do on the blog, and eventually I hope to chat with some athletes, coaches and everyday CrossFitters.

Today's Pilot Episode is just me, discussing the background on this site, how it came to be and who exactly I am.

Stay tuned for our next podcast here in a couple of weeks, and bear with me as I get the hang of this podcast thing.

The Pilot Podcast and all future episodes can be found here: http://cfganalysis.libsyn.com  or below.  I'm currently working on getting it set up on iTunes.

Sunday, January 4, 2015

How Many Years Do CrossFit Athletes Last in the Open?

What you need to know from this post:
  • About 50% of athletes who compete in the Open on year will continue to the following season.
  • Athletes who have competed for multiple years have a higher likelihood of returning the next season than athletes who are competing in their first season.
  • Based on data from the past four years, approximately 20% of athletes who start competing in the Open will still be competing in their fourth season. I estimate that approximately 10% will still be competing in their seventh season.
  • The higher an athlete ranks in the Open, the higher the probability that they will return the next season.
As I mentioned a few weeks ago, I will be missing the Open for the first time this season.  I had arthroscopic hip surgery to repair a torn labrum.  Without getting into too much detail, the injury was a chronic wear-and-tear type of injury that wasn't a direct result of CrossFit, per se, but rather the fact that I had been so active for the past 10-15 years.  I had a hip impingement that I was born with that put me at risk for this type of injury.  I hope to return to CrossFit eventually, but I won't be ready to compete by late February.

So although my injury isn't directly attributable to CrossFit, it's forced me to face the fact that competing and training at the level I had been is not always easy to sustain.  For me, it was my hip.  For others, it is a shoulder or a knee.  This is true in any sport, and CrossFit is not immune.  This is not a commentary on whether CrossFit as a training methodology is dangerous.  That's a third rail I don't intend to touch.  My point is simply that injuries are inevitable when competing in any serious sport.

With an assist from my wife, who came up with the idea for this post, I decided to try to answer the the question: how long do CrossFit athletes tend to compete?  I'm not talking just about the Rich Fronings of the world, but the everyday athlete who signs up for the Open with no hope of even sniffing Regionals.  We see the Open growing in size each year, but that doesn't necessarily mean athletes are continuing on for multiple years; we could just be replacing the vast majority of athletes one year with an even bigger crop of newbies.

Of course there are multiple reasons an athlete won't compete: injury, lack of interest, disappointment from poor performance in a previous season, work commitments, etc.  With the data I have available, I can't identify which factors are most important, but I can get a pretty good idea of the rate at which athletes are dropping off.

Using the Open data from 2011-2014, I looked at how many athletes continued on each year and what attributes about the athlete (prior Open experience, prior Open finish, age, weight, height) may have an influence on their likelihood to continue.  Because I only have athlete names without any other identifier, this analysis is limited to "truly unique names," which are those that never appeared more than once in any year.  I also focused my analysis on men, because women's names are frequently changed due to marriage or divorce, making it hard to track which athletes truly dropped out and which simply changed name.*

The way I am defining it, an athlete successfully survives from one year to the next if he completes all the Open events in both years.  Failing to complete all the Open events in a given year is counted the same as not competing.  Keep that in mind, since typically about 30% of the Open field from week 1 is gone by the end of the Open.  For purposes of this analysis, those athletes never even competed.  I don't care that HQ still got their money.

Finally, in this study, once you're out, you're out.  If an athlete skips a year, I ignore whether or not they returned the following year.  This makes things much cleaner for the analysis.  In case you are wondering, only about 15% of athletes skip a year and return in a subsequent year (I hope to be one of those 15%!).

OK, so let's get to the results.  First, the simplest way to look at this is to evaluate the athletes that started in 2011 and see how many were left in each subsequent year.  Let's take a look at those results below.


You can see that nearly 60% of athletes survived to Year 2 and nearly 30% were after Year 4.  The issue with this analysis is that it ignores the athletes who started after 2011, which is a huge chunk of the current athlete pool.  The type of athlete who is competing today may be characteristically different than those who started in 2011, and we want to capture that.

The next chart estimates a survival curve for today's athlete population using only the most recent information.  To get the survival rate from Year 1 to Year 2, I looked at athletes who first competed in 2013 and see how many returned in 2014.  To get the survival rate from Year 2 to Year 3, I looked at athletes who competed in 2012 and 2013, then found out how many of those returned in 2014.  To get the survival rate from Year 3 to Year 4, I looked at athletes who competed in 2011-2013, then found out how many of those returned in 2014.

To get the cumulative survival rate for year 3, I multiplied the survival rates for years 1-2 and years 2-3.  To get the cumulative survival rate for year 4, I multiplied the survival rates for years 1-2, years 2-3 and years 3-4.  I then took it one step further, estimating survival rates in years 5-7 based on the rates for the first 4 years.  These are obviously estimates, as we don't have enough data to know the true likelihoods beyond year 4.


Here we see the survival rates are much lower than we previously observed.  About 50% of athletes remain after year 2, about 20% are left after year 4 and I'm estimating only about 10% will be left after year 7.  Clearly the majority of athletes don't make it too many years in this sport, but there is still a decent chunk of the population that sticks with it for years.

However, what's not obvious in the chart is that the chances that an athlete continues in the following year increases with each subsequent year of participation.  Below are the year-to-year survival rates:
  • Year 1-2: 47%
  • Year 2-3: 61%
  • Year 3-4: 72%
This is good news in my opinion.  We see that athletes who stick with it beyond the initial year are not likely to "burn out" the longer they compete.  Once an athlete is sufficiently invested, they are pretty likely to keep at it.**

In total, when you combine first-, second- and third-year competitors, about 52% of the field returns in the following season.  One offshoot of this is that if we consider how fast the Open has been expanding, we know that it must be largely made up of first-time competitors.  In order to maintain the size of the Open field from one year to the next, we need a lot of new athletes each season.  If there were 200,000 total athletes last season, we likely need about 100,000 new athletes to enter the field in 2015 simply to maintain the same size field as before.

The last piece of this analysis was to try to identify other factors (aside from number of years of prior experience) that made certain athletes more or less likely to continue in subsequent years.  For this, I limited the data again to only 2013 and 2014, then further limited the data to athletes who submitted a height and weight (I used the 2013 height/weight for all athletes).

Using a logistic regression model, I looked to see if 2013 percentile rank, age, height or weight had statistically significant impact on the likelihood of returning in 2014.  As it turned out, all but height had a statistically significant impact (p-value less than .01 for percentile rank, age and weight).  However, in my opinion, we can basically ignore weight because the predicted probability of returning did not vary a whole lot (only about 4% higher probability at 180 lbs. vs. 220 lbs.).  

The big key was the 2013 percentile rank.  As you might expect, athletes who finished near the top of the rankings had a much higher likelihood of returning.  Holding all other items at their mean, the predicted probability of returning was 74% for an athlete finishing in the 1st percentile, compared with 55% at the 50th percentile and 35% at the 99th percentile***.  If you aren't convinced by those somewhat opaque predictions from the logistic regression, the chart below shows the observed percentage of athletes who returned, by 2013 percentile rank.




As you can see, the pattern is very evident.  I'm not sure this will come as a surprise to anyone, but it's always nice to see your intuition confirmed in the data.

Age was an interesting factor.  There are two things going on here:
  1. Without controlling for the 2013 percentile rank, it appeared that age had basically no effect.  
  2. Older athletes tend to have worse rankings than younger athletes in general. As we just showed, athletes with lower rankings have lower persistency.
What happens here is that the logistic regression showed that all other things being equal, older athletes actually have a higher likelihood of returning than younger athletes. If we hold the other items at their mean, the predicted probability of returning was 51% for a 20-year-old and 65% for a 50-year-old.

Of course, the big question moving forward is how things will change with the changes made to the Open in 2015.  The addition of a scaled division is likely to siphon off some athletes who had previously competed in the Rx'd division, and it's possible that with only 20 Regional invitations in each region (as compared to 48 in 2013-2014 and 60 in 2011-2012), some additional athletes will drop out.  I also have to believe that the overall participation in the Open will not continue to expand the way it has since 2011, where it has roughly doubled each season.

There is no way to know the answers to these questions right now, but understanding what has happened in the past will certainly help us understand the impact of these changes in the future.

[Thanks a lot to Andrew Havko, Michael Girdley and Jeff King for pulling this data for me and/or making it publicly available]

*It appears that overall, the survival rate for women is very similar to that of men.  Initially, it appears about 4% lower, but using some very rough data for the marriage and divorce rates, that discrepancy could very easily be attributable entirely to name changes.

**For my estimates beyond year 4, I assumed this would continue to flatten out, so I used 77% for year 4-5, 79% for year 5-6 and 81% for year 6-7.

***These predicted probabilities are all probably about 4% too high.  That's because the subset of athletes used for the logistic regression was limited to those that submitted a legitimate height and weight.  In general, these athletes have slightly higher finishes and are slightly more likely to return than the average athlete.