The NCAA Formula

Brett Whitehead
March 14, 2012

R
ay and I have been best friends for a very long time, specifically since kindergarten. In that time, we have gone two different directions professionally. Ray went the way of math, where as I went the way of history and social science. Whereas it could be suspected that the two of us would be in conflict due to our differences in professions, we often combine our talents for the mutual benefit of ourselves and the greater good of society. Like Voltron.

Ray and I are sharing duties on this article because this article could be the start of something amazing. Years from now, this article could go down with the Sabermetrics movement, the kids from the Bringing Down the House book, or Biff Tannen in Back to the Future 2. Yes, it is possible that we may have cracked the code to the NCAA tournament.

For purposes of this article, we will do what we do best. Ray will display the numbers, while I will advocate for its success. Youíre welcome, America.

1. The NCAA Tournament is Stupid.

There is no more frivolous exercise than filling out an NCAA bracket. In my 30ish years of existence, the only time I won money on an NCAA pool was when I played with two other people, and even then I had to sweat out a Louisville Elite Eight run after I secretly spent the money before I was even close to winning. I usually tend to follow a few common losing patterns when filing out my bracket on my own.
Step one: I watch zero college basketball all year. I donít even follow it on Sportscenter, even though I watch that approximately 1.75 times a day. Step two: I lean on biases based on familiarity. I always pick a number one seed to win, because Iím afraid I will look stupid otherwise. If UCLA is in, I always pick them to win. Duke always goes far because I see them in highlights a lot. I had a friend who went to Michigan State so they always get into the final four. Gonzaga is ďpluckyĒ so they always get to the Elite Eight. Any future draft picks with a lot of hype are guaranteed spots in the final four. Step three: I lean on crude statistical analysis with no justification. If I have too many favorites going forward, I randomly pick upsets. If the final four is all one seeds, I eliminate a one seed seemingly at random. Step four: I look at my bracket in its totality and convince myself that this is the year I will win.

Needless to say that never works. I had one other plan in college, which was to get super drunk and fill out a bracket, but that also failed. Albeit it did pick an Indiana run in 2001 that otherwise took the nation by surprise.

2. Science is Foolproof. Fame is Forever.

The allure of the foolproof NCAA formula is obvious. On a grand scale, if you figure out a NCAA formula, you will win a ton of money and look like a genius. On a smaller scale, it makes an otherwise irrelevant sporting event incredibly climactic for Ray and I. Normally, winning an NCAA tournament is a couple hundred bucks and a pat on the back. But if Ray and I use this formula to win even one NCAA tournament bracket, well, you will never hear the end of it. Thatís what is so exciting. Itís more than just winning your friendís money; itís being the Bill James of college basketball.

3. The Formula

Having a model or method to approach a problem, whether the model is accurate or not, is preferable to having no model at all. As Brett discussed above, his approach to NCAA tournament picks was essentially arbitrary. He had some rules that may constitute a model, but where he says “random” he doesn’t mean truly random (which may not be a bad way to go in the long run anyway) and, aside from the one that involved getting hammered, he doesn’t have a principled way to apply his method. Because he had no model, he could not adjust it year-to-year to develop a better method. Instead, what I wanted was some model that I could apply in an unbiased fashion, and subsequently amend it.

If you search the Internet, there are plenty of prediction models out there [see https://www.google.com/search?q=ncaa+picks+model], most of which will give you some probability of a certain team winning. That’s great and the right way to do things if you want to be scientifically accurate and cover your ass at the same time; however, I want to apply a few constraints to my model, all of which stem from laziness and apathy. It must:

  1. be based on easily obtainable data, like espn.com for example
  2. be fast and easy to apply, because I don’t want to spend more than an hour on this
  3. clearly pick a winner of each match-up and not give probabilities, because I only want to fill out one bracket

My original formula fulfilled all of those goals, but it wasn’t very good. (Well, I don’t really know if it was any good. It was good one year, bad one year, and nonexistent last year because I totally forgot about the tournament. So I guess it’s a draw.) My new model also fulfills those goals, but I don’t know if it’s any good yet. One thing the new model has going for it is that it will also provide validation for fantasy sports scoring, providing a convenient out: If it fails, hey, it’s not my fault, blame fantasy sports!

4. The Formula wasnít Perfect.

Ray first approached me with the formula the year that Kansas beat Memphis, due to Memphisí poor free throw shooting. I was captivated by the idea of (1) winning an NCAA pool, (2) being famous forever (duh), and (3) not having to fill one out subjectively with no intellectual basis. Ray also found that his formula worked to an extent that year. Had Memphis not choked away the game in the final few minutes, Ray would have won his pool and the NCAA formula would have had its first victim. Unfortunately, Mario Chalmers intervened and it was not to be.

The next year was a hiccup for the formula. Despite correctly predicting a shockingly high amount of first round games, the predicted winner Gonzaga was ousted in the Sweet 16, thus eliminating the formulaís bracket entirely. At that point, we thought of retaining the objectivity of the formula, but allowing for a subjective veto system that would allow us to intervene when the formulaís pick was clearly flawed.

A. The actual (old) formula

The old formula was simple, it consisted of combining only two statistics: average points scored per game and average points against per game. The score of a game was then predicted as

sa = 0.5(Pfa + Pab)
sb = 0.5(Pfb + Paa)

where sa and sb give the predicted point totals of teams a and b respectively, Pfa and Pfb are the average points scored for teams a and b, and Paa and Pab are the average points against teams a and b respectively. The winner of the match-up is simply chosen as the greater of the above-predicted scores. That’s it!

This formula did great three years ago as Brett discussed above, it placed in the 87th percentile of all Yahoo! brackets. The year after it basically tanked at the 77th percentile. One issue Brett mentioned to me is that it doesn’t account for strength of schedule at all. There may be a way to adjust a team’s statistics based on strength of schedule, but that starts to fall afoul of Rule #1.

5. The Formula is now Perfect.

As mentioned above, this new formula is based on the category method of fantasy sports scoring. In baseball, e.g., you can break down a playerís performance into hits, runs batted in, strikeouts, walks, etc. You can then add up your teamís total statistics in each category, and compare that to your opposing teams total. Teams get a point for having the better team stats in a category, and whoever has the most points wins.

I based this year’s formula on the following statistics (all in per game averages): points, rebounds, assists, steals, blocks, turnovers, field goal percentage, free throw percentage, and three-point percentage. Why? Follow this link: http://espn.go.com/mens-college-basketball/team/stats/_/id/399/albany-great-danes. See the first set of stats that are given? That’s why. I could cut and paste one line of numbers into Excel and be on my way.

Scores for each team are then given as described above: each category is compared, and whichever team has the better stat gets a point. For everything except turnovers, that means who ever has the higher stat. Turnovers are bad (I think) so whoever has fewer turnovers gets a point. For any stat that is tied, both teams get a point. If there is a tie in total points (which happened in two or three cases) then the higher seed gets the nod. If the formula predicts something ridiculous (like, I don’t know, putting NC Asheville over Syracuse), then we can apply historical perspective, like the fact that a 16 seed has never beaten a 1 seed. (Not that that was necessary. Not at all.)


Figure 1: The correct NCAA South bracket.
Figure 1: The correct NCAA South bracket.


Figure 1: The correct NCAA West bracket.
Figure 1: The correct NCAA West bracket.


Figure 1: The correct NCAA East bracket.
Figure 1: The correct NCAA West bracket.


Figure 1: The correct NCAA Midwest bracket.
Figure 1: The correct NCAA Midwest bracket.


Figure 1: The correct NCAA Finals bracket.
Figure 1: The correct NCAA Finals bracket.

A. Discussion

Upon taking one look at the formula bracket, it looks good, doesnít it? It predicts a fair amount of upsets, but thatís what happens in the NCAA tournament. Nothing is too obvious. It’s almost charming in handling of favorites.

It’s perfect. This year, things will be different. This year, the NCAA tournament has been cracked.

brett.whitehead@brutalhorse.com
ink splash

Jacques Dangereux, app by WildTaters

Check out The Ringer by Camp Dracula,
available now.

The Ringer, album by Camp Dracula