How Our Bracketology Model Performed

We told you we expected our model to be better than roughly 25% of the final bracketologies on Bracket Matrix. We also told you the model was better, relative to the field, at predicting the final bracket in December than it was in March. Here’s how it did, and here’s where we can improve upon it:

The Final Bracketology Underperformed

By Paymon Score, which is how Bracket Matrix scores the final brackets, our model was only better than 15 of the 201 other bracketologies. It tied with two. This comes out to being better than only about eight percent of the field, not 25 percent, as we expected.

We Don’t Have Data on Earlier Bracketologies

We may measure this next year, Bracket Matrix update by Bracket Matrix update: Paymon Score of every bracket in the matrix at different points throughout the year, showing what we believe to be our model’s true strength: How accurately it predicts future results. We would need to be careful to not be disingenuous with this—a lot of the bracketologies favor the “where things stand if the season ended today” approach over our predictive goal, and we don’t want to compare ourselves to them without noting that important caveat.

We Want to Make the Final Bracketology Better

The goal is to be as right as possible at every moment, and making the final bracketology stronger will necessarily make the in-season bracketologies stronger. Here are the model’s biggest misses, and what we can learn from each:

Michigan State

Our model did not have Michigan State in the field. Michigan State ended up in the field. And unlike Louisville, who was a surprise to most in the Bracket Matrix, our counterparts widely expected Sparty to get the call.

There are three things we’ll look into as we try to adjust our model to account for MSU:

The first is whether basketball brands get any additional love from the committee. It’s a subjective process, and bias plays a role, so if we can find teams who consistently outperform our model’s expectation, it would be helpful to know that.

The second is whether there are especially good wins that get overweighted relative to how ratings systems evaluate them. Michigan State did beat Illinois, Ohio State, and Michigan in the season’s final weeks, and while all of those were at home, they were still three wins over 1-seeds and a 2-seed.

The third is the role of recency. We’ve been reticent to look into this because the committee ostensibly counts every game the same, but if it plays a role, it plays a role, and for whatever it was worth, Michigan State did close the season 7-4, with the only losses coming either on the road, to Iowa, or in the Big Ten Tournament to Maryland.

Wichita State/Colorado State

Our model had Wichita State in the field in the morning, before seeing how the ratings systems reacted to Saturday’s games. It took them out in the final iteration, inserting Colorado State instead. This cost us four points of Paymon Score, which would have put us ahead of six more brackets, or an additional three percent of the field.

One thing to look into here is when our model should take the ratings systems as “final.” The committee is taking Sunday games into account in some literal senses, with automatic bids and all that, but research should be done into whether the end-of-season rating is the most predictive, or if it’s the end-of-Saturday rating, or different times for teams whose seasons end at different times. The committee’s job isn’t done in a moment like our model’s is. If we can make model reflect this accurately, it’ll be stronger.

Colgate

One of our model’s biggest misses was Colgate. We had the Raiders as a 10-seed. They wound up a 14-seed.

This was almost entirely due to NET being such an outlier on Colgate, a team that played just three opponents in the regular season due to the coronavirus. In the future, such a happening shouldn’t repeat itself, but we should take some steps to make our model prepared for an outlier like this one, perhaps by weighting median ratings more strongly than outliers.

St. Bonaventure

In another reminder of the committee not taking Sunday games into account, the Bonnies didn’t seem to get credit for beating VCU in the A-10 Tournament Championship, something our model expected to happen for the last eight or nine days of bracketology, once the Bonnies had won their semifinal.

UConn

Our model had UConn a 9-seed. The Huskies wound up a 7-seed, probably in part due to the timing of James Bouknight’s absence and how that impacted their résumé.

Injuries are something it would be nice to have some data on, because this is somewhere where the subjective approaches have a big leg up over anything purely objective, like our model.

USC/San Diego State/LSU

We overestimated each of these seedings by two seed lines. We don’t yet know why, but we’ll explore.

Florida/Oklahoma

These two, we underestimated by two seed lines. Again, we’ll explore.

A Reasonable Goal for Next Year

It’s hard to say what’s reasonable in terms of the position within the Bracket Matrix, or with how many teams we correctly place in the field, etc. There are variables outside our model for each of those—committee anomalies, the collective performance of the rest of the industry. What does constitute a reasonable goal is developing our model in the following ways:

Ratings Systems

Relying on ratings systems (NET, SOR, KPI, KenPom) as the core of the model is still probably a good idea, but we need to reverse engineer them closely enough to backtest our model across more than a few years of data. This will be especially important for in-season bracketology, as it’ll enable us to iron out some of the bumpiness of rollouts by better predicting how the systems will react to specific results and quantifying uncertainty more accurately.

Exceptions

The injury thing. The Sunday thing. The outlier-NET-ranking thing. The whatever-happened-with-Michigan-State thing. These are all oddities that impacted this year’s bracket and were not caught by our model, because they’re exceptions, not the rule, and our model, young and simplistic, deals with the rule rather than exceptions. Going forward, it needs to deal with both, and it needs to deal with things like if a team has one of the worst nonconference strengths of schedule (as happened with NC State in a recent year, I believe), or if a team hasn’t won a single game against Q1 or Q2 competition (as may have happened with…Maryland in 2018?). Incorporating these little exceptions is frustrating because there isn’t a large sample with each one. But treating them as precedent, like we do with our College Football Playoff model, may allow us to accurately account for them, ultimately allowing us to build a better model, which is always the goal.

How Our Bracketology Model Performed

Joe Stunardi

Leave a Reply Cancel reply