Our 2022 Bracketology Post-Mortem

It’s been more than a month since Selection Sunday, which means it’s probably time to get around to doing our Bracketology Model post-mortem.

It wasn’t a great year for our model, though we did make progress, as we always do. Our final NCAA Tournament bracket ended up being among the worst on Bracket Matrix yet again this year, our NIT bracketology missed more teams than expected, and yet…we made progress.

Bracketology is a lot harder than predicting the College Football Playoff field, something we’ve found a fairly easy way to do well on this site. Part of this is the nature of the beast: In CFP prognosticating, a model has to distinguish between teams already in the sport’s stratosphere, where gaps are naturally larger. With bracketology, the teams that matter most are in the thick of the pack, where there’s sometimes not much difference at all between one team and another, other than personal preferences of key decisionmakers. Part of it too is the weekly CFP ranking updates. We get to see the scoreboard as the season goes on in football, whereas in basketball, we’re largely blind. Another part, though, and one that’s perhaps indicative of a core problem with our approach, is that the way we look at it, the CFP process has more history than that of the selection committee.

Since we first started building our models for this, we’ve leaned heavily on recent history, given how much has changed about the way information is presented to the selection committee over the last five years. NET is still so new. Quadrants are still so new. Rankings of the non-NET systems chosen, like KPI and SOR, are presented more explicitly than they once were. Our model annually performs rather well in back-testing because we’re working with a narrow history and building our model off of that. It then doesn’t perform well on Selection Sunday because every season is unique, and the variables relevant to the selection divide differ so significantly that what was crucial in 2018, 2019, and 2021 is irrelevant in 2022. What’s the solution here? It’s hard to say. Bracketology may just not be well-suited to a formulaic process, especially right now, when the variables available to the committee (and it’s a data-heavy process—a lot of variables are presented to the committee directly, especially compared to football where it’s all rather simple and straightforward: who beat whom)—have changed so much so recently. Other formulaic approaches are better than ours (and others are worse), but even those don’t perform better than manual bracketologists. Perhaps a formula isn’t the way to go.

At least on Selection Sunday.

This is one crux of the thing, for us. A formula on Selection Sunday doesn’t seem very effective, but if we’re going to offer a predictive tool regarding the NCAA Tournament (and the NIT), a formulaic approach is the only way to go. You’re not going to be able to concretely and accurately say a team has even something as general as a 3-in-4 chance of making the field if you don’t use a formula or scoring system of some sort. That’s just too much computing for one person—there are too many other variables at play. So, we need a formula, or at least a formulaic algorithm, and whether it’s easy or not, we need to make it better.

How are we going to do that?

One big thing is looking, to the extent we can, at years prior to 2018. We need to do more extensive back-testing, and while a holdup is whether to use RPI or a NET proxy and whether to include quadrant-related variables or exclude them, the answer is probably that we need to do all of the above. See how formulas work with old variables. See how they work with our best estimate of the new variables, applied backwards. See how they work with the opposite-of-outdated part of the formula excluded. Our testing needs to be much, much more robust.

Another big thing is separating selection from seeding. One cool and generous thing a lot of the more accurate bracketologists out there do is share thoughts from their process, and something we saw a lot this year was an emphasis on how the committee’s seeding list differs from their selection list, in practice. Teams are selected based more on résumé. They’re seeded more on predictive metrics. Or so we’re told by people better at this than we are. Our formula did not reflect this at all. That is bad, but relatively easy to fix.

Still, we missed on four bubble teams this year when we were hoping to only miss on two, and the bubble was where we got most of our overall precedent. Four’s a lot. One of those, Miami (oof—hurts to say that out loud after the run they went on), ended up being a 10-seed. Comfortably in the field, our model had them out. What happened?

With Miami specifically, I’m not entirely sure yet. That’s something we need to dig into more deeply. But a possibility on the bubble in general that’s been flitting around my mind the last few years is that we need to isolate variables at different points during the week the committee meets. In other words, we can’t just do this based on where things stand on Sunday morning (or worse, Sunday afternoon, which we’ve done in the past). It probably also isn’t as simple as switching 100% to Saturday morning. We need to look at what Friday résumés look good, what Saturday résumés look good, what Tuesday résumés look good…all in conjunction. An element of our approach needs to do more of the thing PECOTA originally did with Major League Baseball (and might still do—I often scorn PECOTA in practice due to the ease of accessibility with FanGraphs and one other thing): Look for historical comparisons. What happens to teams who make runs during their conference tournaments? What happens if those runs include a big win on Saturday? What if it’s earlier in the week? What about the opposite case—teams who lose in embarrassing fashion?

Beyond that, there’s some merit to how we’ve been approaching it, and that’s why I do say that we’re improving. Our score-plus-exceptions method (in which a team would have a raw score that was then yanked around based on résumé oddities, where historically significant thresholds were passed in a specific variable) was much more in line with consensus than our past methods, in which teams only got a score (as we tried to ignore thresholds and force-fit the subjective process to a formulaic curve). Consensus is not our barometer, but if our model performs poorly, we hope it at least lines up somewhat with consensus, especially with consensus fairly wise in the bracketological sphere. In another positive, while a downside of our approach is that we were heavily weighting 2019 and 2021 in building 2022’s formula, and that’s just two years of precedent, an upside is that we just got somewhere between 33% and 50% more precedent last month (2018 was the last RPI year, so we sometimes include it and sometimes don’t in our model designs). We got a lot more information, and that will be helpful, especially since 2022 was a lot more “normal” a year, schedule-wise, than 2021 (it was nice to not have to take a wild guess on Colgate like we did in ’21).

There’s also, of course, the matter of the NIT, which is something that rose in its importance to us this year. It’s always been important, but interest was especially high this season for a variety of reasons, and while we did…fine, we didn’t excel.

One big thing with the NIT that we need to do is ask those running the thing what the format of the bracket will be. We laughed when they started rolling it out with only half the teams seeded (an approach we really like, by the way, but were wholly unprepared for), but that’s something that would’ve been extremely helpful to know about whenever the decision was made. Geography was of huge importance in predicting the eventual matchups, and we had no idea.

We also probably need to separate our NIT formula out from our NCAA Tournament formula. We’ve tried lumping them together, but it’s not a good fit. The teams on the NIT bottom bubble have different genres of résumé than those on the NCAA Tournament bubble, and our work needs to account for that. The things which separated Rutgers and Notre Dame and Dayton and Texas A&M this year were different from those which separated Vanderbilt and South Carolina and Washington State and Drake. Our process must account for those differences.

Overall, then, our offseason priorities are robustness, communication, and precision, and I don’t mean precision of output (we’d love that, but it’s unrealistic) so much as precision of approach. A system that lines each team up 1 to 358 (or however many will be in Division I next year) isn’t necessary, and it’s not representative of what these committees actually do, so why make that an output? What we need instead is a model that can look through different lenses at different places on the different S-Curves, using precedent to guide probability in each place. And when I say we need communication, that ties into the “probability” piece of this, which is the real key. It was hard this year to not be able to tell Washington State fans they were 45% likely to make the NIT. The way we presented our model made it very black-and-white, which didn’t represent what the model was even telling us in the first place. So that factors in as well.

It’s exciting and daunting at the same time to write these, because it turns into both a to-do list and a wish list. I don’t know how much we can realistically bite off this summer and fall within the workload we have. This is where our thoughts are, though, and as always, we’d welcome any of yours. Thanks, and as we say: Bark.

Our 2022 Bracketology Post-Mortem

Joe Stunardi

Leave a Reply Cancel reply