How Our College Basketball Model Works (2024–25 Season)

Our full college basketball model is live again, and while it’s a little later than last year, it should also be a little better, both in terms of what we’re measuring and how we’re measuring it. The outputs include these:

Here’s how it all works, along with some commentary. Much of this is copied and pasted from last year’s explanation. Where there’s something new this year, I’ve made sure to explain what’s new and why it’s here. I’ve bolded those sections. The big thing is the NIT opt-outs.


How We Simulate the Games

Our model’s starting point is kenpom. We are unaware of any better system grading the current quality of individual college basketball teams.

To simulate games, we assign win probabilities based on the gap between each team’s adjusted efficiency margin on kenpom, with a standard adjustment baked in to account for home-court advantage. We don’t account for tempo. We don’t account for some home courts being more advantageous than others. If a team is playing at an arena in their home city but it is not a home court of theirs (e.g., when Vanderbilt plays at Bridgestone Arena during the SEC Tournament), they do not receive home court advantage.

Once we have those win probabilities, we generate a random number for each game. Which side of the win probability the random number lands on determines who wins the game in that simulation. How far the random number is from the win probability determines the game’s margin. These are calibrated to roughly match kenpom’s probabilities, but they do not exactly match kenpom’s probabilities.

We run these simulations “lukewarm,” which is to say this: We don’t keep the kenpom ratings static as the simulations go along. If a team blows out an opponent in one game in our simulation, they are expected to be better in their next game, just as it goes in the real world. We also, though, don’t run our simulations fully “hot.” When a team blows out an opponent in the real world, their kenpom rating rises more than it rises when the same thing happens in our model. Our model’s kenpom proxy reacts cautiously compared to the real kenpom.

Why do we run the simulations lukewarm, rather than hot? We tried running hot simulations in 2022 during the conference tournaments, NCAA Tournament, and NIT. That process gave us misleading long-term probabilities. Kenpom is rare in that it is a good enough model that its ratings hold up well over time. Teams do move from their kenpom ratings, but they tend to move around within a certain window. To use a stats term, it’s not a random walk. When running our simulations hot—letting the walk be fully random—the window of outcomes was unrealistically large.

To calibrate our lukewarm simulations, we started with the realistic window of outcomes and worked backwards. We looked at how far teams move around in kenpom over a given period of time and then turned down the heat in our model until our longterm ratings movement was in line with that real world window.

One other thing to note regarding our kenpom approximating: Unlike the real kenpom, our simulated kenpom doesn’t worry about what teams’ opponents do in other games. It functions like an elo system, not as anything Bayesian. Only a team’s own games matter to that team.

When the regular season games are completed, we slot teams into their conference tournaments based on their conference record. However, we don’t account for every conference tiebreaker. We leave those random unless they’re both clear and of high consequence to NCAA Tournament and NIT Bracketology.


Selection Process

This has changed a bit this year, but the framework remains the same. We still begin with four formulas based on the ratings systems included on the NCAA’s team sheets: One formula for NCAA Tournament selection, one for NCAA Tournament seeding, one for NIT selection, and one for NIT seeding. We then enhance the NCAA Tournament selection and seeding formulas with data beyond those seven ratings systems.

Before we can use the ratings systems’ rankings, we need to approximate where those rankings will land on Selection Sunday. We already explained how we do this for kenpom. For the others…

To simulate how teams’ NET ratings change over time, we again use an elo-esque adjustment system. We start with the team’s current ranking, project that onto a normalized curve to assign it a score, and then tally all the team’s kenpom movements over the simulated games and add or subtract our sum to that score.

To simulate how teams’ KPI ratings change over time, we again rely on an elo-esque approach. Unlike our NET and kenpom approach (and unlike the real KPI), we don’t worry about margin here. For each simulated game, we create a variable labeling how surprising a result is based on the current KPI’s of the teams involved. (We do not change KPI’s between games. Those are run static, not hot or lukewarm.) If a team wins, their variable is positive. If a team loses, it’s negative. We then tally all those adjustments and add or subtract them from each team’s current KPI score, again leading us to a final ranking.

We use a similar process for SOR, except instead of worrying about incoming SOR, we use the current BPI ratings of the teams involved. (The SOR the committee uses, which comes from ESPN, relies on BPI.)

We didn’t include BPI last year, and both Torvik and WAB are new to the team sheet. We’re including them all this time around.

For BPI and Torvik, we use a similar approach to what we do with NET: We let our kenpom adjustments guide us. There are more accurate ways to do this, but this is the approach which, given our capacity, best combines efficiency and accuracy.

We tried calculating WAB manually—it’s a fairly simple calculation if you have the right inputs—but we found our approach to be too inaccurate to use. Instead, we accept all current WAB scores as set in stone (in reality, they can change as teams’ opponents’ Torvik ratings change), and we add our best estimates of all the single-game WAB scores from there.

(Update, 3/10: We realized yesterday that the NCAA is not using Bart Torviks calculation of WAB and is instead calculating its own, presumably based off of NET. Weve updated our model to incorporate the NCAA’s version.)

Are these perfect approaches? No, no, not at all. This is an issue with our model, especially because timing dictates that we often run our model before NET, KPI, BPI, and SOR updates are published from a night’s games. We run the model overnight. NET, KPI, BPI, and SOR update in the morning. This can lead to some “rippling” in our model, in which our model hones in its reaction to a game over two days instead of immediately reacting in accurate alignment with the real world.

Once we have each team’s final estimating ratings system rankings, we convert each one back into a score projected onto a normal distribution. This is intended to replicate how a committee member seeing raw rankings for each metric will care more about those rankings than the actual grades each system gives each team. (In short: It doesn’t help the 41st-ranked team in our model to be really close to 30th and really far from 50th, if the pack works that way.) Once we have these scores, we convert them into the base (or entirety) of our four selection and seeding formulas. Those bases are comprised as follows…at least as a starting point:

  • NCAAT Selection Score: 25% NET, 23.5% KPI, 25% SOR, 23.5% WAB, 2% kenpom, 0.5% BPI, 0.5% Torvik
  • NCAAT Seeding Score: 10% NET, 25% KPI, 18% SOR, 18% WAB, 20% kenpom, 5% BPI, 5% Torvik
  • NIT Selection Score: 30% NET, 14% KPI, 14% SOR, 14% WAB, 20% kenpom, 4% BPI, 4% Torvik
  • NIT Seeding Score: 30% NET, 13% KPI, 13% SOR, 13% WAB, 21% kenpom, 5% BPI, 5% Torvik

These formulas are a little different from last year’s. They consider all ratings systems, instead of just four. They’re still based on historic precedent, but there’s some guesswork thrown in, with WAB and Torvik new to the team sheet. Our goal was to approximate how the committees weight NET, how much a team’s accomplished, and how good a team is in the current moment.

These formulas are not supposed to replicate the exact thinking of the committee. We don’t think any committee members look at kenpom and say “I’ll make that 2% of my evaluation” or look at NET and say “that gets 30%.” We find, though, that committee thinking on each front generally aligns with this sort of breakdown. They might not be basing their decisions off of SOR and NET in this proportion, but whatever criteria they use is roughly in line with a 25% NET/25% SOR approach.

From these bases, we make a few randomization adjustments. The first is the introduction of a random number which “wiggles” each team’s four scores, all in the same magnitude and direction. This is a broad uncertainty variable, one which stands in for how right and wrong we expect our formulas to be come Selection Sunday. The second is new this year: We make every one of the seven ratings systems’ importances fluctuate by somewhere between 0% and 20%. That’s percent, not percentage points, and it works in both directions, which means: While we start our NCAA Tournament selection formula with 25 parts NET, in reality it wiggles between 20 parts NET and 30 parts NET. Kenpom, in the same formula, wiggles between 1.6 parts and 2.4 parts.

The first uncertainty adjustment reflects how different committees will view different individual teams differently. The second reflects how different committees will differently weight different aspects of a team’s team sheet. It creates correlations, accounting for how some years, something like NET matters more than it does in the average season.

For the NIT formulas, we make no further adjustments. We’ve removed the pseudo-requirement that at-large selections have an overall record of .500 or better. We also replaced the varying NIT selection formula from last year with the more fluid approach described in the last two paragraphs.

For the NCAA Tournament seeding formula, we make two small adjustments: First, we give some weight to the AP Poll released six days before Selection Sunday, and we approximate where that AP Poll will land through a very haphazard system which rewards teams for wins and losses in equal proportion no matter who they play. Second, we punish mid-majors (except Gonzaga) and low-majors, whose accomplishments tend to not be taken as seriously as those of their Power Five counterparts.

For the NCAA Tournament selection formula, we make plenty of adjustments:

  • High-majors (Power Five plus Gonzaga) receive a moderate boost. Mid-majors (AAC, A-10, MWC, WCC besides Gonzaga) receive a small boost.
  • Nonconference strength of schedule ratings (measured the way it shows up on the team sheets) elicit punishment if they come in at 340th or worse. This is heavyhanded, and that is how it seems to work in the real world. Also: We don’t simulate NET NCSOS changing over time. If we were launching this model earlier in the year, like we want to, we would simulate that, but by now the rating is mostly final.
  • Teams receive a boost for recording five Q1 wins, and another for recording six. Teams receive a punishment for playing no Q1 games. Teams receive a punishment if their Q1 win percentage is below .250, and another if it’s below .150.
  • Teams receive a boost if they have a combined Q1/Q2 win percentage of .450 or better. Teams receive a boost if they have ten or more wins across Q1 and Q2, combined.
  • Teams who went undefeated against Q2, Q3, and Q4 competition receive a boost. Teams who lost four or more games to Q3 and Q4 competition receive a punishment. This Q3/Q4 loss punishment is new this year. It’s small—our model ended up doing very well last year, even without it—but comparisons to bracketologists with better track records than ours implied we needed it.
  • Our other new marker: Teams with two or more Q1A wins get rewarded. Teams with no Q1A wins get punished. This came about for the same reason as the Q3/Q4 loss adjustment: We were seeing it matter to people who make better bracketology than we do.


Bracketology

Our NCAA Tournament bracketology is simple. We line the teams up by median selection score, take the best available at-large candidates, then line the whole field up by median seed score, slotting teams in from 1 to 68.

Our NIT Bracketology is more complicated.

We start with the median NCAAT/NIT cut line, making that the upper bound for our potential NIT field. If Nebraska is the last team into the NCAA Tournament but they only have the 46th-best NCAAT Selection Score, and we project the cut line to fall after the 45th team? Nebraska is eligible for our NIT bracket. This is how bid thieves work.

Then, we account for the College Basketball Crown, and for NIT exempt bids. For bracketology purposes, we assume no one will opt out of either of those tournaments. We adjust for the opt-out potential in our probabilities, but we don’t in our bracketology.

The College Basketball Crown is contractually entitled to the two “best” teams (as gauged by NET ranking) from each of the Big 12, Big Ten, and Big East. So, we remove those six teams (for bracketology, we go by median final NET ranking in our simulations) from our NIT field. The NIT grants exempt bids to the top available team (as gauged by KNIT, an average of the seven team sheet rankings) from each of the twelve best conferences, as measured by kenpom. We’re late enough in the season that we don’t need to include any variability in which conferences those twelve will be. It would be very unexpected if the SoCon passed the Big West. The NIT also grants exempt bids to two more teams each from the ACC and SEC. These sixteen teams get top-4 seeds and first-round home games. We use median final KNIT score to assign these teams.

After this, we account for NIT automatic bids, which are awarded to teams who win their regular season conference championship and finish the regular season with a KNIT score of 125 or better. Here, we’re late enough in the year where we can mostly manually lock in whether a team will have a KNIT score of 125 or better on the day after their regular season ends. In the absence of locking teams into or out of that status, our model goes by final KNIT score in each simulation, which isn’t ideal.

For prospective automatic bids yet to finish their conference tournament, our bracketology includes those with a 50% chance or better of landing in the NIT.

In NCAAT auto-bid-only conferences, or conferences like the MVC and Big West which are close to that status, our bracketology includes the likeliest total number of teams from each league, assigning those spots to the teams likeliest to receive them. So: If the MVC is likeliest to send one team to the NIT, but Bradley is NIT-likelier than Drake, Bradley gets the spot. If Conference USA is also likeliest to send one team, but Liberty—despite being the conference tournament favorite—is the likeliest NIT team, Liberty gets Conference USA’s spot. This is a new approach this year aimed at providing the most realistic bracket possible. In the past, we would occasionally end up with every conference tournament contender in a league all in the NIT. This proved distracting for the masses.

Last, we look at the median bottom NIT cut line, the one we get after accounting for the Crown and for opt-outs through the methodology described in the next section. If there are more teams within the cut line bounds than we have spots available, we make a note of who these teams are at the bottom of the bracketology. If there are fewer teams, we fill the bracket in with automatic bid contenders, going from likeliest (closest to 50%) to least likely (closest to 0%).

To build the bracket, we place any teams in the NCAA Tournament’s First Four Out (per our process) on the 1-seed line, then continue according to the seeding formula nad the rule-like bracketing principles.


Opt-Outs

We wish opt-outs weren’t part of the NIT process, but last year’s were so significant that in the interests of reflecting accurate NIT probabilities for teams beyond the Next Four Out, we’ve decided to include them in our simulations.

Our model accounts for seven different kinds of opt-outs. The first four are eternal. The last three are specific to this year, which will hopefully be the only year the Crown exists.

The first kind of opt-out is boring: A team doesn’t want to play. Some would tie this to program prestige, but Villanova’s played in the NIT the last two years, and Villanova’s won two NCAA Tournaments as recently as anybody. So, we tie it only to power conference status. Across the board, we have a random variable which creates a 0% to 20% chance a power conference team will decline any non-NCAAT postseason invite, and another which creates a 0% to 2.5% chance a mid-major or low-major will decline. Put otherwise: We create a baseline probability within each simulation for high-majors, and another for everybody else. The high-major one averages 10%. The low-major one averages 1.25%. In an average simulation, we see roughly two high-major opt-outs and zero mid-major or low-major opt-outs.

The second kind is familiar: A team fired its coach. We’ve separated teams into two groups for this. Those who’ve outright fired their coach are twenty percentage points less likely to accept a non-NCAAT postseason invite. Those whose coach retired are ten percentage points less likely to accept a non-NCAAT postseason invite. The same is true for those whose coach was listed on the hot seat in Jeff Borzello’s February coaching carousel article for ESPN. In effect, we treated those coaches as each 50% likely to lose their job.

The third is new this year: In the Big Ten and ACC, a team can miss its conference tournament, creating a whole week off between the end of their regular season and Selection Sunday. It’s hard to keep teams together these days after the season ends. So, whoever misses those two conference tournaments will be treated the same as programs who fire their head coach. A twenty-percentage point deduction in probability of accepting a bid.

The fourth is what we saw a lot of last year, with the Pac-12: The team is bad. Sub-.500 teams receive an additional 25-percentage point deduction to their probability of accepting a bid.

The fifth is teams who actually play in the College Basketball Crown.

We’re not worried right now about teams from any conference besides the Big 12, Big Ten, or Big East choosing the Crown over the NIT. We don’t know why they would, and if they do, it’s baked into our general approach to opt-outs. We do, however, expect the Crown to keep asking teams after others turn the Crown down. We expect the Crown to ask the first two teams from each of those three conferences, possibly receive “No” as an answer, then ask every other team from the league, in order.

We’re highly skeptical any coach will want his team to play in a tournament which starts 15 full days after Selection Sunday and seven days after the transfer portal opens, especially when that tournament requires teams to travel all the way to Las Vegas, which sounds exciting unless you’ve attended the Pac-12 Tournament or any number of Vegas-based MTE’s. Those arenas are empty. Those tournaments are sad.

Still, we’re giving the Crown a shot, in part because we don’t know how binding the Fox Sports contracts are, or how much athletic directors—who are often TV people and not basketball people—control the process at different schools. When the Crown comes calling in our model, we take teams’ general opt-out probability and reduce it by somewhere between 25 and 75 percentage points. This variable, like the general opt-out likelihood, is the same across all schools but varies by simulation, averaging 50 percentage points. It stands in for how receptive the whole environment is to the Crown.

If, in our model, a team says yes to the Crown, they do not play in the NIT.

The sixth is teams who decline the Crown and are then contractually barred from playing in the NIT.

Reporting and/or rumor holds that teams who decline the Crown’s two automatic bids are not allowed to play in the NIT, per the Fox Sports contracts. We don’t want to put a 100% label on anything—I’d personally place estimate the probability that the Crown doesn’t fill its own bracket at greater than 0%—but to acknowledge the reality of the situation: If a team is legitimately in line for one of those Crown auto-bids—if they have one of the two best NET’s of the non-NCAAT teams in their conference—and they decline the Crown, their NIT participation probability drops by 95 percentage points. In most simulations—more than 95%, because of the presence of other variables—this keeps them entirely out of the NIT.

The seventh is teams who decline the Crown and may or may not be contractually barred from playing in the NIT.

What we don’t know is what happens if more than one team from a conference declines the Crown. If the Crown invites Villanova and Xavier from the Big East, and Villanova declines, presumably the Crown will proceed to the next team in line, whom we’ll call Butler. If Butler declines, are they contractually barred from playing in the NIT? We don’t know. For teams in this situation, we’ve reduced their NIT acceptance probability by 10 percentage points. (If Butler specifically ends up in this situation, is threatened by Fox Sports, and wants to risk a Fox Sports lawsuit, we will make as many fundraising calls for legal fee help as they ask us to make. You know where to find me, Butler.)

Overall, our approach here is to create a dynamic opt-out environment which can change a lot in each simulation. Through this, we hope to give realistic NIT probabilities, like how we currently have FAU 8% likely to find their way to the NIT in the end.

Still, this is pretty uncertain stuff. We don’t know what’s going to happen. God willing, we’ll be back in the pre-2023 situation soon, the situation where everyone just accepts their NIT invitation and the Crown is relegated to CBI-like status (or—more likely—ceases to exist).

(Update, 3/10: Weve changed this process to one where there is a blanket 95% chance the initial automatic bids are not allowed to play in the NIT, and a blanket 10% chance secondary automatic bids—teams asked after any initial automatic bids decline—are not allowed to play in the NIT. We are also tracking public statements from coaches and universities and adjusting their invitation acceptance probabilities accordingly. If a coach indicates their team will accept a bid, we change their original probability to 99%. If there is a second public confirmation—i.e., it is clear the coach is not getting overruled by their athletic department—we increase that to 100%.)


Vulnerabilities, Expectations

A few caveats, some of which we’ve covered already:

  • Our game simulations use a watered-down, blurry approximation of kenpom. They do not use the real thing.
  • We (mostly) don’t use tiebreakers for conference tournament seeding. We (mostly) leave those random.
  • Our NET, BPI, and Torvik approximations are imperfect. Our WAB, KPI, and SOR approximations are very hazy, and sometimes a day late. Our AP Poll approximation is exactly as serious and reasonable as the AP Poll itself.
  • Our model’s simulations of the NCAA Tournament and NIT don’t account for bracketing principles.
  • Our model’s track record is mixed. Last year, it performed excellently, better than something like 75% of brackets included on Bracket Matrix. Before that, it did terribly. It was almost the exact same model last year as in 2023, so I don’t know why it suddenly worked so well. I am very scared of a return to earth.

Overall, I’d say to treat this as a semi-professional model, and to reference other data points if you want a robust view of a certain team’s chances.

**

The Barking Crow's resident numbers man. NIT Bracketology, college football forecasting, and things of that nature. Fields inquiries on Twitter: @joestunardi.
Posts created 3401

Leave a Reply

Your email address will not be published. Required fields are marked *

Begin typing your search term above and press enter to search. Press ESC to cancel.