How Our College Football Model Works

Our College Football Playoff model is active again, and it’s the most robust it’s ever been, with more features on the way. Here’s how it works:

CFP Formula

The model’s greatest strength is its CFP Rankings formula, which predicts where the College Football Playoff committee will rank teams. We built this three years ago using past rankings, and we built it in a way where it never missed a playoff team in end-of-season backtesting over the Playoff’s first five years. It’s never missed a playoff team since, either, making it a perfect eight-for-eight on seasons and 32-for-32 on teams. That isn’t to guarantee it’ll perform perfectly—we’ll get to the margin of error—but it has a beat on what matters to the committee, and it even worked in 2020, a wacky season.

What matters to the committee, then? Well, we don’t claim that these are the metrics the committee uses (they aren’t), but we have six sets of metrics which reflect what does seem to matter to the committee. The six:

Wins/Losses/Win Percentage

This one is simple. Wins, losses, and win percentage are each factors in the equation.

Adjusted Point Differential

This one is not as simple. This is our approximation of advanced ratings of teams, Vegas odds, the dreaded eye test, etc.—the how-good-are-they pieces of the implicit equation. Speaking more exactly, it’s our measure of how thoroughly a team has beaten its opponents relative to how those opponents played against the rest of their respective schedules. Blowing out a team that always gets blown out doesn’t get you much here. Beating a team narrowly that normally gets blown out hurts you. Blowing out a team that normally gets beaten narrowly helps a lot. Is this a fair representation of how good teams are? No. But it tracks decently well with fair representations, and it seems to mirror the committee’s evaluation rather well.

One note on this: FCS opponents are treated as a uniform entity right now for these purposes, with the same average point differential (-31) every year.

Power Five/Group of Five Status

Adjusted Point Differential, described above, does mirror the committee’s evaluation rather well, but it has one significant flaw: It overranks Group of Five powers. This makes sense: Teams that are blowing out MAC bottom-feeders aren’t as impressive as teams that are blowing out Big Ten bottom-feeders, but because APD measures a team’s performance relative to the performance of the rest of their opponents’ schedules, it can cast the two in a similar light. To correct for this, we issue a standard Group of Five discount on all schools that aren’t either members of a Power Five conference or Notre Dame/BYU* (*this year—BYU has gotten the Group of Five discount in the past, but the committee didn’t seem to give it that discount last year and BYU’s schedule is Power Five-heavy, which lines up).

Is this perfectly fair? No. But neither is the committee.

Power Five Conference Champion Status

The committee values conference championships. We issue a bonus for winning one, so long as it’s the championship of a Power Five conference.

Three Best Wins/Three Worst Losses

After the first four components, we have a baseline score assigned to each team. We want to know more about résumé, though, and we want to specifically know about the fringes of the résumé, which receive disproportionate attention. So, we score each win and loss based on the quality of opponent; location of the game (home/away/neutral); and margin, with diminishing returns as margin increases (the difference between 7 and 14 is greater than the difference between 21 and 28). Each team’s three best wins are incorporated into that team’s score, as are each team’s worst three losses. If a team has fewer than three wins, or fewer than three losses, a zero goes into the relevant place or places in the formula (all wins are positive, all losses are negative).

One note on this, which rarely matters: FCS opponents are again treated as that uniform entity described above.

FPA (Forgiveness/Punishment Adjustment)

Finally, we have our sixth variable: FPA. FPA measures how much love or hate a team is getting from the committee. It’s adjusted every week following the release of rankings, and that adjustment is made in such a way as to 1) align our model’s rankings with those of the committee and 2) keep that aligned ranking still as close to historical precedent as possible. With one exception, FPA is always input as a reaction to the committee’s rankings. With one exception, FPA is zero-sum—any point of FPA given to one team must be taken away from another team. With no exceptions, FPA is always kept to the smallest absolute value possible*, meaning we put as little of it in each week as we can while still getting the rankings to align. Once it’s in there, it stays in there until it’s shown to change (or a team falls out of the rankings, in which case we wipe it clean).

(*Note: The mechanics conflict a bit when it comes to minimizing the absolute value of FPA and keeping FPA from prior weeks. From the second rankings onwards, the way we currently handle this is to have our model go through the rankings from top to bottom, lining them up with the real rankings by using FPA to have teams meet in the middle. This sometimes leads to FPA having a slightly larger absolute value than is possible, but it’s how we do it right now.)

That one exception? Let us tell you about The Kelly Bryant Rule™.

The Kelly Bryant Rule™

There’s one case in which we’ll preemptively add in FPA, and that’s the rare demonstrated case in which roughly half of Clemson’s 2017 loss to Syracuse was forgiven after Kelly Bryant left midgame. If a team’s clear first-string quarterback is hurt and the team loses, we count the loss as roughly half-forgiven, because we expect the committee to do it again. It’s a small sample, but it was so clear, and could have been so impactful, that we account for it. If the quarterback stays out after the game? Misses a lot of time? That’s different. But if the quarterback is just out for a game, or for half the game, or for another temporary and finite amount of time? We put it in there, and it stays in even as we adjust around it.

One update to this for 2022? We’re only doing this with teams who have a playoff shot. Penn State lost a game to Iowa last year in this exact situation, but Sean Clifford’s injury did not earn them the expected forgiveness from the committee.

What don’t we consider? Head-to-head results and results against common opponents are two big ones. Our evidence suggests those do matter, but only when teams are already next to each other in the rankings, which means this almost always gets covered by FPA anyway.

***

The next part of the model? Simulating the games themselves:

Movelor

Movelor (Margin-of-victory-adjusted elo with recruiting) is our rating system applied to all Division I teams, ranging from—at least this preseason—Georgia at +49.6 to Wagner at -40.4 (this implies Georgia would be a 90-point favorite over Wagner, but it also implies, more realistically, that Georgia would be a 30-point favorite over SMU who would be a 30-point favorite over Jackson State who would be a 30-point favorite over Wagner). This is a fairly standard margin-of-victory-incorporating elo system—it’s self-contained within a single season, it doesn’t look backwards, bigger wins get bigger rating increases but there are diminishing returns as margin increases. It has its weaknesses, therefore—the backward-looking piece can lead to overlooking a result that looks more or less impressive with the benefit of hindsight—but it gets the job done, as we’ll express in a moment, and importantly, we’re able to apply it all the way back to 2006 in backtesting, because it’s our own formula so we know the guts.

The recruiting piece of this is important, but not gigantic. We approach the matter by assigning each team a “talent score” comprised of a weighted average of their last five recruiting classes, as rated by the 247Sports consensus rankings. The classes grow in significance over their first three years in a program and decrease after the final two. Importantly, this doesn’t account for transfers, but that should only significantly affect USC this year, and to be frank, we don’t know if the transfer-all-stars approach is going to work out there or not. Down the line, as transfers become a bigger part of rosters, we’ll adjust for them, but for this year, the data just isn’t there to make anything other than a wild guess, and this particular time, that’s a guess we won’t make.

The talent score for each program isn’t compared against other programs’ talent scores. We don’t care so much about how Georgia’s talent compares to Cincinnati’s in this piece of the model. This is handled already, in the Movelo piece. The disparity in talent between Georgia and Cincinnati is evident in the results of Georgia and Cincinnati’s games. What we care about is how a program’s talent score changes year-to-year. Is the program more talented than it was last year? We should expect it to do more. Is the program less talented? We should expect it to do less.

Overall, the system performs surprisingly well. With a three-point home-field adjustment, its average error over the last 16 years is 13.1 points, not far off some of the leading college football rating models, and only a point or so behind my best impression of how the Las Vegas spread performs. Also, it passes the gut-check test, saying preseason that Georgia would be a six-point neutral favorite over Alabama, a seven and a half-point neutral favorite over Ohio State, and at least a ten and a half-point favorite over every other program. We care most about how Movelor does with the best teams, and that lines up with the sort of thing we’re seeing from Vegas (and from Georgia—Movelor did have the Dawgs as a 26-point favorite over Oregon in Week 1).

The fact this incorporates the FCS is something we like a lot. Not every FCS opponent is created equal, even if they’re treated comparably by the CFP committee. North Dakota State, by Movelor’s estimate, entered the season rated equivalently to Iowa, just outside the total Division I top 25. James Madison, new FBS transitioner, entered the season rated between Mississippi State and Oregon State, on the edge of the Division I top 50. UMass entered the season rated between Utah Tech and Lehigh, better than only roughly thirty FCS teams.

The FCS piece will also be helpful down the line, but we’ll get to that.

***

Putting It All Together

To make all of this into a cohesive whole, we use Movelor to simulate the season 10,000 times and we use the CFP Rankings formula—with a randomized FPA variable assigned to each team in each simulation based off of the average absolute FPA across rankings-relevant teams at the end of last year—to select the playoff in every simulation. Movelor does “run hot” in these simulations, meaning it updates to account for results within a given simulation: If Wisconsin blows out Ohio State on September 24^th in one simulation, it’s expected to be a lot better in October and November than it is in simulations where the Buckeyes won big.

The margin of error on this is what you’ll see in the percentages. We’ll get to a point, barring unprecedented chaos, where some teams have a 100.0% chance of making the playoff. It’s unlikely we’ll see all four teams have that 100.0% chance, but it’s possible. Sometimes it’s eminently clear. More likely, we’ll enter the selection show with a few at 100.0% and a ton at 0.0% and two or three somewhere in between. How right or wrong do we expect to be? Whatever our percentages say.

As far as caveats go: We don’t have conference tiebreakers in there yet, so those are handled randomly where they arise. We also haven’t fully backtested our model and calibrated it week-by-week over these last eight years, something we would like to do but requires approximately 360 hours of a computer running simulations, with the accompanying man-hours of setting those each up and processing them (part of this is that we need better computers, so, uh, please buy some merchandise, or at least keep the pageviews coming). Finally, I personally have some suspicion that we could be better predicting FPA ahead of time, but that’s going to be part of the 360-plus-hour backtesting process, so it’s not part of this year’s model. Aside from that, though, the projections are sound, and I believe this is a more thorough explanation of how the model works than you’re getting from almost every other model out there, which—alongside the 32-for-32 success rate we can boast over the last eight playoffs (and never having claimed in 2020 that USC had a 50% chance of making the playoff the week before they opened 18^th in the CFP rankings, as somebody did)—makes us believe we have a legitimate claim to the title of best College Football Playoff prediction model in the market. And it’s only getting better from here.

***

Upcoming Features

Today, the only things we’re publishing are the highest-level probabilities: Each FBS team’s probability of winning its division, winning its conference, making the playoff, and winning the national championship. We don’t even have a homepage for the model yet, though we’ll update this post and link to it when we do. We’ll be publishing those probabilities in a blog post, using them for our published futures bets for the day, and then spending the weekend getting more features added.

As the season goes on, we’ll be rolling out new features: single-game spreads, scenarios for each game of playoff consequence (When Alabama plays Mississippi, we’ll tell you how each team’s playoff probability changes with a win or a loss), bowl projections based on median results, FCS bracketology (and playoff probabilities), hopefully even more. We’ll also, importantly, be updating FPA with every set of rankings, projecting the rankings beforehand and updating our model as each top 25 is unveiled. Finally, we’ll start publishing the live Movelor ratings, so we aren’t the only ones who get to track whether Idaho State has passed Davidson. This post, again, will be updated accordingly.

Update, 10/17/22: We’ve put a touch of conference tiebreakers in, but that’s the only addition so far from this original post. More to come.

How Our College Football Model Works

Joe Stunardi

Leave a Reply Cancel reply