We have a College Football Playoff model, and it projects how the committee will rank the teams. We trot it out from time to time, and when we do, we need to link to how it works, which is why we’re writing this post. Welcome, friend.
The model is built off of precedent, optimized to accurately predict the committee’s playoff selections even pre-FPA (we’ll explain what “pre-FPA” means down below). Backtesting it over the seven playoff seasons, it’s 100% accurate in predicting which four teams will be selected, both pre-FPA and post-FPA. This isn’t guaranteed to hold up, but it gives us confidence in how effective the model is at reflecting how the committee has chosen teams so far, which gives us confidence in how it will perform going forward.
To estimate the committee’s evaluations of teams, the model uses a combination of six sets of metrics:
Wins/Losses/Win Percentage
This one is simple. Wins, losses, and win percentage are each factors in the equation.
Adjusted Point Differential
This one is not as simple. This is our approximation of advanced ratings of teams, Vegas odds, the dreaded eye test, etc.—the how good are they pieces of the implicit equation. Speaking more exactly, it’s our measure of how thoroughly a team has beaten its opponents, relative to how those opponents played against the rest of their respective schedules. Blowing out a team that always gets blown out doesn’t get you much here. Beating a team narrowly that normally gets blown out hurts you. Blowing out a team that normally gets beaten narrowly helps a lot. Is this a fair representation of how good teams are? No. But it tracks decently well with fair representations, and it seems to mirror the committee’s evaluation rather well.
One note on this: FCS opponents are treated as a uniform entity right now for these purposes, with the same average point differential (-31.125) every year.
Power Five/Group of Five Status
Adjusted Point Differential, described above, does mirror the committee’s evaluation rather well, but it has one significant flaw: It overranks Group of Five powers. This makes sense: Teams that are blowing out MAC bottom-feeders aren’t as impressive as teams that are blowing out Big Ten bottom-feeders, but because APD measures a team’s performance relative to the performance of the rest of their opponents’ schedules, it can cast the two in a similar light. To correct for this, we issue a standard Group of Five discount on all schools that aren’t either members of a Power Five conference or Notre Dame/BYU* (*this year—BYU has gotten the Group of Five discount in the past, but the committee doesn’t seem to be giving it that discount this year and BYU’s schedule is Power Five-heavy, which lines up).
Is this perfectly fair? No. But neither is the committee.
Power Five Conference Champion Status
The committee values conference championships. We issue a bonus for winning one, so long as it’s the championship of a Power Five conference.
Three Best Wins/Three Worst Losses
After the first four components, we have a baseline score assigned to each team. We want to know more about résumé, though, and we want to specifically know about the fringes of the résumé, which receive disproportionate attention. So, we score each win and loss based on the quality of opponent; location of the game (home/away/neutral); and margin, with diminishing returns as margin increases (the difference between 7 and 14 is greater than the difference between 21 and 28). Each team’s three best wins are incorporated into that team’s score, as are each team’s worst three losses. If a team has fewer than three wins, or fewer than three losses, a zero goes into the relevant place or places in the formula (all wins are positive, all losses are negative).
One note on this, which rarely matters: FCS opponents are again treated as that uniform entity described above.
FPA (Forgiveness/Punishment Adjustment)
Finally, we have our sixth variable: FPA. FPA measures how much love or hate a team is getting from the committee. It’s adjusted every week following the release of rankings, and that adjustment is made in such a way as to 1) align our model’s rankings with those of the committee and 2) keep that aligned ranking still as close to historical precedent as possible. With one exception, FPA is always input as a reaction to the committee’s rankings. With one exception, FPA is zero-sum—any point of FPA given to one team must be taken away from another team. With no exceptions, FPA is always kept to the smallest absolute value possible*, meaning we put as little of it in as possible to get the rankings to align. Once it’s in there, it stays in there until it’s shown to change (or a team falls out of the rankings, in which case we wipe it clean).
*Note: The mechanics conflict a bit when it comes to minimizing the absolute value of FPA and keeping FPA from prior weeks. From the second rankings onwards, the way we currently handle this is to have our model go through the rankings from top to bottom, lining them up with the real rankings by using FPA to have teams meet in the middle. This sometimes leads to FPA having a slightly larger absolute value than is possible, but it’s how we do it right now.
That one exception? Let us tell you about The Kelly Bryant Rule™.
The Kelly Bryant Rule™
There’s one case in which we’ll preemptively add in FPA, and that’s the rare demonstrated case in which roughly half of Clemson’s 2017 loss to Syracuse was forgiven after Kelly Bryant left midgame. If a team’s clear first-string quarterback is hurt and the team loses, we count the loss as roughly half-forgiven, because we expect the committee to do it again. It’s a small sample, but it was so clear, and could have been so impactful, that we account for it. If the quarterback stays out after the game? Misses a lot of time? That’s different. But if the quarterback is just out for a game, or for half the game, or for another temporary and finite amount of time? We put it in there, and it stays in even as we adjust around it.
***
What don’t we consider? Head-to-head results and results against common opponents are two big ones. Our evidence suggests those do matter, but only when teams are already next to each other in the rankings.
***
I’d like to stress here that this is not how we would construct a ranking formula, were we charged with ranking teams. SOR is better (provided it’s based off of a good reflection of how good teams really are, like SP+ or FPI or FEI). But we’re not trying to rank the teams ourselves. We’re just trying to predict how the committee will do it. And this formula does a pretty good job of it.