How Our 2024 College Football Model Works

Our college football model is back. It has two main parts.

The first part is Movelor, the rating system we use to rate every FBS and FCS team in the country. The second part is our probabilities, the output of simulations designed to predict who will make the College Football Playoff. You can find all these offerings and more through our model’s homepage. Here’s how they work.

Section 1 – Movelor

The Name

Movelor’s name is close to an acronym, standing for Margin of victory-based elo with recruiting, and that name is to some extent self-explanatory. The only live inputs into the continuously updating formula are 1) the margin of victory or defeat in each game involving a Division I football team, 2) each team’s offseason recruiting score on the 247Sports Composite, and 3) each team’s standing in the preseason AP Poll. This third piece is new this year, and we’ll discuss it below. Overall? Movelor is a very simple system. All it cares about is which teams win or lose, where the game is played, the margin by which the game is decided, how well each team is seen to recruit, and what the press thinks of the best teams heading into the season. Despite or because of this simplicity, it performs fairly well. More on that below.

The Scale

The numerical ratings within Movelor have no minimum or maximum. Their average is roughly 0.0, meaning the average Division I football team measures out at 0.0 on this scale. The ratings are equivalent to points per game, meaning a team with a Movelor rating of 28.0 should be expected to beat an average Division I football team by 28 points on a neutral field. Please note: That’s Division I. Not FBS. Movelor covers both the FBS and the FCS.

The Movelo Parts

The elo system is the core of Movelor. Its name and much of its design come from a rating system first devised in chess. (It’s called Elo over in that world, named after its inventor, Arpad Elo.) The basic idea of elo systems is that each competitor has a numerical rating and that each rating updates with every competition the competitor completes. If the result of the competition is surprising—if the competitor wins over an opponent with a much higher rating than their own or loses to an opponent with a much lower rating their own—the rating will swing by a broader margin than if the result is unsurprising—if the competitor wins as a significant favorite or loses as a heavy underdog. Wins improve elo. Losses worsen it.

We add on to this with a margin-of-victory component, one which operates in a logarithmic manner, meaning the difference between a 14-point victory and a 7-point victory is larger than the difference between a 28-point victory and a 21-point victory in Movelor’s eyes. The way Movelor handles margin of victory is to look at its own expectation for the margin of a given game and then, just as with the elo piece, assign a larger change in rating to teams who surprise by a lot and a smaller change to teams who surprise little. In short: If Alabama is a 14-point favorite in Movelor’s eyes and wins by 17, Alabama and its opponent will not see their ratings change very much. If Texas is a 3-point favorite in Movelor’s eyes and wins by 42, Texas and its opponent will see their ratings change dramatically.

There are better ways to measure teams’ relative performance in a game than margin of victory. Ultimately, though, we find this approach does a surprisingly good job of keeping up with the industry. So, while we’d like to refine Movelor’s game-by-game adjustments one day, that’s not a particularly urgent priority for our business.

Home field matters in college football, and we account for location of games in the initial expectations against which the margin is compared. Our best estimate is that home-field advantage is worth three points.

The Recruiting Part

Movelor’s second element is its R, recruiting, and it is a small element. This offseason, only six teams saw their rating change by a point or more based on recruiting.

Movelor looks at recruiting through the lens of a “talent score,” a weighted average of recruiting class scores from the 247Sports Composite over the last five years. It phases these scores in and out, with the class from three years ago twice as important as the classes from two and four years ago and those classes twice as important as the classes from one and five years ago. This is an extremely, extremely basic way to grade how much talent is in a program.

Once Movelor has each team’s talent score for a given season, it compares it to the same team’s talent score from last year. It then adjusts their rating accordingly. They could have had great recruiting, but if it isn’t as good as their previous recruiting was, their Movelor rating will go down. In this way, it normalizes each team’s talent level to itself. If a team has performed well without much talent, Movelor doesn’t doubt them because they lack talent again this year.

Again, this is very basic, and it’s grown more outdated since we debuted it thanks to the growth in transfers. We want to go through rosters using the 247Sports Talent Composite, breaking them out by original high school recruiting class to account for aging. This is our preferred route for refining this approach. That, though, turned out to be too big a project for this specific offseason. So…the AP poll:

The Preseason AP Poll

Because Movelor’s recruiting treatment is so inexact, Movelor doesn’t change much in the offseason. This has been a blessing and a curse. On the positive side, it’s kept us from overreacting to certain offseason hype. On the negative side, Movelor’s results tend to be poorer in September than they are the rest of the season. Clearly, we could be doing something differently.

We tried two quick fixes to improve our offseason adjustment process. The first was to install a blanket adjustment to any team who changed head coaches between Week 1 of the prior season and Week 1 of this season. Returns on this were fairly meaningless, and to the extent they existed, they were disparate—some teams get worse after a coaching change, some get better, and you can probably predict which but establishing a set of categorization rules wasn’t worth the squeeze. Not this time around, anyway. So, we tried the second quick fix, which was to finally consider the AP Poll. Returns on this weren’t dramatic, but we did manage to shave two hundredths of a point off of Movelor’s all-time average absolute margin of error, dating back to the 2006 season. Step by step, folks.

We loathe the in-season AP poll around here. As we’ve said too many times to count, it doesn’t define what it’s ranking, instead serving as some nebulous, ever-changing combination of how good a team is and strong their résumé is, with plenty of noise thrown in surrounding good storylines. The preseason AP poll, though, is pretty good. The reporters who vote on it always know a lot, but in the preseason, they apply that knowledge consistently with one another. They use the same rubric. They rank how good they expect teams to be. They might not often pick the correct national champion (they haven’t since 2017), but the rankings are good, and they have at least a little predictive power.

So, we’ve begun adding the AP poll into our rankings. The way we do it is to take every team’s ranking after the recruiting adjustment and line them up. Then, we find the lowest-rated team ranked in the top 25. Some years, this team is rated 30^th by Movelor. Sometimes, this team is rated 73^rd. Wherever they fall on Movelor’s curve, we take them and every team above them and assign them an “AP Movelor rating.” This hypothetical rating is how we say Movelor would rate each team if it were fully reliant on the AP poll. In the hypothetical rating, all unranked teams take on that 25^th-ranked team’s rating, with the 25 ranked teams following upwards from there in equidistant steps, steps large and small enough that the 1^st-ranked team’s hypothetical AP Movelor rating is equal to Movelor’s original preseason favorite’s. (Often, these are the same team.) Once we have these hypothetical ratings, we blend them with Movelor’s original rating, post-recruiting adjustment, giving a little less than twice as much weight to the original rating, the ratio we found yields the best historic results.

Overall, this has two effects: First, it accounts for top-25 teams who experienced significant offseason turnover, like both Michigan and Washington did this year. Second, it offers some mean reversion. For example: When an FCS team blows the doors off of every other FCS team it faces in the FCS Playoffs, their absence from the ensuing season’s preseason AP poll keeps Movelor from rating them a top-ten team entering the next season. This also applies to teams with big bowl game wins, wins that are meaningful but can be misleading.

The approach certainly has its shortcomings. It offers no offseason adjustment to most of the Group of Five and the FCS. But our hope is that it’s better than what we’ve done in past seasons in that it leads to a slightly less rollercoaster September for our model.

Movelor’s Strengths and Weaknesses

We already talked a lot about the offseason adjustment piece, so I won’t belabor that here.

The strengths? It’s pretty accurate. Movelor had an average error last year of 12.9 points per game. The closing spread in betting markets had an error of 12.1 points per game. ESPN’s FPI had an average error of 12.5 points per game. ESPN’s SP+ had an average error of 12.3. Movelor outperformed at least one of the old systems used in the computer portion of the BCS ratings. I suspect it outperformed others as well, but many have vanished into the sands of the internet.

One driving factor behind Movelor’s success is probably how it treats the FCS. Not all college football models treat different FCS opponents as different from one another. Many do, and more do than used to, but some still don’t. There is a big difference between South Dakota State and Stetson. A roughly 60-point difference, in fact, at the end of last year.

That FCS inclusion is another of Movelor’s calling cards. FCS football is fun, and we have numbers on it, and not everybody does. This also leads us to greatly appreciate the FCS powers. Others realize when South Dakota State cracks the national top 25 in quality despite operating with more than a third fewer scholarships than their FBS peers, but they don’t pay attention the way Movelor forces us to pay attention. This is a more subjective benefit, but it’s a benefit nonetheless.

Another driving factor behind Movelor’s success is likely that simplicity. It’s easy to overthink things. It’s especially easy to overthink things when trying to catch up to public perception. Movelor does not do this. It’s ham-handed, and in some ways, that’s probably good.

Weaknesses? Beyond the offseason adjustment and those better ways to measure relative performance:

Movelor doesn’t consider tempo, and it doesn’t break out into offensive and defensive components. These could limit its precision—short-field touchdowns after a penalty and a bad kickoff are just as important to Movelor as well-executed drives down the field—and it makes it so we can’t forecast totals or exact scores. It might overweight high-octane offenses and underweight strong defensive performances, but it’s not as simple as that: Movelor gave Iowa a lot of credit last year for never letting teams put up big numbers against them.

Movelor also doesn’t look backwards. Once a result is into Movelor, Movelor never looks at that game again. So: If Penn State blasts West Virginia this week and West Virginia goes on to be a lot better than Movelor expects, Movelor won’t change its impression of Penn State’s victory based on how its impression of West Virginia has changed. All scores are final, so to speak. Again, this isn’t a terrible weakness—college football teams change in quality as the year goes on, so while some Bayesianism would be preferred, it’s easy to go too far. But we’d like to blend this in, at least a little bit. We wonder if it would help us close our gap on the markets.

Section 2 – College Football Playoff Probabilities

Movelor is solid. It gets the job done. But our playoff probabilities are where this model shines. We first launched them in 2019. They did very, very well. We’ve continued over the years since, with some small breaks in publication due to Covid-induced scheduling uncertainty.

We’ve tweaked the approach over the years, but we’ve never had to tinker with it too dramatically. Before last year, it had never missed a playoff team. Should it have gotten all four correct last year? Yes. That’s on us. It did not account for a situation where an otherwise traditionally deserving team lost its starting quarterback in the penultimate game of the regular season, or for the head-to-head situation which ensured Texas would make it over Alabama. It was too high on Georgia’s chance of making the field. We should have been prepared for these things, and as we’ll explain below, we underestimated the degree to which the playoff chase is still a horse race. But overall? Last year was really, really weird. One thing we’re proud of: Even without accounting for Jordan Travis’s injury, the model knew there was a chance Florida State wouldn’t get in. It saw the vulnerabilities in their résumé which opened the door to the committee closing the door. Many models saw a 13–0 Power Five conference champion and dropped the 100% number. Our model still got its butt kicked, but 90% is infinitely different from 100%.

As with Movelor, there are long-term changes we want to make to this part of the model, specifically in how it reacts to the committee’s rankings each week. But overall, it does a better job than Movelor does, relative to the market.

Changes we did make this year:

Our model was designed for a four-team playoff, and we based its CFP formula off of rankings in the four-team playoff era. This adds some uncertainty as we look ahead into this year. To account for this, we’ve amplified the uncertainty within the model, telling it to be less confident in its ranking predictions than we usually tell it to be. The only other adjustment we made this year—besides necessary playoff format changes and the change to Movelor, discussed above—is that we’re labeling Washington State and Oregon State each as half a mid-major.

How it works, how strong it is, and what you should know when referencing it:

CFP Formula

Our approach uses a core ranking formula built off of six sets of metrics:

Wins/Losses/Win Percentage

This is fairly self-explanatory. We broke it out into win percentage in addition to wins and losses to adjust for teams without conference championship games (when that was an issue), teams with games canceled by hurricanes, and—of course—the Covid season.

Adjusted Point Differential

This is more complicated.

Adjusted Point Differential (APD) is our metric which approximates the eye test, betting market odds, other advanced ratings, media and coach rankings, and all the other explicit and implicit pieces which impact committee members’ impressions of how good each team under consideration really is.

The way APD works for a given team is to look at each of that team’s opponents’ average margin of victory or defeat (using a flat number for all FCS opponents, because while Movelor knows about the differences between FCS teams, committee members generally don’t) and then compare that average scoring margin to their scoring margin against the given team. In practice: If Indiana has an average scoring differential of –10 points and Ohio State beat them by 20, Ohio State outperformed the average by ten points. Add that up for each of Ohio State’s games, average it, and you have Ohio State’s APD.

Power Four/Group of Five Status

As you may have guessed, APD overestimates the committee’s evaluation of teams outside the Power Four. It knows their schedules are worse, but it doesn’t know how much worse. So, we put a blanket adjustment into our overall CFP ranking formula which deducts from all Group of Five teams, including independents not named Notre Dame. This also accounts for any additional discounting the committee is doing, whether warranted or not. As we said above, Washington State and Oregon State are only receiving half a deduction right now. We don’t know how the committee will treat them, so we split the difference.

Power Four Conference Championship

The committee values conference titles, even if it’s now claiming it doesn’t, but it’s unclear if it cares about those outside the Power Four. Our formula awards a bonus to teams who win a Power Four conference. This probably won’t matter as much this year, because the top four champions will be locked into byes anyway, but if it does matter—if a Group of Five champion outranks the ACC champion, for example, and another is close—it will matter in a big way.

Three Best Wins/Three Best Losses

While strength of schedule metrics get a lot of airtime, the construction of those formulas is more arbitrary than you’d think. What ultimately seems to really matter to the committee is the quality of each team’s best wins and the quality of each team’s losses, if they have any. We add a component to this which accounts for margin of victory or defeat, because blowouts look worse (or better) than narrow losses (or wins).

FPA (Forgiveness/Punishment Adjustment)

Our sixth variable is FPA. FPA is inserted into our CFP formula every week CFP rankings are released. It normalizes our model’s impression of the field to where the committee says the field lies. In only three instances other than in reaction to rankings do we insert FPA into the formula. Those three instances:

The Kelly Bryant Rule: If a team loses without its first-string quarterback and that quarterback will be back for the playoff, their worst loss’s impact is halved, in accordance with how the committee treated Clemson in 2017 after the Tigers’ loss to Syracuse.
The Jordan Travis Rule: If a team loses its starting quarterback and that quarterback will not be back for the playoff, and if that team fails to cover Movelor’s spread in the majority of games played after the injury occurs, they receive an FPA deduction equivalent to the average one-ranking gap within the top ten.
The Nick Saban Rule: Any SEC champion with zero or one losses must be ranked in the top four no matter what. If another one-loss Power Four champion beat them, they must also be ranked in the top four. This is handled via FPA and is not accounted for in our simulations due to the hyperspecificity of the scenario.

These are stupid little exceptions, and we’re aware of that. Basically, what we’re doing with them is saying, “We trust our model, but we’ve seen three specific scenarios where it’s been wrong, and we reserve the right to yank it around in prescribed ways if those scenarios materialize again.” It’s amateur-ish of us, but we think it serves our readers best, and we’ll always be transparent about adjustments we make to our model—the what, the why, and the how. We will continue to work on improving it so we don’t have to make stupid little exceptions.

Like we said above, we want to do a better job with FPA. Our theory when we built this model was that the committee operates more like the basketball committee—one that mostly looks at all the data at one moment in time and builds its rankings from there—than like AP voters, who conduct a horse race each year. This theory had a lot of value and helped us in the model’s early years. Recent committee treatment of 2022 Ohio State and 2023 Georgia, though, has us acknowledging that there’s still some horse race to the process.

Simulations

Our model, using the Movelor rating system, simulates the rest of the season 10,000 times, then tallies up all of the results into probabilities. Each simulation is its own unique season, and Movelor is live within each, adjusting to results as they happen. So: If in one simulation, Temple upsets Oklahoma, Oklahoma is expected to do a lot worse the rest of the season than it is in another simulation where Oklahoma blows out Navy.

Caveats

A few things to be wary of with our system:

At the moment, we have no conference tiebreakers programmed in. Those are purely random. As the season goes on, we will either fix this or adjust the randomness for known tiebreaker scenarios. This is our own fault, but it didn’t help that multiple conferences were so late to update their tiebreaker rules this year as they depart from the division era.
We haven’t yet standardized our process of how exactly to insert FPA in response to the rankings, and so in our model’s simulations, FPA is just another random variable. It isn’t linked to specific results or specific timing which can shake the committee from precedent. We would like to automate this one day. Ideally, soon.
Our model is blindly loyal to precedent. The committee is not. The model is a little too trusting of the past.
We have not done a full, week-by-week backtest of the model throughout the whole of the playoff era, so our probabilities are uncalibrated. This is another thing we would eventually like to do—make sure that 10% of the things we say are 10% likely end up happening—but we have yet to do it.
APD is very simplistic. It works fine, but it is very simplistic. This would be a nice thing to ultimately improve upon, but do not expect any adjustment to this piece of the puzzle for a while.

Overall, this is one of the best playoff predictors publicly published, with only one two-part miss in its history and only one actually poor performance, a performance which was ultimately comfortably covered up (because USC wasn’t that good in 2022, which is why APD doubted their playoff chance in the first place). Also? If there’s a flaw significant enough that we doubt the model, we will talk about that ad nauseum and give you our best read of the situation. This is a big distinction from other models, where certain media outlets will happily throw out ridiculous numbers and call it statistics. If we see a ridiculous number, we fix the issue behind it before we go back out there, and we tell you what happened. If we don’t fix it, we at least tell you it’s ridiculous and why we think it’s happening and how we plan to fix it going forward. You can count on that from us.

Section 3 – CFP Bracketology

Our wheelhouse! The process here is relatively simple:

First, we gather every team’s average final CFP ranking from our simulations. For teams who average finishing the year unranked (this is most teams), we take their playoff probability.

Second, we line up the nine individual conference favorites by average final CFP ranking. Once we get to the part of the list where teams are unranked, we line them up by playoff probability.

The first four conference champions on our list—i.e., those with the best average final CFP ranking—are assigned to the four byes in the bracket, in accordance with CFP procedure. Basically, these are the four highest-ranked conference champions in our model’s “average” simulation.

The fifth conference champion on our list—i.e., that with the fifth-best average final CFP ranking or the best playoff probability among remaining unranked conference champs—gets the fifth automatic bid.

Third, we take every team not among those five (this includes the four conference champions who didn’t get automatic bids) and line them up by average final CFP ranking. The first seven on that list are our at-large bids.

Fourth, we line our seven at-large bids and the fifth conference champion up by average final CFP ranking, and we assign seeds 5 through 12 based on that. For seeds 1 through 4, we do the same for the four auto-bid teams who received byes.

Fifth, we assign teams to bowl games in the quarterfinal and semifinal round based on historic bowl ties (SEC: Sugar Bowl; Big Ten: Rose Bowl; Big 12: Sugar Bowl; ACC: Orange Bowl) and geography where historic ties are impossible to satisfy. This is done, like the rest, in accordance with CFP protocol.

The end result is a realistic bracket which is something of an “average” scenario, in that it it’s comparing every team’s average season against those of its competition.

Section 4 – Bowl Projections

Coming soon: ETA is currently Week 3.

Section 5 – FCS Playoff Probabilities & Bracketology

Coming soon-ish: ETA is currently Week 8.

Acknowledgments

We take a lot of our ideas from the elo-based models of Nate Silver and Jay Boice. Silver has also been a great example for us when it comes to transparency and accountability. We are, again, admittedly amateur-ish with this model, but his is the standard we’re trying to live up to.

Second, we’re grateful to Ken Pomeroy and Joe Lunardi, two college basketball projection pioneers, and to Bill Connelly, the creator of ESPN’s SP+ rating system. We’ve copied and stolen concepts and practices from lots of people, but probably mostly from those three.

Third, we’re always indebted to collegefootballdata.com for compiling the schedule each year. There is an annual “oh shit” moment in August when we are running behind, and there is an annual “oh thank goodness” moment when we get that excel file off their website.

Last, as always, we are most indebted to you, the reader, for helping this be more than a dorky hobby. Thanks for being here.

How Our 2024 College Football Model Works

Joe Stunardi

Leave a Reply Cancel reply