It’s Time to Abolish the BPI

Longtime followers of Bauertology should know that the week after Super Bowl Sunday is a very special time of year for the Bauertology blog, as that’s when the weekly Bauer’s Bubble Watch series makes its grand reappearance. These pieces are always my favorite to write—a deep dive into every team vying for the NCAA Tournament’s limited assortment at-large spots, and a real opportunity to have some fun with my informative yet humorous writing style, churned out every week up until selection day. We’re still a little under a month removed from 2024’s first copy, but, don’t you worry, it’s coming soon enough.

You may have to wait until February for Bauer’s Bubble Watch, but I’m not waiting until then to get the creative writing juices flowing. I’m turning my attention elsewhere for this write-up—though it’s still bubble watch serving as the inspiration.

A huge reason why I really got into this whole bracketology/bubble watch business in the first place is the work of Eamonn Brennan, formerly of ESPN and The Athletic, now continuing to write at his personal blog Buzzer. Eamonn’s kind of the original king of bubble watch; I can recall getting into his weekly column as far back as 2014 and perhaps even earlier, and I’m thrilled that he’s still doing it a whole decade later.

Eamonn’s already dove into the bracket breakdown game for 2024, with his first bubble watch column posted back on Jan. 9. (I personally think it’s a bit early to be taking this kind of bubble analysis too seriously, but, hey, who am I to complain about an awesome new article to read every single week?)

Eamonn does yeoman’s work, pumping out a brand new 9,000-word write-up every Tuesday. As someone who often struggles to write half that amount for my bubble watches in a timely manner, I can tell you that this is a herculean task.

I enjoyed all 9,000 of those words when I read the most recent edition from Tuesday, Jan. 16. But, given the discussions I’d been having with other bracketologists at the time, there are a select few words, written in the blurb for Mountain West title hopeful Colorado State, that really struck a chord with me:

Eamonn Brennan

For those who don’t know, the Basketball Power Index, or BPI, is ESPN’s own predictive measurement tool for comparing all 362 Division I basketball teams.

And I think I can speak on behalf of my fellow bracket-building compatriots, especially those that I collaborate with on Twitter, when I say this: The BPI sucks.

We hate using it as a resource to evaluate the résumés of teams, and we only do it because we kind of have to. The BPI is, after all, an official metric that appears on the team sheets used to evaluate teams by the NCAA Tournament selection committee.

But it shouldn’t be.

To put it succinctly, the BPI makes no damn sense when compared to other metrics that look to do the same thing: evaluate and sort teams based on their efficiency, which is essentially a tally of how they fare at scoring on their offensive possessions and preventing scores on their defensive possessions, adjusted for their schedule. These efficiency numbers are used to predict the quality at which teams are expected to perform going forward; this is why you will often hear these tools called “efficiency” or “quality” or (as we will call them in this article) “predictive” metrics.

These predictive metrics are the main factor that comprises the NET, the official 1 to 362 ranking system that the NCAA uses to sort its teams into some kind of order. I’m a strong defender of the NET, as its rankings generally seem to get a bad reputation among college basketball fans, with most complaints voiced to the tune of “How can Team A be ranked higher than Team B when Team B beat Team A head-to-head and Team B has more Quad 1 wins and yada yada yada” you get it.

Thing is, the NET is not supposed to be this almighty tier list of absolute that many picture it as. In fact, the selection committee barely considers the pure NET ranking number that each team possesses when selecting and seeding teams for the tournament, if at all.

The NET, like the predictive metrics that influence it, mostly sorts teams based on the quality/efficiency of their prior performances (with strength of schedule adjustments and all that), thus predicting how these teams will perform going forward. This is then used to create the quadrant system, a collection of four different buckets (Quads 1 through 4) that help the selection committee to group and visualize the significance of each team’s wins and losses. (This is a key thing to remember: The quadrants don’t create the NET, the NET creates the quadrants.)

This is, in my estimation, a pretty good way to do it. It gives you a general idea of which teams you should be beating and which ones you shouldn’t, and, thus, how much value each of those particular results hold. It’s far better than the previously used RPI, which relied purely on winning percentage rather than how much or how little you’re winning your games by, often allowing teams to “game” the RPI in their scheduling.

No, NET is not perfect. But I think it accomplishes what it sets out to do pretty well. And when you do get those weird quirks, like Colgate being a NET top-20 team during the COVID year, the committee is able to use logical judgment to delineate; they’re humans, not some computer algorithm.

As much as people like to whine about the NET, its rankings fall mostly in line with what other predictive measures say. These predictive metrics include those officially on the selection committee’s team sheets, like KenPom, and those that are not officially monitored by the selection committee, like Haslametrics and T-Rank.

Twitter user Andrew Weatherman recently did some fantastic analysis for the rankings from each metric on Jan. 14 in comparison to the NET. In the case of the three metrics listed in the paragraph above, each one correlated with the NET to a degree of right around 90%, with only a handful of teams here and there jumping off at the page as being severely undervalued or overvalued (i.e. a 50-spot difference or greater) by a certain metric in comparison to the NET.

And then, there’s BPI.

While KenPom, T-Rank, and Haslametrics all correspond to the NET 90% of the way, with between only one and three teams appearing as outliers in each case, BPI is in its own stratosphere, correlating just 86% of the way with the NET, with nineteen different teams listed as significantly over- or undervalued.

Remember, BPI, the deviant here, is an official team sheet metric. So is KenPom, which helps to outweigh some of the zaniness going on, but neither T-Rank nor Haslametrics are team-sheet official.

So, what’s going on?

Well, take a closer look at the logos of the teams on the BPI chart that are considered “significantly undervalued.” Air Force. Montana State. Wyoming. Northern Colorado. Utah Valley. Denver. Montana. Notice anything?

It’s not just some coincidence that all these teams in the Mountain Time Zone are faring much worse in the eyes of the BPI than the NET and other metrics. Remember the postulation from Eamonn Brennan before about how weird it is that all these Mountain West teams are so often ranked 20 to 30 spots lower in BPI than in KenPom? Something fishy is happening here.

Well, believe it or not, the secret at play is not so secret. This blurb comes right from the BPI rankings page on ESPN:

ESPN BPI

There’s one little word in that whole big paragraph that explains it all: altitude.

By BPI’s own admission, the number of feet above sea level that you play your home games is a factor that determines the quality of your basketball team.

That sounds pretty ridiculous, but I do at least understand the sentiment. Oxygen is thinner at higher altitudes, so perhaps a visiting team may struggle to adjust to that environment while playing the highly aerobic game of basketball, in comparison to a team that plays their home games in these heights routinely. So, maybe there’s some good intention there after all.

But the punishment that these high-altitude teams are receiving is far too harsh. Fellow bracketologist Made For March has done some research on this himself. Using this data from Jan. 9, take a look at the teams that play at the highest altitudes and the monstrous gaps we’re seeing between their KenPom and BPI rankings:

I can understand a little punishment for the teams from these lower-oxygen environments. But a metric difference of 40%? Or, in some cases, over 100%?? That’s ludicrous. Should altitude really be this big a deal?

I feel like I can provide a little expertise on the matter myself as someone who lived the first 22 years of his life in the low-altitude environment of Pennsylvania before spending the past two-and-a-half in my current home of Butte, Montana, which towers over a mile in the sky (even higher up than Denver).

I’m not going to try to override what any physician or doctor would say, but I will offer my own experience: It took me no time at all to adjust to the higher altitude. I hardly even noticed it. And even when I did strenuous exercise for the first time in Butte, I didn’t notice anything abnormal about my breathing intake. And, hey, Division I basketball players are far better conditioned than I am. Why should they have more trouble adjusting to the oxygen than me?

Again, I can understand a small tax for teams who regularly play their games in high altitudes compared to the teams that don’t play there all the time. But docking them to this degree, when BPI is one of the official factors that the selection committee uses to determine which teams get to play in the big dance and which teams get to sit home on the couch, is just unfair.

And it could even go beyond just the altitude factor. We may be drifting into tin-foil hat territory here, but stick with me for a second.

BPI is a metric created and published by ESPN that penalizes teams that play at higher altitudes, residing in conferences such as the Mountain West and Pac-12. In comparison, teams that play at lower altitudes don’t receive such a punishment, as would be the case for those in the ACC and SEC.

Well, wouldn’t you know it? The ACC and SEC both have network contracts with ESPN. Meanwhile, the Mountain West and Pac-12 do not, and the latter is seeing a breakup instigated in large part by ESPN after network contract negotiations fell through. Coincidence? …Maybe. But it’s perhaps worth stroking your chin and saying “hmm” at.

(For a little more concrete evidence that something may not smell right in this regard, check out this piece from Marquette writer Alan Bykowski back in 2022 that detailed the differences in BPI between teams on ESPN media contracts and teams on Fox media contracts: http://www.crackedsidewalks.com/2022/02/bpi-should-be-removed-from-team-sheets.html?m=1)

Regardless, any metric that regularly has this much deviation from the norm is not to be taken seriously. And it’s even worse when you consider that the Sagarin ratings, one of three team-sheet official predictive metrics alongside BPI and KenPom, have suddenly gone radio silent in 2024, increasing the BPI’s power from 33% to 50%.

Fortunately, that previous ratio didn’t seem to hamper tournament-level teams too badly, with 2022 Colorado State pulling out a respectable 6-seed in spite of an 80th-ranked BPI. But who’s to say that the same will happen in 2024 and onward when Sagarin is no longer present, and you can’t simply point to the BPI number as an outlier anymore?

This is why we need change, and we need it fast. A multi-billion-dollar corporation that has vested interest in certain conferences and teams should not have this much say in determining who gets to dance in March. We need to return the power to the people, i.e. the genius minds who actually sit down and watch the games and crank out these effective formulas: the Ken Pomeroys, the Erik Haslams, the Bart Torviks, the Evan Miyakawas.

Here’s what I’m proposing: Remove the BPI from team sheets completely. In its place should be a different predictive metric in the same ilk of KenPom (one created and monitored by an individual who cares about the sport) that does not consistently create the same number of wild deviations for largely the same group of schools. I’m partial to T-Rank, but any of these ones I’ve mentioned (Haslametrics, EvanMiya) would do the trick. Heck, with Sagarin gone, I would even put two of them on the team sheets. The more data, the better; with three data points again, we can more easily identify when and where the outliers exist.

And if we go back to three predictive metrics, I would recommend going to three performance metrics as well for evenness’ sake.

To touch on this point briefly, performance metrics (also called “résumé” or “results” metrics), are a measure of how well, in terms of wins and losses, your team has performed against your schedule, with the adjustment here being for your schedule’s difficulty. You could call these metrics a refined version of the RPI, where they work much better as a piece of the puzzle rather than the whole puzzle itself. The numbers they produce tend to be a great indicator for determining which teams make the tournament cut, and, as they really only take the results of your schedule into account, they provide the ideal counterbalance to the predictive metrics that focus less on result and more on the quality of those performances.

The selection committee’s team sheets currently display the numbers for two performance metrics: the Kevin Pauga Index (KPI) and ESPN’s Strength of Record (SOR). And, for the most part, these numbers usually agree with each other much more than BPI and KenPom, as is evidence with the Mountain West teams that spurred this whole conversation; at the time of writing, Colorado State ranks 22nd and 25th, San Diego State ranks 13th and 11th, and Utah State ranks 16th and 14th, in KPI and SOR, respectively.

But you will still see some deviations from time to time. For example, this year’s Iowa State team has put up elite performances against inferior opponents, landing the Cyclones top-of-the-line predictive numbers in just about every measure (15th in KenPom, 10th in T-Rank, 9th in Haslametrics, etc.), but has yet to pull off very many big, meaty wins that really count for their résumé, placing them anywhere in the performance metrics from 42nd (SOR) to 72nd (KPI). Sure, underwhelming in both instances, but that’s still a pretty big gap between those two numbers.

This is where adding a third performance metric can help determine if either one is an outlier. And we’ve got just the thing.

Just like ESPN has both a performance metric (SOR) and predictive metric (BPI), Bart Torvik has both. While T-Rank deals with the quality side of things, Wins Above Bubble (WAB, for short) handles the results. It’s a really nifty tool that normalizes its zero point right around where the tournament’s at-large cut line would be (about 50 teams deep), and it’s here that you see Iowa State ranks 50th in WAB, so we could safely deem their KPI of 72 as the outlier in this scenario.

This plan, I think, satisfies everything. Do away with the constantly-off BPI, replace it with other predictive measures like T-Rank and Haslametrics that consistently produce fewer outliers, then even things out on the performance side by adding WAB to the team sheet as well. Now, we have six total data points, three on each side of the aisle, to paint the clearest possible picture for evaluating these teams’ profiles. Everyone goes home happy.

(Well, everyone except ESPN, whose crappy metric gets kicked to the curb. But, hey, you’ve still got SOR, and that one actually does its job pretty well. So what do you have to complain about?)

3 thoughts on “It’s Time to Abolish the BPI

  1. We don’t have any idea what happened to Sagarin? If it’s on the team sheet you’d expect that the committee would provide the resources to ensure it can be updated.

    1. The Sagarin numbers are updated by Jeff Sagarin himself. He did the same Sagarin ratings, updated daily, for the college football season we just had, so no idea why he suddenly stopped on the college basketball.

Leave a Reply

Discover more from BAUERTOLOGY

Subscribe now to keep reading and get access to the full archive.

Continue reading