MLB Commissioner Rob Manfred said on Wednesday that “we're going to play baseball in 2020." He claimed he was saying so with “100 percent" certainty. And, given the March agreement between the league and the MLBPA, he’s right that a baseball season can be played.
In the worst-case scenario — the one in which MLB and the MLBPA do not come to terms on an economic agreement to start baseball — Manfred has the ability to unilaterally institute a season at prorated salaries. This would certainly result in a short, sprint-like season, since ownership seems unwilling to play even a half-season at prorated dollars. Reports have indicated that this scenario would put the 2020 season somewhere in the realm of 50 games, with some saying that 48 would be the magic number.
The response to this has been underwhelming. Baseball fans in the replies on Twitter seem unenthused about the idea of a 48-game season — would it even be worth it to play at that point?
That got me thinking — what exactly defines a baseball season as legitimate? The long answer probably has to do something with psychology, the definition of the word legitimacy and the ability (or lack thereof) of humans to accept a part of a whole as equal to the whole.
Indeed, some crazy things can happen in such a small sample. That’s the whole point behind the “Mike Trout test” — the one which states (tongue-in-cheek, of course) that data from a baseball season cannot be considered legitimate until Trout is leading in WAR. An entire season of small sample data is sure to provide plenty of annoyance to those (like myself) who love analyzing this information.
But let’s think about this on a macro level. Even if some player hits .500 in the 2020 season, do we really care? Of course this statistic wouldn’t be considered “legitimate” in the eyes of baseball analysts, but what does it matter? What I’m saying is that while there may be certain individual pieces of data that skew the 2020 season, we need to look at the big picture: wins, losses and which teams get to play in the postseason.
That is all sports really come down to anyway — the playoffs and the champion. So, using this as my benchmark for “legitimacy” in a 2020 MLB season, we can begin to devise a pretty simple way to measure what number of games makes the season worth playing, and how many potential playoff teams would be necessary to remedy this increased variability.
I pulled five years worth of data, spanning the 2015-19 seasons, and correlated teams’ winning percentages at 20-game intervals to their overall season-long winning percentage.
For example, the 2019 Diamondbacks were 10-10 after 20 games, good for a .500 winning percentage. I plotted this datapoint alongside the 2019 Diamondbacks’ final record or 85-77, or a .525 winning percentage. I did this for every team in the last 5 years, and at intervals of 20, 40, 60, 80, 100, 120, 140 and 160 games. Here is the correlation between a team’s record after 20 games and their full season record:
Then, to understand the relationship between X-games and the reduction of variability in teams’ records, I plotted the r-squared values for each of these correlations into a second chart, one that evaluates the overall relationship between the total number of games into an MLB season and the overall variability explained by this game total. Simply, I looked at how games influenced legitimacy, defined as the r-squared between winning percentage after X games and overall winning percentage:
What we see here is a logarithmic relationship, and that makes sense. For each increase in 20 games, we shouldn’t expect to see an equal reduction in variability. To understand this, think about how little a team’s winning percentage can change between 100 and 120 games. Even a team that is 100-0 after 100 games — good for a 1.000 winning percentage — can only be as bad as 100-20 (.833) after 120. For a team that is 20-0 (1.000) after 20 games, they could be as bad as 20-20 (.500) after 40.
There are some pretty powerful insights that can be gained here. First, we see that after only 40 games, half of all team win percentage variability can already be explained. After 60, we’re nearing 72%, and after 80 games, we’re over 78%. The breakdown in tabular form is here:
Of course, there’s still the potential for significant outliers, and that’s where an expanded postseason may help. The Nationals were 19-31 after 50 games last year, and, conversely, the Giants had a stretch where they went 31-19. Respectively, these two teams went on to win the World Series and to get the 13th pick in the MLB Draft. While the trends do suggest legitimacy on an overall basis, they cannot control for individual team-by-team performance that may leave certain fanbases bitter. Variability still exists. The Nationals were clearly not a true-talent .380 team last year, evidenced by both the fact that they finished with 93 wins and a parade.
The Nationals were a historic outlier, but they are a good case for why the 2020 season will never be legitimate, no matter how many games will be played. On a broader scale, though, of the 10 teams playing .540 ball or better at the 50-game mark, seven of them ultimately ended up in the postseason. Good teams are good teams, and they usually can prove it early.
That’s why an expanded postseason probably makes sense, and this is what will make the 2020 season legitimate. We all know that the postseason is a “gauntlet of randomness,” as Billy Beane has been quoted as saying. So, what really makes the baseball season “legitimate” is getting the correct teams in the playoffs. After that, it’s hands off, and who knows what will happen? That’s just the beauty of playoff baseball.
It is somewhat surprising — and comforting — to me, at least, that, even in a season that lands somewhere between 60 and 80 games, we should be able to get the correct teams in the playoffs. While Cody Bellinger might actually get a crack at hitting .400, that will just be part of the fun of 2020 baseball. What really matters should ultimately remain pretty legit.