In the world of baseball fandom, the bullpen is a stressor. Though some may have more confidence in their relief arms than others, there’s always a little voice in the back of one’s head going, “God, his only job is to not blow this lead. I hope we score more runs.” And since humans are imperfect, leads are blown. In 2019, there were 687 blown saves. It happens. But, in 2020, those blown saves are each 2.7 times more important — through just four days of the season, it’s pretty evident that the middle reliever-induced stress has been heightened substantially.
That raises some interesting questions. Does the bullpen become more important this season? If so, does it impact every team equally? How can we attempt to understand the impact of the bullpen in a 60-game season? Will teams with better bullpens be at a more significant advantage than normal?
The problem with many of these questions is that the answer can be as simple as, “I don’t know.” There’s no way to really understand how the bullpen is going to be used this year. In the history of baseball, there has never been a 60-game sprint like this, so understanding how managers may or may not use their bullpens this year remains a black box. WIth more information as the season progresses, we may be able to better digest the bullpen’s impact. For now, we can only guess.
But I was curious anyway. I considered the most direct way to find data that supports what’s happening in 2020: the last two months of the season. If managers are going to push their bullpens, that’s going to be the time to do it. If they want to win, that’s going to be the time where they bring in their elite arms — and as much as possible.
There are caveats here, and they’re pretty important to understand.
First, there’s wonkiness from 2020 being 2020. Starting pitchers aren’t as stretched out at the beginning of the season, which will require extra innings from relievers early. Second, rosters are larger, and they’re variable. We have 30 on the team right now, 28 on the team two weeks in, and then 26 two weeks after that. We can’t control these variables using data from prior years.
Second, prior to 2020, a ton of teams are out of the postseason race by August 1, the timeframe I considered for the study. If, for example, the Padres — a totally-capable postseason contender — had a really cold stretch that made the front office think, “Maybe 2021 is our year,” then they probably wouldn’t be pushing their bullpen from August 1 on. That may be where we see more auditions, young arms, bullpen pieces with even more variance than we’re used to.
Third, roster expansion is an issue. Though this rule has since been changed, teams were able to have up to 40 players on their roster from September 1 through the end of the season. Though this probably doesn’t impact every team exactly the same — rebuilding teams are much more likely to take advantage than contending teams — there is the potential for confounds here. The 2019 Astros beat the Athletics 15-0 on September 9. Bryan Abreu and Cionel Pérez, who pitched a combined 12 games for the team last year, both appeared in relief. That outing potentially saved Houston’s better bullpen arms, allowing them to be more effective when needed later on.
In an attempt to decrease some of this known additional variability, I considered one more input variable: contention, defined as teams within three games of a postseason spot at the end of games on July 31. I figured this would eliminate some of the extremes and better emulate what we will see this season, and indeed it did. From 2015 to 2019, the standard deviation of full season bullpen runs allowed per nine innings (RA9) was 0.62 runs. For the last two months, the standard deviation was 0.77 runs. But for the last two months among contending teams, the standard deviation was 0.60 runs. And, as you can see in the boxplot below, we remove most of the high RA9 outliers, which were disproportionately from teams who weren’t in the race to start the final sprint.
Next, I wanted to consider a few different explanatory and response variables. For this process, I wanted to focus on one thing: results. While FIP is a great measure of pitcher value, it doesn’t work as well here, since I’m looking only at run prevention. Simply, I don’t care if a reliever has a poor strikeout-to-walk ratio; if he didn’t allow any runs, that’s what matters. Thus, I used RA9. I also looked at win probability added (WPA) as a potential input, since it considers context; if a team is up 8-2 with three innings to go, and the bullpen allows one run in the seventh, eighth and ninth innings, it doesn’t matter in the win-loss column, despite them having an RA9 of 9.00. That’s why I think WPA is probably the best input variable for a study like this.
As explanatory variables, I considered two, both focusing on team winning percentage. The first potential explanatory variable was straight win percentage. Did better bullpens — either by RA9 or WPA — tend to correlate to more wins? You would think so. The second explanatory variable I considered was the difference between a team’s winning percentage and pythagorean winning percentage (expected win percentage by run differential). Do better bullpens tend to lead to teams outperforming their expectation? You would think so here, too, since success in one-run games tend to be a combination of luck and successful bullpens.
I considered all of these measures for all three situations: full-season bullpens, last two month bullpens, and last two month bullpens in contention. (Win percentages for all of the last two month data is reported as the win percentage from August 1 through the end of the season, in tandem with the bullpen results data.) Here are the results:
There are some interesting takeaways here. First, it’s important to note that none of these relationships are particularly strong, which does make sense — a lot more than bullpen goes into winning ball games. But, if bullpen performance alone can explain 35% of the variability in win percentage over the course of a full season, that means it’s pretty important nonetheless. Second, data over the full season and over the last two months appear to be pretty close across the board.
Where things get interesting is when we focus on contending teams. The correlations between RA9 and win percentage and WPA and win percentage are much weaker than the full season data and the last two month data. It’s possible that this is because there is less variability in the winning percentages of the contending teams, but that does not appear to be the case. In fact, there seems to be a touch more variability among contending teams’ final two month records (standard deviation of .084) than overall full-season records (sd: .078). As mentioned, there is a touch less variability in bullpen RA9, but I doubt that would result in such a significantly lower correlation. Sample size could be an issue, but that looks like a fairly significant drop. I remain perplexed by this.
What didn’t surprise me as much was the relationship between WPA and outperformance. Based on this data, it appears that the bullpen is 52% more conducive to outperformance among contending teams over the last two months than it is for all teams over the course of the season. In a short season, a bullpen that can close out games — irrespective of score or runs allowed — is key, and that is demonstrated there.
Still, the data remains conflicting and potentially unclear. My hunch that the bullpen will be more important this season isn’t necessarily debunked, though I can’t say with all certainty that it’s been confirmed, either. What I do feel confident in saying is that while all teams may not live and die by the success of their bullpens, having a solid group of arms that can close out games without worry will be even more key this season. If each blown save is 2.7 times more important, you better hope that they’re less than 2.7 times as infrequent.
— Devan Fink