Statcast introduces hit probability for 2017

If the 2016 title-winning Cubs taught us anything, it's that untangling pitching skill from defensive talent is more difficult than ever. Did Chicago have a best-in-baseball 3.15 ERA because starters like Jacob Arrieta, Jonathan Lester and Kyle Hendricks were so talented and avoided hard contact? Or because a defense led by Jason Heyward, Addison Russell and Javier Baez was historically good and simply converted those balls into outs better than anyone?
The answer, then and now, was likely "both." We know that defense functions as a unit that includes both pitcher and fielders, and we know that in order to put up the lowest Batting Average on Balls in Play in more than three decades, as the Cubs did with their .255 mark, both pitchers and fielders have to be doing something right. But how much of each?
Today, let's take a step towards trying to answer that question, by introducing one of our new Statcast™ metrics for 2017: Hit Probability. (See also: Catch Probability.) While there's a lot of complicated math that goes into it, it attempts to answer a very simple question: Based on the exit velocity and launch angle of the batted ball, how likely was the ball to land for a hit? That's trying to get to the heart of what a pitcher and hitter control while attempting to take out the effects of defense and ballpark.
It can be expressed as a percentage, which adds instant context without even needing to know all the underlying data that goes into it. For example, think back to the National League Wild Card Game last year, when Brandon Belt crushed a 106-mph rocket off Noah Syndergaard a projected 408 feet in the sixth inning of a scoreless game. That ball had a Hit Probability of 95 percent, which is to say that ball lands for a hit nearly every time.

Now, this particular one didn't drop, because Curtis Granderson made a fantastic play to reel it in, and so Belt had an 0-for-1 in the box score. But that's the point, really, isn't it? Whether or not the fielder had the speed, skill, positioning or luck to make the catch had nothing to do with Belt. Setting aside foot speed for infield hits, his impact on whether the ball was a hit or an out ended as soon as the bat made contact. If the ball had dropped for a triple because Granderson took a bad route or if Jay Bruce had been there and couldn't get to it like Granderson did, well, that doesn't change the skill Belt showed, does it?
You can do this for any tracked play. For example, Addison Russell's double in World Series Game 6 had only a 15 percent Hit Probability, as it was hit at only 78 mph, but Tyler Naquin and Lonnie Chisenhall miscommunicated about who would get it, so it dropped. Because neither fielder touched it, it counted as a hit, not an error -- despite the ball being an out 85 percent of the time.

What this really gets to is what we introduced with Barrels, the combination of exit velocity and launch angle that have a minimum Hit Probability of at least 50 percent. The average Barrel has a Hit Probability of 82 percent, but we look at all possible speed/angle pairings, from the easy popups near zero percent to the sure-thing homers at 100 percent, and everything in between.
But why stop with individual plays? Since we now know not only the expected outcome of each ball in play but also the run value of how dangerous each batted ball type can be, it's a pretty easy jump to combine everything into season-long leaderboards, by simply looking at the expected outcome of each plate appearance, and outputting an expected OPS. (This can also be expressed as wOBA, but we'll keep it simple for now.) We'll include a pitcher's demonstrated strikeout and walk rates as well, so it's not only about batted ball contact.
For this example, we looked at 167 pitchers who faced at least 100 hitters during the 2016 season and created leaderboards based on the expected quality of contact. That is, we're trying to answer the question: "Which pitchers had the best combination of strikeouts, walks and limiting dangerous contact?" Let's give you the leaders first, then follow up with some more detail.
2016 starters with lowest expected OPS (minimum 300 plate appearances)
.517 -- Clayton Kershaw
.580 -- Rich Hill
.592 -- Syndergaard
.595 -- Hendricks
.598 -- Yu Darvish
.604 -- Max Scherzer
.612 -- Jose Fernandez
.619 -- Stephen Strasburg
.628 -- Tyler Anderson
.630 -- Lester
That Kershaw is atop this list is no surprise. It's by how much he tops his Dodgers teammate, Hill, that's shocking. While we're seeing gaps of a few points of OPS between most of the top 10, for Kershaw, it's an amazing 63 points. The takeaway here is that not only does Kershaw miss bats at an elite level, but when hitters manage to make contact, it's not good contact. Throw in excellent control, and it's the perfect recipe for baseball's best pitcher.
Colorado's Anderson gained notice for limiting exit velocity as a rookie, but what's interesting here are the Cubs, Hendricks and Lester. Remember, this is completely independent of defense, and we're still seeing Hendricks as the fourth-toughest starter to face based on contact rate and contact quality, and Lester as the 10th best. While Hendricks' real-life OPS against was .581, his estimated defense-free mark was .595, so it's fair to say he earned just about all of that success. Meanwhile, Arrieta's real-life OPS was .583 while his estimated was .656, a much larger gap that suggests that a good deal of his success actually could be attributed to Chicago's defense.

You probably have a few questions. Let's answer them.
What does this number include?
Thanks to the hard work of MLBAM's Tom Tango and Daren Willman, each tracked batted ball was assigned an expected Hit Probability based on exit velocity and launch angle, which accumulates into a player's seasonal average. The limited subset of batted balls that were not tracked by the Statcast™ radars were given an estimated measure based on a system devised by Tango. Strikeouts, walks and hit-by-pitches were included to give a full number, not just balls on contact.
What is the percentage based on?
It's based on the Major League average for the combination of exit velocity and launch angle over the two seasons of Statcast™, and it includes a smoothing process to include larger samples. For example, for the individual pairing of "100 mph and 30 degrees," we looked at all balls within 4 mph and 4 degrees, with proportionately greater weight to those balls closer to the 100/30 pairing.