Losing Strokes Gained

The game of golf’s consistent, unwavering commitment to strokes gained begs the question of how they could do better. Every unexpected poor performance by a top golfer is met with a banal platitude that resembles some degree of, “You can’t predict golf!” I will surrender that golf inherently has a lot of variance due to its nature. Bounces of even a single foot can have outsized effects on scoring average. One additional roll can take a drive from a premium fairway lie and downgrade it to a lie that makes the ball disappear.

These realities do prove that sometimes you can’t predict golf, but instead of constantly ringing the same bell, why doesn’t the PGA TOUR and other golf data websites attempt to make golf more predictable? I’ve seen innumerable examples of post-round interviews where a player who had a poor strokes gained performance say, “I didn’t hit it that bad, things just didn’t go my way.” That makes me wonder - would looking at ball flight data show players having wildly different scores in rounds with similar physical-characteristic-based ball striking data?

At the forefront of sports analytics is of course, baseball. We’re a little over 20 years past the inception of Moneyball, the idea that common stats like batting average lead teams to have a faulty understanding of where run value derives from. Basketball has also seen a similar analytics renaissance that has caused three point shooting rates to increase rapidly over time. Although behind, we’ve also seen several NFL teams take advantage of analytics recently, placing larger emphasis on pre-snap motion and increased passing volume.

And then we have golf, who has “data analysts” parading around on X and broadcasts alike with logic like this: 

This is the equivalent of a baseball statistics account posting a graphic of the best ERAs in rain, a comment that would be met with amateur saberists raising their proverbial pitchforks at the statistical malpractice. Similar are the “models” posted by betting touts on X who assign percentages to different stat categories and then bet according to their unproven logic.

The problem in the PGA discourse isn’t the betting touts trying to make a buck off a large following, but rather the PGA TOUR’s closed door policy towards shot link data. After exchanging emails with the TOUR, I learned that only two public facing websites have access to their full dataset - Fantasy National and DataGolf. Fantasy National is the main culprit of the create-your-own-model paradigm where people move some sliders on their website and think they have found the winning golfers. DataGolf, on the other hand, actually serves as a phenomenal resource for anyone looking to research players and is run by people who actually care about the furthering of golf statistics.

Although the Courchene brothers, the duo that created DataGolf, do a great job towards innovation, they cannot carry the whole load of this movement. The creation of Baseball Reference and BaseballSavant both were watershed moments for amateur analysts and prospective front office members alike looking to add to their portfolio and uncover new truths about players. Unfortunately, the PGA TOUR’s data policy allows no such “academic use” as they put it in an email to me. 

So how could we fix the general statistical illiteracy and inability to understand why players play poorly or successfully? To me, the characteristics of a suite of statistics that could solve these problems would include quick stabilization, easy interpretability, and predictive power that outclasses the traditional strokes gained system.

The baseball equivalent of a statistics suite like this are Stuff+, Location+, and Pitching+. Not only do these stats check the aforementioned boxes, but also carry an added benefit of large use cases for front offices and amateur players. Put succinctly, Stuff+ aims to find individual pitches that yield low run values based on pitch characteristics such as vertical break, release point, axis differential, and many others. Location+ uses a game-theory-esque approach to quantify how well a pitcher locates pitches in low run value locations given specific counts. Pitching+ ties both Stuff+ and Location+ together, putting a single number to the pitcher’s overall process.

In golf terms, Stuff+ would be bifurcated between off the tee and approach, each with a target variable of strokes gained. The variables to use as features would be statistics like apex, ball speed, clubhead speed, smash factor, and curve. Although strokes gained isn’t a great catch-all statistic, it serves as a great target variable because it still correctly rewards quality locations on the course. My opinion is that, from a predictive standpoint, strokes gained over penalizes golfers for minute bad breaks that luck plays a part in. Stuff+ would highlight consistent ball strikers who have consistent data at the ball level, rather than the hole level.

Golf’s Location+ would basically be what strokes gained already is - a statistic that tries to quantify how well a golfer locates their golf ball based on scoring average of the next shot. Strokes gained does make one incorrect assumption - all distances are created equal. Strokes gained buckets shots by distance and lie, but it can’t tell you if you’re attacking from a favorable angle. This blind spot skews approach and around-the-green strokes gained data because 70 ft from the pin in the rough short-sided to the pin is an exponentially harder chip than one that is 70 ft from the pin in the rough with green to work with. Location+ could ideally factor in pin location to better quantify the quality of a miss and fill in this blind spot. In this ideal state, we could even see what golfer/caddy combos have poor course management. A golfer having a consistently high Stuff+ and consistently low Location+ would show golfers who simply aren’t getting the most out of the quality of their ball striking. You could also compare Location+ values across similar ball strikers to see how inefficient their results are compared to their ball striking quality. 

Tying these two together, Golfing+ would be the stat that marries Location+ and Stuff+, highlighting the golfers who have the highest quality ball striking metrics paired with the expected scoring opportunities. Pitching+ is noted to be a better predictor of ERA year-to-year than any other pitching metric including ERA itself. I believe Golfing+ could see this same effect, being a better predictor of year-to-year strokes gained than strokes gained itself due to its emphasis on segmenting ball striking into two distinct parts - final location and physical characteristics. 

To further the above points, here are a few highlights from Rory McIlroy’s Masters 4th round second nine to elucidate these exact improvements and shortcomings. 

On Sunday’s 3rd hole, McIlroy hit a 333 yard drive, leaving 24 yards to the pin for his approach shot, rewarding him with +0.37 strokes gained off the tee. DataGolf deems the baseline scoring average from 24 yards out in the fairway to be 2.6 strokes. The scoring average on this hole was 3.97, so therefore strokes gained is +0.37 as 3.97 - 2.6 - 1 = +0.37.

From a Stuff+ perspective, this shot would show a high propensity for future strokes gained as hitting it 333 yards with driver takes above average apex, clubhead speed, and ball speed. However, Location+ would likely be harsher on this shot than strokes gained as the massive greenside slope makes a chip on the left side of the green exponentially harder than the one on the right. McIlroy’s next shot shows the true difficulty of this pitch shot: 

McIlroy’s shot-making here is world class. If he lands it on the green, the ball likely flies off the back run off, creating a difficult third shot. As the clip shows, the ball being 24 yards from the hole here isn’t necessarily the advantage it is on another 24 yard approach shot. Had he been on the right side of the hole, McIlroy would’ve had an easier chip that allowed him to funnel the ball towards the hole. For this chipping feat, McIlroy was awarded a measly +0.12 strokes gained. I believe Stuff+ would show this shot as being elite given the tight apex and spin windows needed to achieve an above average proximity. 

McIlroy’s 11th hole on Sunday illustrates another strokes gained shortcoming. In proverbial jail on the right of Augusta’s 11th hole, the course’s architecture gives McIlroy one option - a punch out:

While watching in real-time, any McIlroy backer held their breath as the ball kept rolling towards the water. McIlroy’s initial reaction to the shot, however, looks extremely positive. He club twirls, and looks on as if he has hit the intended shot. His end look is one of puzzlement as if he had no thought that it could possibly be near the water’s edge. 

The strokes gained for McIlroy’s punch shot shown above: -0.19. Had that ball rolled a yard to the left, the strokes gained would’ve been worse than -1. Would the quality of McIlroy’s punch out been different on a ball-striking level if it had rolled into the water? I don’t believe it would have. The ball staying up in this instance was a stroke of luck for McIlroy in the moment, but had it gone in the water by just one roll, the narrative of McIlroy’s shot would’ve been one of bad fortune.

In my eyes, McIlroy’s Stuff+ here was high because he hit the ball with adequate ball speed, apex, and spin to set up the ability to make a par. The Location+ here would be below average. A chip from the green’s front left with a left pin isn’t as low of a scoring average as a chip from the green’s front right or center. If McIlroy ended up in the water, I would dock his Stuff+ because his ball speed was higher than other similar punch outs that weren’t water bound. The Location+ would also lessen, however to a lesser extent than strokes gained because his ball would have barely trickled into the water. Shots that have no chance of staying up should be penalized more than those that stood a chance to stay dry. 

Another key moment of McIlroy’s rollercoaster second nine was his 3rd shot on 13: 

This wedge shot from 86 yards out resulted in -2.04 strokes gained, but how bad of a shot was it really? At the moment, the result seemed almost impossible - with the Masters on the line, how could you possibly hit such a seemingly benign shot so poorly? The strike of McIlroy’s shot didn’t seem poor, rather the club face was left open, pushing the ball right. The ball flight isn’t a massive fade, doesn’t have unwanted spin as evidenced by the lower ball fight, and given a starting line two yards to the left, would’ve been a great shot by strokes gained standards. 

The Stuff+ here sees docks to a lesser degree than strokes gained. As highlighted above, the only flaw in this shot is the line on which it flies. If McIlroy were to hit this shot on any other hole, his strokes gained penalty wouldn’t be nearly as harsh. I believe Stuff+ would more accurately dock this shot than strokes gained. Obviously, the Location+ of a water bound approach here is a less than ideal outcome, however, had the ball been 3 feet left we wouldn’t see as notable of a strokes gained penalty. Golfing+ would have Stuff+’ relative indifference to the ball’s strike overriding the lowly Location+ to see less of a penalty than strokes gained attributed. 

McIlroy’s approach on 17 may have been the shot of the tournament. After clubbing down off the tee, McIlroy faced a 197 yard approach shot into the green:

Given a proximity to the hole of two feet from 198 yards out, this shot rightfully was given +1.10 strokes gained. What makes this shot so spectacular isn’t just the sheer proximity, but rather the ability to hit the ball with such a high apex from that distance. Most players in the field decided to push driver up in the fairway on 17 throughout the week, a main reason being the slopes of the 17th green, like most around Augusta, require spin control to land the ball on the correct spot on the green. Someone who took the route of pushing driver up the 17th fairway was 3rd place finisher Patrick Reed, who faced 146 yards on his approach shot:

The bewildered Reed holed out from 146 yards, yielding a massive +1.89 strokes gained. Reed’s shot here given the stage is nothing short of spectacular, but is it as good as McIlroy’s?

Strokes gained would say that Reed’s shot was .79 strokes better than McIlroy’s which holds some truth as McIlroy birdied the hole and Reed eagled it, a one stroke difference. If we were to use strokes gained to predict future scores, would we want Reed’s shot to be rewarded more than McIlroy’s? 

Shots like Reed’s hole out above are going to skew Reed’s approach statistics in any predictive model using strokes gained as a baseline. If Reed’s shot flies 2 yards further or a yard to the left, the strokes gained drops upwards of 1.5 strokes. Compared to McIlroy, Reed’s shot making ability pales in comparison. Even if he tried, Reed couldn’t hit the shot McIlroy did from a physical characteristics standpoint.

Top golfers are great because they can control their golf ball. During pre-major media sessions, golfers are often asked what it will take to win that week and they routinely reply with some variant of, “You have to have control of your golf ball.” How come there’s no statistic to quantify golf ball control while it’s the primary concern of golfers heading into the biggest events?

To have a quickly stabilizing stat like a golf Stuff+ would enable us to correctly rank newcomers and scout prospective golfers. You could take a sample of a collegiate golfer’s tournament and see how their Stuff+ stacks up against a pro golfer, allowing them to identify where their physical ball-striking characteristics fall short. 

Another problem this statistic may aid in solving is understanding form better. Oftentimes when a player has a standout result after middling past performances they note that they are “trending well.” Take Jake Knapp after shooting 59 in the first round at the Cognizant Classic for example:

Knapp’s strokes gained on approach were definitely above average but his off the tee game was nothing short of erratic. In the interview, Knapp mentions a standout round 4 ball-striking display in a prior tournament. Perhaps we could’ve seen a Knapp breakout performance in the Cognizant had we noticed tighter approach/off the tee dispersions paired with more consistent spin numbers. 

Lastly, the suite of Stuff+, Location+, and Golfing+ has business use cases as well. Strokes gained being a relatively ineffective predictor of future strokes gained makes it hard to assign dollar values to players. Strokes gained depend on field strength, making it difficult to translate numbers from tournament to tournament. Stuff+ would allow you to place a player’s ball striking anywhere and see how it would thrive given certain conditions. Players could then use this predictive power to select tournaments where they will be rewarded for their skill set while in hand allowing sports agencies to calculate theoretical player expected values. 

Social media could also see a transformation given these numbers. Users love to see Jake Knapp stingers, Cam Young 330 yard drives, and Scottie Scheffler darts from 200 yards out. But how good are these shots really? To put a single, interpretable number to shots on a social media post could illuminate just how talented these golfers are and motivate people to come to events. Additionally, it could make broadcasts instantly more entertaining when you could see an instant flash from a live graphic reading: “Scheffler’s approach Stuff+: 134 (TOUR average 100)”. Fans would instantly be able to see that Scheffler’s approach prowess on an aforementioned shot was x percent above TOUR average.

The road to a more understandable, cohesive analytics community isn’t hard-to-understand statistics, it’s highly useful, interpretable numbers that hold multiple use cases and try to answer the big questions. At some point, the TOUR’s system to quantify golfers will need to change for the sake of furthering the game. It’s one of the few sports yet to succumb to a full “Moneyballification” and could use it to increase interest from prospective analysts while also creating more useful metrics for its’ players. Like other sports have evidenced, sometimes you need to question what you’re being told to make progress in uncovering unknown truths.

Next
Next

Overreaction Season and Quinn Priester