Command of Match (COM) and Lost Opportunity Score (LOS)
I’ve been watching the Davis Cup tie between Borna Coric and Frances Tiafoe.
The first set, won by Tiafoe in a tiebreak, is largely marked by Borna Coric showing an incredible lack of touch around the net (he cannot volley at all*) and the near impossibility of keeping his forehand in the court.
*Separate research project: Check The Match Charting Project to see if Coric is the worst volleyer in the Top 50. Starting hypothesis is that he is.
Then when the second set kicks off, Frances Tiafoe can’t win a game, and barely any points. His energy level plummets and his serve is about as bad as you will see from a male professional tennis player. He loses the second set 1-6, and proceeds to carry that over into the third set. In an 11 game stretch, he wins only 1 game, and only 14 of 53 points, resulting in a 0-4 deficit in the third set. Although his aggressiveness waned, I think the most important factor is that Coric stopped making those horrendous unforced errors, at which point Tiafoe’s weaknesses were all brought to the fore (or is it “foe”?).
But at 4-0, Coric started making horrendous errors again. I mean horrendous. At one point, he makes 7 in a row, and voila, Tiafoe’s energy returns. Tiafoe does not play great in the rest of the third set, but Coric is so bad that Tiafoe comes back and wins the second set tiebreak. (In fairness, the tiebreak itself is fairly well-played by both players).
As I’m writing this, I have not started the fourth set. I know Coric wins the next two sets because I know Croatia is in the finals, but I actually haven’t looked at the set scores for the final two sets. Based on what I’ve seen, it seems almost certain that if Coric can keep his forehand in the court, he will win, and win easily. For all the Twitter talk of how Tiafoe was a warrior in the match (apparently forgetting that 11 game stretch), he shows no sign he can control the points. His backhand is merely steady, and his wack-a-doo forehand stroke just rolls the ball around the court. In other words, this match is not on his racquet.
I paused in my viewing of the match, partially because I needed a break and partially because I wondered if there’s anything in the statistics that would tell someone who didn’t watch the match that the match is entirely on Coric’s racquet. And looked at another way, if Coric had lost, could you look at his stats and know just how bad the loss is, because the match truly was on his racquet, and only he could blow it?
There are probably several ways to do this, and what I’m presenting here is perhaps the most back-of-the-envelope way to do it, primarily because I’m starting it on a whim at 11 pm while trying to stay interested in this match. So I think this is just a toy stat, although as I have posted before, I think toy stats have their own kind of value.
But “stat” is the wrong word for the two things I’m proposing here. “Status” is probably a better word...toy status(?). Both COM and LOS seek to identify particular matches, rather than producing a statistic for every match.
I’m doing LOS first, because I was initially motivated by wondering how horrendous it would have been if Coric had lost this match when he was in total control of the match.
Lost Opportunity Score (LOS)
I’m using the acronym LOS for this concept, but it is a bit of a misnomer because it isn’t really a score. Nevertheless, the acronym is so apropos that I can’t drop it. LOS should indicate when the match a player lost was almost entirely on his/her racquet and he/she blew it with too many errors.
We already have Carl Bialik’s Dominance Ratio (available for every match on Tennis Abstract), which indicates how much a player dominated the match statistically, but we don’t know when that dominance is attributable to the winning player playing great, and when it is attributable to the losing player playing horribly.
COM is trying to identify when the winning player was in control, even when the other player did not play poorly. In other words, COM isn’t designed to measure how in command one player is (though I suppose you could use it for that), but rather, to identify those relatively rare matches where the match a player won was almost entirely on his/her racquet even thought his/her opponent may have played reasonably well.
The fundamental basis for both LOS and COM is the same. For each player, calculate this number:
(1-(OppUEs/Points Won)) - (UEs/Points Played)
The first part of the formula determines what percentage of points won by the player were not gifts from the opponent. Some of those points may be unusual situations, but most of them will be winners or FEs caused by the player, and therefore within the player’s control. The second part of the formula indicates what percentage of overall points were gifts given away by the subject player.
Conceptually, if your first number is high, you were controlling the match to a significant degree, but if your second number also is high, you gave away a lot of points in a match.
To calculate LOS and COM, you need just one more step.
Lost Opportunity Score (LOS)
Divide the losing player’s number by the winning player’s number. If the losing player’s quotient is greater than 1.10 (in other words, 10% higher), it’s a lost opportunity (LOS). In other words, the losing player had the match on his/her racquet, but made so many unforced errors that he/she gave the match away. The 10% buffer is to capture only the most egregious of these situations. It is approximately 1 standard deviation away from the average loser quotient.
Here’s an example from the first round US Open match between Sam Stosur and Caroline Wozniacki, won by Wozniacki. From the match score (6-3 6-2) it appears to be an easy win, and Woz’s dominance ratio was 1.56. Stosur won 45 out of 110 points. She made 34 UEs and Wozniacki made just 12.
Stosur’s number via the formula above is (1-(12/45)) - (34/110) = .424
Woz’s number via the formula above is (1-(34/65) - (12/110) = .368
Then, .424/.368 = 1.15 (greater than 1.10), so Stosur gets a Lost Opportunity (LOS) “award.”
Looking at the first part of the formula, Stosur’s points won were largely because of good things she was doing (73.3%), and Woz’s points won were mostly about Stosur doing bad things (47.7%). The second part shows Stosur made unforced errors on nearly 31% of points played, and Woz, typically, only 10.9%. That’s in keeping with what we know about their respective styles.
Bottom line: Stosur controlled the action in the match, but due in large part to the high number of UEs, lost the match. I suspect this is not uncommon for Wozniacki opponents. (See Caveats at the end).
Subtract the losing player’s number from the winning player’s number. If the winning player’s difference is greater than 0.13 for men, or 0.17 for women, the winning player had command of the match (COM). In other words, the gap between how much control the winning player had, and how much control the losing player had, is so significant that we say the winning player was in command via his/her own efforts. Significantly, you can get a COM even if your opponent played reasonably well.
You might wonder where the 0.13 and 0.17 come from. Using US Open matches as the measuring stick, these numbers are 1.5 standard deviations from the mean differences between the players, so we are only capturing relatively rare matches with COM. I tried it with 2 SDs, but the list was far too thin.
Here’s an example from the first round US Open match between Simona Halep and Kaia Kanepi, since most of us saw at least some part of that match and know there wasn’t much Halep could do in that match. The score alone (6-2 6-4) gives us some indication of Kanepi’s level, and the dominance ratio was 1.36. Halep won 47 out of 107 points, not that much different than in the Stosur example. Unlike Stosur, she made only 9 UEs and Kanepi made 28.
Halep’s number via the formula above is (1-(28/47)) - (9/107) = .320
Kanepi’s number via the formula above is (1-(9/60) - (28/107) = .588
Then, .588 - .320 = .268 (greater than .17), so Kanepi gets a Command of Match (COM) award.
Going back to our concept with the first part of the formula, Halep’s points won were largely because of bad things Kanepi was doing, with Halep controlling only 40% of those points. She didn’t hurt herself with errors obviously. And because of that, only 15% of Kanepi’s successful points were due to her opponent’s mistakes.
Bottom line: Kanepi controlled the action in the match, to such a degree that even her significant number of errors, and Halep’s lack of errors, could not stop her.
This is not scientific, so let’s get that out of the way. I haven’t tested it on gobs and gobs of data.
Also, only 13 hours have passed since I first thought of the idea (and 7 of them were spent sleeping), so I reserve the right to make adjustments (or even scrap LOS and COM altogether).
I initially see three issues with LOS and COM:
1. UEs are not official statistics of the ATP and the WTA. They are typically recorded for the grand slams, although I noticed the IBM Slamtracker didn’t bother with many lower profile matches at the US Open. Only 178 of the 254 US Open main draw matches had meaningful UE statistics. In the other 76 matches, IBM Slamtracker reported UEs, but they are clearly understated by vast amounts, so I’m not sure why they even list them (or winners). For example, Andrey Rublev had only 5 winners and 13 unforced errors in a four set match, while his opponent Jeremy Chardy also had only 13 UEs? High-risk player Nikoloz Basilashvili had only 7 UEs in a five set match against Aljaz Bedene? I don’t think so.
So, LOS and COM are good for only Grand Slams, matches that have been charted, or matches you are watching on TV that flash the summary numbers at the end of sets or matches. I don’t feel too badly about this.
2. UEs are extremely subjective. Anyone who has charted a match and then seen the on-screen statistics from the TV broadcast knows the number of differences in judgment that can arise as to whether a player should have made the shot or not. Hopefully some of that is taken care of by the 10% buffer in the LOS calculation and the 1.5 standard deviation buffer in the COM calculation.
3. Aggressive players are far more likely to get a LOS or COM than steady players. It’s not necessarily a bad thing in and of itself, so long as no one says “Wozniacki has 0 COMs in 2018″ (if in fact she does have zero) and uses that as a stick to bash her with.
As a corollary, recognize that aggressiveness is just one way to measure who had control of the match. Steady play with few errors is arguably just as valid a way to keep the match on your own racquet, though it is a lot more subtle. Perhaps a player should get automatically get a COM if his or her opponent gets a LOS, but I’m not yet convinced that’s the right approach as it presumes the LOS players errors were mostly attributable to the steadiness of the opponent.
Since this one is so long, I’ll do another post with the list of LOS and COM awards from this year’s US Open.