February 7, 2013

David Brooks tries his hand at explaining regression toward the mean

Here's the coin-flipping scene from Rosencrantz and Guildenstern Are Dead,
with Gary Oldman blithely on an epic hot streak, much to the
dismay of the more scientific-minded Tim Roth.

Recently, we were kicking around Galton's paradoxical concept of regression toward mean. Galton discovered that there'a a wholly mathematical side to regression. And yet, it's also worth looking at examples of how human decisionmaking can increase or decrease the rate of regression. Regression toward the mean is such an important concept for understanding how the world works that it's worth unpacking the idea so that people don't get wrong ideas stuck in their heads.

Now David Brooks is giving regression toward the mean a try.

From the NYT:
For example, every person who plays basketball and nearly every person who watches it believes that players go through hot streaks, when they are in the groove, and cold streaks, when they are just not feeling it. 
But Thomas Gilovich, Amos Tversky and Robert Vallone found that a player who has made six consecutive foul shots has the same chance of making his seventh as if he had missed the previous six foul shots. 

There's got to be a better way to phrase this, right? Brooks is presumably talking not about two players you've never seen before, but about one player with years of NBA free throw shooting experience, who is widely acknowledged to have reached what appears to be a career plateau, and has no obvious problems contributing to a cold streak. Brooks' next paragraph is better:
When a player has hit six shots in a row, we imagine that he has tapped into some elevated performance groove. In fact, it’s just random statistical noise, like having a coin flip come up tails repeatedly. Each individual shot’s success rate will still devolve back to the player’s career shooting percentage.

If we are just talking about undefended free-throw shooting (not field goal shooting where the distance and defense varies constantly, especially in reaction to the results of the last few shots taken), an we're talking about an NBA veteran who has plateaued over several years at around, say, a 75% free throw success rate, well, then don't get too excited about him making six in a row. There's an 18 percent chance that from pure randomness, a "true" 75% shooter would make six in a row.

On the other, say you are the coach. With one second left in a tied game, the refs call a technical foul on the other team and rewards your team with one free throw. You get to pick the free throw shooter who will have one chance to win the game. Your two best available potential shooters are both veterans with career percentages of 75%. But one has made his last six free throws and the other has missed his last six. Which do you pick, or are you indifferent (as Brooks implies)?

Well, of course you go with the guy who looks like he's on a hot streak. Maybe he's not really on a hot streak, but at least he's not on a true cold streak. Maybe hot streaks are just the absence of cold streaks, but cold streaks caused by very real detrimental factors definitely exist.

There's only a 1/4096th chance of a 75% free throw shooter missing six in a row out of pure bad luck. So, missing six in a row could very well be a sign that he has a secret injury he's not telling you about, or that he's developed a hitch in his shot that he needs some extra free-throw shooting practice to work out, or that he's mentally flustered. So, as coach, you do something. The first thing you do is you don't assign the cold streak guy to shoot the technical foul shot (unless it's some mind game strategy you have of improving his self-confidence by showing your confidence in the cold shooter, but you'd better have your playoff spot clinched before you do that).

Hot streaks and cold streaks in field goal shooting (in the regular run of the game) are more complicated because of defense, which often shifts in response to streaks.

If you look at enough basketball statistics, you see that NBA players generally shoot poorly from the floor their rookie year, then maintain a fairly consistent level, except for off years (presumably caused by minor injuries, divorces, cocaine addictions, spats with coach or media, or whatever), until they hit a decline phase near the end of their careers.

On the other hand, strange things do happen. For example, veteran center Wilt Chamberlain shot .563 in his first three seasons with the Lakers, but then shot .683 over the last two seasons of his career, because his new coach, Bill Sharman, talked him into emphasizing defense like his more successful old rival, Bill Russell, and thus only take easy shots. (But, not too many conclusions should be drawn from anecdotes about Wilt. We still talk about him so much because he was so sui generis.)

But it takes a fair amount of coaching to keep NBA players near their career plateaus.

Say you have two starting guards, one (O) who is good at offense but not defense, and the other (D) is vice-versa. If they took exactly the same shots from the floor, O would make 60% and D 40%. So, of course, you devise game plans where O takes more shots, especially more of the hard shots, and D takes fewer shots, especially fewer of the hard shots. That moderates their respective shooting percentages to, say, 55% and 45%. (Under some conditions, the smart strategy is to push this all the way until both have the same shooting percentage. You want to keep arbitraging marginal advantages down to the vanishing point.

Or, consider the effects of defense on a single player. Say, Jeremy Lin starts off a game making two shots in a row against the Lakers last year. Is this just luck? Maybe. Or maybe he's being nominally defended by 37-year-old Derek Fischer, and so Lin can probably get open looks all night, and, indeed winds up with a career-high 38 points. A little while later, Lin starts a game off missing shots. Cause for panic? Or should he keep firing away because he'll be bailed out by regression toward the mean? If he's being guarded by, say, LeBron James, it's likely time for an agonizing reappraisal of the shoot-Jeremy-shoot tactics that worked so well against Fischer.

In general, if a player is 6 for 6 in the first half of a game because he owns a mismatch over his defender, at halftime the coach will probably tell him he should be shooting more. Assuming, say, a 50% breakeven point, the team would be better off if he went 10-13 in the second half rather than 6 for 6 again, because going 4 for 7 on the incremental shots would be to the team's advantage.

In general, coaches actively encourage players to regress toward their means and the team's means. If a player is missing hard shots, the coach will run plays where he gets fewer hard shots and fewer shots overall, but achieves a higher percentage because he's more limited to taking easy shots like open-court layups and offensive rebound dunks. If a player is hitting shots at a rate above the expected percentage, especially if he's enjoying a defensive mismatch, the coach will try to get him the ball more and have him take harder shots. The coach wants his hot hand to regress toward his mean, just not quite all the way. Meanwhile, the opposing coach is tearing his hair out trying to come up with a way to stop the man with the hot hand.

These kinds of defensive adjustments that encourage regression toward the mean happen at all levels from the most minutely tactical (shading a player a few inches more in one direction) to the most front-office strategic (trading for a defender to stop an archrival's best shooter). The classic paper cited above about the 1980s Philadelphia 76ers mentions that guard Andrew Toney was universally known as a "streak shooter," but there was no evidence that he went on longer streaks of makes or misses that his teammates. Instead, he was a great outside shooter (before a severe injury in his sixth season wrecked his career) who had the talent to make memorable strings of shots against the mighty Celtics in big games. (His nickname was The Boston Strangler.) To stop Toney, the Celtics traded for Dennis Johnson, one of the greatest defensive players of all time.

So, regression toward the mean just doesn't happen, it's often actively encouraged.

Other questions involve which mean a player should aim to regress toward: his natural mean or his current team's mean.

For example, in 2006 Kevin Garnett averaged 22 ppg on .526 shooting, while Kobe Bryant averaged 35 ppg on .450 shooting. Both teams had bad supporting casts (ladies and gentlemen, Laker's point guard Smush Parkerrrrrrrrrr!), but Kobe's team won 12 more games, in part because he took so many more hard shots than Garnett.

Put Garnett on a good team, however, and he's a wonderful team player. Put Kobe on a good team, and he wins a lot, but you know it will be a soap opera.

28 comments:

x said...

he's been reading your blog again to get ideas about what to write about?

helene edwards said...

An unbelievably stupid citation; nobody has ever referred to a hot streak in basketball outside the context of floor shooting.

mel belli said...

Probably the worst NBA free-throw shooter ever was Chris Dudley of the Nets. At the end of a close game, Don Nelson used to have one of his players just bear hug Dudley, whether he had the ball or not, to make sure he had to go to the line.

Pincher Martin said...

One of the best players to show noticeable and even impressive improvement in his free throw shooting after the midway point of his career was Magic Johnson.

For his first five seasons in the NBA, Magic was a good, but not great, free throw shooter, averaging 79 percent at the foul line. Over his last five seasons, he averaged 89 percent and led the league in free throw percentage during one of those years.

(Larry Bird was another NBA player whose free throw percentage got higher over the course of his career, but his improvement was slightly less dramatic than was Magic's.)

Many great players, like MIchael Jordan, get no better over the course of their career at shooting from the charity stripe. MJ might have even got a little worse, which is surprising when you consider how hard he worked at his game to get any edge on his opponents.

mel belli said...

Tim Hardaway, usually a pretty good shooter, once went 0-for-18 from the floor in his home town of Chicago. He was being guarded by Michael Jordan.

Pincher Martin said...

Magic also greatly improved his three point shooting over the course of his professional career.

I think Larry Bird motivated Magic to improve his shooting. Larry took more three pointers in his rookie season than Magic did in his first ten seasons. But by the end of their careers, Magic was taking more three pointers than Larry did in a season, and making a respectable percentage of them.

I think Bird also influenced Magic to improve his free throw shooting.

Anonymous said...

I'm guessing that in the first quote Brooks is trying to articulate the concept of independent events. Each throw is independent in the same sense as each toss of a coin is independent. The probability of s successes in n throws is the binomial distribrution with the probability of success approximately equal to the proportion of succesful throws at that particular time in the thrower's career. I don't think this is quite the same as regression to the mean. It's equivalent to the observation that the probability is very small of getting a long losing or winning streak where the proportion of failures is much greater than the probability of failure or where the proportion of successes is much greater than the probability of success. (Think of the opening coin-tossing scene in Rosenkrantz and Guilderstern Are Dead.) Even the simple binomial distribution can lead to all sorts of results that seem odd from a naive perspective. This was one of the biggest lessons I took out of a graduate course on the biometrics of human fertility that I took ever so many years ago.

mel belli said...

I once attended a game in which a player known for being brainy, Chris Mullin, made one of the all-time bonehead plays against one of the worst outside shooters in league history. The Warriors had a 3-point lead with about 40 seconds left. The Hawks had the ball, which went to Stacey Augmon at the 3-point line. The chances that Augmon, a defensive specialist, would hit a 3-pointer were miniscule, so Mullin, who couldn't jump anyway, should have backed off. Instead, he went up to block the shot. He fouled Augmon, and the ball went in, for a 4-point play, and the Warriors lost.

Steve Sailer said...

"(Think of the opening coin-tossing scene in Rosenkrantz and Guilderstern Are Dead.)"

I'll add the video clip.

Anonymous said...

Shaq shot 64% as an LSU sophomore ...

http://www.sports-reference.com/cbb/players/shaquille-oneal-1.html

... and never beat that mark in 19 years in the NBA, ending up shooting 5% lower in the NBA than at LSU. Very odd.

http://www.basketball-reference.com/players/o/onealsh01.html

Buck Turgidson said...

Mel Belli, I can do better: the 2005 NBA finals game 5.

If the Detroit Pistons can maintain their their three point lead against the San Antonio Spurs with 6 seconds left, they will likely repeat as champs. Coach Larry Brown in the timeout tells Rasheed Wallace who is guarding the in-bounds man Robert Horry, not to leave Horry under any circumstances. The pass goes into the corner to Manu, Rasheed leaves Horry, Manu zips it back to Horry for the tying three over a late-recovering Rasheed to force OT and the Spurs go on to win in 2005.

Bill Simmons said when Rasheed was a Celtic that Simmons once counted seven straight trips up and down the floor when Rasheed never crossed either foul line.

He's now with the Knicks.

I know which team not to bet on.

DR said...

"Your two best available potential shooters are both veterans with career percentages of 75%. But one has made his last six free throws and the other has missed his last six. Which do you pick, or are you indifferent (as Brooks implies)?"

The question isn't what to do in the case of equal career percentages, the question is if you have two players one with 80% career percentage who's missed his last six and one with 70% who's made his last six, who do you pick?

How about 74% vs 76%? 74.9% vs 75.1%? The question is what your threshold's magnitude is. What weight do you put on career history vs hot streak?

Historical statistical analysis can never reject the hypothesis of a non-zero relation for any variable, including free throw hot streak persistence. But it can reject the hypothesis that persistence's magnitude is above some minimum threshold.

So mining historical data we can never be sure that there's zero effect. But we can reject the null hypothesis that six free throws in a row increase the chance of the next free throw by more than 1%. With enough historical data we can narrow down that magnitude even further.

Assistant Village Idiot said...

Yes, independence and regression to the mean are different things. High SATV-people should be steered into statistics courses (plural) rather than calculus.

As for basketball, the 24-second clock pushes down the shooting percentage of the better players. They often get the ball late in the clock because they have a better chance of getting some kind of decent shot off. But it's still a lower-percentage shot, only good in comparison.

Anonymous said...

Interestingly the free throw percentage in the NBA has apparently stayed constant at about 75% for about the last 50 years, which is surprising to me. Is free throw shooting that close to the outer limits of human ability? Is there an advantage to be had shooting underhand, a la the Barrys, that is being overlooked for coolness reasons?

1970: 75.1
1975: 76.5
1980: 76.4
1985: 76.4
1998: 76.6
1990: 76.4
1995: 73.7
2000: 75.0
2005: 75.6
2006: 74.5
2007: 75.2
2008: 75.5
2009: 77.1
2010: 75.9
2011: 76.3

http://www.basketball-reference.com/leagues/NBA_2000.html et al

Luke Lea said...

Anonymous writes:

"I'm guessing that in the first quote Brooks is trying to articulate the concept of independent events. Each throw is independent in the same sense as each toss of a coin is independent. . . . I don't think this is quite the same as regression to the mean."

I may be mistaken but I believe he is wrong about this. Regression to the mean occurs when a trait is influenced by many unconnected genes which make independent contributions to it.

ben tillman said...

Hot streaks and cold streaks in field goal shooting (in the regular run of the game) are more complicated because of defense, which often shifts in response to streaks.

It's also more complicated because the "hot" shooter's confidence tends to lead him to take more, and thus more difficult, shots.

Anonymous said...

"Your two best available potential shooters are both veterans with career percentages of 75%. But one has made his last six free throws and the other has missed his last six. Which do you pick, or are you indifferent (as Brooks implies)?"

Indifference.

Now, if both players are career 75% free throw shooters but one has made 80% of 100 free throws this season, while the other guy has made 70% of 60 free throws - it's an easy call. But the last six free throws don't tell you anything.

Anonymous said...

Re free throw percentage the much larger amount of money involved in 1970 vs 2012 alone would one think drive players to the outer limits of performance. They're willing to dope and spend hours in the weight room, but not improve free throw technique? Making couple extra FTs per week increases player PPG by a significant amount, and probably player market value too.

It's puzzling.

David Brooks said...

"he's been reading your blog again to get ideas about what to write about?"

I have not!

Anonymous said...

But Thomas Gilovich, Amos Tversky and Robert Vallone found that a player who has made six consecutive foul shots has the same chance of making his seventh as if he had missed the previous six foul shots.

Okay, so it works for n=6.

But if you were, say, Bill James, or a Las Vegas odds-maker, then wouldn't you be curious about some of the other values for n, between maybe n=1 and n=100?

Wouldn't it be really strange if some value like n=3 or n=17 were to stick out like a sore thumb?

Heck, this is the age of computers - why can't we see the entire probability density function graphed for ALL values of n, between, say, 1 and 100?!?

Surely they've got all the NBA games of the last ten or fifteen years loaded into some sort of a times-series box-score format which could accept and run a routine like that in a fraction of a second [on one of those 8-core processors].

NOTA said...

Interestingly, for many somewhat heritable traits, you get the opposite pattern--smarter parents push their kids to learn, teach them to read early, etc., probably decreasing apparent regression to the mean in IQ / schoolwork a bit in childhood, though that stuff probably doesn't have all that much effect on their adult lives.

Bill said...

Anonymous said . . .
Heck, this is the age of computers - why can't we see the entire probability density function graphed for ALL values of n, between, say, 1 and 100?!?

Table 1 in the linked paper does it for n=1-3 misses and n=1-3 hits (for field goals). No journal editor on earth will let you put in the table you are asking for.

I had always assumed the same thing you do about the data source, but if you read the paper, it is a lot less impressive than that. The data are not for the whole NBA. They are for a couple of teams over a couple of years. The free throw data for the Celtics only.

Also, Kevin McHale was a streaky free throw shooter (table 3). His chance of making a FT after missing was 59%. His chance of making a FT after hitting was 73%. It's interesting. They say that this difference was not statistically significant. This is a problem for them. If a 14 percentage point difference is not big enough to achieve statistical significance, then their tests are under powered. Presumably, there is some follow-on literature which is better---this was written 25+ years ago.

Finally, the file is not searchable (at least by me), and I don't see the "six free throws" fact in it. Did Brooks make it up? Did I miss it? Is it in another paper?

jody said...

i don't think was a good topic for an exploration of regression to the mean.

in fact, an exploration of NBA player's children's adult heights probably would have been better, if we wanted to remain in the basketball world.

they tend to have lots of kids. so plenty of data to explore there for a true examination of regression to the mean.

or, do like i have, and explore their family's net wealth before the NBA, during the NBA, and after the NBA, if you want to see a dramatic example of regression to the mean. we have millions of data points on the average wealth of each group in the US.

NBA payrolls are about 2 billion dollars a year. lots of dollars to track there. see where they go generation after generation. hint - most of the time they're gone before they've had a chance to be passed on even one generation. political concerns about preventing generational dynasties by establishing an estate tax, needn't be invoked here.

you don't even have to do any math to realize you'd have a greater rate of wealth return, generation over generation, by simply distributing the 2 billion dollars randomly yearly, rather than channeling it into these guys.

Anonymous said...

Anon 12:14 AM, What you're looking for is statistical tests for independence of events.

Anonymous said...

"Indifference.

Now, if both players are career 75% free throw shooters but one has made 80% of 100 free throws this season, while the other guy has made 70% of 60 free throws - it's an easy call. But the last six free throws don't tell you anything."

Spoken like a man who has never himself made nor missed six consecutive foul shots.

Seriously, have you ever missed six straight and then tried to a seventh in front of an audience??

Anonymous said...

How could you write that entire piece and not once mention independence, one of the most basic and fundamental ideas in probability theory? Ugh.

Coleslaw Johnson said...

78 in a row- I get 1 in 3.0223145e+23 chance. About as believable as a Wilt Chamberlin autobiography...

ben tillman said...

How could you write that entire piece and not once mention independence, one of the most basic and fundamental ideas in probability theory? Ugh.

He did discuss it, here:

There's only a 1/4096th chance of a 75% free throw shooter missing six in a row out of pure bad luck. So, missing six in a row could very well be a sign that he has a secret injury he's not telling you about, or that he's developed a hitch in his shot that he needs some extra free-throw shooting practice to work out, or that he's mentally flustered.

In other words, free throws aren't "independent" trials in the sense you're talking about.