December 4, 2013

PISA discovering that accuracy = boredom to the press

Three years ago, Andreas Schleicher and the other well-funded folks at PISA were media darlings. This year ... not so much. You can sense that the bloom is off the rose. 

A big part of PISA's new PR problem is that the results were so similar from 2009 to 2012. Now, you might think that stability is a good sign that suggests that the PISA people aren't just pulling these numbers out of thin air. But accuracy is boring. The media likes change for the sake of change. Who's up? Who's down? A school test that's more or less a giant budget IQ test doesn't produce enough random changes to maintain media interest.

Decades ago when the news magazine US News & World Report was launching their college ranking system, there was much interest from year to year as they improved their methodology, frequently casting overlooked colleges toward the top. But, after awhile, USNWR got pretty good at measuring as much as could be conveniently measured ... and then what? Colleges, it turns out, don't change much from year to year, so the future looked a lot like the present. And without trends, we don't have news. 

So, USNWR came up with the idea of changing some of the fairly arbitrary weights in its formula each year to generate a new #1 frequently. One year, for example, Caltech shot up to #1, which generated a lot of press coverage. But it was almost all just churn for the sake of churn. Caltech was pretty much the same place before, during, and after its sudden rise and fall.

But spectators like churn. In fact, one side effect of bad quantitative methodologies is that they generate phantom churn, which keeps customers interested. For instance, the marketing research company I worked for made two massive breakthroughs in the 1980s to dramatically more accurate methodologies in the consumer packaged goods sector. Before we put to use checkout scanner data, market research companies were reporting a lot of Kentucky windage. In contrast, we reported actual sales in vast detail. Clients were wildly excited ... for a few years. And then they got kind of bored. 

You see, our competitors had previously reported all sorts of exciting stuff to clients: For example, back in the 1970s they'd say: of the two new commercials you are considering, our proprietary methodology demonstrates that Commercial A will increase sales by 30% while Commercial B will decrease sales by 20%. 

Wow.

We'd report in the 1980s: In a one year test of identically matched panels of 5,000 households in Eau Claire and Pittsfield, neither new commercial A nor B was associated with a statistically significant increase in sales of Charmin versus the matched control group that saw the same old Mr. Whipple commercial you've been showing for five years. If you don't believe us, we'll send you all the data tapes and you can look for yourselves.

Ho-hum.

It was pretty amazing that we could turn the real world into a giant laboratory (and this was 30 years ago). But after a few years, all this accuracy and realism got boring. 

It turned out that clients kind of liked it back in the bad old days when market research firms held a wet finger up to the breeze and from that divined that their client was a creative genius whose new ad would revolutionize the toilet paper business forever. (New ads and bigger budgets mostly work only if your ad has some actual message of value to the consumers to convey: e.g., "Crest now comes with Unobtanium, which the American Dental Association endorses for fighting Tooth Scuzz.")

These parallels between the consumer packaged goods industry in the 1980s and the educational reform industry in the 2010s are not really coincidental. Everybody says they want better tests, but what they really want is more congenial results. So, when they get better tests, they aren't as happy as they thought they'd be.

14 comments:

gwern said...

> It turned out that clients kind of liked it back in the bad old days when market research firms held a wet finger up to the breeze and from that divined that their client was a creative genius whose new ad would revolutionize the toilet paper business forever.

I'm reminded of a brilliantly sarcastic bit by Shalizi (http://vserver1.cscs.lsa.umich.edu/~crshalizi/weblog/698.html) in discussing the poor research practices which lead to this sort of constant flip-flopping and irreproducible results:

> ...Let me draw the moral [about publication bias]. Even if the community of inquiry is both too clueless to make any contact with reality and too honest to nudge borderline findings into significance, so long as they can keep coming up with new phenomena to look for, the mechanism of the file-drawer problem alone will guarantee a steady stream of new results. There is, so far as I know, no _Journal of Evidence-Based Haruspicy_ filled, issue after issue, with methodologically-faultless papers reporting the ability of sheeps' livers to predict the winners of sumo championships, the outcome of speed dates, or real estate trends in selected suburbs of Chicago. But the difficulty can only be that the evidence-based haruspices aren't trying hard enough, and some friendly rivalry with the plastromancers is called for. It's true that none of these findings will last forever, but this constant overturning of old ideas by new discoveries is just part of what makes this such a dynamic time in the field of haruspicy. Many scholars will even tell you that their favorite part of being a haruspex is the frequency with which a new sacrifice over-turns everything they thought they knew about reading the future from a sheep's liver! We are very excited about the renewed interest on the part of policy-makers in the recommendations of the mantic arts...

Anonymous said...

http://www.businessweek.com/articles/2013-11-27/mexicos-surprising-engineering-strength

TheLRC said...

Here's one way to make 'boring' PISA results more interesting to the average reader: cross-reference them with the data on student happiness reported in this article, which show that those smartyboots Koreans are also very unhappy.

What's even more interesting, though, is seeing just a couple of places above the Koreans -- the Finns! Yes, Finnish children, in spite of their idealistic, laid-back educational system, are only barely happier on average than the gloomy, pressure-ridden Koreans!

I found it most interesting to see that the NE Asian societies were scattered all over the chart on measures of happiness in school. Hong Kong, where I've lived for many years, ranks pretty high, surprising me a bit. The pressure on schoolkids here is likely not quite as intense as in Korea, but I suspect it's not far off. And yet the kids don't seem to hate school.

This jibes with my own experience. My daughter attends a local HK school noted for being demanding (and she does indeed have lots of homework and high expectations), and after six years there, she could not love it more.

Anonymous said...

Hey, Vertigo finally beat Citizen Kane!!

Make it Jeanne Dielman and the film media will go wild.

median reader said...

I have to concur with The Media on this one-- the new spate of PISA posts is really boring.

DWBudd said...

Living in France, the release of the PISA results have resulted in quite an amusing sequence of chest-beating. Though it was moved from the front pages of the local news today by the report that Francois "Le Nauffrage" Hollande underwent TURP in 2011 and did not tell anyone about it, the "chut des ecoles francaises" remains a hot topic.

Media complaints about the relative stability of the PISA belies just how ignorant most people are about measurement.

Put simply, year-on-year stability is a feature of a well-developed scale, not a bug.

In psychometrics, as virtually anyone reading this blog knows, one desirable property of a scale is test/re-test reliability (c.f, 2010 FDA guidance on scale development). In the absence of some explainable, physical change, a scale should yield more or less the same results. Does anyone think that, three years on, education systems or students really changed in the US? If not, then why on earth ought there to be a large change in the outcome of the test?

Imagine, if you will, that you weighed yourself at night, got in to bed, and then, upon waking, weighed yourself again. If the scale showed that you had lost 10 kg, would you think the scale was reliable?

Unless you are a psychologically challenged young actress, I'm guessing not.

Steve Sailer said...

Yes, but a reality tv show pitch about psychologically challenged young actresses whose weight changes dramatically overnight almost sells itself.

Anonymous said...

If Mexers do work that Americans feel is beneath their dignity, isn't that a form of 'racism'?

If the work is so debasing that even poor Americans without jobs should not be expected to take them, why make Mexers take them? Are they dogs who should be given inhuman labor?

irishman said...

"There's a lot of money to be made off of constant churn."

http://isteve.blogspot.ie/2013/07/the-extended-stay-american-dream.html

I've been reading Steve for years and this is his finest insight. Glad to see him returning to the theme. He could make the above quote a special area of study and sell his findings to hedge funds and corporations and make millions. But hopefully he won't because I like my daily Isteve...

Thank you for your work Steve, I really appreciate it.

John Mansfield said...

Some health magazine listed a city where I lived as one of the top ten healthiest or fattest or something, so the local paper put out a little article about the magazine's list. I looked up the previous year's list and found that not a single top ten city from the previous year was still a top ten city. It appeared to actually be a list of ten more cities where we'd like people to know our magazine exists.

Anonymous said...

We'd report in the 1980s: In a one year test of identically matched panels of 5,000 households in Eau Claire and Pittsfield, neither new commercial A nor B was associated with a statistically significant increase in sales of Charmin versus the matched control group that saw the same old Mr. Whipple commercial you've been showing for five years. If you don't believe us, we'll send you all the data tapes and you can look for yourselves. Ho-hum. It was pretty amazing that we could turn the real world into a giant laboratory (and this was 30 years ago). But after a few years, all this accuracy and realism got boring.

Even The Frankfurt School realizes this now.

They still give lip-service to their Gramscian program of poisoning the culture, but they know damned well that most political affinities are burned into the blood at conception, and, at this point, they've pretty much maxed out the returns which they are seeing from their salting of the cultural well of Western Civilization.

That's why we no longer see Grand Theories of Everything coming out of the DEMs anymore, and why it's now "Chicago Way" Pritzker family crime syndicate payoffs & bribery and Emanuel family crime syndicate "dead-fish" voter intimidation & voter suppression, with, of course, GOTV ["Get Out The Vote"] uber alles.

Obviously you see this in the push for Amnesty and citizenry-replacementism.

But once you know what to look for, you'll realize that The Frankfurt School is now doing this EVERYWHERE.

In particular, The Frankfurt School turned the software engineering of healthcare.gov on its head, in order to force Obamacare applicants to first surrender all of their personal data before any plan pricing would be revealed to them:

Create an Account
Apply
Pick a plan
Enroll


They need the personal data upfront so that they can hit the applicants with either:

1) Pritzker-esque purchasing of their votes every election day from now until the end of time, or else

2) Emanuel-esque dead-fish voter-suppression tactics, especially as regards siccing Schulman and Lerner and the rest of their Frankfurt School IRS thugs on any potential GOP voters who make the mistake of creating an account but then NOT purchasing a plan.

[And remember, if those GOP voters can somehow be made into felons by the IRS, then they will lose their right to vote in most states.]

BTW, if you read Matt LaBash's new piece this week, then you'll learn that they are already using "microtargeted lists" in an attempt to maximize Obamacare account-creation and its luring of potential new DEM voters out of the shadows.

The good news from the LaBash piece, however, is that they seem to be failing pretty miserably at it.

Ed said...

In Steve's market research example, what is going on is that the key executive at the Big Company really wants to hire the advertising firm to run the new commercial. He doesn't want to hire the advertising firm to boost sales of the Big Company's products. He wants to hire the advertising firm because of some sort of conflict of interest that he has kept hidden -his brother in law is a principle, or he hopes to get his son hired there, or whatever.

The market research firm in this example is hired to provide a justification plausible to the other executives at Big Company, who do not have a conflict of interest with the advertising firm, to pay for the new commercial. That is the job of the market research firm and I think most market research firms understand this. Steve's firm may have thought that they were hired to do actual research. I doubt they lasted long.

Luke Lea said...

So it's all tabloid all the time now? Wonder if there might be a market for a truly boring but accurate newspaper -- a paper you read for information not entertainment? WSJ used to be a little like that.

Luke Lea said...

@ Steve - "Yes, but a reality tv show pitch about psychologically challenged young actresses whose weight changes dramatically overnight almost sells itself."

here it is!