December 4, 2013
How can PISA claim to fairly test in 65 countries in dozens of languages?
My vague hunch is that modern Item Response Theory testing, of which the PISA test's Rasch Model is an example, allows testers to say, much like movie directors of sloppy productions: "We'll fix it in Post."
You tell me that during the big, expensive action scene I just shot, the leading man's fly was open and in the distant background two homeless guys got into a highly distracting shoving match? And you want to know whether we should do another take, even though we'd have to pay overtime to 125 people?
"Eh, we'll fix it in Post."
Modern filmmakers have a lot of digital tricks up their sleeves for rescuing scenes, just as modern psychometricians have a lot of computing power available to rescue tests they've already given.
For example, how can the PISA people be sure ahead of time that their Portuguese translations are just as accurate as their Spanish translations?
Well, that's expensive to do and raises security problems. But, when they see the results come in, they can notice that, say, smart kids in both Brazil and Portugal who scored high overall, did no better on Question 11 than kids who don't score well on the other questions, which suggests the translation of Question 11 might be ambiguous. Oh, yeah, there are, now that we think about it, two legitimately right answers to Question 11 in the Portuguese translation. So we'll drop #11 from the scoring in those two countries. But, in the Spanish-speaking countries, this anomaly doesn't show up in the results, so maybe we'll count Question 11 for those countries.
This kind of post-hoc flexibility allows PISA to wring a lot out of their data. On the other hand, it's also a little scary.