November 7, 2013

Explaining the Flynn Effect?

The "Flynn Effect," the name invented by Richard Herrnstein and Charles Murray in The Bell Curve for the phenomenon documented most thoroughly by James Flynn of rising raw scores on IQ tests, remains perhaps the most important (and technically daunting) conundrum in psychometrics.

Many worthy explanations have been offered, but we can use another one. 

One clue might be that the Flynn Effect tends to be largest on those types of IQ tests that seem designed by Mr. Spock-like aliens or robots, such as the Raven's Matrices, that tour d'force in minimalist test design from the late 1930s. 

Raven's Matrices
The more broad-based Wechsler brand of IQ tests was introduced in the same era. On this, we see a wide disparity in magnitude of the Flynn Effect by subtests. 

I adapted the table below from Flynn's 2007 book What Is Intelligence? On Wechsler Intelligence Scale for Children subtests, the size of raw score gains from 1947 to 2002 on general information, arithmetic, and vocabulary subtests were small. But they were quite large on the more Raven's-like subtests, along with the high-concept Similarities subtest:

+2 (IQ Gain in Points, 1947-2002)
Example: On what continent is Argentina?

+2 point gain
If a toy costs $6, how much do 7 cost?

What does "debilitating" mean?

Why are streets usually numbered in order?

Picture Completion
Indicate the missing part from an incomplete picture.

Block Design
Use blocks to replicate a two-color design.

Object Assembly
Assemble puzzles depicting common objects.

Using a key, match symbols with shapes or numbers.

Picture Arrangement
Reorder a set of scrambled picture cards to tell a story.

In what way are "dogs" and "rabbits" alike? 
(Answer key: 2 points for "mammals," 1 point for "four-legged," and 0 points for "I wuv them.")

The last item deserves a separate explanation, but it's not hard to see that the first four subtests, on which the Flynn Effect has been restrained, are qualitatively different from the next five, on which it has been dramatic. All else being equal, more recent children, who grew up with an abundance of complex toys and electronic devices, would seem more likely to ace subtests five through nine. Robert Gordon said life is an IQ test, and life may well have become more like an IQ test, thus making it better training for taking IQ tests.

This pattern may help explain why kids these days don't seem all that hep when you try to talk to them about Grandma's debilitating hemorrhoids, but they are whizzes with their MyFace and Tweeter.


Anonymous said...

so what is it when some are better than others at identifying and comprehending those "rules"?

Life is life, la la, la la la said...

Isn't this just a (perhaps more detailed) restatement of Flynn's own explanation for the effect?

(I thought Google got rid of these painful captchas.)

Anononmous said...

d e b i l l i t a t i n g. Debilitating.

James Thompson said...


Anonymous said...

“The increasing capacity of societies to detect and explicitly utilize rules as a function of the Flynn effect may be related to increasing rule exposure via mass education and to ‘ways of thinking’ endemic to cognitive modernity (Flynn, 2009)."

Uh-oh. What could possibly go wrong here?

Anononmous said...

but (pre-Internet) if your high school library didn't have a subscription to National Review like mine did, you'd be at a disadvantage compared to me.

Doesn't that make you more disadvantaged? No internet? No computers? No ipads? You were on the wrong side of the "Digital Divide" that promotes the inequality gap.

The Z Blog said...

I find this to be fascinating. As a kid I got in trouble for scoring too high on a standardized test. I think it was the Iowa Test, but I no longer recall precisely. My punishment was getting tested a million different ways by school psychologists. My poor parents thought they had a serial killer on their hands.

One of the ways I entertained myself was to figure out the rules of the test. Other kids trapped in the same net did the same things as we would talk about it. I never thought that was important until reading this story. Thank you Steve.

My question is how much does it matter if you know there is an underlying methodology that can be reverse engineered? If you don't know that, you will not look for it. Retesting may not matter. If this is explained, then retesting results in sharp improvement?

Anonymous said...

I'm really confused as to how this isn't the easiest to prep for and the least "g-loaded". Aren't you just memorizing simple facts here?

Luke Lea said...

Quite a story if the kid turns out to be right. In fact quite a story anyway.

Anonymous said...

it is indeed a brilliant article - he nailed it & stuck the ending, as they say in the olympics. this is something that test publishers should've figured out long ago. they had all the data to do it for years. yet it takes someone looking at the data in a fresh way. bravo!

Mr. Rational said...

This is fascinating.  Could it possibly explain the underperformance of far Eastern societies compared to the measured intelligence of their members?  If following rules (necessary for highly formalized socities) uses different abilities than deriving unknown rules and constructing new frameworks of rules within other rules (e.g. science and engineering), it could account for some otherwise inexplicable currents of history.

Eric Rasmusen said...

Has IQ as measured by a culture-free test become a worse predictor of things like who you marry or what income you have, since 1960?

RS said...

Nice arguments, important work I think. I had no idea ravens had relatively poor reproducibility on retest. I had thought highly of it because I knew it had just about the tightest correlation with /g/. Now I think I'll tend to substantially discount results relying on it. A test that can be gamed is just a sorry instrument, unless you can carefully and fully specify why, in a certain subfield of interest, its not such a terrible thing.

I've seen some gameable tests in my time, old boy -- I'm talking achievement tests not aptitude. I smacked a certain honorary high school exam -- incidentally not the Am. HS Math Exam -- because it was just plain low-budget. It was multiple choice of course, and I could easily see all the loose psychological ends hanging out left and right: oh, they're trying to psych me out with this, that . . . between high IQ, psychological insight, and extensive general knowledge, I could get a hook into almost every item. My score vs my actual helplessness in the subject was hilarious.

Later I took the biochem GRE -- also pure multiple choice (five options per question if memory serves). That son of a gun is like a brick wall. Having a high IQ will do very little for you -- once test day arrives, that is. (Incidentally its quite the pleasure cruise: three hours with heavy time pressure.)

My o-chem prof's multiple choice sections were probably 'worse' than the biochem GRE... a brick wall slanted back at you. The man was a wizard, its like he could make you /want/ to get tricked. Looking at the graded test afterward you just shook your head. And naturally there's the impulse to ask yourself hey was that one kind of a trick question -- but it really wasn't true. The questions were clean, simple, clear, 'classical' as could be -- /not/ bizarre and aberrant little exceptions -- its just that they were tiny masterpieces of deflection of IQ and general knowledge. They even deflected 'light' and middling knowledge of o-chem itself -- he wanted to know if you were seeing the Platonic Realities of o-chem.

Anonymous said...


There is a post today on the blog of Columbia U mathematican Peter Woit about the new research program...Jonathan Rosenberg(DNA sequencing technology developer) and MIT physicist Max recruit 400 elite physcists and mathematicians and identify their mathematical genius genes.

Robert Plomin is on record as being a very big supporter of this. Field Medalist Curtis McMullen declined to be one of the 400 hundred. Berkeley mathematician Michael Hutchings doesn't have a high opinion of this research program.

I think you know what my views on the matter are. So I won't say anything else. Go have a look at Peter Woit's blog.

Bill Blizzard and his Men

Anonymous said...

"Wechsler as an admission exam for Manhattan four-year-olds with $40k to burn annually on kindergarten "

Let's say you could make a list of every possible test. Which 10 or so tests would you choose to advance the children of legacy rich kids while keeping out most of the kids of the annoying tiger parents?

What you could do is using sampling administer the tests to various groups and then choose the tests the legacy rich kids did best at relative to the tiger parents kids. Then give that collection of tests a fancy name backed by an Ivy league shrink. Then use that test for entry until too many tiger parent's kids pass it.

Alternatively encouraging tiger parent's kids to try to scam the test by studying specifically for the test might be considered useful as will waste the kids time unproductively and make the kid subservient to authority. See John Taylor Gatto for details.

Sword said...
Horace Staccato said...

Number 5!! 5 is the answer. The two little WHITE triangles!! I will be FIRST in line at the lunchroom!!

pat said...

The major points of this posting must be correct because they seem so obvious. Not that I knew all this before but that's the way it is. Some correct ideas profoundly alter the way you see things thereafter but most - like these -seem like you knew them all along. I take that as evidence that these notions are more or less correct.

I tend to overestimate how smart I am. When the Rubik's Cube came out I was eager to get one and solve it. Much to my surprise I couldn't. Then I learned that all those people who had claimed to have solved it had just memorized the algorithm. I suppose that that also takes some smarts but I wasn't interested in just learning how someone else had been smart.

The Raven's Matrices look a lot like Rubik's Cube to me. I tried one of the online versions last year. It's easy to see how if you had access to a short list of principals you could 'ace' such a test.

Gene Kelly couldn't sing much nor could Sinatra dance. But Sinatra danced up a storm in one of the sailor movies. What you didn't see was that it took 'Ol' Blue Eyes' a month of retakes and a month of editing to look good on screen.

Real life is like a real dance audition. You have to do it right now. That's what we intuitively mean by intelligence. Someone in a new situation spontaneously develops a new solution. That's also why true improv theater is so awe inspiring.


James Thompson said...

Thank you very much for posting on this work

Anonymous said...

This sounds too Oniony.

Anonymous said...

Why any discussion of 'India' is really bogus. There isn't any one country that could be characterized as 'India'... just like Brazil and of course increasingly the US, which should be called DUS or disunited states.

Indians talk more about what must be done than the Chinese do.. but they do less. Old habits die hard.

Fernandinande said...

The Z Blog said...One of the ways I entertained myself was to figure out the rules of the test.

I entertained myself on school 'fill in the oval' tests by making patterns or letters with the answer dots. (I'm pretty sure that's why I got stuck in a semi-tard English class, circa 1968, after being the first kid in the school to finish the self-paced reading program, though perhaps all the English classes were tard-ish).

Anonymous said...

How much of 'reason' is really rearson(reared to reason that way)?

The kid is obviously smart, and smart kids seek recognition and praise.
If that praise is attached to associating with the likes of Al Gore, his intelligence will gravitate toward that position.

I wonder how many smart Libs really thought their way to their convictions or were simply raised that way.

Anonymous said...

The question here is about testing the quality of a human mind. Clearly some minds are better then others, and clearly some cultures are better then others. It is the combination, the mix of the two that determines someone’s capabilities. We are testing the capabilities of both the nature and the nurture of a human mind.

In a previous thread it was asked “have the Eskimos ever produced a Newton?” Clearly Newton lived is cradle of knowledge unknown to the Eskimos. And if Newton was tested on the qualities of snow, a ten year old Eskimo would best him. It is both mind quality and knowledge that counts.

Why are we testing - what is our real goal - what are we looking for - what is the payoff for society? The answer is - creativity - we are looking for minds that can create answers to questions - and individuals that can feed our minds with new wonderful entertaining creations.

Our eyes and ears take in information that our mind stores. It is the capacity to rearrange that information in a new useful manner that manifests creativity. What is most important for a culture is the power of someone to concentrate their mind on relevant previously unconnected information and then bring forth a new idea.

Newton or Eskimo - no matter the intellectual environment - it is concentration that counts. Can there be a test for “concentration that leads to creativity?” Can this be done in a rational fair-minded way? Brains grow at different rates, and do not mature until their early twenties.

p.s. Children who play computer games developed skills of concentration - but about killing? The future is computer programs that develop both knowledge and concentration.

Anonymous said...

In view of the "footbridge dilemma" it seems as though the bomb dropped on Nagasaki was aptly named.

Anonymous said...

Steve, your explanation for the Flynn effect is a description of Marshall McLuhan's children of the digital age.

Jack Amok said...

The a big part of the Flynn effect is really just the increase in Asperger's?

Anonymous said...

it is indeed a brilliant article - he nailed it

One of the most densely written articles I had read. And completely not necessarily so. It's not like the basic premise is some sort of quantum leap. It was all reasonable but should have been expressed better.

Anonymous said...

I had always assumed the Flynn effect was uniform over the various sub-tests. I didn't know there were drastic differences among them.

Flick said...

In comparing dogs and rabbits, do you get 3 points for saying that they're both four-legged mammals that are sometimes adopted as pets?

jody said...

this little dude's blog can't be real. a 15 year old is writing this stuff?

Silver said...

"p.s. Children who play computer games developed skills of concentration - but about killing? "

It helps develop proficiency and confidence in problem-solving, not killing. How many computer geeks can even throw a punch? Some surely can, but it's not what they're known for.

Silver said...

"I'm really confused as to how this isn't the easiest to prep for and the least "g-loaded". Aren't you just memorizing simple facts here?"

General knowledge questions can be prepped for by reading a lot and thereby increasing the likelihood that what you have studied shows up on a test, but the post was specifically discussing the detection of rules rather than addressing prepping more broadly. When it comes to general knowledge you typically either know the answer or you don't. Occasionally the multiple choice options can you tip you off or trigger a memory that improves the odds of guessing correctly but this is really stretching the definition of 'rule detection.'

Anonymous said...
Jehu said...

We have IQ test numbers from lots of people who served in the military back in WWII and the Cold War. Has anyone done a follow-up study on them? My observation is that the people that the US WWII military considered +1 to +2 sigma still strike me as +1 to +2 sigma and so forth.

Alice said...

The game Set! is almost exactly a practice for Raven's. There, they explicitly tell you the acceptable rules, so you can be taught what you should be looking for.

rob said...

One doesn't see this sort of exceptional performance in the Indians who grind out spelling bee victories.

Possibly these distinctions would be useful:

1) Ability to follow a rule.

2) Ability to pick which rule to follow

3) Ability to figure out new and useful rules.

By rules I'm also including noticing patterns. Grinds are very good at 1. Give 'em a list of vocab to study, and they will study the sh*t outa that list. They're not very good at figuring out what's a good idea. Regular smart people are pretty good at 1 and good at 2. People who are extremely good at 3 are geniuses. I can learn Fourier transforms by studying hard, but Fourier didn't study the sh*t outa a book on Fourier transforms. He found very useful rules where no one expected them.

Contrasting grinding Indian spelling-bee winners and Elijah makes the importance of filtering out grinds for top schools.

theyoushow said...

To me, IQ tests of any kind are only a judge of how a person takes tests rather than a correct judgment of a person's real intelligence.

There are genius's out there who can not take tests, but who have demonstrated their super-intelligence through life and life experiences, yet there are those who score high on IQ tests and I wouldn't want them to be on a desert island with me.

A test is a test is a test and nothing shows us a human being more than just being present with that human being and observing how the person handles crisis ,every day life, real problems and how they use their intelligence to peacefully , creatively solve those real problems without creating more problems. 3