November 7, 2013

Has a 15-year-old explained the Flynn Effect?

10-year-old Elijah Armstrong wins
2008 Marin County Spelling Bee
The "Flynn Effect," the name invented by Richard Herrnstein and Charles Murray in The Bell Curve for the phenomenon documented most thoroughly by James Flynn of rising raw scores on IQ tests, remains perhaps the most important (and technically daunting) conundrum in psychometrics.

Many worthy explanations have been offered, but we can use another one. And the brand new paper from Elijah Armstrong (see picture at right) and Michael Woodley is a standout.

One clue might be that the Flynn Effect tends to be largest on those types of IQ tests that seem designed by Mr. Spock-like aliens or robots, such as the Raven's Matrices, that tour d'force in minimalist test design from the late 1930s. 

Raven's Matrices
The more broad-based Wechsler brand of IQ tests was introduced in the same era. On this, we see a wide disparity in magnitude of the Flynn Effect by subtests. 

I adapted the table below from Flynn's 2007 book What Is Intelligence? On Wechsler Intelligence Scale for Children subtests, the size of raw score gains from 1947 to 2002 on general information, arithmetic, and vocabulary subtests were small. But they were quite large on the more Raven's-like subtests, along with the high-concept Similarities subtest:

Information
+2 (IQ Gain in Points, 1947-2002)
Example: On what continent is Argentina?

Arithmetic
+2 point gain
If a toy costs $6, how much do 7 cost?

Vocabulary
+4
What does "debilitating" mean?

Comprehension
+11
Why are streets usually numbered in order?

Picture Completion
+12
Indicate the missing part from an incomplete picture.

Block Design
+16
Use blocks to replicate a two-color design.

Object Assembly
+17
Assemble puzzles depicting common objects.

Coding
+18
Using a key, match symbols with shapes or numbers.

Picture Arrangement
+22
Reorder a set of scrambled picture cards to tell a story.

Similarities
+24
In what way are "dogs" and "rabbits" alike? 
(Answer key: 2 points for "mammals," 1 point for "four-legged," and 0 points for "I wuv them.")

The last item deserves a separate explanation, but it's not hard to see that the first four subtests, on which the Flynn Effect has been restrained, are qualitatively different from the next five, on which it has been dramatic. All else being equal, more recent children, who grew up with an abundance of complex toys and electronic devices, would seem more likely to ace subtests five through nine. Robert Gordon said life is an IQ test, and life may well have become more like an IQ test, thus making it better training for taking IQ tests.

This pattern may help explain why kids these days don't seem all that hep when you try to talk to them about Grandma's debilitating hemorrhoids, but they are whizzes with their MyFace and Tweeter.

James Thompson blogs at Psychological Comments:
Flynn effect as a retesting, rule-based gain 
It is very good to see a paper which takes a large scale effect, the secular rise in intelligence test results, and links it to an intriguing large scale explanation. A new contribution to understanding the Flynn Effect is to be found in the journal Learning and Individual Differences, which became available 30th October: 
Elijah Armstrong and Michael Woodley  
The rule-dependence model explains the commonalities between the Flynn effect and IQ gains via retesting.” 

And here is an uncorrected proof of Woodley and Armstrong's upcoming paper.

Woodley is a prominent young psychologist, now at the U. of Umea in Sweden.

There's young and then there's young. Elijah Armstrong is a 15-year-old who lives in Marin County, California. Above is the Marin News' picture of him winning the county's spelling bee for elementary school students. In the picture he is a fifth-grader at age 10 -- that was slightly less than five years ago. He's been working on his rule-dependence model of the Flynn Effect since early 2012.

Here's Elijah's blog

Thompson continues:
Armstrong and Woodley argue that the Flynn effect is partly driven by the retest effect, whereby familiarity with the test material means that if you can learn a rule of thumb you can solve those particular sorts of problems when you see them again, without having to use much intelligence.

Civilization is a system for conservation-of-cognition.
In very simple terms, the test wears out quickly once you get to learn how it works. Using implicit learning and working memory, test takers learn how to solve rule dependent problems, which leads to apparent IQ gains which are partly independent of general intelligence. 
As readers of this blog will know, the ultimate IQ test is the one for which no-one knows the answers at the moment. Intelligence tests in the real world are more modest affairs. Raven’s Matrices is a test based on progressions: you need to find the rule which underlies the visible changes in the problem arrays, and a good enough memory to hold in mind how those changes are progressing, so that you can correctly choose the final missing picture. ... Carpenter et al. (1990) found that 5 rules covered all the items in the test. Once you know that, it is less of a test.

I suspect the more "culture fair" a test is (such as the Raven's Matrices), the more you can test prep for it. The less you can effectively test prep for an IQ subtest, such as vocabulary or information on the Wechsler Children's IQ test, the more culturally biased it is. For instance, I read a huge amount of William F. Buckley in 9th and 10th grades, which helped my vocabulary no end, but (pre-Internet) if your high school library didn't have a subscription to National Review like mine did, you'd be at a disadvantage compared to me.
Another aspect of being test savvy is the capacity to de-contextualise, that is, to be able to generalise about types of problem, without being confused by the particular context in which the specific example is presented to you.

For example, the "trolley problem" appeals to high IQ individuals good at de-contextualizing -- i.e., not asking a lot of stupid questions about how, exactly, do you push a fat man to his death to stop a runaway trolley. Instead, you should recognize that it's a question about consequentialism v. deontology and therefore only focus upon the details that the questioner wants you to focus upon.

Personally, the older and dumber I get, the more I enjoy "re-contextualizing" -- taking abstract ideas and considering them in light of empirical realities. But, re-contextualizing tends to drive smart people crazy.
Armstrong and Woodley assert that, from the point of view of intelligence, education amounts to a vast re-testing enterprise. There are modest gains from rules of thumb, mnemonics and being “taught to the test”. Indeed, the reliance on exam results makes teachers and pupils confederates in ensuring that nothing is taught which is not taught to be examined. Incidentally, this view does not exclude what James Flynn calls “scientific spectacles” which more people now adopt when solving problems. 

On average, kids in 2002 had watched a lot more nature documentaries on TV than kids in 1947 had, so scientific concepts like "mammals" are more common.
Armstrong and Woodley rank tests according to how much “cognitive scaffolding” they have. Raven’s Matrices is level IV: rules are very helpful, only a few of them are required; Catell Culture Fair is Level III: rules help, but will not help you on many items; the majority of IQ tests [e.g., Wechsler, but I don't think they used it because it's an oral test and they stuck to paper-and-pencil tests -- I may be wrong here] are Level II: very many rules are required, but working out which to use is difficult, (and selecting the rule is what requires general intelligence); and Draw A Man test is Level I: no rule is of much help. 
They then simply correlate the vector of the position of any particular test in the rule dependence typology with the vector of the size of the Flynn effect on that test. A positive correlation would indicate that tests that were more dependent upon rules were yielding the larger Flynn effects. They tested it on 14 data sets, and found a correlation of 0.6

r = 0.6 isn't huge, but it's a lot better than a sharp stick in the eye.
The authors say: “It is proposed that tests like the Raven’s are only highly g loaded when encountered initially — even basic familiarity with the rules and heuristics on a test, or familiarity with inductive reasoning itself, has the potential to radically diminish the g loading of this test over time, both under controlled conditions (such as in a retesting scenario) and over larger societal time scales (i.e., across generations in the case of the Flynn effect).” 

To me, the Raven's looks as sinister as it's Edgar Allen Poe-like name implies. But, with some practice I could probably get the hang of it. In contrast, if you tested me on a random sample of vocabulary words drawn from, say, Dr. Johnson's Dictionary, I'd jump right in, but would only get slightly better as I went along.

(There's a separate issue that many IQ tests have, in practice, a limited question bank. So, if you practiced enough on old tests you'd eventually hear all the words in, say, the Wechsler's vocabulary subtests. But, in theory, that wouldn't be a problem. As Bruce Charlton pointed out, the use of the Wechsler as an admission exam for Manhattan four-year-olds with $40k to burn annually on kindergarten has becoming increasingly gamed because the WISC is intended as a clinical test for diagnostic purposes, not as a gatekeeper exam to select among the children of the most ambitious parents this side of Seoul.
They continue: “The increasing capacity of societies to detect and explicitly utilize rules as a function of the Flynn effect may be related to increasing rule exposure via mass education and to ‘ways of thinking’ endemic to cognitive modernity (Flynn, 2009). 
This is a good paper. It contains lots of ideas, proposes a theory and then tests it, and draws out the conclusions in a thoughtful way. Not content with linking the observed phenomenon with the Flynn Effect and life speed theory, it also includes 5 testable predictions, to encourage other researchers to test whether their proposal has merit. It is a notable debut for the first author, whose first paper this is, and whose ideas formed the basis for the eventual publication.
Postscript 
Elijah lives in Marin County, California, and is interested in philosophy and intelligence research. He originated the rule-dependence model in early 2012 and worked on it for eighteen months thereafter. He claims his conscientiousness is below the 10th percentile. He is also prone to end all his emails saying “Excuse typos, I typed this with my feet”. If you imagine that he is a sad old man gathering up a lifetime of scholarship into a well-honed rant, your imagination would be wrong. Elijah is 15. 

Here's Armstrong and Woodley's abstract:
We present a new model of the Flynn effect. To wit, we propose that Flynn effect gains are partly a function of the degree to which a test is dependent on rules or heuristics. This means that testees can become better at solving ‘rule-dependent’ problems over time in response to changing environments, which lead to the improvement of lower-order cognitive processes (such as implicit learning and aspects of working memory). These in turn lead to apparent IQ gains that are partially independent of general intelligence. We argue that the Flynn effect is directly analogous to IQ gains via retesting, noting that Raven's Progressive Matrices is particularly sensitive to both the effects of retesting and the Flynn effect. After an extensive review of the relevant supporting literature, we test our thesis by developing a rule - dependence typology and then correlate the vector of a test's position in the typology with the vector of the Flynn effect that it yields. We find a significant vector correlation of r = ~ .60 (N = 14). Finally, we make a number of novel and testable predictions based on our model. 

For some readable background on the Flynn Effect, here's my 2007 review of Flynn's What Is Intelligence?

38 comments:

Anonymous said...

so what is it when some are better than others at identifying and comprehending those "rules"?

Life is life, la la, la la la said...

Isn't this just a (perhaps more detailed) restatement of Flynn's own explanation for the effect?

(I thought Google got rid of these painful captchas.)

Anononmous said...

Debilitating.
d e b i l l i t a t i n g. Debilitating.

James Thompson said...

Thanks!

Anonymous said...

“The increasing capacity of societies to detect and explicitly utilize rules as a function of the Flynn effect may be related to increasing rule exposure via mass education and to ‘ways of thinking’ endemic to cognitive modernity (Flynn, 2009)."

Uh-oh. What could possibly go wrong here?

Anononmous said...

but (pre-Internet) if your high school library didn't have a subscription to National Review like mine did, you'd be at a disadvantage compared to me.

Doesn't that make you more disadvantaged? No internet? No computers? No ipads? You were on the wrong side of the "Digital Divide" that promotes the inequality gap.

The Z Blog said...

I find this to be fascinating. As a kid I got in trouble for scoring too high on a standardized test. I think it was the Iowa Test, but I no longer recall precisely. My punishment was getting tested a million different ways by school psychologists. My poor parents thought they had a serial killer on their hands.

One of the ways I entertained myself was to figure out the rules of the test. Other kids trapped in the same net did the same things as we would talk about it. I never thought that was important until reading this story. Thank you Steve.

My question is how much does it matter if you know there is an underlying methodology that can be reverse engineered? If you don't know that, you will not look for it. Retesting may not matter. If this is explained, then retesting results in sharp improvement?

Anonymous said...

Information
+2 (IQ Gain in Points, 1947-2002)
Example: On what continent is Argentina?"

I'm really confused as to how this isn't the easiest to prep for and the least "g-loaded". Aren't you just memorizing simple facts here?

Luke Lea said...

Quite a story if the kid turns out to be right. In fact quite a story anyway.

panjoomby said...

it is indeed a brilliant article - he nailed it & stuck the ending, as they say in the olympics. this is something that test publishers should've figured out long ago. they had all the data to do it for years. yet it takes someone looking at the data in a fresh way. bravo!

Mr. Rational said...

This is fascinating.  Could it possibly explain the underperformance of far Eastern societies compared to the measured intelligence of their members?  If following rules (necessary for highly formalized socities) uses different abilities than deriving unknown rules and constructing new frameworks of rules within other rules (e.g. science and engineering), it could account for some otherwise inexplicable currents of history.

Eric Rasmusen said...

Has IQ as measured by a culture-free test become a worse predictor of things like who you marry or what income you have, since 1960?

RS said...

Nice arguments, important work I think. I had no idea ravens had relatively poor reproducibility on retest. I had thought highly of it because I knew it had just about the tightest correlation with /g/. Now I think I'll tend to substantially discount results relying on it. A test that can be gamed is just a sorry instrument, unless you can carefully and fully specify why, in a certain subfield of interest, its not such a terrible thing.

I've seen some gameable tests in my time, old boy -- I'm talking achievement tests not aptitude. I smacked a certain honorary high school exam -- incidentally not the Am. HS Math Exam -- because it was just plain low-budget. It was multiple choice of course, and I could easily see all the loose psychological ends hanging out left and right: oh, they're trying to psych me out with this, that . . . between high IQ, psychological insight, and extensive general knowledge, I could get a hook into almost every item. My score vs my actual helplessness in the subject was hilarious.

Later I took the biochem GRE -- also pure multiple choice (five options per question if memory serves). That son of a gun is like a brick wall. Having a high IQ will do very little for you -- once test day arrives, that is. (Incidentally its quite the pleasure cruise: three hours with heavy time pressure.)


My o-chem prof's multiple choice sections were probably 'worse' than the biochem GRE... a brick wall slanted back at you. The man was a wizard, its like he could make you /want/ to get tricked. Looking at the graded test afterward you just shook your head. And naturally there's the impulse to ask yourself hey was that one kind of a trick question -- but it really wasn't true. The questions were clean, simple, clear, 'classical' as could be -- /not/ bizarre and aberrant little exceptions -- its just that they were tiny masterpieces of deflection of IQ and general knowledge. They even deflected 'light' and middling knowledge of o-chem itself -- he wanted to know if you were seeing the Platonic Realities of o-chem.

Anonymous said...

Steve

There is a post today on the blog of Columbia U mathematican Peter Woit about the new research program...Jonathan Rosenberg(DNA sequencing technology developer) and MIT physicist Max Tegamak...to recruit 400 elite physcists and mathematicians and identify their mathematical genius genes.

Robert Plomin is on record as being a very big supporter of this. Field Medalist Curtis McMullen declined to be one of the 400 hundred. Berkeley mathematician Michael Hutchings doesn't have a high opinion of this research program.

I think you know what my views on the matter are. So I won't say anything else. Go have a look at Peter Woit's blog.

Bill Blizzard and his Men

Anonymous said...

"Wechsler as an admission exam for Manhattan four-year-olds with $40k to burn annually on kindergarten "

Let's say you could make a list of every possible test. Which 10 or so tests would you choose to advance the children of legacy rich kids while keeping out most of the kids of the annoying tiger parents?

What you could do is using sampling administer the tests to various groups and then choose the tests the legacy rich kids did best at relative to the tiger parents kids. Then give that collection of tests a fancy name backed by an Ivy league shrink. Then use that test for entry until too many tiger parent's kids pass it.

Alternatively encouraging tiger parent's kids to try to scam the test by studying specifically for the test might be considered useful as will waste the kids time unproductively and make the kid subservient to authority. See John Taylor Gatto for details.

Sword said...

A quick glance at Armstrong's blog shows that he has The Audacious Epigone on his blogroll. Heartening that also young people think along our lines.

Horace Staccato said...

Number 5!! 5 is the answer. The two little WHITE triangles!! I will be FIRST in line at the lunchroom!!

pat said...

The major points of this posting must be correct because they seem so obvious. Not that I knew all this before but that's the way it is. Some correct ideas profoundly alter the way you see things thereafter but most - like these -seem like you knew them all along. I take that as evidence that these notions are more or less correct.

I tend to overestimate how smart I am. When the Rubik's Cube came out I was eager to get one and solve it. Much to my surprise I couldn't. Then I learned that all those people who had claimed to have solved it had just memorized the algorithm. I suppose that that also takes some smarts but I wasn't interested in just learning how someone else had been smart.

The Raven's Matrices look a lot like Rubik's Cube to me. I tried one of the online versions last year. It's easy to see how if you had access to a short list of principals you could 'ace' such a test.

Gene Kelly couldn't sing much nor could Sinatra dance. But Sinatra danced up a storm in one of the sailor movies. What you didn't see was that it took 'Ol' Blue Eyes' a month of retakes and a month of editing to look good on screen.

Real life is like a real dance audition. You have to do it right now. That's what we intuitively mean by intelligence. Someone in a new situation spontaneously develops a new solution. That's also why true improv theater is so awe inspiring.

Albertosaurus

James Thompson said...

Thank you very much for posting on this work

Anonymous said...

http://www.youtube.com/watch?v=g7MCSl5ju0E

This sounds too Oniony.

Anonymous said...

http://timesofindia.indiatimes.com/home/education/news/Mumbai-students-not-as-skilled-as-eastern-zones-in-reading/articleshow/25340434.cms?

Why any discussion of 'India' is really bogus. There isn't any one country that could be characterized as 'India'... just like Brazil and of course increasingly the US, which should be called DUS or disunited states.

Indians talk more about what must be done than the Chinese do.. but they do less. Old habits die hard.

Fernandinande said...

The Z Blog said...One of the ways I entertained myself was to figure out the rules of the test.

I entertained myself on school 'fill in the oval' tests by making patterns or letters with the answer dots. (I'm pretty sure that's why I got stuck in a semi-tard English class, circa 1968, after being the first kid in the school to finish the self-paced reading program, though perhaps all the English classes were tard-ish).

Anonymous said...

http://www.youtube.com/watch?v=XFxjy7f9RpY

How much of 'reason' is really rearson(reared to reason that way)?

The kid is obviously smart, and smart kids seek recognition and praise.
If that praise is attached to associating with the likes of Al Gore, his intelligence will gravitate toward that position.

I wonder how many smart Libs really thought their way to their convictions or were simply raised that way.

Anonymous said...

The question here is about testing the quality of a human mind. Clearly some minds are better then others, and clearly some cultures are better then others. It is the combination, the mix of the two that determines someone’s capabilities. We are testing the capabilities of both the nature and the nurture of a human mind.

In a previous thread it was asked “have the Eskimos ever produced a Newton?” Clearly Newton lived is cradle of knowledge unknown to the Eskimos. And if Newton was tested on the qualities of snow, a ten year old Eskimo would best him. It is both mind quality and knowledge that counts.

Why are we testing - what is our real goal - what are we looking for - what is the payoff for society? The answer is - creativity - we are looking for minds that can create answers to questions - and individuals that can feed our minds with new wonderful entertaining creations.

Our eyes and ears take in information that our mind stores. It is the capacity to rearrange that information in a new useful manner that manifests creativity. What is most important for a culture is the power of someone to concentrate their mind on relevant previously unconnected information and then bring forth a new idea.

Newton or Eskimo - no matter the intellectual environment - it is concentration that counts. Can there be a test for “concentration that leads to creativity?” Can this be done in a rational fair-minded way? Brains grow at different rates, and do not mature until their early twenties.

p.s. Children who play computer games developed skills of concentration - but about killing? The future is computer programs that develop both knowledge and concentration.

Jim said...

In view of the "footbridge dilemma" it seems as though the bomb dropped on Nagasaki was aptly named.

Sulla said...

Steve, your explanation for the Flynn effect is a description of Marshall McLuhan's children of the digital age.

Jack Amok said...

The a big part of the Flynn effect is really just the increase in Asperger's?

Anonymous said...

it is indeed a brilliant article - he nailed it

One of the most densely written articles I had read. And completely not necessarily so. It's not like the basic premise is some sort of quantum leap. It was all reasonable but should have been expressed better.

Anonymous said...

I had always assumed the Flynn effect was uniform over the various sub-tests. I didn't know there were drastic differences among them.

Flick said...

In comparing dogs and rabbits, do you get 3 points for saying that they're both four-legged mammals that are sometimes adopted as pets?

jody said...

this little dude's blog can't be real. a 15 year old is writing this stuff?

Silver said...

"p.s. Children who play computer games developed skills of concentration - but about killing? "

It helps develop proficiency and confidence in problem-solving, not killing. How many computer geeks can even throw a punch? Some surely can, but it's not what they're known for.

Silver said...

"I'm really confused as to how this isn't the easiest to prep for and the least "g-loaded". Aren't you just memorizing simple facts here?"

General knowledge questions can be prepped for by reading a lot and thereby increasing the likelihood that what you have studied shows up on a test, but the post was specifically discussing the detection of rules rather than addressing prepping more broadly. When it comes to general knowledge you typically either know the answer or you don't. Occasionally the multiple choice options can you tip you off or trigger a memory that improves the odds of guessing correctly but this is really stretching the definition of 'rule detection.'

Anonymous said...

Only 15 eh? Armstrong seems to be a very interesting guy!

Jehu said...

We have IQ test numbers from lots of people who served in the military back in WWII and the Cold War. Has anyone done a follow-up study on them? My observation is that the people that the US WWII military considered +1 to +2 sigma still strike me as +1 to +2 sigma and so forth.

Alice said...

The game Set! is almost exactly a practice for Raven's. There, they explicitly tell you the acceptable rules, so you can be taught what you should be looking for.


rob said...

One doesn't see this sort of exceptional performance in the Indians who grind out spelling bee victories.

Possibly these distinctions would be useful:

1) Ability to follow a rule.

2) Ability to pick which rule to follow

3) Ability to figure out new and useful rules.



By rules I'm also including noticing patterns. Grinds are very good at 1. Give 'em a list of vocab to study, and they will study the sh*t outa that list. They're not very good at figuring out what's a good idea. Regular smart people are pretty good at 1 and good at 2. People who are extremely good at 3 are geniuses. I can learn Fourier transforms by studying hard, but Fourier didn't study the sh*t outa a book on Fourier transforms. He found very useful rules where no one expected them.

Contrasting grinding Indian spelling-bee winners and Elijah makes the importance of filtering out grinds for top schools.

theyoushow said...

To me, IQ tests of any kind are only a judge of how a person takes tests rather than a correct judgment of a person's real intelligence.

There are genius's out there who can not take tests, but who have demonstrated their super-intelligence through life and life experiences, yet there are those who score high on IQ tests and I wouldn't want them to be on a desert island with me.

A test is a test is a test and nothing shows us a human being more than just being present with that human being and observing how the person handles crisis ,every day life, real problems and how they use their intelligence to peacefully , creatively solve those real problems without creating more problems. 3