Can sending
star teachers into slum schools close the racial gap in school achievement? Can teachers be fairly evaluated by how much their students' test scores went up from last spring to this spring?
Both ideas are very fashionable these days. I want to evaluate both theoretically, using a simple model with two assumptions:
First, star teachers exist, fortunately. Over the course of oneyear, some teachers can raise their students test scores more than one grade level. (There are also dud teachers who can't raise test scores as much as the averager teacher can.) In my simplified model, a star teacher is one who raises grade levels 1.5 years per year 1.0 years in the classroom.
Second, the positive impact of star teachers' is partly reduced over time by regression toward the mean. After nine months under the guidance of
Miss Jean Brodie, the kids are well ahead of the average. But when they come back from summer vacation, they aren't as far ahead anymore. Away from Ms. Wonderful, they've regressed toward the mean. There can be a lot of other causes for regression toward the mean. Perhaps after a second year under Miss Jean, some of the students are bored with her tricks and less intimidated by her shtick. Maybe, especially in math and science, the students start getting closer to their intellectual limits.
So, let's assess both questions about teachers with these two concepts in mind. Let's start with something I've always assumed was a good idea: value-added evaluations of teacher performance.
I've long advocated that teachers should not be evaluated upon how well their students do on standardized tests, since the impact of the teacher is typically overwhelmed in the results by the differences between students. Those kind of evaluation systems just augment the natural tendency for the best teachers to wind up with the best students, as everybody scrambles to get hired at the schools with the smartest students. Instead, I've argued for "value-added" evaluations of teachers, measuring how much test scores have gone up under the teacher relative to the students' previous scores. The Obama Administration has come around to this view, too.
Now, though, I've developed a worrisome question about measuring teacher performance on value added, something I've always recommended. How do you factor the effects of regression toward the mean into formulas for measuring teacher performance? In the real world, you can't always assume that last year's test scores show how smart each teacher's students are on average. Last years scores were likely driven up or down by the quality of the teacher last year. The really confusing thing is that it's likely that students whose test scores were unnaturally depressed by a bad teacher last year are likely to go up more this year than students whose test scores were boosted last year by a very good teacher. That's regression toward the mean.
Let's take a sports coaching example. When I was at Notre Dame High School, our archrival Crespi always killed us in pole vaulting during our annual track meet. In fact,
Crespi vaulters set a whole bunch of different national age group and high school year records.That's pretty amazing. Strangely enough, it becomes less amazing when you discover that all three star Crespi vaulters were named
Curran. It turns out that the Curran brothers had a pole vault track and pit in their backyard, where their father, who had been a pole vaulter, trained them in advanced pole vaulting techniques.
Here's a one minute
video from a Super Eight home movie from around 1972 of seventh-grader Anthony Curran clearing 9 feet in his backyard. I had always imagined ever since I read in the 1970s about the Curran family pole vaulting practice ground that they were very rich and had a huge back yard with an Olympic Stadium type set-up, but the video shows it's cramped, ramshackle, and the pit consists of old mattresses right in front of a brick wall. It looks like a good place to break your neck. I'm sure no modern upper middle class mom would put up with Dad and the boys building such a nightmare in the backyard, but Mrs. Curran can be seen waving happily in the home movie as her 13-year-old son hurtles toward his fate.
Not too surprisingly, the Curran Brothers were quite good pole vaulters in college (Anthony Curran, now the pole vault coach at UCLA, has an all-time personal best of 18'-8"), but they weren't the record setters in their subsequent careers that they had been in high school. I don't think any Curran's ever made the U.S. Olympic team. Regression toward the mean set in as they got older and better natural athletes started to catch up to them in hours of lifetime training.
Say you were the college pole vault coach of the Curran Brothers and the athletic director said to you, "Tim Curran set a world age group record at 15, Anthony Curran sent national class year records in high school for sophomore, junior, and senior years. We recruited you the two most accomplished high school vaulters in the history of the top pole vaulting state in the Union. But under
your coaching, they aren't even winning college national championships. Why are you failing so badly with all this talent we gave you?"
The true answer is that because the Currans started training so much younger than their current competitors in college, they came closer to fulfilling 100% of their natural potential in high school than anybody else in California did. Now, the other kids are catching up and regression toward the mean is kicking in for the Currans. As high schoolers, the Currans had good nature and exceptional nurture to dominate an obscure sport. By college, they were running into competitors with even better nature, and the nurture gap was closing as all the top competitors got the same amount of coaching in college.
Now, let's think about this in a typical school, where children aren't always fully randomly shuffled after each year. For example, at my elementary school in the 1960s, there were 70 children at each grade level, so they were divided up into the Blue and the Red classrooms. They weren't tracked, they were just randomly assigned. If you started out as a Blue, you typically stayed in Blue with your closer friends.
Say that the two 1st grade teachers are wildly different in effectiveness. The Blue 1st Grade teacher's students finish the year a half grade level
above the average, while the Red 1st Grade teachers students finish the year a half grade level below average.
Now, if you are a second grade teacher of perfectly average effectiveness, a teacher who can be expected to raise the grade level of an average class by 1.0 years (relative to the average), which class do you want to inherit, Blue or Red, to do best on the teacher effectiveness evaluation at the end of their second grade.
Let's say that the great Blue first grade teacher's benefits have a one year half life and the bad Red first grade teacher's harm's have a one year half life. In other words, there is regression toward the mean over time in teaching effectiveness, as in so much in life.
If you were just being measured not on value added, but on simple absolute performance at the end of the grade, you'd want to inherit the Blue class that ended last year 0.5 grade levels above average. If you do an average job and the half life is one year, then they'll finish your year averaging grade level 2.25: 0.25 grade levels above average, and you'll be considered a good teacher.
On the other hand, if you are being relativistically measured on value added as calculated by your second graders' grade level at the end of your year minus their grade level at the end of the previous year, you don't want to inherit the star teacher's overachieving Blue class, because you will only get credit for adding a crummy 0.75 grade levels in value. Sure, after two years, they'll be at grade level 2.25, but the were at 1.5 a year ago, so you only get credit for 2.25-1.50 = 0.75 grade levels of value added.
Under value added measurement, you might get fired for, in essence, having inherited the better taught class.
Instead, under value added measurement, you want to inherit the underachieving Red Class from that bad teacher, so that you can get the credit for her students inevitable upward regression toward the mean. They'll wind up the year going from 0.5 to 1.75, so you'll get credit for adding the value of 1.25 grades. I'm a star! Give me my bonus money, Arne Duncan, gimme it now!
This model where there is partial regression toward the mean after the impact of superstar teachers has interesting implications for the national obsession with closing the racial gaps in school achievement.
Assume you have an elementary school with average students where
every teacher is a star capable of pushing students ahead 1.5 grades each year (a Grade Level Boost of 0.5), all else being equal. If there is zero regression toward the mean, a simple Excel model predicts that when the average student graduates at the end of eighth grade, he's performing at the 12th grade level.
Grade | Grd Level Boost | Regress to Mean | Grade Level |
1 | 0.5 | 0% | 1.5 |
2 | 0.5 | 0% | 3.0 |
3 | 0.5 | 0% | 4.5 |
4 | 0.5 | 0% | 6.0 |
5 | 0.5 | 0% | 7.5 |
6 | 0.5 | 0% | 9.0 |
7 | 0.5 | 0% | 10.5 |
8 | 0.5 | 0% | 12.0 |
On the other hand, if there is 100% regression toward the mean, the average student, after eight years of star teachers, tests at just the 8.5 grade level at the end of 8th grade:
Grade | Grade Level Boost | Reg to Mean | Grade Level |
1 | 0.5 | 100% | 1.5 |
2 | 0.5 | 100% | 2.5 |
3 | 0.5 | 100% | 3.5 |
4 | 0.5 | 100% | 4.5 |
5 | 0.5 | 100% | 5.5 |
6 | 0.5 | 100% | 6.5 |
7 | 0.5 | 100% | 7.5 |
8 | 0.5 | 100% | 8.5 |
The discouraging thing is that the results of regression toward the mean aren't symmetrical: you only get the the big boosts in grade level by eliminating the last bits of regression toward the mean, but that's very hard to do.
For example, if the regression toward the mean factor is 50 percent per year, then the average student who has benefited from eight consecutive star teachers leaves the school at the end of the 8th grade performing at just the 9.0 grade level. Eight star teachers in a row have gotten him up only one grade level:
Grade | Grade Level Boost | Reg to Mean | Grade Level |
1 | 0.5 | 50% | 1.5 |
2 | 0.5 | 50% | 2.8 |
3 | 0.5 | 50% | 3.9 |
4 | 0.5 | 50% | 4.9 |
5 | 0.5 | 50% | 6.0 |
6 | 0.5 | 50% | 7.0 |
7 | 0.5 | 50% | 8.0 |
8 | 0.5 | 50% | 9.0 |
So, you can see the contemporary obsession in the Obama Administration and the prestige press comes from with trying to reduce regression toward the mean by taking away kids' summer vacations, by keeping them at school a dozen hours per day (the celebrated KIPP program), and so forth.
Unfortunately, the big gains only come from eliminating the last bits of regression toward the mean. If you can cut regression toward the mean from 50% to 25%, then the average student's grade level at the end of eighth grade increases from 9.0 to 9.8:
Grade | Grade Level Boost | Reg to Mean | Grade Level |
1 | 0.5 | 25% | 1.5 |
2 | 0.5 | 25% | 2.9 |
3 | 0.5 | 25% | 4.2 |
4 | 0.5 | 25% | 5.4 |
5 | 0.5 | 25% | 6.5 |
6 | 0.5 | 25% | 7.6 |
7 | 0.5 | 25% | 8.7 |
8 | 0.5 | 25% | 9.8 |
But, as you can see, in a school of star teachers, reducing annual regression toward the mean from 100% to 25% only boosts grade level upon eighth grade graduation by 1.3 years, from 8.5 to 9.8. In contrast, reducing annual regression toward the mean from 25% to 0% would, theoretically, boost grade level at elementary school graduation by 2.2 years, from 9.8 to 12.0. But, due to diminishing marginal returns, it's probably much harder to reduce regression toward the mean from 25% to 0% than from 100% to 25%.
Since the white-black gap at the end of high school is three to four years, these regression toward the mean calculations can help explain why there is such a
Blind Side-like obsession with plugging holes in the environment where NAM students' regression toward the mean might occur. For example, the
NYT Magazine ran a feature on a public boarding school in a poor part of Washington DC where the taxpayers pay $35k per student per year for five nights per week at this boarding school. But the article was heavily devoted to worrying about whether the two nights per week that the students spend at home was causing the presumed test score gains of the five nights in the dorm to regress back toward the black mean.
Of course, the real killer in terms of closing the racial gap by eliminating sources of regression toward the mean is that eventually, these individuals turn into adults whom you can't manipulate so much, and then they choose environments for themselves.