Psychometrics is a relatively mature field  of science, and a politically unpopular one. So you might think there  isn't much money to be made in making up brand new standardized tests. Yet, there is.
From the NYT:
U.S. Asks Educators to Reinvent Student Tests, and How They Are Given 
    Standardized exams — the multiple-choice, bubble tests in math and  reading that have played a growing role in American public education in  recent years — are being overhauled. 
Over the next four years, two groups of states, 44 in all, will get $330  million to work with hundreds of university professors and testing  experts to design a series of new assessments that officials say will  look very different from those in use today.  
The new tests, which Secretary of Education Arne Duncan described in a speech in Virginia on Thursday, are to be ready for the 2014-15 school year.  
They will be computer-based, Mr. Duncan said, and will measure  higher-order skills ignored by the multiple-choice exams used in nearly  every state, including students’ ability to read complex texts,  synthesize information and do research projects.  
“The use of smarter technology in assessments,” Mr. Duncan said, “makes  it possible to assess students by asking them to design products of  experiments, to manipulate parameters, run tests and record data.”
I don't know what the phrase "design products of experiments" even  means, so I suspect that the schoolchildren of 2014-15 won't be doing  much of it.
Okay, I looked up Duncan's speech, "
Beyond the Bubble Tests," and what he actually said was "design products 
or  experiments," which almost makes sense, until you stop and think about  it. 
Who is going to assess the products the students design? George  Foreman? Donald Trump? (The Donald would be good at grading these tests: tough, but fair. Here's a 
video of Ali G pitching the product he designed -- the "ice cream glove" -- to Trump.
Because the new tests will be computerized and will be administered  several times throughout the school year, they are expected to provide  faster feedback to teachers than the current tests about what students  are learning and what might need to be retaught. 
Both groups will produce tests that rely heavily on technology in their  classroom administration and in their scoring, she noted. 
Both will provide not only end-of-year tests similar to those in use now  but also formative tests that teachers will administer several times a  year to help guide instruction, she said. 
And both groups’ tests will include so-called performance-based tasks, designed to mirror complex, real-world situations. 
In performance-based tasks, which are increasingly common in tests  administered by the military and in other fields, students are given a  problem — they could be told, for example, to pretend they are a mayor  who needs to reduce a city’s pollution — and must sift through a  portfolio of tools and write analytically about how they would use them  to solve the problem. 
Oh, boy ...
There is some good stuff here -- adaptive tests are a good idea (both the military's AFQT and the GRE have gone over to them). But there's obvious trouble, too.
Okay, so these new tests are going to be much more complex, much more  subjective, and get graded much faster than fill-in-the-bubble tests? They'll be a dessert topping and a floor wax!
These sound a lot like the Advanced Placement tests offered to high  school students, which usually include lengthy essays. But AP tests take  two months  to grade, and are only offered once per year (in May, with scores coming  back in July), because they use high school teachers on their summer  vacations to grade them.
There's no good reason why fill-in-the-bubble tests can't be scored quickly. A lot of public school bubble tests are graded slothfully, but they don't have to be. My son took the ERB's 
Independent School Entrance Exam on a Saturday morning and his score arrived at our house in the U.S. Mail the following Friday, six days later.
The only legitimate reason for slow grading is if there are also essays to be read, but in my experience, essay results tend to be  dubious at least below the level of Advanced Placement tests, where there is  specific subject matter in common. The Writing test that was added to  the SAT around 2003 has largely been a bust, with many colleges refusing  to use it in the admissions process.
One often overlooked problem with any kind of writing test, for example, is that graders have a hard time reading kids' handwriting.  You can't demand that kids type because millions of them can't. Indeed,  writing test results tend to correlate with number of words written,  which is often more of a test of handwriting speed than of anything  else. Multiple choice tests have obvious weaknesses, but at least they  minimize the variance introduced by small motor skills. 
And the reference to "performance-based tasks" in which people are supposed to "write analytically" is naive. I suspect that Duncan and the NYT man are confused by all the talk during the 
Ricci case about the wonders of "assessment centers" in which candidates  for promotion are supposed to sort through an in-basket and talk out  loud about how they would handle problems. In other words, those are  hugely expensive oral tests. The city of New Haven brought in 30 senior  fire department officials from out of state to be the judges on the oral  part of the test.
And the main point of spending all this money on an oral test is that  an oral test can't be blindgraded. In New Haven, 19 of  the 30 oral test judges were minorities, which isn't something that happens by randomly recruiting senior fire department officials from across the country.
But nobody can afford to rig the testing of 35,000,000 students annually.
  
Here are some excerpts from Duncan's 
speech:
President Obama called on the nation's governors and state education  chiefs "to develop standards and assessments that don't simply measure  whether students can fill in a bubble on a test, but whether they  possess 21st century skills like problem-solving and critical thinking  and entrepreneurship and creativity."
You know your chain is being yanked when you hear that  schoolteachers are supposed to teach "21st century skills" like  "entrepreneurship." So, schoolteachers are going to teach kids how to be  Steve Jobs?
Look, there are a lot of good things to say  about teachers, but, generally speaking, people who strive for union  jobs with lifetime tenure and summers off are not the world's leading role models on entrepreneurship.
Further, whenever you hear teachers talk about how they teach  "critical thinking," you can more or less translate that into "I hate drilling brats  on their times tables. It's so boring." On the whole, teachers aren't  very good critical thinkers. If they were, Ed School would drive them batty. (Here is an 
essay about Ed School by one teacher who is a good critical thinker.) 
And last but not least, for the first time, the new assessments will  better measure the higher-order thinking skills so vital to success in  the global economy of the 21st century and the future of American  prosperity. To be on track today for college and careers, students need  to show that they can analyze and solve complex problems, communicate  clearly, synthesize information, apply knowledge, and generalize  learning to other settings. ...
Over the past 19 months, I have visited 42 states to talk to teachers,  parents, students, school leaders, and lawmakers about our nation's  public schools. Almost everywhere I went, I heard people express concern  that the curriculum had narrowed as more educators "taught to the  test," especially in schools with large numbers of disadvantaged  students.      
Two words: Disparate Impact.
The higher the intellectual skills that are tested, the larger the  gaps between the races will turn out to be. Consider the AP Physics C  exam, the harder of the two AP physics tests: In 2008, 5,705 white males earned 5s (the top score) versus 
six black females. 
In contrast, tests of rote memorization, such as having third graders  chant the multiplication tables, will have smaller disparate impact  than tests of whether students "can analyze and solve complex problems,  communicate  clearly, synthesize information, apply knowledge, and generalize  learning to other settings." That's a pretty decent description of what  IQ tests measure.
Duncan says that the new tests could replace existing high school exit exams that students must pass to graduate.
Many educators have lamented for years the persistent disconnect between  what high schools expect from their students and the skills that  colleges expect from incoming freshman. Yet both of the state consortia  that won awards in the Race to the Top assessment competition pursued  and got a remarkable level of buy-in from colleges and universities.
... In those MOUs, 188 public  colleges and universities and 16 private ones agreed that they would  work with the consortium to define what it means to be college-ready on  the new high school assessments. 
The fact that you can currently graduate from high school without  being smart enough for college is not a bug, it's a feature. Look, this  isn't Lake Wobegon. Half the people in America are below average in  intelligence. They aren't really college material. But they shouldn't all have to go through life branded as a high school dropout instead of high school graduate because they  weren't lucky enough in the genetic lottery to be college material.
The Gates Foundation and the U. of California ganged up on the LA public schools to get the school board to pass a rule that nobody will be allowed to graduate who hasn't passed three years of math, including Algebra II. That's great for UC, not so great for an 85 IQ kid who just wants a high school diploma so employers won't treat him like (uh oh) a high school dropout. But, nobody gets that.
Another benefit of Duncan's new high stakes tests will be Smaller Sample Sizes of Questions:
With the benefit of technology, assessment questions can incorporate  audio and video. Problems can be situated in real-world environments,  where students perform tasks or include multi-stage scenarios and  extended essays.
By way of example, the NAEP has experimented with asking eighth-graders  to use a hot-air balloon simulation to design and conduct an experiment  to determine the relationship between payload mass and balloon altitude.  As the balloon rises in the flight box, the student notes the changes  in altitude, balloon volume, and time to final altitude. Unlike filling  in the bubble on a score sheet, this complex simulation task takes 60  minutes to complete.   
So, the NAEP has 
experimented with this kind of question. How did the experiment work out?
You'll notice that the problem with using up  60 minutes of valuable testing time on a single multipart problem  instead of, say, 60 separate problems is that it radically reduces the  sample size. A lot of kids will get off track right away and get a zero  for the whole one hour segment. Other kids will have seen a hot air  balloon problem the week before and nail the whole thing and get a  perfect score for the hour.
That kind of thing is fine for the low stakes NAEP where results are  only reported by groups with huge sample sizes  (for example, the NAEP reports scores for whites, blacks, and  Hispanics, but not for Asians). But for high stakes testing of  individual students and of their teachers, it's too random. AP tests  have large problems on them, but they are only given to the top quarter  or so of high school students in the country, not the bottom half of  grade school students. 
It's absurd to think that it's all that crucial that all American  schoolchildren must be able to "analyze and solve complex problems,  communicate  clearly, synthesize information, apply knowledge, and generalize  learning to other settings." You can be a success in life without being  able to do any of that terribly well.
Look, for example, at the Secretary of Education. Arne Duncan has  spent 19 months traveling to 42 states, talking about testing with  teachers, parents, school leaders, and lawmakers. Yet, has he been able  to synthesize information about testing terribly well at all? Has his  failure to apply knowledge and generalize learning about testing gotten  him fired from the Cabinet?