home | contact | faq

action items

 

 

  Is There a Testing Silver Lining?

Some folks see a silver lining in the testing controversy.  Perhaps they will concede that the Terra Nova and InView testing given in September 2004 was poorly timed, more stressful on children than they would have liked, and they may even admit that the results are not as faultless as they would wish. However, they see a wonderful opportunity to use the test scores we now have in order to evaluate our curriculum.  In other words, if the testing was less than perfect, at least we have the chance to truly and impartially judge our curriculum by using the collective test results, they believe.  Let’s examine this belief in greater detail.

Here is the dream: using our collective Terra Nova scores and the resulting statistics provided by CTB/McGraw Hill, we can clearly see, for example, that our students are having difficulty evaluating the meaning of reading passages or that they are strong in geometry and spatial concepts.  With specifics like this provided by the test company statistics, the dream is that we can evaluate our curriculum with detachment for the specific changes needed to strengthen learning in our schools.  If the statistics show that our students are weak in comprehension, we simply purchase and use new materials that stress this area and re-test next year.  Objective assessment and real-world response, right?

Not so fast, McDuff.  The lure of numbers presented as an objective measurement of our curriculum is strong to administrators besieged by State requirements and parental desires for excellence, but, like a moth to a flame, the end result is not what the lure promises.  We must talk about some testing specifics to understand why the lure of these particular numbers must be avoided if we are really interested in excellence.

First, we must be clear on the category of test our students took in September. The Terra Nova test is a norm-referenced test, as opposed to a criterion-referenced test.  It is critical to understand the difference.  Let’s start with the latter.

A criterion-referenced test measures a test taker’s specific knowledge of subjects against a standard.  This category of test renders a score based simply on the number of correct answers, so that, a score of 80 percent indicates that 80 out of 100 questions were answered correctly.  A criterion-referenced test can also be standardized and it can be put into a multiple-choice fill-in-the-bubble format. Criterion referenced standardized tests are subject to many of the same problems as other types of standardized tests, but at least the resulting score indicates that questions relating to specific knowledge were answered correctly or incorrectly.  Criterion referenced tests can show the level of knowledge or achievement in specific areas for individual test takers, subject to the same issues that plague other types of multiple choice bubble tests.

Norm-referenced tests are different.  This category of test can only show how test takers scored in comparison to each other.  A score in the 80th percentile indicates that the test taker scored better than 80 percent of all the other test takers (or the statistical prediction of how those other test-takers would score) but it says nothing about the actual knowledge of the test taker.  A frequently used example is this: you might be in the 95th percentile for height among those living on your block.  At first glance, you might consider yourself very tall indeed, since you are taller than 95 percent of the people living on your block.  However, it could well be that the majority of people living on your block are around four foot eleven inches tall.  Your percentile tells you nothing about actual height, only relative height.

Norm-referenced tests must be carefully constructed in order to produce the desired results.  What the makers of this category of standardized test wish to produce is a perfect bell curve of relative results, indicating that most test takers are “average” in scoring something close to a 50 percentile. The number of test takers scoring above or below average should taper off as the percentile gets higher or lower.  In order to produce these results consistently, norm-referenced test questions are researched extensively in a search for questions that are typically answered correctly only half the time.  The goal, after all, is to create that big bulge of scores right around the middle 50th percentile.

As a result, questions tend to be superficial and sometimes even silly.  Very often it is just plain trickery that is used to differentiate test takers.  Perhaps four questions in a row will ask the test taker to mark the picture that rhymes with a certain word.  The test taker will quickly become accustomed to the format: the given word is “bat”, now fill in the bubble next to the picture of the rhyming “hat”.  On the fifth question, the writers of the questions throw a curve. The test taker is asked to mark the picture with the beginning sound that is the same as the given word, though the visual format of the answers is the same as before.  The word given is “frog”.  The first possible answer is a picture of a “log”, the last possible answer is a picture of a “feather”.  Many test takers will be tricked into selecting the rhyming “log” instead of the correct “feather”.  Sometimes teachers acting as proctors will be unable to resist pointing out to their students that a trick like this is occurring, though the rules prohibit them from doing so. That’s just one more variable introduced and unaccounted for, of course.

As McCaslin, Burross and Good put it in their January 2005 EPAA article Change and Continuity in Student Achievement from Grades 3 to 5:

“Successful test taking is all about reading directions that can (and do) change unexpectedly, resisting the lure of the first familiar and intentionally seductive answer, moving on when confronted by difficulty, not wasting time working the problems through to completion, and keeping one eye on the clock.  It is a considerable leap from student test performance to student learning.”

The Terra Nova test taken by students in our district in September 2004 is a norm-referenced test based on these very principles.  This is the test upon which some would have us base our evaluation of curriculum. Of course, parents and stakeholders in our district have no way of confirming the character of the Terra Nova test for themselves since the test is considered “secure” or secret by CTB/McGraw Hill.  No, you cannot see the questions and, no, they will not release any more information about the test.  The truth is, CTB/McGraw Hill has an exceptional license to deceive schools because of the secrecy of their test.  The company would have us believe that they can create magic: we can have our cake and we can eat it too.  The Terra Nova is now being sold as both a norm-referenced and a criterion referenced test!  Yes, you’ll have your percentile scores, but the kindly statisticians deep in the secret chambers of the company will also mine the results of the test to provide information on how our children performed in specific subject areas as well. Why, its two—two tests in one!

Nice work if you can get it.  The trouble is, the Terra Nova, unlike a true criterion-referenced test, does employ the kind of trickery and seduction noted above, and that makes it a lousy measure of true student knowledge.  It is designed at its core to differentiate not to objectively evaluate. That brings us full circle.  On the guidance of CTB/McGraw Hill, a company that never found a school budget dollar it didn’t like, our administrators are using a norm-referenced test that tells us nothing about individual student achievement in order to evaluate a curriculum that is ideally supposed to maximize individual student achievement.  Swallowed, hook, line and sinker.

Alfie Kohn, author of numerous books on testing and educational policies, has noted:

…support for testing seems to grow as you move away from the students, going from teacher to principal to central office administrator to school board member to state board member, state legislator and governor.

He also describes teachers as having a “dislike/hate” relationship with testing.  For talented teachers who spend every day in our classrooms working with both our students and our curriculum, the dream of using Terra Nova as a basis for curriculum change is more of nightmare.  They already have plenty of suggestions for improved curriculum standards and many of them already supplement the existing curriculum with materials purchased with their own money in order to effect these improvements. All they desire is a central authority to normalize these suggestions for all district classrooms and to provide some funding.  Administrators and board members, though, want the comfort of supposedly objective numbers and full color corporate-style reports.  Unfortunately, the numbers provided by the CTB/McGraw Hill company’s digestion of our Terra Nova test results are anything but objective, though the company does have a knack for creating pretty looking reports.

The truth is, you can put lipstick on a pig, but it is still a pig.  CTB/McGraw Hill can tart-up their Terra Nova test and sell it to Readington, but it is still what it is. Even in the unfortunate event that our administration and board members would prefer to use testing instead of teacher input as a means to evaluate curriculum, the Terra Nova is simply the wrong test.

© Copyright 2005, ReadingtonParents.org.  All Rights Reserved