![]() |
||
|
|
Institutionalized Testing (Released to the web September14, 2008) Have you watched a weather report on television, taken in the stunning graphics, listened to the technical explanations of the forecast model, and then marveled later at how wrong the forecast came to be? Why is it that with all of the sophisticated computer models and high-tech gear meteorologists have these days it can still be more effective for us to look out the window for an accurate forecast? There is a lesson there. When similar modeling and forecasting problems are attempted in the realm of human endeavor, predictions are even more difficult. We don't know which candidate will win the election, sometimes even after the election is over! We can't predict stock market swings or business cycles with any consistency. We can't predict the popularity of new movies or music, even though we have a seemingly good handle on popular tastes. Human behavior is observable, but never subject to hard and fast rules. In the realm of education, the difficulty of measuring or predicting human capacity and knowledge is also fraught with problems when we try to apply mathematical or statistical methods. As Linda McNeil put it in The Contradictions of School Reform: "...educating children is complex, inefficient, idiosyncratic, uncertain and open ended." Few who have taught in a classroom would disagree. Yet, in spite of this seemingly self-evident fact, the higher-up we go in the command structure of public education the more emphasis there is on attempting to measure and predict student learning through the use of norm-referenced standardized tests and mathematical models. Even as more and more private colleges and universities are examining the last twenty years of experience and rejecting their reliance on SAT and ACT entrance tests for admissions, we find that public K-12 schools are pushing harder than ever to define and track students on the basis of a faith in numbers. In Readington we have spent past years in various attempts to fuse the student scores on state required tests with those on various commercial tests, such as the CTB McGraw Hill Terra Nova series, in a search for some holy grail to predict student performance. More recently, this search has led our current administration to take unprecedented steps toward institutionalizing certain testing schemes in ways that preclude the means for teachers in the trenches to quietly resist the onslaught on their professional judgment. With the combination of required state NJ ASK testing, the commercial NWEA "MAP" testing, and other tests, our administration has put in place a factory-like assessment system which treats students more like products than people. Linda McNeil, as well as others, have noted that such systems lead schools to become "centered on managing and controlling" with both teachers and students disengaging from enthusiastic involvement and falling into a "ritual of teaching and learning that tends toward minimal standards and minimal effort." She notes that when "credentialing", which in our case means test scores and student placement, is emphasized over the true substance of course content, teachers begin to teach in ways that serve this credentialing function but which do not provide students with the rich knowledge of the subject or teachers with the opportunity to build their own knowledge of the subject. The overpowering emphasis on the weight of standardized test scores--the institutionalizing of these numbers-- also leads to absurd outcomes which are often hidden from the view of unsuspecting parents. For example, when state NJASK test scores from the 2007-2008 school year were not available in time for Readington administrators to determine next year's student placement in the new middle school math tracks, they had a dilemma. Their model called for a number to be plugged into the equation to determine whether a student would be placed in the "on grade level" program or an accelerated program or a remedial track. To fill the hole where they needed a number, they decided to use NJASK scores from the 2006-2007 year--data that was over a year old--instead of the unavailable scores. Therefore, a student entering eighth grade was placed on a math track which was weighted heavily according to a standardized test he or she took in sixth grade. As it turns out, Readington parents (as well as the district itself) just received the NJASK scores for their children as of this writing, which is after the start of the school year following the year in which the tests were actually taken! While the validity of norm-referenced tests has long been called into question here and elsewhere, the darling test of our current administration, sold by the NWEA organization, is claimed to be different because it is adaptive and therefore thought to be more accurate and insightful. The computerized test is designed to offer more difficult questions or easier questions depending on previous answers by the test-taker. It isn't hard, though, to poke holes in this test and in a misguided faith in technological wonder. Adaptive or not, the NWEA MAP tests are still norm-referenced. A norm-referenced test compares individual answers to the answers of everybody else who has taken the test. A criterion-referenced test, by contrast, measures individual answers against a body of knowledge. Norm referenced tests like the NWEA series and the state NJASK tests are only designed to compare students to other students, not to compare their individual knowledge or skill in a subject to a a set standard. The NWEA test analysis then attempts to make statistical inferences about an individual test-taker and his or her abilities based on comparison to other students and assumptions about typical learning. This analysis presents educators with highly detailed--and sometimes highly dubious--data which is purely a statistical prediction by the psychometricians who designed the analysis. As Alfie Kohn pointed out in his booklet The case against Standardized Testing, the objective of these specialists "...is not to evaluate how well the students were taught, but to separate them, to get a range of scores." Norm-referenced tests by necessity contain questions meant to stump students, which is the means by which the test designers create a range of scores for comparison. Adaptive tests merely make this contrived method of creating a median score more efficient. Using norm-referenced tests to judge what knowledge or level of skill a student possesses for a subject is like "measuring temperature with a tablespoon," as testing expert W. James Popham once put it. Like any standardized test based on multiple-choice answers, the efficacy of the NWEA series also depends on the work of humans who compose the test. What a committee of test question writers believes to be a more difficult question than another may not be true for your particular child or for our school, our curriculum or our predominant culture. As Peter Sacks, author of Standardized Minds, points out, test designers "...are doing far more than deciding which multiple-choice items to put in the mix. They're making choices, subjective ones at that, about the very structure, pacing, organization, and format of the test. It would be naive, indeed, to believe that the [test designers'] choices on the structure and format of their exams has no effect on the outcome." Some children may stumble on what is typically considered a low level skill but have the capacity and demonstrated skill for much higher level performance for other skills in the same subject area. A teacher in a classroom can observe this as she interacts with a student on a daily basis, but an adaptive, multiple choice test will not necessarily pick up on it, which results in misleading data. When a child stumbles on the lower level question, he may not be given the chance by the computer algorithm to answer the higher level question afterwards. Since there are very limited questions for each subject area, it is easy to see how the test algorithm can be wrong in its automatic judgment based on inference. Of course, students may not have had a good night's sleep, or they may be irritated at taking yet another test battery, or they may not test well under pressure, or they may be looking at the birds flying around outside the window. Test items may be uninteresting. As Peter Sacks notes, "...test creators seem to adore dry, unimaginative, fact-laden and reading passages--in other words, simply bad and utterly boring writing." A teacher giving a criterion referenced quiz or working one-on-one can observe and counteract these sorts of problems. A computerized test cannot account for these factors no matter how adaptive or sophisticated the statistical analysis. For that matter, it isn't wise to assume that the math used to score high-profile norm-referenced tests is even correct, as there have been numerous examples of mistakes in recent years. Many of these mistakes have led to lawsuits and political finger-pointing. As just recently happened in Illinois, how do we handle scoring errors when they are discovered after the fact-- after the scores have been used to modify the path students take in their classroom? We could go on and on, but the primary point here is that norm-referenced multiple-choice tests, whether adaptive or not, are still limited to ranking students rather than rating them, are still limited by their inflexible, blind statistical analysis, and are ultimately not objective at all, since the composition of tests items is based on subjective decision making. The computerized adaptive tests like the NWEA MAP series may actually worsen the ill-effects described above because educators come to have faith in what is a false security. A teacher in a classroom is also subjective, but a teacher in a classroom has the professional judgment and flexibility to make substantive changes in assessment and observation based on the daily behavior and unique circumstances of each student. Even putting aside these structural flaws of the NWEA testing, what do educators and parents take away from the testing process? Teachers and administrators are given highly detailed information on each student, with references to charts supposedly showing exactly in which areas a student is deficient or advanced. This information is based on statistical inferences, since the number of the questions on the test doesn't approach the amount needed to provide direct analysis in specific subject areas. Parents are provided with a sheet of information containing a graph of their child's performance over time. The graph simply reflects their child's score relative to other test takers and may be a flat line. Over time and many tests, the graph is supposed to become more useful because it shows progress or lack of progress. Of course, by the time that much becomes apparent--even assuming that the tests are valid snapshots--how much water will already be over the dam? Our children are not assembly line products that we can take off the line, apply some touch-up paint or do some panel alignment, and return them where they left off. If six months or a year or two years have passed and we notice a badly trending line on the graph, what will we do then? Classroom teachers with more authentic assessments can make adjustments in real-time. If, instead, they wait months or longer for standardized test data, as is the case with state testing, how do they make up that lost time? The very best outcome from the NWEA testing strategy is that the scores will confirm what the teacher and parents already knew. Far worse, though, reliance on such tests can lead to misdiagnosis, to a false sense of security or alarm, and to a student whose education is permanently marred. For example, if the instant feedback from the NWEA test confirms a teacher's mistaken view of a student's capacity or progress, that student may suffer the consequences of misplacement. However, without the false reassurance of the test, the teacher would be forced to continually re-evaluate her view and would have a better chance of correcting her mistake in the daily give and take of learning. In another scenario, if a relatively inexperienced teacher is contradicted by test results--even though she is actually correct in her view--she may be inclined to trust the supposedly objective test results over her own judgment. And, that leads us back to the issue with which we originally began this discussion: the institutionalizing of testing schemes which preclude overriding action by teachers on behalf of individual students. Our administration has taken pains to point out that test scores are just one part of an overall assessment strategy. That sounds reasonable enough on the surface, but below the surface lurks something else. The fact of the matter is that test scores are the primary weight given to math placement in the middle school. The fact of the matter is that specific language arts skills questionably flagged by NWEA testing are being reinforced because the test diagnosis is described as reliable to young teachers who may not know better. The fact of the matter is that an inordinate amount of time, money and energy has been put into training teachers in the NWEA testing suite and encouraging its use, which implies a particular value system. The fact of the matter is that public presentations about the state of the district and even the vaunted strategic plan puts state test scores front and center. By this set of values we are only as successful as our scores are superior to other districts. The fact of the matter is that in communication to teachers and parents, the administration has made no secret that test scores from NJASK, NWEA, and some other tests are the primary means of evaluating progress. Actions back up that overarching theme. When the CTB McGraw Hill TerraNova test suite was implemented some years ago, the administration at the time used it to try to predict which students needed more help to improve state test scores. That was bad enough, and it was soon discredited by stakeholders who were paying attention. Today, with the expensive and all encompassing NWEA testing and the emphasis on state test scores, our district is going a step further and institutionalizing the reliance on testing. To think that teachers will be inclined to improve their own assessment skills or to buck the trend under this system is folly. To think that students have not already been mislabeled and mis-tracked under this scheme is also folly. To claim that the NWEA test results can and should be used by teachers to implement differentiated instruction is insulting to anyone who understands the meaning of it. Differentiated instruction demands what expert Carol Ann Tomlinson calls "ongoing assessment", which is rooted in daily observation by an expert teacher in real-time. A misplaced faith in seemingly dazzling technology and the power of statistics should not blind us to the obvious structural flaws in the NWEA and other norm-referenced tests, but that is exactly what is happening. An administration obsessed with data over decision-making by professionals is leading us down a dangerous path. As parents all we can do is assertively monitor how our own children are placed and what they are taught, intervening when it is clear that things are being missed or ignored. That isn't easy, nor should it be necessary. It isn't even desirable because it can lead to mistrust between home and school. Yet, we have little choice as the institutionalizing of data-driven ignorance pushes aside common sense, professional judgment and good teaching practice. For more study:
|
|
© Copyright 2008, ReadingtonParents.org. All Rights Reserved