Ohio Teacher Evaluation System : Dishonest, Unrealistic, and Not Fully Supported by Academic Research

by ajc

I've spent the past three days at an OTES (Ohio Teacher Evaluation System) training. This system is being phased in over the next two years, and will serve as the vehicle by which all teachers in Ohio are evaluated. The workshop culminates with a post-assessment, taken some time after the classes end, resulting in licensure and the ability to evaluate instructional staff.

OTES is described by ODE as a system that will

provide educators with a richer and more detailed view of their performance, with a focus on specific strengths and opportunities for improvement.

I talked to a number of administrators and teachers who had already taken the training before attending. Without exception, they were all struck by the rigidity of the rubric. I agree, but there's more here. Any system that wields so much power must be realistic, honest, and rooted in the consensus of academic research. The OTES rubric fails this basic test.

Words Matter

Check out the Ohio Standards for the Teaching Profession (starting on page 16) approved in October of 2005. Now look at the OTES rubric. The first thing you will notice is that the OTES rubric has four levels, and that the Ohio Standards only have three. I think it's fair to say that the Ohio Standards did not include the lowest level. (The document says as much.) The top three levels of the OTES Rubric align with the three levels of the Ohio Standards. The snag? The terminology used in the OTES rubric. Proficient has been replaced by Developing, Accomplished by Proficient, and Distinguished by Accomplished. Each level has been relegated!

One might argue that this doesn't matter. But, it does. Teacher evaluations are public record. School performance, or at least the percentage of teachers that fall into each category, will be published. Newspapers will ask for names of teachers and their ratings. And, as we will see as I unpack the rubric in greater detail, the very best teachers are likely to fall into the Proficient category. What's the one relationship between public education and the word Proficient already burned into the minds of parents? The minimal level of performance required to pass the Ohio Graduation Test. Dishonest.

Recognizing Excellence

The OTES rubric is brutal. While the final determination of how many Accomplished ratings are required to achieve an overall Accomplished score is left to the district, it is clear that very few, if any, teachers will ever attain such a rating. The State has accumulated over 100 videos of lessons. Not one represents an Accomplished teacher. Trainers are telling trainees to forget about the Accomplished level when they take their assessment. My trainer stated that she's been told to expect between 0-2% of teachers to meet the requirements of this elite rating. (She also said that they've been working this model in her county for the past year and have yet to find a teacher that scores this high.) Unrealistic.

For fun, I looked at some other professions. By my calculations, between 4-5% of doctors and lawyers are recognized as "excellent" or the best in their field. (I used Castle Connolly for the doctor information, and Best Lawyers for the attorneys.) What other profession would policy makers have the audacity to humiliate in this way? What other group of professionals would be complicit in such humiliation? (Certainly, some from our ranks were consulted as OTES was developed.)

Academic Research

Here's where it gets complicated. As a Ph.D. candidate in educational psychology, I'm aware of the battles that take place between those from my field and the curriculum specialists. Typically, educational psychologists will cite quantitative research while the curriculum specialists will have less stringent requirements in terms of what is required for validity. However, it fairly clear that the sort of research that requires experimental design and statistical analysis has not been part of the body of work supporting learning styles and differentiated instruction, yet both are a big part of the rubric.

Learning styles are part of the accomplished ranking for Assessment Data, all rankings for Knowledge of Students, and both the Developing and Proficient levels of Resources. Learning styles have been debunked, full article here. The primary finding? The vast majority of Learning Style studies failed to randomly assign subjects, and many that did found evidence that contradicted the principles of the theory. And yet, OTES requires teachers to employ this discredited theory. Not supported.

Differentiated instruction (DI), one of the ten areas of the rubric, is a bit more troublesome. It also shows up in the Accomplished level of Assessment of Data, and the Developing, Proficient, and Accomplished levels of Assessment of Student Learning. Nascent in comparison to learning styles, DI is rooted in the work of Carol Tomlinson.

In an early piece, Tomlinson, after dedicating several pages to describing the ways in which teachers have failed to meet the "diverse needs of their students", implores teachers to "consistently, defensibly, and vigorously adjust curriculum and instruction in response to student readiness, interest, and learning profile" (p. 131). One of the three foundational areas within which Tomlinson (originally) suggested differentiation take place is the "learning profile", closely related to the debunked "learning styles". It is my understanding that she has focused more on the other areas of her theory in recent years.

Here's the thing about differentiated instruction. It significantly increases a teacher's workload, and it's not clear that real differentiation, as Tomlinson describes, is possible or more effective. In fact, Mike Schmoker argues that, "it is on no list, short or long, of the most effective educational actions or interventions". If you read Schomoker's piece, you'll learn that he corresponded with Tomlinson via email for a prolonged period of time, asking her to cite "research or strong evidence to support (DI's) widespread adoption". Tomlinson was unable to produce such evidence. Not universally supported.

To be fair, differentiated instruction is complex, and likely hard to assess via traditional experimental designs. Providing materials and support that put each student into their idiosyncratic "Zone of Proximal Development" sounds wonderful. But, to reach the Accomplished level on the OTES rubric, teachers have to demonstrate differentiation at the individual level. This is unrealistic. Why is DI included as a category on the OTES rubric? Why has it been embraced by the (public) education community? It sounds magnificent, and expecting one teacher to meet every need of every student is a lot cheaper than hiring additional staff to negotiate the individual differences teachers face every day.

Differentiation, at the Accomplished level of the OTES rubric demands individualization, and this is dangerous. As a professional community, we'd better be careful condoning this sort of expectation (a single teacher differentiating at the individual level), because while it is not likely that a human can do this sort of thing (see research related to the limits of working memory), technology can. And while Tomlinson's motivations appear honorable, policy makers' intentions are not as clear. Would they hesitate to turn over the education of our children to machines that don't get sick or ask for raises? Maybe.

What Does This Mean? (More Academic Research)

Let's say one buys into the notion that Ohio teachers, as a group, are subpar. What does academic research in the area of motivation tell us we should expect to observe when these "underperforming" individuals are confronted with the new evaluation system? In short, we should expect avoidance and distress, resulting the abandonment of any desire to apply the rubric in a meaningful way.

Goal Theory

Goal theory is an area of educational research that examines how goals affect learner motivation. Broadly defined, there are three categories of goals; mastery goals, performance goals, and avoidance goals. Mastery goals are the sort of goals that are set when an individual sees the inherent value in a skill or a domain of knowledge, and seeks to understand or attain competence due to this appreciation. Performance goals are selected when the primary driver of learning is to demonstrate competence, rather than to understand for the sake of understanding.

Finally, avoidance goals are selected when an individual lacks the confidence that they are able to complete a task. In these cases, avoiding notoriety is the principal objective. Individuals who face a rubric which, by design, eliminates any possibility of achieving excellence, are likely to avoid confronting this reality using any means necessary. They are unlikely to buy into the system as a means of professional growth. Rather, the chances are great that their will view the system with apprehension, confronting it and its prescriptions only when required to by their administrator.

Unattainable Goals

Isn't it ironic that educators' school years have traditionally begun with creation of SMART Goals, yet the state is requiring those same individuals to be evaluated using a framework thats highest ranking, by the State's own admission, is, for all intents and purposes, unattainable? (The "A" in SMART stands for attainable.)

Carsten Wrosch and colleagues have done a great deal of research on unattainable goals, finding that, "goal disengagement and goal reengagement tendencies can compensate for the distress associated with the occurrence of unattainable goals" (p. 1505). They conclude that unattainable goals are unhealthy and lead to distress. Further, those individuals who successfully cope with unattainable goals do so by giving up and selecting more realistic, though not necessarily related, attainable goals.


Self-efficacy, part of Bandura's social cognitive theory, refers to the belief that one is able to accomplish the task at hand. Similar to the findings of goal theory, Bandura's work suggests that individuals who do not believe that they will be successful, those who have low efficacy as it relates to the task at hand, will avoid such task rather than confront their perceived certainty of failure. More precisely, Bandura states

Self-efficacy judgments, whether accurate or faulty, influence choice of activities and environmental settings. People avoid activities that they believe exceed their coping capabilities, but they undertake and perform assuredly those that they judge themselves capable of managing (Bandura, 1982, p. 123)

These areas of motivational research suggest that the vast majority of teachers, those who are not mastery oriented or who are not supremely efficacious in their pedagogical ability, are likely to look for other avenues to satisfy whatever professional growth aspirations remain after confronting OTES.


There are bad teachers, and those that need support so that they might reach all of their students. Raising the overall quality of instruction is an admirable goal. However, policy makers have overcompensated for their belief that the vast majority of Ohio's teachers are negligent, creating an evaluation tool that is dishonest, not fully supported by academic research, and, in some cases, unrealistic in its expectations.

Administrators have their work cut out for them if they hope to use OTES as a vehicle for professional growth. Teachers will be confronted with a rubric couched in language seemingly chosen to degrade the level of effectiveness attained. Some administrators have suggested that teachers are "just going to have to forget about the Accomplished level, and be content with Proficient". I don't think that will work.