Why We Learn

Thoughts on educational psychology, instructional design, and the integration of technology in educational settings.

A Better OTES Rubric

The OTES rubric is a fairly complicated document that runs several pages in length. As I see it, there are some major challenges to understanding its demands/expectations.

  • it's written in a way that includes a fair amount of independent clauses
  • the differences between levels are often incremental and depend on a single word
  • many of the assessment areas refer to multiple standards
  • the document runs many pages, so it's difficult to "see" everything at once

I spent the weekend deconstructing the rubric. My hope was to create something more structured; a document that would allow teachers and evaluators to view expectations more easily. I'm pretty happy with the result of my efforts.

OTES Rubric screenshot

There are some odd page breaks in the actual document. My intent was to create something that could be printed, trimmed, and spliced together using a laminating machine. This is the final result on my office door.

OTES Rubric complete

If you would like to use this, the final document can be downloaded here. Let me know if you find any mistakes

Ohio Teacher Evaluation System : Dishonest, Unrealistic, and Not Fully Supported by Academic Research

I've spent the past three days at an OTES (Ohio Teacher Evaluation System) training. This system is being phased in over the next two years, and will serve as the vehicle by which all teachers in Ohio are evaluated. The workshop culminates with a post-assessment, taken some time after the classes end, resulting in licensure and the ability to evaluate instructional staff.

OTES is described by ODE as a system that will

provide educators with a richer and more detailed view of their performance, with a focus on specific strengths and opportunities for improvement.

I talked to a number of administrators and teachers who had already taken the training before attending. Without exception, they were all struck by the rigidity of the rubric. I agree, but there's more here. Any system that wields so much power must be realistic, honest, and rooted in the consensus of academic research. The OTES rubric fails this basic test.

Words Matter

Check out the Ohio Standards for the Teaching Profession (starting on page 16) approved in October of 2005. Now look at the OTES rubric. The first thing you will notice is that the OTES rubric has four levels, and that the Ohio Standards only have three. I think it's fair to say that the Ohio Standards did not include the lowest level. (The document says as much.) The top three levels of the OTES Rubric align with the three levels of the Ohio Standards. The snag? The terminology used in the OTES rubric. Proficient has been replaced by Developing, Accomplished by Proficient, and Distinguished by Accomplished. Each level has been relegated!

One might argue that this doesn't matter. But, it does. Teacher evaluations are public record. School performance, or at least the percentage of teachers that fall into each category, will be published. Newspapers will ask for names of teachers and their ratings. And, as we will see as I unpack the rubric in greater detail, the very best teachers are likely to fall into the Proficient category. What's the one relationship between public education and the word Proficient already burned into the minds of parents? The minimal level of performance required to pass the Ohio Graduation Test. Dishonest.

Recognizing Excellence

The OTES rubric is brutal. While the final determination of how many Accomplished ratings are required to achieve an overall Accomplished score is left to the district, it is clear that very few, if any, teachers will ever attain such a rating. The State has accumulated over 100 videos of lessons. Not one represents an Accomplished teacher. Trainers are telling trainees to forget about the Accomplished level when they take their assessment. My trainer stated that she's been told to expect between 0-2% of teachers to meet the requirements of this elite rating. (She also said that they've been working this model in her county for the past year and have yet to find a teacher that scores this high.) Unrealistic.

For fun, I looked at some other professions. By my calculations, between 4-5% of doctors and lawyers are recognized as "excellent" or the best in their field. (I used Castle Connolly for the doctor information, and Best Lawyers for the attorneys.) What other profession would policy makers have the audacity to humiliate in this way? What other group of professionals would be complicit in such humiliation? (Certainly, some from our ranks were consulted as OTES was developed.)

Academic Research

Here's where it gets complicated. As a Ph.D. candidate in educational psychology, I'm aware of the battles that take place between those from my field and the curriculum specialists. Typically, educational psychologists will cite quantitative research while the curriculum specialists will have less stringent requirements in terms of what is required for validity. However, it fairly clear that the sort of research that requires experimental design and statistical analysis has not been part of the body of work supporting learning styles and differentiated instruction, yet both are a big part of the rubric.

Learning styles are part of the accomplished ranking for Assessment Data, all rankings for Knowledge of Students, and both the Developing and Proficient levels of Resources. Learning styles have been debunked, full article here. The primary finding? The vast majority of Learning Style studies failed to randomly assign subjects, and many that did found evidence that contradicted the principles of the theory. And yet, OTES requires teachers to employ this discredited theory. Not supported.

Differentiated instruction (DI), one of the ten areas of the rubric, is a bit more troublesome. It also shows up in the Accomplished level of Assessment of Data, and the Developing, Proficient, and Accomplished levels of Assessment of Student Learning. Nascent in comparison to learning styles, DI is rooted in the work of Carol Tomlinson.

In an early piece, Tomlinson, after dedicating several pages to describing the ways in which teachers have failed to meet the "diverse needs of their students", implores teachers to "consistently, defensibly, and vigorously adjust curriculum and instruction in response to student readiness, interest, and learning profile" (p. 131). One of the three foundational areas within which Tomlinson (originally) suggested differentiation take place is the "learning profile", closely related to the debunked "learning styles". It is my understanding that she has focused more on the other areas of her theory in recent years.

Here's the thing about differentiated instruction. It significantly increases a teacher's workload, and it's not clear that real differentiation, as Tomlinson describes, is possible or more effective. In fact, Mike Schmoker argues that, "it is on no list, short or long, of the most effective educational actions or interventions". If you read Schomoker's piece, you'll learn that he corresponded with Tomlinson via email for a prolonged period of time, asking her to cite "research or strong evidence to support (DI's) widespread adoption". Tomlinson was unable to produce such evidence. Not universally supported.

To be fair, differentiated instruction is complex, and likely hard to assess via traditional experimental designs. Providing materials and support that put each student into their idiosyncratic "Zone of Proximal Development" sounds wonderful. But, to reach the Accomplished level on the OTES rubric, teachers have to demonstrate differentiation at the individual level. This is unrealistic. Why is DI included as a category on the OTES rubric? Why has it been embraced by the (public) education community? It sounds magnificent, and expecting one teacher to meet every need of every student is a lot cheaper than hiring additional staff to negotiate the individual differences teachers face every day.

Differentiation, at the Accomplished level of the OTES rubric demands individualization, and this is dangerous. As a professional community, we'd better be careful condoning this sort of expectation (a single teacher differentiating at the individual level), because while it is not likely that a human can do this sort of thing (see research related to the limits of working memory), technology can. And while Tomlinson's motivations appear honorable, policy makers' intentions are not as clear. Would they hesitate to turn over the education of our children to machines that don't get sick or ask for raises? Maybe.

What Does This Mean? (More Academic Research)

Let's say one buys into the notion that Ohio teachers, as a group, are subpar. What does academic research in the area of motivation tell us we should expect to observe when these "underperforming" individuals are confronted with the new evaluation system? In short, we should expect avoidance and distress, resulting the abandonment of any desire to apply the rubric in a meaningful way.

Goal Theory

Goal theory is an area of educational research that examines how goals affect learner motivation. Broadly defined, there are three categories of goals; mastery goals, performance goals, and avoidance goals. Mastery goals are the sort of goals that are set when an individual sees the inherent value in a skill or a domain of knowledge, and seeks to understand or attain competence due to this appreciation. Performance goals are selected when the primary driver of learning is to demonstrate competence, rather than to understand for the sake of understanding.

Finally, avoidance goals are selected when an individual lacks the confidence that they are able to complete a task. In these cases, avoiding notoriety is the principal objective. Individuals who face a rubric which, by design, eliminates any possibility of achieving excellence, are likely to avoid confronting this reality using any means necessary. They are unlikely to buy into the system as a means of professional growth. Rather, the chances are great that their will view the system with apprehension, confronting it and its prescriptions only when required to by their administrator.

Unattainable Goals

Isn't it ironic that educators' school years have traditionally begun with creation of SMART Goals, yet the state is requiring those same individuals to be evaluated using a framework thats highest ranking, by the State's own admission, is, for all intents and purposes, unattainable? (The "A" in SMART stands for attainable.)

Carsten Wrosch and colleagues have done a great deal of research on unattainable goals, finding that, "goal disengagement and goal reengagement tendencies can compensate for the distress associated with the occurrence of unattainable goals" (p. 1505). They conclude that unattainable goals are unhealthy and lead to distress. Further, those individuals who successfully cope with unattainable goals do so by giving up and selecting more realistic, though not necessarily related, attainable goals.


Self-efficacy, part of Bandura's social cognitive theory, refers to the belief that one is able to accomplish the task at hand. Similar to the findings of goal theory, Bandura's work suggests that individuals who do not believe that they will be successful, those who have low efficacy as it relates to the task at hand, will avoid such task rather than confront their perceived certainty of failure. More precisely, Bandura states

Self-efficacy judgments, whether accurate or faulty, influence choice of activities and environmental settings. People avoid activities that they believe exceed their coping capabilities, but they undertake and perform assuredly those that they judge themselves capable of managing (Bandura, 1982, p. 123)

These areas of motivational research suggest that the vast majority of teachers, those who are not mastery oriented or who are not supremely efficacious in their pedagogical ability, are likely to look for other avenues to satisfy whatever professional growth aspirations remain after confronting OTES.


There are bad teachers, and those that need support so that they might reach all of their students. Raising the overall quality of instruction is an admirable goal. However, policy makers have overcompensated for their belief that the vast majority of Ohio's teachers are negligent, creating an evaluation tool that is dishonest, not fully supported by academic research, and, in some cases, unrealistic in its expectations.

Administrators have their work cut out for them if they hope to use OTES as a vehicle for professional growth. Teachers will be confronted with a rubric couched in language seemingly chosen to degrade the level of effectiveness attained. Some administrators have suggested that teachers are "just going to have to forget about the Accomplished level, and be content with Proficient". I don't think that will work.

Instructional Efficiency and Learner Involvement

Paas, Tuovinen, van Merrienober, and Darabi (2005) draw on research related mental efficiency and the tenets of Keller’s ARCS model of motivation to arrive at a rather startling deduction; the idea that the results of the mental efficiency calculation, when plotted on a Cartesian axis, provide insight into a learner’s involvement as well as the instructional efficiency of a learning experience. The primary assumption underlying their work is that motivation (or involvement), mental effort, and performance are positively related, i.e., if one of these increases, the others do as well.

Paas et al. use the term “instructional involvement” (I) to refer to their motivational construct. The most interesting result of their assumption is that the “neutral” condition for instructional involvement runs perpendicular to the line representing “zero efficiency”. These two lines and the corresponding regions are combined in the image below.

Instructional Efficiency and Learner Involvement

The red area represents the area that, presumably, should be targeted by the instructional designer. But where should one focus, assuming they have the ability to dynamically monitor mental effort and performance. Further, are instructional experiences within one area of this continuum more beneficial based on the goals of instruction? What sort of trade-offs should one expect if choosing to target point A rather than point C? Does point E represent the “best of both worlds”?

Measures of motivation are numerous: self-efficacy, goal orientation, and attribution theory are three of the most prevalent. Paas et al. don’t propose that their construct replaces these ideas, but that it provides an overall measure of involvement. They leave it to researchers of motivation to determine the underlying reasons why a learner’s involvement is at a measured level.

Learning From Mistakes

It seems to me that education’s ultimate goal is transfer. Ironically, efficient instruction, a primary aim of instructional design, could prevent one from reaching this goal. Instructional designs represent the way in which its creator/designer views, perceives, and/or understands the content of study. Efficient instruction prescribes a narrow (single) path from what the learner knows to what the learner should know. Varying instructional techniques or utilizing complex (dynamic) methods of task selection, while increasing efficiency, is not likely to improve transfer. The path from which the (dynamically-selected) task originates is still narrow and pre-defined by the design.

Tangentially related information, and recollection of experiences (even mistakes), are useful when recalling information. Misperceptions, once corrected, can serve as points of activation (as episodic or autobiographical memories). These experiences and “ways of organizing” are likely beneficial in transfer situations, but they’re idiosyncratic. The teacher or designer’s idiosyncrasies, integrated into an instructional design, are not as likely to be assimilated into a learners schemata because the learner is not their “owner”. How do we construct experiences that facilitate the generation of these idiosyncrasies?

Research on feedback suggests that learners pay great attention when their misconceptions are challenged, and even greater attention when they find their internal “calibration”, or their ability to assess their expertise within a domain, to be inaccurate. These situations are more likely to occur when learners are not led stepwise from point A to point B. That is to say that instructional experiences that result in, but then alleviate cognitive dissonance (see Piaget’s disequilibrium) might be more likely to produce diverse and wide-ranging schema.

Much of the work on instructional efficiency has been completed within the field of research related to cognitive load. Cognitive load theory (CLT) prescribes the presentation of learning tasks matching complexity to learner expertise, so as to ensure that working memory capacity is not overloaded during instruction. More advanced studies vary task complexity based on (a.) performance, (b.) mental effort, or (c.) mental efficiency (calculated using the first two values). Randomly presenting problem states, assuming immediate feedback (corrective or explanatory in nature) is provided, may facilitate the construction more complex schema, consequently increasing performance in transfer situations.

Cognitive Load and Instructional Efficiency

Cognitive Load

Cognitive load theory (CLT) is a theory rooted in the idea that working memory is a limited capacity store within which processing occurs. More broadly, cognitive load theory provides evidence for why specific learning supports / designs are efficient. Essentially, this line of research looks at ways in which instructional design elements might facilitate or serve as an impediment to learning. A primarily tenet of CLT is that there is a dynamic relationship between learner expertise and the amount and/or type of support that should be provided. There are three types of cognitive load, illustrated graphically below.

Cognitive Load Types

Individuals possess various levels of background knowledge and unique sets of learning strategies. These qualities interact with the content-to-be-learned resulting in intrinsic load, or load that results from the relative complexity of the content. The use of the word relative is key, as simple content to which a learner has not been exposed is likely to result in high levels of load. Conversely, complex derivations of formulas my produce low levels of intrinsic load to a mathematician.

Extraneous load is used to refer to the load resulting from the instructional design. That is to say that the way in which instruction unfolds might require additional, unproductive processing on the part of the learner. There are a variety of conditions that have been demonstrated to produce such effects, many within the field of multimedia. Cognitive load theory urges the instructional designer take every precaution to minimize extraneous load.

The third category of load is termed germane load, originating from the idea that if intrinsic load can be decreased (through chunking, the use of advanced organizers, etc.), and extraneous load is minimized through wise instructional design (easier said than done, specifically at the individual level), the designer may impose additional load relevant to the topic of study. More precisely, germane load is load that reinforces the construction and automation (automatic processing) of schemas (organized networks of thought).

Instructional Efficiency

Cognitive load theorists have developed a formula for determining “instructional efficiency”. This is a relative measure utilizing (most often) two variables, standardized measures of learner effort (often self-reported on a likert scale, either during the training phase or the testing phase) and performance. The difference between these two values results in either a positive or negative number, ranging from -1 to +1. Often, this quantity is divided by the root of 2 so that it might be plotted.

Instructional Efficiency

It is interesting to note that researchers use the terms “mental efficiency” and “instructional efficiency” interchangeably. For example, Paas et al. (2003) introduces the quantity using mental efficiency, but then uses the terms “high-instructional efficiency” and “low-instructional efficiency” when referring to positions on the Cartesian axis, reproduced below for the reader’s convenience.

Instructional Efficiency Graph

This interchangeability indicates the perspective of the researchers, specifically the implicit assumption that the work required to construct various instructional experiences is constant, or of no concern. CLT research provides powerful prescriptions for instructional design, supported by approximately twenty years of academic research. However, as with much of the instructional design research, the needs of their most important audience member, the classroom teacher, are not addressed. More precisely, the “efficiency” of the ID process is not a component of the efficiency equation.

By disregarding the mental effort required to create instructional designs / experiences of varying complexity, researchers miss a chance to provide practitioners meaningful information. More precisely, the classroom teacher might want to be able to quantify, at least generally, what sort of “payoff” they might expect by devoting additional effort and time to the construction of instructional experiences to comply with the tenets of cognitive load theory.

A Teacher’s Perspective

As stated previously, the entire body of work related to instructional efficiency is focused on the student’s perspective; the goal is to find instructional strategies that produce the greatest gains while decreasing cognitive demands. However, the preparatory (design) work required to construct such experiences, the background knowledge and the corresponding mental effort (and time) required, has been neglected. How might we relate the input (design of instruction) to the output (student performance) of instruction?

A chemistry teacher who must teach students how to write chemical formulas for ionic compounds can pursue this goal in a variety of ways. Assuming they are experts in the domain, they may decide to prepare very little in terms of materials and assessment of students’ background knowledge, use lecture and guided practice, and materials from the textbook. This strategy requires very little preparatory time, but mastery is likely to take longer.

Alternatively, the teacher might develop some materials on their own and administer a pre-assessment. Maybe they decide to construct worked examples and partially completed problems, spending a day or so on each as they work towards the guided practice. In this situation, guided practice may begin several days into the unit, rather than on day one as we might expect in the first example.

Finally, the instructor may commit even more time and effort, pre-assessing learners and developing materials, possibly creating color-coded manipulatives and a corresponding activity to use after and introductory lecture, and periodically for reinforcement / remediation as students move on to examine worked examples and partially completed problems. They may continue to assess students’ knowledge periodically throughout the instructional experience in order to tailor instruction to individual needs. We might think of this as the instructor assuming more of the “mental load” or doing more of the “work”, reducing the burden on students.

Cost-Effectiveness Analysis?

The classroom instructor is interested in more than the efficiency of an instructional design. They’re also interested in the design-time to teaching-time ratio (often reported to be very high for complex designs), and the relative efficiency of instruction from the teacher’s perspective. Teachers, it seems, perform an informal cost-effectiveness analysis utilizing these sorts of variables as inputs. Cost-effectiveness analysis, different from cost-benefit analysis which is tied to actual financial costs, was developed by the military and is often used in the health care field. The general formula for determining the cost-effectiveness ratio is:

Cost-effectiveness ratio

Costs for the instructional process might be described by values of “effort-time”; the product of self-reported mental effort (as used by researchers when describing instructional efficiency) and time. Individual measures for effort-time could be determined for the design phase, the instructional phase, and the learning phase. The instructional phase and the learning phase refer to the same period of time but differ in perspective; the instructional phase uses the effort value for the instructor, the learning phase uses the effort value from the learners.

The effects of the instructional process could be represented by student performance. Alternatively, the previously described “instructional efficiency” might serve as a measure of effect, but using this value would result in students’ self-reported mental effort values appearing in both the numerator and the denominator of the equation. A example of what the cost-effectiveness ratio for the instructional process, using performance as the measure of effect, is provided below.

Cost-effectiveness for the instructional process

In this equation, ET represents the product of mental effort and time. The subscript “D” represents the design phase, I represents instruction, and L represents learning. It would be interesting to use this conceptualization, or something similar, to evaluate a variety of approaches to classroom instruction – similar to those described in the examples above.


My question is a practical one: how much time and effort is saved, in the planning and instructional stages, by implementing the tenets of CLT (or any instructional design paradigm for that matter)? As the prescriptions coming from the academic community become increasingly complex, the classroom teacher struggles to keep up. Practically speaking, there is only so much one individual can know and do. It is surprising, and maybe a bit revealing, that instructional designs are not evaluated in this way.

One might argue that good teachers should be striving to reduce cognitive load regardless or any formal or informal cost-effectiveness analysis, or that over time instructors will become better at generating good designs and will accumulate ideal instructional designs for different lessons These are valid arguments. However, each new class presents a different composition of learners, meaning that although generated instructional materials can be reused, analyses would have to be completed to accurately implement the design.

In the end, the best argument for conducting studies such as the one proposed here is the fact that it does not exist. Providing teachers with guidelines related to design paradigms – what they should expect to commit, in terms of time and effort, and the corresponding benefits from implementation – are legitimate goals and might lead to revisions to designs that make their adoption more plausible in the school environment.