In discussions around education reforms that don’t deliver, three reasons are typically cited for the failure:
The model was not executed faithfully.
The reforms did not have enough time to jell.
Something changed during the rollout – state policies, political climate, funding – that doomed the effort.
A bit of all three rationales have been offered after the acknowledgement by the Bill & Melinda Gates Foundation that its ambitious multi-million-dollar, multi-year attempt to raise student outcomes by improving teacher effectiveness via smarter recruitment, evaluation and compensation flopped.
It turned out that measuring teacher effectiveness, building a rewards system around it and then replicating it across classrooms pose challenges, even with more than a half billion dollars devoted to the task.
Piloted in three school districts and four California-based charter school networks from 2009 to 2016, the Gates Foundation’s Intensive Partnerships for Effective Teaching cost $575 million, a third of which the foundation provided. The rest largely came from local and federal dollars.
In a six-year evaluation of the initiative commissioned by Gates, the RAND Corporation and the American Institutes for Research concluded:
Overall, however, the initiative did not achieve its goals for student achievement or graduation, particularly low-income minority students. By the end of 2014–2015, student outcomes were not dramatically better than outcomes in similar sites that did not participate in the Intensive Partnerships initiative. Furthermore, in the sites where these analyses could be conducted, we did not find improvement in the effectiveness of newly hired teachers relative to experienced teachers; we found very few instances of improvement in the effectiveness of the teaching force overall; we found no evidence that low income minority students had greater access than non-low -income students to effective teaching; and we found no increase in the retention of effective teachers, although we did find declines in the retention of ineffective teachers in most sites.
My favorite translation of that summation came from University of Arkansas researcher Jay Greene who wrote, “You have to slog through the 587 pages of the report and 196 pages of the appendices to find that the results didn’t just fail to achieve goals, but generally were null to negative across a variety of outcomes.”
I did slog through the report. This was a well-funded and comprehensive initiative with school systems and charter networks that wanted it to work. They sought to participate. They invested in its success. They had sufficient resources and expertise on which to draw. Everything was in place for this to succeed. It didn’t.
In explaining what went awry, RAND and AIR suggest:
There are several possible reasons that the initiative failed to produce the desired dramatic improvement in outcomes across all years: incomplete implementation of the key policies and practices; the influence of external factors, such as state-level policy changes during the Intensive Partnerships initiative; insufficient time for effects to appear; a flawed theory of action; or a combination of these factors.
A failure of this massive scale tells me we can’t keep focusing reform hopes on superstar teachers. I would never argue all teachers are created equal. With four kids, I can attest to variations in teacher skill.
But we’ve tried sorting teachers by the numbers for more than two decades, and it just doesn’t work. Student success cannot rest solely on the person in front of the room.
We have to look at what is being taught. We have to consider classroom conditions, and the quality of school, district and state leadership. We have to see where kids are when they begin school. We have to ensure floundering students get the interventions they need.
We’ve pushed for more meaningful teacher evaluations, somehow expecting principals to refine what even major corporations can’t get right. (I wrote two years ago about all the companies backing off annual checklist performance reviews in favor of less-regimented and more informal and frequent feedback.)
In the Gates initiative, principals were supposed to observe teachers in action, but that was not only time-consuming, it was unreliable. We presume classrooms observations capture what teachers are doing independent of the students. But the academic caliber of the students sways how teachers are seen. Studies show teachers with students who arrive in the classroom with high academic achievement earn better classroom observation scores than colleagues whose incoming students have lower achievement levels.
The RAND and AIR evaluation also cites the challenges school sites faced in developing relevant professional development or PD, which many Georgia teachers will testify is often a waste of their time and the district’s money.
The report states:
The sites confronted some challenges in moving toward effectiveness-linked PD systems. They struggled to figure out how to individualize professional development to address performance problems identified during a teacher’s evaluation. Some also found it difficult to develop a coherent system of PD offerings. Although every site implemented an online repository of PD schedules and materials, the repositories were not entirely successful, at least in part because teachers had technical difficulties accessing the online materials.
The report recommends:
Reformers should not underestimate the resistance that could arise if changes to teacher-evaluation systems have major negative consequences for staff employment.
A near-exclusive focus on teacher effectiveness might be insufficient to dramatically improve student outcomes. Many other factors might need to be addressed, ranging from early childhood education, to students' social and emotional competencies, to the school learning environment, to family support. Dramatic improvement in outcomes, particularly for low-income minority students, will likely require attention to many of these factors as well.
The report ends with a final thought.
A favorite saying in the educational measurement community is that one does not fatten a hog by weighing it. The IP initiative might have failed to achieve its goals because it succeeded more at measuring teaching effectiveness than at using the information to improve student outcomes. Contrary to the developers’ expectations, and for a variety of reasons described in the report, the sites were not able to use the information to improve the effectiveness of their existing teachers through individualized PD, career ladders, or coaching and mentoring. In the end, the sites were able to measure effectiveness but not increase it.