Question? Call us at 800-207-8001 | Sign In | Learn About Membership

Tuesday, May 21, 2013 | Last Updated: January 28, 2013 08:32 AM

Education Experts Blog
«Pre-K for Everyone? | Main page | Boring Old Certainty Spurs Innovation»

Rigorous Teacher Evaluations (With Videotape)

By Fawn Johnson
January 14, 2013 | 8:30 a.m.
  • 10

Two years ago, I sat in the 8th floor of the Watergate building at a National Journal dinner on education. The main attractions of the event were researchers from the Bill and Melinda Gates Foundation, who were about a year into a three-year intensive study on teacher evaluations. As they described their research, the diners were incredulous.

"Teachers let you videotape them?" Yes. They analyzed 13,000 digital video lessons.

"Weren't they upset at you reviewing them?" Actually, the teachers in the study were thrilled to have the feedback. They were happy to go over their videotapes with an observer.

The final report on the project was released last week. It found that teacher effectiveness can, in fact, be measured in a scientific manner. The process is highly labor intensive, and some evaluative factors can't be measured with raw numbers. If you do it right, it requires time, narrative, and observation. The report says the results are the most stable when they rely on a combination of classroom observations (ideally by multiple evaluators), student surveys, and student achievement measures. The report is sprinkled with cautionary notes: Make sure to include prior test scores of students when looking at achievement or their gains will be overstated. Don't weight a single measure too heavily or teachers will lean towards it and neglect others. Make sure to use observers from outside the school.

The biggest impression I gleaned at the dinner from the researchers was respect. This was serious science by serious researchers who genuinely wanted to find an answer. Their work is good news for teachers. American Federation of Teachers President Randi Weingarten virtually crowed in response. "The days of haphazard or check-list observation of teachers must end," she said in a statement. "Teacher evaluation is both an art and a science that requires time, tools, training and trust--ingredients that teachers and principals should have but too often don't."

That said, it's hard to imagine how cash-strapped school districts can implement such a rigorous evaluation system. That's another chapter to be written.

What is most surprising about the Gates' findings? What are the easiest ways teacher evaluations can be tweaked to more accurately reflect effectiveness? How important are student perception surveys? What lies ahead for videotaping teachers' lessons? Do we need to learn anything more about measuring student achievement? Is the task laid out by Gates too daunting for schools to handle?

10 Responses

Expand all comments Collapse all comments

January 25, 2013 8:09 PM

Teacher Evaluation Lesson Plans

By Thomas Toch

Wonky statistical studies like the Gates Foundation's Measures of Effective Teaching project are often soporific. But with some three dozen states and thousands of public school systems crafting new teacher evaluation systems under federal financial incentives, the final report of the 3-year, $45-million Gates investigation is timely.

The study of some 3,000 teachers in Denver, Dallas and several other urban school systems puts to rest the notion that we can’t objectively measure teachers’ contribution to student learning—the thing that matters most in schools. By measuring student performance and then randomly assigning them to teachers in the study the following year, MET researchers, led by Harvard’s Tom Kane, were able to identify who was getting results and who wasn’t.

The study established beyond doubt that the only defensible way to get a fair reading on teacher performance using student test scores is by gauging how students did in the prior year or (preferably) years and compare that performance to how they do in the year you&...

Wonky statistical studies like the Gates Foundation's Measures of Effective Teaching project are often soporific. But with some three dozen states and thousands of public school systems crafting new teacher evaluation systems under federal financial incentives, the final report of the 3-year, $45-million Gates investigation is timely.

The study of some 3,000 teachers in Denver, Dallas and several other urban school systems puts to rest the notion that we can’t objectively measure teachers’ contribution to student learning—the thing that matters most in schools. By measuring student performance and then randomly assigning them to teachers in the study the following year, MET researchers, led by Harvard’s Tom Kane, were able to identify who was getting results and who wasn’t.

The study established beyond doubt that the only defensible way to get a fair reading on teacher performance using student test scores is by gauging how students did in the prior year or (preferably) years and compare that performance to how they do in the year you’re evaluating teachers, what’s called a value-added metric. Simply comparing teachers’ student test scores is a single year—a tactic that some school systems continue to use—yields many very flawed evaluations, the MET researchers report.

But the final MET report also upends the notion that student test scores can defensibly serve as the sole or even the dominant measure of teacher performance. The study found that the more you rely on test scores the more they fluctuate erroneously from year to year, and the less likely you’re able to accurately measure teachers’ ability to raise students’ scores on more intellectually demanding tests than those used by most states today—a flaw that’s going to loom a lot larger as national testing consortia introduce new, tougher tests in the next few years.

Contradicting the claims of some school reformers, the MET researchers found that well-constructed classroom observations can accurately distinguish teachers’ ability to raise student achievement, as can student surveys (which are a new and promising gauge of teacher effectiveness, though one not extensively tested in high-stakes situations where jobs are on the line).

The best strategy—the one that yields the most accurate results—they found, combines classroom observations, student surveys, and measures of student achievement into an evaluation package. Another finding: multiple observations by multiple observers are stronger than a single observation by a single person, or even multiple observations by a single person.

Notably, the MET team combined the three different ways of measuring teacher performance in different configurations and found that weighting observations, surveys and student achievement equally came closest to identifying teachers’ true performance, and did so from one year to the next. (In their report, they expressed confidence in having student test scores count for up to 50 percent of teachers’ evaluations.)

The study focused on the statistical advantages of combining observations, surveys and student achievement. But there are other reasons why combining the methods is valuable. The first is that only a fraction of teachers teach tested grades and subjects, leaving school districts unable to use student test scores to evaluate upwards of 85 percent of their teachers. Individual teachers (not to mention their unions) are a lot more sympathetic to so-called multiple measures than test-score-driven evaluations. That’s important because teacher buy-in, districts are learning (often the hard way), is critical to the sustainability of new evaluation systems.

And if you want to use evaluations to improve teachers’ performance rather than merely to remove bad apples—and you should, because removing bad teachers is a necessary but ultimately insufficient way to build a stronger teaching force, especially as the Common Core State Standards ratchet up expectations in every classroom—then you’ve got to use tools that identify things that teachers can improve, something that value-added scores alone don’t do very well. An important step in this direction would be to create new standards for classroom observations (called rubrics) that focus on the content that teachers teach, rather than merely on the way they teach.

Importantly, the MET study and others have found that teachers want to improve, they want help improving their craft. The MET team videotaped the study’s 3,000 teachers at work in their classrooms and the teachers were thrilled with the feedback they got.

Comprehensive evaluation systems of the sort that Kane and his colleagues recommend are pricier than the cursory, drive-by principal visits that pass for evaluations in most school systems, and are more expensive than value-added gauges of teacher performance. The District of Columbia school district spends about $1,000 a year (not counting principals’ time) to evaluate a teacher under one of the most comprehensive systems in the country.

There are ways to reduce these costs without diminishing the accuracy of the ratings (fewer observations for top-performing teachers is one strategy). But you get what you pay for. The MET study suggests that only comprehensive systems dependably measure a teacher’s performance reliably from one year to the next.

And comprehensive models would allow school systems to target other monies a lot more effectively. We spend something like $20 billion a year on teacher professional development and much of it is dreck, of little value to most teachers. Spending money on comprehensive evaluations would help school districts to get a lot more for their professional-development monies by allowing them to target their support to individual teachers’ needs identified through the evaluation process. It makes sense, in other words, to view comprehensive new evaluation systems as an investment, not merely an expense.

Read More

Print |
Share | E-mail

January 18, 2013 3:06 PM

A Principal’s Role in Evaluation

By Jean Desravines

The Gates Foundation’s recent Measures of Effective Teaching (MET) results demonstrate that carefully constructed teacher evaluations based on multiple measures can provide useful information to improve a teacher’s craft and allow a principal to more effectively manage talent and serve as an instructional leader. For me, the results highlighted the important role a principal must play for teacher evaluation and support systems to be implemented well.

I found it notable that the use of multiple observers significantly improved the reliability of evaluation results. Pairing one full-length administrator observation with three, short peer observations allows schools to invest a similar amount of time in observing a teacher and include teacher leaders in the mix. While these shorter, 15-minute observations were not as reliable as observing a full hour of teaching practice, supplementing the full-length observation with a series of shorter observations by trained peers allowed more people to observe the teacher and, in turn, increased reliability.

In addi...

The Gates Foundation’s recent Measures of Effective Teaching (MET) results demonstrate that carefully constructed teacher evaluations based on multiple measures can provide useful information to improve a teacher’s craft and allow a principal to more effectively manage talent and serve as an instructional leader. For me, the results highlighted the important role a principal must play for teacher evaluation and support systems to be implemented well.

I found it notable that the use of multiple observers significantly improved the reliability of evaluation results. Pairing one full-length administrator observation with three, short peer observations allows schools to invest a similar amount of time in observing a teacher and include teacher leaders in the mix. While these shorter, 15-minute observations were not as reliable as observing a full hour of teaching practice, supplementing the full-length observation with a series of shorter observations by trained peers allowed more people to observe the teacher and, in turn, increased reliability.

In addition to increasing the reliability and therefore credibility of evaluation results, I believe these findings highlight an important reality: to implement evaluations well, we must prepare and empower principals to take responsibility for teacher evaluations and foster the role of the teacher leader.

At the end of the day, principals are responsible for overseeing the successful implementation of evaluation and support systems in their schools. One of the primary ways in which a principal impacts student achievement is by managing talent and improving teacher effectiveness – both key activities when evaluating teachers and providing actionable feedback. Principals are fully responsible for integrating evaluation results into individual growth plans, identifying professional learning opportunities, and making human capital decisions.

By creating opportunities for teachers to take on leadership roles within their school – including observing teacher practice – we not only provide individual teachers with experience practicing adult leadership skills, we also allow principals to build robust leadership teams that make the role of the principal more focused and sustainable.

Great principals build leadership teams and delegate responsibilities in order to effectively manage a school. Principals are being asked to do more with less – using a distributive model to build strong leadership teams allows them to execute on their myriad expectations and engage teachers in ownership of school decisions. By training a cadre of effective teacher observers, principals can tap into the power of a distributive leadership model while developing potential future school leaders.

Through this model, principals free up some of their valuable time to better focus on the full range of their roles as instructional leaders and talent managers. This includes ensuring strong implementation of a school-wide professional development plan that matches the needs of teachers and doing additional observations and feedback as needed to support teacher development. Involving peer evaluators strengthens the instructional leadership responsibilities of the principal; building teacher leadership in a school is a key strategy that principals can deploy to develop pathways for effective teachers and create the necessary school-wide capacity to improve the classroom instruction of all teachers.

The good news for principals taking on the hard work of implementing teacher evaluations is there are many excellent educators ready to help out. Videotaping teacher practice makes distributing this responsibility even easier. Principals can use trained peer observers to observe instruction across schools – not only distributing the workload of observations and providing additional leadership opportunities, but also helping teacher observers improve their own practice. By observing their peers, teachers develop a common language for reflecting on their own instruction and cultivating an improvement cycle. The study also noted that adding observations (for all or a sample of teachers) from external observers can “provide an ongoing check against in-school bias.” It will be important for external observers to work closely with the principal to provide feedback in a timely manner in order to inform professional development and other human capital decisions made by the principal.

The way I see it, empowering principals to develop teacher leaders is critical to strong evaluation systems and strong schools.

New Leaders is committed to leveraging our more than ten years of experience in school leadership to help all principals excel at building an effective teacher corps. To shine a light on this important role of the principal, New Leaders recently released an in-depth report that reinforces the link between strong principals and teachers. Playmakers: How Great Principals Build and Lead Great Teams of Teachers explores both what specific actions principals of high-performing schools take to improve teacher effectiveness and what distinguishes principals who lead high-performing schools from other principals.

Read More

Print |
Share | E-mail

January 18, 2013 9:36 AM

MET Project leaves out PreK-3rd teachers

By Laura Bornfreund

I asked two of my colleagues at the New America Foundation to weigh in on this week's question about the MET project.

Lisa Guernsey, director of the Early Education Initiative at the New America Foundation had this to say:

A word of caution: The MET report was not about the effectiveness of K-12 teachers. It was about teachers in the 4th through 8th grades. What about teachers in kindergarten, 1st and 2nd grade where state test score data doesn’t exist and where people may wonder if young children are reliable reporters of their teachers’ ability to help them learn? What about preschool teachers on public school payroll who are part of new evaluation systems in states? There are many valuable lessons within this report, but before they are applied across the full PreK-12 system, policymakers should look closely at how they will impact these earlier grades.

The rep...

I asked two of my colleagues at the New America Foundation to weigh in on this week's question about the MET project.

Lisa Guernsey, director of the Early Education Initiative at the New America Foundation had this to say:

A word of caution: The MET report was not about the effectiveness of K-12 teachers. It was about teachers in the 4th through 8th grades. What about teachers in kindergarten, 1st and 2nd grade where state test score data doesn’t exist and where people may wonder if young children are reliable reporters of their teachers’ ability to help them learn? What about preschool teachers on public school payroll who are part of new evaluation systems in states? There are many valuable lessons within this report, but before they are applied across the full PreK-12 system, policymakers should look closely at how they will impact these earlier grades.

The report’s information about observing teachers, however, provides good fodder for deeper conversations about improving teaching in the early grades. And in many ways it dovetails with other research, described in the New America Foundation’s report Watching Teachers Work: Using Observation Tools to Promote Effective Teaching in the Early Years and Early Grades, which urges policymakers to include objective observation-based assessment as a measure of teacher effectiveness.

For a broader perspective on the MET findings, my colleague Anne Hyslop, education policy analyst, offers her thoughts:

It took three years and $45 million to confirm, empirically, what educators have always known implicitly: great teaching matters, it can be measured and it improves student learning. That was one of the many findings released in the final report from the MET Project. Two of its big takeaways:

1. What teacher evaluations measure is just as important as how it is measured.

Much has been made of the finding that classroom observations are the worst predictor of student learning, compared to test scores and student surveys. Some have questioned whether observations are worth the significant time and personnel costs involved in order to do them well. Tim Daly of TNTP even claimed that MET shows “the way that most teachers have been evaluated forever is completely unreliable.”

It’s easy to jump to that conclusion: MET used proven observation tools, and observers were trained and certified on their knowledge of them. This isn’t the case with many teacher observations across the country. Still, observations are a critical component of teacher evaluations, particularly for those in the early grades and in untested subjects. And using observations typically receives greater support from educators compared to test scores. Classroom observations can provide teachers with valuable, timely and clear feedback on their practice. Given their complexity, value-added measures are far less teacher-friendly – not to mention, limited in scope. Surely, great teaching involves much more than improving student scores on multiple-choice tests.

To this end, it’s laudable that MET’s researchers also used higher-order tests to measure student learning. In some states, these assessments are more similar to the Common Core assessments they will offer beginning in 2014-15. States should want teacher evaluations that not only function well with today’s tests, but also those of the future.

Still, the higher-order tests MET used only consider English Language Arts and math skills. If the ultimate goal of evaluations is to measure whether teachers create learning environments where students achieve a broader set of outcomes, then there is still a long way to go in developing these evaluation tools. MET’s findings suggest that states should carefully consider whether their evaluation systems are measuring the teacher attributes needed to meet the Common Core’s larger goal of college readiness.

2. How teacher evaluations are used is just as important as what they measure.

Part of the demand for research like MET comes from the push to use teacher evaluation results for human resources decisions, like hiring, retention, compensation and tenure. Some of the push can be attributed directly to the Obama administration: developing and using teacher evaluation systems like the ones in the MET study was a major component of both Race to the Top and the No Child Left Behind waivers.

But there is still uncertainty surrounding teacher evaluations; the MET Project doesn’t provide a definitive roadmap for states and districts looking to measure effective teaching. The report’s findings are inconclusive when it comes to:

  • whether student demographics should be included as a control in value-added models;
  • how to weight each component within a composite effectiveness measure: value-added data, student-perception surveys and classroom observations; and
  • who should observe teachers, how long these observations should last and how many observations should occur each year.

The teacher quality measures MET suggests are “better on virtually every dimension than the measures in use now.” But does that mean similar teacher evaluation systems should be used as the deciding factor for whether a teacher receives a bonus? Or is fired?

Thorny questions, indeed. Yes, the new measures of effective teaching are promising. But given MET’s lingering questions, wouldn’t it make more sense to continue refining teacher evaluations without rushing to use them for high-stakes decisions? Especially since most schools lack the capacity and resources to implement evaluations of the same rigor as those used in the MET study? States and districts should consider using the results from teacher evaluations in a more diagnostic manner: why not make these measures of effective teaching the first step in the process of providing professional development, determining who receives pay increases and making decisions about hiring or firing – rather than the final step?

For more, read Anne’s full blog post on Ed Money Watch.

In full disclosure, the work of New America’s Education Policy Program is supported, in part, with funding from the Bill & Melinda Gates Foundation, which also funded the MET Project.

Read More

Print |
Share | E-mail

January 16, 2013 8:11 PM

The Meta-MET Study

By Steve Peha

Kudos to The Gates Foundation for supporting such a thorough approach to the challenge of understanding the vicissitudes of teaching.

But did we learn anything here that we didn’t already know?

Most of the people I have worked with in education readily admit that if you have a large enough sample of students, and if those students are sufficiently mature such that they can respond appropriately, one can get at least a rough idea of how effective a teacher is by surveying them.

Next, does anyone really think there is NO value in student achievement data? I mean, none whatsoever? Perhaps a few people do, but I haven’t met very many. We may not be in agreement about which tests we like, or how data should be used, but when I’ve shown patterns of difference in aggregated student achievement between teachers across years and across classrooms (often as an exercise with anonymized data), everyone agrees that something meaningful about teaching quality can be confidently inferred—even if the numbers alone can’t tell us exactly what that...

Kudos to The Gates Foundation for supporting such a thorough approach to the challenge of understanding the vicissitudes of teaching.

But did we learn anything here that we didn’t already know?

Most of the people I have worked with in education readily admit that if you have a large enough sample of students, and if those students are sufficiently mature such that they can respond appropriately, one can get at least a rough idea of how effective a teacher is by surveying them.

Next, does anyone really think there is NO value in student achievement data? I mean, none whatsoever? Perhaps a few people do, but I haven’t met very many. We may not be in agreement about which tests we like, or how data should be used, but when I’ve shown patterns of difference in aggregated student achievement between teachers across years and across classrooms (often as an exercise with anonymized data), everyone agrees that something meaningful about teaching quality can be confidently inferred—even if the numbers alone can’t tell us exactly what that is.

Finally, every single teacher I’ve ever talked to about narrative evaluations has made the following point one way or another: “It’s not the evaluation that matters but the evaluator.” A narrative report of an observed classroom session, written thoughtfully by a skilled observer who is also knowledgeable about teaching, is something most teachers crave. Even more to their liking is the coaching they receive from these talented observers. So here again the takeaway seems obvious: Good evaluators know good teaching when they see it.

So here we are back again at “multiple measures”. But in this case, the measures have been known for a long time:

1. Student observations. Appropriately made and gathered in sufficient numbers.

2. Adult observations. Thoughtfully communicated by knowledgeable observers.

3. Student achievement data. Several measures over time with standard statistical rigor.

Put them together in logical ways and you get a pretty good picture of how and how well someone teaches.

I’m very glad that the MET study was done. And I’m thrilled with the results. They seem to me the natural and normal outcomes that have been consistent with my experience of working with thousands of teachers over the past 15 years.

But the study’s commonsensical conclusions make me curious about something: What is it about our culture and about the nature of education in general that would make such a study necessary?

This was a long, hard, expensive, intentional, and patiently-nurtured effort. To me it seems to confirm what most people I know already knew. Yet this is one of the most anticipated studies of education that I can remember. People really wondered what would come out of this. People really wondered what the results would be, what conclusions would be drawn, what recommendations would be made.

At the same time, I think many people, both in schools and out, would probably have reached similar conclusions on their own if given sufficient opportunity to explain themselves.

I think we knew all this. And yet, someone just spent a lot of money and time and effort to figure it out. Again, the work is laudable in its thoroughness and scale and I applaud The Gates Foundation for taking it on. But the study I would be more interested in would be “The Meta-MET Study”, or the study that answers the questions, “Why did we need the MET study? And what does the need for it tell us about where we are as a nation in our understanding of education and education reform?

I believe it is the results of this research that would tell us what we really need to know about teaching such that we could improve it as opposed to merely measuring it.

Read More

Print |
Share | E-mail

January 15, 2013 3:40 PM

MET Study: More than Just Measuring

By Kris Amundson

Chad Aldeman recently shared these thoughts on the MET study.

On Tuesday the Measures of Effective Teaching (MET) Project released its third and final series of reports. The media has reported the main findings: that we can measure and predict effective teaching. And, because the MET Project randomly assigned students to teachers, we can say that there is causality in this relationship, that teachers with high value-added scores in one year caused student achievement to rise in the following year.

But the reports also include a host of interesting detail and helpful suggestions to districts and states. Among them:

Current state tests can be used to identify effective teachers. Assessments that require higher-order thinking skills are likely to be better at differentiating teachers, but even the current low-level tests that states are using are valuable in identifying effective teachers. Important, the same teacher...

Chad Aldeman recently shared these thoughts on the MET study.

On Tuesday the Measures of Effective Teaching (MET) Project released its third and final series of reports. The media has reported the main findings: that we can measure and predict effective teaching. And, because the MET Project randomly assigned students to teachers, we can say that there is causality in this relationship, that teachers with high value-added scores in one year caused student achievement to rise in the following year.

But the reports also include a host of interesting detail and helpful suggestions to districts and states. Among them:

  • Current state tests can be used to identify effective teachers. Assessments that require higher-order thinking skills are likely to be better at differentiating teachers, but even the current low-level tests that states are using are valuable in identifying effective teachers. Important, the same teachers who raised student achievement on low-level state tests also raised student achievement on more cognitively challenging, open-response type assessments.
  • Both teacher observations and student surveys, the two other measures considered in the Project, are predictive of future student achievement. However, once they are combined with a teacher’s value-added score, they no longer add any predictive value. Instead, they add year-to-year stability (what researchers call “reliability”) to teacher ratings. They also provide more detailed and timely feedback to help teachers improve their practice. (If you want to parse this further, Jay P. Greene is sort of right on this but Marty West has a more nuanced take.)
  • The MET Project tested high-quality observation rubrics that are widely considered some of the best in the education field, each of which had been tested and validated in other settings. States and districts attempting to create their own are not likely to have results as strong.
  • Similarly, the MET Project used only the most high-quality and rigorously evaluated student surveys that were created by the Tripod Project. It will be difficult for states and districts developing their own student surveys to see similarly strong results.
  • There are also a number of steps that states and districts can take to maximize the reliability of teacher observations:
    • Observers should be trained on the observation protocol and tested on their accuracy before conducting meaningful observations.
    • Two observations are better than one. The results from two 45-minute observations were substantially more reliable than only one 45-minute observation. MET tested reliability for up to four evaluations. Each additional observation increased reliability, but the largest gain was from moving from one to two observations.
    • Two pairs of eyes are better than one. Using two different observers increased reliability significantly more than having the same person observe two lessons.
    • Different combinations of the number of lessons and observers can work equally well. For example, having two observers each watch a 45-minute lesson had as much reliability as a principal watching a 45-minute lesson and three peer teachers each watching a 15-minute lesson.
    • Districts should focus on rank, not rating. Although principals tended to give their own teachers slightly inflated ratings, their rankings were very similar to those of outside observers.
    • Districts don’t need to surprise teachers (through unannounced visits). Although teachers tended to earn higher ratings when they were told in advance that they would be observed, rankings of teachers were very similar regardless of whether the observation was announced or unannounced. Giving teachers notice of an observation may help reduce stress and increase their belief in the fairness of the observation system.
    • Video can save time and not harm reliability. The MET Project videotaped teacher lessons and let observers watch the videos on their own time. This saves precious time in the workday. Given the evidence that teachers do not need to be surprised, they could even be asked to videotape themselves and provide the tape to principal or peer observers.

Tuesday’s release also included a long paper devoted specifically to how states should think about weighting different measures into one overall teacher evaluation rating. The researchers looked only at test-based value-added scores, student surveys, and teacher observations. Among their findings:

  • In terms of predicting which teachers would be more effective at raising student achievement in future years, prior-year value added scores were the best of the three measures. The study did not look at other measures of student growth such as Student Learning Objectives or whole-school growth, but it’s likely those measures have lower predictive power than an individual teacher’s value-added results.
  • To balance predictive power and year-to-year stability, states should weight state tests between 33 and 50 percent of a teacher’s evaluation. More than that reduces reliability, but less than that reduces predictive power.
  • Important, the 33-50 percent recommendation is for student growth as measured by state tests. Many states have required teacher evaluations to be based on a broad “student growth” factor that includes growth on the state test and other measures of student growth, such as Student Learning Objectives or other locally developed measures. The MET Project suggests student growth as measured by state tests should be weighted at 33-50 percent. Instead of waiting for objective evidence on how their new teacher evaluation systems are playing out, states like Maryland and DC are pre-emptively lowering the weighting for test-based student growth. The MET results suggest those changes, while politically appealing, will cause the evaluation ratings to lose predictive power.

Finally, there’s a debate about just how many different variables teacher value-added models should include. For example, many states and districts have added statistical controls for things like student race/ ethnicity, poverty, or other demographic factors. Doing so assumes, for example, that low-achieving black students should have different expectations than low-achieving white students. The paper concluded that there was no particular statistical rationale for controlling for student demographics.

The MET Project’s formal work is now over, but researchers will have access to all of the data, videotapes of teacher lesson plans, and observation ratings to dig in as they choose.

Read More

Print |
Share | E-mail

January 15, 2013 10:19 AM

Evaluation is Just 1 Piece of the Puzzle

By Michael Haberman

The MET study provides some important insights into what an effective teacher evaluation program will look like. And we know that any organization interested in maximizing its potential has to have a thoughtful and thorough process for evaluating its staff.

However, we miss the point if we focus solely on teacher evaluation. As I recently wrote in a longer post, we know that in the private sector, the businesses that are interested in attracting, retaining and developing the best talent take a multi-pronged approach – competitive pay, robust professional development and comprehensive evaluation. Yet, the facts are clear that when it comes to teaching we do the opposite:

Nationwide, first-year teachers earn a median salary of $31,333 according to the United Federation of Teachers/TeacherPortal, 33 percent less than the median salary of someone...

The MET study provides some important insights into what an effective teacher evaluation program will look like. And we know that any organization interested in maximizing its potential has to have a thoughtful and thorough process for evaluating its staff.

However, we miss the point if we focus solely on teacher evaluation. As I recently wrote in a longer post, we know that in the private sector, the businesses that are interested in attracting, retaining and developing the best talent take a multi-pronged approach – competitive pay, robust professional development and comprehensive evaluation. Yet, the facts are clear that when it comes to teaching we do the opposite:

  • Nationwide, first-year teachers earn a median salary of $31,333 according to the United Federation of Teachers/TeacherPortal, 33 percent less than the median salary of someone with a bachelor's degree. In New York City, the 2008 starting salary for teachers was $45,530, whereas entry-level PR professionals earned a median salary of $53,139 and financial analysts earned $57,442 in their first year. About one in five teachers work part-time jobs just to make ends meet.
  • As states are about to launch the Common Core Standards, 51 percent of teachers said they were only somewhat prepared and 27 percent said they were highly unprepared to meet these new standards, according to yet another study founded by the Bill & Melinda Gates Foundation. In New York City, it was recently reported that $100 million has been spent on professional development yet there is no indication of whether it has had any impact on our teachers. And I know in my own community, where I sit on the Board of Education, professional development is “weather dependent” – too many snow days and professional development disappears from the calendar.

Without a doubt, evaluating teachers – and subsequently providing them the tools they need to develop or moving them out of the profession – is a key to improving our schools. But we shouldn’t focus on evaluation in a vacuum. Additional steps need to be taken to ensure that we recruit the best and brightest to become teachers.

Read More

Print |
Share | E-mail

January 14, 2013 1:29 PM

MET Project Trips on Ecological Fallacy

By Gene Glass

Any attempt to evaluate teachers that is spoken of repeatedly as being "scientific" is naturally going to provoke rebuttals that verge on technical geek-speak. The MET Project's "Ensuring Fair and Reliable Measures of Effective Teaching" brief does just that.

At the center of the brief's claims are a couple of figures (“scatter diagrams” in statistical lingo) that show remarkable agreement in VAM scores for teachers in Language Arts and Math for two consecutive years. The dots form virtual straight lines. A teacher with a high VAM score one year can be relied on to have an equally high VAM score the next, so Figure 2 seems to say.

Not so. The scatter diagrams are not dots of teachers' VAM scores but of averages of groups of VAM scores. For some unexplained reason, the statisticians who analyzed the data for the MET Project report divided the 3,000 teachers into 20 groups of about 150 teachers each and plotted the average VAM scores for each group. Why?

And whatever the reason might be, why would one do such a thing wh...

Any attempt to evaluate teachers that is spoken of repeatedly as being "scientific" is naturally going to provoke rebuttals that verge on technical geek-speak. The MET Project's "Ensuring Fair and Reliable Measures of Effective Teaching" brief does just that.

At the center of the brief's claims are a couple of figures (“scatter diagrams” in statistical lingo) that show remarkable agreement in VAM scores for teachers in Language Arts and Math for two consecutive years. The dots form virtual straight lines. A teacher with a high VAM score one year can be relied on to have an equally high VAM score the next, so Figure 2 seems to say.

Not so. The scatter diagrams are not dots of teachers' VAM scores but of averages of groups of VAM scores. For some unexplained reason, the statisticians who analyzed the data for the MET Project report divided the 3,000 teachers into 20 groups of about 150 teachers each and plotted the average VAM scores for each group. Why?

And whatever the reason might be, why would one do such a thing when it has been known for more than 60 years now that correlating averages of groups grossly overstates the strength of the relationship between two variables? W.S. Robinson in 1950 named this the "ecological correlation fallacy." Please look it up in Wikipedia. The fallacy was used decades ago to argue that African-Americans were illiterate because the correlation of %-African-American and %-illiterate was extremely high when measured at the level of the 50 states. In truth, at the level of persons, the correlation is very much lower; we’re talking about differences as great as .90 for aggregates vs .20 for persons.

Just because the average of VAM scores for 150 teachers will agree with next year's VAM score average for the same 150 teachers gives us no confidence that an individual teacher's VAM score is reliable across years. In fact, such scores are not -- a fact shown repeatedly in several studies.

So we aren't going to fire groups of 150 teachers arbitrarily lumped together who might have low VAM scores, nor pay big bonuses to the high VAM group. Nor are we going to fire those teachers whose Language Arts VAM score is low, because the odds are substantial that the same teachers Math VAM score might be average or even above. We would see that such teachers are hardly the exception if the authors of the MET Project brief had simply shown us scatter plots of individual teachers' VAM scores instead of having tripped up on Robinson's ecological correlation fallacy.

Read More

Print |
Share | E-mail

January 14, 2013 11:49 AM

MET Findings and Implementation Gaps

By Cynthia G. (Cindy) Brown

As I recently wrote for the Hill’s Congress Blog, there is no easy way of measuring teacher effectiveness. Yet in order to ensure that all students have access to an excellent teacher—the one factor that we know most affects student achievement—it is incumbent upon us that we figure it out. The MET Project is to date the most comprehensive example we have of how teacher evaluation can be conducted to achieve reliable results and should be used as a foundation of next-generation teacher evaluation systems.

However, getting the method correct is just one step in a several-step process to ensuring successful implementation of evaluation systems. Another key aspect is getting buy-in from the professionals who will carry out the evaluation—whether that means teachers whose work will now be more deeply reviewed, or administrators, coaches, or peer teachers who must understand teaching in ways that allow them to ob...

As I recently wrote for the Hill’s Congress Blog, there is no easy way of measuring teacher effectiveness. Yet in order to ensure that all students have access to an excellent teacher—the one factor that we know most affects student achievement—it is incumbent upon us that we figure it out. The MET Project is to date the most comprehensive example we have of how teacher evaluation can be conducted to achieve reliable results and should be used as a foundation of next-generation teacher evaluation systems.

However, getting the method correct is just one step in a several-step process to ensuring successful implementation of evaluation systems. Another key aspect is getting buy-in from the professionals who will carry out the evaluation—whether that means teachers whose work will now be more deeply reviewed, or administrators, coaches, or peer teachers who must understand teaching in ways that allow them to observe, report, and reflect on instruction.

In order to support principals and teachers in this work and get their buy in, there are several matters to consider that include, but are not limited to, professional learning opportunities to better understand the teacher evaluation with which they’ll be working, realignment of school personnel responsibilities, and teacher/principal preparation programs’ alignment with teacher evaluation systems.

It’s no secret that principals and teachers have to be prepared to undertake new teacher evaluation systems. This will require robust professional learning opportunities. Currently, our nation’s school systems spend billions of dollars annually on what we know to be wasteful and ineffective professional development. This simply can no longer continue.

On top of better learning opportunities for principals and teachers, in many schools new innovative staffing may be required to accomplish the goal of effectively evaluating teachers. Most principals are already spread thin; therefore building capacity of other school leaders to assist in teacher evaluation may be essential. While this will also require further professional learning, many teachers are searching for such leadership opportunities. Conducting teacher evaluations is a clear career ladder step that many effective teachers could take advantage of to move their career forward while staying in the classroom.

Lastly, as teacher evaluation systems become more commonplace in schools nationwide, we must call on our teacher and principal preparation programs to give novice educators the knowledge and skills necessary to work within these new systems. If our newest teachers and principals come to schools well-versed in evaluation systems’ expectations, the implementation of these systems will be that much more smooth and effective.

While there is much work to be done to ensure successful implementation of teacher evaluation systems, the MET Project’s findings move us one step further in the right direction in teacher evaluation reform.

Read More

Print |
Share | E-mail

January 14, 2013 11:38 AM

Gates’ Foundation MET Study: Surprised?

By Renee Moore

I deliberately avoided looking at any of the social media spin on the final report of the Gates Foundation funded Measurements of Effective Teaching (MET) study until after I had done my own reading. I took the same approach to the release of the first report back in December 2010.

Then, as now, there are several things about this study that I admire. Like Fawn Johnson (National Journal.com Education Experts editor), I am impressed with the seriousness and sincerity of the researchers in tackling the complex issue of teacher evaluation, especially since there are too many people who want to oversimplify it. I’m also glad to know the data from this study (unlike some of the earlier studies involving value-added measures) is being made available to the wider research community for independent investigation of results.

Most delightful of all is the MET researchers’ recognition of the impo...

I deliberately avoided looking at any of the social media spin on the final report of the Gates Foundation funded Measurements of Effective Teaching (MET) study until after I had done my own reading. I took the same approach to the release of the first report back in December 2010.

Then, as now, there are several things about this study that I admire. Like Fawn Johnson (National Journal.com Education Experts editor), I am impressed with the seriousness and sincerity of the researchers in tackling the complex issue of teacher evaluation, especially since there are too many people who want to oversimplify it. I’m also glad to know the data from this study (unlike some of the earlier studies involving value-added measures) is being made available to the wider research community for independent investigation of results.

Most delightful of all is the MET researchers’ recognition of the importance of student voice in determining the quality of teachers’ work. If we are at all serious about preparing our youth to be critical thinkers and contributing citizens, we must start by listening to what only they can tell us about what is and is not working in our classrooms and schools.

Also, unlike some critics of the study, I reject the complaints about the MET’s inclusion of classroom observations by multiple evaluators as an important way to measure teacher effectiveness. The research team recommended that those observations should not be over or under represented in the blend of measures used in a teacher evaluation system. Here I’m using my parent lens (my husband and I have raised 11 children and shepherded them all through public school). There is essential information about a teacher’s effectiveness that no test data can reveal: How does that teacher treat my child? I have known teachers who could boast impressive student test numbers, but disrespected and demeaned their students in the process.

The purpose of teacher evaluation is to answer two questions (not one): How good a job is this teacher doing, AND how can this teacher do better? Candid, objective feedback from outside evaluators and thoughtful reflection by teachers on our work is essential for continuing professional growth.

Teachers submitting video of ourselves teaching for evaluation purposes is not new. Part of National Board Certification, a voluntary process for advanced teaching credential, requires teachers to not only include video examples, but extensive written analysis by the teacher candidate of his/her work using the video as evidence.

As a National Board Certified Teacher myself, and now as a member of the Board of Directors of the National Board of Professional Teaching Standards, I am gratified that the study confirms what the National Board has known and proven for 25 years: There are significant differences in the quality of instruction provided by teachers, and those differences have critical impact on student achievement and on student learning.

It was not the purpose of the MET study to distinguish between student achievement and student learning, but their interchangeable use of those terms in the report further confuses the concepts in the public conversation. In 2011, a task force commissioned by NBPTS (which included Robert Linn, Rick Hess, Lloyd Bond, and Lee Shulman) released a report that supplied much-needed clarification:

Student achievement is the status of subject-matter knowledge, understandings, and skills at one point in time.

Student learning is growth in subject-matter knowledge, understandings, and skills over time…It is student learning—not student achievement—that is most relevant to defining and assessing accomplished teaching.

Standardized tests are the instruments we use (for now) to measure student achievement, but there is much, much more that we need to know about measuring student achievement and student learning. As my higher education colleagues and many employers will testify, students meeting an arbitrary state cut score may (or may not) indicate factual recall of certain immediate learning objectives, but the method falls grievously short as a measure of what students actually know and can do after the test. How this scenario will change if, when, and after the “next-generation” assessments promised under the Common Core Standards are implemented remains to be seen. But if all we want from teacher evaluation is a way to identify which teachers are the best bets for raising student test scores, we would be setting a disgustingly low bar indeed.

Implementing teacher evaluation systems with a balance of multiple measures as recommended by the MET study will present significant hurdles to states and school districts, cost being only one of them. However, there are already some promising starts. Consider what these teachers from Garfield High School in Seattle, Washington have to say about the challenges of implementing such a teacher evaluation system. Notably, these teachers have also decided not to give the state-required tests to their students this Spring.

Surprise! Effective teacher evaluation not only distinguishes teachers; it empowers them.

Cross-posted at TeachMoore.

Read More

Print |
Share | E-mail

January 14, 2013 8:40 AM

MET Study Opens Less Obvious Doors

By Dan Cruce

The recent MET study release certainly does have the headline-grabbing data that show teacher effectiveness can indeed be measured scientifically. I would propose, though, that the documentation of such correlations, while essential to moving the overall work forward, is only as important as the professional conversations that occur with such results. This less sexy facet of reform is one that I feel needs highlighting in response to Ms. Johnson’s blog and as the education community reacts to the MET study findings.

The real change for kids will occur when a teacher and evaluator engage in meaningful dialogue about evaluation data. The enhancement of that professional dialogue as related to teacher evaluation certainly can occur as a result of student achievement data results, but it’s important to identify the other critical teacher voice opportunities within this evaluation reform continuum. To assume that teacher voice is only valuable at the end product stage short sights the full value of new opportunities. It’s paramount to actually realizing a...

The recent MET study release certainly does have the headline-grabbing data that show teacher effectiveness can indeed be measured scientifically. I would propose, though, that the documentation of such correlations, while essential to moving the overall work forward, is only as important as the professional conversations that occur with such results. This less sexy facet of reform is one that I feel needs highlighting in response to Ms. Johnson’s blog and as the education community reacts to the MET study findings.

The real change for kids will occur when a teacher and evaluator engage in meaningful dialogue about evaluation data. The enhancement of that professional dialogue as related to teacher evaluation certainly can occur as a result of student achievement data results, but it’s important to identify the other critical teacher voice opportunities within this evaluation reform continuum. To assume that teacher voice is only valuable at the end product stage short sights the full value of new opportunities. It’s paramount to actually realizing a culture change.

At the end of the day, it’s the human interaction and voice between those persons that really drives professional dialogue for improvement. The data alone do not accomplish that. To get specific, the process of re-building educator evaluations can actually bring such voice to the forefront and actually drive powerful professional conversations that complement the MET data. That teacher voice opportunity is ripe in many states, districts and charters now. We must realize that, identify where that’s happening in our states, and maximize the value now.

In most states across the country, work is underway to incorporate student growth measures into a teacher’s evaluation. In concept, it’s hard to argue that this shouldn’t happen. In application, it’s understandably complicated and, many times, controversial. The MET study shows how it can scientifically add value. I’d to specifically respond to Ms. Johnson’s blog by bringing a different light to this work. It shows that teacher voice can take many forms and have many functions in addition to the positive results in the MET data. The best way to illuminate this is by example, so I’ll pick on a state near and dear to my heart, the First State: Delaware.

A little background to set the table will help the example shine more brightly. In Delaware, teachers across all subjects (not just non-tested grades) have met multiple times by their respective subject areas as convened by the Delaware Department of Education (DDOE). At those meetings, teachers either create or vet methods by which they can be evaluated regarding student growth. It’s easier to wrap your head around how a teacher may provide a pre-test at or near the beginning of the school year to capture data on where each student is related to the subject. Then it’s similarly easy to imagine the delivery of a post-test at or towards the end of the school year to measure how far those students have progressed as measured against their respective pre-test scores. As always, the devil is in the detail, and I acknowledge that. My spotlight, though, is not on the detail devil, but on the empowerment tool that is a byproduct of some of this work.

I’ll move us outside of the core four subject examples (social studies, science, math, English/language arts) to a specific example: guidance counselors.

In Delaware, the work on student growth measure for guidance counselors is shepherded by DDOE’s Dennis Rozumalski. He consistently champions the issue of tapping our guidance counselors for their respective highest use for kids in schools. He and the guidance counselor professionals with whom he undertook this work used this as the opportunity to craft student learning objectives/growth goals (SLOs) that are framed around what guidance counselors should do for kids in schools. These SLOs were developed collaboratively and before employed as part of the, a professional conversation must occur between the guidance counselor and the administrator to approve the SLOs.

This is where the light of this new tool is most bright; the administrator has a new/renewed opportunity to converse (and learn) from the professional dialogue with the guidance counselor. It’s not only about what the administrator needs, but what the guidance counselor’s profession needs to provide to the students. It’s not about opinion from the guidance counselor, but about professional facts and professional responsibilities which align to the role and against which the guidance counselor will be evaluated.

This may have occurred in some schools prior to this new effort, but until it occurs in all schools we should look at the student growth measure work and the MET findings as having a byproduct opportunity such as shared above, rather than a bureaucratic burden. I also believe that that this type of an example can help to broaden the vision for teacher voice value throughout the process. Making that voice heard is what we do at Hope Street Group, and one of the main tenets of our Teacher Evaluation Playbook.

Read More

Print |
Share | E-mail

Leave a response

Next Page »

 

Archives
  • May 2013
    • Student Loan Bonanza
    • They Don't Learn It If They Don't Like You
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
Education Blogroll
  • Alexander Russo’s This Week in Education
  • Brainstorm
  • Bridging Differences
  • Board Buzz
  • Charter Blog
  • Chicago Public Schools Blog
  • Early Ed Watch
  • Ed Money Watch
  • EdReformer
  • Edspresso
  • Education Gadfly
  • Education Intelligence Agency Intercepts
  • Education Optimists
  • Eduwonk
  • Edwize
  • Flypaper
  • GreatSchools Blog
  • Hechinger Report
  • Higher Ed Watch
  • Joanne Jacobs
  • Joe Williams’ Blog
  • National Education Policy Center
  • Politics K-12
  • Sherman Dorn
  • Top Performers
  • World Of Learning

The “agree” function has been temporarily disabled from the blog while we transition to a new system. The National Journal Group has the right (but not the obligation) to monitor the comments and to remove any materials it deems inappropriate.

NationalJournal Magazine | NationalJournal Daily | Hotline | Almanac | NationalJournal Live
About | Contact Us | Press Room | Staff Bios | Jobs | Reprints & Back Issues | Advertise | Privacy Policy | Terms of Service
Atlantic Media Company | Government Executive | The Atlantic | Quartz
Copyright © 2013 by National Journal Group Inc.
Powered by the Parse.ly Publisher Platform (P3).