Teaching Evaluations Are Stupid

They do not predict success or quality!

Apr 09, 2025

In universities today, we ask students to evaluate the quality of their teachers. These are These evaluations only have a very weak correlation with actual teacher quality, though. Students, above all else, hate receiving a bad grade, not whether or not they actually learned the material.

In fact, there is reasonably strong evidence for teachers who get worse ratings from students being better teachers. Scott Carrell and James West exploited the random assignment of students in the U.S. Air Force Academy to instructors to infer the causal effect of good teachers. Since everyone takes the same set of courses, students get no choice over teachers, and filling out the course assessment is basically an order, it is the ideal setting to study this. Carrell and West found that getting an easy instructor in your early math courses led to considerably worse performance later on. Since performance in current classes predicted how highly one rated their current professor, student evaluations largely negatively predicted future performance, though with nothing significant. Note that this is only true in math, which builds on itself much more than the humanities would.

There is other work replicating it, though. Braga, Pacagnella, and Pellizari (2014) looked at students at Bocconi University in Italy, and found that students' evaluations were negatively and significantly correlated with future outcomes. The students must not have been taking the evaluations too seriously either – they found that students gave more negative ratings if it was cold and rainy on the day of the evaluation! DeVlieger, Jacob, and Stange (2019) used a massive administrative dataset from the University of Phoenix, and also found that student evaluations were mostly uncorrelated with future performance.

For a while, those were the only studies I knew of which addressed the endogeneity of teacher evaluations through a plausibly causal method. A new study has come along though, by Merrill Warnick, Jacob Light, and Anthony Yim; this is Warnick’s job market paper.

Most of it is concerned with how we evaluate quality when students are free to select their own courses. In primary schools, evaluating impact is simple, because we can randomly assign students; in post-secondary education, someone might have incredible value-add but be overwhelmingly taken by below-average students, or they might add nothing but drive away the worst students. Their method is to group students with prior course history who diverge in their instructors for a given class. This is obviously extremely data-intensive, but they have transcripts from the entire university system of Texas, linked to later tax records to infer the impact of teacher quality on earnings. (The common mark of a job market paper is to jam at least two papers worth of ideas in there, and also to do really hard things more or less to show you can.)

Student evaluations do much worse than their measures of value-added. In fact, they estimate that if value-added was used for teacher retention instead of evaluations, wages for graduates would be 2.7% higher. Note that their results actually contradict Carrell and West – here they find that it is only uncorrelated for earnings, but not for next semester GPA.

The best argument for student evaluations seems to be customer service. The students are not paying to be taught – they are paying to be passed. This view of the world sees universities as research laboratories attached to a gift shop, where weaker undergraduate students can be passed through as quick as they can.

Needless to say, I am not a fan of this. To inflate grades and make it easy to pass is defecting in a prisoner’s dilemma, for while it might be nice to take advantage of the good faith of others in thinking your degree is meaningful, nobody wants to live in a world in which everyone is incompetent. Colleges should evaluate teachers of the core subjects on their value add to student performance on standardized tests, and for the more advanced classes should retain student evaluations only as a way to get some information on extreme cases of incompetence.

I have been told, in any event, that for many places they do not matter. A professor I shall not name told folks in class once that he had served on a university committee for quite a while for professors in other departments. Never, not once, was a teacher denied tenure for manifest incompetence in teaching – and were there ever folks who almost certainly were!

Homo Economicus

Discussion about this post

Ready for more?