Student evaluations of faculty: concerns raised in the literature, and possible solutions.
Student evaluation of teachers (Analysis)
Wright, Robert E.
Pub Date:
Name: College Student Journal Publisher: Project Innovation (Alabama) Audience: Academic Format: Magazine/Journal Subject: Education Copyright: COPYRIGHT 2006 Project Innovation (Alabama) ISSN: 0146-3934
Date: June, 2006 Source Volume: 40 Source Issue: 2
Geographic Scope: United States Geographic Code: 1USA United States

Accession Number:
Full Text:
Student evaluations of instruction have long been used to evaluate the teaching performance of instructors. However, despite the widespread use of data from student evaluations for the purpose of determining faculty teaching effectiveness, a review of the literature in the area indicates that issues concerning the validity and usefulness of such evaluations remain unresolved. Given the pervasiveness of usage of such evaluations, substantial changes in the system are unlikely to be implemented across the country. After reviewing key problem areas that have been identified in student evaluations of faculty, this paper suggests some possible methods to increase the validity of teaching evaluations without major changes to the current systems of evaluation of faculty. Such changes could benefit both college students and faculty be increasing the usefulness of student evaluations of faculty.


Administrators of colleges and universities for some time have stressed the importance of a marketing orientation (Bush, Ferrell & Thomas, Jr., 1998). Because students are one of the consumer groups interested in the product of a college education, students opinions are consider a vital source of information concerning the quality of instruction at colleges and universities. Virtually all colleges use student evaluations of instructors as a measure of instructor performance (Magner, p. A 18, 1997), therefore, such student evaluations have a significant impact on tenure, promotion or merit pay decisions concerning faculty (Centra, 1979; Ehie & Karathanos, 1994).

Feedback from students may help instructors to improve their teaching performance (Marsh, 1991). Unfortunately, the use of such ratings for evaluations relating to reward systems of a college or university may be problematic.

To the extent that student consumers are responding to factors that should be unrelated to teaching quality, such evaluations may be misleading (Cashin, Downey & Sixbury, 1994; Marsh, 1994, 1995), and may have negative consequences on the overall quality of the educational experience for students.

Extensive research has been conducted on student evaluations to determine their validity. Stability and internal consistency (Cashin, Downey & Sixbury, 1994; Costin, Greenough & Menges, 1971; Marsh, 1994, 1995) as well as variability between instructors (Marsh & Bailey, 1993) have all been demonstrated.

However, numerous studies have cast doubt on the validity of such instruments. Rodin & Rodin (1972) found a negative relationship between student performance and student ratings. O'Connell & Dickinson (1993) found that amount learned by students was unrelated to the overall ratings of the instructor. Yunker and Yunker (2003) found that students from a class where the instructor was rated higher did worse than students from class where an instructor was rated lower, in a subsequent follow-up class. In another study, Koon and Murray (1995) found that final examination scores had only a .30 correlation with student ratings of instructors. Marsh (1987) noted that workload and students' grades might also affect student ratings of instructors. Students may not have the level of knowledge necessary to properly evaluate their instruction (Olshavsky & Spreng, 1995). This may lead to use of some other proxy in determination of instructor performance.

"Entertainment" level of the classroom experience has been shown to affect overall instructor ratings (Costin, Greenough, & Menges, 1971). The famous "Dr. Fox" study (Naftulin, Ware, & Donnelly, 1973) found that an enthusiastic actor was highly rated on teaching quality despite a lecture intentionally devoid of content.

Narrative comments noted on student evaluations of faculty in one study demonstrated that many students wanted classes to be more fun and entertaining (Trout, 1997). Williams and Ceci (1997) found that emphasizing communication skills produced significant improvements in ratings of an instructor by students in all areas of teaching for a class, despite the fact that the material and lecture format remained identical, as well as student performance.

Wright (2000) found that perceived fairness in grading and instructor appearance were strongly related to student evaluations of professors, despite the fact that these factors may be unrelated to learning of students.

These results suggest that student consumers may rate faculty using a different set of criteria than other constituencies.

Student consumers, may, in fact prefer a teaching style that is detrimental to their learning experience. For example, Strom, Hocevar, and Zimmer (1990) showed that students who preferred easy courses had greater achievement with instructors who had a relatively low orientation toward students' interests, achievement, and satisfaction, but preferred teachers who had a high level of student orientation.

In addition, external ratings may differ from student-derived ratings. This was dramatically demonstrated when a Business Week ranking of graduate business schools (Bryne & Leonhardt, 1996, p.112, 113) showed substantial differences for the ratings of teaching performance in terms of an output measure (the perceived level of skills of MBA graduates as rated by corporate recruiters) and a process measure (ratings of the instructional quality of faculty in the same program by MBAs (Bryne & Leonhardt, 1996, p. 122).

As student evaluations of faculty have become increasingly important, the process of instructor evaluation needs to be reexamined. Given that the process of using students to evaluate instructors is ingrained in academia, it is unlikely that efforts to totally overhaul this system would be fruitful. However, it may be possible to significantly improve the quality of information gained from student evaluations of faculty with relatively small changes. These changes involve the anonymity of the evaluations, and the massive quantity of data generated.

Anonymous Evaluations

Typically, evaluations are anonymous. The basic idea behind anonymous evaluations is apparently to avoid potential reprisal from faculty. This idea certainly has a positive intent. However, anonymity may facilitate reprisal from students to faculty who gave the students lower grades than the students felt that they deserved. Faculty and administrators have no way of determining which students gave low ratings to an instructor. Apparently, it is believed that faculty members are less trustworthy than students.

Under a system of anonymous evaluations, students need take no responsibility for their opinions. With no possibility for follow-up, students need not think through their decision. They do not have to carefully consider all facts, in order to come to a valid and justifiable conclusion, supported by facts. An evaluation could be based solely on latent anger resulting from a recent grade received on an exam, or from a single negative in-class experience with an instructor over the course of an entire semester.

In addition, the practice of anonymity provides no way to insure that the evaluations were properly filled out. In many instances, the faculty member leaves a classroom while the evaluations are being filled out. This leaves the potential for significant abuse. For example, students could enter the room and fill out evaluations who were not even in the class.

Another potential problem is lack of ability to follow up on the results. No details can be gathered as to why an evaluation was very positive or very negative. Do all students with low grades give low evaluations? While a question may be asked concerning the students expected grades, there is no way to insure that students' reports are accurate. Do students with poor attendance records give low evaluations to an instructor they may have only rarely had contact with in class? There is absolutely no way to determine the relationship between class attendance and student evaluations of instructors.

Closely related to the previous point is the lack of ability to track "customers". Businesses use customer tracking systems to determine if certain individuals are blatantly abusing the complaint system. Such businesses can then determine whether the problem is with the business, or with the customer. Under a system of anonymous evaluations, there is no way to determine whether a student is a chronic complainer.

The Problem of Survey Data

Student evaluations are a result of survey data. While the survey results may give an overview of student feelings concerning a faculty member, they do not provide an in-depth picture of what happened in the classroom. They also do not allow for probing, to determine the factors leading to the evaluation. If it was possible to track evaluations to individual students, in depth interviews could determine reasons for dissatisfaction. This would allow administrators to determine whether the problems were due to weaknesses in the instructor, or in the student, or in both. It would be possible to determine, for example, if there was a correlation between academic preparation of students and evaluation of instructors. Do instructors get lower evaluations from students who are less prepared?

The Problem of Massive Data

Even in small colleges, student evaluations generate an immense quantity of data. If there is evidence of possible problems with instructor performance based on student evaluations, it is imperative that we look closely to determine the extent and nature of such problems. However, such a massive amount of data precludes that possibility.

One factor leading to the problem of massive data is the problem of both too much and too little oversight of faculty.

It is frequently the case that all faculty are evaluated in the same fashion, whether they have been teaching for one year or 15 years. With such a large number of people to evaluate, it is very difficult to perform an in-depth evaluation of everyone. Therefore, decisions on such important processes as faculty tenure may be based solely or largely on anonymous student evaluations of faculty.

It would seem evident that much closer scrutiny should be given to new faculty. Given the long-term nature of any tenure decision, and the importance of teaching as a measure of a faculty member's performance, it is important to examine closely all aspects of the teaching performance of untenured faculty members. Without such oversight, untenured faculty members may be tempted to "dumb down" their courses, give high grades, and take other extraordinary measures in an attempt to increase student evaluations, and thus maximize a faculty member's chances of being awarded tenure.

Conversely, tenured faculty members are, by definition, proven performers in the classroom. They may resent close scrutiny of their teaching performance, and such close scrutiny is unlikely to result in a positive outcome.


Evaluating the teaching performance of faculty members is a critical part of ensuring a high quality education for students. However, given the research showing potential problems with the current method of evaluation, significant changes need to be made.

The first suggestion is to change the nature of the student evaluations from being anonymous in nature to confidential in nature. The instructor would not be given access to this database. But, such a database would allow implementation of a customer tracking system by someone in the university. It would also allow for follow-up to investigate particularly high or low evaluations. In addition, this allows investigation into how actual grades are related to evaluations.

The second suggestion is to overcome the problems of massive amounts of data, and too little oversight of untenured faculty. Besides the use of traditional student evaluations of faculty, all classes taught by untenured faculty should be subject to sampling and in-depth interviews of selected students.

Students could be randomly sampled from among those that had high, medium, and low ratings of the untenured faculty member. These selected students could then be interviewed in depth concerning what really went on in the class, and the reasons for ratings of the professor. In the case of small classes, everyone in the class could be interviewed. This process would serve to determine to what extent rigor, and high expectations, or lack of rigor and low expectations, affected evaluations in the classroom. Such a system would also serve to protect faculty members who were very demanding in the classroom, but skilled teachers. Because the number of untenured faculty would be expected to be small in relation to the number of total faculty, this plan would be workable.

For tenured faculty, a similar system would be available if requested by the faculty member. For those tenured faculty with consistently high ratings, a normal grade distribution, and no student complaints, the standard student evaluations system could be used. For those who seemed to be having problems, closer scrutiny could be paid using the above-described system.


Student evaluations of faculty are a widely used method of evaluating faculty performance in the classroom. However, significant improvements can be made in the system with only minor changes, which may significantly improve the educational experience for students.


Bryne, J. A. & Leonhardt, D. (with Bongiorno, L. & Jespersen, F.) (1996, October 21). The best B schools. Business Week 110-116.

Bush, V., Ferrell, O.C., & Thomas, Jr., J. L. (1998). Marketing the business school: An exploratory investigation. Journal of Marketing Education 20, 16-23.

Cashin, W. E., & Downey, R.G. (1992). Using global student rating items for summative evaluation. Journal of Educational Psychology, 84, (4), 563-572.

Cashin, W. E., Downey, R. G. & Sixbury, G. R. (1994). Global and specific ratings of teaching effectiveness and their relation to course objectives: Reply to Marsh (1994). Journal of Educational Psychology, 86,(4), 649-657.

Centra, J. (1979). Determining faculty effectiveness:Assessing teaching, research, and service for personnel decisions and improvement. San Francisco: Jossey-Bass Inc.

Costin, F., Greenough, W.T., & Menges, R.J., (1971). Student ratings of college teaching: Reliability, validity, and usefulness. Review of Educational Research. 41, 511-535.

Ehie, I. C. & Karathanos, D. (1994). Business faculty performance evaluation based on the new AACSB accreditation standards. Journal of Education for Business 69, (No. 5), 257-262.

Hativa, N. & Raviv, A. (1993). Using a single score for summative teacher evaluation by students. Research in Higher Education 34 (5), 625-646.

Koon, J. & Murray, H.G., (1983). Low-inference classroom teaching behaviors and student ratings of college teaching effectiveness. Journal of Educational Psychology, 75, 138-149.

Magner, D. K., (1997) Report says standards used to evaluated research should also be used for teaching and service. The Chronicle of Higher Education 44, (2), p. A18-A19.

Marsh, H. W. (1995). Still weighting for the right criteria to validate student evaluations of teaching in the IDEA system. Journal of Educational Psychology, 87,(4), 666-679.

Marsh, H. W. (1994). Weighting the right criteria in the instructional development and effectiveness assessment (idea) system: Global and specific ratings of teaching effectiveness and their relation to course objectives. Journal of Educational Psychology 86,(4), 631-648.

Marsh, H. (1991). Multidimensional students' evaluations of teaching effectiveness: A test of alternative higher-order structures. Journal of Educational Psychology, 83 (2), 285-296.

Marsh, H. (1987). Students' evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11, #3, 253-388.

Marsh, H. W. & Bailey, M., (1993). Multidimensional students' evaluations of teaching effectiveness. Journal of Higher Education 64, (1), 1-18.

Naftulin, D.H., M.D., Ware, J.E., Jr., & Donnelly, F.A., (1983). The Doctor Fox lecture: A paradigm of educational seduction. Journal of Medical Education 75, 138-149.

O'Connell, D. Q. & Dickinson, D. J., (1993). Student ratings of instruction as a function of testing conditions and perceptions of amount learned. Journal of Research and Development in Education 27, (1), 18-23.

Olshavsky, R. W. & Spreng, R.A. (1995). Consumer satisfaction and students: Some pitfalls of being customer driven. Journal of Consumer Satisfaction Dissatisfaction and Complaining Behaviors, 8, 69-77

Parasuraman, A., Zeithaml, V., & Berry, L. (1985). A conceptual model of service quality and its implications for future research. Journal of Marketing, 49 (Fall), 41 50.

Rodin, M. & Rodin, B., (1972). Student evaluations of teachers. Science. 177, 1164-1166.

Strom, B., Hocevar, D. & Zimmer, J., (1990). Satisfaction and achievement: Antagonists in ATI research on student-oriented instruction. Educational Research Quarterly 14, (4), 15-21.

Trout, P. A. (1997). What the numbers mean: Providing a context for numerical student evaluations of courses. Change, 29, 25-30.

Williams, W.M. & Ceci, S.J. (1997). "How'm I doing?" Problems with student ratings of instructors and courses. Change, 29, 13-23.

Wright, R.E. (2000). Student evaluations and consumer orientation of universities. Journal of Nonprofit and Public Sector Marketing Vol 8, p. 33-40.

Yunker, P. & Yunker, J. (2003). Are student evaluations of teaching valid? Evidence from an analytical business core course. Journal of Education for Business. Vol. 78, p. 313-317.

Robert E. Wright

Business Administration Program

College of Business and Management

University of Illinois at Springfield
Gale Copyright:
Copyright 2006 Gale, Cengage Learning. All rights reserved.