Value-added program assessment using nationally standardized tests: insights into internal validity issues.
The objectives of the present research are (1) to investigate specific internal validity issues involved in implementing value-added assessment at the program level using nationally standardized tests as pretest and posttest measurements for students seeking a baccalaureate business degree, and (2) to present some value-added results we have estimated in our business program. We discuss challenges associated when using American College Testing (ACT) program composite scores as a measure of overall knowledge and skills upon entry and the Major Field Test (MFT) total score as a measure of discipline-specific knowledge and skills upon exiting a baccalaureate business program. Options for estimating the effects of, as well as reducing the impact of, some of these issues are discussed. Also discussed are significant predictors of MFT for business students at a regional comprehensive university between 1999 and 2004.

Keywords: value-added approach, internal validity, higher education, assessment

Article Type:
Engineering schools (Education)
Educational evaluation (Research)
Standardized tests (Research)
Zeis, Charles
Waronska, Agnieszka K.
Fuller, Rex
Pub Date:
Name: Journal of Academy of Business and Economics Publisher: International Academy of Business and Economics Audience: Academic Format: Magazine/Journal Subject: Business; Business, general; Economics; Government Copyright: COPYRIGHT 2009 International Academy of Business and Economics ISSN: 1542-8710
Date: Jan, 2009 Source Volume: 9 Source Issue: 1
Event Code: 310 Science & research
Product Code: 9105111 Educational Quality Assessment; 2731130 Standardized Testbooks NAICS Code: 92311 Administration of Education Programs; 51113 Book Publishers
Geographic Scope: United States Geographic Code: 1USA United States
Accession Number:
Full Text:

Over the past twenty years, higher education has increasingly utilized various assessment approaches to demonstrate the impact or value of a post-secondary education. Such efforts have been driven by several interrelated forces, among others, increased accountability. For public institutions, increased accountability is typically measured in one or more of possible ways: (1) through legislative funding, (2) through oversight by governing boards, (3) through oversight by coordinating boards.

Outcomes assessment at both the student and program level is a common practice in schools of business and is supported by both academic business program accreditation agencies: The Association to Advance Collegiate Schools of Business (AACSB) and The Association of Collegiate Business Schools and Programs (ACBSP) (Henninger, 1994). At the university level, regional accrediting bodies such as North Central Association Commission on Accreditation and School Improvement (NCA CASI) ( have included assessment programs as major components of accreditation since the early 1990's.

Program assessment evaluates a program, as opposed to an individual student. Numerous researchers support a value-added program assessment approach (Bennett, 2001; Clerehan et al., 2003; Kerby and Weber, 2000; Pickering and Bowers, 1990; Pipho, 1998; Tam, 2001).

Olson (2004) discusses fundamental problems related to value-added program assessment. The author points out that with current practice; statistical biases are not well understood. They also do not control for student or school characteristics.

Pickering and Bowers (1990) discuss strengths and weaknesses of the value-added approach. In addition, they delineate four major validity issues to be addressed when using a value-added approach; specifically: (1) problems traditionally associated with one-group pretest-posttest designs; (2) problems that students encounter during the actual test-taking procedures; (3) problems with defining and measuring the effects of the treatment, that is, the curriculum; and (4) problems with analyzing and interpreting the data to make recommendations.

Isaac and Michael (1981) discuss the research background of these problems. They list traditional threats to internal validity. These threats overlap with the problems listed by Pickering and Bowers (1990).

Some other researchers strongly oppose the value-added approach. Miller (1999) argues that it may actually reduce the overall level of observed student achievement when the goal is to increase outcomes from a base performance level that may be lower than the actual levels of some students. Tam (2001) concludes there is no doubt that the value-added approach to quality measurement is an improvement from the input-output analysis and its associated performance indicators but highlights the fundamental problem of the approach; that it assumes a stable relationship between students' performance at the points of entry and exit. Similarly, McCaffrey et al. (2003) note the promise of the value-added approach and argue that estimates need to be provided that are not distorted by the powerful effects of such non-educational factors as family background. This study also recommends the empirical evaluation of other potential error sources. Slater (2003) has doubts about the quality of added value scores.

To date, the empirical literature does not give details about overcoming these validity issues. We do not see comprehensive solutions to the validity threats in the literature.


Our definition of value-added comes from McMillan (1988):

Students are assessed for entering competencies and then reassessed following the completion of appropriate courses or experiences. Differences between the initial and subsequent measures are then used as evidence of institutional impact.

The specific perception of the value-added concept that we are trying to realize with our program is stated as follows:

If measurements of program elements are found to contribute positively to exit scores (Major Field Achievement Tests from ETS. Inc, which we call MFT), over and above entry scores (ACT Composite Score of ACT Inc., which we call ACT), and over and above other sources of bias (including non-educational factors) that can be identified, then the program adds value to students' education.

We calculate value-added based on MFT scores recorded for all graduating seniors for whom ACT scores are available. Our results support a hypothesis of a positive difference between exit and entry scores. We use stepwise multiple regression to study a quasi-experimental model that estimates MFT using ACT, grade point average (GPA), time, and various demographics and interactions as predictors. The significance of the GPA coefficient will give an indication of a correlation between the curriculum and MFT over and above that of ACT.

The purpose of this research is: (1) to address the internal validity issues delineated by Pickering and Bowers (1990), and (2) to present statistical estimates of value-added in our program that take into account the effects of certain confounding variables. Some options to surmount selected issues will be suggested. This paper does not address external validity issues that have to do with generalizing these results to other institutions or degree programs.

The intended audience for this paper is all researchers who want insight into measurement problems associated with assessment data. We hope that institutions that desire to use nationally standardized entrance and exit scores such as ACT and MFT will be able to relate to issues of internal validity discussed.


The subject of this research is a university accredited regionally by the NCA CASI. It is classified as a moderately selective Masters I regional, comprehensive university that serves students from a large geographic region. Recently recorded statistics indicate that the university enrollment is less than 4,500 students, accounted for less than 4,000 FTE. According to the university mission statement:

"... The University shall offer a broad array of baccalaureate programs with a strong professional focus and a firm grounding in the liberal arts and sciences. The university shall also offer selected masters-level graduate programs".

Based upon this mission, the university offers a number of traditional as well as professional undergraduate programs. The professional programs include teacher education, nursing, engineering technology, and business. The graduate programs reflect this professional emphasis and include nursing, applied natural science, industrial systems engineering, and business administration.

Housed in the provost's office, the university created a centralized professional staff position in April of 1998. Among others, a major focus of this position was consultative in nature in that the individual work with colleges and departments as they developed and implemented their college and program assessment plans. During the initial phases, many units, including the business program, made progress on developing assessment plans related to student learning. In the wake of budgetary pressures, the centralized support for assessment was eliminated and individual academic units assumed full responsibility for developing, maintaining and finding resources for assessment plans.

These circumstances led the program to develop an assessment plan that could be sustained with limited administrative support. The decision to abandon a centralized assessment approach, coupled with accreditation requirements meant that the business program had to develop an alternative approach to assessment that would allow the school to measure student learning and engage in continuous improvement practices. In this environment, the business school developed an assessment plan that relies on multiple measures of student learning.


The assessment program includes a strategic mix of indirect and direct assessment measures. Indirect measures include the EBI satisfaction survey as well as a self-assessment of student proficiencies in selected learning goals. The resulting self-reported student data enables the school to carefully evaluate its learning environment and internal processes. For example, the EBI report provides feedback on the perceived quality of teaching, academic advising, career advising, placement services, and the quality of interactions with fellow students. The results from these types of questions enable the program to improve its academic processes.

Other indirect measures include locally developed instruments that track self-reported skill levels for specific program outcomes. This assessment tool furnishes student's self-ratings on various learning outcomes. For example, students are asked to rate their ability to develop a business plan. These results are tracked over time and used to inform faculty discussions about programmatic change. Direct program measures include the ACT and the MFT. Direct measures allow a greater understanding of actual student achievement.


During the early stages of the university's assessment efforts, the business school decided to use ETS's major field tests (MFT's) to monitor student learning in subject areas of business. The MFT is administered each year and the results are tracked over time. The data is also used to evaluate program effectiveness vis-a-vis its learning goals. For example, changes in the sub-scores on specific business subjects are evaluated in light of the curriculum requirements to discern whether program changes have had a positive or negative effect on MFT scores. Given the nature of the School's curriculum, the MFT is seen as an appropriate measure of general business knowledge. Moreover, we argue that by comparing MFT scores of graduating seniors to their ACT scores, one can estimate "value-added" that is due to the educational program.

Lopez (2002) cautions against using standardized tests as assessment instruments unless the particular test selected has been found to be appropriate for the specific learning objectives. Lopez recommends the use of multiple measures, both quantitative and qualitative, for measuring student achievement. As noted in the preceding section, this multi-faceted approach has been adopted by the school and endorsed by the faculty.


The positive relationship between ACT and success in college is well known. Noble and Sawyer (2002) reported that ACT Composite scores were effective in predicting all first-year GPA levels for college students. They stated that for institutions participating in ACT Research Services that, across 129 colleges, the median multiple correlation (across institutions) between first-year GPA and the four ACT scores (in English, mathematics, reading, and science reasoning) was 0.43. The ACT Information Brief (2004) reports the correlation between ACT Composite scores and first-year cumulative GPAs was 0.45 for returning students, compared to a correlation of 0.27 for non-returning students. Stumpf and Stanley (2002) report that ACT scores are as good predictors of the percentage of students graduating who were admitted as freshmen (correlations between 0.62 and 0.73) and high school GPA (correlation = 0.49). Although these correlations are significant, they also indicate that ACT is not a strong predictor of college success.


There have been positive correlations reported between general measures of student learning and student ability and college success. Powers and Kaufman (2002) find positive correlations between (Graduate Record Examinations[R], of ETS Corp., is a college level measure of student learning) and selected personality traits such as quickness and creativity. Powers (1998) found median multiple correlations between GRE[R] and success in veterinary school is 0.53. Therefore, we expect a positive relationship between ACT scores and MFT scores.

This paper focuses only on the segment of our assessment model that uses the ACT Composite as the entry score and MFT Composite for the exit score. Lopez (1998) cites the use of these measures for certain universities in the North Central Association, but does not give details about results or specific problems.



The entry measurement is defined as the percentile of the average ACT of graduating business students at entry, compared to a national reference population of all entering college students. The exit measurement is the percentile of the average MFT for graduating seniors. MFT is administered in a capstone class during a typical student's last semester (or in some cases, the next-to-last semester). The exit percentile is based on a MFT distribution determined from the "average" institutional MFT corresponding to the semester the test was taken


Internal validity refers to whether the treatments or other measurable effects in an experiment actually impact the outcome variable. Internal validity threats are associated with one group pre-post test design mentioned earlier by Isaac and Michaels (1981): history, maturation, testing effects, instrumentation effects, statistical regression, selection bias, and mortality. We address each of these in this section.

Definitions for these threats go back to Campbell and Stanley (1963). Some of the following--such as history, maturation, and instrumentation--would be mitigated by the use of a control group. To utilize a classic control group approach, an institution would need to maintain a randomly selected group of students who would not be permitted to complete the educational program. Such an approach is probably unrealistic. Later we will suggest how the use of a consortium of peer institutions might serve as a pseudo-control group.


The problem: There are historical trends in ACT or MFT scores that can be estimated using data provided by ACT Inc. and ETS Corp., presuming the ACT scores used are recorded with their corresponding dates. University historical events can be harder to account for. Such things can include changes in the status of the university, such as mission changes, faculty or staff changes, events that have financial impact on the university, changes in admission standards or accreditation status, curriculum changes, or other.

Administrative changes could impact test results. In Colorado, for example, the public university system was put under financial restrictions brought on by a tax limitation amendment passed by the voters in 1994.

The solution: One way to address historical trends is to estimate them using time series analysis. This requires at minimum data that is taken over time. Ideally, results from a national group would be used as a control group. Historical events are represented in statistical models by indicator variables. This is not a perfect solution, because general improvements or declines in the group as a whole can be confounded with this historical event estimate.

Another solution for history effects (as well as maturation and instrumentation) is to estimate them using a group of statistically equivalent universities serving as a control group. Deviations from the group taken as a whole provide a measure of history that is due to an individual institution that is not subjected to the same history as other group members. We wee this as a possibility for the future at our School.


The problem: Simple maturity of students occurs during the course of any assessment experiment. An argument can be made that maturation will average out across the nation. However, there could be reasons why maturity could be different in one university as compared with national averages. A more important question is why would the maturation effect at CSU-Pueblo be any different than that experienced by other Masters I institutions--i.e. the group to which the MFT scores are compared? A priori, could the maturation effect on "open" admissions campuses be larger?

The solution: This can at least be studied by recording demographic variables and studying their relationship with entry and exit measurements. If variables such as age or employment status contribute to the prediction of value-added, this could be an indication of a maturation effect. If so, an attempt should be made to estimate the size of the effect, and make adjustments accordingly. Results of predicting the criterion MFT variable adjusted for ACT; for certain demographic variables, and time, are discussed later.


The problem: It is known that some universities that highly value their MFT exit scores may play up this emphasis to their students. Some even go so far as to offer incentives in an attempt to get better scores. One could argue that unbiasedness should be consciously practiced if comparisons are to be made with national averages for exit scores. Alternatively, one might argue that schools that offer incentives are measuring the maximum gain--assuming that the incentives offered by schools provide a similar incentive as scoring well on ACT to gain admission to a one's school(s) of choice. By contrast, schools that do not offer incentives may be seeing a low estimate of value-added--i.e. less incentive for the MFT than ACT may result in less value-added.

The Hawthorne effect occurs when subjects in experiments are aware that they are subjects, and they respond differently than if they were not. It can be partially minimized by the simple practice of not focusing students on ulterior assessment goals during test administration. The Hawthorne effect can be difficult to detect and correct without control groups.

A reverse testing effect may be present if, for various reasons, students are not motivated to score up to their potential on exit tests. It is presumed that students are motivated during entry testing--the ACT test in our case--because they need a good score in order to be admitted into a desirable college program. But exit tests may not carry the same motivation for students who have no plans to continue their education past their baccalaureate degree.

It is fair to say that no matter how conscientious experimenters are during the actual test, some testing effect may be inevitable.

The solution: At our schools, extraordinary means to motivate students to score well on exit tests are avoided. We presume students infer that it carries some degree of importance because it is administered during class time.


The problem: Because we use percentiles on nationally standardized tests, there is a number of instrumentation-related questions that arise. These usually are concerned with the statistical distribution of scores.

Question 1: Is it fair to compare ACT with MFT, given that ACT assesses high school students' general educational development and their ability to complete college-level work, and MFT is designed to measure knowledge of business disciplines?

Question 2: What statistical distribution should be used for the entry score percentile? Also, the ACT scores are reported for two groups--Core and Less (than Core). Core students are those who have completed or plan to complete the college core curriculum. The distribution of scores for Core students may not ideally match that of actual college students in the U.S. Should adjustments be made to national ACT scores to account for differences between the original population tested and the population that entered a particular university?

Solution for question 2: To address this question, one can use the data from US News and World Reports (1999-2003) in their annual America's Best Colleges reports to construct a distribution of ACT scores across the country. One can construct a distribution of scores as reported to US News. A test is then run to see if the mean score is different than that for the distribution of Core students reported by ACT. When we did this analysis, we found the means for the two groups are almost identical. Had they been different, the data could provide a way to adjust the MFT percentiles for college-bound students. This analysis is discussed under Details of the ACT Distribution of Entry Scores.

Question 3: A question regarding the ACT scores is whether or not the distribution of scores for Core students matches that of the set of students that actually enroll and attend college? We can surmise that many students that say they are going to college in actuality do not, and many who do not decide on college when taking the test will, in fact, later decide to go. Also, there may be differences between demographic groups of a particular institution, as compared with the national group.

Solution for question 3: We studied national trends for Hispanic versus non-Hispanic, gender, and business versus non-business intentions of high school students. Since we can identify minority and gender status, we analyze directly for those differences. We did not make an adjustment for Core versus non-'Core because the results we got from the US News and World Reports did not suggest that it be made.

Question 4: Because entry exam takers span all college major fields of study, while exit exam takers are housed primarily in a specialized sector such as business, how large is the bias associated with fundamental differences between these two populations?

Solution for question 4: An adjustment for the area of specialty for exit scores such as MFT can be made based on the distribution of scores for area bound students. ACT reports scores broken down by planned majors such as business can be obtained from ACT Inc. From this, an adjustment to the entry score percentile can be made for areas of study. We found from data obtained from ETS Corp. that the effect was not large enough to warrant a correction (0.6 of a point).

Question 5: What statistical distribution should be used for the exit score percentile?

This is much less threatening for MFT, because the distribution for each set of MFT scores is reported along with the scores. Experimenters can use the date-specific MFT distribution computed for the nation as a whole. This will give a comparison with students from all universities, public and private, that use MFT. A better approach, again, is to form a group of comparable universities for the purpose of studying exit scores. This will provide a better fit regarding the type of university and the goals of the program.

It must be admitted that other demographic variables, for example geographic location, may exist that will bear on value-added. We rely on future research to improve our general knowledge of variable relationships that are not presently well understood.


The problem: This form of bias is a problem for big prestigious schools that have many students with abnormally high entry scores, because they have a smaller chance of improving on those scores at exit.

Some schools actually have the opposite problem--students with abnormally low scores. Our experience is that abnormally low scoring students tend to be very erratic. Extreme cases also tend to create unusual tail conditions in statistical distributions. These tail conditions can result in the rejection the hypothesis of normality for populations of scores that appear very bell-shaped.

The solution: A way to estimate the effect associated with very high and very low exit scores is to analyze outliers of regressions. We report the results of predicting MFT using ACT and other demographic variables, mentioned earlier under maturation effect, in the results section, to follow.


The problem: We presume that any school that samples students for exit score testing will use some type of randomization method. If self-selection is used, a selection bias is certain to occur because students that will tend to take the test will be those who are motivated to score well because they want to impress future school administrators or future employers. There could also be a bias generated if exam administrators, perhaps unconsciously, fail to retain students they know scored well on the ACT and are doing poorly. Most faculty pay little attention to incoming student scores.

The solution: One must be careful about the sampling aspect of exit test-taking. The best solution for experimental purposes is a mandatory test for all graduating students. The situation of missing test results for any reason is discussed under mortality, below. This self-selection bias can be avoided by administering MFT to all graduating seniors, and then only analyzing those that have corresponding ACT scores. Note that we also study differences between ACT students and non-ACT students.


The problem: Mortality in higher education primarily takes the form of lack of retention. Because of the positive correlation between entrance scores and success in college previously mentioned, we expect the entrance scores to be lower on average for freshmen than for graduating seniors.

A program that uses mandatory exit testing will have another (hopefully minor) mortality issue that has to do with students that do not take the exit test because of illness or other acceptable reasons. An assumption is usually made when comparing results with a national standard that all such universities have some percentage of students designated as test-takers that fail to complete the test. The magnitude of this bias will probably never be fully known.

The solution: A fundamental practice is for exit scores to be compared only to their corresponding entrance scores. This will eliminate the natural difference due to entrance skill differences.

The value-added aspect of education for students that are not retained is a separate issue that should be addressed in the assessment model. If the value added is less for students that are not retained, it may be a sign of a retention issue that needs to be addressed.

The addition of a "middle" score to the assessment effort--one that occurs when students have completed about 50 percent of the program, is a way to measure mortality bias, if any exists. If the value-added between entry and "middle" is not consistent with that between "middle" and exit, then an explanation is in order. At minimum, retention rates should be reported alongside of value-added, as an additional assessment measure of program effectiveness. We do not presently have a "middle" score at our school.


As mentioned, only students who have an ACT score on file and who have progressed to the capstone course are used in the calculation of value-added. For this result period, we use MFT-takers between fall 1999 and spring 2004. Forty percent of the MFT takers did not have ACT scores on file. Most of these are transfer students, to be discussed.

The calculation of value-added is made based on percentiles calculated from the distribution of ACT and MFT, which are described in the next section. Following that is the section contains results of the analysis to predict MFT using various demographic variables, i.e., GPA, curriculum, time, and interaction independent variables. This is done to estimate and correct for certain history, maturation, testing, and statistical regression threats to validity.

To address certain instrumentation threats having to do with ACT-takers being business-bound or not, Hispanic or not, and male or female, we obtained data from ACT for the years 1994 to 2003. The details are discussed in the next section.

Our approach to the sample selection threat is to measure MFT using all graduating students, and analyze those that have reported ACT scores to the university. Some transfer students do not report ACT because their community college or other university experiences qualify them for admission without ACT scores. We find no significant difference for MFT, between students that have ACT scores on file and those that do not (t-test, n1=193, n2=128, t=0.35, p-value = 0.723).


The standard assumption one would make about entry scores is that the distribution for ACT Core students, who have designated college as their plan, would be consistent with that for actual college freshmen. We decided to find out the actual distribution of ACT scores for schools in the US that use ACT in their admission requirements, as reported in US News and World Reports. This analysis was completed for the years 1999--2003. Surprisingly, it turned out to be almost identical for each of the years.

US News and World Reports (2001) give first and third quartiles. We estimate a median entrance score using the midpoint between these two quartiles. We present the resulting histogram of ACT median scores in Table 1. It corresponds to US News and World Reports reported colleges and universities that use ACT in 2001.

The Anderson Darling test for non-normality indicates there is a significant difference between this distribution and a true normal distribution, with a p-value of .000. The mean of 22.03 and the median of 22 are almost identical. Importantly, the median matches that of the distribution of ACT scores for Core students, as reported by ACT Inc (1994-2003), which appears in Table 2. We have not found evidence from this data to say an adjustment is necessary for the difference between ACT Core students and students that actually end up as freshmen in college.

The first approach for determining an ACT distribution for US students is to use some kind of weighted composite of distributions for the years in question. Because our records do not have the exact dates when our students took the ACT, we have to use a compromise. We decided to use a theoretical distribution corresponding to the years 1994 through 1999. This is the period when most of our students were entering freshmen, and we judge that the distribution of ACT scores seems to be relatively stable during that time, based in part on Table 2.

We decided to adjust the mean ACT to correspond to certain key demographics. Because we are comparing MFT for business, we adjust for the Planned Educational Major. Because we are a Hispanic Serving Institution, we adjust for Hispanic ethnicity. We make a very minor adjustment for gender. Appropriate data was obtained from ACT Inc. We used two multiple regression analyses--one to predict the mean, and the other to predict the standard deviation, adjusting for these demographics in both cases. The estimated mean turned out to be 20.9, and the standard deviation 4.14. We determine estimates of the percentiles in Table 1 by calculating the cumulative frequency corresponding to a normal distribution with this mean and standard deviation.


Variables for data collected between fall 1999 and spring 2004 are listed in Table 3.

For all statistical hypotheses testing, we used a significance level of 0.05. The MinitabR statistical software was used for all analysis. Statistical methods include stepwise multiple regression analysis and general linear model analysis.

The data is highly unbalanced because not all variable values are available for all students. Some values are missing for ACT scores as well as the indications or grades for college algebra or business quantitative analysis. Data is available for all students for MFT, gender, Hispanic status, credit hours transferred, age, entry year, cumulative GPA, and time period. Table 4 provides a summary of data available for ACT by gender and transfer hour group.

Our general linear model analysis to predict MFT showed no significant difference between ACT students and non-ACT students, with a p-value of 0.565. There were 190 ACT students and 124 non-ACT students.

We consider only the 190 students that represent the sixty-one percent of graduating seniors who have an ACT on file. Using only graduating students eliminates some of the selection bias associated with students with lower ACT scores. This bias issue is discussed in the following section. Results of the stepwise regression of MFT exit scores with the significant explanatory variables and interactions are presented in Table 5.

We see a strong relationship between MFT and these four variables. The effect for age is small but significant. The gender effect indicates to us that men and women do not follow the same pattern from ACT outcome to MFT outcome. A strongly significant component for GPA suggests to us that success in our curriculum plays a strong part in scoring well on MFT, over and above the ability shown by ACT. This supports a content hypothesis--that our students have added value by completing the business curriculum.

Of the 187 observations used for the analysis in Table 4, eleven observed values were found to somewhat depart from what would be predicted, using Cook's distance. We expect 9.4 such unusual observations. Six of these were unusually high, and five were unusually low. We conclude from this that outliers, and hence statistical regression biases, are not significantly influencing our results.

Corrected value-added results for the semesters between fall 1999 and spring 2004 are listed in Table 6. It can be seen that the overall average value-added is estimated to be 4.84 percentile points. We interpret this, along with the positive coefficient from the stepwise regression, as an indication of a positive value added. Note the Hispanic variable was not found to contribute to the prediction of MFT over the other variables in Table 5.


We summarize our position of the seven types of validity threats as follows.

Threats 1 and 2. Known or unknown history or maturation threats could be mitigated by the use of control groups formed in conjunction with other comparable universities. Although we did not find any in this study, we acknowledge that they may exist.

Threat 3. Because the entry and exit instruments are nationally standardized tests, and based on our subjective evaluation of the testing process, we feel we have not introduced any noteworthy testing effects that are independent of instrumentation. The use of control groups and "middle" measurements would improve our ability to mitigate or estimate testing effects. Even with control groups, subtle testing effects could be introduced administratively that would be difficult to detect.

Threat 4. Having discussed the methods of determining statistical distributions of pretest and posttest scores in detail, we have addressed to some extent the instrumentation threat. We admit that we need to continue working on this, and that more research in time will no doubt shed more light on this aspect.

Threat 5. Based on our analysis of outliers, we do not see that statistical regression is a large threat at our school. We do not address the external threat involved in generalizing our results to other schools.

Threat 6. We do not feel sample selection is a large threat for us because we include all possible students that succeeded into the last year of our undergraduate program in the sample.

Threat 7. The effects of mortality remain unknown. Mortality threats could be better understood by implementing testing at more than two points in our students' curriculum.

We do not conclude our value-added results are free of bias. Overall, we feel we have made some progress in mitigating and/or estimating certain internal biases. We acknowledge that some validity problems remain.

Of course we are pleased that our analysis showed value added. We realize we need to continue studying our data for effects that we presently may not be detecting.


Dealing with internal validity problems is the norm for all researchers that are trying to get good measurements. There are two basic remedies--correction and elimination. Correction is done by estimating the size of the effect using available data. Elimination of the threat requires some type of experimental design, specifically, control groups. (See Campbell and Stanley, 1963)

We add to recommendations of Pickering and Bowers (1990) for use of value-added as follows:

1) Brainstorm possible validity threats. A list of variables that are measurable, as well as those that are not, should be drawn up and considered for study.

2) Estimate trends of assessment measurements over time. This could help to uncover and correct for validity issues.

3) Study as many demographic relationships as possible. These include, but are not limited to, age, gender, previous education, and race. Relationships obtained will help administrators understand sources of differences, as well as point out aspects of the student population that need to be addressed.

4) Estimate the impact of a threat when variable(s) are available, such as demographic and time variables.

5) Eliminate or correct all sources of bias that can be identified and estimated under history, maturation, instrumentation, testing, statistical regression, sample selection, and mortality.

6) Use measures of assessed progress at other points between entry and exit. This will allow an accomplishment of the following: (a) it will help estimation of history, maturation, instrumentation, testing, or mortality threats and (b) it will refine the estimation of the effect of the program over and above the entry scores and other sources of bias based on measured variables. A "middle" score will enhance the precision of the estimate of added value.

7) Form a consortium of similar universities who also use ACT and MFT to study value-added. The members will serve as control groups for each other. This will help to mitigate for history, maturation, testing, and mortality threats.

8) Study other variables that may pertain to student learning. Work habits, lifestyle characteristics, and personality traits may bear on outcomes.


ACT Information Brief (2004-1), Retention, ACT Composite Score, and College GPA: What's the Connection?,

ACT Incorporated (1994-2003), The ACT High School Profile Report--National Data,

Bennett, Douglas C., "Assessing Quality in Higher Education", Liberal Education, Vol. 87(2), 2001, 40-45.

Campbell, Donald T. and Stanley, Julian C., Experimental and Quasi-Experimental Designs for Research, Rand McNally & Company, Chicago, Ill, 1963.

Clerehan, Rosemary, Chanock, Kate, Moore, Tim, and Prince, Anne, "A Testing Issue: Key Skills Assessment in Australia", Teaching in Higher Education, Vol. 8(2), 2003, 279-284.

Colton, Dean A., Gao, Xiaohong, Harris, Deborah J.,Michael J. Kelen, Michael J., Martinovich-Barhite, Dara, Wang, Tianyou and Welch, Catherine J., Reliability Issues With Performance Assessments: A Collection of Papers, ACT Research Report Series 97-3,

Henninger, Edward A.,"Outcomes Assessment: The Role of Business School and Program Accrediting Agencies", Journal of Education for Business, Vol. 69(5), 1994.

Isaac S. and Michael, W. B., Handbook in Research and Evaluation (2nd Ed.), Edits Publishers, San Diego, CA, 1981.

Kerby, Debra and Weber , Sandra, "Linking Mission Objectives to an Assessment Plan", Journal of Education for Business, Vol. 75(4), 2000, 202-209.

Lei, Pui-Wa, Bassiri, Dina and Schultz, E.M., Alternatives to the Grade Point Average as a Measure of Academic Achievement in College, ACT Research Report Series 2001-4, 2001,

Lopez, Cecelia L., "Assessment of Student Learning: Challenges and Strategies", Journal of Academic Librarianship, Vol. 28(6), 2002, 356-367.

Lopez, Cecelia L., "Assessment of Student Learning", Liberal Education, Vol. 84(3), 1998, 36-44.

McCaffrey, Daniel F., Lockwood, J.R., Koretz, Daniel M., and Hamilton, Laura S., Evaluating Value-Added Models for Teaching Accountability, The Rand Corporation, Santa Monica, CA, 2003.

McMillan, J. H., "Beyond Value-Added Education: Improvement Is Not Enough", Journal of Higher Education, Vol. 59(5), 1988, 564-579.

Miller, Michael S., "Classroom Assessment and University Accountability", Journal of Education for Business, Vol. 75(2), 1999.

Noble, Julie and Sawyer, Richard, Predicting Different Levels of Academic Success in College Using High School GPA and ACT Composite Score, ACT Research Report Series 2002-4, 2002,

Olson, Lynn, "Researchers Debate Merits of 'Value-Added' Measures", Education Week, Vol. 24(12), 2004, 14-15.

Pickering, James W. and Bowers, Jeanne C., "Assessing Value-Added outcomes Assessment", Measurement & Evaluation in Counseling & Development, Vol. 22(4), 1990, 215-221.

Powers, D. E., Validity of GRE[R] General Test Scores For Admission To Colleges Of Veterinary Medicine ETS Research Report #RR-01-10 ,1998, ETS Corporation,

Powers, D. E., and Kaufman, J. C., Do Standardized Multiple-Choice Tests Penalize Deep Thinking Or Creative Students? ETS Research Report #RR-02-15, 2002, ETS Corporation,

Pipho, Chris, "The Value-Added Side of Standards", Phi Delta Kappan, Vol. 79(5), 1998.

Slater, Jon, "Doubts About Quality of Added Value Scores", Times Educational Supplement, Col. 4/11/2003, Issue 4527, 2003, 2-6.

Stumpf, Heinrich and Stanley, Julian C., "Group Data on High School Grade Point Averages and Scores on Academic Attitude Tests as Predictors of Institutional Graduation Rates", Educational &Psychological Measurement, Vol. 62(6), 2002, 1042-1053.

Tam, Maureen, "Measuring Quality and Performance in Higher Education", Quality in Higher Education, Vol. 7(1), 2001, 47-54.

US News & World Reports (1999-2003), America's Best Colleges, published by US News & World Reports, Washington, DC. (also at

Charles Zeis, Colorado State University-Pueblo, Pueblo, Colorado, USA

Agnieszka K. Waronska, Colorado State University-Pueblo, Pueblo, Colorado, USA

Rex Fuller, Easter Washington University, Spokane, Washington, USA


Dr. Charles Zeis is retired Professor of Business Administration at the Colorado State University--Pueblo, Colorado. His Ph.D. is in Statistics from Texas A&M University. His research interests include statistical aspects of rating scale data, analysis of survey data, and analysis of student assessment. Professor Agnieszka K. Waronska is completing her doctorate in Manufacturing Management and Engineering at the University of Toledo, Toledo, Ohio. Currently she is an Assistant Professor of Management at the Colorado State University--Pueblo, Colorado.
Histogram of median (N = 497 Each  * represents 2 observations)

Midpoint   Count

14         1 *
15         2 *
16         4 **
17         8 ****
18         8 ****
19         29 ***************
20         45 ***********************
21         81 *****************************************
22         96 ************************************************
23         82 *****************************************
24         63 ********************************
25         35 ******************
26         22 ***********
27         8 ****
28         12 ******
29         1 *


Year    Average Core

1994    22.0
1995    22.0
1996    22.0
1997    22.1
1998    22.1
1999    22.0
2000    22.0

Data is from ACT Inc. (1994 - 2000) The ACT High School Profile
Report--National Data, at www.act.orq/news/data.)


Name               Description

1. MFT Comp        Composite score reported on Major
                   Field Test exam from ETS Co.
2. ACT Comp        Composite score reported on ACT (name)
                    exam from ACT Inc.
3. Gender          Indicator variable for gender
4. Hisp            Indicator variable for Hispanic status
5. HrsTms          Number of hours transferred from either community
                   college or other universities
6. TransCateg      Transfer credit hours groups: 1 = none, 2 = 0- 29
                   hours, 3 = 30-59 hours, 4=60 hours or more
7. Age             Age of respondent in years
8. Entry year      Year of admission
9. AIgInd          College algebra indicator:
                   0 or 1 = taken at the university
10. AIgGrade       GPA for college algebra, ranges from 1.0 to 4.0
11. QuantInd       Quantitative analysis indicator:
                   0 or 1 = taken at the university
12. QuantGrad      GPA for business quantitative analysis course,
                   ranges from 1 to 4.0
13. CumHrs         Cumulative credit hours taken at the university
14. CumGPA         Cumulative GPA at the university,
                   ranges from 2.0 to 4.0
15. PeriodMFT      Time period for MFT exam
16. interactions   Interaction combinations between ACT Comp, Gender,
                   HrsTrns, Trans cate , AGE, and CumGPA


Transfer Group     Male   Female   Total

0 hours              48       51      99
1-29 hours           23       26      49
30-59 hours          10       14      24
60 hours or more      6       12      18
Total                87       93     190

Number of MFT observations available for non-ACT students = 124


Variable          Coefficient   P-Value   Variable Description

1. Age            0.5332        0.001     Age in years
2. Cum GPA        7.223         0.000     Cumulative GPA
3. Gender * ACT   -0.2316       0.000     Gender x ACT composite
4. ACT            2.243         0.000     ACT composite scoreNote:
                                          R-square = 48.9%

Note: R-square = 48.9%

               ACT          MFT          Exit--Entry
Period    N    Percentile   Percentile

1999      10   36.85        47.50        10.65
1999      17   69.40        69           -0.4
2000      22   50.96        53           2.04
2000      16   36.85        46           9.15
2001      7    41.40        44           2.60
2001      22   24.18        22.50        -1.68
2002      30   32.31        49           16.69
2002      21   41.40        44           2.60
2003      12   50.96        61           10.04
2003      6    51           55           4
2004      29   41.40        39           -2.4
Average        43.34        48.18        4.84
Gale Copyright:
Copyright 2009 Gale, Cengage Learning. All rights reserved.