The effects of word variability on the lexical decision task.
The effect of word list variability on lexical decision performance was investigated. The stimulus list presented to participants either contained words that varied little from each other (homogeneous) in terms of word frequency and neighborhood count or words that varied a lot from each other (heterogeneous). All other factors were kept constant, and only the variability of the word lists was manipulated. The results of the study showed that increasing the variability of the word list decreased the accuracy of response to words. No effect was observed on nonword accuracies. This suggests that studies using highly variable word lists may be getting significant results simply because of the variability itself.

Article Type:
Lexical phonology (Research)
Mental lexicon (Research)
Decision-making (Psychological aspects)
Azarbehi, Rostam
Piercey, C. Darren
Joordens, Steve
Pub Date:
Name: North American Journal of Psychology Publisher: North American Journal of Psychology Audience: Academic Format: Magazine/Journal Subject: Education; Psychology and mental health Copyright: COPYRIGHT 2011 North American Journal of Psychology ISSN: 1527-7143
Date: June, 2011 Source Volume: 13 Source Issue: 2
Event Code: 310 Science & research
Geographic Scope: United States Geographic Code: 1USA United States
Accession Number:
Full Text:
Our ability to process words during a lexical decision task depends on a number of lexical features. These lexical features include, but are not limited to, word frequency (i.e. the number of times a word appears in written text), orthographic neighborhood count (i.e. the total number of new words that can be created by changing one letter at a time), letter count, and number of syllables (Andrews, 1992; Sears et al.. 2008;). For example, words that are high in frequency are responded to more quickly and accurately than low frequency words (Rubenstein, Garfield & Millikan, 1970; Scarborough, Cortese & Scarborough, 1977; Stone & Van Orden, 1993). When designing a lexical decision experiment researchers often vary specific lexical features while matching other features. For example, if one were to investigate the frequency effect, two groups of words would be created that differed on frequency but matched on other lexical features, such as average orthographic neighborhood, letter count, and number of syllables.

When creating a list of items for a lexical decision task, researchers report the mean values for each of the lexical features they are controlling and manipulating. However, researchers typically do not refer to the amount of variability for each of the lexical features that are being controlled. The amount of variability in a list of items can have an effect on lexical decision performance. For example, in two separate studies, Glanzer and Ehrenreich (1979) and Gordon (1983), the variability of the frequency ratings were manipulated. In each study, participants received six blocks of trials that contained either a pure low frequency list of items, a pure medium frequency list of items, a pure high frequency list of items, or one of three mixed frequency, high variability lists. The mixed frequency lists were created by choosing an equal number of items from each of the three pure frequency lists. Their results indicated that increasing the variability of a list of word items both increased reaction time and decreased the error rate.

According to Balota and Chumbley (1984), a lexical decision may be performed based on an early familiarity measure. When an item is presented during a lexical decision task, a representation begins to form that becomes more and more like something that is stored in memory. For word items, the similarity between the representation that is forming and memory will increase at a rate that is greater than the rate for nonwords. Words will also reach a higher level of familiarity than nonwords. Therefore, a familiarity decision criterion that falls somewhere between the average level of familiarity for words and nonwords could be used as a basis for categorization. If the familiarity decision criterion is determined based on the average familiarity for words and nonwords, increasing the variability of a list of words should have an effect on accuracy but no effect on reaction time. Increasing the variability of a word list will not affect the average level of familiarity; however, a highly variable word list will contain items that are not very familiar and these items would have a greater chance of incorrectly being called nonwords.

There are a number of lexical decision models that utilize decision criteria that are based on some measure of familiarity or similarity between the representation that is forming and memory (Granger & Jacobs, 1996; Joordens, Piercey, & Azarbehi, 2009; Ratcliff, Gomez, & McKoon, 2004). However, the purpose of this study is not to perform a critical evaluation of the existing models of lexical decision. Rather, the purpose is to determine how changes in the variability of a word list will affect lexical decision performance.

The mixed lists utilized by Glanzer and Ehrenreich (1979) and Gordon (1983) were created by combining high, medium, and low frequency words. Therefore, the resulting lists of mixed words had an average frequency rating that fell somewhere between the low and high frequency pure lists and had a greater variability than the pure lists. The authors of each of these studies did not report the average frequencies for each of the mixed lists so it is difficult to do a comparison between the mixed lists and each of the pure lists. The fact that the mixed lists were responded to more slowly and more accurately may be due to the variability, the average frequency, or a combination of variability and frequency.

The purpose of the present experiment was to observe the effect of word list variability on participant performance in a lexical decision task. The lists of words for each group will have similar mean values; however, one group of participants will be given a list of words with little variability and another group will receive word stimuli that have a higher level of variability. Any dissimilarity observed in performance between the two groups of participants should thus be due to the variability of the words list only.



Sixty seven undergraduate students (36 randomly assigned to the heterogeneous condition and 31 to the homogeneous condition) from the University of Toronto at Scarborough participated in the experiment in return for bonus credits going towards their introductory psychology course grade (N=67). All participants had normal or corrected to normal vision and were fluent in English.


Two separate word lists (i.e. homogeneous and heterogeneous) each consisting of 157 words were selected from the MRC psycholinguistic word database (Coltheart, 1981). Each list contained one or two syllables words that were five letters in length. For the homogeneous word list (see Appendix 1a) the variance of word frequency and orthographic neighborhood count was reduced as much as possible by choosing items with the following restrictions. Word frequency (M=33) range was 10-50 and the standard deviation was 11.22 as obtained from the Kucera and Francis (1967) word frequency index. Orthographic neighborhood (Andrews, 1992) mean was 4.00 with a range of 3-5 and a standard deviation of 0.81. For the heterogeneous word list (see Appendix Ib), the mean word frequency and neighborhood count was the same as in the homogeneous word list. However, the variance of word frequency and neighborhood count was made as large as possible. The word frequency range was 1-2000 (SD = 149.09) and the orthographic neighborhood range was 0-19 (SD = 6.73).

Nonwords were generated by altering only one consonant--excluding the end consonants--of an alternate list of 157 homogeneous words which were selected by the same criteria as those used to select the stimuli for the homogeneous word list (See Piercey & Joordens, 2000, for procedure).


The experiment was programmed in MEL version 2, and run on an IBM compatible personal computer running DOS version 6.1. All stimuli were presented in the center of a 15 inch IBM monitor, with white letters on a black background. Participants were asked to read the instructions provided to them on the computer monitor. The instructions were then verbally repeated to the participants. Participants began by pressing the space bar on the computer keyboard. Each trial consisted of (1) a 250-msec. blank field, (2) a 250-msec. presentation of a fixation cross +, (3) a second 250-msec. blank field, and (4) either a word or nonword was presented until the participant responded. Participants categorized the lexical status of each item by pressing one of two keys on the computer keyboard, the z key and the / key. The keys assigned for word and nonword responses were counterbalanced across participants. The stimulus disappeared as soon as one of the buttons was pressed and the next trial would begin. Participants completed a block of 28 practice trials followed by three experimental blocks consisting of 100 trials per block (50 words and 50 nonwords) for a total of 300 experimental trials. Participants were allowed to rest between blocks. A message appeared that instructed participants to press the space bar on the computer keyboard when they were ready to continue. Response reaction times and accuracies were recorded and the experiment took approximately 30 minutes to complete.


Mean reaction time and accuracy, with standard deviations for the two conditions are presented in Table 1.

A 2 (Stimulus Variability: low, high) X 2 (Stimulus Type: word, nonword) analysis of variance was performed. All reaction times are reported in milliseconds and accuracy is presented as percentage of correct trials. The analysis of reaction time did not yield a significant main effect of Stimulus Variability [F(1, 65)=1.20, n.s.]. The main effect of Stimulus Type was significant [F(1, 65)=49.27, p<.0001]. Word responses (M=729 ms, SD=201) were significantly faster than nonword responses (M=835 ms, SD=286). There were no significant interactions between Stimulus Variability and Stimulus Type [F(1, 65)=0.78, n.s.].

Accuracy rates showed a significant main effect of Stimulus Variability [F(1, 65)=22.26, p<.0001]. Accuracy for low variability words (M=90%, SD=6.56) was higher than accuracy for high variability words (M=83%, SD=10.88). There was also a significant main effect of Stimulus Type such that word accuracy (M=83% , SD=9.58) was lower than nonword accuracy [M=89% , SD=9.10; F(1, 65)=15.19, p<.0005]. The interaction between Stimulus Variability and Stimulus Type was also significant [F(1, 65)=22.26, p<.0001].

A Levene's test of homogeneity of variance was performed on reaction time and accuracy. No significant results were found. Therefore, the homogeneity of variance assumptions were not violated for the above analyses.

The simple effects were tested using Tukey's HSD test (p<.05). It revealed that the accuracy of responses to high variability words (M=77%, SD=7.32) were lower than the accuracy of responses to low variability words (M=91%, SD=5.21). Also, the accuracy of nonwords responses in the high variability condition (M=89%, SD=10.33) and the low variability condition (M=88%, SD=7.58) were significantly higher than the accuracy of words in the high variability condition. No other simple effects were significant.


The purpose of this study was to determine the effect of word list variability on reaction time and accuracy during a lexical decision task. Researchers routinely report averages of various lexical features for their lists of words (e.g. frequency, number of syllables, etc.); however, they do not report the variability of their word items. The results of this experiment indicate that controlling the variability of the list of word items is important when developing a lexical decision experiment.

The results of this study demonstrate how manipulating the variability of a word's lexical features affects response accuracy. Specifically, while the error rates of words and nonwords were equal when low variability words were used, they changed drastically when greater variability was introduced into the word list. Specifically, the error rates of words in the high word variability condition increased by 14% when compared to the error rates of words in the low word variability condition. Also, the accuracy of words was lower than the accuracy of nonwords in the high variability condition. However, the reaction time of words was faster when compared to nonwords. It appears that, because of the increase in variability, participants are more likely to incorrectly classify words as nonwords. If an emphasis is placed on accuracy, this effect may change. However, further research is needed to investigate this finding.

Although not as obvious as error rates, another aspect of the data that varied between the two conditions was the standard deviations of the performance measures. In all instances, reaction times and accuracies, the standard deviations increased when greater variability was introduced into word stimuli. In fact, the variability in the reaction time measure for the high variability word stimulus list was twice as large as the low variability word stimulus list condition. This larger standard deviation could affect the results of studies because it creates greater noise during data analyses which could mask any real significant effects that the researchers are looking for and produce null results. So, if the purpose of a particular study is to test a specific theory, then the reduction of the variability in the type of stimuli selected for the study will help increase the probability of finding results that are due to their proposed theories and not because of artifacts such as stimulus variability.

Finally, in trying to replicate past research one limitation that has become clear is that the information that is provided about how stimulus lists are generated is not always sufficient in order to replicate a study properly. Although many studies provide ranges as a representation of the content of their stimulus list, this information only informs researchers of the maximum and minimum values that were used to generate the list, and provides no further information about the contents of the list that might be obtained if a mean were also included. A better method of describing the characteristics of a stimulus list may be through mean and standard deviation. Greater information about the contents of a stimulus list can be obtained from standard deviation than from range. If such a method is thought too cumbersome to be used than an alternative solution would be to provide a list of the stimuli that were used as part of the manuscript. This method makes replication studies much easier to conduct and opens the door for public scrutiny of stimuli that were employed in a study.


Homogenous Word List

Examples of homogenous words used in the experiment include:



The entire list is available upon request from the second author.


Heterogeneous Word List

Examples of heterogeneous words used in the experiment include:



The entire list is available upon request from the second author.


Andrews, S. (1992). Frequency and neighbourhood effects on lexical access: Lexical similarity or orthographic redundancy? Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 234-254.

Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions a good measure of lexical access? The role of word frequency in the neglected decision stage. Journal of Experimental Psychology: Human Perception & Performance, 10, 340-357.

Coltheart, M. (1981). The MRC psycholinguistic database. Quarterly Journal of Experimental Psychology, 33A, 497-505.

Glanzer, M., & Ehrenreich, S. L. (1979). Structure and search of the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 18, 381-398.

Gordon, B. (1983). Lexical access and lexical decision: Mechanisms of frequency sensitivity. Journal of Verbal Learning and Verbal Behavior, 22, 24-44.

Grainger, J. & Jacobs, A. M. (1996). Orthographic processing in visual word recognition: A multiple read-out model. Psychological Review, 103, 518565.

Joordens, S., Piercey, C. D., & Azarbehi, R. (2009). Modeling performance at the trial level within a diffusion framework: A simple yet powerful method for increasing efficiency via error detection and correction. Canadian Journal of Experimental Psychology. 63(2), 81-93.

Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.

Piercey, C. D., & Joordens, S. (2000). Turning an advantage into a disadvantage: Ambiguity effects in lexical decision versus reading tasks. Memory & Cognition, 28, 657-666.

Ratcliff, R., Gomez, P., & McKoon, G. (2004). A diffusion model account of the lexical decision task. Psychological Review, 111(1), 159-182.

Rubenstein, H., Garfield, L., & Millikan, J. A. (1970). Homographic entries in the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 9, 487-492.

Scarborough, D. L., Cortese, C., & Scarborough, H. L. (1977). Frequency and repetition effects in lexical memory. Journal of Experimental Psychology: Human Perception and Performance, 3, 1-17.

Sears, C.R., Siakaluk, P.D., Chow. V., & Buchanan, L. (2008). Is there an effect of print exposure on the word frequency effect and the neighborhood size effect? Journal of Psycholinguistic Research, 37, 269-291.

Stone, G. O., & Van Orden, G. C. (1993). Strategic control of processing in word recognition. Journal of Experimental Psychology: Human Perception and Performance, 19, 744-774.

Rostam Azarbehi (1), C. Darren Piercey (1), & Steve Joordens (2)

(1) University of New Brunswick

(2) University of Toronto at Scarborough

291 Author info: Correspondence should be sent to: Dr. C. Darren Piercey, Department of Psychology, University of New Brunswick, 38 Dineen Drive, Fredericton, New Brunswick, E3B 6E4 E-mail:
TABLE 1 Unweighted Mean Reaction Times (ms) and Accuracies (%)
of Condition type (SDs / SEs are Presented in parentheses)

                       Condition Type

                Homogenous   Heterogeneous

Reaction Time
  Word           687 (96)      765 (255)
  Non-word       807 (159)     858 (363)
  Word            91 (5)        77 (7)
  Nonword         88 (8)        89 (10)
Gale Copyright:
Copyright 2011 Gale, Cengage Learning. All rights reserved.