Our ability to process words during a lexical decision task depends
on a number of lexical features. These lexical features include, but are
not limited to, word frequency (i.e. the number of times a word appears
in written text), orthographic neighborhood count (i.e. the total number
of new words that can be created by changing one letter at a time),
letter count, and number of syllables (Andrews, 1992; Sears et al..
2008;). For example, words that are high in frequency are responded to
more quickly and accurately than low frequency words (Rubenstein,
Garfield & Millikan, 1970; Scarborough, Cortese & Scarborough,
1977; Stone & Van Orden, 1993). When designing a lexical decision
experiment researchers often vary specific lexical features while
matching other features. For example, if one were to investigate the
frequency effect, two groups of words would be created that differed on
frequency but matched on other lexical features, such as average
orthographic neighborhood, letter count, and number of syllables.
When creating a list of items for a lexical decision task,
researchers report the mean values for each of the lexical features they
are controlling and manipulating. However, researchers typically do not
refer to the amount of variability for each of the lexical features that
are being controlled. The amount of variability in a list of items can
have an effect on lexical decision performance. For example, in two
separate studies, Glanzer and Ehrenreich (1979) and Gordon (1983), the
variability of the frequency ratings were manipulated. In each study,
participants received six blocks of trials that contained either a pure
low frequency list of items, a pure medium frequency list of items, a
pure high frequency list of items, or one of three mixed frequency, high
variability lists. The mixed frequency lists were created by choosing an
equal number of items from each of the three pure frequency lists. Their
results indicated that increasing the variability of a list of word
items both increased reaction time and decreased the error rate.
According to Balota and Chumbley (1984), a lexical decision may be
performed based on an early familiarity measure. When an item is
presented during a lexical decision task, a representation begins to
form that becomes more and more like something that is stored in memory.
For word items, the similarity between the representation that is
forming and memory will increase at a rate that is greater than the rate
for nonwords. Words will also reach a higher level of familiarity than
nonwords. Therefore, a familiarity decision criterion that falls
somewhere between the average level of familiarity for words and
nonwords could be used as a basis for categorization. If the familiarity
decision criterion is determined based on the average familiarity for
words and nonwords, increasing the variability of a list of words should
have an effect on accuracy but no effect on reaction time. Increasing
the variability of a word list will not affect the average level of
familiarity; however, a highly variable word list will contain items
that are not very familiar and these items would have a greater chance
of incorrectly being called nonwords.
There are a number of lexical decision models that utilize decision
criteria that are based on some measure of familiarity or similarity
between the representation that is forming and memory (Granger &
Jacobs, 1996; Joordens, Piercey, & Azarbehi, 2009; Ratcliff, Gomez,
& McKoon, 2004). However, the purpose of this study is not to
perform a critical evaluation of the existing models of lexical
decision. Rather, the purpose is to determine how changes in the
variability of a word list will affect lexical decision performance.
The mixed lists utilized by Glanzer and Ehrenreich (1979) and
Gordon (1983) were created by combining high, medium, and low frequency
words. Therefore, the resulting lists of mixed words had an average
frequency rating that fell somewhere between the low and high frequency
pure lists and had a greater variability than the pure lists. The
authors of each of these studies did not report the average frequencies
for each of the mixed lists so it is difficult to do a comparison
between the mixed lists and each of the pure lists. The fact that the
mixed lists were responded to more slowly and more accurately may be due
to the variability, the average frequency, or a combination of
variability and frequency.
The purpose of the present experiment was to observe the effect of
word list variability on participant performance in a lexical decision
task. The lists of words for each group will have similar mean values;
however, one group of participants will be given a list of words with
little variability and another group will receive word stimuli that have
a higher level of variability. Any dissimilarity observed in performance
between the two groups of participants should thus be due to the
variability of the words list only.
Sixty seven undergraduate students (36 randomly assigned to the
heterogeneous condition and 31 to the homogeneous condition) from the
University of Toronto at Scarborough participated in the experiment in
return for bonus credits going towards their introductory psychology
course grade (N=67). All participants had normal or corrected to normal
vision and were fluent in English.
Two separate word lists (i.e. homogeneous and heterogeneous) each
consisting of 157 words were selected from the MRC psycholinguistic word
database (Coltheart, 1981). Each list contained one or two syllables
words that were five letters in length. For the homogeneous word list
(see Appendix 1a) the variance of word frequency and orthographic
neighborhood count was reduced as much as possible by choosing items
with the following restrictions. Word frequency (M=33) range was 10-50
and the standard deviation was 11.22 as obtained from the Kucera and
Francis (1967) word frequency index. Orthographic neighborhood (Andrews,
1992) mean was 4.00 with a range of 3-5 and a standard deviation of
0.81. For the heterogeneous word list (see Appendix Ib), the mean word
frequency and neighborhood count was the same as in the homogeneous word
list. However, the variance of word frequency and neighborhood count was
made as large as possible. The word frequency range was 1-2000 (SD =
149.09) and the orthographic neighborhood range was 0-19 (SD = 6.73).
Nonwords were generated by altering only one consonant--excluding
the end consonants--of an alternate list of 157 homogeneous words which
were selected by the same criteria as those used to select the stimuli
for the homogeneous word list (See Piercey & Joordens, 2000, for
The experiment was programmed in MEL version 2, and run on an IBM
compatible personal computer running DOS version 6.1. All stimuli were
presented in the center of a 15 inch IBM monitor, with white letters on
a black background. Participants were asked to read the instructions
provided to them on the computer monitor. The instructions were then
verbally repeated to the participants. Participants began by pressing
the space bar on the computer keyboard. Each trial consisted of (1) a
250-msec. blank field, (2) a 250-msec. presentation of a fixation cross
+, (3) a second 250-msec. blank field, and (4) either a word or nonword
was presented until the participant responded. Participants categorized
the lexical status of each item by pressing one of two keys on the
computer keyboard, the z key and the / key. The keys assigned for word
and nonword responses were counterbalanced across participants. The
stimulus disappeared as soon as one of the buttons was pressed and the
next trial would begin. Participants completed a block of 28 practice
trials followed by three experimental blocks consisting of 100 trials
per block (50 words and 50 nonwords) for a total of 300 experimental
trials. Participants were allowed to rest between blocks. A message
appeared that instructed participants to press the space bar on the
computer keyboard when they were ready to continue. Response reaction
times and accuracies were recorded and the experiment took approximately
30 minutes to complete.
Mean reaction time and accuracy, with standard deviations for the
two conditions are presented in Table 1.
A 2 (Stimulus Variability: low, high) X 2 (Stimulus Type: word,
nonword) analysis of variance was performed. All reaction times are
reported in milliseconds and accuracy is presented as percentage of
correct trials. The analysis of reaction time did not yield a
significant main effect of Stimulus Variability [F(1, 65)=1.20, n.s.].
The main effect of Stimulus Type was significant [F(1, 65)=49.27,
p<.0001]. Word responses (M=729 ms, SD=201) were significantly faster
than nonword responses (M=835 ms, SD=286). There were no significant
interactions between Stimulus Variability and Stimulus Type [F(1,
Accuracy rates showed a significant main effect of Stimulus
Variability [F(1, 65)=22.26, p<.0001]. Accuracy for low variability
words (M=90%, SD=6.56) was higher than accuracy for high variability
words (M=83%, SD=10.88). There was also a significant main effect of
Stimulus Type such that word accuracy (M=83% , SD=9.58) was lower than
nonword accuracy [M=89% , SD=9.10; F(1, 65)=15.19, p<.0005]. The
interaction between Stimulus Variability and Stimulus Type was also
significant [F(1, 65)=22.26, p<.0001].
A Levene's test of homogeneity of variance was performed on
reaction time and accuracy. No significant results were found.
Therefore, the homogeneity of variance assumptions were not violated for
the above analyses.
The simple effects were tested using Tukey's HSD test
(p<.05). It revealed that the accuracy of responses to high
variability words (M=77%, SD=7.32) were lower than the accuracy of
responses to low variability words (M=91%, SD=5.21). Also, the accuracy
of nonwords responses in the high variability condition (M=89%,
SD=10.33) and the low variability condition (M=88%, SD=7.58) were
significantly higher than the accuracy of words in the high variability
condition. No other simple effects were significant.
The purpose of this study was to determine the effect of word list
variability on reaction time and accuracy during a lexical decision
task. Researchers routinely report averages of various lexical features
for their lists of words (e.g. frequency, number of syllables, etc.);
however, they do not report the variability of their word items. The
results of this experiment indicate that controlling the variability of
the list of word items is important when developing a lexical decision
The results of this study demonstrate how manipulating the
variability of a word's lexical features affects response accuracy.
Specifically, while the error rates of words and nonwords were equal
when low variability words were used, they changed drastically when
greater variability was introduced into the word list. Specifically, the
error rates of words in the high word variability condition increased by
14% when compared to the error rates of words in the low word
variability condition. Also, the accuracy of words was lower than the
accuracy of nonwords in the high variability condition. However, the
reaction time of words was faster when compared to nonwords. It appears
that, because of the increase in variability, participants are more
likely to incorrectly classify words as nonwords. If an emphasis is
placed on accuracy, this effect may change. However, further research is
needed to investigate this finding.
Although not as obvious as error rates, another aspect of the data
that varied between the two conditions was the standard deviations of
the performance measures. In all instances, reaction times and
accuracies, the standard deviations increased when greater variability
was introduced into word stimuli. In fact, the variability in the
reaction time measure for the high variability word stimulus list was
twice as large as the low variability word stimulus list condition. This
larger standard deviation could affect the results of studies because it
creates greater noise during data analyses which could mask any real
significant effects that the researchers are looking for and produce
null results. So, if the purpose of a particular study is to test a
specific theory, then the reduction of the variability in the type of
stimuli selected for the study will help increase the probability of
finding results that are due to their proposed theories and not because
of artifacts such as stimulus variability.
Finally, in trying to replicate past research one limitation that
has become clear is that the information that is provided about how
stimulus lists are generated is not always sufficient in order to
replicate a study properly. Although many studies provide ranges as a
representation of the content of their stimulus list, this information
only informs researchers of the maximum and minimum values that were
used to generate the list, and provides no further information about the
contents of the list that might be obtained if a mean were also
included. A better method of describing the characteristics of a
stimulus list may be through mean and standard deviation. Greater
information about the contents of a stimulus list can be obtained from
standard deviation than from range. If such a method is thought too
cumbersome to be used than an alternative solution would be to provide a
list of the stimuli that were used as part of the manuscript. This
method makes replication studies much easier to conduct and opens the
door for public scrutiny of stimuli that were employed in a study.
Homogenous Word List
Examples of homogenous words used in the experiment include:
SHEEP, ABUSE, GUEST, FLOOD, MOUNT, NURSE
PAUSE, RANCH, SHELF, STEAM, TOAST, VOCAL
The entire list is available upon request from the second author.
Heterogeneous Word List
Examples of heterogeneous words used in the experiment include:
AGREE, DOLLY, LATER, RENEW, RIVER, THIGH
ACRID, DOWER, LOBAR, RHEUM, RUPEE, TABAC
The entire list is available upon request from the second author.
Andrews, S. (1992). Frequency and neighbourhood effects on lexical
access: Lexical similarity or orthographic redundancy? Journal of
Experimental Psychology: Learning, Memory, and Cognition, 18, 234-254.
Balota, D. A., & Chumbley, J. I. (1984). Are lexical decisions
a good measure of lexical access? The role of word frequency in the
neglected decision stage. Journal of Experimental Psychology: Human
Perception & Performance, 10, 340-357.
Coltheart, M. (1981). The MRC psycholinguistic database. Quarterly
Journal of Experimental Psychology, 33A, 497-505.
Glanzer, M., & Ehrenreich, S. L. (1979). Structure and search
of the internal lexicon. Journal of Verbal Learning and Verbal Behavior,
Gordon, B. (1983). Lexical access and lexical decision: Mechanisms
of frequency sensitivity. Journal of Verbal Learning and Verbal
Behavior, 22, 24-44.
Grainger, J. & Jacobs, A. M. (1996). Orthographic processing in
visual word recognition: A multiple read-out model. Psychological
Review, 103, 518565.
Joordens, S., Piercey, C. D., & Azarbehi, R. (2009). Modeling
performance at the trial level within a diffusion framework: A simple
yet powerful method for increasing efficiency via error detection and
correction. Canadian Journal of Experimental Psychology. 63(2), 81-93.
Kucera, H., & Francis, W. N. (1967). Computational analysis
of present-day American English. Providence, RI: Brown University Press.
Piercey, C. D., & Joordens, S. (2000). Turning an advantage
into a disadvantage: Ambiguity effects in lexical decision versus
reading tasks. Memory & Cognition, 28, 657-666.
Ratcliff, R., Gomez, P., & McKoon, G. (2004). A diffusion model
account of the lexical decision task. Psychological Review, 111(1),
Rubenstein, H., Garfield, L., & Millikan, J. A. (1970).
Homographic entries in the internal lexicon. Journal of Verbal Learning
and Verbal Behavior, 9, 487-492.
Scarborough, D. L., Cortese, C., & Scarborough, H. L. (1977).
Frequency and repetition effects in lexical memory. Journal of
Experimental Psychology: Human Perception and Performance, 3, 1-17.
Sears, C.R., Siakaluk, P.D., Chow. V., & Buchanan, L. (2008).
Is there an effect of print exposure on the word frequency effect and
the neighborhood size effect? Journal of Psycholinguistic Research, 37,
Stone, G. O., & Van Orden, G. C. (1993). Strategic control of
processing in word recognition. Journal of Experimental Psychology:
Human Perception and Performance, 19, 744-774.
Rostam Azarbehi (1), C. Darren Piercey (1), & Steve Joordens
(1) University of New Brunswick
(2) University of Toronto at Scarborough
291 Author info: Correspondence should be sent to: Dr. C. Darren
Piercey, Department of Psychology, University of New Brunswick, 38
Dineen Drive, Fredericton, New Brunswick, E3B 6E4 E-mail: firstname.lastname@example.org
TABLE 1 Unweighted Mean Reaction Times (ms) and Accuracies (%)
of Condition type (SDs / SEs are Presented in parentheses)
Word 687 (96) 765 (255)
Non-word 807 (159) 858 (363)
Word 91 (5) 77 (7)
Nonword 88 (8) 89 (10)