Investigating more powerful discrimination tests with consumers - effects of memory and response bias.pdf

(114 KB) Pobierz
PII: S0950-3293(01)00055-6
Food Qualityand Preference 13 (2002)39–45
www.elsevier.com/locate/foodqual
Investigatingmorepowerfuldiscriminationtestswithconsumers:
effectsofmemoryandresponsebias
Benoı ˆ tRousseau*,StefanieStroh,MichaelO’Mahony
Department of Food Science and Technology, University of California, Davis CA 95616, USA
Received4April 2000;receivedin revisedform19 June 2001;accepted20 July 2001
Abstract
Two experiments were conducted to investigate the sensitivity of four discrimination methods when they were performed by
consumers. InExperiment I, the influence of memory in the duo-trio method was studied. Three versions of theduo-trio method
with different memory requirements were considered. Calculated d 0 values indicated a higher sensitivity for the duo-trio with the
referencetasted between thetwotestsamples (DTM), illustrating theimportance ofmemoryinsensory discrimination testing.In
Experiment2,fourdiscriminationtestswerecompared:thetriangle,theDTM,thesame–differentandthedual-pairtests.Thedual-
pairtestwaspredictedtoincreasethed 0 valueofthesame–differenttestbyeliminatingthelargeresponsebiasvariationsbetween
consumers. Results indicated no significant differences in d 0 among the protocols. Thus the dual-pair method was not able to
improvethesamplediscriminationabilityofthesame–differenttest. # 2001ElsevierScienceLtd.Allrightsreserved.
Keywords: Discriminationtests;Power;Consumers;Memory;Responsebias
1. Introduction
& Sauvageot, 1988; Gridgeman, 1955; Helm & Trolle,
1946; Hopkins, 1954; Hopkins & Gridgeman, 1955;
MacRae & Geelhoed, 1992; Pfaffman, 1954; Pokorny,
Marcı ´ n, & Davı ´ dek, 1981; Raffensperger & Pilgrim,
1956; Vessereau, 1965). The work of Thurstone (1927),
laterusedbyFrijters(1979)tosolvethe‘paradoxofdis-
criminatorynon-discriminators’(Byer&Abrams,1953;
Gridgeman, 1970), allowed an explanation of these
inconsistencies and provided a more consistent analysis
of results obtained from discrimination tests and other
sensory evaluation procedures. This approach involves
thecalculationofad 0 valuewhichisanindexdescribing
theperceiveddegreeofdifferencebetweentwoproducts.
Thelargerthevalue,themoredifferenttheproducts.
This approach has been frequently applied in recent
studies in order to investigate which sensory protocol
would be the most appropriate to study small sensory
differences between products (Delwiche & O’Mahony,
1996; Dessirier & O’Mahony, 1998; Frijters, 1980;
Geelhoed, Macrae, & Ennis, 1994; Hautus & Irwin,
1995; Huang & Lawless, 1998; Irwin, Hautus, & Still-
man, 1992; Irwin, Stillman, Hautus, & Huddleston,
1993; Masuoka, Hatjopoulos, & O’Mahony, 1995;
Rousseau & O’Mahony, 1997, 2000, 2001; Rousseau,
Meyer, & O’Mahony, 1998; Rousseau, Rogeaux, &
The area of discrimination testing has recently been
an active area of research in sensory evaluation. Dis-
crimination methods are broadly used both in the
industry and academia, their applications ranging from
daily quality control measurements, to the study of the
impact of ingredient or process changes, to the investi-
gation of the ability of consumers to discriminate
amongproducts.Becauseofthecostlyimplicationsthat
can be induced by imprecise results or conclusions,
numerous studies have been conducted to attempt to
definethemostappropriatediscriminationmethods.By
‘most appropriate’ was meant the most powerful
methodofinvestigation.Themorepowerfulthetest,the
morelikelythedetectionofasensorydifferencewhenit
exists. The analysis of results in early studies was often
based on the guessing model (illustrated by the use of
binomial tables) and yielded conflicting results (Bucha-
nan, Givon, & Goldman, 1987; Byer & Abrams, 1953;
Dawson&Dochterman,1951;Filipello,1956;Franc¸ois
* Corresponding author. Tel.: +1-530-752-6389; fax: +1-530-752-
4759.
E-mail address: bdrousseau@ucdavis.edu(B.Rousseau).
0950-3293/01/$ -see frontmatter # 2001ElsevierScienceLtd. All rightsreserved.
PII: S0950-3293(01)00055-6
448698929.002.png 448698929.003.png
40
B. Rousseau et al./Food Quality and Preference 13 (2002) 39–45
O’Mahony, 1999; Stillman, 1993; Stillman & Irwin,
1995;Tedja,Nonaka,Ennis,&O’Mahony,1994).
While the majority of these studies have been con-
ducted under very controlled conditions in the labora-
tory, few have been conducted under less controlled
conditions of consumer testing. Stillman (1993) con-
firmed the paradox of discriminatory non-dis-
criminators using consumers; the subjects could
discriminatesignificantlybetterbetweentwoversionsof
anonionpartydipusinga3-AlternativeForcedChoice
(3-AFC) than using a triangle test. This illustrated the
higherpowerofthe3-AFCoverthetriangletest(Ennis,
1993) because of its more effective cognitive strategy.
One of the drawbacks of the 3-AFC test, or its two-
sample counterpart, the paired comparison test (2-
AFC), is that it requires the specification of the nature
of the difference in the instructions. Yet, this informa-
tionisnotalwaysreadilyavailabletotheexperimenter.
Another study by Rousseau et al. (1998) compared the
performance of the duo-trio, triangle and same–differ-
enttestswithconsumers.Thesethreeprotocolshavethe
advantageofnotrequiringanydescriptionofthedimen-
sion of the difference; however, they lack the statistical
power of the 2-AFC and 3-AFC tests (Ennis, 1993). In
that study, Rousseau et al. found that the same–differ-
enttestwasthemostpowerfulandmostsensitiveofthe
three protocols thanks to its more effective cognitive
strategyandlowermemoryrequirements,respectively.
While some studies have shown the importance of
memory when using scaling procedures (Kim &
O’Mahony, 1998), more studies have investigated its
effects in discrimination testing. For example, memory
decay over time has been shown to affect performance
in the same–different test (Cubero, de Almeida, &
O’Mahony, 1995; de Almeida, Cubero, & O’Mahony,
1999). Furthermore, memory is thought to be the main
factor giving an advantage to the same–different over
the triangle test (Rousseau et al., 1998, 1999; Rousseau
&O’Mahony,2000;2001).Memoryisalsosuspectedof
beingpartlyresponsibleforthehighersensitivity ofthe
2-AFC over the 3-AFC (Dessirier & O’Mahony, 1998;
Rousseau&O’Mahony,1997).
A potentially useful way of further confirming the
significant effects of memory would be to change only
slightly the design of a particular protocol, while alter-
ing its memory properties, and observe its effect on the
protocol’s performance. A candidate for such an inves-
tigation is the duo-trio method. The traditional pre-
sentationinvolvesthetastingofthereferencefirst,then
the two unknown or ‘test’ samples. It is possible that
oncethethirdsampleisevaluated,thememorytraceor
‘engram’ofthereferencestoredintheshort-termmem-
ory would have degenerated; it would be ‘blurred 0 , and
might cause the judge to provide an incorrect answer.
One could slightly modify the mode of presentation by
introducing the reference between the two alternatives,
so that both alternatives are evaluated immediately
before or after the reference, reducing memory require-
ments for the comparisons. This issue was investigated
inExperiment1.
While the same–different protocol is more sensitive
and powerful than the traditional triangle and duo-trio
tests, it might be possible to further improve its perfor-
mance. The factor that needs to be taken into con-
sideration is response bias, in the form of their t
criterion,apsychological‘yardstick’ofdegreeofdiffer-
ence. If the perceived difference between the two sam-
ples is smaller than his t criterion, the subject will
consider the stimuli as ‘same’, while a larger difference
will induce him to consider them as ‘different’. Signal
Detection Theory and Thurstonian models successfully
deal with this response bias to yield an unbiased
measure of discrimination, the d 0 value. However, one
ofthelimitationsofcombiningdatafromdifferentsub-
jects in the same–different test is that subjects exhibit t
criteria of different sizes. Such combination will cause
anunderestimationofthed 0 value(Hautus,1997;Mac-
millan&Kaplan,1985).
Thedual-pairtestisthusinvestigatedinanattemptto
control this effect. In the dual-pair test, also called 4-
IntervalAXor4IAXinpsychology(e.g.Kaplan,Mac-
millan,&Creelman,1978),thesubjectispresentedwith
two pairs of samples simultaneously. One is a pair of
identical samples, the other of different samples. In the
pair consisting of two different samples, one of the
samples is the same as the sample used in the pair of
identicalstimuli.Thetaskofthesubjectistodetermine
which pair has the same and which pair has different
samples. This protocol has been studied previously
under controlled conditions (Rousseau & O’Mahony,
2000,2001),whileaThurstonianmodelhasbeendevel-
oped allowing the investigation of its statistical power
(Rousseau & Ennis, 2001). Information obtained from
these experiments and new models did not indicate an
advantage for the dual-pair test. Its statistical power
wasfoundtobemoresimilartothatoftheduo-triotest
and thus inferior to that of the same–different test.
Furthermore, it did not yield larger d 0 values. Not
yielding larger d 0 values than the same–different test
might have been due to the fact that in controlled con-
ditions subjects were familiar with the products under
study,limitingthelargevariationsinthesubjects’ t cri-
teria which would have hindered performance on the
same–different test. It is possible that such variations
could have a larger influence on the same–different test
when testing highly ‘uncalibrated 0 subjects such as con-
sumers. Here, an advantage of the dual-pair test over
thesame–differenttestmightpossiblybeobserved.This
particularissuewasstudiedinExperiment2.
Thus, in summary, the present experiments further
investigatedtheeffectofmemoryindiscriminationtest-
ing and the possible experimental advantages of the
B. Rousseau et al./Food Quality and Preference 13 (2002) 39–45
41
dual-pair paradigm. In Experiment 1 concerning mem-
ory,thesensitivitiesofthreeversionsoftheduo-triotest
were compared using 96 consumers. The objective was
to investigate the effect of the position of the reference
versus that of the two alternative samples. In Experi-
ment 2, the relative performances of the dual-pair and
same–different tests were compared using 144 con-
sumers. The duo-trio and triangle tests were also inclu-
ded in the study so as to provide more complete
informationabouttheadvantagesanddisadvantagesof
thetraditionaldiscriminationprotocols.
twotestsamples(DTM)andathirdversiondesignated
as the 2-Distance-AFC or 2-D-AFC. For the first two
protocols,thesubjectwasaskedtoindicatewhichofthe
two alternatives was the most similar to the reference.
The 2-D-AFC had the same presentation order as the
DTM, but the subject was asked to indicate which
changeofsensationwasthelarger.Thus,theyreported
whether the sensation change was greater between the
first and the second sample, or between the second and
thethirdsample,ratherthanindicatewhichsamplewas
themoresimilartothemiddlestimulus.Here,theywere
nottoldthatamongthethreesamples,twowereidentical.
Onedeionizedwaterrinsewastakenbeforeeachtest.
Before tasting the first sample of the triad, a primer
sample was taken in order to prevent the distorting
effect of the rinsing water on the taste of the first sam-
ple. The primer was always sample A. Subjects were
requiredtotakethewhole10mlofeachsampleatonce
and swallow it. Water rinses were also swallowed. No
retasting was allowed so as to allow memory to have a
larger influence. The order of presentation of the tests
and the order in which samples appeared in a test were
counterbalancedoverjudges.Inordertopreventdiffer-
ential sequence effects among the various protocols,
only two presentation orders were used, namely ‘AAB’
and‘BBA’.Eachjudgeencounteredonlyoneofthetwo
possibletriadsoverthethreeprotocolsperformed.Jud-
ges responded verbally, no feedback was given. Session
lengths from interception to termination lasted
approximately10–15mins.
2. Experiment 1
2.1. Materials and methods
2.1.1. Judges
Ninety-six(96)subjects(51M,45F;agerange13–56
yearsaverageage22.7years)participatedinExperiment
I. They were students, faculty, staff and their friends
intercepted on the campus of the University of Cali-
fornia,Davis.
2.1.2. Products
The stimuli for the experiment consisted of two ver-
sions of a non-carbonated orange flavored beverage
(The Gatorade company, Chicago, IL, USA), available
asaninstantmix.Onestimulus(A)waspreparedusing
80 g of product per liter, mixed with deionized water.
Thesecondstimulus(B)waspreparedthesamewaybut
10 g of sucrose (Fischer Grade, Fisher Scientific, Fair
Lawn,NJ,USA)werealsoaddedtocreateasmallsen-
sorydifference.Sampleswereservedin10mlaliquotsin
approximately thirty-milliliter clear plastic cups (Serco
1-oz Plastic Cups, S. E. Rykoff & CO., Los Angeles,
CA,USA).
2.2. Results
Thed 0 valuesforthethreeversionsoftheduo-triotest
are presented in Table 1. The d 0 values for the duo-trio
and DTMwere obtained from the proportions of cor-
rect responses using tables (Ennis, 1993). The d 0 value
for the 2-D-AFC was also obtained using the tables,
sinceitcanbearguedthatthecognitivestrategyusedby
the subjects to perform this particular protocol is the
sameasthatassumedfortheduo-trio.
Variances for testing significant differences were
obtainedusingtheapproachproposedbyBi,Ennis,and
O’Mahony (1997). It was found that the d 0 value
obtained with the DTMwas significantly higher than
those obtained with the traditional duo-trio and the
2.1.3. Procedure
Having intercepted the judge, an experimenter first
established rapport and then took personal details. A
second experimenter then explained the methods and
performedthetest.
Each judge performed three versions of the duo-trio
test: the traditional duo-trio with the reference tasted
first, the duo-trio with the reference tasted between the
Table1
Resultsforthethreepro t ocolscomparedinExperiment1—proportionofcorrectanswers,d 0 ,varianceofd 0 and95%confidenceintervals a
Protocol
Numberofcorrectanswers(outof96)
d 0
Variance
95%Confidenceintervals
Duo-Trio
62
1.4a
0.08
(0.8;2.0)
DTM
74
2.2b
0.08
(1.6;2.7)
2-D-AFC
61
1.3a
0.09
(0.8;1.9)
a DTMd 0 valuesignificantlydifferentfromduo-trio(P=0.05)and2-D-AFC(P=0.03).DTM,duo-triowithreferenceinthemiddle;2-D-AFC,
2-distance-alternativeforcedchoice.
448698929.004.png
42
B. Rousseau et al./Food Quality and Preference 13 (2002) 39–45
2-D-AFC(specificallyat P=0.05and0.03,respectively,
after an overall screening at P=0.06), while the latter
two procedures did not show any significant difference
(P=0.8).
For each of the three duo-trio protocols, no differ-
enceswerefoundbetweenthetwodifferentsequencesof
tasting (AAB, BBA). d 0 values are given in Table 2.
Differenceswerenotsignificant(P 50.33).
ItwasdecidedtousetheDTMinExperiment2sinceit
appearedtobethemostsensitiveversionoftheduo-trio.
sisted of two successive pairs of samples, one being of
identical samples, the other of different samples. How-
ever, subjects were not aware of this arrangement. The
version of the same–different test used here, sometimes
labeledasthe‘longer’version(Rousseauetal.,1999),is
statistically more powerful than the traditional triangle
and duo-trio tests (Ennis, 1996). For each pair of sti-
muli,subjectswereaskedtostatewhethertheythought
thesampleswerethesameordifferentandwhetherthey
weresureornotsosureoftheiranswer.
Each consumer was first introduced to the two sam-
ples A and B to be tested and was allowed to sample
both.This‘familiarizationprocedure’wasperformedto
allow the judge to have some preconceived idea of the
extent of the difference to be detected. This allowed
themtoadjusttheircriterionofdifference(Macmillan&
Creelman, 1991) such that if they detected a difference,
theywouldreportit.
Theorderofpresentationofthetestsandtheorderin
which samples appeared in a test were counterbalanced
overjudges.Ablockof48subjectspermittedacomplete
counterbalancing. Thus three of these blocks were used
here.Judgesrespondedverbally,nofeedbackwasgiven.
Session lengths from interception to termination lasted
approximately10–15min.
3. Experiment 2
3.1. Materials and methods
3.1.1. Judges
One hundred and forty-four (144) subjects (78 M, 66
F;agerange18–71;meanage24.3years)participatedin
Experiment2.Theywereselectedfromthesamepoolof
subjectsasinExperiment1.
3.1.2. Products
Thestimulifortheexperimentwerethesameasthose
used in Experiment 1. The only difference was a reduc-
tionofthesugarconcentrationinsampleBto8g/lfrom
10g/l,toassurebetterconfusabilityofthestimuli.
4. Results
3.1.3. Procedure
The testing procedures were similar to those used in
Experiment 1. This included the rinsing protocol and
the use of a primer. Consumers swallowed the stimuli
and the rinses. Each consumer performed one test of
each protocol: triangle, DTM, dual-pair and same–dif-
ferent. It should be noted that the last protocol con-
ResultsarepresentedinTable3.FortheDTM,dual-
pairandtriangletests,d 0 valuesandtheirvarianceswere
obtainedfromtheproportionofcorrectresponsesusing
tables (Ennis, 1993; Rousseau & Ennis, 2001). For the
same–different test,ananalysis usedinprevious studies
(Rousseau et al., 1998, 1999; Rousseau & O’Mahony,
2000; 2001) was applied. This analysis was based on
Thurstone’s law of categorical judgment (Ennis, 1998;
Thurstone,1927).Theanalysiswasappliedtodifference
ratingsandusedthemethod ofmaximum likelihoodto
estimate d 0 , t and the variance–covariance matrix for
theseestimates.
Using the approach proposed by Bi et al. (1997), no
significant differences were observed among the four d 0
values(P=0.7).However,atrendwasobservedforthe
triangle test to be slightly less sensitive than the other
threeprocedures.
Table2
d 0 valuesforeachduo-triosequence(n=48)
Protocol
AAB
BBA
Duo-Trio
1.6 a
1.1 a
DTM
1.9 b
2.5 b
2-D-AFC
1.5 c
1.1 c
a,b,c For each protocol, d 0 values for sequences AAB and BBA are
notsignificantlydifferent(P 50.33).DTM,duo-triowithreferencein
themiddle;2-D-AFC,2-distance-alternativeforcedchoice.
Table3
ResultsforthefourprotocolscomparedinExperiment2—proportionofcorrectanswers,d 0 ,varianceofd 0 and95%confidenceintervals
Protocol
Numberofcorrectanswers(outof144) d 0
Variance 95%Confidenceintervals
Triangle
62
1.1 a
0.06
(0.6;1.6)
DTM
96
1.5 a
0.05
(1.1;2.0)
Same–different N/A
1.4 a
0.04
(1.0;1.8)
Dual-pair
90
1.4 a
0.06
(0.9;1.9)
a d 0 valuesarenotsignificantlydifferent(P=0.7).
448698929.005.png
B. Rousseau et al./Food Quality and Preference 13 (2002) 39–45
43
Table4
Distribution of responses for the four pairs presented in the same–
differenttest(inbold,responseswhichcanbescoredas‘correct’) a
5. Discussion
The results from Experiment 1 confirmed the sig-
nificanteffectplayedbymemoryindiscriminationtests.
Merely changing the position of the reference in the
duo-triotestsignificantlyimprovedthesensitivityofthe
protocol. This result substantiates the higher sensitivity
oftestswithonlytwosamples,suchasthesame–differ-
ent and 2-AFC tests over those with three samples,
observed in previous studies. This first result seems to
indicate that tasting the reference first in the duo-trio
test is not the most appropriate design for this parti-
cular protocol. In order to obtain the best discrimina-
tion between products, the reference should be tasted
betweenthetwoalternativesamples.
However, it needs to be noted that subjects were not
allowed to retaste the samples, therefore permitting
memoryeffectstohaveanon-negligibleinfluenceonthe
outcome of the test. It is possible that retasting might
limit the advantage of the DTM; further experimenta-
tion is needed to investigate this issue. Nevertheless,
whenthenatureoftheproductdoesnotallowretasting
becauseofsensoryfatigue(e.g.redpepper,hardliquors,
...),theDTMappearslikethemostappropriateversion
oftheduo-triotest.
The2-D-AFCdidnotshowanyparticularadvantage.
Itmayhavebeenexpectedtogiveahigherproportionof
testscorrectliketheDTM.Yet,subjectsreportedthatthe
task was diXcult and they might have reverted to an
alternativestrategy.Perhapsmorepracticed and trained
subjectsinlaboratoryconditionsmayhavemoresuccess.
Experiment2didnotshowanysignificantdifferences
among the sensitivities of the four protocols. These
results would confirm Thurstonian predictions. How-
ever, trends seen in previous research (Rousseau et al.,
1998, 1999; Rousseau & O’Mahony, 2000; 2001; Still-
man & Irwin, 1995) were observed. Of the four proto-
cols, the same–different test was the most powerful as
indicatedbyitslowervariance.
Thedual-pairdidnotyieldalargerd 0 valuethanthat
ofthesame–differenttestaspredicted.Thereasoncould
be that larger memory requirements counterbalanced
any stabilizing effect on the judges’ t criteria that was
providedbyusingaforced-choicemethod.Asecondpos-
sibilityisthatitmightbeduetothefactthatthefamiliar-
ization procedure performed by the subjects prior to
starting the experiment was suXcient to prevent large
variations in the size of the subjects’ t criteria in the
same–differenttestandsonotreduceitsperformance.
Nevertheless, this result indicates that the dual-pair
methoddoesnotappeartobeagoodalternativetothe
same–differentmethod.Itexhibitsatbestthesamesen-
sitivity as the same–different, while its statistical power
is more limited. Thus, the same–different test with a
familiarization procedure appears more appropriate as
asensorydiscriminationmethodforconsumertesting.
Pair Numberofanswers(/72)
‘‘Correct’’ b
‘‘Incorrect’’ b
Same
AA 39
33
BB 41
31
Different AB 42
30
BA 44
28
a The pattern of responses is not significant ( 2 , P=0.7 for com-
parison of ‘same’ pairs, P=0.7 for comparison of ‘different’ pairs).
Thecategories‘‘Differentsure’’and‘‘Differentnotsure’’ononehand,
and ‘‘Same sure’’ and ‘‘Same not sure’’ on the other have been com-
binedtopermiteasiercomparison.
b For the pairs AB and BA, a response ‘Different’ will be con-
sideredcorrect,whileforthepairsAAandBB,aresponse‘Same’will
beconsideredcorrect.
Table5
d 0 valuesforeachtypeofsequenceforthreetestingprotocols(n=72)
Protocol
‘B’odd
‘A’odd
Triangle
0.9 a
1.2 a
DTM
1.9 b
1.2 b
Dual-pair
1.7 c
1.0 c
a,b,c Foreachprotocol,d 0 valuesofeachhalfofthetastingsequen-
ces (‘B’ odd vs. ‘A’ odd) are not significantly different (P 5 0.17).
DTM,duo-triowithreferenceinthemiddle.
As for Experiment 1, no clear trend was apparent
when looking at possible sequence effects. This can be
observed separately for different test protocols. First,
for the same–different test, the pattern of sequences
obtainedwiththefourpossiblepairsofstimulicouldbe
examined. This is an approach similar to that used in
thesequentialsensitivityanalysis(O’Mahony&Odbert,
1985). The appropriate way of performing the analysis
would be to carry out a 2 test including the four
sequences. However, thisanalysis iscomplicated bythe
existence of response bias. For instance, if the subjects
exhibited a very large t criterion, the results of the pair
AA and BB would tend to appear more ‘correct’ than
those of the pairs AB and BA, giving a significant 2
test. This would obscure the effect of adaptation and
sequencingsincetheresultswouldbecontaminatedbya
psychological bias. However, it is still relevant to com-
pare the responses of the pairs AA with BB on one
hand,andABwithBAontheotherhand(Table4).No
significantdifferencesareobserved( 2 ,P=0.7foreither
comparison).
For the remaining tests, the effects can be examined
by looking at the two halves of each of the triangle,
DTMand dual-pair tests. Results are presented in
Table 5. Using the same analysis as in Experiment I
(Table 2), no significant difference between the two
halves(‘A-odd 0 and‘B-odd 0 )wereobserved(P 5 0.17).
448698929.001.png
Zgłoś jeśli naruszono regulamin