Successful communication in complex listening environments is difficult for most of
us, but especially for those with whom audiologists interact on a daily basis. Foremost
among these individuals are those who are hard of hearing, but others might include
individuals who are older and individuals with other conditions that impact the auditory
system or other higher-order systems. While there are many factors that contribute
to a complex listening environment, this article focuses on the contribution of background
noise.
Listening in Background Noise
When it comes to communication, difficulty understanding speech in the presence of
background noise is the primary complaint among those treated for hearing loss (Kochkin
2010). Communicating in background noise is a universal challenge because there is
so much noise in the world around us. For decades, researchers have tried to understand
the extent of background noise in common communication settings. Plomp (1977) and
Pearsons et al. (1977) were some of the first to attempt to measure the levels of
signals and noise in everyday listening environments to determine signal-to-noise
ratios (SNRs) typical in communication. [Fig. 1] illustrates data from several studies published over the past 45 years that measured
SNRs present in everyday listening environments (Hodgson 1999; Markides 1986; Pearsons
et al. 1977; Plomp 1977; Smeds et al. 2015; Teder 1990). Methods of measurement varied
across studies, and there can be a large degree of variability even within one setting
depending on the conditions that are present; however, results are generally consistent
across studies with SNRs mostly between 0 and 15 dB across a range of settings.
Figure 1 Signal-to-noise ratios (SNRs) measured in everyday life. Measurements across several
studies dating from 1977 to 2015 reveal general trends regarding typical SNRs in each
environment, although some environments (e.g., classroom, public transportation) demonstrate
more variability than others. (Figure used with permission of Hearing Research [see
Billings & Madsen 2018, for details of measured settings]).
Historical Perspective on Speech-in-Noise Testing
Difficulties understanding speech in background noise have been studied for decades,
dating back to the beginnings of the field of audiology (Carhart 1946; Miller 1947;
Cherry 1953). With respect to measuring success with hearing aids, Carhart (1946)
identified understanding speech in noise as one of the main dimensions needing further
exploration. Miller (1947), and later Cherry (1953), explored the effect of different
competing maskers (i.e., different types of noise and different sources of noise)
on speech understanding. Similarly, it has been suggested since the early days of
audiology that speech-in-noise testing is important to include in audiological testing
(Carhart 1946; Davis et al. 1946; Hardy 1950). Despite its consistent recommendation
by professional associations and researchers alike (e.g., APSO 2021; Davidson et al.
2021, the prevalence of regular speech-in-noise testing by hearing professionals remains
below 50% (ASHA 2019; Mueller 2010; Strom 2006; Clark et al. 2017).
The fundamental challenge of helping patients understand speech in the presence of
competing maskers continues to be a critical task for audiology. More than 70 years
ago, Hardy (1950) introduced two different categories of hearing difficulty. The first
is the “louder please” category. Individuals with this type of difficulty do well
if the volume of the signal can be increased. The second is the “I can't understand
you” category. These individuals may need increased volume, but they also have a perceptual
impairment involving auditory distortion that results in listening difficulties. Other
researchers have since proposed similar categories (Carhart 1951; Stephens 1976).
In 1978, in a seminal paper, Plomp formalized these categories in terms of “attenuation”
versus “distortion” problems, with the latter being especially pronounced in noisy
environments. He proposed that hearing losses in the attenuation category simply decrease a person's sensitivity to sound (i.e., raise one's audibility threshold), while losses in the distortion category cause a degradation in the fidelity of a person's perception of sounds even when they are well above threshold. In most
listeners, both attenuation and distortion-related factors contribute to speech-in-noise
performance, resulting in considerable variability.
Why Should I Test Speech in Noise?
An emblematic feature of speech-in-noise perception is the wide range of variability
in performance across individuals, even when all are of similar age and hearing status.
For example, [Fig. 2A] shows audiograms for 18 individuals who are over 65 years of age; despite the similarities
across individuals (i.e., bilateral sensorineural hearing loss in the normal/mild
sloping to moderately severe/severe range), understanding words in background babble
noise at a given SNR (12 dB in this case) varies from ∼0 to 95% ([Fig. 2B]). The large range of variability is also apparent in SNR50s, or the SNR at which
an individual achieves 50% intelligibility ([Fig. 2C]). Both the percent correct score at a given SNR and the SNR at a given percent correct
can be derived from the more complete psychometric function ([Fig. 2D]), which in this case represents the performance of the individuals tested across
a range of SNRs.
Figure 2 Variability in speech-in-noise listening performance. Performance on the WIN for
18 individuals over the age of 65 years with symmetrical hearing loss. Panel A shows the average (thick blue line) and individual (thin gray lines) pure-tone thresholds.
Panels B and C reveal the group-mean (diamond/square) and individual (triangles/circles) percent
correct and SNR50 scores, illustrating the wide range of variability across individuals.
Panel D shows the modeled group (thick blue line) and individual (thin gray lines) psychometric
functions.
The wide range of variability in understanding speech in background noise presents
a challenge for audiologists. For example, two patients may present with similar pure-tone
thresholds and speech-in-quiet understanding but very different speech-in-noise understanding.
Given that pure-tone testing and speech-in-quiet testing are completed in an optimal
listening situation, it is not surprising that these measures do not adequately characterize
the listening difficulties experienced by patients in background noise. In a group
of 3,430 Veterans, Wilson (2011) demonstrated the relationship between pure-tone average
(PTA), word recognition in quiet, and word recognition in noise using the Words in
Noise (WIN) test. While these measures were correlated, there was a wide range of
performance on WIN scores even among individuals with similar PTAs or performance
in quiet. As many as 70% of those tested had word recognition scores ≥ 80% in quiet,
whereas only 7% of the group demonstrated normal WIN performance (≤ 6 dB; Wilson 2011).
Wilson suggested that speech-in-noise testing “puts substantial pressure on the auditory
system and should be considered as the ‘stress test’ of auditory function” (p. 418).
Another challenge for audiologists is determining which speech-in-noise test to use.
Several different tests have been developed over the years; however, the rationale
for using one test instead of another is not always clear. Interestingly, the most
commonly used of these tests is reportedly the QuickSIN (Clark et al. 2017; Strom
2006; Mealings et al. 2020; Mueller 2003), perhaps because of its ease of use and
the length of time that it has been available to audiologists. The purpose of this
article is to introduce audiologists to the variety of tests that are available to
them as well as the underlying rationale for each, and to discuss important factors
that should be considered when selecting a test to use in their audiometric test battery.
Data will also be presented that compare performance across a variety of speech-in-noise
tests within the same group of individuals who vary in age and hearing status.
How Is Speech Understanding in Noise Measured?
Speech understanding in noise is most often characterized either in terms of percent
correct at a given SNR or, conversely, as the SNR needed to achieve a specific percent
correct (most commonly the SNR50, i.e., the SNR needed to understand 50% of the signal)
as shown in [Fig. 2] ([B] and [C], respectively). Some tests go a step further and compare the SNR score of an individual
to a group of normal-hearing individuals, resulting in an “SNR loss” value that is
similar to the conversion of dB SPL to dB HL on the audiogram (i.e., a group of normal-hearing
individuals are used to provide a reference performance level). The main motivation
that has been advanced for using SNR loss rather than SNR50 is that SNR loss is likely to be more comparable within an individual across tests
than SNR50 is, with the latter being extremely test-specific. Comparing SNR50 across
tests with the purpose of tracking individual changes in performance should be done
with caution as differences may be more reflective of signal or noise differences
rather than individual performance difference. If the purpose is to track changes
in performance over time, then it may be advisable to normalize the results from different
tests in some way. Theoretically, using SNR loss should work to subtract out the differences
in mean performance between tests, but the conversion procedure does not address any
differences that might exist in the width of the performance spread across tests;
so, one would still expect definite limits to the degree of across-test generalizability.
It is noteworthy that the SNR50, or 50%-point (also known as the speech reception
threshold in noise [SRTN] or speech reception threshold [SRT] in the speech-in-noise
literature, but not to be confused with the audiometric SRT, which is typically the
detection threshold of spondees in quiet), has historically been the convention for
characterizing speech-in-noise performance.
It is important to consider the methodology of administration of speech-in-noise tests
that are available to audiologists. Each test's methodology is unique but can be divided
into three categories: fixed, adaptive, or progressive. A fixed protocol uses a constant SNR (e.g., +5 dB) for the entirety of the test and the outcome
measure is a percent correct score. An adaptive protocol changes the SNR of a given condition according to how the participant performs
and is designed to “bracket” a particular predetermined level of performance (most
often targeting 50% correct), with the response variable being the SNR that, on average,
results in performance closest to the target level. Finally, a progressive protocol
gradually changes the SNR (usually in either an increasing only or decreasing only
direction) using step sizes and numbers of trials that are independent of how the
participant performs, and typically the SNR50 is derived or the conversion to SNR
loss is made. One partial exception to this performance-independence is that some
progressive tests employ a stopping rule such that if the individual performs below
a certain criterion at a given SNR (e.g., 0 of 5 correct) then the testing can be
terminated and no poorer SNRs are presented, effectively reducing test time (in which
case all SNRs lower than the one that triggered the stopping criterion would also
be assumed to have a score of 0).
Another important consideration is how the SNR for a given test is changed throughout
the test. As an example, the QuickSIN holds the signal level constant and adjusts
the noise level to create different SNRs, whereas in contrast, the WIN holds the noise
level constant and adjusts the signal level to manipulate SNR. For most speech-in-noise
tests, the presentation level should be high enough to ensure audibility; however,
it is critical to carefully consider how audibility of the signal and audibility of
the noise may impact speech-in-noise testing for an individual given their specific
hearing loss.
What Is the Best Test for You?
There are many speech-in-noise tests that are available to audiologists. [Table 1] represents basic information for the common tests available to audiologists. A more
thorough reference is provided as Appendix A, which compiles additional information for each test about the purpose, materials,
administration, scoring, norms if available, and key references. Of course, the selection
of a given test will be dependent on many factors. For example, the type of noise,
the type of signal, and other factors may be important to consider.
Table 1
Current speech-in-noise tests that are frequently used or discussed in the literature
|
Acronym
|
Target Speech
|
Noise
|
Protocol
|
Use
|
Time (min)
|
Purchase Source
|
Approximate Cost
|
|
Acceptable Noise Level
|
ANL
|
Running speech
Female
|
Multi-talker babble
|
Adaptive
|
Pre-hearing aid fitting
Estimate noise tolerance
|
5-10
|
Interacoustics
www.interacoustics.com
|
Included with equipment purchase
|
|
Arizona Biomedical Sentences Recognition
|
AzBio
|
AzBio sentences
2 Female
2 Male
|
Auditec four-talker babble
|
Fixed
|
CI evaluation
CI post-op monitoring
|
5-7
|
Auditory Potential
www.auditorypotential.com
|
$155
|
|
Bamford Kowal Bench Sentence Test
|
BKB-SIN
|
BKB sentences
Male
|
Auditec four-talker babble
|
Progressive
|
CI evaluation
CI post-op monitoring
APD evaluation
When QuickSIN is too difficult
|
5-7
|
Etymotic Research, Inc.
Auditec, Inc.
|
$215
|
|
Connected Speech Test
|
CST
|
Recorded passages
|
Six-talker babble
|
Fixed
|
Pre- post hearing aid fitting
|
<10
|
https://harlmemphis.org/connected-speech-test-cst/
|
Free
|
|
Coordinate Response Measure
|
CRM
|
Ready [CALL SIGN] go to [COLOR] [NUMBER] now
|
Multi-talker babble
|
Adaptive or Progressive
|
Analyze spatial hearing
|
|
http://auditory.org/mhonarc/2012/msg00034.html
|
Free
|
|
Hearing in Noise Test
|
HINT
|
BKB Sentences
|
Speech spectrum noise
|
Adaptive
|
Measure SNR threshold while avoiding ceiling & floor effects
Determine benefit of directional mics
|
5-10
|
Owned by Interacoustics; not currently available to clinicians
|
|
|
Listening in Spatialized Noise – Sentences
|
LiSN-S
|
LiSN-S sentences
Female
|
Children stories
|
Adaptive
|
APD evaluation or specifically, spatial processing disorder
|
<15
|
Sound Scouts
https://www.soundscouts.com/apd/
|
Monthly subscription
|
|
Quick Speech in Noise
|
Quick-SIN
|
IEEE sentences
Female
|
Auditec four-talker babble
|
Progressive
|
Pre- Post- hearing aid fitting
May indicate cognitive processes
|
1/list
|
Etymotic Research, Inc.
Auditec, Inc.
|
$176AQS
|
|
Revised Speech in Noise
|
R-SPIN
|
R-SPIN Sentences
1 Female
1 Male
|
Multi-talker babble
|
Fixed
|
Determine use of context clues
|
|
https://ahs.illinois.edu
|
Free
|
|
Speech Reception in Noise Test
|
SPRINT
|
NU-6 words
|
Six-talker babble
|
Fixed
|
Aid in determination of H3 active-duty members with retention, reclassification, or
separation
|
10
|
Richard Wilson, Ph.D.:Richard.H.Wilson@asu.edu or wilsonr1943@gmail.com
|
$50 donation
|
|
Words in Noise
|
WIN
|
NU-6 words
Female
|
Causey six-talker babble
|
Progressive
|
Pre- hearing aid fitting
Minimize cognitive influences
|
2.5
|
Richard Wilson, Ph.D.:Richard.H.Wilson@asu.edu or wilsonr1943@gmail.com
|
$50 donation
|
Effects of Signal Type
There is a range of speech signal types, including nonsense syllables, words, sentences,
running monologue, and conversation. Syllables typically increase the number of scoring
opportunities, allowing for less variability and more precise measurements. Syllable-
or word-level tests may also give the clinician information about specific phoneme
errors while reducing top-down processing effects (Billings et al. 2016). Within each
broad type, there are also many other potential considerations to account for, such
as the properties of the target talker (e.g., pitch, speed, or dialect characteristics),
the length and familiarity of the target speech, the amount of contextual information,
and so on. Different signal types will require contributions from different levels
and processes of the auditory system. In some cases, audiologists may wish to limit
higher-order contributions such as cognitive processes (e.g., working memory, cognitive
processing speed, sustained attention) by administering syllable- or word-level tests
such as the WIN. At other times, it may be important to use signal types that allow
for more cognitive contribution as with sentence-level tests such as the QuickSIN.
The assumption is that speech-in-noise perception with simpler signals, such as syllables
or words, will be driven by processes at the peripheral end of the auditory system
(Wilson & McArdle 2005).
Another important consideration is set size and whether the answer set is open or
closed. A closed set refers to a set of responses that is finite and often known by
the patient. Alternatively, an open set test has an unlimited number of possible responses.
For example, the CRM uses 32 specific color-number pairs, resulting in a closed set
of answers that will be inherently easier than an open set task with words or sentences,
all else being equal. That said, all else may not be equal; a closed-set test presented at a small/poor SNR, for example, might result
in worse performance than an open-set task presented at a large/good SNR. It should
be kept in mind that even a closed-set test requires cognitive processing by the patient
to take advantage of the limited number of possible responses.
While speech-in-noise tests that use sentences are more reflective of real-world listening
conditions and the use of context, their results are likely to be more affected by
cognitive decline, for similar reasons as in the case of closed sets discussed earlier
(after all, knowledge about the set of possible answers is, like the surrounding words
in a sentence, just another specific kind of contextual information). If an audiologist
is concerned that a patient may be suffering from cognitive decline, it may be best
to choose a test of words unless you hope to account for the cognitive decline as
well (Wilson 2004; Wingfield 1996). In summary, with regard to signal type, clinicians
should consider using sentence-level tests as a means to assess everyday communication
in a functional way and use word or syllable tests to determine more bottom-up factors
involving lower-level auditory function in the presence of noise.
Effects of Noise Type
Many different types of background noise are used in speech-in-noise testing. Noise
types vary in content and complexity across frequency, level, and timing domains (i.e.,
in essentially every way that a sound can vary). To assist in considering the effects
of background noise, the speech-in-noise literature has categorized masking in two
general ways: energetic masking and informational masking. Energetic masking has been
characterized as the overlap of the target and masker in time and frequency in the
cochlea such that portions of the target are inaudible (Brungart 2001a b; Kidd et
al. 2008). Informational masking, in contrast, cannot be explained by interactions
in the auditory periphery and has its origins at higher levels in the auditory system.
For informational masking, uncertainty (difference between what the listener actually
hears and what the listener expects to hear) and similarity (the relationship between
the target and the masker such that the listener is able to separate them from each
other) cause increased understanding difficulties above and beyond what would be expected
from the acoustics alone (Durlach et al. 2003). Unsurprisingly, higher-intensity noise
provides more masking, all else being equal. For equal average intensity, the strongest
speech maskers will be those whose other characteristics are most similar to speech
(and, within that, most similar to the voice of the particular speaker they are masking).
Thus, in the real world, speech-on-speech listening is typically the most difficult
listening task.
Background noise progresses in speech-masking effectiveness as it becomes more like
speech. Continuous broadband noise, such as white noise, pink noise, or speech spectrum
noise, becomes more effective in masking as it becomes more similar spectrally and
temporally to the target speech. It is important to keep in mind that continuous noise
types typically do not contain significant amplitude or frequency changes over time
(or, to be more precise, any instantaneous changes average out so as to be inconsequential
on timescales relevant for human hearing); therefore, continuous maskers are less
effective than maskers that vary over time similarly to target speech. Speech maskers
with fewer than four talkers have fluctuations similar to speech targets and result
in increased uncertainty for the patient and therefore increased difficulty. As the
number of talkers increases beyond four, amplitude and frequency fluctuations begin
to overlap across talkers, resulting in a more continuous-like noise and gradually
shift from more informational masking to more energetic masking (Kidd et al. 2008).
Most clinical speech-in-noise tests use speech as the background noise as a means
to reflect real-world listening situations. Such a speech-on-speech listening condition
reflects the more difficult speech-in-noise situations that listeners will experience
in everyday life. When the background noise is speech, both energetic and informational
masking are expected to occur, with the balance of masking tending toward more energetic
masking as the talkers increase beyond four talkers. Audiologists should consider
carefully what signal and noise type they are using and determine if that matches
their purpose for performing speech-in-noise testing.
Other Considerations
Several other factors may influence the choice of which speech-in-noise test to use.
Test time is a critical factor in most clinics as appointment lengths are limited.
It may be that a speech-in-noise test could take the place of other more traditional
tests that may not be needed. Wilson & McArdle (2005) have advocated for the possibility
of replacing words-in-quiet testing with words-in-noise testing. It is noteworthy
that most speech-in-noise tests take < 15 minutes to administer. High on the list
of reasons to complete speech-in-noise testing would be the ability to counsel and
instruct the patient using stimuli and conditions that have good face validity and
match the difficulties that most patients face in their everyday lives. Test results
may also be effective at helping patients set realistic expectations for intervention
outcomes.
Speech-in-noise tests may help with the diagnostic process by differentiating, as
per the Plomp (1978) model, between attenuation-based hearing losses and distortion-based
hearing losses. Such a distinction could inform intervention strategies. For example,
those that have a distortion-based hearing loss, or greater SNR loss, would benefit
from use of tight-beam directionality, remote microphones, or accommodations like
preferential seating (Etymotic Research 2006). Furthermore, counseling about the importance
of lip reading and ensuring that visual cues are available would be helpful. However,
those who have an attenuation-based hearing loss, or no SNR loss, may benefit substantially
from a straightforward amplification approach to enhance frequencies that were previously
inaudible.
Special Populations
Despite recommendations and support for speech-in-noise testing in the scientific
literature, the prevalence of regular speech-in-noise testing by hearing professionals
remains below 50% overall (ASHA 2019; Mueller 2010; Strom 2006; Clark et al. 2017).
However, it is important to note that speech-in-noise testing is more frequently used
in specific populations. We will highlight three of these populations: cochlear implant
users, hearing aid users, and those who have listening difficulties beyond what would
be expected based on audibility thresholds alone.
Cochlear-Implant Users
Speech-in-noise testing has more recently become an integral part of both cochlear-implant
candidacy evaluations and post-implantation monitoring. Cochlear-implant candidacy
guidelines are based on an individual's degree of hearing loss as well as aided speech-perception
scores. These results are necessary to document that an individual is not receiving
sufficient benefit from acoustic amplification alone and, therefore, may obtain more
benefit from a cochlear implant. The Food Drug Administration (FDA) approved candidacy
indications specify the necessary aided speech-perception score on a test of open-set
sentence recognition to be considered for cochlear implantation for each manufacturer.
Early candidacy indications specified the use of the HINT; however, clinicians generally
completed the evaluations in quiet rather than performing the test in the adaptive
SNR format as originally designed. Therefore, testing was not representative of listening
difficulties in real-world environments. Research later identified that the HINT was
prone to ceiling effects when testing was completed only in quiet to assess pre-implant
versus post-implant benefit. Updated recommendations based on the revised Minimum
Speech Test Battery (MSTB 2011) suggest use of the AzBio sentence test in place of
the HINT (Auditory Potential LLC, 2022; Gifford et al. 2008). Additionally, it was
recommended that testing be completed both in quiet and in noise to better document
an individual's performance in a more realistic range of listening scenarios (Auditory
Potential 2022).
The revised MSTB (2011) now serves as a guideline for audiologists completing adult
candidacy evaluations. Aided speech-perception testing generally includes CNC words,
AzBio sentences in quiet, AzBio sentences in noise, and the BKB-SIN (the AzBio sentences
provide a percent correct score at a fixed level, whereas BKB-SIN provides a SNR50
score). A presentation level of 60 dB SPL is recommended for all speech material;
however, when administering the AzBio in noise, the clinic may choose their own SNR
based on that clinic's candidacy evaluation protocol. Recent surveys by Prentiss et
al. (2020) and Carlson et al. (2018) found that 68 or 89% (respectively) of respondents
routinely perform speech-in-noise testing during cochlear-implant candidacy evaluations
with either a +5 or +10 dB SNR. The majority of remaining respondents indicated that
speech-in-noise testing was performed on a selective basis only if patient scores
were considered “borderline” after testing was completed in quiet. The same studies
also inquired as to the routine use of the BKB-SIN during adult candidacy evaluations,
and they found that only 32 or 9% (respectively) includes the BKB-SIN as an additional
test in their protocol. Limited use of the BKB-SIN during adult candidacy evaluations
may be related to time constraints and current FDA criteria, which do not base any
requirements on SNR50 or any other SNR score.
Verification of benefit following implantation generally includes monitoring and comparing
pre-implantation speech perception scores to post-implantation scores. Although research
suggests that there is still variability in cochlear-implant users' performance, significant
improvements in speech perception scores have been documented (Sladen et al. 2017).
For new cochlear-implant users, the majority of aided speech-perception testing may
be completed in quiet as this alone can be a difficult task for many patients. However,
as the user adapts to their cochlear implant and their speech understanding improves,
then speech-in-noise testing can be beneficial to verify benefit in a more realistic
listening scenario.
Hearing-Aid Users
As with cochlear-implants users, hearing-aid users can benefit from speech-in-noise
testing to demonstrate benefit. Three quarters of a century ago, Carhart (1946) and
Davis et al. (1946) suggested that speech-in-noise testing could be very helpful as
a pre-fitting measure to help determine candidacy for hearing aids. Mueller (2003)
suggests several reasons for including a speech-in-noise measure as part of the pre-fitting
testing: (1) help with selecting technology (e.g., more aggressive noise reduction
for those with speech-in-noise difficulties), (2) determine frequencies to amplify
(e.g., less low-frequency gain in noisy environments), and (3) use for counseling
and setting realistic expectations. In fact, some of the currently available tests
were designed specifically for pre-fitting hearing-aid testing. One of these is the
Acceptable Noise Level (ANL) test, which seeks to determine the highest level of noise
that is acceptable to the patient. Patients who can tolerate higher noise levels (i.e.,
ANLs of < 7 dB) are typically more successful with hearing aids as compared with those
who cannot tolerate higher noise levels (i.e., ANLs ≥ 8 dB); those with ANLs > 13 dB,
on the other hand, may not be good candidates for hearing aids and perhaps should
be encouraged to pursue hearing assistive technology systems (HATS) or may need additional
counseling before and during hearing aid use (Nabelek et al. 2006). Recently, Davidson
et al. (2021) completed a systematic review of the relationship between various pre-fitting
measures (e.g., speech recognition in quiet, speech recognition in noise, subjective
ratings, and dichotic speech tests) and hearing aid satisfaction, and they concluded
that speech-in-noise tests had the highest association with hearing aid satisfaction.
Speech-in-noise testing can also be used in the post-fitting process to help patients
understand more about their own performance in noise. The Performance-Perceptual Test
(PPT) was designed as a post-fit measure to determine if and how patients misjudge
their speech-in-noise performance. By comparing their actual performance with how
they think they perform, a clinician is equipped to help the patient recalibrate and
set expectations appropriately through counseling with the goal of improving hearing
aid use among those reporting little benefit.
Ceiling effects in speech-in-noise testing are important to consider with hearing-aid
users and other groups that might experience a wide range of performance variability.
When fixed SNRs are used that are relatively easy for the patient, performance will
bump up against the ceiling of 100%. As a result, it will be difficult to differentiate
performance between conditions or tests. Therefore, a clinician may need to use lower
SNRs or use an adaptive test, such as LiSN-S or HINT, or a progressive measure that
tests a range of SNRs, such as WIN or QuickSIN.
Individuals with Listening Difficulties
Speech-in-noise testing is especially important for individuals who have listening
difficulties that are beyond what is suggested by their audiogram. Unexpected difficulties
in complex listening environments could include challenges with background noise or
situations with rapid, reverberant, or otherwise degraded speech (Middelweerd et al.
1990; AAA 2010). Patients with normal-hearing thresholds and listening difficulties
have been referenced in many ways in the literature including auditory inferiority
complex (Byrne & Kerr 1987), auditory disability with normal hearing (Rendell & Stephens
1988), selective dysacusis (Narula & Mason 1988), obscure auditory dysfunction (Saunders
& Haggard 1989), King–Kopetzky syndrome (Kopetzky 1948; King 1954; Hinchcliffe 1992),
auditory dysacusis (Jayaram et al. 1992), and idiopathic discriminatory dysfunction
(Rappaport et al. 1993). One common characteristic of these patients is greater-than-normal
listening difficulties even with a normal audiogram. Reports from these patients show
that a diagnosis of normal hearing combined with a lack of treatment recommendations
may result in feelings of dismissal and confusion (Pryce & Wainwright 2008).
Currently, many audiologists may seek to diagnose patients who have listening difficulties
in the presence of normal pure-tone hearing with what has been called APD: auditory
processing disorder or deficit (ASHA 2005; AAA 2010). While APD's status as a distinct
condition remains controversial and there are well-documented challenges in assessing
the accuracy of such a diagnosis (see Vermiglio 2018a, 2018b, and Chermak et al. 2018,
for a review), there is broad agreement that the listening difficulties reported by
individuals who show up as normal on traditional audiologic assessments constitute
a very real phenomenon. According to ASHA (2005) and AAA (2010), at least two domains
of APD testing may include speech in the presence of background noise: monaural low
redundancy and binaural interaction. Monaural low redundancy refers to signals presented
to one ear that are degraded in some way to reduce the natural redundancy in speech,
such as filtering a signal or adding background noise to a signal, and binaural interaction
refers to using inputs to both ears (i.e., dichotic) to localize or lateralize sounds
(AAA 2010; ASHA 2005). The tests presented in Appendix A may be especially useful with this population, that is, those with normal or near-normal
pure-tone thresholds and good speech understanding in quiet. For example, the LiSN-S
(often categorized as a test of binaural interaction) was created to identify individuals
with a spatial processing disorder. For these individuals, the LiSN-S has been used
successfully to diagnose and monitor treatment (Brown et al. 2010; Cameron et al.
2012).
Methods
To illustrate differences in speech-in-noise performance across tests and groups of
individuals, we tested individuals in a repeated-measures cross-sectional experiment.
We used the QuickSIN (Killion et al. 2004), Words-in-Noise (WIN; Wilson & Burks 2005),
Coordinate Response Measure (CRM; Bolia et al. 2000), and Listening in Spatialized
Noise—Sentences (LiSN-S; Cameron & Dillon 2007) tests.
Participants
Participants were 30 individuals separated into three groups based on age and hearing
ability. The groups were 10 younger normal-hearing individuals (YNH) with a mean age
of 27.1 years, SD = 7.2; 10 older normal-hearing individuals (ONH) with a mean age
of 67.2 years, SD = 5.1; and 10 older hearing-impaired individuals (OHI) with a mean
age of 68.8 years, SD = 5.9. Each of the three groups contained six female and four
male participants. A two-sample t-test found that the difference in mean participant's age between the ONH and OHI
groups was not statistically significant (t(18) = − 0.645; p = 0.527). Pure-tone hearing thresholds of YNH and ONH participants were below 25 dB
HL up to 4,000 Hz bilaterally. It should be noted that individuals in these “normal-hearing”
groups may indeed have hearing deficits; we wish to generally categorize only the
participant groups according to their pure-tone thresholds. OHI participants had approximately
symmetrical mild-to-moderate sloping sensorineural hearing loss bilaterally. A PTA
(pure-tone average of thresholds at 500, 1,000, and 2,000 Hz) was calculated for each
participant, with mean PTA and standard deviations calculated for each group (YNH:
6.0 ± 4.4 dB; ONH: 7.9 ± 4.9 dB; OHI: 32.3 ± 7.7 dB); according to a two-sample t-test, the mean difference between normal-hearing groups was not statistically significant
(t(18) = 0.95; p = 0.355). This research was completed with the approval of the local institutional
review board and with the informed consent of all participants.
Materials and Outcome Measures
The QuickSIN speech-in-noise test uses target sentences that originate from the IEEE
corpus (IEEE 1969). Only five “key words” from each sentence are scored; the rest
of the words do not affect the score. The target sentences were presented at 70 dB
HL with four-talker babble in the background. The level of the target speech remains
constant, while the background babble increases by 5 dB after each sentence. This
produces SNRs that range from +25 to 0 dB, with one sentence per SNR.
The WIN uses target words from the NU-6 corpus (35 per trial), with each word being
scored completely correct or incorrect. The target words were presented at a starting
level of 84 dB HL with six-talker babble in the background at 60 dB HL. The signal
level decreases by 4 dB after each five-word block, while the level of the background
babble remains constant. This produces SNRs that range from +24 to 0 dB.
The CRM employs target sentences that follow a stereotyped, fill-in-the-blank format:
“Ready [call sign], go to [color] [number] now” (e.g., “Ready Charlie, go to blue
eight now”). The participant is asked to select the color/number combination (i.e.,
the “coordinates”) associated with their assigned call sign (in our implementation,
this was always “Charlie”) by using a computer touchscreen with a 4 × 8 grid of colors
(red, blue, green, and white) and numbers (1–8). There is no single definitive version
of this test, as even publicly released versions tend to be highly customizable by
the user, and our version was run directly through custom MATLAB software. In our
implementation of the CRM, the target signal was presented at 40 dB SL (re: spondee
threshold); the signal level was held constant while the background noise was gradually
increased over the course of each run, producing SNRs ranging from +9 to −21 dB in
2-dB steps. We used three different types of background noise in separate runs to
obtain a specific performance estimate in each noise type for each participant; these
noise types were one-talker modulated (1TM), four-talker babble (4TB), and speech-shaped
continuous (SSC). The SSC noise was created using the long-term average speech spectrum
(LTASS) of the IEEE sentence corpus created previously (Billings et al. 2011) to spectrally
shape continuous noise. The 1TM noise used this same LTASS but with the envelope modulated
to mimic 10 concatenated IEEE sentences.
The LiSN-S target sentences were presented at 62 dB SPL, with each word scored as
correct or incorrect. The distractor stimuli were children's stories initially presented
at 55 dB SPL. The level of the target sentences remained constant while the level
of the story changed according to an adaptive bracketing algorithm designed to pinpoint
the SNR50 (see the following paragraph), and which was automatically performed by
the software. The test was administered in four different conditions according to
two parameters: (1) whether the target and distractor stimuli are presented by the
same voice (SV) or two different voices (DV) and (2) whether the stimuli are presented
with target and distractors both at 0-degree azimuth or with the distractors offset
from the target by 90 degrees. The four different conditions (DV90, SV90, DV0, and
SV0) are equivalent to the four different permutations resulting from this 2 × 2 grid
of test parameter values.
A single scalar metric, the SNR50 (the SNR threshold associated with a 50% correct
rate), was used to quantify behavioral performance on each of the measures described
earlier. To calculate each participant's SNR50 for the QuickSIN, CRM, and WIN, we
first employed the Palamedes Toolbox (Prins & Kingdom 2018) in MATLAB to estimate
each participant's psychometric function for each measure by fitting a four-parameter
logistic curve (@PAL_Logistic) to each participant's performance data using an iterative
maximum-likelihood optimization algorithm (@PAL_PFML_Fit). The SNR50 was then calculated
as the point at which the fitted psychometric function intersected the 50% correct
line. For the LiSN-S, on the other hand, we simply used the bracketed SNR50 reported
by the software, since the LiSN-S is an adaptive SNR50-specific test that does not
attempt to sample the rest of the psychometric function.
Analyses
Three sets of repeated measures analyses of variance (RM-ANOVAs) were completed to
test the following effects for statistical significance: (1) the effects of noise
type using the CRM dataset, (2) the effects of group across all four tests, and (3)
the effects of talker and azimuth as a function of group using the LiSN-S dataset.
Results
[Table 2] presents means and standard deviations for all tests and conditions as a function
of group. In general, and as expected, the YNH group demonstrated the best performance
across tests and conditions followed by the ONH and then OHI groups. The four tests
that were used provide the opportunity to explore the effects of noise type and, to
some extent, signal type as a function of group.
Table 2
Mean SNR50 speech-in-noise test scores for the YNH, ONH, and OHI groups
|
WIN84
|
QuickSIN_SNR50
|
LiSN_DV90
|
LiSN_SV90
|
LiSN_DV0
|
LiSN_SV0
|
CRM_1TM
|
CRM_4TB
|
CRM_SSC
|
|
Group
|
Mean (dB)
|
(SD)
|
Mean
|
(SD)
|
Mean
|
(SD)
|
Mean
|
(SD)
|
Mean
|
(SD)
|
Mean
|
(SD)
|
Mean
|
(SD)
|
Mean
|
(SD)
|
Mean
|
(SD)
|
|
YNH
|
4.83
|
(1.31)
|
2.17
|
(1.05)
|
−15.53
|
(1.47)
|
−13.34
|
(2.95)
|
−11.0
|
(1.97)
|
−1.81
|
(0.46)
|
−20.66
|
(1.67)
|
−4.49
|
(2.25)
|
−8.51
|
(1.30)
|
|
ONH
|
8.34
|
(1.77)
|
3.61
|
(1.24)
|
−11.84
|
(1.74)
|
−9.72
|
(2.20)
|
−8.27
|
(2.24)
|
−0.53
|
(1.01)
|
−19.63
|
(2.61)
|
−2.50
|
(1.59)
|
−6.73
|
(2.06)
|
|
OHI
|
11.09
|
(2.44)
|
4.83
|
(1.69)
|
−3.65
|
(4.16)
|
−2.46
|
(3.29)
|
−1.81
|
(2.93)
|
2.08
|
(1.79)
|
−12.27
|
(3.13)
|
−1.43
|
(1.61)
|
−4.0
|
(2.13)
|
SD-standard deviation, dB-decibel, YNH-young normal hearing, ONH-old normal hearing,
OHI-old hearing impaired
Effects of Noise Type and Group
As discussed in the introduction, noise type can have an important effect on performance.
The CRM test used in this study provided a way to directly explore the effects of
noise type. [Fig. 3] shows results from the CRM test for the three groups and three noise types. The
top portion of the figure shows the psychometric functions, demonstrating the effects
of noise type across a range of SNRs. Participants had the most difficulty (highest
SNR50s) with 4TB followed by SSC. Participants performed the best (had the lowest
SNR50s) in 1TM noise. Such a pattern would be expected given the high degree of spectrotemporal
similarity between the babble and the signal leading to poorer performance, and the
gaps present in the modulated noise leading to better performance. The 3 × 3 RM-ANOVA
found the effect of Noise Type to be significant (F = 548.1; df = 2, 27; p < 0.0001) as well as the between-subjects main effect of Group (F = 25.99; df = 2, 27; p < 0.0001). As seen in [Fig. 3] (bottom), the YNH group performed the best on average, followed by the ONH group,
with the OHI group performing the worst. In addition to the aforementioned main effects,
the Noise Type × Group interaction was also found to be significant (F = 10.16; df = 4, 54; p < 0.0001). [Fig. 3] (top) reveals that this interaction was likely driven by OHI performance in 1TM
noise; the OHI group showed a smaller improvement in 1TM, relative to the other noise
types, than the YNH and ONH groups. In other words, the OHI individuals were not able
to take advantage of the gaps in the 1TM noise to the same degree as the normal-hearing
groups.
Figure 3 Psychometric functions (bottom) and SNR50 values (top) for CRM testing as a function
of Noise Type and Group. Error bars are present for SNR50s but are small compared
with the symbol size (all standard errors were less than 1 dB). SSC, speech spectrum
continuous noise; 4TB, four-talker babble noise; 1TM, one-talker-modulated noise;
YNH, young normal hearing; ONH, older normal hearing; OHI, older hearing impaired.
[Fig. 4] shows the test results for the four tests that were completed as a function of group.
For the tests with multiple conditions (CRM and LiSN-S), conditions were selected
that were most comparable to the WIN and QuickSIN paradigms—namely, the CRM's 4TB
condition (since WIN and QuickSIN both use multitalker babble as their maskers), and
the LiSN-S's DV0 condition (since the other tests involved neither spatial separation
nor a masker matching the target talker). One-way ANOVAs for each of the outcome measures
displayed were completed to characterize the Group effect on each measure. In all
cases, the effect of Group was found to be significant (CRM-4TB: F = 6.39, df = 2, 27, p = 0.0059; WIN: F = 23.1, df = 2, 27, p < 0.0001; QuickSIN: F = 8.29, df = 2, 27, p = 0.0018; LiSN-DV0: F = 34.01, df = 2, 27, p < 0.0001), demonstrating that Group was an important factor contributing to performance
across the four tests. Generally, YNH individuals performed the best followed by ONH
individuals and then the OHI individuals. It is important to note that there was overlap
between groups for most of the tests.
Figure 4 SNR50 values for the four different speech-in-noise tests as a function of participant
group. The overall difficulty posed by each test is reflected by the “center of gravity”
of each cluster, with WIN and QuickSIN resulting in the worst SNR50 values and the
CRM and LiSN resulting in the best SNR values. Variability within a test is also demonstrated
by the spread of SNR50 values within and across participant groups. The separation
between groups was most pronounced for the LiSN.
Effect of Signal Type
The effect of signal type is somewhat apparent in [Fig. 4], although only qualitatively, given that noise types and conditions are not equivalent
across tests. However, it is noted that participants performed most poorly (i.e.,
had the highest SNR50s) on the WIN, an open-set word test with no syntactic or semantic
context clues. Next in difficulty was the QuickSIN, also an open-set test. It is important
to note that because the QuickSIN signals are sentences, the participants are able
to use context cues (especially syntactic context clues) to help them understand and
repeat words that may not otherwise have been recognized. Generally, the participants
found the CRM to be an easier test, which is not surprising given that the CRM is
a closed-set test with a limited number of colors and numbers to choose from. Finally,
the easiest test was the LiSN-S in the different voice, 0-degree condition. In this
case, the signal was again sentences, like the QuickSIN; however, the LiSN-S sentences
may have more context clues than IEEE sentences. Perhaps more importantly, the noise
type was only a single talker, which may lead to being able to focus in on the target
signal easier than in the multitalker babble of the QuickSIN.
Effects of Spatial Processing and Voice
[Fig. 5] shows the SNR50 results for the LiSN-S test, which varies talker voice (masker same
vs. different from target) and azimuth (0 vs. 90 degrees of signal-masker separation).
A 2 × 2 × 3 repeated-measures ANOVA was completed to determine the effects of Voice,
Azimuth, and Group. Main effects of Voice (F = 314.2; df = 1, 18; p < 0.0001), Azimuth (F = 317.5; df = 1, 18; p < 0.0001), and Group (F = 43.96; df = 2, 27; p < 0.0001) were found to be statistically significant. In addition, two-way interactions
between Voice and Group (F = 15.22; df = 2, 45; p < 0.0001), Azimuth and Group (F = 41.27; df = 2, 45; p < 0.0001), and Voice and Azimuth (F = 32.60; df = 2, 36; p < 0.0001) were also found to be statistically significant. The three-way interaction
was not found to be significant (F = 1.061; df = 2, 81; p = 0.361). From [Fig. 5], it is likely that the same voice, 0-degree condition (SV0) played an important
role in the two-way interactions, demonstrating poorer SNR50s than the other conditions
and more overlap across groups.
Figure 5 SNR50 values for LiSN-S test results plotted as a function of Group and Condition.
Notice the larger spread of performance for the older hearing-impaired group. Effects
of spatial separation and talker are also apparent.
Discussion and Conclusion
Speech-in-noise testing has been a proposed part of the audiological test battery
from the inception of the field after WWII (Carhart 1946; Davis et al. 1946). With
the development and release of several commercially available tests in the last two
decades, speech-in-noise testing has increased (ASHA 2019). However, it could be argued
that the proportion of use still lags behind the proportion of individuals who have
special difficulties hearing in noise. Survey data obtained by the American Speech-Language-Hearing
Association show that only about a third of audiologists report using speech-in-noise
testing (ASHA 2019); however, more than 90% of patients report at least some difficulty
hearing speech in noise (Kochkin 2010). The purpose of this article was to provide
a resource for hearing health professionals seeking to learn more about speech-in-noise
testing and to provide some data that illustrate some of the main considerations that
are important when selecting tests to use.
Benefits of Testing
There are many advantages to including speech-in-noise testing in hearing healthcare.
Perhaps foremost among these advantages is the face validity of using a test that
corresponds to one of the primary complaints of those seeking audiological care—that
of understanding speech in background noise. The field of audiology has focused almost
exclusively on performance near threshold or in quiet, likely due to the fact that
treatment options have historically been mostly limited to increasing levels through
basic amplification strategies. Certainly, the most important treatment for most individuals
with hearing loss is restoring audibility. However, more advanced noise reduction
technologies and specialized HATS are providing opportunities for specialized auditory
care. Therefore, to tailor specialized solutions to patients it will be important
to include specialized testing such as speech-in-noise testing. For example, the QuickSIN
manual currently recommends different treatments for different speech-in-noise test
outcomes (e.g., directional microphones for moderate SNR loss versus FM systems for
severe SNR loss).
A major challenge in hearing care is the wide range of variability that is seen between
individuals, even individuals with very similar performance in quiet or pure-tone
audiometric thresholds. Speech-in-noise testing gives the audiologist a direct measure
of speech understanding in more complex environments that are more like the everyday
environments in which patients experience difficulty. Assessing and treating speech-in-noise
difficulties is currently among the most important challenges being addressed in the
world of auditory research, with the potential to directly impact hearing health care
by improving patient communication in daily life. At the very least, inclusion of
speech-in-noise testing provides audiologists with a strong foundation for patient
counseling and education. This is critical due to the variability of speech-in-noise
test results among even those who have very similar pure-tone thresholds (Wilson 2004).
Setting realistic expectations and helping patients to understand their difficulties
is an important benefit of including speech-in-noise testing as part of routine clinical
protocols. The possible treatments of those difficulties may need to include high-level
noise reduction technology (be it noise reduction in hearing aids or assistive devices
such as remote microphones), aural rehabilitation, focused communication strategies,
family counseling, or any combination of solutions. Speech-in-noise testing along
with case history can help deduce which strategies will be necessary for a particular
patient. For example, sentence-level tests like QuickSIN and HINT can inform how patients
communicate in a crowded restaurant, while a word test like the WIN can help reduce
the effect of higher-order cognitive deficits on testing.
Barriers to Testing
A discussion of this topic must also address the barriers that have so far prevented
speech-in-noise testing from becoming the norm in most hearing clinics. Barriers like
time and money are of great concern not only for those who rely on hearing-aid sales
for income, but for all audiologists as populations with hearing loss continue to
grow. The cost of training and learning new test procedures may also play a role in
whether clinicians choose to perform speech-in-noise testing. Clinicians may feel
uncomfortable using speech-in-noise testing because they do not have enough training
or feel confident in selecting which test(s) to use. The inclusion of speech-in-noise
testing as a topic of instruction in audiology programs is an important step toward
gaining broader acceptance and use in audiology. For those who are beyond their degree
programs, [Table 1] and Appendix A were created as quick guides for education and selection of appropriate test materials
with the goal of increasing use of speech-in-noise tests clinically.
Appointment length for hearing tests and hearing-aid fittings varies by setting; so,
it is difficult to quantify the impact of adding an additional test to an audiologist's
battery. That said, the thought of one more test which takes as few as 2 but as many
as 15 minutes to conduct could be extremely difficult to implement into a schedule
which is often already tight. However, speech-in-noise testing can also be completed
quickly with some currently available tests (e.g., QuickSIN can take < 5 minutes),
and the case has been made that speech-in-noise testing could replace a speech-in-quiet
test (Taylor, 2003; Wilson 2011), which would result in no added test time to the
appointment.
Cost can be a barrier in multiple ways. First, there is no specific billing code for
speech-in-noise testing; it is typically either included in 92557 (comprehensive audiometry)
or 92556 (speech audiometry threshold; with speech recognition) when assessing for
hearing aid candidacy (ASHAc n.d.). Clinicians may find justification for using 92700,
which is used for otorhinolaryngological procedures without a listed code (ASHAc n.d.).
In the case of cochlear-implant candidacy and monitoring of outcome performance, the
clinician may utilize 92626 (evaluation of auditory function for surgically implanted
devices); however, this is a time-based code and may not be widely covered by private
insurers. This code covers the first 60 minutes of evaluation and can only be billed
if at least 31 minutes is spent completing the evaluation of auditory function. Second,
it can be costly (or otherwise difficult) to acquire test materials; for example,
the ANL must be purchased along with Interacoustics equipment, and is not sold separately.
Of course, clinicians could create their own assessments of speech understanding in
noise; however, there is a benefit to using commercially available tests with norms
to which patient performance can be compared.
Addressing these barriers has the potential to significantly increase the use of speech-in-noise
tests. And, thankfully, the use of speech-in-noise tests is increasing; ASHA surveys
of audiologists from 2014 to 2018 showed that the percentage of respondents who used
speech-in-noise testing to validate hearing aids rose from 30 to 34% (ASHA 2019).
Future Directions
More normative data are needed to improve the usefulness of speech-in-noise tests.
Some of the tests that are presented in [Table 1] and Appendix A have limited normative data. It will be important for additional testing and research
to explore effects of aging, hearing impairment, and other conditions so that a patient's
score can be compared with the general population or to subpopulations.
Speech-in-noise testing will likely be more useful in some situations and for some
individuals than for others. Cochlear-implant candidates, hearing-aid users, and individuals
with special listening difficulties are three such examples. Therefore, clinicians
and researchers can benefit from carefully considering for which patients and under
what conditions speech-in-noise testing would be beneficial. Another priority should
be addressing the barrier of cost and lack of reimbursement for testing.
Speech-in-noise testing can be very helpful in audiology, with the potential to improve
and augment auditory assessment and treatment in some situations. It is important
for the clinician to carefully consider the advantages and disadvantages of speech-in-noise
testing in their own particular clinic and setting. In some populations, speech-in-noise
testing is a vital component of the candidacy evaluation or the monitoring of auditory
treatment (e.g., cochlear implantation, auditory processing disorders). Unfortunately,
there is no agreed-upon standard for speech-in-noise testing; instead there are several
tests to choose from with varying amounts of literature and data to support them.
Nonetheless, given that understanding speech in noise is often one of the most difficult
listening situations for patients, it is clear that audiologists who want to tailor
treatment to the needs of individual patients will find speech-in-noise testing to
be an important tool in providing top-quality clinical assessment and treatment.