J Am Acad Audiol 2021; 32(07): 445-463
DOI: 10.1055/s-0041-1730959
Research Article

Waveform Amplitude and Temporal Symmetric/Asymmetric Characteristics of Phoneme and Syllable Segments in the W-1 Spondaic Words Recorded by Four Speakers

Richard H. Wilson
1   Speech and Hearing Sciences, Arizona State University, Tempe, Arizona
,
Nancy J. Scherer
1   Speech and Hearing Sciences, Arizona State University, Tempe, Arizona
› Author Affiliations
Funding This work was supported by the Arizona State University Foundation. The input from two reviewers is greatly appreciated.

Abstract

Background The amplitude and temporal asymmetry of the speech waveform are mostly associated with voiced speech utterances and are obvious in recent graphic depictions in the literature. The asymmetries are attributed to the presence and interactions of the major formants characteristic of voicing with possible contributions from the unidirectional air flow that accompanies speaking.

Purpose This study investigated the amplitude symmetry/asymmetry characteristics (polarity) of speech waveforms that to our knowledge have not been quantified.

Study Sample Thirty-six spondaic words spoken by two male speakers and two female speakers were selected because they were multisyllabic words providing a reasonable sampling of speech sounds and four recordings were available that were not related to the topic under study.

Research Design Collectively, the words were segmented into phonemes (vowels [130], diphthongs [77], voiced consonants [258], voiceless consonants [219]), syllables (82), and blends (6). For each segment the following were analyzed separately for the positive and negative datum points: peak amplitude, the percent of the total segment datum points, the root-mean-square (rms) amplitude, and the crest factor.

Data Collection and Analyses The digitized words (44,100 samples/s; 16-bit) were parsed into 144 files (36 words × 4 speakers), edited, transcribed to numeric values (±1), and stored in a spread sheet in which all analyses were performed with in-house routines. Overall approximately 85% of each waveform was analyzed, which excluded portions of silent intervals, transitions, and diminished waveform endings.

Results The vowel, diphthong, and syllable segments had durations (180–220 ms) that were about twice as long as the consonant durations (∼90 ms) and peak and rms amplitudes that were 6 to 12 dB higher than the consonant peak and rms amplitudes. Vowel, diphthong, and syllable segments had 10% more positive datum points (55%) than negative points (45%), which suggested temporal asymmetries within the segments. With voiced consonants, the distribution of positive and negative datum points dropped to 52 and 48% and essentially was equal with the voiceless consonants (50.3 and 49.6%). The mean rms amplitudes of the negative datum points were higher than the rms amplitudes for the positive points by 2 dB (vowels, diphthongs, and syllables), 1 dB (voiced consonants), and 0.1 dB (voiceless consonants). The 144 waveforms and segmentations are illustrated in the Supplementary Material along with the tabularized positive and negative segment characteristics.

Conclusions The temporal and amplitude waveform asymmetries were by far most notable in segments that had a voicing component, which included the voiced consonants. These asymmetries were characterized by larger envelopes and more energy in the negative side of the waveform segment than in the positive side. Interestingly, these segments had more positive datum points than negative points, which indicated temporal asymmetry. All aspects of the voiceless consonants were equally divided between the positive and negative domains. There were female/male differences but with these limited samples such differences should not be generalized beyond the speakers in this study. The influence of the temporal and amplitude asymmetries on monaural word-recognition performance is thought to be negligible.

Disclaimer

Any mention of a product, service, or procedure in the Journal of the American Academy of Audiology does not constitute an endorsement of the product, service, or procedure by the American Academy of Audiology.


Supplementary Material



Publication History

Received: 30 September 2020

Accepted: 08 April 2021

Article published online:
30 November 2021

© 2021. American Academy of Audiology. This article is published by Thieme.

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA

 
  • References

  • 1 Nittrouer S, Studdert-Kennedy M. The stop-glide distinction: acoustic analysis and perceptual effect of variation in syllable amplitude envelope for initial /b/ and /w/. J Acoust Soc Am 1986; 80 (04) 1026-1029
  • 2 Steinberg JC. Application of sound measuring instruments to the study of phonetic problems. J Acoust Soc Am 1934; 6 (01) 16-24
  • 3 Cooper FS. Spectrum analysis. J Acoust Soc Am 1950; 22 (06) 761-762
  • 4 Koenig W, Dunn HK, Lacy LY. The sound spectrograph. J Acoust Soc Am 1946; 18 (01) 19-49
  • 5 Peterson GE, Barney HL. Control methods used in a study of the vowels. J Acoust Soc Am 1952; 24 (02) 175-184
  • 6 Potter RK. Visible patterns of sound. Science 1945; 102 (2654): 463-470
  • 7 Deepak B, Govind D. Significance of implementing polarity detection circuits in audio preamplifiers. Paper presented at: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), August 10–13, 2015; Kochi, India (pp. 2197–2200). IEEE.
  • 8 Drugman T. Residual excitation skewness for automatic speech polarity detection. IEEE Signal Process Lett 2013; 20 (04) 387-390
  • 9 Wilson RH, Arcos JT, Jones HC. Word recognition with segmented-alternated CVC words: a preliminary report on listeners with normal hearing. J Speech Hear Res 1984; 27 (03) 378-386
  • 10 Egan JP. Articulation testing methods. Laryngoscope 1948; 58 (09) 955-991
  • 11 Hirsh IJ, Davis H, Silverman SR, Reynolds EG, Eldert E, Benson RW. Development of materials for speech audiometry. J Speech Hear Disord 1952; 17 (03) 321-337
  • 12 Tillman TW, Carhart R. An expanded test for speech discrimination utilizing CNC monosyllabic words. Northwestern University Auditory Test No. 6. SAM-TR-66-55. Tech Rep SAM-TR. 1966 Jun: 1-12
  • 13 McArdle R, Wilson RH. Predicting word-recognition performance in noise by young listeners with normal hearing using acoustic, phonetic, and lexical variables. J Am Acad Audiol 2008; 19 (06) 507-518
  • 14 Wilson RH, McArdle R, Roberts H. A comparison of recognition performances in speech-spectrum noise by listeners with normal hearing on PB-50, CID W-22, NU-6, W-1 spondaic words, and monosyllabic digits spoken by the same speaker. J Am Acad Audiol 2008; 19 (06) 496-506
  • 15 Aiken SJ, Picton TW. Human cortical responses to the speech envelope. Ear Hear 2008; 29 (02) 139-157
  • 16 Drullman R. Temporal envelope and fine structure cues for speech intelligibility. J Acoust Soc Am 1995; 97 (01) 585-592
  • 17 Shinn P, Blumstein SE. On the role of the amplitude envelope for the perception of [b] and [w]. J Acoust Soc Am 1984; 75 (04) 1243-1252
  • 18 Caetano M, Rodet X. Improved estimation of the amplitude envelope of time-domain signals using true envelope cepstral smoothing. Paper presented at: 2011 (May) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 22–27, 2011; Prague, Czech Republic (pp. 4244–4247). IEEE
  • 19 Tiffin J. Applications of pitch and intensity measurements of connected speech. J Acoust Soc Am 1934; 5 (04) 225-234
  • 20 Dunn HK, White SD. Statistical measurements on conversational speech. J Acoust Soc Am 1940; 11 (03) 278-288
  • 21 Davenport WB. An experimental study of speech-wave probability distributions. J Acoust Soc Am 1952; 24 (04) 390-399
  • 22 Comer D. The use of waveform asymmetry to identify voiced sounds. IEEE Trans Audio Electroacoust 1968; 16 (04) 500-506
  • 23 Herbst CT, Lohscheller J, Švec JG, Henrich N, Weissengruber G, Fitch WT. Glottal opening and closing events investigated by electroglottography and super-high-speed video recordings. J Exp Biol 2014; 217 (Pt 6): 955-963
  • 24 Robjohns H. Why do waveforms sometimes look lop-sided?. Sound on Sound 2013; 28 (07) 224-225
  • 25 Van Tasell DJ, Soli SD, Kirby VM, Widin GP. Speech waveform envelope cues for consonant recognition. J Acoust Soc Am 1987; 82 (04) 1152-1161
  • 26 Drullman R, Festen JM, Plomp R. Effect of temporal envelope smearing on speech reception. J Acoust Soc Am 1994; 95 (02) 1053-1064
  • 27 Turner CW, Souza PE, Forget LN. Use of temporal envelope cues in speech recognition by normal and hearing-impaired listeners. J Acoust Soc Am 1995; 97 (04) 2568-2576
  • 28 Easwar V, Beamish L, Aiken S, Choi JM, Scollie S, Purcell D. Sensitivity of envelope following responses to vowel polarity. Hear Res 2015; 320: 38-50
  • 29 Wilson RH. Amplitude (vu and rms) and temporal (msec) measures of two Northwestern University Auditory Test No. 6 recordings. J Am Acad Audiol 2015; 26 (04) 346-354
  • 30 Oganian Y, Chang EF. A speech envelope landmark for syllable encoding in human superior temporal gyrus. Sci Adv 2019; 5 (11) eaay6279
  • 31 Tucker BV, Wright R. Speech acoustics of the world's languages. Acoust Today 2020; 16 (02) 56-64
  • 32 Kryter KD, Licklider JC, Stevens SS. Premodulation clipping in AM voice communication. J Acoust Soc Am 1947; 19 (01) 125-131
  • 33 Causey GD, Hermanson CL, Hood LJ, Bowling LS. A comparative evaluation of the Maryland NU 6 auditory test. J Speech Hear Disord 1983; 48 (01) 62-69
  • 34 Department of Veterans Affairs. Speech recognition and identification materials. Disc 4.0. Mountain Home, TN: VA Medical Center; 2010
  • 35 Adobe Systems, Inc.. Adobe Audition CS6, Version 5.0.2. San Jose, CA: Adobe Systems, Inc.; 2012
  • 36 Audacity Team. Audacity®. Version 2.3.3. Audio editor and recorder. 2019 . Accessed November, 2019 at: http://audacityteam.org/
  • 37 House AS, Fairbanks G. The influence of consonant environment upon the secondary acoustical characteristics of vowels. J Acoust Soc Am 1953; 25 (01) 105-113
  • 38 Peterson GE. An oral communication model. Language 1955; 31 (03) 414-427
  • 39 Liberman AM, Delattre P, Cooper FS. The role of selected stimulus-variables in the perception of the unvoiced stop consonants. Am J Psychol 1952; 65 (04) 497-516
  • 40 Liberman AM, Delattre PC, Cooper FS, Gerstman LJ. The role of consonant-vowel transitions in the perception of the stop and nasal consonants. Psychol Monogr 1954; 68 (08) 1-13
  • 41 Peterson GE. Systematic research in experimental phonetics. IV. The evaluation of speech signals. J Speech Hear Disord 1954; 19 (02) 158-168
  • 42 Fletcher H. Speech and Hearing. New York, NY: D. Van Nostrand Company; 1929
  • 43 Allen GD. Vowel duration measurement: A reliability study. J Acoust Soc Am 1978; 63 (04) 1176-1185
  • 44 Peterson GE, Lehiste I. Duration of syllable nuclei in English. J Acoust Soc Am 1960; 32 (06) 693-703
  • 45 Umeda N. Vowel duration in American English. J Acoust Soc Am 1975; 58 (02) 434-445
  • 46 Shriberg LD, Kent RD. Clinical phonetics. 2nd ed.. Needham Heights, MA: Allyn and Bacon; 1995
  • 47 Black JW. Natural frequency, duration, and intensity of vowels in reading. J Speech Hear Disord 1949; 14 (03) 216-221
  • 48 House AS. On vowel duration in English. J Acoust Soc Am 1961; 33 (09) 1174-1178
  • 49 Hillenbrand J, Getty LA, Clark MJ, Wheeler K. Acoustic characteristics of American English vowels. J Acoust Soc Am 1995; 97 (5, Pt 1): 3099-3111
  • 50 Umeda N. Consonant duration in American English. J Acoust Soc Am 1977; 61 (03) 846-858
  • 51 Jacewicz E, Fox RA. Amplitude variations in coarticulated vowels. J Acoust Soc Am 2008; 123 (05) 2750-2768