General Information

Purpose of presentation

Our main research interest is the acoustics of the vowel, that is, the relationship between perceived vowel identities and the physical characteristics of the sounds produced.

In our experimental investigations, we first followed the traditional approach of the formant theory, which predicates characteristic formant patterns associated with vowel identities. Investigating this relationship on the basis of a large vowel sample, we were confronted with severe methodological problems and unexpected behaviour on the part of the vowel spectrum. Some of our findings have already been reported in earlier studies recorded in the literature, some where indicated by studies on synthesised sounds and on sung vowels, while others are new. The interpretation of these findings may be controversial; we suggest, however, that they are of great importance for the understanding of the voice in general, and of speech production and perception in particular.

Within this Internet presentation, our main findings are described and illustrated by the use of specimen vowel series, including original sounds. Thus, both the experimental setting and the results can be examined by the reader. Moreover, the experiments can easily be repeated.

This presentation has four major aims:

  • we would like to report on our acoustic findings in relation to the behaviour of the vowel spectrum; we are interested in receiving comments on and interpretations of these findings;
  • we are interested in knowing whether other research groups have made similar observations, and what conclusions they have reached;
  • we are seeking collaboration with others with a view to developing a new acoustic approach (see the section headed "Exchange"); we are also seeking collaboration with others on the investigation of phonation and articulation, and in relation to vowel synthesis.
  • we offer consultancy services with regard to the acoustics of voiced speech sounds, including topics relating to speech recognition (see the section headed "Exchange").

The following paragraphs give a summary of our acoustic research, followed by some brief details of other studies we have carried out. For a detailed description of our acoustic research, including sound demonstration, see the section on "Acoustics".


The use of the term "Formant"

With the exception of the studies on phonation and articulation, the experiments and the results presented as well as their discussion are restricted to the acoustic (or psychophysical) aspect of vowels, that is, the parallelism between a perceived phoneme identity and the physical correlate of the sound wave. Because of this, the term "formant" is used only in the sense of apparent spectral envelope peaks of the sound wave. It is beyond the scope of this Internet presentation to discuss in detail whether or not the observed spectral envelope peaks directly relate to resonances of the vocal tract.

The acoustic question investigated was what variations in the formants could be observed when analysing the physical properties of the signals. The question was thus not how the vowels were produced, nor what the perceptual process of identification was. Since these different perspectives are often discussed together, it is important to note the purely descriptive approach of our studies.

 

Summary of acoustic research

Experiments and vowel sounds investigated

Our main experimental approach was to investigate the spectrum and the spectral envelope of isolated sound fragments of vowels. Special attention was given to sounds exhibiting high spectral constancy (exclusion of transitions) and a high perceptual identification rate. Our main experimental procedure was to record vowel sounds from different speakers, produced in isolation as well as in a CVC context, in different vocalisation modes and at different fundamental frequencies. The sounds underwent perceptual identification tests. Finally, Fourier and LPC analyses were performed, and the spectra and their envelopes were visually inspected.

Since, in many cases, estimation of the spectral envelope and of the formant patterns in the sense of envelope peak frequencies was problematical, we had to abandon a statistical approach. As an alternative, we followed a specimen-based approach: first, we investigated the spectral envelope visually, and formulated pragmatic rules concerning the relationship between the envelope, its peaks and the perceived vowel quality; second, we arranged model vowel series in order to demonstrate these rules. Following this procedure, the question of the relationship between spectral envelope peaks and perceived vowel identities can be investigated despite the methodological problem referred to above, since sounds for which the spectra exhibit a well-defined peak structure can be chosen, and the estimation of peak frequencies is at least in part independent of both the method of analysis and the parameter settings chosen (for details, see the section on "Acoustics").

In this way, a large sample of the Swiss German vowels /u,o,a,ä,ö,e,ü,i/ (region of Zurich; for phonetic symbols, see the section on "Symbols") was investigated. Approximately 18,700 recordings were made of 35 men, 44 women, and 20 children. The speakers produced the vowels in isolation as well as in a CVC context. Whispered sounds were also included. Isolated vowels were produced at different levels of F0, and at different intensities. – In this Internet presentation, however, we will only present findings relating to vowels produced in isolation and on a montone.

Major findings

The following list gives a summary of the major findings. The findings are described in detail in the section on "Acoustics".

  • Methodological problem of formant estimation: Up to now, there is no objective method of experimentally investigating all clearly identifiable vowel sounds.
  • Non-linear and non-monotonic correlation between F0 and the lower formants: The lower formants < 1.5 kHz shift with ascending F0; however, this correlation between the lower formants and F0 proves to be non-linear and non-monotonic, since it depends on the frequency of F0 and on the vowel identity.
  • High-pitched vowels: Vowel sounds at high fundamental frequencies above F1 of speech (above F1 at "habitual pitch") produced by untrained speakers (non-singers) can be clearly identified, and they confirm the correlation between the lower formants and F0.
  • Disappearance of gender- and age-differences in the lower formant frequencies: If men, women and children produce vowels at similar F0, the formant frequency differences found at F0 of normal speech ("habitual pitch") of the three speaker groups tend to disappear.
  • Formant number alteration: Different numbers of formants relevant for the phoneme identity appear in the spectra of natural vowels.
  • Formant pattern ambiguity: Similar formant patterns can be found for adjacent as well as non-adjacent vowels. Moreover, one formant pattern is often associated with more than two different vowels, i.e. the ambiguity proves to be "multiple".
  • "Anomalous" vowel spectra: Vowel sounds exhibiting either flat spectra with no clear resonance structure or an unexpectedly large number of envelope peaks can be clearly identified.

Conclusions

It has to be emphasised that the investigations mentioned proceed from a purely acoustic (psychophysical) perspective, that is to say, only the relationship between the vowel spectrum and the perceived vowel quality is considered. The term "formant" is thus used solely in the sense of an apparent spectral envelope peak.

Concerning the acoustics of vowel sounds, three main statements can be made: On the one hand, up to the present time, there is no objective method of experimentally investigating all clearly identifiable vowel sounds. On the other hand, comparing vowel sounds, for which the estimation of formant patterns is justifiable, the patterns do not behave systematically: the correlation of the lower formants is non-linear and non-monotonic, the number of formants relevant for vowel identity alters, and no general spectral integration of two formants into one could be experimentally confirmed. Finally, vowels can be perceived even if the spectrum does not exhibit a clear peak structure.

As far as future research is concerned, two different approaches are possible: either a highly detailed hypothesis involving normalizing the formant patterns can be developed, or a new acoustic approach not associated with formants can be sought. Since we doubt whether a reliable normalisation approach can be developed, we are inclined to abandon the formant theory and to turn our attention to features other than spectral envelopes and their peak characteristics.

As mentioned above, details are given in the section on "Acoustics".


Related studies: vowels and phonation

Usually, phonation and articulation are regarded as independent of each other. Roughly speaking, however, some of the findings reported above could indicate a substantial link between these two processes. We therefore investigated vocal fold movements for different vowels, using high-speed light-intensified digital imaging. The results showed a tendency towards phoneme-specific movements. We were unable to determine "a priori" whether the observed variations in vocal fold movements were related directly to phoneme identity, or whether they appeared because of other features of vocalisation. However, we conclude that any assumptions regarding phoneme-independent source characteristics have to be made cautiously.

Reference

Maurer, D., Hess, M., and Gross, M. (1996). "High-speed imaged vocal fold vibrations and larynx movements within vocalizations of different vowels," Annals of Otology, Rhinology and Laryngology, 105, 975-981.


Related studies: vowels and articulation I

The finding of a correlation between the formant pattern and F0 could lead to the assumption that different configurations of the vocal tract (producing different resonance characteristics) are associated with the same vowel, even for sounds uttered by a single speaker. Moreover, the finding of formant pattern ambiguity could lead to the assumption that a single vocal tract configuration can (or must) represent different vowels - given a very close physiological relationship between the resonance pattern exhibited by a signal and the positions of the articulators. We tested this relationship by means of Electromagnetic Articulography (EMA). In general, the correlation between articulator positions and the vowel spectrum proved to be weak, and the anterior part of the vocal tract proved to be non-specific for a given vowel. Moreover, overlapping articulator positions were found for different vowels at similar F0. This was also true in the case of vocalisations of a particular vowel at different F0, where different formant patterns occurred. We therefore conclude that there is no necessary physiological link between the entire vocal tract and a particular vowel sound.

Reference

Maurer, D., Gröne, B., Landis, T., Hoch, G., and Schönle, P.W. (1993). "Reexamination of the relation between the vocal tract and the vowel sound with EMA in vocalizations," Clinical Linguistics & Phonetics, 7, 129-143.


Related studies: vowels and articulation II

We are at the moment engaged in investigating the acoustic characteristics of vowels produced by a child with severe speech disorders (paralysis of the vocal chords) who has recently succeeded in developing vocalisations comparable to the norm. This investigation allows for an acoustic comparison of vowel spectra associated with two different modes of speech which are both "habitual" for the child concerned. The main experimental question is whether or not the vowel spectra for the two different speech modes exhibit similar resonance patterns.


Related studies: self-perception of speech

Aside from our main research interest, we are also investigating the role of bone-conduction in the self-perception of speech.

In a first study, we tested the effects of bone conduction on the recognition of one’s own voice. We recorded the voices of subjects simultaneously via air and bone conduction. We then presented the recordings simultaneously to the speakers, and asked them to mix the recordings until they judged the perceived voice to be closest to their own. Although most subjects preferred a mixture, the role of bone conduction in the self-perception of speech varied quiet widely among the subjects.

At the moment, we are investigating the acoustic properties of phonemes recorded from the bone.

Reference

Maurer, D., and Landis, T. (1990): Role of bone conduction in the self-perception of speech. Folia phoniatrica, 42, 226-229.


Author
Dieter Maurer, Ph.D.
Researcher - Lecturer
 
Zurich University of the Arts
Hafnerstrasse 31 - CH-8005 Zurich - Switzerland
Phone (Switzerland) + 41 + 43 446 63 10
Phone (France) + 33 + 1 48 06 36 51
E-mail: dieter.maurer@zhdk.ch
Web: www.vowel.ch - www.scribblings.ch - www.zhdk.ch

 

Publications

ARTICLES

Maurer, D., and Landis, T. (1990): "Role of bone conduction in the self-perception of speech," Folia phoniatrica, 42, 226-229. (Abstract)

Maurer, D., Landis, T., and d'Heureuse, C. (1991): "Formant movement and formant number alteration with rising F0 in real vocalizations of the German vowels [u:], [o:] and [a:]," International Journal of Neuroscience, 57, 25-38. (Abstract) 

Maurer, D., Cook, N., Landis, T., and d'Heureuse, C. (1992): "Are measured differences between the formants of men, women and children due to F0 differences?" Journal of the International Phonetic Association, 21, 66-79. (Abstract) 

Maurer, D., Gröne, B., Landis, T., Hoch, G., and Schönle, P.W. (1993): "Reexamination of the relation between the vocal tract and the vowel sound with EMA in vocalizations," Clinical Linguistics & Phonetics, 7, 129-143. (Abstract) 

Maurer, D., and Landis, T. (1995): "F0-dependence, number alteration, and non-systematic behaviour of the formants in German vowels," International Journal of Neuroscience, 83, 25-44. (Abstract)

Maurer, D., and Landis, T. (1996): "Intelligibility and spectral differences in high pitched vowels," Folia phoniatrica et logopaedica, 48, 1-10. (Abstract) 

Maurer, D., Hess, M., and Gross, M. (1996): "High-speed imaged vocal fold vibrations and larynx movements within vocalizations of different vowels," Annals of Otology, Rhinology & Laryngology, 105, 975-981. (Abstract) 

Maurer, D. (1997): "Arguments against formants - the descriptional problem of acoustic phonetics," Proceedings of the Journées d'Etudes Linguistiques 1997, University of Nantes (FR), 106-111.

Maurer, D. (1999): " 'Anomalous' vowel spectra of Swiss German /a/," manuscript (submitted).

Maurer, D., and Landis, T. (2000): "Formant pattern ambiguity of vowel sounds," International Journal of Neuroscience, 100, 39-76. (Abstract)

BOOKS / ARTICLES IN BOOKS

Maurer, D. (1991): "Was ist das Physikalische des Vokals?" in Rohmert, W. (Ed.): Kolloquium Praktische Musikphysiologie, Dokumentation Arbeitswissenschaft, Bd. 27 (pp 165 - 187). Darmstadt: Schmidt Verlag.

Maurer, D. (1994): "Über den Vokal". Band I: Kritik der akustischen Theorie der stimmhaften Sprachlaute. Band II: Materialien. Hartung-Gorre Verlag, Konstanz/Germany.

SHORT CONTRIBUTIONS

Maurer, D., and Klinkert, A. (1998): "Vokale und ihre physikalischen Merkmale. I. Lautidentität und Formantmuster," Fortschritte der Akustik – DAGA 98, 386-387.

Maurer, D., and Klinkert, A. (1998): "Vokale und ihre physikalischen Merkmale. II. Lautidentität und Partialton-spektrum," Fortschritte der Akustik – DAGA 98, 388-389.

ABSTRACTS

Maurer, D., and Landis, T. (1990): "Does vowel perception rely on resonance patterns?" Journal of Clinical and Experimental Neuropsychology, 12, 421.

Maurer, D. (1996): "Über den Vokal - Psychophysik und Physiologie der stimmhaften Sprachlaute," Roche Research Foundation, 25th Anniversary Annual Report, 51.

Klinkert, A., and Maurer, D. (1997): "Fourier spectra and formant patterns of German vowels produced at F0 of 70 - 850 Hz," Journal of the Acoustical Society of America, 101, 3112. (Abstract)

Maurer, D., and Klinkert, A. (1997): "The spectral difference of different vowels - towards a new acoustical concept," Journal of the Acoustical Society of America, 101, 3112. (Abstract)