by Kay H. Mount and Shirley J. Salmon
Audiology and Speech Pathology Service, Veterans Administration Medical Center, Kansas City, Missouri.
The vocal characteristics of a 63-year-old individual who underwent male-to-female sex reassignment surgery were evaluated. Treatment was designed to alter inappropriate male voice characteristics. Speech goals were to
- encourage use of successively higher pitch levels, and
- modify tongue carriage to change resonance.
After 11 months of therapy, average fundamental frequency for /i, a, u/ vowels changed from 110 to 205 Hz. Also, second formant frequency values changed remarkably for each of these vowels, with the greatest frequency change being 291 Hz for /i/. These acoustic differences could account for the perception of femininity in her posttreatment voice. Maintenance of these acoustic features was found five years posttreatment.
Address correspondence to Kay H. Mount, Ph.D. Audiology and Speech Pathology Service (126), Kansas City VA Medical Center, 4801 Linwood Blvd., Kansas City, MO 64128.
Because of differences in laryngeal size and mass, average fundamental frequency (F0) for females is higher (220 Hz) than for males (110 Hz). The perceived pitch of the laryngeal fundamental has long been accepted as an acoustic cue to speaker sex. Thus, for male-to-female transsexuals, fundamental frequency must change for the perception of a female voice. Baker and Green (1970) report that a common misconception about male-to-female sex reassignment surgery is that castration or use of estrogen will raise vocal pitch. That surgery or use of estrogen has little effect on vocal pitch was demonstrated by Wolfe, Ratusnik, and Northrop (1980). They investigated vocal characteristics of 20 male-to-female transsexuals all of whom had been on hormone treatment for various amounts of time, but only one of whom had undergone surgery. Reportedly, "mean fundamental frequency of transsexual speakers (93-202 Hz) covered a broad range, overlapping those of male and female speakers" (p. 473). These data demonstrated that this group of patients exhibited fundamental frequencies that might be expected from the normal male population. Their fundamental frequencies did not skew to the right as might be predicted if fundamental frequency was affected by either surgery or estrogen. Those who have treated male-to-female transsexuals have recognized the importance of changing fundamental frequency. Bralley, Bull, Gore, and Edgerton (1978) as well as Kalra (1977) presented data that illustrated how one male-to-female transsexual elevated fundamental frequency following vocal rehabilitation. The individual described by Bralley et al. (1978) elevated fundamental frequency from 145 to 165 Hz. Although the voice was higher in pitch and judged more feminine, the investigators reported that it could still be distinguished from female voices. These findings suggest that alteration of fundamental frequency alone is not enough to achieve perception of feminine gender. Kalra's patient raised fundamental frequency from 168 to 200 Hz and used Froeschel's chewing method (1952) in an attempt to increase anterior oral resonance. Kalra recognized the importance of accentuating anterior oral resonance to accommodate the newly acquired higher pitch; however, he did not provide data to substantiate change in vocal tract resonance. Other investigators also have recognized the need to alter vocal parameters besides fundamental frequency. Wolfe, Ratusnik, and Northrop (1980) reported high negative correlations between ratings on a femininity-masulinity scale and means of five vocal characteristics (F0 formant frequency, extent of upward inflections, extent of downward inflections, and extent of both upward and downward inflections). These findings as well as similar observations reported by Pronovost (1942), Snidecor (1951), and Coleman (1971, 1976) provide further support for the notion that other acoustic parameters in addition to frequency influence gender identification.
Vocal tract resonance characteristics may be the second most important acoustic cue to speaker identification. Peterson and Barney (1952) and Ladefoged and Broadbent (1957) found that females have higher average vowel formant frequencies than males. The importance of vocal tract resonances as a cue to speaker sex identification was shown by Coleman (1971). He reported that listeners correctly identified speaker sex 88% of the time when listening to both sexes produce artificial larynx speech having a fundamental of 85 Hz. In 1976 Coleman investigated further the importance of vocal tract resonance and fundamental frequency on gender identification. In one experiment, male and female speakers produced speech samples using normal voice and in another they used an artificial larynx. When speakers used normal voice, vocal tract resonance and fundamental frequency were both important to male-female identification. When they used artificial voice and when vocal tract resonance characteristics of one sex were combined with F0 characteristics of the opposite sex, listeners generally identified the speaker as male. This was true whether a male F0 was combined with female vocal tract resonance or whether a female F0 was combined with male vocal tract resonance.
These cumulative findings lead to the hypotheses that for male-to-female transsexuals
- raising the F0 alone will likely result in perception of male voice, and
- raising the F0 and the vocal tract resonance simultaneously will likely result in the perception of female voice.
To test these hypotheses, a treatment plan was devised that focused on changing both the laryngeal tone and its resonance. Consequent frequencies were measured.
Successful procedures for increasing fundamental frequency of male-to-female transsexuals' voices were well established; but procedures for changing vocal tract resonance were not. Yet, resonance of the vocal tract can be altered at will and is demonstrated during vowel production. According to Fant (1956), vowels are primarily the product of the voice source and the filtering action of the vocal tract. The resonant frequencies of the vocal tract, F1, F2, etc., are determined by characteristics of the tract including size and shape (Delattre, 1951; Fant, 1962). Narrowing of the oral-pharyngeal cavities during production of a high front vowel such as /i/ is produced by elevation of the mandible, which results in low F, values, and anterior tongue carriage, which results in high-frequency resonance for F2. Resonance produced with excessive anterior tongue carriage has been referred to as "thin" by Boone (1971) and Fisher (1975). Here, "thin vocal tract resonance" and "upward movement of the second vowel formant frequencies" are used synonymously and are thought to be the result of anterior tongue carriage. Thus, upward and downward movements of F2 were chosen to represent change in vocal tract resonance for this study. In therapy, the patient's natural abilities were used to produce voluntarily a higher laryngeal fundamental and to enhance it by forward carriage of the tongue, thereby raising second formant vowel frequency. Pre- and posttreatment frequency measurements of the fundamental and the vowel second formant were made to quantify treatment results.
The purpose of this report is to describe the treatment provided to a postoperative male-to-female transsexual, present acoustic data from pretreatment and posttreatment voice samples, and speculate about the relationship between acoustic change and perception of feminine voice.
Equipment used during diagnosis and treatment included
- Kay Elemetrics Visi-pitch, model 6087A, attached to a Tektronix oscilloscope, model 5113;
- Voice Identification 700 Series sound spectrograph with attached full track reel-to-reel Crown tape recorder, model IM7;
- portable Sony cassette tape recorder;
- Philco minicassette recorder;
- Bell and Howell Language Master; and
- Sony U-matic video recorder and camera.
The Visi-pitch was used to assess fundamental frequency and relative intensity of utterances and to provide visual feedback regarding these parameters. A Language Master and tape recorders were used to provide auditory feedback, while the video recorder was used to provide both auditory and visual feedback. The sound spectrograph was used for acoustic analysis of selected speech samples over time.
The patient was a 63-year-old individual who began hormone treatment a year before undergoing male-to-female reassignment surgery. Six months after surgery she came to the speech clinic for evaluation. Her chief complaint was a low-pitched voice, which was not perceived as female, especially over the telephone.
The patient was audio- and video-recorded throughout the diagnostic session. During conversational speech and production of sustained vowels, she spoke in a full, resounding, low-pitched voice appropriate for a bass male speaker as confirmed by measurement of fundamental frequency from spectrograms. Average fundamental frequency of sustained /i, a, u/ was 110 Hz with fundamental frequency at the lowest and highest pitch levels ranging from 110 to 340 Hz. In conversational speech prosodic patterns and voice quality were judged normal for a male; however, hard glottal attacks were noted occasionally. When asked to demonstrate the female-type voice she had been attempting to develop on her own, the patient's vocal characteristics changed remarkably. Although pitch was higher, upward and downward inflection patterns were bizarre and not consistent with sentence structure. The speech resembled that of a male amateur comedian trying to imitate a female.
The patient also was seen by ENT for indirect laryngoscopy. The structure and function of the vocal folds were judged normal.
The overall goal of treatment was to train the patient to effect at will a voice that was perceived as feminine. Primary goals were to
- train successively higher pitch levels while avoiding vocal abuse, and
- modify tongue carriage to achieve higher resonance characteristics of the vocal tract.
Secondary goals were to
- promote a breathy vocal attack, and
- establish appropriate inflection patterns at higher pitch levels.
The rationales were
- higher fundamental frequency is associated with feminine voice;
- higher resonance characteristics are associated with female voices;
- breathy vocal attacks contribute to vocal health and are considered by some (Money and Primrose, 1969) to exemplify feminine gender; and
- inappropriate inflection patterns call attention to the voice as different and unnatural.
Stimulus materials were selected or constructed to achieve the specified goals. To increase fundamental frequency and change resonance characteristics, words that contained high front vowels and anterior consonants were identified. To encourage easy onset of phonation and the adoption of a breathy voice quality, other words beginning with /h/ were selected. Using these two groups of words, phrases and sentences were constructed to represent various types of intonation patterns.
Initially, the patient was required to listen to the female clinician's production, study the Visi-pitch display, and attempt to match the pitch contours. From the beginning of treatment, the middle third of the patient's frequency range was established as the target fundamental. In general, frequency was raised in increments of 10 Hz until consistently good vocal quality could be maintained, using an average fundamental frequency of 210 Hz. Because rising inflection patterns were easier for the patient to imitate, they were used in the early period of treatment. To help attain high-frequency resonance, the patient was encouraged to listen to the quality of voice and note the elevation of the mandible and anterior tongue carriage when producing words with high front vowels and anterior consonants. She was directed to maintain this articulatory positioning throughout an utterance to effect higher resonance that was labeled "thin" for qualitative purposes. Breathiness on words beginning with /h/ and, later, on words beginning with vowels was established by monitoring rise time of intensity patterns displayed on the Visi-pitch. Easy onset of voicing was encouraged at all times.
A variety of intonation patterns was practiced while maintaining increased fundamental frequency, high frequency resonance, and breathy quality. Such practice was necessary to overcome inappropriate patterns adopted by the patient in her pretherapy attempts to develop a feminine voice. These behaviors were established first in the clinic and, later, during "self-modeling" home practice using audio recordings of her best efforts in therapy.
In the final months of therapy, role-play situations stressing functional conversations were practiced both in and outside the clinic. Assignments outside the clinic required conversations in person and over the telephone with people unknown to the patient. These conversations were recorded to assess appropriateness of vocal behaviors. The patient did not consider her speech and voice acceptable until she was referred to as "Ma'am" over the telephone. Thus, work with the telephone continued until feminine references predominated. Prior to the end of treatment an otolaryngological exam indicated normal supraglottic and glottic structures at rest and during phonation. Treatment was terminated following 88 1-hour sessions over an 11-month period. Maintenance was evaluated five years posttreatment.
Broad band (300 Hz) and narrow band (45 Hz) amplitude cross-section spectrograms were produced for /i, a, u/ vowels at the beginning and end of treatment and five years thereafter. Narrow band sections were made in the middle of the vowel at the most stationary portion of the second formant. The center frequencies of formants 1, 2, and 3 were judged to be equal to the frequency of the maximum harmonic of the first, second, and third spectral envelopes, respectively, or at a point half-way between two adjacent high-amplitude harmonics when two relatively equal central harmonics were present. Fundamental frequency was estimated by counting the vertical striations (representing the laryngeal pitch periods) in a 100-msec segment which corresponded to the most stationary portion of F2. Starrett model 120Z machinists dial calipers were used to measure F0 and F1, F2, and F3. The formulas were as follows:
- F0 = N(10) where N is the number of pitch periods in a 100-msec segment.
- F1, F2, or F3 = (X/3 Y) / (X/3000) where X is the measurement in thousandths of an inch from 1000 to 4000 Hz calibration marks on a particular spectrogram, and Y is the caliper reading for the formant being measured.
Two judges trained in acoustic analysis independently measured F0 and F1-F3 for each vowel. When examining the broad band spectrograms the two judges agreed 100% of the time on the number of vertical striations present within each 100-msec segment. When measuring the formants, the two judges agreed 100% of the time on selection of the point representing the peak of each spectral envelope. Because good reliability was obtained, formant measurements of only one of the judges were used. On repeated measurement of the entire sample by this judge, the greatest difference in dial caliper readings was +0.004 inch or 8 Hz at 1053 Hz and 2092 Hz for F2 of /a/ and /i/, respectively. When dial caliper readings were compared with a second judge, the greatest difference in readings was 0.006 inch or 11 Hz at 3467 Hz for F3 of /i/. These differences in the caliper readings were regarded as insignificant because such small changes in the these formants are probably linguistically irrelevant when submitted to listeners for perceptual judgments (Flanagan, 1955; and Mermelstein and Finch, 1976).
Table 1 provides frequency values for F0-F3 for /i, a, u/ vowels at the initiation (T1), termination (T2), and five years following (T3) treatment. In general, as fundamental frequency increased, formant frequencies increased. For example, five years posttreatment the mean for F0 increased to 222 Hz and F2 for all vowels was at its highest level. Figure 1 shows that at the beginning of treatment average fundamental frequency for /a/ was 110 Hz and after four months of treatment it increased to 210 Hz. F0 stabilized at 210 Hz throughout the remainder of treatment. The shaded area represents average F0 values of /a/ for males and females studied by Peterson and Barney (1952).
Figure 2 (missing in my original)\ shows second formant frequency values for /i, a, u/ at each of the three time periods (T1-T3). The lower and upper limits of the shaded areas in the figure depict male and female average F2 values for these vowels as reported by Peterson and Barney. Note that the patient's F2 values for all vowels illustrate a constant rise toward the female frequencies. When F2 values for the beginning and end of treatment were compared, F2 for /i/ increased the most (291 Hz) followed by /u/ (255 Hz) and /a/ (94 Hz). Posttreatment F2 values continued to increase but at a lesser rate; with the greatest change occurring for /u/ (103 Hz), followed by /a/ (44 Hz), and the smallest change occurring for /i/ (6 Hz).
Table 1. F0-F3 in Hertz for /i, a, u/ Vowels at Initiation (T1), Termination (T2), and Five Years after (T3) Treatment
Although a fundamental frequency comparable to that of females was obtained after four months of therapy, the patient was not perceived as female on the telephone. This did not occur until six months later, near the termination of treatment. It is not surprising that she was still perceived as male because alteration of fundamental frequency alone is not enough to achieve perception of feminine gender (Coleman, 1976; Bralley, Bull, Gore, and Edgerton, 1978). Although treatment for the present study was directed toward increasing both F0 and F2, F0 values had reached those appropriate for females within the first four months, while F2 had not. (See Figures 1 and 2). At the initiation of treatment, the patient's F2 values for the three vowels were below the means for male speakers studied by Peterson and Barney (1962). Midway through treatment, the patient's F2 values crossed the male frequency averages and began to rise toward the female means. At the end of treatment, the patient's F2 values had exceeded the female means for /u/, were halfway between the male and female averages for /a/, and were about one-third of the way between the male and female means for /i/. Although the patient did not achieve female values for F2 in every instance, her resonant characteristics when coupled with the feminine F0 apparently were sufficiently close to those for females to elicit feminine perception. She maintained female vocal characteristics five years posttreatment and reported continued success in being perceived as a female over the telephone.
There seems to be a range of acceptable fundamental and formant frequencies necessary for identification of voice as female. The patient first achieved F0 frequencies appropriate for females, then gradually achieved F2 values for this gender. Thus, by the end of treatment F2 values were perceptually consonant with those for F0 allowing perception of feminine voice when visual clues were not available. It is assumed that perception of femininity was accomplished by affecting a change in the resonance cavity through forward movement of the tongue. This treatment study lends support to the hypothesis that when a disparity exists between a female F0 and a male F2 in the transsexual voice, perception follows F2 and that only when consonance occurs between F0 and F2 will perception follow the female F0.
- Baker, H., and Green, R. (1970). Treatment of transsexualism. Curr. Psychiatric Ther. 10:88-89.
- Boone, D. (1971). The Voice and Voice Therapy. Englewood Cliffs, NJ: Prentice-Hall. Bralley, R., Bull, G., Gore, C., and Edgerton, M. (1978). Evaluation of vocal pitch in male transsexuals. J. Commun. Disord. 11:443-449.
- Coleman, R. O. (1971). Male and female voice quality and its relationship to vowel formant frequencies. J. Speech Hear. Res. 14:565-577.
- Coleman, R. O. (1976). A comparison of the contributions of two voice quality characteristics to the perception of maleness and femaleness in the voice. J. Speech Hear. Res. 19:168-180.
- Fant, C. (1956). On the predictability of formant levels and spectrum envelopes from formant frequencies. In M. Halle. H. Lunt, and H. MacLean (eds.), For Roman Jakobson. The Hague: Mouton.
- Fant, C. (1962). Descriptive analysis of the acoustic aspects of speech. Logos, 5:3-17.
- Fisher, H. (1975). Improving Voice and Articulation. Utica: H. M. Cardamone.
- Flanagan, J. (1955). A difference limen for vowel formant frequency. J. Acoust, Soc. Am. 27:765-768.
- Froeschels, E. (1952). Chewing method as therapy. Arch. Otolaryng. 56:427-434.
- Kalra, M. (1977). Voice therapy with a transsexual. In R. Gemme and C. Wheeler (eds.), Progress in Sexology. New York: Plenum Press.
- Ladefoged, P., and Broadbent, D. (1957). Information conveyed by vowels. J. Acoust. Soc. Am. 29:98-104.
- Mermelstein, P., and Fitch, H. (1976). Difference limens for formant frequencies for steady state and consonant-bounded vowels. Paper presented at the 92nd meeting of the Acoustical Society of America.
- Money, J., and Primrose, C. (1969). Sexual dimorphism and dissociation in the psychology of male transsexuals. In R. Green and J. Money (eds.). Transsexualism and Sex Reassignment. Baltimore: Johns Hopkins.
- Peterson, G., and Barney. H. (1952). Control methods used in a study of the vowels. J. Acoust. Soc. Am. 24:175-184.
- Pronovost, W. (1942). An experimental study of methods for determining natural and habitual-pitch. Speech Monogr. 9:111-123.
- Snidecor, J. (1951). The pitch and duration characteristics of superior female speakers during oral reading. J. Speech Hear. Disord. 16:44-52.
- Wolfe, V. I., Ratusnik, D. L., and Northrop, G. (1980). Vocal characteristics of male transsexuals on a masculinity-femininity dimension. In The Proceedings of the The Congress of the International Association of Logopedics and Phoniatrics. Vol 1, pp. 469-474.