Protocol - Auditory-Perceptual Evaluation of Voice
The Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) indicates salient perceptual vocal attributes; the attributes are: (a) Overall Severity, (b) Roughness, (c) Breathiness, (d) Strain, (e) Pitch, and (f) Loudness. The CAPE-V displays each attribute accompanied by a 100-mm line forming a visual analog scale (VAS). Using a tick mark, the clinician indicates the degree of perceived deviance from normal for each parameter on this scale. For each dimension, scalar extremes are unlabeled.
Judgments may be assisted by referring to general regions indicated below each scale on the CAPE-V: “MI” refers to “mildly deviant,” “MO” refers to “moderately deviant,” and “SE” refers to “severely deviant.” A key issue is that the regions indicate gradations in severity rather than discrete points. The clinician may place tick marks at any location along the line. Ratings are based on the clinician’s direct observations of the patient’s performance during the evaluation rather than on patient report or other sources.
The individual should be seated comfortably in a quiet environment. The clinician should audio-record the individual’s performance on three tasks: vowels, sentences, and conversational speech. Standard recording procedures should be used that incorporate a condenser microphone placed at an azimuth of 45° from the front of the mouth and at a 4-cm microphone-to-mouth distance. Audio recordings are recommended to be made onto a computer with a minimum of 16 bits of resolution and a signal-sampling rate of no less than 20 KHz. This protocol applies to pediatric through adult ages; the requirement for participation is the ability to follow instructions and participate in reading or repeating stimuli to produce voicing.
Refer to CAPE-V form as you administer the following steps.
Task 1: Sustained vowels. Two vowels were selected for this task. One is considered a lax vowel (/a /) and the other tense (/ i /). In addition, the vowel, /i/, is the sustained vowel used during videostroboscopy. Thus, the use of this vowel during this task offers an auditory comparison to that produced during a stroboscopic exam.
The clinician should say to the individual, “The first task is to say the sound, /a/. Hold it as steady as you can, in your typical voice, until I ask you to stop.” (The clinician may provide a model of this task, if necessary.) The individual performs this task three times for 3 to 5 seconds each time. “Next, say the sound, /i/. Hold it as steady as you can, in your typical voice, until I ask you to stop.” The individual performs this task three times for 3 to 5 seconds each time.
Task 2: Sentences. Six sentences were designed to elicit various laryngeal behaviors and clinical signs. The first sentence provides production of every vowel sound in the English language, the second sentence emphasizes easy onset with the /h/, the third sentence is all voiced, the fourth sentence elicits hard glottal attack, the fifth sentence incorporates nasal sounds, and the final sentence is weighted with voiceless plosive sounds.
The clinician should give the person being evaluated flash cards, which progressively show the target sentences (see below) one at a time.
The clinician says, “Please read the following sentences one at a time, as if you were speaking to somebody in a real conversation.” (Individual performs task, producing one exemplar of each sentence.) If the individual has difficulty reading, the clinician may ask him or her to repeat sentences after verbal examples. This should be noted on the CAPE-V form. The sentences are: (a) The blue spot is on the key again; (b) How hard did he hit him? (c) We were away a year ago; (d) We eat eggs every Easter; (e) My mama makes lemon jam; and (f) Peter will keep at the peak.
Task 3: Running speech. The clinician should elicit at least 20 seconds of natural conversational speech using standard interview questions such as “Tell me about your voice problem” or “Tell me how your voice is functioning.”
Although the PDF scale is accurate, printer configurations vary. Please verify that your paper copy has accurate 100-mm lines before reproducing the CAPE-V form. The clinician should have the individual perform all voice tasks—including vowel prolongation, sentence production, and running speech—before completing the CAPE-V form. If performance is uniform across all tasks, the clinician should mark the ratings, indicating overall performance for each scale. If the clinician notes a discrepancy in performance across tasks, he or she should rate performance on each task separately, on a given line. Only one CAPE-V form is used per individual being evaluated. In the case of discrepancies across tasks, tick marks should be labeled with the task number. Tick marks reflecting vowel prolongation should be labeled #1 (see form). Tick marks reflecting running speech (i.e., sentence reading) should be labeled #2. Tick marks reflecting spontaneous speaking should be labeled #3. In the rare event that the clinician perceives discrepancies within task type (e.g., /a/ vs. /i/), he or she may further label the ratings accordingly, such as 1/a/ versus 1/i/ to reflect the different vowels, or 2(a)-(b)-(c)-(d)-(e)- or (f) for the different sentences. Unlabeled tick marks indicate uniform performance. See examples below. (Note: Using labels to indicate discrepancies/variation across tasks in the severity of an attribute is different than indicating that an attribute is displayed intermittently [I]. If an attribute is judged to have equal severity whenever it appears, but it is not present all the time, “I” should be circled to indicate that the attribute is intermittent, and no additional labeling needs to be done.)
After the clinician has completed all ratings, he or she should measure ratings from each scale. To do so, he or she should physically measure the distance in millimeters from the left end of the scale. The millimeters score should be written in the blank space to the far right of the scale, thereby relating the results in a proportion to the total 100-mm length of the line. The results can be reported in two possible ways. First, results can indicate distance in millimeters to describe the degree of deviancy; for example, “73/100” on “strain.” Second, results can be reported using descriptive labels that are typically employed clinically to indicate the general amount of deviancy; for example, “moderate-to-severe” on “strain.”
We strongly suggest using both forms of reporting. It is strongly recommended that for all rating sessions following the initial one, the clinician have a paper or electronic copy of the previous CAPE-V ratings available for comparison purposes. He or she should also rate subsequent examinations based on direct comparisons between earlier and current audio recordings. Such an approach should optimize the internal consistency/reliability of repeated sequential ratings within a patient, particularly for purposes of assessing treatment outcomes. Although difficult, clinicians are encouraged to make every effort to minimize bias in all ratings.
Personnel and Training Required
An individual trained to evaluate speech disorders. A clinician or specialist is required to assess abnormal voice function.
To record the session for comparison at later times, audio recordings require a condenser microphone, pre-amplifier, and laptop computer.
|Specialized requirements for biospecimen collection||No|
|Average time of greater than 15 minutes in an unaffected individual||No|
Mode of Administration
Child, Adolescent, Adult
All age groups (with the minimum requirement of being able to follow instructions and cooperate)
The Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) protocol was developed as a result of a consensus expert meeting in 2002, which was sponsored by the American Speech-Language-Hearing Association (ASHA), and has been used by clinicians and researchers since that time to evaluate voice characteristics.
Process and Review
The Expert Review Panel #7 (ERP 7) reviewed the measures in the Speech and Hearing domain.
Guidance from ERP 7 includes the following:
- Added a new measure
- Created a new Data Dictionary
Protocol Name from Source
Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V)
Kempster, G. B., Gerratt, B. R., Verdolini A, K., Barkmeier-Kraemer, J., & Hillman, R. E. (2009). Consensus auditory-perceptual evaluation of voice: Development of a standardized clinical protocol. American Journal of Speech-Language Pathology, 18(2), 124–132.
Posted with permission from the American Speech-Language-Hearing Association (ASHA).
Awan, S. N., Solomon, N. P., Helou, L. B., & Stojadinovic, A. (2013). Spectral-cepstral estimation of dysphonia severity: External validation. Annals of Otology, Rhinology, and Laryngology, 122(1), 40–48.
Kelchner, L. N., Brehm, S. B., Weinrich, B., Middendorf, J., deAlarcon, A., Levin, L., & Elluru, R. (2010> Perceptual evaluation of severe pediatric voice disorders: Rater reliability using the consensus auditory perceptual evaluation of voice. Journal of Voice, 24(4), 441–449.
Nemr, K., Simoes-Zenari, M., Cordeiro, G. F., Tsuji, D., Ogawa, A. I., Ubrig, M. T., & Menezes, M. H. (2012). GRBAS and CAPE-V scales: High reliability and consensus when applied at different times. Journal of Voice, 26(6), 812.e17–22.
Sandage, M. J., Plexico, L. W., & Schiwitz, A. (2015). Clinical utility of CAPE-V sentences for determination of speaking fundamental frequency. Journal of Voice, 29(4), 441–445.
Solomon, N. P., Helou, L. B., & Stojadinovic, A. (2011). Clinical versus laboratory ratings of voice using the CAPE-V. Journal of Voice, 25(1), e7–14.
Watts, C. R. (2015). The effect of CAPE-V sentences on cepstral/spectral acoustic measures in dysphonic speakers. Folia Phoniatrica et Logopaedica, 67(1), 15–20.
Zraick, R. I., Kempster, G. B., Connor, N. P., Thibeault, S., Klaben, B. K., Bursac, Z., Thrush, C. R., & Glaze, L. E. (2011). Establishing validity of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). American Journal of Speech-Language Pathology, 20(1), 14–22.
Clinical Research Examples
Hseu, A., Ayele, N., Kawai, K., Woodnorth, G., & Nuss, R. (2018). Voice abnormalities and laryngeal pathology in preterm children. Annals of Otology, Rhinology, and Laryngology, 127(8), 508–513.
Lee, S. J., Hoi, H. S., & Kim, H. (2018). A comparison of voice activity and participation profiles among etiological groups. Journal of Voice, pii: S0892-1997(18)30112-7. doi: 10.1016/j.jvoice.2018.04.016. [Epub ahead of print]
McLaughlin, C. W., Swendseid, B., Courey, M. S., Schneider, S., Gartner-Schmidt, J. L., & Yung, K. C. (2018). Long-term outcomes in unilateral vocal fold paralysis patients. Laryngoscope, 128(2), 430–436.
Schwarz, K., Fontanari, A. M. V., Costa, A. B., Soll, B. M. B., da Silva, D. C., de Sa Villas-Boas, A. P., Cielo, C. A., Bastiha, G. R., Ribeiro, V. V., Dorfman, M. E. K. Y., Lobato, M. I. R. (2018). Perceptual-auditory and acoustical analysis of the voices of transgender women. Journal of Voice, 32(5), 602–608.
|Variable Name||Variable ID||Variable Description||dbGaP Mapping|
|PX201701060000||Did the clinician elicit at least 20 seconds more||N/A|
|PX201701050000||Was the subject provided flash cards one at more||N/A|
|PX201701020000||Was the vowel 'i' used during videostroboscopy?||N/A|
|PX201701030100||Did the subject vocalize the lax vowel more||N/A|
|PX201701030200||Was this repeated three times?||N/A|
|PX201701040100||Did the subject vocalize the tense vowel more||N/A|
|PX201701040200||Was this repeated three times?||N/A|
|PX201701010000||Were two vowels (lax and tense) selected for more||N/A|
Auditory-Perceptual Evaluation of Voice
June 4, 2019
An assessment of voice quality based on observations of the auditory and perceptual features of an individual.
Clinicians and researchers can utilize this standardized tool and visual analog scale to assess the voice quality of an individual.
auditory-perceptual evaluation of voice, Consensus Auditory-Perceptual Evaluation of Voice, CAPE-V, Voice, speech and hearing
|Protocol ID||Protocol Name|
|201701||Auditory-Perceptual Evaluation of Voice|
There are no publications listed for this protocol.