Learning Outcomes: As a result of this activity, the reader will be able to (1) discuss practical factors to consider when choosing a discourse outcome measure to use in treatment; (2) discuss psychometric properties to consider when choosing a discourse outcome measure to use in treatment; (3) and explain why it is important to know about an outcome measure's stability before using it to measure treatment-related change.
The desired outcome of therapy for people with stroke-induced aphasia is an improvement in communication ability. Ideally, this improvement should have a noticeable impact on a person's everyday communication activities. Because discourse activities are the most common communication activities for adults with aphasia,[1] it is not surprising that discourse is increasingly a target of clinical treatment and research.[2]
[3]
[4]
[5]
[6] There have been calls by some researchers to establish a core set of discourse outcome measures to be used across studies,[5] but to date no consensus about a group of measurements has been reached. Because there is no agreed-upon core set of discourse outcome measures, researchers often develop new measures tailored to the aims of their particular study.[5] One result has been the proliferation of discourse outcome measures. Bryant et al identified 536 different discourse outcome measures in the research literature, and Pritchard and her colleagues identified 58 measures that focused solely on the information content of discourse.[2]
[6] Bryant et al speculated that the existence of such a large number of measures might cause confusion regarding selection of appropriate measures for individual clients.[3] The results of their survey confirmed that, indeed, clinicians viewed the selection of appropriate discourse outcome measures for specific clients as a barrier to using discourse analysis in their practice.
It is beyond the scope of this article to offer guidance about each of the hundreds of discourse outcome measures that have been reported.[2]
[6] Speech-language pathologists can learn about discourse outcome measures by reviewing the professional literature, participating in professional continuing education offerings, and reviewing materials available from publishers of assessment and treatment materials. This article aims to provide a general process that clinicians can use to determine whether a particular discourse outcome measure might be a reasonable choice for a particular client. The process involves principles of evidence-based practice, the model of health and disability provided by the International Classification of Functioning, Disability, and Health (ICF),[7] and psychometric properties of the measure under consideration.
Let us start with the assumption that the clinician has followed the principles of evidence-based practice to integrate the needs and perspectives of the client and/or other recipient of the treatment (such as an important communication partner) with assessment data concerning the client's aphasia and its effect on communication. Using this information, the clinician has developed a treatment plan to achieve client-centered goals, choosing a treatment approach that is within his or her clinical expertise to carry out and that is acceptable to the client and other treatment participants. Let us also assume that the clinician is aware of the external scientific evidence associated with each treatment option and has chosen the one with the highest level of evidence that is also compatible with the client's preferences and the clinician's own expertise.[8]
A Process for Choosing a Discourse Outcome Measure
Once a treatment plan to achieve the client's goals has been formulated, the clinician can consider a series of questions (see [Table 1]) to determine which discourse outcome measure or measures might be best for a particular client.
Table 1
A Process for Choosing a Discourse Outcome Measure
Questions related to the client, treatment, and work setting
|
1
|
What aspect or level of discourse are you expecting the treatment to improve?
|
a. Microstructure
|
b. Macrostructure
|
c. Does the discourse genre of the outcome measure match the genre that you plan to use in treatment?
|
2
|
Do you expect that improving discourse might result in changes in activity, participation, or quality of life?
|
a. Consider aphasia-specific patient-reported outcome measures
|
b. Does your client have family/social support crucial for change at this level?
|
3
|
Can you implement the outcome measure in your workplace?
|
a. Do you have access to the materials necessary to administer the outcome measure?
|
b. Do you have time in your workday to analyze the discourse according to the outcome measure's protocol?
|
4
|
Is there evidence that the discourse outcome measure is relevant for people similar to your client?
|
a. Has the measure been used with people who have aphasia?
|
b. Has the measure been used with people whose aphasia is similar in severity and type to your client's aphasia?
|
Questions related to the psychometric properties of the discourse outcome measure
|
1
|
Is there evidence concerning the scoring reliability of the outcome measure?
|
a. Are there reports that intra-rater and inter-rater reliability with the outcome measure is 0.70 or better?
|
2
|
Is there evidence concerning the stability (test–retest reliability) of the outcome measure?
|
a. Is the test–retest reliability coefficient at least 0.90?
|
b. Is the standard error of measurement (SEM) reported?
|
i. If so, add and subtract the SEM from your client's score. If the resulting range does not include your client's pretreatment score, it is likely that treatment has truly changed your client's performance.
|
c. Is the minimal detectable change (MDC) value reported?
|
i. If so, compare how much your client changed on the measure pre- to posttreatment. If this value is equal to or larger than the MDC value, it is likely that treatment has truly changed your client's performance.
|
3
|
Is there evidence that the outcome measure is responsive to change?
|
a. If an effect size or an MDC is reported, you can use this to gauge whether your client's change on the outcome measure is likely to reflect treatment-related change.
|
Questions Related to the Client, Treatment, and Work Setting
1. What aspect or level of discourse are you expecting the treatment to improve? Because the primary purpose of an outcome measure is to demonstrate improvement following treatment, the outcome measure should be aligned with the focus of treatment. Some treatments concentrate on improving the microstructural level of discourse, such as words, phrases, clauses, or sentences, thus emphasizing the lexical, semantic, or syntactic aspects of language.[6]
[9] So, for example, if the treatment is focused on improving word-retrieval ability, an outcome measure that would show improvement in that area, such as an increase in the number of words produced per minute,[10] a reduction in the occurrence of word-finding behaviors,[11] or an improvement in the percentage of words that convey accurate, relevant information[10] might be a logical outcome measure to choose. If the treatment is focused on improving utterance production, then an outcome measure that could demonstrate an increase in the percentage of complete utterances[12] or in the number of embedded clauses per utterance[13] might be considered, depending on the exact goal(s) of the treatment.
The macrostructural level of discourse is concerned with its overall meaning, with the way meaning is organized within the discourse, and with its social or interpersonal purpose.[6]
[9] If treatment is expected to improve discourse production at the macrostructural level, then outcome measures focused on the adequacy of cohesive ties between utterances,[14] elements of story grammar,[15]
[16] or turn-taking interchanges[17] might be expected to reveal improvement depending on the specific focus of the treatment.
Another aspect that will influence the choice of an outcome measure is the genre of discourse that is targeted in treatment. Genre refers to different ways of using language for a particular purpose that are shared within a culture. Different discourse genres are marked by different words and structures.[9] Some discourse outcome measures are specific to a particular genre of discourse (e.g., story grammar is used to analyze narrative discourse), whereas other outcome measures might be used across discourse genres (e.g., adequacy of cohesive ties; the number of words produced per minute). Although some outcome measures can be used across genres, you cannot necessarily make valid comparisons about performance on one of those measures across genres because of different requirements from one genre to another. For example, although increasing the number of words produced per minute might be a desired outcome in a story retell (because such an increase could provide more information more efficiently), increasing the number of words produced per minute in a procedural discourse might result in unnecessarily long, complicated, or rapid instructions, which would not be positive changes.
Finally, some discourse outcome measures were specifically designed to be used with particular elicitation stimuli and cannot be applied when discourses are elicited with other stimuli. For example, main concepts analysis, which analyzes both microstructural (specific words) and macrostructural (concepts essential to convey the gist of a story) discourse elements, was developed by Nicholas and Brookshire to be used with a specific set of elicitation stimuli.[18] Richardson and Dalton developed a different list of main concepts that can be used with a different set of elicitation stimuli.[19]
[20] Because the lists of the main concepts in these two analysis schemes contain vocabulary that is specific to each different stimulus set, they cannot be used with any other stimuli than those for which they were developed. Obviously, in these cases, it would be important for the clinician to have access to the elicitation stimuli that were used to develop the main concepts list.
2. Do you expect that improving discourse might result in changes at the level of activity, participation, or quality of life? Thus far, the emphasis has been on outcome measures that might demonstrate change at the body function, or impairment level.[7] Barak and Duncan stated that “measurement of recovery at just one level gives only a partial picture of the recovery process.”[21] Kagan and colleagues pointed out that we should also try to capture the ways that changes in the impairment lead to changes in participation, confidence, or quality of life.[22] They developed a framework, based on the ICF model, to illustrate this idea in aphasia. The overlapping circles of the Living with Aphasia: Framework for Outcome Measurement (A-FROM; [Fig. 1]) indicate that working in one domain of the model is likely to impact other domains of the model. For example, improving a client's ability to convey information in discourse could increase participation in conversations with family members and friends. It might also increase the client's confidence during conversations.
Figure 1 Living with aphasia: framework for outcome measurement (A-FROM). Reprinted with permission from the Aphasia Institute.
Outcome measures that can capture changes in activities, participation, attitudes, and quality of life are usually patient-reported outcome measures (PROMs), meaning that the client (or patient) completes the outcome tool. There are a variety of PROMs that have been developed specifically for people with aphasia, including the Aphasia Communication Outcome Measure (ACOM),[23] the Assessment for Living with Aphasia—second edition (ALA-2),[24] the Stroke and Aphasia Quality of Life Scale-39 (SAQOL-39),[25] and the Communication Confidence Rating Scale for Aphasia (CCRSA).[26] Including an outcome measure that assesses change at the activity and participation level can highlight the way that treatment has an impact on real-life activities and situations. However, it is important to consider a client's individual situation when deciding whether a particular PROM might capture treatment-related changes. For example, the presence of family or social support can influence whether impairment-level changes translate into changes at the activity or participation level. That is, a client's discourse production might improve in treatment, but without someone at home or in the community to talk to daily, that improvement might not result in improved activities or participation for that client.[21]
3. Can you implement the outcome measure in your work setting? Some practical factors should be considered as you choose a discourse outcome measure. These include ability to access the materials necessary to complete the measure and availability of time to elicit and analyze the discourse sample. In terms of accessing the materials, you should determine whether you or your employer has the funds to purchase the necessary materials if the measure is not freely available. Likewise, think about whether you can complete the procedures to elicit and analyze the discourse in the time allowed for a diagnostic session in your work setting. Some discourse outcome measures require that the discourse be recorded and transcribed before it can be analyzed, and this may take more time than most clinicians have in their daily schedules. For other measures, the analysis can be done during the elicitation of the discourse or while listening to a recording, so that transcribing the discourse is not required. For example, Hula and colleagues reported that there was good reliability between transcription-based scoring of the Story Retell Procedure and scoring from an audiorecording only, without transcription.[27] In this issue, Dalton et al review other non–transcription-based discourse analysis methods.[28]
4. Is there evidence that the discourse outcome measure is relevant for people similar to your client? There are several aspects of this question to consider. First, are there reports demonstrating that the outcome measure has been used successfully with people who have aphasia? Some discourse outcome measures might have been developed to assess discourse in people who sustained traumatic brain injuries or right-hemisphere cerebrovascular accidents. If that is the case, are there also studies that used the measure with people who have aphasia? Although all three groups have discourse impairments, the way that discourse is impaired differs markedly among them.[29] Choosing a measure that has been used to assess the discourse of people with aphasia improves the likelihood that the measure will be relevant for your client.
A second aspect of this question pertains to outcome measures that assess quality of life. Often, people with aphasia have been excluded from participating in research to assess quality of life following stroke.[25] This means that measures developed to assess how stroke affects quality of life may not include questions pertinent to aphasia and may be too linguistically difficult for people with aphasia to complete. Choosing a quality-of-life measure that was developed specifically for people with aphasia, like those mentioned earlier in this article, improves the chances that the measure will allow you to show change related to aphasia treatment.
Finally, if the discourse outcome measure has been used with people who have aphasia, how similar are those people to your client? If your client has severe aphasia but the participants in studies using the measure all had mild aphasia, it might not be the best choice for your client. If your client has Wernicke's aphasia but all of the participants in studies using the measure had agrammatic Broca's aphasia, you should think critically about whether it would provide you with useful information about a change in your client's fluent discourse production. It is the responsibility of the clinician rather than of the developers of the outcome measure to make judgments about the suitability of the measure for his or her own purpose or clients.[30]
Questions Related to the Psychometric Properties of the Discourse Outcome Measure
Clinicians need to be as concerned as researchers about the psychometric robustness of an outcome measure. Clinicians use outcome measures to establish pretreatment performance and to assess treatment-related change. We want to be confident that an increased score on an outcome measure reflects real change and is not a result of error in the measure. Unless an outcome measure's psychometric properties have been established, we cannot be confident about this.[2]
[4]
[6]
[21]
[30]
The development of a psychometrically sound measure is a long process and may involve more than one study.[30] For example, the CCRSA was introduced in a paper published in 2010 and its psychometric properties were established in two follow-up papers.[26]
[31]
[32] The Story Retell Procedure was introduced in a paper published in 1998 and various aspects of its psychometric properties were established in five subsequent publications.[27]
[33]
[34]
[35]
[36]
[37] It is important to be aware, however, that sometimes discourse outcome measures appear in refereed journals despite the fact that there is little or no information about their psychometric properties, either in the original paper describing the outcome measure or in subsequent papers.[6]
[38] For these reasons, clinicians should be aware of the existence or absence of information about the psychometric properties of an outcome measure.[30] The psychometric properties of an outcome measure most often identified as being important for deciding on its clinical use are reliability and responsiveness to change.[6]
[21]
[30]
1. Is there evidence concerning the scoring reliability of the outcome measure? Scoring reliability refers to whether an outcome measure can be scored consistently across scoring attempts and across scorers. It includes intra-rater and inter-rater reliability. Intra-rater reliability is the ability of one rater to score an outcome measure from an individual in the same way on two different occasions without referring back to the earlier scoring outcome. Inter-rater reliability is the ability of two raters to independently score a person's results in the same way. Intra-rater and inter-rater reliability is reported as reliability coefficients that range from 0 to 1. Generally, reliability coefficients less than 0.40 are considered weak, between 0.40 and 0.70 are considered moderate, and above 0.70 are considered strong.[39] Good scoring reliability (i.e., a reliability coefficient of 0.70 or better) is generally a reflection of a well-described, clear, and detailed protocol for administering and scoring the outcome measure. This contributes to confidence that the measure will be administered and scored consistently, thus minimizing measurement error.
2. Is there evidence concerning the stability of the outcome measure? The stability, also called “test–retest reliability” or “session-to-session stability,” of an outcome measure refers to whether it produces the same result on repeated applications when the person being assessed has not changed on the domain or behavior being measured.[40] It is important to establish the stability of an outcome measure to provide confidence that changes on the measure are related to treatment rather than to spurious, day-to-day variability inherent either in the measure itself or in the behavior that it is measuring, which is frequently more variable in clinical populations than in neurologically healthy individuals.[41]
[42] For example, the ability to retrieve words to produce discourse may vary from one day to the next because of variations in the speaker's physiologic and cognitive states—he or she may be more tired or more distracted on one day than another—and this variability, leading to a change in score, might be misinterpreted as change due to treatment. If we know the amount by which an outcome measure varies when there has been no change in the behavior being measured, then we know that a larger change is necessary to be confident that it is a true change that resulted from treatment.
The stability is reported as a reliability coefficient that ranges from 0 to 1. The statistic used to calculate the test–retest reliability coefficient will depend on the kind of data the study is analyzing. For example, the weighted Kappa statistic might be used for categorical data and the intraclass correlation coefficient might be used for continuous data. Generally, values less than 0.50 indicate poor stability, values between 0.50 and 0.75 indicate moderate stability, values between 0.75 and 0.90 indicate good stability, and values above 0.90 indicate excellent stability.[43] Fitzpatrick and colleagues recommended that an outcome measure should have a minimum test–retest reliability coefficient of 0.90 if it will be used to make clinical decisions about changes in an individual's performance, because confidence intervals around an individual's score are wide at reliability levels less than 0.90.[40] Confidence intervals indicate the range of values within which an individual's true score lies at a particular level of confidence (e.g., confidence 90% of the time). Smaller confidence intervals indicate higher precision and less error.[30]
Two other values are very useful when making decisions about whether a client has truly changed on an outcome measure. Donoghue and Stokes[44] suggested that it is better to use the standard error of measurement (SEM), which indicates how much a score varies randomly on repeated measurements, for clinical decisions rather than the value of the reliability coefficient. The SEM can be used to calculate the minimal detectable change (MDC) value. The MDC estimates the change in score on an outcome measure necessary to be confident that the change is a real one and not simply a reflection of measurement error.[45] Studies that report the MDC provide clinicians with a means to objectively determine whether a client's change in score is large enough to be considered a true improvement rather than random day-to-day variability of the behavior. If a client's score changes by an amount equal to or greater than the MDC reported for the outcome measure, a clinician can be fairly confident that treatment, rather than random variability, is responsible for the change.
Occasionally, a study will report a value called the “minimally important difference” (MID) or the minimally clinically important difference (MCID). This represents the smallest change on an outcome measure that would be considered important by a client or a clinician. Although this sounds like it would be a useful value, there is no standard method to derive an MCID, and this has led to problems in interpreting such values.[21]
[46] Establishing agreed-upon methods for deriving and interpreting the MCID is an area of current research in stroke outcomes research.[21]
In summary, when reviewing an outcome measure that has been reported as part of a group study, a clinician can use the MDC value to assess whether a client's change in score represents real change. If the MDC value is not reported but SEMs are reported, a clinician can add and subtract the SEM from a client's score to get a range that includes the client's true performance. If the range of scores obtained in this way does not include the client's pretreatment score on the outcome measure, then it is likely that the client has truly changed in the behavior in question. If neither the MDC nor the SEM is reported, but the study reports that the test–retest reliability coefficient for the outcome measure is greater than 0.90, the clinician can assume that the measurement error is probably small, but the clinician still has no way of knowing objectively whether the client's change in score represents true change.
3. Is there evidence that the outcome measure is responsive to change? Responsiveness is the sensitivity of the measure to change over time, and so may indicate the effects of treatment.[21] Responsiveness is a component of an outcome measure's validity, and it is important if the measure is going to be used to evaluate whether treatment caused a change in score on the measure.[30] The responsiveness of an outcome measure quantifies the magnitude of change, which is often reported as an effect size.[21]
[30] Effect size is a statistical calculation that provides information about the magnitude (e.g., small, medium, or large) of an effect. Calculated effect sizes can transform different scales of measurement to a common scale, so that they can then be compared with each other. An outcome measure that has a large effect size would be considered more responsive to change than an outcome measure that has a medium or small effect size. The MDC, discussed earlier, could also be considered an indicator of an outcome measure's responsiveness.