Responsiveness analysis: a psychometric consideration in evaluating patient-reported outcome measures

There is a greater need for meaningful interpretation of patient-reported outcomes following interventions, especially health-related quality of life outcomes (HRQoL). Application of relevant and psychometrically sound instruments to measure such outcomes in research and practice assist patients, their family members, clinicians and policy makers to understand the impact of an intervention on respective outcomes (1).


Introduction
There is a greater need for meaningful interpretation of patient-reported outcomes following interventions, especially health-related quality of life outcomes (HRQoL). Application of relevant and psychometrically sound instruments to measure such outcomes in research and practice assist patients, their family members, clinicians and policy makers to understand the impact of an intervention on respective outcomes (1).
When the psychometric properties, reliability and validity of an instrument are established, it is said to be appropriate to use for research purposes. However, when it is used for evaluation of a change of a variable, it should be sensitive enough or responsive enough to detect the change when patients improve or deteriorate. Responsiveness or sensitivity to change covers an instrument's ability to accurately detect meaningful change when it occurs (2)(3). Some researchers indicate that responsiveness is an aspect of validity, not a separate dimension (4).

Measurement of responsiveness
Measurement of the statistical significance of change score either between intervention group and control group, or in the same group as pre-and post- measurements depends on the sample size of the study. With large samples, trivial differences may become statistically significant and with a small sample size, large differences may not become statistically significant. As such, there is a growing concern that assessment of change following an intervention should not solely depend on the statistical significance. Estimating the magnitude of the difference between change scores in both groups, the difference between mean change scores are expressed in standard deviation units with the effect size index (ES) (3). Therefore, effect sizes should report with appropriate statistical test and p values as the number of observations required to detect change has not been estimated with power analysis. Effect size analysis should take as a supplement for statistical testing but not as a substitute (5-6).

Continued Medical Education
There are two broad methods for measuring responsiveness, namely distribution-based methods and anchor-based methods (7-8). (8)(9)(10)(11) In this method, the observed change is compared to the statistical properties of the sample, which measures variation attributable to chance. These methods are further classified as group-level and individual-level analyses.

Group level analysis • Cohen D Effect Size (ES)
Cohen's D ES is calculated by dividing the mean difference of paired measurements of the group by pooled standard deviation (SD) (3).

• Standardized Response Mean (SRM)
The SRM is calculated by dividing the difference between mean scores (pre and post) by SD of the change score (13). Confidence intervals are calculated by assuming normal distribution (14) using the substitution method (15). Probability of change statistics ((P) is used to interpret the SRM. The P statistic denotes the probability that the instrument detects a change, representing the proportion of subjects whose scores have changed, and ranges from 0.5 (no ability to detect change) to 1 (perfect ability to detect change) (14,16).
Approximation of the CI of SRM = SRM ± z 1 α √V (14) 2 Probability of change statistic of SRM is selected from the Z distribution Table (

Individual level analysis • Cohen D ES (Individual)
Cohen's D ES is calculated for all participants individually, by dividing the difference between pre-and post-assessment scores of the instrument by pooled SD. The result is then categorized as ES< -0.5 (large decline), -0.5 to 0.5 (small or medium effect), >0.5 (large improvement). A cut off value of ES> 0.5 is recommended as the threshold for meaningful change. Patients with meaningful improvement more than 0.5 of ES is presented as a proportion (11).

• Standard Error of Measurement (SEM)
The SEM is constructed by multiplying the SD of the baseline score by the square root of one minus the reliability of the instrument. Either test-retest reliability or internal consistency reliability (Cronbach's Alpha) is taken to construct the SEM (17)(18). Different thresholds of SEM ranging from 1 to 2.77 have been proposed to consider individual level change as statistically meaningful (17)(18) while the proportion of individuals with change score more than the set cut-off point is the proportion with statistically meaningful improvement. A responsive measure should have more than 2.5% of the sample to have an increase and more than 2.5% of the sample to have a decrease in their change score greater than 2.77×SEM or 1 SEM, according to the threshold taken (8). (7) In this method, the observed change is related to an external indicator or a criterion of change. It compares the changes in patient-reported outcome scores to other clinically meaningful markers or anchors. They are of two types.

Longitudinal approach
Correlation of either total scores or change scores of respective instruments (instrument for which responsiveness is assessed) with external standards or criteria will be assessed longitudinally at each time points of the intervention under longitudinal approach (7). Most studies recommend 0.3 to 0.35 as a correlation threshold to define an acceptable association between an anchor and patient-reported outcome change score (1).

Cross-sectional approach
Cross-sectional approach involves comparison of groups that are different in terms of disease-related criterion to determine clinically meaningful change against anchor criteria. Adhered to this, analysis is carried out to assess whether a significant mean difference of respective instrument under study (instrument for which responsiveness is assessed) exists between patients with and without the disease and between adjacent disease categories (no disease with mild disease, mild disease with moderate disease, etc. e.g. depression) (7). In other words, the change of patient-reported outcome score in the group of people who turned from mild disease to no disease or from moderate disease to mild disease or severe to moderate disease after the intervention. This change in scores could be considered as the minimum important difference (MID) (19). MID or minimal clinically important difference (MCID) is the smallest change in a treatment outcome that an individual patient would identify as important and which would indicate a change in the patient's management (20)(21).
Anchor-based approaches are often easier to interpret for clinical audience than the distribution-based approaches. However, the external anchor chosen must itself be a valid measure of clinical change (22).