Introduction
The use of health outcomes instruments, called patients reported outcomes (PRO) has
grown in hand surgery clinical research. One of the most used PRO instruments has
been the DASH (Disability of the Arm, Hand, and Shoulder). In 2005, Beaton et al[1 ] by comparing three item reduction approaches (concept-retention method, equidiscriminative
item-total correlation, and Item Response theory or Rasch modeling) developed a shorter
version to reduce respondent burden while the PRO instruments kept the psychometric
properties. QuickDASH was the shortened version of the DASH Outcome Measure. It was
constituted by 11 items, instead of 30 items, to measure physical function and symptoms
in individuals with one or more musculoskeletal disorders of the upper. Considering
that a Spanish version of the DASH was already available with a good level of reliability,
validity, and responsiveness,[2 ]
[3 ]
[4 ] a new Spanish version of the shorter version, QuickDASH, was released and approved
by the Institute for Work and Health (IWH), Ontario, Canada (https://dash.iwh.on.ca/sites/dash/public/translations/QuickDASH_Spanish_Spain_2018.pd ) However, no new evidence has been reported about the psychometric properties of
the Spanish version of the QuickDASH instrument.
This paper's purpose was to assess the reliability, measurement error, construct validity,
and responsiveness of the Spanish version of the QuickDASH for outcomes assessment
in Carpal Tunnel Syndrome (CTS).
Methods
Study Population
The study population consisted of 40 consecutive patients (27 women), mean age 54.5
(SD 11.2) years ([Table 1 ]) with the diagnosis of CTS based on clinical and neurophysiological criteria, recruited
from the waiting list for CTS surgery in the National Health System in Tenerife, Spain.
All procedures performed in this study involving human participants were by with the
ethical standards of the institutional national research committee of the University
Hospital of La Candelaria, School of Medicine, University of La Laguna, and with the
1964 Helsinki Declaration and its later amendments or comparable ethical standards.
The ethics committee reviewed and approved this study (Protocol PI-11/16). Written
informed consent was obtained from all individual participants included in the study.
Table 1
Total
(N = 40)
Age
Mean (SD)
54.48 (11.18)
Median (Q1, Q3)
50.5 (47.5, 59.5)
CTS-AL_pre
Mean (SD)
3.76 (0.68)
Median (Q1, Q3)
3.9 (3.4, 4.2)
QuickDASH1
Mean (SD)
63.88 (19.46)
Median (Q1, Q3)
65.9 (53.4, 77.3)
Eq. 5D_pre
Mean (SD)
0.56 (0.31)
Median (Q1, Q3)
0.7 (0.2, 0.8)
Gender
Male
13 (32.5%)
Female
27 (67.5%)
Affected hand
Right
26 (65.0%)
Left
14 (35.0%)
Eligibility Criteria
Inclusion criteria were: (1) numbness or tingling with or without pain in at least
2 of the digits of the median nerve distribution,[5 ]
[6 ] (2) increased symptoms with carpal tunnel provocation tests (Phalen ́test, and/or
reverse Phalen),[6 ] (3) symptoms for over two months,[5 ] (4) failure in conservative treatment,[5 ] and nerve conduction test showing median neuropathy at the wrist (distal motor latency > 4.5
milliseconds, wrist-digit sensory latency > 3.5 milliseconds, or sensory conduction
velocity at the carpal tunnel segment < 40 m/second.[7 ]
[8 ] Exclusion criteria were: clinical or electrophysiological signs of proximal nerve
compression, diabetes or any metabolic disease, rheumatoid arthritis or other general
inflammatory diseases.[5 ]
[6 ]
[9 ]
Clinical Design
We conducted an observational study with a cross-sectional design for the Standard
Error of the Measurement (SRM) and construct validity analyses and a classic cohort
design for test-retest reliability and responsiveness assessment which adhered to
the STROBE guidelines and was the clinical design used in this research.[10 ]
Outcomes Instruments and Measures
The standard Spanish versions of the QuickDASH ((www.iwh.on.ca ), the Carpal Tunnel Syndrome 6 items de Atroshi- Leryn (CTS-AL)[11 ] and Euroquol 5D Index (EQ-5D)[12 ] questionnaires were completed by the patients 1 week before surgery.
The QuickDASH is the shorter version of the DASH PRO instruments developed for measuring
“upper extremities disability.” It is constituted by 11 items, and it is scored from
0 (best =lowest disability) to 100 (worst = highest disability). At least 10 of the
11 items must be completed for a score to be calculated. The assigned values for all
completed responses are simply summed and averaged, producing a score out of five.
This value is then transformed to a score out of 100 by subtracting one and multiplying
by 25. This transformation is done to make the score easier to compare with other
measures scaled on a 0–100 scale. In this paper, it was used the calculation service
tools developed by the Institute for Work and Health to obtain the QuickDASH scores.
http://www.orthopaedicscore.com/scorepages/disabilities_of_arm_shoulder_hand_score_quickdash.html .[13 ]
The CTS-AL measured symptoms severity related to CTS. It is constituted of 6 items.
Five of the 6 items in the CTS-AL have similar item text as the corresponding items
in the 11-item symptom severity scale (Standard CTS-SS questionnaire)[14 ] and the remaining item (a result of the merger of 2 symptom severity scale items)
has text from the 2 items. The CTS-AL has, however, a completely different and improved
layout. The scoring is like that for the 11-item symptom severity score; for each
patient, the item responses are scored from 1 (best) to 5 (worst) and then averaged
for the 6 items to yield a CTS-AL score (only 1 missing item response is allowed).
The EuroQol (EQ-5D) generic health index comprises a five-part questionnaire and a
visual analog self-rating scale. In this paper, it was used he EQ-5D as a health-related
Quality of Life index with five dimensions: mobility (MO), self-care (SC), usual activities
(UA), pain/discomfort (PD), and anxiety/depression (AD). The possible range for each
of the dimension variables is 1 to 3, where 1 = no problems, 2 = moderate problems,
and 3 = extreme problems. Once the analytical dataset had appropriately named dimension
variables, the EQ-5D Index was generated using the EQ-5D TM Scoring algorithm for
Excel Excel http://www.ahrq.gov/professionals/clinicians-providers/resources/rice/EQ5Dscore.html . The EQ-D- Index scores ranged from 0 (death) to 1(best health). Some patients with
severe long-standing diseases may have health states which attracted utility values
below zero, i.e., from a societal perspective they were regarded as being in states
“worse than death”[15 ]. The index score will not be calculated when responses are missing for one or more
of the dimensions. No missing items from the three PRO instruments were observed in
this study.
For assessing test-retest reliability a second self-administration of the QuickDASH
was done 1 week after the first administration and answered a simple questionnaire
consisting of questions regarding changes in their upper extremities' health status
during the preceding week. Finally, the QuickDASH was self-administered to the sample
population 3 months after open carpal tunnel release for responsiveness analysis.
The measurement property “construct validity” included three aspects: structural validity,
hypotheses testing, and cross-cultural validity. Hypothesis testing was used for analyzing
the construct validity of the QuickDASH. Construct hypotheses testing validity is
the degree to which the scores of a PRO instrument are consistent with hypotheses
based on the assumption that the PRO instrument validly measures the construct to
be measured.[16 ]
[17 ]
[18 ] For assessing construct validity it was hypothesized a priori that the QuickDASH
would have a moderate to strong positive correlation with the CTS-AL and a low to
moderate negative correlation with the EQ-5D Index.
Responsiveness is the ability of a PRO instrument to detect change over time in the
construct to be measured and it is related to longitudinal validity or change scores
over time. For that purpose, an a priori hypothesis was established: the QuickDASH,
as an upper extremities-specific instrument would have responsiveness lower than the
CTS-AL (disease-specific PRO instrument), and higher than a generic instrument as
the EQ-5DIndex.
Data Analysis
Reliability
Internal consistency or the degree to the QuickDASH measured a single concept, was
assessed with the Cronbach α coefficient (α > 0.7 indicated a good internal consistency).
Reproducibility or test-retest reliability was analyzed by the Intraclass correlation
coefficient (two-way random effect model and absolute agreement definition (ICC2,1 ),[18 ] Lińs Concordance Correlation Coefficient (CCC), Bland and Altman Limits of Agreement
(LoA) and the non-parametric Passing-Bablok (P-B) orthogonal regression model between
the two administrations before surgery, taking 1 week as washout time.
The Bland-Altman limits of agreement (LoA)[19 ] calculate the difference between both measures for each subject (di = Y – X) and
it is faced with the mean ((Xi + Yi)/2) of both measures for everyone. If we assume
the normal distribution of the differences, it is expected that 95% of the differences
should be between the limits of the interval as an indicator of a good agreement.
The normality of the distribution of the difference was analyzed by the Shapiro-Wilk
test; besides, kurtosis and skewness were also analyzed using a level of significance
of 0.05.
The Passing–Bablok (P-B) regression line of agreement was used in case we did not
need to assume a normal distribution of the differences. The P-B analysis is a non-parametric
estimation of the orthogonal regression line between the two methods or measures.
The linear equation will be Y = A + BX + ε, in which A is the constant difference
between the two measures, B is the proportional difference, and ε is a random variable
with a mean equal to zero which represents the random non-systematic error between
the two measures. When A = 0 and B = 1 meant that both measures presented the same
error, they presented an excellent agreement, and they were comparable and interchangeable.
Based on the P-B non-parametric regression we could determine if there were any significant
systematic constant and/or proportional differences between two measures.[18 ]
Measurement Error
Cross-sectional precision was analyzed with the Standard Error of the Measurement
(SEM = SD multiplied by the squared root of 1-Cronbach coefficient). Longitudinal
precision for the test-retest reliability coefficient was analyzed with the Standard
Error of the Measurement difference (SEMdiff = SD multiplied by the squared root of
1-ICC multiplied by the squared root of 2) and the Minimal Detectable Change at 90%
confidence level (MDC90 = SEM diff multiplied by 1.65) and 95% confidence level (MDC95 = SEM
diff multiplied by 1.96).[20 ]
Hypothesis Construct Validity
The construct validity hypotheses were analyzed with the Pearson correlation coefficient
(r), using a level 0.05 for statistical significance. Values between 0.8 and 1.0 indicating
a very strong relationship, between 0.6 and 0.8 a strong relationship, between 0.4
and 0.6 a moderate relationship, between 0.2 and 0.4 a weak relationship, and less
than 0.2 very weak or no relationship.[21 ]
Responsiveness
The responsiveness of every PRO instrument was evaluated by calculating the effect
size (ES = mean change scores /standard deviation (SD) of the baseline scores) and
the standardized response mean (SRM = mean change scores / SD of the change). A large
SRM or ES indicated high sensitivity to change, and an ES or SRM >0.8 meant an important
clinical improvement.[18 ]
Sample Size
For an expected ICC of 0.85 in the test-retest reliability, a 95% confident Interval
(CI) width of 0.20, 2 measures, the expected sample size was 31. For construct validity,
a priori sample size calculation for the correlation analysis showed that based on
the proposed null hypothesis (Ho = the correlation is equal to zero), with the 2-sided
test, 0.05 significance level, 80% power and expected minimum r of 0.4, a sample size
of 37 patients would be needed (Stata 16.1. StataCorp 4905. Lakeway Drive. College
Station, Texas 77845 USA)
Results
Reliability
Internal consistency analysis showed a Cronbach α of 0.912 with an average interitem
covariance of 0.522. Absolute test re-test agreement analysis demonstrated an ICC2,1 of 0.868 (95% C.I.: 0.750 to 0.930) (p < 0.001) and a CCC of 0.738 (95% C.I.: 0.597 to 0.877) (p < 0.001). Bland- Altman analysis showed that the estimated absolute difference bias
between both administrations of the QuickDASH was 5.47 units (SD 13.39) with a LoA
of -20.767 to 31.707 being 95% of the cases included in the interval (2 cases over
the limit which meant a 5% and no one case under the limit ([Fig. 1 ]). However, the assumption of normality of the difference's distribution was not
confirmed (Shapiro-Wil test p = 0.005, Skewness test p = 0.0019, Kurtosis test p = 0.0029).
Fig. 1 Bland-Altman Limits of Agreement. The Bland-Altman limits of agreement (LoA) calculate
the difference between both measures for each subject (di = Y – X) and it is faced
with the average ((Xi + Yi)/2) of both measures for everyone. It is shown that 95%
of the cases were included in the interval (only 2 cases over the limit which meant
5% and no one case under the limit).
P-B regression showed a median difference between both administrations of the QuickDASH
scores in the test re-test reliability of 2.3 units with no significant constant and
proportional differences ([Table 2 ]) ([Fig. 2 ]).
Fig. 2 Passing Bablok Regression Line. A is the constant difference between the two measures,
and B is the proportional difference. If A = 0 and B = 1 meant that both measures
presented the same error, and they presented an excellent agreement. Observe that
the null hypotheses (A = 0 and B = 1) were included in the 95% confidence interval
and the P-B showed no significant constant and proportional differences.
Table 2
Variable
Valid
Miss
Obs
Median
Mean
Minimum
Maximum
SD
Y: QDash1
40
0
40
65.9
63.88
13.6
95.5
19.45837
X: QDash2
40
0
40
63.6
58.41
11.4
97.7
19.73363
Y-X
40
0
40
2.3
5.47
-15.9
50
13.38622
100*(Y-X)/X
40
0
40
4.58%
16.0%
-30.5%
199.5%
43.0%
Measurement Error
Cross-sectional precision analysis demonstrated an SEM of 5.785. Longitudinal precision
showed a SEdiff of 10.0001 with an MDC95 = 19.6 and an MDC90 = 16.5 ([Table 3 ]).
Table 3
A) Cross sectional precision
Cronbach α
SD
SEM
Cross Sectional
Precision. 95% CI
Cross Sectional precision. 90% C.I.
0.912
19.458
5.785
-/+ 11.339
-/+ 9.546
B) Longitudinal precision.
ICC2,1
SD
SE
diff
MDC95
MDC90
0. 868
19.4584
10.000
19.600
16.500
Construct Validity
The QuickDASH presented a positive strong correlation with the CTS-AL (r = 0.635)
and negative a moderate correlation with the EQ-5D Index (r = -0.492), being significant
(p < 0.001) ([Supplementary material ]).
Responsiveness
The QuickDASH showed a responsiveness (ES = 2.1; SRM = 1.97) lower than the CTS-AL
(ES = 3.53; SRM = 3.50), and higher than the EQ-5D Index (ES =0.78; SRM =0.83) ([Table 4 ]).
Table 4
PRO Instrument
Pre-op
Post-op
Change pre-post
Responsiveness
Mean (SD)
Mean (SD)
Mean (SD)
ES
SRM
CTS_AL
3.76 (0.68)
1.38 (0.44)
2.38 (0.68)
3.53
3.50
QuickDASH
63.88 (19.46)
23.01 (13.19)
40.87 (20.75)
2.10
1.97
Eq. 5D Index
0.56 (0.31)
0.80 (0.19)
0.24 (0.29)
0.78
0.83
Discussion
The results have demonstrated that the QuickDASH PRO instrument presented a good level
of internal consistency and a high level of test-retest reliability with an excellent
level of absolute agreement coefficients (ICC2,1 , CCC) without significant constant or proportional systematic differences between
the scores of two administrations, 1 week apart. Results of validity and responsiveness
analyses were coincidental with the construct hypotheses formulated a priori as proof
of good cross-sectional and longitudinal construct validity.
Different tools have been developed for assessing the quality of a PRO instrument.
The work of the original team, from the Medical Outcomes Trust, that developed the
classic generic instruments, the SF-36 and SF-12, was the basis for two guidelines:
“Evaluating the Measurement of Patient-Reported Outcomes” (EMPRO),[22 ] and “Consensus-based standards for the selection of health measurement instruments”
(COSMIN).[17 ]
[23 ] Based on the COSMIN, three main quality domains could be distinguished to assess
a PRO Instrument: reliability, validity, and responsiveness. Each domain may include
different measurement properties.[18 ] Reliability included 3 measurement properties: internal consistency, test-retest
reliability, and measurement error. Cronbach's α coefficient is an estimate of internal
consistency and depend on the number of items in a scale and their magnitude of intercorrelation
and values >0.7 indicate good internal consistency. In this research we observed a
Cronbach́s α of 0.912 very close to that one reported by Gabel,[24 ] Alnahdi,[25 ] and Claro da Silva[26 ]; and slightly lower than the value of 0.94 reported by Hammond et al.[27 ] (British version), Beaton et al.[1 ] in the development of the QuickDASH and that one reported by Schønnemann et al.[28 ] (α = 0.96 Danish version) [2016]. Absolute agreement in test-retest reliability
analysis demonstrated an excellent level of agreement with an ICC2,1 of 0.85 and CCC of 0 0.74 very similar to the Brazilian QuickDASH (ICC2,1 = 0.81)[26 ] and the Korean version (ICC 2,1 = 0.83)[29 ] ([Table 5 ]). Measurement error is another psychometric property included in the domain of reliability
by COSMIN. SEM or cross-sectional precision gives us information about the variation
around an observed score. The true score exists within this range. In this research,
the SEM of 5.78 was slightly higher compared with that one reported by Beaton et al
in the description of the longer version DASH (SEM = 4.4).[1 ] For longitudinal precision of the measurement we calculated the SEdiff, MDC90, and MDC95. The MDC is the minimum change score required before an individual can
confidently be considered to have changed by more than day-to-day variability.[30 ] In this research, our result at the level of MDC90 = 16.5 was very similar to the range of 11.0–17.2 reported in a Systematic review
of the quality of measurement properties of the QuickDASH by Kennedy et al.[31 ] However, our MDC95 19.6 was lower than that one published by Mintken et al.[32 ] When we compare MDC we should take in account that this absolute reliability index
depends on the study population, washout interval, time point during the follow-up
when the test-retest analysis was done, and the SD of the data.[3 ] Important issue in the discussion of test-retest reliability is that the mean difference
score between the two administrations of the QuickDASH 1 week apart (5.47) ([Table 2 ]) was lower than the MDC95 as a proof that that change should be considered as a change by more than day-to-day
variability.
Table 5
Translated DASH
Study author, year
Study population
Testing reported in study
Reliability: Internal Consistency (Cronbach́s Alpha)
Reliability: Test-retest (ICC unless stated otherwise)
Validity
Responsiveness
Arabic
Alnahdi 2021[25 ]
Participants with upper extremity musculoskeletal disorders
QuickDASH 0.90
ICC (2,1) = 0.91
Construct
GAF r = -0.53
NRPS r = 0.52
RAND 36:
Physical functioning r = -0.77
Emotional well-being r = -0.47
Pain r = -0.75
Brazilian
Claro da Silva 2020[26 ]
Participants with Upper extremity disorders
DASH 0.93
QuickDASH 0.88
ICC = 0.81
Construct
DASH r = 0.91
SF-12
Physical component r = -0.55
Mental component r = -0.49
Observed change at third interview
SRM = 0.94
Chinese
Cao 2019[33 ]
Patients suffering from upper limb chronic disorders
QuickDASH 0.81
ICC = 0.90
Construct
DASH r = 0.820
VAS r = 0.463
SF-36 subscales
Physical function r = -0.630
Role physical r = -0.471
Bodily pain r = -0.563
General health r = -0.414
Vitality r = -0.053
Social function r = -0.178
Role emotional r = -0.010
Mental health r = -0.165
−
Danish
Schønnemann 2016[28 ]
Patients with distal fracture radius
QuickDASH 0.96
ICC = 0.94
−
−
English (British)
Hammond 2018[27 ]
Patients with rheumatoid arthritis
DASH 0.98
DASH Work 0.94
ICC (2,1) = 0.95
Concurrent
QuickDASH r = 0.61–0.91
DASH r = 0.61–0.99
DASH Work r = 0.53–0.80
DASH Spam r = 0.52–0.78
Italian
Franchignoni 2011[34 ]
Patients with upper limb disorders
QuickDASH 0.87
−
−
−
Korean
Hong 2018[29 ]
Patients with Carpal Tunnel Syndrome
QuickDASH 0.89
ADL 0.89
Social Activity 0.70
Symptom 0.72
ICC (2,1)= test for each items ranged 0.64 to 0.98 and 0.83 for all QuickDASH
Construct
DASH r = 0.975
Pain VAS r = 0.365
Blend Scale of NCS r = -0.074
Grip Power r = -0.309
Pinch power r = -0.327
Observed change six months after operation
SRM = 1.00
The COSMIN established that the appropriate statistics for assessing measurement error
are SEM, LoA, and the smallest detectable change (SDC) or MDC. Changes within the
LoA or smaller than the MDC95 are likely to be due to measurement error and changes outside the LoA or larger than
the SDC should be considered as real change. In this research, we included the LoA
as an appropriate statistic for test-retest agreement because Bland and Altman[33 ] developed the LoA as an alternative analysis to the ICC; but the assumption of normality
of the differences between the two measures was violated (Shapiro Wilks p = 0.005) in our sample. Consequently, LoA results should not be considered.
Rosales & Atroshi[18 ] introduced in the hand surgery field the use of the non-parametric P-B analysis
to determine constant and/or proportional systematic errors between two measures with
the advantage that the assumption of normality was not necessary. Our result demonstrated
that there was no constant (A = 2.3; 95% CI: -9.71 to 16.06) or proportional (B = 1;
95% CI: 0.80 to 1.20) systematic differences between the two administrations of QuickDASH,
1 week apart.
Construct validity analysis confirmed the a priori hypothesis as proof of good validity
of the QuickDASH for measuring the proposed construct by a cross-sectional design
(correlations between QuickDASH, CTS-AL, and EQ-5D) and by a longitudinal design or
responsiveness analysis (comparison of ES and SRM of the QuickDASH, CTS-AL, and EQ-5D
from baseline to 3 months after surgery).
Strong point of this study is that based on COSMIN check list,[17 ]
[23 ] the reliability, validity and responsiveness of this paper met most of the design
requirements as standards for good methodological quality of a PRO instrument: description
of how the missing items were handled, percentage of missing values, sample size calculation,
at least two independent measurements with appropriated and stable wash out interval
of 1 week in which conditions were similar for both measurements in the reliability
analysis, the hypotheses testing regarding correlations and responsiveness were formulated
a priori before the data collections, the expected directions of the correlations
and effect sizes were included in the hypothesis, the expected absolute or relative
magnitude of correlations effects sizes were include in the hypothesis, there was
an adequate description of comparator instruments and their properties, and finally
adequate clinical design and statistical method for the hypotheses to be tested.
The limitation of this study is the lack of interpretability analysis which can give
information about MICD (minimal important clinical difference) or the degree to which
one can assign qualitative clinical meaning to an instrument's quantitative scores
or change in scores.[17 ]
[18 ]
[23 ] Further studies regarding MICD are recommended to complete the analysis of the measurement's
properties of the Spanish QuickDASH PRO instrument.
In conclusion, this study has demonstrated that the Spanish (Spain) version of the
QuickDASH instrument has good reliability, construct validity, and responsiveness
for outcomes assessment in CTS.