Key words
CT - thorax - cystic fibrosis - Brody score - reproducibility - image interpretation
Introduction
Cystic fibrosis (CF) is the most frequent life-shortening autosomal recessive disease
among the Caucasian population in Europe, with an incidence about 1 in 2,500 live
births and a carrier frequency of about 4 – 5 % [1]
[2]
[3]. Mutations in a single large gene on chromosome 7 cause dysfunction of cAMP-dependent
chloride channels in exocrine tissues, particularly in the lung and pancreas [3]
[4]. Due to improved health care, life expectancy among CF patients has continuously
increased, from about 25 years in 1985 to 37 years in 2008 [5].
There is no generally accepted gold standard for imaging CF-related lung disease in
adult patients. However, some clinical institutions have established follow-up examinations
at regular intervals that include computed tomography (CT) imaging [6]. In our institution, adult CF patients over the age of 18 years are regularly assessed
by means of lung function tests once every 6 months and by low-dose-CT of the chest
(LDCT) for lung morphology once every four years. Computed tomography (CT) has become
the gold standard for imaging CF-related lung disease, with lung morphology appearing
to be complementary to lung function tests (LFTs) in at least 50 % of CF patients.
Chest radiographs and LFTs alone fail to recognize structural lung alterations caused
by CF [7]. MRI has not been widely applied to evaluate normal or mildly damaged lung structure
in CF patients. Despite radiation exposure, the risk of CT-induced mortality is likely
to be minimal in CF patients when compared to the benefit of a more accurate diagnosis
[7]. The ACE systems of modern CT scanners significantly reduce dose exposure [8]
[9]. Also, low-dose multidetector row CT of the lung (LDCT) shows high morphologic accuracy
in non-malignant lung disease [10].
Different high-resolution (HR) CT-based scoring systems, with a range of morphologic
parameters and different weighting of results, have been designed to support the evaluation
of structural lung damage in CF patients. For some scoring systems, HRCT findings
have been correlated with LFTs [11]
[12]. A modification of the Bhalla score for structural lung alteration in CF has previously
been applied in an adult population [2]. Brody and co-workers have developed a scoring system for HRCT scans obtained in
both inspiration and expiration in children suffering from CF [13], which demonstrates both high reproducibility and accuracy [12]
[14]. Since many CF patients live through adulthood, it appears necessary for optimal
radiological patient management to extend CT scoring systems for children with CF
to include adults with CF-associated lung disease [15] who underwent LDCT in inspiration.
Since the reproducibility of scoring results in the repetitive evaluation of the same
chest CT images is a key feature of clinical reliability of any particular CT scoring
system, we conducted an intra- and inter-observer reproducibility study of the Brody
scoring system in a selected set of LDCTs of the chest performed in adult CF patients.
We determined the short-term reproducibility (28 – 60 days) of the Brody score among
second-year radiology residents, to understand how firmly its different features would
be integrated in new clinical knowledge. We also assessed the long-term (at least
two years) reproducibility of the Brody score in an attending radiologist with 10
years of post-fellowship chest radiology experience, since time intervals between
follow-up CT scans of the chest in CF patients are often in the range of several years
[2]
[6]. It has previously been established that the short-term reproducibility of the Brody
score is high among experienced chest radiologists [12]
[14].
We hypothesized that the intra-observer reproducibility of the Brody score and its
different subscores in adult patients with CF who undergo LDCT of the chest in inspiration
would be within +/–10 % of the average score, both with a short time interval of 28 – 60
days for second-year radiology residents and with a long time interval of at least
2 years for an attending radiologist. Similarly, we hypothesized that the inter-observer
reproducibility of the Brody score and its different subscores in adult patients with
CF who undergo LDCT of the chest in inspiration would be within +/–10 % of the average
score between second-year radiology residents and an attending radiologist.
Materials and Methods
Patients
This retrospective analysis was based on LDCT examinations of the chest performed
between February 2006 and October 2010, in 15 consecutive adult CF patients (male,
n = 7, female n = 8, age range, 18 – 50 years, average, 33 years), each of whom had
only had one LDCT examination at our institution. The clinical indication for LDCT
was to assess the current lung morphology status. Institutional policy for adult CF
patients over the age of 18 years includes the assessment of lung morphology status
by means of LDCT once every four years. Additional MDCT scans are acquired when there
is clinical suspicion or evidence of aggravation of CF-related lung disease.
Ethical Issues
Institutional ethics committee approval was obtained for the retrospective analysis
of data previously obtained for individual clinical treatment of CF patients. Data
included in this study were evaluated and presented in accordance with the World Medical
Association (WMA) Declaration of Helsinki Ethical Principles for Medical Research
Involving Human Subjects, as last amended by the 59th WMA General Assembly, Seoul, Korea, October 2008.
LDCT Protocol
LDCT of the chest was invariably performed in inspiration, at a tube charge of 120
KVp, with the lowest technically achievable tube current that would still generate
images of diagnostic value and left no option of further dose reduction by means of
dose modulation programs.
Through January 2007, examinations were conducted on a 4-row MDCT scanner (Mx 8000,
Philips Medical Systems, Hamburg, Germany) at tube current, 35 mA, collimation, 4x1 mm,
rotation time, 0.5 s, pitch, 1.75, effective slice thickness, 1.3 mm, effective tube-current-time
product per slice (TCTP), 10 mAs, weighted CT dose index (CTDIw), 1.0 mGy, reconstruction
algorithms “C” and “D”. For an adult female patient of average weight and height,
the protocol results in a mean effective dose of 0.5 mSv [7]. From February 2007, a 64-row MDCT scanner (Brilliance-CT64, Philips Medical Systems,
Hamburg, Germany) was applied, at tube current, 35 mA, pitch 1.173, collimation, 64x0.5 mm,
rotation time, 0.5 s, TCTP, 15 mAs, reconstruction algorithms “C” and “YA”, effective
slice thickness, 0.625 mm, CDTIw, 1.0 mGy. After reformatting images in the sagittal,
coronal, and axial planes (slice thickness 3 mm), and images with 1.0 – 1.5 mm slice
thickness in at least one plane, from primary LDCT reconstructions with a slice thickness
of 1.0 to 1.3 mm and 50 % slice overlap, the images were filed in an electronic picture
archiving and communication system (Syngo PACS, Siemens Medical Solutions, Erlangen,
Germany).
LDCT Evaluation
Axial and coronal images (slice thickness 3 mm and 1.5 mm) were obtained from the
PACS and analyzed side-by-side on two 1k-PACS monitors, with one image per display,
at window width and window level settings of 1,600 HU and -600 HU (lung window), respectively.
Sagittal images were only evaluated to match findings when this was not possible on
coronal or axial images alone.
Independent of each other, three radiologists (R1, R2, R3) with different clinical
work experience each evaluated all LDCT examinations twice, according to the scoring
system previously introduced by Brody and co-workers for children with CF [13]. R1 was an attending radiologist with 10 years of post-fellowship clinical chest
radiology experience. R2 and R3 were second-year radiology residents, with 17 and
12 months of post-graduate clinical work experience, respectively, who were on a 6-month
CT training rotation with approximately 50 % of the case load involving chest CT assessments.
The “Brody score” is a weighted composite (total) score for CF-related changes in
lung morphology that describes and quantifies the respective presence, location, and
extent of bronchiectasis, mucus plugging, peribronchial thickening, parenchymal opacity,
and hyperinflation in the periphery and in the center of the upper, lower and middle
right and left lung lobes, respectively. The lingual segment constitutes the left
middle lobe equivalent. The Brody score ranges between 0 and 207.00 points, at increments
of 0.25 points, and increases with the severity of the CF-related changes in lung
morphology [13]. The Brody score was originally developed to assess lung morphology in incremental
HRCT scans obtained in both inspiration and expiration in 6- to 10-year-old children
[13]. We analyzed LDCT scans of adult CF patients obtained in inspiration, only, for
parenchymal opacity and hyperinflation.
Statistical Analysis
Average values of the six individual scoring results (two each for R1-R3) of the Brody
score and its five different subscores [13] were plotted along the x-axis of modified Bland-Altmann plots [16], and respective deviations of individual scoring results from the average were plotted
along the y-axis. Both the absolute deviations (in Brody scoring points) and the relative
deviations (in percent of the mean) were recorded.
Results
The respective median time intervals and ranges between the first and second LDCT
readings were 4.7 years (2.0 – 7.0) for R1, 42 days (32 – 60) for R2, and 37 days
(28 – 47) for R3.
Brody Score (Total Score)
Among the 15 patients, the average Brody scores between the three independent readers
ranged from 20.0 to 132.7 points (median, 58.4 points, mean, 68.0 points, possible
range, 0 – 207.0 points). The individual Brody scores ranged from 10.0 – 146.5 points
(median, 63.0 points, mean, 70.7 points) for R1, 25.0 – 133.5 points (median, 57.9
points, mean, 66.4 points) for R2, and 22.0 – 134.5 points (median, 59.9 points, mean,
66.9 points) for R3.
The average deviation of all individual Brody scores from the mean value of all readers
was 4.9 points (7 %), with a range of 1.5 – 10.3 points (2 – 30 %). The mean deviation
of Brody scores from individual respective average values was 8.7 points (12 %, range,
0.5 – 26.0 points, 1 – 36 %) for R1, 3.1 points (5 %, range, 0 – 6.3 points, 0 – 12 %)
for R2, and 2.2 points (3 %, range, 0 – 7.0 points, 0 – 9 %) for R3. Findings implied
an average intra-individual reproducibility of Brody scoring results within about
7 % of the individual mean score, with greater deviations in the long-term re-assessment
by the most experienced reader. There was no evidence of skewing of the average deviation
toward lower or higher average values of the Brody score ([Fig. 1a – 3]).
Fig. 1 Intra- and inter-observer reproducibility of the Brody score and its five different
subscores between three independent readers in 15 patients with CF-related lung disease
of different severity. a Brody score (possible range, 0 – 207.00 points, average observed range, 20.00 – 132.67
points), b bronchiectasis subscore (possible range, 0 – 72.00 points, average observed range,
10.17 – 56.63 points), c mucus plugging subscore (possible range, 0 – 36.00 points, average observed range,
0.50 – 15.83 points), d peribronchial thickening subscore (possible range, 0 – 54.00 points, average observed
range, 4.67 – 36.38 points), e parenchymal opacity subscore (possible range, 0 – 54.00 points, average observed
range, 0 – 13.67 points), f hyperinflation subscore (possible range, 0 – 27.00 points, average observed range,
4.17 – 20.08 points).
Abb. 1 Intra- und Inter-Betrachter-Reproduzierbarkeit des Brody Score und seiner fünf Subscores
von 15 Patienten mit durch CF verursachter Lungenerkrankung anhand der Auswertungen
von drei unabhängigen Betrachtern. a Brody-Score (möglicher Bereich 0 – 207,00 Punkte, durch schnittlich ermittelter Bereich
20,00 – 132,67 Punkte), b Bronchiektasen Sub-score (möglicher Bereich 0 – 72,00 Punkte, durchschnittlich ermittelter
Bereich 10,17 – 56,63 Punkte), c Schleimverlegung Subscore (möglicher Bereich 0 – 36,00 Punkte, durchschnittlich ermittelter
Bereich 0,50 – 15,83 Punkte), d peribronchiale Zeichnungsvermehrung Subscore (möglicher Bereich 0 – 54,00 Punkte,
durchschnittlich ermittelter Bereich 4,67 – 36,38 Punkte), e Gewebekonsolidierung Subscore (möglicher Bereich 0 – 54,00 Punkte, durchschnittlich
ermittelter Bereich 0 – 13,67 Punkte), f Überblähung Subscore (möglicher Bereich 0 – 27,00 Punkte, durchschnittlich ermittelter
Bereich 4,17 – 20,08 Punkte).
Fig. 2 LDCT in axial (upper left, at the level of the tracheal carina, upper right, 5 cm
above the tracheal carina, lower left, 5 cm below the tracheal carina) and coronal
(lower right) reconstructions of a patient with mild manifestation of CF-related lung
disease.
Abb. 2 Niedrig-Dosis-CT eines Patienten mit geringer Ausprägung einer durch CF verursachten
Lungenschädigung in axialer (oben links auf Höhe der Trachealbifurkation, oben rechts
5 cm oberhalb der Trachealbifurkation, unten links 5 cm unterhalb der Trachealbifurkation)
und koronaler (unten rechts) Rekonstruktion.
Fig. 3 LDCT in axial upper left, at the level of the tracheal carina, upper right, 5 cm
above the tracheal carina, lower left, 5 cm below the tracheal carina) and coronal
(lower right) reconstructions of a patient with severe manifestation of CF-related
lung disease.
Abb. 3 Niedrig-Dosis-CT eines Patienten mit schwerer Ausprägung einer durch CF verursachten
Lungenschädigung in axialer (oben links auf Höhe der Trachealbifurkation, oben rechts
5 cm oberhalb der Trachealbifurkation, unten links 5 cm unterhalb der Trachealbifurkation)
und koronaler (unten rechts) Rekonstruktion.
Bronchiectasis Subscore
The average deviation of individual bronchiectasis subscores (possible range, 0 – 72.0
points) from the mean value of all readers was 4.2 points (16 %), with a range of
1.2 – 11.7 (8 – 43 %). The mean deviation of bronchiectasis subscores from individual
respective average values was 7.9 points (25 %, range, 0.5 – 17.0 points, 2 – 67 %)
for R1, 3.7 points (16 %, range, 0.5 – 21.0 points, 1 – 57 %) for R2, and 1.2 points
(5 %, range, 0 – 4.0 points, 0 – 20 %) for R3, implying an average intra-individual
reproducibility of bronchiectasis subscore results within about 15 % of the individual
mean score, with greater deviations for long-term re-assessment by the attending radiologist
([Fig. 1b], [3]).
Mucus Plugging Subscore
The average deviation of all individual mucus plugging subscores (possible range,
0 – 36.0 points) from the mean value of all readers was 1.9 points (19 %), with a
range of 0.3 – 4.2 points (2 – 133 %). The mean deviation of mucus plugging subscores
from individual respective average values was 2.1 points (18 %, range, 0 – 6.0 points,
0 – 67 %) for R1, 1.4 points (16 %, range, 0 – 5.0 points, 0 – 67 %) for R2, and 1.1
points (11 %, range, 0 – 6.0 points, 0 – 100 %) for R3, implying an average intra-individual
reproducibility of mucus plugging subscore results within about 12 % of the individual
mean score, with similar deviations for short-term and long-term re-assessment ([Fig. 1c], [3]).
Peribronchial Thickening Subscore
The average deviation of all individual peribronchial thickening subscores (possible
range, 0 – 54.0 points) from the mean value of all readers was 2.7 points (15 %),
with a range of 0.3 – 7.0 points (3 – 38 %). The mean deviation of peribronchial thickening
subscores from individual respective average values was 4.7 points (22 %, range, 0 – 12.0
points, 0 – 67 %) for R1, 2.4 points (14 %, range, 0 – 10.3 points, 0 – 36 %) for
R2, and 0.67 points (4 %, range, 0 – 2.8 points, 0 – 13 %) for R3, implying an average
intra-individual reproducibility of mucus plugging subscore results within about 13 %
of the individual mean score, with greater deviations for long-term re-assessment
by the attending radiologist ([Fig. 1 d], [3]).
Parenchymal Opacity Subscore
The average deviation of all individual parenchymal opacity subscores (possible range,
0 – 54.0 points) from the mean value of all readers was 1.8 points (41 %), with a
range of 0 – 5.2 points (0 – 111 %). The mean deviation of parenchymal opacity subscores
from individual respective average values was 2.6 points (86 %, range, 0 – 13.0 points,
0 – 200 %) for R1, 1.5 points (35 %, range, 0 – 5.0 points, 0 – 200 %) for R2, and
0.9 points (17 %, range, 0 – 7.0 points, 0 – 93 %) for R3, implying an average intra-individual
reproducibility of parenchymal opacity subscore results within about 46 % of the individual
mean score, with similar deviations for short-term and long-term re-assessment ([Fig. 1e]).
Hyperinflation Subscore
The average deviation of all individual hyperinflation subscores (possible range,
0 – 27.0 points) from the mean value of all readers was 3.1 points (37 %), with a
range of 0.9 – 7.4 points (4 – 71 %). The mean deviation of hyperinflation subscores
from individual respective average values was 2.7 points (62 %, range, 0 – 8.0 points,
0 – 200 %) for R1, 1.2 points (11 %, range, 0 – 5.5 points, 0 to 71 %) for R2, and
1.4 points (14 %, range, 0 – 10.5 points, 0 – 67 %) for R3, implying an average intra-individual
reproducibility of hyperinflation subscore results within about 29 % of the individual
mean score ([Fig. 1f]), with greater deviations for long-term re-assessment by the attending radiologist
([Fig. 1f]).
Discussion
In a retrospective analysis, we applied the Brody score to assess lung morphology
in adult CF patients, based on LDCT scans of the lung performed in inspiration, and
tested for the reproducibility of scoring results. The Brody score is a weighted composite
score that was developed to describe and quantify lung morphology in children suffering
from CF based on incremental HRCT scans of the lung obtained in both inspiration and
expiration. The Brody score sums up its five different subscores for bronchiectasis,
mucus plugging, peribronchial thickening, parenchymal opacity, and hyperinflation
[13]. Reproducibility was tested among two second-year radiology residents, with short-term
intervals of 28 – 60 days between subsequent assessments of the same LDCT scan, and
an attending radiologist with 10 years of post-fellowship clinical chest radiology
experience who re-assessed all LDCT scans in this study after long-term intervals
of 2 – 7 years.
The respective mean, median, and range values for the Brody score were similar between
readers. The range covered both very minor and severe lung involvement with CF-related
disease, leaving out only the most severe part of the possible range. The findings
imply that, overall, the different radiological features underlying the Brody scoring
system are easy to understand, recognize, and weigh in their respective severity even
with relatively little chest radiology experience. These findings expand previous
experience which was largely based on expert readings of chest images of CF patients
[12]
[13]
[14]
[17].
For the Brody score as the weighted composite score sum of its different subscores,
the study hypothesis of inter- and intra-observer deviations of 10 % or less from
the respective averages was met by the average of our findings, and, particularly,
by the second-year radiologists who re-assessed LDCTs within 28 – 60 days. The latter
finding corroborates previous reports of high short-term reproducibility of different
HRCT scoring systems for CF patients of different ages among expert readers [12]
[13]
[14]
[17]. The findings appear to expand previous knowledge to include a more general perception
of the Brody score as a tool that reproducibly assesses lung morphology changes not
only among pediatric patients in the hand of expert readers [12]
[13]
[14], but also among adult CF patients in the hands of radiology residents, as in this
study. However, high reproducibility only seems to hold for short-term re-assessments.
In fact, our finding of decreased reproducibility of Brody scoring results in the
hands of an experienced chest radiologist after time intervals of 2 – 7 years, when
compared with short-term reproducibility among radiology residents, implies that either
memory effects play a role in the short-term reproduction of scoring results or visual
perception may change over time. From a clinical point of view, it appears to be better
to review the previous chest CT along with the current one and perhaps newly apply
the Brody scoring system to both than to score only the current scan and rely on the
old Brody scoring results of the previous one.
For the individual subscores, the study hypothesis of an inter- and intra-observer
deviation of 10 % or less from the respective averages was not met. Rather, individual
sub-scoring results deviated by as much as 200 percent from the average. This finding
was more pronounced among subscores with smaller score point ranges, i. e., mucus
plugging and hyperinflation, and among subscores that appear to be more difficult
to assess by visual means, i. e. parenchymal opacity and hyperinflation. Among the
former, the lower point range increases both the susceptibility to small changes and
the relative impact of deviations between subsequent assessments. Among the latter,
visual impressions of lung parenchymal density and its distribution could be a source
of uncertainty. When two neighboring areas of lung parenchyma show with different
densities, the first could be hyperinflated while the second is normal, or the first
could be normal while the second demonstrates ground glass opacity of some origin.
Possible solutions would include the addition of an expiratory CT scan of the chest
to more easily recognize areas of hyperinflation by their relative lack of volume
change when compared with normal lung, or the addition of CT density measurements
in areas of uncertainty. While the former would expose the patient to additional radiation,
potentially at uncertain gain in the individual case, the latter would require lists
of CT density value ranges that would be accepted as normal, too high, or too low
for CF patients.
Limitations of our retrospective reproducibility study include the lack of experienced
readers reassessing CT examinations after a short time interval, the restriction of
experience at long time intervals to one experienced reader only, and the relatively
low number of patients included. Since previous studies were based on the short-term
reproducibility of CT scoring results in CF patients among experienced readers, it
did not appear to add crucial knowledge to the field to obtain those results in addition.
Among readers with brief postgraduate medical experience, it would not have been possible
by design to perform a reproducibility study with intervals of several years in between
individual readings. Therefore, long-term reproducibility results could only have
been obtained for readers with more extensive post-graduate chest radiology experience.
In our department, such long-term experience with radiological examinations in CF
patients was restricted to only one attending radiologist.
In conclusion, the Brody score, as a weighted composite score that describes and quantifies
the respective presence, location, and extent of CF-related changes in lung morphology
as detected by means of CT, appears to be reproducible within about 10 % of an average
value between different radiologists, both with a short time interval of 28 to 60
days, and with a long time interval of 2 to 7 years. However, its individual subscores
for bronchiectasis, mucus plugging, peribronchial thickening, parenchymal opacity,
and hyperinflation are less reproducible, with values exceeding 10 %. Overall, the
long-term reproducibility was not as good as the short-term reproducibility. For clinical
practice, it appears advisable to review both the previous chest CT and the associated
Brody score along with the new chest CT and new Brody score and perhaps newly apply
the Brody scoring system to both.