Key words thorax - CT - CT-quantitative - adenocarcinoma - technical aspects
Introduction
Part-solid pulmonary nodules (PSN) are a frequent incidental finding in chest CTs, for example in the setting of a lung cancer screening program. Persistent PSN have been reported to be malignant more frequently than solid or pure ground glass nodules (PGGN) with rates as high as 93.3 % [1 ]. In case of malignancy they usually represent adenocarcinoma and its precursors [2 ]. On the other hand, they can also represent various benign entities like infection, inflammation, focal interstitial fibrosis, eosinophilic pneumonia, thoracic endometriosis, focal hemorrhage or organizing pneumonia and have been shown to be transient in up to 69.8 % of cases [3 ]. This variety can make diagnosis challenging and dependent on the experience of the radiologist who has to take into account multiple lesion features such as attenuation, location inside the lung, size, shape and whether singular vs. multiple. But even if one considers all these particularities, diagnosis is often not certain requiring follow-up examinations for documentation of resolution, persistence or growth [5 ]. Moreover, because the size of many of these nodules is small, how size is measured becomes especially important. Variations in CT-scanners, window settings, as well as inter- and intra-rater performance are common and may have a critical impact on the assessment of size, especially in the follow-up [6 ]. Presently, manual uni- or bidirectional diameter measurements are the standard for lung cancer screening programs and day to day clinical care as reflected in current guidelines regarding the management of pulmonary nodules [5 ]
[7 ]. In PSN both the diameter of the whole lesion and of the solid part should be measured while the focus should be on the solid component, because although the solid component does not always correlate with the pathologically determined invasive component, there is a general correlation between them. [4 ]
[5 ]
[6 ]
[8 ].
There is accumulating evidence that semi-automated computer-aided volumetry (CAV) has several advantages compared to manual diameter measurements. The Dutch-Belgian lung cancer screening trial (NELSON), which was the first screening program to use semi-automated CAV instead of manual diameter measurements, achieved high negative predictive values and presumably fewer false-positive results compared to other lung cancer screening trials [9 ]
[10 ]. Furthermore, the volume-based management protocol yielded high sensitivity and specificity for the 2-year lung cancer probability [11 ]. Heuvelmans et al. [12 ] concluded in their investigation of diameter and volume measurements for estimation of lung nodule size that the use of mean or maximum axial diameter to assess the size of intermediate-sized lung nodules leads to a substantial overestimation of nodule volume, compared with semi-automated volumetry and that median intra-nodular diameter variation exceeds the 1.5 mm growth cut-off advocated in screening guidelines such as LungRADS implying a significant potential for errors in nodule management. It is not trivial to measure the accuracy of semi-automated volumetry since the “true” size of any pulmonary nodule is in most cases unknown. The reference standard of volume measurement after nodule excision is not perfect due to factors like an inevitable bias toward larger nodules, differences in pathology handling techniques and variations in the degree of lung inflation [13 ]. Nevertheless, several phantom studies have delivered promising results [14 ]
[15 ]
[16 ]
[17 ]. Apart from accurate estimation of the nodule volume, in clinical practice it is arguably more important for the software to possess high levels of intra- and interrater reliability since many nodules require follow-up examinations. In the past the variability range of nodule volume has been reported to be in general approximately 25 % [18 ].
The objective of this study was to test the performance of a software prototype for semi-automated computer-aided volumetry of part-solid pulmonary nodules with separate segmentation of the whole lesion and the solid component and to compare results with those acquired by manual volumetry.
Material and Methods
Study population
This retrospective evaluation of CT image data was approved by the local institutional review board (registration number 187/2018BO2). A retrospective database search of the local radiology department identified 34 chest CT scans of 19 consecutive patients (median age 75 years; range, 55–91 years; 8 female) diagnosed with part-solid pulmonary nodules (n = 66) in the routine CT-work-up between February 2015 and February 2018.
CT examinational protocol
All chest CTs were obtained unenhanced at end-inspiratory phase. In total, 34 CT-image data sets with a mean of 2 (range, 1–10) follow-up examinations were evaluated. CT-examinations were performed using a multi-detector scanner (SOMATOM Definition Flash, Siemens Healthineers, Forchheim, Germany), a 300–350 mm field of view, a 512 × 512 reconstruction matrix, 120 kV, 100 effective mAs and a tube rotation time of 0.5 ms. In all patients a spiral acquisition was obtained from the apex to the base of the lungs. Patients were positioned supine with the arms stretched in elevation and abduction. Thin-slice CT scans (0.6 mm) were reconstructed using a smooth reconstruction kernel (filter, B31f). For 54 lesions both smooth and sharp kernel reconstructions (filter, B70f) were available. All chest CTs were analysed for the existence of additional pathologies e. g. pleural effusions, pulmonary oedema, haemorrhage or pneumonia, that could have impacted volumetry results in which case they were excluded from the final analysis.
Functioning of the software prototype
Implementation of manual segmentation
The complete chest CT image dataset is displayed in three planes. The reader identifies the PSN and uses a designated tool to manually draw the edges of the whole lesion and of the solid part on every image the nodule is visible on. The edges can be drawn and freely adjusted in all three planes. After finalizing the manual segmentation, the software automatically calculates the volume and longest axial diameter of both the entire nodule and the solid part without displaying results to the reader.
Implementation of semi-automated computer-aided segmentation
The complete chest CT image dataset is displayed in three planes. The reader identifies the PSN and subjectively selects the axial slice in which the lesion shows the longest diameter. A designated tool is used to draw a straight line (seed line) through the longest diameter. The software then immediately performs automatic segmentation separately for the entire nodule and the solid part and calculates the volumes and longest axial diameters. The reader is blinded to the segmentation results.
Technical description of semi-automated computer-aided segmentation
After initializing CAV via drawing, the seed line the algorithm then computes a histogram of the attenuation of the voxels marked by the seed line to differentiate between solid lesions (i. e. parenchymal consolidation obscuring pulmonary structures like vessels and bronchi) and part-solid lesions. If the 25 % quantile of the histogram exceeds a predefined attenuation threshold, a pure solid lesion is assumed. In this case the lesion is segmented through region growing followed by morphological operations. The algorithm determines whether the lesion shows direct pleural contact, in which case the nodule is separated from the pleura. A detailed description of the algorithm can be found in a study by Moltz et al. [19 ]. If the histogram analysis does not indicate a pure solid lesion (i. e. detects the presence of ground-glass opacification which shows higher attenuation than normal lung parenchyma and lower attenuation than the solid portion and pulmonary soft tissues such as vessels or bronchi), the entire part-solid lesion is segmented through region growing with boundaries determined via intensity analysis of the nodule region and surrounding parenchyma [20 ]. This is followed by morphological operations analogous to the ones performed for solid lesions. In part-solid lesions the denser structures belonging to the solid compartment of the lesion are identified via thresholding: the center of the largest solid structure is used as a seed point to segment a solid compartment with the same algorithm as for pure solid lesions described above. The solid compartment is restricted to the boundaries of the subsolid compartment. The algorithm accounts for partial volume effects when determining the volumes of the solid and the subsolid compartment [21 ]. The reported subsolid volume includes the volume of the solid compartment. Examples of a segmentation results are given in [Fig. 1 ], [2 ].
Fig. 1 Sample images of a 78-year-old male patient with an adenocarcinoma in the right upper lobe. Images demonstrate results of manual segmentation using the smooth reconstruction kernel (b , c ) and semi-automated computer-aided segmentation using the smooth d and sharp kernel e .
Abb. 1 Beispielbilder eines 78 Jahre alten männlichen Patienten mit einem Adenokarzinom im rechten Oberlappen. Die Bilder zeigen die Ergebnisse der manuellen Segmentierung mit dem weichen Rekonstruktionskernel (b , c ) und der halbautomatischen computergestützten Segmentierung mit dem weichen d und harten Kernel e .
Fig. 2 Sample images of a 76-year-old female patient with an adenocarcinoma in the right upper lobe. Images show the results of semi-automated computer-assisted segmentation using the smooth (left) and sharp reconstruction kernel (right).
Abb. 2 Beispielbilder einer 76 Jahre alten Patientin mit einem Adenokarzinom im rechten Oberlappen. Die Bilder zeigen die Ergebnisse der halbautomatischen computergestützten Segmentierung mit dem weichen (links) und dem harten Rekonstruktionskernel (rechts).
Manual and computer-assisted segmentation and volumetry
For each of the 66 part-solid nodules 4 sets of volume measurements (MV1, MV2, CAV1, CAV2) were produced by two radiology residents and two medical students, each set containing separate measurements of the entire PSN and the solid part: Manual volumetry performed by Radiologist 1 (MV1), manual volumetry performed by Radiologist 2 (MV2), CAV performed by medical student 1 (CAV1) and CAV performed by medical student 2 (CAV2). Radiologist 1 did not have any significant experience in reading chest CTs, Radiologist 2 had three years of experience.
In a subset of 54 part-solid nodules CT datasets had been reconstructed with both the smooth and the sharp kernel. In this subset two additional sets of CAV measurements (CAVsmooth, CAVsharp) were produced by medical student 1, each set again containing separate measurements of the entire PSN and the solid part: CAV was performed three seperate times in each PSN using different variants of seed lines according to the following instructions: Seed line 1: “Draw a seed line through the longest diameter.”; Seed line 2: “Draw a seed line through the longest diameter but be a little imprecise.”; Seed line 3: “Draw a seed line through the approximate longest diameter and extend the seed line into the surrounding lung parenchyma.” The reader was blinded to the segmentation results. The average of the three volume measurements was calculated (CAVsmooth). The manually-drawn seed lines were then transferred to the CT data sets that had been reconstructed with the sharp kernel to obtain CAV results for both kernels using the exact same seed lines and the average of the three volume measurements was calculated (CAVsharp).
Analysis
Subjective visual assessment
Four weeks after the production the blinded segmentation results MV1, MV2 and CAV1 were shown to a senior radiologist with 25 years of experience in reading chest CTs (Radiologist 3) and to Radiologist 2. They visually assessed the quality of the results in the following manner: A dedicated software program was used. The readers selected each of the segmented 66 PSNs from a list. The selected PSN is shown in the axial CT images with the segmentation results displayed as colored lines surrounding the edges of the entire nodule and the solid part, each color representing one of the three datasets. The readers were able to select which of the separate segmentation results were displayed at any time with the option to display any combination or no result at all. This ensured that the lesion itself could be examined well and that segmentation results could be compared directly. The readers visually evaluated the segmentation results, i. e. how exact the lines depicted the borders of the solid part and the entire nodule. Each single segmentation result was evaluated as either satisfactory or unsatisfactory via consensus reading.
Quantitative statistical analysis
The following parameters were evaluated:
1. CAV Accuracy
CAV accuracy was assessed via comparing semi-automated CAV (CAV1 and CAV2) to the calculated average of the two radiology residentsʼ manual volume measurements (MV1 and MV2), which was defined as the reference standard, using the Bland-Altman method [22 ]
[23 ].
2. CAV and manual volumetry interrater variability
The interrater variability of CAV and manual volumetry was assessed by comparing the results of semi-automated CAV performed by the two medical students (CAV1 and CAV2) and the results of manual volumetry performed by Radiologist 1 and 2 (MV1 and MV2) using the Bland-Altman method and calculating the intraclass correlation coefficient (ICC).
3. CAV intra-rater variability
The intra-rater variability of CAV was assessed by determining each minimum and maximum measurement out of the three separate measurements per PSN performed by medical student 1 in the CT datasets that had been reconstructed with the smooth kernel (CAVsmooth). These were then compared via the Bland-Altman method. Additionally, we calculated the ICC for the three separate measurements.
4. Variability between the smooth and sharp reconstruction kernel
Variability of CAV measurements between the smooth and the sharp reconstruction kernel was assessed via comparing the calculated average values of CAVsmooth with those of CAVsharp using the Bland-Altman method.
Bland-Altman analysis consists of calculation of the relative differences in volume measurements, i. e. the difference in two measurements divided by the mean volume. Volume measurement variability is defined as the 95 % confidence interval of these relative differences. ICC estimates and their 95 % confidence intervals were calculated based on a single rater, absolute-agreement, two-way random-effects model. A p-value of 0.05 was considered statistically significant. We used the computer software IBM SPSS Statistics 26 and GraphPad Prism 9.
Results
Mean values and standard deviations for volumes and diameters of the entire lesion and the solid part are presented in [Table 1 ].
Table 1
Mean volumes [mm³] and longest axial diameters [mm] with standard deviations of the entire PSN and the solid lesion part acquired by manual volumetry and CAV (± standard deviation).
Tab. 1 Durchschnittliches Volumen [mm3 ] und größter axialer Durchmesser [mm] mit Standardabweichungen der gesamten PSN und der soliden Anteile, ermittelt mittels manueller Volumetrie und CAV (± Standardabweichung).
manual volumetry (reader 1)
manual volumetry (reader 2)
CAV (student 1)
volume entire PSN
1401 (± 2929)
1607 (± 3420)
1213.0 (± 2706)
volume solid part
272 (± 500)
245 (± 376)
266 (± 440)
diameter PSN
15.0 (± 7.2)
15.8 (± 8.4)
12.3 (± 8.3)
diameter solid part
9.1 (± 3.7)
9.2 (± 3.6)
8.7 (± 4.2)
Subjective visual assessment
Manual segmentation of the solid part was rated as satisfactory in 79 %–80 %. Manual segmentation of the entire nodule was rated as satisfactory in 73 %–76 %. Semi-automated computer-assisted segmentation delivered satisfactory results in 77 % for the solid part and 67 % for the entire nodule ([Table 2 ]).
Table 2
Results of subjective visual assessment. Percentage of segmentation results rated as satisfactory.
Tab. 2 Ergebnisse der visuellen Bewertung. Prozentsatz der Segmentierungsergebnisse, die als zufriedenstellend gewertet wurden.
solid part
Manual volumetry (Radiologist 1)
79 % (52/66)
Manual volumetry (Radiologist 2)
80 % (53/66)
CAV (Medical Student 1)
77 % (51/66)
entire PSN
Manual volumetry (Radiologist 1)
73 % (48/66)
Manual volumetry (Radiologist 2)
76 % (50/66)
CAV (Medical Student 1)
67 % (44/66)
Statistical analysis of volumetry
Numbers in brackets following ICC values indicate the lower and upper bounds of their 95 % confidence intervals.
1. CAV Accuracy
For the solid part relative variability between CAV1 / CAV2 and the reference standard was –150–116 %/–151–117 % with a mean relative difference of –17 %/–17 %. For the entire nodule relative variability was –106–54 %/–63–49 % with a mean relative difference of –26 %/–7 %. The respective Bland-Altman plots are shown in [Fig. 3 ].
Fig. 3 CAV Accuracy. Bland-Altman plots depicting variability between CAV1 and CAV2 and the reference standard. The mean differences (middle dotted line) and the upper and lower 95 % limits of agreement (upper and lower dotted lines) were as follows (limits of agreement in parenthesis): a : –17 (–150–116), b : –17 (–151–117), c : –26 (–106–54), d : –7 (–63–49).
Abb. 3 CAV Accuracy. Bland-Altman-Plots zur Darstellung der Variabilität zwischen CAV1 und CAV2 und dem Referenzstandard. Die Mittelwerte der Differenz (mittlere gestrichelte Linie) und die oberen und unteren 95 % Limits of Agreement (obere und untere gestrichelte Linien) waren wie folgt (Limits of Agreement in Klammern): a : –17 (–150–116), b : –17 (–151–117), c : –26 (–106–54), d : –7 (–63–49).
2. CAV interrater variability
For the solid part relative variability between CAV1 and CAV2 was –16–16 % with a mean relative difference of –0.075 %. For the entire nodule relative variability was –102–65 % with a mean relative difference of –18 %. The respective Bland-Altman plots are shown in [Fig. 4 ]. Regarding the solid part the ICC was 0.998 (0.997, 0.999). For the entire lesion the ICC was 0.880 (0.806, 0.926).
Fig. 4 Interrater variability. Bland-Altman plots depicting interrater variability between CAV1 and CAV2 and between MV1 and MV2. The mean difference (middle dotted line) and the upper and lower 95 % limits of agreement (upper and lower dotted lines) were as follows (limits of agreement in parenthesis): a : –0.075 (–16–16), b : –18 (–102–65), c : –3.6 (–89–82), d : –5.9 (–46–34).
Abb. 4 Interrater-Variabilität. Bland-Altman-Plots zur Darstellung der Interrater-Variabilität zwischen CAV1 und CAV2 und zwischen MV1 und MV2. Die Mittelwerte der Differenz (mittlere gestrichelte Linie) und die oberen und unteren 95 % Limits of Agreement (obere und untere gestrichelte Linien) waren wie folgt (Limits of Agreement in Klammern): a : –0.075 (–16–16), b : –18 (–102–65), c : –3.6 (–89–82), d : –5.9 (–46–34).
3. CAV intra-rater variability
For the solid part relative intra-rater variability was –70–49 % with a mean relative difference of –10 %. For the entire nodule variability was –111–31 % with a mean relative difference of –40 %. The respective Bland-Altman plots are shown in [Fig. 5a, b ]. The ICC of the three separate measurements per PSN performed by medical student 1 was 0.992 (0.988, 0.995) for the solid part and 0.929 (0.883, 0.958) for the entire nodule.
Fig. 5 CAV intra-rater variability; intrascan variability between reconstruction kernels. Bland-Altman plots depicting intra-rater variability for CAVsmooth and the intrascan variability between the smooth and sharp reconstruction kernels. The mean difference (middle dotted line) and the upper and lower 95 % limits of agreement (upper and lower dotted lines) were as follows (limits of agreement in parenthesis): a : –10 (–70–49), b : –40 (–111–31), c : –3.2 (–45–39), d : 13 (–21–46).
Abb. 5 Intrarater-Variabilität. Bland-Altman-Plots zur Darstellung der Intrarater-Variabilität für CAVsmooth und der Intrascan-Variabilität zwischen dem weichen und harten Rekonstruktionskernel. Die Mittelwerte der Differenz (mittlere gestrichelte Linie) und die oberen und unteren 95 % Limits of Agreement (obere und untere gestrichelte Linien) waren wie folgt (Limits of Agreement in Klammern): a : –10 (–70–49), b : –40 (–111–31), c : –3.2 (–45–39), d : 13 (–21–46)
4. Variability between the smooth and sharp reconstruction kernel
For the solid part relative variability of CAV measurements between the smooth and the sharp reconstruction kernel was –45–39 % with a mean relative difference of –3.2 %. For the entire nodule variability was –21–46 % with a mean relative difference of 13 %. The respective Bland-Altman plots are shown in [Fig. 5c, d ].
Discussion
Overall the software prototype showed mixed results. Subjective assessment of CAV yielded satisfactory results with a somewhat higher rate of satisfactory segmentation results for the solid part. On the other hand, Bland-Altman analysis showed comparatively lower accuracy and interestingly better results for the entire nodule compared to the solid part. Since both, the subjective assessment of results as well as the establishment of the reference standard were based on subjective visual delineation of the solid and subsolid part’s edges, this could be a result of relatively high intra- and interrater variability regarding this task. The reduced difference in attenuation between the ground-glass component of a subsolid nodule and the surrounding lung parenchyma is a known segmentation problem [13 ]. The subjective impression of the authors is that when performing manual segmentation, the edges of the solid lesion parts often can be more easily and confidently identified than those of the ground glass part because they are more sharply delineated. The volumes measured by CAV were lower compared to the manually derived reference standard. In clinical practice, rather than measuring the true size of a PSN – which is not known – it is more important to detect size changes during follow-up, which requires high intra- and interrater reliability. Bland-Altman analysis showed low interrater variability for the solid part but relatively high variability for the entire nodule. Expressed as ICCs the agreement was high for both. Interestingly, the interrater variability of manual segmentation was lower for the entire nodule compared to the solid part. Intra-rater variability of CAV was relatively high overall with lower values for the solid part compared to the entire nodule. Expressed as ICCs the agreement was high. Regarding differences between the two reconstruction kernels we found that with the smooth kernel the volume of the solid part was measured slightly lower and the volume of the entire nodule somewhat higher. Overall, variability between the kernels was higher for the solid part compared to the entire nodule.
These findings are important because accurate and especially precise size measurement of PSNs, a task that can be difficult to accomplish adequately when performed manually, is vital for the estimation of their malignant potential in the initial assessment and in a follow-up scenario. Additionally, valid quantification is particularly important for the solid part of malignant nodules due to its known general correlation to the invasive component [6 ].
There are not many publications examining semi-automated volumetry of part-solid nodules. Most publications examine subsolid nodules in general, of which part-solid nodules are a subset. Moreover, the studies including part-solid nodules did not for the most part perform separate segmentation for the solid part.
In regard to the subjective evaluation of the segmentations’ quality, Benzakoun et al. [24 ] examined 47 PGGNs and 50 PSNs and found satisfactory results in 81 %. Charbonnier et al. [25 ] found satisfactory results in 80.6 % for the solid parts of 170 subsolid nodules. These values are slightly better, but similar to ours. Intra-rater variability for the entire nodule in other studies was lower than our own. Kim et al. [26 ] analyzed 72 PGGNs and 22 PSNs and found a variability of –7.6 % to 8.5 %. Park et al. [27 ] examined 30 PGGNs and found a maximum variability of –9.1 % to 10.1 % with a sharp reconstruction kernel and of –11.6 % to 11.8 % with a medium sharp reconstruction kernel. Higher variability in our study might be a result of the deliberate manipulation of the seed lines in repeated measurements and using a smooth kernel. Expressed as an ICC Scholten et al. [29 ] found an agreement of 0.92 which almost equals our own results. Regarding interrater variability for the entire lesion Kim et al. [26 ] found a variability of –11.7 % to 18.1 % and Park et al. [27 ] of –15.8 % to 13.4 % with a sharp reconstruction kernel and –11.1 % to 6.2 % with a the medium sharp kernel. Those values also are lower than our own. However, expressed as ICCs we found comparatively lower values as for example Scholten et al. [28 ]
[29 ] or Kamiya et al. [30 ]. In those two studies which included 24 PGGNs and 20 PSNs and 19 PGGNs and 14 PSNs respectively, Scholten et al. found ICCs between 0.920 and 0.957. Kamiya et al. found an ICC of 0.940 in an analysis of 4 PGGNs and 92 PSNs. Expressed as relative volume deviation other authors found values between –1.2 % and 18.1 % [26 ]
[27 ]. With respect to volume measurements of the solid nodule part, Kamiya et al. in the study cited above, found ICCs between 0.994–0.996, which are similar to our own results.
Regarding differences in volume between manual and semi-automated measurements the study by Scholten et al. demonstrated that the average volume was 24.3 %−26.5 % smaller when measured manually [29 ]. This stands in contrast to our results which showed the reverse.
Our study is limited by its retrospective design with typical drawbacks such as the fact that sharp kernel reconstructions were not available for all nodules. The number of nodules is rather low in absolute terms but similar to other studies on this topic. We did not have a histological gold standard to determine the accuracy of volume and diameter measurements, but this is a common problem concerning publications on this issue.
In conclusion, although the software prototype delivers satisfactory results when segmentation is evaluated subjectively, quantitative statistical analysis revealed room for improvement especially regarding the segmentation accuracy of the solid part and the reproducibility of measurements of the nodule’s subsolid margins.
Accurate and reliable size measurement plays an important role in the management of PSNs, which possess relatively high malignant potential
The workload regarding PSN management is going to increase with the implementation of lung cancer screening programs
CAV has the potential to make nodule size quantification easier and faster if the software’s accuracy and especially the reproducibility can reach the level of manual size measurement or even surpass it