Key words
computed tomography - calcium scoring - coronary artery calcium - diagnostic accuracy
- machine learning
Introduction
Coronary artery disease (CAD) is the leading cause of death worldwide [1]
[2]
[3]. Given the burden of CAD on patients and the health care system, early detection
of the disease and prediction of the individual risk of developing cardiovascular
events are crucial. Systematic research in this area has led to further developments
in treatment and patient care and the possibility of individual risk assessment, which
helps to optimize treatment and patient care [4]
[5]. The current clinical guidelines in the US and Europe recommend calcium scoring
computed tomography (CSCT) in selected asymptomatic individuals, typically at low
to intermediate risk of CAD [6]
[7].
Non-contrast-enhanced, ECG-triggered CSCT is performed at a low radiation dose and
can determine the cardiovascular risk for each patient, using the well-established
metrics Agatston score (AS), volume score (VS), and mass score (MS) [8]. The AS calculates calcium burden by multiplying the area of the lesion above a
130 HU threshold and VS is defined as the total number of voxels exceeding the threshold
of 130 HU for the respective calcium region [8]
[9]. Whereas VS and AS are intended as indirect indicators of coronary artery calcification
(CAC), MS provides an actual quantitative measure and assesses the true mass of CAC
[8].
Typically, radiologists use semi-automated software for evaluation, including manual
detection and marking of coronary artery calcifications [10], supported by threshold-based, automated region-growing algorithms. Up to now, measurement
of CAC requires manual input by a human operator to identify and assign calcified
coronary lesions to the left main artery (LM), left anterior descending artery (LAD),
circumflex artery (CX), or right coronary (RCA) artery [11]
[12].
Due to the worldwide use of CSCT, there is a need to further improve and automize
the examination and post-processing workflow [13]. In recent years, developments in machine learning have led to improvements in automated
systems for CSCT [10]
[13]
[14]
[15]. With regard to determining the total calcium load of all coronary arteries, some
studies have already shown promising results [10]
[14]. Data regarding the performance of machine learning-based algorithms for the detection
of CAC with identification of the particular coronary artery are limited. However,
knowledge of the calcium load of each individual coronary vessel could have an impact
on cardiovascular risk management. In fact, CAC of the LM and LAD was associated with
increased mortality risk and CAC of the right coronary artery with decreased mortality
risk [16]
[17].
The aim of this retrospective single-center study was to evaluate novel machine learning-based
software for fully automated calcium scoring with identification and evaluation of
each coronary artery in non-contrast cardiac CT, as compared to a semi-automated post-processing
tool serving as the standard of reference.
Materials & Method
The local institutional review board approved this retrospective analysis of patient
data. In this retrospective single-center study, patients and their baseline characteristics
were retrospectively collected from the institutional database. A total of 505 patients
with CSCT performed on a state-of-the-art CT scanner (SOMATOM Definition Flash or
SOMATOM Force; Siemens Healthineers, Erlangen, Germany) between January 2013 and July
2020 were included. The exclusion criteria were cardiac CT without non-contrast-enhanced
ECG-triggered calcium scoring datasets, pediatric cardiac CT datasets, and patients
with intracoronary stents ([Fig. 1]).
Fig. 1 STARD flowchart of patient inclusion.
Abb. 1 STARD-Diagramm der Studienpopulation.
Imaging protocol
All CSCT scans were performed on a state-of-the-art multidetector CT scanner (SOMATOM
Definition Flash or SOMATOM Force; Siemens Healthineers, Erlangen, Germany). All images
were acquired with automatic tube current modulation (CARE Dose 4 D), automatic kV
modulation (CARE kV), and a reference mAs of 60 and a reference kV of 120. For SOMATOM
Force, the gantry rotation time was 0.25 s, the pitch was 3.2, and the collimation
was 0.6 × 192 mm. Reconstructions were computed with Sa36 kernel, a slice thickness
of 3.0 mm, and an increment 1.5 mm. For SOMATOM Definition Flash, the gantry rotation
time was 0.28 s, the pitch was 3.4, and the collimation was 0.6 × 128 mm. Reconstructions
were made with B35 f\. kernel, a slice thickness of 3.0 mm, and an increment of 1.5 mm.
If the patientʼs heart rate was above 65 bpm, a beta-blocker (5 mg Metroprolol, Recordati
Pharma GmbH, Germany) was administered intravenously. Following CSCT, contrast-enhanced
angio/cardiac CT was performed.
Machine learning-based Calcium scoring software
The automated software was trained on 1261 anonymized datasets from routine coronary
artery calcification examinations from multiple vendors, scanners, and from different
hospitals. No training data sets were analyzed in the current study.
First, the standard 130 HU threshold is applied to the image to identify voxels as
calcium candidates. For each candidate voxel, a small piece of image information as
well as the voxel position in a cardiac coordinate system and some local features
(e. g., HU value of the voxel) are extracted. The Deep Learning model works with two
components, a convolutional neural network (CNN) with ResNet architecture that processes
the image piece around the voxel and a dense network that processes the position in
the heart coordinate system and the local features. The results of both networks are
merged and plugged into a classifier that outputs the probability of coronary calcium
for each voxel. If the average probability of a connected cluster is higher than a
predefined threshold, it is marked in the application. The CNN is accompanied by an
atlas trained with segmented coronaries from CTAs. Therefore, this component indicates
whether a voxel is likely to be coronary or not, thus excluding heart valves, etc.
The next step is a deep learning algorithm that provides the position of the LM-LAD-LCX
bifurcation. For this, the final classification of the branch is performed using a
simple, fully connected neural network whose features include the spatial coordinates
of each voxel identified as belonging to the coronary arteries and the coordinates
of the voxel as a function of coronary bifurcation. This model yields 4 outcomes,
namely the probability that the voxel belongs to the LM, LAD, CX, or RCA. The final
assignment is made using a softmax function to determine the most likely position
for each voxel.
Calcium scoring and evaluation of the machine learning-based software
Semi-automated, clinically established post-processing software (syngo.via, version
VB50 Siemens Healthineers) was utilized to generate the reference standard. All 505
CSCT scans were double-read by two radiologists in multiple sessions (Reader 1 with
nine years and Reader 2 with four years of experience in cardiac CT diagnostics),
and all differences in image interpretation were resolved by consensus. To avoid bias,
both readers were blinded to the results of the automated software. As previously
described in the literature [4]
[10], to detect CAC, a threshold of > 130 HU was determined on an area of ≥ 1 mm2, which corresponded to the default setting of the software. Calcified lesions of
interest were manually identified and assigned to their respective coronary artery
type (LM, LAD, CX, RCA). Regions were labeled to obtain the number of calcified lesions,
the artery-based AS, VS, MS, and total AS, VS, MS. After loading the images into the
software, the time measurement of the automatic system was started after the onset
of the automatic assessment and stopped after the software displayed a score. The
evaluation time of the reference standard included the location of all CACs and the
correlation of the automatically derived number of CACs. The time that was required
for the first reading was registered.
The individual scans were assigned to risk groups, which are standardized [18] and based on the AS. CAC 0: very low risk; CAC 1–10: low risk, CAC 11–100: moderate
risk, CAC 101–400: moderately high risk, CAC > 400: high risk. The automated software
was used on a regular daily routine diagnostic workstation (syngo.via, version VB50
Siemens Healthineers). All CSCT scans (n = 505) were analyzed with the machine learning-based
automated software. The number of calcified lesions was registered and additionally
assigned to the respective coronary artery. The AS, VS, and MS for the respective
coronary artery and the total AS, VS, and MS were determined. The duration of the
system run time was recorded. Subsequently, a double-check of the results was performed
in which the number and location of the calcified lesions were reviewed. The only
human interaction that was needed was for loading the images into the software ([Fig. 2]).
Fig. 2 a–b Reconstructions in axial, axial thin-section MIP, coronary, and sagittal planes with
calcifications in the LM, LAD, CX, and RCA. a Visually visible coronary calcifications before application of automatic calcium
scoring software. b Calcium regions detected by the automated software and color-coded for the corresponding
artery.
Abb. 2 a–b Rekonstruktionen in der axialen Ebene, axialen Dünnschnitt-MIP, Koronar- und Sagittalebene
mit Verkalkungen in der LM, LAD, CX und RCA. a Visuell sichtbare Koronarkalkablagerungen vor Anwendung der automatischen Kalzium-Scoring-Software.
b Von der automatischen Software ermittelte und für die entsprechende Arterie farbkodierte
Kalziumregionen.
Statistics
The available data were analyzed using SPSS (SPSS Statistics 26, IBM Corp., Armonk,
New York, USA), R version 4.0.3 (The R Foundation for Statistical Computing, Vienna,
Austria), in particular using the package Blandr [19]. Continuous variables are presented as mean ± standard or as the median and interquartile
range (IQR) if non-normally distributed. The correlation and agreement between the
standard reference and the machine learning-based software for coronary artery-based
and total AS, VS, MS, and the number of lesions were calculated with Spearmanʼs rank
correlation coefficient (⍴) and intraclass correlation coefficient (ICC). The reference
standard and the machine learning-based automatic software were compared by way of
a Bland-Altman procedure. The agreement was examined after recoding values of 0 to
0.06 and subsequent log transformation because of the right skewness of the data.
Differences in risk classifications were assessed by weighted kappa analysis (κ).
The time difference was determined using the Wilcoxon signed-rank test.
Results
A total of 505 patients were successfully included in the study based on the inclusion
criteria: 132 (26.1 %) women and 373 men (73.9 %). The mean age was 57.6 ± 12.6 years
([Table 1]).
Table 1
Patient characteristics.
Tab. 1 Patientencharakteristika.
Variables
|
N (%)/mean ± SD
|
Patients
|
505
|
Women
|
132
|
Men
|
373
|
Age (years)
|
57.6 ± 12.6
|
BMI
|
25.4 ± 5.4
|
The median time for the semi-automatic collection of data for the reference standard
was 59 seconds (IQR, 39–140 sec) compared to the time of 5.9 seconds (IQR, 3.9–16 sec)
required by the automatic machine learning-based algorithm (p < 0.001).
The correlation and agreement of the automatic algorithm and the reference standard
concerning the number of calcified lesions were calculated by Spearmanʼs rank correlation
coefficient and ICC for the respective arteries (Spearmanʼs rho > 0.965; ICC > 0.870)
(p < 0.001) ([Table 2]).
Table 2
Measures of association between automatic algorithm and reference standard.
Tab. 2 Grad der Übereinstimmung zwischen automatischem Algorithmus und Referenzstandard.
Measure
|
Spearman ⍴[*]
|
ICC*
|
95 % CI ICC
|
LM
number of lesions
|
0.965
|
0.870
|
[0.847–0.890]
|
LM volume (mm2)
|
0.982
|
0.978
|
[0.973–0.981]
|
LM equiv. mass (mg)
|
0.982
|
0.981
|
[0.977–0.984]
|
LM Agatston-score
|
0.982
|
0.983
|
[0.979–0.985]
|
LAD
number of lesions
|
0.987
|
0.948
|
[0.938–0.956]
|
LAD volume (mm2)
|
0.996
|
0.953
|
[0.944–0.960]
|
LAD equiv. mass (mg)
|
0.996
|
0.957
|
[0.945–0.964]
|
LAD Agatston-score
|
0.996
|
0.954
|
[0.950–0.961]
|
CX
number of lesions
|
0.966
|
0.952
|
[0.943–0.960]
|
CX volume (mm2)
|
0.969
|
0.922
|
[0.910–0.936]
|
CX equiv. Mmass (mg)
|
0.969
|
0.924
|
[0.910–0.936]
|
CX Agatston-score
|
0.970
|
0.920
|
[0.905–0.932]
|
RCA
number of lesions
|
0.980
|
0.972
|
[0.967–0.977]
|
RCA volume (mm2)
|
0.986
|
0.990
|
[0.988–0.991]
|
RCA equiv. mass (mg)
|
0.986
|
0.990
|
[0.987–0.991]
|
RCA Agatston-score
|
0.986
|
0.990
|
[0.998–0.992]
|
Total
number of lesions
|
0.995
|
0.977
|
[0.973–0.981]
|
Total volume (mm2)
|
0.999
|
0.995
|
[0.994–0.996]
|
Total equiv. mass (mg)
|
0.998
|
0.992
|
[0.991–0.993]
|
Total Agatston-score
|
0.999
|
0.996
|
[0.995–0.996]
|
LM = left main artery, LAD = left anterior descending artery, CX = circumflex artery,
RCA = right coronary artery.
* all p < 0.001.
The coronary artery calcium scoring results of the machine learning-based software
correlated highly with the reference standard for the AS, VS, and MS for all four
coronary arteries (Spearmanʼs rho > 0.969) (p < 0.001). The Spearmanʼs rho of the
individual arteries can be found in [Table 2].
The agreement of the machine learning-based software with the reference standard was
evaluated using ICC. In terms of the AS, VS, and MS, the ICC was 0.983, 0.978, and
0.981, respectively, for the LM, 0.954, 0.953, and 0.957 for the LAD, 0.919, 0.922,
and 0.924 for the CX, and 0.989, 0.989 and 0.989 for the RCA. The ICC for the total
values of the AS, VS, and MS was 0.996, 0.995, and 0.992, respectively (p < 0.001)
([Table 2]).
The Bland-Altman plots mean difference (log-transformed, theoretical line of no bias
y = 1) and 1.96 upper and lower limits of agreement for all arteries combined was:
AS 0.996 (1.33 to 0.74), VS 0.995 (1.40 to 0.71), and MS 0.995 (1.35 to 0.74). The
mean bias was minimal for the respective coronary arteries (0.964–1.0429). The values
for the individual arteries are shown in [Fig. 3] and [Table 3].
Fig. 3 Bland-Altman plots (log-transformed with back transformation) for LM, LAD, CX, and
RCA. Mean of log (rating) and log (artificial intelligence) on the x-axis, Rating
by humans/AI result ratio on the y-axis. The theoretical line of no bias is at y = 1.
Dashed lines indicate bias and LOAs, and dotted lines indicate 95 % confidence bands.
The solid line represents proportional bias. Observations with rating/AI ratios higher
than the maximum value on the y-axis are omitted for presentation, while analysis
used all available cases.
Abb. 3 Bland-Altman-Diagramme (log-transformiert mit Rücktransformation) für LM, LAD, CX
und RCA. Mittelwert von log (Rating) und log (künstliche Intelligenz) auf der x-Achse,
Verhältnis zwischen menschlichem Rating und AI-Ergebnis auf der y-Achse. Die theoretische
Linie ohne Bias liegt bei y = 1. Die gestrichelten Linien zeigen die Verzerrungen
und die LOAs an, die gepunkteten Linien die 95 %-Konfidenzbänder. Die durchgezogene
Linie stellt den proportionalen Bias dar. Beobachtungen mit Rating/AI-Verhältnissen,
die über dem Maximalwert auf der y-Achse liegen, wurden für die Darstellung ausgelassen,
während für die Analyse alle verfügbaren Fälle verwendet wurden.
Table 3
Bland Altman procedure with log-transformed measurement values, results in back-transformed
(exponentiated).
Tab. 3 Bland-Altman-Methode mit log-transformierten Messwerten und rücktransformierten Ergebnissen
(potenziert).
Measure
|
Mean bias
|
Upper limit of agreement
|
Lower limit of agreement
|
p for proportional bias
|
LM volume (mm2)
|
0.944
|
2.622
|
0.340
|
0.252
|
LM equiv. mass (mg)
|
0.961
|
2.006
|
0.460
|
0.117
|
LM Agatston-score
|
0.945
|
2.580
|
0.346
|
0.254
|
LAD volume (mm2)
|
1.041
|
2.182
|
0.496
|
0.422
|
LAD equiv. mass (mg)
|
1.047
|
2.630
|
0.417
|
0.771
|
LAD Agatston-score
|
1.044
|
2.640
|
0.481
|
0.344
|
CX volume (mm2)
|
1.047
|
3.848
|
0.285
|
0.745
|
CX equiv. mass (mg)
|
1.030
|
2.802
|
0.379
|
0.999
|
CX Agatston-score
|
1.050
|
3.841
|
0.287
|
0.972
|
RCA volume (mm2)
|
1.014
|
2.693
|
0.381
|
0.929
|
RCA equiv. mass (mg)
|
1.011
|
1.888
|
0.541
|
0.852
|
RCA Agatston-score
|
1.014
|
2.434
|
0.422
|
0.851
|
Total volume (mm2)
|
0.995
|
1.403
|
0.706
|
0.699
|
Total equiv. mass (mg)
|
0.995
|
1.347
|
0.735
|
0.485
|
Total Agatston-score
|
0.996
|
1.332
|
0.744
|
0.812
|
LM = left main artery, LAD = left anterior descending artery, CX = circumflex artery,
RCA = right coronary artery.
Weighted kappa analysis for risk class assignment showed high accuracy for the AS
in total (weighted κ = 0.99) and for each artery (κ = 0.96–0.99). There were a total
of 88 misclassifications with consecutive change of the total Agatston score. Most
scans were incorrect within the low-risk category (CAC 1–10: n = 58) and moderate-risk
category (CAC 11–100: n = 22). These minor errors had no effect on the assignment
of the risk group and occurred mainly due to misregistration of image noise in the
heart and adjacent structures. The fully automated software classified 497 of 505
patients (98.4 %) into the correct risk category.
In five patients (1 %) with significant errors in the moderate high-risk category
(CAC 101–400), the software did not include calcification at the right coronary ostium
(n = 1) or malfunctioned in differentiating between coronary and pericardial calcifications
(n = 4), thus underestimating the calcium load. Significant overestimation of calcium
load was observed in three patients in the high-risk group (CAC > 400) due to erroneous
inclusion of calcifications at the aortic root (n = 1), pericardium (n = 1), and mitral
valve (n = 1) ([Fig. 4a–c]).
Fig. 4 a–c Reconstruction in axial planes after application of the automatic calcium scoring
software in three different patients. Depicted is an overestimation of calcium load
(arrows) by the automatic algorithm due to incorrect inclusion of calcifications at
the aortic root a, mitral valve b, and in the pericardium c.
Abb. 4 a–c Rekonstruktion in axialen Ebenen nach Anwendung der automatischen Kalzium-Scoring-Software
bei 3 verschiedenen Patienten. Dargestellt ist eine Überschätzung der Kalziumbelastung
(Pfeile) durch den automatischen Algorithmus durch fehlerhafte Einbeziehung von Verkalkungen
an der Aortenwurzel a, der Mitralklappe b und im Perikard c.
Discussion
In this study, the performance of novel machine learning-based fully automated post-processing
software was evaluated for artery-based calcium scoring in cardiac CT, compared with
clinically established semi-automated post-processing software serving as the standard
of reference. Correlation, agreement, and risk classification were excellent for each
artery and in total. Compared with the semi-automated approach, the fully automated
analysis allows a tailored survey of each patientʼs calcium load to be collected in
significantly less time.
For the coronary arteries separately and as a total, the correlation and agreement
of the number of lesions, the AS, the VS, and the MS of the machine learning-based
software were excellent compared with the reference standard. The Bland-Altman plot
for the AS, VS, and MS showed a high level of agreement for all arteries. The Bland-Altman
evaluation that was performed is based on the logarithmized values of the two measurements
(automatic software and reference standard). This transformation is appropriate in
the case of values that are highly right-skewed-distributed and downward-bounded.
In our study, skewness of the data set was present, as 213 of 505 patients (42 %)
had a total AS of 0. Weighted kappa analysis provided accurate risk group categorization.
Several studies have already evaluated automated software for CSCT with comparable
results regarding correlation and agreement for calcium scoring and risk category
classification [10]
[11]
[20]. Due to differences in study design, data distribution, and quantitative assessment,
comparisons are difficult. The larger number of patients in our study confirms the
robustness of the automated software for the evaluation of CSCT. In contrast to previous
studies [10], exclusion of patients with metallic foreign bodies such as heart valve replacements
and cardiac pacemakers was not necessary. The softwareʼs CNN is trained to differentiate
whether a voxel belongs to a coronary artery or metal implant.
The number of studies evaluating automated CSCT software with calcium load assignment
for each coronary artery is limited [20]. Since the risk from calcium burden can vary for each coronary vessel, the excellent
performance of artery-specific automated calcium score evaluation can contribute to
time-efficient, cost-effective, tailored CAD screening [21]. The results of our study suggest that artery-specific automated calcium assessment
software could be integrated into routine clinical practice for the quantification
of coronary calcium with additional branch labeling. Since the software will be commercially
available, widespread clinical implementation and workflow integration are anticipated
and will hopefully yield the same results as our study.
We are aware that our study has limitations, mainly due to its retrospective nature,
and we made every effort to create a strong reference standard with two independent,
experienced readers. All CSCT scans were performed in a single center on two different
CT scanners from the same vendor. It was already presumed that calcium scoring from
other vendors might vary [22]. The automatic software was compared with semi-automatic software from the same
vendor. However, the results of the semi-automatic software can be reproduced on other
platforms [23]. Although this is one of the most extensive known studies evaluating automatic CAC
scoring from CSCT scans, an even larger data set would undoubtedly lead to even more
robust results.
Despite the overall excellent performance of the algorithm, there were some outliers.
Misclassification by the automated software occurred in five patients in the intermediate
to high-risk group, with calcifications at the ostium of the right coronary artery
not detected in one patient and partial failure to distinguish between coronary calcification
and calcification in the pericardium in the remaining patients. In a total of three
patients, there was a misclassification into the high-risk group due to an overestimation
of the calcium burden because of an incorrect detection of calcifications at the aortic
arch, the pericardium, and the mitral valve. However, these distinct errors are not
difficult to detect when reviewing the results and may therefore be of limited clinical
relevance. For this reason, the results of the automated algorithm should always be
verified by a human observer when used in routine clinical practice.
Furthermore, it would be beneficial to further develop the software to apply to non-ECG-triggered,
standard CT thorax examinations. A number of studies have already addressed epidemiologic
stratifications of coronary calcification on conventional chest CT [24]
[25]
[26]. However, the present study was designed to automatically assess coronary calcification
on cardiac CT in a large population in a detail-oriented manner.
In conclusion, this study presented the validation of fully automated software for
artery-specific detection of coronary calcification. The results showed excellent
correlation and agreement between the automatic and the reference standard for three
CAC scores and the number of coronary lesions in each coronary artery.
-
Coronary calcium load is known to predict cardiovascular risk, and its automatic and
time-efficient determination is of clinical importance.
-
The utilization of machine learning-based applications in clinical practice can improve
workflow efficiency for frequent CT examinations, such as non-contrast-enhanced calcium
scoring computed tomography.