Nuklearmedizin 2019; 58(02): 155
DOI: 10.1055/s-0039-1683623
Poster
Schilddrüse
Georg Thieme Verlag KG Stuttgart · New York

Interobserver Agreement and Efficacy of Consensus Reading in Kwak-, EU-, ACR-TIRADS and ATA Guidelines for the Ultrasound Risk Stratification of Thyroid Nodules

P Seifert
1   Universitätsklinikum Jena, Klinik für Nuklearmedizin, Jena
,
R Görges
2   Universitätsklinikum Essen, Klinik für Nuklearmedizin, Essen
,
M Zimny
3   ÜBAG Nuklearmedizin Hanau, Hanau
,
S Schenke
4   Universitätsklinikum Magdeburg, Klinik für Nuklearmedizin, Magdeburg
› Institutsangaben
Weitere Informationen

Publikationsverlauf

Publikationsdatum:
27. März 2019 (online)

 
 

    Ziel/Aim:

    To evaluate the impact of consensus reading on the interobserver agreement (IA) of ultrasound classification systems for thyroid nodules (TN).

    Methodik/Methods:

    Four observers with at least five years of clinical experience independently rated ultrasound images of 80 TN according to Kwak-, EU-, ACR-TIRADS and ATA Guidelines (GL). The cases were randomly extracted from a prospectively acquired database of > 1500 TN; the observers were blinded to all clinical data. The study was divided into two sessions (S1, S2) with each 40 image sets. After S1 a consensus reading was carried out (C1). Subsequently the effect of C1 was tested in S2 with 40 new cases followed by a second consensus reading (C2). Fleiss' kappa (?) was calculated for S1 and S2 to estimate IA and to evaluate the learning curves. The results of C1 and C2 were used to assess the diagnostic accuracy of each classification system by means of ROC-analysis.

    Ergebnisse/Results:

    There were no significant differences in sex (70% women), age (51 ± 13 years) and the rate of malignant nodules (18%) between S1 and S2. IA significantly increased (p < 0.014) after C1 with κ-values of 0.375 (0.615), 0.411 (0.596), 0.321 (0.569) and 0.410 (0.583) for Kwak-, EU-, ACR-TIRADS and ATA GL in S1 (S2), respectively. Furthermore, the number of exact matches of all observers in all classification systems significantly increased from S1 (35,6%) to S2 (52,5%, p < 0.01). The ROC-analysis including all cases, revealed similar areas under the curve (AUC)forKwak-, EU-, ACR-TIRADS and ATA GL (0.635, 0.675, 0.694 and 0.654, respectively, n.s.). The AUC did not increase from C1 (AUC for each scoring system: 0.677) to C2 (0.632, n.s.). However, ATA GL were not applicable in 6.3% of all cases, including two malignant TN.

    Schlussfolgerungen/Conclusions:

    In this study, the results of IA and diagnostic accuracy for the four investigated systems were very similar. Learning by consensus reading significantly improves IA in TN classification systems even for experienced observers.


    #