Ultraschall Med 2022; 43(05): e49-e55
DOI: 10.1055/a-1177-0480
Original Article

Accuracy of Trained Physicians is Inferior to Deep Learning-Based Algorithm for Determining Angles in Ultrasound of the Newborn Hip

Genauigkeit von geschulten Ärzten unterliegt einem Deep-Learning-basierten Algorithmus beim Bestimmen von Winkeln im Ultraschall der Säuglingshüfte
David Oelen
1   Biotechnologie & Physik, Supercomputing Systems, Zürich, Switzerland
,
Pascal Kaiser
1   Biotechnologie & Physik, Supercomputing Systems, Zürich, Switzerland
,
Thomas Baumann
2   Research Department, Institute of Primary and Community Care Lucerne, Luzern, Switzerland
,
Raoul Schmid
3   Praxis, Baarer Kinderarztpraxis, Baar, Switzerland
,
Christof Bühler
1   Biotechnologie & Physik, Supercomputing Systems, Zürich, Switzerland
,
Bayalag Munkhuu
4   National Center for Maternal and Child Health, Ulaanbaatar, Mongolia
,
Stefan Essig
2   Research Department, Institute of Primary and Community Care Lucerne, Luzern, Switzerland
› Author Affiliations

Abstract

Purpose Sonographic diagnosis of developmental dysplasia of the hip allows treatment with a flexion-abduction orthosis preventing hip luxation. Accurate determination of alpha and beta angles according to Graf is crucial for correct diagnosis. It is unclear if algorithms could predict the angles. We aimed to compare the accuracy for users and automation reporting root mean squared errors (RMSE).

Materials and Methods We used 303 306 ultrasound images of newborn hips collected between 2009 and 2016 in screening consultations. Trained physicians labelled every second image with alpha and beta angles during the consultations. A random subset of images was labeled with time and precision under lab conditions as ground truth. Automation predicted the two angles using a convolutional neural network (CNN). The analysis was focused on the alpha angle.

Results Three methods were implemented, each with a different abstraction of the problem: (1) CNNs that directly learn the angles without any post-processing steps; (2) CNNs that return the relevant landmarks in the image to identify the angles; (3) CNNs that return the base line, bony roof line, and the cartilage roof line which are necessary to calculate the angles. The RMSE between physicians and ground truth were found to be 7.1° for alpha. The best CNN architecture was (2) landmark detection. The RMSE between landmark detection and ground truth was 3.9° for alpha.

Conclusion The accuracy of physicians in their daily routine is inferior to deep learning-based algorithms for determining angles in ultrasound of the newborn hip. Similar methods could be used to support physicians.

Zusammenfassung

Ziel Die Diagnose von Hüftdysplasie mittels Sonografie erlaubt das Behandeln mit Flexionsorthese, um einer Hüftluxation vorzubeugen. Genaue Bestimmungen der Winkel Alpha und Beta nach Graf sind essenziell für eine korrekte Diagnose. Es ist unklar, ob ein Algorithmus die Winkel vorhersagen könnte. Diese Arbeit vergleicht die Genauigkeit für Anwender und Automation mittels mittlerer quadratischer Fehler (MQF).

Material und Methode Wir verwendeten 303 306 Ultraschallbilder von Neugeborenenhüften, die zwischen 2009 und 2016 in Screening-Untersuchungen akquiriert wurden. Ausgebildete Ärzte bestimmten während der Konsultation in jedem zweiten Bild die Winkel Alpha und Beta. Eine zufällige Teilmenge an Bildern wurde unter Laborbedingungen mit Zeit und Präzision als Ground Truth beschriftet. Die Automation sagte die beiden Winkel mittels convolutional neural network (CNN) voraus. Die Analyse war auf den Winkel Alpha fokussiert.

Ergebnisse Drei Methoden wurden implementiert, jede davon mit einer anderen Abstraktion des Problems: (1) CNNs, die Winkel ohne post-processing direkt lernen; (2) CNNs, die Punkte im Bild bestimmen, die relevant sind, um die Winkel zu bestimmen; (3) CNNs, die Grundlinie, Pfannendachlinie und die Knorpeldachlinie in das Bild legen, um daraus die Winkel zu bestimmen. Der MQF zwischen Ärzten und der Ground Truth war 7,1° für Alpha. Die beste CNN-Architektur war (2) die Detektion der Punkte. Der MQF zwischen Punktedetektion und Ground Truth betrug 3,9° für Alpha.

Schlussfolgerung Die Genauigkeit von Ärzten in ihrer täglichen Arbeit ist kleiner als diejenige eines Deep-Learning-basierten Algorithmus beim Bestimmen von Winkeln im Ultraschall der Säuglingshüfte. Ähnliche Methoden könnten verwendet werden, um Ärzte zu unterstützen.

Supporting information



Publication History

Received: 20 February 2020

Accepted: 10 May 2020

Article published online:
06 August 2020

© 2020. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany

 
  • References

  • 1 Yang S, Zusman N, Lieberman E. et al. Developmental dysplasia of the hip. Pediatrics 2019; 143: e20181147
  • 2 Munkhuu B, Essig S, Renchinnyam E. et al. Incidence and treatment of developmental hip dysplasia in Mongolia: a prospective cohort study. PLoS One 2013; 8: e79427
  • 3 Kotlarsky P, Haber R, Bialik V. et al. Developmental dysplasia of the hip: What has changed in the last 20 years?. World J Orthop 2015; 6: 886
  • 4 Graf R. Hip Sonography, Diagnosis and Management of Infant Hip Dysplasia. Springer; 2006
  • 5 Quader N, Schaeffer EK, Hodgson AJ. et al. A Systematic Review and Meta-analysis on the Reproducibility of Ultrasound-based Metrics for Assessing Developmental Dysplasia of the Hip. J Pediatr Orthoped 2018; 38: e305-e311
  • 6 Jaremko JL, Mabee M, Swami VG. et al. Potential for change in US diagnosis of hip dysplasia solely caused by changes in probe orientation: patterns of alpha-angle variation revealed by using three-dimensional US. Radiology 2014; 273: 870-878
  • 7 Pedrotti L, Crivellari I, Degrate A. et al. Interpreting neonatal hip sonography: intraobserver and interobserver variability. J Pediatr Orthop B 2020; 29: 214-218
  • 8 Shirai Y, Wakabayashi K, Wada I. et al. Reproducibility of acquiring ultrasonographic infant hip images by the Graf method after an infant hip ultrasound training course. J Med Ultrason 2018; 45: 583-589
  • 9 Litjens G, Kooi T, Bejnordi BE. et al. A survey on deep learning in medical image analysis. Med Image Anal 2017; 42: 60-88
  • 10 Lecun Y, Bottou L, Bengio Y. et al. Gradient-based learning applied to document recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE; 1998: 2278-2324
  • 11 Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. 25th International Conference on Neural Information Processing Systems. USA: Curran Associates Inc; 2012: 1097-1105
  • 12 Golan D, Donner Y, Mansi C. et al. Fully Automating Graf's Method for DDH Diagnosis Using Deep Convolutional Neural Networks. Deep Learning and Data Labeling for Medical Applications: First International Workshop, LABELS, and Second International Workshop, DLMIA, Held in Conjunction with MICCAI. Greece: Springer; 2016: 130-141
  • 13 Pehrson LM, Lauridsen C, Nielsen MB. Machine learning and deep learning applied in ultrasound. Ultraschall in Med 2018; 39: 379-381
  • 14 Ritgen J, Merhof D, Kopaczka M. et al. Deep learning Algorithmen in der retrospektiven Bildanalyse großer Bilddatenbanken. Ultraschall in Med 2019; 40: WS21-WS23
  • 15 Graf R, Baumgartner F, Lercher K. et al. Sonographie der Säuglingshüfte und therapeutische Konsequenzen. Thieme; 2009
  • 16 Clarke NM. Swaddling and hip dysplasia: an orthopaedic perspective. Arch Dis Child 2014; 99: 5-6
  • 17 Blatt SH. To swaddle, or not to swaddle? paleoepidemiology of developmental dysplasia of the hip and the swaddling dilemma among the indigenous populations of North America. Am J Hum Biol 2014; 21: 116-128
  • 18 Wang E, Liu T, Li J. et al. Does swaddling influence developmental dysplasia of the hip?: An experimental study of the traditional straight-leg swaddling model in neonatal rats. J Bone Joint Surg Am 2012; 94: 1071-1077
  • 19 Essig S, Schmid R, Munkhuu B. et al. Qualitätskonzept eines Ultraschall-basierten, nationalen Screeningprogramms für Hüftdysplasie in der Mongolei. Ultraschall in Med 2017; 38: V2. 002
  • 20 Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention. Germany: Springer; 2015: 234-241
  • 21 Zhao H, Shi J, Qi X. et al. Pyramid Scene Parsing Network. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). USA: IEEE; 2017: 6230-6239
  • 22 Abadi M, Barham P, Chen J. et al Tensorflow: Large-Scale Machine Learning on Heterogeneous Systems. arXiv 2016. Preprint at https://arxiv.org/abs/1603.04467
  • 23 Dong H, Supratak A, Mai L. et al. TensorLayer: A Versatile Library for Efficient Deep Learning Development. ACM Multimedia; 2017
  • 24 Simon EA, Saur F, Buerge M. et al. Inter-observer agreement of ultrasonographic measurement of alpha and beta angles and the final type classification based on the Graf method. Swiss Med Wkly 2004; 134: 671-677
  • 25 He K, Zhang X, Ren S. et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. IEEE Conference on Computer Vision (ICCV). Chile: IEEE; 2015: 1026-1034
  • 26 Tschauner C, Klapsch W, Baumgartner A. et al. „Reifungskurve“ des sonografischen Alpha-Winkels nach GRAF unbehandelter Hüftgelenke im ersten Lebensjahr. Z Orthop Unfall 1994; 132: 502-504
  • 27 Blatter M. Automated Classication of Spatial Orientation of US Images in DDH Diagnosis Using Machine Learning Techniques. Zurich: ETH Zurich; 2019: 1-30