Am J Perinatol
DOI: 10.1055/s-0044-1786033
Original Article

Exploring the Limits of Artificial Intelligence for Referencing Scientific Articles

Emily M. Graf
1   Department of Obstetrics and Gynecology, University of Florida College of Medicine, Jacksonville, Florida
,
1   Department of Obstetrics and Gynecology, University of Florida College of Medicine, Jacksonville, Florida
,
Alexander B. Dye
1   Department of Obstetrics and Gynecology, University of Florida College of Medicine, Jacksonville, Florida
,
Lifeng Lin
2   Department of Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona
,
Luis Sanchez-Ramos
1   Department of Obstetrics and Gynecology, University of Florida College of Medicine, Jacksonville, Florida
› Author Affiliations

Abstract

Objective To evaluate the reliability of three artificial intelligence (AI) chatbots (ChatGPT, Google Bard, and Chatsonic) in generating accurate references from existing obstetric literature.

Study Design Between mid-March and late April 2023, ChatGPT, Google Bard, and Chatsonic were prompted to provide references for specific obstetrical randomized controlled trials (RCTs) published in 2020. RCTs were considered for inclusion if they were mentioned in a previous article that primarily evaluated RCTs published by the top medical and obstetrics and gynecology journals with the highest impact factors in 2020 as well as RCTs published in a new journal focused on publishing obstetric RCTs. The selection of the three AI models was based on their popularity, performance in natural language processing, and public availability. Data collection involved prompting the AI chatbots to provide references according to a standardized protocol. The primary evaluation metric was the accuracy of each AI model in correctly citing references, including authors, publication title, journal name, and digital object identifier (DOI). Statistical analysis was performed using a permutation test to compare the performance of the AI models.

Results Among the 44 RCTs analyzed, Google Bard demonstrated the highest accuracy, correctly citing 13.6% of the requested RCTs, whereas ChatGPT and Chatsonic exhibited lower accuracy rates of 2.4 and 0%, respectively. Google Bard often substantially outperformed Chatsonic and ChatGPT in correctly citing the studied reference components. The majority of references from all AI models studied were noted to provide DOIs for unrelated studies or DOIs that do not exist.

Conclusion To ensure the reliability of scientific information being disseminated, authors must exercise caution when utilizing AI for scientific writing and literature search. However, despite their limitations, collaborative partnerships between AI systems and researchers have the potential to drive synergistic advancements, leading to improved patient care and outcomes.

Key Points

  • AI chatbots often cite scientific articles incorrectly.

  • AI chatbots can create false references.

  • Responsible AI use in research is vital.



Publication History

Received: 13 November 2023

Accepted: 24 March 2024

Article published online:
23 April 2024

© 2024. Thieme. All rights reserved.

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA

 
  • References

  • 1 Jergas H, Baethge C. Quotation accuracy in medical journal articles-a systematic review and meta-analysis. PeerJ 2015; 3: e1364
  • 2 Key JD, Roland CG. Reference accuracy in articles accepted for publication in the Archives of Physical Medicine and Rehabilitation. Arch Phys Med Rehabil 1977; 58 (03) 136-137
  • 3 Kristof C. Accuracy of reference citations in five entomology journals. Am Entomol (Lanham Md) 1997; 43: 246-251
  • 4 Siebers R, Holt S. Accuracy of references in five leading medical journals. Lancet 2000; 356 (9239): 1445
  • 5 Mogull SA. Accuracy of cited “facts” in medical research articles: a review of study methodology and recalculation of quotation error rate. PLoS One 2017; 12 (09) e0184727
  • 6 Chavez MR, Butler TS, Rekawek P, Heo H, Kinzler WL. Chat generative pre-trained transformer: why we should embrace this technology. Am J Obstet Gynecol 2023; 228 (06) 706-711
  • 7 Grünebaum A, Chervenak J, Pollet SL, Katz A, Chervenak FA. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol 2023; 228 (06) 696-705
  • 8 Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: an emerging stage for an innovative perspective. BenchCouncil Transactions on Benchmarks, Standards and Evaluations 2023: 100105
  • 9 Alkaissi H, McFarlane SI. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus 2023; 15 (02) e35179
  • 10 Ariyaratne S, Iyengar KP, Nischal N, Chitti Babu N, Botchu R. A comparison of ChatGPT-generated articles with human-written articles. Skeletal Radiol 2023; 52 (09) 1755-1758
  • 11 Kim S-G. Using ChatGPT for language editing in scientific articles. Maxillofac Plast Reconstr Surg 2023; 45 (01) 13
  • 12 Kitamura FC. ChatGPT is shaping the future of medical writing but still requires human judgment. Radiology 2023; 307 (02) e230171
  • 13 Sanchez-Ramos L, Lin L, Romero R. Beware of references when using ChatGPT as a source of information to write scientific articles. Am J Obstet Gynecol 2023; 229 (03) 356-357
  • 14 Wagner MW, Ertl-Wagner BB. Accuracy of information and references using ChatGPT-3 for retrieval of clinical radiological information. Can Assoc Radiol J 2024; 75 (01) 69-73
  • 15 Wittmann J. Science fact vs science fiction: a ChatGPT immunological review experiment gone awry. Immunol Lett 2023; 256–257: 42-47
  • 16 Doulaveris G, Vani K, Saccone G, Chauhan SP, Berghella V. Number and quality of randomized controlled trials in obstetrics published in the top general medical and obstetrics and gynecology journals. Am J Obstet Gynecol MFM 2022; 4 (01) 100509
  • 17 Vintzileos AM, Chavez MR, Romero R. A role for artificial intelligence chatbots in the writing of scientific articles. Am J Obstet Gynecol 2023; 229 (02) 89-90
  • 18 Madhi SA, Polack FP, Piedra PA. et al; Prepare Study Group. Respiratory syncytial virus vaccination during pregnancy and effects in infants. N Engl J Med 2020; 383 (05) 426-439
  • 19 Oladapo OT, Vogel JP, Piaggio G. et al; WHO ACTION Trials Collaborators. Antenatal dexamethasone for early preterm birth in low-resource countries. N Engl J Med 2020; 383 (26) 2514-2525
  • 20 Kuppermann M, Kaimal AJ, Blat C. et al. Effect of a patient-centered decision support tool on rates of trial of labor after previous cesarean delivery: the PROCEED randomized clinical trial. JAMA 2020; 323 (21) 2151-2159
  • 21 Tuuli MG, Liu J, Tita ATN. et al. Effect of prophylactic negative pressure wound therapy vs standard wound dressing on surgical-site infection in obese women after cesarean delivery: a randomized clinical trial. JAMA 2020; 324 (12) 1180-1189
  • 22 Hoffman MK, Goudar SS, Kodkany BS. et al; ASPIRIN Study Group. Low-dose aspirin for the prevention of preterm delivery in nulliparous women with a singleton pregnancy (ASPIRIN): a randomised, double-blind, placebo-controlled trial. Lancet 2020; 395 (10220): 285-293
  • 23 Shennan A, Chandiramani M, Bennett P. et al. MAVRIC: a multicenter randomized controlled trial of transabdominal vs transvaginal cervical cerclage. Am J Obstet Gynecol 2020; 222 (03) 261.e1-261.e9
  • 24 Teigen NC, Sahasrabudhe N, Doulaveris G. et al. Enhanced recovery after surgery at cesarean delivery to reduce postoperative length of stay: a randomized controlled trial. Am J Obstet Gynecol 2020; 222 (04) 372.e1-372.e10
  • 25 Dinis J, Soto E, Pedroza C, Chauhan SP, Blackwell S, Sibai B. Nonopioid versus opioid analgesia after hospital discharge following cesarean delivery: a randomized equivalence trial. Am J Obstet Gynecol 2020; 222 (05) 488.e1-488.e8
  • 26 Harper LM, Jauk V, Longo S, Biggio JR, Szychowski JM, Tita AT. Early gestational diabetes screening in obese women: a randomized controlled trial. Am J Obstet Gynecol 2020; 222 (05) 495.e1-495.e8
  • 27 Stone J, Bianco A, Monro J. et al. Study to reduce infection prior to elective cesarean deliveries (STRIPES): a randomized clinical trial of chlorhexidine. Am J Obstet Gynecol 2020; 223 (01) 113.e1-113.e11
  • 28 Yu SCH, Cheng YKY, Tse WT. et al. Perioperative prophylactic internal iliac artery balloon occlusion in the prevention of postpartum hemorrhage in placenta previa: a randomized controlled trial. Am J Obstet Gynecol 2020; 223 (01) 117.e1-117.e13
  • 29 Monson M, Heuser C, Einerson BD. et al. Evaluation of an external fetal electrocardiogram monitoring system: a randomized controlled trial. Am J Obstet Gynecol 2020; 223 (02) 244.e1-244.e12
  • 30 Sherrell HC, Clifton VL, Kumar S. Prelabor screening at term using the cerebroplacental ratio and placental growth factor: a pragmatic randomized open-label phase 2 trial. Am J Obstet Gynecol 2020; 223 (03) 429.e1-429.e9
  • 31 Paidas MJ, Tita ATN, Macones GA. et al. Prospective, randomized, double-blind, placebo-controlled evaluation of the pharmacokinetics, safety and efficacy of recombinant antithrombin versus placebo in preterm preeclampsia. Am J Obstet Gynecol 2020; 223 (05) 739.e1-739.e13
  • 32 Roman A, Zork N, Haeri S. et al. Physical examination-indicated cerclage in twin pregnancy: a randomized controlled trial. Am J Obstet Gynecol 2020; 223 (06) 902.e1-902.e11
  • 33 Shamshirsaz AA, Lee TC, Hair AB. et al. Elective delivery at 34 weeks vs routine obstetric care in fetal gastroschisis: randomized controlled trial. Ultrasound Obstet Gynecol 2020; 55 (01) 15-19
  • 34 Andreasen LA, Tabor A, Nørgaard LN. et al. Multicenter randomized trial exploring effects of simulation-based ultrasound training on obstetricians' diagnostic accuracy: value for experienced operators. Ultrasound Obstet Gynecol 2020; 55 (04) 523-529
  • 35 Barney EZ, Pedro CD, Gamez BH, Fuller ME, Dominguez JE, Habib AS. Ropivacaine and ketorolac wound infusion for post-cesarean delivery analgesia: a randomized controlled trial. Obstet Gynecol 2020; 135 (02) 427-435
  • 36 Lassey SC, Little SE, Saadeh M. et al. Cephalic elevation device for second-stage cesarean delivery: a randomized controlled trial. Obstet Gynecol 2020; 135 (04) 879-884
  • 37 Chen M, Liu X, You Y. et al. Internal iliac artery balloon occlusion for placenta previa and suspected placenta accreta: a randomized controlled trial. Obstet Gynecol 2020; 135 (05) 1112-1119
  • 38 Mireault D, Loubert C, Drolet P. et al. Uterine exteriorization compared with in situ repair of hysterotomy after cesarean delivery: a randomized controlled trial. Obstet Gynecol 2020; 135 (05) 1145-1151
  • 39 Bleicher I, Dikopoltsev E, Kadour-Ferro E. et al. Double-balloon device for 6 compared with 12 hours for cervical ripening: a randomized controlled trial. Obstet Gynecol 2020; 135 (05) 1153-1160
  • 40 Ganer Herman H, Kleiner I, Tairy D. et al. Effect of digital step counter feedback on mobility after cesarean delivery: a randomized controlled trial. Obstet Gynecol 2020; 135 (06) 1345-1352
  • 41 Phipps MG, Ware CF, Stout RL, Raker CA, Zlotnick C. Reducing the risk for postpartum depression in adolescent mothers: a randomized controlled trial. Obstet Gynecol 2020; 136 (03) 613-621
  • 42 Ausbeck EB, Jauk VC, Xue Y. et al. Outpatient Foley catheter for induction of labor in nulliparous women: a randomized controlled trial. Obstet Gynecol 2020; 136 (03) 597-606
  • 43 Adhikari EH, Nelson DB, McIntire DD, Leveno KJ. Foley bulb added to an oral misoprostol induction protocol: a cluster randomized trial. Obstet Gynecol 2020; 136 (05) 953-961
  • 44 Brookfield KF, Tuel K, Rincon M, Vinson A, Caughey AB, Carvalho B. Alternate dosing protocol for magnesium sulfate in obese women with preeclampsia: a randomized controlled trial. Obstet Gynecol 2020; 136 (06) 1190-1194
  • 45 Husain S, Allotey J, Drymoussi Z. et al. Effects of oral probiotic supplements on vaginal microbiota during pregnancy: a randomised, double-blind, placebo-controlled trial with microbiome analysis. BJOG 2020; 127 (02) 275-284
  • 46 Sangkomkamhang U, Kongwattanakul K, Kietpeerakool C. et al. Restrictive versus routine episiotomy among Southeast Asian term pregnancies: a multicentre randomised controlled trial. BJOG 2020; 127 (03) 397-403
  • 47 Ahmed A, Williams DJ, Cheed V. et al; StAmP trial Collaborative Group. Pravastatin for early-onset pre-eclampsia: a randomised, blinded, placebo-controlled trial. BJOG 2020; 127 (04) 478-488
  • 48 Ngai FW, Wong PC, Chung KF, Chau PH, Hui PW. Effect of couple-based cognitive behavioural intervention on prevention of postnatal depression: multisite randomised controlled trial. BJOG 2020; 127 (04) 500-507
  • 49 Beckmann M, Gibbons K, Flenady V, Kumar S. Induction of labour using prostaglandin E2 as an inpatient versus balloon catheter as an outpatient: a multicentre randomised controlled trial. BJOG 2020; 127 (05) 571-579
  • 50 Akselsson A, Lindgren H, Georgsson S. et al. Mindfetalness to increase women's awareness of fetal movements and pregnancy outcomes: a cluster-randomised controlled trial including 39 865 women. BJOG 2020; 127 (07) 829-837
  • 51 Slade P, West H, Thomson G. et al. STRAWB2 (Stress and Wellbeing After Childbirth): a randomised controlled trial of targeted self-help materials to prevent post-traumatic stress disorder following childbirth. BJOG 2020; 127 (07) 886-896
  • 52 Wolf HT, Brok J, Henriksen TB. et al; MASP research group. Antenatal magnesium sulphate for the prevention of cerebral palsy in infants born preterm: a double-blind, randomised, placebo-controlled, multi-centre trial. BJOG 2020; 127 (10) 1217-1225
  • 53 Tan PC, Rohani E, Lim M, Win ST, Omar SZ. A randomised trial of caesarean wound coverage: exposed versus dressed. BJOG 2020; 127 (10) 1250-1258
  • 54 Tan PC, Abdussyukur SA, Lim BK, Win ST, Omar SZ. Twelve-hour fasting compared with expedited oral intake in the initial inpatient management of hyperemesis gravidarum: a randomised trial. BJOG 2020; 127 (11) 1430-1437
  • 55 Choi SJ, Kwak DW, Kil K. et al; from The Preterm Birth Research Committee of the Korean Society of Maternal Fetal Medicine. Vaginal compared with intramuscular progestogen for preventing preterm birth in high-risk pregnant women (VICTORIA study): a multicentre, open-label randomised trial and meta-analysis. BJOG 2020; 127 (13) 1646-1654
  • 56 Hautakangas T, Uotila J, Huhtala H, Palomäki O. Intrauterine versus external tocodynamometry in monitoring labour: a randomised controlled clinical trial. BJOG 2020; 127 (13) 1677-1686
  • 57 Moors S, Bullens LM, van Runnard Heimel PJ. et al. The effect of intrauterine resuscitation by maternal hyperoxygenation on perinatal and maternal outcome: a randomized controlled trial. Am J Obstet Gynecol MFM 2020; 2 (02) 100102
  • 58 Duffy CR, Garcia-So J, Ajemian B, Gyamfi-Bannerman C, Han YW. A randomized trial of the bactericidal effects of chlorhexidine vs povidone-iodine vaginal preparation. Am J Obstet Gynecol MFM 2020; 2 (03) 100114
  • 59 Dengler KL, Simpson KJ, Strauchon CJ, Shaddeau AK, Brooks DI, Gruber DD. A randomized controlled trial of liposomal bupivacaine for pain following obstetrical laceration. Am J Obstet Gynecol MFM 2020; 2 (03) 100115
  • 60 Lewkowitz AK, López JD, Carter EB. et al. Impact of a novel smartphone application on low-income, first-time mothers' breastfeeding rates: a randomized controlled trial. Am J Obstet Gynecol MFM 2020; 2 (03) 100143
  • 61 Elkind-Hirsch KE, Seidemann E, Harris R. A randomized trial of dapagliflozin and metformin, alone and combined, in overweight women after gestational diabetes mellitus. Am J Obstet Gynecol MFM 2020; 2 (03) 100139