Am J Perinatol
DOI: 10.1055/a-2302-8604
Original Article

Identifying ChatGPT-written Patient Education Materials Using Text Analysis and Readability

1   The Warren Alpert Medical School, Brown University, Providence, Rhode Island
,
Sophie Ulene
2   Columbia University Vagelos College of Physicians and Surgeons, New York
,
3   Division of Maternal-Fetal Medicine, Department of Obstetrics and Gynecology, Women & Infants Hospital of Rhode Island, Alpert Medical School of Brown University, Providence, Rhode Island
› Author Affiliations
Funding None.

Abstract

Objective Artificial intelligence (AI)-based text generators such as Chat Generative Pre-Trained Transformer (ChatGPT) have come into the forefront of modern medicine. Given the similarity between AI-generated and human-composed text, tools need to be developed to quickly differentiate the two. Previous work has shown that simple grammatical analysis can reliably differentiate AI-generated text from human-written text.

Study Design In this study, ChatGPT was used to generate 25 articles related to obstetric topics similar to those made by the American College of Obstetrics and Gynecology (ACOG). All articles were geared towards patient education. These AI-generated articles were then analyzed for their readability and grammar using validated scoring systems and compared to real articles from ACOG.

Results Characteristics of the 25 AI-generated articles included fewer overall characters than original articles (mean 3,066 vs. 7,426; p < 0.0001), a greater average word length (mean 5.3 vs. 4.8; p < 0.0001), and a lower Flesch–Kincaid score (mean 46 vs. 59; p < 0.0001). With this knowledge, a new scoring system was develop to score articles based on their Flesch–Kincaid readability score, number of total characters, and average word length. This novel scoring system was tested on 17 new AI-generated articles related to obstetrics and 7 articles from ACOG, and was able to differentiate between AI-generated articles and human-written articles with a sensitivity of 94.1% and specificity of 100% (Area Under the Curve [AUC] 0.99).

Conclusion As ChatGPT is more widely integrated into medicine, it will be important for health care stakeholders to have tools to separate originally written documents from those generated by AI. While more robust analyses may be required to determine the authenticity of articles written by complex AI technology in the future, simple grammatical analysis can accurately characterize current AI-generated texts with a high degree of sensitivity and specificity.

Key Points

  • More tools are needed to identify AI-generated text in obstetrics, for both doctors and patients.

  • Grammatical analysis is quick and easily done.

  • Grammatical analysis is a feasible and accurate way to identify AI-generated text.

Supplementary Material



Publication History

Received: 19 February 2024

Accepted: 24 March 2024

Accepted Manuscript online:
09 April 2024

Article published online:
02 May 2024

© 2024. Thieme. All rights reserved.

Thieme Medical Publishers, Inc.
333 Seventh Avenue, 18th Floor, New York, NY 10001, USA

 
  • References

  • 1 Open AI. GPT-3.5. 2023 . Accessed April 15, 2024 at: https://chat.openai.com/
  • 2 Li SW, Kemp MW, Logan SJS. et al; National University of Singapore Obstetrics and Gynecology Artificial Intelligence (NUS OBGYN-AI) Collaborative Group. ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology. Am J Obstet Gynecol 2023; 229 (02) 172.e1-172.e12
  • 3 Sanchez-Ramos L, Lin L, Romero R. Beware of references when using ChatGPT as a source of information to write scientific articles. Am J Obstet Gynecol 2023; 229 (03) 356-357
  • 4 Grammarly. Free AI Writing Assistance. 2023 . Accessed April 15, 2024 at: https://www.grammarly.com/
  • 5 Levin G, Meyer R, Kadoch E, Brezinov Y. Identifying ChatGPT-written OBGYN abstracts using a simple tool. Am J Obstet Gynecol MFM 2023; 5 (06) 100936