CC BY 4.0 · Appl Clin Inform
DOI: 10.1055/a-2565-9155
Research Article

Primary Care Providers Acceptance of Generative AI Responses to Patient Portal Messages

Amarpreet Kaur
1   Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, United States (Ringgold ID: RIN6572)
,
Alex Budko
2   Computer and Information Sciences, University of Pennsylvania, Philadelphia, United States (Ringgold ID: RIN6572)
,
Katrina Liu
3   Computer and Information Science, University of Pennsylvania, Philadelphia, United States (Ringgold ID: RIN6572)
,
Bryan D. Steitz
4   Biomedical Informatics, Vanderbilt University Medical Center, Nashville, United States (Ringgold ID: RIN12328)
,
Kevin B Johnson
1   Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, United States (Ringgold ID: RIN6572)
› Author Affiliations
Supported by: National Institutes of Health 5DP1LM014558

Background: Patient portals bridge patient and provider communications but exacerbate physician and nursing burnout. Large language models (LLMs) can generate message responses that are viewed favorably by healthcare professionals; however, these studies have not included diverse message types or new prompt-engineering strategies. Our goal is to investigate and compare the quality and precision GPT-generated message responses versus real doctor responses across the spectrum of message types within a patient portal. Methods: We used prompt engineering techniques to craft synthetic provider responses tailored to adult primary care patients. We enrolled a sample of primary care providers in a cross-sectional study to compare authentic with synthetic patient portal message responses, generated by GPT-3.5-turbo, July 2023 version (GPT). The survey assessed each response’s empathy, relevance, medical accuracy, and readability on a scale from 0 to 5. Respondents were asked to identify responses that were GPT-generated vs. provider-generated. Mean scores for all metrics were computed for subsequent analysis. Results: A total of 49 health care providers participated in the survey (59% completion rate), comprising 16 physicians and 32 advanced practice providers (APPs). In comparison to responses generated by real doctors, GPT-generated responses scored statistically significantly higher than doctors in two of the four parameters: empathy (p<0.05) and readability (p<0.05). However, no statistically significant difference was observed for relevance and accuracy (p > 0.05). Although readability scores were significantly different, the absolute difference was small, and the clinical significance of this finding remains uncertain. Conclusion: Our findings affirm the potential of GPT-generated message responses to achieve comparable levels of empathy, relevance, and readability to those found in typical responses crafted by healthcare providers. Additional studies should be done within provider workflows and with careful evaluation of patient attitudes and concerns related to the ethics as well as the quality of generated responses in all settings.



Publication History

Received: 12 August 2024

Accepted after revision: 11 February 2025

Accepted Manuscript online:
25 March 2025

© . The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution License, permitting unrestricted use, distribution, and reproduction so long as the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/).

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany