Article Text
Abstract
Objective Lupus is a rare and complex disease that affects almost all aspects of life. Inevitably, patients are constantly confronted with questions about their disease. Nevertheless, the shortage of rheumatology expert care often stands in contrast with the patients’ demand for sufficient information. To provide reliable disease-related information in a patient-friendly language, ‘Lupus100.org’ was launched, where experts have answered 100 common questions.
With the advent of widely accessible artificial intelligence, the question arises to what extent AI large language models (LLM) could fill in the information gap and support physicians in the care of patients. Therefore, this study aimed to assess the capability of the LLM ChatGPT-4 to comment on 100 frequently asked patient questions related to lupus.
Methods ChatGPT-4 responses were generated by entering the English questions from https://lupus100.org/ in a fresh session on October 16, 2023. Three senior rheumatologists who were blinded concerning authorship evaluated responses from ChatGPT-4 and Lupus100 independently. The evaluation criteria were quality, empathy (Likert scale 1–5 each) and the selection of a preferred answer. Differences between the scores were analysed using a two-tailed Student’s t-test. A one-sample Chi-Square test was performed to assess whether there was a preferred source for the answers. All statistical analyses were conducted in SPSS, the significance threshold used was p<.05.
Results The quality of the answers provided by ChatGPT-4 was considered significantly greater than that of the Lupus100 responses (table 1). A similar trend was observed for empathy but the difference was not statistically significant. Regarding the responses scored as of ‘poor’ or ‘very poor’ quality and ‘not empathetic’, there were very few cases for either ChatGPT-4 or Lupus100. In general, more responses from ChatGPT-4 (n = 171, 57%) were preferred over those from Lupus100 (n = 129, 43%) and this difference was seen to be significant (p = 0.02).
Conclusions In this study, the LLM ChatGPT-4 generated quality and empathetic responses to patient questions concerning lupus. The study suggests that such models might be a valuable source of patient information and it may support physicians in generating beneficial patient information.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .