Is ChatGPT ready for prime time with patient education?
There’s no question that healthcare works best when patients understand their conditions, their treatments, and how their everyday actions can affect their outcomes. Knowledgeable patients and caregivers are less anxious, more likely to seek out preventive care services, and more likely to achieve their health goals. Yet comprehensive, reliable patient education on even the most common topics isn’t always available.
Professional societies, patient advocacy groups, health systems, and companies like WebMD and Healthline have produced a huge array of resources aimed at educating patients, and savvy individuals can even try combing through millions of online clinical journals themselves to piece together information on their unique health conditions.
But it can take a lot of time, effort, and a relatively high degree of foundational health literacy to really get to grips with a complicated disease state.
So why not turn to the internet’s newest wunderkind, ChatGPT, to do the heavy lifting instead?
As a large language model (LLM), ChatGPT is adept at taking a natural language prompt, such as a question from a patient, and scanning through billions of disparate documents before summarizing its findings into a succinct narrative that often, but not always, fits the brief.
For many everyday applications, the answers are good enough to run with. In healthcare, however, “good enough” simply isn’t enough. Misleading or missing information can have an enormous impact on patient choices, which ultimately affect their own outcomes as well as spending and utilization across the nation’s health system.
The question then becomes: how close is ChatGPT to becoming a viable source of accurate patient education? Can consumers rely on the algorithm to provide accessible, meaningful insights into their health concerns?
ChatGPT’s potential for becoming a partner in patient education
ChatGPT is a form of generative AI, which is designed to produce new textual, video, or audio content based on an extensive corpus of training information. After ingesting most of the internet as background fodder, the latest iteration of the popular tool can create human-sounding narratives that address a variety of healthcare topics with convincing authority.
In fact, ChatGPT nearly earned its medical degree in February of 2023 after scoring passing or close-to-passing grades on the components of the United States Medical Licensing Exam (USMLE).
The model itself believes it has plenty of opportunities to help with healthcare, stating in a conversation with the New England Medical Journal that it has the potential to assist with administrative tasks, clinical documentation, medical research, medical education, and patient engagement.
But it also pointed out that there are still many barriers to overcome first, including privacy and security, ethical considerations, and significant challenges with biases, transparency, and the ongoing need for human oversight.
ChatGPT’s seeming self-awareness is one reason why it has made such a deep impression on its users, and it makes some good points about its own limitations. For example, the model repeatedly pointed out during the interview that it is unable to understand and interpret context in the same way as humans, which may be a crucial sticking point for its use as a patient educator.
It’s likely correct. Large language models are only as good as their source material, and current online resources have long been criticized for creating undue alarm among symptom-searching patients who often seem to end up with the worst possible differential diagnosis. Without that sense of clinical context, ChatGPT could end up perpetuating the flaws in the existing patient education ecosystem.
Gauging real-world performance on patient education tasks
But how does ChatGPT measure up in the real world? So-so, depending on the specific use cases, according to an emerging body of research on the topic.
Investigators from around the world have been looking into how the tool performs in different specialty areas, including gastrointestinal illnesses, liver disease, radiology issues, and eye problems. Overall, the results indicate promising potential, but significant room for improvement. For example:
- Researchers from Turkey fed ChatGPT 20 questions about Crohn’s disease (CD) and ulcerative colitis (UC), two common gastrointestinal diseases. Half of the questions were designed to be patient-facing, and the other half were physician-facing. The answers were graded by gastroenterology experts for accuracy and usefulness on a scale of 1-7. While ChatGPT produced high scores when listing symptoms and complications of both diseases, it only achieved 3’s and 4’s for both usefulness and accuracy on questions about causes, diagnosis, and treatments.
- When an optometry research team in Birmingham, UK, generated 275 responses from ChatGPT about myopia, the model produced “very good” responses only 24% of the time. A further 49% were rated “good,” but more than 5% of responses were “poor” or “very poor” in quality. However, the five human evaluators of the responses were also problematic, demonstrating “significant difference in scoring” between them, which could have an impact on the interpretation of the results.
- Missing information was the main concern in a study from Cedars-Sinai Medical Center, where researchers asked 164 questions about liver cirrhosis and hepatocellular carcinoma (HCC). While ChatGPT’s knowledge of cirrhosis and HCC was largely correct (79.1% and 74% respectively), less than half of the information received was comprehensive (47.3% for cirrhosis and 41.1% for HCC). The model lacked knowledge of regional variation in guidelines and often failed to specify decision-making cutoffs and treatment durations, the authors said. However, it performed better at providing advice about next steps to patients and caregivers who had received a diagnosis.
- Researchers from Beth Israel Deaconess Medical Center compared ChatGPT’s answers to existing resources on the website of the Society of Interventional Radiologists. After developing questions based on published website content, they found that ChatGPT’s answers were longer and more difficult to read based on standardized readability scales (although both sources were written at higher-than-recommended reading grade levels for patient consumption). ChatGPT’s output included incorrect elements for 12 (11.5%) of 104 questions, some of which were due to context issues.
Taking the next steps toward AI-assisted patient education
Overall, ChatGPT seems to have middling performance on the types of questions that patients may wish to ask. In some cases, however, it could give answers that are worryingly inaccurate, incomplete, or just plain wrong.
This is a major concern for healthcare providers, who have already spent years battling the bane of “Dr. Google” in their practices and fielding frustrations about patients coming to the clinic armed with inaccurate or irrelevant knowledge about health conditions.
It turns out, however, that both patients and providers might be aligned on the best way to get accurate and useful information about specific health concerns. In a recent survey from Wolters Kluwer Health, two-thirds of patients would prefer to get education directly from their providers, not from the web. More than 90% of respondents said they would engage with the materials if they got them, and 68% said they would be more likely to return to a provider who offered them educational materials in the future.
This is where ChatGPT could prove its utility for clinicians who want to provide this service to their patients but feel pinched for time and resources. Using generative AI to quickly and easily create tailored patient education materials – and review them for applicability and accuracy before sharing with consumers – could be the best of both worlds. With appropriate clinical oversight, providers and patients could benefit from ChatGPT’s acknowledged clinical prowess without as many risks of bias or misinformation.
After all, as ChatGPT said itself in its NEJM interview, “as a language model, ChatGPT is not capable of replacing human healthcare professionals. Instead, ChatGPT can be used as a tool to assist healthcare professionals in providing better care to their patients.”
ChatGPT might not be ready to fly solo with patient education just yet, but it could be a valuable addition to the clinical toolkit sooner rather than later if humans are available to provide the necessary context and direction to augment its output and produce meaningful guidance for patients seeking answers about their own health.
Jennifer Bresnick is a journalist and freelance content creator with a decade of experience in the health IT industry. Her work has focused on leveraging innovative technology tools to create value, improve health equity, and achieve the promises of the learning health system. She can be reached at jennifer@inklesscreative.com.