Senator warns Google on premature AI clinical use
Generative AI is in acceleration mode, with new application ideas emerging daily including in the world of healthcare. Not everyone is feeling the unbridled excitement over all the ways this technology could make life easier. Sen. Mark Warner (D-Virginia) is among the cautious, as evidenced by his letter to Google expressing concerns about AI clinical use in select hospitals as part of a pilot program for its Med PaLM 2 AI tool.
“While artificial intelligence (AI) undoubtedly holds tremendous potential to improve patient care and health outcomes, I worry that premature deployment of unproven technology could lead to the erosion of trust in our medical professionals and institutions, the exacerbation of existing racial disparities in health outcomes, and an increased risk of diagnostic and care-delivery errors,” Warner said, in the letter.
A former tech entrepreneur, Warner expressed a fear that major tech companies, including Google and Microsoft, are rushing generative AI tools to hospitals to capture market share following the buzz created by OpenAI’s release of Chat-GPT.
Life or death mistakes
On the race to gain market share, Warner warned mistakes in a clinical setting can have life-or-death consequences.
While AI previously has been used in medical settings, Warner said the new Google generative tool Med-PaLM 2, promises to answer medical questions, summarize documents and organize health data. As part of a pilot program started earlier this year, this Google tool is being used in select hospitals including the Mayo Clinic — VHC Health is a Mayo member health system located in Warner’s state.
The prior version, Med PaLM, was the first language learning model (LLM) to earn a passing score — 67.2% — on US Medical Licensing Examination (USMLE)-style questions, but researchers noted significant ways the tool should be improved, especially when comparing the AI answers to those from actual clinicians.
“… further work was needed to ensure the AI output, including long-form answers to open-ended questions, are safe and aligned with human values and expectations in this safety-critical domain,” they reported.
Developed using the lessons learned from the initial version, Med PaLM 2, the LLM scored 86.5% on the USMLE-style test, and comparison of its answers to clinician’s responses showed the AI tool’s responses were preferred by physicians across eight of the nine axes on clinical utility, according to the researchers.
While these improvements show rapid progress, Warner noted the one axis where Med PaLM 2 faltered, as did its earlier version, was on containing more inaccurate or irrelevant information than answers provided by physicians.
LLM training and patient privacy questions
To close out his letter, Warner asked Google to respond to a dozen questions largely focused on the testing and training of the AI tool, and the many ways Med PaLM 2, Google, and its hospital pilot program handle patient privacy and choice.
“In 2019, I raised concerns that Google was skirting health privacy laws through secretive partnerships with leading hospital systems, under which it trained diagnostic models on sensitive health data without patients’ knowledge or consent.” Warner said.
He noted LLMs tend to memorize training data, including sensitive health information. “How has Google evaluated Med-PaLM 2 for this risk and what steps has Google taken to mitigate inadvertent privacy leaks of sensitive health information?” he asked.
Another concern was whether Google ensures patients are informed when Med PaLM 2 is being used in their care, and if the AI tool includes training on protecting health information.
Warner further asked for information on any re-training of Med PaLM 2 and what information Google provided Mayo Clinic and other select pilot program hospitals on datasets, testing, evaluation and other performance and quality parameters.
Google was also asked to respond to the possibility that the tendency for LLMs to repeat back a user’s preferred answer, a process called sycophancy, may increase risks of misdiagnosis in clinical settings.
Too soon for AI clinical care use
In his letter, Warner cited a quote in the Wall Street Journal attributed to a senior research director at Google who worked on Med PaLM 2: “I don’t feel that this kind of technology is yet at a place where I would want it in my family’s healthcare journey.”
Citing Google’s own research, Warner warned the use of generative AI tools such as Med PaLM 2 in a clinical setting requires guardrails to limit overreliance on the LLM, including when it should and shouldn’t be used.
Google has said the pilot program includes a limited number of care providers and is meant to explore its usefulness in healthcare. It emphasized Med PaLM 2 is not a chatbot but a finely tuned LLM.
Considering the accuracy, privacy, transparency, and training concerns expressed in his letter, Warner seems very concerned the technology is not ready for use in clinical care.
“It is clear more work is needed to improve this technology as well as to ensure the health care community develops appropriate standards governing the deployment and use of AI,” he concluded.