Why do AI models hallucinate and what can go wrong when they do?

Understanding how, when, and why AI models hallucinate is crucial for safety and reliability as LLMs become more popular in healthcare.

By admin

Aug 20, 2025, 7:41 AM

Biologically-induced hallucinations make it difficult for humans to trust their own experiences– and now, computationally-induced hallucinations are making humans question how much they can trust their AI companions.

Hallucinations in artificial intelligence models are defined as the delivery of incorrect, misleading, fabricated, or nonsensical responses to a prompt intended to elicit a real answer.

Hallucinations often appear extremely plausible, which makes it difficult for human users to identify improper responses and act accordingly – especially when users get overly comfortable with “cognitive offloading,” or delegating critical thinking skills to their large language model (LLMs) helpmates.

All LLMs can make mistakes (Chat-GPT even includes a warning to that effect right on its main interface), but not all hallucinations are created equal.

Mistakes can range from the relatively benign, like adding extra fingers to a model in a catalog image, to the life-threatening, such as cases in which AI tools add their own invented content when transcribing conversations or even create whole new body parts that don’t actually exist in humans.

Obviously, hallucinations in the medical field are fraught with potential problems, especially when AI tools start to reach into the world of direct patient care. While humans are certainly prone to making devastating mistakes, the lack of transparency and accountability around AI models raises difficult questions of how to prevent hallucinations from causing harm and who is liable when an adverse event does occur.

The first steps for solving the hallucination problem are understanding what causes an AI tool to produce undesirable information and recognizing why a human-AI interaction isn’t going quite as planned.

Why do AI models get it wrong?

Hallucinations can stem from a variety of factors, most of which come down to challenges with their training data. Common problems include:

Insufficient training data – When models don’t have enough data to learn on, they’re going to struggle to extrapolate answers to queries that exceed their experiences. They might make incorrect predictions or flat-out wrong guesses when encountering unfamiliar situations.
Biased training data – There’s a reason why data diversity and representation are so important in medical research. When training datasets are too narrow or too far skewed toward a specific population, models aren’t going to be able to extrapolate accurate answers about cases that fall outside of those parameters.
Overfitting – Similarly, models that don’t have enough opportunity to train on broader datasets might cling too closely to the foundational dataset they’ve been using for a long time. Models might become really good at returning answers based on that initial dataset, but they fail to reproduce the same levels of accuracy and reliability when set loose on unfamiliar data.

All these issues can directly impact healthcare providers, especially when adopting products from smaller companies that claim near-perfect results from their models. Prospective purchasers should carefully review any claims made by AI vendors, including those related to accuracy, specificity, and trustworthiness that haven’t been fully validated by objective, third-party sources.

What types of hallucinations can occur when AI gets it wrong?

Researchers have been working to create classifications of hallucinations to help developers and users spot mistakes and make meaningful improvements to their models.

Broadly, hallucinations can be classified into “intrinsic” and “extrinsic” errors, according to a pre-print paper from the University of Barcelona.

Intrinsic hallucinations occur when models generate responses that contradict the provided input or context of the query due to an inability to maintain internal consistency in its logic. For example, the author says, if an AI is asked to summarize a paper stating that the FDA approved the first Ebola vaccine in 2019, an intrinsic hallucination would be if the summary claimed that the FDA in fact rejected the vaccine.

In contrast, extrinsic hallucinations are responses about things that don’t exist in reality at all, such as the aforementioned cases of the brand-new body parts or invented patient-provider interactions inserted into visit summaries. These mistakes often occur when a model is asked to create something “new” or is attempting to bridge a gap in its knowledge base that stretches too far for its abilities.

These categories can be broken down further into more specific types of errors, the paper continues, including:

Factual errors and fabrications – Direct contradictions of provable facts, such as incorrectly stating birthdates for notable people, misrepresenting important historical events, or presenting folklore or urban myths as truth.
Instruction inconsistencies – Instances where the model fails to follow specific instructions provided in the prompt and returns incorrect or unasked-for information.
Logical inconsistencies – Self-contradictory statements or a lack of logical connection between semantically related statements, such as failing to adhere to the appropriate sequence of operations in a math problem or presenting two conflicting sentences about the same event in the same response.
Nonsensical output – Statements that have no clear relationship to the prompt or distort and conflate output in such a way as to lose meaning or relevance to the original query.

These and other types of hallucinations can cause direct harm to users and the community if not caught and addressed. For example, ethical and legal violations are a serious risk. Hallucinating LLMs could be implicated in cases of defamation or libel if they falsely accuse an individual of unsavory or illegal behavior, or persuade individuals to behave in a certain manner that violates ethical or moral ideals.

Companies and individuals can also find themselves in trouble if LLMs produce false or non-existent sources for information or provide inaccurate legal or financial advice that is subsequently used for decision-making.

In the healthcare environment, it’s easy to see how hallucinations could have devastating consequences. From clinical decision support systems to payer coverage determinations to administrative models sending patient accounts to collections, models that don’t get it exactly right each and every time could have profound impacts on clinical and socioeconomic outcomes for real patients.

How to address hallucinations in action

Because hallucinations come in so many shapes and sizes, they’re difficult to consistently identify and address at scale.

When working with Chat-GPT or Google Gemini during everyday use, it’s still practical enough for users to simply double-check their outputs before taking action (although with the same AI systems increasingly integrated into top-line search engine results, users will need to dig deeper into primary materials to find a trustworthy, AI-free source of truth).

But for large-scale applications such as clinical documentation assistance tools or AI chatbots, uncovering hallucinations can be much more challenging, and may require a more scientific approach.

For example, in 2024, researchers at the University of Oxford published a study about their “uncertainty estimator” that can detect the likelihood of a given prompt being the catalyst for a hallucinatory answer. The strategy uses probabilistic tools to define and measure the “semantic entropy” of the data, or the potential for wires to get crossed as the LLM parses through the intent of the query and the answer it provides.

Developers and model trainers can also use strategies such as retrieval augmentation generation (RAG) to support more accurate outputs. RAG creates a link between an LLM’s foundational training data and an external source of authoritative data so it can cross-refence and double-check its answer before returning an output to the user. This allows models originally trained on generalized data to essentially specialize in subject-specific data, such as sepsis detection or anesthesia delivery, with a higher degree of reliability and confidence that the outputs will not include an avoidable hallucination.

As AI tools become increasingly popular across a range of healthcare settings, taking action to understand, identify, and address the worrying world of hallucinations will be essential. While it’s a complicated area of engineering that is still unfolding alongside the AI maturity curve, every AI user should be aware of the potential for misinformation to occur and be ready to apply critical thinking skills to every AI output in order to prevent an avoidable mistake.

Jennifer Bresnick is a journalist and freelance content creator with a decade of experience in the health IT industry. Her work has focused on leveraging innovative technology tools to create value, improve health equity, and achieve the promises of the learning health system. She can be reached at [email protected].