Explore our Topics:

Data standardization: The invisible prerequisite for healthcare AI

Despite advanced algorithms, healthcare's "data dust cloud" remains AI's biggest hurdle, requiring both syntactic and semantic standardization for success.
By admin
Mar 28, 2025, 9:26 AM

In the race to implement artificial intelligence in healthcare, even the most sophisticated AI algorithms fail without properly standardized data. While AI technology itself advances rapidly, healthcare’s fragmented data ecosystem remains the primary obstacle to widespread adoption and meaningful outcomes.

Healthcare’s Data Dust Cloud

Healthcare data exists in what Bo Dagnall, Chief Product Officer at Smile Digital Health, aptly calls a “dust cloud” where lab results, patient vulnerabilities, and demographics are present but difficult to work with. “You can’t write queries against this dust cloud, even though the data is there,” he explained at ViVE 2025.

This challenge stems from healthcare’s complex data ecosystem. Brian Carlson from West Coast Informatics pointed out that “Healthcare data exists in a lot of different places and in a lot of different forms…CCDAs or FHIR data, unstructured notes, PDFs, all sorts of things. Some of this data is structured, some of it is semi-structured, some of it is not structured at all,” said Brian Carlson from West Coast Informatics , who presented with Dagnall at Vive during their session “Semantic Standardization Using AI.”

The Two-Step Standardization Process

Organizations can address this challenge through a two-step standardization process:

  1. Syntactic standardization – Organizing data into consistent structures by applying “mapping logic” to transform information into FHIR (Fast Healthcare Interoperability Resources) standard resources. This creates a foundation where data is “organized and cataloged neatly in a FHIR standard.”
  2. Semantic standardization – Mapping healthcare terms to standardized medical terminologies like SNOMED, LOINC, and RxNorm. This transforms human-readable text into computer-processable codes that maintain consistent meaning.

“Words are good for people and doctors to interact with and make decisions about care. But when you want digital health applications, it’s really much better to have coded representations that carry that meaning forward,” said Carlson. 

AI Automating the Standardization Process

The traditional approach to semantic standardization has been labor-intensive, requiring human terminologists to manually map terms to standard codes. Now, AI itself is being deployed to solve this standardization challenge. “We’re doing it with AI. We’re automating the solutions. We can do the mapping in real time at scale using AI,” Dagnall explained.

This creates a virtuous cycle: better standardized data enables more effective AI applications, which in turn improve data standardization.

Benefits of Proper Data Standardization

When examining the academic perspective, standardization work should focus on several key areas:

  1. Methods to measure and reduce bias
  2. Methods to measure reliability
  3. Notions of reproducibility in non-deterministic systems
  4. Methods for explainability in various AI techniques

Organizations that successfully address data standardization are seeing dramatically better outcomes with their AI implementations:

Dr. Nishith Patel, VP and Chief Medical Informatics Officer at Tampa General Hospital, shared an example of AI-powered sepsis detection that reduced mortality rates from 15-18% to under 7%—performance in the top decile nationally. “We have over 400 of our patients that were able to go back to their families as a result of these technologies being deployed over the last 24 months,” he noted.

Similarly, Dr. Rebecca Mixson, Chief Medical Officer at Color Health, described how their AI system identified “four times as many gaps in care than a hand review would be” when properly integrated with standardized data.

Framework for Standardization

According to recommendations from the Ad-Hoc Group on Application of AI Technologies in Health Informatics (ISO TC215), future standardization efforts should focus on:

  1. Establishing clear definitions and terminology
  2. Creating methods to measure reliability and reduce bias
  3. Developing standards for dataset preparation and data labeling
  4. Building quality management systems for AI health applications
  5. Creating methods for explainability in various types of AI solutions

Looking Forward

“By 2030, we’re going to have globally relevant and actionable data sets that allow us to build things for populations across the globe,” said Dr. Clark Ottley, Chief Medical Officer of Mayo Clinic Platform.

For healthcare organizations of all sizes—from major academic medical centers to rural hospitals—the path to AI success begins not with selecting the latest algorithms but with systematically addressing data standardization. Those who solve this invisible prerequisite will be positioned to deliver on AI’s promise of more efficient operations, earlier disease detection, and ultimately, better patient outcomes.


Show Your Support

Subscribe

Newsletter Logo

Subscribe to our topic-centric newsletters to get the latest insights delivered to your inbox weekly.

Enter your information below

By submitting this form, you are agreeing to DHI’s Privacy Policy and Terms of Use.