Feds target healthcare AI bias, but how will it work?
Bias in artificial intelligence (AI) algorithms is one of the biggest problems facing the healthcare industry as we move into the real-world adoption phase for these new technologies, with the clinical sphere being particularly vulnerable to issues stemming from skewed data and flawed interpretation of results.
There are already too many examples of AI doing more harm than good in other industries, from Amazon’s hiring algorithm that discriminated against female candidates to the COMPAS recidivism tool that deemed Black defendants at a much higher risk of committing new crimes than their white counterparts.
And healthcare itself certainly hasn’t been spared from criticism. In 2019, one widely used algorithm appeared to significantly underestimate the illness burdens and clinical risks of Black patients due to its questionable risk scoring methodology. And more recently, UnitedHealth and Cigna have been sued for using AI tools that allegedly improperly denied claims on a broad scale.
As AI-driven clinical decision support and risk scoring algorithms rapidly proliferate, chances are that more and more accusations of bias are going to be popping up in headlines, even after a new final rule from the Office of Civil Rights (OCR) and CMS that prohibits discrimination in AI based on common factors such as gender, age, race, and ethnicity.
But the problems we really have to worry about are the ones that never make the news. Even the most well-intentioned regulations rely, to some degree, on whistleblowers. And while there are a lot of very ethical, eagle-eyed observers out there with the willingness to jump in when they notice an issue, the biggest challenge with bias is its insidious nature.
We don’t know what we don’t know when it comes to AI, especially during these early stages of maturity. And sometimes, what we think we know is actually influenced by algorithms themselves, as one new study indicates. Algorithms that are trained on biased data will return biased results, researchers found, which often continue to influence human decision-making even when those humans are asked to complete similar tasks without the assistance of AI.
To combat the sneaky poison of biased data and biased thinking in the AI ecosystem, we will need to gain a better understanding of what bias really means, where it begins, how it propagates through systems, and how we can avoid or reverse its effects on patient care.
Defining the scope of “clinical AI” for regulatory purposes
The first task is getting a clear perspective on what we truly mean when we talk about AI tools in the clinical space.
The aforementioned final rule, which fits under Section 1557 of the Affordable Care Act, notes that industry leaders are hungry for more clarity around how regulation of certain types of tools is going to apply.
During the public comment period, respondents repeatedly requested that OCR provide more detailed definitions of what constitutes a “clinical algorithm,” and whether the term includes hybrid clinical-administrative tools, such as those used by insurers to make claims decisions.
In response, OCR and CMS have adopted the term “patient care decision support tool” to acknowledge that AI-driven algorithms that involve clinical data alongside administrative or operational applications are subject to the non-discrimination rules.
“We define ‘patient care decision support tool’ to mean ‘any automated or non-automated tool, mechanism, method, technology, or combination thereof used by a [HIPAA] covered entity to support clinical decision-making in its health programs or activities,’” the officials stated. “The definition applies to tools that are used by a covered entity in its clinical decision-making that affect the patient care that individuals receive.”
OCR and CMS also point out that the new rules are geared toward end-users of AI tools, and are designed to work in tandem with the HTI-1 regulations from the Office of the National Coordinator (ONC), which are focused on the developers of AI applications. Theoretically, this will cover all angles and help ensure that all AI stakeholders are held to similar degrees of accountability.
Addressing the roots of bias in algorithmic development
Naturally, since poor data will produce poor results, it’s imperative to use diverse and representative data when training and validating algorithms, and much has already been said on this topic.
But developers — and the users who are responsible for applying the results to patient care decisions — also have to be careful when setting the parameters for what they hope to achieve and how they are going to architect the algorithm’s decision-making process.
For example, in the 2019 study about the algorithm that underestimated the risk scores of Black patients, the developers assumed that higher clinical complexity would, of course, result in more spending, which could serve as a proxy for current and future risks.
However, they failed to account for the fact that many Black patients experience socioeconomic and geographical barriers to care due to deeply rooted systemic inequities. If people from these communities can’t get access to the right care in the first place, their spending will actually be lower than other populations — yet their clinical burdens are likely to be just as high (or higher), and the lack of access to appropriate care means their outcomes will likely be even worse than higher spending groups.
Even though the development team may have used data that was diverse and appropriately representative of the populations under scrutiny, incorrect assumptions about how to process that data may have led to potential harm for the populations involved.
They almost certainly didn’t intend for this misalignment to happen. But it’s a strong example of how subtle bias can be and the difficulty of sniffing out issues in models that may appear, on the surface, to be doing things right.
Best practices for identifying and avoiding bias in the care environment
Regulators are taking an increasingly close look at these and other issues that could be problematic as the AI era unfolds, and it’s encouraging to see that they are keen to hold end-users to the same standards as developers when it comes to being responsible for preventing harm.
For bias to be effectively “banned” in healthcare, we can’t rely on overstretched regulators to be able to take a deep dive into every single algorithm that hits the market — or every single one that is developed in-house and applied on a smaller scale. If we want anti-bias efforts to work, end-users are going to have to step up to the plate to proactively monitor the tools they put in place for reliability and trustworthiness.
In addition, they must feel confident that exposing bias and admitting to the need for corrective action will actually enhance their reputations as a responsible member of the healthcare community, especially since consumers are widely demanding transparency and accountability in the AI world. This will require a major cultural shift in many cases, as corporate entities are not always known for their willingness to be transparent when something goes awry.
To meet these growing expectations and become true partners in what is likely to become a Herculean task, healthcare providers and health plans will also need to develop the skills to spot dodgy results before they trickle down through their organizations.
They can start by asking the right questions, beginning with the basics:
- What is the purpose of this model or tool? How will it help with making decisions, and how will those decisions be used to affect key outcomes for patients or health plan members?
- What data is being used to train and validate this algorithm? Is it representative of the population we serve? What about the sub-population we are targeting for a specific use case?
- If the available data is not appropriately representative, can we apply techniques to bulk up the data in a safe and accurate way, such as using historical or synthetic data?
- How transparent is the algorithm’s decision-making process? What elements is it using to make decisions, and how do these elements track with existing decision-making processes? If there are differences, can those new methods be justified or explained?
- How will we measure bias? What distinct types of bias are we looking for? What will we use as a comparator to gauge the direction and degree of drift?
- How are humans going to be involved — and remain involved — in the training, vetting, deployment, and monitoring processes? What will be the mechanisms for providing feedback? What is the plan for integrating feedback to continually improve accuracy and trustworthiness?
- What happens if harm does result from an algorithm? How will we address it? What is our liability and how can we protect our organization while taking appropriate responsibility for our actions?
The bottom line is that federal regulators will not be able address and eliminate bias all on their own. Developers and end-users are going to have to make a concerted effort to be up front about their challenges and willing to admit when mistakes occur.
Ideally, as the regulatory environment evolves, there will be safe harbors carved out for reporting biases and non-punitive support from federal agencies for taking corrective action. Advocating for these measures will be important as the market matures.
In the meantime, AI developers and end-users can position themselves for success by implementing a comprehensive anti-bias framework from the very start of their AI journeys. By keeping tabs on their tools throughout the process, we can make AI more equitable for real-world patients while keeping organizations compliant with emerging regulations around non-discrimination.
Jennifer Bresnick is a journalist and freelance content creator with a decade of experience in the health IT industry. Her work has focused on leveraging innovative technology tools to create value, improve health equity, and achieve the promises of the learning health system. She can be reached at jennifer@inklesscreative.com.