AI model predicts cancer 3 years early
Pancreatic cancer typically has few symptoms in its early stages, leading to late diagnoses and extremely poor prognosis for patients: a 12 percent 5-year relative survival rate. There are currently no widely reliable tools to screen for pancreatic cancer in the general population, compounding the challenge of identifying and treating the condition as early as possible.
However, researchers from Harvard Medical School, the University of Copenhagen, the VA Boston Healthcare System, Dana-Farber Cancer Institute, and the Harvard T.H. Chan School of Public Health, have developed an AI model that could change the equation.
In a new study published in Nature Medicine, the team details an artificial intelligence algorithm that could identify people at elevated risk of developing pancreatic cancer up to three years before detectable cancer can be diagnosed, allowing these people to undergo enhanced screenings and monitoring in the interval.
“Many types of cancer, especially those hard to identify and treat early, exert a disproportionate toll on patients, families and the healthcare system as a whole,” said study co-senior investigator Søren Brunak, professor of disease systems biology and director of research at the Novo Nordisk Foundation Center for Protein Research at the University of Copenhagen, in a press release.
“AI-based screening is an opportunity to alter the trajectory of pancreatic cancer, an aggressive disease that is notoriously hard to diagnose early and treat promptly when the chances for success are highest.”
Using deep learning to predict cancer development
The team created a deep learning algorithm to comb through more than 9 million patient records, including 6 million from the Danish National Patient Registry (DNPR) and 3 million from the US Veterans Affairs health system.
They primarily trained the model on the sequence of ICD-10 disease codes in clinical histories and tested for cancer occurrence within six months, one year, two years, and three years, but also experimented with a non-linear model that simply looked at all disease codes, regardless of sequence.
When running the algorithm against the Danish patient data set, the researchers found certain diagnoses, such as gallstones, anemia, type 2 diabetes, and GI-related problems, were correlated with a greater risk for pancreatic cancer within 3 years. Inflammation of the pancreas was correlated with the appearance pancreatic cancer within just two years.
The team then ran their best performing algorithm (AUROC = 0.879 (0.877–0.880) at 3 years) against the VA database, but found it was slightly less accurate in its predictive ability. They attributed this to differences in the datasets, such as a shorter duration of longitudinal patient history and a narrower, more medically complex population of military veterans.
Overall, the algorithm was able to identify the highest risk patients in the cohorts with a fair degree of accurately. Among the 1000 highest risk individuals identified by the AI tool, approximately 320 went on to develop pancreatic cancer, the study notes.
Integrating AI models into holistic preventive care programming
To maximize the health system’s ability to get ahead of pancreatic cancer, AI models will need to be deeply integrated into a holistic, population-based approach for monitoring and care delivery, the authors pointed out.
“Successful implementation of early diagnosis and treatment of pancreatic cancer in clinical practice will likely require three essential steps: (1) identification of high-risk patients; (2) detection of early cancer or pre-cancerous states by detailed surveillance of high-risk patients; and (3) effective treatment after early detection. The overall impact in clinical practice depends on the success rates in each of these stages.”
As AI models continue to develop and become more commonplace in routine practice, it will be essential to ensure providers are collecting complete, accurate, high-quality data to support advanced predictive tools.
“Although the precise clinical impact will depend on the quality of EHR data and current clinical practice in a particular healthcare system, we conclude that this level of additional early detection may be considered of value, provided that implementation issues, including the cost of a surveillance program, can be successfully addressed in a real-world implementation,” the team wrote.
“To achieve a globally useful set of prediction rules, access to large datasets of disease histories aggregated nationally or internationally will be extremely valuable, with careful assessment of the accuracy of clinical records.”
Jennifer Bresnick is a journalist and freelance content creator with a decade of experience in the health IT industry. Her work has focused on leveraging innovative technology tools to create value, improve health equity, and achieve the promises of the learning health system. She can be reached at jennifer@inklesscreative.com.