Accurate endoscopic scoring sits at the heart of ulcerative colitis (UC) clinical trials. Yet for decades, the field has wrestled with a persistent challenge: high inter-reader variability, operational complexity, and delays caused by traditional central reading workflows. Even with multi-reader paradigms designed to reduce bias, variability remains substantial—and costly.
In a newly published study in BMJ Open Gastroenterology, researchers evaluated a novel approach that blends the strengths of artificial intelligence (AI) with essential human oversight. The findings suggest a path forward that could fundamentally improve how endoscopic endpoints are assessed in UC trials.
Read the full peer-reviewed article here.
The Problem with Traditional Central Reading
The Mayo Endoscopic Score (MES) remains the most widely used measure of endoscopic disease activity in UC. Despite its practicality, MES scoring is inherently subjective. Even well-trained, experienced readers disagree roughly 40% of the time when scoring the same endoscopic video.
To address this, late-phase UC trials commonly rely on a “2+1” central reading paradigm—two independent human readers, plus a third adjudicator in cases of disagreement. While this approach reduces bias, it introduces new issues:
- High operational burden and cost
- Delays in trial timelines and treatment initiation
- Persistent variability driven by reader assignment
- Limited gains in accuracy relative to effort
As trials grow more complex and timelines more compressed, these inefficiencies become increasingly problematic.
Can AI Improve Endoscopic Scoring?
Over the past several years, AI models have demonstrated strong agreement with human readers in endoscopic scoring tasks. However, using AI alone raises important concerns—particularly around regulatory acceptance, reliability, and oversight.
Regulators and sponsors alike are seeking solutions that preserve human judgment while harnessing the scalability and consistency of machine learning.
This is where a hybrid paradigm becomes compelling.
Introducing the 2M+1H Paradigm
The study evaluated a new central reading framework known as 2M+1H:
- 2M: Two independently developed machine learning models score each endoscopic video
- 1H: A board-certified gastroenterologist adjudicates only when the models disagree
One AI model was developed by Iterative Health, and the other by Docbot (now curated by Eli Lilly and Co). Importantly, the models were trained independently using different data and methodologies—mirroring the “independent reader” concept used in human-based paradigms.
This design intentionally aligns with emerging regulatory guidance, including the FDA’s January 2025 recommendations on the use of AI to support regulatory decision-making.
What the Study Found
Using 150 full-length endoscopic videos, researchers compared the 2M+1H approach to the traditional human-only 2+1 method. Key findings included:
- Statistical non-inferiority to traditional central reading, with a quadratic weighted kappa of 0.78
- High agreement with reference standards for endoscopic improvement (82.7%) and remission (89.3%)
- An 81% reduction in required human reads per video
- Evidence that 16% of human-only cases produced different final scores depending on reader assignment
In short, the hybrid approach maintained accuracy while dramatically improving efficiency and reproducibility.
Why This Matters for UC Trials
The implications extend beyond operational convenience:
- Faster trial timelines by reducing read bottlenecks
- Lower costs through fewer required human reads
- Greater consistency in endpoint assessment
- Regulator-aligned AI adoption, with human oversight embedded by design
By combining independent AI models with targeted human adjudication, the 2M+1H framework preserves clinical judgment while addressing long-standing variability challenges.
Advancing the Future of Endpoint Assessment
This work represents more than a technical validation—it contributes to a broader conversation about how AI can responsibly support regulatory-grade clinical research. By adhering closely to FDA guidance and prioritizing transparency and oversight, hybrid models like 2M+1H offer a pragmatic path forward.
At Iterative Health, we believe that AI should augment—not replace—clinical expertise. This study demonstrates how thoughtfully designed machine learning systems can improve trial quality, scalability, and trust.
To explore the full methodology, results, and regulatory context, read the complete article in BMJ Open Gastroenterology here.
About Iterative Health
Iterative Health is a healthcare technology and services company powering the acceleration of clinical research to transform patient outcomes. By combining deep expertise in clinical trials with cutting-edge AI, we empower research teams and study sponsors to expand and expedite access to novel therapeutics for patients in need. Today, Iterative Health is based in Cambridge, Massachusetts, and New York City with 250+ employees world-wide.
To learn more or explore a partnership, please visit our Contact page.
