DOP59 Development of a novel Ulcerative Colitis (UC) endoscopic activity prediction model using machine learning (ML)

Colombel , J.F.(1);Rubin , D.(2);Schott , J.P.(3);Gottlieb , K.(4);Erisson , L.(5);Prucka , B.(6);Phillips , S.(5);Kwon , J.(7);Ng , J.(8);McGill , J.(4);

(1)Icahn School of Medicine at Mount Sinai, Gastroenterology, New York City- NY, United States;(2)University of Chicago, Gastroenterology, Chicago- IL, United States;(3)Iterative Scopes, Engineering, Cambridge- MA, United States;(4)Eli Lilly and Company, Immunology BU, Indianapolis- IN, United States;(5)Iterative Scopes, Research and Development Strategy, Cambridge- MA, United States;(6)Eli Lilly and Company, Advanced Analytics and Data Sciences, Indianapolis- IN, United States;(7)Gilead Sciences, Immunology, Foster City- CA, United States;(8)Iterative Scopes, Corporate, Cambridge- MA, United States;

Background

Previous studies have described machine learning (ML) models to predict how human readers would score disease activity in UC using the endoscopic Mayo Score (eMS). So far, none employed deep human annotation that considers all the endoscopic features making up the eMS.  Here we report the results of an ML model that is trained on eMS features using centrally read endoscopies.

Methods

793 full-length videos were obtained from 249 patients with UC who participated in NCT02589665, a phase 2 trial with mirikizumab in patients with UC and associated with centrally read (single reader) eMS (CReMS) as the primary dataset. After cleaning for usable frames, the data were split into training, validation and testing subsets. The ML workflow consisted of annotation, segmentation, and classification (e.g., erosions, ulcers, erythema, vascular pattern, and bleeding). Human image classification and segmentation with bounding boxes and was subjected to quality control adjudicated by one of three IBD specialists, generating more than 60,000 eMS-relevant annotation labels. The model was evaluated on a test set of 147 videos using the CReMS, and a consensus set of 94 test videos, where CReMS and annotator reported eMS (AReMS) were in agreement without adjudication. The primary objective of the model was a categorical prediction of endoscopically inactive disease (eMS 0 & 1) compared with active disease (eMS 2 & 3). The secondary objectives of the model were to predict endoscopic healing (eMS 0) and to predict severe disease (eMS 3).

Results

The model performances are in Table 1. On the full test set of 147 videos, the model predicted inactive disease compared with active disease with an accuracy of 84%, positive predictive value (PPV) of 80%, and negative predictive value (NPV) of 85%. In the subset of 94 videos with CreMS and AReMS consensus, the model predicted inactive disease compared with active disease with an accuracy of 89%, PPV 87%, and NPV of 90%. In this same subset, the model predicted endoscopic healing and severe disease with an accuracy of 95% and 85%, PPVs of 86% and 82% and NPVs of 95% and 87%, respectively. For the secondary objectives in the full set of 147 videos, the model predicted endoscopic healing and severe disease with an accuracy of 90% and 80%, PPVs of 44% and 86%, and NPVs of 95% and 86%, respectively.

Accuracy Results

Conclusion

We have developed a ML predictive model of the eMS in UC using centrally read videos and demonstrate excellent distinction between active and inactive disease, and clear discrimination between other levels of endoscopic activity. We propose that this unique ML approach to endoscopic assessment be considered as a substitute to human central reading in future clinical trials.