P212 Deep learning video analysis for a fully automated per-frame grading of ulcerative colitis
B. GUTIERREZ BECKER1, C. Gamez Serna1, A. Thalhammer1, Y. Oh2, F. Drawnel3, F. Arcadu1, M. Prunotto3
1Roche, Informatics, Basel, Switzerland, 2Department of Clinical Science Gastroimmunology, Genentech, Inc., South San Francisco, USA, 3Immunology, Infectious Disease and Ophthalmology, Roche, Basel, Switzerland
Several approaches have been proposed to use deep learning to automatically assess the severity of ulcerative colitis (UC) in terms of Mayo Clinic Endoscopic Subscore (MCES) from colonoscopy videos. These previous methods have been trained and evaluated on high-quality individual frames, carefully hand-picked by an expert gastroenterologist and without rigorous central endoscopy reading as an anchor. We investigate the use of a fully automated, end-to-end approach which takes as input raw endoscopic videos and returns the MCES values for individual colon sections.
Our method consisted of two stages: (1) a Quality Control Network (QCN) which took as input a raw endoscopic video and removed from the analysis frames with poor image quality and (2) a Severity Scoring Network (SSN) which took the high-quality frames extracted using the QCN and returned a severity score for each individual frame. To obtain an aggregate severity score for a full video or a colon section, the individual predictions were aggregated by averaging all individual predictions. Both the QCN and the SSN are convolutional neural networks (Resnet50) trained in a weakly supervised manner. The QCN was trained by providing the network examples of good quality vs. bad quality video segments and the SSN was trained by giving pairs of video segments extracted for a given colon subsection (rectum, sigmoid, descending colon) and its corresponding MCES. Manual annotations of quality were performed by non-experts while annotations for colon subsection and MCES were performed by expert gastroenterologists as part of the Hickory (NCT02100696) and Laurel (NCT02165215) clinical studies.
We evaluated our method retrospectively in 104 sigmoidoscopy videos obtained from Hickory and Laurel. We measured the agreement between the per-section MCES generated as part of the studies and the MCES obtained by automatic grading. Evaluation was done in a binary setting, where grading was constrained to determining whether a colon section has an MCES above a certain threshold. We achieved an Area Under the receiving operating characteristic Curve (AUC) of 0.76 +−0.08 for the detection of MCES >1 and an AUC of 0.81 +−0.06 for MCES >3 after performing 5-fold cross-validation.
We have demonstrated that the use of deep learning for a fully automated assessment of the severity of ulcerative colitis in colonoscopy videos is feasible. Our approach achieves high accuracy even when training is performed using weak labels and a limited number of examples and paves the way towards a fully automated assessment of the severity of ulcerative colitis in full colonoscopy videos.