OP024. Agreement among central readers in the evaluation of endoscopic disease activity in Crohn's disease

R. Khanna1, G. Zou1,2, G. D'Haens1,3, P. Rutgeerts4, J.W. McDonald1, M. Daperno5, B.G. Feagan1,6, W.J. Sandborn1,7, E. Dubcenco1, M.K. Vandervoort1, A. Luo8, B.G. Levesque1,7, 1Robarts Clinical Trials Inc., Robarts Research Institute, Western University, London, Ontario, Canada, 2Western University, Department of Epidemiology and Biostatistics, London Ontario, Canada, 3University of Amsterdam, Academic Medical Center, Amsterdam, Netherlands, 4University Hospital, Department of Gastroenterology, Leuven, Belgium, 5A.O. Ordine Mauriziano, Gastroenterology Division, Torino, Italy, 6Western University, Department of Epidemiology and Biostatistics, Department of Medicine, London Ontario, Canada, 7University of California, San Diego, Division of Gastroenterology, La Jolla, United States, 8Bristol-Myers Squibb, Department, Princeton New Jersey, United States


The Crohn's Disease Endoscopic Index of Severity (CDEIS) and the Simple Endoscopic Score for Crohn's Disease (SES-CD) are commonly used to assess disease activity; however neither instrument has been fully validated. We evaluated intra- and inter-rater agreement and identified index items responsible for the greatest amount of disagreement between the measures.


Video recordings of colonoscopies obtained from 50 patients with CD who participated in an induction trial of a biologic therapy conducted by Bristol-Myers Squibb were triplicated and randomly reviewed by 4 central readers. Inter-class correlation coefficients (ICCs) for the inter- and intra-rater agreement were calculated for the CDEIS, SES-CD, and a global assessment of endoscopic lesion severity (GELS). Videos most influential to ICC estimates were identified using Cook's distance (D) statistic. A Delphi process identified the most common sources of disagreement. Regression of the CDEIS and SES-CD against the GELS after exclusion of these items was used to assess correlation coefficients of the modified indices.


Ten of 50 videos were responsible for the greatest disagreement. The Delphi study has identified multiple areas that require greater clarity for scoring. Specifically, definitions of stenosis and scoring after balloon dilation, defining location of ulcers involving two contiguous segments, and differentiating between anal and rectal lesions. Statistically, interpretation of stenosis contributed the most to disagreement for both indices. Intra- and inter-rater ICCs for the original and modified indices (excluding stenosis from the calculation of each index for all videos) and correlation coefficients with GELS were nearly identical (Tables 1 and 2).

Table 1. Reliability estimates for the CDEIS, SES-CD and GELS
ICCs (95% CI) a
OriginalModified bOriginalModified b
CDEIS0.89 (0.86–0.93)0.90 (0.87–0.93)0.71 (0.61–0.79)0.72 (0.63–0.81)
SES-CD0.91 (0.87–0.94)0.92 (0.88–0.95)0.83 (0.75–0.89)0.84 (0.78–0.90)
GELS0.81 (0.75–0.86)N/A0.62 (0.52–0.73)N/A
a ICCs of <0.00, 0.00–0.20, 0.21–0.40, 0.41–0.60, 0.61–0.80, 0.81–1.00 indicate poor, slight, fair, moderate, substantial, and almost perfect, agreement, respectively.
b “Modified” refers to exclusion of scoring stenosis.
Table 2. Correlation between the indices
Original versionsModified a versions
CDEIS–GELS0.75 (0.63–0.85)0.74 (0.61–0.83)
SES-CD–GELS0.74 (0.62–0.84)0.75 (0.63–0.85)
CDEIS–SES-CD0.92 (0.87–0.96)0.75 (0.63–0.85) b
0.90 (0.84–0.94) c
a “Modified” refers to exclusion of scoring stenosis.
b Between the modified CDEIS and the original SES-CD.
c Between the original CDEIS and the modified SES-CD.


Expert central reading of the SES-CD and CDEIS had “substantial” to “almost perfect” intra- and inter-rater agreement. However, our preliminary results indicate that (1) further efforts to standardize definitions of problematic items and (2) potentially, the removal of stenosis assessment from both the CDEIS and SES-CD may be strategies to further improve agreement. We plan future studies to evaluate these possibilities.