P124 Disagreement among gastroenterologists in the endoscopic evaluation using the Mayo endoscopic score and the Rutgeerts postoperative score

Raimundo Fernandes S.*1, Pinto J.2, Marques da Costa P.1, Correia L.1 Grupo de Estudos de Doença Inflamatόria Intestinal

1Hospital de Santa Maria, Serviço de Gastrenterologia e Hepatologia, Lisboa, Portugal 2Hospital Amato Lusitano, Serviço de Gastrenterologia, Castelo Branco, Portugal


Endoscopic evaluation is an integral part in the evaluation of patients with Inflammatory bowel disease (IBD). Endoscopy is routinely used to evaluate disease severity and guide important clinical decisions. Therefore, variability in the interpretation of endoscopic findings amongst different gastroenterologists represents a problem of the upmost importance, which can severely impact the management of these patients. We aimed to study the inter-rater variability using the Mayo endoscopic subscore (MS) and Rutgeerts score (RS) amongst gastroenterologists.


Several gastroenterologists were invited to participate in an online survey including pictures and video recordings from colonoscopies of patients with Crohn's disease (CD) and Ulcerative Colitis (UC) (total of 20 questions). Participants were asked to assess the mucosal appearance of the colorectal mucosa in UC using the MS (0–3), and the mucosal aspect of the neo-terminal ileum, anastomosis, and proximal colon in operated patients with CD using the RS (0–4). To assess the impact of clinical information on scoring, two identical questionnaires were distributed, only differing on the clinical information provided on six cases. In the second part of the survey, participants were able to consult the endoscopic scores before answering the questions. The inter-rater concordance (IRC) was assessed using Krippendorff's alpha test.


A total of 58 gastroenterologists accepted to participate in this study. Nineteen participants (32.8%) stated to be more experienced in IBD (>10 patients/week), 50 (86.2%) stated to commonly perform endoscopy in patients with IBD, and 52 (89.7%) routinely used the scores. The IRC for the MS and RS was 0.47 95% CI (0.41–0.54) and 0.33 95% CI (0.28–0.38). Clinical information did not influence IRC for the MS (p=0.762) or RS (p=0.147). Consultation of scores slightly improved IRC for the MS [0.50 95% CI (0.42–0.58) vs 0.45 95% CI (0.38–0.51)], but not RS [0.16 95% CI (0.10–0.21) vs 0.41 95% CI (0.35–0.46)]. The IRC for mucosal healing (MS≤1) and deep mucosal healing (MS=0) was 0.57 95% CI (0.40–0.72) and 0.89 95% CI (0.73–1). The IRC for postoperative recurrence (RS≥i2) was only 0.44 95% CI (0.24–0.62) and for severe recurrence (RS≥3) 0.54 95% CI (0.36–0.71). Considering only the 18 more experienced participants, the IRC increased for the MS [0.54 95% CI (0.46–0.60)] and RS [0.42 95% CI (0.37–0.47)]. The IRC for MS≤1 [0.62 95% CI (0.45–0.78)], MS=0 [0.89 95% CI (0.75–1)], RS≥2 [0.58 95% CI (0.39–0.76)] and RS≥3 [0.59 95% CI (0.42–0.75)].


Our study confirms a high rate of disagreement using the MS and RS, even amongst experienced physicians. Worryingly, there was only a moderate concordance for the most important outcomes- mucosal healing and postoperative recurrence.