DOP31 Serum protein markers for early and differential IBD diagnosis validated by machine learning approaches

S. Verstockt1,2, N. Verplaetse2,3, D. Raimondi3, B. Verstockt1,4, E. Glorieus5, M. De Decker6, L. Hannes2, V. Ballet4, E. Vandeput1, Y. Moreau3, M. Ferrante1,4, D. Laukens5, F. Mana6, M. De Vos5, S. Vermeire1,4, I. Cleynen2

1Department of Chronic Diseases, Metabolism and Ageing, KU Leuven, Leuven, Belgium, 2Laboratory for Complex Genetics, Department of Human Genetics, KU Leuven, Leuven, Belgium, 3Department of Electrical Engineering ESAT, KU Leuven, Leuven, Belgium, 4Department of Gastroenterology and Hepatology, University Hospitals Leuven, Leuven, Belgium, 5Department of Gastroenterology, University Hospital of Ghent, Ghent, Belgium, 6Department of Gastroenterology, University Hospitals Brussels, Brussels, Belgium


The inflammatory bowel diseases (IBD), Crohn’s disease (CD) and ulcerative colitis (UC) are chronic inflammatory conditions with a polygenic and multifactorial pathogenesis. Intensified treatment early in the disease course of IBD results in better outcomes. This is, however, challenged by the diagnostic delay faced in IBD, and especially in CD. Therefore, markers supporting early and differential diagnosis are needed. In this study, we aimed to discriminate IBD patients from non-IBD controls, and CD from UC patients, using serum protein profiles combined with an IBD polygenic risk score.


Patients naïve for immunosuppressives and biologicals, and without previous IBD-related surgery were prospectively included within 3 months after diagnosis, across three Belgian IBD referral centres (PANTHER study). We collected serum from 127 patients (88 CD, 39 UC) and 66 age- and gender-matched non-IBD controls. Relative serum levels of 576 unique proteins were quantified (OLINK). Proteins were ranked according to (1) adjusted (adj.) p values obtained from differential expression analysis; (2) importance scores from machine-learning feature-selection algorithms (univariate feature selection, logistic regression with L2 penalty and Random Forest). For all individuals, a weighted IBD polygenic risk score (PRS) was calculated (PRSice 2.0) for the 242 known IBD risk loci. Receiver operating characteristics (ROC) and area under the curve (AUC) analysis were performed to measure the performance of top-ranked proteins and the IBD PRS (R package ROCR).


Following statistical analysis, 243 serum proteins were found to be differentially expressed (adj. p < 0.05) between IBD patients and controls. Three top-ranked markers were also identified as top 10 ranked proteins by all feature-selection algorithms, and resulted in a significant AUC of 93% (95% CI: 89–97%) to distinguish IBD from controls. While adding the IBD PRS did not further contribute (AUC 93% [95% CI: 89–97%]), the top-ranked protein on its own had a strong discriminative power with an AUC of 87% (95% CI: 82–92%). When comparing UC and CD, we found 15 differentially expressed proteins. Two proteins ranked within the top 10 across all feature-selection algorithms. This two-marker panel could discriminate UC from CD with an accuracy of 88% (95% CI: 82–96%). Adding the IBD PRS did not further improve the prediction model (AUC=88% [95% CI: 81–96%]).


Machine learning approaches validated top differentially expressed serum proteins with diagnostic potential in IBD. We identified a three-marker panel classifying IBD patients and non-IBD controls, and a two-marker panel discriminating UC from CD.