P316 New approaches for IBD management based on text mining of digitalised medical reports and latent class modelling
Bergey F.*1, Saccenti E.2, Jonkers D.3, van den Heuvel T.3, Jeuring S.3, Pierik M.3, Martins dos Santos V.1,2
1LifeGlimmer GmbH, Berlin, Germany 2Wageningen University, Chair Systems and Synthetic Biology, Wageningen, Netherlands 3Maastricht University Medical Centre+, Division of Gastroenterology and Hepatology, Maastricht, Netherlands
Inflammatory bowel diseases (IBD) are characterized by a chronic inflammation of the gastrointestinal tract. The course of the diseases is variable, ranging from a quiescent course to multiple (severe) flares, often necessitating surgery. Early identification of patients with an expected poor prognosis may be of importance, to adapt treatment strategies accordingly. Currently used clinical markers for severe disease course have limited value for decision making in daily clinical practice. Although diagnostic medical reports, e.g. endoscopy, pathology and imaging reports, may comprise relevant information, these are not used to stratify patients.
We aimed to apply Natural Language Processing of diagnostic reports of all IBD patients of a population-based cohort to test whether we could identify new disease characteristics within 6 months of diagnosis that can predict disease outcome. We converted and cleaned diagnostic medical reports for text mining approaches. Predictive models were used to characterise clusters obtained by unsupervised classification with Latent Class Modelling of disease events.
Detailed health records with complete clinical follow-up since diagnosis, were available of 1142 newly diagnosed Crohn's disease patients within the population-based IBD South Limburg cohort. The mean follow-up was 101 months (SD 71 months). A total of 9760 imaging, endoscopy and pathology reports were digitalised and de-identified in order to remove any protected health information using pattern matching rules. We successfully corrected miswritten or misrecognised words and compounds with algorithms specifically adapted for Dutch language particularities. Three clusters were obtained with varying disease course based on flares, surgery, hospitalization, and medication changes. Subsequent steps include the identification of consecutive groups of words and their modality and/or negativity for relevant patient groups, aiming to define disease characteristics that predict disease course.
This new approach enables use of “hidden” information in generally available diagnostic medical reports for improved stratification of IBD patients.