Using Electronic Health Records and Machine Learning to Screen for Noonan Syndrome
Principle Investigator: Jing Chen, PhD
Division of Biomedical Informatics
The goal of this project is to develop a machine-learning method to use electronic health records (EHRs) to screen patients for rare genetic diseases. We have previously developed a computational method, named Genetic Disease Diagnosis based on Phenotypes (GDDP), to use phenotypes to prioritize patients who may have genetic diseases. However, GDDP depends heavily on the accuracy of the phenotypic annotations in the disease reference database and does not take into account the real clinical records of specific genetic disorders in an EHR system. We hypothesize that a machine-learning method that targets a specific genetic disorder can be combined with EHR data to achieve better performance in identifying patients of specific genetic disorders than GDDP.
To test this hypothesis, we are using Noonan syndrome (NS) as the case example. In the preliminary study, we evaluated a set of deep learning models to identify patients at high risk for NS. A convolutional neural net (CNN)-based model achieved the best performance and substantially outperformed GDDP. However, this evaluation was based on blinded cases. To test our approach for identifying unknown patients, and to further improve the machine-learning method, we will: (1) evaluate the model for unknown patients and apply the model to the 96,000 existing patients in Cincinnati Children's Hospital Medical Center Discover Together Biobank. Clinicians then will review about 100 high-risk patients to exclude patients with a genetically confirmed diagnosis, then follow up with the other patients though molecular testing to confirm an NS diagnosis; and (2) improve the granularity of NS classification by developing multiclass classification models to classify not only NS, but also a group of genetic disorders that are similar to NS. We anticipate that the machine-learning method will outperform GDDP, and we will identify patients with undiagnosed NS with molecular testing confirmation. A successful completion our study will demonstrate the validity of our approach and exemplify the utility of combining this machine-learning method with EHR data for the diagnostics of rare genetic diseases. We also will develop more generalizable methods, and will implement these validated methods in the EHR system in future studies.