Data Mining and Machine Learning Series

Combining machine learning and genomic approaches to predict RNA virus emergence

12th February 2020, 11:00 add to calender
Liam Brieley
Department of Biostatistics, University of Liverpool

Abstract

Emerging infectious diseases remain a prominent threat to global health, well-exemplified by recent outbreaks of, e.g. Ebola virus and Zika virus. Traditional approaches towards understanding risk factors and predicting emergence use simple ecological traits as predictors. However, modern RNA sequencing has generated a wealth of new, high-dimensional information in genetic sequences that has proven difficult to capture for predictive models.

I will discuss the application of various machine learning methods to identify and select genomic features towards predicting RNA virus emergence. I use EID2, a large text-mined data resource within the University of Liverpool to source which viruses infect which hosts. I will show the predictive power of genome composition biases (over or under-representation of certain sequence elements compared to expectation) in determining whether viruses infect key host groups, e.g., humans, domestic animals. I will then explore future fellowship aims in further methods of extracting signal from genetic sequence data.
add to calender (including abstract)