Sparse Bayesian variable selection for the identification of antigenic variability in the foot-and-mouth disease virus

Davies V, Reeve R, Harvey WT, Maree FF and Husmeier D (2016) Journal of Machine Learning Research Workshop and Conference Proceedings 33: 149-158 PDF


Vaccines created from closely related viruses are vital for offering protection against newly emerging strains. For Foot-and-Mouth disease virus (FMDV), where multiple serotypes co-circulate, testing large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Here we describe a novel sparse Bayesian variable selection model using spike and slab priors which is able to predict antigenic variability and identify sites which are important for the neutralisation of the virus. We are able to identify multiple residues which are known to be key indicators of antigenic variability. Many of these were not identified previously using Frequentist mixed-effects models and still cannot be found when an l1 penalty is used. We further explore how the Markov chain Monte Carlo (MCMC) proposal method for the inclusion of variables can offer significant reductions in computational requirements, both for spike and slab priors in general, and our hierarchical Bayesian model in particular.