During the course of an outbreak or epidemic, many viral pathogens are known to evolve rapidly, leaving imprint of the pattern of spread in their genomes. Uncovering the molecular footprint of this transmission process is a key goal of phylodynamic inference. Relatively less focus has been put on the evolution of quantitative phenotypic traits of viruses. Traits such as geographical location or virulence can be studied using phylogenetic Comparative Methods (PCMs) that account for a shared evolutionary history among the set of non-independent samples. Conditioning on such an history, the observed traits can be seen as the result of a stochastic process running on the branches of a phylogenetic tree. The Ornstein-Uhlenbeck (OU) process is often used to model stabilizing selection toward an optimal trait value. For a multivariate trait, the dynamics of the trajectory is controlled by a selection strength matrix, that is only constrained to have positive eigenvalues. Depending on the form of this matrix, the OU can have a variety of behaviors, and is hence suited to model various biological processes.
We propose a Bayesian inference framework for the study of this flexible model. Using a Markov Chain Monte Carlo (MCMC) based method, one critical aspect is to be able to sample uniformly in the space of constrained matrices, both for the selection strength and the variance matrix, in a context where traditional Gibbs sampling cannot be used. This can be done using a smooth transformation that maps the parameters to an unconstrained space. We investigated the use of two such maps, along with adequate prior distributions. MCMC methods also rely on multiple likelihood evaluations, at each step of the chain. Exploiting the tree structure, we studied a fast and flexible algorithm to compute both the likelihood and its gradient for a wide class of processes, that contains but is not limited to the OU. This makes it possible to use efficient sampling methods, such as the Hamiltonian Monte Carlo (HMC).
We implemented the new framework in BEAST, a widely used and flexible phylodynamics software. This allows us to leverage on the many other tools of the BEAST ecosystem, such as the phylogenetic factor model, that can be used to model extra-environmental variation, or the marginal likelihood estimation for model selection. It also offers us the possibility to integrate the results over the space of all probable trees, in an integrated analysis that directly starts from the genomic sequences, instead of relying on a fixed tree. We illustrate the use of this framework for the study of the heritability of virulence of the human immunodeficiency virus (HIV), a question that has attracted a lot of attention recently, and for which model choice is a recognized critical aspect.