For a while, I’ve been interested in the threshold model in phylogenetics and the opportunity it allows for performing comparative methods analyses that combine both discrete and continuous characters. Here I’ve had a go at performing a simple analysis involving phenotypic trait reconstruction of Darwin’s finches.

The figure below summarises genetic and phenotypic evolution during the adaptive radiation of 13 species of Darwin’s finches. The heatmap depicts variation in 5 scaled, continuously-measured morphometric traits. Both the sequence data and morphometric data are distributed with the BEAST package which I also used to reconstruct the tree.

Often we hope to understand the evolutionary process by studying trait correlations. However, closely related lineages often share traits and trait combinations as a result of their shared evolutionary history. This non-independence violates the assumptions of established methods. Phylogenetic comparative methods allow us to assess whether traits tend to evolve together by assessing their correlation throughout evolution, or across the tree.

In the finch tree there are 13 tips which gives 12 independent *contrasts* (Felsenstein, 1985): the pair of branches connecting a parent node to its two child nodes. Continuous traits are reconstructed at internal nodes, often using Brownian motion, and the correlation in the evolution of continuous traits across contrasts is assessed through the precision matrix of the multivariate diffusion process.

The phylogenetic threshold model (Felsenstein 2005, 2012) assumes there is a quantitative character, called liability, underlying a discrete trait, that is unobserved and that determines the discrete character according to whether the liability exceeds a threshold value. Two commonly cited advantages of the threshold model are:

- Unlike the Markov-process model that is often used for reconstruction of discrete traits, the probability of changing state and the time since last change are not independent under the threshold model. A trait that has recently changed from state
*A*to state*B*is more likely to change back quickly, while the liability remains close to the threshold. For many traits this is an biologically appealing characteristic. - It allows for estimation of the evolutionary correlation between continuous and discrete traits or between discrete traits (this is the correlation of their liabilities)

I’ve looked at the finch data using the *threshBayes* function (from the *phytools* R package, Revell, 2012) and the maximum clade credibility tree from a BEAST analysis (shown above) to estimate the evolutionary correlations between the 5 continuously-measured morphometric traits shown in the heatmap above and the discrete diet trait. The estimated correlations are shown in to the left in the plot below. Currently *threshBayes* only allows binary discrete traits so I’ve classified diet into two categories: plant-based and insect-based instead of the four categories marked on the tree.

The strongest correlations between the continuous traits are between bill depth and gonys (lower edge of bill) length (0.98) and between wing and tarsus length (0.87). The morphometric traits are generally negatively correlated with an insectivorous diet (bill depth and gonys length being the strongest). This fits with our understanding that the different beak shapes and sizes of the finches are highly adapted to different food sources. Interestingly wing length and tarsus length differ quite strongly in their correlations with diet despite being quite highly correlated to each other.

The similar plot to the right above is taken from Figure 2 of Drummond *et al*. (2012) and shows the evolutionary correlations between the 5 continuous traits as estimated by a joint phylogenetic reconstruction and multivariate Brownian diffusion analysis performed using BEAST. Generally the correlations derived by the two methods are similar. Unlike the BEAST analysis, my *threshBayes* analysis was performed on a single tree ignoring phylogenetic uncertainty, a potential cause of differences between the analyses. Phylogenetic uncertainty could easily be accounted for by running *threshBayes* on multiple trees from a posterior sample.

I was also interested to compare the correlations estimated using the threshold model with the corresponding phylogenetically-naive coefficients estimated from only the tip information. These correlations between tip measurements are shown below in the correlation plot to the right. For the correlations between diet and continuous traits these are the square root of McFadden’s *R²* calculated from a logistic regression model. To the right, these phylogenetically naive correlations are plotted against the corresponding evolutionary correlations obtained using *threshBayes*.

While the correlations obtained with and without accounting for shared evolutionary history tend to be quite similar in this case, the potential to form incorrect inferences when not accounting for phylogeny can be seen in the differing orders of strengths of correlations obtained using the two approaches.

In a fairly recent paper, Cybis *et al*. (2015) describe a multivariate latent liability model implemented in BEAST that allows for simultaneous phylogenetic reconstruction and estimation of evolutionary correlations between continuous, binary, and both ordered or unordered multi-state discrete traits. I’ve not had a chance to figure out how to get it running in BEAST yet but I think it’s a really exciting technique that I’m hoping to have a play around with soon.

Some useful links:

Figures created using ggplot2, ggtree and corrplot