To begin, load the R package 'caper' (see
Section 1.1.2 for installation instructions=
; Orme et al., in press):

library("caper")

Next download the data set and the phylogeny:

And then load the files into *R*:

primatedata <-read.table("Primatedata.txt", sep =3D "\t", header = =3D TRUE)primatetree <-read.nexus("consensusTree_10kTrees_Version2.nex")

Note that phylogenies in R cannot have spaces in the tip labels, instead= R uses underscores to separate Genus and species (Genus_species). The name= s of the species in the tree must match those in the data, therefore in thi= s dataset the spaces in species names have been replaced with underscores. = This is easy to do in your own data:

primatedata$Binomial<-gsub(" ", "_", primatedata$Binomial)

caper requires your phylogeny and your data to be in a special kind of R= object called a comparative data object. This will match the species names= in the tree to those in your data automatically so there is no need to wor= ry about data being ordered correctly etc. phy is your tree, data your data= set, and names.col is the name of the column containing your species names= .

primate <- comparative.data(phy =3D primatetree, data =3D primateda= ta, names.col =3D Binomial, vcv =3D TRUE, na.omit =3D FALSE, warn.dropped = =3D TRUE)

NOTE regarding the arguments to this function: vcv =3D TRUE st= ores a variance covariance matrix of your tree, which you will need for pgl= s. na.omit =3D FALSE stops the function from removing species wit= hout data for certain variables. warn.dropped =3D TRUE will tell = you if there are any species which are not in the tree and the dataset= and are therefore dropped from the data object.

If you drop species, you can see the list of dropped species by typing p= rimate$dropped. Make sure you check this list is what you expected, it may = reveal typos in your species names. If you want to turn this off use w= arn.dropped =3D FALSE. Note that the comparative.data function does all the= matching of data and tree for you. Note that we expect to drop some s= pecies here because not all the species in primatetree are in primatedata a= nd vice versa.

The function for PGLS analyses in caper is pgls. To fit a model of = log gestation length against log body mass which uses the maximum likelihoo= d estimate of lambda we use the following code:

=3D'ML')model.pgls<-pgls(log(GestationLen_d)~ log(AdultBodyMass_g), data = =3D primate, lambda

summary(model.pgls)

If you get an error related to optimization, you may need to adjust the = bounds on the search of the maximum likelihood space. Each of the parameter= s that can be fit are given bounds, the default values for the bounds are l= ambda: 1e-6 to 1, kappa: 1e-6 to 3 and delta: 1e-6 to 3. To change the boun= ds, use the "bounds" argument within your pgls model, for example:

model.pgls<-pgls(log(GestationLen_d)~ log(AdultBodyMass_g), data = =3D primate, lambda =3D"ML", bounds =3D list(lambda=3Dc(0.001,1), kappa=3Dc= (1e-6,3), delta=3Dc(1e-6,3)))

Note that here we have changed the lower bound of lambda from 1e-6 to 0.= 001. At present if you want to change the bounds for any one parameter you = have to list the bounds for each of the other parameters as well. Hence abo= ve we state the bounds for delta and kappa although we are not fitting thes= e parameters with ML.

The output should look like this:

Call:pgls(formula =3D log(GestationLen_d) ~ log(AdultBodyMass_g), data =3D = primate,lambda =3D "ML")Residuals:Min 1Q Median 3Q Max-0.098899 -0.011661 0.003082 0.017758 0.075133Branch length transformations:kappa [Fix] : 1.000lambda [ ML] : 0.892lower bound : 0.000, p =3D 1.1435e-14upper bound : 1.000, p =3D 0.0004639395.0% CI : (0.753, 0.967)delta [Fix] : 1.000Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) 4.290229 0.160355 26.7546 < 2.2e-16 ***log(AdultBodyMass_g) 0.104864 0.019628 5.3426 9.479e-07 ***---Signif. codes: 0 =E2=80=98***=E2=80=99 0.001 =E2=80=98**=E2=80=99 0.0= 1 =E2=80=98*=E2=80=99 0.05 =E2=80=98.=E2=80=99 0.1 =E2=80=98 =E2=80=99 1Residual standard error: 0.0261 on 75 degrees of freedomMultiple R-squared: 0.2757, Adjusted R-squared: 0.266F-statistic: 28.54 on 2 and 75 DF, p-value: 6.062e-10

As well as the standard regression outputs, the output includes the estimat=
ed ML value of =CE=BB (0.892) and p values from likelihood ratio tests show=
ing whether the ML lambda is significantly different from 0 or 1 (see
5.3: Estimating Phylogenetic Signal - Pagel=E2=80=99s =CE=
=BB). In this case, the maximum likelihood estimate of =CE=BB is signif=
icantly different from both zero and 1; in other words, there is evidence f=
or phylogenetic signal, but the trait has not evolved as expected under a B=
rownian motion model of evolution (given the branch lengths used).

We can also plot the results, including a plot of the likelihood surface= of =CE=BB (which is well worth investigating - sometimes unexpected patter= ns are found, with peaks at 0 or 1!):

plot(log(GestationLen_d)~ log(AdultBodyMass_g), data =3D primatedata)<= /pre>abline(model.pgls) profile_lambda=3Dpgls.profile(model.pgls, which=3D"lambda") # vary lambda plot(profile_lambda)

The parameters =CE=BA and =CE=B4 are additional branch-length transforma= tions implemented in caper that can improve the fit of the data to the tree= (See 7.3: Common branch length tr= ansformations for more details). To optimise =CE=BA or =CE=B4= , use the arguments delta =3D "ML" or kappa =3D "ML" in= stead of lambda =3D "ML" in the code above. Note that optimi= sing more than one of these parameters at the same time is not advisable be= cause it would be nearly impossible to interpret the results! Also note tha= t for =CE=BA, you will need a different variance-covariance structure from = that used above, specifically a 3D vcv.array. To do so, use the vcv.dim arg= ument when constructing your comparative data object:

primate_3d <- comparative.data(phy =3D primatetree, data =3D primat= edata, names.col =3D Binomial, vcv.dim=3D3, vcv =3D TRUE, na.omit =3D FALSE= , warn.dropped =3D TRUE)

model.pgls.kappa<-pgls(log(GestationLen_d)~ log(AdultBodyMass_g), data = =3D primate_3d, kappa =3D"ML")

You should find that the maximum likelihood estimate of =CE=BA is 0.458,= and that it also is significantly different from zero (as predicted under = a speciational model of evolution) and one (as predicted under Brownian mot= ion using the given branch lengths). Let's check out the likelihood surface= :

profile_kappa=3Dpgls.profile(model.pgls.kappa, which=3D"kappa") # vary= kappa

plot(profile_kappa)

Outliers can seriously affect the parameter estimates for any regression= model and PGLS models are no exception. Some researchers recommend the&nbs= p;removal of outliers with studentized residuals >=C2=B13 (Jones and Pur= vis 1997). Note that caper does not at present have an automated outlier re= moval option (compare to crunch, section 7.2.2), = but it may in the near future. Check ?pgls for updates.

To identify the outliers in a PGLS model, first you need to extract the = phylogenetic residuals from the model:

res<- residuals(model.pgls, phylo =3D TRUE) #extracts phylogenetic= residuals from the pgls model

Next these residuals need to be standardized by dividing by their square= root of their variance:

res<- res/sqrt(var(res))[1] #standardises residuals by sqrt of thei= r variance

Finally the residuals can be matched to the species names and the outlie= r idenitified (in this case Colobus_polykomos):

rownames(res)<-rownames(model.pgls$residuals) #matches the resid= uals up with the species names

rownames(res)[(abs(res)>3)]#gives the names of the outliers

You can then remove these species from the data and tree and redo t= he analysis. Note that you may need to continue removing species until ther= e are no more outliers.

primate_nooutliers<-primate[-which(abs(res)>3),]model.pgls_nooutliers<-pgls(log(GestationLen_d)~ log(AdultBodyMass_= g), data =3D primate_nooutliers, lambda=3D'ML')summary(model.pgls_nooutliers)plot(model.pgls_nooutliers)abline(model.pgls_nooutliers)

NOTE: These are phylogenetic residuals so they detect "phylogenetic outl= iers". Thus outliers may not obviously appear to be outliers on a plot of t= he raw data.

It is generally worth checking model diagnostic plots whenever you fit a= model in R to check that your data meet the assumptions of linear modellin= g. The method for this is the same for OLS, independent contrasts and PGLS = models (though the graphs are slightly different):

par(mfrow=3Dc(2,2))#so you can see all 4 plots at once plot(model.pgls)

par(mfrow=3Dc(1,1))#to reset the graphic window to just one graph

Here are the model diagnostics for our model above:

Essentially what you are looking for in these plots are:

1) No studentised model residuals >=C2=B13. Any species with such lar= ge residuals should be removed. These outliers may overly influence the res= ults of the regression.

2) The points of the Q-Q plot should approximately form a straight line = (rather than a banana shape).

3 and 4) These should show a fairly random scattering of points. You wan= t to avoid any clear patterns.

It takes practice to know what is =E2=80=9Cgood=E2=80=9D, =E2=80=9Cbad= =E2=80=9D and =E2=80=9Cacceptable=E2=80=9D with these plots. These plots lo= ok OK, but there appear to be several outliers; it may be worth investigati= ng how these outliers affect the results.

Jones, K. E. & Purvis, A. 1997. An optimum body size for mammals? Compa= rative evidence from bats.Funct. Ecol.11: 751-756.

Orme, C. D. L., Freckleton, R. P., Thomas, G. H., Petzoldt, T., Fritz, = S. A. & Isaac, N. J. B. in press. caper: Comparative Analyses of Phylog= enetics and Evolution in R.Methods Ecol. Evol.