Discover Top Posts Tagged with #phylo.d

caper::comparative.data() complaining about rownames: solution

I was using caper::comparative.data() just now to prep some data for PGLS analysis. It was giving me an odd error message:

Error in .rowNamesDF<-(x, value = value) : invalid 'row.names' length In addition: Warning message: Setting row names on a tibble is deprecated.

I couldn't understand what was going on, the function doesn't even ask for rownames at all. And what's this warning from tidyverse about rownames on tibbles?

After digging into the source code for this function and having a chat with my good friend and colleague Dr Hannah Haynie, we figured it out: the data frame I was feeding it was also a tibble, and the tibble warning, though only a warning, was throwing a spanner in the works for the way the caper-function was functioning such that the warning turns into an error.

The solution is to just turn the data frame into a data fram only by using as.data.frame().

#rstats #caper #phylo.d

looping over caper::phylo.d()

If you're using caper::phylo.d() to estimate the phylogentic signal in a series of traits, you may want to use a for-loop, map() or apply() to do it consecutively over a set of traits instead of one at a time. However, if you've ever tried this you may have encountered a problem with the argument binvar. It is not able to digest what it is fed from a for-loop, apply() etc as the correct kind of input for that argument. You'll get one of these two type of errors:

Error in caper::phylo.d(compdat, ...) : 'var' is not a variable in data..

Error in caper::phylo.d(data = compdat, binvar = "var") :

'"var"' is not a variable in data.

However! Don't worry! Help is at hand!

As was noticed by the GitHub user MaxKerney, this has to do with this line of the phylo.d() function.

binvar <- deparse(substitute(binvar))

I struggled with how to go about this, so I contacted the package maintainer David Orme who was kind enough to join the GitHub-thread where some of us with this problem were hanging out and offer a solution. The solution is based on using eval() and substitute() to wrap around the phylo.d() function to make the input to the binvar argument evaluate to the right type.

Here is an example of the solution in use with a random small tree generated from rtree() and some made up data. What you'll get is a data frame with the D-estimate, Pval1 and Pval0 of each trait in a separate row. The example was made by Orme.

library(ape)

library(caper)

library(tidyverse)

tree <- rtree(10)

df <- tree$tip.label %>% as.data.frame() %>% rename(tip.label = ".") %>% mutate(var1 = c("a", "b", "b", "b", "b", "b", "b", "b", "b", "b"), var2= c(c("a", "b", "b", "b", "b", "b", "b", "b", "b", "a"))) # Build the comparative dataset once

ds <- comparative.data(tree, df, names.col=tip.label)

vars <- colnames(ds$data)

# Create the rows all at once, to avoid rbind (real performance hit in

# larger examples) although it does mean having to loop over indices not names below.

nvar <- length(vars)

result_df <- tibble(Feature = character(nvar), Destimate = numeric(nvar), Pval1 = numeric(nvar), Pval0 = numeric(nvar))

for (idx in seq_along(vars)) {

var <- vars[idx]

output <- eval(substitute(phylo.d(data = ds, binvar = this_var), list(this_var=as.name(var))))

result_df[idx, 1] <- var

result_df[idx, 2] <- output$DEstimate

result_df[idx, 3] <- output$Pval1