Use your own spatial method in Voyager

Introduction

There are multiple different ways to do certain things. Each of the different ways have pros and cons, and sometimes can tell somewhat different stories. Often the different ways come with different syntaxes, increasing the learning curve for users. Voyager took inspiration from caret and tidymodels (Kuhn2020-fm?) for machine learning, foreach, future, and BiocParallel for parallel processing with different backends, bluster for different clustering algorithms, and BiocNeighbors for different algorithms to find nearest neighbors. These packages provide uniform user interfaces to different methods to achieve a given goal.

In caret and tidymodels, users can make the uniform user interface fit custom models not included in the package to eliminate a lot of duplicate code. In Voyager, this is done with the SFEMethod S4 class. This vignette shows how to use the SFEMethod class to use Voyager’s uniform user interface on you own custom methods.

Here we load the packages used:

library(Voyager)
#> Loading required package: SpatialFeatureExperiment
#> 
#> Attaching package: 'SpatialFeatureExperiment'
#> The following object is masked from 'package:base':
#> 
#>     scale
library(spdep)
#> Loading required package: spData
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source')`
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE

Voyager categorizes exploratory spatial data analysis (ESDA) methods by the number of variables and whether the method gives one result for the entire dataset (global) or gives results at each location (local). While the process to create an SFEMethod object is mostly the same across categories, each category has specific arguments.

Also, before you make your own SFEMethod object, see if the method of interest is already in Voyager. The methods can be listed with the listSFEMethods() function.

When calling calculate*variate() or run*variate(), the type (2nd) argument takes either an SFEMethod object or a string that matches an entry in the name column in the data frame returned by listSFEMethods() so Voyager will search for an S4 object with name matching the string.

Univariate

Global

These are the univariate global methods in Voyager:

listSFEMethods("uni", "global")
#>              name                                           description
#> 1           moran                                             Moran's I
#> 2           geary                                             Geary's C
#> 3        moran.mc                    Moran's I with permutation testing
#> 4        geary.mc                    Geary's C with permutation testing
#> 5    sp.mantel.mc Mantel-Hubert spatial general cross product statistic
#> 6      moran.test                                        Moran's I test
#> 7      geary.test                                        Geary's C test
#> 8    globalG.test                                         Global G test
#> 9  sp.correlogram                                           Correlogram
#> 10      variogram                                  Variogram with model
#> 11  variogram_map                                         Variogram map

This is the code used to create the SFEMethod object to run Moran’s I, with the SFEMethod() constructor:

moran <- SFEMethod(
    name = "moran", title = "Moran's I", package = "spdep", 
    variate = "uni", scope = "global",
    fun = function(x, listw, zero.policy = NULL)
        spdep::moran(x, listw, n = length(listw$neighbours), S0 = spdep::Szero(listw),
                     zero.policy = zero.policy),
    use_graph = TRUE,
    reorganize_fun = .moran2df
)

The package argument is used to check if the package is installed when the method is run. The function to run the method is in the fun argument. All univariate methods that use a spatial neighborhood graph (use_graph = TRUE) must have arguments:

x for a vector as input
listw for the spatial neighborhood graph as a listw object, and
zero.policy for what to do when there are cells or spots that don’t have spatial neighbors. See spdep documentation (e.g. spdep::moran()) for how the zero.policy argument behaves.

In this case I wrote a think wrapper to fill in the confusing arguments that may confuse the users.

If the function running the method from another package has different arguments, then write a thin wrapper to make these required arguments. Extra arguments can be passed to fun through ....

The reorganize_fun argument takes a function to reorganize the output from fun to form a DataFrame so the results for genes can be added to rowData(sfe). Here for Moran’s I, the function is

.moran2df <- function(out, name, ...) {
    rns <- names(out)
    out <- lapply(out, unlist, use.names = TRUE)
    out <- Reduce(rbind, out)
    if (!is.matrix(out)) out <- t(as.matrix(out))
    rownames(out) <- rns
    out <- DataFrame(out)
    names(out)[1] <- name
    out
}

For univariate or bivariate global methods, the function must have:

out argument to take in the output from fun for multiple genes or features
name to take the name under which the results are stored in case the same method is run for the same genes but with different parameters and you don’t want to overwrite previous results. The name is the name specified in the SFEMethod() constructor by default, but can be set by the user when calling calculate*variate() or run*variate(), and
... because the reorganize_fun of some univariate global methods in Voyager, such as sp.correlogram, needs other arguments.

Some spatial methods use spatial distances rather than graphs, such as the variogram. This is the code used to create the SFEMethod object for variogram:

variogram <- SFEMethod(package = "automap", variate = "uni", scope = "global",
                       default_attr = NA, name = "variogram", title = "Variogram",
                       fun = .variogram,
                       reorganize_fun = .other2df,
                       use_graph = FALSE)

The function in fun for univariate methods that don’t use a spatial neighborhood graph must have arguments x and coords_df (sf data frame for the spatial coordinates) and other arguments are allowed. This is the .variogram function:

.variogram <- function(x, coords_df, formula = x ~ 1, scale = TRUE, ...) {
    coords_df$x <- x
    if (scale) coords_df$x <- scale(coords_df$x)
    dots <- list(...)
    # Deal with alpha myself and fit a global variogram to avoid further gstat warnings
    have_alpha <- "alpha" %in% names(dots)
    if (have_alpha) {
        empirical <- gstat::variogram(formula, data = coords_df, alpha = dots$alpha)
        dots$alpha <- NULL
    }
    out <- do.call(automap::autofitVariogram,
                   c(list(formula = formula, input_data = coords_df,
                          map = FALSE, cloud = FALSE), dots))
    if (have_alpha) {
        out$exp_var <- empirical
    }
    out
}

The rule for reorganize_fun remains the same, and this is the .other2df function:

.other2df <- function(out, name, ...) {
    if (!is.atomic(out)) out <- I(out)
    out_df <- DataFrame(res = out)
    names(out_df) <- name
    rownames(out_df) <- names(out)
    out_df
}

Local

These are the univariate local methods in Voyager:

listSFEMethods("uni", "local")
#>               name                                          description
#> 1       localmoran                                      Local Moran's I
#> 2  localmoran_perm                  Local Moran's I permutation testing
#> 3           localC                                      Local Geary's C
#> 4      localC_perm                  Local Geary's C permutation testing
#> 5           localG                                      Getis-Ord Gi(*)
#> 6      localG_perm             Getis-Ord Gi(*) with permutation testing
#> 7             LOSH                     Local spatial heteroscedasticity
#> 8          LOSH.mc Local spatial heteroscedasticity permutation testing
#> 9          LOSH.cs     Local spatial heteroscedasticity Chi-square test
#> 10      moran.plot                                   Moran scatter plot

This is the code used to create the SFEMethod object for localmoran:

localmoran <- SFEMethod(
    name = "localmoran", title = "Local Moran's I",
    package = "spdep", scope = "local", default_attr = "Ii",
    fun = spdep::localmoran,
    use_graph = TRUE,
    reorganize_fun = .localmoran2df
)

spdep::localmoran already has the right arguments, including x, listw, and zero.policy.

For local methods, the title and default_attr arguments are important, as they are used in plotLocalResults() for the plot title. Many local methods return a matrix or data frame as the results for each gene, and default_attr specifies the column to use by default when plotting, such as the local Moran’s I values (Ii) in this case. Other fields the results can be p-values and adjusted p-values.

The reorganize_fun is different from those of univariate global methods because the local results are organized differently. Here is the .localmoran2df function:

.localmoran2df <- function(out, nb, p.adjust.method) {
    lapply(out, function(o) {
        o1 <- as.data.frame(o)
        quadr <- attr(o, "quadr")
        I(.add_log_p(cbind(o1, quadr), nb, p.adjust.method))
    })
}

The function must have arguments:

out for results from fun for all genes, a list each element of which is the results for one gene.
nb for a neighbor object of class nb, which is part of the listw object for spatial neighborhood graphs. This is used to correct for multiple hypothesis testing with p.adjustSP()
p.adjust.method to specify a method to correct for multiple testing. See p.adjust() for available methods.

The output is a list of organized results, each element for one gene, which will then be converted to a DataFrame and added to localResults(sfe).

Bivariate

These are the bivariate global methods in Voyager:

listSFEMethods("bi", "global")
#>                  name                                     description
#> 1                 lee                       Lee's bivariate statistic
#> 2              lee.mc Lee's bivariate static with permutation testing
#> 3            lee.test                                    Lee's L test
#> 4     cross_variogram                                 Cross variogram
#> 5 cross_variogram_map                             Cross variogram map

These are the bivariate local methods in Voyager:

listSFEMethods("bi", "local")
#>            name                     description
#> 1      locallee Local Lee's bivariate statistic
#> 2 localmoran_bv       Local bivariate Moran's I

The SFEMethod construction for bivariate methods is similar to that of univariate methods, except that the function in fun must have argument y after x. This is the code used to create the SFEMethod object for lee, Lee’s L:

lee <- SFEMethod(name = "lee", fun = .lee_mat, title = "Lee's bivariate statistic",
                 reorganize_fun = function(out, name, ...) out,
                 package = "Voyager", variate = "bi", scope = "global",
                 use_matrix = TRUE)

Note the use_matrix argument, which specific to bivariate methods. It means whether the method can take a matrix as argument and compute the statistic for all pairwise combinations of the matrix’s rows. This way the computation can be expressed as matrix operations which is much more efficient than R loops because the loops are pushed to the underlying C and Fortran code in BLAS and the Matrix package for sparse matrices.

Here’s the .lee_mat function:

.lee_mat <- function(x, y = NULL, listw, zero.policy = TRUE, ...) {
    # X has genes in rows
    if (is(listw, "listw"))
        W <- listw2sparse(listw)
    else W <- listw
    x <- .scale_n(x)
    if (!is.null(y)) {
        y <- .scale_n(y)
    } else y <- x
    n <- ncol(x) # dimension of y is checked in calculateBivariate
    out <- x %*% (t(W) %*% W) %*% t(y)/sum(rowSums(W)^2) * n
    if (all(dim(out) == 1L)) out <- out[1,1]
    out
}

Due to the matrix operation, listw can be a sparse or dense adjacency matrix of the spatial neighborhood graph. To conform to scRNA-seq conventions, x and y have genes in rows if they are matrices.

The reorganize_fun for bivariate global methods don’t have to return a DataFrame, because bivariate global results can’t be stored in the SFE object. However, reorganize_fun for bivariate local methods follow the same rules as that of univariate local methods because the results also go into localResults(sfe).

Multivariate

These are the multivariate methods in Voyager:

listSFEMethods("multi")
#>                name                                      description
#> 1        multispati                                   MULTISPATI PCA
#> 2      localC_multi                     Multivariate local Geary's C
#> 3 localC_perm_multi Multivariate local Geary's C permutation testing

The SFEMethod construction for bivariate methods is similar to that of univariate methods, except for two arguments: joint to indicate whether it makes sense to run the method for multiple samples jointly just like in non-spatial PCA, and dest to indicate whether the results should go into reducedDims(sfe) or colData(sfe). This is the code for a multivariate generalization of local Geary’s C (Anselin1995-ra?) with permutation testing:

.localC_multi_fun <- function(perm = FALSE) {
    function(x, listw, ..., zero.policy) {
        x <- as.matrix(x)
        fun <- if (perm) spdep::localC_perm else spdep::localC
        fun(x, listw = listw, zero.policy = zero.policy, ...)
    }
}

.localCpermmulti2df <- function(out, nb, p.adjust.method) {
    .attrmat2df(list(out), "pseudo-p", "localC_perm_multi", nb, p.adjust.method)[[1]]
}

localC_perm_multi <- SFEMethod(
    name = "localC_perm_multi", title = "Multivariate local Geary's C permutation testing",
    package = "spdep", variate = "multi", default_attr = "localC",
    fun = .localC_multi_fun(TRUE),
    reorganize_fun = .localCpermmulti2df,
    dest = "colData"
)

Here the results, as a single vector, goes into colData(sfe), and it does not make sense to run it across multiple samples jointly as each sample has a separate spatial neighborhood graph, so it will be run on each sample separately.

The function in reorganize_fun should return a vector, matrix, or data frame ready to be added to reducedDims(sfe) or colData(sfe). If the results can go into colData, the rules for arguments are the same as those for univariate local methods, because with permutation testing for multivariate local Geary’s C, multiple testing correction is performed in the reorganize_fun. If the results go into reducedDims, then it only needs one argument for the output.

Lambda Moses