Use your own spatial method in Voyager
Lambda Moses
2024-11-23
Source:vignettes/sfemethod.Rmd
      sfemethod.RmdIntroduction
There are multiple different ways to do certain things. Each of the
different ways have pros and cons, and sometimes can tell somewhat
different stories. Often the different ways come with different
syntaxes, increasing the learning curve for users. Voyager
took inspiration from caret and tidymodels
(Kuhn2020-fm?) for
machine learning, foreach, future, and
BiocParallel for parallel processing with different
backends, bluster for different clustering algorithms, and
BiocNeighbors for different algorithms to find nearest
neighbors. These packages provide uniform user interfaces to different
methods to achieve a given goal.
In caret and tidymodels, users can make the
uniform user interface fit custom models not included in the package to
eliminate a lot of duplicate code. In Voyager, this is done
with the SFEMethod S4 class. This vignette shows how to use
the SFEMethod class to use Voyager’s uniform user interface
on you own custom methods.
Here we load the packages used:
library(Voyager)
#> Loading required package: SpatialFeatureExperiment
#> 
#> Attaching package: 'SpatialFeatureExperiment'
#> The following object is masked from 'package:base':
#> 
#>     scale
library(spdep)
#> Loading required package: spData
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source')`
#> Loading required package: sf
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUEVoyager categorizes exploratory spatial data analysis
(ESDA) methods by the number of variables and whether the method gives
one result for the entire dataset (global) or gives results at each
location (local). While the process to create an SFEMethod
object is mostly the same across categories, each category has specific
arguments.
Also, before you make your own SFEMethod object, see if
the method of interest is already in Voyager. The methods
can be listed with the listSFEMethods() function.
When calling calculate*variate() or
run*variate(), the type (2nd) argument takes
either an SFEMethod object or a string that matches an
entry in the name column in the data frame returned by
listSFEMethods() so Voyager will search for an
S4 object with name matching the string.
Univariate
Global
These are the univariate global methods in Voyager:
listSFEMethods("uni", "global")
#>              name                                           description
#> 1           moran                                             Moran's I
#> 2           geary                                             Geary's C
#> 3        moran.mc                    Moran's I with permutation testing
#> 4        geary.mc                    Geary's C with permutation testing
#> 5    sp.mantel.mc Mantel-Hubert spatial general cross product statistic
#> 6      moran.test                                        Moran's I test
#> 7      geary.test                                        Geary's C test
#> 8    globalG.test                                         Global G test
#> 9  sp.correlogram                                           Correlogram
#> 10      variogram                                  Variogram with model
#> 11  variogram_map                                         Variogram mapThis is the code used to create the SFEMethod object to
run Moran’s I, with the SFEMethod() constructor:
moran <- SFEMethod(
    name = "moran", title = "Moran's I", package = "spdep", 
    variate = "uni", scope = "global",
    fun = function(x, listw, zero.policy = NULL)
        spdep::moran(x, listw, n = length(listw$neighbours), S0 = spdep::Szero(listw),
                     zero.policy = zero.policy),
    use_graph = TRUE,
    reorganize_fun = .moran2df
)The package argument is used to check if the package is
installed when the method is run. The function to run the method is in
the fun argument. All univariate methods that use a spatial
neighborhood graph (use_graph = TRUE) must have
arguments:
- 
xfor a vector as input
- 
listwfor the spatial neighborhood graph as alistwobject, and
- 
zero.policyfor what to do when there are cells or spots that don’t have spatial neighbors. Seespdepdocumentation (e.g.spdep::moran()) for how thezero.policyargument behaves.
In this case I wrote a think wrapper to fill in the confusing arguments that may confuse the users.
If the function running the method from another package has different
arguments, then write a thin wrapper to make these required arguments.
Extra arguments can be passed to fun through
....
The reorganize_fun argument takes a function to
reorganize the output from fun to form a
DataFrame so the results for genes can be added to
rowData(sfe). Here for Moran’s I, the function is
.moran2df <- function(out, name, ...) {
    rns <- names(out)
    out <- lapply(out, unlist, use.names = TRUE)
    out <- Reduce(rbind, out)
    if (!is.matrix(out)) out <- t(as.matrix(out))
    rownames(out) <- rns
    out <- DataFrame(out)
    names(out)[1] <- name
    out
}For univariate or bivariate global methods, the function must have:
- 
outargument to take in the output fromfunfor multiple genes or features
- 
nameto take the name under which the results are stored in case the same method is run for the same genes but with different parameters and you don’t want to overwrite previous results. Thenameis the name specified in theSFEMethod()constructor by default, but can be set by the user when callingcalculate*variate()orrun*variate(), and
- 
...because thereorganize_funof some univariate global methods in Voyager, such assp.correlogram, needs other arguments.
Some spatial methods use spatial distances rather than graphs, such
as the variogram. This is the code used to create the
SFEMethod object for variogram:
variogram <- SFEMethod(package = "automap", variate = "uni", scope = "global",
                       default_attr = NA, name = "variogram", title = "Variogram",
                       fun = .variogram,
                       reorganize_fun = .other2df,
                       use_graph = FALSE)The function in fun for univariate methods that don’t
use a spatial neighborhood graph must have arguments x and
coords_df (sf data frame for the spatial
coordinates) and other arguments are allowed. This is the
.variogram function:
.variogram <- function(x, coords_df, formula = x ~ 1, scale = TRUE, ...) {
    coords_df$x <- x
    if (scale) coords_df$x <- scale(coords_df$x)
    dots <- list(...)
    # Deal with alpha myself and fit a global variogram to avoid further gstat warnings
    have_alpha <- "alpha" %in% names(dots)
    if (have_alpha) {
        empirical <- gstat::variogram(formula, data = coords_df, alpha = dots$alpha)
        dots$alpha <- NULL
    }
    out <- do.call(automap::autofitVariogram,
                   c(list(formula = formula, input_data = coords_df,
                          map = FALSE, cloud = FALSE), dots))
    if (have_alpha) {
        out$exp_var <- empirical
    }
    out
}The rule for reorganize_fun remains the same, and this
is the .other2df function:
Local
These are the univariate local methods in Voyager:
listSFEMethods("uni", "local")
#>               name                                          description
#> 1       localmoran                                      Local Moran's I
#> 2  localmoran_perm                  Local Moran's I permutation testing
#> 3           localC                                      Local Geary's C
#> 4      localC_perm                  Local Geary's C permutation testing
#> 5           localG                                      Getis-Ord Gi(*)
#> 6      localG_perm             Getis-Ord Gi(*) with permutation testing
#> 7             LOSH                     Local spatial heteroscedasticity
#> 8          LOSH.mc Local spatial heteroscedasticity permutation testing
#> 9          LOSH.cs     Local spatial heteroscedasticity Chi-square test
#> 10      moran.plot                                   Moran scatter plotThis is the code used to create the SFEMethod object for
localmoran:
localmoran <- SFEMethod(
    name = "localmoran", title = "Local Moran's I",
    package = "spdep", scope = "local", default_attr = "Ii",
    fun = spdep::localmoran,
    use_graph = TRUE,
    reorganize_fun = .localmoran2df
)spdep::localmoran already has the right arguments,
including x, listw, and
zero.policy.
For local methods, the title and
default_attr arguments are important, as they are used in
plotLocalResults() for the plot title. Many local methods
return a matrix or data frame as the results for each gene, and
default_attr specifies the column to use by default when
plotting, such as the local Moran’s I values (Ii) in this
case. Other fields the results can be p-values and adjusted
p-values.
The reorganize_fun is different from those of univariate
global methods because the local results are organized differently. Here
is the .localmoran2df function:
.localmoran2df <- function(out, nb, p.adjust.method) {
    lapply(out, function(o) {
        o1 <- as.data.frame(o)
        quadr <- attr(o, "quadr")
        I(.add_log_p(cbind(o1, quadr), nb, p.adjust.method))
    })
}The function must have arguments:
- 
outfor results fromfunfor all genes, a list each element of which is the results for one gene.
- 
nbfor a neighbor object of classnb, which is part of thelistwobject for spatial neighborhood graphs. This is used to correct for multiple hypothesis testing withp.adjustSP()
- 
p.adjust.methodto specify a method to correct for multiple testing. Seep.adjust()for available methods.
The output is a list of organized results, each element for one gene,
which will then be converted to a DataFrame and added to
localResults(sfe).
Bivariate
These are the bivariate global methods in Voyager:
listSFEMethods("bi", "global")
#>                  name                                     description
#> 1                 lee                       Lee's bivariate statistic
#> 2              lee.mc Lee's bivariate static with permutation testing
#> 3            lee.test                                    Lee's L test
#> 4     cross_variogram                                 Cross variogram
#> 5 cross_variogram_map                             Cross variogram mapThese are the bivariate local methods in Voyager:
listSFEMethods("bi", "local")
#>            name                     description
#> 1      locallee Local Lee's bivariate statistic
#> 2 localmoran_bv       Local bivariate Moran's IThe SFEMethod construction for bivariate methods is
similar to that of univariate methods, except that the function in
fun must have argument y after x.
This is the code used to create the SFEMethod object for
lee, Lee’s L:
lee <- SFEMethod(name = "lee", fun = .lee_mat, title = "Lee's bivariate statistic",
                 reorganize_fun = function(out, name, ...) out,
                 package = "Voyager", variate = "bi", scope = "global",
                 use_matrix = TRUE)Note the use_matrix argument, which specific to
bivariate methods. It means whether the method can take a matrix as
argument and compute the statistic for all pairwise combinations of the
matrix’s rows. This way the computation can be expressed as matrix
operations which is much more efficient than R loops because the loops
are pushed to the underlying C and Fortran code in BLAS and the
Matrix package for sparse matrices.
Here’s the .lee_mat function:
.lee_mat <- function(x, y = NULL, listw, zero.policy = TRUE, ...) {
    # X has genes in rows
    if (is(listw, "listw"))
        W <- listw2sparse(listw)
    else W <- listw
    x <- .scale_n(x)
    if (!is.null(y)) {
        y <- .scale_n(y)
    } else y <- x
    n <- ncol(x) # dimension of y is checked in calculateBivariate
    out <- x %*% (t(W) %*% W) %*% t(y)/sum(rowSums(W)^2) * n
    if (all(dim(out) == 1L)) out <- out[1,1]
    out
}Due to the matrix operation, listw can be a sparse or
dense adjacency matrix of the spatial neighborhood graph. To conform to
scRNA-seq conventions, x and y have genes in
rows if they are matrices.
The reorganize_fun for bivariate global methods don’t
have to return a DataFrame, because bivariate global
results can’t be stored in the SFE object. However,
reorganize_fun for bivariate local methods follow the same
rules as that of univariate local methods because the results also go
into localResults(sfe).
Multivariate
These are the multivariate methods in Voyager:
listSFEMethods("multi")
#>                name                                      description
#> 1        multispati                                   MULTISPATI PCA
#> 2      localC_multi                     Multivariate local Geary's C
#> 3 localC_perm_multi Multivariate local Geary's C permutation testingThe SFEMethod construction for bivariate methods is
similar to that of univariate methods, except for two arguments:
joint to indicate whether it makes sense to run the method
for multiple samples jointly just like in non-spatial PCA, and
dest to indicate whether the results should go into
reducedDims(sfe) or colData(sfe). This is the
code for a multivariate generalization of local Geary’s C (Anselin1995-ra?) with
permutation testing:
.localC_multi_fun <- function(perm = FALSE) {
    function(x, listw, ..., zero.policy) {
        x <- as.matrix(x)
        fun <- if (perm) spdep::localC_perm else spdep::localC
        fun(x, listw = listw, zero.policy = zero.policy, ...)
    }
}
.localCpermmulti2df <- function(out, nb, p.adjust.method) {
    .attrmat2df(list(out), "pseudo-p", "localC_perm_multi", nb, p.adjust.method)[[1]]
}
localC_perm_multi <- SFEMethod(
    name = "localC_perm_multi", title = "Multivariate local Geary's C permutation testing",
    package = "spdep", variate = "multi", default_attr = "localC",
    fun = .localC_multi_fun(TRUE),
    reorganize_fun = .localCpermmulti2df,
    dest = "colData"
)Here the results, as a single vector, goes into
colData(sfe), and it does not make sense to run it across
multiple samples jointly as each sample has a separate spatial
neighborhood graph, so it will be run on each sample separately.
The function in reorganize_fun should return a vector,
matrix, or data frame ready to be added to reducedDims(sfe)
or colData(sfe). If the results can go into
colData, the rules for arguments are the same as those for
univariate local methods, because with permutation testing for
multivariate local Geary’s C, multiple testing correction is performed
in the reorganize_fun. If the results go into
reducedDims, then it only needs one argument for the
output.