Use your own spatial method in Voyager
Lambda Moses
2024-05-17
Source:vignettes/sfemethod.Rmd
sfemethod.Rmd
Introduction
There are multiple different ways to do certain things. Each of the
different ways have pros and cons, and sometimes can tell somewhat
different stories. Often the different ways come with different
syntaxes, increasing the learning curve for users. Voyager
took inspiration from caret
and tidymodels
(Kuhn and Wickham 2020) for machine
learning, foreach
, future
, and
BiocParallel
for parallel processing with different
backends, bluster
for different clustering algorithms, and
BiocNeighbors
for different algorithms to find nearest
neighbors. These packages provide uniform user interfaces to different
methods to achieve a given goal.
In caret
and tidymodels
, users can make the
uniform user interface fit custom models not included in the package to
eliminate a lot of duplicate code. In Voyager
, this is done
with the SFEMethod
S4 class. This vignette shows how to use
the SFEMethod
class to use Voyager’s uniform user interface
on you own custom methods.
Here we load the packages used:
library(Voyager)
#> Loading required package: SpatialFeatureExperiment
#>
#> Attaching package: 'SpatialFeatureExperiment'
#> The following object is masked from 'package:base':
#>
#> scale
library(spdep)
#> Loading required package: spData
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source')`
#> Loading required package: sf
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
Voyager
categorizes exploratory spatial data analysis
(ESDA) methods by the number of variables and whether the method gives
one result for the entire dataset (global) or gives results at each
location (local). While the process to create an SFEMethod
object is mostly the same across categories, each category has specific
arguments.
Also, before you make your own SFEMethod
object, see if
the method of interest is already in Voyager
. The methods
can be listed with the listSFEMethods()
function.
When calling calculate*variate()
or
run*variate()
, the type
(2nd) argument takes
either an SFEMethod
object or a string that matches an
entry in the name
column in the data frame returned by
listSFEMethods()
so Voyager
will search for an
S4 object with name matching the string.
Univariate
Global
These are the univariate global methods in Voyager
:
listSFEMethods("uni", "global")
#> name description
#> 1 moran Moran's I
#> 2 geary Geary's C
#> 3 moran.mc Moran's I with permutation testing
#> 4 geary.mc Geary's C with permutation testing
#> 5 sp.mantel.mc Mantel-Hubert spatial general cross product statistic
#> 6 moran.test Moran's I test
#> 7 geary.test Geary's C test
#> 8 globalG.test Global G test
#> 9 sp.correlogram Correlogram
#> 10 variogram Variogram with model
#> 11 variogram_map Variogram map
This is the code used to create the SFEMethod
object to
run Moran’s I, with the SFEMethod()
constructor:
moran <- SFEMethod(
name = "moran", title = "Moran's I", package = "spdep",
variate = "uni", scope = "global",
fun = function(x, listw, zero.policy = NULL)
spdep::moran(x, listw, n = length(listw$neighbours), S0 = spdep::Szero(listw),
zero.policy = zero.policy),
use_graph = TRUE,
reorganize_fun = .moran2df
)
The package
argument is used to check if the package is
installed when the method is run. The function to run the method is in
the fun
argument. All univariate methods that use a spatial
neighborhood graph (use_graph = TRUE
) must have
arguments:
-
x
for a vector as input -
listw
for the spatial neighborhood graph as alistw
object, and -
zero.policy
for what to do when there are cells or spots that don’t have spatial neighbors. Seespdep
documentation (e.g.spdep::moran()
) for how thezero.policy
argument behaves.
In this case I wrote a think wrapper to fill in the confusing arguments that may confuse the users.
If the function running the method from another package has different
arguments, then write a thin wrapper to make these required arguments.
Extra arguments can be passed to fun
through
...
.
The reorganize_fun
argument takes a function to
reorganize the output from fun
to form a
DataFrame
so the results for genes can be added to
rowData(sfe)
. Here for Moran’s I, the function is
.moran2df <- function(out, name, ...) {
rns <- names(out)
out <- lapply(out, unlist, use.names = TRUE)
out <- Reduce(rbind, out)
if (!is.matrix(out)) out <- t(as.matrix(out))
rownames(out) <- rns
out <- DataFrame(out)
names(out)[1] <- name
out
}
For univariate or bivariate global methods, the function must have:
-
out
argument to take in the output fromfun
for multiple genes or features -
name
to take the name under which the results are stored in case the same method is run for the same genes but with different parameters and you don’t want to overwrite previous results. Thename
is the name specified in theSFEMethod()
constructor by default, but can be set by the user when callingcalculate*variate()
orrun*variate()
, and -
...
because thereorganize_fun
of some univariate global methods in Voyager, such assp.correlogram
, needs other arguments.
Some spatial methods use spatial distances rather than graphs, such
as the variogram. This is the code used to create the
SFEMethod
object for variogram
:
variogram <- SFEMethod(package = "automap", variate = "uni", scope = "global",
default_attr = NA, name = "variogram", title = "Variogram",
fun = .variogram,
reorganize_fun = .other2df,
use_graph = FALSE)
The function in fun
for univariate methods that don’t
use a spatial neighborhood graph must have arguments x
and
coords_df
(sf
data frame for the spatial
coordinates) and other arguments are allowed. This is the
.variogram
function:
.variogram <- function(x, coords_df, formula = x ~ 1, scale = TRUE, ...) {
coords_df$x <- x
if (scale) coords_df$x <- scale(coords_df$x)
dots <- list(...)
# Deal with alpha myself and fit a global variogram to avoid further gstat warnings
have_alpha <- "alpha" %in% names(dots)
if (have_alpha) {
empirical <- gstat::variogram(formula, data = coords_df, alpha = dots$alpha)
dots$alpha <- NULL
}
out <- do.call(automap::autofitVariogram,
c(list(formula = formula, input_data = coords_df,
map = FALSE, cloud = FALSE), dots))
if (have_alpha) {
out$exp_var <- empirical
}
out
}
The rule for reorganize_fun
remains the same, and this
is the .other2df
function:
Local
These are the univariate local methods in Voyager
:
listSFEMethods("uni", "local")
#> name description
#> 1 localmoran Local Moran's I
#> 2 localmoran_perm Local Moran's I permutation testing
#> 3 localC Local Geary's C
#> 4 localC_perm Local Geary's C permutation testing
#> 5 localG Getis-Ord Gi(*)
#> 6 localG_perm Getis-Ord Gi(*) with permutation testing
#> 7 LOSH Local spatial heteroscedasticity
#> 8 LOSH.mc Local spatial heteroscedasticity permutation testing
#> 9 LOSH.cs Local spatial heteroscedasticity Chi-square test
#> 10 moran.plot Moran scatter plot
This is the code used to create the SFEMethod
object for
localmoran
:
localmoran <- SFEMethod(
name = "localmoran", title = "Local Moran's I",
package = "spdep", scope = "local", default_attr = "Ii",
fun = spdep::localmoran,
use_graph = TRUE,
reorganize_fun = .localmoran2df
)
spdep::localmoran
already has the right arguments,
including x
, listw
, and
zero.policy
.
For local methods, the title
and
default_attr
arguments are important, as they are used in
plotLocalResults()
for the plot title. Many local methods
return a matrix or data frame as the results for each gene, and
default_attr
specifies the column to use by default when
plotting, such as the local Moran’s I values (Ii
) in this
case. Other fields the results can be p-values and adjusted
p-values.
The reorganize_fun
is different from those of univariate
global methods because the local results are organized differently. Here
is the .localmoran2df
function:
.localmoran2df <- function(out, nb, p.adjust.method) {
lapply(out, function(o) {
o1 <- as.data.frame(o)
quadr <- attr(o, "quadr")
I(.add_log_p(cbind(o1, quadr), nb, p.adjust.method))
})
}
The function must have arguments:
-
out
for results fromfun
for all genes, a list each element of which is the results for one gene. -
nb
for a neighbor object of classnb
, which is part of thelistw
object for spatial neighborhood graphs. This is used to correct for multiple hypothesis testing withp.adjustSP()
-
p.adjust.method
to specify a method to correct for multiple testing. Seep.adjust()
for available methods.
The output is a list of organized results, each element for one gene,
which will then be converted to a DataFrame
and added to
localResults(sfe)
.
Bivariate
These are the bivariate global methods in Voyager
:
listSFEMethods("bi", "global")
#> name description
#> 1 lee Lee's bivariate statistic
#> 2 lee.mc Lee's bivariate static with permutation testing
#> 3 lee.test Lee's L test
#> 4 cross_variogram Cross variogram
#> 5 cross_variogram_map Cross variogram map
These are the bivariate local methods in Voyager
:
listSFEMethods("bi", "local")
#> name description
#> 1 locallee Local Lee's bivariate statistic
#> 2 localmoran_bv Local bivariate Moran's I
The SFEMethod
construction for bivariate methods is
similar to that of univariate methods, except that the function in
fun
must have argument y
after x
.
This is the code used to create the SFEMethod
object for
lee
, Lee’s L:
lee <- SFEMethod(name = "lee", fun = .lee_mat, title = "Lee's bivariate statistic",
reorganize_fun = function(out, name, ...) out,
package = "Voyager", variate = "bi", scope = "global",
use_matrix = TRUE)
Note the use_matrix
argument, which specific to
bivariate methods. It means whether the method can take a matrix as
argument and compute the statistic for all pairwise combinations of the
matrix’s rows. This way the computation can be expressed as matrix
operations which is much more efficient than R loops because the loops
are pushed to the underlying C and Fortran code in BLAS and the
Matrix
package for sparse matrices.
Here’s the .lee_mat
function:
.lee_mat <- function(x, y = NULL, listw, zero.policy = TRUE, ...) {
# X has genes in rows
if (is(listw, "listw"))
W <- listw2sparse(listw)
else W <- listw
x <- .scale_n(x)
if (!is.null(y)) {
y <- .scale_n(y)
} else y <- x
n <- ncol(x) # dimension of y is checked in calculateBivariate
out <- x %*% (t(W) %*% W) %*% t(y)/sum(rowSums(W)^2) * n
if (all(dim(out) == 1L)) out <- out[1,1]
out
}
Due to the matrix operation, listw
can be a sparse or
dense adjacency matrix of the spatial neighborhood graph. To conform to
scRNA-seq conventions, x
and y
have genes in
rows if they are matrices.
The reorganize_fun
for bivariate global methods don’t
have to return a DataFrame
, because bivariate global
results can’t be stored in the SFE object. However,
reorganize_fun
for bivariate local methods follow the same
rules as that of univariate local methods because the results also go
into localResults(sfe)
.
Multivariate
These are the multivariate methods in Voyager
:
listSFEMethods("multi")
#> name description
#> 1 multispati MULTISPATI PCA
#> 2 localC_multi Multivariate local Geary's C
#> 3 localC_perm_multi Multivariate local Geary's C permutation testing
The SFEMethod
construction for bivariate methods is
similar to that of univariate methods, except for two arguments:
joint
to indicate whether it makes sense to run the method
for multiple samples jointly just like in non-spatial PCA, and
dest
to indicate whether the results should go into
reducedDims(sfe)
or colData(sfe)
. This is the
code for a multivariate generalization of local Geary’s C (Anselin 2019) with permutation testing:
.localC_multi_fun <- function(perm = FALSE) {
function(x, listw, ..., zero.policy) {
x <- as.matrix(x)
fun <- if (perm) spdep::localC_perm else spdep::localC
fun(x, listw = listw, zero.policy = zero.policy, ...)
}
}
.localCpermmulti2df <- function(out, nb, p.adjust.method) {
.attrmat2df(list(out), "pseudo-p", "localC_perm_multi", nb, p.adjust.method)[[1]]
}
localC_perm_multi <- SFEMethod(
name = "localC_perm_multi", title = "Multivariate local Geary's C permutation testing",
package = "spdep", variate = "multi", default_attr = "localC",
fun = .localC_multi_fun(TRUE),
reorganize_fun = .localCpermmulti2df,
dest = "colData"
)
Here the results, as a single vector, goes into
colData(sfe)
, and it does not make sense to run it across
multiple samples jointly as each sample has a separate spatial
neighborhood graph, so it will be run on each sample separately.
The function in reorganize_fun
should return a vector,
matrix, or data frame ready to be added to reducedDims(sfe)
or colData(sfe)
. If the results can go into
colData
, the rules for arguments are the same as those for
univariate local methods, because with permutation testing for
multivariate local Geary’s C, multiple testing correction is performed
in the reorganize_fun
. If the results go into
reducedDims
, then it only needs one argument for the
output.