Introduction
Consider two variables that are correlated, say with Pearson
correlation of 0.8. The observations are spatially referenced. The
locations of the observations can be permuted without affecting Pearson
correlation. The purpose of bivariate spatial statistics is to indicate
both correlation in value (as in Pearson correlation), and spatial
autocorrelation and co-patterning. One of the bivariate methods
implemented in Voyager
is the cross variogram, which is
shown in the variogram
vignette. This vignette demonstrates other bivariate spatial
statistics, which use a spatial neighborhood graph, on the mouse
skeletal muscle Visium dataset.
Here we load the packages used:
library(Voyager)
library(SFEData)
library(SpatialFeatureExperiment)
library(scater)
library(scran)
library(ggplot2)
library(pheatmap)
library(scico)
theme_set(theme_bw())
A list of all bivariate global methods can be seen here:
listSFEMethods(variate = "bi", scope = "global")
#> name description
#> 1 lee Lee's bivariate statistic
#> 2 lee.mc Lee's bivariate static with permutation testing
#> 3 lee.test Lee's L test
#> 4 cross_variogram Cross variogram
#> 5 cross_variogram_map Cross variogram map
When calling calculate*variate()
or
run*variate()
, the type
(2nd) argument takes
either an SFEMethod
object or a string that matches an
entry in the name
column in the data frame returned by
listSFEMethods()
.
QC was performed in another vignette, so this vignette will not plot QC metrics.
(sfe <- McKellarMuscleData("full"))
#> see ?SFEData and browseVignettes('SFEData') for documentation
#> loading from cache
#> class: SpatialFeatureExperiment
#> dim: 15123 4992
#> metadata(0):
#> assays(1): counts
#> rownames(15123): ENSMUSG00000025902 ENSMUSG00000096126 ...
#> ENSMUSG00000064368 ENSMUSG00000064370
#> rowData names(6): Ensembl symbol ... vars cv2
#> colnames(4992): AAACAACGAATAGTTC AAACAAGTATCTCCCA ... TTGTTTGTATTACACG
#> TTGTTTGTGTAAATTC
#> colData names(12): barcode col ... prop_mito in_tissue
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : imageX imageY
#> imgData names(1): sample_id
#>
#> unit: full_res_image_pixels
#> Geometries:
#> colGeometries: spotPoly (POLYGON)
#> annotGeometries: tissueBoundary (POLYGON), myofiber_full (POLYGON), myofiber_simplified (POLYGON), nuclei (POLYGON), nuclei_centroid (POINT)
#>
#> Graphs:
#> Vis5A:
The image can be added to the SFE object and plotted behind the geometries, and needs to be flipped to align to the spots because the origin is at the top left for the image but bottom left for geometries.
if (!file.exists("tissue_lowres_5a.jpeg")) {
download.file("https://raw.githubusercontent.com/pachterlab/voyager/main/vignettes/tissue_lowres_5a.jpeg",
destfile = "tissue_lowres_5a.jpeg")
}
sfe <- addImg(sfe, imageSource = "tissue_lowres_5a.jpeg", sample_id = "Vis5A",
image_id = "lowres", scale_fct = 1024/22208)
sfe_tissue <- sfe[,colData(sfe)$in_tissue]
sfe_tissue <- sfe_tissue[rowSums(counts(sfe_tissue)) > 0,]
sfe_tissue <- logNormCounts(sfe_tissue)
colGraph(sfe_tissue, "visium") <- findVisiumGraph(sfe_tissue)
Lee’s L
Lee’s L (Lee 2001) was developed from relating Moran’s I to Pearson correlation, and is defined as
where is the number of spots or locations, and are different locations, or spots in the Visium context, and are variables with values at each location, and is a spatial weight, which can be inversely proportional to distance between spots or an indicator of whether two spots are neighbors, subject to various definitions of neighborhood.
Here we compute Lee’s L for top highly variagle genes (HVGs) in this dataset:
hvgs <- getTopHVGs(sfe_tissue, fdr.threshold = 0.01)
Because bivariate global results can have very different formats (matrix for Lee’s L and lists for many other methods), the results are not stored in the SFE object.
res <- calculateBivariate(sfe_tissue, type = "lee", feature1 = hvgs)
#> Warning: `listw2sparse()` was deprecated in SpatialFeatureExperiment 1.9.0.
#> ℹ Please use `spatialreg::as_dgRMatrix_listw()` instead.
#> ℹ The deprecated feature was likely used in the Voyager package.
#> Please report the issue at <https://github.com/pachterlab/voyager/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
This gives a spatially informed correlation matrix among the genes, which can be plotted as a heatmap:
pal_rng <- getDivergeRange(res)
pal <- scico(256, begin = pal_rng[1], end = pal_rng[2], palette = "vik")
pheatmap(res, color = pal, show_rownames = FALSE,
show_colnames = FALSE, cellwidth = 1, cellheight = 1)
Some coexpression blocks can be seen. Note that unlike in Pearson correlation, the diagonal is not 1, because
which is approximated the ratio between the variance of spatially lagged and variance of . Because the spatial lag introduces smoothing, the spatial lag reduced variance, making the diagonal less than 1. This is the spatial smoothing scalar (SSS), and Moran’s I is approximately Pearson correlation between and spatially lagged () multiplied by SSS:
Similarly for Lee’s L, as shown in (Lee 2001),
With more spatial clustering, the variance is less reduced by the spatial lag, leading to a larger SSS. Hence when both and are spatially distributed like salt and pepper while strongly correlated, Lee’s L will be low because the lack of spatial autocorrelation leads to a small SSS.
Weighted correlation network analysis (WGCNA) (Langfelder and Horvath 2008) is a time honored method to find gene co-expression modules, and it can take any correlation matrix. Then it would be interesting to apply WGCNA to the Lee’s L matrix to identify spatially informed gene co-expression modules.
Local Lee
Local Lee’s L (Lee 2001) is defined as
Compare this to the global L in the previous section. Local L does not sum over the locations . This is the contribution of each location to global L and can show spatial heterogeneity in the relationship between two variables.
All bivariate local methods in Voyager
is listed
here:
listSFEMethods("bi", "local")
#> name description
#> 1 locallee Local Lee's bivariate statistic
#> 2 localmoran_bv Local bivariate Moran's I
Here we compute local L for two myofiber marker genes and one gene highly expressed in the injury site:
sfe_tissue <- runBivariate(sfe_tissue, "locallee", swap_rownames = "symbol",
feature1 = c("Myh2", "Myh1", "Ftl1"))
Bivariate local results are stored in the localResults
field and the feature names are the pairwise combinations of features
supplied. When only feature1
is specified, then the
bivariate method is applied to all pairwise combinations of
feature1
.
localResultFeatures(sfe_tissue, "locallee")
#> [1] "Myh2__Myh2" "Myh1__Myh2" "Ftl1__Myh2" "Myh2__Myh1" "Myh1__Myh1"
#> [6] "Ftl1__Myh1" "Myh2__Ftl1" "Myh1__Ftl1" "Ftl1__Ftl1"
For Lee’s L, both and are computed although they are the same. However, not all bivariate methods are symmetric (see next section). In the next release (Bioconductor 3.18), we may introduce another argument to indicate whether the method is symmetric and if so only compute and not .
First plot the three genes individually:
plotSpatialFeature(sfe_tissue, c("Myh2", "Myh1", "Ftl1"),
swap_rownames = "symbol", image_id = "lowres", maxcell = 5e4)
Then plot the local L’s:
plotLocalResult(sfe_tissue, "locallee", c("Myh1__Myh2", "Myh2__Ftl1", "Myh1__Ftl1"),
colGeometryName = "spotPoly",
image_id = "lowres", maxcell = 5e4,
divergent = TRUE, diverge_center = 0)
Here we see regions where Myh1 and Myh2 are more co-expressed, and where the myosins and Ftl1 are negatively correlated.
is also computed, so we can plot the local SSS for the three genes:
plotLocalResult(sfe_tissue, "locallee", c("Myh1__Myh1", "Myh2__Myh2", "Ftl1__Ftl1"),
colGeometryName = "spotPoly",
image_id = "lowres", maxcell = 5e4)
See how the local SSS compares to local Moran’s I:
sfe_tissue <- runUnivariate(sfe_tissue, "localmoran", c("Myh2", "Myh1", "Ftl1"),
swap_rownames = "symbol")
plotLocalResult(sfe_tissue, "localmoran", c("Myh1", "Myh2", "Ftl1"),
colGeometryName = "spotPoly", swap_rownames = "symbol",
image_id = "lowres", maxcell = 5e4,
divergent = TRUE, diverge_center = 0)
The patterns are qualitatively the same, but while local Moran’s I is negative in heterogeneous regions, the SSS can’t be negative.
Bivariate local Moran
The spdep
package implements a bivariate version of
local Moran, which basically is
Note that this is not symmetric, i.e. .
sfe_tissue <- runBivariate(sfe_tissue, "localmoran_bv", c("Myh1", "Myh2", "Ftl1"),
swap_rownames = "symbol", nsim = 1000)
localResultFeatures(sfe_tissue, "localmoran_bv")
#> [1] "Myh1__Myh1" "Myh2__Myh1" "Ftl1__Myh1" "Myh1__Myh2" "Myh2__Myh2"
#> [6] "Ftl1__Myh2" "Myh1__Ftl1" "Myh2__Ftl1" "Ftl1__Ftl1"
Permutation testing is performed so we get a pseudo p-value
localResultAttrs(sfe_tissue, "localmoran_bv", "Myh1__Myh2")
#> [1] "Ibvi" "E.Ibvi" "Var.Ibvi"
#> [4] "Z.Ibvi" "Pr(z != E(Ibvi))" "Pr(z != E(Ibvi)) Sim"
#> [7] "Pr(folded) Sim" "-log10p Sim" "-log10p_adj Sim"
First plot the bivariate local Moran’s I values
plotLocalResult(sfe_tissue, "localmoran_bv", c("Myh1__Myh2", "Myh2__Ftl1", "Myh1__Ftl1",
"Myh2__Myh1", "Ftl1__Myh2", "Ftl1__Myh1"),
colGeometryName = "spotPoly", attribute = "Ibvi",
image_id = "lowres", maxcell = 5e4,
divergent = TRUE, diverge_center = 0)
The first row plots XY while the second row plots YX; note that while they are similar, they are not the same. What does bivariate local Moran mean? It’s kind of like contribution of each location to the correlation between and spatially lagged , so is not smoothed. In contrast, Lee’s L is a scaled Pearson correlation between spatially lagged and spatially lagged . Because permutation testing is performed, we can plot the pseudo-p-value, after correcting for multiple testing based on the spatial neighborhood graph:
plotLocalResult(sfe_tissue, "localmoran_bv", c("Myh1__Myh2", "Myh2__Ftl1", "Myh1__Ftl1",
"Myh2__Myh1", "Ftl1__Myh2", "Ftl1__Myh1"),
colGeometryName = "spotPoly", attribute = "-log10p_adj Sim",
image_id = "lowres", maxcell = 5e4,
divergent = TRUE, diverge_center = -log10(0.05))
Note that the p-values are asymetric, because according to the source
code of localmoran_bv()
,
is permuted, but not
.
It’s also related to Wartenberg’s spatial PCA (Wartenberg 1985), where Moran’s I is expressed
in matrix form:
where is the data matrix with scaled and centered variables in columns, is the spatial weights matrix, and is a vector of all 1’s, so the denominator is in effect . The diagonal entries are Moran’s I’s for the variables, and the off diagonal entries are the global versions of what we computed here that sum the bivariate local Moran’s I’s and divide by the sum of all spatial weights. Because doesn’t have to be symmetric, this matrix may not be symmetric. Wartenberg diagonalized this matrix in place of the covariance matrix for spatial PCA. When using scaled and centered data and row normalized spatial weights matrix, MULTISPATI PCA is equivalent to Wartenberg’s approach (Dray, Saïd, and Débias 2008). Lee considered this asymmetry an inadequacy of Wartenberg’s approach as a bivariate association measure (Lee 2001). While I’m not sure how bivariate local Moran’s I helps with data analysis, it is an interesting piece of history.
Session info
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] scico_1.5.0 pheatmap_1.0.12
#> [3] scran_1.34.0 scater_1.34.0
#> [5] ggplot2_3.5.1 scuttle_1.16.0
#> [7] SingleCellExperiment_1.28.1 SummarizedExperiment_1.36.0
#> [9] Biobase_2.66.0 GenomicRanges_1.58.0
#> [11] GenomeInfoDb_1.42.0 IRanges_2.40.0
#> [13] S4Vectors_0.44.0 BiocGenerics_0.52.0
#> [15] MatrixGenerics_1.18.0 matrixStats_1.4.1
#> [17] SFEData_1.8.0 Voyager_1.8.1
#> [19] SpatialFeatureExperiment_1.9.4
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.4.2 bitops_1.0-9
#> [3] filelock_1.0.3 tibble_3.2.1
#> [5] R.oo_1.27.0 lifecycle_1.0.4
#> [7] sf_1.0-19 edgeR_4.4.0
#> [9] lattice_0.22-6 MASS_7.3-61
#> [11] magrittr_2.0.3 limma_3.62.1
#> [13] sass_0.4.9 rmarkdown_2.29
#> [15] jquerylib_0.1.4 yaml_2.3.10
#> [17] metapod_1.14.0 sp_2.1-4
#> [19] RColorBrewer_1.1-3 DBI_1.2.3
#> [21] multcomp_1.4-26 abind_1.4-8
#> [23] spatialreg_1.3-5 zlibbioc_1.52.0
#> [25] purrr_1.0.2 R.utils_2.12.3
#> [27] RCurl_1.98-1.16 TH.data_1.1-2
#> [29] rappdirs_0.3.3 sandwich_3.1-1
#> [31] GenomeInfoDbData_1.2.13 ggrepel_0.9.6
#> [33] irlba_2.3.5.1 terra_1.7-83
#> [35] units_0.8-5 RSpectra_0.16-2
#> [37] dqrng_0.4.1 pkgdown_2.1.1
#> [39] DelayedMatrixStats_1.28.0 codetools_0.2-20
#> [41] DropletUtils_1.26.0 DelayedArray_0.32.0
#> [43] tidyselect_1.2.1 UCSC.utils_1.2.0
#> [45] memuse_4.2-3 farver_2.1.2
#> [47] viridis_0.6.5 ScaledMatrix_1.14.0
#> [49] BiocFileCache_2.14.0 jsonlite_1.8.9
#> [51] BiocNeighbors_2.0.0 e1071_1.7-16
#> [53] survival_3.7-0 systemfonts_1.1.0
#> [55] dbscan_1.2-0 tools_4.4.2
#> [57] ggnewscale_0.5.0 ragg_1.3.3
#> [59] Rcpp_1.0.13-1 glue_1.8.0
#> [61] gridExtra_2.3 SparseArray_1.6.0
#> [63] xfun_0.49 EBImage_4.48.0
#> [65] dplyr_1.1.4 HDF5Array_1.34.0
#> [67] withr_3.0.2 BiocManager_1.30.25
#> [69] fastmap_1.2.0 boot_1.3-31
#> [71] rhdf5filters_1.18.0 bluster_1.16.0
#> [73] fansi_1.0.6 spData_2.3.3
#> [75] digest_0.6.37 rsvd_1.0.5
#> [77] mime_0.12 R6_2.5.1
#> [79] textshaping_0.4.0 colorspace_2.1-1
#> [81] wk_0.9.4 LearnBayes_2.15.1
#> [83] jpeg_0.1-10 RSQLite_2.3.8
#> [85] R.methodsS3_1.8.2 utf8_1.2.4
#> [87] generics_0.1.3 data.table_1.16.2
#> [89] class_7.3-22 httr_1.4.7
#> [91] htmlwidgets_1.6.4 S4Arrays_1.6.0
#> [93] spdep_1.3-6 pkgconfig_2.0.3
#> [95] gtable_0.3.6 blob_1.2.4
#> [97] XVector_0.46.0 htmltools_0.5.8.1
#> [99] fftwtools_0.9-11 scales_1.3.0
#> [101] png_0.1-8 SpatialExperiment_1.16.0
#> [103] knitr_1.49 rjson_0.2.23
#> [105] coda_0.19-4.1 nlme_3.1-166
#> [107] curl_6.0.1 proxy_0.4-27
#> [109] cachem_1.1.0 zoo_1.8-12
#> [111] rhdf5_2.50.0 BiocVersion_3.20.0
#> [113] KernSmooth_2.23-24 vipor_0.4.7
#> [115] parallel_4.4.2 AnnotationDbi_1.68.0
#> [117] desc_1.4.3 s2_1.1.7
#> [119] pillar_1.9.0 grid_4.4.2
#> [121] vctrs_0.6.5 BiocSingular_1.22.0
#> [123] dbplyr_2.5.0 beachmat_2.22.0
#> [125] sfheaders_0.4.4 cluster_2.1.6
#> [127] beeswarm_0.4.0 evaluate_1.0.1
#> [129] zeallot_0.1.0 magick_2.8.5
#> [131] mvtnorm_1.3-2 cli_3.6.3
#> [133] locfit_1.5-9.10 compiler_4.4.2
#> [135] rlang_1.1.4 crayon_1.5.3
#> [137] labeling_0.4.3 classInt_0.4-10
#> [139] ggbeeswarm_0.7.2 fs_1.6.5
#> [141] viridisLite_0.4.2 deldir_2.0-4
#> [143] BiocParallel_1.40.0 munsell_0.5.1
#> [145] Biostrings_2.74.0 tiff_0.1-12
#> [147] Matrix_1.7-1 ExperimentHub_2.14.0
#> [149] patchwork_1.3.0 sparseMatrixStats_1.18.0
#> [151] bit64_4.5.2 Rhdf5lib_1.28.0
#> [153] KEGGREST_1.46.0 statmod_1.5.0
#> [155] AnnotationHub_3.14.0 igraph_2.1.1
#> [157] memoise_2.0.1 bslib_0.8.0
#> [159] bit_4.5.0