Bivariate spatial statistics

Introduction

Consider two variables that are correlated, say with Pearson correlation of 0.8. The observations are spatially referenced. The locations of the observations can be permuted without affecting Pearson correlation. The purpose of bivariate spatial statistics is to indicate both correlation in value (as in Pearson correlation), and spatial autocorrelation and co-patterning. One of the bivariate methods implemented in Voyager is the cross variogram, which is shown in the variogram vignette. This vignette demonstrates other bivariate spatial statistics, which use a spatial neighborhood graph, on the mouse skeletal muscle Visium dataset.

Here we load the packages used:

library(Voyager)
library(SFEData)
library(SpatialFeatureExperiment)
library(scater)
library(scran)
library(ggplot2)
library(pheatmap)
library(scico)
theme_set(theme_bw())

A list of all bivariate global methods can be seen here:

listSFEMethods(variate = "bi", scope = "global")
#>                  name                                     description
#> 1                 lee                       Lee's bivariate statistic
#> 2              lee.mc Lee's bivariate static with permutation testing
#> 3            lee.test                                    Lee's L test
#> 4     cross_variogram                                 Cross variogram
#> 5 cross_variogram_map                             Cross variogram map

When calling calculate*variate() or run*variate(), the type (2nd) argument takes either an SFEMethod object or a string that matches an entry in the name column in the data frame returned by listSFEMethods().

QC was performed in another vignette, so this vignette will not plot QC metrics.

(sfe <- McKellarMuscleData("full"))
#> see ?SFEData and browseVignettes('SFEData') for documentation
#> loading from cache
#> class: SpatialFeatureExperiment 
#> dim: 15123 4992 
#> metadata(0):
#> assays(1): counts
#> rownames(15123): ENSMUSG00000025902 ENSMUSG00000096126 ...
#>   ENSMUSG00000064368 ENSMUSG00000064370
#> rowData names(6): Ensembl symbol ... vars cv2
#> colnames(4992): AAACAACGAATAGTTC AAACAAGTATCTCCCA ... TTGTTTGTATTACACG
#>   TTGTTTGTGTAAATTC
#> colData names(12): barcode col ... prop_mito in_tissue
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : imageX imageY
#> imgData names(1): sample_id
#> 
#> unit: full_res_image_pixels
#> Geometries:
#> colGeometries: spotPoly (POLYGON) 
#> annotGeometries: tissueBoundary (POLYGON), myofiber_full (POLYGON), myofiber_simplified (POLYGON), nuclei (POLYGON), nuclei_centroid (POINT) 
#> 
#> Graphs:
#> Vis5A:

The image can be added to the SFE object and plotted behind the geometries, and needs to be flipped to align to the spots because the origin is at the top left for the image but bottom left for geometries.

if (!file.exists("tissue_lowres_5a.jpeg")) {
    download.file("https://raw.githubusercontent.com/pachterlab/voyager/main/vignettes/tissue_lowres_5a.jpeg",
                  destfile = "tissue_lowres_5a.jpeg")
}

sfe <- addImg(sfe, imageSource = "tissue_lowres_5a.jpeg", sample_id = "Vis5A", 
              image_id = "lowres", scale_fct = 1024/22208)

sfe_tissue <- sfe[,colData(sfe)$in_tissue]
sfe_tissue <- sfe_tissue[rowSums(counts(sfe_tissue)) > 0,]

sfe_tissue <- logNormCounts(sfe_tissue)

colGraph(sfe_tissue, "visium") <- findVisiumGraph(sfe_tissue)

Lee’s L

Lee’s L (Lee 2001) was developed from relating Moran’s I to Pearson correlation, and is defined as

$L_{X,Y} = \frac{n}{\sum_{i=1}^n \sum_{j=1}^n w_{ij}} \frac{\sum_{i=1}^n \left[ \sum_{j=1}^n w_{ij} (x_j - \bar{x}) \right] \left[ \sum_{j=1}^n w_{ij} (y_j - \bar{y}) \right]}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2}\sqrt{\sum_{i=1}^n (y_i - \bar{y})^2} },$

where $n$ is the number of spots or locations, $i$ and $j$ are different locations, or spots in the Visium context, $x$ and $y$ are variables with values at each location, and $w_{ij}$ is a spatial weight, which can be inversely proportional to distance between spots or an indicator of whether two spots are neighbors, subject to various definitions of neighborhood.

Here we compute Lee’s L for top highly variagle genes (HVGs) in this dataset:

hvgs <- getTopHVGs(sfe_tissue, fdr.threshold = 0.01)

Because bivariate global results can have very different formats (matrix for Lee’s L and lists for many other methods), the results are not stored in the SFE object.

res <- calculateBivariate(sfe_tissue, type = "lee", feature1 = hvgs)
#> Warning: `listw2sparse()` was deprecated in SpatialFeatureExperiment 1.9.0.
#> ℹ Please use `spatialreg::as_dgRMatrix_listw()` instead.
#> ℹ The deprecated feature was likely used in the Voyager package.
#>   Please report the issue at <https://github.com/pachterlab/voyager/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

This gives a spatially informed correlation matrix among the genes, which can be plotted as a heatmap:

pal_rng <- getDivergeRange(res)
pal <- scico(256, begin = pal_rng[1], end = pal_rng[2], palette = "vik")

pheatmap(res, color = pal, show_rownames = FALSE, 
         show_colnames = FALSE, cellwidth = 1, cellheight = 1)

Some coexpression blocks can be seen. Note that unlike in Pearson correlation, the diagonal is not 1, because

$L_{X,X} = \frac{\sum_i (\tilde x_i - \bar x)^2}{\sum_i (x_i - \bar x)^2} = \mathrm{SSS}_X,$

which is approximated the ratio between the variance of spatially lagged $x$ and variance of $x$ . Because the spatial lag introduces smoothing, the spatial lag reduced variance, making the diagonal less than 1. This is the spatial smoothing scalar (SSS), and Moran’s I is approximately Pearson correlation between $X$ and spatially lagged $X$ ( $\tilde X$ ) multiplied by SSS:

$I \approx \mathrm{SSS}_X \cdot \rho_{X, \tilde X}$

Similarly for Lee’s L, as shown in (Lee 2001),

$L_{X, Y} = \sqrt{\mathrm{SSS}_X}\sqrt{\mathrm{SSS}_Y} \cdot \rho_{\tilde X, \tilde Y}$

With more spatial clustering, the variance is less reduced by the spatial lag, leading to a larger SSS. Hence when both $X$ and $Y$ are spatially distributed like salt and pepper while strongly correlated, Lee’s L will be low because the lack of spatial autocorrelation leads to a small SSS.

Weighted correlation network analysis (WGCNA) (Langfelder and Horvath 2008) is a time honored method to find gene co-expression modules, and it can take any correlation matrix. Then it would be interesting to apply WGCNA to the Lee’s L matrix to identify spatially informed gene co-expression modules.

Local Lee

Local Lee’s L (Lee 2001) is defined as

$L_i = \frac{n\left[ \sum_{j=1}^n w_{ij} (x_j - \bar{x}) \right] \left[ \sum_{j=1}^n w_{ij} (y_j - \bar{y}) \right]}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2}\sqrt{\sum_{i=1}^n (y_i - \bar{y})^2} }$

Compare this to the global L in the previous section. Local L does not sum over the locations $i$ . This is the contribution of each location to global L and can show spatial heterogeneity in the relationship between two variables.

All bivariate local methods in Voyager is listed here:

listSFEMethods("bi", "local")
#>            name                     description
#> 1      locallee Local Lee's bivariate statistic
#> 2 localmoran_bv       Local bivariate Moran's I

Here we compute local L for two myofiber marker genes and one gene highly expressed in the injury site:

sfe_tissue <- runBivariate(sfe_tissue, "locallee", swap_rownames = "symbol",
                           feature1 = c("Myh2", "Myh1", "Ftl1"))

Bivariate local results are stored in the localResults field and the feature names are the pairwise combinations of features supplied. When only feature1 is specified, then the bivariate method is applied to all pairwise combinations of feature1.

localResultFeatures(sfe_tissue, "locallee")
#> [1] "Myh2__Myh2" "Myh1__Myh2" "Ftl1__Myh2" "Myh2__Myh1" "Myh1__Myh1"
#> [6] "Ftl1__Myh1" "Myh2__Ftl1" "Myh1__Ftl1" "Ftl1__Ftl1"

For Lee’s L, both $L_{X,Y}$ and $L_{Y,X}$ are computed although they are the same. However, not all bivariate methods are symmetric (see next section). In the next release (Bioconductor 3.18), we may introduce another argument to indicate whether the method is symmetric and if so only compute $L_{X,Y}$ and not $L_{Y,X}$ .

First plot the three genes individually:

plotSpatialFeature(sfe_tissue, c("Myh2", "Myh1", "Ftl1"), 
                   swap_rownames = "symbol", image_id = "lowres", maxcell = 5e4)

Then plot the local L’s:

plotLocalResult(sfe_tissue, "locallee", c("Myh1__Myh2", "Myh2__Ftl1", "Myh1__Ftl1"),
                colGeometryName = "spotPoly",
                image_id = "lowres", maxcell = 5e4,
                divergent = TRUE, diverge_center = 0)

Here we see regions where Myh1 and Myh2 are more co-expressed, and where the myosins and Ftl1 are negatively correlated.

$L_{X,X}$ is also computed, so we can plot the local SSS for the three genes:

plotLocalResult(sfe_tissue, "locallee", c("Myh1__Myh1", "Myh2__Myh2", "Ftl1__Ftl1"),
                colGeometryName = "spotPoly",
                image_id = "lowres", maxcell = 5e4)

See how the local SSS compares to local Moran’s I:

sfe_tissue <- runUnivariate(sfe_tissue, "localmoran", c("Myh2", "Myh1", "Ftl1"),
                            swap_rownames = "symbol")

plotLocalResult(sfe_tissue, "localmoran", c("Myh1", "Myh2", "Ftl1"),
                colGeometryName = "spotPoly", swap_rownames = "symbol",
                image_id = "lowres", maxcell = 5e4,
                divergent = TRUE, diverge_center = 0)

The patterns are qualitatively the same, but while local Moran’s I is negative in heterogeneous regions, the SSS can’t be negative.

Bivariate local Moran

The spdep package implements a bivariate version of local Moran, which basically is

$I_{X_i,Y_i} = (n-1)\frac{(x_i - \bar{x})\sum_{j=1}^n w_{ij} (y_j - \bar{y})}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2} \sqrt{\sum_{i=1}^n (y_i - \bar{y})^2}}.$

Note that this is not symmetric, i.e. $I_{X_i,Y_i} \neq I_{Y_i,X_i}$ .

sfe_tissue <- runBivariate(sfe_tissue, "localmoran_bv", c("Myh1", "Myh2", "Ftl1"),
                           swap_rownames = "symbol", nsim = 1000)

localResultFeatures(sfe_tissue, "localmoran_bv")
#> [1] "Myh1__Myh1" "Myh2__Myh1" "Ftl1__Myh1" "Myh1__Myh2" "Myh2__Myh2"
#> [6] "Ftl1__Myh2" "Myh1__Ftl1" "Myh2__Ftl1" "Ftl1__Ftl1"

Permutation testing is performed so we get a pseudo p-value

localResultAttrs(sfe_tissue, "localmoran_bv", "Myh1__Myh2")
#> [1] "Ibvi"                 "E.Ibvi"               "Var.Ibvi"            
#> [4] "Z.Ibvi"               "Pr(z != E(Ibvi))"     "Pr(z != E(Ibvi)) Sim"
#> [7] "Pr(folded) Sim"       "-log10p Sim"          "-log10p_adj Sim"

First plot the bivariate local Moran’s I values

plotLocalResult(sfe_tissue, "localmoran_bv", c("Myh1__Myh2", "Myh2__Ftl1", "Myh1__Ftl1",
                                               "Myh2__Myh1", "Ftl1__Myh2", "Ftl1__Myh1"),
                colGeometryName = "spotPoly", attribute = "Ibvi",
                image_id = "lowres", maxcell = 5e4,
                divergent = TRUE, diverge_center = 0)

The first row plots XY while the second row plots YX; note that while they are similar, they are not the same. What does bivariate local Moran mean? It’s kind of like contribution of each location to the correlation between $x$ and spatially lagged $y$ , so $x$ is not smoothed. In contrast, Lee’s L is a scaled Pearson correlation between spatially lagged $x$ and spatially lagged $y$ . Because permutation testing is performed, we can plot the pseudo-p-value, after correcting for multiple testing based on the spatial neighborhood graph:

plotLocalResult(sfe_tissue, "localmoran_bv", c("Myh1__Myh2", "Myh2__Ftl1", "Myh1__Ftl1",
                                               "Myh2__Myh1", "Ftl1__Myh2", "Ftl1__Myh1"),
                colGeometryName = "spotPoly", attribute = "-log10p_adj Sim",
                image_id = "lowres", maxcell = 5e4,
                divergent = TRUE, diverge_center = -log10(0.05))

Note that the p-values are asymetric, because according to the source code of localmoran_bv(), $y$ is permuted, but not $x$ . It’s also related to Wartenberg’s spatial PCA (Wartenberg 1985), where Moran’s I is expressed in matrix form:

$\mathbf{I} = \frac{\mathbf{Z}^T\mathbf{WZ}}{\mathbf 1^T \mathbf{W1}},$

where $\mathbf Z$ is the data matrix with scaled and centered variables in columns, $\mathbf W$ is the spatial weights matrix, and $\mathbf 1$ is a vector of all 1’s, so the denominator is in effect $\sum_{i=1}^n \sum_{j=1}^n w_{ij}$ . The diagonal entries are Moran’s I’s for the variables, and the off diagonal entries are the global versions of what we computed here that sum the bivariate local Moran’s I’s and divide by the sum of all spatial weights. Because $\mathbf W$ doesn’t have to be symmetric, this matrix may not be symmetric. Wartenberg diagonalized this matrix in place of the covariance matrix for spatial PCA. When using scaled and centered data and row normalized spatial weights matrix, MULTISPATI PCA is equivalent to Wartenberg’s approach (Dray, Saïd, and Débias 2008). Lee considered this asymmetry an inadequacy of Wartenberg’s approach as a bivariate association measure (Lee 2001). While I’m not sure how bivariate local Moran’s I helps with data analysis, it is an interesting piece of history.

Session info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] scico_1.5.0                    pheatmap_1.0.12               
#>  [3] scran_1.34.0                   scater_1.34.0                 
#>  [5] ggplot2_3.5.1                  scuttle_1.16.0                
#>  [7] SingleCellExperiment_1.28.1    SummarizedExperiment_1.36.0   
#>  [9] Biobase_2.66.0                 GenomicRanges_1.58.0          
#> [11] GenomeInfoDb_1.42.0            IRanges_2.40.0                
#> [13] S4Vectors_0.44.0               BiocGenerics_0.52.0           
#> [15] MatrixGenerics_1.18.0          matrixStats_1.4.1             
#> [17] SFEData_1.8.0                  Voyager_1.8.1                 
#> [19] SpatialFeatureExperiment_1.9.4
#> 
#> loaded via a namespace (and not attached):
#>   [1] splines_4.4.2             bitops_1.0-9             
#>   [3] filelock_1.0.3            tibble_3.2.1             
#>   [5] R.oo_1.27.0               lifecycle_1.0.4          
#>   [7] sf_1.0-19                 edgeR_4.4.0              
#>   [9] lattice_0.22-6            MASS_7.3-61              
#>  [11] magrittr_2.0.3            limma_3.62.1             
#>  [13] sass_0.4.9                rmarkdown_2.29           
#>  [15] jquerylib_0.1.4           yaml_2.3.10              
#>  [17] metapod_1.14.0            sp_2.1-4                 
#>  [19] RColorBrewer_1.1-3        DBI_1.2.3                
#>  [21] multcomp_1.4-26           abind_1.4-8              
#>  [23] spatialreg_1.3-5          zlibbioc_1.52.0          
#>  [25] purrr_1.0.2               R.utils_2.12.3           
#>  [27] RCurl_1.98-1.16           TH.data_1.1-2            
#>  [29] rappdirs_0.3.3            sandwich_3.1-1           
#>  [31] GenomeInfoDbData_1.2.13   ggrepel_0.9.6            
#>  [33] irlba_2.3.5.1             terra_1.7-83             
#>  [35] units_0.8-5               RSpectra_0.16-2          
#>  [37] dqrng_0.4.1               pkgdown_2.1.1            
#>  [39] DelayedMatrixStats_1.28.0 codetools_0.2-20         
#>  [41] DropletUtils_1.26.0       DelayedArray_0.32.0      
#>  [43] tidyselect_1.2.1          UCSC.utils_1.2.0         
#>  [45] memuse_4.2-3              farver_2.1.2             
#>  [47] viridis_0.6.5             ScaledMatrix_1.14.0      
#>  [49] BiocFileCache_2.14.0      jsonlite_1.8.9           
#>  [51] BiocNeighbors_2.0.0       e1071_1.7-16             
#>  [53] survival_3.7-0            systemfonts_1.1.0        
#>  [55] dbscan_1.2-0              tools_4.4.2              
#>  [57] ggnewscale_0.5.0          ragg_1.3.3               
#>  [59] Rcpp_1.0.13-1             glue_1.8.0               
#>  [61] gridExtra_2.3             SparseArray_1.6.0        
#>  [63] xfun_0.49                 EBImage_4.48.0           
#>  [65] dplyr_1.1.4               HDF5Array_1.34.0         
#>  [67] withr_3.0.2               BiocManager_1.30.25      
#>  [69] fastmap_1.2.0             boot_1.3-31              
#>  [71] rhdf5filters_1.18.0       bluster_1.16.0           
#>  [73] fansi_1.0.6               spData_2.3.3             
#>  [75] digest_0.6.37             rsvd_1.0.5               
#>  [77] mime_0.12                 R6_2.5.1                 
#>  [79] textshaping_0.4.0         colorspace_2.1-1         
#>  [81] wk_0.9.4                  LearnBayes_2.15.1        
#>  [83] jpeg_0.1-10               RSQLite_2.3.8            
#>  [85] R.methodsS3_1.8.2         utf8_1.2.4               
#>  [87] generics_0.1.3            data.table_1.16.2        
#>  [89] class_7.3-22              httr_1.4.7               
#>  [91] htmlwidgets_1.6.4         S4Arrays_1.6.0           
#>  [93] spdep_1.3-6               pkgconfig_2.0.3          
#>  [95] gtable_0.3.6              blob_1.2.4               
#>  [97] XVector_0.46.0            htmltools_0.5.8.1        
#>  [99] fftwtools_0.9-11          scales_1.3.0             
#> [101] png_0.1-8                 SpatialExperiment_1.16.0 
#> [103] knitr_1.49                rjson_0.2.23             
#> [105] coda_0.19-4.1             nlme_3.1-166             
#> [107] curl_6.0.1                proxy_0.4-27             
#> [109] cachem_1.1.0              zoo_1.8-12               
#> [111] rhdf5_2.50.0              BiocVersion_3.20.0       
#> [113] KernSmooth_2.23-24        vipor_0.4.7              
#> [115] parallel_4.4.2            AnnotationDbi_1.68.0     
#> [117] desc_1.4.3                s2_1.1.7                 
#> [119] pillar_1.9.0              grid_4.4.2               
#> [121] vctrs_0.6.5               BiocSingular_1.22.0      
#> [123] dbplyr_2.5.0              beachmat_2.22.0          
#> [125] sfheaders_0.4.4           cluster_2.1.6            
#> [127] beeswarm_0.4.0            evaluate_1.0.1           
#> [129] zeallot_0.1.0             magick_2.8.5             
#> [131] mvtnorm_1.3-2             cli_3.6.3                
#> [133] locfit_1.5-9.10           compiler_4.4.2           
#> [135] rlang_1.1.4               crayon_1.5.3             
#> [137] labeling_0.4.3            classInt_0.4-10          
#> [139] ggbeeswarm_0.7.2          fs_1.6.5                 
#> [141] viridisLite_0.4.2         deldir_2.0-4             
#> [143] BiocParallel_1.40.0       munsell_0.5.1            
#> [145] Biostrings_2.74.0         tiff_0.1-12              
#> [147] Matrix_1.7-1              ExperimentHub_2.14.0     
#> [149] patchwork_1.3.0           sparseMatrixStats_1.18.0 
#> [151] bit64_4.5.2               Rhdf5lib_1.28.0          
#> [153] KEGGREST_1.46.0           statmod_1.5.0            
#> [155] AnnotationHub_3.14.0      igraph_2.1.1             
#> [157] memoise_2.0.1             bslib_0.8.0              
#> [159] bit_4.5.0

References

Dray, Stéphane, Sonia Saïd, and Françis Débias. 2008. “Spatial Ordination of Vegetation Data Using a Generalization of Wartenberg’s Multivariate Spatial Correlation.” J. Veg. Sci. 19 (1): 45–56.

Langfelder, Peter, and Steve Horvath. 2008. “WGCNA: An R Package for Weighted Correlation Network Analysis.” BMC Bioinformatics 9 (December): 559.

Lee, Sang-Il. 2001. “Developing a Bivariate Spatial Association Measure: An Integration of Pearson’s r and Moran’s I.” J. Geogr. Syst. 3 (4): 369–85.

Wartenberg, Daniel. 1985. “Multivariate Spatial Correlation: A Method for Exploratory Geographical Analysis.” Geogr. Anal. 17 (4): 263–83.