Variogram

Introduction

In geostatistical data, an underlying spatial process is sampled at known locations. Kriging uses a Gaussian process model to interpolate the values between the sample locations, and the semivariogram is used to model the spatial dependency between the locations as the covariance of the Gaussian process. When not kriging, the semivariogram can be used as an exploratory data analysis tool to find the length scale and anisotropy of spatial autocorrelation. The semivariogram is defined as

$\gamma(t) = \frac 1 2 \mathrm{Var}(X_t - X_0),$

where $X$ is the value such as gene expression, and $t$ is a spatial vector. $X_0$ is the value at a location of interest, and $X_t$ is the value lagged by $t$ . With positive spatial autocorrelation, the variance would be smaller among nearby values, so the variogram would increase with distance, eventually leveling off when the distance is beyond the length scale of spatial autocorrelation. The “semi” comes from the 1/2, which comes from the assumption that the Gaussian process is weakly stationary, i.e. the covariance between two locations only depends on the spatial lag between them:

$\begin{align} \mathrm{Var}(X_{t_2} - X_{t_1}) &= \mathrm{Var}(X_{t_2}) + \mathrm{Var}(X_{t_1}) - 2\mathrm{Cov}(X_{t_2}, X_{t_1}) \\ &= 2\rho(0) - 2\rho(t_2 - t_1), \end{align}$

where $\rho$ is a covariance function and $t_1$ and $t_2$ are spatial locations. A model can be fitted to the empirical semivariogram, to model this $\rho$ . That the variance of differences between the value across locations only depends on the spatial lag means intrinsically stationary, which is even weaker and more generalizable than weakly stationary. The weaker assumption is used in kriging.

This vignette demonstrates the variogram as an ESDA tool, including interpretation of the univariate variogram, anisotropic variograms (variograms in different directions), variogram maps, and bivariate cross variograms.

Here we load the packages:

library(Voyager)
library(SFEData)
library(SpatialFeatureExperiment)
library(scater)
library(scran)
library(ggplot2)
library(BiocParallel)
library(bluster)
library(dplyr)
theme_set(theme_bw())

The Slide-seq melanoma metastasis data (Biermann et al. 2022) is used for demonstration. QC is performed in another vignette.

(sfe <- BiermannMelaMetasData(dataset = "MBM05_rep1"))
#> see ?SFEData and browseVignettes('SFEData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
#> class: SpatialFeatureExperiment 
#> dim: 27566 29536 
#> metadata(0):
#> assays(1): counts
#> rownames(27566): A1BG A1BG-AS1 ... ZZZ3 snoZ196
#> rowData names(3): means vars cv2
#> colnames(29536): ACCACTCATTTCTC-1 GTTCANTCCACGTA-1 ... ACGCGCAATCGTAG-1
#>   TTGTTCCGTTCATA-1
#> colData names(4): sample_id nCounts nGenes prop_mito
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : xcoord ycoord
#> imgData names(1): sample_id
#> 
#> unit: full_res_image_pixels
#> Geometries:
#> colGeometries: centroids (POINT) 
#> 
#> Graphs:
#> sample01:

sfe <- sfe[, colData(sfe)$prop_mito < 0.1]
sfe <- sfe[rowSums(counts(sfe)) > 0,]

sfe <- logNormCounts(sfe)

Variograms will be demonstrated on some of the top highly variable genes (HVGs)

dec <- modelGeneVar(sfe)
hvgs <- getTopHVGs(dec, n = 50)

The same user interface used to run Moran’s I can be used to compute variograms. However, since the variogram uses spatial distances instead of spatial neighborhood graph, the colGraph does not need to be specified. Instead, a colGeometry can be specified, and if the geometry is not POINT, then spatialCoords(sfe) will be used to compute the distances. Behind the scene, the automap package is used, which fits a number of different variogram models to the empirical variogram and chooses one that fits the best. The automap package is a user friendly wrapper of gstat, a time honored package for geostatistics.

sfe <- runUnivariate(sfe, "variogram", hvgs, BPPARAM = SnowParam(2),
                     model = "Ste")
#> Warning: <anonymous>: ... may be used in an incorrect context: 'fun(x[i, ], ...)'

plotVariogram(sfe, hvgs[1:4], name = "variogram")

The data is binned by distance between spots and the variance is computed for each bin. While gstat’s plotting functions say “semivariance”, because the data is scaled so the variance is 1, I do think the variance rather than semivariance is plotted. The numbers by the points in the plot indicate the number of pairs of spots in each bin. “Ste” means the Matern model with M. Stein’s parameterization was fitted to the points.

Nugget is the variance at distance 0, or variance within the first distance bin. The data is scaled by default prior to variogram computation to make the variograms for multiple genes comparable.

Spatial autocorrelation makes the variance smaller at shorter distances. When the variogram levels off, it means that spatial autocorrelation no longer has an effect at this distance. Sill is the variance where the variogram levels off. Range is the distance where the variogram levels off.

In the first 4 genes, IGHG3 and IGKC seem to have stronger spatial autocorrelation that dissipate in 100 to 200 units (whether it’s microns or pixels is unclear from the publication), whereas spatial autocorrelation of B2M and MT-RNR1 is much weaker and has longer length scale.

Here the genes are plotted in space:

plotSpatialFeature(sfe, hvgs[1:4], size = 0.3) & 
    theme_bw() # To show the length units

The length scales of spatial autocorrelation for these genes are quite obvious from just plotting the genes. Then what’s the point of plotting variograms for ESDA? We can also compute variograms for a larger number of genes and cluster the variograms for patterns in spatial autocorrelation length scales, or compare variograms of the same genes across different samples. Here we cluster the variograms for top highly variable genes (HVGs):

The BLUSPARAM argument is used to specify methods of clustering, as implemented in the bluster package. Here we use hierarchical clustering.

clusts <- clusterVariograms(sfe, hvgs, BLUSPARAM = HclustParam())

Then plot the clusters:

plotVariogram(sfe, hvgs, color_by = clusts, group = "feature", use_lty = FALSE,
              show_np = FALSE)

It seems that there are many genes, like MT-RNR1, with weak spatial autocorrelation over longer length scales, genes with stronger and shorter range spatial autocorrelation (around 150 to 200 units) like IGKC, and genes with somewhat longer length scale of spatial autocorrelation (around 400 units).

Plot one gene from each cluster in space:

genes_clusts <- clusts |> 
    group_by(cluster) |> 
    slice_head(n = 1) |> 
    pull(feature)

plotSpatialFeature(sfe, genes_clusts, size = 0.3)

MT-RNR1 is more widely expressed. IGKC and ICHC3 are restricted to smaller areas, and IGHM is restricted to even smaller areas. Note that genes with variograms in the same cluster don’t have to be co-expressed; they only need to have similar length scales and strengths of spatial autocorrelation.

Anisotropy

Anisotropy means different in different directions. An example is the cerebral cortex, which has a layered structure. The variogram can be computed in different directions.

Anisotropic variogram

The directions on which to compute variograms can be explicitly specified, in the alpha argument. However, since gstat does not fit anisotropic variograms, the model is fitted to all directions and the empirical variograms at each angle are plotted separately. Here we compute anisotropic variograms for the 4 genes above:

sfe <- runUnivariate(sfe, "variogram", genes_clusts, alpha = c(0, 45, 90, 135),
                     # To not to overwrite omnidirectional variogram results
                     name = "variogram_anis", model = "Ste", 
                     BPPARAM = SnowParam(2))
#> gstat does not fit anisotropic variograms. Variogram model is fitted to the whole dataset.
#> Warning: <anonymous>: ... may be used in an incorrect context: 'fun(x[i, ], ...)'

plotVariogram(sfe, genes_clusts, group = "angle", name = "variogram_anis",
              show_np = FALSE)

Here the line is the variogram model fitted to all directions and the text describes this model. The points show the angles in different colors. Zero degree points north (up), and the angles go clockwise.

Variogram map

The variogram map is another way to visualize spatial autocorrelation in different directions. It bins distances in x and distances in y, so we have a grid of distances where the variance is computed. Just like the variograms above, the origin usually has a low value, because spatial autocorrelation reduces the variance in a short distance, and the values increase with increasing distance from the origin, but it can increase more quickly in some directions than others. Here to compute variogram maps for the 4 genes above:

sfe <- runUnivariate(sfe, "variogram_map", genes_clusts, width = 100, 
                     cutoff = 800, BPPARAM = SnowParam(2), name = "variogram_map2")
#> Warning: <anonymous>: ... may be used in an incorrect context: 'fun(x[i, ], ...)'

The width argument is the width of the bins, and cutoff is the maximum distance.

plotVariogramMap(sfe, genes_clusts, name = "variogram_map2")

Cross variogram

The cross variogram is used in cokriging, which uses multiple variables in the spatial interpolation model. The cross variogram is defined as

$\gamma(t) = \frac 1 2 \mathrm{Cov}(X_t - X_0, Y_t - Y_0),$

where $Y$ is another variable. The cross variogram also has nugget, sill, and range. It shows how the covariance between two variables changes with distance. Voyager supports multiple bivariate spatial methods, and the cross variogram is one of them. Just like for univariate spatial methods, Voyager provides a uniform user interface for bivariate methods. However, bivariate local methods can’t be stored in the SFE object at present because they tend to have very different formats in outputs (e.g. a correlation matrix for Lee’s L and a list for most other methods) some of which may not be straightforward to store in the SFE object.

cross_v <- calculateBivariate(sfe, "cross_variogram", 
                              feature1 = "IGKC", feature2 = "IGHG3")

plotCrossVariogram(cross_v, show_np = FALSE)

The facets are shown in a matrix, whose diagonal is the variogram for each gene, and off diagonal entries are cross variograms. Here for IGKC and IGHG3, the length scale of the covariance is similar to that of spatial autocorrelation.

There is also a cross variogram map to show the cross variogram in different directions:

cross_v_map <- calculateBivariate(sfe, "cross_variogram_map",
                                  feature1 = "IGKC", feature2 = "IGHG3",
                                  width = 100, cutoff = 800)

plotCrossVariogramMap(cross_v_map)

Session Info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] dplyr_1.1.4                    bluster_1.16.0                
#>  [3] BiocParallel_1.40.0            scran_1.34.0                  
#>  [5] scater_1.34.0                  ggplot2_3.5.1                 
#>  [7] scuttle_1.16.0                 SingleCellExperiment_1.28.1   
#>  [9] SummarizedExperiment_1.36.0    Biobase_2.66.0                
#> [11] GenomicRanges_1.58.0           GenomeInfoDb_1.42.0           
#> [13] IRanges_2.40.0                 S4Vectors_0.44.0              
#> [15] BiocGenerics_0.52.0            MatrixGenerics_1.18.0         
#> [17] matrixStats_1.4.1              SFEData_1.8.0                 
#> [19] Voyager_1.8.1                  SpatialFeatureExperiment_1.9.4
#> 
#> loaded via a namespace (and not attached):
#>   [1] splines_4.4.2             bitops_1.0-9             
#>   [3] filelock_1.0.3            tibble_3.2.1             
#>   [5] R.oo_1.27.0               xts_0.14.1               
#>   [7] lifecycle_1.0.4           sf_1.0-19                
#>   [9] edgeR_4.4.0               lattice_0.22-6           
#>  [11] MASS_7.3-61               magrittr_2.0.3           
#>  [13] limma_3.62.1              sass_0.4.9               
#>  [15] rmarkdown_2.29            jquerylib_0.1.4          
#>  [17] yaml_2.3.10               metapod_1.14.0           
#>  [19] sp_2.1-4                  RColorBrewer_1.1-3       
#>  [21] DBI_1.2.3                 multcomp_1.4-26          
#>  [23] abind_1.4-8               spatialreg_1.3-5         
#>  [25] zlibbioc_1.52.0           purrr_1.0.2              
#>  [27] R.utils_2.12.3            RCurl_1.98-1.16          
#>  [29] TH.data_1.1-2             rappdirs_0.3.3           
#>  [31] sandwich_3.1-1            GenomeInfoDbData_1.2.13  
#>  [33] ggrepel_0.9.6             irlba_2.3.5.1            
#>  [35] terra_1.7-83              units_0.8-5              
#>  [37] RSpectra_0.16-2           dqrng_0.4.1              
#>  [39] pkgdown_2.1.1             DelayedMatrixStats_1.28.0
#>  [41] codetools_0.2-20          DropletUtils_1.26.0      
#>  [43] DelayedArray_0.32.0       gstat_2.1-2              
#>  [45] tidyselect_1.2.1          UCSC.utils_1.2.0         
#>  [47] memuse_4.2-3              farver_2.1.2             
#>  [49] viridis_0.6.5             ScaledMatrix_1.14.0      
#>  [51] BiocFileCache_2.14.0      jsonlite_1.8.9           
#>  [53] BiocNeighbors_2.0.0       e1071_1.7-16             
#>  [55] survival_3.7-0            systemfonts_1.1.0        
#>  [57] tools_4.4.2               ggnewscale_0.5.0         
#>  [59] ragg_1.3.3                snow_0.4-4               
#>  [61] Rcpp_1.0.13-1             glue_1.8.0               
#>  [63] gridExtra_2.3             SparseArray_1.6.0        
#>  [65] xfun_0.49                 EBImage_4.48.0           
#>  [67] HDF5Array_1.34.0          withr_3.0.2              
#>  [69] BiocManager_1.30.25       fastmap_1.2.0            
#>  [71] ggh4x_0.2.8               boot_1.3-31              
#>  [73] rhdf5filters_1.18.0       fansi_1.0.6              
#>  [75] spData_2.3.3              digest_0.6.37            
#>  [77] rsvd_1.0.5                mime_0.12                
#>  [79] R6_2.5.1                  textshaping_0.4.0        
#>  [81] colorspace_2.1-1          wk_0.9.4                 
#>  [83] LearnBayes_2.15.1         jpeg_0.1-10              
#>  [85] RSQLite_2.3.8             R.methodsS3_1.8.2        
#>  [87] intervals_0.15.5          utf8_1.2.4               
#>  [89] generics_0.1.3            data.table_1.16.2        
#>  [91] FNN_1.1.4.1               class_7.3-22             
#>  [93] httr_1.4.7                htmlwidgets_1.6.4        
#>  [95] S4Arrays_1.6.0            spdep_1.3-6              
#>  [97] pkgconfig_2.0.3           scico_1.5.0              
#>  [99] gtable_0.3.6              blob_1.2.4               
#> [101] XVector_0.46.0            htmltools_0.5.8.1        
#> [103] fftwtools_0.9-11          automap_1.1-12           
#> [105] scales_1.3.0              png_0.1-8                
#> [107] SpatialExperiment_1.16.0  knitr_1.49               
#> [109] rjson_0.2.23              spacetime_1.3-2          
#> [111] coda_0.19-4.1             nlme_3.1-166             
#> [113] curl_6.0.1                proxy_0.4-27             
#> [115] cachem_1.1.0              zoo_1.8-12               
#> [117] rhdf5_2.50.0              BiocVersion_3.20.0       
#> [119] KernSmooth_2.23-24        vipor_0.4.7              
#> [121] parallel_4.4.2            AnnotationDbi_1.68.0     
#> [123] desc_1.4.3                s2_1.1.7                 
#> [125] reshape_0.8.9             pillar_1.9.0             
#> [127] grid_4.4.2                vctrs_0.6.5              
#> [129] BiocSingular_1.22.0       dbplyr_2.5.0             
#> [131] beachmat_2.22.0           sfheaders_0.4.4          
#> [133] cluster_2.1.6             beeswarm_0.4.0           
#> [135] evaluate_1.0.1            zeallot_0.1.0            
#> [137] magick_2.8.5              mvtnorm_1.3-2            
#> [139] cli_3.6.3                 locfit_1.5-9.10          
#> [141] compiler_4.4.2            rlang_1.1.4              
#> [143] crayon_1.5.3              labeling_0.4.3           
#> [145] classInt_0.4-10           plyr_1.8.9               
#> [147] ggbeeswarm_0.7.2          fs_1.6.5                 
#> [149] stars_0.6-7               viridisLite_0.4.2        
#> [151] deldir_2.0-4              munsell_0.5.1            
#> [153] Biostrings_2.74.0         tiff_0.1-12              
#> [155] Matrix_1.7-1              ExperimentHub_2.14.0     
#> [157] patchwork_1.3.0           sparseMatrixStats_1.18.0 
#> [159] bit64_4.5.2               Rhdf5lib_1.28.0          
#> [161] KEGGREST_1.46.0           statmod_1.5.0            
#> [163] AnnotationHub_3.14.0      igraph_2.1.1             
#> [165] memoise_2.0.1             bslib_0.8.0              
#> [167] bit_4.5.0

References

Biermann, Jana, Johannes C Melms, Amit Dipak Amin, Yiping Wang, Lindsay A Caprio, Alcida Karz, Somnath Tagore, et al. 2022. “Dissecting the Treatment-Naive Ecosystem of Human Melanoma Brain Metastasis.” Cell 185 (14): 2591–2608.e30.

Variogram

Lambda Moses

2024-11-23

Introduction

Variogram

Anisotropy

Anisotropic variogram

Variogram map

Cross variogram

Session Info

References