MULTISPATI PCA and negative spatial autocorrelation
Lambda Moses
2024-11-23
Source:vignettes/multispati.Rmd
multispati.Rmd
Introduction
Due to the large number of genes quantified in single cell and spatial transcriptomics, dimension reduction is part of the standard workflow to analyze such data, to visualize, to help interpreting the data, to distill relevant information and reduce noise, to facilitate downstream analyses such as clustering and pseudotime, to project different samples into a shared latent space for data integration, and so on.
The first dimension reduction methods we learn about, such as good
old principal component analysis (PCA), tSNE, and UMAP, don’t use
spatial information. With the rise of spatial transcriptomics, some
dimension reduction methods that take spatial dependence into account
have been written. Some, such as SpatialPCA
(Shang and Zhou 2022), NSF
(Townes2023-bi?), and
MEFISTO
(Velten2022-gv?) use factor
analysis or probabilistic PCA which is related to factor analysis, and
model the factors as Gaussian processes, with a spatial kernel for the
covariance matrix, so the factors have positive spatial autocorrelation
and can be used for downstream clustering where the clusters can be more
spatially coherent. Some use graph convolution networks on a spatial
neighborhood graph to find spatially informed embeddings of the cells,
such as conST
(Zong2022-tb?) and
SpaceFlow
(Ren et al. 2022).
SpaSRL
(Zhang et al. 2023)
finds a low dimension projection of spatial neighborhood augmented
data.
Spatially informed dimension reduction is actually not new, and dates
back to at least 1985, with Wartenberg’s crossover of Moran’s I and PCA
(Wartenberg 1985), which was generalized
and further developed as MULTISPATI PCA (Dray,
Saïd, and Débias 2008), implemented in the adespatial
package on CRAN. In short, while PCA tries to maximize the variance
explained by each PC, MULTISPATI maximizes the product of Moran’s I and
variance explained. Also, while all the eigenvalues from PCA are
non-negative, because the covariance matrix is positive semidefinite,
MULTISPATI can give negative eigenvalues, which represent negative
spatial autocorrelation, which can be present and interesting but is not
as common as positive spatial autocorrelation and is often masked by the
latter (Griffith 2019).
In single cell -omics conventions, let denote a gene count matrix whose columns are cells or Visium spots and whose rows are genes, with columns. Let denote the row normalized adjacency matrix of the spatial neighborhood graph of the cells or Visium spots, which does not have to be symmetric. MULTISPATI diagonalizes a symmetric matrix
However, the implementation in adespatial
is more
general and can be used for other multivariate analyses in the duality
diagram paradigm, such as correspondence analysis; the equation
above is simplified just for PCA, without having to introduce the
duality diagram here.
Voyager 1.2.0 (Bioconductor release) has a much faster implementation
of MULTISPATI PCA based on RSpectra
.
See benchmark here.
In this vignette, we perform MULTISPATI PCA on the MERFISH mouse liver dataset. See the first vignette using this dataset here.
Here we load the packages used:
library(Voyager)
library(SFEData)
library(SpatialFeatureExperiment)
library(scater)
library(scran)
library(scuttle)
library(ggplot2)
library(stringr)
library(tidyr)
library(tibble)
library(bluster)
library(BiocSingular)
library(BiocParallel)
library(sf)
library(patchwork)
library(spdep)
set.ZeroPolicyOption(TRUE)
theme_set(theme_bw())
(sfe <- VizgenLiverData())
#> see ?SFEData and browseVignettes('SFEData') for documentation
#> downloading 1 resources
#> retrieving 1 resource
#> loading from cache
#> class: SpatialFeatureExperiment
#> dim: 385 395215
#> metadata(0):
#> assays(1): counts
#> rownames(385): Comt Ldha ... Blank-36 Blank-37
#> rowData names(3): means vars cv2
#> colnames(395215): 10482024599960584593741782560798328923
#> 111551578131181081835796893618918348842 ...
#> 92389687374928708938472537234969690424
#> 96399783859933548456002372694492036651
#> colData names(9): fov volume ... nCounts nGenes
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : center_x center_y
#> imgData names(1): sample_id
#>
#> unit: full_res_image_pixels
#> Geometries:
#> colGeometries: centroids (POINT), cellSeg (POLYGON)
#>
#> Graphs:
#> sample01:
MULTISPATI PCA is one of the multivariate methods introduced in
Voyager
1.2.0. All multivariate methods in
Voyager
are listed here:
listSFEMethods(variate = "multi")
#> name description
#> 1 multispati MULTISPATI PCA
#> 2 localC_multi Multivariate local Geary's C
#> 3 localC_perm_multi Multivariate local Geary's C permutation testing
When calling calculate*variate()
or
run*variate()
, the type
(2nd) argument takes
either an SFEMethod
object or a string that matches an
entry in the name
column in the data frame returned by
listSFEMethods()
.
Quality control
QC was already performed in the first vignette. We do the same QC here, but see the first vignette for more details.
is_blank <- str_detect(rownames(sfe), "^Blank-")
sfe <- addPerCellQCMetrics(sfe, subset = list(blank = is_blank))
get_neg_ctrl_outliers <- function(col, sfe, nmads = 3, log = FALSE) {
inds <- colData(sfe)$nCounts > 0 & colData(sfe)[[col]] > 0
df <- colData(sfe)[inds,]
outlier_inds <- isOutlier(df[[col]], type = "higher", nmads = nmads, log = log)
outliers <- rownames(df)[outlier_inds]
col2 <- str_remove(col, "^subsets_")
col2 <- str_remove(col2, "_percent$")
new_colname <- paste("is", col2, "outlier", sep = "_")
colData(sfe)[[new_colname]] <- colnames(sfe) %in% outliers
sfe
}
sfe <- get_neg_ctrl_outliers("subsets_blank_percent", sfe, log = TRUE)
Remove the outliers and empty cells:
inds <- !sfe$is_blank_outlier & sfe$nCounts > 0
(sfe <- sfe[, inds])
#> class: SpatialFeatureExperiment
#> dim: 385 390348
#> metadata(0):
#> assays(1): counts
#> rownames(385): Comt Ldha ... Blank-36 Blank-37
#> rowData names(4): means vars cv2 subsets_blank
#> colnames(390348): 10482024599960584593741782560798328923
#> 111551578131181081835796893618918348842 ...
#> 92389687374928708938472537234969690424
#> 96399783859933548456002372694492036651
#> colData names(16): fov volume ... total is_blank_outlier
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : center_x center_y
#> imgData names(1): sample_id
#>
#> unit: full_res_image_pixels
#> Geometries:
#> colGeometries: centroids (POINT), cellSeg (POLYGON)
#>
#> Graphs:
#> sample01:
There still are over 390,000 cells left after removing the outliers.
Next we compute Moran’s I for QC metrics, which requires a spatial neighborhood graph:
system.time(
colGraph(sfe, "knn5") <- findSpatialNeighbors(sfe, method = "knearneigh",
dist_type = "idw", k = 5,
style = "W", zero.policy = TRUE)
)
#> user system elapsed
#> 28.869 0.100 28.988
features_use <- c("nCounts", "nGenes", "volume")
sfe <- colDataUnivariate(sfe, "moran.mc", features_use,
colGraphName = "knn5", nsim = 49,
BPPARAM = MulticoreParam(2))
plotMoranMC(sfe, features_use)
Here Moran’s I is a little negative, but from the permutation
testing, it is significant, though that can also be the large number of
cells. The lower bound of Moran’s I given the spatial neighborhood graph
is usually closer to -0.5 than -1, while the upper bound is usually
around 1. The bounds given a specific spatial neighborhood graph can be
found with moranBounds()
, but because it double centers the
adjacency matrix, hence making it dense, there isn’t enough memory to
use it for the entire dataset. But we can look at the Moran bounds of a
small subset of data, which might be generalizable to the whole dataset
given that this tissue appears quite homogeneous in space.
bbox_use <- c(xmin = 6000, xmax = 7000, ymin = 4000, ymax = 5000)
inds2 <- st_intersects(cellSeg(sfe), st_as_sfc(st_bbox(bbox_use)),
sparse = FALSE)[,1]
sfe_sub <- sfe[, inds2]
#> Warning in sn2listw(df, style = style, zero.policy = zero.policy, from_mat2listw = TRUE): no-neighbour observations found, set zero.policy to TRUE;
#> this warning will soon become an error
Note that since SpatialFeatureExperiment v1.8, the sptial graphs are
subsetted rather than reconstructed when the SFE object is subsetted
because reconstruction tends to be more time consuming and the BPPARAM
and BNPARAM arguments can’t be stored and saved by
alabaster.sfe
. Subsetting removes some of the neighbors of
cells near the boundary of this bounding box but the vast majority of
cells still have all 5 nearest neighbors.
(mb <- moranBounds(colGraph(sfe_sub, "knn5")))
#> Imin Imax
#> -0.9623209 1.0779488
The lower bound is quite different from if we reconstruct the knn graph on the subset
colGraph(sfe_sub, "knn5_reconst") <- findSpatialNeighbors(sfe_sub, method = "knearneigh",
k = 5L, dist_type = "idw",
style = "W")
(mb2 <- moranBounds(colGraph(sfe_sub, "knn5_reconst")))
#> Imin Imax
#> -0.6079436 1.0608389
It would be cool to systematically investigate the effects of perturbations to the spatial neighborhood graph on Moran’s I and other spatial statistics.
So considering the bounds, the Moran’s I values of the QC metrics are more like
setNames(colFeatureData(sfe)[c("nCounts", "nGenes", "volume"),
"moran.mc_statistic_sample01"] / mb2["Imin"],
features_use)
#> nCounts nGenes volume
#> 0.17839356 0.15168017 0.03211427
whose magnitudes seem more substantial for nCounts
and
nGenes
if it’s positive spatial autocorrelation. So there
may be mild to moderate negative spatial autocorrelation.
# Normalize data
sfe <- logNormCounts(sfe)
Hepatic zonation
This dataset comes from a relatively large piece of tissue and we need to zoom into a smaller region to better see the local structures. Here we specify a bounding box.
bbox_use <- c(xmin = 6100, xmax = 7100, ymin = 7500, ymax = 8500)
A portal triad is shown near the top right of this bounding box. The two large vessels on the left and bottom right are central veins. The portal triad consists of the hepatic artery, portal vein which brings blood from the intestine, and bile duct, so it’s more oxygenated. The regions around the central vein is more deoxygenated. The different oxygen and nutrient contents mean that hepatocytes play different metabolic roles in the zones between the portal triad and the central vein. Here we plot some zonation marker genes from (Halpern et al. 2017).
markers <- c("Axin2", "Cyp1a2", "Gstm3", "Psmd4", # Pericentral
"Cyp2e1", "Asl", "Alb", "Ass1", # Monotonic but has intermediate
"Hamp", "Igfbp2", "Cyp8b1", "Mup3", # Non-monotonic
"Arg1", "Pck1", "C2", "Sdhd") # Periportal
Only 3 of these marker genes are present in this dataset. The first two are pericentral (near the central vein), and the last one is periportal (near the portal triad).
plotSpatialFeature(sfe, markers[inds], colGeometryName = "cellSeg",
ncol = 3, bbox = bbox_use)
Besides hepatocytes, the liver also has many endothelial cells and Kupffer cells (macrophages). Marker genes of these cells from (Bonnardel et al. 2019) are plotted to visualize these cell types in space:
# Kuppfer cells
kc_genes <- c("Timd4", "Vsig4", "Clec4f", "Clec1b", "Il18bp", "C6", "Irf7",
"Slc40a1", "Cdh5", "Nr1h3", "Dmpk", "Paqr9", "Pcolce2", "Kcna2",
"Gbp8", "Iigp1", "Helz2", "Cd207", "Icos", "Adcy4", "Slc1a2",
"Rsad2", "Slc16a9", "Cd209f", "Oasl1", "Fam167a")
which(kc_genes %in% rownames(sfe))
#> [1] 9
Only one of the Kupffer cell markers is available in this dataset.
plotSpatialFeature(sfe, kc_genes[9], colGeometryName = "cellSeg",
bbox = bbox_use)
Expression of this gene does not seem very spatially coherent.
# Endothelial cells
lec_genes <- c("Rspo3", "Wnt2", "Wnt9b", "Pcdhgc5", "Ecm1", "Ltbp4", "Efnb2")
(inds_lec <- which(lec_genes %in% rownames(sfe)))
#> [1] 2 6 7
Only 3 of these endothelial cell marker genes are available in this dataset.
plotSpatialFeature(sfe, lec_genes[inds_lec], colGeometryName = "cellSeg",
bbox = bbox_use, ncol = 3)
Wnt2 seems to be more pericentral, while Ltbp4 and Efnb2 seem more periportal.
Some of these marker genes will show up in the top PC loadings non-spatial and spatial PCA.
Non-spatial PCA
First we run non-spatial PCA, to compare to MULTISPATI.
set.seed(29)
system.time(
sfe <- runPCA(sfe, ncomponents = 20, subset_row = !is_blank,
exprs_values = "logcounts",
scale = TRUE, BSPARAM = IrlbaParam())
)
#> user system elapsed
#> 13.947 5.977 12.990
gc()
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 16563261 884.6 26419770 1411.0 26419770 1411.0
#> Vcells 242231194 1848.1 540607325 4124.6 540592386 4124.4
That’s pretty quick for almost 400,000 cells, but there aren’t that many genes here. Use the elbow plot to see variance explained by each PC:
ElbowPlot(sfe)
Plot top gene loadings in each PC
plotDimLoadings(sfe)
Many of these genes seem to be related to the endothelium. PC1 and PC4 concern the Kupffer cells as well, as the Kupffer cell marker gene Cdh5 has high loading.
Plot the first 4 PCs in space
spatialReducedDim(sfe, "PCA", 4, colGeometryName = "centroids", scattermore = TRUE,
divergent = TRUE, diverge_center = 0)
PC1 and PC4 highlight the major blood vessels, while PC2 and PC3 have less spatial structure. While in the CosMX and Xenium datasets on this website, the top PCs have clear spatial structures despite the absence of spatial information in non-spatial PCA because of clear spatial compartments for some cell types, which does not seem to be the case in this dataset except for the blood vessels. We have seen above that some genes have strong spatial structures.
While PC2 and PC3 don’t seem to have large scale spatial structure, they may have more local spatial structure not obvious from plotting the entire section, so we zoom into a bounding box which shows hepatic zonation.
spatialReducedDim(sfe, "PCA", ncomponents = 4, colGeometryName = "cellSeg",
bbox = bbox_use, divergent = TRUE, diverge_center = 0)
There’s some spatial structure at a smaller scale, perhaps some negative spatial autocorrelation.
MULTISPATI PCA
system.time({
sfe <- runMultivariate(sfe, "multispati", colGraphName = "knn5", nfposi = 20,
nfnega = 20)
})
#> Warning in asMethod(object): sparse->dense coercion: allocating vector of size
#> 1.1 GiB
#> Warning: `listw2sparse()` was deprecated in SpatialFeatureExperiment 1.9.0.
#> ℹ Please use `spatialreg::as_dgRMatrix_listw()` instead.
#> ℹ The deprecated feature was likely used in the Voyager package.
#> Please report the issue at <https://github.com/pachterlab/voyager/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> user system elapsed
#> 14.723 4.987 18.651
Then plot the most positive and most negative eigenvalues. Note that the eigenvalues here are not variance explained. Instead, they are the product of variance explained and Moran’s I. So the most positive eigenvalues correspond to eigenvectors that simultaneously explain more variance and have large positive Moran’s I. The most negative eigenvalues correspond to eigenvectors that simultaneously explain more variance and have negative Moran’s I.
ElbowPlot(sfe, nfnega = 20, reduction = "multispati")
Here the positive eigenvalues drop sharply from PC1 to PC4, and there
is only one very negative eigenvalue which might be interesting, which
is unsurprising given the moderately negative Moran’s I for
nCounts
and nGenes
. However, from the first
MERFISH vignette, none of the genes have very negative Moran’s I.
Perhaps the negative eigenvalue comes from negative spatial
autocorrelation in a gene program or “eigengene” and is not obvious from
individual genes. This is the beauty of multivariate analysis.
What do these components mean? Each component is a linear combination of genes to maximize the product of variance explained and Moran’s I. The second component maximizes this product provided that it’s orthogonal to the first component, and so on. As the loss in variance explained is usually not huge, these components can be considered axes along which spatially coherent groups of spots are separated from each other as much as possible according to expression of the highly variable genes, so in theory, clustering with positive MULTISPATI components should give more spatially coherent clusters. Because of the spatial coherence, MULTISPATI might be more robust to outliers.
plotDimLoadings(sfe, dims = c(1:3, 40), reduction = "multispati")
From gene loadings, PC40 seems to separate endothelial cells and Kupffer cells from hepatocytes.
Plot the these PCs:
spatialReducedDim(sfe, "multispati", components = c(1:3, 40),
colGeometryName = "cellSeg", bbox = bbox_use,
divergent = TRUE, diverge_center = 0)
The first two PCs pick up zoning. PC3 seems to have smaller scale spatial structure. PC”40” (should really be 300 something) is an example of negative spatial autocorrelation in biology. That Kupffer cells and endothelial cells are scattered among hepatocytes may play a functional role. This does not mean that non-spatial PCA is bad. While MULTISPATI tends not to lose too much variance explained in per PC with positive eigenvalues, it identifies co-expressed genes with spatially structured expression patterns. MULTISPATI tells a different story from non-spatial PCA. PCA cell embeddings are often used for downstream analysis. Whether to use MULTISPATI embeddings instead and which or how many PCs to use depend on the questions asked in the further downstream analyses.
Spatial autocorrelation of principal components
Moran’s I
Here we compare Moran’s I for cell embeddings in each non-spatial and MULTISPATI PC:
# non-spatial
sfe <- reducedDimMoransI(sfe, dimred = "PCA", components = 1:20,
BPPARAM = MulticoreParam(2))
# spatial
sfe <- reducedDimMoransI(sfe, dimred = "multispati", components = 1:40,
BPPARAM = MulticoreParam(2))
df_moran <- tibble(PCA = reducedDimFeatureData(sfe, "PCA")$moran_sample01[1:20],
MULTISPATI_pos =
reducedDimFeatureData(sfe, "multispati")$moran_sample01[1:20],
MULTISPATI_neg =
reducedDimFeatureData(sfe,"multispati")$moran_sample01[21:40] |>
rev(),
index = 1:20)
data("ditto_colors")
df_moran |>
pivot_longer(cols = -index, values_to = "value", names_to = "name") |>
ggplot(aes(index, value, color = name)) +
geom_line() +
scale_color_manual(values = ditto_colors) +
geom_hline(yintercept = 0, color = "gray") +
geom_hline(yintercept = mb2, linetype = 2, color = "gray") +
scale_y_continuous(breaks = scales::breaks_pretty()) +
scale_x_continuous(breaks = scales::breaks_width(5)) +
labs(y = "Moran's I", color = "Type", x = "Component")
In MULTISPATI, Moran’s I is high in PC1 and PC2, but then sharply drops. Moran’s I for the PC with the most negative eigenvalues is not very negative, which means the large magnitude of that eigenvalue comes from explaining more variance. However, considering the lower bound of Moran’s I that is around -0.6 instead of -1, the magnitude of Moran’s I for the PC with the most negative eigenvalue is not trivial.
min(df_moran$MULTISPATI_neg) / mb2[1]
#> Imin
#> 0.1374483
Non-spatial PCs are not sorted by Moran’s I; PC5 has surprising large Moran’s I.
spatialReducedDim(sfe, "PCA", component = 5, colGeometryName = "cellSeg",
divergent = TRUE, diverge_center = 0, bbox = bbox_use)
PC5 must be about zonation. Also show a larger scale:
spatialReducedDim(sfe, "PCA", components = 5, colGeometryName = "centroids",
divergent = TRUE, diverge_center = 0, scattermore = TRUE)
Moran scatter plot
Local positive and negative spatial autocorrelation can average out in global Moran’s I. From the zoomed in plots and gene loadings above, some PCs are about endothelial cells. The Moran scatter plot can help discovering more local heterogeneity.
sfe <- reducedDimUnivariate(sfe, "moran.plot", dimred = "PCA", components = 1:6)
plts <- lapply(seq_len(6), function(i) {
moranPlot(sfe, paste0("PC", i), binned = TRUE, hex = TRUE, plot_influential = FALSE)
})
wrap_plots(plts, widths = 1, heights = 1) +
plot_layout(ncol = 3) +
plot_annotation(tag_levels = "1",
title = "Moran scatter plot for non-spatial PCs") &
theme(legend.position = "none")
PCs 1-3 have some fainter clusters outside the main cluster, indicating heterogeneous spatial autocorrelation. Also make Moran scatter plots for MULTISPATI
sfe <- reducedDimUnivariate(sfe, "moran.plot", dimred = "multispati",
components = c(1:5, 40),
# Not to overwrite non-spatial PCA moran plots
name = "moran.plot2")
plts2 <- lapply(c(1:5, 40), function(i) {
moranPlot(sfe, paste0("PC", i), binned = TRUE, hex = TRUE,
plot_influential = FALSE, name = "moran.plot2")
})
wrap_plots(plts2, widths = 1, heights = 1) +
plot_layout(ncol = 3) +
plot_annotation(tag_levels = "1",
title = "Moran scatter plot for MULTISPATI PCs") &
theme(legend.position = "none")
There are some interesting clusters.
Clustering with MULTISPATI PCA
In the standard scRNA-seq data analysis workflow, a k nearest neighbor graph is found in the PCA space, which is then used for graph based clustering such as Louvain and Leiden, which is used to perform differential expression. Spatial dimension reductions can similarly be used to perform clustering, to identify spatial regions in the tissue, as done in (Shang and Zhou 2022; Ren et al. 2022; Zhang et al. 2023). This type of studies often use a manual segmentation as ground truth to compare different methods that identify spatial regions.
The problem with this is that spatial region methods are meant to help us to identify novel spatial regions based on new -omics data, which might reveal what’s previously unknown from manual annotations. If the output from a method doesn’t match manual annotations, it might simply be pointing out a previously unknown aspect of the tissue rather than wrong. Depending on the questions being asked, there can simultaneously be multiple spatial partitions. This happens in geographical space. For instance, there’s land use and neighborhood boundaries, but equally valid are watershed boundaries and types of rock formation. Which one is relevant depends on the questions asked.
Here we perform Leiden clustering with non-spatial and MULTISPATI PCA and compare the results. For the k nearest neighbor graph, I used the default k = 10.
system.time({
set.seed(29)
sfe$clusts_nonspatial <- clusterCells(sfe, use.dimred = "PCA",
BLUSPARAM = NNGraphParam(
cluster.fun = "leiden",
cluster.args = list(
objective_function = "modularity",
resolution_parameter = 1
)
))
})
#> user system elapsed
#> 346.860 0.355 347.243
See if clustering with the positive MULTISPATI PCs give more spatially coherent clusters
system.time({
set.seed(29)
sfe$clusts_multispati <- clusterRows(reducedDim(sfe, "multispati")[,1:20],
BLUSPARAM = NNGraphParam(
cluster.fun = "leiden",
cluster.args = list(
objective_function = "modularity",
resolution_parameter = 1
)
))
})
#> user system elapsed
#> 449.761 0.295 450.072
Plot the clusters in space:
plotSpatialFeature(sfe, c("clusts_nonspatial", "clusts_multispati"),
colGeometryName = "centroids",
scattermore = TRUE) &
guides(colour = guide_legend(override.aes = list(size=2), ncol = 2))
The MULTISPATI clusters do look somewhat more spatially structured than clusters from non-spatial PCA. Also zoom into a small area:
plotSpatialFeature(sfe, c("clusts_nonspatial", "clusts_multispati"),
colGeometryName = "cellSeg",
bbox = bbox_use) &
guides(fill = guide_legend(ncol = 2))
What do these clusters mean? Clusters are supposed to be groups of different spots that are more similar within a group, sharing some characteristics. Non-spatial and MULTISPATI PCA use different characteristics for the clustering. Non-spatial PCA finds genes that are good for telling cell types apart, although those genes may happen to be very spatially structured. Non-spatial clustering aims to find these groups only from gene expression, and cells with similar gene expression can be surrounded by cells of other types in histological space. This is just like mapping Art Deco buildings, which are often near Spanish revival and Beaux Art buildings whose styles are quite different and perform different functions, thus not necessarily forming a coherent spatial region.
In contrast, MULTISPATI’s positive components find genes that must characterize spatial regions in addition to distinguishing between different cell types. Which genes are involved in each MULTISPATI component may be as interesting as the clusters. It would be interesting to perform gene set enrichment analysis, or to interpret this as some sort of spatial patterns of spatially variable genes. This is like mapping when the buildings were built, so Art Deco, Spanish revival, Beaux Art popular in the 1920s and 1930s will end up in the same cluster and form a more spatially coherent region, which can be found in DTLA Historical Core and Jewelry District, and Old Pasadena. Hence non-spatial clustering of spatial data isn’t necessarily bad. Rather, it tells a different story and reveals different aspects of the data from spatial clustering.
Session Info
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] spdep_1.3-6 spData_2.3.3
#> [3] patchwork_1.3.0 sf_1.0-19
#> [5] BiocParallel_1.40.0 BiocSingular_1.22.0
#> [7] bluster_1.16.0 tibble_3.2.1
#> [9] tidyr_1.3.1 stringr_1.5.1
#> [11] scran_1.34.0 scater_1.34.0
#> [13] ggplot2_3.5.1 scuttle_1.16.0
#> [15] SingleCellExperiment_1.28.1 SummarizedExperiment_1.36.0
#> [17] Biobase_2.66.0 GenomicRanges_1.58.0
#> [19] GenomeInfoDb_1.42.0 IRanges_2.40.0
#> [21] S4Vectors_0.44.0 BiocGenerics_0.52.0
#> [23] MatrixGenerics_1.18.0 matrixStats_1.4.1
#> [25] SFEData_1.8.0 Voyager_1.8.1
#> [27] SpatialFeatureExperiment_1.9.4
#>
#> loaded via a namespace (and not attached):
#> [1] splines_4.4.2 bitops_1.0-9
#> [3] filelock_1.0.3 R.oo_1.27.0
#> [5] lifecycle_1.0.4 edgeR_4.4.0
#> [7] lattice_0.22-6 MASS_7.3-61
#> [9] magrittr_2.0.3 limma_3.62.1
#> [11] sass_0.4.9 rmarkdown_2.29
#> [13] jquerylib_0.1.4 yaml_2.3.10
#> [15] metapod_1.14.0 sp_2.1-4
#> [17] RColorBrewer_1.1-3 DBI_1.2.3
#> [19] multcomp_1.4-26 abind_1.4-8
#> [21] spatialreg_1.3-5 zlibbioc_1.52.0
#> [23] purrr_1.0.2 R.utils_2.12.3
#> [25] RCurl_1.98-1.16 TH.data_1.1-2
#> [27] rappdirs_0.3.3 sandwich_3.1-1
#> [29] GenomeInfoDbData_1.2.13 ggrepel_0.9.6
#> [31] irlba_2.3.5.1 terra_1.7-83
#> [33] units_0.8-5 RSpectra_0.16-2
#> [35] dqrng_0.4.1 pkgdown_2.1.1
#> [37] DelayedMatrixStats_1.28.0 codetools_0.2-20
#> [39] DropletUtils_1.26.0 DelayedArray_0.32.0
#> [41] tidyselect_1.2.1 UCSC.utils_1.2.0
#> [43] memuse_4.2-3 farver_2.1.2
#> [45] viridis_0.6.5 ScaledMatrix_1.14.0
#> [47] BiocFileCache_2.14.0 jsonlite_1.8.9
#> [49] BiocNeighbors_2.0.0 e1071_1.7-16
#> [51] survival_3.7-0 systemfonts_1.1.0
#> [53] tools_4.4.2 ggnewscale_0.5.0
#> [55] ragg_1.3.3 Rcpp_1.0.13-1
#> [57] glue_1.8.0 gridExtra_2.3
#> [59] SparseArray_1.6.0 mgcv_1.9-1
#> [61] xfun_0.49 EBImage_4.48.0
#> [63] dplyr_1.1.4 HDF5Array_1.34.0
#> [65] withr_3.0.2 BiocManager_1.30.25
#> [67] fastmap_1.2.0 boot_1.3-31
#> [69] rhdf5filters_1.18.0 fansi_1.0.6
#> [71] digest_0.6.37 rsvd_1.0.5
#> [73] mime_0.12 R6_2.5.1
#> [75] textshaping_0.4.0 colorspace_2.1-1
#> [77] wk_0.9.4 scattermore_1.2
#> [79] LearnBayes_2.15.1 jpeg_0.1-10
#> [81] RSQLite_2.3.8 R.methodsS3_1.8.2
#> [83] hexbin_1.28.5 utf8_1.2.4
#> [85] generics_0.1.3 data.table_1.16.2
#> [87] class_7.3-22 httr_1.4.7
#> [89] htmlwidgets_1.6.4 S4Arrays_1.6.0
#> [91] pkgconfig_2.0.3 scico_1.5.0
#> [93] gtable_0.3.6 blob_1.2.4
#> [95] XVector_0.46.0 htmltools_0.5.8.1
#> [97] fftwtools_0.9-11 scales_1.3.0
#> [99] png_0.1-8 SpatialExperiment_1.16.0
#> [101] knitr_1.49 rjson_0.2.23
#> [103] coda_0.19-4.1 nlme_3.1-166
#> [105] curl_6.0.1 proxy_0.4-27
#> [107] cachem_1.1.0 zoo_1.8-12
#> [109] rhdf5_2.50.0 BiocVersion_3.20.0
#> [111] KernSmooth_2.23-24 vipor_0.4.7
#> [113] parallel_4.4.2 AnnotationDbi_1.68.0
#> [115] desc_1.4.3 s2_1.1.7
#> [117] pillar_1.9.0 grid_4.4.2
#> [119] vctrs_0.6.5 dbplyr_2.5.0
#> [121] beachmat_2.22.0 sfheaders_0.4.4
#> [123] cluster_2.1.6 beeswarm_0.4.0
#> [125] evaluate_1.0.1 zeallot_0.1.0
#> [127] magick_2.8.5 mvtnorm_1.3-2
#> [129] cli_3.6.3 locfit_1.5-9.10
#> [131] compiler_4.4.2 rlang_1.1.4
#> [133] crayon_1.5.3 labeling_0.4.3
#> [135] classInt_0.4-10 ggbeeswarm_0.7.2
#> [137] fs_1.6.5 stringi_1.8.4
#> [139] viridisLite_0.4.2 deldir_2.0-4
#> [141] munsell_0.5.1 Biostrings_2.74.0
#> [143] tiff_0.1-12 Matrix_1.7-1
#> [145] ExperimentHub_2.14.0 sparseMatrixStats_1.18.0
#> [147] bit64_4.5.2 Rhdf5lib_1.28.0
#> [149] KEGGREST_1.46.0 statmod_1.5.0
#> [151] AnnotationHub_3.14.0 igraph_2.1.1
#> [153] memoise_2.0.1 bslib_0.8.0
#> [155] bit_4.5.0