Basic analysis of 10X example Visium dataset
Lambda Moses
2024-11-22
Source:vignettes/visium_10x.Rmd
visium_10x.Rmd
Introduction
In this introductory vignette for SpatialFeatureExperiment
data representation and Voyager
anlaysis package, we demonstrate a basic exploratory data analysis (EDA)
of spatial transcriptomics data. Basic knowledge of R and SingleCellExperiment
is assumed.
This vignette showcases the packages with a Visium spatial gene expression system dataset, downloaded from the 10X website, in the Space Ranger output format. The technology was chosen due to its popularity, and therefore the availability of numerous publicly available datasets for analysis (Moses2022-xz?).
While Voyager was developed with the goal of facilitating the use of
geospatial methods in spatial genomics, this introductory vignette is
restricted to non-spatial scRNA-seq EDA with the Visium dataset. There
is another
Visium introductory vignette using a dataset in the SFEData
package but not from the 10X website.
Here we load the packages used in this vignette.
library(Voyager)
library(SpatialExperiment)
library(SpatialFeatureExperiment)
library(SingleCellExperiment)
library(ggplot2)
library(scater)
library(scuttle)
library(scran)
library(stringr)
library(patchwork)
library(bluster)
library(rjson)
theme_set(theme_bw())
Here we download the data from the 10X website. This is the unfiltered gene count matrix:
if (!file.exists("visium_ob.tar.gz"))
download.file("https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_raw_feature_bc_matrix.tar.gz",
destfile = "visium_ob.tar.gz")
This is the spatial information:
if (!file.exists("visium_ob_spatial.tar.gz"))
download.file("https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_spatial.tar.gz",
destfile = "visium_ob_spatial.tar.gz")
Decompress the downloaded content:
if (!dir.exists("outs")) {
dir.create("outs")
system("tar -xvf visium_ob.tar.gz -C outs")
system("tar -xvf visium_ob_spatial.tar.gz -C outs")
}
This is what the outs
directory in Space Ranger output
looks like:
list.dirs("outs")
#> [1] "outs" "outs/raw_feature_bc_matrix"
#> [3] "outs/spatial"
In the gene count matrix directory:
list.files("outs/raw_feature_bc_matrix")
#> [1] "barcodes.tsv.gz" "features.tsv.gz" "matrix.mtx.gz"
In the spatial directory:
list.files("outs/spatial")
#> [1] "aligned_fiducials.jpg" "detected_tissue_image.jpg"
#> [3] "scalefactors_json.json" "spatial_enrichment.csv"
#> [5] "tissue_hires_image.png" "tissue_lowres_image.png"
#> [7] "tissue_positions.csv"
The outputs in the spatial directory is explained here on the 10X website.
The tissue_hires_image.png
is a relatively high
resolution image of the tissue, but not full resolution. The
tissue_lowres_image.png
file is a low resolution image of
the tissue, suitable for quick plotting, and is shown here:
The array of dots framing the tissue seen in this image is the
fiducials, used to align the tissue image to the positions of the Visium
spots, so gene expression can be matched to spatial locations. The
alignment of the fiducials is shown in
aligned_fiducials.jpg
. Space Ranger can automatically
detect which spots are in tissue, and these spots are highlighted in
detected_tissue_image.jpg
.
Inside the scalefactors_json.json
file:
fromJSON(file = "outs/spatial/scalefactors_json.json")
#> $tissue_hires_scalef
#> [1] 0.2
#>
#> $tissue_lowres_scalef
#> [1] 0.06
#>
#> $fiducial_diameter_fullres
#> [1] 118.9155
#>
#> $spot_diameter_fullres
#> [1] 73.61433
spot_diameter_fullres
is the diameter of each Visium
spot in the full resolution H&E image in pixels.
tissue_hires_scalef
and tissue_lowres_scalef
are the ratio of the size of the high resolution (but not full
resolution) and low resolution H&E image to the full resolution
image. fiducial_diameter_fullres
is the diameter of each
fiducial spot used to align the spots to the H&E image in pixels in
the full resolution image.
The tissue_positions_list.csv
file contains information
for the coordinates of the spots in the full resolution image and
whether each spot is in tissue (in_tissue
, 1 means yes and
0 means no) as automatically detected by Space Ranger or manually
annotated in the Loupe browser.
head(read.csv("outs/spatial/tissue_positions.csv"))
#> barcode in_tissue array_row array_col pxl_row_in_fullres
#> 1 ACGCCTGACACGCGCT-1 0 0 0 8668
#> 2 TACCGATCCAACACTT-1 0 1 1 8611
#> 3 ATTAAAGCGGACGAGC-1 0 0 2 8554
#> 4 GATAAGGGACGATTAG-1 0 1 3 8498
#> 5 GTGCAAATCACCAATA-1 0 0 4 8441
#> 6 TGTTGGCTGGCGGAAG-1 0 1 5 8384
#> pxl_col_in_fullres
#> 1 1102
#> 2 1200
#> 3 1102
#> 4 1200
#> 5 1102
#> 6 1200
The spatial_enrichment.csv
file has Moran’s I
(presumably for spots in tissue) and its p-value for each gene that is
detected in at least 10 spots and has at least 20 UMIs.
head(read.csv("outs/spatial/spatial_enrichment.csv"))
#> Feature.ID Feature.Name Feature.Type I P.value
#> 1 ENSMUSG00000001023 S100a5 Gene Expression 0.7709048 0
#> 2 ENSMUSG00000019874 Fabp7 Gene Expression 0.6987346 0
#> 3 ENSMUSG00000002985 Apoe Gene Expression 0.6945210 0
#> 4 ENSMUSG00000025739 Gng13 Gene Expression 0.6585750 0
#> 5 ENSMUSG00000090223 Pcp4 Gene Expression 0.6317032 0
#> 6 ENSMUSG00000053310 Nrgn Gene Expression 0.6033600 0
#> Adjusted.p.value Feature.Counts.in.Spots.Under.Tissue
#> 1 0 9019
#> 2 0 13462
#> 3 0 67509
#> 4 0 5260
#> 5 0 45118
#> 6 0 10723
#> Median.Normalized.Average.Counts Barcodes.Detected.per.Feature
#> 1 15.848669 1021
#> 2 20.679932 1170
#> 3 76.635169 1184
#> 4 8.803694 1050
#> 5 25.811125 1133
#> 6 6.075966 898
Here we read the Space Ranger output into R as an SFE object:
(sfe <- read10xVisiumSFE(samples = ".", type = "sparse", data = "raw"))
#> >>> 10X Visium data will be loaded: outs
#> class: SpatialFeatureExperiment
#> dim: 32285 4992
#> metadata(0):
#> assays(1): counts
#> rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ...
#> ENSMUSG00000095019 ENSMUSG00000095041
#> rowData names(8): symbol Feature.Type ...
#> Median.Normalized.Average.Counts_sample01
#> Barcodes.Detected.per.Feature_sample01
#> colnames(4992): AAACAACGAATAGTTC-1 AAACAAGTATCTCCCA-1 ...
#> TTGTTTGTATTACACG-1 TTGTTTGTGTAAATTC-1
#> colData names(4): in_tissue array_row array_col sample_id
#> reducedDimNames(0):
#> mainExpName: NULL
#> altExpNames(0):
#> spatialCoords names(2) : pxl_col_in_fullres pxl_row_in_fullres
#> imgData names(4): sample_id image_id data scaleFactor
#>
#> unit: full_res_image_pixel
#> Geometries:
#> colGeometries: spotPoly (POLYGON)
#>
#> Graphs:
#> sample01:
Quality control (QC)
is_mt <- str_detect(rowData(sfe)$symbol, "^mt-")
sfe <- addPerCellQCMetrics(sfe, subsets = list(mito = is_mt))
names(colData(sfe))
#> [1] "in_tissue" "array_row" "array_col"
#> [4] "sample_id" "sum" "detected"
#> [7] "subsets_mito_sum" "subsets_mito_detected" "subsets_mito_percent"
#> [10] "total"
The mouse olfactory bulb is conventionally plotted horizontally. The entire SFE object can be transposed in histologial space to make the olfactory bulb horizontal.
sfe <- SpatialFeatureExperiment::transpose(sfe)
plotSpatialFeature(sfe, c("sum", "detected", "subsets_mito_percent"),
image_id = "lowres", maxcell = 5e4, ncol = 2)
Percentage of mitochondrial counts in spots outside tissue is higher near the tissue, especially on the left.
plotColData(sfe, "sum", x = "in_tissue", color_by = "in_tissue") +
plotColData(sfe, "detected", x = "in_tissue", color_by = "in_tissue") +
plotColData(sfe, "subsets_mito_percent", x = "in_tissue", color_by = "in_tissue") +
plot_layout(guides = "collect")
3 peaks, apparently histologically relevant. Also no obvious outliers.
plotColData(sfe, x = "sum", y = "subsets_mito_percent", color_by = "in_tissue") +
geom_density_2d()
This is unlike scRNA-seq data. Spots not in tissue have a wide range of mitocondrial percentage. Spots in tissue fall into 3 clusters in this plot, seemingly related to histological regions.
sfe_tissue <- sfe[,sfe$in_tissue]
plotColData(sfe_tissue, x = "sum", y = "detected", bins = 75)
#clusters <- quickCluster(sfe_tissue)
#sfe_tissue <- computeSumFactors(sfe_tissue, clusters=clusters)
#sfe_tissue <- sfe_tissue[, sizeFactors(sfe_tissue) > 0]
sfe_tissue <- logNormCounts(sfe_tissue)
dec <- modelGeneVar(sfe_tissue, lowess = FALSE)
hvgs <- getTopHVGs(dec, n = 2000)
Dimension reduction and clustering
sfe_tissue <- runPCA(sfe_tissue, ncomponents = 30, subset_row = hvgs,
scale = TRUE) # scale as in Seurat
ElbowPlot(sfe_tissue, ndims = 30)
names(rowData(sfe_tissue))
#> [1] "symbol"
#> [2] "Feature.Type"
#> [3] "I_sample01"
#> [4] "P.value_sample01"
#> [5] "Adjusted.p.value_sample01"
#> [6] "Feature.Counts.in.Spots.Under.Tissue_sample01"
#> [7] "Median.Normalized.Average.Counts_sample01"
#> [8] "Barcodes.Detected.per.Feature_sample01"
#> [9] "subsets_mito"
plotDimLoadings(sfe_tissue, dims = 1:5, swap_rownames = "symbol", ncol = 3)
Do the clustering to show on the dimension reduction plots
set.seed(29)
colData(sfe_tissue)$cluster <- clusterRows(reducedDim(sfe_tissue, "PCA")[,1:3],
BLUSPARAM = SNNGraphParam(
cluster.fun = "leiden",
cluster.args = list(
resolution_parameter = 0.5,
objective_function = "modularity")))
plotPCA(sfe_tissue, ncomponents = 5, colour_by = "cluster")
plotSpatialFeature(sfe_tissue, features = "cluster",
colGeometryName = "spotPoly", image_id = "lowres")
spatialReducedDim(sfe_tissue, "PCA", ncomponents = 5,
colGeometryName = "spotPoly", divergent = TRUE,
diverge_center = 0, ncol = 2,
image_id = "lowres", maxcell = 5e4)
markers <- findMarkers(sfe_tissue, groups = colData(sfe_tissue)$cluster,
test.type = "wilcox", pval.type = "all", direction = "up")
Significant markers for each cluster can be obtained as follows:
genes_use <- vapply(markers, function(x) rownames(x)[1], FUN.VALUE = character(1))
plotExpression(sfe_tissue, rowData(sfe_tissue)[genes_use, "symbol"], x = "cluster",
colour_by = "cluster", swap_rownames = "symbol")
These genes are interesting to view in spatial context:
plotSpatialFeature(sfe_tissue, genes_use, colGeometryName = "spotPoly", ncol = 2,
swap_rownames = "symbol", image_id = "lowres", maxcell = 5e4)
More spatial analyses on this dataset is performed in an “advanced” version of this vignette.
Session info
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 22.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
#> [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
#> [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
#> [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] rjson_0.2.23 bluster_1.16.0
#> [3] patchwork_1.3.0 stringr_1.5.1
#> [5] scran_1.34.0 scater_1.34.0
#> [7] scuttle_1.16.0 ggplot2_3.5.1
#> [9] SpatialExperiment_1.16.0 SingleCellExperiment_1.28.1
#> [11] SummarizedExperiment_1.36.0 Biobase_2.66.0
#> [13] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
#> [15] IRanges_2.40.0 S4Vectors_0.44.0
#> [17] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
#> [19] matrixStats_1.4.1 Voyager_1.8.1
#> [21] SpatialFeatureExperiment_1.9.4
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 jsonlite_1.8.9
#> [3] wk_0.9.4 magrittr_2.0.3
#> [5] ggbeeswarm_0.7.2 TH.data_1.1-2
#> [7] magick_2.8.5 farver_2.1.2
#> [9] rmarkdown_2.29 fs_1.6.5
#> [11] zlibbioc_1.52.0 ragg_1.3.3
#> [13] vctrs_0.6.5 spdep_1.3-6
#> [15] DelayedMatrixStats_1.28.0 RCurl_1.98-1.16
#> [17] terra_1.7-83 htmltools_0.5.8.1
#> [19] S4Arrays_1.6.0 BiocNeighbors_2.0.0
#> [21] Rhdf5lib_1.28.0 s2_1.1.7
#> [23] SparseArray_1.6.0 rhdf5_2.50.0
#> [25] LearnBayes_2.15.1 sass_0.4.9
#> [27] spData_2.3.3 KernSmooth_2.23-24
#> [29] bslib_0.8.0 htmlwidgets_1.6.4
#> [31] desc_1.4.3 sandwich_3.1-1
#> [33] zoo_1.8-12 cachem_1.1.0
#> [35] igraph_2.1.1 lifecycle_1.0.4
#> [37] pkgconfig_2.0.3 rsvd_1.0.5
#> [39] Matrix_1.7-1 R6_2.5.1
#> [41] fastmap_1.2.0 GenomeInfoDbData_1.2.13
#> [43] digest_0.6.37 colorspace_2.1-1
#> [45] ggnewscale_0.5.0 irlba_2.3.5.1
#> [47] dqrng_0.4.1 RSpectra_0.16-2
#> [49] textshaping_0.4.0 beachmat_2.22.0
#> [51] labeling_0.4.3 fansi_1.0.6
#> [53] spatialreg_1.3-5 httr_1.4.7
#> [55] abind_1.4-8 compiler_4.4.2
#> [57] proxy_0.4-27 withr_3.0.2
#> [59] tiff_0.1-12 BiocParallel_1.40.0
#> [61] viridis_0.6.5 DBI_1.2.3
#> [63] HDF5Array_1.34.0 R.utils_2.12.3
#> [65] MASS_7.3-61 DelayedArray_0.32.0
#> [67] classInt_0.4-10 tools_4.4.2
#> [69] units_0.8-5 vipor_0.4.7
#> [71] beeswarm_0.4.0 R.oo_1.27.0
#> [73] glue_1.8.0 nlme_3.1-166
#> [75] EBImage_4.48.0 rhdf5filters_1.18.0
#> [77] grid_4.4.2 sf_1.0-19
#> [79] cluster_2.1.6 memuse_4.2-3
#> [81] generics_0.1.3 isoband_0.2.7
#> [83] gtable_0.3.6 R.methodsS3_1.8.2
#> [85] class_7.3-22 data.table_1.16.2
#> [87] metapod_1.14.0 ScaledMatrix_1.14.0
#> [89] BiocSingular_1.22.0 sp_2.1-4
#> [91] utf8_1.2.4 XVector_0.46.0
#> [93] ggrepel_0.9.6 pillar_1.9.0
#> [95] limma_3.62.1 splines_4.4.2
#> [97] dplyr_1.1.4 lattice_0.22-6
#> [99] survival_3.7-0 deldir_2.0-4
#> [101] tidyselect_1.2.1 locfit_1.5-9.10
#> [103] sfheaders_0.4.4 knitr_1.49
#> [105] gridExtra_2.3 edgeR_4.4.0
#> [107] xfun_0.49 statmod_1.5.0
#> [109] DropletUtils_1.26.0 stringi_1.8.4
#> [111] UCSC.utils_1.2.0 fftwtools_0.9-11
#> [113] yaml_2.3.10 boot_1.3-31
#> [115] evaluate_1.0.1 codetools_0.2-20
#> [117] tibble_3.2.1 cli_3.6.3
#> [119] systemfonts_1.1.0 munsell_0.5.1
#> [121] jquerylib_0.1.4 Rcpp_1.0.13-1
#> [123] zeallot_0.1.0 coda_0.19-4.1
#> [125] png_0.1-8 parallel_4.4.2
#> [127] pkgdown_2.1.1 jpeg_0.1-10
#> [129] sparseMatrixStats_1.18.0 bitops_1.0-9
#> [131] viridisLite_0.4.2 mvtnorm_1.3-2
#> [133] scales_1.3.0 e1071_1.7-16
#> [135] crayon_1.5.3 scico_1.5.0
#> [137] rlang_1.1.4 cowplot_1.1.3
#> [139] multcomp_1.4-26