4 From the past to the present

4.1 Legacy of the prequel era

The current era continues many of the quests of the prequel era, such as to profile the transcriptome in space, to identify genes with restricted expression, to classify gene expression patterns, to build reference gene expression atlases for model systems, and to infer anatomical regions based on gene expression. While the prequel era also sought to identify cell type markers, this has been taken over by non-spatial transcriptomics, which has been used to identify marker genes to stain for with spatial transcriptomics methods not easily scalable to the whole transcriptome. As already mentioned, (WM)ISH atlases can be understood as an improved alternative to microarray and in situ reporter screens, and the latter can be in turn understood as an improved alternative to enhancer and gene traps. To some extent, current era spatial transcriptomics started as an improved alternative to (WM)ISH atlases, to profile the whole transcriptome in the same cells (Junker et al. 2014; J. H. Lee et al. 2014). On the other hand, part of current era of spatial transcriptomics can be seen as an improvement to bulk microarray or RNA-seq (V. M. Brown et al. 2002; Junker et al. 2014; Ståhl et al. 2016; Luo et al. 1999), and lower throughput single cell biology (Lubeck and Cai 2012; K. H. Chen et al. 2015).

How does the current era relate to the prequel era in general? The current era has undergone massive growth unseen in the prequel era (Figure 4.2). Unlike in the prequel era, current era technologies are typically highly multiplexed to quantify hundreds to thousands of genes if not the transcriptome within the same piece of tissue. While cell segmentation of bright field (WM)ISH images is challenging, in some current era technologies, transcripts can be traced back to the individual cells of origin. Moreover, cost of NGS has greatly decreased, and the most popular current era techniques – LCM followed by RNA-seq, and 10X Visium – rely on NGS to quantify transcripts, thus making it much more efficient to profile transcriptomes in space than with (WM)ISH, let alone enhancer and gene traps.

Again, we may take inspiration from histories of other technologies that have no doubt undergone revolutions to illustrate where we are in what appears to be a revolution in progress in part propelled by NGS and greater computing power. The growing popularity and greatly improved efficiency of current era techniques compared to those of prequel techniques in applications to thousands of genes may be akin to how the safety bicycle relates to the penny-farthing. The advances from the former has rendered the latter virtually obsolete, and nearly all bikes we see on the streets today are much more like the safety bicycle in both appearance and mechanism than the penny-farthing. Since the 1890s, when the safety bicycle became popular, bicycle technologies have drastically improved. However, most histories of cycling do mention the penny-farthing and its ancestors such as the bone shaker, velocipedes (where the “velo” in “velodrome” comes from), and the hobby horse. Some histories mention bicycles propelled by treadles rather than the familiar cranks and a 17th century four wheeled human powered vehicle propelled by pulling ropes from within. Despite these vehicles’ drastically different mechanisms from the modern bicycle, as these are earlier and less successful attempts to achieve the goal of devising a human powered land vehicle that travels faster than walking, which is still one of the primary goals of modern bicycle-related technologies. These histories are really histories of the quest to achieve that goal.

Moreover, when you see a lightening fast high-tech aerodynamic carbon fiber time trial bike tested in an aerospace wind tunnel, the penny-farthing is not to be forgotten, because the former still benefits from legacies from the latter. Roads used to be unpaved and very rough, and in the US, the paved roads, road signs guiding travelers, and interstates originated from advocacy by the League of American Wheelmen (LAW) since 1880, which was the penny-farthing era (Guroff 2016). The same may be said for the UK (Reid 2011). As the automobile replaced the safety bicycle as the favored mode of transportation in the 20th century (another revolution in transportation), drivers are not only benefiting from the better roads advocated by early cyclists but also the pneumatic tire originally popularized by the safety bicycle. Finally, without the legacy of LAW’s advocacy, the modern form of fast road racing for which the high-tech carbon fiber time trial bike is built would not be possible. Today, LAW, which has been renamed League of American Bicyclists (LAB), is still operational as a cycling advocacy group.

On the one hand, just like the penny-farthing, which are now obsolete except in some hobbyist niches, prequel techniques can be seen as earlier and less successful attempts to achieve the goal to profile expression of as many genes as possible while preserving spatial context in tissue, less successful attempts whose disadvantages are addressed in newer and more successful attempts. While enhancer and gene traps and ISH atlases never completely died off (Figure 2.3), there is no doubt that to profile expression of larger number of genes in new studies, prequel techniques have by and large been replaced by current era techniques. On the other hand, just like LAW/B, the legacies of prequel spatial transcriptomics directly benefit the current era.

How has the prequel era influenced the current era? The direct influence does not seem profound overall when considering all biological systems studied, but is nevertheless sizable. For mouse brain studies, the Allen Brain Institute and its ABA do seem to have a bigger influence. The most obvious institutional continuation between the prequel and current eras is the Allen Brain Institute, which used ISH for the mouse ISH atlases, LCM and microarray for the human and macaque atlases, and generated bulk and scRNA-seq datasets as part of the atlases. Allen scRNA-seq data is often used to benchmark computational methods to map dissociated scRNA-seq cells to spatial locations in tissue and/or to impute gene expression in space (the two related tasks are collectively called spatial reconstruction of scRNA-seq here), with STARmap (X. Wang et al. 2018), osmFISH (Codeluppi et al. 2018), MERFISH (M. Zhang et al. 2020), and/or Visium (Dou et al. 2020) mouse cortex data as the spatial reference (Stuart et al. 2019; Cang and Nie 2020; Abdelaal et al. 2020; Okochi et al. 2021; Dou et al. 2020); this is an institutional legacy from the prequel era. Another prequel era institutional legacy is the Jackson Lab (JAX), home of the prequel GXD, and where many lab mice come from. JAX has also contributed to the current era with the recent Visium mouse urinary bladder atlas (Baker et al. 2021). The data might soon be available for online exploration with cellxgene on the JAX Single Cell Portal but is not yet available as of writing. However, for the most part, as shown later in this chapter, prequel and current era data collection techniques were developed and used in distinct institutions, suggesting that the two eras are largely sociologically distinct (Figure 4.12). This is not surprising given that different techniques in the current era are also often developed and used in largely distinct institutions (e.g. 5.27).

The influence is mainly usage of prequel resources in current era data analysis, mostly in spatial reconstruction of scRNA-seq data and cross referencing to validate or interpret computational results. As already mentioned in Chapter 2, early scRNA-seq spatial reconstruction methods used binarized prequel style WMISH atlases for zebrafish embryos (Seurat v1) and Platynereis and whole mount FISH atlas BDTNP for Drosophila as spatial references. Thereafter the Seurat v1 zebrafish WMISH atlas and BDTNP have been used to benchmark several new spatial reconstruction methods (Cang and Nie 2020; G. Sharma et al. 2020; Okochi et al. 2021), including methods developed for the DREAM challenge to map cells to locations with smaller number of informative genes (Tanevski et al. 2020; Alonso, Carrea, and Diambra 2020; Pham et al. 2020). However, such benchmarks do not seem to indicate interest in studying the biology of zebrafish and Drosophila development in 3D, as the purpose of such benchmark is more to validate computational methods than to perform biological inferences. Furthermore, zebrafish and Drosophila only take up very small proportions of all current era studies compared to mouse and human (Figure 4.4). For the mouse brain, the ABA mouse ISH atlas has been used as the spatial reference to quantitatively map scRNA-seq cell types to spatial locations (Zeisel et al. 2018). The Allen developing mouse brain ISH atlas was also used as spatial reference to map human brain organoid scRNA-seq cells to space and mouse developmental stages for interpretation (Fleck et al. 2021). Here, unlike the WMISH atlases and BDTNP, the ABA is not binarized before spatial mapping.

With staining for around 20,000 genes, the ABA is more frequently used to qualitatively confirm that the computationally imputed gene expression patterns recapitulate the ISH staining of the same genes (Stuart et al. 2019; Welch et al. 2019; Abdelaal et al. 2020; Korsunsky et al. 2019; Biancalani et al. 2020). EMAGE eMouseAtlas (Armit et al. 2017) has been used to qualitative validate Geo-seq (G. Peng et al. 2016) and DBiT-seq (Y. Liu, Yang, et al. 2020) results, but usage of EMAGE is rare in the current era. In current era ST and Visium, an H&E image of the tissue accompanies the spatial transcriptome. The H&E image of mouse brains has been used to manually or computationally align the dataset to the Allen Mouse Brain Common Coordinate Framework (CCF) (G. Wang et al. 2020) to integrate ABA’s brain anatomical ontologies to new datasets to facilitate interpretation of the data (Ortiz et al. 2020; W.-T. Chen et al. 2020; Biancalani et al. 2020). Even without H&E, Allen ontologies have been used to manually annotate HybISS data from the developing mouse brain based on marker gene expression (Manno et al. 2021). In the mouse primary motor cortex (MOp) MERFISH atlas (M. Zhang et al. 2020), Allen CCF was used to select the MOp region.

There are over 15 extant mouse databases from the prequel era (Figure 2.9), yet ABA is exceptional in its impact on the current era. We have never seen any mention of other prequel mouse databases, such as Eurexpress (Diez-Roux et al. 2011), and GenePaint (Yaylaoglu et al. 2005) in current era literature. This may be due to the following reasons: First, the ABA is the most comprehensive prequel atlas for the adult mouse brain, with around 20,000 genes for adult mice (P56). As of August 2021, EMAGE has ISH images for 17,554 genes, Eurexpress has 19,440 (that’s the number of assays, but it seems that each gene typically has one assay, so it should be close to the number of genes), and GenePaint contains Eurexpress data and can query ISH data from several other databases including ABA. EMAGE covers a wide range of developmental stages, from E0.5 to E18, but not later stages and adults. Eurexpress only covers E14.5. The Allen developing mouse brain atlas covers from E11.5 to P28, though only with about 2000 genes. When the ABA is used in the current era, most of the time the adult mouse atlas is used.

Second, the ABA has much better infrastructure to facilitate quantitative analyses of the atlas than the other prequel mouse atlases. Both EMAGE and Eurexpress have detailed annotation of ISH results for many genes and allow searching for genes with similar expression patterns. In addition, Eurexpress shows ISH for many consecutive sections in 3D, and EMAGE has 3D histology models (for morphology rather than gene expression) at different developmental stages. In addition to these functionalities, ABA quantified ISH staining and registered the quantified ISH to the CCF, so just like in scRNA-seq, each voxel would have a vector of gene expression values. Usage of ABA in the current era mentioned above would not be possible without the CCF. ABA also has an application programming interface (API) to automate retrieval of such quantitative data for analyses suitable for the quantitative nature of the current era. In contrast, we are unaware of such quantification, registration, and API in other prequel mouse atlases, thus restricting their uses to be more qualitative. A similar pattern can be seen in Drosophila prequel databases. BDTNP registered staining from thousands of embryos stained for different genes onto a common coordinate system. BDTNP data could also be easily downloaded as csv-like files that can be easily parsed, though as of August 2021, the BDTNP website is not responsive. As seen in 3, a reason why FlyExpress was commonly used was that images for different genes were registered in FlyExpress.

4.2 Metadata of the current era

The current era started with LCM followed by microarray in 1999 (Luo et al. 1999). Due to the immense popularity of LCM followed by microarray or RNA-seq, the body of LCM literature is too vast for unbiased and comprehensive manual curation, so the curated database does not include most LCM literature, which was instead collected from a PubMed search and text mined (Figure 6.3, Chapter 6). Because the search results—without manual inspection and curation—may contain irrelevant entries and miss relevant ones, they are separated from the curated database in our analyses. Current era literature in the curated database is classified into Microdissection, smFISH, ISS, Array, and No Imaging, to be defined in detail in their corresponding sections below.

Table 4.1: Summary of spatial transcriptomics techniques in the current era
Method	First published	Category	Max # genes	Min spot diameter (\(\mu\)m)
voxelation	2002-01-31	ROI selection	Tx wide	NA
PA-GFP	2010-11-11	ROI selection	Tx wide	NA
SRM seqFISH	2012-06-02	smFISH	32	single cell
Tomo-array	2012-09-18	ROI selection	Tx wide	NA
iceFISH	2013-02-16	smFISH	20	single cell
ISS	2013-07-14	ISS	256	single cell
Tomo-seq	2013-08-11	ROI selection	Tx wide	NA
bDNA-smFISH	2013-10-05	smFISH	928	single cell
TIVA	2014-01-11	ROI selection	Tx wide	NA
FISSEQ	2014-03-21	ISS	8102	single cell
seqFISH	2014-03-27	smFISH	10421	single cell
MERFISH	2015-04-23	smFISH	4209	single cell
Puzzle Imaging	2015-07-20	De novo	NA	NA
Geo-seq	2016-03-20	ROI selection	Tx wide	NA
corrFISH	2016-06-05	smFISH	10	single cell
ST	2016-06-30	NGS barcoding	Tx wide	100
HCR-seqFISH	2016-10-18	smFISH	249	single cell
punch	2017-06-27	ROI selection	Tx wide	NA
SGA	2017-11-27	smFISH	35	single cell
APEX-RIP	2017-12-14	De novo	NA	NA
Niche-seq	2017-12-21	ROI selection	Tx wide	NA
ExM-MERFISH	2018-03-18	smFISH	10050	single cell
STARmap	2018-07-27	ISS	1020	single cell
Paired-cell sequencing	2018-09-17	De novo	NA	NA
osmFISH	2018-10-29	smFISH	33	single cell
seqFISH+	2019-03-24	smFISH	17856	single cell
slide-seq	2019-03-28	NGS barcoding	Tx wide	10
bDNA-MERFISH	2019-05-24	smFISH	130	single cell
GeoMX DSP	2019-06-20	ROI selection	1825	NA
DNA microscopy	2019-06-27	De novo	NA	NA
APEX-seq	2019-07-11	De novo	NA	NA
INSTA-seq	2019-08-06	ISS	NA	single cell
PARSIFT	2019-09-04	De novo	NA	NA
HDST	2019-09-08	NGS barcoding	Tx wide	2
GaST-seq	2019-10-09	ROI selection	Tx wide	NA
BARseq	2019-10-17	ISS	107	single cell
PIC-seq	2020-03-09	De novo	NA	NA
miRNA nanowell	2020-05-08	NGS barcoding	1	300
split-FISH	2020-06-14	smFISH	317	single cell
Visium	2020-06-21	NGS barcoding	Tx wide	55
ZipSeq	2020-07-05	ROI selection	Tx wide	NA
SMD-seq	2020-08-10	ROI selection	Tx wide	NA
HybISS	2020-09-28	smFISH	199	single cell
DBiT-seq	2020-10-18	NGS barcoding	Tx wide	10
C-FISH	2020-10-22	smFISH	2	single cell
SCRINSHOT	2020-11-19	smFISH	177	single cell
slide-seq2	2020-12-06	NGS barcoding	Tx wide	10
Stereo-seq	2021-01-18	NGS barcoding	Tx wide	0.22
GeoMX WTA	2021-01-24	ROI selection	19963	NA
ExSeq	2021-01-29	ISS	297	single cell
BOLORAMIS	2021-03-08	ISS	96	single cell
Pick-Seq	2021-03-08	ROI selection	Tx wide	NA
nanoneedles	2021-03-09	ROI selection	Tx wide	NA
CISI	2021-04-14	smFISH	37	single cell
STRP-seq	2021-04-18	ROI selection	Tx wide	NA
XYZeq	2021-04-20	NGS barcoding	Tx wide	500
BARseq2	2021-05-10	ISS	65	single cell
ClumpSeq	2021-05-24	De novo	NA	NA
Seq-Scope	2021-06-24	NGS barcoding	Tx wide	0.5
sci-Space	2021-07-01	NGS barcoding	Tx wide	73.2
CIM-seq	2021-07-12	De novo	NA	NA
PIC	2021-07-19	ROI selection	Tx wide	NA
par-seqFISH	2021-08-12	smFISH	150	single cell
SPACECAT	2021-08-16	ROI selection	Tx wide	NA
RNAscope	2021-09-28	smFISH	95	single cell
Molecular Cartography	2021-10-11	smFISH	100	single cell
coppaFISH	2021-10-23	smFISH	72	single cell
EASI-FISH	2021-12-05	smFISH	29	single cell
Halo-seq	2021-12-07	De novo	NA	NA
OpTAG-seq	2021-12-29	ROI selection	Tx wide	NA
MOSAICA	2022-01-09	smFISH	10	single cell
Visium protein	2022-01-19	NGS barcoding	Tx wide	55
LoRNA	2022-01-25	De novo	NA	NA
manual dissection with velocimetry and cell tracking	2022-01-30	ROI selection	Tx wide	NA
SM-Omics	2022-02-09	NGS barcoding	Tx wide	100
FUNseq	2022-02-21	ROI selection	Tx wide	NA
centrifugation on 384 well plate	2022-02-22	ROI selection	Tx wide	NA
Space-TREX	2022-02-23	NGS barcoding	Tx wide	55
MERR APEX-seq	2022-03-03	De novo	NA	NA
vCatFISH	2022-03-15	smFISH	21	single cell
scStereo-seq	2022-05-03	NGS barcoding	Tx wide	0.22
Select-seq	2022-05-08	ROI selection	Tx wide	NA
HybRISS	2022-05-12	smFISH	176	single cell
punch2	2022-06-16	ROI selection	Tx wide	NA
SHM-seq	2022-07-18	NGS barcoding	Tx wide	100
scNaST	2022-07-21	NGS barcoding	Tx wide	NA
PHYTOMap	2022-07-29	smFISH	28	single cell
Matrix-seq	2022-08-04	NGS barcoding	Tx wide	50
xDbit	2022-08-31	NGS barcoding	Tx wide	50
EEL FISH	2022-09-21	smFISH	888	single cell
TEMPOmap	2022-09-27	ISS	998	single cell
CBSST-Seq	2022-10-04	NGS barcoding	Tx wide	50
CosMX	2022-10-06	ROI selection	18878	NA
CosMX	2022-10-06	smFISH	18878	single cell
Light-Seq	2022-10-09	ROI selection	Tx wide	NA
Slide-TCR-seq	2022-10-10	NGS barcoding	Tx wide	10
clampFISH 2.0	2022-10-23	smFISH	10	single cell
Spatial-seq	2022-10-29	ROI selection	Tx wide	NA
sphere-seq	2022-11-01	De novo	NA	NA
STRS	2022-11-02	NGS barcoding	Tx wide	55
PIXEL-seq	2022-11-09	NGS barcoding	Tx wide	1.22
mFISH3D	2022-11-23	smFISH	6	single cell
STOmics-GenX	2022-12-07	NGS barcoding	Tx wide	0.22
Spectrum-FISH	2022-12-12	smFISH	33	single cell
TATTOO-seq	2022-12-13	ROI selection	Tx wide	NA
SPOTS	2023-01-01	NGS barcoding	Tx wide	55
MiP-Seq	2023-01-07	ISS	217	single cell
Decoder-seq	2023-01-15	NGS barcoding	Tx wide	15
USeqFISH	2023-01-25	smFISH	30	single cell
RRST	2023-01-30	NGS barcoding	Tx wide	55
STARmap PLUS	2023-02-02	ISS	2766	single cell
IISS	2023-02-15	ISS	40	single cell
Spatial-CITE-seq	2023-02-22	NGS barcoding	Tx wide	25
Spatial ATAC–RNA-seq	2023-03-14	NGS barcoding	Tx wide	20
Spatial CUT&Tag–RNA-seq	2023-03-14	NGS barcoding	Tx wide	NA
fs-LM	2023-03-15	ROI selection	Tx wide	NA
SiT	2023-03-16	NGS barcoding	Tx wide	55
FISHnCHIPs	2023-04-11	smFISH	674	single cell
RAINBOW-seq	2023-04-13	De novo	NA	NA
Stereo-CITE-seq	2023-04-27	NGS barcoding	Tx wide	NA
electro-seq	2023-04-27	ISS	201	single cell
GeoMX SPG	2023-05-02	ROI selection	Tx wide	NA
MISAR-seq	2023-05-24	NGS barcoding	Tx wide	50
SLCR-seq	2023-06-06	ROI selection	Tx wide	NA
ST-FFPE	2023-06-11	ROI selection	Tx wide	NA
HiFi-Slides	2023-06-12	NGS barcoding	Tx wide	NA
Ex-ST	2023-06-21	NGS barcoding	Tx wide	55
SEEP	2023-06-21	De novo	NA	NA
Well-ST-seq	2023-06-29	NGS barcoding	Tx wide	10
Xenium	2023-06-29	smFISH	5101	single cell
STcEM	2023-07-10	smFISH	287	single cell
Voltage-Seq	2023-07-19	ROI selection	Tx wide	NA
NVIGEN nanobiopsy	2023-07-31	ROI selection	Tx wide	NA
DART-FISH	2023-08-17	smFISH	300	single cell
Curio Seeker	2023-09-01	NGS barcoding	Tx wide	10
SMA	2023-09-03	NGS barcoding	Tx wide	NA
BMKMANU S1000	2023-09-12	NGS barcoding	Tx wide	NA
DRaqL	2023-09-17	ROI selection	Tx wide	NA
DNA-GPS	2023-09-24	NGS barcoding	Tx wide	NA
double-barcode	2023-09-24	NGS barcoding	Tx wide	NA
CDB-seq	2023-10-10	ROI selection	Tx wide	NA
RNAsky	2023-11-07	ISS	23	single cell
SPRINTseq	2023-11-14	ISS	108	single cell
SPTCR-seq	2023-11-15	NGS barcoding	Tx wide	55
SmT	2023-11-19	NGS barcoding	Tx wide	55
scTECH-seq	2023-11-21	De novo	NA	NA
Perturb-FISH	2023-11-30	smFISH	485	single cell
SWITCH-seq	2023-12-04	ISS	NA	single cell
LR-Spatial VDJ	2023-12-07	NGS barcoding	Tx wide	55
SR-Spatial VDJ	2023-12-07	NGS barcoding	Tx wide	55
Slide-tags	2023-12-12	NGS barcoding	Tx wide	10
Slide-tags	2023-12-12	De novo	NA	10
Slide-tags multiome	2023-12-12	NGS barcoding	Tx wide	10
PerturbView	2023-12-26	ISS	377	single cell
Array-seq	2024-01-07	NGS barcoding	Tx wide	30
Raman2RNA	2024-01-09	smFISH	9	single cell
HyperSeq	2024-01-30	NGS barcoding	Tx wide	500
Patho-DBiT	2024-02-08	NGS barcoding	Tx wide	10
ARTseq-FISH	2024-02-20	smFISH	67	single cell
C3PO	2024-03-12	ROI selection	Tx wide	NA
ROInet-seq	2024-04-05	ROI selection	Tx wide	NA
Nova-ST	2024-04-10	NGS barcoding	Tx wide	0.3
BALI	2024-05-22	ROI selection	Tx wide	NA
combFISH	2024-05-22	ISS	64	single cell
HiPlex FISH	2024-05-26	smFISH	20	single cell
MIST	2024-06-03	NGS barcoding	Tx wide	55
MOLseq	2024-06-05	ROI selection	Tx wide	NA
BirthSeq	2024-06-10	ISS	84	single cell
Open-ST	2024-06-25	NGS barcoding	Tx wide	0.6
bacterial-MERFISH	2024-06-27	smFISH	1057	single cell
PRISM	2024-07-02	smFISH	31	single cell
BMK S1000	2024-07-04	NGS barcoding	Tx wide	NA
Pixel-seq	2024-07-04	NGS barcoding	Tx wide	NA
DBiT ARP-seq	2024-07-28	NGS barcoding	Tx wide	NA
DBiT CTRP-seq	2024-07-28	NGS barcoding	Tx wide	NA
IRISeq	2024-08-08	De novo	NA	NA
SCOPE	2024-08-08	De novo	NA	NA
cycleHCR	2024-08-15	smFISH	254	single cell
weMERFISH	2024-08-27	smFISH	495	single cell
2P-NucTag	2024-09-04	ROI selection	Tx wide	NA
MAGIC-seq	2024-09-10	NGS barcoding	Tx wide	15
spatial-Mux-seq	2024-09-19	NGS barcoding	Tx wide	50
3DEEP-FISH	2024-09-24	smFISH	85	single cell
Centrillion	2024-09-27	NGS barcoding	Tx wide	2
ATLAS	2024-10-09	smFISH	6654	single cell
Perturb-DBiT	2024-11-28	NGS barcoding	Tx wide	NA
Visium HD	2024-11-29	NGS barcoding	Tx wide	2
RAM-FISH	2024-12-12	smFISH	33	single cell
Spatial Perturb-Seq	2024-12-20	NGS barcoding	Tx wide	NA
DBiTplus	2024-12-23	NGS barcoding	Tx wide	NA
IN-DEPTH	2025-01-07	ROI selection	Tx wide	NA
Salus-STS	2025-01-18	NGS barcoding	Tx wide	1
Stereo-XCR-seq	2025-01-19	NGS barcoding	Tx wide	NA
CrossSeq	2025-01-23	ROI selection	Tx wide	NA
LISH-seq	2025-01-24	smFISH	130	single cell
SeekSpace	2025-01-26	NGS barcoding	Tx wide	NA
Seq-Scope-X	2025-02-08	NGS barcoding	Tx wide	0.2
GenePS	2025-02-14	smFISH	216	single cell
CITEgeist	2025-02-17	NGS barcoding	Tx wide	NA
spRandom-seq	2025-02-23	NGS barcoding	Tx wide	55
microdissection	2025-02-24	ROI selection	Tx wide	NA
mudRapp Seq	2025-03-26	ISS	NA	single cell
graphene-seq	2025-03-28	ISS	NA	single cell
RAEFISH	2025-03-29	smFISH	23312	single cell
ST-FFPE-mIF	2025-04-10	NGS barcoding	Tx wide	NA
CMS-seq	2025-04-11	ROI selection	Tx wide	NA
MEA-seqX	2025-04-30	NGS barcoding	Tx wide	55
Cartana	2025-05-01	ISS	159	single cell
Nanotweezers	2025-05-06	ROI selection	Tx wide	NA
BCR-MERFISH	2025-05-14	smFISH	589	single cell
PHOTON	2025-05-14	ROI selection	Tx wide	NA
CAP-seq	2025-06-10	De novo	NA	NA
RCA-MERFISH	2025-06-13	smFISH	209	single cell
STAMP-C	2025-06-18	smFISH	6000	single cell
STAMP-M	2025-06-18	smFISH	500	single cell
STAMP-X	2025-06-18	smFISH	5000	single cell
Spatial-DMT	2025-07-04	NGS barcoding	Tx wide	NA
hamFISH	2025-07-29	smFISH	32	single cell
Deep-STARmap	2025-11-25	ISS	1017	single cell

Number of publications over time in the current era. The gray histogram in the background is the overall trend of all current era literature. Each facet highlights a category, ordered chronologically in terms of first report. Bin width is 30 days. Plots in this figure include curated LCM literature, but not the non-curated literature.

Figure 4.1: Number of publications over time in the current era. The gray histogram in the background is the overall trend of all current era literature. Each facet highlights a category, ordered chronologically in terms of first report. Bin width is 30 days. Plots in this figure include curated LCM literature, but not the non-curated literature.

Chronologically, in the curated database, microdissection came first, with voxelation in 2002 (V. M. Brown et al. 2002), followed by smFISH, ISS, no imaging, and NGS barcoding (Figure 4.1). Despite an early start in the midst of the (WM)ISH golden age, if not including non-curated LCM literature, the current era did not really take off until around 2014 (Figure 4.2). Ever since, its has seen drastic growth, far exceeding that of the prequel era in the 1990s and 2000s (Figure 4.2). Growth in microdissection and NGS barcoding seemed to have contributed the most to this overall drastic growth (Figure 4.1). All techniques in the curated database, along with their classification, maximum number of genes, spatial resolution, and references are listed in Table 4.1.

Comparing number of publications over time in the prequel and the current eras. Bin width is 180 days. The x-shaped points show the number of publications from the last bin, which is not yet full.

Figure 4.2: Comparing number of publications over time in the prequel and the current eras. Bin width is 180 days. The x-shaped points show the number of publications from the last bin, which is not yet full.

A timeline of foundational or influential techniques in the current era is shown in Figure 4.3. This is not meant to be a timeline of all current era techniques, but only of techniques that either laid the foundation of popular current era techniques (e.g. Solexa, later Illumina, sequencing) or very influential within a category of techniques (e.g. MERFISH for smFISH based techniques, and ST for NGS barcoding). Just like the “revolution” of current era spatial transcriptomics, each item in the timeline must not be understood as works of the “solitary genius”. Rather, each of the landmark innovations in the timeline occurred in its own historical context, with influences from predecessors, which are not plotted in the timeline for the sake of brevity.

Figure 4.3: Timeline of major techniques related to the current era.

The prequel era started with untargeted screens and grew into atlases and databases striving to be comprehensive. Screens are still a theme in the current era and spatial transcriptomics is still used in untargeted searches for genes involved in development of model organisms, but with highly multiplexed technology, this can also be done for pathological and human tissues (Figure 4.4, Figure 4.5). Thanks to multiplexing, while mouse was the most popular species in the prequel era, in the current era, there are more studies on human tissues than those on mice and the vast majority of studies are on either humans or mice (Figure 4.4). Furthermore, there are datasets for a wider range of organs in mice in the current era (e.g. colon, liver, uterus, and etc.) than in the prequel era though there still is more interest in the brain (Figure 4.6, Figure 2.7).

Drosophila is no longer as commonly used in the current era (Figure 4.4). Whole mount smFISH has been applied Drosophila brains but without multiplexing (Titlow et al. 2022), zebrafish embryos (Oka and Sato 2015), and embryonic mouse organs (C. Wang et al. 2019). For Drosophila tissue sections, while microdissection, smFISH, and ISS may be applied, the resolution of ST and Visium may be too low to discern sufficiently fine patterns in such a small model organism. Besides low resolution Tomo-seq along one body axis (Combs and Eisen 2013; Combs and Fraser 2018), current era Drosophila datasets come from subcellular resolution technologies, such as smFISH on YFP trap lines (whole mount nervous system, not multiplexed) (Titlow et al. 2022), in situ sequencing (retina) (Fürth, Hatini, and Lee 2019), ExSeq (embryo, might be whole mount with tissue clearing and expansion) (Alon et al. 2021), and Stereo-seq (whole embryo sectioned along the anterior-posterior axis) (M. Wang et al. 2022). The reason why Drosophila is less popular may be that the most popular commercial technologies are unsuitable, as Visium (Figure 4.7) has too low a resolution and MERFISH is not whole mount.

Figure 4.4: Number of publication per species.

A) Number of publications for each healthy organ in human (male shown here, as there is no study on healthy female specific organs in humans at present). B) Number of publications for pathological organs in human (female shown here, but there are at least two studies on prostate cancer (Burgess 2019; L. Brady et al. 2021)).

Figure 4.5: A) Number of publications for each healthy organ in human (male shown here, as there is no study on healthy female specific organs in humans at present). B) Number of publications for pathological organs in human (female shown here, but there are at least two studies on prostate cancer (Burgess 2019; L. Brady et al. 2021)).

Atlases have been made with current era technology, such as MERFISH (M. Zhang et al. 2020), HybISS (Manno et al. 2021), ST (Ortiz et al. 2020), Visium (S. Z. Wu et al. 2021), GeoMX DSP (Delorey et al. 2021), and Slide-seq2 (Lake et al. 2021) described and analyzed with similar language to that of (WM)ISH atlases. Also as in the prequel era, the brain is still the most favored healthy organ (Figure 4.5, Figure 4.6). Among pathological tissues, breast tumors are the most used (Figure 4.5). Note that these anatograms only includes organs available in the R package gganatogram. Datasets from organs unavailable in the package are not shown. For metastases, the organ used for plotting here is the destination of metastases, so a liver metastasis of breast cancer would be plotted in the liver. More recently, in the wake of the SARS-CoV-2 pandemic, a number of studies using GeoMX Digital Spatial Profiler (DSP) to profile spatial transcriptomes of lungs of COVID victims have been published (Jiwoon Park et al. 2021; Butler et al. 2021; Delorey et al. 2021; Margaroli et al. 2021).

Figure 4.6: A) Number of publications per healthy organ in the mouse. B) Number of publications for pathological organs in mouse.

However, unlike in the prequel era, in which older technologies were adapted to larger scale to produce the screens and atlases, the current era has another major theme – new techniques, due to the challenges to be discussed in the following sections; the number of new techniques published each year has grown steadily in the past few years (Figure 4.9). However, this difference might be due to bias in curation and change in culture. In the prequel era, very different enhancer and gene trap vectors were lumped together into enhancer or gene trap in our database, and there might have been many different early non-radioactive ISH protocols not included in our database because they were not used to profile a sufficiently large number of genes. Furthermore, in the current era, authors like to give techniques new names, making related techniques seem distinct rather than lumped together in a wider category like enhancer or gene trap.

While a few techniques other than LCM have become popular, such as ISS (2013), Tomo-seq (2013), MERFISH (2015), ST (late 2016), GeoMX DSP (2019, only showing transcriptomics studies here), and Visium (first preprint in 2020), most techniques never or rarely spread beyond their institutions of origin (Figures 4.7, 4.9). Furthermore, except for Visium and LCM, prequel (WM)ISH, enhancer trap, and gene trap have been used by more institutions than current era techniques (4.8). This might be because there has not been enough time for recently published new techniques to be implemented elsewhere, or if they have been implemented, there has not been enough time for the relevant studies to be published. Or that there has been much less time for relatively new commercial techniques like GeoMX DSP to spread to more institutions compared to (WM)ISH. Furthermore, usage of Visium and GeoMX DSP might have been spread by commercialization and core facilities and usage of Tomo-seq might have been spread by relative ease of implementation with standard lab equipment; implementing complex current era techniques that require custom built equipment such as custom fluidics systems independently may be more challenging, thus hampering their widespread adoption. This is analogous to a well-tested and fool proof commercial cake mix widely available at grocery stores that only calls for standard kitchen equipment such as the oven and the hand mixer as opposed to a cake recipe that is not only very complicated but also requires the home cook to build custom kitchen equipment. Even if instructions to assemble the custom equipment is available, most people would probably prefer to buy the pre-assembled product when feasible. The average home cook would most likely prefer the former to the latter. Having a core facility perform a procedure is like ordering a cake from a bakery, which is much easier than DIY trials and errors and building custom equipment.

Figure 4.7: Techniques used by at least 3 institutions and the number of institutions that have used them.

Figure 4.8: Prequel techniques used by at least 3 institutions and the number of institutions that have used them.

Protocols of WMISH (as used in GEISHA) (Bell, Yatskievych, and Antin 2004), ISH (as used in GenePaint and ABA) (Yaylaoglu et al. 2005), Visium, and MERFISH (J. R. Moffitt and Zhuang 2016) all have numerous steps. What (WM)ISH and Visium seem to have in common besides that they are widely adopted is that a significant part of the protocol is taken care of, by commercial automated systems (for (WM)ISH) or core facilities (for Visium), so there is less DIY hassle. Commercial automated ISH systems are commonly used by large scale (WM)ISH atlases. For example, GEISHA used the Abimed In Situ Pro (Bell, Yatskievych, and Antin 2004), and GenePaint (Geffers and Eichele 2015), ABA (E. S. Lein et al. 2007), and LungMap (Ljungberg et al. 2019) used the Tecan EVO liquid handling platform (or its pre-commercial version), to automate ISH staining of numerous sections or embryos and genes. Several major institutions have core facilities that perform Visium (“10X VISIUM SPATIAL TRANSCRIPTOMICS,” n.d.; “ADVANCED GENOMICS CORE PRICING,” n.d.; “SpaRTAN,” n.d.), and even if the core facility does not perform Visium as a whole, NGS core facilities are common. Furthermore, the Visium protocol does not require custom made equipment that cannot be purchased from 10X itself and major lab equipment companies such as Bio-Rad and VWR. Visium involves scanning the H&E image of the tissue section, which can be done by a histology core. As library preparation of the forerunner of Visium, ST, can be automated (Jemt et al. 2016), it would be reasonable to say that Visium library preparation can be automated. In contrast, the MERFISH protocol involves a custom built fluidics system to automate the imaging and liquid handling and long imaging time that might not be appropriate for a microscopy core facility. However, as MERFISH is now commercialized and automated by Vizgen, it has been more widely adopted.

Figure 4.9: Number of new methods per year, colored by the number of institutions that have used the method.

Whether fastq files from published NGS based papers (no preprints) are available on a public data repository such as GEO over time. Bin width is 180 days.

Figure 4.10: Whether fastq files from published NGS based papers (no preprints) are available on a public data repository such as GEO over time. Bin width is 180 days.

While in prequel (WM)ISH atlases, the images are themselves the data, current era data goes beyond visualization of gene expression in space. NGS based current era data has the sequencing reads in fastq files, which can be re-processed for RNA velocity and isoform analyses. The fastq files are often deposited in data repositories such as GEO and ENA, where they can be downloaded for re-processing. However, for some human data, to protect patients’ privacy, the fastq files are not available or have controlled access. While the fastq files from around half of published papers for NGS based current era datasets are available in a data repository (Figure 4.10), the fastq files from most NGS based current era preprints are not available, especially the older preprints (Figure 4.11). Sometimes preprints state that the data will be deposited on GEO upon acceptance of the manuscript (e.g. (Zuo et al. 2021)).

Whether fastq files from published NGS based preprints are available on a public data repository such as GEO over time. Bin width is 90 days.

Figure 4.11: Whether fastq files from published NGS based preprints are available on a public data repository such as GEO over time. Bin width is 90 days.

Especially in the US, research in the current era tends to be more concentrated in a few elite institutions despite the mainstreaming of spatial transcriptomics to many less well-known institutions, while some top contributors in the prequel era were some less well-known institutions (Figure 4.12, 4.13). Among the top contributing institutions in the prequel era are those hosting databases, such as Allen Institute for ABA, University of Oregon (UO) for ZFIN, UC Berkeley and Lawrence Berkeley National Laboratory (LBL) for BDGP, University of Arizona (UofA) for GEISHA, Jackson Laboratory (JAX) for GXD, Western General Hospital (WGH) for EMAGE, and Kyoto University (Kyodai) for GHOST (Figure 4.13). By and large, in western Europe and northeast Asia, prequel and current era research was conducted in different institutions as well (Figure 4.14, Figure 4.15).

World map of institutions. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Figure 4.12: World map of institutions. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Map of institutions around continental US. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Figure 4.13: Map of institutions around continental US. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Map of institutions around western Europe. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Figure 4.14: Map of institutions around western Europe. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Map of institutions in northeast Asia. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Figure 4.15: Map of institutions in northeast Asia. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

4.3 Learning from the past

What can we learn from the history of the prequel era? We might be able to learn something from the past, as people in the past have come up with good ideas that have been mostly forgotten in the present era. An example of such idea in the history of cycling is the 1930s network of at least 280 miles of cycleways separated from motor traffic in the UK, forgotten even by the Ministry of Transport itself; with the new wave of bike advocacy since the 1970s, there have been recent efforts to resurrect these old cycleways (Laskow 2017). Furthermore, the past can illustrate what might happen next and what to do to get better outcome during similar developments at present, such as how the 1918 Spanish flu pandemic has been compared to the current COVID pandemic to point to strategies (e.g. (Aadya Sharma et al. 2021; Robinson 2021)).

First, prequel (WM)ISH atlases by nature require thousands of animals to stain for thousands of genes, and often show photos from multiple animals stained for the same gene, sometimes showing variability in staining and morphology (especially in BDGP and GEISHA as the embryos are small and can be stained en masse), giving some qualitative sense of reproducibility of the staining and pattern and how generalizable a pattern seen in the atlas is to the wider population of the model organism. In contrast, current era datasets and atlases from model organisms tend to use much smaller numbers of animals thanks to multiplexing and cost and do not tend to discuss biological differences and reproducibility of results between the animals. For instance, in the Molecular Atlas of the Adult Mouse Brain (Ortiz et al. 2020), 3 male C57BL/6 mice were used. The online viewer of the Molecular Atlas shows gene expression in coronal sections from different mice all registered to the Allen CCF; adjacent spots from different mice sometimes have quite different expression of the same gene. However, such variability is not discussed in the paper. The MERFISH MOp atlas has 32 sections from each of the 2 mice used and reproducibility of results in the 2 mice is not discussed. The HybISS developing mouse atlas (Manno et al. 2021) only used one E10.5 mouse embryo.

Second, while there are databases for current era data, as discussed in Section 5.9, they do not provide the querying functionalities and systematic annotations of the ABA, EMAGE, and Eurexpress. While SpatialDB (Fan, Chen, and Chen 2020) provides easy access to and visualization of processed current era data from several different technologies, as of August 2021, SpatialDB does not seem to have updated since 2020 and does not contain new datasets. As of writing, the Human Cell Atlas (HCA) (Regev et al. 2017) has data from 5 Visium studies and at least one HybISS study (Langseth et al. 2021). The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative - Cell Census Network (BICCN) (Callaway et al. 2021) has data from MERFISH, osmFISH, seqFISH, and etc. While the studies can be queried, as of writing, in HCA and BICCN, unlike in the prequel atlases, genes can’t be queried to open a webpage to show expression patterns in different data sources, nor can one search for other genes with similar expression patterns. Gene expression patterns are also not annotated. Given the massive volume of scRNA-seq and current era spatial data in HCA and BICCN, it would be more challenging to enable gene annotation, search, and comparison as this would involve analyzing and comparing hundreds of scRNA-seq datasets. However, for current era spatial data, for each organ of each species, the number of datasets in the order of dozens per organ seems to be more manageable for such analyses and comparisons that would enable prequel database style gene queries (Figures 4.5, 4.6). Specifically, mouse brain data can be registered to the CCF to facilitate comparison between datasets, studies, and subjects within one study. Furthermore, the massive volume of quantitative current era may be used to refine prequel gene annotations such as ontologies of developmental stages and anatomical regions.

Third, while most extant prequel (WM)ISH atlases, such as ABA, LungMAP, and GUDMAP, are hosted in online databases for query and view, most current era datasets – including those that claim to be atlases – cannot be viewed online, which can be useful in cases such as to easily look up more information about genome wide association study (GWAS) candidate genes associated with phenotype or expression quantitative trait loci (eQTL) and about differentially expressed (DE) genes from non-spatial transcriptomic or proteomic studies. Even if comparison and analysis of current era data for gene annotation and query is challenging, a web portal that searches multiple datasets for gene expression patterns, merely linking to the gene expression plots in the original data visualization websites of the datasets, would still be helpful. Besides datasets in SpatialDB, some current era datasets can be queried and visualized online, plotting gene expression values in space (dataset description is linked to the online data visualization portal), such as zebrafish Tomo-seq (Junker et al. 2014), mid-gastrula mouse embryo Geo-seq (G. Peng et al. 2016), mouse cortex osmFISH (Codeluppi et al. 2018), ST molecular atlas of the adult mouse brain (Ortiz et al. 2020), and ST and Cartana ISS for Alzheimer’s disease (W.-T. Chen et al. 2020). However, many other current era atlases do not provide online visualization, such as the MERFISH MOp atlas (M. Zhang et al. 2020) and the Visium breast cancer atlas (S. Z. Wu et al. 2021).

Finally, what if another revolution in spatial genomics comes? What in the current era will be remembered like the ABA, and what will be forgotten? We may take clues from the impact of the ABA in the current era and how other prequel atlases seem to be forgotten. To recap, the ABA is still relevant in the current era because of its comprehensiveness, quantification of ISH staining, registration to the CCF, and API to programatically query the database. Some of the hallmarks of the current era are quantitative data and multiplexing. With quantification and the CCF, ABA data resembles such hallmarks, although ABA’s quantification and CCF began in 2006, long before the current era really took off around the mid 2010s. The API makes the data easier to download for analyses than the images from (WM)ISH databases that don’t have APIs. The comprehensiveness makes the ABA relevant to qualitative comparisons to current era results. Furthermore, the Allen Institute itself is participating in the current era by not only producing bulk and single cell RNA-seq data for the atlas but also hosting the data catalog for the BICCN. In contrast, we are unaware of other prequel atlas consortia, such as EMAGE and BDGP, participating in the current era. We cannot foresee what the next revolution would be like. However, from ABA, perhaps we may say that for data, resources, and institutions from the current era to not to be forgotten when the next era comes, they should resemble or adapt to the hallmarks of the next era.

3 Data analysis in the prequel era

5 Current era technologies