4 From the past to the present

4.1 Legacy of the prequel era

The current era continues many of the quests of the prequel era, such as to profile the transcriptome in space, to identify genes with restricted expression, to classify gene expression patterns, to build reference gene expression atlases for model systems, and to infer anatomical regions based on gene expression. While the prequel era also sought to identify cell type markers, this has been taken over by non-spatial transcriptomics, which has been used to identify marker genes to stain for with spatial transcriptomics methods not easily scalable to the whole transcriptome. As already mentioned, (WM)ISH atlases can be understood as an improved alternative to microarray and in situ reporter screens, and the latter can be in turn understood as an improved alternative to enhancer and gene traps. To some extent, current era spatial transcriptomics started as an improved alternative to (WM)ISH atlases, to profile the whole transcriptome in the same cells (Junker et al. 2014; Lee et al. 2014). On the other hand, part of current era of spatial transcriptomics can be seen as an improvement to bulk microarray or RNA-seq (V. M. Brown et al. 2002; Junker et al. 2014; Ståhl et al. 2016; Luo et al. 1999), and lower throughput single cell biology (Lubeck and Cai 2012; K. H. Chen et al. 2015).

How does the current era relate to the prequel era in general? The current era has undergone massive growth unseen in the prequel era (Figure 4.2). Unlike in the prequel era, current era technologies are typically highly multiplexed to quantify hundreds to thousands of genes if not the transcriptome within the same piece of tissue. While cell segmentation of bright field (WM)ISH images is challenging, in some current era technologies, transcripts can be traced back to the individual cells of origin. Moreover, cost of NGS has greatly decreased, and the most popular current era techniques – LCM followed by RNA-seq, and 10X Visium – rely on NGS to quantify transcripts, thus making it much more efficient to profile transcriptomes in space than with (WM)ISH, let alone enhancer and gene traps.

Again, we may take inspiration from histories of other technologies that have no doubt undergone revolutions to illustrate where we are in what appears to be a revolution in progress in part propelled by NGS and greater computing power. The growing popularity and greatly improved efficiency of current era techniques compared to those of prequel techniques in applications to thousands of genes may be akin to how the safety bicycle relates to the penny-farthing. The advances from the former has rendered the latter virtually obsolete, and nearly all bikes we see on the streets today are much more like the safety bicycle in both appearance and mechanism than the penny-farthing. Since the 1890s, when the safety bicycle became popular, bicycle technologies have drastically improved. However, most histories of cycling do mention the penny-farthing and its ancestors such as the bone shaker, velocipedes (where the “velo” in “velodrome” comes from), and the hobby horse. Some histories mention bicycles propelled by treadles rather than the familiar cranks and a 17th century four wheeled human powered vehicle propelled by pulling ropes from within. Despite these vehicles’ drastically different mechanisms from the modern bicycle, as these are earlier and less successful attempts to achieve the goal of devising a human powered land vehicle that travels faster than walking, which is still one of the primary goals of modern bicycle-related technologies. These histories are really histories of the quest to achieve that goal.

Moreover, when you see a lightening fast high-tech aerodynamic carbon fiber time trial bike tested in an aerospace wind tunnel, the penny-farthing is not to be forgotten, because the former still benefits from legacies from the latter. Roads used to be unpaved and very rough, and in the US, the paved roads, road signs guiding travelers, and interstates originated from advocacy by the League of American Wheelmen (LAW) since 1880, which was the penny-farthing era (Guroff 2016). The same may be said for the UK (Reid 2011). As the automobile replaced the safety bicycle as the favored mode of transportation in the 20th century (another revolution in transportation), drivers are not only benefiting from the better roads advocated by early cyclists but also the pneumatic tire originally popularized by the safety bicycle. Finally, without the legacy of LAW’s advocacy, the modern form of fast road racing for which the high-tech carbon fiber time trial bike is built would not be possible. Today, LAW, which has been renamed League of American Bicyclists (LAB), is still operational as a cycling advocacy group.

On the one hand, just like the penny-farthing, which are now obsolete except in some hobbyist niches, prequel techniques can be seen as earlier and less successful attempts to achieve the goal to profile expression of as many genes as possible while preserving spatial context in tissue, less successful attempts whose disadvantages are addressed in newer and more successful attempts. While enhancer and gene traps and ISH atlases never completely died off (Figure 2.3), there is no doubt that to profile expression of larger number of genes in new studies, prequel techniques have by and large been replaced by current era techniques. On the other hand, just like LAW/B, the legacies of prequel spatial transcriptomics directly benefit the current era.

How has the prequel era influenced the current era? The direct influence does not seem profound overall when considering all biological systems studied, but is nevertheless sizable. For mouse brain studies, the Allen Brain Institute and its ABA do seem to have a bigger influence. The most obvious institutional continuation between the prequel and current eras is the Allen Brain Institute, which used ISH for the mouse ISH atlases, LCM and microarray for the human and macaque atlases, and generated bulk and scRNA-seq datasets as part of the atlases. Allen scRNA-seq data is often used to benchmark computational methods to map dissociated scRNA-seq cells to spatial locations in tissue and/or to impute gene expression in space (the two related tasks are collectively called spatial reconstruction of scRNA-seq here), with STARmap (X. Wang et al. 2018), osmFISH (Codeluppi et al. 2018), MERFISH (Zhang et al. 2020), and/or Visium (Dou et al. 2020) mouse cortex data as the spatial reference (Stuart et al. 2019; Cang and Nie 2020; Abdelaal et al. 2020; Okochi et al. 2021; Dou et al. 2020); this is an institutional legacy from the prequel era. Another prequel era institutional legacy is the Jackson Lab (JAX), home of the prequel GXD, and where many lab mice come from. JAX has also contributed to the current era with the recent Visium mouse urinary bladder atlas (Baker et al. 2021). The data might soon be available for online exploration with cellxgene on the JAX Single Cell Portal but is not yet available as of writing. However, for the most part, as shown later in this chapter, prequel and current era data collection techniques were developed and used in distinct institutions, suggesting that the two eras are largely sociologically distinct (Figure 4.12). This is not surprising given that different techniques in the current era are also often developed and used in largely distinct institutions (e.g. 5.27).

The influence is mainly usage of prequel resources in current era data analysis, mostly in spatial reconstruction of scRNA-seq data and cross referencing to validate or interpret computational results. As already mentioned in Chapter 2, early scRNA-seq spatial reconstruction methods used binarized prequel style WMISH atlases for zebrafish embryos (Seurat v1) and Platynereis and whole mount FISH atlas BDTNP for Drosophila as spatial references. Thereafter the Seurat v1 zebrafish WMISH atlas and BDTNP have been used to benchmark several new spatial reconstruction methods (Cang and Nie 2020; G. Sharma et al. 2020; Okochi et al. 2021), including methods developed for the DREAM challenge to map cells to locations with smaller number of informative genes (Tanevski et al. 2020; Alonso, Carrea, and Diambra 2020; Pham et al. 2020). However, such benchmarks do not seem to indicate interest in studying the biology of zebrafish and Drosophila development in 3D, as the purpose of such benchmark is more to validate computational methods than to perform biological inferences. Furthermore, zebrafish and Drosophila only take up very small proportions of all current era studies compared to mouse and human (Figure 4.4). For the mouse brain, the ABA mouse ISH atlas has been used as the spatial reference to quantitatively map scRNA-seq cell types to spatial locations (Zeisel et al. 2018). The Allen developing mouse brain ISH atlas was also used as spatial reference to map human brain organoid scRNA-seq cells to space and mouse developmental stages for interpretation (Fleck et al. 2021). Here, unlike the WMISH atlases and BDTNP, the ABA is not binarized before spatial mapping.

With staining for around 20,000 genes, the ABA is more frequently used to qualitatively confirm that the computationally imputed gene expression patterns recapitulate the ISH staining of the same genes (Stuart et al. 2019; Welch et al. 2019; Abdelaal et al. 2020; Korsunsky et al. 2019; Biancalani et al. 2020). EMAGE eMouseAtlas (Armit et al. 2017) has been used to qualitative validate Geo-seq (Peng et al. 2016) and DBiT-seq (Y. Liu, Yang, et al. 2020) results, but usage of EMAGE is rare in the current era. In current era ST and Visium, an H&E image of the tissue accompanies the spatial transcriptome. The H&E image of mouse brains has been used to manually or computationally align the dataset to the Allen Mouse Brain Common Coordinate Framework (CCF) (Wang et al. 2020) to integrate ABA’s brain anatomical ontologies to new datasets to facilitate interpretation of the data (Ortiz et al. 2020; Chen et al. 2020; Biancalani et al. 2020). Even without H&E, Allen ontologies have been used to manually annotate HybISS data from the developing mouse brain based on marker gene expression (La Manno et al. 2021). In the mouse primary motor cortex (MOp) MERFISH atlas (Zhang et al. 2020), Allen CCF was used to select the MOp region.

There are over 15 extant mouse databases from the prequel era (Figure 2.9), yet ABA is exceptional in its impact on the current era. We have never seen any mention of other prequel mouse databases, such as Eurexpress (Diez-Roux et al. 2011), and GenePaint (Yaylaoglu et al. 2005) in current era literature. This may be due to the following reasons: First, the ABA is the most comprehensive prequel atlas for the adult mouse brain, with around 20,000 genes for adult mice (P56). As of August 2021, EMAGE has ISH images for 17,554 genes, Eurexpress has 19,440 (that’s the number of assays, but it seems that each gene typically has one assay, so it should be close to the number of genes), and GenePaint contains Eurexpress data and can query ISH data from several other databases including ABA. EMAGE covers a wide range of developmental stages, from E0.5 to E18, but not later stages and adults. Eurexpress only covers E14.5. The Allen developing mouse brain atlas covers from E11.5 to P28, though only with about 2000 genes. When the ABA is used in the current era, most of the time the adult mouse atlas is used.

Second, the ABA has much better infrastructure to facilitate quantitative analyses of the atlas than the other prequel mouse atlases. Both EMAGE and Eurexpress have detailed annotation of ISH results for many genes and allow searching for genes with similar expression patterns. In addition, Eurexpress shows ISH for many consecutive sections in 3D, and EMAGE has 3D histology models (for morphology rather than gene expression) at different developmental stages. In addition to these functionalities, ABA quantified ISH staining and registered the quantified ISH to the CCF, so just like in scRNA-seq, each voxel would have a vector of gene expression values. Usage of ABA in the current era mentioned above would not be possible without the CCF. ABA also has an application programming interface (API) to automate retrieval of such quantitative data for analyses suitable for the quantitative nature of the current era. In contrast, we are unaware of such quantification, registration, and API in other prequel mouse atlases, thus restricting their uses to be more qualitative. A similar pattern can be seen in Drosophila prequel databases. BDTNP registered staining from thousands of embryos stained for different genes onto a common coordinate system. BDTNP data could also be easily downloaded as csv-like files that can be easily parsed, though as of August 2021, the BDTNP website is not responsive. As seen in 3, a reason why FlyExpress was commonly used was that images for different genes were registered in FlyExpress.

4.2 Metadata of the current era

The current era started with LCM followed by microarray in 1999 (Luo et al. 1999). Due to the immense popularity of LCM followed by microarray or RNA-seq, the body of LCM literature is too vast for unbiased and comprehensive manual curation, so the curated database does not include most LCM literature, which was instead collected from a PubMed search and text mined (Figure 6.3, Chapter 6). Because the search results—without manual inspection and curation—may contain irrelevant entries and miss relevant ones, they are separated from the curated database in our analyses. Current era literature in the curated database is classified into Microdissection, smFISH, ISS, Array, and No Imaging, to be defined in detail in their corresponding sections below.

Table 4.1: Summary of spatial transcriptomics techniques in the current era
Method First published Category Max # genes Min spot diameter (\(\mu\)m)
voxelation 2002-02-01 ROI selection Tx wide NA
PA-GFP 2010-11-12 ROI selection Tx wide NA
SRM seqFISH 2012-06-03 smFISH 32 single cell
Tomo-array 2012-09-19 ROI selection Tx wide NA
iceFISH 2013-02-17 smFISH 20 single cell
ISS 2013-07-14 ISS 222 single cell
Tomo-seq 2013-08-12 ROI selection Tx wide NA
bDNA-smFISH 2013-10-06 smFISH 928 single cell
TIVA 2014-01-12 ROI selection Tx wide NA
FISSEQ 2014-03-21 ISS 8102 single cell
seqFISH 2014-03-28 smFISH 10421 single cell
MERFISH 2015-04-24 smFISH 4209 single cell
Puzzle Imaging 2015-07-20 De novo NA NA
Geo-seq 2016-03-21 ROI selection Tx wide NA
corrFISH 2016-06-06 smFISH 10 single cell
ST 2016-07-01 NGS barcoding Tx wide 100
HCR-seqFISH 2016-10-19 smFISH 249 single cell
punch 2017-06-28 ROI selection Tx wide NA
SGA 2017-11-28 smFISH 35 single cell
APEX-RIP 2017-12-14 De novo NA NA
Niche-seq 2017-12-22 ROI selection Tx wide NA
ExM-MERFISH 2018-03-19 smFISH 10050 single cell
STARmap 2018-07-27 ISS 1020 single cell
Paired-cell sequencing 2018-09-17 De novo NA NA
osmFISH 2018-10-30 smFISH 33 single cell
seqFISH+ 2019-03-25 smFISH 10000 single cell
slide-seq 2019-03-29 NGS barcoding Tx wide 10
bDNA-MERFISH 2019-05-25 smFISH 130 single cell
GeoMX DSP 2019-06-21 ROI selection 2093 NA
DNA microscopy 2019-06-27 De novo NA NA
APEX-seq 2019-07-11 De novo NA NA
INSTA-seq 2019-08-06 ISS NA single cell
PARSIFT 2019-09-04 De novo NA NA
HDST 2019-09-09 NGS barcoding Tx wide 2
GaST-seq 2019-10-10 ROI selection Tx wide NA
BARseq 2019-10-17 ISS 79 single cell
miRNA nanowell 2020-05-09 NGS barcoding 9 300
split-FISH 2020-06-15 smFISH 317 single cell
Visium 2020-06-22 NGS barcoding Tx wide 55
ZipSeq 2020-07-06 ROI selection Tx wide NA
SMD-seq 2020-08-11 ROI selection Tx wide NA
HybISS 2020-09-29 smFISH 146 single cell
DBiT-seq 2020-10-19 NGS barcoding Tx wide 10
C-FISH 2020-10-23 smFISH 2 single cell
SCRINSHOT 2020-11-20 smFISH 31 single cell
slide-seq2 2020-12-07 NGS barcoding Tx wide 10
Stereo-seq 2021-01-19 NGS barcoding Tx wide 0.22
GeoMX WTA 2021-01-25 ROI selection 20175 NA
Seq-Scope 2021-01-27 NGS barcoding Tx wide 0.5
ExSeq 2021-01-29 ISS 297 single cell
BOLORAMIS 2021-03-08 ISS 96 single cell
Pick-Seq 2021-03-09 ROI selection Tx wide NA
nanoneedles 2021-03-10 ROI selection 9 NA
PIXEL-seq 2021-03-17 NGS barcoding Tx wide 1.22
CISI 2021-04-15 smFISH 37 single cell
STRP-seq 2021-04-19 ROI selection Tx wide NA
XYZeq 2021-04-21 NGS barcoding Tx wide 500
electro-seq 2021-04-23 ISS 201 single cell
BARseq2 2021-05-10 ISS 65 single cell
ClumpSeq 2021-05-24 De novo NA NA
sci-Space 2021-07-02 NGS barcoding Tx wide 73.2
PIC 2021-07-20 ROI selection Tx wide NA
par-seqFISH 2021-08-13 smFISH 105 single cell
SPACECAT 2021-08-17 ROI selection Tx wide NA
RNAscope 2021-09-29 smFISH 95 single cell
Molecular Cartography 2021-10-12 smFISH 100 single cell
Visium protein 2021-10-16 NGS barcoding Tx wide 55
coppaFISH 2021-10-24 smFISH 72 single cell
Raman2RNA 2021-12-01 smFISH 9 single cell
EASI-FISH 2021-12-06 smFISH 26 single cell
Halo-seq 2021-12-07 De novo NA NA
OpTAG-seq 2021-12-30 ROI selection Tx wide NA
SiT 2022-01-05 NGS barcoding Tx wide 55
MOSAICA 2022-01-10 smFISH 10 single cell
EEL FISH 2022-01-13 smFISH 445 single cell
STARmap PLUS 2022-01-16 ISS 2766 single cell
LoRNA 2022-01-25 De novo NA NA
manual dissection with velocimetry and cell tracking 2022-01-31 ROI selection Tx wide NA
SM-Omics 2022-02-10 NGS barcoding Tx wide 100
FUNseq 2022-02-22 ROI selection Tx wide NA
centrifugation on 384 well plate 2022-02-23 ROI selection Tx wide NA
Space-TREX 2022-02-24 NGS barcoding Tx wide 55
MERR APEX-seq 2022-03-03 De novo NA NA
vCatFISH 2022-03-16 smFISH 21 single cell
clampFISH 2.0 2022-03-17 smFISH 10 single cell
SPOTS 2022-03-18 NGS barcoding Tx wide 55
TATTOO-seq 2022-03-20 ROI selection Tx wide NA
SPARC-seq 2022-03-23 NGS barcoding Tx wide 50
Spatial-CITE-seq 2022-04-02 NGS barcoding Tx wide 25
GPS-seq 2022-04-05 NGS barcoding Tx wide NA
STRS 2022-04-21 NGS barcoding Tx wide 55
scStereo-seq 2022-05-04 NGS barcoding Tx wide 0.22
Select-seq 2022-05-09 ROI selection Tx wide NA
HybRISS 2022-05-13 smFISH 50 single cell
punch2 2022-06-17 ROI selection Tx wide NA
STcEM 2022-06-27 smFISH 287 single cell
SmT 2022-07-18 NGS barcoding Tx wide 55
SHM-seq 2022-07-19 NGS barcoding Tx wide 100
SMI 2022-07-19 smFISH 960 single cell
scNaST 2022-07-22 NGS barcoding Tx wide NA
PHYTOMap 2022-07-30 smFISH 28 single cell
Matrix-seq 2022-08-05 NGS barcoding Tx wide 50
Number of publications over time in the current era. The gray histogram in the background is the overall trend of all current era literature. Each facet highlights a category, ordered chronologically in terms of first report. Bin width is 365 days. Plots in this figure include curated LCM literature, but not the non-curated literature.

Figure 4.1: Number of publications over time in the current era. The gray histogram in the background is the overall trend of all current era literature. Each facet highlights a category, ordered chronologically in terms of first report. Bin width is 365 days. Plots in this figure include curated LCM literature, but not the non-curated literature.

Chronologically, in the curated database, microdissection came first, with voxelation in 2002 (V. M. Brown et al. 2002), followed by smFISH, ISS, no imaging, and NGS barcoding (Figure 4.1). Despite an early start in the midst of the (WM)ISH golden age, if not including non-curated LCM literature, the current era did not really take off until around 2014 (Figure 4.2). Ever since, its has seen drastic growth, far exceeding that of the prequel era in the 1990s and 2000s (Figure 4.2). Growth in microdissection and NGS barcoding seemed to have contributed the most to this overall drastic growth (Figure 4.1). All techniques in the curated database, along with their classification, maximum number of genes, spatial resolution, and references are listed in Table 4.1.

Comparing number of publications over time in the prequel and the current eras. Bin width is 180 days. The x-shaped points show the number of publications from the last bin, which is not yet full.

Figure 4.2: Comparing number of publications over time in the prequel and the current eras. Bin width is 180 days. The x-shaped points show the number of publications from the last bin, which is not yet full.

A timeline of foundational or influential techniques in the current era is shown in Figure 4.3. This is not meant to be a timeline of all current era techniques, but only of techniques that either laid the foundation of popular current era techniques (e.g. Solexa, later Illumina, sequencing) or very influential within a category of techniques (e.g. MERFISH for smFISH based techniques, and ST for NGS barcoding). Just like the “revolution” of current era spatial transcriptomics, each item in the timeline must not be understood as works of the “solitary genius”. Rather, each of the landmark innovations in the timeline occurred in its own historical context, with influences from predecessors, which are not plotted in the timeline for the sake of brevity.

Timeline of major techniques related to the current era.

Figure 4.3: Timeline of major techniques related to the current era.

The prequel era started with untargeted screens and grew into atlases and databases striving to be comprehensive. Screens are still a theme in the current era and spatial transcriptomics is still used in untargeted searches for genes involved in development of model organisms, but with highly multiplexed technology, this can also be done for pathological and human tissues (Figure 4.4, Figure 4.5). Thanks to multiplexing, while mouse was the most popular species in the prequel era, in the current era, there are more studies on human tissues than those on mice and the vast majority of studies are on either humans or mice (Figure 4.4). Furthermore, there are datasets for a wider range of organs in mice in the current era (e.g. colon, liver, uterus, and etc.) than in the prequel era though there still is more interest in the brain (Figure 4.6, Figure 2.7).

Drosophila is no longer as commonly used in the current era (Figure 4.4), perhaps because any current era technique requiring tissue sectioning is less amenable to Drosophila embryos, making Tomo-seq along one body axis the only technique that has been demonstrated to be amenable (Combs and Eisen 2013; Combs and Fraser 2018). Whole mount smFISH protocols exist for Drosophila brains (Long et al. 2017), zebrafish embryos (Oka and Sato 2015), and embryonic mouse organs (Wang et al. 2019), but to our best knowledge, highly multiplexed smFISH and ISS have not been adapted to whole mount samples, although they have been adapted to thick slices of cleared tissue (X. Wang et al. 2018; Wang et al. 2021). For Drosophila tissue sections, while microdissection, smFISH, and ISS may be applied, the resolution of ST and Visium may be too low to discern sufficiently fine patterns in such a small model organism.

Number of publication per species.

Figure 4.4: Number of publication per species.

A) Number of publications for each healthy organ in human (male shown here, as there is no study on healthy female specific organs in humans at present). B) Number of publications for pathological organs in human (female shown here, but there are two studies on prostate cancer (Burgess 2019; Brady et al. 2021)).

Figure 4.5: A) Number of publications for each healthy organ in human (male shown here, as there is no study on healthy female specific organs in humans at present). B) Number of publications for pathological organs in human (female shown here, but there are two studies on prostate cancer (Burgess 2019; Brady et al. 2021)).

While atlases have so far not been in the center of the stage, some atlases have been made with current era technology, such as MERFISH (Zhang et al. 2020), HybISS (La Manno et al. 2021), ST (Ortiz et al. 2020), Visium (Wu et al. 2021), GeoMX DSP (Delorey et al. 2021), and Slide-seq2 (Lake et al. 2021) described and analyzed with similar language to that of (WM)ISH atlases. Also as in the prequel era, the brain is still the most favored healthy organ (Figure 4.5, Figure 4.6). Among pathological tissues, breast tumors are the most used (Figure 4.5). Note that these anatograms only includes organs available in the R package gganatogram. Datasets from organs unavailable in the package are not shown. For metastases, the organ used for plotting here is the destination of metastases, so a liver metastasis of breast cancer would be plotted in the liver. More recently, in the wake of the SARS-CoV-2 pandemic, a number of studies using GeoMX Digital Spatial Profiler (DSP) to profile spatial transcriptomes of lungs of COVID victims have been published (Park et al. 2021; Butler et al. 2021; Delorey et al. 2021; Margaroli et al. 2021).

A) Number of publications per healthy organ in the mouse. B) Number of publications for pathological organs in mouse.

Figure 4.6: A) Number of publications per healthy organ in the mouse. B) Number of publications for pathological organs in mouse.

However, unlike in the prequel era, in which older technologies were adapted to larger scale to produce the screens and atlases, the current era has another major theme – new techniques, due to the challenges to be discussed in the following sections; the number of new techniques published each year has grown steadily in the past few years (Figure 4.9). However, this difference might be due to bias in curation and change in culture. In the prequel era, very different enhancer and gene trap vectors were lumped together into enhancer or gene trap in our database, and there might have been many different early non-radioactive ISH protocols not included in our database because they were not used to profile a sufficiently large number of genes. Furthermore, in the current era, authors like to give techniques new names, making related techniques seem distinct rather than lumped together in a wider category like enhancer or gene trap.

While a few techniques other than LCM have become popular, such as ISS (2013), Tomo-seq (2013), MERFISH (2015), ST (late 2016), GeoMX DSP (2019, only showing transcriptomics studies here), and Visium (first preprint in 2020), most techniques never or rarely spread beyond their institutions of origin (Figures 4.7, 4.9). Furthermore, except for Visium and LCM, prequel (WM)ISH, enhancer trap, and gene trap have been used by more institutions than current era techniques (4.8). This might be because there has not been enough time for recently published new techniques to be implemented elsewhere, or if they have been implemented, there has not been enough time for the relevant studies to be published. Or that there has been much less time for relatively new commercial techniques like GeoMX DSP to spread to more institutions compared to (WM)ISH. Furthermore, usage of Visium and GeoMX DSP might have been spread by commercialization and core facilities and usage of Tomo-seq might have been spread by relative ease of implementation with standard lab equipment; implementing complex current era techniques that require custom built equipment such as custom fluidics systems independently may be more challenging, thus hampering their widespread adoption. This is analogous to a well-tested and fool proof commercial cake mix widely available at grocery stores that only calls for standard kitchen equipment such as the oven and the hand mixer as opposed to a cake recipe that is not only very complicated but also requires the home cook to build custom kitchen equipment. Even if instructions to assemble the custom equipment is available, most people would probably prefer to buy the pre-assembled product when feasible. The average home cook would most likely prefer the former to the latter. Having a core facility perform a procedure is like ordering a cake from a bakery, which is much easier than DIY trials and errors and building custom equipment.

Techniques used by at least 3 institutions and the number of institutions that have used them.

Figure 4.7: Techniques used by at least 3 institutions and the number of institutions that have used them.

Prequel techniques used by at least 3 institutions and the number of institutions that have used them.

Figure 4.8: Prequel techniques used by at least 3 institutions and the number of institutions that have used them.

Protocols of WMISH (as used in GEISHA) (Bell, Yatskievych, and Antin 2004), ISH (as used in GenePaint and ABA) (Yaylaoglu et al. 2005), Visium, and MERFISH (J. R. Moffitt and Zhuang 2016) all have numerous steps. What (WM)ISH and Visium seem to have in common besides that they are widely adopted is that a significant part of the protocol is taken care of, by commercial automated systems (for (WM)ISH) or core facilities (for Visium), so there is less DIY hassle. Commercial automated ISH systems are commonly used by large scale (WM)ISH atlases. For example, GEISHA used the Abimed In Situ Pro (Bell, Yatskievych, and Antin 2004), and GenePaint (Geffers and Eichele 2015), ABA (Lein et al. 2007), and LungMap (Ljungberg et al. 2019) used the Tecan EVO liquid handling platform (or its pre-commercial version), to automate ISH staining of numerous sections or embryos and genes. Several major institutions have core facilities that perform Visium (“10X VISIUM SPATIAL TRANSCRIPTOMICS” n.d.; “ADVANCED GENOMICS CORE PRICING” n.d.; “SpaRTAN” n.d.), and even if the core facility does not perform Visium as a whole, NGS core facilities are common. Furthermore, the Visium protocol does not require custom made equipment that cannot be purchased from 10X itself and major lab equipment companies such as Bio-Rad and VWR. Visium involves scanning the H&E image of the tissue section, which can be done by a histology core. As library preparation of the forerunner of Visium, ST, can be automated (Jemt et al. 2016), it would be reasonable to say that Visium library preparation can be automated. In contrast, the MERFISH protocol involves a custom built fluidics system to automate the imaging and liquid handling and long imaging time that might not be appropriate for a microscopy core facility. However, as MERFISH is getting commercialized by Vizgen and automated with the MERSCOPE product, it might become more widely adopted in the near future as the commercial package removes a lot of DIY hassle to independently implement MERFISH.

Number of new methods per year, colored by the number of institutions that have used the method.

Figure 4.9: Number of new methods per year, colored by the number of institutions that have used the method.

Whether fastq files from published NGS based papers (no preprints) are available on a public data repository such as GEO over time. Bin width is 180 days.

Figure 4.10: Whether fastq files from published NGS based papers (no preprints) are available on a public data repository such as GEO over time. Bin width is 180 days.

While in prequel (WM)ISH atlases, the images are themselves the data, current era data goes beyond visualization of gene expression in space. NGS based current era data has the sequencing reads in fastq files, which can be re-processed for RNA velocity and isoform analyses. The fastq files are often deposited in data repositories such as GEO and ENA, where they can be downloaded for re-processing. However, for some human data, to protect patients’ privacy, the fastq files are not available or have controlled access. While the fastq files from most published papers for NGS based current era datasets are available in a data repository (Figure 4.10), the fastq files from most NGS based current era preprints are not available, especially the older preprints (Figure 4.11). Sometimes preprints state that the data will be deposited on GEO upon acceptance of the manuscript (e.g. (Zuo et al. 2021)).

Whether fastq files from published NGS based preprints are available on a public data repository such as GEO over time. Bin width is 90 days.

Figure 4.11: Whether fastq files from published NGS based preprints are available on a public data repository such as GEO over time. Bin width is 90 days.

Especially in the US, research in the current era tends to be more concentrated in a few elite institutions, while research in the prequel era tends to be more spread out to some less well-known institutions (Figure 4.12, 4.13). Among the top contributing institutions in the prequel era are those hosting databases, such as Allen Institute for ABA, University of Oregon (UO) for ZFIN, UC Berkeley and Lawrence Berkeley National Laboratory (LBL) for BDGP, University of Arizona (UofA) for GEISHA, Jackson Laboratory (JAX) for GXD, Western General Hospital (WGH) for EMAGE, and Kyoto University (Kyodai) for GHOST (Figure 4.13). By and large, in western Europe and northeast Asia, prequel and current era research was conducted in different institutions as well (Figure 4.14, Figure 4.15).

World map of institutions. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Figure 4.12: World map of institutions. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Map of institutions around continental US. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Figure 4.13: Map of institutions around continental US. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Map of institutions around western Europe. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Figure 4.14: Map of institutions around western Europe. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Map of institutions in northeast Asia. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

Figure 4.15: Map of institutions in northeast Asia. Area of the point is proportional to the number of publications from that city. Gray points are sum of both prequel and current eras for each city. Top 10 institutions in each era are labeled.

4.3 Learning from the past

What can we learn from the history of the prequel era? We might be able to learn something from the past, as people in the past have come up with good ideas that have been mostly forgotten in the present era. An example of such idea in the history of cycling is the 1930s network of at least 280 miles of cycleways separated from motor traffic in the UK, forgotten even by the Ministry of Transport itself; with the new wave of bike advocacy since the 1970s, there have been recent efforts to resurrect these old cycleways (Laskow 2017). Furthermore, the past can illustrate what might happen next and what to do to get better outcome during similar developments at present, such as how the 1918 Spanish flu pandemic has been compared to the current COVID pandemic to point to strategies (e.g. (Sharma et al. 2021; Robinson 2021)).

First, prequel (WM)ISH atlases by nature require thousands of animals to stain for thousands of genes, and often show photos from multiple animals stained for the same gene, sometimes showing variability in staining and morphology (especially in BDGP and GEISHA as the embryos are small and can be stained en masse), giving some qualitative sense of reproducibility of the staining and pattern and how generalizable a pattern seen in the atlas is to the wider population of the model organism. In contrast, current era datasets and atlases from model organisms tend to use much smaller numbers of animals thanks to multiplexing and cost and do not tend to discuss biological differences and reproducibility of results between the animals. For instance, in the Molecular Atlas of the Adult Mouse Brain (Ortiz et al. 2020), 3 male C57BL/6 mice were used. The online viewer of the Molecular Atlas shows gene expression in coronal sections from different mice all registered to the Allen CCF; adjacent spots from different mice sometimes have quite different expression of the same gene. However, such variability is not discussed in the paper. The MERFISH MOp atlas has 32 sections from each of the 2 mice used and reproducibility of results in the 2 mice is not discussed. The HybISS developing mouse atlas (La Manno et al. 2021) only used one E10.5 mouse embryo.

Second, while there are databases for current era data, as discussed in Section 5.9, they do not provide the querying functionalities and systematic annotations of the ABA, EMAGE, and Eurexpress. While SpatialDB (Fan, Chen, and Chen 2020) provides easy access to and visualization of processed current era data from several different technologies, as of August 2021, SpatialDB does not seem to have updated since 2020 and does not contain new datasets. As of writing, the Human Cell Atlas (HCA) (Regev et al. 2017) has data from 5 Visium studies and at least one HybISS study (Langseth et al. 2021). The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative - Cell Census Network (BICCN) (Adkins et al. 2020) has data from MERFISH, osmFISH, seqFISH, and etc. While the studies can be queried, as of writing, in HCA and BICCN, unlike in the prequel atlases, genes can’t be queried to open a webpage to show expression patterns in different data sources, nor can one search for other genes with similar expression patterns. Gene expression patterns are also not annotated. Given the massive volume of scRNA-seq and current era spatial data in HCA and BICCN, it would be more challenging to enable gene annotation, search, and comparison as this would involve analyzing and comparing hundreds of scRNA-seq datasets. However, for current era spatial data, for each organ of each species, the number of datasets in the order of dozens per organ seems to be more manageable for such analyses and comparisons that would enable prequel database style gene queries (Figures 4.5, 4.6). Specifically, mouse brain data can be registered to the CCF to facilitate comparison between datasets, studies, and subjects within one study. Furthermore, the massive volume of quantitative current era may be used to refine prequel gene annotations such as ontologies of developmental stages and anatomical regions.

Third, while most extant prequel (WM)ISH atlases, such as ABA, LungMAP, and GUDMAP, are hosted in online databases for query and view, most current era datasets – including those that claim to be atlases – cannot be viewed online, which can be useful in cases such as to easily look up more information about genome wide association study (GWAS) candidate genes associated with phenotype or expression quantitative trait loci (eQTL) and about differentially expressed (DE) genes from non-spatial transcriptomic or proteomic studies. Even if comparison and analysis of current era data for gene annotation and query is challenging, a web portal that searches multiple datasets for gene expression patterns, merely linking to the gene expression plots in the original data visualization websites of the datasets, would still be helpful. Besides datasets in SpatialDB, some current era datasets can be queried and visualized online, plotting gene expression values in space (dataset description is linked to the online data visualization portal), such as zebrafish Tomo-seq (Junker et al. 2014), mid-gastrula mouse embryo Geo-seq (Peng et al. 2016), mouse cortex osmFISH (Codeluppi et al. 2018), ST molecular atlas of the adult mouse brain (Ortiz et al. 2020), and ST and Cartana ISS for Alzheimer’s disease (Chen et al. 2020). However, many other current era atlases do not provide online visualization, such as the MERFISH MOp atlas (Zhang et al. 2020) and the Visium breast cancer atlas (Wu et al. 2021).

Finally, what if another revolution in spatial genomics comes? What in the current era will be remembered like the ABA, and what will be forgotten? We may take clues from the impact of the ABA in the current era and how other prequel atlases seem to be forgotten. To recap, the ABA is still relevant in the current era because of its comprehensiveness, quantification of ISH staining, registration to the CCF, and API to programatically query the database. Some of the hallmarks of the current era are quantitative data and multiplexing. With quantification and the CCF, ABA data resembles such hallmarks, although ABA’s quantification and CCF began in 2006, long before the current era really took off around the mid 2010s. The API makes the data easier to download for analyses than the images from (WM)ISH databases that don’t have APIs. The comprehensiveness makes the ABA relevant to qualitative comparisons to current era results. Furthermore, the Allen Institute itself is participating in the current era by not only producing bulk and single cell RNA-seq data for the atlas but also hosting the data catalog for the BICCN. In contrast, we are unaware of other prequel atlas consortia, such as EMAGE and BDGP, participating in the current era. We cannot foresee what the next revolution would be like. However, from ABA, perhaps we may say that for data, resources, and institutions from the current era to not to be forgotten when the next era comes, they should resemble or adapt to the hallmarks of the next era.