Aggregate transcript spots from file — aggregateTx • SpatialFeatureExperiment

This function reads the transcript spot file from the standard output of the commercial technologies (not GeoParquet) for spatial aggregation where the spots are assigned to polygons such as cells or spatial bins. Presets for Xenium, MERFISH, and CosMX are available. For Vizgen and Xenium, the images can be added when add_images = TRUE.

Usage

aggregateTx(
  file,
  df = NULL,
  by = NULL,
  sample_id = "sample01",
  spatialCoordsNames = c("X", "Y", "Z"),
  gene_col = "gene",
  phred_col = "qv",
  min_phred = 20,
  flip_geometry = FALSE,
  cellsize = NULL,
  square = TRUE,
  flat_topped = FALSE,
  new_geometry_name = "bins",
  unit = "micron"
)

aggregateTxTech(
  data_dir,
  df = NULL,
  by = NULL,
  tech = c("Vizgen", "Xenium", "CosMX"),
  sample_id = "sample01",
  image = NULL,
  min_phred = 20,
  flip = c("geometry", "image", "none"),
  max_flip = "50 MB",
  cellsize = NULL,
  square = TRUE,
  flat_topped = FALSE,
  new_geometry_name = "bins"
)

Arguments

file: File with the transcript spot coordinates. Should be one row per spot when read into R and should have columns for coordinates on each axis, gene the transcript is assigned to, and optionally cell the transcript is assigned to. Must be csv, tsv, or parquet.
df: If the file is already loaded into memory, a data frame (sf) with columns for the x, y, and optionally z coordinates and gene assignment of each transcript spot. If specified, then argument file will be ignored.
by: A sfc or sf object for spatial aggregation.
sample_id: Which sample in the SFE object the transcript spots should be added to.
spatialCoordsNames: Column names for the x, y, and optionally z coordinates of the spots. The defaults are for Vizgen.
gene_col: Column name for genes.
phred_col: Column name for Phred scores of the spots.
min_phred: Minimum Phred score to keep spot. By default 20, the conventional threshold indicating "acceptable", meaning that there's 1 chance that the spot was decoded in error.
flip_geometry: Logical, whether to flip the transcript spot geometries to match the images if added later.
cellsize: numeric of length 1 or 2 with target cellsize: for square or rectangular cells the width and height, for hexagonal cells the distance between opposite edges (edge length is cellsize/sqrt(3)). A length units object can be passed, or an area unit object with area size of the square or hexagonal cell.
square: logical; if FALSE, create hexagonal grid
flat_topped: logical; if TRUE generate flat topped hexagons, else generate pointy topped
new_geometry_name: Name to give to the new colGeometry in the output. Defaults to "bins".
unit: Unit the coordinates are in, either microns or pixels in full resolution image.
data_dir: Top level output directory.
tech: Which technology whose output to read, must be one of "Vizgen", "Xenium", or "CosMX" though more technologies may be added later.
image: String, which image(s) to add to the output SFE object. Not applicable to CosMX. See readVizgen and readXenium for options and multiple images can be specified. If NULL, then the default from the read function for the technology will be used.
flip: Logical, whether to flip the geometry to match image. Here the y coordinates are simply set to -y, so the original bounding box is not preserved. This is consistent with readVizgen and readXenium.
max_flip: Maximum size of the image allowed to flip the image. Because the image will be loaded into memory to be flipped. If the image is larger than this size then the coordinates will be flipped instead.

Value

A SFE object with count matrix for number of spots of each gene in each geometry. Geometries with no spot are removed.

Note

The resulting SFE object often includes geometries (e.g. grid cells) outside tissue, because there can be transcript spots detected outside the tissue. Also, bins at the edge of the tissue that don't fully overlap with the tissue will have lower transcript counts; this may have implications to downstream spatial analyses.