Skip to contents

This function reads the transcript spot file from the standard output of the commercial technologies (not GeoParquet) for spatial aggregation where the spots are assigned to polygons such as cells or spatial bins. Presets for Xenium, MERFISH, and CosMX are available. For Vizgen and Xenium, the images can be added when add_images = TRUE.

Usage

aggregateTx(
  file,
  df = NULL,
  by = NULL,
  sample_id = "sample01",
  spatialCoordsNames = c("X", "Y", "Z"),
  gene_col = "gene",
  phred_col = "qv",
  min_phred = 20,
  flip_geometry = FALSE,
  cellsize = NULL,
  square = TRUE,
  flat_topped = FALSE,
  new_geometry_name = "bins",
  unit = "micron"
)

aggregateTxTech(
  data_dir,
  df = NULL,
  by = NULL,
  tech = c("Vizgen", "Xenium", "CosMX"),
  sample_id = "sample01",
  image = NULL,
  min_phred = 20,
  flip = c("geometry", "image", "none"),
  max_flip = "50 MB",
  cellsize = NULL,
  square = TRUE,
  flat_topped = FALSE,
  new_geometry_name = "bins"
)

Arguments

file

File with the transcript spot coordinates. Should be one row per spot when read into R and should have columns for coordinates on each axis, gene the transcript is assigned to, and optionally cell the transcript is assigned to. Must be csv, tsv, or parquet.

df

If the file is already loaded into memory, a data frame (sf) with columns for the x, y, and optionally z coordinates and gene assignment of each transcript spot. If specified, then argument file will be ignored.

by

A sfc or sf object for spatial aggregation.

sample_id

Which sample in the SFE object the transcript spots should be added to.

spatialCoordsNames

Column names for the x, y, and optionally z coordinates of the spots. The defaults are for Vizgen.

gene_col

Column name for genes.

phred_col

Column name for Phred scores of the spots.

min_phred

Minimum Phred score to keep spot. By default 20, the conventional threshold indicating "acceptable", meaning that there's 1 chance that the spot was decoded in error.

flip_geometry

Logical, whether to flip the transcript spot geometries to match the images if added later.

cellsize

numeric of length 1 or 2 with target cellsize: for square or rectangular cells the width and height, for hexagonal cells the distance between opposite edges (edge length is cellsize/sqrt(3)). A length units object can be passed, or an area unit object with area size of the square or hexagonal cell.

square

logical; if FALSE, create hexagonal grid

flat_topped

logical; if TRUE generate flat topped hexagons, else generate pointy topped

new_geometry_name

Name to give to the new colGeometry in the output. Defaults to "bins".

unit

Unit the coordinates are in, either microns or pixels in full resolution image.

data_dir

Top level output directory.

tech

Which technology whose output to read, must be one of "Vizgen", "Xenium", or "CosMX" though more technologies may be added later.

image

String, which image(s) to add to the output SFE object. Not applicable to CosMX. See readVizgen and readXenium for options and multiple images can be specified. If NULL, then the default from the read function for the technology will be used.

flip

Logical, whether to flip the geometry to match image. Here the y coordinates are simply set to -y, so the original bounding box is not preserved. This is consistent with readVizgen and readXenium.

max_flip

Maximum size of the image allowed to flip the image. Because the image will be loaded into memory to be flipped. If the image is larger than this size then the coordinates will be flipped instead.

Value

A SFE object with count matrix for number of spots of each gene in each geometry. Geometries with no spot are removed.

Note

The resulting SFE object often includes geometries (e.g. grid cells) outside tissue, because there can be transcript spots detected outside the tissue. Also, bins at the edge of the tissue that don't fully overlap with the tissue will have lower transcript counts; this may have implications to downstream spatial analyses.