Usage¶
The seqspec tool operates on seqspec files and
Facilitates the standardization of preprocessing steps across different assays,
Enables data management and tracking,
Simplifies the interpretation and reuse of sequencing data.
seqspec consists of the following subcommands:
usage: seqspec [-h] <CMD> ...
seqspec 0.4.0: A machine-readable file format for genomic library sequence and structure.
GitHub: https://github.com/pachterlab/seqspec
Documentation: https://pachterlab.github.io/seqspec/
positional arguments:
<CMD>
auth Manage remote authentication profiles
build Deprecated. This command will be removed.
check Validate seqspec file against specification
find Find objects in seqspec file
file List files present in seqspec file
format Autoformat seqspec file
index Identify position of elements in seqspec file
info Get information from seqspec file
init Generate a new empty seqspec file
insert Insert regions or reads into an existing spec
methods Convert seqspec file into methods section
modify Modify attributes of various elements in seqspec file
onlist Get onlist file for elements in seqspec file
print Display the sequence and/or library structure from seqspec file
split Split seqspec file by modality
upgrade Upgrade seqspec file to current version (hidden)
version Get seqspec tool version and seqspec file version
optional arguments:
-h, --help show this help message and exitseqspec operates on seqspec compatible YAML files that follow the specification. All of the following examples will use the seqspec specification for the DOGMAseq-DIG assay which can be found here: seqspec/examples/specs/dogmaseq-dig/spec.yaml.
Any command that takes yaml also accepts gzipped specs such as spec.yaml.gz.
The build command is deprecated. Use seqspec init, seqspec insert, and seqspec modify instead.
seqspec auth: Manage remote authentication profiles¶
Use auth profiles when a spec points to protected remote files such as IGVF-hosted onlists or FASTQs.
seqspec auth <AUTH_CMD> ...seqspec auth has four subcommands:
init: create or update a profile that maps one or more hosts to credential environment variablespath: show where the auth config file liveslist: list configured profilesresolve: show which profile would be used for a given URL
The auth config is host-based. The profile stores environment variable names, not secrets.
Examples¶
# create an IGVF profile
seqspec auth init \
--profile igvf \
--host api.data.igvf.org \
--host data.igvf.org \
--kind basic \
--username-env IGVF_ACCESS_KEY_ID \
--password-env IGVF_ACCESS_KEY_SECRET
# inspect the config path
seqspec auth path
# list configured profiles
seqspec auth list
# resolve a URL to a profile
seqspec auth resolve https://api.data.igvf.org/reference-files/IGVFFI5429KKCK/seqspec check: Validate seqspec file against specification¶
Check that the seqspec file is correctly formatted and consistent with the specification.
seqspec check [-h] [-o OUT] [--skip {igvf,igvf_onlist_skip,structural}] [--auth-profile PROFILE] yamlfrom seqspec.seqspec_check import seqspec_check
from seqspec.utils import load_spec
spec = load_spec("spec.yaml", strict=False)
seqspec_check(spec, filter_type=None, auth_profile=None)optionally,
-o OUTcan be used to write the output to a file.optionally,
--skip {igvf,igvf_onlist_skip,structural}can filter out known diagnostic classes (see source for list).optionally,
--auth-profile PROFILEuses a named auth profile when checking remote files.yamlcorresponds to theseqspecfile and may be plain YAML or.yaml.gz.
seqspec check emits diagnostics with two severities:
error: the spec is invalid and should be fixed.warning: the spec is valid, but the declared geometry may still need explicit downstream handling.
Warnings do not mean the spec is malformed. They flag cases that are easy to miss, such as two reads in the same modality covering the same declared regions. In those cases, downstream tools may need explicit overlap handling such as seqspec index --no-overlap.
A list of checks performed:
Check that the spec validates against the JSON Schema.
Check that modalities are unique.
Check that
region_ids of the first level of thelibrary_speccorrespond to modalities (one per modality).Check that onlist files exist (either as local paths or reachable URLs).
Check that the
read_ids in thesequence_specare unique.Check that read files exist (either as local paths or reachable URLs).
Check that read
(primer_id, strand)pairs are unique across all reads.Check that the
region_ids are unique across all regions.Check that each read
modalityexists in the assay list of modalities.Check that each read
primer_idexists among the region IDs in thelibrary_spec.Check
sequence_typeand region annotation consistencies:
if a region has a sequence type “fixed” then it should not contain subregions
if a region has a sequence type “joined” then it should contain subregions
if a region has a sequence type “random” then it should not contain subregions and
sequenceshould be all X’sif a region has a sequence type “onlist” then it should have an onlist object
Check that the
min_lenis less than or equal to themax_len.Check that the length of the
sequencein every region is between themin_lenandmax_len.Check that the number of files in each
Readis the same across all reads.Check that for every region with subregions, the region
min_len/max_lenequals the sum of the subregions’min_len/max_len.Check that for every region with subregions, the region
sequenceequals the left-to-right concatenation of the subregions’sequences.Check that each read’s
max_lendoes not exceed the sequence-able range of library elements after (pos strand) or before (neg strand) the primer.Warn when two reads in the same modality cover the same declared regions. This often needs explicit overlap handling with
seqspec index --no-overlap.
Below are a list of example errors one may encounter when checking a spec:
# The "assay" value was not specified in the spec
[error 1] None is not of type 'string' in spec['assay']
# The "modalities" are not using the controlled vocabulary
[error 2] 'Ribonucleic acid' is not one of ['rna', 'tag', 'protein', 'atac', 'crispr'] in spec['modalities'][0]
# The "region_type" is not using the controlled vocabulary
[error 3] 'link_1' is not one of ['atac', 'barcode', 'cdna', 'crispr', 'fastq', 'gdna', 'hic', 'illumina_p5', 'illumina_p7', 'index5', 'index7', 'linker', 'ME1', 'ME2', 'methyl', 'nextera_read1', 'nextera_read2', 'poly_A', 'poly_G', 'poly_T', 'poly_C', 'protein', 'rna', 's5', 's7', 'tag', 'truseq_read1', 'truseq_read2', 'umi'] in spec['library_spec'][0]['regions'][3]['region_type']
# The "sequence_type" is not using the controlled vocabulary
[error 4] 'linker' is not one of ['fixed', 'random', 'onlist', 'joined'] in spec['library_spec'][0]['regions'][3]['sequence_type']
# The "region_id" is not unique across the spec
[error 5] region_id 'cell_bc' is not unique across all regions
# The length of the given "sequence" is less than the "min_len" specified for the sequence
[error 6] 'sample_bc' sequence 'NNNNNNNN' length '8' is less than min_len '10'
# The "filename" for the specified "onlist" does not exist in the same location as the spec.
[error 7] i5_index_onlist.txt does not exist
# The provided "sequence" contains invalid characters (only A, C, G, T, N, and X are permitted)
[error 8] 'NNNNNNNNZN' does not match '^[ACGTNX]+$' in spec['library_spec'][0]['regions'][4]['sequence']
# The "md5" for the given "onlist" file is not a valid md5sum
[error 9] '7asddd7asd7' does not match '^[a-f0-9]{32}$' in spec['library_spec'][0]['regions'][8]['onlist']['md5']Examples¶
# check the spec against the formal specification
$ seqspec check spec.yaml
[error 1] None is not of type 'string' in spec['assay']
[error 2] 'Ribonucleic acid' is not one of ['rna', 'tag', 'protein', 'atac', 'crispr'] in spec['modalities'][0]
# check a spec with protected remote resources
$ seqspec check --auth-profile igvf spec.yaml
# a valid spec can still emit overlap warnings
$ seqspec check overlap_spec.yaml
[warning 1] reads 'rna_R1' and 'rna_R2' in modality 'rna' both cover region(s) 'barcode', 'umi'. Downstream tools may require explicit overlap handling such as `seqspec index --no-overlap`seqspec find: Find objects in seqspec file¶
seqspec find [-h] [-o OUT] [-s SELECTOR] -m MODALITY [-i ID] yamlfrom seqspec.seqspec_find import run_find
run_find(spec_fn: str, modality: str, id: str, idtype: str, o: str)optionally,
-o OUTcan be used to write the output to a file.optionally,
-s Selectoris the type of the ID you are searching for (default: region). Can be one ofread
region
file
region-type
-m MODALITYis the modality in which you are searching within.-i IDthe ID you are searching for.yamlcorresponds to theseqspecfile.
Examples¶
# Find reads by id
$ seqspec find -m rna -s read -i rna_R1 spec.yaml
- !Read
read_id: rna_R1
name: rna Read 1
modality: rna
primer_id: rna_truseq_read1
min_len: 28
max_len: 28
strand: pos
files:
- !File
file_id: rna_R1_SRR18677638.fastq.gz
filename: rna_R1_SRR18677638.fastq.gz
filetype: fastq
filesize: 18499436
url: fastqs/rna_R1_SRR18677638.fastq.gz
urltype: local
md5: 7eb15a70da9b729b5a87e30b6596b641
# Find regions with `barcode` region type
$ seqspec find -m rna -s region-type -i barcode spec.yaml
- !Region
region_id: rna_cell_bc
region_type: barcode
name: Cell Barcode
sequence_type: onlist
sequence: NNNNNNNNNNNNNNNN
min_len: 16
max_len: 16
onlist: !Onlist
location: local
filename: RNA-737K-arc-v1.txt
filetype: txt
filesize: 0
url: RNA-737K-arc-v1.txt
urltype: local
md5: a88cd21e801ae6f9a7d9a48b67ccf693
file_id: RNA-737K-arc-v1.txt
regions: null
parent_id: rnaseqspec file: List files present in seqspec file¶
seqspec file [-h] [-o OUT] [-i IDs] -m MODALITY [-s SELECTOR] [-f FORMAT] [-k KEY] [--fullpath] yamlfrom seqspec.seqspec_file import run_file
run_file(spec_fn: str, m: str, ids: List[str], idtype: str, fmt: str, k: str, o: str, fp=False)optionally,
-o OUTcan be used to write the output to a file.optionally,
-s Selectoris the type of the ID you are searching for (default: read). Can be one ofread
region
file
region-type
optionally,
-f FORMATis the format to return the list of files. Can be one ofpaired
interleaved
index
list
json
optionally,
-k KEYis the key to display for the file (default: file_id). Can be one offile_id
filename
filetype
filesize
url
urltype
md5
all
-m MODALITYis the modality in which you are searching within.-i IDsthe ID you are searching for.yamlcorresponds to theseqspecfile.--fullpathexpands localurlvalues to absolute paths relative to the spec file.
Examples¶
# List paired read files
$ seqspec file -m rna spec.yaml
rna_R1_SRR18677638.fastq.gz rna_R2_SRR18677638.fastq.gz
# List interleaved read files
$ seqspec file -m rna -f interleaved spec.yaml
rna_R1_SRR18677638.fastq.gz
rna_R2_SRR18677638.fastq.gz
# List urls of all read files
$ seqspec file -m rna -f list -k url spec.yaml
rna_R1 rna_R1_SRR18677638.fastq.gz fastqs/rna_R1_SRR18677638.fastq.gz
rna_R2 rna_R2_SRR18677638.fastq.gz fastqs/rna_R2_SRR18677638.fastq.gz
# List all files in regions
$ seqspec file -m rna -f list -s region -k all spec.yaml
rna_cell_bc RNA-737K-arc-v1.txt RNA-737K-arc-v1.txt txt 2142553 https://github.com/pachterlab/qcbc/raw/main/tests/10xMOME/RNA-737K-arc-v1.txt.gz https a88cd21e801ae6f9a7d9a48b67ccf693
# List files for barcode regions in json
$ seqspec file -m rna -f json -s region-type -k all -i barcode spec.yaml
[
{
"file_id": "RNA-737K-arc-v1.txt",
"filename": "RNA-737K-arc-v1.txt",
"filetype": "txt",
"filesize": 2142553,
"url": "https://github.com/pachterlab/qcbc/raw/main/tests/10xMOME/RNA-737K-arc-v1.txt.gz",
"urltype": "https",
"md5": "a88cd21e801ae6f9a7d9a48b67ccf693"
}
]Note: seqspec file -s read gets the files for the read, not the files contained in the regions mapped to the read.
seqspec format: Autoformat seqspec file¶
Automatically fill in missing fields in the spec.
seqspec format [-h] [-o OUT] yamlfrom seqspec.seqspec_format import run_format
run_format(spec_fn: str, o: str)-o OUTthe path to create the formattedseqspecfile.yamlcorresponds to theseqspecfile.
Examples¶
# format the spec and print the spec to stdout
$ seqspec format spec.yaml
# note you can also overwrite the spec you are formatting
$ seqspec format -o spec.yaml spec.yamlseqspec index: Identify position of elements in seqspec file¶
Identify the position of elements in a spec for use in downstream tools. Returns the 0-indexed position of elements contained in a given region in the 5’->3’ direction.
seqspec index [-h] [-o OUT] [-t TOOL] [-s SELECTOR] [--rev] [--subregion-type SUBREGIONTYPE] [--no-overlap] -m MODALITY [-i IDs] yamlfrom seqspec.seqspec_index import run_index
run_index(spec_fn: str, modality: str, ids: List[str], idtype: str, fmt: str, rev: str, subregion_type: str, o)optionally,
-o OUTcan be used to write the output to a file.optionally,
--revcan be set to return the 3’->5’ index.optionally,
-t TOOLreturns the indices in the format specified by the tool. One of:chromap: emit barcode and genomic ranges in chromap--read-formatsyntaxkb:kallisto/kb count-x TECHNOLOGY(format) requires a barcode, UMI, and sequence. The followingregion_typeare used during indexing:barcodefor the barcodeumifor the umicdna,gdna,protein, ortagfor the sequence
kb-single: same askbbut forces a single feature segmentseqkit:seqkit subseq-r, --region string(format)simpleaf:simpleaf quant-c, --chemistry(format) requires a barcode, UMI, and sequence. The followingregion_typeare used during indexing:barcodefor the barcodeumifor the umicdnafor the sequence
starsolo:--soloCBstart,--soloCBlen,--soloUMIstart,--soloUMIlen(format) requires a barcode, UMI, and sequence. The followingregion_typeare used during indexing:barcodefor the barcodeumifor the umicdnafor the sequence
splitcode: splitcode@extractlines and tag groupstab: tab delimited file (region<\t>element<\t>start<t>end)zumis: yaml (format) requires a barcode, UMI, and sequence. The followingregion_typeare used during indexing:barcodefor the barcodeumifor the umicdnafor the sequence
optionally,
-s Selectoris the type of the ID you are searching for (default: read). Can be one ofread
region
file
-m MODALITYis the modality that the-r REGIONregion resides in.-i IDsis the ID of the object you are indexing.yamlcorresponds to theseqspecfile.--subregion-typefilters to a specific region_type in some formats (e.g., seqkit)--no-overlapremoves overlapping regions across coordinates (stable, by first occurrence)
Examples¶
# get the indices of the elements contained within the FASTQs specified in the spec in tab format
$ seqspec index -m atac -s file -i atac_R1_SRR18677642.fastq.gz,atac_R2_SRR18677642.fastq.gz,atac_R3_SRR18677642.fastq.gz spec.yaml
atac_R1 gdna gdna 0 53
atac_R2 atac linker linker 0 8
atac_R2 Cell Barcode barcode 8 24
atac_R3 gdna gdna 0 53
# do the same but in the kb format
$ seqspec index -m atac -t kb -s file -i atac_R1_SRR18677642.fastq.gz,atac_R2_SRR18677642.fastq.gz,atac_R3_SRR18677642.fastq.gz spec.yaml
1,8,24:-1,-1,-1:0,0,53,2,0,53
# If the files are specified in the spec then -i can be omitted
$ seqspec index -m atac -t kb -s file spec.yaml
1,8,24:-1,-1,-1:0,0,53,2,0,53seqspec info: get info about seqspec file¶
seqspec info [-h] [-k KEY] [-f FORMAT] [-o OUT] yamlfrom seqspec.seqspec_info import run_info
run_info(spec_fn: str, f: str, k=None, o=None)optionally,
-o OUTpath to write the info.optionally,
-k KEYthe object to display (default: meta). Can be one ofmodalities
meta
sequence_spec
library_spec
optionally,
-f FORMATthe output format (default: tab). Can be one oftab
json
yamlcorresponds to theseqspecfile.
Examples¶
# Get meta information in json format
$ seqspec info -f json spec.yaml
{
"seqspec_version": "0.3.0",
"assay_id": "DOGMAseq-DIG",
"name": "DOGMAseq-DIG/Illumina",
"doi": "https://doi.org/10.1186/s13059-022-02698-8",
"date": "23 June 2022",
"description": "DOGMAseq with digitonin (DIG) is a single-cell multi-omics assay that simultaneously measures protein, RNA, and chromatin accessibility in single cells. The assay is based on the DOGMAseq technology, which uses a DNA-barcoded antibody library to capture proteins of interest, followed by a single-cell RNA-seq protocol and a single-cell ATAC-seq protocol. The DOGMAseq-LLL assay is designed to be compatible with the 10x Genomics Chromium platform.",
"lib_struct": "",
...
# long output omitted
# Get the list of modalities
$ seqspec info -k modalities spec.yaml
protein tag rna atac
# Get library spec in json format
$ seqspec info -f json -k library_spec spec.yaml
{
"protein": [
{
"region_id": "ghost_protein_truseq_read1",
"region_type": "named",
"name": "Truseq Read 1",
"sequence_type": "fixed",
"onlist": null,
"sequence": "",
"min_len": 0,
"max_len": 0,
"regions": []
},
...
# long output omitted
# Get sequence spec in json format
$ seqspec info -f json -k sequence_spec spec.yaml
[
{
"read_id": "protein_R1",
"name": "protein Read 1",
"modality": "protein",
"primer_id": "protein_truseq_read1",
"min_len": 28,
"max_len": 28,
"strand": "pos",
"files": [
...
# long output omittedseqspec init: Generate a new empty seqspec draft¶
Create a minimal, valid draft containing only meta Regions (one per modality). This is intended as a starting point to then insert regions and reads.
seqspec init [-h] -n NAME -m MODALITIES [--doi DOI] [--description DESC] [--date YYYY-MM-DD] [-o OUT]-m MODALITIEScomma-separated list of modalities (e.g.,rna,atac).-n NAMEassay name.--doi,--description,--dateoptional metadata.-o OUToptional output path (default: stdout).
Example:
seqspec init -n myassay -m rna,atac -o spec.yamlseqspec methods: Convert seqspec file into methods section¶
Generate a methods section from a seqspec file.
seqspec methods [-h] -m MODALITY [-o OUT] yamlfrom seqspec.seqspec_methods import run_methods
run_methods(spec_fn: str, m: str, o: str)optionally,
-o OUTpath to write the methods section.-m MODALITYis the modality to write the methods for.yamlcorresponds to theseqspecfile.
Examples¶
# print methods for rna modality
$ seqspec methods -m rna spec.yaml
Methods
The rna portion of the DOGMAseq-DIG/Illumina assay was generated on 23 June 2022.
Libary structure
The library was generated using the CG000338 Chromium Next GEM Multiome ATAC + Gene Expression Rev. D protocol (10x Genomics) library protocol and Illumina Truseq Single Index library kit. The library contains the following elements:
1. Truseq Read 1: 33-33bp fixed sequence (ACACTCTTTCCCTACACGACGCTCTTCCGATCT).
2. Cell Barcode: 16-16bp onlist sequence (NNNNNNNNNNNNNNNN), onlist file: RNA-737K-arc-v1.txt.
3. umi: 12-12bp random sequence (XXXXXXXXXXXX).
4. cdna: 102-102bp random sequence (XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX).
5. Truseq Read 2: 34-34bp fixed sequence (AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC).
Sequence structure
The library was sequenced on a Illumina NovaSeq 6000 (EFO:0008637) using the NovaSeq 6000 S2 Reagent Kit v1.5 (100 cycles) sequencing kit. The library was sequenced using the following configuration:
- rna Read 1: 28 cycles on the positive strand using the rna_truseq_read1 primer. The following files contain the sequences in Read 1:
- File 1: rna_R1_SRR18677638.fastq.gz
- rna Read 2: 102 cycles on the negative strand using the rna_truseq_read2 primer. The following files contain the sequences in Read 2:
- File 1: rna_R2_SRR18677638.fastq.gzseqspec modify: Modify attributes of various elements (JSON-based)¶
Modify objects by passing a JSON array of partial objects via --keys. Only provided fields are applied.
seqspec modify [-h] -m MODALITY -s SELECTOR -k JSON [-o OUT] yamlSelectors: read, region, file, seqkit, seqprotocol, libkit, libprotocol, assay.
Examples:
# Update a read name
seqspec modify -m rna -s read -k '[{"read_id":"rna_R1","name":"renamed_rna_R1"}]' spec.yaml
# Update a region name
seqspec modify -m rna -s region -k '[{"region_id":"rna_cell_bc","name":"Cell Barcode"}]' spec.yaml
# Update a file url
seqspec modify -m rna -s file -k '[{"file_id":"R1.fastq.gz","url":"./fastq/R1.fastq.gz"}]' spec.yamlExamples¶
# modify the read id
$ seqspec modify -m atac -o mod_spec.yaml -i atac_R1 --read-id renamed_atac_R1 spec.yaml
# modify the region id
$ seqspec modify -m atac -o mod_spec.yaml -s region -i atac_cell_bc --region-id renamed_atac_cell_bc spec.yaml
# modify the files for R1 fastq
$ seqspec modify -m atac -o mod_spec.yaml -i atac_R1 --files "R1_1.fastq.gz,fastq,0,./fastq/R1_1.fastq.gz,local,null:R1_2.fastq.gz,fastq,0,./fastq/R1_2.fastq.gz,local,null" spec.yamlseqspec onlist: Get onlist file(s) for elements in seqspec file¶
seqspec onlist [-h] [-o OUT] [-s SELECTOR] [-f {product,multi}] [--auth-profile PROFILE] -m MODALITY [-i ID] yamlfrom seqspec.seqspec_onlist import get_onlists
from seqspec.utils import load_spec
spec = load_spec("spec.yaml")
get_onlists(spec, modality="rna", selector="region-type", id="barcode")optionally,
-o OUTwhen set with-f, writes the joined onlist to this file; when set without-f, downloads remote onlists locally and prints paths.-m MODALITYis the modality in which you are searching for the region.-i IDis theidof the object to search for the onlist.-s SELECTORis the type of theidof the object (default: read). Can be one of:read
region
region-type
-fselects how to combine multiple onlists:product(cartesian product)multi(row-aligned, zip with padding)
optionally,
--auth-profile PROFILEuses a named auth profile for protected remote onlists.yamlcorresponds to theseqspecfile and may be plain YAML or.yaml.gz.
Note: -s region-type is only valid when the matching regions come from one read geometry. If the same region_type appears across multiple reads in the modality, seqspec onlist errors and asks you to use -s read or -s region instead.
Examples¶
# Get onlist for the element in the rna_R1 read
$ seqspec onlist -m rna -s read -i rna_R1 spec.yaml
/path/to/spec/folder/RNA-737K-arc-v1.txt
# Get onlist for barcode region type
$ seqspec onlist -m rna -s region-type -i barcode spec.yaml
/path/to/spec/folder/RNA-737K-arc-v1.txt
# Ambiguous region-type matches across reads are rejected
$ seqspec onlist -m rna -s region-type -i barcode ambiguous_spec.yaml
region-type 'barcode' matches regions in multiple reads for modality 'rna': rna_R1, rna_R2. Use -s read or -s region to disambiguate.
# Get an onlist from a protected remote source
$ seqspec onlist --auth-profile igvf -m crispr -s region-type -i barcode spec.yaml
/path/to/spec/folder/IGVFFI5429KKCK.txt.gzseqspec print: Display the sequence and/or library structure from seqspec file¶
Print sequence and/or library structure as ascii, png, or html.
seqspec print [-h] [-o OUT] [-f FORMAT] yamlfrom seqspec.seqspec_print import seqspec_print
from seqspec.utils import load_spec
seqspec_print(load_spec("spec.yaml"), "seqspec-html")optionally,
-o OUTto set the path of printed file.optionally,
-f FORMATis the format of the printed file. Can be one of:library-ascii: prints an ascii tree of the library_specseqspec-html: prints a self-contained interactive HTML view of the library structure, reads, and metadataseqspec-png: prints a png summary of modality structuresseqspec-ascii: prints an ascii representation of both the library_spec and sequence_spec
yamlcorresponds to theseqspecfile and may be plain YAML or.yaml.gz.
The Python CLI supports all four formats. The standalone Rust CLI supports library-ascii, seqspec-ascii, and seqspec-html.
Examples¶
# Print the library structure as ascii
$ seqspec print spec.yaml
┌─'ghost_protein_truseq_read1:0'
├─'protein_truseq_read1:33'
├─'protein_cell_bc:16'
┌─protein─┤
│ ├─'protein_umi:12'
│ ├─'protein_seq:15'
│ └─'protein_truseq_read2:34'
│ ┌─'tag_truseq_read1:33'
│ ├─'tag_cell_bc:16'
├─tag─────┼─'tag_umi:12'
│ ├─'tag_seq:15'
─┤ └─'tag_truseq_read2:34'
│ ┌─'rna_truseq_read1:33'
│ ├─'rna_cell_bc:16'
├─rna─────┼─'rna_umi:12'
│ ├─'cdna:102'
│ └─'rna_truseq_read2:34'
│ ┌─'atac_truseq_read1:33'
│ ├─'gDNA:100'
└─atac────┼─'atac_truseq_read2:34'
├─'spacer:8'
└─'atac_cell_bc:16'
# Print the sequence and library structure as ascii
$ seqspec print -f seqspec-ascii spec.yaml
protein
---
|--------------------------->(1) protein_R1
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG
<--------------|(2) protein_R2
tag
---
|--------------------------->(1) tag_R1
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG
<--------------|(2) tag_R2
rna
---
|--------------------------->(1) rna_R1
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG
<-----------------------------------------------------------------------------------------------------|(2) rna_R2
atac
---
|---------------------------------------------------->(1) atac_R1
|----------------------->(2) atac_R2
ACACTCTTTCCCTACACGACGCTCTTCCGATCTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGACGCGNNNNNNNNNNNNNNNN
TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGGTCTGCGCNNNNNNNNNNNNNNNN
<----------------------------------------------------|(3) atac_R3
# Print the sequence and library structure as html
$ seqspec print -f seqspec-html -o spec.html spec.yaml
# Print the library structure as a png
$ seqspec print -o spec.png -f seqspec-png spec.yamlseqspec split: Split seqspec file by modality¶
seqspec split [-h] -o OUT yamlfrom seqspec.seqspec_split import run_split
run_split(spec_fn, o)optionally,
-o OUTname prepended to split specs.yamlcorresponds to theseqspecfile.
Examples¶
# split spec into modalities
$ seqspec split -o split spec.yaml
$ ls -1
spec.yaml
split.atac.yaml
split.protein.yaml
split.rna.yaml
split.tag.yamlseqspec version: Get seqspec tool version and seqspec file version¶
seqspec version [-h] [-o OUT] yamlfrom seqspec.seqspec_version import seqspec_version
from seqspec.utils import load_spec
seqspec_version(load_spec("spec.yaml"))optionally,
-o OUTpath to file to write output.yamlcorresponds to theseqspecfile and may be plain YAML or.yaml.gz.
Examples¶
# Get versions of tool and file
$ seqspec version spec.yaml
seqspec version: 0.4.0
seqspec file version: 0.4.0(HIDDEN) seqspec upgrade: Upgrade seqspec file from older versions to the current version¶
This is a hidden subcommand that upgrades an old version of the spec to the current one. It upgrades 0.0.x, 0.1.x, 0.2.0, and 0.3.0 specs to 0.4.0.
seqspec upgrade [-h] [-o OUT] yamlfrom seqspec.seqspec_upgrade import seqspec_upgrade
from seqspec.utils import load_spec
spec = load_spec("spec.v0_3_0.yaml", strict=False)
seqspec_upgrade(spec, spec.seqspec_version or "0.0.0")Examples¶
# upgrade a 0.3.0 spec to 0.4.0
$ seqspec upgrade -o spec.v0_4_0.yaml spec.v0_3_0.yaml- Xu, Z., Heidrich-O’Hare, E., Chen, W., & Duerr, R. H. (2022). Comprehensive benchmarking of CITE-seq versus DOGMA-seq single cell multimodal omics. Genome Biology, 23(1). 10.1186/s13059-022-02698-8