Usage¶
The seqspec tool operates on seqspec files and
- Facilitates the standardization of preprocessing steps across different assays,
- Enables data management and tracking,
- Simplifies the interpretation and reuse of sequencing data.
seqspec consists of the following subcommands:
usage: seqspec [-h] <CMD> ...
seqspec 0.3.0: A machine-readable file format for genomic library sequence and structure.
GitHub: https://github.com/pachterlab/seqspec
Documentation: https://pachterlab.github.io/seqspec/
positional arguments:
<CMD>
build Generate a complete seqspec with natural language (LLM-assisted)
check Validate seqspec file against specification
find Find objects in seqspec file
file List files present in seqspec file
format Autoformat seqspec file
index Identify position of elements in seqspec file
info Get information from seqspec file
init Generate a new empty seqspec file
insert Insert regions or reads into an existing spec
methods Convert seqspec file into methods section
modify Modify attributes of various elements in seqspec file
onlist Get onlist file for elements in seqspec file
print Display the sequence and/or library structure from seqspec file
split Split seqspec file by modality
upgrade Upgrade seqspec file to current version (hidden)
version Get seqspec tool version and seqspec file version
optional arguments:
-h, --help show this help message and exitseqspec operates on seqspec compatible YAML files that follow the specification. All of the following examples will use the seqspec specification for the DOGMAseq-DIG assay which can be found here: seqspec/examples/specs/dogmaseq-dig/spec.yaml.
seqspec check: Validate seqspec file against specification¶
Check that the seqspec file is correctly formatted and consistent with the specification.
seqspec check [-h] [-o OUT] [--skip {igvf,igvf_onlist_skip}] yamlfrom seqspec.seqspec_check import run_check
run_check(schema_fn: str, spec_fn: str, o: str)- optionally,
-o OUTcan be used to write the output to a file. - optionally,
--skip {igvf,igvf_onlist_skip}can filter out known IGVF-specific warnings (see source for list). yamlcorresponds to theseqspecfile.
A list of checks performed:
- Check that the spec validates against the JSON Schema.
- Check that modalities are unique.
- Check that
region_ids of the first level of thelibrary_speccorrespond to modalities (one per modality). - Check that onlist files exist (either as local paths or reachable URLs).
- Check that the
read_ids in thesequence_specare unique. - Check that read files exist (either as local paths or reachable URLs).
- Check that read
(primer_id, strand)pairs are unique across all reads. - Check that the
region_ids are unique across all regions. - Check that each read
modalityexists in the assay list of modalities. - Check that each read
primer_idexists among the region IDs in thelibrary_spec. - Check
sequence_typeand region annotation consistencies:
- if a region has a sequence type “fixed” then it should not contain subregions
- if a region has a sequence type “joined” then it should contain subregions
- if a region has a sequence type “random” then it should not contain subregions and
sequenceshould be all X’s - if a region has a sequence type “onlist” then it should have an onlist object
- Check that the
min_lenis less than or equal to themax_len. - Check that the length of the
sequencein every region is between themin_lenandmax_len. - Check that the number of files in each
Readis the same across all reads. - Check that for every region with subregions, the region
min_len/max_lenequals the sum of the subregions’min_len/max_len. - Check that for every region with subregions, the region
sequenceequals the left-to-right concatenation of the subregions’sequences. - Check that each read’s
max_lendoes not exceed the sequence-able range of library elements after (pos strand) or before (neg strand) the primer.
Below are a list of example errors one may encounter when checking a spec:
# The "assay" value was not specified in the spec
[error 1] None is not of type 'string' in spec['assay']
# The "modalities" are not using the controlled vocabulary
[error 2] 'Ribonucleic acid' is not one of ['rna', 'tag', 'protein', 'atac', 'crispr'] in spec['modalities'][0]
# The "region_type" is not using the controlled vocabulary
[error 3] 'link_1' is not one of ['atac', 'barcode', 'cdna', 'crispr', 'fastq', 'gdna', 'hic', 'illumina_p5', 'illumina_p7', 'index5', 'index7', 'linker', 'ME1', 'ME2', 'methyl', 'nextera_read1', 'nextera_read2', 'poly_A', 'poly_G', 'poly_T', 'poly_C', 'protein', 'rna', 's5', 's7', 'tag', 'truseq_read1', 'truseq_read2', 'umi'] in spec['library_spec'][0]['regions'][3]['region_type']
# The "sequence_type" is not using the controlled vocabulary
[error 4] 'linker' is not one of ['fixed', 'random', 'onlist', 'joined'] in spec['library_spec'][0]['regions'][3]['sequence_type']
# The "region_id" is not unique across the spec
[error 5] region_id 'cell_bc' is not unique across all regions
# The length of the given "sequence" is less than the "min_len" specified for the sequence
[error 6] 'sample_bc' sequence 'NNNNNNNN' length '8' is less than min_len '10'
# The "filename" for the specified "onlist" does not exist in the same location as the spec.
[error 7] i5_index_onlist.txt does not exist
# The provided "sequence" contains invalid characters (only A, C, G, T, N, and X are permitted)
[error 8] 'NNNNNNNNZN' does not match '^[ACGTNX]+$' in spec['library_spec'][0]['regions'][4]['sequence']
# The "md5" for the given "onlist" file is not a valid md5sum
[error 9] '7asddd7asd7' does not match '^[a-f0-9]{32}$' in spec['library_spec'][0]['regions'][8]['onlist']['md5']Examples¶
# check the spec against the formal specification
$ seqspec check spec.yaml
[error 1] None is not of type 'string' in spec['assay']
[error 2] 'Ribonucleic acid' is not one of ['rna', 'tag', 'protein', 'atac', 'crispr'] in spec['modalities'][0]seqspec find: Find objects in seqspec file¶
seqspec find [-h] [-o OUT] [-s SELECTOR] -m MODALITY [-i ID] yamlfrom seqspec.seqspec_find import run_find
run_find(spec_fn: str, modality: str, id: str, idtype: str, o: str)- optionally,
-o OUTcan be used to write the output to a file. - optionally,
-s Selectoris the type of the ID you are searching for (default: region). Can be one of- read
- region
- file
- region-type
-m MODALITYis the modality in which you are searching within.-i IDthe ID you are searching for.yamlcorresponds to theseqspecfile.
Examples¶
# Find reads by id
$ seqspec find -m rna -s read -i rna_R1 spec.yaml
- !Read
read_id: rna_R1
name: rna Read 1
modality: rna
primer_id: rna_truseq_read1
min_len: 28
max_len: 28
strand: pos
files:
- !File
file_id: rna_R1_SRR18677638.fastq.gz
filename: rna_R1_SRR18677638.fastq.gz
filetype: fastq
filesize: 18499436
url: fastqs/rna_R1_SRR18677638.fastq.gz
urltype: local
md5: 7eb15a70da9b729b5a87e30b6596b641
# Find regions with `barcode` region type
$ seqspec find -m rna -s region-type -i barcode spec.yaml
- !Region
region_id: rna_cell_bc
region_type: barcode
name: Cell Barcode
sequence_type: onlist
sequence: NNNNNNNNNNNNNNNN
min_len: 16
max_len: 16
onlist: !Onlist
location: local
filename: RNA-737K-arc-v1.txt
filetype: txt
filesize: 0
url: RNA-737K-arc-v1.txt
urltype: local
md5: a88cd21e801ae6f9a7d9a48b67ccf693
file_id: RNA-737K-arc-v1.txt
regions: null
parent_id: rnaseqspec file: List files present in seqspec file¶
seqspec file [-h] [-o OUT] [-i IDs] -m MODALITY [-s SELECTOR] [-f FORMAT] [-k KEY] [--fullpath] yamlfrom seqspec.seqspec_file import run_file
run_file(spec_fn: str, m: str, ids: List[str], idtype: str, fmt: str, k: str, o: str, fp=False)- optionally,
-o OUTcan be used to write the output to a file. - optionally,
-s Selectoris the type of the ID you are searching for (default: read). Can be one of- read
- region
- file
- region-type
- optionally,
-f FORMATis the format to return the list of files. Can be one of- paired
- interleaved
- index
- list
- json
- optionally,
-k KEYis the key to display for the file (default: file_id). Can be one of- file_id
- filename
- filetype
- filesize
- url
- urltype
- md5
- all
-m MODALITYis the modality in which you are searching within.-i IDsthe ID you are searching for.yamlcorresponds to theseqspecfile.--fullpathexpands localurlvalues to absolute paths relative to the spec file.
Examples¶
# List paired read files
$ seqspec file -m rna spec.yaml
rna_R1_SRR18677638.fastq.gz rna_R2_SRR18677638.fastq.gz
# List interleaved read files
$ seqspec file -m rna -f interleaved spec.yaml
rna_R1_SRR18677638.fastq.gz
rna_R2_SRR18677638.fastq.gz
# List urls of all read files
$ seqspec file -m rna -f list -k url spec.yaml
rna_R1 rna_R1_SRR18677638.fastq.gz fastqs/rna_R1_SRR18677638.fastq.gz
rna_R2 rna_R2_SRR18677638.fastq.gz fastqs/rna_R2_SRR18677638.fastq.gz
# List all files in regions
$ seqspec file -m rna -f list -s region -k all spec.yaml
rna_cell_bc RNA-737K-arc-v1.txt RNA-737K-arc-v1.txt txt 2142553 https://github.com/pachterlab/qcbc/raw/main/tests/10xMOME/RNA-737K-arc-v1.txt.gz https a88cd21e801ae6f9a7d9a48b67ccf693
# List files for barcode regions in json
$ seqspec file -m rna -f json -s region-type -k all -i barcode spec.yaml
[
{
"file_id": "RNA-737K-arc-v1.txt",
"filename": "RNA-737K-arc-v1.txt",
"filetype": "txt",
"filesize": 2142553,
"url": "https://github.com/pachterlab/qcbc/raw/main/tests/10xMOME/RNA-737K-arc-v1.txt.gz",
"urltype": "https",
"md5": "a88cd21e801ae6f9a7d9a48b67ccf693"
}
]Note: seqspec file -s read gets the files for the read, not the files contained in the regions mapped to the read.
seqspec format: Autoformat seqspec file¶
Automatically fill in missing fields in the spec.
seqspec format [-h] [-o OUT] yamlfrom seqspec.seqspec_format import run_format
run_format(spec_fn: str, o: str)-o OUTthe path to create the formattedseqspecfile.yamlcorresponds to theseqspecfile.
Examples¶
# format the spec and print the spec to stdout
$ seqspec format spec.yaml
# note you can also overwrite the spec you are formatting
$ seqspec format -o spec.yaml spec.yamlseqspec index: Identify position of elements in seqspec file¶
Identify the position of elements in a spec for use in downstream tools. Returns the 0-indexed position of elements contained in a given region in the 5’->3’ direction.
seqspec index [-h] [-o OUT] [-t TOOL] [-s SELECTOR] [--rev] [--subregion-type SUBREGIONTYPE] [--no-overlap] -m MODALITY [-i IDs] yamlfrom seqspec.seqspec_index import run_index
run_index(spec_fn: str, modality: str, ids: List[str], idtype: str, fmt: str, rev: str, subregion_type: str, o)- optionally,
-o OUTcan be used to write the output to a file. - optionally,
--revcan be set to return the 3’->5’ index. - optionally,
-t TOOLreturns the indices in the format specified by the tool. One of:chromap: emit barcode and genomic ranges in chromap--read-formatsyntaxkb:kallisto/kb count-x TECHNOLOGY(format) requires a barcode, UMI, and sequence. The followingregion_typeare used during indexing:barcodefor the barcodeumifor the umicdna,gdna,protein, ortagfor the sequence
kb-single: same askbbut forces a single feature segmentseqkit:seqkit subseq-r, --region string(format)simpleaf:simpleaf quant-c, --chemistry(format) requires a barcode, UMI, and sequence. The followingregion_typeare used during indexing:barcodefor the barcodeumifor the umicdnafor the sequence
starsolo:--soloCBstart,--soloCBlen,--soloUMIstart,--soloUMIlen(format) requires a barcode, UMI, and sequence. The followingregion_typeare used during indexing:barcodefor the barcodeumifor the umicdnafor the sequence
splitcode: splitcode@extractlines and tag groupstab: tab delimited file (region<\t>element<\t>start<t>end)zumis: yaml (format) requires a barcode, UMI, and sequence. The followingregion_typeare used during indexing:barcodefor the barcodeumifor the umicdnafor the sequence
- optionally,
-s Selectoris the type of the ID you are searching for (default: read). Can be one of- read
- region
- file
-m MODALITYis the modality that the-r REGIONregion resides in.-i IDsis the ID of the object you are indexing.yamlcorresponds to theseqspecfile.--subregion-typefilters to a specific region_type in some formats (e.g., seqkit)--no-overlapremoves overlapping regions across coordinates (stable, by first occurrence)
Examples¶
# get the indices of the elements contained within the FASTQs specified in the spec in tab format
$ seqspec index -m atac -s file -i atac_R1_SRR18677642.fastq.gz,atac_R2_SRR18677642.fastq.gz,atac_R3_SRR18677642.fastq.gz spec.yaml
atac_R1 gdna gdna 0 53
atac_R2 atac linker linker 0 8
atac_R2 Cell Barcode barcode 8 24
atac_R3 gdna gdna 0 53
# do the same but in the kb format
$ seqspec index -m atac -t kb -s file -i atac_R1_SRR18677642.fastq.gz,atac_R2_SRR18677642.fastq.gz,atac_R3_SRR18677642.fastq.gz spec.yaml
1,8,24:-1,-1,-1:0,0,53,2,0,53
# If the files are specified in the spec then -i can be omitted
$ seqspec index -m atac -t kb -s file spec.yaml
1,8,24:-1,-1,-1:0,0,53,2,0,53seqspec info: get info about seqspec file¶
seqspec info [-h] [-k KEY] [-f FORMAT] [-o OUT] yamlfrom seqspec.seqspec_info import run_info
run_info(spec_fn: str, f: str, k=None, o=None)- optionally,
-o OUTpath to write the info. - optionally,
-k KEYthe object to display (default: meta). Can be one of- modalities
- meta
- sequence_spec
- library_spec
- optionally,
-f FORMATthe output format (default: tab). Can be one of- tab
- json
yamlcorresponds to theseqspecfile.
Examples¶
# Get meta information in json format
$ seqspec info -f json spec.yaml
{
"seqspec_version": "0.3.0",
"assay_id": "DOGMAseq-DIG",
"name": "DOGMAseq-DIG/Illumina",
"doi": "https://doi.org/10.1186/s13059-022-02698-8",
"date": "23 June 2022",
"description": "DOGMAseq with digitonin (DIG) is a single-cell multi-omics assay that simultaneously measures protein, RNA, and chromatin accessibility in single cells. The assay is based on the DOGMAseq technology, which uses a DNA-barcoded antibody library to capture proteins of interest, followed by a single-cell RNA-seq protocol and a single-cell ATAC-seq protocol. The DOGMAseq-LLL assay is designed to be compatible with the 10x Genomics Chromium platform.",
"lib_struct": "",
...
# long output omitted
# Get the list of modalities
$ seqspec info -k modalities spec.yaml
protein tag rna atac
# Get library spec in json format
$ seqspec info -f json -k library_spec spec.yaml
{
"protein": [
{
"region_id": "ghost_protein_truseq_read1",
"region_type": "named",
"name": "Truseq Read 1",
"sequence_type": "fixed",
"onlist": null,
"sequence": "",
"min_len": 0,
"max_len": 0,
"regions": []
},
...
# long output omitted
# Get sequence spec in json format
$ seqspec info -f json -k sequence_spec spec.yaml
[
{
"read_id": "protein_R1",
"name": "protein Read 1",
"modality": "protein",
"primer_id": "protein_truseq_read1",
"min_len": 28,
"max_len": 28,
"strand": "pos",
"files": [
...
# long output omittedseqspec init: Generate a new empty seqspec draft¶
Create a minimal, valid draft containing only meta Regions (one per modality). This is intended as a starting point to then insert regions and reads.
seqspec init [-h] -n NAME -m MODALITIES [--doi DOI] [--description DESC] [--date YYYY-MM-DD] [-o OUT]-m MODALITIEScomma-separated list of modalities (e.g.,rna,atac).-n NAMEassay name.--doi,--description,--dateoptional metadata.-o OUToptional output path (default: stdout).
Example:
seqspec init -n myassay -m rna,atac -o spec.yamlseqspec methods: Convert seqspec file into methods section¶
Generate a methods section from a seqspec file.
seqspec methods [-h] -m MODALITY [-o OUT] yamlfrom seqspec.seqspec_methods import run_methods
run_methods(spec_fn: str, m: str, o: str)- optionally,
-o OUTpath to write the methods section. -m MODALITYis the modality to write the methods for.yamlcorresponds to theseqspecfile.
Examples¶
# print methods for rna modality
$ seqspec methods -m rna spec.yaml
Methods
The rna portion of the DOGMAseq-DIG/Illumina assay was generated on 23 June 2022.
Libary structure
The library was generated using the CG000338 Chromium Next GEM Multiome ATAC + Gene Expression Rev. D protocol (10x Genomics) library protocol and Illumina Truseq Single Index library kit. The library contains the following elements:
1. Truseq Read 1: 33-33bp fixed sequence (ACACTCTTTCCCTACACGACGCTCTTCCGATCT).
2. Cell Barcode: 16-16bp onlist sequence (NNNNNNNNNNNNNNNN), onlist file: RNA-737K-arc-v1.txt.
3. umi: 12-12bp random sequence (XXXXXXXXXXXX).
4. cdna: 102-102bp random sequence (XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX).
5. Truseq Read 2: 34-34bp fixed sequence (AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC).
Sequence structure
The library was sequenced on a Illumina NovaSeq 6000 (EFO:0008637) using the NovaSeq 6000 S2 Reagent Kit v1.5 (100 cycles) sequencing kit. The library was sequenced using the following configuration:
- rna Read 1: 28 cycles on the positive strand using the rna_truseq_read1 primer. The following files contain the sequences in Read 1:
- File 1: rna_R1_SRR18677638.fastq.gz
- rna Read 2: 102 cycles on the negative strand using the rna_truseq_read2 primer. The following files contain the sequences in Read 2:
- File 1: rna_R2_SRR18677638.fastq.gzseqspec modify: Modify attributes of various elements (JSON-based)¶
Modify objects by passing a JSON array of partial objects via --keys. Only provided fields are applied.
seqspec modify [-h] -m MODALITY -s SELECTOR -k JSON [-o OUT] yamlSelectors: read, region, file, seqkit, seqprotocol, libkit, libprotocol, assay.
Examples:
# Update a read name
seqspec modify -m rna -s read -k '[{"read_id":"rna_R1","name":"renamed_rna_R1"}]' spec.yaml
# Update a region name
seqspec modify -m rna -s region -k '[{"region_id":"rna_cell_bc","name":"Cell Barcode"}]' spec.yaml
# Update a file url
seqspec modify -m rna -s file -k '[{"file_id":"R1.fastq.gz","url":"./fastq/R1.fastq.gz"}]' spec.yamlExamples¶
# modify the read id
$ seqspec modify -m atac -o mod_spec.yaml -i atac_R1 --read-id renamed_atac_R1 spec.yaml
# modify the region id
$ seqspec modify -m atac -o mod_spec.yaml -s region -i atac_cell_bc --region-id renamed_atac_cell_bc spec.yaml
# modify the files for R1 fastq
$ seqspec modify -m atac -o mod_spec.yaml -i atac_R1 --files "R1_1.fastq.gz,fastq,0,./fastq/R1_1.fastq.gz,local,null:R1_2.fastq.gz,fastq,0,./fastq/R1_2.fastq.gz,local,null" spec.yamlseqspec onlist: Get onlist file(s) for elements in seqspec file¶
seqspec onlist [-h] [-o OUT] [-s SELECTOR] [-f {product,multi}] -m MODALITY [-i ID] yamlfrom seqspec.seqspec_onlist import run_onlist
run_onlist(spec_fn, modality, ids, idtype, fmt, o)- optionally,
-o OUTwhen set with-f, writes the joined onlist to this file; when set without-f, downloads remote onlists locally and prints paths. -m MODALITYis the modality in which you are searching for the region.-i IDis theidof the object to search for the onlist.-s SELECTORis the type of theidof the object (default: read). Can be one of:- read
- region
- region-type
-fselects how to combine multiple onlists:product(cartesian product)multi(row-aligned, zip with padding)
yamlcorresponds to theseqspecfile.
Note: If, for example, there are multiple regions with the specified region_type in the modality (e.g. multiple barcodes), then seqspec onlist will return a path to an onlist that it generates where the entries in that onlist are the cartesian product of the onlists for all of the regions found.
Examples¶
# Get onlist for the element in the rna_R1 read
$ seqspec onlist -m rna -s read -i rna_R1 spec.yaml
/path/to/spec/folder/RNA-737K-arc-v1.txt
# Get onlist for barcode region type
$ seqspec onlist -m rna -s region-type -i barcode spec.yaml
/path/to/spec/folder/RNA-737K-arc-v1.txtseqspec print: Display the sequence and/or library structure from seqspec file¶
Print sequence and/or library structure as ascii, png, or html.
seqspec print [-h] [-o OUT] [-f FORMAT] yamlfrom seqspec.seqspec_print import run_seqspec_print
run_seqspec_print(spec_fn, fmt, o)- optionally,
-o OUTto set the path of printed file. - optionally,
-f FORMATis the format of the printed file. Can be one of:library-ascii: prints an ascii tree of the library_specseqspec-html: prints an html of both the library_spec and sequence_spec (TODO this is incomplete)seqspec-png: prints a png summary of modality structuresseqspec-ascii: prints an ascii representation of both the library_spec and sequence_spec
yamlcorresponds to theseqspecfile.
Examples¶
# Print the library structure as ascii
$ seqspec print spec.yaml
┌─'ghost_protein_truseq_read1:0'
├─'protein_truseq_read1:33'
├─'protein_cell_bc:16'
┌─protein─┤
│ ├─'protein_umi:12'
│ ├─'protein_seq:15'
│ └─'protein_truseq_read2:34'
│ ┌─'tag_truseq_read1:33'
│ ├─'tag_cell_bc:16'
├─tag─────┼─'tag_umi:12'
│ ├─'tag_seq:15'
─┤ └─'tag_truseq_read2:34'
│ ┌─'rna_truseq_read1:33'
│ ├─'rna_cell_bc:16'
├─rna─────┼─'rna_umi:12'
│ ├─'cdna:102'
│ └─'rna_truseq_read2:34'
│ ┌─'atac_truseq_read1:33'
│ ├─'gDNA:100'
└─atac────┼─'atac_truseq_read2:34'
├─'spacer:8'
└─'atac_cell_bc:16'
# Print the sequence and library structure as ascii
$ seqspec print -f seqspec-ascii spec.yaml
protein
---
|--------------------------->(1) protein_R1
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG
<--------------|(2) protein_R2
tag
---
|--------------------------->(1) tag_R1
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG
<--------------|(2) tag_R2
rna
---
|--------------------------->(1) rna_R1
ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG
<-----------------------------------------------------------------------------------------------------|(2) rna_R2
atac
---
|---------------------------------------------------->(1) atac_R1
|----------------------->(2) atac_R2
ACACTCTTTCCCTACACGACGCTCTTCCGATCTXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGACGCGNNNNNNNNNNNNNNNN
TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGGTCTGCGCNNNNNNNNNNNNNNNN
<----------------------------------------------------|(3) atac_R3
# Print the sequence and library structure as html
$ seqspec print -f seqspec-html spec.yaml
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<style>
highlight {
color: green;
}
...
# long output omitted
# Print the library structure as a png
$ seqspec print -o spec.png -f seqspec-png spec.yamlseqspec split: Split seqspec file by modality¶
seqspec split [-h] -o OUT yamlfrom seqspec.seqspec_split import run_split
run_split(spec_fn, o)- optionally,
-o OUTname prepended to split specs. yamlcorresponds to theseqspecfile.
Examples¶
# split spec into modalities
$ seqspec split -o split spec.yaml
$ ls -1
spec.yaml
split.atac.yaml
split.protein.yaml
split.rna.yaml
split.tag.yamlseqspec version: Get seqspec tool version and seqspec file version¶
seqspec version [-h] [-o OUT] yamlfrom seqspec.seqspec_version import run_version
run_version(spec_fn, o)- optionally,
-o OUTpath to file to write output. yamlcorresponds to theseqspecfile.
Examples¶
# Get versions of tool and file
$ seqspec version spec.yaml
seqspec version: 0.3.0
seqspec file version: 0.3.0(HIDDEN) seqspec upgrade: Upgrade seqspec file from older versions to the current version¶
This is a hidden subcommand that upgrades an old version of the spec to the current one. It is not intended to be used in a production environment.
seqspec upgrade [-h] [-o OUT] yamlfrom seqspec.seqspec_upgrade import run_upgrade
run_upgrade(spec_fn, o)Examples¶
# upgrade spec
$ seqspec upgrade -o spec.yaml spec.yaml- Xu, Z., Heidrich-O’Hare, E., Chen, W., & Duerr, R. H. (2022). Comprehensive benchmarking of CITE-seq versus DOGMA-seq single cell multimodal omics. Genome Biology, 23(1). 10.1186/s13059-022-02698-8