Contributing - seqspec

Thank you for wanting to add a spec or improve seqspec. If you have a bug that is related to seqspec please create an issue. This document outlines the process for suggesting improvements to the seqspec specification and the procedure for updating the specification.

Issues¶

The issue should contain

the seqspec command ran,
the error message, and
the seqspec and python version.

Improvements¶

To suggest improvements to the seqspec project please do the following:

Open an Issue: For suggesting improvements, please open a new issue in the GitHub repository.
Describe Your Suggestion: Clearly describe the problem and your proposed solution. Include examples and use cases where possible.
Engagement: Encourage community feedback on the suggestion through comments.
Iterate: Be open to iterating on your suggestion based on community feedback.

Specs and code changes¶

If you’d like to add assays sequence specifications or make modifications to the seqspec tool please do the following:

Fork the project.

# Press "Fork" at the top right of the GitHub page

Clone the fork and create a branch for your feature

git clone https://github.com/<USERNAME>/seqspec.git
cd seqspec
git checkout -b cool-new-feature
pip install -r dev-requirements.txt
pre-commit install

Make changes, add files, and commit

This means creating a seqspec for the assay and including one million reads for the FASTQ files pointed to in the spec. Assay specs should be located in assays/MYASSAY/. File structure should look like:

MYASSAY
├── onlist.txt.gz
├── ...
├── spec.yaml
└── fastqs
    ├── R1.fastq.gz
    ├── R2.fastq.gz
    └── ...

To generate one million reads from the FASTQ files associated with your spec, the following cna be run:

zcat allreads_R1.fastq.gz | head -4000000 | gzip > R1.fastq.gz # fastq files has 4 lines per record so 1 million records = 4 million lines

Before committing the spec, make sure to run:

seqspec print spec.yaml # make sure the structure matches expected
secspec check spec.yaml # checks the seqspec against the defined specification
seqspec format -o fmt.yaml spec.yaml  # formats many of the empty fields
mv fmt.yaml spec.yaml # move the formatted spec to the spec.yaml

# make changes, add files, and commit them
git add onlist.txt.gz spec.yaml fastq/R1.fastq.gz fastq/R2.fastq.gz
git commit -m "I made these changes"

Push changes to GitHub

git push origin cool-new-feature

Submit a pull request

If you are unfamiliar with pull requests, you can find more information on the GitHub help page.

Steps for Review¶

Initial Review: A maintainer will review the suggestion for completeness and relevance.
Community Feedback: A period for community feedback will follow.
Final Review: The maintainers will make a final review, considering all feedback.

Decision Making¶

Decisions will be made based on the specification’s goals, community feedback, and overall impact on the seqspec ecosystem.

Updating the Specification¶

Approval and Merging¶

Once approved, a maintainer will merge the changes into the specification.
Major changes may require a more detailed review process or a community vote.

Versioning and Change Log¶

Versioning: Follow semantic versioning. Major changes result in a version bump.
Change Log: Update the change log with a summary of the changes and contributors.

Testing and Validation¶

Ensure any changes are tested for compatibility and do not break existing functionality.

Adding or modifying controlled vocabulary¶

Various Region attributes use controlled vocabulary to describe the sequence. These vocabulary are listed in the specification. If you wish to add new controlled vocabulary or modify existing controlled vocabulary please first review the specification and then submit a pull request with an example Region. Please justify the inclusion of the controlled vocabulary in your pull request. Below are a list of questions and prompts to address:

If you are suggesting a new region_type:

In what assay is this region_type used? Please link to primary sources.
In what ways will the identification and extraction of the region_type be useful for sequence processing?
What seqspec tools need to be modified to take advantage of this new region_type?

If you are suggesting a new sequence_type:

Given examples of this sequence type.
Where is this sequence type used?
What seqspec tools need to be modified to take advantage of this new sequence_type?

Conclusion¶

We value your contributions and aim to make the process of improving the specification collaborative and transparent. For any questions, please contact the repository maintainers.

Tutorials

Sequencing Explained

Tutorials

Development & Release Guide