✨ What's new
Version ≥ 0.29.0 (Sep 25, 2024):
- New modules:
gget enrichr
now also supports species other than human and mouse (fly, yeast, worm, and fish) via modEnrichRgget mutate
:
gget mutate
will now merge identical sequences in the final file by default. Mutation creation was vectorized to decrease runtime. Improved flanking sequence check for non-substitution mutations to make sure no wildtype kmer is retained in the mutation-containing sequence. Addition of several new arguments to customize sequence generation and output.gget cosmic
:
Added support for targeted as well as gene screens. The CSV file created for gget mutate now also contains protein mutation info.gget ref
:
Added out file option.gget info
andgget seq
:
Switched to Ensembl POST API to increase speed (nothing changes in front end).- Other "behind the scenes" changes:
- Unit tests reorganized to increase speed and decrease code
- Requirements updated to allow newer mysql-connector versions
- Support Numpy>= 2.0
Version ≥ 0.28.6 (Jun 2, 2024):
- New module:
gget mutate
gget cosmic
: You can now download entire COSMIC databases using the argumentdownload_cosmic
argumentgget ref
: Can now fetch the GRCh37 genome assembly usingspecies='human_grch37'
gget search
: Adjust access of human data to the structure of Ensembl release 112 (fixes issue 129)
Version ≥ 0.28.5 (May 29, 2024):
- Yanked due to logging bug in
gget.setup("alphafold")
+ inversion mutations ingget mutate
only reverse the string instead of also computing the complementary strand
Version ≥ 0.28.4 (January 31, 2024):
gget setup
: Fix bug with filepath when runninggget.setup("elm")
on Windows OS.
Version ≥ 0.28.3 (January 22, 2024):
gget search
andgget ref
now also support fungi 🍄, protists 🌝, and invertebrate metazoa 🐝 🐜 🐌 🐙 (in addition to vertebrates and plants)- New module:
gget cosmic
gget enrichr
: Fix duplicate scatter dots in plot when pathway names are duplicatedgget elm
:- Changed ortho results column name 'Ortholog_UniProt_ID' to 'Ortholog_UniProt_Acc' to correctly reflect the column contents, which are UniProt Accessions. 'UniProt ID' was changed to 'UniProt Acc' in the documentation for all
gget
modules. - Changed ortho results column name 'motif_in_query' to 'motif_inside_subject_query_overlap'.
- Added interaction domain information to results (new columns: "InteractionDomainId", "InteractionDomainDescription", "InteractionDomainName").
- The regex string for regular expression matches was encapsulated as follows: "(?=(regex))" (instead of directly passing the regex string "regex") to enable capturing all occurrences of a motif when the motif length is variable and there are repeats in the sequence (https://regex101.com/r/HUWLlZ/1).
- Changed ortho results column name 'Ortholog_UniProt_ID' to 'Ortholog_UniProt_Acc' to correctly reflect the column contents, which are UniProt Accessions. 'UniProt ID' was changed to 'UniProt Acc' in the documentation for all
gget setup
: Use theout
argument to specify a directory the ELM database will be downloaded into. Completes this feature request.gget diamond
: The DIAMOND command is now run with--ignore-warnings
flag, allowing niche sequences such as amino acid sequences that only contain nucleotide characters and repeated sequences. This is also true for DIAMOND alignments performed withingget elm
.gget ref
andgget search
back-end change: the current Ensembl release is fetched from the new release file on the Ensembl FTP site to avoid errors during uploads of new releases.gget search
:- FTP link results (
--ftp
) are saved in txt file format instead of json. - Fix URL links to Ensembl gene summary for species with a subspecies name and invertebrates.
- FTP link results (
gget ref
:- Back-end changes to increase speed
- New argument:
list_iv_species
to list all available invertebrate species (can be combined with therelease
argument to fetch all species available from a specific Ensembl release)
Version ≥ 0.28.2 (November 15, 2023):
gget info
: Return a logging error message when the NCBI server fails for a reason other than a fetch fail (this is an error on the server side rather than an error withgget
)- Replace deprecated 'text' argument to find()-type methods whenever used with dependency
BeautifulSoup
gget elm
: Remove false positive and true negative instances from returned resultsgget elm
: Addexpand
argument
Version ≥ 0.28.0 (November 5, 2023):
- Updated documentation of
gget muscle
to add a tutorial on how to visualize sequences with varying sequence name lengths + slight change to returned visualization so it's a bit more robust to varying sequence names gget muscle
now also allows a list of sequences as input (as an alternative to providing the path to a FASTA file)- Allow missing gene filter for
gget cellxgene
(fixes bug) gget seq
: Allow missing gene names (fixes https://github.com/pachterlab/gget/issues/107)gget enrichr
: Use new argumentskegg_out
andkegg_rank
to create an image of the KEGG pathway with the genes from the enrichment analysis highlighted (thanks to this PR by Noriaki Sato)- New modules:
gget elm
andgget diamond
Version ≥ 0.27.9 (August 7, 2023):
gget enrichr
: Use new argumentbackground_list
to provide a list of background genesgget search
now also searches Ensembl synonyms (in addition to gene descriptions and names) to return more comprehensive search results (thanks to Samuel Klein for the suggestion)
Version ≥ 0.27.8 (July 12, 2023):
gget search
: Specify the Ensembl release from which information is fetched with new argument-r
--release
- Fixed bug in
gget pdb
(this bug was introduced in version 0.27.5)
Version ≥ 0.27.7 (May 15, 2023):
- Moved dependencies for modules
gget gpt
andgget cellxgene
from automatically installed requirements togget setup
. - Updated
gget alphafold
dependencies for compatibility with Python >= 3.10. - Added
census_version
argument togget cellxgene
.
Version ≥ 0.27.6 (May 1, 2023) (YANKED due to problems with dependencies -> replaced with version 0.27.7):
- Thanks to PR by Tomás Di Domenico:
gget search
can now also query plant 🌱 Ensembl IDs. - New module:
gget cellxgene
Version ≥ 0.27.5 (April 6, 2023):
- Updated
gget search
to function correctly with new Pandas version 2.0.0 (released on April 3rd, 2023) as well as older versions of Pandas - Updated
gget info
with new flagsuniprot
andncbi
which allow turning off results from these databases independently to save runtime (note: flagensembl_only
was deprecated) - All gget modules now feature a
-q / --quiet
(Python:verbose=False
) flag to turn off progress information
Version ≥ 0.27.4 (March 19, 2023):
- New module:
gget gpt
Version ≥ 0.27.3 (March 11, 2023):
gget info
excludes PDB IDs by default to increase speed (PDB results can be included using flag--pdb
/pdb=True
).
Version ≥ 0.27.2 (January 1, 2023):
- Updated
gget alphafold
to DeepMind's AlphaFold v2.3.0 (including new argumentsmultimer_for_monomer
andmultimer_recycles
)
Version ≥ 0.27.0 (December 10, 2022):
- Updated
gget alphafold
to match recent changes by DeepMind - Updated version number to match gget's creator's age following a long-standing Pachter lab tradition
Version ≥ 0.3.13 (November 11, 2022):
- Reduced runtime for
gget enrichr
andgget archs4
when used with Ensembl IDs
Version ≥ 0.3.12 (November 10, 2022):
gget info
now also returns subcellular localisation data from UniProt- New
gget info
flagensembl_only
returns only Ensembl results - Reduced runtime for
gget info
andgget seq
Version ≥ 0.3.11 (September 7, 2022):
- New module:
gget pdb
Version ≥ 0.3.10 (September 2, 2022):
gget alphafold
now also returns pLDDT values for generating plots from output without rerunning the program (also see the gget alphafold FAQ)
Version ≥ 0.3.9 (August 25, 2022):
- Updated openmm installation instructions for
gget alphafold
Version ≥ 0.3.8 (August 12, 2022):
- Fixed mysql-connector-python version requirements
Version ≥ 0.3.7 (August 9, 2022):
- NOTE: The Ensembl FTP site changed its structure on August 8, 2022. Please upgrade to
gget
version ≥ 0.3.7 if you usegget ref
Version ≥ 0.3.5 (August 6, 2022):
- New module:
gget alphafold
Version ≥ 0.2.6 (July 7, 2022):
gget ref
now supports plant genomes! 🌱
Version ≥ 0.2.5 (June 30, 2022):
- NOTE: UniProt changed the structure of their API on June 28, 2022. Please upgrade to
gget
version ≥ 0.2.5 if you use any of the modules querying data from UniProt (gget info
andgget seq
).
Version ≥ 0.2.3: (June 26, 2022):
- JSON is now the default output format for the command-line interface for modules that previously returned data frame (CSV) format by default (the output can be converted to data frame/CSV using flag
[-csv][--csv]
). Data frame/CSV remains the default output for Jupyter Lab / Google Colab (and can be converted to JSON withjson=True
). - For all modules, the first required argument was converted to a positional argument and should not be named anymore in the command-line, e.g.
gget ref -s human
→gget ref human
. gget info
:[--expand]
is deprecated. The module will now always return all of the available information.- Slight changes to the output returned by
gget info
, including the return of versioned Ensembl IDs. gget info
andgget seq
now support 🪱 WormBase and 🪰 FlyBase IDs.gget archs4
andgget enrichr
now also take Ensembl IDs as input with added flag[-e][--ensembl]
(ensembl=True
in Jupyter Lab / Google Colab).gget seq
argumentseqtype
was replaced by flag[-t][--translate]
(translate=True/False
in Jupyter Lab / Google Colab) which will return either nucleotide (False
) or amino acid (True
) sequences.gget search
argumentseqtype
was renamed toid_type
for clarity (still taking the same arguments 'gene' or 'transcript').