Python arguments are equivalent to long-option arguments (--arg), unless otherwise specified. Flags are True/False arguments in Python. The manual for any gget tool can be called from the command-line using the -h --help flag.

gget ref 📖

Fetch FTPs and their respective metadata (or use flag ftp to only return the links) for reference genomes and annotations from Ensembl by species.
Return format: dictionary/JSON.

Positional argument
Species for which the FTPs will be fetched in the format genus_species, e.g. homo_sapiens.
Supports all available vertebrate and invertebrate (plants, fungi, protists, and invertebrate metazoa) genomes from Ensembl, except bacteria.
Note: Not required when using flags --list_species or --list_iv_species.
Supported shortcuts: 'human', 'mouse'

Optional arguments
-w --which
Defines which results to return. Default: 'all' -> Returns all available results.
Possible entries are one or a combination (as comma-separated list) of the following:
'gtf' - Returns the annotation (GTF).
'cdna' - Returns the trancriptome (cDNA).
'dna' - Returns the genome (DNA).
'cds' - Returns the coding sequences corresponding to Ensembl genes. (Does not contain UTR or intronic sequence.)
'cdrna' - Returns transcript sequences corresponding to non-coding RNA genes (ncRNA).
'pep' - Returns the protein translations of Ensembl genes.

-r --release
Defines the Ensembl release number from which the files are fetched, e.g. 104. Default: latest Ensembl release.

-o --out
Path to the JSON file the results will be saved in, e.g. path/to/directory/results.json. Default: Standard out.
Python: save=True will save the output in the current working directory.

-l --list_species
Lists all available vertebrate species. (Python: combine with species=None.)

-liv --list_iv_species
Lists all available invertebrate species. (Python: combine with species=None.)

-ftp --ftp
Returns only the requested FTP links.

-d --download
Command-line only. Downloads the requested FTPs to the current directory (requires curl to be installed).

-q --quiet
Command-line only. Prevents progress information from being displayed.
Python: Use verbose=False to prevent progress information from being displayed.


Use gget ref in combination with kallisto | bustools to build a reference index:

kb ref -i INDEX -g T2G -f1 FASTA $(gget ref --ftp -w dna,gtf homo_sapiens)

→ kb ref builds a reference index using the latest DNA and GTF files of species Homo sapiens passed to it by gget ref.

List all available genomes from Ensembl release 103:

gget ref --list_species -r 103
# Python
gget.ref(species=None, list_species=True, release=103)

→ Returns a list with all available genomes (checks if GTF and FASTAs are available) from Ensembl release 103.
(If no release is specified, gget ref will always return information from the latest Ensembl release.)

Get the genome reference for a specific species:

gget ref -w gtf,dna homo_sapiens
# Python
gget.ref("homo_sapiens", which=["gtf", "dna"])

→ Returns a JSON with the latest human GTF and FASTA FTPs, and their respective metadata, in the format:

    "homo_sapiens": {
        "annotation_gtf": {
            "ftp": "",
            "ensembl_release": 106,
            "release_date": "28-Feb-2022",
            "release_time": "23:27",
            "bytes": "51379459"
        "genome_dna": {
            "ftp": "",
            "ensembl_release": 106,
            "release_date": "21-Feb-2022",
            "release_time": "09:35",
            "bytes": "881211416"

More examples