Python arguments are equivalent to long-option arguments (--arg), unless otherwise specified. Flags are True/False arguments in Python. The manual for any gget tool can be called from the command-line using the -h --help flag.

gget cosmic 🪐

Search for genes, mutations, and other factors associated with cancer using the COSMIC (Catalogue Of Somatic Mutations In Cancer) database.
Return format: JSON (command-line) or data frame/CSV (Python) when download_cosmic=False. When download_cosmic=True, downloads the requested database into the specified folder.

This module was originally written in part by @AubakirovArman (information querying) and @josephrich98 (database download).

NOTE: License fees apply for the commercial use of COSMIC. You can read more about licensing COSMIC data here.

NOTE: When using this module for the first time, first download a COSMIC database to obtain cosmic_tsv_path (see examples below).

Positional argument (for querying information)
searchterm
Search term, which can be a mutation, or gene name (or Ensembl ID), or sample, etc.
Examples: 'EGFR', 'ENST00000275493', 'c.650A>T', 'p.Q217L', 'COSV51765119', 'BT2012100223LNCTB' (sample ID)
NOTE: (Python only) Set to None when downloading COSMIC databases with download_cosmic=True.

Required argument (for querying information)
-ctp --cosmic_tsv_path
Path to the COSMIC database tsv file, e.g. 'path/to/CancerMutationCensus_AllData_v101_GRCh37.tsv'.
This file is downloaded when downloading COSMIC databases using the arguments described below.
NOTE: This is a required argument when download_cosmic=False.

Optional arguments (for querying information)
-l --limit
Limits number of hits to return. Default: 100.

-csv --csv
Command-line only. Returns results in CSV format.
Python: Use json=True to return output in JSON format.

Flags (for downloading COSMIC databases)
-d --download_cosmic
Switches into database download mode.

-gm --gget_mutate
Creates a modified version of the COSMIC database for use with gget mutate.

Optional arguments (for downloading COSMIC databases)
-cp --cosmic_project
'cancer' (default), 'cancer_example', 'census', 'resistance', 'cell_line', 'genome_screen', or 'targeted_screen'
Type of COSMIC database to download:

cosmic_projectDescriptionNotesSize
cancerCancer Mutation Census (CMC) (most commonly used COSMIC mutation set)Only available for GRCh37. Most feature-rich schema (takes the longest to search).2 GB
cancer_exampleExample CMC subset provided for testing and demonstrationDownloadable without a COSMIC account. Minimal dataset.2.5 MB
censusCOSMIC census of curated somatic mutations in known cancer genesSmaller curated set of known cancer drivers.630 MB
resistanceMutations associated with drug resistanceHelpful for pharmacogenomics research.1.6 MB
cell_lineCell Lines Project mutation dataSample metadata often available.2.7 GB
genome_screenMutations from genome screening effortsIncludes less curated data, good for large-scale screens.
targeted_screenMutations from targeted screening panelsFocused panel datasets, good for clinical settings.

-cv --cosmic_version
Version of the COSMIC database. Default: None -> Defaults to latest version.

-gv --grch_version
Version of the human GRCh reference genome the COSMIC database was based on (37 or 38). Default: 37

--email Email for COSMIC login. Helpful for avoiding required input upon running gget COSMIC. Default: None

--password Password for COSMIC login. Helpful for avoiding required input upon running gget COSMIC, but password will be stored in plain text in the script. Default: None

Additional arguments for the --gget_mutate flag
--keep_genome_info Whether to keep genome information in the modified database for use with gget mutate. Default: False

--remove_duplicates Whether to remove duplicate rows from the modified database for use with gget mutate. Default: False

--seq_id_column (str) Name of the seq_id column in the csv file created by gget_mutate. Default: "seq_ID"

--mutation_column (str) Name of the mutation column in the csv file created by gget_mutate. Default: "mutation"

--mut_id_column (str) Name of the mutation_id column in the csv file created by gget_mutate. Default: "mutation_id"

Optional arguments (general)
-o --out
Path to the file (or folder when downloading databases with the download_cosmic flag) the results will be saved in, e.g. 'path/to/results.json'.
Defaults:
-> When download_cosmic=False: Results will be returned to standard out
-> When download_cosmic=True: Database will be downloaded into current working directory

Flags (general)
-q --quiet
Command-line only. Prevents progress information from being displayed.
Python: Use verbose=False to prevent progress information from being displayed.

Examples

Download the COSMIC "cancer" database and query information

# The download_cosmic command will ask for your COSMIC email and password and only needs to be run once
gget cosmic --download_cosmic --cosmic_project cancer

gget cosmic EGFR --cosmic_tsv_path 'CancerMutationCensus_AllData_Tsv_v101_GRCh37/CancerMutationCensus_AllData_v101_GRCh37.tsv'
# Python
# The download_cosmic command will ask for your COSMIC email and password and only needs to be run once
gget.cosmic(searchterm=None, download_cosmic=True, cosmic_project="cancer")

gget.cosmic("EGFR", cosmic_tsv_path="CancerMutationCensus_AllData_Tsv_v101_GRCh37/CancerMutationCensus_AllData_v101_GRCh37.tsv")

→ The first command downloads the requested COSMIC database of the latest COSMIC release into the current working directory. The second command searches the database for mutations associated with the 'EGFR' gene and returns results in the following format:

GENE_NAMEACCESSION_NUMBERONC_TSGMutation_CDSMutation_AA...
EGFRENST00000275493.2oncogenec.650A>Tp.Q217L...
EGFRENST00000275493.2oncogenec.966C>Tp.G322=...
..................

Download the COSMIC "census" database and query information

# The download_cosmic command will ask for your COSMIC email and password and only needs to be run once
gget cosmic --download_cosmic --cosmic_project census

gget cosmic EGFR --cosmic_tsv_path 'Cosmic_MutantCensus_Tsv_v101_GRCh37/Cosmic_MutantCensus_v101_GRCh37.tsv'
# Python
# The download_cosmic command will ask for your COSMIC email and password and only needs to be run once
gget.cosmic(searchterm=None, download_cosmic=True, cosmic_project="cancer")

gget.cosmic("EGFR", cosmic_tsv_path="Cosmic_MutantCensus_Tsv_v101_GRCh37/Cosmic_MutantCensus_v101_GRCh37.tsv")

→ The first command downloads the requested COSMIC database of the latest COSMIC release into the current working directory. The second command searches the database for mutations associated with the 'EGFR' gene and returns results in the following format:

GENE_SYMBOLCOSMIC_GENE_IDMUTATION_DESCRIPTIONMUTATION_CDSMutation_AAMUTATION_SOMATIC_STATUS...
EGFRCOSG35617inframe_deletionc.2235_2249delp.E746_A750delReported in another cancer sample as somatic...
EGFRCOSG35617missense_variantc.2573T>Gp.L858RReported in another cancer sample as somatic...
.....................

References

If you use gget cosmic in a publication, please cite the following articles:

  • Luebbert, L., & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836

  • Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, Boutselakis H, Cole CG, Creatore C, Dawson E, Fish P, Harsha B, Hathaway C, Jupe SC, Kok CY, Noble K, Ponting L, Ramshaw CC, Rye CE, Speedy HE, Stefancsik R, Thompson SL, Wang S, Ward S, Campbell PJ, Forbes SA. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2019 Jan 8;47(D1):D941-D947. doi: 10.1093/nar/gky1015. PMID: 30371878; PMCID: PMC6323903.