Python arguments are equivalent to long-option arguments (--arg), unless otherwise specified. Flags are True/False arguments in Python. The manual for any gget tool can be called from the command-line using the -h --help flag.

gget elm 🎭

Locally predict Eukaryotic Linear Motifs from an amino acid sequence or UniProt Acc using data from the ELM database.
Return format: JSON (command-line) or data frame/CSV (Python). This module returns two data frames (or JSON formatted files) (see examples).

ELM data can be downloaded & distributed for non-commercial use according to the ELM Software License Agreement.

Before using gget elm for the first time, run gget setup elm (bash) / gget.setup("elm") (Python) once (also see gget setup).

Positional argument
sequence
Amino acid sequence or Uniprot Acc (str).
When providing a Uniprot Acc, use flag --uniprot (Python: uniprot=True).

Optional arguments
-s --sensitivity
Sensitivity of DIAMOND alignment (str). Default: "very-sensitive".
One of the following: fast, mid-sensitive, sensitive, more-sensitive, very-sensitive, or ultra-sensitive.

-t --threads
Number of threads used in DIAMOND alignment (int). Default: 1.

-bin --diamond_binary
Path to DIAMOND binary (str). Default: None -> Uses DIAMOND binary installed with gget.

-o --out
Path to the folder to save results in (str), e.g. "path/to/directory". Default: Standard out; temporary files are deleted.

Flags
-u --uniprot
Set to True if sequence is a Uniprot Acc instead of an amino acid sequence.

-e --expand
Expand the information returned in the regex data frame to include the protein names, organisms, and references that the motif was orignally validated on.

-csv --csv
Command-line only. Returns results in CSV format.
Python: Use json=True to return output in JSON format.

-q --quiet
Command-line only. Prevents progress information from being displayed.
Python: Use verbose=False to prevent progress information from being displayed.

Examples

Find ELMs in an amino acid sequence:

gget setup elm          # Downloads/updates local ELM database
gget elm -o gget_elm_results LIAQSIGQASFV
# Python
gget.setup(“elm”)      # Downloads/updates local ELM database
ortholog_df, regex_df = gget.elm("LIAQSIGQASFV")

Find ELMs giving a UniProt Acc as input:

gget setup elm          # Downloads/updates local ELM database
gget elm -o gget_elm_results --uniprot Q02410 -e
# Python
gget.setup(“elm”)      # Downloads/updates local ELM database
ortholog_df, regex_df = gget.elm("Q02410", uniprot=True, expand=True)

→ Returns two data frames (or JSON formatted dictionaries for command line) containing extensive information about linear motifs associated with orthologous proteins and motifs found in the input sequence directly based on their regex expressions:

ortholog_df:

Ortholog_UniProt_AccProteinNameclass_accessionELMIdentifierFunctionalSiteNameDescriptionOrganism
Q02410APBA1_HUMANELME000357LIG_CaMK_CASK_1CASK CaMK domain binding ligand motifMotif that mediates binding to the calmodulin-dependent protein kinase (CaMK) domain of the peripheral plasma membrane protein CASK/Lin2.Homo sapiens
Q02410APBA1_HUMANELME000091LIG_PDZ_Class_2PDZ domain ligandsThe C-terminal class 2 PDZ-binding motif is classically represented by a pattern such asHomo sapiens

regex_df:

Instance_accessionELMIdentifierFunctionalSiteNameELMTypeDescriptionInstances (Matched Sequence)Organism
ELME000321CLV_C14_Caspase3-7Caspase cleavage motifCLVCaspase-3 and Caspase-7 cleavage site.ERSDGMus musculus
ELME000102CLV_NRD_NRD_1NRD cleavage siteCLVN-Arg dibasic convertase (NRD/Nardilysin) cleavage site.RRARattus norvegicus
ELME000100CLV_PCSK_PC1ET2_1PCSK cleavage siteCLVNEC1/NEC2 cleavage site.KRDMus musculus
ELME000146CLV_PCSK_SKI1_1PCSK cleavage siteCLVSubtilisin/kexin isozyme-1 (SKI1) cleavage site.RLLTAHomo sapiens
ELME000231DEG_APCC_DBOX_1APCC-binding Destruction motifsDEGAn RxxL-based motif that binds to the Cdh1 and Cdc20 components of APC/C thereby targeting the protein for destruction in a cell cycle dependent mannerSRVKLNIVRSaccharomyces cerevisiae S288c

Tutorials

🔗 General gget elm demo

🔗 A point mutation in BRCA2 is carcinogenic due to the loss of a protein interaction motif

🔗 Filter gget elm results based on disordered protein regions

References

If you use gget elm in a publication, please cite the following articles:

  • Laura Luebbert, Chi Hoang, Manjeet Kumar, Lior Pachter, Fast and scalable querying of eukaryotic linear motifs with gget elm, Bioinformatics, 2024, btae095, https://doi.org/10.1093/bioinformatics/btae095

  • Manjeet Kumar, Sushama Michael, Jesús Alvarado-Valverde, Bálint Mészáros, Hugo Sámano‐Sánchez, András Zeke, Laszlo Dobson, Tamas Lazar, Mihkel Örd, Anurag Nagpal, Nazanin Farahi, Melanie Käser, Ramya Kraleti, Norman E Davey, Rita Pancsa, Lucía B Chemes, Toby J Gibson, The Eukaryotic Linear Motif resource: 2022 release, Nucleic Acids Research, Volume 50, Issue D1, 7 January 2022, Pages D497–D508, https://doi.org/10.1093/nar/gkab975