Python arguments are equivalent to long-option arguments (
--arg
), unless otherwise specified. Flags are True/False arguments in Python. The manual for any gget tool can be called from the command-line using the-h
--help
flag.
gget pdb 🔮
Query RCSB Protein Data Bank (PDB) for the protein structure/metadata of a given PDB ID.
Return format: Resource 'pdb' is returned in PDB format. All other resources are returned in JSON format.
Positional argument
pdb_id
PDB ID to be queried, e.g. '7S7U'.
Optional arguments
-r
--resource
Defines type of information to be returned. One of the following:
'pdb': Returns the protein structure in PDB format (default).
'entry': Information about PDB structures at the top level of PDB structure hierarchical data organization.
'pubmed': Get PubMed annotations (data integrated from PubMed) for a given entry's primary citation.
'assembly': Information about PDB structures at the quaternary structure level.
'branched_entity': Get branched entity description (define entity ID as 'identifier').
'nonpolymer_entity': Get non-polymer entity data (define entity ID as 'identifier').
'polymer_entity': Get polymer entity data (define entity ID as 'identifier').
'uniprot': Get UniProt annotations for a given macromolecular entity (define entity ID as 'identifier').
'branched_entity_instance': Get branched entity instance description (define chain ID as 'identifier').
'polymer_entity_instance': Get polymer entity instance (a.k.a chain) data (define chain ID as 'identifier').
'nonpolymer_entity_instance': Get non-polymer entity instance description (define chain ID as 'identifier').
-i
--identifier
Can be used to define assembly, entity or chain ID (default: None). Assembly/entity IDs are numbers (e.g. 1), and chain IDs are letters (e.g. 'A').
-o
--out
Path to the file the results will be saved in, e.g. path/to/directory/7S7U.pdb or path/to/directory/7S7U_entry.json. Default: Standard out.
Python: save=True
will save the output in the current working directory.
Examples
gget pdb 7S7U -o 7S7U.pdb
# Python
gget.pdb("7S7U", save=True)
→ Saves the structure of 7S7U in PDB format as '7S7U.pdb' in the current working directory.
Find PDB crystal structures for a comparative analysis of protein structure:
# Find PDB IDs associated with an Ensembl ID
gget info ENSG00000130234
# Alternatively: Since many entries in the PDB do not have linked Ensembl IDs,
# you will likely find more PDB entries by BLASTing the sequence agains the PDB.
# Get the amino acid sequence of a transcript from an Ensembl ID
gget seq --translate ENSG00000130234 -o gget_seq_results.fa
# BLAST an amino acid sequence to find similar structures in the PDB
gget blast --database pdbaa gget_seq_results.fa
# Get PDB files from the PDB IDs returned by gget blast for comparative analysis
gget pdb 7DQA -o 7DQA.pdb
gget pdb 7CT5 -o 7CT5.pdb
# Find PDB IDs associated with an Ensembl ID
gget.info("ENSG00000130234")
# Alternatively: Since many entries in the PDB do not have linked Ensembl IDs,
# you will likely find more PDB entries by BLASTing the sequence agains the PDB.
# Get the amino acid sequence of a transcript from an Ensembl ID
gget.seq("ENSG00000130234", translate=True, save=True)
# BLAST an amino acid sequence to find similar structures in the PDB
gget.blast("gget_seq_results.fa", database="pdbaa")
# Get PDB files from the PDB IDs returned by gget blast for comparative analysis
gget.pdb("7DQA", save=True)
gget.pdb("7CT5", save=True)
→ The use case above exemplifies how to find PDB files for comparative analysis of protein structure starting with Ensembl IDs or amino acid sequences. The fetched PDB files can also be compared to predicted structures generated by gget alphafold
. PDB files can be viewed interactively in 3D online, or using programs like PyMOL or Blender. To compare two PDB files, you can use this website.
More examples
References
If you use gget pdb
in a publication, please cite the following articles:
-
Luebbert, L., & Pachter, L. (2023). Efficient querying of genomic reference databases with gget. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac836
-
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. 2000 Jan 1;28(1):235-42. doi: 10.1093/nar/28.1.235. PMID: 10592235; PMCID: PMC102472.