The NMPDR keyword search works like a typical search engine. You type in the appropriate words, and a list of genes will come back. Our keyword database contains millions of words, including vitamins, aldolase, and pyrophosphokinase. The NMPDR looks at many specific data items when computing the keywords for a gene. The table below shows each of them along with the keywords derived for the gene fig|171101.1.peg.269, a dual-role protein encoding gene for Streptococcus pneumoniae r6 that has over 40 keywords.

FIG gene identifier fig|171101.1.peg.269
Aliases GeneID:934668, gi|15902313, kegg|spd:SPD_0272, kegg|spr:spr0269, NP_357863.1, sp|P59657, spr0269, sulD, tr|Q04MF8, uni|P59657, uni|Q04MF8
All words in the functional role dihydroneopterin, aldolase 2-amino-4-hydroxy-6-hydroxymethyldihydropteridine, pyrophosphokinase amino, hydroxy, hydroxymethyldihydropteridine
The genome ID 171101.1
All words in the taxonomy bacteria, firmicutes, lactobacillales, streptococcaceae, streptococcus, pneumoniae, r6
The subsystem name folate, biosynthesis
The EC number 2.7.6.3, 4.1.2.25
The subsystem role 2-amino-4-hydroxy-6-hydroxymethyldihydropteridine, pyrophosphokinase, amino, hydroxy, hydroxymethylhydroperidine
Special Keywords essential
Cell Location Cytoplasmic
IDs from Other Databases  

Notes

  • Some keywords appear twice.
  • In the functional role, hyphenated words are stored in their full form (2-amino-4-hydroxy-6-hydroxymethyldihydropteridine) as well as broken up on the hyphen boundaries (amino hydroxy hydroxymethyldihydropteridine).
  • Keywords are case-insensitive
  • Special keywords indicate attributes of the gene. Most of these are incomplete: for example, we know certain genes are virulence-associated, but for most of the genes we have no virulence data.
    • virulence, which indicates the gene participates in the process of helping the organism to damage its host. This attribute is incomplete.
    • essential, which indicates that the gene is essential to to the survival of the organism. This attribute is incomplete.
    • iedb, which indicates that the gene is listed in the Immune Epitope Database
  • The IDs from Other Databases are provided by PIR International, which keeps a curated list of correspondences between gene names in major bioinformatics? databases.

See also SearchingByFunctionalRole

Advanced Keyword Searching

Normally, the search process selects the genes relevant to all the words in the keyword box. You can modify the default behavior using the following control characters.

char Meaning Example Explanation of Example
- negation 2.7.6.3 -firmicutes search for all genes with EC number 2.7.6.3 that are not in firmicutes
() optional (2.7.6.3 4.1.2.25) search for any gene with EC number 2.7.6.3 or 4.1.2.25
"" phrase "folate biosynthesis" search for all genes that participate in folate biosynthesis

Using Negation

It is illegal to use negation on all the keywords. For example, you can't do

    -hypothetical

to get all non-hypothetical proteins. You can trick the the keyword search a little by including a positive keyword for a broad category

    bacteria -hypothetical

which will return all non-hypothetical proteins for bacteria.

Topic revision: r8 - 16 Oct 2008 - 15:57:15 - BruceParrello
 
NMPDR is a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.