Searching for Genes with Specific FunctionalRoles

Using Keyword Seach

There is no specific NMPDR search for genes by functional role; however, the functional role is part of a gene's keyword list, so you can get a close approximation using a keyword search. To perform a keyword search, either enter the keywords in the NmpdrBanner search box or go to the WordSearch? page.

If you are pretty sure you know the exact wording of a functional assignment, simply enclose the probable language in quotes. So, for example,

   "Chaperone protein dnaK"

would return all genes with that specific functional role. The keyword search attempts to be clever, so if you enter a minor variation such as

   "Chaperone proteins for dnaK"

you will get the same result.

The double quotes are important in this type of search. The unquoted

   Chaperone protein dnaK

will get a much larger result set, including any chaperone proteins in the Heat shock dnaK gene cluster extended subsystem. This is because the keywords for a specific gene include the names of any subsystem containing the gene as well as the functional role itself.

If you know the EC number? for a particular enzyme, you can simply enter that. Thus,

  1.13.11.27

will return all genes that produce 4-Hydroxyphenylpyruvate dioxygenase.

Using FigFams

Genome Viewer page fragmentA FigFam is a set of functionally identical genes. If you know a specific gene with a particular functional role, you can enter the FigId in the NmpdrBanner search box and click GO to see the GenomeViewer page for the gene. If a FigFam exists for the gene, it will be displayed in the overview section of the page. Click on the ID to see all the genes in the same family. The screen fragment on the left comes from the GenomeViewer page for fig|314288.3.peg.916, the 4-hydroxyphenylpyruvate dioxygenase gene for Vibrio alginolyticus 12G01. In the screen fragment, this gene belongs to FigFam FIG001109. The resulting page contains not only general information about the FigFam, but also a GenomeViewerTable? of all the genes that belong to it, as shown below.

Fig Fam page fragment from Genome Viewer
If you don't know the FigId of a gene, you can enter any other ID, including the NCBI, CMR, RefSeq or UniProt ID. Over 10 million gene identifiers from these four organizations are stored in the SproutDatabase.

If you don't know an ID, but you know a protein sequence that performs the functional role, you can enter it in FastaFormat on the FigFam page of the GenomeViewer. For your convenience, you can also enter it in the box below.


Using Web Page Search

banner search box with web page option selectedresults of a web page search for chaperone protein dnak
In addition to its function as a keyword search tool, the search box in the NmpdrBanner can be used for text mining of the NMPDR web site. Simply select the web pages radio button, enter search words, and click GO. The results of a web page search for =chaperone protein dnak" are shown in the screen fragment to the right. Text mining is much less precise than searching the SproutDatabase. You will generally get fewer results, and there will be some noise among the hits.

In the example, four results were returned. The second is the GenomeViewer page for fig|36329.1.peg.1198. If you travel to that page, you will immediately notice that it is a protein whose functional assignment is Chaperone protein dnaK and that it belongs to FigFam FIG134874, which contains 666 genes with the same functional role.

Using Subsystems

Over half the genes in the SproutDatabase are members of subsystems. Genes in subsystems are manually curated, and represent the highest-quality annotations in the database. If you can find a subsystem in which a particular functional role plays a part, you have immediate access to a list of related genes that not only perform the same functional role, but perform it in the service of the same metabolic process.

The subsystems are listed in the form of a giant tree on the subsystem search page. Locate a likely-looking subsystem and click on its name to see the subsystem's main page in the GenomeViewer. Alternatively, select the radio button for the desired subsystem, type some likely keywords into the Search Words box at the bottom of the page and click GO. This will return all the features in the selected subsystem that are associated with the specified keywords. If a single subsystem seems too narrow in scope, you can select a class of subsystems using a radio buttons for a higher-level classification.

screen fragment from protein folding searchFor example, you may be interested in a functional role that plays a part as a transcriptional repressor in protein folding. Select the Protein folding subsystem class, then type "transcriptional repressor" into the search words box (the quotes are important). The result will be a list of over 70 genes. As you can see in the screen image to the left, all of them have the functional assignment HspR, transcriptional repressor of DnaK? operon. From this, you can derive other ideas for search phrases, click on a subsystem name to see the entire metabolic pathway, download the list of genes, or click one of the Viewer buttons to see details about a particular gene.


Conclusion

Finding genes with a specific functional role is a difficult process because there is no canonical form for gene annotations. The NMPDR, however, provides several different methods for finding genes by function, including FigFams, Subsystems, keyword searching, and text mining. If these tools are not enough, please let us know. The SproutDatabase is specifically designed for use as a search resource, and if the data is in there, we can almost always find a way to pull it out.

AuthorDataForm
Original Author BruceParrello
Display Title Searching for Genes with Specific Functional Roles
Original date 2008-10-16
Citation string

Topic revision: r3 - 17 Oct 2008 - 04:49:13 - BruceParrello
 
NMPDR is a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.