There is no specific NMPDR search for genes by functional role; however, the functional role is part of a gene's keyword list, so you can get a close approximation using a
keyword search. To perform a keyword search, either enter the keywords in the
NmpdrBanner search box or go to the
WordSearch? page.
If you are pretty sure you know the exact wording of a functional assignment, simply enclose the probable language in quotes. So, for example,
"Chaperone protein dnaK"
would return
all genes with that specific functional role. The keyword search attempts to be clever, so if you enter a minor variation such as
"Chaperone proteins for dnaK"
you will get the same result.
The double quotes are important in this type of search. The unquoted
Chaperone protein dnaK
will get a
much larger result set, including any chaperone proteins in the
Heat shock dnaK gene cluster extended subsystem. This is because the keywords for a specific gene include the names of any subsystem containing the gene as well as the functional role itself.
If you know the
EC number? for a particular enzyme, you can simply enter that. Thus,
1.13.11.27
will return
all genes that produce 4-Hydroxyphenylpyruvate dioxygenase.

A
FigFam is a set of functionally identical genes. If you know a specific gene with a particular functional role, you can enter the
FigId in the
NmpdrBanner search box and click GO to see the
GenomeViewer page for the gene. If a
FigFam exists for the gene, it will be displayed in the overview section of the page. Click on the ID to see all the genes in the same family. The screen fragment on the left comes from the
GenomeViewer page for
fig|314288.3.peg.916, the 4-hydroxyphenylpyruvate dioxygenase gene for
Vibrio alginolyticus 12G01. In the screen fragment, this gene belongs to
FigFam FIG001109. The resulting page contains not only general information about the
FigFam, but also a
GenomeViewerTable? of all the genes that belong to it, as shown below.

If you don't know the
FigId of a gene, you can enter any other ID, including the
NCBI,
CMR,
RefSeq or
UniProt ID. Over 10 million gene identifiers from these four organizations are stored in the
SproutDatabase.
If you don't know an ID, but you know a protein sequence that performs the functional role, you can enter it in
FastaFormat on the
FigFam page of the
GenomeViewer. For your convenience, you can also enter it in the box below.
Using Web Page Search


In addition to its function as a keyword search tool, the search box in the
NmpdrBanner can be used for text mining of the NMPDR web site. Simply select the
web pages radio button, enter search words, and click
GO. The results of a web page search for =chaperone protein dnak" are shown in the screen fragment to the right. Text mining is much less precise than searching the
SproutDatabase. You will generally get fewer results, and there will be some noise among the hits.
In the example, four results were returned. The second is the
GenomeViewer page for
fig|36329.1.peg.1198. If you travel to that page, you will immediately notice that it is a protein whose functional assignment is
Chaperone protein dnaK and that it belongs to
FigFam FIG134874, which contains 666 genes with the same functional role.
Using Subsystems
Over half the genes in the
SproutDatabase are members of
subsystems. Genes in subsystems are manually curated, and represent the highest-quality annotations in the database. If you can find a subsystem in which a particular functional role plays a part, you have immediate access to a list of related genes that not only perform the same functional role, but perform it in the service of the same metabolic process.
The subsystems are listed in the form of a giant tree on the
subsystem search page. Locate a likely-looking subsystem and click on its name to see the subsystem's main page in the
GenomeViewer. Alternatively, select the radio button for the desired subsystem, type some likely keywords into the
Search Words box at the bottom of the page and click
GO. This will return all the features in the selected subsystem that are associated with the specified keywords. If a single subsystem seems too narrow in scope, you can select a class of subsystems using a radio buttons for a higher-level classification.

For example, you may be interested in a functional role that plays a part as a transcriptional repressor in protein folding. Select the
Protein folding subsystem class, then type
"transcriptional repressor" into the search words box (the quotes are important). The result will be
a list of over 70 genes. As you can see in the screen image to the left, all of them have the functional assignment
HspR, transcriptional repressor of DnaK? operon. From this, you can derive other ideas for search phrases, click on a subsystem name to see the entire metabolic pathway, download the list of genes, or click one of the
Viewer buttons to see details about a particular gene.
Conclusion
Finding genes with a specific functional role is a difficult process because there is no canonical form for gene annotations. The NMPDR, however, provides several different methods for finding genes by function, including
FigFams,
Subsystems,
keyword searching, and text mining. If these tools are not enough, please
let us know. The
SproutDatabase is specifically designed for use as a search resource, and if the data is in there, we can almost always find a way to pull it out.