SigGenesHits.gif
Use the signature genes tool to find genes that are common among one set of genomes, or differentiate one set from another.

Like most searches in the NMPDR, this is a search for features. In particular, it is a search for features in a single genome, known as the reference genome. The reference genome is compared to two sets of genomes, the inclusion set and the exclusion set. For each feature in the reference genome, the search strives to determine if it is common in the inclusion set and uncommon in the exclusion set, as shown in the diagram to the left. If the reference feature passes the test, it is displayed in the search results.

If the exclusion set is empty, then the search will return genes that are common to the inclusion set (on the grounds that everything is uncommon in an empty set). Note that in both cases, the reference genome is considered part of the inclusion set.

To begin the search, select the reference genome from the drop-down box at the top of the form. The inclusion and exclusion sets are specified using standard genome controls. When you select a genome for one set, it will automatically be deselected from the other set.

You have the option of filtering the genes in the reference genome before they are examined, using the options below.

Option Type Description
Search Words Keyword Box Enter one or more keywords to limit the set of genes examined in the reference genome.
Subsystem Subsystem Filter Select a subsystem to limit the set of genes from the reference genome to those in that subsystem.
Options Gene Display Options Specify how the results are to be displayed or sorted.

You can also modify the algorithm used to determine whether a gene is common or uncommon.

Commonality
When searching for genes in common, this is the ratio of hits found to total genomes in the set. The default value of 0.80 represents 80%. In that case, if you have a set of 10 genomes, a gene will be considered common if it has hits in 8 genomes from the set. This value is ignored if there is an exclusion set and Use Statistical Algorithm is checked.
Use Statistical Algorithm
If you have an exclusion set (that is, if you are looking for genes which differentiate between two sets) and this box is checked, a second-order statistical computation will be used to determine whether or not a gene differentiates. If this box is not checked, a simple percentage calculation will be used.
Use Similarities
Normally bidirectional best hits are used to determine if a gene in the reference genome has a hit in a specified inclusion or exclusion genome. If this box is checked, similarities will be used instead. Because the similarity set is larger, this will result in a slower search.
Show Matching Genes
If this box is checked, a list of the genes matching the reference gene will be shown. This is useful if you want to know why a particular reference gene is considered common.
Cutoff
The SimilarityScore? to be used as a cutoff when computing bidirectional best hits or similarities. A lower number means that fewer hits will be found when processing a reference gene.
Topic revision: r5 - 26 Aug 2008 - 15:55:48 - BruceParrello
 
NMPDR is a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.