SOP032: Finding Genes that May Be Characteristics of a Phenotype

This procedure explains how to use the NMPDR SigGenes? page (Signature Genes Tool) to compare and contrast whole genomes with the goal of defining a phenotype.

  • For example, what genes are characteristic of the El Tor-like biotype responsible for the seventh cholera pandemic?

The Signature Genes Tool requires you to specify a reference genome, an inclusion set, and an exclusion set. The tool looks for genes from the reference genome that are common in the inclusion set and uncommon in the exclusion set. Genes found by the tool are likely to be characteristic of the genomes in the inclusion set.

As the reference genome, select an organism that exemplifies the phenotype.

In the inclusion set, select any number of genomes (zero or more) that share the phenotype to compare with the reference genome.

In the exclusion set, select any number of related genomes that do NOT share the phenotype. These will be contrasted with the reference genome.

Leave the remainder of the settings at their default values, and click the Go button. A simplified version of the Signature Genes Tool input form is given below. You may use it to run the example, or to run an example of your own. The results will open in a new window or tab.

Reference Genome
Type to narrow selection  (help)
Inclusion Set
Type to narrow selection  (help)
Exclusion Set
Type to narrow selection  (help)
Options
Use Similarities Show Matching Genes

The tool searches the database to find every protein in the reference genome that has BidirectionalBestHits? (BBHs) with most of the genomes in the inclusion set. Then it compares those to the genomes in the exclusion set and removes from the results all proteins in the reference genome that have BBHs with most of the genomes in the exclusion set. For every protein in the reference genome, the tool computes a score from 0 to 1. A score of 1.000 means that the gene has a bidirectional best hit in every genome from the inclusion set and no bidirectional best hit against any genome in the exclusion set. When the sets are large, proteins with less than perfect scores will be returned in the results.

  • For example, 76 proteins in Vibrio cholerae strain N16961 have bidirectional best hits with strain MO10, but not with strain 0395. These are a starting point for finding genes that may be responsible for pandemic virulence.

To run the search using similarities instead of bidirectional best hits, check the Use Similarities box. To see which genes matched each gene in the reference genome, check the Show Matching Genes box.

SopForm
Number 032
Audience User Group
Title Finding Genes that May Be Characteristics of a Phenotype
Style active
Topic revision: r3 - 23 May 2008 - 02:55:20 - Bruce Parrello
 
Notice to NMPDR Users - The NMPDR BRC contract has ended and bacterial data from NMPDR has been transferred to PATRIC (http://www.patricbrc.org), a new consolidated BRC for all NIAID category A-C priority pathogenic bacteria. NMPDR was a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.