Similarities and Homologs

Two features are said to be similar if they share a nearly identical DNA sequence. In bioinformatics, similarities are the fundamental tool for reasoning about genes in new genomes. In particular, similarity is considered to be evidence that features may be homologs; that is, genes derived from a common ancestor.

Given that there are millions of features in our database, it is not possible to compute similarities on demand. Instead, similarities are computed as new genomes make their way into the system via the pipeline and are stored on an external Similarities Server? .

RouteToEvidence.png
In the NMPDR, similarities are displayed on the Genome Viewer Evidence Page? . A link to the evidence page is provided near the top of a feature's annotation page, as shown in the screen fragment on the right. The similarities are shown in the lower portion of the page. Each similarity is displayed in the form of an alignment, with the focus gene (in this case fig|360108.3.peg.1041) shown on top, and the similar gene on the bottom. The color of the similarity indicates the closeness of the match. Red indicates a close match, yellow a moderate match, and green a distant match.

EvidencePage.png
Fragment of the evidence page for the ATP synthase delta gene in Campylobacter jejuni

In a similarity, the gene whose similarities are desired is called the query and the matching gene is called the hit. In general, a similarity exists if a large portion of the query gene matches a large portion of the hit. On the evidence page, the matching portions are indicated by the white part of the alignment bars. The colored part indicates how much of each gene extends to either side of the match. As can be expected, the colored parts are larger for the lower-quality matches.

In addition to the visual depiction of similarities shown above, the evidence page can show the data in an Interactive Table. Simply click on the Tabular Protein Evidence tab at the top of the page to see the table. You can add columns to the table, and you can also select genes in the table using the check boxes and export the sequences in FASTA Format or as a standard tab-delimited file.

Table view of similarities

Similar genes generate similar proteins, and as a result we expect close homologs to have the same functional role. The evidence page displays the functional role of each similar gene. Similar genes with different functional roles can be considered candidates for review; however, it is important to realize that a gene's functional role is also influenced by its neighborhood: the fifth step in a process that produces a toxin is not going to take place if the other four steps are missing. The neighborhood influence is brought into play using two other important NMPDR tools: subsystems and functional coupling.

Further reading: Wikipedia:Homologs

Topic revision: r2 - 01 Jan 2009 - 05:09:06 - Bruce Parrello
 
NMPDR is a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.