Annotation Overview Page
The Annotation page shows a variaty of information about a single
feature. The page is roughly divided into three parts. The
Annotation Overview presents the basic information about the feature.
Reasons for Current Assignment reflect why the feature was assigned its current
functional role. The
Compare Regions Display shows the context of the feature in its own
genome and the context of
similar features in related organisms.
Overview Summary
The feature ID and the genome it belongs to are shown in the header line of this part of the page. They link to the
Genome Viewer Genome Browser? and the
Genome Viewer Organism Page? , respectively.
The
current annotation depicts the functional role currently assigned to the feature. As annotations can be changed by our annotators, you have the option to view an annotation history by pressing the
show button. It will open a small table listing the date, the curator and the annotation that was made for each entry.
As the genome name for the feature is already presented in the header of this section, we additionally show the
taxonomy id for that genome in the overview. The link will lead to the Taxonomy Browser at the
NCBI showing the taxonomy information for that genome. To the right of the taxonomy id of the genome you will find the name of the
contig the feature can be found on.
The
internal links you can see in the next row lead to different pages containing other views and information about the feature.
genome browser links to the
Genome Viewer Genome Browser? display. Use this to walk the chromosome viewing the features in each region of each contig.
evidence links to the
Genome Viewer Evidence Page? , which displays
similarities,
protein domains, and
cell location estimates.
sequence links to the
Genome Viewer Sequence Page? , which shows the feature's DNA or protein sequence in
FASTA Format.
If the feature is a
Protein Encoding Gene with an
EC Number, there will be a link to the appropriate enzyme page on the
KEGG web site. Two rows below that there will be a link to the NMPDR
Annotation Clearinghouse that shows
annotations for genes that generate essentially identical proteins. On the lower left there will be links to
aliases, that is, corresponding features in other databases, such as
CMR,
NCBI,
UniProt, and
RefSeq. There may also be
PubMed Links that link to papers about the feature in the
NCBI Entrez Database.
If the feature is in a
FIGfam, there will be a link to the specified FIGfam's
Genome Viewer FIG Fam Page? . If it is in a
subsystem, there will be a link to the apprporiate
Genome Viewer Subsystem Page? and the feature's role in the subsystem will be listed.
Finally, you can use the dropdown box to run common
bioinformatics tools on the selected feature.
| Tool |
Description |
| CELLO |
Predict the feature's probably cell location. Different versions are provided for Gram Negative and Gram Positive organisms. |
| InterProScan |
Scan the Inter Pro? database for data about similar proteins. |
| LipoP |
Predict the feature's lipoproteins. |
| PDB |
Find the structures of similar proteins in the Protein Data Bank |
| Psi-Blast |
Generate a feature profile from its closest BLAST hits. |
| PSORT |
Predict the feature's probable cell location. Different versions are provided for Gram Negative and Gram Positive organisms. |
| PPSearch |
Search for protein motifs. |
| Radar |
Look for repeated groups in the feature's protein. |
| SignalP |
Predict the feature's signal peptide? cleavage sites. |
| TMHMM |
Compute the feature's transmembrane domains? using hidden markov models. |
| TMpred |
Compute the feature's transmembrane domains? using statistical analysis. |
Reasons for Current Assignment
For information about what evidence an assignment of a functional role to your feature is based on, the text in
Reasons for Current Assignment summarizes important information supporting the annotation. In addition to the information in the overview table, a list of indirect supporting papers—those with results relating to
similar features—is included. In the screen fragment below, these are highlighted in red.
Compare Regions Display
The compare regions display is the signature visual tool of the NMPDR
Genome Viewer. The purpose of the display is to show the
chromosome context of a
protein encoding gene along with similar regions in other genomes. The display below is for
fig|360108.3.peg.119, a flagellar protein in
Campylobacter jejuni.
It begins with a short explanatory text. Immediately below that is the
options form, which allows you to change the parameters of the regions displayed. The display itself has two forms—a visual depiction of genes in the various regions (the
visual region information), and an
interactive table of the same data (the
tabular region information).
What It Shows
The starting point of a compare regions display is usually a single gene called the
focus gene. The software normally searches for genes in other genomes that are very
similar to the focus gene, and shows the neighborhood around each such gene found. The genes in the various neighborhoods are then divided into similar groups which are numbered and colored accordingly. The focus gene is colored red and its group is given the number 1. Genes that are not similar to any others are colored gray and have no numbers.
In the visual display,
protein encoding genes are shown as arrows, and genes of other types (such as
transfer RNAs? or
binding sites? ) are shown as rectangles. Genes that are
functionally coupled to the focus gene are indicated by gray boxes behind the arrows.
If everything is working properly and the genome of interest is fairly close to others in the database, the displayed regions will tend to look the same. Mutations and sequencing errors show up as sudden changes in the midst of a column of similarity. The screen fragment below shows two of the common cases. In similarity group 11, the third genome shows a group of small genes in the place of the one large gene present in the first two. This could indicate that extra
stop codons have appeared, possibly causing the genes to become inoperative. In the fourth genome, nothing is present in the space formerly occupied by gene 11. This could mean that its
functional role is optional, or it could mean that there is a sequencing error in that section that requires manual correction.
Gene Details
Data about the individual genes shown in the display is available in the tabular view, but you can see the details for an individual gene on the visual display by holding the mouse cursor over its arrow. If you want to see the gene's annotation page, simply click on it.

Visual region display tooltip

Tabular region display showing export button
The tabular region information for a gene is almost the same as what is seen in the tooltip. In the tabular display, however, you can filter what is shown, sort it on any column, and export the results in various formats. In addition, you can click on the link in the
FC column to see
Functional Coupling information, or click on the
cluster button to see chromosomal clusters around the selected gene.
| Column Name |
Description |
| Genome |
Name of the genome containing the gene. |
| ID |
FIG ID of the gene. |
| Start |
Gene's start location on the contig. For a Protein Encoding Gene, this is the location of the start codon? . |
| Stop |
Gene's stop location on the contig. For a Protein Encoding Gene, this is the location immediately before the stop codon. |
| Size (nt) |
The number of nucleotides (base pairs) that make up the gene. |
| Strand |
The strand that contains the gene. If a gene is on the minus strand, its stop location will be numerically greater than its stop location. |
| Function |
Functional Role assigned to the gene. |
| FC |
Functional Coupling score, equal to the number of PCH pairs for the given gene and the focus gene. In the example above, the focus gene is fig|360108.3.peg.1041. A value of 5 indicates that there are PCH pairs corresponding to fig|360108.3.peg.1041 and fig|360108.3.peg.1047 in five diverse genomes. Clicking on the number will take you to the functional coupling page for those two genes. |
| SS |
Subsystems in which the gene participates. To save space in the table, the subsystems are assigned numbers and only the numbers are shown. Hold the mouse cursor over a number to see the subsystem name. |
| Set |
Label of color-matched sets of homologs. Set 1, red, contains homologs of the focus gene. Other sets of homologs are numbered in order of their frequency of occurence in the neighborhood of the focus gene. |
| CL |
Click to show chromosomal clusters including the respective gene in other organisms. |
Options Form
The default Compare Regions display shows four genomes containing genes similar to the focus gene and displays a neighborhood of 16000 base pairs around the gene. You can make the display larger or smaller by changing a value in the options form and clicking the
update graphic button. Of course, the more you increase one or both of the values, the longer it will take to draw the graphic and build the table.
The advanced options form (revealed when you click
Advanced on the regular form) allows more esoteric options for tuning the display. These options are listed in the table below.
| Option |
Description |
| Pinned CDS selection |
Normally, we select the genomes for the Compare Regions display by looking for similarities to the focus genes. If you select PCH pin, the genomes selected will be the ones which contain a pair of close homologs involving the focus gene. |
| Genome selection |
If Collapse close genomes is selected, only representative genomes will be shown. In other words, if two genomes are very close to each other (for example, because they are different strains of the same species), only one will be selected for display. |
| Sort genomes by |
Normally, genomes are sorted by phylogenetic distance, which is a measure of how closely related they are to the focus gene's genome in the phylogenetic tree. You can alternately sort them by their position in the tree without regard to how they relate to the focus genome. |
| Evalue cutoff for selection of pinned CDSs |
If you have selected PCH pin as the genome selection criterion, this will be the maximum EValue score. PCHs with a higher score (indicating a lower quality match) will be excluding. |
| Evalue cutoff for coloring CDS sets |
This is the maximum EValue score for putting genes in the same color set. Genes whose similarity to the gene on the focus genome is higher than this (indicating a lower-quality match) will be assigned a different color. |
| Coloring algorithm |
Some shortcuts are taken for performance when computing the color sets. Choosing the Slower algorithm will provide more accurate coloring, but will take longer to compute. |