The NMPDR will be down from October 2 - 5 while the systems hosting it are moved to the new TCS building at Argonne.

Annotation Overview Page

The Annotation page shows a variaty of information about a single feature. The page is roughly divided into three parts. The Annotation Overview presents the basic information about the feature. Reasons for Current Assignment reflect why the feature was assigned its current functional role. The Compare Regions Display shows the context of the feature in its own genome and the context of similar features in related organisms.

Overview Summary

top links on annotation page
The feature ID and the genome it belongs to are shown in the header line of this part of the page. They link to the Genome Viewer Genome Browser? and the Genome Viewer Organism Page? , respectively.

annotation page showing annotation history button
The current annotation depicts the functional role currently assigned to the feature. As annotations can be changed by our annotators, you have the option to view an annotation history by pressing the show button. It will open a small table listing the date, the curator and the annotation that was made for each entry.

annotation page showing NCBI link and contig info
As the genome name for the feature is already presented in the header of this section, we additionally show the taxonomy id for that genome in the overview. The link will lead to the Taxonomy Browser at the NCBI showing the taxonomy information for that genome. To the right of the taxonomy id of the genome you will find the name of the contig the feature can be found on.

annotation page showning internal link
The internal links you can see in the next row lead to different pages containing other views and information about the feature. genome browser links to the Genome Viewer Genome Browser? display. Use this to walk the chromosome viewing the features in each region of each contig. evidence links to the Genome Viewer Evidence Page? , which displays similarities, protein domains, and cell location estimates. sequence links to the Genome Viewer Sequence Page? , which shows the feature's DNA or protein sequence in FASTA Format.

annotation page external links
If the feature is a Protein Encoding Gene with an EC Number, there will be a link to the appropriate enzyme page on the KEGG web site. Two rows below that there will be a link to the NMPDR Annotation Clearinghouse that shows annotations for genes that generate essentially identical proteins. On the lower left there will be links to aliases, that is, corresponding features in other databases, such as CMR, NCBI, UniProt, and RefSeq. There may also be PubMed Links that link to papers about the feature in the NCBI Entrez Database.

annotation page FIGfam and subsystem links
If the feature is in a FIGfam, there will be a link to the specified FIGfam's Genome Viewer FIG Fam Page? . If it is in a subsystem, there will be a link to the apprporiate Genome Viewer Subsystem Page? and the feature's role in the subsystem will be listed.

AnnotationPageToolDropdown.png
Finally, you can use the dropdown box to run common bioinformatics tools on the selected feature.
Tool Description
CELLO Predict the feature's probably cell location. Different versions are provided for Gram Negative and Gram Positive organisms.
InterProScan Scan the Inter Pro? database for data about similar proteins.
LipoP Predict the feature's lipoproteins.
PDB Find the structures of similar proteins in the Protein Data Bank
Psi-Blast Generate a feature profile from its closest BLAST hits.
PSORT Predict the feature's probable cell location. Different versions are provided for Gram Negative and Gram Positive organisms.
PPSearch Search for protein motifs.
Radar Look for repeated groups in the feature's protein.
SignalP Predict the feature's signal peptide? cleavage sites.
TMHMM Compute the feature's transmembrane domains? using hidden markov models.
TMpred Compute the feature's transmembrane domains? using statistical analysis.

Reasons for Current Assignment

For information about what evidence an assignment of a functional role to your feature is based on, the text in Reasons for Current Assignment summarizes important information supporting the annotation. In addition to the information in the overview table, a list of indirect supporting papers—those with results relating to similar features—is included. In the screen fragment below, these are highlighted in red.

Sample structured english assignment reason text

Compare Regions Display

The compare regions display is the signature visual tool of the NMPDR Genome Viewer. The purpose of the display is to show the chromosome context of a protein encoding gene along with similar regions in other genomes. The display below is for fig|360108.3.peg.119, a flagellar protein in Campylobacter jejuni.

CompareRegionsAnnotated.png
Click here for a larger view or here to see the real page.
It begins with a short explanatory text. Immediately below that is the options form, which allows you to change the parameters of the regions displayed. The display itself has two forms—a visual depiction of genes in the various regions (the visual region information), and an interactive table of the same data (the tabular region information).

What It Shows

The starting point of a compare regions display is usually a single gene called the focus gene. The software normally searches for genes in other genomes that are very similar to the focus gene, and shows the neighborhood around each such gene found. The genes in the various neighborhoods are then divided into similar groups which are numbered and colored accordingly. The focus gene is colored red and its group is given the number 1. Genes that are not similar to any others are colored gray and have no numbers.

CompareRegionsPEGvsRNA.png
In the visual display, protein encoding genes are shown as arrows, and genes of other types (such as transfer RNAs? or binding sites? ) are shown as rectangles. Genes that are functionally coupled to the focus gene are indicated by gray boxes behind the arrows.

If everything is working properly and the genome of interest is fairly close to others in the database, the displayed regions will tend to look the same. Mutations and sequencing errors show up as sudden changes in the midst of a column of similarity. The screen fragment below shows two of the common cases. In similarity group 11, the third genome shows a group of small genes in the place of the one large gene present in the first two. This could indicate that extra stop codons have appeared, possibly causing the genes to become inoperative. In the fourth genome, nothing is present in the space formerly occupied by gene 11. This could mean that its functional role is optional, or it could mean that there is a sequencing error in that section that requires manual correction.


UnusualCompareRegionsDisplay.png

Gene Details

Data about the individual genes shown in the display is available in the tabular view, but you can see the details for an individual gene on the visual display by holding the mouse cursor over its arrow. If you want to see the gene's annotation page, simply click on it.

Compare Regions display showing tooltip
Visual region display tooltip
compare regions tabular display
Tabular region display showing export button

The tabular region information for a gene is almost the same as what is seen in the tooltip. In the tabular display, however, you can filter what is shown, sort it on any column, and export the results in various formats. In addition, you can click on the link in the FC column to see Functional Coupling information, or click on the cluster button to see chromosomal clusters around the selected gene.

Column Name Description
Genome Name of the genome containing the gene.
ID FIG ID of the gene.
Start Gene's start location on the contig. For a Protein Encoding Gene, this is the location of the start codon? .
Stop Gene's stop location on the contig. For a Protein Encoding Gene, this is the location immediately before the stop codon.
Size (nt) The number of nucleotides (base pairs) that make up the gene.
Strand The strand that contains the gene. If a gene is on the minus strand, its stop location will be numerically greater than its stop location.
Function Functional Role assigned to the gene.
FC Functional Coupling score, equal to the number of PCH pairs for the given gene and the focus gene. In the example above, the focus gene is fig|360108.3.peg.1041. A value of 5 indicates that there are PCH pairs corresponding to fig|360108.3.peg.1041 and fig|360108.3.peg.1047 in five diverse genomes. Clicking on the number will take you to the functional coupling page for those two genes.
SS Subsystems in which the gene participates. To save space in the table, the subsystems are assigned numbers and only the numbers are shown. Hold the mouse cursor over a number to see the subsystem name.
Set Label of color-matched sets of homologs. Set 1, red, contains homologs of the focus gene. Other sets of homologs are numbered in order of their frequency of occurence in the neighborhood of the focus gene.
CL Click to show chromosomal clusters including the respective gene in other organisms.

Options Form

Compare Regions Options Form
The default Compare Regions display shows four genomes containing genes similar to the focus gene and displays a neighborhood of 16000 base pairs around the gene. You can make the display larger or smaller by changing a value in the options form and clicking the update graphic button. Of course, the more you increase one or both of the values, the longer it will take to draw the graphic and build the table.

Advanced Compare Regions Options.png
The advanced options form (revealed when you click Advanced on the regular form) allows more esoteric options for tuning the display. These options are listed in the table below.

Option Description
Pinned CDS selection Normally, we select the genomes for the Compare Regions display by looking for similarities to the focus genes. If you select PCH pin, the genomes selected will be the ones which contain a pair of close homologs involving the focus gene.
Genome selection If Collapse close genomes is selected, only representative genomes will be shown. In other words, if two genomes are very close to each other (for example, because they are different strains of the same species), only one will be selected for display.
Sort genomes by Normally, genomes are sorted by phylogenetic distance, which is a measure of how closely related they are to the focus gene's genome in the phylogenetic tree. You can alternately sort them by their position in the tree without regard to how they relate to the focus genome.
Evalue cutoff for selection of pinned CDSs If you have selected PCH pin as the genome selection criterion, this will be the maximum EValue score. PCHs with a higher score (indicating a lower quality match) will be excluding.
Evalue cutoff for coloring CDS sets This is the maximum EValue score for putting genes in the same color set. Genes whose similarity to the gene on the focus genome is higher than this (indicating a lower-quality match) will be assigned a different color.
Coloring algorithm Some shortcuts are taken for performance when computing the color sets. Choosing the Slower algorithm will provide more accurate coloring, but will take longer to compute.
SequencingForm
Sequence 000100
Summary Primary display page for individual features
Topic revision: r2 - 02 Mar 2009 - 14:13:26 - Leslie Mc Neil
 
National Microbial Pathogen Data Resource
 
Search
Click to log in

NMPDR is a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.