Attribution Page

This page lists the tools and data sources that were used to generate the NMPDR web site.


ClustalW2 is a general-purpose multiple sequence alignment? program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen. ClustalW2 was developed at the European Molecular Biology Laboratory in Heidelberg, Germany.
fastDNAml is a tool developed by Gary J. Olsen, Hideo Matsuda, Ray Hagstrom, and Ross Overbeek for the purpose of constructing phylogenetic trees from DNA sequences. It uses a maximum likelihood approach and is based on version 3.3 of Felsenstein 's dnaml program. Several enhancements, including algorithmic changes, significantly improve performance and reduce memory usage, making it feasible to construct even very large trees.
GLIMMER is a suite of programs for finding regions with a high likelihood of being protein encoding genes in non-eukaryotic DNA based on a Markov-chain probabilistic model for nucleotide usage. It was developed by Steven L. Salzberg, Arthur K. Delcher, and others, originally for the Institute for Genomics Research (TIGR), and currently for the University of Maryland's Center for Bioinformatics and Computational Biology. RAST originally made use of GLIMMER-2 to suggest protein-encoding ORF candidates, and currently uses GLIMMER-3. We thank the developers of GLIMMER for generously making it freely available and redistributable under a Open-Source Software license. For more details, including links to past publications and past and current downloadable software, please visit the UMD GLIMMER website.
Glimpse is a very powerful indexing and query system that is used in the implementation of the search mechanism in the SEED servers. Glimpse is also the basis of WebGlimpse, which provides search for web sites, and it is the default search engine in Harvest.
tRNAscan-SE is used to find Transfer RNA genes in NMPDR genomes. It is maintained by the Department of Genetics at the Washington University School of Medicine.
TWiki is a PERL-based wiki application, and is used as the NMPDR's presentation medium. TWiki enables us to create a consistent look and feel for both content and application pages using a simplified rendering language.

Data Sources

The European Bioinformatics Institute created the EMBL DNA data collection. TrEMBL is the set of protein sequences derived from the EMBL data. EBI is one of the three partners in UniProt. We continually find ourselves using UniProt data. They offer a host of tools, some of which we link to and some of which we use directly (and we plan on using many more).
The metabolic pathway collection created by Evgeni Selkov and his team in Pushchino, Russia was unique when they first made it publicly available. We believe that every currently existing, major collection of metabolic and enzymatic data built upon the work of this team. Their work has set the stage for the advances that are now occurring, and we feel that we owe them a major intellectual debt.
The SwissProt database was a pioneering effort in every sense. It set new standards for speed and accuracy of curation and we owe a great deal to that early effort. From those heroic beginnings, ExPASy has emerged as a source of tools and data that are known for their quality. We frequently use their annotations and their data on enzymes and EC nomenclature.
Greengenes is a 16S ribosomal RNA database addressing limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies.
The J. Craig Venter Institute (formerly TIGR) has been one of the most productive and successful organizations in the field of genomics. They have sequenced and annotated a long list of genomes, and their CMR database is an integration of genomic data that has provided wide services to the community. We download their annotations and we use them in our Annotation Clearinghouse.
The Joint Genome Institute of the Department of Energy has sequenced a remarkably diverse, useful set of microbial genomes called IMG. In the IMG mission statement JGI states
The Joint Genome Institute (JGI) is currently producing about 22% of the reported number of bacterial genome projects worldwide, based on information in the Genomes On Line Database. The key mission of the Integrated Microbial Genomes (IMG) system is to provide a data management platform that supports comprehensive analysis and annotation of all publicly available genomes in a comparative genomics context.
The goal of constructing a comprehensive analysis and annotation of all available genomes is one we hold in common. JGI has contributed a large number of curated product names to our Annotation Clearinghouse, and we look forward to reconciling our annotations with theirs. We certainly benefit from the annotations produced by this team.
The Kyoto Encyclopedia of Genes and Genomes was one of the first efforts to construct a comprehensive representation of metabolism. Here is how they describe their motivation:
The increasing amount of genome sequence data is the basis for understanding life as a molecular system and for developing medical, pharmaceutical, and other practical applications. Since 1995 we have been developing knowledge-based methods for uncovering higher-order systemic behaviors of the cell and the organism from genomic and molecular information. The reference knowledge is stored in KEGG, Kyoto Encyclopedia of Genes and Genomes, and associated bioinformatics technologies are being developed both for basic research and practical applications.
This database has been invaluable to us. We use their metabolic maps and their data on enzymes throughout our work.
The National Center for Biotechnology Information maintains GenBank, the United States repository of DNA data, PubMed, and numerous wonderful tools (BLAST, their taxonomy browser, etc.). We use their sequence data, the taxonomy they provide, and the wonderfully useful PubMed collection on a daily basis.
  • Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Di Cuccio? M,Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ,Madden TL, Maglott DR, Miller V, Mizrachi I, Ostell J, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Souvorov A, Starchenko G, Tatusova TA, Wagner L, Yaschenko E,Ye J. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009 Jan;37(Database issue):D5-15. Epub 2008 Oct 21.
The Protein Information Resource has a rich history and did pioneering work in the creation and maintenance of protein families. They are now one of the three participants in UniProt. We have benefited by comparing our annotations against theirs, and we deeply appreciate their help in resolving the meaning of specific protein IDs. We use the data they compiled for their BioThesaurus on a regular basis. One of our key goals for the coming few years is to reconcile differences in the protein families PIR curates (see PIRSF) with our own FIGfams.
The Ribosomal Database Project provides researchers with quality-controlled bacterial and archaeal small subunit ribosomal RNA (16S) alignments? and analysis tools.
SILVA provides comprehensive, quality checked and regularly updated databases of aligned small- (16S/18S, SSU) and large-subunit (23S/28S, LSU) ribosomal RNA sequences for all three domains of life. All sequences are checked for anomalies, carry a rich set of sequence associated contextual information, have multiple taxonomic classifications, and the latest validly described nomenclature.
  • Pruesse, E., C. Quast, K. Knittel, B. Fuchs, W. Ludwig, J. Peplies, and F. O. Glöckner. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nuc. Acids Res. 2007; Vol. 35, No. 21, p. 7188-7196)
The Transport Classification Database is operated by the Saier Lab Bioinformatics Group. The database details a comprehensive IUBMB approved classification system for membrane transport proteins known as the Transporter Classification (TC) system. The TC system is analogous to the Enzyme Commission (EC) system for classification of enzymes, but incorporates phylogenetic information additionally. TCDB is utilized in the metabolic reconstruction process as a source of information on transporter stoichiometry, transporter specificity, and known transporter gene associations.
UniProt was formed as a joint effort of three institutions: the Swiss Institute of Bioinformatics, European Bioinformatics Institute, and the Protein Information Resource. The consortium is making substantial progress towards a controlled vocabulary for protein product names (which we call protein functions), for rapid improvements in overall accuracy and consistency of annotations, and for making substantial amounts of data freely available.
Topic revision: r5 - 02 Mar 2009 - 20:15:10 - TWiki Guest
Notice to NMPDR Users - The NMPDR BRC contract has ended and bacterial data from NMPDR has been transferred to PATRIC (, a new consolidated BRC for all NIAID category A-C priority pathogenic bacteria. NMPDR was a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.