Glossary of Useful Terms

  • Accession Number Accession Number An accession number is a unique identifier assigned to a particular genome or protein sequence to uniquely identify it in a database. GenBank ...
  • Alias Alias An alias is an alternative gene identifier, such as an accession number from another database (sometimes called a dbxref ), a locus tag, or a gene name. Inside ...
  • Amino Acid Codes Amino Acid Codes The standard, single letter amino acid codes are used when specifying a protein sequence in FastaFormat or when entering the data for a protein ...
  • Annotation Annotation and Assigning a Gene Function Human curators and automated programs such as RAST assign functional roles to genes, and the process is called annotation ...
  • Assertion Assertion An assertion is a statement about the purpose of a particular feature in a genome . The most common type of assertion is the assignment of a FunctionalRole ...
  • BLAST Basic Local Alignment Search Tool (BLAST) The Basic Local Alignment Search Tool was designed to find sequences in a DNA or protein sequence database by searching the ...
  • Bidirectional Best Hit Bidirectional Best Hit (BBH) The best hit of a particular gene to a target genome is the gene in that genome that represents a best match. The match is ...
  • CDS CDS The CDS , or coding sequence , refers to the portion of a genomic DNA sequence that is translated into a protein. If complete, it runs from the StartCodon to ...
  • Clustering Based Subsystem Clustering Based Subsystem A clustering based subsystem is one in which there is functional coupling evidence that genes belong together, but we don't yet know ...
  • Codon Codon A codon is a triplet of DNA or RNA nucleotides. In a ProteinEncodingGene, a codon specifies a single amino acid (see AminoAcidCodes) or a stop codon that ...
  • Conserved Domain Database Conserved Domain Database The Conserved Domain Database (CDD) is a collection of sequence alignments and profiles representing protein domains conserved during ...
  • Contig Contigs and Sequences A contig is a contiguous sequence of DNA. A contig is conceptually similar to a chromosome; however, in the SproutDatabase, a single chromosome ...
  • Core Organisms Core Organisms NMPDR is a Bioinformatics Resource Center. Each center specializes in certain pathogens. In NMPDR, we call these the core genomes or NMPDR genomes ...
  • Domain Archaea, Bacteria, and Eukaryota This page describes taxonomic domains. For protein domains, see ProteinDomain. Cellular organisms are divided into three domains archaea ...
  • EValue Similarity Scores: the E Value The E value (or Expect Value) is a parameter that describes the number of hits one can expect to see by chance when performing an alignment ...
  • EC Number EC Numbers Enzyme Commission numbers represent classes of FunctionalRoles assigned by the International Union of Biochemistry and Molecular Biology. In particular ...
  • FASTA Format FASTA format FASTA format is a standard format for encoding DNA or protein sequences. A FASTA file may contain a single or multiple sequences in FASTA format. ...
  • Feature Features and Genes A feature is anything that can be mapped onto a strand of DNA, and is defined by its start and stop location. Feature is a catch all, general ...
  • Fellowship for Interpretation of Genomes The Fellowship for Interpretation of Genomes (FIG) FIG is a nonprofit organization devoted to providing support for those analyzing genomes . Sequencing of genomes ...
  • FIGfam FIGfams FIGfams are sets of ProteinSequences that are similar along their full length. Further, all of the proteins within a single FIGfam are believed to implement ...
  • FIG ID The FIG ID Every gene in the NMPDR database has a unique FigId. It has four parts, as shown in the diagram below. /FigID.png The prefix identifies the fact ...
  • Frame Shift Frame Shifts A frame shift occurs when a gene is disrupted by addition or subtraction of nucleotides of a quantity not evenly divisible by three the length of a ...
  • Functional Coupling Functional Coupling and Chromosomal Clusters Two genes are considered functionally coupled if they tend to be placed near each other in genomes belonging to ...
  • Functional Role Functional Roles The functional role of a feature is the task it performs in the host organism. The role is stored in the NMPDR database as a descriptive phrase ...
  • GBK GenBank format (file.gbk) GenBank format is a flat file format for sequence data related to complete bacterial genomes. By convention, GenBank format files have the ...
  • Genome Genome A genome is the complete complement of DNA contained in a single organism. One genome may consist of more than one replicating molecule (replicon) such as chromosomes ...
  • Gram Stain Gram Positive and Gram Negative Gram staining is the application of a crystal violet dye to a culture of bacteria. Bacteria that retain the color of the dye are ...
  • Messenger RNA Messenger RNA Messenger RNA is RNA created from a ProteinEncodingGene. Triplets of nucleotides ( codons ) on the messenger RNA molecule fit into the anticodons ...
  • Metabolic Reconstruction Metabolic Reconstruction We use the term metabolic reconstruction to mean the set of populated subsystems that contain the genome , the ProteinEncodingGenes ...
  • Motif Protein Motifs A protein motif is a small section of a protein that has a known function, is folded independently of the rest if the protein, or functions as a docking ...
  • Ortholog Orthologs Orthologs are genes in different species that derive from a common ancestor. In other words, they are direct evolutionary counterparts. If genes are BidirectionalBestHits ...
  • PFAM PFAM PFAM is a database of multiple sequence alignments and hidden Markov models covering many common protein domains . The portal URL is http://pfam.sanger ...
  • PSORT PSORT PSORT is a program that analyzes protein sorting signals and predicts subcellular localization. PSORTb takes as input an amino acid sequence and its source ...
  • Pair of Close Homologs PCH PCH The paper The use of gene clusters to infer functional coupling defines a pair of close homologs as follows: We can also define the concept of pairs of close ...
  • Paralog Paralogs Paralogs are similar genes that have diverged from each other as a consequence of gene duplication: a particular gene was copied to a different location ...
  • Protein Domain Protein Domains A protein domain (as opposed to a taxonomic domain) is a segment of the protein sequence that serves as a functional unit. Protein domains on ...
  • Protein Encoding Gene Protein Encoding Gene Protein encoding gene (PEG), protein coding sequence (CDS), and open reading frame (ORF) are nearly synonymous terms. In the FIG ID , the ...
  • Ribosomal RNA Ribosomal RNA Ribosomal RNA is used to create ribosomes , the molecular machines that manage the process of converting ProteinEncodingGenes into actual proteins ...
  • Similarity Similarities and Homologs Two features are said to be similar if they share a nearly identical DNA sequence. In bioinformatics , similarities are the fundamental ...
  • Stop Codon The Stop Codon The stop codon is a codon that indicates the end of a gene. The three known stop codons are TAA , TAG , and TGA . In protein translations ...
  • Strand Strand A DNA molecule consists of two strands of nucleotides . Each nucleotide is one of the four molecules adenine , guanine , thymine , or cytosine . Adenine ...
  • Subsystem Subsystems Subsystems are a generalization of the concept of pathways, and they have two components. Subsystem diagram for ,an example of a subsystem based on a metabolic ...
  • Subsystems Approach The Subsystems Approach to Genome Annotation The subsystems approach to genome annotation is the primary route by which the functional roles of genes ...
  • Tar File TAR Files A TAR File is a flat file composed of a group of smaller sequential files. It also contains information about where the smaller files should be put when ...
  • Taxonomy Identifier The Taxonomy Identifier The Taxonomy ID (TaxID or taxon number) is a stable unique identifier for each taxonomic group in the NCBI Taxonomy Browser. The TaxID is seen ...
  • Transfer RNA Transfer RNA Transfer RNA is used to convert a 3 nucleotide DNA codon to an amino acid (see AminoAcidCodes). Each transfer RNA molecule has an anitcodon that ...
  • Variant Subsystem Variants A variant is a particular combination of functional roles for a subsystem . A subsystem's variants are described by a variant code , which ...
Topic revision: r10 - 15 Feb 2009 - 16:37:02 - TWiki Guest
 
Notice to NMPDR Users - The NMPDR BRC contract has ended and bacterial data from NMPDR has been transferred to PATRIC (http://www.patricbrc.org), a new consolidated BRC for all NIAID category A-C priority pathogenic bacteria. NMPDR was a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.