NMPDR> FIG Web>SproutProject? >SimBlocksDBDXml  
 

Introduction

Entities

Contig

A contig is a contiguous run of nucleotides. The contig's ID consists of the genome ID followed by a name that identifies which contig this is for the parent genome. The individual components are separated by a colon.

Contig Table

Field Type Default Description
id key-string n/a Unique identifier for this Contig.

Index Unique Fields Notes
idx0 yes id Primary index for Contig.

Genome

A genome contains the sequence data for a particular individual organism.

Genome Table

Field Type Default Description
id name-string n/a Unique identifier for this Genome.
description string n/a Brief description of this genome.
group-name name-string n/a Name of this genome's close-strain group.

Index Unique Fields Notes
idx0   group-name, id This index sorts the genomes by group so that close strains are placed next to each other.
idx1 yes id Primary index for Genome.

GroupBlock

A group block is a set of similar genome regions. A group block can represent a gene or an inter-genic region. The result is that every position in a contig belongs to exactly one block, though some will belong to several.

GroupBlock Table

Field Type Default Description
id name-string n/a Unique identifier for this GroupBlock.
description string n/a Descriptive name of this block. This will be the gene name for gene blocks, and a generated string for inter-genic blocks.
len int n/a Number of nucleotides in the regions belonging to this block. This may include insertion markers (-).
snip-count int n/a The number of positions at which the nucleotides vary between regions in this group. The variance value is this number divided by the block length.
variance float n/a The proportion of nucleotides that vary between regions in this group. For example, a value of 0 means all regions are identical at every position. A value of 0.5 means all regions are identical at exactly half of the positions. For a block length of 100, a value of 0.03 means all regions are identical at every position but 3. The variance does not indicate the degree of dissimilarity, just how much of each region needs to be examined for SNPs.
pattern text n/a A representation of the nucleotides in the group, with question marks substituted for positions that are not identical for all group members.

Index Unique Fields Notes
idx0 yes id Primary index for GroupBlock.

Region

A region describes a location in a contig, and essentially bridges the gap between blocks and contigs. Each instance of this object corresponds to a single segment on a contig. The key is the region's sprout-style location string.

Region Table

Field Type Default Description
id name-string n/a Unique identifier for this Region.
contigID key-string n/a Name of the contig containing this region.
direction char n/a + for a forward region, - for a reverse region.
endpoint int n/a Index (1-based) of the region's rightmost nucleotide in the contig.
len int n/a Length of this region. This may be slightly smaller than the block length.
peg name-string n/a PEG identifier for this block if it is a gene block, or a string generated from the nearby PEGs if it is an inter-genic block
position int n/a Index (1-based) of the region's leftmost nucleotide in the contig.
content text n/a Nucleotide sequence of variance in this region (upper case). For a forward region, this is the exact content of each position of variance in the region. For a reverse region, it is the complement in reverse order.

Index Unique Fields Notes
idx0   endpoint This index enables the application to find regions that overlap a specific section of the contig. The index can be used to find the first region whose end point is at or follows the start of the section in question. Because every nucleotide is in at most one region, this guarantees that if any region overlaps the section, the region found by the index will.
idx1 yes id Primary index for Region.

Relationships

ConsistsOf

  • Each Genome relates to multiple Contigs.

This relationship connects a genome to its contigs.

ConsistsOf Table

Field Type Default Description
from-link name-string n/a id of the source Genome.
to-link key-string n/a id of the target Contig.

Index Unique Fields Notes
idxFrom   from-link  
idxTo yes to-link  

ContainsRegion

  • Each Contig relates to multiple Regions.

This relationship connects contigs to the regions on them.

ContainsRegion Table

Field Type Default Description
from-link key-string n/a id of the source Contig.
to-link name-string n/a id of the target Region.
len int n/a Length of this region. This may be slightly smaller than the block length.
position int n/a Index (1-based) of the region's leftmost nucleotide in the contig.

Index Unique Fields Notes
idxFrom   from-link  
idxTo yes to-link, position, len DESC This index enables the application to find all of the regions in a contig in the order they are present in the contig.

HasInstanceOf

  • Each Genome relates to multiple GroupBlocks.
  • Each GroupBlock relates to multiple Genomes.

This relationship connects a genome to the groups represented in its contigs. It provides a fast was to get an ordered list of groups for a genome. The group lists for genomes can then be merged to determine the common groups of a set of genomes.

HasInstanceOf Table

Field Type Default Description
from-link name-string n/a id of the source Genome.
to-link name-string n/a id of the target GroupBlock.

Index Unique Fields Notes
idxFrom   from-link  
idxTo   to-link  

IncludesRegion

  • Each GroupBlock relates to multiple Regions.

This relationship connects a block to the regions it covers. Note that since the ID of the region is its Sprout-style location string, often it is not necessary to cross to the Region table when accessing this relationship.

IncludesRegion Table

Field Type Default Description
from-link name-string n/a id of the source GroupBlock.
to-link name-string n/a id of the target Region.

Index Unique Fields Notes
idxFrom   from-link  
idxTo yes to-link  

Topic revision: r2 - 02 Oct 2008 - 17:22:02 - FigWikiBot
 
NMPDR is a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.