Introduction
Entities
Contig
A
contig is a contiguous run of nucleotides. The contig's ID consists of the genome ID followed by a name that identifies which contig this is for the parent genome. The individual components are separated by a colon.
Contig Table
| Field |
Type |
Default |
Description |
| id |
key-string |
n/a |
Unique identifier for this Contig. |
| Index |
Unique |
Fields |
Notes |
| idx0 |
yes |
id |
Primary index for Contig. |
Genome
A
genome contains the sequence data for a particular individual organism.
Genome Table
| Field |
Type |
Default |
Description |
| id |
name-string |
n/a |
Unique identifier for this Genome. |
| description |
string |
n/a |
Brief description of this genome. |
| group-name |
name-string |
n/a |
Name of this genome's close-strain group. |
| Index |
Unique |
Fields |
Notes |
| idx0 |
|
group-name, id |
This index sorts the genomes by group so that close strains are placed next to each other. |
| idx1 |
yes |
id |
Primary index for Genome. |
GroupBlock
A
group block is a set of similar genome regions. A group block can represent a gene or an inter-genic region. The result is that every position in a contig belongs to exactly one block, though some will belong to several.
GroupBlock Table
| Field |
Type |
Default |
Description |
| id |
name-string |
n/a |
Unique identifier for this GroupBlock. |
| description |
string |
n/a |
Descriptive name of this block. This will be the gene name for gene blocks, and a generated string for inter-genic blocks. |
| len |
int |
n/a |
Number of nucleotides in the regions belonging to this block. This may include insertion markers (-). |
| snip-count |
int |
n/a |
The number of positions at which the nucleotides vary between regions in this group. The variance value is this number divided by the block length. |
| variance |
float |
n/a |
The proportion of nucleotides that vary between regions in this group. For example, a value of 0 means all regions are identical at every position. A value of 0.5 means all regions are identical at exactly half of the positions. For a block length of 100, a value of 0.03 means all regions are identical at every position but 3. The variance does not indicate the degree of dissimilarity, just how much of each region needs to be examined for SNPs. |
| pattern |
text |
n/a |
A representation of the nucleotides in the group, with question marks substituted for positions that are not identical for all group members. |
| Index |
Unique |
Fields |
Notes |
| idx0 |
yes |
id |
Primary index for GroupBlock. |
Region
A
region describes a location in a contig, and essentially bridges the gap between blocks and contigs. Each instance of this object corresponds to a single segment on a contig. The key is the region's sprout-style location string.
Region Table
| Field |
Type |
Default |
Description |
| id |
name-string |
n/a |
Unique identifier for this Region. |
| contigID |
key-string |
n/a |
Name of the contig containing this region. |
| direction |
char |
n/a |
+ for a forward region, - for a reverse region. |
| endpoint |
int |
n/a |
Index (1-based) of the region's rightmost nucleotide in the contig. |
| len |
int |
n/a |
Length of this region. This may be slightly smaller than the block length. |
| peg |
name-string |
n/a |
PEG identifier for this block if it is a gene block, or a string generated from the nearby PEGs if it is an inter-genic block |
| position |
int |
n/a |
Index (1-based) of the region's leftmost nucleotide in the contig. |
| content |
text |
n/a |
Nucleotide sequence of variance in this region (upper case). For a forward region, this is the exact content of each position of variance in the region. For a reverse region, it is the complement in reverse order. |
| Index |
Unique |
Fields |
Notes |
| idx0 |
|
endpoint |
This index enables the application to find regions that overlap a specific section of the contig. The index can be used to find the first region whose end point is at or follows the start of the section in question. Because every nucleotide is in at most one region, this guarantees that if any region overlaps the section, the region found by the index will. |
| idx1 |
yes |
id |
Primary index for Region. |
Relationships
ConsistsOf
- Each Genome relates to multiple Contigs.
This relationship connects a genome to its contigs.
ConsistsOf Table
| Field |
Type |
Default |
Description |
| from-link |
name-string |
n/a |
id of the source Genome. |
| to-link |
key-string |
n/a |
id of the target Contig. |
| Index |
Unique |
Fields |
Notes |
| idxFrom |
|
from-link |
|
| idxTo |
yes |
to-link |
|
ContainsRegion
- Each Contig relates to multiple Regions.
This relationship connects contigs to the regions on them.
ContainsRegion Table
| Field |
Type |
Default |
Description |
| from-link |
key-string |
n/a |
id of the source Contig. |
| to-link |
name-string |
n/a |
id of the target Region. |
| len |
int |
n/a |
Length of this region. This may be slightly smaller than the block length. |
| position |
int |
n/a |
Index (1-based) of the region's leftmost nucleotide in the contig. |
| Index |
Unique |
Fields |
Notes |
| idxFrom |
|
from-link |
|
| idxTo |
yes |
to-link, position, len DESC |
This index enables the application to find all of the regions in a contig in the order they are present in the contig. |
HasInstanceOf
- Each Genome relates to multiple GroupBlocks.
- Each GroupBlock relates to multiple Genomes.
This relationship connects a genome to the groups represented in its contigs. It provides a fast was to get an ordered list of groups for a genome. The group lists for genomes can then be merged to determine the common groups of a set of genomes.
HasInstanceOf Table
| Field |
Type |
Default |
Description |
| from-link |
name-string |
n/a |
id of the source Genome. |
| to-link |
name-string |
n/a |
id of the target GroupBlock. |
| Index |
Unique |
Fields |
Notes |
| idxFrom |
|
from-link |
|
| idxTo |
|
to-link |
|
IncludesRegion
- Each GroupBlock relates to multiple Regions.
This relationship connects a block to the regions it covers. Note that since the ID of the region is its Sprout-style location string, often it is not necessary to cross to the
Region table when accessing this relationship.
IncludesRegion Table
| Field |
Type |
Default |
Description |
| from-link |
name-string |
n/a |
id of the source GroupBlock. |
| to-link |
name-string |
n/a |
id of the target Region. |
| Index |
Unique |
Fields |
Notes |
| idxFrom |
|
from-link |
|
| idxTo |
yes |
to-link |
|
Topic revision: r2 - 02 Oct 2008 - 17:22:02 -
FigWikiBot