Downloads
NMPDR has released new genome annotations for each of the focus organisms in modified GFF3 format. GFF3, described at the Sequence Ontology website, is a flat file format for describing genomic features. The formatted GFF3 files accessible below contain rows of records, each with nine tab-delimited fields: seqid, source, type, start, end, score, strand, phase, and attributes. The "score" and "phase" fields are not in use, so in each row, those fields contain the "." character. Each row describes a feature, which is a region on the DNA located between start and end nucleotide coordinates. To describe a protein-encoding gene, two rows are used to record two features at the same location: gene and CDS. FASTA formatted nucleotide and amino acid sequences follow the tab-delimited table of feature annotations.
- CampylobacterFTPHTTPFTP at BRC-Central
- ListeriaFTPHTTPFTP at BRC-Central
- StaphylococcusFTPHTTPFTP at BRC-Central
- StreptococcusFTPHTTPFTP at BRC-Central
- VibrioFTPHTTPFTP at BRC-Central
NMPDR recommends that you explore BioPerl for data exhanges. The bioperl project has built many converters used to handle a wide variety of data formats for sequences. Bioperl objects are in widespread use and kept up-to-date with even the most fast-moving sequence format definition. The BioPerl module for reading and writing GFF3 format is becoming the de facto validator for these files. There is also a parser of GFF3 files in the BioJava package.