FASTA format

FASTA format is a standard format for encoding DNA or protein sequences. A FASTA file may contain a single or multiple sequences in FASTA format.

A single sequence is described by a title line followed by one or more data lines. The title line begins with a right angle bracket followed by a label. The label ends with the first white space character. Everything after that on the first line is considered a comment. The data lines begin right after the title line and contain the sequence characters in order. Each data line except the last should be exactly 60 letters long, although many programs allow a little flexibility on that score.

PEG number 1 of Staphylococcus Aureus MRSA 252
The first Protein Encoding Gene of Staph aureus MRSA 252 is shown in FASTA format. The letters in this example are amino acid codes.

The box below shows a FASTA file containing multiple RNA genes from Listeria monocytogenes 10403S. In this case, the letters are DNA Nucleotide Codes, and the file extension would be either ".fasta" or ".fna" (for Fasta Nucleic Acid). When the sequences are amino acids, file extensions are either ".fasta" or ".faa" (for Fasta Amino Acid).

>fig|393133.3.rna.1
ggagaaatacccaagtccggctgaaggggacagactcgaaatctgttaggtggtgtatgc
cgcgccggggttcgaatccccgtttctccg
>fig|393133.3.rna.2
gggttgttagctcagttggtagagcagctgactcttaatcagcgggtcgggggttcgaaa
ccctcacaaccca
>fig|393133.3.rna.3
gcccatatagttaaacggatataacaagcccctcctaagggctagttcgtggttcgattc
cgcgtatgggcg
>fig|393133.3.rna.4
gccgctttagctcagttggtagagcacttccatggtaaggaaggggtcgtcggttcaaat
ccgacaagtggct
>fig|393133.3.rna.5
gtcctgatagctcagctggatagagcaacggccttctaagccgtcggtcgggggttcgaa
tccctctcaggacg
>fig|393133.3.rna.6
gagccgttagctcagttggtagagcatctgacttttaatcagagggtcgctggttcgaac
ccagcacggctca
>fig|393133.3.rna.7
gccggcttagctcagttggtagagcaactgatttgtaatcagtaggtcgcgagttcgact
cttgcagccggca
>fig|393133.3.rna.8
ggggaagtactcaagtggctgaagaggtgcccctgctaagggtataggtcgctcgcgcgg
cgcgagggttcaaatccctccttctccg
To see (and optionally download) an individual gene in FASTA format, you use the sequence link on the annotation overview page.

FastaSequenceButton.png

In addition, most NMPDR Search result pages allow you to download genes or locations in FASTA format, either as raw DNA or as translated protein sequences. If you are downloading them in DNA form, you can also specify a number of flanking positions on either side. So, for example, to include the 50 base pairs before and after each gene, you would type 50 into the little box next to the nt in the search results activity box.


The listing below was obtained by requesting for a 50 nucleotide flanking width FASTA download for the results of a search for luxR genes. The flanking nucleotides are shown in lower case; the nucleotides for the actual gene are shown in upper case. Note that for the sake of brevity, only the first three genes are shown.
>fig|273036.3.peg.1759 [Staphylococcus aureus RF122] Two component transcriptional regulator VraR, LuxR family
ttcaggtacacgtatcgaggtgaaagcacctttaaataaggaggattcgtATGACGATTA
AAGTATTGTTTGTGGATGATCATGAAATGGTACGTATAGGAATTTCAAGTTATCTATCAA
CGCAAAGTGATATTGAAGTAGTTGGTGAAGGCGCTTCTGGTAAAGAAGCAATTGCCAAAG
CCCATGAGTTGAAGCCAGATTTAATTTTAATGGATTTACTTATGGATGACATGGATGGTG
TAGAAGCGACGACTCAGATTAAAAAAGATTTACCGCAAATTAAAGTATTAATGTTAACTA
GTTTTATTGAAGATAAAGAGGTATATCGTGCATTAGATGCAGGTGTCGATAGTTACATTT
TAAAAACAACAAGTGCAAAAGATATCGCCGATGCAGTTCGTAAAACTTCTAGAGGAGAAT
CTGTTTTTGAACCGGAAGTTTTAGTGAAAATGCGTAACCGTATGAAAAAGCGCGCAGAGT
TATATGAAATGCTTACAGAACGAGAAATGGAAATATTATTATTGATTGCGAAAGGTTACT
CAAATCAAGAAATTGCTAGTGCATCGCATATTACTATTAAAACGGTTAAGACACATGTGA
GTAACATTTTAAGTAAGTTAGAAGTGCAAGATAGAACACAAGCTGTTATCTATGCATTCC
AACATAATTTAATTCAATAGttcatatcgaattaagaaaagttacttacgccaatcacaa
tataacatca
>fig|93062.4.peg.393 [Staphylococcus aureus subsp. aureus COL] Two component transcriptional regulator VraR, LuxR family
ttcaggtacacgtatcgaggtgaaagcacctttaaataaggaggattcgtATGACGATTA
AAGTATTGTTTGTGGATGATCATGAAATGGTACGTATAGGAATTTCAAGTTATCTATCAA
CGCAAAGTGATATTGAAGTAGTTGGTGAAGGCGCTTCTGGTAAAGAAGCAATTGCCAAAG
CCCATGAGTTGAAGCCAGATTTAATTTTAATGGATTTACTTATGGATGACATGGATGGTG
TAGAAGCGACGACTCAGATTAAAAAAGATTTACCGCAAATTAAAGTATTAATGTTAACTA
GTTTTATTGAAGATAAAGAGGTATATCGTGCATTAGATGCAGGTGTCGATAGTTACATTT
TAAAAACAACAAGTGCAAAAGATATCGCCGATGCAGTTCGTAAAACTTCTAGAGGAGAAT
CTGTTTTTGAACCGGAAGTTTTAGTGAAAATGCGTAACCGTATGAAAAAGCGCGCAGAGT
TATATGAAATGCTTACAGAACGAGAAATGGAAATATTATTATTGATTGCGAAAGGTTACT
CAAATCAAGAAATTGCTAGTGCATCGCATATTACTATTAAAACGGTTAAGACACATGTGA
GTAACATTTTAAGTAAGTTAGAAGTGCAAGATAGAACACAAGCTGTTATCTATGCATTCC
AACATAATTTAATTCAATAGttcatatcgaattaagaaaagttacttacgccaatcacaa
tataacatca
>fig|359787.3.peg.2603 [Staphylococcus aureus subsp. aureus JH1] Two component transcriptional regulator VraR, LuxR family
ttcaggtacacgtatcgaggtgaaagcacctttaaataaggaggattcgtATGACGATTA
AAGTATTGTTTGTGGATGATCATGAAATGGTACGTATAGGAATTTCAAGTTATCTATCAA
CGCAAAGTGATATTGAAGTAGTTGGTGAAGGCGCTTCTGGTAAAGAAGCAATTGCCAAAG
CCCATGAGTTGAAGCCAGATTTAATTTTAATGGATTTACTTATGGAAGACATGGATGGTG
TAGAAGCGACGACTCAGATTAAAAAAGATTTACCGCAAATTAAAGTATTAATGTTAACTA
GTTTTATTGAAGATAAAGAGGTATATCGTGCATTAGATGCAGGTGTCGATAGTTACATTT
TAAAAACAACAAGTGCAAAAGATATCGCCGATGCAGTTCGTAAAACTTCTAGAGGAGAAT
CTGTTTTTGAACCGGAAGTTTTAGTGAAAATGCGTAACCGTATGAAAAAGCGCGCAGAGT
TATATGAAATGCTTACAGAACGAGAAATGGAAATATTATTATTGATTGCGAAAGGTTACT
CAAATCAAGAAATTGCTAGTGCATCGCATATTACTATTAAAACGGTTAAGACACATGTGA
GTAACATTTTAAGTAAGTTAGAAGTGCAAGATAGAACACAAGCTGTCATCTATGCATTCC
AACATAATTTAATTCAATAGttcgtatcgaattaagaaaagttacttacgccaatcacaa
tataacatca
Topic revision: r7 - 16 Jan 2009 - 15:05:16 - Bruce Parrello
 
Notice to NMPDR Users - The NMPDR BRC contract has ended and bacterial data from NMPDR has been transferred to PATRIC (http://www.patricbrc.org), a new consolidated BRC for all NIAID category A-C priority pathogenic bacteria. NMPDR was a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.