Complete genomes and draft whole-genome shotgun (WGS) sequences are available from
NCBI Microbial Genomes.
Complete genome sequences
One replicon
Records listed of complete genome sequences have direct links to their
GenBank files as well as Refseq links to their genome overview. If the genome has one chromosome and no plasmids, use the GenBank accession number link to open the annotated genome. Read the comments and make note of the strain name, taxon id number, sequencing method, coverage, and number of contigs. Use the small menu at the top of the page and Send to file. This will download a
GenBank-format file of the DNA sequence along with all the annotations and translations.
Multiple replicons (chromosomes and/or plasmids)
For complete sequences that have more than one replicon, click the organism name in the table of genome projects. The genome project will open, listing a separate accession number for each replicon. To avoid downloading these all as separate files, click the organism name and open its taxonomy page from the lineage listed above the picture. In the new page, find the table of Entrez records for this organism. Click the number corresponding to genome sequences. From the list of genome records that appears, if any are NOT a replicon of this organism, then go through and select just the replicons. If all records listed are replicons of this genome (almost always true), then select none of them. Use the dropdown Display menu at the top of the page to display GenBank. The number of items loaded will be stated at the top of the page; however, the records have loaded with the sequences hidden. Uncheck the box that hides the sequence, then click the refresh button. Now, use the control at the top of the page to send all items shown (select none, you'll get all) to a file. This will download all replicon sequences and annotations to one
GenBank file.
Draft WGS sequences
All blue rows in the table of genome projects in-progress have sequence data available. Click on the Accession number for the genome of interest. When the genome overview opens, click on the Refseq number. Scroll to the bottom of the Refseq record. Read the comments and make note of the strain name,
taxon id number, sequencing method, coverage, and number of contigs. Click the link labeled WGS. This will open a list of links to separate nucleotide files for each contig. If there are more than 20 contigs, increase the number of records shown per page from 20 to whatever it takes to ensure that all data is loaded into one browser window. When summaries of all contigs are displayed on one page, use the Display menu to select FASTA. Now, with all FASTA-format sequences displayed in the same window and none checked, use the Send To menu and send all records to a file (select none, you'll get all). This will download one
multi-FASTA file to your computer.
Uploading sequences to RAST
Open your downloaded file in a plain text editor (notepad) to make sure it contains sequence data. Make no changes--just have a look to make sure you got the sequence, then close it without saving.
Login to the
RAST server and elect to upload a new job. Browse for the sequence file you just downloaded, then click the button to upload it. Paste the taxonomy ID you copied from the Refseq or taxonomy record into the box in the next screen. Select either Bacteria or Archaea. The remainder of the form will fill itself in. You may change any of the autofilled fields if you like. Click the green tab to look at the upload summary. The number of contigs found should agree with the number in the Refseq record. Click the button to upload and move on.
Use the information from the Refseq record to complete the next screen. You can find the average read length in the Upload Summary tab. Select the error correction options that suit your needs. Click the button to finish the upload.
Now, wait a day or so for the job to process. You will recieve an automatic email when it is complete. The usage of other people's sequence data is governed by the principles described at (
http://www.genome.gov/page.cfm?pageID=10506376). See the
RAST tutorial to learn how to analyze your completed annotation.
--
Leslie Mc Neil - 13 Mar 2009