Frequently Asked Questions About MG-RAST

Using MG-RAST

What is MG-RAST for?

  • MG-RAST will compare a metagenome data set (i.e., a very large set of short nucleotide sequences) to protein and RNA databases in order to determine the functional and phylogenetic content of the data.

What input format does MG-RAST accept?

  • MG-RAST can accept sequence data generated by 454 pyrosequencing as well as longer reads generated by the Sanger method. DNA sequences reads should be in FASTA format. All reads for one metagenomic sample should be in one file. Multiple metagenome files from one study or project may be uploaded together if they are compressed together in one archive using zip or tar and gzip? . You may include the quality file in the compressed archive, but that data is not used in the MG-RAST analysis. All data files must be plain text--not Microsoft Word.

What are projects?

  • Projects are related sets of metagenomes. If you for example plan on studying a set of samples from a chrono-series, it might be useful to group them into a project.

What level of privacy does MG-RAST v2.0 provide?

  • We provide password control and the ability for the submitting entity to control access to the submitted data sets on a username/password basis. Note that we currently do not provide industry standard encryption as this would put additional load on our server infrastructure and is not strictly required for scientific purposes.

Do you support BLASTing against my private database XYZ?

  • We currently do not explicitly support this, however the underlying software design end system architecture support this.

How frequently do you update the underlying NR for MG-RAST?

  • With version 2.0 we have added the support for multiple concurrent sets of sequence similarity results to be stored per metagenome. We can add results for newer NRs. However once you start comparing results for metagenomes (say you are interested in the phylogenetic reconstruction) different versions of the NR used for the underlying data will lead to incorrect comparison results as older versions of NR will miss certain organisms and or annotations.

How long does it take to analyze my metagenome?

  • The answer depends on two factors a) the size of your data set and b) the current server load. Under optimal conditions, it takes about 18 hours to run a 100 million basepair 454 metagenome through the pipeline.

I just submitted a job, but don’t see it in my jobs list?

  • New jobs do not get displayed immediately, wait a few minutes and they will show up.

How many metagenomes can I submit?

  • We do not restrict user submission of samples. However the computation required is massive and samples are processed on a first-come first-served basis.

What parameters should I use to analyze my data?

  • The answer depends on your sample. In any case we recommend that you modify e-value, minimal alignment length and percent identity requirements for the BLAST results underlying the results. The effects of this are different for each sample. Depending on sample complexity, sample size, number of species an diversity of species present your results will vary dramatically when modifying these parameters. For RNA based phylogenetic reconstruction, we recommend requiring a minimum alignment length of 50bp for exact matches.

Where can people access my "published" metagenomes?

  • The MG-RAST v2.0 Homepage has a list of publicly accessible metagenomes. Future versions will continue to support this feature; also, we will provide a metadata-based selection tool that will allow the user to focus on metagenome data sets from the environment or condition they are interested in.

What about HIPAA relevant data?

  • MG-RAST is provided under the assumption that all data is anonymized, no HIPAA relevant data should be stored on MG-RAST.

How can I download a subset of fragments in FASTA format?

  • Many pages support downloading the data into a spreadsheet format (e.g. MS Excel). On the Metabolic Reconstruction page or the Phylogenetic Profile, you can download a subset of the fragments contained in the sample matching a specific group of organisms or matching a specific part of metabolism via clicking on the tab for Tabular view. There you click on a given subset.

How are Overview statistics calculated?

  • Total number of sequences--This is the total number of sequences submitted by the user for this metagenome. Not all of these will produce results later on. It is possible and very probable that some sequences can not be matched to anything in our database.
  • Total sequence size--This is the sum of the lengths (bp) of all submitted sequences.
  • Average sequence length--This is the Total sequence length divided by the Number of sequences
  • Longest sequence length--This is the length (bp) of the longest sequence submitted.
  • Shortest sequence length--This is the length of the shortest sequence submitted

Will my MG-RAST jobs ever be deleted?

  • We presently guarantee that we will not delete your job until at least 120 days after completion.

Contact Information

Who should I contact regarding questions about or problems using MG-RAST?

  • All questions, comments, or problems regarding MG-RAST should be directed to mg-rast @ mcs.anl.gov . Likewise, all questions, comments, or problems regarding RAST should be directed to rast @ mcs.anl.gov .

-- Leslie Mc Neil - 08 Jan 2009

Topic revision: r3 - 10 Jan 2009 - 23:51:09 - Leslie Mc Neil
 
NMPDR is a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.