Similarity Scores: the E-Value

The E-value (or Expect Value) is a parameter that describes the number of hits one can expect to see by chance when performing an alignment search in a database of a particular size.

In an alignment search (e.g. BLAST), the input sequence is the query and the sequence matched is the hit. The raw alignment score indicates how much of the query sequence corresponds to the hit sequence. If the raw score is 331 and the E-Value is 1 for a search against a particular genome, it would mean that given any random genome of the same size, we would expect to get on average 1 hit with a raw score of 331.

The lower the E-Value, the higher the quality of the match and the more likely it is that the matching sequences are truly related. For a fairly lengthy query sequence, E-Values tend to be extremely close to zero, and a match is considered a good one if the E-Value is less than 1e-10. For shorter sequences the chance of a random match is higher and the E-Values are also higher.

When you perform a BLAST search using the NMPDR Sequence Search, the E-Value appears in the first column of the Search Results, as shown below.

BLAST results, showing the E-Value column
Topic revision: r3 - 31 Mar 2009 - 21:11:50 - Bruce Parrello
 
Notice to NMPDR Users - The NMPDR BRC contract has ended and bacterial data from NMPDR has been transferred to PATRIC (http://www.patricbrc.org), a new consolidated BRC for all NIAID category A-C priority pathogenic bacteria. NMPDR was a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.