An Update on the Annotation Clearinghouse: May, 2008

The AnnotationClearinghouse was introduced to support two services. The basic service allows the user to ask for all of the function annotations for a specified protein sequence (designated using any of the common IDs). Some of the returned assertions of function may be designated as expert assertions. These expert assertions were made by individuals who felt they could state with reasonable reliability an accurate function for the protein. Hence, we view these expert assertions as particularly valuable.

The second service provided by the Clearinghouse is the ability for a user to register and then to upload a collection of expert assertions (i.e., to contribute expert assertions to the collection).

These two simple services have, in our opinion, the potential of dramatically improving existing collections of annotations. To understand why we believe this, let us first describe how the NMPDR staff processes contributed expert assertions:

  1. First, we take the set of uploaded assertions and find all cases in which the an assertion reflects an opinion about a protein in one of our subsystems. In this case, we take the function asserted by the expert (let us call it expert-function) and compare it against ours (let us call this NMPDR-function). If the functions are not precisely (character-for-character) identical, then we ask Are the two functions just different ways to say the same thing? If so, we add the pair {NMPDR-function, expert-function} to a growing list of synonyms that we maintain. If not, we consider the pair as reflecting a significant difference of opinion.
  2. We maintain a web site in which our annotators and any contributing expert can easily see the significant differences of opinion. Further, people can attach comments to such differences and see all of the comments others have attached.
  3. In most cases, either our annotator or the expert can determine who has made an error and correct it. If it is our error, we re-annotate the protein within the NMPDR, and if it is the expert's error, a new corrected assertion is uploaded. When the conflict has been resolved, all record of the disagreement is purged.

This simple processes produces three distinct products: the growing list of synonyms, the list of expert assertions, and the disagreements that have not yet been reconciled.

What has made this effort a success is the number of expert assertions that have now been collected (currently, in excess of 50,000 covering over 1000 functional roles). A number of individuals and institutions have contributed expert assertions, which is leading to a rapidly growing effort to reconcile differences, and a rapidly expanding set of proteins with functions that have been certified (by at least one individual) as reliable.

Within the NMPDR, we have now made it possible to rapidly find out whether an expert assertion exists for any given protein. Soon, we will have all of the subsystems connected to relevant expert assertions, as well as all of the impacted FigFam protein families. When lists of smilar proteins are produced, it will become possible to locate the closest protein with a function supported by an expert assertion. The entire collection of

  • expert assertions,
  • synonyms produced by our annotators during the process of reconciliation,
  • assertions relating to any FigFam, and
  • assertions associated with columns in subsystems

is available in the Clearinghouse directory of our FTP site.

It is our plan that any expert in an specific set of proteins will be able to contribute assertions, and then these assertions will be used to rapidly propagate corrections to the main annotation groups participating in the overall reconciliation effort.

AuthorDataForm
Original Author RossOverbeek
Display Title An Update on the Annotation Clearinghouse: May, 2008
Original date 2008-05-01
Citation string

Topic revision: r3 - 02 Sep 2008 - 19:01:53 - BruceParrello
FIG.AnnotationClearinghouseUpdate moved from Sandbox.AnnotationClearinghouseUpdate on 02 Sep 2008 - 18:58 by BruceParrello - put it back
 
NMPDR is a collaboration among researchers from the Computation Institute of the University of Chicago, the Fellowship for Interpretation of Genomes (FIG), Argonne National Laboratory, and the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NMPDR is funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract HHSN266200400042C. Banner images are copyright © Dennis Kunkel.