Evidence Codes |
General comments: This document is intended to help standardize the way the evidence codes are used for GO annotation of genes/gene products. Every GO annotation must indicate the type of evidence that supports it; these evidence codes correspond to broad categories of experimental or other support. The codes are listed along with examples (not exhaustive lists) of the kinds of experiments that would fall into each category.
Note that these evidence codes are intended for use in conjunction with GO terms, and should not be considered in isolation from the terms. In other words, an evidence code indicates how annotation to a particular term is supported, and is not necessarily a classification of an experiment.
For every evidence category, there is room for curators to exercise judgement about the quality of the evidence, and how well it supports annotation to a node within each ontology. The distinction between "TAS" and "NAS" is particularly sensitive to interpretation (see below).
Note: The "database identifier" column in the gene_association file should be filled in whenever possible, to help avoid circular annotations between GO and other databases.
IMP also covers phenotypic similarity: a phenotype that is informative because it is similar to that of another independent phenotype (which may have been described earlier or documented more fully) is IMP (not IGI).
We have also decided to use this category for situations where a mutation in one gene (gene A) provides information about the function, process, or component of another gene (gene B; i.e. annotate gene B using IGI).
We recommend making an entry in the "with" column when using this evidence code (i.e. include an identifier for the "other" gene involved in the interaction). If more than one independent genetic interaction supports the association, use separate lines for each. In cases where the gene of interest interacts simultaneously with more than one other gene, put both/all of the interacting genes on the same line (separate identifiers by pipes in the "with" column). To help clarify:
GOterm IGI FB:gene1|FB:gene2 means that the GO term is supported by evidence from its interaction with *both* of these genes; i.e. neither of these statements are true: GOterm IGI FB:gene1 GOterm IGI FB:gene2See the GO Annotation Guide for more information.
We recommend making an entry in the "with" column when using this evidence code (i.e. include an identifier for the "other" protein involved in the interaction). If more than one independent physical interaction supports the association, use separate lines for each. In cases where the gene product of interest interacts simultaneously with more than one other protein, put both/all of the interacting things on the same line (separate identifiers by pipes in the "with" column). To help clarify:
GOterm IPI DB:id1|DB:id2 means that the GO term is supported by evidence from its interaction with *both* of these proteins; i.e. neither of these statements are true: GOterm IPI DB:id1 GOterm IPI DB:id2See the GO Annotation Guide for more information.
We recommend making an entry in the "with" column when using this evidence code (i.e. include an identifier for the similar sequence). The 'with' column can have more than one identifier, separated by pipes.
The evidence fields can be thought of in a loose hierachy:
TAS/IDA IMP/IGI/IPI ISS/IEP NAS IEAThis hierarchy should not be interpreted as a rigid ranking of evidence types; users can and should form their own conclusions as to the reliability of each type of evidence and each individual annotation. It is a loose hierarchy also partly because the strength of the evidence will also depend on to what resolution you are annotating, and because there is a range of reliability within each evidence category (e.g. 90% versus 20% identity for "sequence similarity" or a two-hybrid result versus co-purification over several columns for "physical interaction").
There may be different kinds of evidence available to support annotating a gene product to different levels within each ontology. For example, there might be a direct assay showing that a protein localizes to the mitochondrion, and a physical interaction suggesting localization to the mitochondrial matrix (more specific node, but less reliable evidence). Curators can annotate genes to both a parent and a child, and cite the same or different kinds of evidence for the annotations as appropriate.
Added 2000-11-08: Heather has seen cases where a paper presents several lines of evidence supporting a conclusion, of which each line of evidence alone is sufficient to annotate to a higher-level (more generic) node, but combining the lines of evidence gives the author (or curator) enough data to support annotating to a lower-level (more specific) node. We've decided to annotate each line of evidence singly, with the appropriate evidence code, for the higher node (e.g. have a line for IMP, another line for IPI, for one GO ID). The annotation to the lower node can then be included with 'TAS' as the evidence; cite the paper if the author draws the conclusion. If the curator draws the conclusion, keep some record of what went into the decision.
Notes on ASS versus NA (from Heather, 2000-02-26)
note added 2000-08-02: ASS is now TAS, and NA is now NAS (see above)
I previously used ASS for evidence from abstracts and to me it indicated less reliable evidence. For review articles I tended to look up the references they cited and take the evidence from the original papers. I have not used NA. Midori and other SGD curators have used ASS for traceable and non-traceable evidence i.e. whether she had just the authors word for it or whether she had a respectable review article where all the necessary references are cited - Midori doesn't go to the original papers but adds the terms qualfied by ASS. It seems that Midori's way of treating review articles makes annotating much faster and easier. If we are to use it in this way then I think it is necessary to have a different evidence field for cases where there is no way of finding the real evidence - call this NA instead. This will mean that we are not at both ends of the reliability spectrum for ASS evidence.
Notes added 2000-03-01 (MAH): SGD curators annotated several genes before the evidence fields were used. For a while, these were given "NA" as the evidence codes. These have since been changed to ASS where reviews were used, or "NR" for other papers. Also, future SGD annotations will use ASS and NA as described in the list above, so that ASS is used for reviews, dictionaries, texts, etc.
Note added 2000-08-02: SGD gene associations have been updated to use TAS and NAS.
Notes on IEP (added 2000-03-08; updated 2000-03-09 MAH): Addition of the IEP category generated a lot of discussion via email. One theme that emerged is that curators and users will have to be careful when interpreting expression results, especially if there's no other kind of evidence linking a gene product with a process. For instance, we certainly don't want to look at a cluster of genes, and, based on previous knowledge of one of them being involved in protein folding, annotate the rest of the genes in that cluster to the same process. This is certainly a dangerous thing to do. But having the IEP code allows curators to include expression data when they deem it appropriate, and allows researchers to make their own decisions/judgements about the reliability of the annotation.
Another important theme, indeed one of the reasons we opted to add the category, is that systematic analysis will prove to be very informative. It was especially well stated by Richard Baldarelli of MGI, so I've included his message here:
It seems that expression data will be very useful for process and cellular component mapping, but caution should be used for function mapping (as Allan and Kara point out [in email messages]). While conventional expression assays will provide useful evidence in several cases, the real benefit will come from expression profiling. The rationale behind expression profiling from chip data is that genes that are coordinately regulated over a range of environments are likely to be involved in the same biological processes, and thus may have interrelated functions. As expression technology evolves to consider other aspects of gene expression (e.g. transcription and post-transcription chips, Mass-spec on 2D protein data), profiling will become an even more valuable tool for process implication. With the genome sequences here or on the way, the most significant information we may have for many genes will be expression profiling data (at least for a while anyway). Accuracy levels for process implication aside, this type of evidence is necessarily indirect. Having an evidence type "expression" takes this into account and remains fairly non-specific.
For more details, check the GO mail archive for messages with the subject "evidence code comment."
Copyright © 1999-2000 Gene Ontology Consortium. Permission to use the information contained in this database was given by the researchers/institutes who contributed or published the information. Users of the database are solely responsible for compliance with any copyright restrictions, including those applying to the author abstracts. Documents from this server are provided "AS-IS" without any warranty, expressed or implied.
联系客服