Glossary of terms and icons found within COMBREX

Main Page

Search Results Page

Cluster Detail Page

Gene Detail Page


Main Page

Current Status

Status Bar

This figure from the COMBREX home page displays the current experimental validation and prediction status of the genes within two model organisms of historical and biomedical importance, E. coli K12 MG1655 and H. pylori 26695. It is a goal of COMBREX to encourage the biochemical characterization of every gene within the genomes of these two "focus" organisms. Colored sections of the bars correspond to the validation and prediction status that COMBREX assigns to genes (see below), and the numbers in each section indicate the number of genes in the given genome with that status. The bar on the bottom displays the current experimental validation and prediction statuses for all genes within COMBREX.

Gold Circle Gold Gene
A gold gene corresponds to a biochemically characterized protein where the DNA Sequence coding for that exact protein has been determined. The publication(s) reporting the sequence and the biochemistry are documented. More information on the Gold Standard Project can be found in the Gold Standard Genes documentation.


Green Circle Green Gene
A green gene (1) has been experimentally validated, however manual curation is incomplete, or information required for gold status is lacking. (2) Alternatively, a gene that has greater than 98% full length sequence homology to a gold gene is also considered to be a green gene.


Blue Square Blue Gene
A blue gene has a specific prediction of molecular function but lacks biochemical characterization. Some genes that are labeled as blue may already have been biochemically characterized but those genes’ experimental validation has yet to be established in COMBREX. If you come across any blue gene you believe to have been biochemically characterized, we ask that you please submit an annotation to help COMBREX correctly reflect that gene’s experimental validation status. Please refer to the How to Submit an Annotation guide for more detailed information.


Black Diamond Black Gene
A black gene either does not have a prediction of molecular function or it has a non-specific prediction of molecular function.



Search Results Page

Green Circle (Green cluster, found to the left of the names of some clusters)
This green circle indicates that the cluster contains one or more experimentally validated (green or gold status) genes.


Blue Square (Blue cluster, found to the left of the names of some clusters)
The blue square indicates that the cluster contains no experimentally validated (green or gold status) genes known to COMBREX, but does contain genes with specific predictions of molecular function.


Black Diamond (Black cluster, found to the left of the names of some clusters)
The black diamond indicates that the cluster contains genes that do not have predictions of molecular function or have non-specific predictions of function, black genes.


cluster colors (Found to the right of cluster names)
This bar of colored boxes and numbers indicates the number and level of experimental validation of proteins that comprise the cluster. For instance, the bar above indicates that the cluster with which it is associated contains 735 blue genes, 2 green genes, and no gold or black genes.


protiens with crystal structures (Found to the right of some cluster names and in the gene list within the Protein Cluster detail page)
When found next to a cluster name, the “S” encased in a box indicates that the cluster contains at least one protein with crystal structures available in PDB. When found to the right of a gene name, it indicates that the protein encoded by that gene has a crystal structure.


prediction of molecular function (Found to the right of some cluster names and in the gene list within the Protein Cluster detail page)
When found to the right of a cluster name, the “P” encased in a box indicates that the cluster contains at least one gene with predictions of molecular function submitted to COMBREX by other prediction teams. When found to the right of a gene name, it indicates that gene has a prediction of molecular function submitted to COMBREX by other prediction teams.


Human Homolog (Found to the right of the names of some clusters, in the gene list within the Protein Cluster detail page, and to the right of domain names in the Gene detail page)
When found to the right of a cluster name, this symbol indicates that the cluster contains at least one protein with a Pfam domain that is also found in at least one human protein. When found next to a gene name, this symbol indicates that the gene has at least one Pfam domain that is also found in at least one human gene. When found next to a domain name, it indicates that this Pfam domain is also found in at least one human gene.


Recommended by COMBREX (Found to the right of the names of some genes)
When found to the right of a gene name, the “R” encased in a box indicates that the gene is recommended for experimental validation by COMBREX (info). These genes are also highlighted with blue screens for additional emphasis.


Purified Protein (found to the right of some cluster names and in the gene list within the Protein Cluster detail page)
When found to the right of a cluster name, in the search results page, the “U” encased in a box indicates that this cluster contains at least one protein that has been purified by a participant in the Protein Structure Initiative (PSI). The purification of this protein is one step in a series of experimental steps taken to determine its structure using NMR or X-ray crystallography. More information on these proteins can be found at TargetDB, a database which contains experimental progress information and statuses of proteins selected for structural determination. For protocol information on the cloning, expression, purification, and crystallization of the proteins within TargetDB, please see PepcDB, a database which contains experimental protocol and other information about proteins selected for structural determination. When found in the gene list, in the protein cluster detail page, this symbol indicates that these genes encode proteins that have been purified for experimentation.


Cloned Protein (found to the right of some cluster names and in the gene list within the Protein Cluster detail page)
When found to the right of a cluster name, in the search results page, the “C” encased in a box indicates that this cluster contains at least one protein that has been successfully cloned by a participant in the Protein Structure Initiative (PSI). The cloning of this protein is one step in a series of experimental steps taken to determine its structure using NMR or X-ray crystallography. More information on these proteins can be found at TargetDB, a database which contains experimental progress information and statuses of proteins selected for structural determination. For protocol information on the cloning, expression, purification, and crystallization of the proteins within TargetDB, please see PepcDB, a database which contains experimental protocol and other information about proteins selected for structural determination. When found in the gene list, in the protein cluster detail page, this symbol indicates that these genes encode proteins that have been cloned for experimentation.


Phylogenetic spread score: The phylogenetic spread score is an integer that corresponds to the depth of the most recent common ancestor for the species within the current cluster in the phylogenetic tree provided by NCBI. For example, a phylogenetic score of 0 means that the cluster species are conserved at the root level, and a score of 1 means they are conserved at the kingdom level. In general, lower scores correspond to wider phylogenetic spread.


Cluster Detail Page


Histogram (Histograms for the average distances from each protein to every other protein in this cluster, found to the left of gene names in the search results page and the cluster detail page)
The Histogram for the average distances from each protein to every other protein in a cluster, found in the detail pages of only curated clusters, serves to help the COMBREX user identify a good target for experimental validation within a certain cluster. Good targets are considered to be proteins whose experimental validation would provide the most predictive value for the entire protein cluster. If a green or gold gene is present in a cluster, then a good target for experimental validation is a protein that is far away from them. You can get this information by contacting COMBREX. In the absence of a green or gold gene, a good target for experimental validation is a protein that is close to the centroid of a cluster. Such a protein will have low average distances to every other protein in the cluster. Before selecting a protein to experimentally validate, you may want to examine the multiple sequence alignment and full phylogenetic tree of the cluster, if it is small enough to allow such analysis. This information is available on NCBI and can be reached by clicking on the protein cluster link at the top of the protein cluster page. You can also contact us at help-desk@combrex.bu.edu if you would like help selecting genes to experimentally validate or if you have any general questions.
For more information on how the histogram for the average distances from each protein to every other protein in a cluster was produced please see: How is the Histogram for the average distances from each protein to every other protein in a cluster produced?


Gene Detail Page


Functional Linkage Table Functional Linkage Table

This table provides links to genes that are functionally linked to the gene described in this web page. Two genes are functionally linked if there is some evidence that suggests they might be performing the same biological or biochemical function. In these cases one can consider “transferring” the functional annotation from one gene to its functionally linked neighbors.

More information on Functional linkage networks:

Functional Linkage Networks

Functional linkage networks (FLN's), originally defined by Eisenberg and coworkers [1], are representations of putative functional relationships between proteins. In FLN's, nodes typically represent proteins, and edges indicate experimental or computational evidence that the two proteins they connect share a common function. Evidence for edge creation can take many forms, including the presence of a common domain, transcript coexpression, or protein-protein interaction. FLN's enable the prediction of function for uncharacterized proteins based on propagation of information from characterized proteins to their network neighbors (the "guilt by association" principle). Probabilistic functional linkages that enable propagation of evidence across the network (beyond neighbors) have been demonstrated [2], and databases of functional links are available [3-5].

[1] Protein function in the post-genomic era.

Eisenberg D, Marcotte EM, Xenarios I, Yeates TO. Nature. 2000 Jun 15;405(6788):823-6. Review.PMID: 10866208

[2] Predicting protein function from protein/protein interaction data: a probabilistic approach.

Letovsky S, Kasif S.Bioinformatics. 2003;19 Suppl 1:i197-204.PMID: 12855458

[3] Predictome: a database of putative functional links between proteins.

Mellor JC, Yanai I, Clodfelter KH, Mintseris J, DeLisi C. Nucleic Acids Res. 2002 Jan 1;30(1):306-9.PMID: 11752322

[4] STRING-8—a global view on proteins and their functional interactions in 630 organisms.

Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, Bork P, von Mering C. 2009 Nucleic Acids Res Jan; 37(Database issue):D412-6. PMID: 18940858.

[5] VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology.

Hu Z, Hung JH, Wang Y, Chang YC, Huang CL, Huyck M, DeLisi C. 2009 Nucleic Acids Res Jul; 37(Web Server issue), W115-21. PMID 19465394.