COMBREX Predictions

The molecular functions of some uncharacterized proteins can be inferred from highly similar, experimentally validated proteins based on sequence similarity, a process we can refer to as "propagation" of the experimentally verified function. The greater the similarity, the higher the confidence in associating the propagated function with the uncharacterized protein. Currently, COMBREX predictions are produced by the following, relatively conservative propagation method. The sequence of an uncharacterized protein is directly compared to all known experimentally validated proteins in the database, and the validated function is propagated if both proteins (1) share all domains (as defined by CDD), regardless of the order of the domains in the sequence, and (2) share at least one BLASTP high scoring segment pair (HSP) with an E-value < 1e-05. We can refer to the characterized genes from which functions are propagated as source genes, and the uncharacterized genes to which predicted functions are propagated as target genes. Predictions for target genes made by COMBREX can be found in the “Predicted Function” section on the Gene Page. On the Gene Pages of a source gene, a list of related proteins to which its function has been propagated can be found in the “Status” section. For example, if a function were propagated from the source gene “oppA” to the target gene “mppA”, the Gene Page of “mppA” would look as follows, with the propagated prediction indicated by the red arrow:

Propagation Prediction

Clicking on "More information on this prediction" will open a lightbox containing information about the BLASTP match, common domains, and cluster relationship of the source and target genes, as shown below.

Propagation Information Link

A target gene can in principle receive predictions propagated from several source genes. Shown by default are the source gene ("Prediction based on gene") with the strongest match (smallest BLAST E-value) to the target, along with the BLASTP match, the common domains (defined by CDD), and the NCBI Protein Cluster relationship between that particular source and the target. The propagated function from that source is shown at the top ("COMBREX Predicted Function"), and functions propagated from other source genes are shown at the bottom ("Other COMBREX predictions for this gene"). A link to information about the other source genes is indicated by the red arrow.