The molecular functions of some uncharacterized proteins can be inferred
from highly similar, experimentally validated proteins based on sequence
similarity, a process we can refer to as "propagation" of the experimentally
verified function. The greater the similarity, the higher the confidence
in associating the propagated function with the uncharacterized protein.
Currently, COMBREX predictions are produced by the following, relatively
conservative propagation method. The sequence of an uncharacterized protein
is directly compared to all known experimentally validated proteins in the
database, and the validated function is propagated if both proteins (1) share
all domains (as defined by CDD), regardless of the order of the domains in the
sequence, and (2) share at least one BLASTP high scoring segment pair (HSP)
with an E-value < 1e-05. We can refer to the characterized genes from which
functions are propagated as source genes, and the uncharacterized genes to
which predicted functions are propagated as target genes. Predictions for
target genes made by COMBREX can be found in the “Predicted Function” section
on the Gene Page. On the Gene Pages of a source gene, a list of related
proteins to which its function has been propagated can be found in the “Status”
section. For example, if a function were propagated from the source gene “oppA”
to the target gene “mppA”, the Gene Page of “mppA” would look as follows, with
the propagated prediction indicated by the red arrow:
Clicking on "More information on this prediction" will open a lightbox containing information about the BLASTP match, common domains, and cluster relationship of the source and target genes, as shown below.
A target gene can in principle receive predictions propagated from several source genes. Shown by default are the source gene ("Prediction based on gene") with the strongest match (smallest BLAST E-value) to the target, along with the BLASTP match, the common domains (defined by CDD), and the NCBI Protein Cluster relationship between that particular source and the target. The propagated function from that source is shown at the top ("COMBREX Predicted Function"), and functions propagated from other source genes are shown at the bottom ("Other COMBREX predictions for this gene"). A link to information about the other source genes is indicated by the red arrow.