COMBREX is a collaborative project to bring the computational and experimental
communities of biologists together in an effort to better understand gene function.
COMBREX uses colored symbols as visual cues to reflect the experimental validation
status of genes and clusters. The colored symbol seen next to a gene name indicates
the experimental validation status of the gene. The colored symbol seen next to
a cluster name indicates the experimental validation status of its
Gold Circle – indicates the protein has been experimentally validated, and the
DNA sequence coding for the exact protein has been determined. The
publication(s) reporting the sequence and the biochemistry are documented.
Green Circle – indicates that the gene is believed to have been experimentally
validated, but manual curation is incomplete, or information required for gold
status is lacking.
Blue Square – indicates the gene has a specific prediction of molecular function,
but has not been experimentally validated, or, the gene's experimental validation
status has yet to be established in COMBREX.
Black Diamond – indicates the gene does not have a specific prediction of molecular
function but it may have predictions of "general" or "non-specific" functionality.
Green Circle - indicates the cluster contains one or more experimentally validated
(green and/or gold) genes.
Blue Square - indicates the cluster does not contain any experimentally validated
genes and that it contains genes with specific predictions of molecular function.
Black Diamond – indicates the cluster does not contain any experimentally validated genes.
Additionally, no constituent gene has a specific prediction of molecular function.
Green genes are believed to have experimentally validated functions as indicated by
other highly curated databases (NCBI, UniProt, EcoCyc and others). In many cases,
they are candidate "gold standard" genes that are awaiting manual curation to
confirm their gold status. Alternatively, a gene that has greater than 98% full-length
sequence similarity to a gold gene is also considered to be a green gene.
The Gold Standard data set consists of genes that have been experimentally validated,
and the DNA sequence coding for those exact proteins have been determined. All gold
entries have been manually curated, and references for the experiments and the gene
sequencing are available. The Gold Standard data set is an ongoing project that is
still in its early stages. If you would like more information or want to help with the
curation of this data set, please contact Dr. Richard J. Roberts (firstname.lastname@example.org).
NCBI and UniProt will soon have a downloadable version of the Gold Standard data set
on their sites, but until then, it can be obtained on the COMBREX site. For more information
on Gold Standard genes please refer to the Gold Standard Genes document.
To find a gene of interest, enter information about the gene into the COMBREX search
engine and click “Search.” The more specific your search term, the more easily you
will be able to find your gene of interest. For instance, using a NCBI Gene ID
to specify your gene will result in finding your gene of interest faster than
using a gene name or gene symbol to specify it.
The search will return a list of NCBI Protein Clusters (groups of highly
sequence-similar proteins thought to perform the same function), each of
which contain genes matching your search criteria. The list of clusters can
be sorted by various criteria including phylogenetic distribution and cluster
size (in terms of numbers of proteins or numbers of organisms). For each
cluster, we highlight genes from either of our two "focus organisms", E. coli
K12 MG1655 and Helicobacter pylori 26695, when present.
If you used a unique identifier such as a RefSeq protein identifer or UniProt
accession number in your search, the search should yield a single cluster, and
the matching gene will be highlighted beneath it, in addition to any genes from
the two focus genomes above.
Any of the following terms can be used:
Entrez GeneID -- e.g.: "1021855"
UniProt accession number -- e.g.: "Q8G6A5"
RefSeq protein accession number -- e.g.: "NP_695922.1"
Please Note: YP and NP must be capitalized and you must include an underscore “_”, but need not include the version number (".1")
Gene Name -- e.g.: "helY"
-- e.g.: "CLSK967808" (Please Note: CLSK denotes non-curated clusters)
-- e.g.: "PRK10917" (Please Note: PRK denotes curated clusters)
-- e.g.: "helicase"
-- e.g.: "RNA helicase"
-- e.g.: "Superfamily II RNA helicase"
Please Note: Using key words or generic gene names may not uniquely identify your gene.
To easily search for a unique, specific gene, please use specific identifiers such as
the Entrez GeneID, UniProt accession number, or RefSeq accession number if any of these
are known. You can also use our Advanced Search feature to limit your search results by
specifying gene status, protein cluster status, or species name.
For more information, please see our Help Center.
Advanced Search allows you to search for a particular gene status, protein cluster status,
and/or species name. Additionally, you can limit your search results to include only those
with a specific gene status, protein cluster status, and/or species name.
To search the COMBREX database for all genes found within a specific organism - Click on
“Advanced Search.” Then type the organism’s name in the “Species” box and click “Search.”
The search will result in a list of clusters, each of which contains at least one gene
from your organism of interest.
To search the COMBREX database for a specific gene found within a specific organism -
Enter the gene information into the COMBREX home page search box. Then click on
“Advanced Search.” Finally, type the organism’s name in the “Species” box and click “Search.”
The search will result in a list of clusters, each of which contains the gene of
interest in the organism of interest.
At the present time, specifying an organism name in the COMBREX home page search box will yield incomplete results. This will be adjusted in a future release.
E. coli str. K-12 substr. MG1655 was chosen as a COMBREX focus organism because of its
frequent use as a model organism in molecular biology and biochemistry. H. pylori 26695
was chosen because of its importance to human health and disease. Although these two
bacteria are considered COMBREX “focus organisms”, thus affording them some preferential
treatment, COMBREX accepts predictions for and funds experimental validation of genes
from any bacterial and archaeal organism.
Ideally, choosing a protein as a target for experimental validation is based upon prior
functional knowledge. We encourage researchers submitting bids to select protein functions
with which they have some previous experience. (The essence of the COMBREX project is to
match specific predictions with expert biochemists who are knowledgeable about the
appropriate assays to use and who have suitable reagents already on hand.) When
selecting a particular protein for experimental validation, we encourage choosing
one that will provide the most information for the entire protein cluster.
Another method of choosing a protein for experimental validation involves choosing a protein will provide the most information for the entire cluster if experimentally validated. If a green or gold gene is present in the cluster, select the protein that is furthest away from the green or gold member. If there are no experimentally validated members in the cluster, select a protein that is close to the centroid of the cluster. The histogram for the average distances from each protein to every other protein in a cluster can be used to help identify good targets for experimental validation. Proteins with larger average distances to every other protein are good targets for clusters containing green or gold genes, and proteins with smaller average distances to every other protein are good targets for clusters containing no experimentally validated genes.
Follow the steps below to find a gene for experimental validation:
1. Find a cluster.
The importance of a particular cluster is ranked according to the following scale
(high priority clusters listed first, and low priority clusters listed last):
2. After choosing a cluster, select a particular gene within the cluster.
- Large cluster with wide phylogenetic spread (low phylogenetic spread score) that contains E. coli and/or H. pylori genes with unknown functions (blue genes)
- Large cluster with wide phylogenetic spread (low phylogenetic spread score) that only contains blue and/or black genes (no green or gold genes are present)
- Any cluster with functionally validated E. coli and/or H. pylori genes (gold or green genes)
- Black clusters (clusters consisting of genes with poor or no predicted functions)
The importance of a gene within a cluster is ranked according to the following scales (high priority genes listed first, and low priority genes listed last):
For genes in clusters containing no green or gold genes:
For genes in clusters containing green and/or gold genes:
- Genes from either E. coli or H. pylori
- Genes with small average distances to other proteins in the cluster (such a gene is the most representative of the entire cluster)
- Any other genes
For more information on how to select a gene for experimental validation and
how to submit a bid for funding to experimentally validate this gene please
refer to the How To Submit A Bid document.
- Genes with large average distances from green and/or gold genes
Attention: You can contact COMBREX associates to get this information.
Keep in mind that bids proposing to validate genes with the same functions
as green and/or gold genes will likely be of low priority unless a case
can be made that the gene to be tested is mis-annotated.
- Any other gene
COMBREX currently uses two criteria to recommend genes in a given Protein Cluster for
experimental validation. First, we recommend validating genes in COMBREX ‘focus organisms’,
which currently include E. coli K-12 MG1655 and H. pylori 26695; this ensures that we
continue to develop a more complete picture of the coding potential, and thus a greater
understanding of the biology, of these two important model organisms.
Second, for ‘blue’ clusters, in other words those with no experimentally validated members
at present, we recommend validating the gene with the shortest average distance to all
other proteins in the cluster as measured using sequence similarity; this gene can be
thought of as lying nearest the centroid of the cluster. The functions of uncharacterized
genes are often predicted based on sequence similarity to experimentally validated homologs,
and an implicit assumption in this process is that the confidence in such predictions
increases as sequence similarity increases. Thus, validating the function of a gene
near the centroid of a cluster results in the greatest overall confidence when that
function is predicted to apply to all other members of the cluster.
In some cases, there may be a compelling reason why a recommended gene might not be a
good validation candidate (for example, the organism from which it comes is highly pathogenic,
difficult to obtain, or otherwise difficult to work with). In such cases, we suggest
using the ‘average distance to other proteins’ metric we provide to choose another gene
from the cluster with a relatively small average distance. We encourage experimental
biologists to use their best judgment.
For more information on the Histogram for the average distances from each protein to
every other protein within a cluster, please see questions 12 and 13 below.
The Histogram for the average distances from each protein to every other protein in a cluster serves to help you identify a good target for experimental validation within a certain cluster. Good targets are considered to be proteins whose experimental validation would provide the most predictive value for the entire protein cluster.
If a green or gold gene is present in a cluster, then a good target for experimental validation is a protein that is far away from it in sequence space. You can get this information by contacting COMBREX.
If a green or gold gene is NOT present in a cluster, then a good target for experimental validation is a protein that is close to the centroid of a cluster. Such a protein will have low average distances to every other protein in the cluster.
Before selecting a protein to experimentally validate, you may want to examine the multiple sequence alignment and full phylogenetic tree of the cluster, if it is small enough to allow such analysis. This information is available on NCBI and can be reached by clicking on the protein cluster link at the top of the protein cluster page.
For additional information and examples, see (avg_dist_to_other_genes.doc)
First, the average distance of a protein to all other cluster members is determined by performing a multiple sequence alignment for each member of a protein cluster. The multiple sequence alignments for the curated protein clusters were provided by NCBI using the tool MUSCLE . These alignments are then converted into a distance matrix using the protdist program within the tool PHYLIP . We use the Jones-Taylor-Thornton model  for amino acid substitution. Finally, we use this distance matrix to calculate the average distance to all other members of the cluster.
1: Edgar RC.
MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Res, 2004 Mar 19. 32(5):1792-7. Print 2004. PubMed PMID: 15034147; PubMed Central PMCID: PMC390337.
2: Felsenstein, J.
PHYLIP - Phylogeny Inference Package (Version 3.2).
Cladistics, 1989. 5: 164-6.
3: Jones DT, Taylor WR, Thornton JM.
The rapid generation of mutation data matrices from protein sequences.
Comput Appl Biosci, 1992 Jun. 8(3):275-82. PubMed PMID: 1633570.
We realize that there may be cases where the annotation we have listed for a protein is incomplete or inaccurate. For example, we may list a protein's function as something generic, such as “protein-binding” when experimental evidence suggests a specific enzymatic function. In other cases, we may list a protein's current annotation as unverified, meaning that we do not have a reference, when an appropriate reference exists. If you notice any annotation on our site which could be improved, please submit an annotation update via the gene page.
We encourage you to use the annotation submission tool to:
For more information on how to submit an annotation and how to nominate a gene
as a candidate for the set of gold standard annotated proteins please refer
to the How To Submit An Annotation guide.
- Update an existing annotation by making it more specific or correcting it
- Nominate a gene as a candidate for the set of gold standard annotated proteins (link to help page on what is the gold standard)
- Make a comment on the existing annotation.
When a list of search results (gene clusters) is returned, the clusters are ordered
by default based on the following criteria. (The search results can also be
manually reordered based on number of organisms, number of proteins, phylogenetic
spread score, and cluster name by using the available sort functionality.)
COMBREX organizes proteins within the context of NCBI protein clusters. Therefore, once
you have submitted your search terms, COMBREX will return one or more appropriate NCBI
Protein Clusters. As of February 2010, the NCBI Protein Clusters database contains 409016
prokaryotic clusters, of which only 7297 have been curated. Curated NCBI Protein Clusters
contain added information which includes functional annotation for proteins, Enzyme Commission
numbers which detail enzymatic function, and publications describing protein function and composition.
Selecting the “Curated Clusters Only” box during a COMBREX search will limit your search
results to curated NCBI Protein Clusters. For more information on the NCBI Protein Clusters
database and the cluster curation process, please see here:
- Validation status:
Clusters with a blue experimental validation status are listed before clusters
with a green experimental validation status which are listed before clusters
with a black experimental validation status. The color assigned to a cluster
represents the experimental validation status of its constituent genes. For more
information on the symbol and color assignments to genes and clusters please refer to the
symbol and color explanation for clusters and genes.
- Number of focus organisms in a cluster:
Clusters containing genes from our two focus organisms, E. coli K12 MG1655 and H. pylori 26695, are listed before clusters that do not.
- Phylogenetic spread score:
The phylogenetic spread score currently employed is an integer that corresponds to the depth of the most recent common ancestor for the species within the cluster in the phylogenetic tree provided by NCBI. For example, a phylogentic score of 0 means that the cluster species are conserved at the root level, and a score of 1 means they are conserved at the kingdom level. In general, clusters with lower phylogenetic spread scores, corresponding to wider spread are listed before clusters with high phylogenetic spread scores.
- Number of organisms in a cluster:
Clusters with genes from larger numbers of organisms are listed before clusters with genes from smaller numbers of organisms
- Number of genes in a cluster:
Clusters with larger numbers of genes are listed before clusters with smaller numbers of genes.
We use the NCBI Protein Clusters as our clustering model. NCBI clusters proteins
mainly based on sequence similarity. Protein sequences are compared by performing
a BLAST (Basic Local Alignment Search Tool that finds regions of similarity between
two protein sequences; for more information on BLAST please refer to
PMID: 18440982) all against
all with an E-value (Expect value: a parameter describing the number of BLAST
alignment matches, known as “hits”, one can attribute to chance; for more information
on the E-value, please see here: (link to reference) cut off 1E-05. Each BLAST alignment
match between two sequences is assigned a BLAST score which is then modified to take
into account protein length and the alignment length between the two protein sequences.
Proteins within a cluster are one another’s best BLAST alignment matches as determined
by the modified score. For a protein within a cluster, all other proteins within that
cluster will have a higher modified score to that protein than would any protein not
within the cluster. For more information on the NCBI Protein Clusters Database and their
methodology of producing protein clusters, please refer to
Proteins within a Protein Cluster share significant sequence similarity with one another.
However, proteins in two different Protein Clusters can also share significant sequence
similarity, so the concept of "related" clusters captures this type of relationship.
Relationships between Protein Clusters are determined by NCBI using the alignment tools
BLASTP (please refer to PMID: 18440982) and RPS-BLAST
(info link to reference describing RPS-BLAST). For two clusters, A and B, to be related,
every protein within cluster A must be related to every protein within cluster B.
Two proteins are defined as related if they share a BLASTP alignment with an E-value
less than 1e-03 and covering greater than 80% of the length of the shorter sequence,
and if they have RPS-BLAST matches to the Conserved Domain Database
(info PMID: 18984618),
they share a similar domain structure. For more information on the NCBI Protein
Clusters Database and their methodology of determining related clusters, please refer to
NCBI Protein Clusters related to a cluster of interest are displayed on the cluster
detail page in order to help the user identify other possible clusters of interest.
Considering proteins within related clusters share significant sequence similarity
with one another, any information gained about one protein will most likely shed
light on proteins within its own cluster as well as on proteins within related clusters.
COMBREX users interested in biochemically characterizing multiple proteins may find it
worthwhile to choose target proteins within related clusters along with multiple proteins
from the same cluster.
At this initial stage of the COMBREX project we are focusing on predictions of biochemical
function, which is typically described by EC numbers or GO Molecular Function (MF) terms.
We will accept predictions in the form of traditional TEXT description but this format
is not encouraged and should be used only when an appropriate structured vocabulary term
is either not available or not specific enough. For step by step instructions on how to submit
predictions of function for a single gene or multiple genes please refer to the How To Submit Predictions guide.
For examples of the formats required for prediction submissions, click here: link to template section in how to submit predictions document.
Once your predictions have been submitted, they will be available on the gene
detail pages of the genes for which you made predictions. To submit predictions
that you also want to validate experimentally, submit a bid for your gene
of interest and explain your prediction in your bid submission. Based on your
preference, these predictions will or will not be made public. For step by step
instructions on how to submit a bid for funding to experimentally validate a gene's
function, please refer to the How To Submit A Bid guide.
Predictions of function for a given gene are available on the gene’s specific gene detail page and on the cluster page to which it belongs.
Predictions of function come from a variety of sources. Genes grouped within a cluster are predicted to have similar functions based on sequence similarity, and so all genes have their cluster definition as a functional prediction. Conserved domains within a gene that are associated with a specific function serve as another source of functional predictions. The NCBI gene definition associated with uncharacterized genes may also serve as a prediction of function. Over time, additional predictions will be added by COMBREX members and the broader community. We expect these predictions of gene function provided by expert teams of computational biologists will be of high quality.
Experimentalists will have the option to experimentally validate any of these predictions
The Functional Linkage Table is a graphical representation of how a gene of interest is functionally linked to other genes as predicted by various methods. Two genes are functionally linked if evidence suggests that they perform the same biological or biochemical function. In these cases one can consider “transferring” the functional annotation from one gene to its functionally linked neighbours. The level of confidence in such a functional annotation “transfer” is indicated by the shading of the square boxes within the table - the darker the shade, the higher the confidence.
Examples of Functional Linkage Networks
based on these functional linkages we can define a network of genes linked to other genes based on experimental evidence or computationally predicted functional linkages.
- two genes that share a domain - may perform related biochemical function (e.g., methyltransferase)
- two genes that are often co-expressed or are co-regulated - may be involved in the same biological process (e.g., SOS response)
- two genes that are shown to interact in a protein-protein interaction assay - may be involved in similar biological processes
Functional linkage networks were originally defined in . A probabilistic version of functional linkage graphs networks (PFLGs) was originally defined in . The first database of functional linkages was described in .
 Eisenberg D, Marcotte EM, Xenarios I, Yeates TO.
Protein function in the post-genomic era.
Nature, 2000 Jun 15. 405(6788):823-6. Review. PMID: 10866208
 Letovsky S and Kasif S.
Predicting protein function from protein/protein interaction data: a probabilistic approach.
Bioinformatics, 2003. 19 Suppl 1:i197-204. PMID: 12855458
 Mellor JC, Yanai I, Clodfelter KH, Mintseris J, DeLisi C.
Predictome: a database of putative functional links between proteins.
Nucleic Acids Res, 2002 Jan 1. 30(1):306-9. PMID: 11752322
OperonDB is a method of predicting gene function that involves detecting and analyzing
pairs of genes located adjacent to one another on the same DNA strand and conserved
in two or more bacterial genomes. For each conserved gene pair, OperonDB estimates
the probability that the genes belong to the same operon by taking into account
alternative possibilities that explain why the genes are adjacent in several genomes.
To determine the structure of an operon, the gene order and orientation must be
conserved in two or more species. Since genes within an operon often have related
functions, knowing the operon's structure provides information about the function of genes within it.
Mihaela Pertea, Kunmi Ayanbule, Megan Smedinghoff and Steven L. Salzberg.
OperonDB: a comprehensive database of predicted operons in microbial genomes.
Nucleic Acids Res, 2009 Jan; 37(Database issue):D479-82. Epub 2008 Oct 23.
Please refer to: Operon Database
Domain fusion allows for a prediction of functional relationship between two distinct genes in an organism if those two genes are fused as a continuous sequence in another organism. The fused gene in one organism suggests a relationship between the component genes in another organism - a relationship which is not necessarily due to sequence similarity. Fusion links frequently relate genes of the same functional category. Therefore, the function of an uncharacterized gene within a fusion link can be inferred from the known function of the gene to which it is fused.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D.
Detecting protein function and protein-protein interactions from genome sequences.
Science, 1999. 285(5428):751-3.
Yanai I, Derti A, and DeLisi C.
Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes.
Proc Natl Acad Sci U S A, 2001. 98(14):7940-5.
Phylogenetic profiling infers the function of a gene from another gene with a known function that has a pattern of presence and absence across a set of phylogenetically distributed genomes that is identical to that of the gene with the unknown function. The profile of a gene consists of the pattern of occurrence of its orthologs across a set of genomes. Orthologs here are used as defined in the COG database. Two genes are assumed to be functionally related if the correlation between their profiles is greater than would be expected by chance.
Wu J, Kasif S, and DeLisi C.
Identification of functional links between genes using phylogenetic profiles.
Bioinformatics, 2003. 19(12):1524-30.
Wu J, Hu Z, and DeLisi C.
Gene annotation and network inference by phylogenetic profiling.
BMC Bioinformatics, 2006. 7:80.
A gene neighborhood consists of genes located near one another on a DNA strand for which their proximity is conserved in several genomes. When a gene neighborhood is known, related function between these genes can be inferred from their conservation of proximity across many genomes. The probability that neighboring genes encode proteins within the same biological pathway depends on the number of genomes in which the proximity of the genes is conserved. The conserved order of genes implies selective bias, which suggests related function. This method produces links between ortholog families that are validated by observed proximity in genomes from multiple phylogenetic groups.
Yanai I, Mellor JC, and DeLisi C.
Identifying functional links between genes using conserved chromosomal proximity.
Trends Genet, 2002. 18(4):176-9.
Yes. COMBREX predictions are produced by conservatively propagating molecular function
from experimentally validated proteins to proteins without annotation and proteins
lacking specific functional annotation. The molecular functions of some experimentally
validated proteins can be propagated to others based on sequence similarity; the greater
the similarity, the higher the confidence in the propagated function. Functional propagation
involves transferring a gold or green gene’s function to a gene with unknown function that
shares significant identity with the experimentally validated gene. The functional
propagation is mainly used to improve the predictions associated with black genes,
consequently turning them into blue genes. Work on propagating function from an
experimentally validated protein to another experimentally validated one is under
progress and may be included in future versions of COMBREX. Propagation of function
follows the criteria that require both proteins to share all the same domains regardless
of the order of the domains and that both proteins share sufficiently enough sequence
similarity to result in a BLAST E-value below 1e-05. These predictions are displayed under
the “Predicted Function” section on the Gene Page of the gene to which the function is
propagated. Proteins that receive annotation of function from an experimentally validated
protein are listed as protein ID’s under the “Status” section on the Gene Page of the
experimentally validated gene, the gene from which the function is propagated.
For more information on the conservative functional propagation predictions produced
by COMBREX, please refer to the Functional Propagation guide.
Additionally, many teams associated with the COMBREX project, such as the Salzberg team,
Vitkup team, DeLisi team, and Segre team, provide expert predictions of gene function.
The collaborative nature of the COMBREX project entails the involvement of many
teams beyond the immediate COMBREX community which include the Horn team,
Greiner team, the Palsson team, the Sjolander, the Haft and other teams.
These teams contributing to the COMBREX project use a variety of different methods
to computationally produce predictions of gene function. These predictions along with
the conservative functional propagation predictions produced by COMBREX can be found
on the gene detail page.
To submit a bid, go to the “Submit a Bid” page by either clicking on the auctioneer’s gavel graphic found on the cluster page to the left of each gene symbol under the “Status” column or by clicking on the “Submit a Bid” button found on the gene page under the “Status” heading in the “Bid Status” row.
On the “Submit a Bid” page you can download the bid submission form here: Bid Submission Form.
The bid submission form requires you to provide the following information:
Once you have completed the bid submission form, attach your completed bid form to the “Submit Bid” page, and remember to specify the amount of funds you are requesting.
- A brief description of the rationale/motivation for pursuing this prediction and the importance/benefit of experimentally validating this gene’s function.
- A summary of what is experimentally known about this gene. Results from a thorough Pubmed search should be described and is an essential element of a successful application.
- A brief description of proposed experimental procedures. Details should include method of protein purification, the proposed assay, any specific reagents that may exist in the lab that will facilitate experimentation, and an estimate of the time required.
- A justification of the budget. The most successful bids will be for costs less than $10,000, and the justification may be brief. If unusual circumstances require funds in excess of this, please specify in detail the reasons. In unusual and well specified cases, we will be able to make larger awards if the potential payoff is very high (e.g. multiple genes are tested, a new technology is developed, targets of clinical or technological significance with short term translational opportunities, etc). In such cases, we request that you submit a pre-proposal.
- Up to three Pubmed IDs which demonstrate your lab’s experience with the proposed essays.
After you submit your bid, please attach a biosketch or CV for the laboratory’s PI. The biosketch or CV can be in any standard NIH, NSF, or other granting agency's format. Other support information is not needed.
If your bid is judged to be competitive by the COMBREX executive committee and its external reviewers, you will be asked to submit a full proposal, which will include a detailed budget and the normal institutional assurances and paperwork required by NIH to establish a subcontract with Boston University. These details are not needed at the time of the initial bid.
You will also find a location to upload a 1-2 page word or pdf document with details of the proposed experiments.
For step by step detailed instructions on how to submit a bid, please refer to the How To Submit A Bid instructions guide.
Genes currently being investigated by an experimental group are not available for
an additional bid until their six month period of investigation completes. The bid
in progress logo will indicate which genes are currently being bid on and consequently
are not available for further bid submission. It is a goal of COMBREX to foster
cooperation and collaboration, rather than competition, and groups awarded a bid
will have a six month exclusive period funded by COMBREX to perform their investigations.
If you are interested in the experimental validation of a certain gene that is already
in the process of validation by another group, please contact the administrators.
Generally, submitting bids for green or gold genes is discouraged, because the functions of these genes have already been validated experimentally. Further validation of these genes will most likely not provide significant additional insight into gene function. However, if there is a new prediction or other information indicating that the current annotation of certain gold or green genes may be incorrect or incomplete, bid submission for those green or gold genes will be considered.
COMBREX administrators should be notified of such possible mis-annotations of green or gold genes, and the evidence of incomplete or mis-annotation will be reviewed. If the evidence is compelling, the bid submission and review process will proceed as usual.
One criterion for labeling a gene "Green" is if it shares 98% or more full-length sequence identity to a confirmed, experimentally-validated Gold gene. A relevant scenario that may occur is the prediction of altered substrate specificity due to amino acid changes near an active site. Bid submissions for these genes are likely to be considered, because the changes can be plausibly linked to altered function, and experimental validation of this could be an important step towards understanding a larger family of enzymes.
The High Priority list consists of uncharacterized genes with specific, biochemically testable
predictions of molecular function that have been nominated by registered COMBREX users
to have high priority for experimental validation. This High Priority status implies that
their biochemical characterization would be of great benefit to the scientific community.
The list is not ranked, and as a result it can be sorted by species name, gene name, and
functional assignment. Registered COMBREX users who wish to apply for funding to experimentally
validate a High Priority gene may do so either through the appropriate gene page, or directly
from the High Priority list (instructions can be found (how to submit a bid document). For step
by step instructions on how to nominate a gene for the High Priority List, please see here:
(link to "how to nominate a gene for High Priority List" document in help center) If you have
any questions regarding the High Priority List please contact us at:
The NCBI Protein Clusters database groups proteins into mutually sequence-similar groups called clusters.
Because the proteins in a given cluster are likely to perform the same or similar function, membership in
a cluster with a defined function can serve as a functional prediction for a given protein.
For more information on the NCBI Protein Clusters Database and their methodology of producing protein clusters,
please refer to: PMID: 18940865.
For a protein that has no function prediction produced by COMBREX or other computational teams,
and for which NCBI Protein Clusters provides no functional clues, we use its NCBI (RefSeq protein)
annotation as the default prediction of function.
The “Phenotype Data” section, within the COMBREX gene detail page, contains information about documented
phenotypes associated with the gene of interest. Specifically, phenotype name, a brief description of
the phenotype, expression class (e.g. wild type, knockout, etc.), and a link to the reference which documents
the association of the gene with the listed phenotype are listed in this section. Currently, this phenotype
data consists of antibiotic resistance, antibiotic hypersensitivity, and gene essentiality. Brief descriptions
of these phenotypes and links to the sources from which these data were obtained are provided below.
Antibiotic Resistance genes
These genes can confer resistance to one or multiple antibiotics through several mechanisms.
The antibiotic resistance data was obtained from the Antibiotic Resistance Genes Database
Antibiotic Hypersensitivity genes
Loss of these genes confers increased sensitivity to one or more antibiotics.
This data was kindly provided by Dr. Jeffrey H. Miller (UCLA). The Miller lab has screened the
KEIO collection of approximately 4000 single gene knockout mutants of E. coli K12 for
increased sensitivity to 22 different antibiotics. More information on the antibiotic
hypersensitivity genes can be found in the original publication: Liu A et al (2010)
Antimicrob Agents Chemother 54(4), 1393-1403. PMID 20065048.
These genes are identified as being essential for growth or viability in one or more of the following
organisms: Escherichia coli str. K-12, Helicobacter pylori 26695, Acinetobacter baylyi ADP1, Bacillus
subtilis 168, Haemophilus influenzae Rd, and Pseudomonas aeruginosa. Information about these candidate
essential genes has been gathered from multiple sources, references for which can be found at the bottom of the “list of Phenotypes” page.
To view a complete list of all the phenotypes available in COMBREX, click on “Advanced Search” next to
the search bar and then click on “View list of phenotypes” within the advanced search options.