Guide to the COMBREX website

The information in COMBREX is freely available for anyone to search and view. Searching for genes or groups of genes is the common first step to actions such as requesting funding for experimental work (info), submitting predictions of gene function (info), or submitting other kinds of information. All of these actions, while free of charge, require a COMBREX account to which you must be logged in. Register for a COMBREX account here.

Throughout the COMBREX website, hyperlinks marked "(info)" will bring you to relevant information on a specific topic within the Help Center. This guide describes the layout of the various pages within the COMBREX website, and contains the following subsections:

INITIATING A SEARCH

THE SEARCH RESULTS PAGE

THE PROTEIN CLUSTERS DETAIL PAGE

THE GENE DETAIL PAGE

REGISTERING AND LOGGING IN



INITIATING A SEARCH

You can search the COMBREX database for genes or groups of genes using the search box, which can be found on the COMBREX Home Page, and at the top of the Search Results, Protein Clusters Detail, and Gene Detail pages. A Basic Search is shown by default, but clicking on Advanced Search will bring up additional filters for more targeted searches.

The search function searches on gene names, descriptions, predictions, and identifiers. Thus, you may enter in the search box a gene symbol, a keyword or word fragment, an Enzyme Commission (EC) number, or an identifier such as an Entrez GeneID, an NCBI Protein Clusters ID, a RefSeq accession number, or a UniProtKB accession number (info).

Several example search terms are provided above the search box.


THE SEARCH RESULTS PAGE

Regardless of whether your search matches a single gene or many genes, all search results are organized into groups of similar genes called Protein Clusters. These groups are determined by NCBI, and the methodology for their determination is described in the following publication: Klimke W et al (2009) Nucleic Acids Research 37, D216-D223 (PMID 18940865). Further information may be found at the NCBI Protein Clusters Home Page.

Organization of search results by Protein Cluster is done for two reasons. First, when many genes are returned by the search, this reduces redundancy by grouping sequence-similar genes. Second, by presenting genes in the context of groups, a user can easily see whether there is an experimentally validated gene that is similar or identical to that returned by the search. The latter is useful even when a search is intended to return only a single gene, as when searching with a specific identifier.

A typical entry from the search results is shown below. In this case, many Protein Clusters were returned, and the list can be rearranged by various criteria using the "sort results by" drop-down menu and clicking the adjacent "sort" button.

Each entry in the search results is organized as follows. The top line, which is the primary search result, represents a Protein Cluster, and shown next to it are the number and gene status (info) of the genes in the cluster. Beneath this line, selected individual genes within this cluster may be shown, and they are grouped into two sections. The first section includes up to three genes matching the specific criteria from your search; this is useful, for example, when you are searching on an identifier specifying a unique gene. The second section includes all genes in this cluster recommended by COMBREX for experimental validation (info); such "recommended genes" are in all cases highlighted by blue boxes. Keep in mind there may be redundancy between the two sections, as in the example below.

cluster_page_bottom

A note about keyword searches: searching on multiple keywords will retrieve Protein Clusters containing both keywords somewhere in the predictions or annotations of their component genes. The different keywords may not necessarily co-occur in any single component gene.


THE PROTEIN CLUSTERS DETAIL PAGE

The information on this page pertains to a specific NCBI Protein Cluster, and is divided into five sections. Much of the information is self-explanatory, and further information about specific fields can be accessed using the "(info)" links on the page itself. Below are some additional comments on specific fields.

The Summary section includes information about the gene status of all of the component genes (i.e., how many are experimentally validated, how many have good predictions, etc.) using the COMBREX color scheme (info). The cluster itself is assigned a color, which is an aggregate description of the status of its component genes, and this can be seen on the top line of this section. Also listed here are other Protein Clusters that NCBI has deemed to be sequence-related to this one (info).

The Functional Information section contains functional or structural information that can be construed to apply to all genes within the cluster. Possible exceptions to this information make good candidates for experimental validation. Note the top line of this section, which is identical to the title of the page, is the definition assigned to this cluster by NCBI, and may not always agree with predictions or other annotations associated with its component genes.

The last section, Genes In This Cluster, provides a complete list of all genes in the cluster, which in some cases may be quite large. Genes in the table can be rearranged using the "(sort)" links in the table header. Those genes in the cluster that COMBREX recommends for experimental validation are highlighted.

cluster_page_bottom


THE GENE DETAIL PAGE

This page contains information pertaining to a specific gene, and also serves as the primary gateway for contributing information to or applying for funds from COMBREX. As for the Protein Cluster Detail Page, much of the information on the Gene Detail Page is either self-explanatory or is linked to further information via "(info)" links. Additional comments on the various sections of this page follow.

The Actions section contains action buttons that allow you to contribute information as predictions (info) or comments (info), or initiate the grant application process (info). These actions pertain to the gene that is the subject of the Gene Detail Page specifically, and the associated forms are pre-populated with information about that particular gene. These actions require registration as a COMBREX user and login. See below for further information on becoming a registered COMBREX member.

The Function Predictions section includes predictions generated by COMBREX (info), predictions obtained from external databases such as NCBI Protein Clusters, and predictions submitted directly to COMBREX by computational laboratories. Predictions may be in the form of free text, or may consist of structured vocabulary such as Gene Ontology (GO) terms or EC numbers.

If the gene has been experimentally validated (status gold or green), there may be an additional section called Experimental Validation. This section contains references (in the form of PMID links) to the experimental determination of function, as found in source databases such as UniProt, EcoCyc, or the Gold Standard Gene Database.

The Supporting Information section contains functionally pertinent information, including domain structure and GO terms, from external sources. It also contains fields that differ based on the gene's functional status, as follows. For experimentally validated genes (status gold or green), it lists the other (unvalidated) genes in COMBREX to which COMBREX has transferred the function of this gene based on sequence similarity (info). For predicted genes (status blue), it lists the source of the prediction (info); in other words, where in the COMBREX color determination pipeline this gene came to be labeled as "blue". For unvalidated genes (status blue or black), it gives the results of a BLASTP search of a database of experimentally validated genes, showing by default the top hit, and optionally all hits, below an E-value score threshold of 1e-05.

If the gene can be "functionally linked" to other genes using sequence-independent methods such as gene neighborhood conservation or operon structure, phylogenetic profiling, or gene fusion, this information is presented in an optional section called Functionally Linked Genes. Such linkages typically represent participation in a common biological process rather than a common biochemical function. This section also contains links enabling visualization of these genes as a network using the VisANT tool. VisANT is also described in the following publication: Hu Z et al (2009) Nucleic Acids Research 37, W115-21 (PMID 1946594).

If the gene participates in a known biological pathway, this information is presented in an optional section called Pathway Information.

cluster_page_bottom



REGISTERING AND LOGGING IN



Actions such as submitting information to COMBREX or applying for funding for experimental validation require registration and the creation of an account with COMBREX. To register, simply click the "Log In" link at the top right corner of the Home Page and follow the instructions, or click here (COMBREX registration) to access the registration form.

Once your account is set up, simply enter your username and password to log in.