We are encouraging scientists from both the computational and experimental communities to submit gene function predictions to COMBREX. Predictions are accepted for all genes represented in the COMBREX database. However, of particular value are predictions for genes about which little is already known.

At this initial stage of the COMBREX project we are focusing on predictions of biochemical function, which is typically described by EC numbers or GO Molecular Function (MF) terms. We will accept predictions in the form of traditional TEXT description but this format is not encouraged and should be used only when an appropriate structured vocabulary term is either not available or not specific enough (see more information on this below).

How to submit Predictions (step-by-step instructions)

1: Submitting Prediction of function for a single gene

Selection of target gene

1.1: Identifying your gene of interest

1.2: Logging into COMBREX

Preparing data for submission

1.3 Completing required fields in submission form

1.4 Preparing prediction file

Submit file

1.5 Submitting file of predictions

1.6 Receiving acknowledgment of your submission

2: Submitting Predictions of function for multiple genes

Initial selection

2.1 Selecting set of genes for prediction

2.2 Logging into COMBREX

2.3 Accessing Prediction Submission form

Preparing data for submission

2.4 Completing required fields in submission form

2.5 Preparing prediction file

Submit file

2.6 Submitting file of predictions

2.7 Receiving acknowledgment of your submission

3: File Formats for Batch Submission of Functional Predictions



1: Prediction of function for a single gene

Submitting a prediction for a single gene is as simple as finding the gene in the COMBREX database and filling in a form that can be accessed from the gene detail page.

Selection of target gene

1.1 Identifying your gene of interest

You may find the following documents helpful when searching for your gene of interest in COMBREX:

How to find your gene of interest in COMBREX How to find a gene of interest in COMBREX document.

A guide to navigating COMBREX Guide to the COMBREX website document.

Once you arrive at the gene detail page for your gene of interest, click on the “Submit Prediction” button under the “Actions You Can Take” section to initiate the prediction submission process.



submit_button

1.2 Logging into COMBREX

You must be a registered COMBREX user to submit predictions, nominate genes for the High Priority list, nominate genes for the Gold Standard dataset and apply for funding. To register, simply click the "Log In" link at the top right corner of the Home Page and follow the instructions, or click here: (COMBREX registration) to access the registration form. If you are not currently logged in, you will be asked to log in at this point.


login

Preparing data for submission

1.3 Completing required fields in submission form

Once you have logged in, the “Prediction/Annotation Submission Form” page will appear with the gene ID, gene symbol, and gene name of your gene of interest already filled in.


prediction_submission

In the box marked “Comment” comments about the prediction can be made using free text.

1.4 Preparing prediction file

Attach the file containing predictions (in tab delimited format) for the specific gene by clicking on “Choose file” (as shown below). To review the list of acceptable file formats for prediction submission, please see: File Formats For Batch Submission Of Functional Predictions.



Prediction file type

Submit file

1.5 Submitting file of predictions

When done, click on the "Submit" button to submit your prediction.

Prediction Submission

1.6 Receiving acknowledgement of your submission

After submitting your prediction, you will receive an email confirming that it has been received by COMBREX. You will receive another email once your prediction is successfully integrated into COMBREX.



2: Prediction of function for multiple genes

The following instructions involve submitting predictions for multiple genes

Initial Selection

2.1 Selecting set of genes for prediction

Users can select a batch of genes that exist in COMBREX for which they want to submit predictions

2.2 Logging into COMBREX

You must be a registered COMBREX user to submit predictions, nominate genes for the High Priority list, nominate genes for the Gold Standard dataset and apply for funding. To register, simply click the "Log In" link at the top right corner of the Home Page and follow the instructions, or click here: (COMBREX registration) to access the registration form. If you are not currently logged in, you will be asked to log in at this point.


login

2.3 Accessing Prediction Submission form

Once logged in, you will be taken to your user portal, with your user profile initially displayed.

To access a list of your previously submitted predictions as well as to submit more predictions, click on “My Predictions,” as shown below.

My Predictions Detail

From the options displayed on the page, click on the “Submit predictions for MULTIPLE genes” link, as shown below

Submit Predictions For Multiple Genes

Preparing data for submission

2.4 Completing required fields in submission form

In the box marked "Comment", comments about the predictions can be made using free text.

Predictions For Multiple Genes

2.5 Preparing prediction file

Attach the file containing predictions (in tab delimited format) for the specific gene by clicking on “Choose file” (as shown below). To review the list of acceptable file formats for prediction submission, please see: File Formats For Batch Submission Of Functional Predictions.

Choose file multiple predictions

Submit file

2.6 Submitting file of predictions

When done, click on the "Submit" button to submit your predictions.

Submit button for multiple genes

2.7 Receiving acknowledgment of your submission

After submitting your predictions, you will receive an email confirming that they have been received by COMBREX. You will receive another email once your predictions have been successfully integrated into COMBREX.

3: File Formats For Batch Submission Of Functional Predictions

Predictions of molecular function forproteins can be submitted to COMBREX in one of the following two formats. Followingthe description of each format, there is a link to download a sample file with examples of submitted predictions for reference.

Case 1: Predicting functions for a set of individual genes (associating genes with functions)

Header lines:

* # Lab Name: This field contains the details of how a submitter wants to be identified. For example, as a lab name, email address, name of the Principal Investigator etc.

* # Contact Email: An email id that can be used for corresponding with the group regarding successful receiving of predictions, notifications regarding incorrect file formats etc.

The following fields should be separated by tabs. Every row represents a single prediction for a single protein. If a protein has multiple predictions, put them in separate rows.

* Gene Identifier: This field can contain either the Entrez GeneID or NCBI RefSeq Protein GI. The field can only contain numerical values

* Type of Identifier: This can be “0” for GeneID ; “1” for ProteinGI ; default is 0. This field can only contain numerical values

* Prediction: This field contains the predicted function as a free text description. Limit up to 250 characters

GO/EC Number (optional): This field can contain either the GO term or the EC number corresponding to the predicted function (if available). GO terms must be in the format “GO:0005515” and EC numbers must be in format “EC:3.4.11.4”.

Confidence Score (optional): This field contains the confidence value or the probability score of the protein having the particular prediction, normalized to a percentage score between 0 and 100.

Evidence: This field contains evidence to the submitted prediction. For example: Prediction based on homology to gene “X”

* Reference: This field contains the PMID (PubMed ID) of the paper that describes the method used for obtaining the particular prediction. For example: 20675471. This field can only contain numbers

Links (optional): This is an optional field in case if there is extra information available as an external link.

Sample File

Case 2: In addition to directly predicting functions, COMBREX supports predicting functional linkages between genes. To do this, assign an identifier to each group of linked genes (and optionally name the group), and then submit predictions where each gene is predicted to be associated with that group identifier. You can also optionally describe predictions for the linked genes if available.

Header Lines:

* # Lab Name: This field contains the details of how a submitter wants to be identified. For example, as a lab name, email address, name of the Principal Investigator etc.

* # Contact Email: An email id that can be used for corresponding with the group regarding successful receiving of predictions, notifications regarding incorrect file formats etc.

The following fields should be present as columns separated by tab. Every row represents a single prediction for a protein. In the case, a protein has multiple predictions, put them in separate rows.

*Identifier for Gene1: This field can contain either the Entrez GeneID or NCBI RefSeq Protein GI. The field can only contain numbers

*Identifier for Gene2: This field can contain either the Entrez GeneID or NCBI RefSeq Protein GI. The field can only contain numbers

*Type of Identifier: This can be “0” for GeneID ; “1” for ProteinGI ; default is 0. This field can only contain numerical values

Direction of Linkage (optional): This field contains numerical values depending on the direction of the linkage. The value is “0” if the direction of linkage is unknown. The value is “-1” if the direction of linkage is from Gene2 Gene1. The value is “1” if the direction of linkage is from Gene1 Gene2. Note: In cases, when the linkage is bi-directional, please output in 2 separate rows, one row indicating linkage from Gene1 –> Gene2 and the other row from Gene2 –> Gene1.

Prediction (optional): This field contains the predicted function (if the protein set is involved in a common biochemical function) as a free text description. If the proteins are just linked together by a common process or reaction, this field can be left blank Limit up to 250

GO/EC Number (optional): This field can contain either the GO term or the EC number corresponding to the predicted function (if prediction available). GO terms must be in the format “GO:0005515” and EC numbers must be in format “EC:3.4.11.4”. In the case that the proteins are just linked to one another by some common process/reaction, this field can be left blank.

Confidence Score (optional): This field contains the confidence value or the probability score for the predicted linkage normalized to a percentage value between 0 and 100

Evidence (optional): This field contains evidence to the submitted linkage/prediction.

*Reference: This field contains the PMID (PubMed ID) of the paper that describes the method used for obtaining the particular prediction. For example: 20675471. This field can only contain numbers

Links (optional): This is an optional field in case if there is extra information available as an external link.

Sample File

Case 3: Gene Function can also be predicted through the membership of the gene in a functionally relevant group. For example: presence of Gene X in a Protein Cluster can predict certain function for the Gene based on the function of the Cluster. The format below can be used to submit predictions of the similar type

Header Lines:

* # Lab Name: This field contains the details of how a submitter wants to be identified. For example, as a lab name, email address, name of the Principal Investigator etc.

* # Contact Email: An email id that can be used for corresponding with the group regarding successful receiving of predictions, notifications regarding incorrect file formats etc.

The following fields should be present as columns separated by tab. Every row represents a single prediction for a protein. In the case, a protein has multiple predictions, put them in separate rows.

* Gene Identifier: This field can contain either the Entrez GeneID or NCBI RefSeq Protein GI. The field can only contain numbers

* Type Identifier: This can be “0” for GeneID ; “1” for ProteinGI ; default is 0. This field can only contain numerical values

* Group Identifier: A numerical value identifying a particular group/cluster. For example: A number for Protein Cluster “PRK09604”, say 1

Group Name (optional): A name for the particular group/cluster used for predicting function (if available). For example, Protein Cluster “PRK09604”

Group Description (optional): This field contains the predicted function of the group as a free text description. Limit up to 250 characters

GO/EC Number (optional): This field can contain either the GO term or the EC number corresponding to the predicted function of the group (if available). GO terms must be in the format “GO:0005515” and EC numbers must be in format “EC:3.4.11.4”.

Confidence Score (optional): This field contains the confidence score or the probability score for the predicted group membership normalized to a percentage value between 0 and 100

Evidence (optional): This field contains evidence to the submitted prediction.

*Reference: This field contains the PMID (PubMed ID) of the paper that describes the method used for obtaining the particular prediction. For example: 20675471. This field can only contain numbers

Links (optional): This is an optional field in case if there is extra information available as an external link.

Sample File

Questions and direct emails are encouraged. Please contact us at: help-desk@combrex.bu.edu

All fields marked by “*” are mandatory. Optional fields can be left blank if case of unavailability of information.

Submitted predictions will be selectively transferred (“pushed”) to several prominent databases such as NCBI (along with being available on the COMBREX website) and even more, will have a good chance to be selected by experimental scientists for experimental testing.