About the project

The Gold Standard Gene Database (GSGD), initiated by COMBREX and carried out in partnership with UniProt and NCBI, aims to identify and collect information about genes and proteins for which the biochemical function has been experimentally determined. Such a database, when complete, will make it easier to distinguish genes whose function has been experimentally determined from those whose function has merely been assumed or predicted based on sequence or structural similarity.

Standards for inclusion in GSGD are rigorous, and require all of the following: (1) a known DNA sequence coding for the validated protein, (2) a record of the species and strain of the organism from which the validated protein was isolated or gene cloned, and (3) a published reference to the experimental determination of function.

How you can participate

Construction of the GSGD can only be accomplished through a large manual curation effort. Exhaustive identification of potential Gold Standard genes from the vast archive of scientific literature is beyond the resources of any one group. We are therefore soliciting the help of scientists to assist us in a massively parallel effort.

The GSGD is currently visible to the pubic only within COMBREX, where genes accepted for inclusion in the GSGD are marked as "Gold" in the Gene Status field. The set of "Gold" genes can be queried using the Advanced Search function by selecting "Gold" from the Gene Status drop-down menu and performing a search as you otherwise would (using keyword, gene name and/or species criteria). If you come across any gene in COMBREX that you believe should be part of the GSGD but is not currently marked with "Gold" status, we strongly encourage you to nominate it for inclusion in GSGD.

Instructions for nominating a gene for the GSGD

To nominate a gene for inclusion in the GSGD, you will need the UniProt accession number, which can be found either through UniProt (www.uniprot.org) or through COMBREX, as follows. (If you already know the UniProt number, or have found it through the UniProt website, skip to step 5 below.)

  1. Using the search function on the COMBREX home page, enter information pertaining to your candidate gene, preferably a unique identifier such as a locus tag or RefSeq accession number, or a keyword or gene name in combination with a species name.
  2. The results of your search are organized as NCBI Protein Clusters. Identify and click on the Protein Cluster containing your gene. (If your search returns many clusters, you may wish to return to step 1 and filter using the species name.)
  3. Your click should take you to the cluster page, which includes a list of genes belonging to this cluster. From here, click on the appropriate gene you wish to nominate.
  4. Your click should take you to the gene page. In the "Summary" section of the gene page, the UniProt accession number is marked "UniProtKB."
  5. Compose an email to Dr. Rich Roberts (roberts@neb.com) with the words "Gold Standard Nomination" in the subject line and the UniProt accession number(s) in the body of the email. If you know the PubMed identifier(s) (PMID) of the reference(s) to the experiment, please include it (them) on the same line as the UniProt number, prefaced by "PMID"; this information is helpful, but not required. If you have more than one gene to nominate, you can list all of them in a single email.


Richard J. Roberts (New England Biolabs and COMBREX) – roberts@neb.com
William Klimke (NCBI) – klimke@ncbi.nlm.nih.gov
Maria Martin (UniProt) – martin@ebi.ac.uk