Minimising Changes - why do we ever change gene symbols?
HGNC ·We avoid changing HGNC approved gene symbols unless there are very good reasons to do so. We appreciate that change can be disruptive, but sometimes it’s also necessary. This blog post will explain why we make changes to gene symbols and give some examples of changes that we’ve made recently.
Gene symbols are a short form of the gene name and are the most commonly used way to refer to a gene. They’re used in research papers, scientific talks, in the clinic, and increasingly by the media. To make them user-friendly, we aim for symbols to be pronounceable, memorable… and ideally stable!
All named genes have a single official unique symbol, shown at the top of our gene reports, and we also list alternative symbols that have been used in the literature and in other databases. Where symbol changes have been made, we always list the previous symbols as well so that these are easily found.
Placeholder symbols: Corfs, KIAAs and FAMs
The majority of recent symbol changes have been made because we are updating functionally uninformative “placeholder symbols”. HGNC policy has always been to view placeholder symbols as temporary assignments that should be updated when the function of a protein encoded by a gene is identified.
HGNC have three main types of placeholder symbols, created to enable people to talk about newly discovered genes.
These symbols have the following luggage tag image: directly next to the approved symbol in their gene reports on our site.
-
Corfs specify the chromosome number that the gene is found on, followed by the letters ‘orf’ for open reading frame and a number (e.g. C1orf43).
-
KIAAs were approved for a set of genes identified by the Kazusa cDNA sequencing project when no other information was known about them (e.g. KIAA0100).
-
Finally, the FAM root is used to group together a set of genes that are related based on sequence similarity, but cannot be described by function or (named) conserved domains (e.g. FAM49A and FAM49B).
Renaming these temporary placeholder symbols allows us to assign informative user friendly symbols such as MTRES1 (mitochondrial transcription rescue factor 1) in place of C6orf203; PRORP (protein only RNase P catalytic subunit) in place of KIAA0391; and TASOR (transcription activation suppressor) in place of FAM208A.
If you have functional data about a gene that currently has a placeholder symbol, then please get in touch with us at hgnc@genenames.org so that we can work with you to assign appropriate new nomenclature.
Other reasons for symbol updates
There are several other main reasons why we may consider changing an HGNC approved gene symbol.
A symbol may be updated if:
- the current symbol is pejorative and may cause offence, particularly in a clinical setting e.g. DOPEY1 was renamed to DOP1A (DOP1 leucine zipper like protein A).
- the current symbol is misleading - for example, suggesting the gene is part of a gene group that it has subsequently been found not to be a member of e.g. OTX3 was initially named erroneously as part of the OTX family and has been renamed as DMBX1 (diencephalon/mesencephalon homeobox 1) following advice from homeobox gene family experts.
- the community working on a gene or group of genes has requested a change, is overwhelmingly in support of a symbol update, and the suggested new symbol meets our nomenclature guidelines e.g. BAI1 was renamed as ADGRB1 (adhesion G protein-coupled receptor B1) following a new unified nomenclature for Adhesion GPCRs instigated by the International Union of Basic and Clinical Pharmacology.
- the gene is far more heavily published on using an alias symbol, there is no compelling reason why we cannot approve the alias and the community working on the gene is in support of an update e.g. we updated RNASEN to DROSHA (drosha ribonuclease III) due to overwhelming community usage.
- the current symbol is not functionally informative and has not become well established in the literature, so an update can be considered if the function of the encoded protein has now been determined e.g. TMEM206 (transmembrane protein 206) was renamed to PACC1 (proton activated chloride channel 1).
- the gene product is discovered to encode an enzyme; where possible the gene name will be in line with the enzyme commission accepted name e.g. MB21D1 was updated to CGAS (cyclic GMP-AMP synthase).
In all these cases, we write to all authors who have published on the gene or genes in question to gauge their opinions on an update and attempt to reach a consensus agreement. In cases where there are realistically too many publications to write to every group, we run an automated script to pick out authors who have heavily published on a gene to consult with.
In most cases, when a symbol has become well established in the literature but the associated name is functionally uninformative or misleading, we choose to retain the symbol and update the gene name only e.g. the SMCR8 gene name was updated from “Smith-Magenis syndrome chromosome region, candidate 8” to “SMCR8-C9orf72 complex subunit”.
As approved gene symbols may occasionally be updated, HGNC IDs are useful as a consistent stable identifier that can be quoted in publications or used when working with our data. Every named gene has a unique HGNC ID in the format HGNC:number which does not change even if the symbol and/or name is updated.
Coming up…
In next month’s blog post we will describe our current project in collaboration with the Transforming Genetic Medicine Initiative (TGMI) to provide a systematic review and planned long-term stabilization for clinically relevant genes. Please join us to find out more about how we are assessing the nomenclature and how we plan to label symbols that we believe are unlikely to change as “stable”.