Stable symbols - improving scientific communication long-term
HGNC, TGMI ·Why should gene symbols be stable?
Wherever possible we avoid changing gene symbols, but occasionally this is necessary, as we described in our last post. In this post we explain how we have progressed from simply avoiding symbol changes to assessing if some gene symbols are likely to ever need changing.
The main motivation for this shift in emphasis is the increasing importance of gene symbols in genomic medicine. Genetic testing, either using a home testing kit, by referral to a specialist clinic, or even by a general practitioner, is becoming more common. This huge increase in interest in genetic data means that changes in gene symbols could lead to an increased risk of confusion - or more worryingly even misdiagnosis. These potential consequences were emphasised to us when we joined the Transforming Genetic Medicine Initiative (TGMI) and discussed these issues with clinical geneticists, who highlighted the need for symbol stability for effective clinical reporting.
The TGMI and Developmental Disorders Gene to Phenotype projects have put together a list of ~3400 genes associated with genetic disease and so it made sense to start with these. Gene symbols that we consider fit for purpose and unlikely to change, will be classed as stable and given a “luggage tag” style label on our website.
But how are we deciding that nomenclature is stable?
This is a complex process - after all, what does a “stable” symbol look like? Eventually we settled on the following four points as an initial guide:
-
Is the HGNC approved gene symbol more widely used than its aliases in the scientific literature? For example, the BRCA1 gene, involved in DNA repair and associated with breast and other cancers, has at least 5 other aliases. Fortunately, they are used much less frequently than our official symbol.
-
Is the HGNC approved gene name similar to the protein name used by UniProt? For example, HGNC calls BRAF, ‘B-Raf proto-oncogene, serine/threonine kinase’, while UniProt names the encoded protein ‘Serine/threonine-protein kinase B-raf’. The word order is different, but they are similar enough to suggest that we agree on the function of the gene product.
-
Is the gene named as a member of an HGNC gene group? Gene groups are created manually by curators and often share a “root” symbol, making it clear which genes are members of the group. This kind of naming is usually popular as it is clear from the symbol the genes are related in some way.
-
Has the gene symbol already been transferred to the equivalent genes (orthologs) in other vertebrates in VGNC, for example in chimpanzee? If so, this means that the symbol has undergone another layer of manual checking.
Ultimately we believe that a key measure of potential stability is whether approved symbols are actually being used - and crucially, used more commonly than unapproved alias symbols in publications. More attention is also given to symbols that are poor search terms, e.g. if they match common English words or acronyms.
While the above factors are useful as a guide, we have found that ultimately the only way to check if a gene symbol is really a candidate for being marked as “stable” is through manual checking. For example, for any given gene, there may be published aliases that we are not aware of until we review the literature for that gene.
Since we began this project, in September 2017, about 1800 symbols have been assessed as being “stable” and only 19 genes have been given new symbols in order to stabilise their nomenclature.
One type of nomenclature that we actively checked for was symbols or names that could potentially be offensive, something that was not so much of an issue prior to the explosion of gene symbols in the clinic. Two examples of this are FUK, ‘fucokinase’, which has become FCSK, ‘fucose kinase’ and ARSE, ‘arylsulfatase E’ which is now ARSL, ‘arylsulfatase L’. During our assessment, 261 gene names, but not the corresponding gene symbols, have been updated. Many of these changes involved adding information about what a gene product does:, e.g. the gene name of BANF1 was the rather obscure ‘barrier to autointegration factor 1’ and is now ‘BAF nuclear assembly factor 1’, which reflects the role of the gene product in the assembly of the cell nucleus.
We now display “stable symbol” tags alongside symbols on our gene report pages and are starting to look beyond disease-related genes, at the symbols (and names) of all our protein coding genes - so watch out for more tags!
Coming up…
The weather is getting colder and the leaves are falling fast. You can read all our news in our Autumn newsletter, which will feature as our next blog post.