DNA databases

| March 4, 2023

It’s been 70 years since Watson and Crick revealed the double-helix structure of the DNA molecule. Now, instead of wondering at its structure, we chatter about DNA sequencing.

We talk about how soon we will each have our own genomic data stored on our home computer. Or a time when a parent will be handed their newborn in one hand and their baby’s genomic code in the other. Or when our phones will beep to alert us that there is a new study showing that a portion of our sequence (or our child’s) has been linked to this or that disease.

What we should be talking about is what will hold us back from realising these next steps.

Our increasing technological ability, and reduced cost to sequence genomes, will play a part in making this happen in the next decade or two. But ability and cost are not the key to this future. Instead, comprehensive, centralised health records of populations will be the difference between realising this future, or not. Without health records we will not be able to know what the health implications of these sequences are.

Science can read the DNA code, or ‘sequence’ DNA, faster and cheaper and create more raw data than ever before. But how do we interpret it? It’s like having a book in a foreign language without a dictionary. The dictionary in this analogy is the complete health record.

In the US each person’s dictionary has pages scattered across the country locked in different computer systems — some of which we know about, and others long forgotten (that trip to a community hospital one rainy Sunday night when your child wouldn’t stop coughing during that year you lived in Seattle, perhaps). Information will not make it into a computer in the first place if people can’t afford a visit to the doctor.

It is ironic that the US – a country that is among the world leaders in medical technology — will be among the least likely to move from sequence to meaning in this future.

The fragmented and bloated state of an individual’s health information in the US stands in stark contrast to its ability to create technical and life-saving advances – some of the COVID-19 vaccines being among the most recent.

In addition to producing the sequence quickly and affordably, we need to match the raw genomic data to the clinical data – all the mundane details about a person and their life – their height, weight over time, their illnesses big and small, their habits including smoking, drinking, and exercise.

Researchers must disentangle the effects that these attributes and habits have on each other and the genetic code. For example, we know that lack of exercise is not good, but it is worse in the context of being overweight. High blood pressure increases your risk of stroke but having high blood pressure and diabetes increases that risk even more.

Without a dictionary to interpret, you can know what DNA code is there, but you won’t know what it means. Part of this problem is that we don’t know what we don’t know. And most of what we do know focuses on a specific portion of the genome – those genes that provide the instructions for making proteins, one of the building blocks of the body.

Everyone has roughly 20,000 of these protein-coding genes. But they only make up 2 percent of the whole genome. The other 98 percent is an assortment of genes and other long stretches of DNA, much of which we don’t know what it does.

Plenty of examples show that we don’t have to understand the entire genome to understand the function of some genes and learn how to use that knowledge to improve the health of individuals and the population.

For example, cystic fibrosis is a relatively rare condition where defects in one gene are largely responsible for the disease. Children inherit the genes from parents who show no sign of the disease. Each parent has a defective gene, and in the luck of the genetic draw, a child may inherit both defective genes – and develop the condition.

Understanding what the cystic fibrosis sequence means is easier when you have obvious genetic changes paired with specific clinical symptoms. It also helps that most individuals with cystic fibrosis are cared for at centres that are part of a nationwide network funded by the CF Foundation, making joining the dots between DNA sequence and symptoms easier.

In some common conditions – like type 2 diabetes – the complexity of deciphering what is going on in the genome increases astronomically. There are hundreds of genes (that we know of) that have a role in type 2 diabetes.

Unlike individuals with cystic fibrosis, the tens of millions of Americans with type 2 diabetes don’t receive their care at diabetes centres that coordinate this care and share data. These individuals receive care at hospitals and doctor’s offices scattered across the country. Trying to centralise that information is like trying to assemble the dictionary page-by-page for each person.

The US could centralise health information on its population. It chooses not to for a variety of reasons, but largely because it is a society that prizes privacy above all else – including sharing data in ways that might help Americans live better, healthier lives. This value system hinders the ability to realise much of the value of the genomic technology that the nation has created.

Deciphering genomic interactions and effects takes patience, time, and a lot of centralised health data. Instead of leading the way, the US will be a follower. It will cede leadership to those countries that haven’t let privacy concerns stand in the way of assembling their individual DNA ‘dictionaries’.

Originally published under Creative Commons by 360info™.

SHARE WITH: