Houston, We Have a Data Problem

Genomics is generating the most consequential dataset in history. The gap between generating it and having a vision for it will define the next era of human health, wealth, and power.

Lauren Berkowitz · May 19, 2026

In 2013, when a bankrupt company had its assets sold off, the DNA of millions of people was not on the list of things anyone worried about. In 2025, when 23andMe filed for bankruptcy, the genetic data of roughly 15 million people became the single most contested asset in the proceedings, and the states lined up to sue. In twelve years, genomic data went from an afterthought to the thing everyone was fighting over. That shift is the whole story, and most people missed it while it happened.

Genomics is now generating the most consequential dataset in human history, and I do not think that is an overstatement. Your genome is the most personal data that exists. It is you, at the level of the code. It does not change, it cannot be reset like a password, and it implicates your parents, your siblings, and your children whether they consented or not. And we are producing it faster than we are producing any vision for what to do with it.

The generation problem is effectively solved. In 2001, sequencing one human genome cost close to $100 million. Today it is a few hundred dollars and falling. The UK is sequencing every newborn. Nations are building population-scale biobanks. The volume of genomic data doubles roughly every seven months, faster than astronomy, faster than particle physics, faster than the platforms that defined the last era of technology. We are very good at making this data. We are not good at anything that comes after.

Because generating a genome and understanding it are not the same thing, and the gap between them is enormous. We can read almost all of your DNA. We cannot yet tell you what most of it does. A test can tell you a variant is there, and then leave you in a category the field politely calls uncertain, which means nobody knows if it matters. Millions of people are sitting in that category right now, holding information that is technically about their own body and practically illegible.

This is the data problem, and it has three parts that rarely get discussed together. The first is interpretation, turning raw sequence into meaning. The second is infrastructure, the systems to store, connect, and reason over data at a scale that breaks everything built before it. The third is governance, deciding who owns it, who can see it, who profits, and who is protected. Each of these is unsolved. Together they are the defining challenge of the next era, and we are treating them as afterthoughts to the sequencing itself.

The interpretation gap is closing, but unevenly, and AI is the reason it is closing at all. Tools like AlphaMissense can now predict whether a variant is likely harmful across millions of possibilities that no human team could review by hand. Variants that sat in the uncertain pile for years are being resolved. Connections between genes and diseases nobody thought to link are surfacing. This is the part I am genuinely optimistic about, because it is the part where the technology is clearly working and the trajectory is clearly up.

But interpretation improving creates its own problem, because meaning is exactly what makes data valuable, and valuable data attracts everyone. The moment a genome becomes legible, it becomes an asset, and assets get bought, sold, breached, and fought over. This is where the 23andMe story stops being about one company and starts being about all of us. Fifteen million people gave their DNA to a service for ancestry and mild curiosity, and then watched it become a line item in a bankruptcy, subject to sale to whoever showed up with a bid.

The infrastructure problem is less visible and just as serious. Genomic data is not like other data. It is enormous, it is permanent, and its value compounds as interpretation improves, which means data collected today becomes more revealing every year without anyone touching it. Building systems that can store, connect, and reason over this at population scale, securely, is a problem we have not solved, and most of the entities collecting the data are not thinking about it as a problem at all. They are thinking about collection.

The governance problem is the one we are failing most completely. In the US, GINA offers some protection against genetic discrimination in health insurance and employment, and nothing close to comprehensive protection everywhere else. Life insurance, long-term care, and disability are largely fair game. The law was written for a world that no longer exists, and it has not kept pace with a reality where 15 million genomes can change hands in a courtroom.

And this is where it stops being a health story and becomes a wealth and power story. Whoever controls genomic data at scale controls something closer to infrastructure than to a product. The countries building national biobanks understand this. The companies racing to sequence understand this. The gap is between the people generating and holding this data, who understand its value precisely, and the people it actually describes, who mostly do not.

That asymmetry is the thing I keep coming back to. The most personal data that has ever existed is being generated at a pace nobody can absorb, interpreted by tools most people cannot see, stored in systems nobody is auditing, and governed by laws written for a different century. The people who understand what is happening are positioned to benefit from it. The people who do not are the data.

I do not think the answer is to stop, and I do not think you could stop it if you tried. The science is genuinely miraculous and the upside for human health is real. Early detection, prevention, treatments matched to your biology, this is the good version, and it is worth wanting. But the good version does not arrive automatically. It arrives only if the interpretation, infrastructure, and governance problems get treated as seriously as the sequencing did, and right now they are not.

What worries me is the pattern, because we have seen it before. A powerful technology arrives, the capability races ahead, the implications get sorted out later, and later usually means after the people with the least power have already absorbed the cost. Social media was legible in hindsight. Genomics is legible right now, in advance, if we are willing to look.

So the question is not whether we can generate the data. That is settled. The question is whether we can build a vision for it that treats the genome as what it is, the most personal thing a person has, rather than as the next asset class to be optimized. Houston, we have a data problem, and the window to solve it on purpose, rather than in a courtroom after the fact, is open now and will not stay open long.

The genome is out of the tube. What we do next is the only part still up to us.

Read on Substack for comments and discussion ↗