Get all your news in one place.

100’s of premium titles.
One app.

Start reading

Get all your news in one place.

100’s of premium titles. One news app.

Start reading

LiveScience

Stephanie Pappas

1st draft of a human 'pangenome' published, adding millions of 'building blocks' to the human reference genome

Genome Live Science Nature Diversity

An illustration of the globe ribbons of bright color wrapped around it, representing the newly drafted human pangenome

Scientists have published the first human "pangenome" — a full genetic sequence that incorporates genomes from not just one individual, but 47.

These 47 individuals hail from around the globe and thus vastly increase the diversity of the genomes represented in the sequence, compared to the previous full human genome sequence that scientists use as their reference for study. The first human genome sequence was released with some gaps in 2003 and only made "gapless" in 2022. If that first human genome is a simple linear string of genetic code, the new pangenome is a series of branching paths.

The ultimate goal of the Human Pangenome Reference Consortium, which published the first draft of the pangenome on Wednesday (May 10) in the journal Nature, is to sequence at least 350 individuals from different populations around the world. Although 99.9% of the genome is the same from person to person, there is a lot of diversity found in that final 0.1%.

"Rather than using a single genome sequence as our coordinate system, we should instead have a representation that is based on the genomes of many different people so we can better capture genetic diversity in humans," Melissa Gymrek, a genetics researcher at the University of California, San Diego, who was not involved in the project, told Live Science.

The newly drafted human pangenome is a collection of different genomes from which to compare an individual genome sequence. Like a map of the subway system, the pangenome graph has many possible routes for a sequence to take, represented by the different colors. The detouring paths at the top of the image represent single nucleotide variants (SNVs), which are single letter differences. The yellow path that loops around itself and repeats the same nucleotides represents a duplication variant. The pink path that loops counterclockwise and follows the nucleotide sequence backwards represents an inversion variant. At the bottom, the green and dark blue paths miss the C nucleotide in its route and represent a deletion variant. The light blue path, which has extra nucleotides in its route, represents an insertion variant. (Image credit: Darryl Leja, NHGRI)

A reference for health

The first full human genome sequence was completed in 2003 by the Human Genome Project and was based on one person's DNA. Later, bits and pieces from about 20 other individuals were added, but 70% of the sequence scientists use to benchmark genetic variation still comes from a single person.

Geneticists use the reference genome as a guide when sequencing pieces of people's genetic codes, Arya Massarat, a doctoral student in Gymrek's lab who co-authored an editorial about the new research with her in the journal Nature, told Live Science. They match the newly decoded DNA snippets to the reference to figure out how they fit within the genome as a whole. They also use the reference genome as a standard to pinpoint genetic variations — different versions of genes that diverge from the reference — that might be linked with health conditions.

But with a single reference mostly from one person, scientists have only a limited window of genetic diversity to study.

The first pangenome draft now doubles the number of large genome variants, known as structural variants, that scientists can detect, bringing them up to 18,000. These are places in the genome where large chunks have been deleted, inserted or rearranged. The new draft also adds 119 million new base pairs, meaning the paired "letters" that make up the DNA sequence, and 1,115 new gene duplication mutations to the previous version of the human genome.

"It really is understanding and cataloging these differences between genomes that allow us to understand how cells operate and their biology and how they function, as well as understanding genetic differences and how they contribute to understanding human disease," study co-author Karen Miga, a geneticist at the University of California, Santa Cruz, said at a press conference held May 9.

The pangenome could help scientists get a better grasp of complex conditions in which genes play an influential role, such as autism, schizophrenia, immune disorders and coronary heart disease, researchers involved with the study said at the press conference.

For example, the Lipoprotein A gene is known to be one of the biggest risk factors for coronary heart disease in African Americans, but the specific genetic changes involved are complex and poorly understood, study co-author Evan Eichler, a genomics researcher at the University of Washington in Seattle, told reporters. With the pangenome, researchers can now more thoroughly compare the variation in people with heart disease and without, and this could help clarify individuals' risk of heart disease based on what variants of the gene they carry.

A diverse understanding

The current pangenome draft used data from participants in the 1000 Genomes Project, which was the first attempt to sequence genomes from a large number of people from around the world. The included participants had agreed for their genetic sequences to be anonymized and included in publicly available databases.

The new study also used advanced sequencing technology called "long-read sequencing," as opposed to the short-read sequencing that came before. Short-read sequencing is what happens when you send your DNA to a company like 23andMe, Eichler said. Researchers read out small segments of DNA and then stitch them together into a whole. This kind of sequencing can capture a decent amount of genetic variation, but there can be poor overlap between each DNA fragment. Long-read sequencing, on the other hand, captures big segments of DNA all at once.