Reference genome

From Infogalactic: the planetary knowledge core
Jump to: navigation, search
The first printout of the human reference genome presented as a series of books, displayed at the Wellcome Collection, London

A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of a species' set of genes. As they are often assembled from the sequencing of DNA from a number of donors, reference genomes do not accurately represent the set of genes of any single person. Instead a reference provides a haploid mosaic of different DNA sequences from each donor. For example GRCh37, the Genome Reference Consortium human genome (build 37) is derived from thirteen anonymous volunteers from Buffalo, New York.[1][2][3] The ABO blood group system differs among humans, but the human reference genome contains only an O allele (although the other alleles are annotated).[4]

As the cost of DNA sequencing falls, and new full genome sequencing technologies emerge, more genome sequences continue to be generated. Reference genomes are typically used as a guide on which new genomes are built, enabling them to be assembled much more quickly and cheaply than the initial Human Genome Project. Most individuals with their entire genome sequenced, such as James D. Watson, had their genome assembled in this manner.[5][6] For much of a genome, the reference provides a good approximation of the DNA of any single individual. But in regions with high allelic diversity, such as the major histocompatibility complex in humans and the major urinary proteins of mice, the reference genome may differ significantly from other individuals.[7][8][9] Comparison between the reference (build 36) and Watson's genome revealed 3.3 million single nucleotide polymorphism differences, while about 1.4 percent of his DNA could not be matched to the reference genome at all.[2][5] For regions where there is known to be large scale variation, sets of alternate loci are assembled alongside the reference locus.

The human and mouse reference genomes are maintained and improved by the Genome Reference Consortium (GRC), a group of fewer than 20 scientists from a number of genome research institutes, including the European Bioinformatics Institute, the National Center for Biotechnology Information, the Sanger Institute and McDonnell Genome Institute at Washington University in St. Louis. GRC continues to improve reference genomes by building new alignments that contain fewer gaps, and fixing misrepresentations in the sequence.

The human reference genome GRCh38 was released on 24 December 2013.[10]

The previous human reference genome (GRCh37) was the nineteenth version. This build contained around 250 gaps, whereas the first version had ~150,000 gaps.[1]

Reference genomes can be accessed online at several locations, using dedicated browsers such as Ensembl or UCSC Genome Browser.[11]

Tutorials

References

  1. 1.0 1.1 Lua error in package.lua at line 80: module 'strict' not found.
  2. 2.0 2.1 Lua error in package.lua at line 80: module 'strict' not found.
  3. Donors were recruited by advertisement in The Buffalo News, on Sunday, March 23, 1997. The first ten male and ten female volunteers were invited to make an appointment with the project's genetic counselors and donate blood from which DNA was extracted. As a result of how the DNA samples were processed, about 80 percent of the reference genome came from eight people and one male, designated RP11, accounts for 66 percent of the total.
  4. Lua error in package.lua at line 80: module 'strict' not found.
  5. 5.0 5.1 Lua error in package.lua at line 80: module 'strict' not found.
  6. The exception to this is J. Craig Venter whose DNA was sequenced and assembled using shotgun sequencing methods.
  7. Lua error in package.lua at line 80: module 'strict' not found.
  8. Lua error in package.lua at line 80: module 'strict' not found.
  9. Lua error in package.lua at line 80: module 'strict' not found.
  10. New human genome assembly (GRCh38) released, NCBI news
  11. Lua error in package.lua at line 80: module 'strict' not found.

External links