[<< wikibooks] Structural Biochemistry/DNA recombinant techniques/Human Genome Project
== What is the Human Genome Project ==
The human genome project is a 13 year long project run by the US Department of Energy and the National Institutes of Health.  The task of sequencing the estimated 3 billion DNA base pairs was intimidating and it took an international effort to complete, the UK’s Wellcome Trust was a partner in the effort and many other countries such as Germany, France, China and Japan also made great contributions.  The goals of the project included: sequencing the 3 billion human DNA base pairs, identifying the estimated 20,000-25,000 genes found in human DNA, to store all this information in an accessible databases and improve the tools for data analysis.
The DNA of a set of model organisms was also sequenced and studied to provide comparative information so that scientists could understand how the human genome functioned. The primary reason for this project was to use the sequenced genome to understand and eventually treat the ~4000 genetic diseases that afflict humans, as well as the many multi-factorial diseases in which genetic predisposition plays an important role. Currently the human genome project research and the technologies developed are being used in: molecular medicine, energy sources/environmental applications, risk assessment, Bioarchaeology, anthropology, evolution, human migration, DNA forensics (for identification purposes), agriculture, livestock breeding, bioprocessing and many other fields.
The Human Genome Project (HGP) was a world wide collaboration initiated to discover and create a database for the entire human genome.  A genome is the entire DNA of an organism (including its genes). DNA is made up of sequences of four different bases Adenine, Guanine, Thymine and Cytosine.  Knowing the specific sequence of these backbone bases allows us to compare and determine certain diseases.  For example sickle-cell anemia is caused from the alteration of a single base from A to T.  The identification of these differences makes it possible to research and seek medical advances on diseases and DNA disorders. To further understand the importance and significance of the sequence of the human genome studies were carried out on non human species such as the fruit fly, mice, and e-coli. 
Creating a means to store the information obtained by the Human Genome Project was a challenge in itself.   The human genome contains approximately 3 billion different sequences which alone would take up about 3 gigabytes of memory additional memory would be required for the ongoing advances on these sequences.  Bioinformatics Morey Parang, Richard Mural and Mark Adams were the main contributors to designing a means to store this massive amount of information.   
Other goals of the HGP include: identifying all 20 to 25 thousand genes in the human DNA, improve tools for data analysis, transfer technologies to private sectors, and address the ethical, legal, and social (ELSI) that might come from the HGP.  
The three billion dollar project was supposed to take 15 years to complete, as a combined world wide effort, including scientists from China, Japan, UK, Germany, and France. Strangely enough, the project was finished in 13 years rather than the expected 15. Some believe that the reason behind this anomaly is because of a privately funded only around three hundred million dollar project by Celera Genomics. Because an Celera Genomics declared that they were going to finish sequencing the human genome before the three billion dollar effort put forth by the cooperative effort from scientists around the world. They used a slightly more risky approach called the "shotgun sequencing", instead of sequencing the genome in a linear manner, they "shot" the seqeuences into small segments and found where they overlapped. The competition between the two groups helped fuel the project, as larger efforts were put forth from both parties to finish before the other. Thus, both shared equal amounts of work towards the finished project of the sequencing of the entire human genome.
The project started in 1990 and was completed in the year 2003.  Although the goals established by this project were completed the information obtained is still being analyzed and researched in order to make advances in life sciences.  The HGP has benefited many areas of science such as molecular medicine, energy sources and environmental applications, risk assessments, bioarchaeology, anthropology, evolution, human migration, DNA forensics, and bioprocessing.  Although the human genome is now considered "complete", there are still many sections of the genome that haven't been sequenced as of yet. Many gaps still exist, but the sequences are put onto world wide data bases such as BLAST as the small tadbits are sequenced by different groups around the world.

== Why the Human Genome Project? ==
The human genome project was developed in order to open the eyes of mankind. 
For instance, through sequencing the human genome, scientists can now study genetic diseases in more detail and depth. The human genome project has helped identify genes associated with different genetic conditions such as myotonic dystrophy, fragile X syndrome, Alzheimer's disease, familial breast cancer and more. This will help researchers develop better ways to treat the root of the conditions rather than just the symptoms of the diseases. In traditional medicine, it will also allow earlier identification of diseases and treatment options customized to individuals. Research is also underway to improve gene therapy—in the future, scientists may be able to fix or replace faulty genes.  Another area of research involves the variation in individual response to environment.   As researchers determine which genes code for sensitivity to environmental pressures, like carcinogens and irritants, they will be able to better predict the risks involved for an individual exposed to a risky environment. One of the most important results of this area of study is an increased understanding on how low-level exposure to radiation effects cancer risc.
This project also helped boost forensics; scientists can now create DNA fingerprints of small areas of DNA regions that vary between individuals, allowing for accurate identification techniques. Fluids, tissue, and hair at crime scenes have greatly increased in utility as a direct result of genetic research. The importance of the Human Genome Project to forensics extends beyond the criminal sphere. The "DNA fingerprints" can also be used match organ donors, establish familial relationships, and identify microorganisms that might be polluting an environment. 
Because of the expansion of the general knowledge related to sequencing genomes, smaller projects such as the microbial genome program were developed to sequence the genomes of bacteria. The goal of this is to find ways to produce energy, reduce toxic wastes, and industrial processing through the use of microbes and microbial enzymes. It can be used to analyze in greater depth the influence of even the tiniest forms of life on an ecosystem.  Also, sequencing certain microbial genomes allows scientists valuable insight into the ways pathogenic microbes infect the human body.   Because of the dependence humans have on the microbial world, the relationship between human world and the microbial world is also worth researching. This will benefit both human health and the environment.
The human genome project has also helped people understand the road humankind took in the process of evolution. It gives scientists a glimpse of history as it helps connect the three kingdoms of life: determine a complete lack of ethnic divisions Archaebacteria, Prokaryotes, and Eukaryotes.   Comparative genomics aid scientists in determining what specific segments of human DNA code for by comparing them to the equivalent segments in other organisms.   
As for more recent history, DNA studies have been used to on an ethnic level, proving that ethnicities are the product of society, not different DNA.  That said, specific markers on the Y chromosome can be used to trace the migrations of a man's paternal line throughout human history.
The field of behavioral genetics has also been aided by the completion of the Human Genome Project. For years, scientists recognized evidence that many behaviors have biological foundations. For example, within a species certain behaviors consistently crop up, and such specific behaviors can be passed along to later generations (like the herding instincts of an Australian Shepherd).  Further support for this theory includes cross-species parallels in behavior, particularly with closely related species.  While traditionally the area of behavioral genetics has been centered around the study of twins and adoptees in an attempt to clarify the nature vs. nurture debate—that is, how much of our behavior is actually coded into our DNA, and how much has resulted from environmental influences.  Behavioral genetics is complicated by both the difficulty of quantifying certain abstract concepts (i.e. intelligence) and by the fact that any behaviors are coded by multiple genes and affected by other factors.  In addition, any results from research regarding behavioral genetics will likely be hot-button issues and thus will require greater care before drawing conclusions. 
Finally, the human genome project has developed many techniques for genetic engineering, resulting in genetically modified plants and animals for better food and energy production. Examples include crops engineered to require less pesticide use or less water.  The human genome project has also allowed the development of plants that break down certain types of waste. This has led to many disputes over the right of man to alter "natural" organisms and concern over the long-term effect of genetically modified organisms—particularly those intended for human consumption.

== TimeLine of Major Goals Completed ==
Sep-94:  1-cM resolution Genetic map was produced with ~3,000 markers
Dec-94: High-throughput oligonucleotide synthesis technologies developed 
Aug-96: Methanococcus jannaschii genome sequenced; confirms existence of third major branch of life on earth.
Sep-96: First sequenced genome, for yeast, was completed
Dec-96: DNA microarrays technologies developed
Oct-98: A physical map with ~52,000 STS’s (i.e. - a sequence tagged site, a short DNA segment that occurs only once in a genome) was completed
Dec-99: First Human Chromosone Completely Sequenced 
Nov-02:  Fiscal Report showed financially the project was on track; the project was sequencing more than 1,400 fragments per year at only $0.09 per finished base, well below the estimated costs of 500 fragments a year at $0.25 per base
Dec-02:  Genomic-scale technologies: scale-up of two-hybrid system for protein-protein interaction was developed
Feb-03:  3.7 million mapped human SNPs (i.e. - Single nucleotide polymorphisms, DNA sequence variations that occur when a single nucleotide in the sequence is altered)
Mar-03: 15,000 full-length human cDNAs (i.e. - DNA molecules that are complementary to specific messenger RNA) were sequenced 
Apr-03: 99% of gene-containing part of human sequence was finished to 99.99% accuracy
Apr-03: Finished genome sequences of E. coli, S. cerevisiae, C. elegans, D. melanogaster, plus whole-genome drafts of several other model organisms, including:  C. briggsae, D. pseudoobscura, mouse and rat were completed
May-08: Genetic Information Nondiscrimination Act (GINA) becomes a law

== Tools Used To Sequence The Genome ==
The DNA sequence of humans if first broken into smaller projects called a cosmid, BAC, PAC, or P1 clone. These projects can be assigned to private labs all across the world. The following phases are the order in which private or government labs sequence these portions or contigs of the main genome.
The Random Phase
For many labs this means using the shotgun approach to sequencing DNA and it utilizes DNA restriction enzyme to cut the project DNA into varying sizes of base pair regions. 
Gap Closure Phase
Connecting the fragments of DNA caused by restriction enzyme is a bottleneck in the process of DNA sequencing and it has been greatly sped up through gold standard computer programs such as phredPhrap. As these programs become better less overlap is needed to find matching strands, but at the same time the higher the overlap of these strands the higher the accuracy. 
Ambiguity Resolution Phase
Through the use of programs such as confed, the low quality regions of the sequenced DNA can be analyzed for anomalies such as deletions or contaminant reads. This step is mostly a finishing or spell check function that acts to increase the accuracy of the raw sequenced DNA. 
Analysis Phase
This portion of the sequencing finds known patterns that are common to DNA. Patterns are found by programs such as BLAST, XGRAIL, and REPBASE. The commonalities looked for by XGRAIL include exons, introns, poly-a sites, promoter regions (TATA boxes, etc.), and repetitive bases. REPBASE will find repetitive sequences that are known to exist in families and subfamilies. BLAST has a large community of scientists who input DNA sequences of a wide range of species and allows the DNA in question to find its nearest evolutionary relative.

== Viewing the Genome through DNA Maps ==
There are many different ways of mapping the human Genome. One of the most common ways is in units of Centimorgans. Every centimorgan represents a one percent chance that two genes will separate during meiosis. One example is a gene that is inherited with Huntington’s disease 96 percent of the time. The remaining 4 percent of the time it does not travel with Hunington’s disease and thus it has a 4cM from that gene. 
Of the maps that are used to view DNA there are two kinds of maps, genetic-linkage and physical maps. Genetic Linkage maps view DNA in reference to another DNA group and how often they are inherited together. These genetic maps include Cytogenetic maps, Restriction Maps, Cosmid maps, and Sequence Maps. 
The Cytogenetic map was created by Victr McKusick, and it utilizes chromosomal staining in order to view groups. This method is limited in resolution as the target gene a scientist may be seeking could be in a stain containing ten million base pairs. That is why this method is useful for broad analysis and narrowing a sequence down to certain regions of the chromosome. 
The Restriction map was created by Dr. Raymond White, and it utilizes restriction enzymes. This process takes the genome of a family or generation of people and finds the percentage of genes that are close together between these related people. By using restriction enzymes the same specific sequence of DNA is cut from the genomes of the family and can be analyzed. The resolution of this methods is ten times greater than Cytogenetic mapping and can focus in on a genetic marking within one million base pairs. 
Cosmid maps are used from the actual overlapping sequence of base pairs derived from shotgun methods of sequencing. This methods takes bases of roughly 40,000 base pair lengths and overlaps them. The resolution of this method is highly accurate and can find a gene within 10,000 – 100,000 base pairs. 
The Sequence Map is the actual culmination of all sequencing and lists the entire order of the known genome of all 46 chromosomes. IT consists of over 3 billion base pairs and has 20,000 - 25,000 protein encoding genes.

== Ethical Dilemas ==
"If scientists don't play God, who will?" James Watson, former head of the Human Genome Project. 
The Human Genome Project raised many ethical concerns, with knowledge of the entire human DNA sequence and genes they code, people could alter their genes, (for a price) leading to possible genetic discrimination and other moral ramifications from “playing God”.  While the intentions may be noble, trying to better understand and help treat the many genetic diseases and defects that afflict mankind, many believe genetic manipulation/alteration is a slippery slope. While now most research is targeted at identifying or treating birth defects that are caused by a single gene, such as cystic fibrosis and Tay-Sachs disease, as well as more daunting ventures such as preventing diabetes, heart disease and other big killers, many worry what will come next. Will the mind will be targeted for improvement—preventing alcohol addiction and mental illness, and enhancing visual acuity or intelligence trying to improve the human design?  Even genetic testing of fetuses raises questions about the ethical ramifications of ever more accurate genetic screening.  Where should the line be drawn in eliminating perceived defects? On a more mundane note, the completion of the Human Genome Project raised concerns about genetic discrimination by employers and health insurance companies based on individual predisposition to current diseases. As of May 2008, GINA (Genetic Information Nondiscrimination Act) protects individuals from such discrimination, and precludes employers from demanding such tests.    
Dr. Marvin Frazier, who fields human genome questions as director of the Life Sciences Division of the U.S. Department of Energy's Office of Biological and Environmental Research, says it will take decades for scientists to figure out how to manipulate human intelligence or athletic ability, because of the complexity of the traits (they rely on a lot of genes) and the unknown role the environment plays on these abilities. To achieve desired goals the costs would be tremendous, not only in fiscal capital but human capital as well due to the large amount of risky experimentation that would be involved. "It is my opinion that this would be wrong," he added, "but that will not stop some people from wanting to try. The key question is not whether human (genetic) manipulation will occur, but how and when it will.”

== Related Projects ==
GLT: US Department of Energy's Genomics: explores the diversity of microbial and plant genomes in through their DNA sequences in order to understand how living systems operate
Human Microbiome Project: generates data on the human microbiome to study its role in human diseases. Instead of studying individual species individually, this project studies the microbial community harvested from their natural environment. 
Genographic Project: a combined project with National Geographic and IBM, the goal of the project is to analyze the roots of human genetics roots over the course of five years.

== Genetic Information Nondiscrimination Act (GINA): Major Impacts ==
On May 21, 2008, President George W. Bush made into law the Genetic Information Nondiscrimination Act (GINA). The law prevents U.S. insurance companies as well as employers from pre-screening potential prospects based on the information of their genetic tests. The bill finally passed both chambers of Congress after certain disagreements were worked out, but not until a few months have passed.
GINA was put into effect to prevent prejudiced situations such as when an arbitrary employer uses a future employee’s genetic information to determine performance level, proneness to tardiness, health risk, etc. It is also illegal to request or demand a genetic test under the law.
Without the fear of affecting their jobs or insurance rates, Americans would be more willing to go through genetic testing for diseases. This is encouraging because it could open doors to new medical discoveries and cures. It also allows early detection of health problems, leading to cost effective preventive solutions.
For more information: http://www.ornl.gov/sci/techresources/Human_Genome/publicat/GINAMay2008.pdf

== References ==
Definitions for LLNL's Shotgun Sequencing. Human Genome Project Information. U.S. Department of Energy Office of Science.
Davis, Joel. Mapping the Code.