I wrote down a list of the twenty-three chromosomes and next to each I began to list themes of human nature. Gradually and painstakingly I began to find genes that were emblematic of my story. There were frequent frustrations when I could not find a suitable gene, or when I found the ideal gene and it was on the wrong chromosome. There was the puzzle of what to do with the X and Y chromosomes, which I have placed after chromosome 7, as befits the X chromosome’s size. You now know why the last chapter of a book that boasts in its subtitle that it has twenty-three chapters is called Chapter 22.
It is, at first glance, a most misleading thing that I have done. I may seem to be implying that chromosome 1 came first, which it did not. I may seem to imply that chromosome 11 is exclusively concerned with human personality, which it is not. There are probably 60,000–80,000 genes in the human genome and I could not tell you about all of them, partly because fewer than 8,000 have been found (though the number is growing by several hundred a month) and partly because the great majority of them are tedious biochemical middle managers.
But what I can give you is a coherent glimpse of the whole: a whistle-stop tour of some of the more interesting sites in the genome and what they tell us about ourselves. For we, this lucky generation, will be the first to read the book that is the genome. Being able to read the genome will tell us more about our origins, our evolution, our nature and our minds than all the efforts of science to date. It will revolutionise anthropology, psychology, medicine, palaeontology and virtually every other science. This is not to claim that everything is in the genes, or that genes matter more than other factors. Clearly, they do not. But they matter, that is for sure.
This is not a book about the Human Genome Project – about mapping and sequencing techniques – but a book about what that project has found. Some time in the year 2000, we shall probably have a rough first draft of the complete human genome. In just a few short years we will have moved from knowing almost nothing about our genes to knowing everything. I genuinely believe that we are living through the greatest intellectual moment in history. Bar none. Some may protest that the human being is more than his genes. I do not deny it. There is much, much more to each of us than a genetic code. But until now human genes were an almost complete mystery. We will be the first generation to penetrate that mystery. We stand on the brink of great new answers but, even more, of great new questions. This is what I have tried to convey in this book.
PRIMER
The second part of this preface is intended as a brief primer, a sort of narrative glossary, on the subject of genes and how they work. I hope that readers will glance through it at the outset and return to it at intervals if they come across technical terms that are not explained. Modern genetics is a formidable thicket of jargon. I have tried hard to use the bare minimum of technical terms in this book, but some are unavoidable.
The human body contains approximately 100 trillion (million million) CELLS, most of which are less than a tenth of a millimetre across. Inside each cell there is a black blob called a NUCLEUS. Inside the nucleus are two complete sets of the human GENOME (except in egg cells and sperm cells, which have one copy each, and red blood cells, which have none). One set of the genome came from the mother and one from the father. In principle, each set includes the same 60,000–80,000 GENES on the same twenty-three CHROMOSOMES. In practice, there are often small and subtle differences between the paternal and maternal versions of each gene, differences that account for blue eyes or brown, for example. When we breed, we pass on one complete set, but only after swapping bits of the paternal and maternal chromosomes in a procedure known as RECOMBINATION.
Imagine that the genome is a book.
There are twenty-three chapters, called CHROMOSOMES.
Each chapter contains several thousand stories, called GENES.
Each story is made up of paragraphs, called EXONS, which are interrupted by advertisements called INTRONS.
Each paragraph is made up of words, called CODONS.
Each word is written in letters called BASES.
There are one billion words in the book, which makes it longer than 5,000 volumes the size of this one, or as long as 800 Bibles. If I read the genome out to you at the rate of one word per second for eight hours a day, it would take me a century. If I wrote out the human genome, one letter per millimetre, my text would be as long as the River Danube. This is a gigantic document, an immense book, a recipe of extravagant length, and it all fits inside the microscopic nucleus of a tiny cell that fits easily upon the head of a pin.
The idea of the genome as a book is not, strictly speaking, even a metaphor. It is literally true. A book is a piece of digital information, written in linear, one-dimensional and one-directional form and defined by a code that transliterates a small alphabet of signs into a large lexicon of meanings through the order of their groupings. So is a genome. The only complication is that all English books read from left to right, whereas some parts of the genome read from left to right, and some from right to left, though never both at the same time.
(Incidentally, you will not find the tired word ‘blueprint’ in this book, after this paragraph, for three reasons. First, only architects and engineers use blueprints and even they are giving them up in the computer age, whereas we all use books. Second, blueprints are very bad analogies for genes. Blueprints are two-dimensional maps, not one-dimensional digital codes. Third, blueprints are too literal for genetics, because each part of a blueprint makes an equivalent part of the machine or building; each sentence of a recipe book does not make a different mouthful of cake.)
Whereas English books are written in words of variable length using twenty-six letters, genomes are written entirely in three-letter words, using only four letters: A, C, G and T (which stand for adenine, cytosine, guanine and thymine). And instead of being written on flat pages, they are written on long chains of sugar and phosphate called DNA molecules to which the bases are attached as side rungs. Each chromosome is one pair of (very) long DNA molecules.
The genome is a very clever book, because in the right conditions it can both photocopy itself and read itself. The photocopying is known as REPLICATION, and the reading as TRANSLATION. Replication works because of an ingenious property of the four bases: A likes to pair with T, and G with C. So a single strand of DNA can copy itself by assembling a complementary strand with Ts opposite all the As, As opposite all the Ts, Cs opposite all the Gs and Gs opposite all the Cs. In fact, the usual state of DNA is the famous DOUBLE HELIX of the original strand and its complementary pair intertwined.
To make a copy of the complementary strand therefore brings back the original text. So the sequence ACGT become TGCA in the copy, which transcribes back to ACGT in the copy of the copy. This enables DNA to replicate indefinitely, yet still contain the same information.
Translation is a little more complicated. First the text of a gene is TRANSCRIBED into a copy by the same base-pairing process, but this time the copy is made not of DNA but of RNA, a very slightly different chemical. RNA, too, can carry a linear code and it uses the same letters as DNA except that it uses U, for uracil, in place of T. This RNA copy, called the MESSENGER RNA, is then edited by the excision of all introns and the splicing together of all exons (see above).
The messenger is then befriended by a microscopic machine called a RIBOSOME, itself made partly of RNA. The ribosome moves along the messenger, translating each three-letter codon in turn into one letter of a different alphabet, an alphabet of twenty different AMINO ACIDS, each brought by a different version of a molecule called TRANSFER RNA. Each amino acid is attached to the last to form a chain in the same order as the codons. When the whole message has been translated, the chain of amino acids folds itself up into a distinctive shape that depends on its sequence. It is now known as a PROTEIN.
Читать дальше