Information Flow in Cells

(Norman Cohen) If you're going to build anything complicated, you need some sort of plan - a blueprint, a set of instructions. Now that's just as true of living cells as anything else. But they contain the instructions within themselves: they're inherited, they're copied and passed on from generation of cell to generation of cell.

Now what we're going to do is look at what the instructions are and how they work. But before we do that, just stop for a moment and think what sorts of characteristics any set of inherited instructions is likely to have. I think there are probably three. First of all the instructions have to be stable. They have to last long enough to be copied and passed on to the next generation of cells.

And there's the second point. They need to be capable of being copied and copied accurately. And finally and most obviously, as instructions they need to contain information. Well that's all very well but it's really just sort of speculation - theory. What about the reality?

The reality is a molecule called DNA. Now you might think that the structure of a molecule that contains the instructions for making a complete cell must itself be very complicated. But in fact, the basic structure of DNA is remarkably simple.

(Rissa de la Paz) Here's a schematic model of DNA. Unwind the double helix and it consists of two long strands that form a ladder-like structure.

The strands of DNA are strings of chemically repeating units, which act as basic building blocks. Each unit contains a sugar-deoxyribose, a phosphate group and a base.

Together these form a nucleotide. There are actually 4 types of base - adenine, thymine, cytosine and guanine. Within a strand of DNA, the bases can come in any order and just how important this sequence is will become clear later. What's more, in double-stranded DNA, the bases match up in a particular fashion, adenine always pairs up with thymine and guanine with cytosine.

This precise base pairing means that the base sequence in one strand is complementary to the sequence in the other. The base pairs are held together by relatively weak hydrogen bonds. But when summed up over the whole DNA double helix, these hydrogen bonds impart great stability.

(Norman Cohen) Let's have a look at base pairing in a bit more detail. Here's another model of DNA. It's a different type of model and it's of a very short section of DNA. These are the two strands of the double helix and connecting them like steps are base pairs

We've arranged it with both types of base pair represented. A, T and on the other side, G, C. And joining the base pairs, these are the hydrogen bonds.

Now hydrogen bonds are relatively weak bonds, but summed up over a whole DNA molecule, which in reality would be a very long molecule, the double helix is stable. In some ways, DNA is a bit like a zip fastener. Individual links are weak but overall the thing's quite stable. But if it's stability you're after, why have something that can quite simply fall apart?

Well the answer's also quite simple. There are occasions when the two stands of a DNA double helix do have to separate. For instance, remember that before cells divide, they have to copy their DNA in a process called replication.

(Rissa de la Paz) A double strand of DNA unwinds and the strands separate. For each unwound strand, the bases can now match with those that are floating free in the cell. The familiar base pairing rules apply: A pairs with T; G with C, and so on. So each strand of the double helix acts as a template for the formation of a new complementary strand.

This eventually gives two DNA double helices - each identical to the original and actually containing one old strand and one new.

(Norman Cohen) Replication is a pretty accurate process but very occasionally a mistake does occur, a wrong base is put in. But even then cells contain proof-reading mechanisms that can detect the mistake and correct it.

And the net result in some cells is that as few as one mistake in a billion or so bases creeps through. Now I reckon that's probably equivalent to about one printer's error in a thousand average-sized novels or so. That's pretty remarkable. Well that's how replication occurs - how the information in the DNA is copied. But where is the information?

(Rissa de la Paz) Think again about the structure of DNA and how it might carry instructions. A single chain that simply repeats one symbol would carry no useful information. But a chain made up of different symbols can encode information. Information needs difference. In fact, life's genetic instructions are spelled out in combinations of the letters A, G, C and T - the four bases of the DNA molecule. In effect, one particular sequence of bases containing one particular piece of information, is one gene.

(Norman Cohen) Genes code for proteins. Each specific gene codes for a specific polypeptide within a protein. Now proteins are extremely important in living organisms. Some proteins are structural, others for example are enzymes. A typical gene is about a thousand base pairs or so. Now that may seem rather a lot, but there's plenty to spare in DNA. You see this model actually represents a very, very small section of a real DNA molecule.

Real DNA molecules would be many, many times longer than this. They're the largest molecules known by far. In fact a single human DNA molecule on this sort of scale would be thousands and thousands of miles long.

And if you consider the 23 different molecules of DNA in a human haploid cell and add all the base pairs together, you come out with a figure of round about 3 billion - that's three thousand million base pairs.

Now frankly numbers like that don't mean very much to me. So how can we put numbers of that sort into some sort of perspective? Take a telephone directory and imagine the whole thing is composed of the very tiny print. And that each letter and each digit corresponds to a base pair. Well to get three billion, you'd need a couple of hundred or so different directories.

Now in any particular type of cell not all the genes in the DNA are being used - essentially, some genes are switched on and some are switched off. Well, what does that imply? Imagine you've got an instruction manual and you want to use just some instructions at a particular time, in a particular place without lugging the whole manual around. Now how could you do that? Well one way is to choose the instruction you need - tear it out, use them, discard them.

But if this is DNA, the cell can't do things that way because the DNA would be damaged - and sooner or later all of the DNA has to be copied and the copies passed onto future generations of cells - so that can't be the way things happen. Okay, back to a manual. Another way of doing things is not to tear things out, but to make a photocopy of just the instructions you need at a particular time. And in a sense that's what happens in living cells. The genes that are switched on, those that are going to be used, are copied, that is the information in them is copied to make a copy called messenger RNA.

(Rissa de la Paz) To make a particular protein in the cell, the relevant gene is first switched on in the DNA. A working copy of the gene, called messenger RNA, is made. This copying process is called transcription. Next the information in the messenger RNA is acted upon to produce a protein.

This step is called translation since it involves translating the four letter code in DNA or RNA into the sequence of amino acids in a protein. Let's look at these steps in more detail. But first, a look at RNA.

To understand how a working copy of the gene is made, we need to be familiar with the structure of RNA. Unlike DNA, RNA is just a single strand of nucleotide units. In DNA, the sugar is deoxyribose, in RNA, it's ribose. As for the bases, although 3 are identical - adenine, guanine and cytosine, the thymine in DNA is replaced by uracil in RNA.

Uracil is very similar to thymine: it always pairs with adenine; that is, it obeys the same base pairing rules.

(Norman Cohen) OK - so specific base pairing has cropped up again. You've already seen how important it is in the structure of DNA and in the replication of DNA. And now you'll see how vital it is in the production of messenger RNA, in a process known as transcription.

(Rissa de la Paz) When transcription starts, a small section of DNA is unwound. One of the 2 unwound strands acts as a template for making the message.

The messenger RNA is built up, one nucleotide at a time, according to the familiar base pairing rules:
A on the DNA pairs with U on the RNA
G pairs with C
T pairs with A, and so on.

The result: a message with a base sequence complementary to the template strand of the DNA. This messenger RNA will eventually be used to direct the formation of a protein.

(Norman Cohen) You might have been wondering how the cell uses linear information - the bases in messenger RNA - to produce something obviously three-dimensional, a protein. Well in fact the problem isn't quite as complicated as it might seem. You see in every protein, there are one or more polypeptide chains - linear structures - that run throughout the 3-dimensional structure of the protein.

Now what this means can be seen more easily on a simpler model. This is a large-scale model of a polypeptide. It's all twisted up here but in fact, in essence, it's a linear structure.

Now this represents a fairly short polypeptide. Real polypeptides would generally have many more units than this. And each unit is an amino acid, here represented by a ball. Each ball is an amino acid.

Now in polypeptides you've 20 different types of amino acid. So the problem reduces to this - how does the cell use linear information in messenger RNA, which has four types of unit - the four different bases - to produce this: a linear polypeptide with 20 different types of unit - the 20 different amino acids?

(Rissa de la Paz) It turns out that each triplet of bases on the messenger RNA, called a codon, corresponds to a particular amino acid. Now there has to be a chemical connection between each triplet codon and each amino acid. A special adaptor molecule called transfer RNA makes that connection. One end of the adaptor carries a particular triplet of bases called an anticodon.

This matches up with a specific triplet codon on the messenger RNA. The other end of the transfer RNA is capable of binding to the unique amino acid corresponding to that anticodon.

Since there are twenty amino acids, there must be at least 20 different transfer RNA adaptors.

So for a protein chain to be assembled, each triplet codon on the message is read. The transfer RNA adaptor with the relevant anticodon binds the messenger. The amino acids at the other end of the transfer RNA adaptors become joined by a peptide bond. Once the transfer RNA molecule is no longer needed, it's released. The process repeats, elongating the peptide chain, until a 'STOP' codon is reached.

(Norman Cohen) Finally, how does the cell go from the linear polyeptide to the three-dimensional shape seen in a protein? Well, that's where individual amino acids come into play because individual amino acids of different types can interact with each other in a variety of ways.

So, depending on the particular amino acids in the polypeptide in question and their position in the sequence, the interactions occur, giving the weird and wonderful but very characteristic 3-dimensional shape of the protein in question.

Well that completes the journey from DNA to protein. But it's worth having another look at translation, this time from a slightly different perspective. We'll look at where it happens in the cell, that's on the ribosome.

(Rissa de la Paz) In this sequence we concentrate on how the messenger RNA interacts with the ribosome. A specific set of bases on the message signifies a 'start' signal for protein synthesis. The ribosome and the messenger RNA move relative to one another. The instructions in the message are decoded to produce a polypeptide chain.

This requires the transfer RNA adaptor molecules, omitted here for simplicity. The process continues until the ribosome reaches a 'stop' signal in the message, again denoted by a specific set of bases. The completed polypeptide chain is released.

(Norman Cohen) Well, that's just about it. We've looked at the structure of DNA, how DNA is replicated and how the information in DNA is utilised to make proteins. But underneath all of that I think there's one principle that really helps tie things together. And that's the principle of base pairing.

We've seen how base pairing helps explain replication of DNA. It also helps explain transcription to make messenger RNA. And translation of messenger RNA to make proteins. And of course base pairing is also important in explaining why the DNA molecule is a double helix.

It's a quite remarkable molecule and one that provides a wonderful example of the connection between structure and function.

It's also a very clever solution to the three problems that we set earlier for inherited material: stability, accuracy of copying and containing information. Clever? Well if molecules did have brains, this one would be Einstein.