GENETIC ENGINEERING: FUNDAMENTALS AND APPLICATIONS

II. c DNA Cloning and Protein/Variant Production

David C. Tiemeier, Senior Fellow

Biological Sciences Department

Monsanto Company

St. Louis, Missouri

(NOTE: Edited for CompuServe by C. E. Styron. Comments on

this article can be forwarded through CompuServe E-Mail to

76054,1666 or through SourceMail BBH329.)

INTRODUCTION

In an earlier article in this series, I discussed some of

the basic concepts and techniques associated with

recombinant DNA technology. These techniques can be used to

isolate and amplify, that is to clone, a piece of DNA from

the total set of DNA of an organism. Depending on the

amount of DNA required to encode the particular gene(s) one

wishes to study, the cloned DNA fragment might encode a

portion of one gene or up to several complete genes. This

procedure also yields DNA sequences which flank the gene and

are critical for the normal physiological control of protein

production.

The recombinant DNA technology permits one to make not only

large amounts of the DNA corresponding to a particular gene

but also large amounts of the protein encoded by that gene.

This can, in turn, permit detailed studies of the protein's

structure and its interaction with substrate and inhibitors.

Moreover, this can be the basis for a production process if

the protein, itself, is judged to be a product candidate for

animal or human health care applications.

PROTEIN PRODUCTION

The vector used to produce the protein encoded by the cloned

gene differs from that described for the simple

amplification of the hybrid DNA. DNA elements are

incorporated on either side of the foreign gene which direct

the host cell to produce an mRNA transcript and subsequently

translate the mRNA into protein. By using regulatory

elements known to be very efficient in the host cell, high

level production of the protein can be achieved. 

In some cases, proteins have been produced at high levels in

E. coli. Human and bovine somatotropin are examples of

this. The yeast, Saccharomyces, better known to the

fermentation industry, has also been used for the production

of some proteins. Proteins are more readily secreted

secreted from yeast and that feature may facilitate protein

isolation in some instances. In other cases, animal cells

have been used as the expression host. Tissue plasminogen

activator(tPA) is an example of this. Animal cells may

prove particularly useful for producing proteins whose

function is affected by post-translational modifications.

Many of these modifications are not performed by E. coli.

Some are performed by yeast but in a way different from

animal cells.

cDNA CLONING

Gene-containing DNA fragments, isolated directly from the

cell's DNA as described in the first article, may be used

for the protein production I have just described; but two

reasons have prompted scientists to use an alternative

cloning approach. First, most genes only occur as single

copies in the DNA. Since DNA in higher eukaryotes such as

soybeans, cows, and humans have enough information for one

to ten million genes, one must mount a non-trivial cloning

project to pull out the full gene copy. Second, as

mentioned in the first article, many genes have been found

to be interrupted by DNA segments called intervening

sequences, or introns. These are good for the gene,

apparently facilitating the stable accumulation of the

gene's mRNA, but rough on the molecular biologist who would

like to over-produce the encoded protein. Introns can make

the stretch of DNA encoding the gene so big that it doesn't

fit into most vectors. Moreover, some of the key

host:vector systems for protein production, principally

those involving E. coli, do not remove the introns from the

mRNA. As a result, even though mRNA might be made within

the host it cannot be properly translated into protein.

To understand the alternative cloning approach, it is useful

to recall the "central dogma" of molecular biology. An

organism's traits are encoded in DNA. The information for a

specific protein, associated with a particular trait or

characteristic, is converted into a second polynucleotide

termed messenger RNA (mRNA) by a process called

transcription. This specific bit of information is then

translated into the particular protein. (Figure 2)

FIGURE 2: "CENTRAL DOGMA" OF MOLECULAR BIOLOGY

DNA =============================

||

|| TRANSCRIPTION

\|/

\|/

RNA -----------------------------

\|/

\|/

TRANSLATION

\|/

\|/

PROTEIN

Different cells in an organism are specialized to produce

only a portion of the proteins encoded in the total DNA

complement. Some cells produce as much as 1 - 2 percent of

their total protein as a single species. Generally, mRNA

levels reflect this same bias. Hence, if one can start with

a population of cells that are preferentially producing the

protein of interest and one can produce a double-stranded

DNA copy from the mRNA, the gene cloning will have the

advantage of starting with a DNA population highly enriched

in the desired gene. Moreover, the mRNA that accumulates in

the cell and that is subsequently translated into protein is

a mature form lacking the introns. Hence, the double

stranded DNA that results also lacks the introns and so

contains the amino acid information in an uninterrupted

form.

Fortunately for the molecular biologist, an enzyme

associated with certain animal viruses is capable of using

mRNA as a template to generate a complementary piece of DNA.

Eukaryotic mRNA typically has a run of adenylic acid at its

3' terminus (FIGURE 3a) so that a short piece of

deoxythymidylate can be used as a primer or starter for the

enzymatic synthesis (FIGURE 3b). Because this enzymatic

process is the opposite of transcription, the enzyme has

been named reverse transcriptase. The resulting

complementary DNA is referred to as cDNA and the cloning

approach based on this initial conversion of mRNA to cDNA is

termed cDNA cloning. 

The single-stranded cDNA can be converted into a double

stranded DNA using ribonuclease H and DNA polymerase. The

former chews away the mRNA in the hybrid leaving short RNA

pieces which can act as primers to start synthesis of the

second DNA strand (FIGURE 3d). This process results in a

double-stranded cDNA (FIGURE 3e). Short, chemically-

synthesized oligonucleotides containing desired restriction

enzyme sites can then be attached with DNA ligase (FIGURE

3f). Subsequent cutting with the appropriate restriction

enzyme then generates the single-stranded termini or ends

(FIGURE 3g) described last time which can mediate

recombination with the vector.

There are many variations on the cDNA cloning scheme

outlined here. All start, however, with mRNA enriched for

the particular sequence of interest and take advantage of

enzymatic tools for converting the mRNA into a double-

stranded cDNA.

As with the cloning of a specific piece of genomic DNA, the

identification of the desired cDNA clone typically depends

on hybridization with labeled oligonucleotides specific for

the desired gene or screening of the cells transformed by

the hybrids with antibodies specific for the desired

protein. Basically, if one can purify small amounts of the

desired protein, its gene can be cloned. Protein

microsequencing on as little as 10 - 100 micrograms of

protein can provide a partial amino acid sequence on which,

since the DNA genetic code is known, gene-specific,

synthetic oligonucleotide probes can be based. Alternately,

specific antibodies raised against similar quantities of

protein can be used to screen cDNA libraries if they are

constructed so as to produce the encoded protein.

VARIANT PROTEIN PRODUCTION

In addition to permitting the production of naturally-

occurring proteins, the expression technology can be adapted

to the production of variants of naturally-occurring

proteins. This can be useful for defining the relationships

between protein structure and function and for providing

novel proprietary compositions for product applications.

There are two basic approaches to the construction of

variants: site-specific mutagenesis and random mutagenesis.

Site-specific or site-directed mutagenesis is a very precise

means of generating proteins with specific alterations in

their structure. One needs to have already cloned the

desired gene and to know its DNA sequence. The cloned gene

is then introduced into a bacterial virus system known as

M13. This has the unique characteristic of yielding either

double-stranded or single-stranded forms of the hybrid DNA

molecule. One then synthesizes an oligodeoxynucleotide

whose internal sequence matches the new amino acids one

wishes to introduce into the protein. The ends of the

oligonucleotide are made such that they match the already

known sequence of the gene. (Figure 4)

ENCODING BASE CHANGES

When the oligonucleotide is added to the single-stranded

hybrid, it hybridizes by virtue of its complementary ends.

DNA polymerase, using nucleotide triphosphates as building

blocks, the oligonucleotide as primer, and the gene-

containing hybrid as template proceeds to complete the

second strand. A heteroduplex circle results (FIGURE 4).

The two circles are mismatched in the region where the first

strand contains the old sequence and the second strand

contains the new sequence representing the amino acid

alterations one desires to make. When the heteroduplex is

introduced into E. coli one of the two strands is selected

for viral replication and production. Approximately, half

the time the viral DNA obtained has the old sequence fixed

in both strands; the other half has the new sequences fixed

in both strands. One can then re-introduce this altered DNA

into the expression vector and obtain the desired protein

variant. In addition to amino acid replacements, one can

similarly produce amino acid additions and deletions. This

has been an important program in our studies of bovine

somatotropin.

It is obvious that one must have some idea of what amino

acids ought to be varied and what specific variations should

be made.

If the gene is in hand and already sequenced, if

synthetic oligonucleotides are available, and if the

expression system and protein purification scheme are in

place, then one person can hope to generate a half dozen

variant proteins in a three to six month period. However,

when you consider that any one of twenty amino acids could

be put into any one of a typical protein's one hundred amino

acid positions and that you could delete, add, or rearrange

protein segments as well, it is clear that one cannot

realistically expect to construct all possible structural

variants. Information from the first set of variants

constructed in this way and knowledge of peptide chemistry

are critical elements in advancing a site-specific

mutagenesis program in a productive fashion.

The second approach to variant construction, random

mutagenesis, can be a powerful adjunct. This depends on

generating mutations randomly throughout the gene or a

region of the gene enzymatically or chemically. Its

advantage is that it can be applied to proteins for which we

have little structure:function information. What is

critical in this approach is that a rapid, functional assay

be available.

When the randomly mutated gene is re-inserted

into the expression vector and returned to the host cell,

each cell now produces a different variant protein. With E.

coli or yeast cell systems, one can reasonably plate and

screen thousands of variant proteins on a nutrient agar

plate. With mammalian cells hundreds can potentially be

screened. The plates are subjected to the particular assay

and the rare variants identified on the plates.

CONCLUSION

The technologies to produce large amounts of specific

proteins and variants of those proteins represent powerful

tools for the synthesis of protein products and the

identification of non-protein products. A very fertile area

of research will be the combination of these technologies

with techniques for protein crystallization, X-ray

crystallography, and computer-assisted protein structure

analysis.

ACKNOWLEDGEMENT

I thank Gwen Krivi and Roger Wiegand for their advice, and

Vicki Grant for her tireless assistance, and Clarence Styron

for his encouragement and editorial assistance in the

completion of this article.


Index - Evolution or Creation

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231