GENETIC ENGINEERING: FUNDAMENTALS AND APPLICATIONS
II. c DNA Cloning and Protein/Variant Production
David C. Tiemeier, Senior Fellow
Biological Sciences Department
Monsanto Company
St. Louis, Missouri
(NOTE: Edited for CompuServe by C. E. Styron. Comments on
this article can be forwarded through CompuServe E-Mail to
76054,1666 or through SourceMail BBH329.)
INTRODUCTION
In an earlier article in this series, I discussed some of
the basic concepts and techniques associated with
recombinant DNA technology. These techniques can be used to
isolate and amplify, that is to clone, a piece of DNA from
the total set of DNA of an organism. Depending on the
amount of DNA required to encode the particular gene(s) one
wishes to study, the cloned DNA fragment might encode a
portion of one gene or up to several complete genes. This
procedure also yields DNA sequences which flank the gene and
are critical for the normal physiological control of protein
production.
The recombinant DNA technology permits one to make not only
large amounts of the DNA corresponding to a particular gene
but also large amounts of the protein encoded by that gene.
This can, in turn, permit detailed studies of the protein's
structure and its interaction with substrate and inhibitors.
Moreover, this can be the basis for a production process if
the protein, itself, is judged to be a product candidate for
animal or human health care applications.
PROTEIN PRODUCTION
The vector used to produce the protein encoded by the cloned
gene differs from that described for the simple
amplification of the hybrid DNA. DNA elements are
incorporated on either side of the foreign gene which direct
the host cell to produce an mRNA transcript and subsequently
translate the mRNA into protein. By using regulatory
elements known to be very efficient in the host cell, high
level production of the protein can be achieved.
In some cases, proteins have been produced at high levels in
E. coli. Human and bovine somatotropin are examples of
this. The yeast, Saccharomyces, better known to the
fermentation industry, has also been used for the production
of some proteins. Proteins are more readily secreted
secreted from yeast and that feature may facilitate protein
isolation in some instances. In other cases, animal cells
have been used as the expression host. Tissue plasminogen
activator(tPA) is an example of this. Animal cells may
prove particularly useful for producing proteins whose
function is affected by post-translational modifications.
Many of these modifications are not performed by E. coli.
Some are performed by yeast but in a way different from
animal cells.
cDNA CLONING
Gene-containing DNA fragments, isolated directly from the
cell's DNA as described in the first article, may be used
for the protein production I have just described; but two
reasons have prompted scientists to use an alternative
cloning approach. First, most genes only occur as single
copies in the DNA. Since DNA in higher eukaryotes such as
soybeans, cows, and humans have enough information for one
to ten million genes, one must mount a non-trivial cloning
project to pull out the full gene copy. Second, as
mentioned in the first article, many genes have been found
to be interrupted by DNA segments called intervening
sequences, or introns. These are good for the gene,
apparently facilitating the stable accumulation of the
gene's mRNA, but rough on the molecular biologist who would
like to over-produce the encoded protein. Introns can make
the stretch of DNA encoding the gene so big that it doesn't
fit into most vectors. Moreover, some of the key
host:vector systems for protein production, principally
those involving E. coli, do not remove the introns from the
mRNA. As a result, even though mRNA might be made within
the host it cannot be properly translated into protein.
To understand the alternative cloning approach, it is useful
to recall the "central dogma" of molecular biology. An
organism's traits are encoded in DNA. The information for a
specific protein, associated with a particular trait or
characteristic, is converted into a second polynucleotide
termed messenger RNA (mRNA) by a process called
transcription. This specific bit of information is then
translated into the particular protein. (Figure 2)
FIGURE 2: "CENTRAL DOGMA" OF MOLECULAR BIOLOGY
DNA =============================
||
|| TRANSCRIPTION
\|/
\|/
RNA -----------------------------
\|/
\|/
TRANSLATION
\|/
\|/
PROTEIN
Different cells in an organism are specialized to produce
only a portion of the proteins encoded in the total DNA
complement. Some cells produce as much as 1 - 2 percent of
their total protein as a single species. Generally, mRNA
levels reflect this same bias. Hence, if one can start with
a population of cells that are preferentially producing the
protein of interest and one can produce a double-stranded
DNA copy from the mRNA, the gene cloning will have the
advantage of starting with a DNA population highly enriched
in the desired gene. Moreover, the mRNA that accumulates in
the cell and that is subsequently translated into protein is
a mature form lacking the introns. Hence, the double
stranded DNA that results also lacks the introns and so
contains the amino acid information in an uninterrupted
form.
Fortunately for the molecular biologist, an enzyme
associated with certain animal viruses is capable of using
mRNA as a template to generate a complementary piece of DNA.
Eukaryotic mRNA typically has a run of adenylic acid at its
3' terminus (FIGURE 3a) so that a short piece of
deoxythymidylate can be used as a primer or starter for the
enzymatic synthesis (FIGURE 3b). Because this enzymatic
process is the opposite of transcription, the enzyme has
been named reverse transcriptase. The resulting
complementary DNA is referred to as cDNA and the cloning
approach based on this initial conversion of mRNA to cDNA is
termed cDNA cloning.
The single-stranded cDNA can be converted into a double
stranded DNA using ribonuclease H and DNA polymerase. The
former chews away the mRNA in the hybrid leaving short RNA
pieces which can act as primers to start synthesis of the
second DNA strand (FIGURE 3d). This process results in a
double-stranded cDNA (FIGURE 3e). Short, chemically-
synthesized oligonucleotides containing desired restriction
enzyme sites can then be attached with DNA ligase (FIGURE
3f). Subsequent cutting with the appropriate restriction
enzyme then generates the single-stranded termini or ends
(FIGURE 3g) described last time which can mediate
recombination with the vector.
There are many variations on the cDNA cloning scheme
outlined here. All start, however, with mRNA enriched for
the particular sequence of interest and take advantage of
enzymatic tools for converting the mRNA into a double-
stranded cDNA.
As with the cloning of a specific piece of genomic DNA, the
identification of the desired cDNA clone typically depends
on hybridization with labeled oligonucleotides specific for
the desired gene or screening of the cells transformed by
the hybrids with antibodies specific for the desired
protein. Basically, if one can purify small amounts of the
desired protein, its gene can be cloned. Protein
microsequencing on as little as 10 - 100 micrograms of
protein can provide a partial amino acid sequence on which,
since the DNA genetic code is known, gene-specific,
synthetic oligonucleotide probes can be based. Alternately,
specific antibodies raised against similar quantities of
protein can be used to screen cDNA libraries if they are
constructed so as to produce the encoded protein.
VARIANT PROTEIN PRODUCTION
In addition to permitting the production of naturally-
occurring proteins, the expression technology can be adapted
to the production of variants of naturally-occurring
proteins. This can be useful for defining the relationships
between protein structure and function and for providing
novel proprietary compositions for product applications.
There are two basic approaches to the construction of
variants: site-specific mutagenesis and random mutagenesis.
Site-specific or site-directed mutagenesis is a very precise
means of generating proteins with specific alterations in
their structure. One needs to have already cloned the
desired gene and to know its DNA sequence. The cloned gene
is then introduced into a bacterial virus system known as
M13. This has the unique characteristic of yielding either
double-stranded or single-stranded forms of the hybrid DNA
molecule. One then synthesizes an oligodeoxynucleotide
whose internal sequence matches the new amino acids one
wishes to introduce into the protein. The ends of the
oligonucleotide are made such that they match the already
known sequence of the gene. (Figure 4)
ENCODING BASE CHANGES
When the oligonucleotide is added to the single-stranded
hybrid, it hybridizes by virtue of its complementary ends.
DNA polymerase, using nucleotide triphosphates as building
blocks, the oligonucleotide as primer, and the gene-
containing hybrid as template proceeds to complete the
second strand. A heteroduplex circle results (FIGURE 4).
The two circles are mismatched in the region where the first
strand contains the old sequence and the second strand
contains the new sequence representing the amino acid
alterations one desires to make. When the heteroduplex is
introduced into E. coli one of the two strands is selected
for viral replication and production. Approximately, half
the time the viral DNA obtained has the old sequence fixed
in both strands; the other half has the new sequences fixed
in both strands. One can then re-introduce this altered DNA
into the expression vector and obtain the desired protein
variant. In addition to amino acid replacements, one can
similarly produce amino acid additions and deletions. This
has been an important program in our studies of bovine
somatotropin.
It is obvious that one must have some idea of what amino
acids ought to be varied and what specific variations should
be made.
If the gene is in hand and already sequenced, if
synthetic oligonucleotides are available, and if the
expression system and protein purification scheme are in
place, then one person can hope to generate a half dozen
variant proteins in a three to six month period. However,
when you consider that any one of twenty amino acids could
be put into any one of a typical protein's one hundred amino
acid positions and that you could delete, add, or rearrange
protein segments as well, it is clear that one cannot
realistically expect to construct all possible structural
variants. Information from the first set of variants
constructed in this way and knowledge of peptide chemistry
are critical elements in advancing a site-specific
mutagenesis program in a productive fashion.
The second approach to variant construction, random
mutagenesis, can be a powerful adjunct. This depends on
generating mutations randomly throughout the gene or a
region of the gene enzymatically or chemically. Its
advantage is that it can be applied to proteins for which we
have little structure:function information. What is
critical in this approach is that a rapid, functional assay
be available.
When the randomly mutated gene is re-inserted
into the expression vector and returned to the host cell,
each cell now produces a different variant protein. With E.
coli or yeast cell systems, one can reasonably plate and
screen thousands of variant proteins on a nutrient agar
plate. With mammalian cells hundreds can potentially be
screened. The plates are subjected to the particular assay
and the rare variants identified on the plates.
CONCLUSION
The technologies to produce large amounts of specific
proteins and variants of those proteins represent powerful
tools for the synthesis of protein products and the
identification of non-protein products. A very fertile area
of research will be the combination of these technologies
with techniques for protein crystallization, X-ray
crystallography, and computer-assisted protein structure
analysis.
ACKNOWLEDGEMENT
I thank Gwen Krivi and Roger Wiegand for their advice, and
Vicki Grant for her tireless assistance, and Clarence Styron
for his encouragement and editorial assistance in the
completion of this article.
Index - Evolution or Creation
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231