Computer Assisted Parental Sequences Analysis as a Previous Step to DNA Shuffling Process

June 3, 2017 | Autor: M. Nicoletti | Categoria: Directed evolution, Sequence Analysis, Genetic Diversity

Descrição do Produto

2006 IEEE Congress on Evolutionary Computation Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006

Computer Assisted Parental Sequences Analysis as a Previous Step to DNA Shuffling Process Luciana Montera1,2, Maria do Carmo Nicoletti1, Flavio Henrique-Silva2 –

Abstract—Experiments of Directed Evolution have been successfully used to improve specific biological functions. DNA Shuffling is a directed evolution process which generates genetic diversity through the recombination of parental sequences. It is expected that the sequences resulting from a Shuffling process be a mixture of the parental sequences. In order to evaluate which pair of sequences could potentially produce the best result, this paper analyses, using information given by a software, the adequacy of three pairs of parental sequences as candidates for undergoing a DNA shuffling process.

I. INTRODUCTION

D

IRECTED molecular evolution is an in vitro technique that tries to mimic the natural process of selection and evolution according to Darwin, aiming to produce proteins with improve properties. The starting point of directed molecular evolution is a pool of molecules with varying sequences of the four nucleotide subunits of DNA, and the basic strategy is recombination, a process that tries to assemble new functional sequences from the previous ones, in a cyclic manner. Several directed molecular evolution methods have been proposed and used presently such as Stemmer’s DNA shuffling reaction [1][2], heteroduplex recombination [3], staggered extension (StEP) [4] and a few others (listed in [5]). These methods aim at generating novel sequences that encode functionally interesting proteins. They have already been used to improve certain properties of a variety of commercially available products such as pharmaceutical proteins, vaccines and industrial enzymes (see [5], [6] and [7]). The basic principle of in vitro evolution, as described in [8], can be summarized as follows. Initially a library of molecules - DNA, RNA or proteins - is built. This library of molecules can be created using either random molecules of peptides or oligonucleotides, or variants of one or more parent molecule(s), obtained through mutagenesis. Using the first option for constructing a library is not very appealing because in spite of the potential diversity of a

random library being huge, any library can only contain a very small fraction of the potential molecules. However, a library built using a process of mutation (mutagenesis) from one (or more) molecule(s), which are already known to have some desired property, might be more useful, mainly because diversity can be kept under control. Once the molecular library has been built, some of its molecules can have a desired function. A selection process is then used to isolate these molecules - generally only a small number of molecules are isolated. Next a process of mutation (mutagenesis) is used in order to increase the number of the selected molecules, as well as their diversity. The resulting molecules of the mutation process then undergo an amplification process, in order to increase their numbers. The sequence selection, mutagenesis and amplification constitutes a cycle in an in vitro evolution process. The cycle is repeated until molecules with the desired properties have been selected. Fig. 1 shows a simplified diagram of the general scheme of an in vitro evolution process.

Fig. 1. General in vitro evolution scheme. 1

Department of Computer Science Department of Genetics and Evolution Federal University of São Carlos São Carlos – SP, Brazil The first and the second authors would like to thank the funding agencies CAPES and CNPq, respectively, for their support. 2

0-7803-9487-9/06/$20.00/©2006 IEEE

8079

Mutagenesis is a fundamental step in in vitro evolution experiments. As can be inferred from the diagram in Fig. 1, mutagenesis can be used either for initially creating the molecular library or for increasing the molecular diversity, after the selection process. Of prime importance in all directed evolution protocols is the production of molecular diversity by mutagenesis. In vitro evolution experiments that do not implement mutagenesis are confined to the selection of only those molecules that are part of the initial library. Due to the fact that the initial molecular library usually only contains a small fraction of potential molecules, the probability that this library contains molecules with desired properties is very low. Also, of great importance in a mutagenesis process are the parental molecules. Depending on the mutagenesis protocol as well as of certain characteristics of these molecules, the process has greater or lesser chances of being successful. Mutagenesis techniques have been continually proposed and developed. A very important mutagenesis technique is known as DNA shuffling and was devised by Stemmer [1][2]. Although its full extent is not completely understood yet, shuffling has been successfully used in many applications (such as those described in [5], [6] and [7]). This paper analyses the adequacy of three pairs of parental sequences as candidates for undergoing a DNA shuffling process based on results given by a software that estimates the diversity represented in libraries generated by recombining two highly homologous parental sequences. Following this introductory section, Section 2 describes in detail the DNA shuffling as a four-step process and discusses the relevance that the initial DNA sequences play in the process. Section 3 describes the main functionalities of the software DRIVeR (Diversity Resulting from In vitro Recombination), suitable for estimating the number of distinct variants represented in shuffled libraries, the way it operates and its main assumptions. Section 4 describes and discusses the results obtained using DRIVeR for assessing the adequacy of three different pairs of sequences as parental sequences, under different software settings. The experiments aimed at the identification of a pair of parental sequences with the highest probability of producing the highest number of distinct variants, in a DNA shuffling process. Finally, Section 5 presents conclusions and highlights the scope for future work. I. ABOUT DNA SHUFFLING AND THE IMPORTANCE OF PARENTAL DNA SEQUENCES The technique known as DNA Shuffling (or sexual PCR), proposed and developed by Stemmer [1][2] is a method for in vitro recombination of homologous3 genes, strongly based

on a DNA synthesis process known as Polymerase Chain Reaction (PCR). In order to describe the DNA shuffling process in detail, the main characteristics of a PCR process are presented first. A. Polymerase Chain Reaction (PCR) As defined in [9], “PCR is an elegant but simple technique for the in vitro amplification of target DNA utilizing DNA polymerase and two specific oligonucleotide or primer sequences flanking the region of interest…. Each cycle (of PCR) doubles the region marked by the primer sequences. By sequential iteration of the process, PCR exponentially generates up to a billion copies of the target within just a few hours.” The DNA polymerase enzyme synthesizes new strands of DNA in a 5’→3’ direction from a single-stranded template. In a PCR cycle, the three temperature controlled steps, as described in [10][11] and pictorially shown in Fig. 2 are: 1) Denaturing: double-stranded DNA molecules are heated to near boiling temperature so that the double-stranded DNA molecules are separated completely into two single-stranded sequences; 2) Annealing: the temperature is lowered such that primers anneal to the single-stranded sequences; 3) Extension: the temperature is raised again to the temperature that is optimum for the polymerase to react. DNA polymerases use the single-stranded sequences as templates to extend the primers that have been annealed to the templates. There are many variables and parameters that interfere with a PCR process and its results, such as number of cycles, temperature and time involved in each individual step namely denaturation, primer annealing, extension as well as the quality and concentration of primers used in the annealing step. Reference [9] presents details about an exhaustive list of them and an in depth discussion about the PCR process, its results and limitations. It is important to mention, though, as pointed out in [12], that “while in theory one would expect an exponential growth for the target as function of PCR cycles (i.e., 2n times the original DNA copy number, after n cycles), in practice, replication processes measured by different real-time PCR systems show varying yields, suggesting a biochemical random process. In addition to variable gains and inconsistent amplification levels within a PCR process, there is also the likelihood of creating non-specific byproducts (i.e., DNA strands different than the target) as well as inserting mutations into the product, which further degrades the quality of the PCR product”.

3

Homology among proteins and DNA is often concluded on the basis of sequence similarity. For example, in general, if two genes have an almost identical DNA sequence, it is likely that they are homologous. However, it may be that the sequence similarity did not arise from their sharing a common ancestor; short sequences may be similar by chance, or sequences may be similar because both were selected to bind to a particular protein,

such as a transcription factor. Such sequences are similar but not homologous (en.wikipedia.org/wiki/Homology_(biology)).

8080

1) Selection of the parental genes; 2) Enzymatic digestion of the parental genes, also called fragmentation; 3) Cycles of PCR (without primers) in order to promote the reassembly of the fragments. Recombination occurs when fragments from different parents anneal at a region of high sequence similarity; 4) PCR amplification with primers of the reassembled sequences obtained in the previous step, in order to create full-length sequences.

Fig. 2. The three step temperature controlled PCR process.

PCR-based DNA technology can be used to perform simultaneous mutation and recombination. As stated in [13], “The application of PCR techniques has blurred the distinctions among mutagenesis, recombination, and synthesis of genes. The product of PCR-based manipulations is really a mosaic in which sequences derived from natural sources are connected by sequences derived from the synthetic oligonucleotide primers used to direct the amplification; essentially any desired gene sequence can be constructed by combining natural, mutant, and synthetic regions.” B. DNA Shuffling DNA shuffling processes can be categorized into two different types. Those that start with single genes, known as normal mutagenesis (e.g. error prone PCR) and those that start with genes from a gene family, known as DNA shuffling (or sexual PCR). The focus of this paper is on DNA shuffling. Described in a simplistic way, DNA shuffling starts with sequences of homologous genes that share some functionalities of interest and generates, from these genes, a library of recombining genes. The library contains genes whose DNA sequences are composed of a ‘mixture’ of fragments of the original sequences. Among the resulting genes, also known as recombinants, it is expected that genes with some of their functionalities improved in relation to the original genes are found. Fig. 3 shows how recombination occurs during shuffling. The original protocol of DNA Shuffling technique can be summarized as the following sequence of steps:

Fig. 3 (a) Fragmentation of parental genes S1 and S2, which differ from each other in two positions. (b) Denaturation (c) Recombination between two fragments (from different parental), due to their complementary regions (annealing) (d) Extension by a polymerase. The reassembled sequence in the figure is the result of one unique crossover between parental sequences S1 and S2.

The situation in Fig. 3 (a) shows the enzymatic digestion of parental sequences S1 and S2 resulting in four doublestranded fragments, namely F11/F13, F12/F14, F21/F23 and F22/F24 respectively. These fragments are denatured, resulting in eight single stranded fragments, as shown in Fig. 3 (b). Fig. 3 (c) shows the recombination of two fragments, each one part of a distinct parental sequence. This was possible because the two fragments shared complementary base regions. After extension by a polymerase enzyme, it is possible to identify that the reassembled sequence is a result of one crossover between parental sequences S1 and S2, as can be seen in Fig. 3 (d). The figure shows how recombination occurs during shuffling. Generally the optimization of conditions related to reactions that are part of a shuffling process is conducted empirically; this can be, in many cases, a time and resource consuming task, due to the many variables involved in the problem [14]. In spite of the lack of exact answers to all questions related to the optimization of this process, several qualitative proposals that try to model DNA shuffling or

8081

some of its steps aiming at providing information that would help to maximize the recombination between the parental sequences can be found in the literature (see [8], [14], [15] and [16]). As commented in [17], “in several instances, chimeric enzymes with improved activity and stability have been isolated from libraries constructed using DNA shuffling ([18][19][20][21]). In other cases, the method resulted in libraries with either too many mutations ([22]) or too few crossovers ([23]) to be useful.” One way to try to avoid these undesired results is to start the process with adequate parental sequences - results from DNA shuffling are strongly based on the adequacy of parental genes, as discussed next. C. The Importance of Parental Sequences in DNA Shuffling It is very important that parental sequences submitted to a shuffling process be carefully selected. The selection process of these sequences is, generally, based on homology. One way to check how homologous two sequences are, is by aligning them and carrying out a check for similarities between corresponding base pairs. Although DNA shuffling assumes that parental sequences should be homologous, a determinant issue for the success of this process is to determine ‘how much’ homologous the parental sequences should be. Two identical or two quite different sequences would not be suitable. On one hand, identical sequences do not introduce any diversity. On the other hand, sequences that differ “too much” will hardly present the requested complementary regions for carrying out annealing. Consider two single-stranded DNA sequences S1 e S2 both with 136 base pairs, differing from each other in 8 base pairs. Fig. 4 shows the alignment of S1 and S2 and identifies the eight different base pairs between them.

Fig. 4. Alignment between sequences S1 e S2. Both share all the bases, except for those in the shaded area.

The problem of comparing two sequences is formally described using the notation given in [16]. Consider two sequences of size N that differ from each other in m base pairs, named m1, m2,... mm. Two different and consecutive base pairs mi and mi+1 (1 ≤ i < m-1) are separated from each other by ni identical base pairs. The variables n0 and nm represent the number of identical base pairs before the first and after the last different base pair, respectively. A diagram describing this is shown in Fig. 5, where only the different base pairs are represented, and the two sequences, named S1 and S2 are represented using single-stranded sequences.

Fig. 5. Generic representation of the alignment between two sequences S1 and S2 that differ from each other in 8 base pairs, each represented by its single-stranded DNA string 5’→3’. The dotted line between base pairs (o and x) indicate mismatch.

In what follows the positions where two parental sequences differ are named mutations. In order to estimate how diverse a library resulting from shuffling will be, it is necessary first to estimate the chances that mutations present in both parental sequences may be present in a recombined sequence. The estimation is directly related to the size of the selected fragments that are input in PCR cycles, where the reassembly of the fragments occurs. As described before, as a previous step to PCR cycles, fragments of parental DNA are produced as a result of enzymatic digestion. The resulting fragments are then purified in order to select only those that have their size within a certain interval (measured in number of bases generally fragments with sizes varying between 50 to 300 base pairs [1][2] are used) and then, the reassembly phase starts. A more careful analysis shows that the size of the fragments is directly related to the diversity of the library resulting from a shuffling process. In a shuffling process the reassembly occurs specifically in regions that share complementary bases between singlestranded fragments, since these bases will join through the formation of hydrogen biddings. The union of fragments originating from distinct parental fragments is described as a crossover event. Crossovers occurring in regions where parental genes are identical are experimentally undetectable and are called silent crossovers. The higher the number of crossovers during the fragment reassembly phase, the more diverse the resulting library will be. One way to promote crossover during the reassembly phase is by increasing the number of fragments produced by enzymatic digestion. In order to do that, smaller fragments should be selected for the PCR phase, since the number of recombining events is inversely proportional to the size of the fragments. However, this has an undesirable side effect small fragments are generally inefficiently reassembled [24]. Note that the recombination of fragments, shown in Fig. 3 (c) was only possible because the enzymatic cuts in both sequences produced fragments with complementary base sub-regions at their endings. In order to maximize the chances of a higher number of crossovers occurring, the ideal is that the size of selected fragments, represented by topt, should not be greater than the smallest value among all the intervals that separate distinct consecutive bases, as shown in (1). (1) t opt = min{ n i ,1 < i < m }

8082

In practice, parental sequences rarely are the same size and the identification of mutation points between them is not a trivial task. Generally in experiments only the local alignment between them is considered. In order to make the task feasible, we decided to focus on identifying regions instead of single mutation points, during the local alignment of parental sequences, which was conducted using BLAST [25] Results discussed in Section 3 are based on alignments given by BLAST. In the experiments all mutation points (i.e., the result of a mismatch or gap) that are not separated by at least five identical bases, are considered only one mutation region (i.e., counted as one). As an example, consider the local alignment between sequences S1 and S2 as shown in Fig. 6, where consecutive mutations separated by less than five identical bases are grouped into only one mutation region.

Fig. 6. Local alignment between sequences S1 e S2. The five mutation regions are shaded.

Similarly to what occurs during PCR in relation to the size of the fragments, the distance ni that separates two consecutive mutations has a direct influence on the occurrence of crossovers during the reassembly phase, since the bigger the distance (ni) between consecutive mutations mi and mi+1, the bigger the chance they will be separated into different fragments during the fragmentation step. II. ABOUT DRIVER 4

DRIVeR (Diversity Resulting from In vitro Recombination) [16] is a software that implements a statistical model which estimates the expected number of distinct sequences in a library created by random crossovers between two parental sequences highly homologous (i.e. differing from each other in only a few (e.g. 20) base pair positions). DRIVeR expects values for the following parameters: 1) N: parental sequence length; 2) λtrue: mean number of real crossover per sequence; 3) L: library size; 4) M: number of mutation pairs between parentals; 5) mi, 1 < i ≤ M, representing the positions of mutation pairs. Besides the total number of sequences with at least one crossover, which is represented by letter C, DRIVeR returns a probable location of each crossover within the reassembled sequences. As previously mentioned, the number of experimentally 4

available for download at www.bio.cam.ac.uk/~blackburn/stats.html

observed crossovers does not correspond to the real number of crossovers that occurs in a reassembled sequence, due to the presence of silent crossovers. For a given value of λtrue, DRIVeR estimates the number of observed crossovers, represented by λobs. If the value of λtrue is known, then it should be used. If only the value of λobs is known, then different values of λtrue should be tried, until the value of λobs is reproduced. This is done by carrying out successive executions of DRIVeR. The software assumes that the number of crossovers between two consecutive varying positions in a particular sequence resulting from shuffling follows a Poisson distribution (see [8] for details), given in (2). e λ λx (2) P( x ) = , x = 1,2,3,... x!

where λ is the true mean number of crossovers per sequence and P(x) is the probability that the sequence has exactly x crossovers. It should be noticed, however, that not all crossovers are observable - e.g 1, 3, 5,... crossovers between two varying base pairs look like only one crossover, while 2, 4, 6,… crossovers look like no crossover at all. So that, the probability that any number of crossovers occur between two consecutive points mi and mi+1 is given by (3). (3) P( bi ) = P( bi = 0 ) + P( bi = 1 ) where P(bi = 0) represents the probability of an even number of crossovers between mi and mi+1 occurs and P(bi = 1), the probability of this number is odd. In this way, the probability of each one of the k possible variants resulting from Shuffling (represented by v k ) occurring, is given by the product of the values of P(bi), for each of the M-1 distinct and consecutive base pairs, as shown in (4). M −1 (4) P( vk ) = ∏ P( bi ) i =1

III. EVALUATING THREE PAIRS OF PARENTAL SEQUENCES AS CANDIDATES FOR A SHUFFLING PROCESS The DRIVeR experiments described in this section aimed at analyzing the influence of parameter ni over the number of variants obtained from a shuffling process using two homologous parental sequences. In an attempt to verify which parental pair would produce a shuffling library with a greater number of variants, different experimental simulations were carried out using DRIVeR with three different pairs of parental sequences. The parental sequences choosen in this study are proteins which specifically inhibit cysteine proteases, named cystatins. They occur naturally in several vegetable species and it is believed that they are part of a plants defence mechanism against some pathogens [26]. In the experiments, the three sequences used are mRNA which codify to cysteine proteinase inhibitor: one obtained from rice seeds (Oryza sativa - japonica cultivar-group, GenBank accession NM_190953); the second obtained from sugarcane (Saccharum officinarum, GenBank accession

8083

AY119689), and the third from Sorghum seedlings (Sorghum bicolor, GenBank accession X87168). The three sequences will be referred to as Oryza, Cane and Sorghum respectively. In an attempt to verify which parental pair would produce a shuffling library with a greater number of variants, different experimental simulations were carried out, with the following parental pairs: Cane/Oryza, Cane/Sorghum e Oryza/Sorghum under different settings (library sizes of 1000, 5,000 e 10,000 sequences and the real number of crossovers varying between 1 up to 10). Results are presented in Table I. Values presented in Table I have been grouped according to library size of 1,000, 5,000 and 10,000 sequences and are graphically shown in Fig. 7, 8 and 9, respectively. Note that using Oryza and Sorghum as parental sequences in any of the above situations would probably produce a library with greater diversity. Based on results obtained it is possible to analyze the effects of the average distance between mutations and the number of mutations on the expected number of variants, in respect to each situation considered. Table II shows the number of mutation regions and the average distance between each consecutive mutation pair, for each of the three parental pairs used. Fig. 7, as well as Fig. 8 and 9, shows that the resulting library from a shuffling process using Oryza and Sorghum as parental sequences has greater diversity than the resulting library using Cane and Sorghum or Cane and Oryza. Results favouring Oryza/Sorghum can be justified by the fact that the average distance between existing parental mutations (12 base pairs) is greater than that of their counterparts ( 8.5 for Cane and Sorghum and 8.9 for Cane and Oryza). This fact reinforces the hypotheses that the bigger the distance between two consecutive mutations, the greater the chances that they will be recombined in a reassembled fragment.

TABLE I. DRIVER RESULTS USING (A) PARENTALS: ORYZA AND SORGHUM, ALIGNMENT SIZE: 189 BP, MUTATION POINTS : 13, AVERAGE DISTANCE BETWEEN MUTATION: 12 BP. (B) PARENTALS: CANE AND SORGHUM, ALIGNMENT SIZE: 122 BP, MUTATION POINTS : 12, AVERAGE DISTANCE BETWEEN MUTATION: 8.5 BP. (C) PARENTALS: C ANE AND ORYZA, ALIGNMENT SIZE: 118 BP, MUTATION POINTS : 11, AVERAGE DISTANCE BETWEEN MUTATION: 8.9 BP

TABLE II. N UMBER OF MUTATION REGIONS AND MEAN DISTANCE BETWEEN CONSECUTIVE MUTATIONS BETWEEN BASE PAIRS CONSIDERING AS PARENTAL SEQUENCES: ORYZA AND SORGHUM , CANE AND SORGHUM AND CANE AND ORYZA

8084

Fig. 7. Number of observed crossovers versus number of expected variants. Library size 1,000.

Comparing parental pairs Cane/Sorghum and Cane/Oryza, we can notice that in spite of the average distance between mutations in Cane/Oryza being greater than their counterpart in Cane/Sorghum, the number of expected variants produced by shuffling using Cane/Sorghum is greater. This fact shows that, in spite of the average distance between mutations being an important issue which could influence the number of produced variants, the absolute value of the number of mutations between the two parental sequences should also be taken into consideration when choosing the sequences to submit to a shuffling process, since this value determines the ‘sample space’ in which the reassembled sequences occur. It is important to mention that for two sequences which differ from each other in m base pairs, the total number of distinct variant sequences resulting from any number of crossovers (maximum m) is 2m. Even for small values of m the total number of variants is rarely reached due to, among other things, the limitations of screening methods [15]. Given that, when possible, sequences that exhibit a greater average distance between mutations should be chosen instead of those with a greater number of mutations. In the experiments carried out, the total number of resulting variants were 8192, 4096 e 2048, for parental sequences Oryza/Sorghum, Cane/Sorghum and Cane/Oryza, respectively. Table III shows the size of the sample space defined by each parental sequence pair as well as the percentage of the sample space covered by the experiments, for λtrue = 4 and L = 1,000. TABLE III. PERCENTAGES OF THE SAMPLE SPACE COVERED BY EACH PARENTAL PAIR: ORYZA/SORGHUM, C ANE/SORGHUM AND C ANE/ORYZA, CONSIDERING THE DRIVER RESULTS WHEN

λ

TRUE

= 4 AND L = 1,000

Fig. 8. Number of observed crossovers versus number of expected variants. Library size 5,000.

Note that in spite of the values of C for the three parental pairs being relatively close to each other, the percentage of the sample space covered varies significantly. In spite of the low covering given by experiments using the Oryza/Sorghum pair, this pair still is potentially the most suitable for producing a large number of variants, since the average distance between mutations is the greatest of the three pairs. IV. CONCLUSION

Fig. 9. Number of observed crossovers versus number of expected variants. Library size 10,000.

The efficiency of a directed evolution method can be measured by the average number of recombination events that occur in the reassembled sequences [24]; directed evolution experiments have largely been guided by empirical information and experience without a quantitative understanding of the recombination step and subsequent

8085

optimization of the experimental setup [27]. Optimizing each of the shuffling phases is the key to ensure a greater number of recombination events. Although many optimization events are experimentally determined, computation models have been proposed and used as tools to support and, in many cases, direct in vitro experiments. In this paper a software named DRIVeR was used to simulate a shuffling process using three different parental pairs, aiming at analyzing the most suitable pair. Based on results, it was possible to evaluate the effect of the distance between mutations in parental sequences over the resulting sequences from a shuffling process, as well as infer the number of mutation between the parental sequences. The experiments described in this paper reinforce the fact that conducting a more detailed and deeper analysis of the sequences available for DNA Shuffling can result in identifying the most promising sequences to effectively undergo shuffling with success. Apart from using the identified sequence pair in a laboratory shuffling experiment, this work will proceed by analysing the three parental sequences using the model proposed in [8] and [15].

[17] [18] [19]

[20] [21] [22] [23] [24] [25] [26]

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

[13] [14] [15] [16]

Stemmer, W.P.C. Rapid evolution of a protein in vitro by DNA shuffling. Nature 370, pp. 389-391 (1994). Stemmer, W.P.C. DNA shuffling by random fragmentation and reassembly: In vitro recombination for molecular evolution. Proc. Natl. Acad. Sci. USA 91, pp. 10747-10751 (1994). Volkov, A. A., Shao, Z., Arnold, F. H. Random chimeragenesis by heteroduplex recombination. Methods Enzymol. 328, pp. 456-463 (2000). Zhao, H., Giver, L., Shao, Z., Affholter, A., Arnold, F. H. Molecular evolution by staggered extension process (StEP) in vitro recombination. Nature Biotechnol. 16, pp. 258-261 (1998). Patten, P. A., Howard, R. J., Stemmer, W. P. C. Applications of DNA shuffling to pharmaceuticals and vaccines. Current Opinion in Biotechnology 8, pp. 724-733 (1997). Shao Z., Arnold, F. H. Engineering new functions and altering existing functions. Current Opinion in Structural Biology 6, pp. 513518 (1996). Arnold, F. H., Moore, J. C. Optimizing industrial enzymes by directed evolution. Advances in Biochemical Engineering Biotechnology 58, pp. 1-14 (1997). Sun, F. Modeling DNA Shuffling. J. Comp. Biol. 6, pp. 77-90 (1999). Metzker, M. L., Caskey, T. C. Polymerase Chain Reaction, Encyclopedia of Life Sciences. Nature Publishing Group (2001). Sun, F. The polymerase chain reaction and branching processes, Journal of Computational Biology, Spring, 2(1), pp. 63-86 1995). Sun, F. Stochastic modeling of polymerase chain reaction and related biotechnologies. Bulletin of the International Statistical Institute, 53rd session proceedings, book 1, pp. 393-396 (2001). Hassibi, A., Kakavand, H., Lee, T. H. A Stochastic model and simulation algorithm for polymerase chain reaction (PCR) systems. Proc. of Workshop on Genomics Signal Processing and Statistics, (2004). Tait, R. C., Horton, R. M. Genetic engineering with PCR, Horizon Scientific Press, 219 pages (1998). Maheshri, N. and Schaffer, D. Computational and Experimental analysis of DNA shuffling. Proc. Natl. Acad. Sci. 100, pp. 3071-3076 (2003). Moore, G.L., Maranas, C.D., Lutz, S. and Benkovic, S.L. Predicting crossover generation in DNA shuffling. PNAS 98, 3226-3231 (2001). Patrick, W.M., Firth, A.E. and Blackburn, J.M. User-friendly algorithms for estimating completeness and diversity in randomized

[27]

8086

protein-encoding libraries. Protein Engineering 16( 6), pp. 451-457 (2003). Joern, J. M. Engineering Dioxygenases by Laboratory Evolution: A Comparison of Evolutionary Search Strategies. PH.D. Thesis, 233 pages (2003). Chang, C.C.J., Chen, T.T, Cox, B.W., Dawes, G.N., Stemmer, W.P.C, Punnonen, J. and Patten, P.A., Evolution of cytokine using DNA family shuffling. Nature Biotechnology 17, pp. 793-797 (1999). Ness, J.E., Welch, M., Giver, L., Bueno, M., Cherry, J.R., Borchert, T.V., Stemmer, W.P.C, and Minshull, J. DNA shuffling of DNA subgenomic sequences of subtilisin. Nature Biotechnology 17, pp. 893-896 (1999). Christians, F. C.; Scapozza, L.; Crameri, A.; Folkers, G.; Stemmer, W. P. C. Directed evolution of thymidine kinase for AZT phosphorylarion using DNA family shuffling. Nat. Biotechnol 17, pp. 259-264 (1999). Bruhlmann, F. and Chen, W. Tuning biphenyl dioxygenase for extended substrate specificity. Biotech Bioeng. 63, pp. 544-551 (1998). Zhao, H. and Arnold, F. H. Optimization of DNA shuffling for high fidelity recombination. Nucleic Acids Res. 25, pp. 1307-1308 (1997). Kikuchi, M., Ohnishi, K. and Harayama, S. Novel family shuffling methods for the in vitro evolution of enzymes. Gene 236, pp. 159-167 (1999). Volkov A.A. and Arnold F.H. Methods for in Vitro DNA Recombination and Random Chimeragenesis. Methods in Enzymology 328, pp. 447-456 (2000). Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. A Basic Local Alignment Search Tool. Journal of Molecular Biology 215, pp. 403-410 (1990). Costa, A.S., Expressão heteróloga, purificação e estudos de atividade de uma proteína inibidora de Cisteíno Protease da cana-de-açúcar e posterior evolução in vitro pela técnica de DNA Shuffling. Tese de Doutoramento, Universidade Federal de São Carlos - SP, Brasil, 111 pages (2004). Moore, G.L., Maranas, C.D., Gutshall, K.R., Brenchley, J.E. Modeling and optimization of DNA recombination. Computers & Chemical Engineering 24, pp. 693-699 (2000).

Lihat lebih banyak...

Computer Assisted Parental Sequences Analysis as a Previous Step to DNA Shuffling Process

Descrição do Produto

Comentários