A new genetic approach to construct near-optimal binary search trees

May 22, 2017 | Autor: Mohammad Nematbakhsh | Categoria: Genetics, Applied Mathematics, Genetic Algorithm, Evolutionary Algorithm, AMC, Cost Efficiency, Binary Search Tree, Numerical Analysis and Computational Mathematics, Binary Search, Cost Efficiency, Binary Search Tree, Numerical Analysis and Computational Mathematics, Binary Search

Share Embed

Denunciar este link

Descrição do Produto

Applied Mathematics and Computation 190 (2007) 1514–1525 www.elsevier.com/locate/amc

A new genetic approach to construct near-optimal binary search trees Afsaneh Fatemi *, Kamran Zamanifar, Naser NematBakhsh Department of Computer Engineering, University of Isfahan, P.O. Box 81746-73441, Isfahan, Iran

Abstract Many deﬁnitive and approximate methods have been so far proposed for the construction of an optimal binary search tree. One such method is the use of evolutionary algorithms with satisfactorily improved cost eﬃciencies. This paper will propose a new genetic algorithm for making a near-optimal binary search tree. In this algorithm, a new greedy method is used for the crossover of chromosomes while a new way is also developed for inducing mutation in them. Practical results show a rapid and desirable convergence towards the near-optimal solution. The use of a heuristic to create not so costly chromosomes as the ﬁrst oﬀspring, the greediness of the crossover, and the application of elitism in the selection of future generation chromosomes are the most important factors leading to near-optimal solutions by the algorithm at desirably high speeds. Due to the practical results, increasing problem size does not cause any considerable diﬀerence between the solution obtained from the algorithm and exact solution. 2007 Elsevier Inc. All rights reserved. Keywords: Near-optimal binary search tree; Genetic algorithm; Optimization

1. Introduction Optimal binary search tree is a fundamental data structure with wide applications. Work in this ﬁeld has evolved over the past 50 years, a full historical account of which can be consulted in Refs. [9,10,15,18]. As a brief history, the classical algorithm commonly used for this purpose was ﬁrst developed in 1959. This algorithm worked originally in O(n3) time which was later improved to O(n2) [10]. Its disadvantage, however, is that it does not have the desirable eﬃciency in most applications. Therefore, an optimal binary search tree with far less time cost is required. There are also algorithms for constructing near-optimal binary search trees [12,15], and algorithms for constructing optimal height-restricted binary search trees [5–7,9,15,20]. The latter are useful because the maximum number of comparisons made during a search cannot be greater than the height plus 1. Thus, *

Corresponding author. E-mail addresses: [email protected] (A. Fatemi), [email protected] (K. Zamanifar), [email protected] (N. NematBakhsh). 0096-3003/$ - see front matter 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.amc.2007.02.052

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525

1515

height-restriction can be used to produce binary search trees that perform well in both the worst case and the expected case. All known algorithms for constructing optimal binary search trees, whether height-restricted or not, use dynamic programming. Knuth’s algorithm [10] runs in O(n2) time, where n is the number of keys. Itai [7] and Wessner [20] independently improved Garey’s algorithm [6] to run in O(Ln2) time, where L is the height restriction. Their algorithm proposes an optimal binary search tree of height h and its cost, for each h, such that log 2n 6 h 6 L. Mehlhorn’s and Larmore’s algorithms [12,14,15] can be implemented to run in O(n2) time, but the trees that they produce are not always optimal. As mentioned earlier, a number of estimation techniques have been so far used for this purpose [2,3,5,8,9,12,13,19]. These techniques show good results when used in small size problems but seem to need to be combined with other methods when applied to solve large sized problems. One such technique is the evolutionary algorithm. It shows good results in practical development of a near-optimal binary search tree [3,16], but does not guarantee a deﬁnitive optimal solution [4,17]. Most studies have shown that a good way for achieving results comparable to deﬁnitive algorithms is to use speciﬁc knowledge of the problem in genetic algorithms [3,16]. The conventional and exact solutions have been used as the basic reference in genetic algorithms to construct near-optimal search trees. Thus, the genetic algorithm has been used to identify the most suitable parameters to solve the problem. The advantages of these methods include extreme simplicity in deﬁning chromosomes as well as crossover and mutation processes which, in turn, lead to simpliﬁed encoding of the algorithm. However, as the speciﬁc knowledge of the problem is not typically used in these methods, the genetic algorithm has a completely random operation. This randomness increases the time required for the convergence toward a near-optimal solution. In addition, the discrepancy between the time required to form BST through the proposed algorithm and conventional solutions, is increased. In this paper, a new genetic algorithm is proposed for the construction of a near-optimal binary search tree. The chief tenet of the method is developing a heuristic to create the initial population of binary search trees. In addition, new methods of greedy crossover and mutation operations are designed to change the population eﬃciently. The use of the speciﬁc knowledge of the problem in all the stages increases the speed and accuracy of the algorithm, but only at the cost of a more diﬃcult encoding process. Section 2 provides a description of the problem and the details of the proposed algorithm. Section 3 investigates the practical results obtained from this algorithm and Section 4 sums up with concluding remarks and introduces possible future trends in this area. 2. The new approach In this section, a new genetic algorithm is introduced for the construction of a near-optimal binary search tree. Before that, a description of the problem follows. 2.1. Description of the problem A binary search tree (BST) is a binary tree in which each node has at least four associated pieces of information: • A key value. This is an element from an ordered set. It can be assumed that each key is unique in the BST. • A parent node. This is either the special value null, or another node in the BST. One and only one node in a BST will have a null parent at any time, the root node of the BST. • Left child and right child. Either or both of these may also be null. 2n There are many BSTs for a ﬁxed set of nodes: the number of distinct BSTs on n nodes is =ðn þ 1Þ. n When using BSTs, we often wish to change between the available trees in order to reduce the depth of keys we access at a later time.

1516

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525

For a search in a binary search tree on n keys, we can see one of the 2n + 1 possible outcomes: the search target could be less than the smallest key; for each key, the search target could equal that key; for each pair of sequential keys, the search target could be strictly between them; or the search target could be greater than the largest key. For a ﬁxed probability distribution P over these outcomes, the cost of a binary search tree is the expected number of comparisons made during a search. If P is diﬀerent for each key on the BST, the cost is aﬀected with these diﬀering probabilities. A binary search tree with minimum cost is called optimal. In formal, given a set of keys, ki, a search probability value, Pi for each key to be searched, and assuming that for all i between 1 to n : k iþ1 6 k i , an optimal binary search tree including all the n keys is sought such that the expected time to access nodes is minimal [10,11]. To deﬁne the problem in the form of an evolutionary problem, one should consider its search space as including the range of all binary search trees represented by a set of node numbers (codes). Thus, the proposed strategy is to select a random subset of these trees as an initial population and to allow it to evolve under various conditions. 2.2. Representation of chromosomes and the initial population In a total binary tree, a unique number from 1 to 2n1 (where n is the tree height) can be assigned to each node [10,11]. Fig. 1 depicts an example of encoding the nodes on a tree: In the proposed method, we represent each tree with n nodes as a chromosome. If we number the genes of this chromosome from 1 to n (exactly same as tree keys), then gene numbered as p contains the number of the node on the tree corresponding to key p. For example, if second gene contains 10, it means that k2 is placed on node with sequence number 10 in corresponding tree. As a sample, the permutation (2, 10, 5, 1, 3) uniquely represents a BST containing the ﬁve keys K1 6 K2 6 K3 6 K4 6 K5 in which K1 is located on node 2, K2 on node 10, etc. This chromosome will be representing a tree as shown in Fig. 2. Thus, each chromosome can be shown as a permutation of a subset of numbers from 1 to 2n1 such that the numbers introduce a BST.

Fig. 1. Encoding the nodes on a tree.

Fig. 2. Representation of a BST as a permutation of numbers.

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525

1517

Let us assume that each chromosome is designated as C and further that Ci is the number of the node containing key Ki. If Size represents the number of genes of C, the search function Search (C, Size, X) is deﬁned in a manner that when a key is located on node X of tree represented by chromosome C, the number of the key (the gene code containing the value of X) is returned, otherwise it returns the value zero. Based on deﬁnition [1,10,11], chromosome C represents a BST if and only if for each key Ki on it, all of the keys located in the left subtree of Ki (if present) are smaller than (or equal to) and all of the keys located in the right subtree of Ki (if present) are larger than (or equal to) it. In other words: 8i 2 ½1; n : All Nodes On Left ðKiÞ 6 Ki and All Nodes On Right ðKiÞ P Ki:

ð1Þ

In the light of our ﬁrst assumption (K1 6 K2 6 6 Kn), the above relation can be interpreted to mean that for each key named Ki, the number of the key on the left of Ki (if present) is smaller than i and the number of the key on the right of Ki (if present) will be larger than i. Given that if a node on the binary tree is numbered Ci, its left child will be numbered Ci*2 and its right child Ci*2 + 1, then C represents a BST if and only if: 8i 2 ½1; n : ðSearchðC;n; Ci 2Þ < iÞ and ðif SearchðC;n; Ci 2 þ 1Þ < 0 then Search ðC;n; Ci 2 þ 1Þ > iÞ:

ð2Þ

This condition will be checked in all stages of generating populations in diﬀerent oﬀspring of the genetic algorithm. Fig. 3 shows the Search function. 2.3. Selection operator The Roulette Wheel has been used in this algorithm for the selection method [4,14]. According to this method, parents are selected on the basis of their ﬁtness rate; i.e., the ﬁtter the chromosome, the higher the probability of its selection. For a population with a population size of PopSize, then Popsize-1 chromosomes will be selected. As chromosomes are randomly selected, it will be possible to have repetitions of the same chromosomes in the population. In order to avoid the ﬁttest chromosomes in a population being discarded in the next generation, elitism is employed in which the ﬁttest chromosomes are simply transferred to the next generation without selection [3,4,17].

Function Search(C : Chromosome , Size,X: integer ) : integer ; Begin Var m : integer; Found:Boolean; i:=1; Found:=False; While (iSize then Search:=0; end; End; Fig. 3. Algorithm for search function.

1518

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525

2.4. Genetic operators Crossover exchange and mutation processes are the main factors of producing changes in the nature of chromosomes [4,17]. Below is a discussion of the crossover exchange and mutation operations employed in our proposed algorithm. 2.4.1. Crossover operator A greedy crossover is introduced which reproduces one child from two parents. The child and the ﬁtter of the two parents are transferred to the next generation. The common method used in crossover consists in selecting random parts of the parents and subsequent permutation of those parts. In our proposed genetic algorithm, however, this method can not be employed as each chromosome is representing one BST. This

Function CrossOver(Parent1, Parent2 : Chromosome ) : Chromosome ; Begin Var Child : Chromosome ; CopySameGenes(Parent1,Parent2,Child) ; FillEmptyGenes(Child); CrossOver:=Child; End; Fig. 4a. Algorithm of crossover operator.

Procedure CopySameGens(B1,B2: Chromosome, var C: Chromosome) Begin Var sw:Boolean; For i = 1 To ChromosomeSize do C(i) = 0; For i = 1 To ChromosomeSize do Begin sw = False; If sw = False Then For j = 1 To ChromosomeSize If B1(i) = B2(j) Then Begin Sw := True; If (keys(i) > keys(j)) And (C(i) = 0) Then C(i) := B1(i) Else If C(j) = 0 Then C(j) := B2(j) Else Begin Point:= 1; While (C(Point) 0) Point := Point + 1; C(Point) = B2(j); End ; End; End; End ; Fig. 4b. First step of crossover operator.

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525

1519

Procedure FillEmptyGens(C: Chromosome) Begin Var Flag,Finish : Boolean; Zero,Point: Integer; Zero := 1; Point := 1; Finish := False; While (C(Point) = 0) Point = Point + 1; NextZero(Zero, C); While (not Finish) Begin If Zero > ChromosomeSize Then Finish = True Else Search (C(Point) * 2,C, Flag); If Flag = False Then Begin C(Zero) = C(Point) * 2; NextZero (Zero, C); End ; End If If Zero > ChromosomeSize Then Finish = True Else Search(C(Point) * 2 + 1, C,Flag); If Flag = False Then Begin C(Zero): = C(Point) * 2 + 1; NextZero(Zero, C); End; Point := Point + 1; If (Point >= ChromosomeSize) Or (zero > ChromosomeSize) Then finish: = True; End ; Fig. 4c. Second step of crossover operator.

is mainly because permutation of parts of two chromosomes does not guarantee that the resulting chromosome will continue to represent a BST. Fig. 4a shows the crossover operation in the proposed algorithm in a glance. It operates in the following manner. First, identical nodes on the two parent trees are copied in the child chromosome. This is accomplished in a greedy manner in the sense that if Ki in the ﬁrst parent and Kj in the second parent are located on node S of the tree, the key with a higher probability of search will be located on node S. This process starts up from node 1 (from the tree root which is common to all trees in the search space) in order to ensure that all the keys with higher search probabilities are located at levels closer to the root. Thus, with regard to the ﬁtness function to be introduced in Section 2.5, the expected time to access each node gains a decreasing trend. Fig. 4b depicts the ﬁrst step of crossover function. It is assumed that Keys(i) shows the probability that Ki is searched according to it. It can be seen that the ﬁrst step acts as a simple search in the parents’ genes, to ﬁnd the same tree nodes that are occupied in both chromosomes (see Fig. 4c). It means that we prefer to use those nodes of child tree that are used in both of its parents. Then we select the key with higher search probability to be placed in the referred tree node. This step runs in O(n2) time. It can be explained better, using an example. Fig. 5 depicts

1520

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525

Fig. 5. An example to show how crossover works.

such a sample. In this example, tree nodes numbered 1, 6 and 3 contain K3, K4 and K5 in ﬁrst parent and K1, K2 and K3 in second parent. According to proposed algorithm for crossover, we prefer to ﬁll nodes numbered 1, 6 and 3 ﬁrst, to form a child. This is done in a greedy manner: Node numbered 1 may contain k3 or k1. We select K1, because selection probability of K1(0.3) is greater than K3’s(0.1). Node numbered 6 may contain K4 or K2. We choose K2, because selection probability of K2(0.3) is greater than K4’s(0.1). K5 is selected to ﬁll node numbered 3 with the same reasoning. The next step in the crossover is ﬁlling in the remaining vacant genes in the child chromosome. This must be accomplished in such a manner that the resulting tree will still remain to represent a BST, that is, Relation (2) must hold true for it. The second step runs in O(n) time. This step can be explained better using the example shown in Fig. 5. The ﬁrst vacant gene (that is not ﬁlled in ﬁrst step) found in child chromosome is the third gene (showing the place of K3). K3 is greater than K2, so can be placed as its right child. K2 is placed in tree node numbered 6, so its right child should be numbered 13. All the vacant genes are ﬁlled in a same manner. It can be depicted simply that the crossover operation runs in O(n2) time. After the crossover is terminated, parents P1 and P2 will be replaced by both their child and the ﬁtter of the two parents. 2.4.2. Mutation operator The objective behind mutation in genetic algorithms is to cause a small change in one chromosome. In our problem of constructing an optimal binary search tree, the change must be made in a manner that the resulting chromosome still represents a BST.

Fig. 6. An example to show how mutation works.

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525

1521

Procedure Mutate (C : Chromosome ); Begin Var i,j:integer ; i := KeyInRoot(C) ; if i=1 then for j:=n down to 2 C(j) := C(j-1) else for j:=i-1 down to 2 C(j) := C(j-1) ; C(1) := C(1)*2; End; Fig. 7. Algorithm of mutation operator.

In our proposed method, mutation in one chromosome is accomplished through the permutation of K1 (the smallest key) on the tree. As the smallest key on the BST tree is the ﬁrst node in the In-Order traversing, K1 has no left-hand child. Therefore, it can be permuted in a way that it is placed in the position of its imaginary left child. Other nodes on the tree are then permuted accordingly. These permutations are readily accomplished in our proposed algorithm thanks to the special pattern designed for the chromosomes. The ﬁrst step to cause mutation in the selected chromosome is identifying the key located in the tree root (Ki). When i is equal to 1, i.e., K1 is in the tree root, all genes of the chromosome in one unit are shifted to their right. In other words, K2 replaces K1, K3 replaces K2, etc. and ﬁnally, K1 is moved to the left-hand side of its original position. In case i is a number other than 1, i.e., a key other than K1 is in the root, all genes from the ith gene onwards to the end of the chromosome remain unchanged but the permutations described above take place for the ﬁrst gene through to the i 1th. It can be clearly seen that the mutation operation runs in O(n) time. A more detailed description of the proposed algorithm for mutation (Fig. 7) is given in the examples in Fig. 6. In ﬁrst sample, node numbered 1 (the root) contains K5 (not K1). So, shifting is done for keys K1 to K4 and the remaining keys (K5 and K6) are not moved. K4 that is in node numbered 5 is moved to place of K3 (node numbered 2), K3 is moved from node numbered 2 to place of K2 (node numbered 9) and so on. Finally, K1 is moved to its imaginary left child (from node numbered 4 to 8). In second sample K1 is placed in root, so all of the keys except K1 are shifted and K1 is moved from node numbered 1 to 2. It need be mentioned that mutation can also be a greedy process; i.e., the chromosome resulting from the mutation replaces the original one only when the resulting tree incurs lower costs than the original one. In our proposed algorithm, however, this option was not adopted in order not to lose the chance of obtaining completely diﬀerent trees from the original one resulting from the mutation process. 2.5. Evaluation of the chromosomes A cost function computing the expected cost to access the tree nodes is used to evaluate the chromosome representing it. As the main objective in constructing an optimal binary search tree is to minimize this cost function [10,11],the inverse form of this function can be regarded as the ﬁtness function for the genetic algorithm. The relations below show the cost and ﬁtness functions in our genetic algorithm. In these relations, Cj designates the number of the node where jth key is stored in the tree under evaluation, Pj designates the search probability of the key, and n is the number of keys.

Fig. 8. Evaluating a chromosome.

1522

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525

Cost ¼

n X

Pj ðintðlogCj 2 Þ þ 1Þ;

ð3Þ

j:¼1

Fitness ¼ 1=Cost:

ð4Þ

Fig. 8 shows the evaluation of a chromosome representing a BST with ﬁve keys. 2.6. The proposed algorithm As seen in Relation (2), in our problem of constructing an optimal binary search tree, certain conditions must hold all throughout the execution of the genetic algorithm. It is because we need the chromosome to represent a BST all over the time. An algorithm is proposed herein to solve the problem under study in the light of these conditions along with the special genetic operators (Fig. 9). The population size, the number of keys, crossover rate, mutation rate, and the number of algorithm iterations, as the ending condition, form the parameters involved in this algorithm. Once the values for these parameters are obtained, the ﬁrst chromosome population is stochastically formed in such a manner that trees with minimum depth are represented. This method of initial quantiﬁcation of chromosomes leads to an accelerated trend in the convergence of the algorithm toward an optimal solution. Then, the chromosomes from the current population are evaluated and the ﬁttest chromosomes from the current generation are transferred to the next, with the outcome that the best existing chromosomes are prevented from being discarded in the process of forming a ﬁtter population in the next generation. As a result of this and with regard to the crossover rate, a percentage of the selected population is directly transferred to the next generation without undergoing any changes while the remaining portion of the next generation is the result of crossover of other selected chromosomes. In the end, small changes are incurred in the new population in proportion to the mutation rate. These stages are repeated to reach the desirable conditions at the end of the algorithm.

Procedure OBST(p {Population Size}, CRate{CrossOver Rate}, Mrate {Mutation Rate}, NumGenerations { Termination Criteria} ); Begin #crossovers := p* CRate; #mutations := p* Mrate; Initialize population P1 with a construction heuristic; For j:=1 to NumGenerations do Begin Evaluate the fitness of individuals in P j ; { P j is present population } Add the individual with highest fitness in P j into P j +1 ;

{ elitism } { P j +1 is next

population } Select p - #crossovers -1 individuals from Pi and copy them to P j +1 ; For k:=1 to #crossovers do Begin Select two parents a1,a2 from P j randomly; c:= Crossover(a1,a2); add c and the most fit from a1,a2 to P j +1 ; end; For k:=1 to #mutations do Begin Select an individual a1 from P j +1 randomly; Mutate(a1); replace a1 in Pj+1; end end; end; Fig. 9. The proposed algorithm to construct near-optimal binary search tree.

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525

1523

3. Implementation and practical results The proposed algorithm was implemented in Multi-Pascal environment. We chose Multi-Pascal, because our end goal was developing the proposed algorithm parallely. The keys was generated randomly for each trial, due to the number of keys given as a parameter .The number of chromosomes in each generation, crossover rate, and mutation rate were the variable parameters of the genetic algorithm. The results used in evaluating the algorithm were obtained on the basis of a crossover rate of 0.75, a mutation rate of 0.1, and varying numbers of chromosomes while the problem size (number of keys) also varied. The optimal tree corresponding to problem parameters was obtained by executing the code written according to the algorithm introduced in [1].The expected time to access the trees obtained from the proposed solution and conventional solution were compared. Due to the stochastic nature of the proposed genetic algorithm, it is natural to obtain diﬀerent solutions from constant parameter values. For this reason, the mean values obtained from ﬁve consecutive iterations of the algorithm were used as the solutions in evaluating the algorithm. Two criteria were used in evaluating the performance of the algorithm: (1) the diﬀerence between the expected time required to search the trees obtained from the proposed solution and conventional solution, and (2) the real-time execution of the algorithm as compared to the execution complexity computed. Practical results showed that increasing the problem size (number of keys) increases the diﬀerence between algorithm solution and the exact solution. However, this diﬀerence is not considerable even in problems with large numbers of keys. It may be concluded that the proposed algorithm is satisfactory in terms of convergence toward the near-optimum point. It must be mentioned that the optimal exact solution is obtained only at high time costs. This is while the proposed algorithm requires far less time to converge to proper solution. Fig. 10a compares the proposed algorithm and exact solution. From the practical results obtained, it may be observed that despite the fact that the time complexity computed for the algorithm in its worst case is in the order of n2, the execution time is practically far shorter. Fig. 10b shows the steady increase in the algorithm execution time as compared to an n2-order algorithm. It may thus be concluded that the proposed algorithm enjoys a rather high eﬃciency and is satisfactorily comparable to non-evolutionary algorithms so far proposed for solving the problem [5]. Employment of a heuristic in creating not so costly chromosomes as the ﬁrst generation chromosomes of the algorithm, greediness of the crossover operations, and the application of elitism in selecting future generation chromosomes constitute the most important factors contributing to the capability by this algorithm to obtain near-optimal solutions at desirably fast speeds. It was observed that the number of elements in each generation aﬀected the convergence speed of the algorithm. Generally speaking, the greater the number of elements in each generation, the faster the convergence.

12

Mean Search Cost

10 8 6 4 2 0 4

5

6 7

8

9 10 11 12 13 14 15 16 17 18 19 20

Problem Size (Num. Keys) Fig. 10a. Comparison of expected cost to search the trees obtained from proposed (dashed line) and conventional (solid line) algorithms.

1524

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525 9 8

Execution Time

7 6 5 4 3 2 1 0 4

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

Problem Size( Num Keys) Fig. 10b. Comparison of average execution time for proposed (dashed line) and conventional (solid line) algorithms.

It naturally requires fewer iterations of the algorithm. However, if the number of elements is excessively high in relation to the problem size, a premature stop will occur at a local optimum point, which will greatly increase the diﬀerence between the algorithm and the exact solutions. Selection of suitable number of chromosomes in each generation in sound proportion to other parameters will play a great role in increased eﬃciency of the algorithm. In a higher stage, we implemented the proposed algorithm in parallel using task parallelism. In this manner, we did the matrix initializations, ﬁtness evaluation of chromosomes of same generation, and doing genetic operations of each generation in parallel. Fig. 11 shows the results when crossover probability is 0.6 and mutation probability is 0.01. 4. Conclusion and future works Binary search trees are suitable representations of data that are frequently searched. The construction of a satisfactory and low cost search tree, however, is a diﬃcult and expensive task. Application of genetic algorithms combined with a heuristic in compliance with the speciﬁc conditions of the problem can guarantee a suitably eﬃcient method for constructing a near-optimal binary search tree. The acceptable diﬀerence between the algorithm solution and the exact one can be considered as one parameter in the problem or even as the condition for ending the genetic algorithm operation.

Execution time (seconds)

45 40 35 30 25 20 15 10 5 0 1

2

3

4

5

6

7

8

9 10 11 12 13 14 15 16 17

Problem Size (Num. Keys) Fig. 11. Speed up (dashed line) achieved using various processor numbers (solid line).

A. Fatemi et al. / Applied Mathematics and Computation 190 (2007) 1514–1525

1525

This paper introduced a genetic algorithm proposed for solving the problem in question. The algorithm uses a greedy heuristic for crossover of chromosomes along with a new method to do mutation in generations. The employment of a heuristic in creating not so costly chromosomes as the ﬁrst generation of the algorithm, the greediness of the crossover operation, and the use of elitism in selecting future generation chromosomes are the factors leading to the capability of the algorithm to ﬁnd near-optimal solutions at a highly suitable speed. Experiments show that task parallelism gets good speedup in results. Employment of other heuristic methods for crossover and mutation operations, varying system parameters such as crossover rate, mutation rate, and the algorithm end conditions may be recommended for future investigations. The number of elements in each generation of the population included in the algorithm was taken as a constant parameter in the present work. Varying numbers of elements in each generation can be considered as another area for inquiry which must be in accordance with other parameters and execution conditions. Also parallelizing the proposed genetic algorithm may be examined as the best option to accelerate the execution and to reduce time cost of algorithms with large numbers of keys. Here, a very simple parallelization technique is used. Experiments in a parallel genetic programming environment with facilities of parallel hardware, using other parallel population models, using other levels of parallelism, and changing all problem parameters, would improve the proposed algorithm. References [1] A. Aho, J. Hopcroft, J. Ullman, Introduction to the Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA, 1974. [2] B. Allen, Optimal and near optimal binary search trees, Acta Informatica 18 (1987) 195–208. [3] T. Cassen et al., Near optimal construction of partitioning trees using evolutionary techniques, in: Proceedings of Graphics Interface 95, vol. I, 1995, pp. 263–271. [4] L. Davis (Ed.), Handbook of Genetic Algorithms, Van Nostrand Reinhold, NewYork, 1991. [5] T. Gagie, New Ways to Construct Binary Search Trees, Department of Computer Science, University of Toronto, 2004. [6] M.R. Garey, Optimal binary search trees with restricted maximal depth, SIAM J. Comput. 3 (1976) 101–110. [7] A. Itai, Optimal alphabetic trees, SIAM J. Comput 5 (1976) 9–18. [8] M. Karpinski, Sequential and parallel subquadratic work algorithms for constructing approximately binary search trees, in: Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms vol. I, 1996, pp. 36–41 . [9] K.M. Kim, J.J. Park, M.H. Song, I.C. Kim, C.Y. Suen, Binary decision tree using genetic algorithm for recognizing defect patterns of cold mill strip, in: Proceedings of 17th Conference of the Canadian Society for Computational Studies of Intelligence, Canadian AI, London, Ont., Canada, May 2004, pp. 461–466. [10] D.E. Knuth, Optimum binary search trees, Acta Informatica 1 (1971) 79–110. [11] D.E. Knuth, The Art of Computer Programming 3, Addison Wesley, Reading, MA, 1973. [12] L. Larmore, A subquadratic algorithm for constructing approximately optimal binary search trees, Algorithms 8 (1987) 579–591. [13] S. Lotﬁ, A new genetic algorithm to construct optimal binary search tree, MS Theses, University of Isfahan, Isfahan, Iran, 2002 . [14] K. Mehlhorn, A best possible bound for the weighted path length of binary search trees, SIAM J. Comput. 6 (1975) 235–239. [15] K. Mehlhorn, Nearly optimal binary search trees, Acta Informatica 5 (1975) 287–295. [16] A. Moilanen, Simulated evolutionary optimization and local search: introduction and application to tree search, Cladistics 17 (2001) 512–525. [17] M. Mitchell, An Introduction to Genetic Algorithms (Complex Adaptive Systems), MIT Press, 1995. [18] S.V. Nagaraj, Optimal binary search trees, Theoret. Comput. Sci. 188 (1997) 1–44. [19] J. Szwarcﬁter, G. Navarro, R. Baeza-Yates, J.S. Oliveira, W. Cunto, N. Ziviani, Optimal binary search trees with costs depending on the access paths, Theoret. Comput. Sci. (TCS) 290 (3) (2003) 1799–1814. [20] R.L. Wessner, Optimal alphabetic search trees with restricted maximal height, Inf. Process. Lett. (1976) 90–94.

Lihat lebih banyak...

A new genetic approach to construct near-optimal binary search trees

Descrição do Produto

Comentários