A Schema Theorem for context-free grammars

June 25, 2017 | Autor: Peter Whigham | Categoria: Genetic Algorithm, Learning System

Descrição do Produto

A Schema Theorem for Context-Free Grammars P.A.Whigham Department of Computer Science, University College, University of New South Wales Australian Defence Force Academy Canberra ACT 2600 AUSTRALIA Email: [email protected] ABSTRACT The basic Schema Theorem for genetic algorithms is modi ed for a grammatically-based learning system. A context-free grammar is used to de ne a language in which each sentence is mapped to a tness value. The derivation trees associated with these sentences are used to de ne the structure of schemata. The eect of crossover and mutation on schemata is described. A schema theorem is developed which describes how sentences of a language are propagated during evolution.

1. Introduction

2. Grammatical Genetic Learning

The Schema Theorem for genetic algorithms (GA) [3] de nes how useful structures in a population of strings are propagated during the evolution of a solution. It gives a lower bound on the propagation of schemata from one generation to the next. A GA uses a tness function to evaluate the population, and the genetic operations of crossover and mutation to search for new strings. The xed-length nature of GA's has been extended with genetic programming (GP) [4], which uses a tree-based representation for the population members. Crossover and mutation are performed on subtrees of varying size. The variable-length nature of GP has introduced additional complications when trying to formalise a schema theorem for GP [5]. The lack of structure with GP programs made the schema theorem dicult to express without introducing functional de nitions to represent subtree patterns. This paper describes an evolutionary approach to learning using a language de ned by a context-free grammar (CFG) 1. The derivation trees associated with the sentences created using the CFG are manipulated using the operations of crossover and mutation. These operations are guaranteed to create only legal sentences, biased by the grammar. This satis es the issues of closure and typing that have occurred with genetic programming (GP). The use of a formal grammatical structure allows a schema, de ned by the derivation trees, to represent the sentences (programs) of the language.

An evolutionary framework for learning strings de ned by the language L, and represented by the context-free grammar G, may be stated as: 1. Create an initial random population Li (G) L(G) with derivation trees Di (G) D(G) 2. While termination conditions unsatis ed (a) evaluate each derivation tree 2 Di (G) (b) create new population Di+1 (G) where Di+1 (G) D(G) using: i. Fitness Proportionate Reproduction ii. Single-Point Crossover iii. Single-Point Mutation

1

[1].

2.1. Evaluating a Derivation Tree A function F is de ned, that maps a derivation tree

to a real-valued tness, as F () 7?! < where 2 Di (G) The evaluation of F () is used as the proportionate tness measure when selecting members of Di (G) during the evolution. The schema de nition is independent of the form of F . Hence, our observations will be independent of how the sentences in the language L(G) are interpreted.

2.2. Fitness Proportionate Reproduction

An introduction to formal languages may be found in This represents the copying of some derivation tree 2 Di (G) to Di+1 (G).

S

S X

Y

S X

H

G

Y

Terminal

Terminal Nonterminal

A

+

+ Derivation A=>α

Derivation Tree ρ1

A Derivation A=>β

Nonterminal

A

Derivation Tree ρ1

Derivation Tree ρ2

+

+ Derivation A=>α

A Derivation A=>β

New Derivation using L(G)

Delete Subtree

S

S

S X

Y + A Derivation A=> β

Modified ρ1

G

H + α A Derivation A=>

Modified ρ2

Fig. 1: Crossover with Derivation Trees

2.3. Single-Point Crossover

Figure 1 shows an example of crossover when viewed as a derivation tree. A derivation tree 1 2 Di (G) is selected and some nonterminal A 2 1 selected at random. This represents the crossover site for 1 . This sitePis the complete derivation tree + A) where 2 . A second derivation tree 2 2 Di (G) is selected and a matching symbol A 2 2 is selected at random. If no matching symbol exists the crossover is recommenced. This sitePis the complete derivation + tree A ) where 2 . Crossover creates two new sentences of L(G), by + + swapping the derivation trees A ) and A ) of 1 and 2 . The modi ed derivation trees 1 and 2 are copied to Di+1 (G). Rather than A being selected from any of the nonterminals N, A is restricted such that it belongs to some speci ed subset of N : A 2 N. This will be referred to as selective crossover.

2.4. Single-Point Mutation

Figure 2 shows an example of mutation based on derivation trees. A derivation tree 2 Di (G) is selected and some nonterminal A 2 selected at random. This represents the mutation site for . + This site is P the complete derivation tree A ) where 2 . This derivation tree is deleted and a random complete derivation tree, rooted in A, is created usingPthe grammar G. This deriva+ + tion, A ) ; 2 , replaces A ) in . This modi ed is copied to Di+1 (G). Rather than A being selected from any of the nonterminals N, A is restricted such that it belongs to some speci ed subset of N : A 2 N. This will be referred to as selective mutation.

X

Y + A Derivation A=> β

Modified ρ1

Fig. 2: Mutation Applied to a Derivation Tree

3. Schema De nitions The Schema Theorem for genetic algorithms describes how useful schemata are propagated when genetic operators are applied to a population of strings. As de ned by Goldberg[2]: A schema H is a similarity template describing a subset of strings with similarities at certain string positions. There are two properties used to describe how a schema is disrupted by crossover and mutation: 1. The De ning Length (H) is the number of bits between the rst and last bits within H. 2. The Schema Order o(H) is the number of xed positions in H. For a language de ned by a CFG, the schema refer to derivation trees (i.e. subtrees) which are rooted in some nonterminal.

De nition 1 A schema H for a context-free gram , where x 2 N and mar is theP derivation tree x ) 2 fN [ g. 3.1. Schema and Reproduction Using the notation from [2], the eect of reproduction on t schema is de ned as: m(H; t + 1) = m(H; t) f (fH ) where m(H; t) is the tness of schema H at time t, f(H) is the average tness of the sentences 2 Li (G) containing schema H and f is the average tness of the entire population.

3.2. Schema Disruption due to Crossover Proposition 1 The probability of the schema x ) , within derivation tree 2 Di (G), being disrupted during selective crossover, where N is as follows:

8 j x ) j ? j j ?1 ><

x2

j j

= > j x ) j ?jj

: x 26

j j

wherej x ) j is the number of symbols 2 that occur in the schema, j j is the number of symbols 2 at the frontier of the schema and j j is the

number of symbols 2 in the total derivation tree rooted in S.

3.3. Schema Disruption due to Mutation Proposition 2 The probability of the schema x ) , within derivation tree 2 Di (G), being disrupted during selective mutation, where N is as follows:

8 j S; ::; x j + j x ) j ? j j ?1 >< j j = > j S; ::; x j + j x ) j ?jj : jj

X ?

f(): j jx)

G) ) = 2Di (X f (x )

where

X 2Di (G)

2Di (G)

j jx)

indicates a sum over the n deriva-

tion trees that exist in the population, f() is the tness of and j jx) represents the number of schema in the derivation tree . The average tness of x ) over the entire population is now the sum of all schema contributions. X ? f (x ) ) ? x) 2D (G) f= j Di (G) jx) In a similar manner we can argue that the disruption to crossover and mutation must be summed over the population to give an average measure of disruption. The average disruption due to crossover is de ned as follows.X : j jx) ? (G) = 2DX j jx) i

i

2Di (G)

Similarly, the average disruption due to mutation is de ned as: X x2 : j jx) ? 2D (G) = X x 62 j jx) i

2Di (G)

where j S; ::; x j represents the number of sym- Theorem 1 The schema theorem for a GA of bols 2 in the derivation tree from the start symbol length n, using independent mutation and singleto the nonterminal x, j j is the number of sym- point crossover may be stated as [7] 2 : bols 2 at the frontier of the schema and j j is the number of symbols 2 in the total derivation tree rooted in S. P(H,t+1) P(H; t) f (?H )

n

4. A Grammatical Schema Theorem

The Schema Theorem has been used to describe the probability of a particular schema existing in the population from one generation to the next. The GA Schema Theorem assumes that all members of a population are the same size. This cannot be assumed when using the grammatical description of programs, and results in 2 observations [5]. 1. A schema x ) may exist more than once in a population member . 2. The probability of disrupting a schema will depend on the size of the population member which contains the schema. Before presenting the schema theorem for grammatical learning these variations must be incorporated into the basic de nition of and . The average tness of a schema within a population Di (G) of derivation trees is given by:

1 - pc n(?H1)

o

f

(1-pm )o(H )

where P(H; t) represents the probability of schema H being present at time t, pc is the probability of crossover, pm is the probability of muta? tion, f(H) is the average tness of H and f is the average tness of the population.

Theorem 2 The schema theorem for grammatical learning using single-point (selective) mutation and single-point (selective) crossover may be stated as: ?

; t + 1) P(x ) ; t) f (x)) P(x ) ?

n

o

f

? ? (1 - pc )(1 ? pm ) 2 The complication of a schema being reintroduced by crossover or mutation is ignored.

5. Discussion

Normally with genetic approaches mutation is used as a local syntactic search and crossover a more global search. However, with the grammatical genetic de nition, selective mutation is more disruptive than selective crossover. This is easily shown as j S; : : :; x j 0. However, as j x ) j becomes large j x ) j j S; : : :; x j , and the disruptions are equal to a rst approximation. Extending this further, if = then for large schemata = . The independent bit-level mutation of GA's may be modelled by assuming that any nonterminal A 2 has equal probability of being mutated in the manner described by selective mutation. If this is the case, the de nition for is independent of the size of the derivation tree which contains the schema. The measure used for disruption is then the number of nonterminals 2 within x ) .

overall behaviour. These changes to G do not aect the overall behaviour of the learning system, however they change the possible intermediate forms of schemata. Work is required to describe how changes to a grammar and the sets and relate to building and maintaining schemata.

7. Conclusion This paper demonstates the de nition of a schema theorem for languages de ned by a context-free grammar. The formal methods allow a clear statement of how sub-parts of a solution interact during evolution. These results may be used to assist the understanding of GP and variable-length techniques using evolution.

8. Acknowledgments

Thanks must go to Bob McKay, Xin Yao, Justinian

Proposition 3 The probability of a schema x ) Rosca, Una-May O'Reilly and Desley Anthony for , within derivation tree 2 Di (G), being disrupted suggestions. during selective independent mutation, where the probability of mutation is pm , will be:

(1 ? pm )

References

[1] D.Barrett, R. Bates, D. Gustafson, and J.Couch, Compiler Construction: Theory and 8 j S; ::; x j + j x ) j ? j j ?1 x 2 Practice. Science Research Assoc, Inc., 1986. < [2] D. E. Goldberg, Genetic Algorithms in Search, =: Optimization and Machine Learning. Addisonj S; ::; x j + j x ) j ? j j x 62 Wesley Publishing Company, inc., 1989. [3] J. H. Holland, Adaptation in Natural and Arti5.1. Maintaining Schema cial Systems. MIT Press, second ed., 1992. [4] J. R. Koza, Genetic Programming:on the proTo ensure that t schemata are propogated to fugramming of computers by means of natural seture generations, we want to minimise lection. A Bradford Book, The MIT Press, 1992. jx)jf [ g ?jjf [ g jjf [ g [5] U.-M. O'Reilly and F. Oppacher, \The trouOne possibility is that the grammar productions bling aspects of a building block hypothesis for should be modi ed to reduce the number of nontergenetic programming," in Foundations of Geminals 2 f [ g in the derivation of a schema. For netic Algorithms 3, Morgan Kaufmann Pub., + example, given the schema A ) bc with derivation 1994. steps: [6] P. Whigham, \Operations that modify a gramA ! Bc B ! b mar to direct the search space of a genetic pro A ) Bc ) bc gram," Tech. Rep. CS2/95, University College, an alternative production A ! bc would improve University of New South Wales, 1995. the chances of the schema surviving. The derivation [7] D. Whitley, \A genetic algorithm tutorial," steps could then be modi ed to A ! bc Tech. Rep. CS-93-103, Computer Science Dept., A ) bc Colorado State University, 1993. A discussion of how these new productions may be learnt during the evolution of a solution is beyond the scope of this paper [6]. Alternatively, the value of j jf [ g must increase to protect the schemata. This may account for the condition of bloating that is observed with GP. where

6. Future Work There are an in nite number of grammars G with

dierent schemata representations, but the same

Lihat lebih banyak...

A Schema Theorem for context-free grammars

Descrição do Produto

Comentários