Random Graphs: Structural-Contextual Dichotomy

August 3, 2017 | Autor: Andrew Wong | Categoria: Information Systems, Operations Research, Pattern Recognition, Random Graph Theory, Entropy, Probability Distribution & Applications, Layout, Electrical And Electronic Engineering, Probability Distribution & Applications, Layout, Electrical And Electronic Engineering

Share Embed

Denunciar este link

Descrição do Produto

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-2, NO. 4, JULY 1980

341

Ra ndom Graphs: Structural-Contextual Dichotomy ANDREW K. C. WONG,

MEMBER, IEEE, AND

Abstract-A formal definition of random graphs is introduced which is applicable to graphical pattern recognition problems. The definition is used to formulate rigorously the structural-contextual dichotomy of random graphs. The probability of outcome graphs is expressed as the product of two terms, one due to the statistical variability of structure among the outcome graphs and the other due to their contextual variability. Expressions are obtained to estimate the various probability, typicality, and entropy measures. The members in an ensemble of signed digraphs are interpreted as outcome graphs of a random graph. The synthesized random graph is used to quantify the structural, contextual, and overall typicality of the outcome graphs with respect to the random graph.

DAVID E. GHAHRAMAN, MEMBER, IEEE

ball

large

adj acent box 1

occluded

pyramid 2

distant

box 2

_-~-

random graph

box pyramid edge

Manuscript received August 17, 1977; revised October 9, 1979. This work was supported in part by the National Science Foundation under Grant GK 31080 and in part by the National Research Council of Canada under Operating Grant A4716. A. K. C. Wong is with the Department of Systems Design, University of Waterloo, Waterloo, Ont., Canada. D. E. Ghahraman was with the Carnegie Institute of Technology, Carnegie-Mellon University, Pittsburgh, PA 15213. He is now with Megatest Corporation, Santa Clara, CA. 'Specialized formalizations have previously been given in connection with certain problems of operations research [121 and in proving theorems of combinatorics [131; these are not sufficiently general for the applications considered in this paper.

high

outcome graph

ball

Index Tenns-Entropy, graph monomorphism, pattern recognition, random graph, signed digraph, typicality.

I. INTRODUCTION T HE NOTION of random graph is useful when solving problems in which the members in an ensemble of labeled graphs are to be compared and contrasted in order to exhibit their typical characteristics. Such problems arise in some pattern recognition tasks as well as in some phases of systems analysis and planning. In grammatical inference, for instance, the exemplars in the training set may be labeled graphs. The induction of certain rules of the grammar can be accomplished by discovering subgraphs which are common among the graphs [1]. A related task is pattern analysis and classification of graph-representable visual scenes [2] -[4]. In this case, each member of the ensemble is a graph of the relational structure describing the scene [5] -[8]. Certain environmental systems can be modeled by means of labeled graphs [9], [10]. In general, there may be a multitude of graphs representing the system at different instances or as perceived by different individuals [I 1 ] . A methodology for synthesizing the various graphs and to produce a probabilistic description can be utilized to estimate the future graph of the system or to quantify individual differences, in the two cases just mentioned, respectively. A random graph is a graph whose vertices and arcs are finite probability distributions.! The joint outcomes of the random

d low< id

pyramid 1

-

(large, small, null)

-

(high, low,

null}

- (tall, short, null}

- (adjacent, distant, occluded, null)

(a)

4-1, -,

___j

111(b) Fig. 1. Example of random graph. (a) Random graph and its outcome. (b) Spatial arrangement of objects in outcome graph.

vertices and random arcs are labeled graphs. Each outcome of a random graph consists of a labeled graph and a morphism (incidence-preserving mapping) of the labeled graph into the random graph. The morphism signifies, for each vertex or arc label, the random vertex or arc of which the label is an outcome. An example is given in Fig. 1. The random graph describes the class of scenes composed of at most two pyramids, two boxes, and one ball; and in which two objects of different type can be adjacent, distant, or occluded from each other. For instance, the scene shown in Fig. l(b) is represented by the outcome graph in (a). The utility of random graphs lies in the fact that they account for both the graphical (structural) and the probabilistic elements of the knowledge domains under investigation. This fact can be explored to differentiate those elements in the knowledge domain which are due to the variability of structural characteristics from those which are due to contextual characteristics. Given an ensemble of graphs, a random graph can be synthesized by associating morphisms with the graphs.

0162-8828/80/0700-0341$00.75

© 1980 IEEE

342

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-2, NO. 4, JULY 1980

Since the entropy of probability distributions is a measure of the statistical variability of their outcomes, the objective of minimizing the random graph entropy is a valid objective in comparing labeled graphs for common characteristics. Furthermore, the entropy can be expressed as the sum of a structural and a contextual entropy, measuring, respectively, the statistical variability of structure (existence of vertices and arcs in the outcome graphs) and that of context (identity of vertex and arc labels). Such measures are important in determining the nature of the typical characteristics of the outcome graphs. The potential applications of random graph synthesis can be seen in the major approaches to pattern recognition. In the decision-theoretic (statistical) approach [14], the statistical characteristics of a feature vector are used to classify patterns. In general, the features are statistically dependent, the simplest case being pair dependence. Statistical dependence, however, cannot account for semantic dependence of features; for instance, when the patterns are character sequences, the relation of two letters appearing in the same word is different from the relation of the same two letters appearing in different words. To distinguish types of feature-pair dependence, the (random) feature vector can be replaced by a random graph with vertices identical to the features and arcs having the various types of dependence as outcomes. The synthesis can then proceed on the basis of available patterns. Thus, random graphs are generalizations of feature vectors. In the syntactic (linguistic) approach to pattern recognition [15], [16], patterns are described as strings generated by a formal grammar. Recognition is accomplished by parsing strings according to the rules of the grammar. Two important classes of graphs for scene analysis are graph grammars (for high-dimensional scenes) and stochastic grammars (for noisy scenes) [15]. Since realistic scenes are high dimensional as well as contaminated by noise, the definitions of these classes of grammars may be extended to define the class of stochastic graph grammars. A sample set from such a grammar would be a collection of graphs together with their probabilities, and can be considered a random graph. Entropy minimization can be used as a procedure for reducing complexity of the sample set, a criterion for grammatical inference already used in the case of stochastic grammars [ 1 7] , [18] . II. DEFINITION OF RANDOM GRAPH The notion of random graph can be given as precise definition if the following requirements are met. 1) The outcomes of the probability space of a random graph are graphs. 2) The probability space of a random graph is complete. Unless both requirements are satisfied, it would not be meaningful to speak of randomness in connection with graphs and of graphical structure in relation to probability. A random graph is defined to be a graph G = (M,A) such that: 1) each vertex i EM, called a random vertex, and each arc a CA, called a random arc, is a finite probability distribution; 2) for every random arc a G A,

Prob{a=OIUG(a)= 0

or

TG(a)=0}-=1

where 0 is the null outcome, and UG(a) and rG(a) are the initial and terminal (random) endpoints of a, respectively; and 3) the probability space of the joint distribution consisting of all random vertices and random arcs is complete. The reason for introducing the null outcome is to permit a general definition for outcomes of a random graph. The condition (1) states that if either endpoint of a random arc has a null outcome, then, with probability one, the outcome of the arc would be null. Consider the probability space of the joint distribution. Let this space be called the joint probability space and refer to an element of it as a joint outcome. Let M and A be the set of nonnull outcomes of the random vertices and the set of nonnull outcomes of the random arcs, respectively. Define

{AM-*A by letting p(i) or a(a) stand for the random vertex or the random arc of which i or a is an outcome, respectively. A joint outcome J is completely specified as a quadruple J = (M? A, ,u a). The probability space of a random graph G = (M, A) consists of all pairs (G, y), called outcome graphs, such that: 1) G is a graph G = (M,A); 2) y is a monomorphism y =Iu, o) of G into G, y: G - G; and 3) for each vertex i GM and each arc a GA, i is a nonnull outcome of the random vertex ,u(i) and a is a nonnull outcome of the random arc a (a). The probability PG(G, -y) of an outcome graph is defined to be equal to the probability of the corresponding joint outcome. Let ,t' and &t' denote the inverses of, and a. In general, the inverse mappings are not total, i.e., ,'(i) = 0 if i is not the image of some vertex of G and cz'(a) = X if a is not the image of an arc. Then PG (G, y) = Prob {Q = 8'(i), V i E M), (a = a'(a), V a E A)}. (2) The purpose of the remainder of the present section is to show that the probability space of a random graph is complete. The joint probability space can be partitioned into two parts: 1) part I consists of all joint outcomes J = (M,A, ,u, a) such that there is a CA with P'(G(a(a))) = 0 or '(G(a(a))) = 0; and 2) part II is the complement of part I. For a joint outcome J in part I, J contains the event {a(a) $ 0, uG(OaO)) = 0 or TG(a(a)) = 0} which, according to (1) is of probability 0. It is therefore concluded that the probability of every joint outcome in part I is 0. Since the joint probability space is complete, it follows that part II is also complete. It can be shown that for every joint outcome J = (M, A, ,u, a) in part II, FeG(c (a)) = P (UG ()

TG ( (a)) = (TG(a)) . Thus, the pair (p, oa) is a monomorphism y = (p, ae), y: G - G, (1) and J corresponds to an outcome graph (G, y), where G =

343

WONG AND GHAHRAMAN: RANDOM GRAPHS

(M,A) and -y = (pu, a). It follows that the probability space of part" of the probability of (G, G is complete. =

y), is

pG(G, 7) PG(G, Y)/PG(G, 7)

III. STRUCTURAL-CONTEXTUAL DICHOTOMY which is immediately recognized to be the probability of the 7 E. Hence, correspondence It has been shown that there is a one-to-one conditional event (G,,y) a comand between the probability space of a random graph PG (G, y) = PG (G, y) PG (G, Py) plete subspace of the probability space of the joint distribution. The fact that a graph outcome can be considered both a and the probability of an outcome graph has been decomposed graph and an outcome of a probability space is significant. into a structural component and a contextual component. From the purely graphical viewpoint an outcome graph is a IV. ESTIMATION OF PROBABILITY "structure." From the information-theoretic point of view, sources information of probabilities of outcome graphs is rethe by estimation The provided however, it is a "context" of random graphs. Reasonable space. probability the applications the in constituting quired (probability distributions) the (high) dimensionality of reduce to are sought assumptions formalized. be indeed can The dichotomy simple assumption that all The space. probability the joint a ran) of and (G1 graphs Consider two outcome ,-yl) (G2, y2 mutually independent is, are arcs random vertices and random atk) = 7k = = and Ak) Let A). (M, dom graph G (I'k, Gk (Mk, to (1), a certain degree according since, untenable in general, mappings = of/Ik inverse the and by a4 Denote 2. for k 1, g4 is required. of dependence compositions the Define and ak, respectively. The expression (2) for the probability of an outcome graph { 112: M1 -*-M2: i -*1 (i) -* P (P1 (i)) can be written as A1 12 AlA2: a--al1(a)-e ' (a1(a)). PG (G, y) = Prob {i = 8'(i), V i E M} Then the outcome graphs (G1,,yi) and (G2,72) are called structurally equivalent if 712 = (112,,t12) is an isomorphism 712- G1 -e G2 * Thus, for each random vertex i the outcomes (i) and g2 (i) are either both null or both nonnuil. A similar statement is true for the random arcs. The structural equivalence relation defined on the probability space of a random graph G induces a partition of the probability space into mutually exclusive events. The partition will be called the structural equivalence partition, and its members, structural equivalence events. Let E be such an event. Then the probability of E is

PG(E)=

Z

(G, y) G E

pG (G,7)

where the summation extends over all outcome graphs in E. The structural equivalence partition is itself a complete probability space with the structural equivalence events as its outcomes. Every outcome graph (G, ^) in a structual equivalence event E is isomorphic to the same subgraph of G. That subgraph is defined by all the random vertices and random arcs whose outcomes are nonnull in every outcome graph in E. Any two outcome graphs in E become indistinguishable graphs if the setmembership of their vertices and arcs is disregarded. On the other hand, the event E contains all the probable combinations of the outcomes of the random vertices and random arcs, given a fixed structure (namely, that of the subgraph of G) for outcome graphs. Therefore, an outcome graph (G,7y), viewed purely as a graph, coincides with E and occurs with probability pG(E). Hence, the structural probability of an outcome graph (G, y) can be defined as

pG (G, y) = PG (E) where E is the (unique) structural equivalence event to which (G, y) belongs. The contextual probability, the "4remaining

X Prob {a = oa'(a), V a CA) (i = ,'(i), V i EM)}.

(3) The form of (3) suggests that the following assumptions be made. 1) The random vertices i, i E M, are mutually independent. 2) The conditional distributions a] (i = ,u'(i), V i E M), a E A, are mutually independent. 3) A random arc a is independent of any random vertex which is neither aG (a) nor TG (a). 4) For each random arc a, the conditional distribution a] (G(a) = 1'(UG(a)), TG (a) = 11'(TG(a))) is equal, with probability one, to the conditional distribution

alI (GG(a) * 0,'TG(a) if ,u (G (a))

$

=

0)

0 and ,u(TG (a))

0.

Assumptions I)-3) state that random vertices and arcs are "almost" independent, the exception being the dependence of random arcs on endpoints. The meaning of assumption 4) is that, if the outcomes of the endpoints of a random arc are both nonnull, then the probability of any outcome of it is the same regardless of the actual outcomes of the endpoints. Let A},(M) denote the set of random arcs each of which has both endpoints in the set 1u(M) of random vertices that are images of the vertices M of G. Thus, a CAm,(M) if and only if the outcomes of the endpoints of a are nonnull in the outcome graph G. Now (3) becomes PG (G, Y) = H Prob {i = ieM

H

X a

CA

(M)

,'(i)}

Prob {a =at'(a) (aG (a) # 0,TG(a)

¢3)}I (4)

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-2, NO. 4, JULY 1980

344

To write (4) in a manageable form, the following notation is used. For a random vertex i,

terest to the subject of random graphs is the interpretation of entropy as a measure of the variability of outcome graphs. If there is a significant similarity among a large number of outcome graphs, then the value of h(G) is low; otherwise it is high. The entropy (10) can be evaluated using the probability ex(5) pression (7):

pg(v) = Prob {i = v}

iPi = Pi(0)

qi= 1 -pi where v is an arbitrary outcome of i. Let I be an arbitrary outcome of a random arc a. Then 7 0, TG(a)> 0)} NOPa() = Prob {a = I

IUG(a)

h(G)=

i E=M

h(i)+ E h(a) a EA

(11)

where the summands are the entropies of the random vertices and random arcs. The entropy of a random vertex i is given by

Pa = Pa(A) qa= I -Pa

h(i) = - E pi(v) log pi(v) 0 ca = Prob {aG(a) (D, rG(a) =} (6) v where the last quantity, the probability that both endpoints have nonnull outcomes, has been defined for future reference. where the sum is over all outcomes of i (including the null outcome) and the probabilities are defined in (5). The conditional Using (4)-(6) entropy of a random arc a is =

H pi('(i)) PG(G, y) = iGM

l Pa(t'(a)).

aEA,u(M)

(7)

h(a) =-Ca

Pa(l) log Pa(l)

The structural and contextual probabilities of an outcome graph can, in this case, be expressed directly although a formal where the sum is again over all outcomes of a and the probabilities as well as ca are defined in (6). The reason for the derivation can be given [19] . The structural probability is coefficient ca is the fact that the probabilities pa(l) are conditional probabilities, the condition being precisely the event PG(G, y) = = Pi [,H 0 whose probability is ca. Ag) 0 A(i) The argument concerning the definition of structural probability can be applied to structural entropy as well. The struc(8) tural probability p"(G, y) was defined as the probabilitypG(E) H Pa Hl qa X aEAU(M) aGEAAH() . of the structural equivalence event E of (G, y). Consequently, 0 J L.a'(a) = La(a) the structural entropy of G can be defined as the entropy of The contextual probability is the ratio of PG(G, y) and the probability space of the structural equivalence partition

qi]

JL

PG(G, -y):

p

(G,y)=

L0 p('(i))/qi] (i, + XJ H

Pa(a'(a))qa]1 a E A(M)

Lcx (a)

0

(9) It is observed that there is no contribution to contextual probability from random vertices and arcs with null outcomes; p (G, y) is seen intuitively to be the probability of (G, 'y) assuming the structure of G. V. ENTROPY OF RANDOM GRAPH An important information measure for complete finite probability distributions is (the Shannon) entropy [20], [211 . Its importance lies in the fact that entropy can be axiomatically characterized as possessing the natural properties which a measure of information or uncertainty may be required to have. The entropy of a random graph G = (M, A) is defined by

h(G)=-Z PG(G,y7)logPG(G, y) (G, y)

(10)

where the summation contains a term for every outcome graph (G, y) of G and the logarithm is evaluated in base 2. Of in-

h"(G) = - Z PG(E) log pG(E). E

This expression can be evaluated using (8) to obtain

h"(G)M= h"(i)+ E h"(a) i E=M

a E=-A

(12)

The structural entropy of a random vertex i is

h"(i) = - pi log pi - qi log qi and measures the variability of the nullity and nonnullity of the outcomes of i. Similarly, the conditional structural entropy of a random arc a can be written as

h"(a) = - Ca(Pa log Pa + qa log qa) which is a measure of the variability in the nullity and nonnullity of the outcomes of a assuming the nonnullity of the outcomes of the endpoints of a. The contextual entropy of G is defined to be the difference between the entropy of G and its structural entropy. Before stating the expression, like definitions are given for random vertices and arcs. The contextual entropy of a random vertex

345

WONG AND GHAHRAMAN: RANDOM GRAPHS

is

1

(p1(v)/qj) log (p1(v)/qj) 0 in which only the nonnull outcomes of i are included. The conditional contextual entropy of a random arc a is given by h'(i) = - q1i

vo

(PaQ)/qa) log (Pa(l)/qa) 1+0 The following relations are obtained easily: h(i) = h"(i) + h'(i), V iEM, (13) h(a) h"(a) + h'(a), V a EA. The contextual entropy h'(i) is a measure of the variability of the nonnull outcomes of i assuming the nonnullity of its outcomes (hence the presence of the coefficient qi). For a random arc a, the probabilities in h'(a) are "doubly" conditional. The first condition is the nonnullity of endpoints of a; the second is the nonnullity of the outcome of a itself. Finally, the contextual entropy of a random graph G = (M, A) is obtained from (11)-(13) as h'(a) =

3

q a

=

h'(G)= E h'(i)+ £ h'(a) iGEM

aEA

(14)

and can be considered a measure for the contextual variability of outcome graphs assuming their structure. Entropy can be expressed as the average value (expectation) of self-information. The self-information of an outcome of a finite probability distribution is defined to be the negative of the logarithm of the probability. The higher the probability of the outcome, the lower its self-information would be. Hence, self-information may be considered an inverse measure of "typicality" of outcomes [21]. The self-information, structural self-information and contextual self-information of an outcome graph are given by

rtG (G, Y) = - 109 PG (G, y) { tG (G, y) = - log p (G, y), t' (G, y) = - logpG(G, y),

(15)

respectively, where the probabilities are those given in (7)-(9). VI. EXAMPLE OF APPLICATION Some environmental problems related to air pollution, solidwaste disposal, and energy demand can be formulated and analyzed in terms of signed digraphs [9], [10]. A signed digraph is a labeled graph with signs (positive and negative) as labels for the arcs. The variables (e.g., population size, number of passenger miles) of the problem are represented by the vertices of a signed digraph while the causal relations for pairs of variables are depicted by the arcs and their signs. For instance, the population size of an area may have a positive effect on the number of passenger miles commuted annually. The purpose of analyzing a signed digraph is to answer questions such as the effects of a strategy of introducing changes at the vertices [22].

6 5 Fig. 2. Energy demand signed digraph. 1-Passenger miles. 2-Fuel economy. 3-Population size. 4-Cost of car. 5-Price of commuter ticket. 6-Emissions. 7-Accidents. 8-Probability of delay. 9-Fuel consumption.

A major concern in the signed digraph formulation is the actual construction of the signed digraph representing a physical system, i.e., the choice of variables and the nature of their relationships. It has been proposed [10], [1 1] that the construction can be carried out by combining the subjective judgments of the members of a panel of experts regarding the questions posed. It is conceivable that only in rare occasions would the experts agree completely. In general, it is necessary to compare and contrast the various opinions to extract the common or typical elements for inclusion in a deterministic signed digraph representation. The concept of random graph is applicable to the quantification of expert judgment in constructing graphs of physical systems. The justification may be found in the following general argument. If there exists a deterministic graph representing a system or a problem, then experts must agree as to what that graph should be. Hence, if there is considerable disagreement (a likely possibility), then there is a multitude of probable graphical representations. It follows that the system or problem under consideration is best described by a random graph whose outcome graphs are the probable (deterministic) representations. An expert's opinion may be envisioned as a particular outcome graph. Furthermore, if a probable representation is selected for analysis, the reliability of the conclusions is governed by the typicality of the outcome graph with respect to the random graph. Thus, only a highly typical outcome graph would be an acceptable candidate for a deterministic representation. An energy demand signed digraph for intra-urban commuter transportation has been constructed on the basis of expert judgment and analyzed in [11]. The nine variables of the problem (obtained from an earlier investigation [10] as well as the signed digraph are reproduced in Fig. 2. Questionnaires were distributed among the seven members of a panel of experts to obtain the opinion of each expert regarding the relation (none, positive, or negative) of every pair of variables. The opinions of the respondents can be represented as the signed digraphs shown in Fig. 3. A random graph G = (M, A) describing energy demand for intra-urban commuter transportation can now be constructed

346

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-2, NO. 4, JULY 1980 11

1

1

P4~~~~

/tt

6I

(g

Fig. 3. Signed digraphs. (a) Graph G1. (b) Graph G2. (c) Graph G3. (d) Graph G4. (e) Graph Gs. (f) Graph G6. (g) Graph G7.

WONG AND GHAHRAMAN: RANDOM GRAPHS

347

TABLE I STRUCTURAL ENTROPY. h"(G) = 45.214

TABLE II CONTEXTUAL ENTROPY. h'(G)

1

2

3

4

5

6

7

8

9

1

.00

.99

.00

.79

.00

.86

.00

.59

.59

2

.99

.00

.00

.86

.59

.00

.86

.59

.00

2

.46

.00

.00

3

.00

.00

.00

.79

.99

.59

.86

.00

.00

3

.00

.00

.00

4

.56

.56

.00

.59

.56

.86

.79

.00

.79

4

.00

.98

.00

.00

5

.00

.86

.00

.79

.00

.99

.99

.99

.99

5

.59

.29

.00

.46

6

.99

.86

.99

.79

.86

.00

.86

.59

.86

6

.46

.69

.00

.46

7

.99

.86

.86

.79

.86

.99

.00

.00

.59

7

.46

.29

.52

.29

8

.59

.99

.59

.56

.86

.86

.99

.00

.86

8

.00

.00

.00

.00

9

.86

.86

.00

.79

.99

.00

.59

.86

.00

9

.29

.00

.00

.00

1

=

17.633

1

2

3

4

5

6

7

8

9

.00

.00

.00

.29

.99

.00

.00

.00

.00

.39

.00

.00

.29

.00

.59

.00

.68

.00

.00

.00

.00

.69

.39

.46

.00

.46

.00

.46

.39

.46

.46

.29

.00

.00

.00

.69

.00

.68

.00

.00

.00

.00

.00

.57

.00

.69

.57

.59

.00

.29

.00

TABLE III SELF-INFORMATION OF OUTCOME GRAPHS Structural

Contextual Self-information

Outcome

Self-information

1

66.3

46.2

20.1

70

2

71.7

42.7

29.0

60

3

42.5

38.9

3.6

92

4

65.6

50.1

15.5

76

5

66.6

49.0

17.6

74

6

76.8

49.5

27.3

64

7

50.5

40.3

10.2

80

Self-information

on the basis of expert judgment. The random vertices are denoted by underlining the integers 1, * -, 9. For each random vertex i 0 4, there is only one admissible outcome, namely variable i, with probability 1. For 4, the admissible outcomes are 4 and 0 (null) since variable 4 is missing in one of the "observations" (G3). The admissible outcomes of a random arc a arc 0, +, -, (+), and (-). The last two alternatives mean "weakly" positive and negative, respectively, and their occurrences (not shown in Fig. 3) were rare. To determine the probability space of G, the graphs Gk are interpreted as outcome graphs (Gk, Yk) where Yk maps every vertex i of Gk onto a random vertex i bearing the same number. Hence, each vertex or arc of each graph is interpreted as an "observation" of a random vertex or random arc. The structural and contextual entropies of the random vertices and random arcs are given in Tables I and II, respectively. The results indicate that most of the variability in outcome graphs is structural (about 72 percent). The same conclusion is reached from Table III where the self-informations (15) of the outcome graphs are tabulated. This is also observed in Fig. 3. From the structural viewpoint (disregarding signs) the graphs are clearly considerably different. From the contextual viewpoint (fixing structure and regarding context), however, significant similarities are detected. For instance, consider the

% Structural

outcome graphs in which the circuit (1, 7), (7, 8), (8, 1) exists (G2, G3, G4, G5, G6, and G7). Among this group there is complete agreement with respect to the signs of the arcs in the circuit. In the terminology of [101 this directed circuit is a deviation-counteracting cycle because of the signs of its arcs. That is, an increase in passenger miles (vertex 1) leads to an increase in accidents (vertex 7) which in turn leads to an increase in the probability of delay (vertex 8) which finally leads to a decrease in passenger miles. Thus, among the experts who agreed on the existence of the circuit there is agreement about the nature of the circuit. The agreement is manifest in the form of low contextual entropy. From Table III the signed digraph G3 is determined to be the most typical (with respect to probability of occurrence), both structurally and contextually. In fact, it is similar to the one shown in Fig. 2, the signed digraph finally chosen in [1 1 ] for further analysis. It may be concluded that the "opinion" of the third expert is typical of the members of the panel selected for the problem. It has been suggested that low values of entropy of the random graph G are indicative of the similarity of the outcome graphs. However, the entropy of G depends on the interpretation of the signed digraphs Gk as outcome graphs, i.e., on the choice of the monomorphisms Yk. Hence, it may be possible

348

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-2, NO. 4, JULY 1980

to obtain a lower value of entropy if a choice of 'Yk (for some k), other than the one given before, is made. The possibility can be investigated by solving the optimization problem min h(G) (16) 'yk

where 'Yk: Gk -e G. In view of (12) this problem is an optimal graph monomorphism problem and can be solved using the techniques developed in [241. Since the graph monomorphism (subgraph isomorphism) problem is NP-complete [231, at present only approximate algorithms for problems of nontrivial size are practicable. The problem defined in (16) was approximately solved for some of the signed digraphs. The results indicate that the construction of the random graph carried out previously is consistent with the criterion of minimum entropy for the detection of similarities among signed digraphs. VII. CONCLUSION In the example of application the synthesis of the random graph was straightforward since every vertex of a signed digraph represented a unique variable. In applications such as scene analysis, however, the vertices of a labeled graph will not generally represent unique objects. For instance, consider a scene composed of objects some of which are identical. Then a labeled graph describing the objects and their spatial arrangement would contain vertices and arcs which could not be interpreted unambiguously as outcomes of random vertices and random arcs. In such cases, the objective of minimizing random graph entropy can remove part of the ambiguity. This objective is partially achieved by solving the optimal graph monomorphism problem (16) for each of the outcome graphs until no further reduction of entropy is possible. The final entropy and typicality measures provide a facility for structural or contextual classification of the labeled graphs in the ensemble. REFERENCES [1] F. Hayes-Roth, "Uniform representation of structured patterns and an algorithm for the induction of contingency-response rules," Inform. Contr., vol. 33, pp. 87-116, 1977. [2] M. A. Fischler and R. A. Elschlager, "The representation and matching of pictorial structures," IEEE Trans. Comput., vol. C-22, pp. 67-92, Jan. 1973. [3] U. Montanari, "Optimization methods in image processing," in Proc. IFIP Cong. 74. Amsterdam, The Netherlands: NorthHolland, July 1974, pp. 727-732. [4] C. T. Zahn, Jr., "An algorithm for noisy template matching," in Proc. IFIP Cong. 74. Amsterdam, The Netherlands: NorthHolland, July 1974, pp. 698-701. [5] H. G. Barrow et al., "Some techniques for recognizing structures in pictures," in Frontiers of Pattern Recognition, S. Watanabe, Ed. New York: Academic, 1972, pp. 1-29. [6] E. L. Morofsky and A. K. C. Wong, "Computer perception of complex patterns," in Proc. 2nd Int. Joint Conf. Artificial Intelligence, Imperial College, London, Sept. 1971, pp. 248-257. [7] T. Pavlidis, "Representation of figures by labeled graphs," Pattern Recognition, vol. 4, pp. 5-17, Jan. 1972. [8] A. C. Shaw, "Parsing of graph-representable pictures," J. Ass. Comput. Mach., vol. 17, pp. 453-481, July 1970. [9] M. Maruyama, "The second cybernetics: Deviation-amplifying

[101 [111

[121 [13]

[14] [15] 116]

[17]

[181 (19] [20] [21]

[22] [23]

[24]

mutual causal processes," Amer. Scientist, vol. 51, pp. 164-179, June 1963. F. S. Roberts, "Signed digraphs and the growing demand for energy," Environment and Planning, vol. 3, pp. 395-410, 1971. F. S. Roberts, "Building and analyzing an energy demand signed digraph," Environment and Planning, vol. 5, pp. 199-221, 1973. C. W. Marshall,Applied Graph Theory. New York: Wiley, 1971. P. Erd6s and J. Spencer, Probabilistic Methods in Combinatorics. New York: Academic, 1974. R. 0. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973. K. S. Fu, Syntactic Methods in Pattern Recognition. New York: Academic, 1974. K. S. Fu, Ed. Syntactic Pattern Recognition, Applications. Berlin: Springer-Verlag, 1977. K. S. Fu and T. L. Booth, "Grammatical inference: Introduction and survey-Part II," IEEE Trans. Syst., Man, Cybern., vol. SMC-5, pp. 409-423, July 1975. S. Soule, "Entropies of probabilistic grammars," Inform. Contr., vol. 25, pp. 57-74, 1974. D. E. Ghahraman, "Graph morphism and its engineering applications," Ph.D. dissertation, Carnegie-Mellon Univ., Pittsburgh, PA, Aug. 1976. J. Aczel and Z. Daroczy, On Measures of Information and Their Characterization. New York: Academic, 1975. A. K. C. Wong and T. S. Liu, "Typicality, diversity and feature patterns of an ensemble," IEEE Trans. Comput., vol. C-24, pp. 158-181, Feb. 1975. T. A. Brown et al., "Pulse processes on signal digraphs: A tool for analyzing energy demand," The Rand Corp., Santa Monica, CA, Rep. R-926-NSF, 1972. A. V. Aho et al., The Design and Analysis of Computer Algorithms. Reading, MA: Addison-Wesley, 1976. D. E. Ghahraman, A. K. C. Wong, and T. Au, "Graph optimal monomorphism algorithm," IEEE Trans. Syst., Man, Cybern., vol. SMC-10, pp. 181-188, Apr. 1980.

Andrew K. C. Wong (M'79) received the Ph.D. degree from Carnegie-Mellon University, Pittsburgh, PA, in 1968. He is currently a Professor of Systems Design 111 1!1 at the University of Waterloo, Waterloo, Ont., Canada. Prior to this he was an Associate Professor in the Biomedical Engineering Program of Carnegie-Mellon University. He is the author and coauthor of many papers concerning pattern, image, and scene analysis; synthesis and analysis of systems and relational structures; biomedical signal analysis; computer-aided clinical diagnosis and prognosis; biomedical and health care systems; and applications of information theory and complex information processing to molecular biology and genetics. He is currently an Associate Editor of the Journal of Computers in Biology and Medicine.

David E. Ghahraman (M'77) received the B.S.E.E., M.S.E.E., and Ph.D. degrees in engineering from Carnegie-Mellon University, Pittsburgh, PA. From September 1976 to October 1979, he was a Research Associate in the Transportation Research Institute, Carnegie-Mellon University. Since November 1979, he has been with Megatest Corporation, Santa Clara, CA, as a Software f Engineer. His fields of interest include pattern recognition, transportation planning, minicomputer hardware, and software design and application. Dr. Ghahraman is a member of the Association for Computing Machinery and Sigma Xi.

Lihat lebih banyak...

Random Graphs: Structural-Contextual Dichotomy

Descrição do Produto

Comentários