A comparative study of neural network to artificial noses

June 26, 2017 | Autor: Aida Ferreira | Categoria: Pattern Recognition, Neural Network, Temporal Processing, Multi-layer Perceptron, Probabilistic Neural Network, Artificial Neural Network

Share Embed

Denunciar este link

Descrição do Produto

A comparative study of neural network to artificial noses A. A. Ferreira1, T. B. Ludermir1, R. R. B. de Aquino2 1 Center of Informatics – Federal University of Pernambuco, P. O. Box 7851. Cidade Universitária, Recife-PE, Brazil. 50.732-970 {aaf, tbl}@cin.ufpe.br 2 Departments of Electrical Engineering and Power System – DEESP Federal University of Pernambuco [email protected]

Abstract - Artificial neural networks have been used to classify odor patterns and are showing promising results. In this paper we present four different models of neural networks to implement pattern recognition system in artificial noses. The models investigated are the multi-layer perceptrons, two different implementations of the radial basis function networks and the probabilistic neural network. All the models were tested with and without temporal processing. A complex data base with nine different classes was used in this paper. 1.

INTRODUCTION

An artificial nose is a modular system, which consists of two main parts: a sensor system, made up of elements that detect odor, and a pattern recognition system that classifies the detected odors. Artificial neural networks have been used as pattern recognition system and have showed good results. Since the 80s, research to create artificial noses, which will detect and classify odors, vapors and gas automatically have advanced significantly. These pieces of equipment can be used for monitoring the environment in order to control the quality of the air, in the health field to help diagnosing diseases and in the food, drink and cosmetics industry to control quality and monitor the production process. Different artificial neural networks architectures have been used on artificial noses pattern recognition systems. The Multi-layer Perceptron (MLP), Radial Base Function (RBF) and Time Delay Neural Network (TDNN) architectures have been used to recognize different vintage of the same red wine in different works [1-3,9] and they have achieved promising results. Artificial neural networks have also been used successfully to classify petrol-derived odors [3,10, 11]. This work’s main objective is to develop a systematical study of different artificial neural networks architectures to create pattern recognition systems on artificial noses using a new and more complex database. Four different artificial

neural network architectures have been selected to do this work. The MLP, RBF and TDNN architectures were selected because they have been previously used in other works with success. The Probabilistic Neural Network (PNN) architecture was chosen because it is indicated to classifying problems and for the fact that it has never been used before to create pattern recognition systems on artificial noses. 2.

DESCRIPTION OF THE PROBLEM AND THE DATA

The problem dealt with in this work consists of classifying nine different samples of turpentine English Candle (reference sample, croqueo naphtha with 100, 500 e 1000ppm of contamination, craqueo naphtha with 100, 500 and 1000ppm of contamination, diesel oil and TBQ46) that were made available by PETROBRAS Gabriel Passos´ Refinery. The data was obtained by the AROMA project team through the use of an artificial nose prototype [3]. This prototype is composed of eight polymer conductors sensors prepared with different dopant. For each type of turpentine were done up to eight acquisitions. On each acquisition the sensor resistance values were recorded every 20 seconds. As each acquisition lasted about 10 minutes, each sensor obtained an average of 30 values for each turpentine type on each acquisition. With the objective of using balanced information of each turpentine type (class), the acquisitions 5 and 6 of sample contaminated in croqueo with contamination level of 500ppm and 1000ppm were replicated until they had the same acquisition number as the other classes. For the same reason, the values of both the acquisition 2 contaminated in the croqueo with level of contamination 1000ppm and the acquisition 2 of TBQ46 turpentine were replicated until there was a total of 30 values. The set formed by the eight sensors values in the same time interval was considered a data base pattern. So, the database has 2160 patterns, which correspond to 30 patterns in eight acquisitions for each one of the nine turpentine types.

To create the temporal processing (TDNN) networks, a new database was created from the same original data, in a way that each pattern started to represent the eight sensors resistance in a time period (t) and the eight sensors resistance to the same substance in a time period (t+1). In order to obtain an error classification estimate closer to the true error, the 10-fold cross-validation with stratification method was chosen to create the training sets and the test sets. This method has become a standard method in practical terms [5]. This way, the patterns were divided in 10 independent portions and with stratification (same quantity of each class patterns in each portion); each portion has 10% of the data. In each experiment a portion was used to test the network and the nine portions left were used to train the networks. The portions created for the data without temporal processing have 24 patterns from each class, in a total of 216 patterns per portion. The portions created with temporal processing have 23 patterns of each class, in a total of 207 patterns per portion. 3.

EXPERIMENTS

The whole experiment was done in the Matlab tool, version 6.0.0.88, and release 12. All the networks created without temporal processing consist of eight units in the input layer, one for each sensor. The networks output is represented by the codification 1-of-m, a unit for each turpentine type, being so, and the networks have nine units in the output layer. All the networks created with temporal processing consist of sixteen units in the input layer, eight to represent each of the eight sensors in the time t and eight to represent the same eight sensors in the time t+1. The networks output is also represented by the codification 1-of-m, and then networks have nine units in the output layer. 4.

EXPERIMENTS WITH RBF NETWORKS

In these experiments, two different RBN networks training algorithms were investigated. The first algorithm is executed when the Matlab newrb command is used. It creates nodes in the hidden layer according to the expected Goal parameter or to the maximum number of nodes in the hidden layer. The second algorithm is executed when the Matlab newrbe command is used. It creates as many nodes in the hidden layer as the used patterns on the training set. All the RBF networks created in this work have only one hidden layer. The hidden layer nodes use the Gauss activation function and the output layer nodes use the linear activation function.

higher error is identified, a new node is included in the hidden layer for this pattern, the weights of the output layer are adjusted to minimize the error. These steps are repeated until the network error is smaller than the value passed as the Goal parameter or until the maximum number of nodes is reached. Different Goal values were investigated, but the value 200 allowed the creation of not so big RBF networks and with good classification results. To this Goal value, five different values to the radial base function width (Spread parameter) on the hidden layer nodes were investigated. Table 1 illustrates the obtained results. Table 1. RBF networks created with newrb command Results. Spread

Nodes

Train

Test %Classif. Error Mean Std

Mean

Std

%Classif. Error Mean Std

0.05

104.30

5.83

3.11

0.51

3.70

1.20

0.08

122.40

5.23

2.68

0.36

4.07

1.81

0.10

130.20

3.39

2.53

0.41

3.56

1.43

0.50

168.60

4.12

1.65

0.11

2.69

0.78

1.00

443.40

528.51

2.70

3.41

3.84

3.63

4.2. RBF Networks Created with NEWRBE The training algorithm of these networks create as many nodes in the hidden layer as there are patterns on the training set and does this layer’s nodes weight equal to the transposed input vector. So, the hidden layer consists of radial base activation function nodes, with each node acting like a detector to a different input vector. If there are n input vectors, so there will be n nodes in the hidden layer. Different values to the radial base function width (Spread parameter) were investigated in order to reduce the classification error. Table 2 illustrates the results. Table 2. RBF networks created with newrbe command Results. Spread

Nodes

Train

Test

%Classif. Error

%Classif. Error

Mean

Std

Mean

Std

0.0500

1944

0.12

0.04

1.39

0.72

0.0800

1944

0.16

0.04

1.16

0.63

0.1000

1944

0.21

0.07

1.16

0.76

0.5000

1944

2.35

0.46

3.38

1.16

1.0000

1944

13.40

4.17

15.56

5.31

5.

EXPERIMENT WITH PNN NETWORKS

4.1. RBF Networks Created with NEWRB The training algorithm of these networks works this way: the network is simulated, the training pattern with

The PNN networks implement the Bayesian decision strategy to classify input vectors [4]. The PNN networks are divided in four layers: input unit layer, pattern unit

layer, sum unit layer and output unit layer. As in MLP networks, the input layer units do not execute any operations over input vectors. The pattern layer units represent each training set pattern and use a radial base activation function. The pattern layer does a non-linear transformation of the input space to the hidden space; on most applications, the hidden space is of high dimensionality, so transforming the non-linear separable pattern set in linear separable output sets. The sum layer inputs have origin in correspondent pattern layer units from a determined class. The output layer units produce a binary output; which is equal to 1 in only one of the units and 0 in the others [4]. The network is trained creating a unit into the pattern layer for each training set pattern and making the weight vector of this units equal to the corresponding training set pattern. After that, the pattern layer outputs are connected to the sum layer units from the correspondent class. The network operates through the sum of the pattern layer nodes outputs, which due to the use of Gauss activation function, group the input pattern in classes. So, the PNN networks calculate the a posteriori probability distribution function to a simple class, evaluating the point defined by the input pattern. This environment is useful to classify applications, where one wishes to estimate the probability of a new pattern being a pre-defined class member. With the objective of reducing the classification error of the PNN networks, different radial base function width values were investigated (Spread parameter). Table 3 illustrates the results obtained from each Spread value investigated.

set. The maximum number of iterations defined for all the trainings was 2500. The training stops if the Early Stop criteria [6] implemented by Matlab occurs 5 consecutive times, or if the maximum number of iterations is reached or, still, if the error in the training set is equal to zero. For each training, validation and test set, formed from the portions created by the cross-validation method; 10 networks were created. Table 4 describes how the training, validation and test set were formed and identifies the network that used the respective sets. The means obtained for each of the 10 created networks with the sets are considered as the MLP networks result and these results are described in Table 5. Table 4. Sets formation Exp

Network

Train set

Validation set

Test set

1

De 1 a 10

5, 6, 7, 8, 9, 10

2, 3, 4

1

2

De 11 a 20

1, 6, 7, 8, 9, 10

3, 4, 5

2

3

De 21 a 30

1, 2, 7, 8, 9, 10

4, 5, 6,

3

4

De 31 a 40

1, 2, 3, 8, 9, 10

5, 6, 7

4

5

De 41 a 50

1, 2, 3, 4, 9, 10

6, 7, 8

5

6

De 51 a 60

1, 2, 3, 4, 5, 10

7, 8, 9

6

7

De 61 a 70

1, 2, 3, 4, 5, 6

8, 9, 10

7

8

De 71 a 80

2, 3, 4, 5, 6, 7

9, 10, 1

8

9

De 81 a 90

3, 4, 5, 6, 7, 8

1, 2, 10

9

10

De 91 a 100

4, 5, 6, 7, 8, 9

1, 2, 3

10

Table 5. MLP networks Results. Exp.

Table 3. PNN networks Results. Spread

Train

Test

%Classif. Error

%Classif. Error

Mean

Mean

Std

Mean

Std

0.0001

1944

0.09

0.02

24.86

2.94

0.0005

1944

0.09

0.02

1.16

0.59

0.0050

1944

1.54

0.17

2.69

0.0500

1944

31.83

1.37

0.1000

1944

44.69

2.42

6.

Nodes

Train

Validation

Test

%Classif. Error

%Classif. Error

%Classif. Error

Mean

Mean

Mean

Std

Std

Std

1

46.84

33.85

47.50

32.64

47.82

33.37

2

13.29

19.83

14.81

19.25

13.94

19.80

3

32.69

31.50

34.01

30.66

33.38

31.10

4

25.85

27.80

27.05

27.37

27.59

26.91

5

28.99

28.35

31.22

27.56

30.93

27.51

1.21

6

42.62

34.77

43.60

33.98

43.47

34.20

35.83

3.86

7

29.52

25.95

31.37

24.93

31.02

25.87

48.84

2.84

8

45.44

34.58

47.10

33.58

47.18

33.85

9

18.20

22.52

19.58

21.51

20.00

21.56

10

28.39

33.51

29.35

32.82

30.19

32.50

Mean

31.18

32.56

32.55

Std

11.13

10.98

11.09

EXPERIMENTS WITH MLP NETWORKS

The MLP networks were created with only one hidden layer. The MLP networks were trained with the Leavenberg-Marquardt algorithm described in [7]. The hidden layer nodes and the output layer nodes have logistic sigmoid activation function. Initially, ten experiments were done with random weight initialization using 8, 10, 12, 14, 16, 18, 20, 22, 24 e 26 nodes in the hidden layer to select the best architecture to the problem. The architecture with 16 nodes in the hidden layer was chosen because it presented the smallest classification error in the validation

7.

EXPERIMENTS WITH TEMPORAL PROCESSING

The networks with temporal processing (TDNN) were created through the artifice of including the temporal information into the input data. This way it was possible to create TDNN networks with the same architectures used with networks without temporal processing, TDNN

networks were created with the RBF, PNN and MLP architectures. 7.1. TDNN Networks with RBF The TDNN networks with RBF were created with the Matlab newrbe command. The same training algorithm and the same parameters described in section 4.2 were used to create these networks. All the networks created on this phase have 16 nodes in the input layer, 1863 nodes in the hidden layer and 9 nodes in the output layer. The hidden layer nodes use Gauss activation function and the output layer nodes use linear activation function. Table 6 describes results obtained for these networks. Table 6. TDNN networks with RBF Results. Spread

Nodes

Train %Classif. Error Mean Std

Test %Classif. Error Mean Std

0.0500

1863

0.05

7.44

0.0800

1863

0.05

0.02

2.51

1.13

0.1000

1863

0.05

0.02

2.61

1.07

0.5000

1863

1.01

0.20

2.37

1.15

1.0000

1863

7.08

2.20

9.76

3.94

0.02

3.97

7.2. TDNN Networks with PNN The TDNN networks with PNN were created using the same training and the same parameters described in section 5. The networks created in this phase have 16 nodes in the input layer, 1863 nodes in the hidden layer and 9 nodes in the output layer. Table 7 describes results obtained.

done using 8, 10, 12, 14, 16, 18, 20, 22, 24 e 26 nodes in the hidden layer in order to choose the best architecture to the problem. The 26 nodes architecture in the hidden layer was chosen because it presents the smallest classification error in the validation set. Table 8 presents the results obtained with 26 nodes in the hidden layer. The formation of training, validation and test sets were equal to the formations described in Table 4. Table 8. TDNN with MLP Results. Exp.

Train

Validation

Test

1 2 3 4 5 6 7 8 9 10 Mean Std

%Classif. Error Mean Std 31.49 40.13 41.63 40.14 33.27 33.39 32.82 40.11 27.10 32.13 31.05 36.52 15.25 17.18 30.95 36.43 13.32 14.74 31.23 41.25 28.81 8.50

%Classif. Error Mean Std 32.91 39.18 42.58 39.01 34.64 32.88 34.12 39.27 28.10 31.46 32.40 35.69 17.71 17.50 32.13 35.90 14.72 14.66 32.30 40.38 30.16 8.22

%Classif. Error Mean Std 32.56 39.63 43.19 39.59 34.54 32.79 33.91 39.44 29.52 31.41 32.17 35.81 17.83 17.77 32.90 35.72 13.91 13.83 32.90 39.81 30.34 8.46

8.

COMPARING RESULTS

The objective of this section is to make a comparative study from different neural networks models created in this work and to present the advantages and the disadvantages of each model. The study is based in each model classification performance. Table 9 identifies the topology and the classification error mean obtained in the training, validation (when there is) and test sets. Table 9. Results from the best topologies.

Table 7. TDNN networks with PNN Results. Spread

Nodes

Train

Topology

Train

Validation

Test

%Classif. Error -

%Classif . Error 1.16

Test

%Classif. Error

%Classif. Error

Mean

Std

Mean

Std

PNN

%Classif. Error 0.09

RBF with newrbe

0.12

-

1.39

TDNN with PNN

0.48

-

1.74

TDNN with RBF

0.05

-

2.61

RBF with newrb

1.65

-

2.69

TDNN with MLP

28.81

30.16

30.34

MLP

31.18

32.56

32.55

0.0500

1863

0.05

0.02

50.43

2.58

0.0800

1863

0.05

0.02

4.30

1.26

0.1000

1863

0.48

0.09

1.74

0.92

0.5000

1863

28.52

1.29

30.05

3.57

1.0000

1863

37.27

1.17

40.77

3.93

7.3. TDNN Networks with MLP The TDNN networks with MLP were created with only one hidden layer. The hidden layer nodes and the output layer nodes have logistic sigmoid activation function. The same training algorithm and the same parameters described in section 6 were used to create these networks. Initially, ten experiments with random weight initialization were

The PNN networks presented a classification error mean of 1.16% in the test set. This error was the smallest error obtained among all created systems, however, even with the smallest classification error mean the hypothesis tests (t-test) [8] showed that the classifiers created with PNN are not better than the classifiers created with the RBF architecture with the Matlab newrbe command, in a significance level of 5%. Besides having presented the

smallest error mean on the test set, the PNN networks presented the advantages of having the smallest training time and reduced quantity of parameters that were investigated. On the other hand, the PNN network presented a disadvantage that is the necessity of creating a node in the hidden layer for each input patter. This disadvantage can become a problem, if the created system has to be used in pieces of equipment with less computational resources. The RBF networks have been trained in two different ways. The first adopted way was the incremental training. This training presented an advantage when compared to the second RBF networks training option, which was the creation of networks with a relatively small quantity of nodes in the hidden layer. However, these networks presented a disadvantage that was the their long training time, because the hidden layer nodes are included one by one until the stop criteria is reached. The classification error mean in the test was 2.69% and the mean quantity of nodes in the hidden layer was 168.6; which is much smaller than the hidden layer nodes quantity of the PNN networks. These networks performance was not superior to the PNN networks performance and to the RBF networks with Matlab newrbe performance. The second way used to train RBF networks was through the Matlab newrbe command. This training creates a node in the hidden layer for each training set pattern. The RBF newrbe networks were equivalent to the PNN networks, in fact, the same advantages and disadvantages from PNN networks were observed. The test set classification error mean was 1.39%. The quantity of hidden layer nodes was equal to the quantity of hidden layer nodes in the PNN networks. The MLP networks performance was not superior to any other network. The principal advantage of using MLP networks is that the created systems presented the smallest number of nodes in the hidden layer, but the quantity of parameters that can be investigated and the training time are disadvantages when they are compared to RBF or PNN networks. The TDNN networks created using the time window artifice. As the temporal information is in the input data and not in the network architecture, it was possible to investigate three different architectures to create TDNN networks. The first architecture used to create TDNN networks was the PNN architecture. The temporal information inclusion did not improve the TDNN with PNN networks performance in relation to the PNN networks. A probable explanation for the performance of these networks not being superior to the PNN networks performance is because as the input data dimension doubled more training patterns would be necessary, or it would be necessary to apply some techniques to reduce input pattern dimensionality. The second architecture investigated to create TDNN networks was the RBF with Matlab newrbe architecture. The results obtained from these networks did not exceed

the results obtained from the networks without temporal processing. The third architecture investigated to create TDNN networks was the MLP architecture. Like the MLP networks without temporal processing, the TDNN with MLP networks presented the advantage of using a reduced quantity of nodes in the hidden layer, but the test set classification error mean was 30.34%. This error was only inferior to the MLP network mean error; in other words, the temporal processing helped improve the odor pattern classification performance with MLP architecture; but this did not occur with the PNN and RBF networks. Although it presented a test set classification error mean smaller than the error presented by the MLP networks, the hypothesis tests showed that this improvement was not enough to affirm that the TDNN with MLP classifiers are better than the MLP classifiers with a 5% significance level. 9.

CONCLUSIONS AND FUTURE PERSPECTIVES

As there is no knowledge so far of the PNN networks being applied to the odor pattern recognition problem and also considering the training simplicity and quickness, we decided to analyze these networks behavior to classify odor in artificial noses. The comparison of the obtained results from the PNN networks with other neural networks architectures has become important because the MLP, RBF and TDNN architectures have been deeply investigated in odor pattern recognition problems and they have showed good results [1-3,9]. The use of a new and more complex database was important to prove the applicability of the MLP, RBF and TDNN architecture in odor patterns recognition problems. From the results obtained we can conclude that the creation of odor pattern recognition systems for an artificial nose, with PNN or RBF networks would allow a fast equipment update to new data bases, once the training from these networks is almost immediate and few adjustable parameters need to be investigated. However, if the equipment has few computational resources, it would be more indicated to use the MLP architecture. This architecture allows networks creation with fewer nodes in the hidden layer, though it is necessary much more time to train these networks. Future works could use the PNN networks with new databases. The data base which was prepared to create TDNN networks could have its dimensionality reduced using, for example, the principal component analysis (PCA) statistical technique [8], to verify if the TDNN networks results can be optimized. REFERENCES [1] M.S. Santos et al, “Artificial Nose And Data Analysis Using Multi Layer Perceptron” Em Data Mining, WIT Press, Computational Mechanics Publication, 1998.

[2] A. Yamazaki, T.B. Ludermir and M.C.P. de Souto, "Classification of vintages of wine by an artificial nose using time delay neural networks", IEE Electronics Letters, 22nd November 2001, Vol. 37, N. 24, pp. 14661467. [3] M. S. Santos, Construction of an artificial nose using neural networks (in Portuguese), Ph.D. Thesis, Centre of Informatics, Federal University of Pernambuco, Brazil, 2000. [4] D.F. Specht, Probabilistic Neural Networks and the Polynomial Adaline as Complementary Techniques for Classification, IEE Transactions on Neural Networks, 1990, Vol. 1, pp. 111-121. [5] I.W. Witten, E. Frank, Editor D.D. Cerra, Data Mining, Practical Machine Learning Tools and Techniques with Java Implementations, 2000, Morgan Kaufmann Publishers, pp. 126. [6] L. Prechelt, Proben1 – A Set of Neural Network Benchmark Problems and Benchmarking Rules. Technical Report 21/94, Fakultät für Informatik, Universität Karlsruhe, Germany, 1994. . [7] R. Fletcher, Practical Methods of Optimization, Wiley, 1987. [8] D. Hildebrand, Statistical Thinking for Behavioral Scientists. Duxbury Press, Boston, MA, 1986. [9] A. Yamazaki and T. B. Ludermir. “Neural Network Training with Global Optimization Technique”. International Journal of Intelligent Systems, v.13, n.2, p.77 - 86, 2003. [10] C. Zanchettin. Hybrid Neural System for Patterns Recognition in Artificial Noses. (In Portuguese), Master Dissertation, Centre of Informatics, Federal University of Pernambuco, Brazil, 2004. [11] A. A. Ferreira. Comparing Different Neural Networks Topologies for Patterns Recognition in Artificial Noses. (In Portuguese), Master Dissertation, Centre of Informatics, Federal University of Pernambuco, Brazil, 2004.

Lihat lebih banyak...

A comparative study of neural network to artificial noses

Descrição do Produto

Comentários