Assessing Artificial Neural Network Pruning Algorithms

Share Embed


Descrição do Produto

In Proceedings of the 24th Annual Conference and Exhibition of the Remote Sensing Society, Greenwich, UK, pp. 603-609, 9-11 September 1998.

ASSESSING ARTIFICIAL NEURAL NETWORK PRUNING ALGORITHMS Taskin Kavzoglu and Paul M. Mather ([email protected]) Department of Geography, The University of Nottingham, NG7 2RD, Nottingham. Abstract. In this study, three major pruning techniques are used to examine the effects of network pruning on classification accuracy. A feed-forward neural network structure which learns the characteristics of the training data via the backpropagation learning algorithm is employed to classify a multisource remotely sensed data set. An important conclusion from this study is that, of the three techniques tested, optimal brain surgeon (OBS) (the most sophisticated of the algorithms used) gave the best results. Results also show that pruning techniques are effective in that the pruned network is still capable of classifying the test data with high accuracy even after a considerable number of links have been removed. However, as a result of network pruning, the positive trend in overall accuracy may not apply to individual class accuracies. In order to appreciate the effects of pruning we have used visualisation techniques (including 3D visualisation and animations) and found them effective in providing insight into the process. In summary, this study shows that pruning algorithms are effective in reducing network size and producing a network that has a greater capacity for generalisation. This study also demonstrates the value of visualisation techniques in understanding the behaviour of artificial neural networks.

1.

INTRODUCTION

Artificial Neural Networks (ANNs) are computer programs designed to mimic human learning processes by establishing linkages between input and output data. These techniques have been developed and used in a wide range of fields including medicine, business and computer processing. The reasons for their popularity rest on their unique advantages, such as nonparametric nature, arbitrary decision boundary capabilities and easy adaptation to different types of data. As a parallel development to these areas, ANN techniques have been intensively used and found to perform well in the classification of remotely sensed data. They have been used in many applications such as land-cover classification, geological mapping, urban area classification and multidate image classification. Despite the advantages of ANNs, they have some drawbacks related in particular to their long training time requirement and the determination of the most efficient network structure. Small networks generally lack the ability to learn the characteristics of the training data, whereas large networks can learn the characteristics of the training data, but thereby lose their generalisation capabilities. To overcome this problem, pruning algorithms are generally employed. Recent work on pruning techniques shows that they can increase the generalisation capabilities and improve the efficiency of artificial neural networks. 2.

TEST SITE AND DATA

In this study, multisensor data including SIR-C and SPOT imagery was used for land-cover classification. The area under analysis is a 76.6 km² region of agricultural land located near Thetford, Norfolk, in the south-east part of England. The quad-polarised L-band (~24cm wavelength) SIR-C data was acquired by the NASA/JPL SIR-C (Shuttle Imaging Radar) system on April 15, 1994. The SPOT image was acquired on May 13, 1994. 280-pixel by 475-pixel portions of the images covering the area of interest were extracted and used for further stages. As a result of some experiments, it was decided to use 7 land-cover classes covering the bulk of the study area. The ANN classification procedure was applied to examine these 7 classes, sugar beet, wheat, peas, forest, winter barley, potatoes and linseed. Furthermore, a mixed class including soil and triticale, a man-made hybrid between rye and durum, was produced. Two fields for each class were then chosen for the training set. A set of test areas including data from 35 different fields was selected to use in the accuracy assessment stage. 3.

ARTIFICIAL NEURAL NETWORK APPROACH

The powerful capabilities for knowledge acquisition, recall, synthesis, and problem solving of the human brain have inspired scientists from different disciplines to attempt to model its operations. Based on the biological theory of human brain, artificial neural networks are models that attempt to simulate the functionality and decision making processes of the human brain (Civco and Waug 1994). In other words, Artificial Neural Networks (ANNs) are heuristic algorithms in that they can learn from experience via samples and are applied to recognise new data. The advantages of ANNs are their non-parametric nature, arbitrary decision boundary capabilities, easy adaptation to different types of data and input structures, fuzzy output values that can enhance classification decisions, and good generalization capabilities. They can give considerably better results for small training data sets compared to conventional statistical classifiers (Foody, 1995; Hepner et al., 1990). Of the advantages of ANN techniques the most important is their non-parametric nature. In other words, there is no underlying assumption about the distribution of the data. They learn the characteristics of the training data typically in an iterative

way, so they may be called data-dependent techniques. Therefore, the characteristics of the training data intensely affect the accuracy of the artificial neural network classification. The power of the network depends on how well it generalizes new data following training. This is the main criterion for judging the performance of a network. Generalization is the ability of the neural network to interpolate and extrapolate the data that it has not seen before (Atkinson and Tatnall 1997). Size of the training data, training time, and architecture of the network are the major factors affecting the generalization capabilities of ANNs. There are two approaches to determine the appropriate size of ANNs. One is to start with a small network and iteratively increase the number of nodes in the hidden layer(s) until classification accuracy is satisfactory. However, there are problems, as small networks are sensitive to initial circumstances and learning parameters. These networks are also more likely to become trapped in local minima. As a consequence, a number of networks must be trained to find the optimum network structure, which takes a long time. The second approach is to begin with a large network and make it smaller by iteratively eliminating nodes in the hidden layer(s) or interconnections between nodes. These types of algorithms are called pruning algorithms, and will be discussed in this paper. 4.

PRUNING ALGORITHMS

Pruning is the name given to the process of examining a solution network, determining which units are not necessary to the solution and removing those units (Sietsma and Dow, 1988). After a network is trained to a desired solution with the training data, units (hidden layer nodes or interconnections) are analysed to determine which are not contributing to the solution. There are several approaches described in the literature to determine non-contributing units. A widely used approach is that non-essential units can be determined by a form of sensitivity analysis. So far, few studies have been performed in the analysis of pruning algorithms for classification of remotely-sensed data. An important study is reported by Dreyer (1993) who used skeletonization method based on removing the least important hidden neurons using first-order partial derivatives of the error function to simplify neural networks. He underlined that, even if the improvement in accuracy is minor, optimisation with skeletonization still results in highly improved efficiency. In this study, pruning techniques based on removing the least effective interconnections in the network are considered and used for the analyses. Three major techniques are used : magnitude based, optimum brain damage and optimum brain surgeon pruning techniques. 4.1. Magnitude Based Pruning The Magnitude Based pruning technique (MB) is the simplest pruning algorithm considered here. It is based on deleting interconnections with small ‘saliency’, i.e. those whose deletion will have the least effect on the training error. A straightforward approach is to assume that saliency corresponds to the magnitude (weight) value of the interconnections. It is assumed that the interconnections whose magnitude value is small will have minor effect on the performance of the network. After a reasonable initial training, the interconnection having the smallest magnitude value is removed. The network is then retrained and the process is repeated in an iterative fashion until the training error reaches a certain limit. 4.2. Optimum Brain Damage The Optimum Brain Damage pruning algorithm (OBD), introduced by Le Cun, Denker and Solla (1990), is based on second order derivatives of the error function. The aim is to iteratively delete the weights whose deletion will result in the least increase of the error in the network. There is an

important problem in the estimation of formula, is the size of the Hessian matrix. The calculation of the Hessian matrix is time-consuming, hence Le Cun et al. (1990) assume that the Hessian is diagonal. On the other hand, Hassibi and Stork (1993) argue that Hessians for every problem they have considered are strongly non-diagonal, and this leads OBD to eliminate the wrong weights. 4.3. Optimal Brain Surgeon The Optimum Brain Surgeon pruning algorithm (OBS), introduced by Hassibi and Stork (1993), is a more complex form of Optimum Brain Damage (OBD). Although OBS and OBD are basically based on the same theoretical approach, OBS does not make any assumption about the form of Hessian matrix. Therefore, OBS is more complex and robust than OBD. It is claimed that optimum brain surgeon (OBS) is significantly better than magnitude based (MB) and optimal brain damage (OBD) techniques, and OBS permits the pruning of more weights than other methods for the same error on the training set, and thus yields better generalisation on test data. The most important problem with OBS is that the inverse of the Hessian matrix has to be computed to judge saliency and weight change for every link. Therefore, this method is quite slow and takes much memory compared to the other methods. 5.

ANN CLASSIFICATION RESULTS

A 6-15-8 network structure was found to be appropriate for classification and pruning tasks. Training was performed with the training data 15,000 times in an iterative way using the backpropagation learning algorithm. 15,000 iterations were found to be sufficient to allow the network to reach a local minimum. Each input data band was represented by a node, 6 nodes in total. Learning rate was set to 0.2 as a result of some experiments. As can be seen from Table 1, the ANN classification gave the overall accuracy of 79.71% and could not recognise 326 pixels out of 5835. For the major issue in this study three pruning algorithms, magnitude based pruning (MB), optimal brain damage (OBD) and optimal brain surgeon (OBS), were applied to the 6-15-8 network and the pruned networks were used in a feed-forward form to classify the test data. Table 1. Accuracy Assessment of Neural Network Classification. sugar beet wheat peas forest w. barley potato linseed mixed accuracy (%)

6.

sugar beet wheat 0 825 0 455 2 3 0 0 0 0 9 0 48 0 0 6 72.6

55.6

peas 54 3 654 0 0 0 0 0 87.9

forest w. barley potato linseed mixed 0 0 20 126 0 0 32 0 0 302 0 1 32 1 0 0 0 0 0 1170 0 0 0 9 421 0 0 4 0 296 0 0 1 0 521 51 154 0 0 309 Overall : 79.71% 100.0 97.0 89.7 86.3 51.7

total 1025 792 693 1170 430 309 570 520 5509

unr. pix. 111 27 51 0 4 21 34 78 326

RESULTS

Reducing the size of the networks and providing them with higher generalization capabilities was performed by three known pruning algorithms; magnitude based pruning (MB), optimum brain damage (OBD), and optimum brain surgeon (OBS). Some important conclusions can be drawn from the analyses implemented in this study:

Table 2. Results of MB, OBD and OBS Pruning Techniques. w In general, pruning algorithms slightly improved the performance Links of the networks although almost 210 half of the links in the network 200 were removed. It can be concluded 190 that the links in 6-15-8 network for 180 the data sets used could be reduced 170 by 60% with confidence. In fact, 160 they not only improved the overall 150 accuracy of the classification but 140 also allowed the network to 130 recognise more pixels. Moreover, 120 resulting networks classified faster 110 than the initial networks due to the 100 smaller size of the network. 90 w Of three pruning techniques, 85 Optimal Brain Surgeon gave the 80 best results and seems to be the 70 most reliable technique. Even when the number of links was reduced from 210 to 80 (almost 62% of the links removed), the overall accuracy was almost the same and the number of unrecognised pixels was slightly higher compared to the full network (Table 2). w Contrary to claims by Le Cun et al. (1990), the OBD algorithm did not give significantly better results than the magnitude based pruning. On the contrary, MB surprisingly appeared to be more robust and reliable than OBD. However, OBD may give better results for more complicated data sets. Another reason for the weakness of OBD in this study might be that the Hessian matrix was strongly non-diagonal, this compromising one of the basic assumption of OBD. MB Overall unr. pix. 79.71 326 81.13 282 80.00 314 80.10 261 80.46 292 80.33 293 80.45 285 79.54 347 80.33 238 80.05 255 80.69 224 76.85 606 76.74 512 76.38 721 74.33 716

OBD Overall unr. pix. 79.71 326 80.93 248 80.70 217 80.86 217 79.71 297 79.35 263 80.14 278 79.23 378 78.80 371 79.55 358 78.94 512 74.52 628 59.40 1512 59.86 1297

OBS Overall unr. pix. 79.71 326 80.98 281 80.65 274 80.51 264 80.46 297 80.70 274 80.24 308 80.29 275 80.14 229 80.33 246 81.83 238 78.58 380 78.66 398 79.79 395 60.00 1544

Figure 1. 3-D Visualization of the Training Data. w After removing a certain number of links, pruning algorithms start removing nodes and links representing one of the land-cover classes. Thus, the pixels belonging to that class are categorized as unrecognised pixels. For MB potato was removed for 60-link solution, for OBD peas was removed for 85-link solution, and for OBS forest was removed for 70-link solution. Yet, the networks generally gave almost the same individual accuracy for the other land cover types after removing one of the classes.

w 3-D visualisation of the data helped to examine the characteristics of the training data (Figure 1), and 3-D animation of the training process helped to understand the behaviour of ANNs. One important conclusion is that in the training stage, the majority of the time is used for the classification of mixed and noisy pixels. The network could recognise the pixels which are distinct from each other very quickly. However, it takes time for ANNs to classify atypical pixels. This work shows that the neural network classifier is a very useful tool and gives promising results for remotely sensed image classification. Pruning techniques are currently in a developing stage. Therefore, new and complicated techniques like Optimum Brain Surgeon seem to play a very important role in the development of ANN. Another important point is that visualisation of the training process helps the user to analyse the ANN classification step by step. This could be also useful to improve the accuracy of the classification. As artificial neural networks (ANNs) are relatively new techniques and their behaviour is still unknown, research in this area will play a crucial role in our understanding of the behaviour of ANNs. Acknowledgments Mr. Kavzoglu is supported by a grant from Turkish Government. The Department of Geography, Nottingham University, provided computer facilities. The SIR-C SAR data were kindly made available by the NASA Jet Propulsion Laboratory, Pasadena, California, USA. References ATKINSON, P.M. and TATNALL, A.R.L., 1997, Neural Networks in Remote Sensing. International Journal of Remote Sensing. 18, 699-709. CIVCO, D.L. and WAUG, Y., 1994, Classification of Multispectral, Multitemporal, Multisource Spatial Data Using Artificial Neural Networks. ASPRS/ACSM. http://wwwsgi.ursus.maine.edu/gisweb/spatdb/acsm/ac94014.html. DREYER, P., 1993, Classification of Land Cover Using Optimized Neural Nets on SPOT Data. Photogrammetric Engineering & Remote Sensing. 59, 617-621. FOODY, G.M., 1995, Using Prior Knowledge in Artificial Neural Network Classification with a Minimal Training Set. International Journal of Remote Sensing. 16, 301-312. HASSIBI, B. and STORK, D.G., 1993, Second-order Derivatives for Network Pruning: Optimal Brain Surgeon. In Advances in Neural Information Processing Systems 5, edited by S.J. Hanson, J.D. Cowan, and C.L. Giles. (San Mateo: Morgan Kaufmann), pp. 164-171. HEPNER, G.F. LOGAN, T. RITTER, N. and BRYANT, N., 1990, Artificial Neural Network Classification Using a Minimal Training Set: Comparison To Conventional Supervised Classification. Photogrammetric Engineering & Remote Sensing. 56, 469-473. LE CUN, Y. DENKER, J.S. and SOLLA, S.A., 1990, Optimal Brain Damage. In Advances in Neural Information Processing Systems 2, edited by D.S. Touretsky. (San Mateo: Morgan Kaufmann), pp. 598-605. SIETSMA, J. and DOW, R.J.F., 1988, Neural Net Pruning: Why and How. In Proceedings of IEEE Int. Conf. Neural Networks, San Diego, 1, 325-333.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.