Ensemble Strategies for Classifying Hyperspectral Remote Sensing Data

June 5, 2017 | Autor: J. Benediktsson | Categoria: Support Vector Machines, Hyperspectral remote sensing, Multiple Classifier Systems, Random Forest, Land cover classification, Decision Fusion, Classifier Ensemble, Hyperspectral Imagery, Decision Fusion, Classifier Ensemble, Hyperspectral Imagery

Share Embed

Denunciar este link

Descrição do Produto

Ensemble Strategies for Classifying Hyperspectral Remote Sensing Data Xavier Ceamanos1, Bj¨ orn Waske2 , J´ on Atli Benediktsson2 , 1 Jocelyn Chanussot , and Johannes R. Sveinsson2 1

GIPSA-LAB, Signal & Image Department, Grenoble Institute of Technology, INPG BP 46 - 38402 Saint Martin d’H` eres, France 2 University of Iceland, Faculty of Electrical and Computer Engineering, Hajararhagi 2-6, 107 Reykjavik, Iceland

Abstract. The classification of hyperspectral imagery, using multiple classifier systems is discussed and an SVM-based ensemble is introduced. The data set is separated into separate feature subsets using the correlation between the different spectral bands as a criterion. Afterwards, each source is classified separately by an SVM classifier. Finally, the different outputs are used as inputs for final decision fusion that is based on an additional SVM classifier. The results using the proposed strategy are compared to classification results achieved by a single SVM and other well known classifier ensembles, such as random forests, boosting and bagging. Keywords: hyperspectral, land cover classification, support vector machines, multiple classifier systems, classifier ensmeble.

1

Introduction

Hyperspectral data provide detailed spectral information from land cover, ranging from the visible to the short-wave infrared region of the electromagnetic spectrum. Nevertheless the classification of hyperspectral imaging is challenging, due to the high-dimension of the data sets. Particularly with a limited number of training samples the classification accuracy (of conventional statistical classifiers) can be limited. Hughes [1] showed that with a limited number of training samples the classification accuracy decreases after a maximum is achieved. Thus, it requires sophisticated classification algorithms to use detailed hyperspectral information comprehensively. In several remote sensing studies it was demonstrated that Support Vector Machines (SVM) perform better than or at least comparable to other classifiers in terms of accuracy, even when applied to hyperspectral data sets [2],[3]. One reason for this success might be the underlying concept of SVM classifiers. Their aim is to discriminate two classes by constructing an optimal separating hyperplane to the training samples within a multi-dimensional feature space, by using only the closest training samples of each class [4]. Consequently, the approach only considers training data close to the class boundary and performs well with small training sets. J.A. Benediktsson, J. Kittler, and F. Roli (Eds.): MCS 2009, LNCS 5519, pp. 62–71, 2009. c Springer-Verlag Berlin Heidelberg 2009 !

Ensemble Strategies for Classifying Hyperspectral Remote Sensing Data

63

Multiple classifier systems (MCS) or classifier ensembles are another machine learning concept, which has been applied to remote sensing data sets recently [5]. By combining different independent classifiers, MCS can improve the classification accuracy in comparison to a single classifier. In [6]-[8] ensemble strategies were successfully applied to hyperspectral data sets. In [9] an SVM base classifier system was introduced, to classify multisensor imagery. Each data set, a multitemporal image and a set of multitemporal SAR data, was individually classified by SVM. Afterwards the outputs were fused by an additional SVM classifier. The results demonstrate that the proposed classification concept outperforms other parametric and non-parametric classification techniques (i.e., maximum likelihood classifier, decision tree, and boosted decision tree) including a single SVM. Moreover, the fusion step by an additional SVM classifier seems more efficient than other approaches, such as a simple majority vote. In [10] the simultaneous use of a neural network and a statistical method was discussed for classifying a hyperspectral data set. The image was separated into several feature subsets, using the correlation coefficient between the different bands. Afterwards each subset was individually classified by a statistical classifier and a neural network. To generate the final map the outputs were combined by decision fusion. In [11] a similar concept was used for economic forecasting. The feature space was separated into different subsets, using mutual information as criterion. Therefore, features within the same group are more similar to each other than compared to features belonging to different subsets. Whereas in [10] the individual feature subsets were used as input for the classifier ensemble, in [11] the input feature sets were generated by selecting individual features from each subset. In regard to these results, it seems interesting to apply the approach introduced in [9] to a hyperspectral data set. To generate multiple sources, the original hyperspectral data are split into several smaller data sources, following the correlation between bands as proposed in [10]. The subsets are classified by an individual SVM classifier. Finally the different outputs are combined by an additional SVM classifier [9]. The results are compared to those achieved by a single SVM classifier using the whole hyperspectral data set as well as other ensemble methods, such as boosted decision trees and random forests. The paper is organized as follows. The classification techniques are reviewed in Section 2, followed by the SVM ensemble in Section 3. The data set is introduced in Section 4. Experimental results are given in Section 5, and, finally, conclusions are discussed in Section 6.

2 2.1

Classifier Algorithms Support Vector Machines

Support Vector Machines fit an optimal separating hyperplane to the training data of two classes in a multi-dimensional feature space. In linearly non-separable cases, the input space is mapped into a high dimensional feature space, using a so-called kernel function [4]. A detailed overview on the general concept of SVM is given in [12]. A brief introduction is given below: Let us assume that a training

64

X. Ceamanos et al.

data set of ! samples, in a d -dimensional feature space !d , is given by xi with their corresponding class labels yi = ±1, Ω = {(xi , yi ) | i ∈ [1, !]}. The linear hyperplane fl (x) = wx + b is given by the normal vector w and the bias b, with |b| / #w# as the distance between the hyperplane and the origin, where #w# is the Euclidean norm from w. The margin maximization results in the following optimization problem: ! # ! " w2 +C min ζi (1) 2 i=1

with ζi as slack variables and C as regularization parameter. The constant C penalizes training errors, i.e., training samples that are located on the wrong side of the hyperplane. The final SVM function for a non-linear separable case is described as follows: $ ! % " αi yi k(xi , xj ) + b (2) fn (x) = i=1

where αi are Lagrange multipliers.

Thanks to the kernel-trick it is possible to work within the newly transformed feature space, without knowing the explicit mapping, but only knowing the kernel function k(xi , xj ). In this study a common radial basis function (RBF) kernel was used: & ' 2 k(xi , xj ) = exp −γ #xi − xj # . (3) The training of an SVM classifier requires the adequate definition of the kernel parameter γ and the regularization parameter C, which is usually done by a gridsearch. Various combinations of C and γ are tested and the combination that yields the highest accuracy (based on a cross validation) is taken. A one-againstone rule was used, which generates a binary SVM for each possible pair-wise classification problem. In contrast to other classifier algorithms, which provide probability measurements (e.g., Bayesian classifiers) and class labels (e.g., decision trees), respectively, the output image of a SVM classifier (Eq. 2) contains the distance of each pixel to the hyperplane of the binary classification problem. This information is used to determine the final result. 2.2

Multiple Classifier Systems

Multiple classifier systems combine variants of the same base classifier or different algorithms [13]. In doing so the total accuracy is usually increased, compared to the classification accuracy achieved by a single classifier [13]. Several different concepts have been introduced, which were also applied successfully to in diverse remote sensing studies [9],[10],[14],[15]. However two main techniques exist: boosting and bagging.

Ensemble Strategies for Classifying Hyperspectral Remote Sensing Data

65

Boosting techniques, such as AdaBoost.M1 [16], are concepts to improve the performance of any (weak) classifier. During the initialization all training samples are equally weighted. The weights of the training samples are iteratively modified after each step and the next base classifier CB within the system is trained on the reweighed samples. The weights of correctly classified training samples decreases, while misclassified samples are assigned a stronger weight. In doing so the classifier is focusing on ”difficult” training samples. The approach is described as follows: Input: A training set Ω = {(xj , yj )}!j=1 , base classifier CB and number of classifiers I. 1. Ω1 = Ω and weight(xj ) = 1 for j = 1 . . . l (x ∈ S1 ) 2. FOR i = 1 to I{ 3. Ci = CB (Ωi ) 4. calculate error rate &i 5. if &i > 0.5, terminate procedure 6. calculate weight βi = &i /(1 − &i ) 7. for each xj ∈ Ωi { if Ci (xj ) %= yj then weight(xj ) = weight(xj ) · βi }. 8. normalize weights that the total sum of weights is 1}. 9. END " 10. C ∗ (x) = arg max log (1/βi ) Ci (x)=y

In contrast to this bagging (bootstrap aggregating) is based on resampling instead of re-weighting, thus it does not change the distribution of the training data and all classes are equally weighted [17]. Usually a random and uniform selection is performed, generating a training set with ! samples from a training set of same size !. This random selection is performed with replacement, i.e., a sample can be selected several times in a set, whereas another sample is not considered in this particularly training set. Each individual training set is used to train the base classifier, thus, different outputs are generated. A simple majority vote is used to determine the final classification result. Bagging is described as follows: Input: A training set Ω={(xj , yj )}!j=1 , the base classifier CB and number of randomly generated training sets I. 1. FOR i = 1 to I{ 2. Ωi = training set from Ω 3. Ωi = C(Ωi )} 4. END 5. the class with the maximum number of votes is chosen The random forest (RF) method is a combination of bagging of the training samples as well as the attributes. RF is an ensemble of decision tree classifiers DT (x, θi ), i = 1, ..., where θi are independent identically distributed random vectors and x is an input pattern [18]. Thus, each tree within the classifier systems

66

X. Ceamanos et al.

ensemble is trained on a subset of the original training samples. In addition the feature subset is generated randomly at each split of a tree. RF are well suited for classifying high dimensional data sets, because the computational complexity is simplified by decreasing the number of input features (and training samples) at each node.

3

Data Sets

An AVIRIS (Airborne Visible InfraRed Imaging Spectrometer) data set was collected on a cloud-free day, surrounding the region of the volcano Hekla in South Iceland. The sensor operates from the visible to mid-infrared region of the electromagnetic spectrum (i.e., 0.4µm to 2.4µm). The system has four spectrometers and 224 data channels. Because spectrometer 4 was not working properly during the image acquisition, 64 bands (1.84µm to 2.4µm) were deleted from the imagery, as the first channels for all the other spectrometers, but those channels were blank. Thus, the image used in this study contains 157 bands. The data set covers 2048 x 614 pixels, with a spatial resolution of 20 m [10]. Twenty-four land cover classes were considered during the classification, mainly lava flows from different eruptions, partly covered by vegetation. The available ground truth information was equally split into independent training and test data. Moreover, different training data sets were generated, containing 25 and 100 samples per class, as well as a set with all available training data (total: 17491). The validation set contains 16667 samples.

4

SVM Ensemble Strategy

The proposed SVM classifier ensemble is based on the application of SVM on different data sources and a fusion of the outputs by an additional SVM. To generate different sources, the hyperspectral image is separated into feature subsets, in accordance to the correlation matrix (see Fig. 2). The elements in the correlation matrix Σ are defined by the absolute correlation r between the spectral response of individual bands si and sj in a d-dimensional feature space is defined as: ( ( ( ( ( ( ( ( )d )d )d ( ( d s s − s s 1 i j 1 i 1 j ( ( (4) rSi Sj = ( *+ . + . ( ( )d 2 ,)d -2 )d 2 ,)d -2 ( ( ( d 1 si − d 1 sj − 1 si 1 sj ( ( Figure 2 shows the correlation matrix for the data set. Blue regions show a low correlation, whereas a high correlation is indicated by red. A visual interpretation of the matrix points out three main regions of high correlation, ranging from band 3 to 29, 30 to 105 and 111 to 150. The low correlation values in the remaining bands, indicates noise (e.g., water absorption) and thus, the bands

Ensemble Strategies for Classifying Hyperspectral Remote Sensing Data

67

Fig. 1. AVIRIS data set, sourrounding the region of the volcano Hekla, South Iceland and coressponding test data with 24 classes

SVM3-29

f(x)#1

SVM30-105

f( )#2 f(x)

SVM111-150

f(x)#3

SVMOAO

f(x)all

final classification

Fig. 2. Schematic overview on the SVM-based ensemble (after Waske and Benediktsson, 2007). The visual interpretation of the correlation matrix points out three different major regions, which are used for the application of the proposed SVM ensemble.

are removed. The remaining 143-band data set was used for the classifcation by SVM, Boosting, RF, etc.. After generating three feature subsets, individual SVM classifiers were applied to each subset. The outputs were fused by a final SVM (see Fig. 2).

68

5

X. Ceamanos et al.

Experimental Results

The SVM were trained on the different feature subsets and the whole image. The training and parameter selection was performed using LIBSVM in a MATLAB environment [20]. The best values for γ and C were selected in an user defined range of possible parameters based on a leave-one-out cross validation. The outputs generated for the three feature subsets (see Fig.2) were then used for the decision fusion process, which was based on the application of another SVM (see Fig.2). In addition to the SVM, three different ensemble strategies were applied on the data sets, boosting, bagging and random forests (RF), with varying ensemble sizes (i.e., 10, 25, 50, 100). Boosting (i.e., AdaBoost.M1) and bagging were performed with a j4.8 decision tree, which is an implementation of the well-known C4.5 decision tree. All three ensembles were applied by using the WEKA data mining software [19]. In Table 1 the classification accuracies for the different results are given. The results of the proposed SVM classification show that the total accuracy is increased between 2.3% and 4.6% when compared to a single SVM classifier. Comparing the different DT-based classifier systems, it can be assessed that boosting and random forests achieve significantly higher accuracies compared to bagging, which even performs less accurate than a single SVM does. Boosting and RF achieve very similar results. Whereas boosting perform slightly better with large training data sets, the latter approach is more adequate for a smaller training set size. All three DT-based ensemble methods show a typical increase in the classification accuracy with an increasing number of classifiers within the ensemble (not presented in detail). Thus, in the following discussion only the results achieved by 100 iterations are considered. The accuracy assessment demonstrates that the proposed SVM ensemble strategy can outperform boosting and RF in terms of accuracy, depending on the number of available training samples. With a small number of samples (i.e., 25 per class) DT-based boosting and RF yield higher classification accuracy (i.e., 73.2% and 74.3%) than the proposed method (71.4%). Although a larger training set size results in an increase of the classification accuracies for all methods, the increase is most significant for the SVM ensemble. The classifications that Table 1. Overall test accuracy in percentage, using the proposed SVM-based classifier system and other ensemble methods (with 100 iterations) with different training sample sets (25 samples per class (TR 25), 100 samples per class (TR 100) and all training samples)

Method single SVM SVM ensemble Boosting Bagging RF

TR 25 66.8 71.4 73.2 65.3 74.3

Overall test accuracy [%] TR 100 all training samples 79.4 90 82.2 92.3 82.9 89.2 76.1 85.4 82.9 88.1

land cover class

Ensemble Strategies for Classifying Hyperspectral Remote Sensing Data

#24 #23 #22 #21 #20 #19 #18 #17 #16 #15 #14 #13 #12 #11 #10 #9 #8 #7 #6 #5 #4 #3 #2 #1

69

SVMensembleͲ SVM

-5

0

5

10

15

20

25

differenceinclassaccuracy

Fig. 3. Differences in class accuracies, achived by the SVM ensemble as compared to a standard SVM

are based on the medium training set size show very similar accuracies. However, using the largest training set with all available samples, the proposed ensemble strategy results in accuracy of 92.3%, which is approximately 3% to 4% higher compared to the results achieved by boosting and RF. The good performance of the proposed classifier ensemble is also underlined by the class accuracy. Comparing the class accuracies, achieved by the SVMensemble with those by a standard SVM classifier, it can be observed that the proposed strategy achieved the higher class accuracies in most cases. In Figure 3 the differences between the accuracies for different land cover classes are shown. The differences significantly tends towards the positive, i.e, the proposed strategy outperforms a single SVM in terms of the class accuracy, in most cases.

6

Discussion and Conclusion

In this paper, the problem of classifying hyperspectral imagery was addressed. A multiple classifier system was proposed, which is based on the fusion of SVM. The classification strategy is based on the combination of different SVM classifiers that are applied to several subsets within the feature space. To generate different subsets, the whole data set was separated, using the correlation between spectral bands. Besides the proposed strategy a single SVM and different well-known classifier ensembles were applied on the data set. Their performance was compared to the proposed method, varying the number of training samples. Experimental results show that the proposed SVM fusion outperforms an SVM classifier in terms of total accuracy, irrespectively of the number of available training samples. In contrast to this, other ensembles such as boosting

70

X. Ceamanos et al.

and random forests can outperform the proposed strategy in terms of accuracy, particularly with a small number of training samples. Nevertheless the SVM ensemble is interesting, when a larger number of training samples is available. In this case it performs better than boosting, bagging and random forests in terms of accuracy. Overall, the proposed method seems to be a promising, alternative classification strategy for hyperspectral remote sensing data and can yield higher accuracies than other well-known ensemble methods. In our future research the computational complexity needs to be reduced. Moreover the impact of the subset generation (e.g., number of subsets) on the overall accuracy will be investigated.

References 1. Hughes, G.F.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. on Information Theory 14, 55–63 (1968) 2. Melgani, F., Bruzzone, L.: Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans. Geosci. and Remote Sens. 42, 1778– 1790 (2004) 3. Pal, M., Mather, P.M.: Some issues in the classification of DAIS hyperspectral data. Int. J. Remote Sens. 27, 2896–2916 (2006) 4. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998) 5. Benediktsson, J.A., Chanussot, J., Fauvel, M.: Multiple Classifier Systems in Remote Sensing: From Basics to Recent Developments. In: Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 501–512. Springer, Heidelberg (2007) 6. Ham, J., Chen, Y.C., Crawford, M.M., Ghosh, J.: Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. on Geosci. and Remote Sens. 43, 492–501 (2005) 7. Joelsson, S.R., Benediktsson, J.A., Sveinsson, J.R.: Random forest classification of remote sensing data. In: Chen, C.H. (ed.) Signal and Image Processing for Remote Sensing, pp. 327–344. CRC Press, Boca Raton (2007) 8. Cheung-Wai Chan, J., Paelinckx, D.: Evaluation of Random Forest and Adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sens. of Env. 112, 2999–3011 (2008) 9. Waske, B., Benediktsson, J.A.: Fusion of Support Vector Machines for Classification of Multisensor Data. IEEE Trans. Geosci. and Remote Sens. 45, 3858–3866 (2007) 10. Benediktsson, J.A., Kanellopoulos, I.: Classification of Multisource and Hyperspectral Data Based on Decision Fusion. IEEE Trans. Geosci. and Remote Sens. 37, 1367–1377 (1999) 11. Liao, Y., Moody, J.: Constructing heterogeneous committees using input feature grouping. In: Solla, S.A., Leen, T.K., Muller, K.-R. (eds.) Advances in Neural Information Processing Systems (NIPS) Conference, 1999, Denver, USA, vol. 12. MIT Press, Cambridge (2000) 12. Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998) 13. Polikar, R.: Ensemble Based Systems in Decision Making. IEEE Circuits and Systems Magazine 6, 21–45 (2006)

Ensemble Strategies for Classifying Hyperspectral Remote Sensing Data

71

14. Briem, G.J., Benediktsson, J.A., Sveinsson, J.R.: Multiple Classifiers Applied to Multisource Remote Sensing Data. IEEE Trans. Geosci. Remote Sens. 40, 2291– 2299 (2002) 15. Waske, B., van der Linden, S.: Classifying multilevel imagery from SAR and optical sensors by decision fusion. IEEE Trans. on Geosci. and Remote Sens. 46, 1457–1466 (2008) 16. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: 13th International Conference of Machine Learning, Bari, Italy (1996) 17. Breiman, L.: Bagging predictors. Mach. Learning 24, 123–140 (1996) 18. Breiman, L.: Random forests. Mach. Learning 45, 5–32 (2001) 19. Witten, I.H., Eibe, F.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) 20. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), http://www.csie.ntu.edu.tw/~ cjlin/libsvm

Lihat lebih banyak...

Ensemble Strategies for Classifying Hyperspectral Remote Sensing Data

Descrição do Produto

Comentários