EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment

June 3, 2017 | Autor: Joao Paulo Papa | Categoria: Engineering, Bayesian, Support Vector Machines, Wavelets, Neurocomputing
Share Embed


Descrição do Produto

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment Thiago M. Nunes a, André L.V. Coelho b, Clodoaldo A.M. Lima c, João P. Papa d, Victor Hugo C. de Albuquerque b,n a

Centro de Ciências Tecnológicas, Universidade de Fortaleza, Fortaleza, CE, Brazil Programa de Pós-Graduação em Informática Aplicada, Universidade de Fortaleza, Fortaleza, CE, Brazil c Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, São Paulo, SP, Brazil d Departamento de Ciência da Computação, Universidade Estadual Paulista, Bauru, São Paulo, Brazil b

art ic l e i nf o

a b s t r a c t

Article history: Received 17 January 2013 Received in revised form 31 December 2013 Accepted 1 January 2014 Communicated by T. Heskes

Epilepsy refers to a set of chronic neurological syndromes characterized by transient and unexpected electrical disturbances of the brain. The detailed analysis of the electroencephalogram (EEG) is one of the most influential steps for the proper diagnosis of this disorder. This work presents a systematic performance evaluation of the recently introduced optimum path forest (OPF) classifier when coping with the task of epilepsy diagnosis directly through EEG signal analysis. For this purpose, we have made extensive use of a benchmark dataset composed of five classes, whose full discrimination is very hard to achieve. Four types of wavelet functions and three well-known filter methods were considered for the tasks of feature extraction and selection, respectively. Moreover, support vector machines configured with radial basis function (SVM-RBF) kernel, multilayer perceptron neural networks (ANN-MLP), and Bayesian classifiers were used for comparison in terms of effectiveness and efficiency. Overall, the results evidence the outperformance of the OPF classifier in both types of criteria. Indeed, the OPF classifier was usually extremely fast, with average training/testing times much lower than those required by SVM-RBF and ANN-MLP. Moreover, when configured with Coiflets as feature extractors, the performance scores achieved by the OPF classifier include 89.2% as average accuracy and sensitivity/specificity values higher than 80% for all five classes. & 2014 Elsevier B.V. All rights reserved.

Keywords: EEG signal classification Optimum path forest Bayesian Support vector machines Multilayer perceptrons Wavelets

1. Introduction In the last few decades, a significant progress has been made in the broad area of biomedical signal processing (BSP), aiming at extracting relevant information directly from raw physiological data [1]. In particular, the automated classification of these data has shown up as a promising strategy for assisting physicians in identifying hard-to-diagnosis pathologies, such as epilepsy. As mentioned by Chang and Lowenstein [2], the term epilepsy encompasses a number of different neurological syndromes characterized by a predisposition to recurrent unprovoked seizures. People has seizures when the electrical signals in the brain misfire. The brain's normal electrical activity is disrupted by these overactive electrical discharges, causing a temporary communication problem between nerve cells [3].

n

Corresponding author. E-mail addresses: [email protected] (T.M. Nunes), [email protected] (A.L.V. Coelho), [email protected] (C.A.M. Lima), [email protected] (J.P. Papa), [email protected] (V.H.C. de Albuquerque).

Arguably, the detailed analysis of electroencephalogram (EEG) is one of the most influential steps for the proper diagnosis of seizures and epilepsy [3]. Unfortunately, since the occurrence of an epileptic seizure cannot be predicted in advance in the majority of cases, continuous recording of EEG is quite common. However, analysis by visual inspection of long recordings of EEG is usually a time-consuming and error-prone process. Hence, the automatic detection of epilepsy directly from EEG data has been pursued by many researchers for a long time already. Several works have investigated different artificial intelligence (AI) approaches for tackling epilepsy diagnosis via EEG signal classification. For instance, Nigam and Graupe [4] employed a multistage nonlinear filter in combination with a LAMSTAR neural network. The overall success percentage achieved by their system, considering both the false positive and false negative rates, was of 97.2%. In turn, Patnaik and Manyam [5] adopted the wavelet transformation for feature extraction, a genetic algorithm (GA) for choosing the training set, and an artificial neural network (ANN) trained with backpropagation for the classification of the signals. An average specificity of 99.19%, a sensitivity of 91.29% and a selectivity of 91.14% were obtained.

0925-2312/$ - see front matter & 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2014.01.020

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

2

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

By other means, Subasi [6–8] adopted different versions of ANN and also mixture-of-expert (ME) models to discriminate between seizure and seizure-free profiles. Mixtures of ANN experts induced with wavelet coefficients have also been considered by Übeyli [9]. In this case, the total classification accuracy obtained by the ME network structures was 93.17%, and ROC curves for single multilayer perceptrons (MLP) and ME classifiers were provided. On the other hand, Güler and Übeyli [10] and Kannathal et al. [11] have both considered the application of neurofuzzy models as EEG classifiers. The main difference between these works lies in the type of feature extracted, via either wavelets or entropy measures, respectively. While the classification accuracy reported in [11] was typically above 90% for different entropy measures, that achieved in [10] was of 98.68% (with Daubechies of order 2 adopted as wavelet basis). Tzallas et al. [12] presented a methodology that is based on time–frequency analysis. Initially, selected segments of the EEG signals (maybe with different sizes) are analyzed using time– frequency methods and several features are extracted for each segment, representing the energy distribution in the time–frequency plane. Then, those features are used as an input to a feedforward neural network, which provides the final classification. To evaluate the methodology, the authors have generated four different classification problems, and the results achieved in terms of overall accuracy varied from 97.72% to 100%. By other means, Kocyigit et al. [13] have designed an MLP classifier based on a faster variant of independent component analysis (ICA) feature extraction technique. The resulting system achieved a sensitivity rate of 98% and a specificity rate of 90.5%. Recently, in a series of papers [14–16], our group has systematically evaluated the potentials of several kernel-based learning machines, such as support vector machines (SVM) and relevance vector machines, while tackling the task of automatic EEG signal classification. Overall, the results achieved evidence that all kernel machines considered were competitive in terms of accuracy and generalization, and the choice of the kernel function and its hyperparameter value as well as the choice of the feature extractor are really critical decisions to be taken into account. All the aforementioned works report experiments conducted on the dataset made publicly available by Andrzejak et al. [17], which facilitates the comparison of the results achieved. This dataset has served well for benchmarking novel approaches for EEG signal classification due to its intrinsic difficulties. In total, it has five classes (two of which comprising normal patients with eyes open or closed, and the remaining comprising ill patients with different levels of epilepsy), whose full discrimination is very hard to achieve. In this work, we also conduct a systematic empirical study on the problem of epilepsy diagnosis via EEG signal classification. However, we focus this time on another powerful classifier referred to as optimum path forest (OPF, for short) [18,19]. This classifier has gained increased attention in the last few years for it has some advantages over more traditional classifiers: (i) it is free of hard-to-calibrate control parameters; (ii) it does not assume any shape/separability of the feature space; (iii) it runs the training phase usually much faster; and (iv) it can make decisions based on global criteria. Moreover, the OPF classifier does not interpret the classification task as a hyperplane optimization problem, but as the computation of optimum paths from some key patterns (known as prototypes) to the remaining nodes. By this means, each prototype becomes a root from its optimum path tree, and then each node is classified according to its strongly connected prototype. This process defines a discrete optimal partition (aka influence region) of the feature space. So, we argue that, due to its high efficiency and accuracy, jointly with its parameter independence and robustness to highly non-linear datasets, the OPF classifier can be considered as a very suitable alternative for automatically classifying EEG signals for epilepsy diagnosis.

However, although showing promising results in different application domains [20–25], only recently the potentials of the OPF classifier have been investigated in the BSP context, more specifically in the problem of electrocardiogram-based arrythmia classification [26]. As far we are aware of, no work was conducted yet with respect to the task of epilepsy diagnosis. To assess the levels of performance delivered by the OPF classifier in this context, in terms of computational cost (efficiency) and accuracy/ generalization rate (effectiveness), we have investigated the sensitivity of this classifier to the choice of the distance function used to calculate the similarity between the patterns [27], as well as to the type of features extracted from the EEG signal via the wavelet transform [9,16,28]. Moreover, a performance comparison with SVM classifiers configured with radial basis function kernel (SVMRBF), MLP, and Bayesian classifiers was also realized. The rest of the paper is organized as follows. In Section 2, we briefly outline the main aspects behind the wavelet families used as EEG feature extractors and the classifiers used in the experiments. In this section, more emphasis is given to the characterization of the OPF classifier. Section 3 is devoted to the assessment of the performance of the OPF classifier on the task of EEG signal classification, taking into account different distance functions, feature modalities, and also the behavior displayed by the other well-known classifiers. Finally, Section 4 concludes the paper.

2. Materials and methods This section describes the two main steps involved in our classification methodology. First, we characterize the different wavelet families adopted as EEG feature extractors as well as the filter algorithms used for feature selection. Then, we detail the formalization and properties of the OPF classifier, and also briefly discuss the main features of the other learning algorithms adopted in the comparative assessment. 2.1. Feature extraction and selection In order to extract discriminatory features from raw EEG data, the discrete wavelet transform (DWT) was employed in this study [29,30]. According to Subasi [8], the selection of the number of decomposition levels and the mother wavelet is a very important decision to be taken when using wavelets. In the experiments reported in this paper, the number of levels the EEG signals were decomposed into was chosen to be five, since, as Subasi argues, these signals do not have any useful frequency components above 30 Hz. Thus, the EEG signals were decomposed into details D1–D5 and one final approximation, A5. Regarding the choice of the wavelet basis, Gandhi et al. [31] have recently performed an empirical study aiming at finding the most useful wavelet function for preprocessing EEG data. Different types of features, such as energy, entropy, and standard deviation values at different sub-bands, were computed using four families of wavelet functions, namely, Haar, Daubecchies, Coiflets, and Bioorthogonal. Based on the results achieved, the authors recommended Coiflets of order 1, followed by Daubecchies of orders 2 and 3, to be applied jointly with the classifier. While conducting the experiments for this paper, we have also considered different wavelet families with different orders and parametrizations, but due to space limitation we focus our analysis here on the following wavelet configurations, most of which were also used in [16]: Haar, Coiflets (Coif) orders 2–4, Symlet (Sym) orders 2–4, and Daubecchies (Db) orders 2–4. For each of the 10 wavelet types, five datasets were generated (one not normalized, one normalized without feature selection and three normalized

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

with feature selection), yielding 50 derived datasets over the 500 data patterns available in the dataset. The chosen features relate to well-known statistics calculated over the wavelet coefficients in each or adjacent sub-bands, namely, minimum, maximum, mean, standard deviation, power, absolute mean, and ratio of absolute mean [8,9,16,28], generating 40 features for the two datasets without feature selection, normalized and not normalized. Besides, in order to eliminate redundant features and thus to increase the performance of the induced classifiers, we have made use of well-known filter methods for feature selection, namely, Relief, InfoGain, and correlation-based feature subset selection (Cfss) [32]. The latter evaluates the quality of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them. So, subsets of features that are highly correlated with the class while having low intercorrelation are preferred. As the underlying search method used by Cfss to select the best subsets we have used a simple GA [33]. So, the size of the optimal subset of features selected by Cfss may vary for each type of wavelet basis. Conversely, Relief evaluates the worth of an attribute by repeatedly sampling an instance and considering the value of the given attribute for the nearest instance of the same and different class. On the other hand, as its name implies, InfoGain scores each attribute separately by measuring its information gain with respect to the class. For Relief and InfoGain, the 10 best features according to the respective criterion were selected for each wavelet function.

2.2. EEG signal classification 2.2.1. Optimum path forest classifier The OPF classifier models the problem of pattern recognition as a graph partition in a given feature space. The nodes represent the patterns (EEG features in this paper) and all pairs of nodes are connected by edges, defining a complete graph (Fig. 1a). This kind of representation is straightforward, given that the graph does not need to be explicitly represented, and has low memory requirements. The partition of the graph is carried out by a competition process between some key nodes (prototypes), which offer optimum paths to the remaining nodes of the graph. Each prototype defines

3

its optimum path tree (OPT), and the collection of all OPT defines the optimum path forest, which gives the name to the classifier [18,19]. In the experiments reported in the next section, we have made use of the LibOPF 2.0 made available in [34]. Four distance metrics, already available in the library, were used in this work, namely, Euclidean, Manhattan, Canberra and Squared Chi-square distances, and no other parameter was defined due to the parameter-free characteristic of the OPF classifier. The OPF classifier can be seen as a generalization of the wellknown Dijkstra's algorithm to compute optimum paths from a source node to the remaining ones [35]. The main difference lies in the fact that the OPF classifier uses not only one but a set of source nodes (prototypes) and can also be configured with any path cost function. In the case of Dijkstra's algorithm, a function that sums up the edge weights along a path is used. Conversely, in the OPF classifier, a function that gives the maximum edge weight along a path is usually adopted [18]. Let Z ¼ Z tr [ Z ts be a dataset labeled with a function λ, in which Ztr and Zts are, respectively, the training and test sets, and let S D Z tr be a set of prototype patterns (EEG feature vectors). Essentially, the OPF classifier builds a discrete optimal partition of the feature space such that any EEG feature vector t A Z ts can be classified according to this partition. This partition comes in the form of an optimum path forest computed in Rn by the image foresting transform (IFT) algorithm [36]. The OPF algorithm may be used with any smooth path cost function that can group EEG features with similar properties [36]. This work used the path cost function fmax, which is computed as follows: ( 0 if s A S; f max ð〈s〉Þ ¼ þ 1 otherwise: f max ðπ  〈s; t〉Þ ¼ maxff max ðπ Þ; dðs; tÞg;

ð1Þ

in which dðs; tÞ means the distance between EEG feature vectors s and t, and a path π is defined as a sequence of adjacent EEG features. As such, f max ðπ Þ computes the maximum distance between adjacent EEG features in π, when π is not a trivial path (null length). The OPF algorithm assigns one optimum path P n ðsÞ from S to every EEG feature s A Z tr , bringing forth an optimum path forest P

Fig. 1. OPF feature space representation: (a) Complete graph, (b) minimum spanning tree of (a), (c) highlighted prototypes, (d) optimum-path forest generated during the training phase.

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

(a function with no cycles that assign to each s A Z tr \S its predecessor PðsÞ in P n ðsÞ or a marker nil when s A S). Let RðsÞ A S be the root of P n ðsÞ that can be reached from PðsÞ. The OPF algorithm computes, for each s A Z tr , the cost CðsÞ of P n ðsÞ, the label LðsÞ ¼ λðRðsÞÞ, and the predecessor PðsÞ. The OPF classifier is composed of two distinct phases: (i) training and (ii) classification. The former step consists, essentially, in finding the prototypes and computing the optimum path forest. In the second step, a pattern (EEG feature) is picked from the test sample and labeled with the class of its most strongly connected prototype through the optimum path forest generated in the training phase. Notice that this test pattern (EEG feature) is not permanently added to the generated optimum path forest, i.e., it is used only once, and thus does not interfere with the classification of other test patterns. In what follows, we describe the training/test procedures of the OPF classifier in more detail, which are also illustrated in Fig. 3. Training: We say that Sn is an optimum set of prototypes when the OPF algorithm minimizes the classification errors for every s A Z tr . Sn can be found by exploiting the theoretical relation between the minimum spanning tree (MST) and optimum path tree for fmax [37]. The training phase essentially consists in finding Sn and an OPF classifier rooted at Sn. By computing an MST on the complete graph ðZ tr ; AÞ (Fig. 1b), we obtain a connected acyclic graph whose nodes are all EEG features of Ztr and the undirected edges are weighted by the distances d between adjacent EEG features. Remember that the MST is the tree that has the least sum of its edge weights compared to any other spanning tree induced on the complete graph. In an MST, every pair of EEG features is connected through a single path, which is optimal according to fmax (Eq. (1)). That is, an MST contains one optimum path tree for any selected root node. The optimal prototypes are the closest elements of the MST with different labels in Ztr (highlighted nodes in Fig. 1c); i.e., elements that lie on or close to the class boundaries. By removing the edges between different classes, their adjacent EEG features become prototypes in Sn, and the OPF classifier can compute an optimum path forest with lowest error rate in Ztr, as we can see in Fig. 1d. It should be noted that a given class may be represented by multiple prototypes, i.e., optimum path trees, and there must exist at least one prototype per class. Classification: For any EEG feature t A Z ts , all edges connecting t with EEG features s A Z tr are addressed (Fig. 2a), as if t were part of the training graph. Considering all possible paths from Sn to t, the optimum path P n ðtÞ from Sn is found and t is labeled with the class λðRðtÞÞ of its most strongly connected prototype RðtÞ A Sn (Fig. 2b). This path can be identified incrementally by evaluating the optimum cost CðtÞ as follows: CðtÞ ¼ minfmaxfCðsÞ; dðs; tÞgg;

8 s A Z tr :

ð2Þ

Let the node sn A Z tr be the one that satisfies Eq. (2), i.e., the predecessor PðtÞ in the optimum path P n ðtÞ. Given that Lðsn Þ ¼ λðRðtÞÞ, the classification simply assigns Lðsn Þ as the class of t. An error occurs when Lðsn Þ a λðtÞ. 2.2.2. Machine learning techniques Bayesian classifier: Let pðωi jxÞ be the probability that a given EEG feature x A Rn belongs to class ωi, i ¼ 1; 2; …; c, which can be defined by resorting to the Bayes Theorem [38]: pðωi jxÞ ¼

pðxjωi ÞPðωi Þ ; pðxÞ

ð3Þ

where pðxjωi Þ is the likelihood function of the EEG features that belong to class ωi, and Pðωi Þ denotes the prior of class ωi. A Bayesian classifier decides that an EEG feature x belongs to class ωi when pðωi jxÞ 4 pðωj jxÞ;

8 i; j ¼ 1; 2; …; c; i a j;

ð4Þ

which can be rewritten as follows by considering Eq. (3): pðxjωi ÞPðωi Þ 4 pðxjωj ÞPðωj Þ;

8 i; j ¼ 1; 2; …; c; ia j:

ð5Þ

As easily noticeable, the Bayesian classifier's decision function di ðxÞ ¼ pðxjωi ÞPðωi Þ of a given class ωi strongly depends on the previous knowledge of pðxjωi Þ and Pðωi Þ, 8 i ¼ 1; 2; …; c. The probability values of Pðωi Þ are straightforward and can be obtained by calculating the histogram of the classes. However, the main problem is to estimate pðxjωi Þ, given that the only information available is a set of EEG features and its corresponding labels. A common practice is to assume that the likelihood function is Gaussian, and thus one can estimate its parameters using the dataset samples [39]. In the n-dimensional case, a Gaussian density of the EEG features from class ωi can be calculated using   1 1 T 1 ðx  pðxjωi Þ ¼ exp  μ Þ C ðx  μ Þ ; ð6Þ i i i 2 ð2π Þn=2 jCi j1=2 in which μi and Ci correspond to the mean vector (centroid) and the covariance matrix of class ωi, respectively. Multilayer artificial neural network: In a nutshell, an MLP classifier is a feedforward neural network composed of several layers of perceptrons aiming to solve multiclass problems [40]. In this setting, the output of a neuron of the ith layer feeds the inputs of neurons at the ði þ 1Þ th layer. The first layer, denoted by A, has NA neurons, where NA is the number of features of the feature vector, while the last layer, denoted by Q, has NQ neurons, which stands for the number of the classes c. This neural network assigns an EEG feature x to a class ωq if the qth output neuron achieves the highest value. Each input layer corresponds to a weighted sum of the previous layer. Let J  1 denote the previous layer of J, such that each input IJj

Fig. 2. OPF classification phase: (a) testing sample (green) is connected to all training nodes of the optimum-path forest generated in the training phase, as displayed in Fig. 1d, (b) testing sample classified according to fmax (Eq. (1)). (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

in J is given by NJ  1

I Jj ¼ ∑ wjk OJk 1 ;

ð7Þ

k¼1

where OJk 1 ¼ ϕðI Jk 1 Þ; j ¼ 1; 2; …; NJ , with NJ and NJ  1 being the number of neurons at layers J and J  1, respectively, whereas wjk stands for the weights that modify the kth output of layer J  1, i.e., OJk 1 . The backpropagation algorithm is usually employed to train MLP classifiers [40]. This algorithm minimizes the mean squared error EQ between the desired outputs rq and the obtained outputs Φq of each node of the output layer Q. Therefore, the idea is to minimize the equation below: EQ ¼

1 NQ ∑ ðr q  Φq Þ2 : NQ q ¼ 1

ð8Þ

For the experiments reported in the next section, we have used the Fast Artificial Neural Network (FANN) library made available in [41]. Such implementation supports both fully and sparsely connected networks arranged in different layers. In this work, we used the following empirically chosen architecture: the number of neurons in the input layer equals to the number of features, one hidden layer with fifteen neurons and five neurons in the output layer corresponding to the five classes. Support vector machines: While the learning of MLP is based on the principle of empirical risk minimization, the SVM induction process is rooted in the principle of structural risk minimization [42–44], aiming at accomplishing the tradeoff between generalization and overfitting. For this purpose, Vapnik [42] considered the class of separating hyperplanes lying in some dot product space H, 〈w; x〉 þ b ¼ 0;

ð9Þ

accuracy, but the former, which was the one adopted in the experiments reported in this paper through the use of the LibSVM library [45], usually requires shorter training time, although incurring a higher number of binary decompositions.

3. Results and discussion Like the related works surveyed in the introduction of this paper, for assessing the performance of the OPF classifiers in the task of epilepsy diagnosis, we have employed the EEG data repository made publicly available by Andrzejak et al. [17]. The complete dataset consists of five sets (denoted as A–E), each containing 100 single-channel EEG segments of 23.6 s. Table 1 brings a descriptive summary of the five classes. In some works [5,6,8,14], only the sets A and E were used for assessing the classifiers' performance. Other works, on the other hand, have considered three out of the five classes, namely, A, D, and E [9,46]. In this work, like in [47], we have considered the whole dataset of 500 EEG segments. This decision rendered the classification problem much more hard to solve by single classifiers in general (since the patterns from classes C and D are the most intertwined ones), but allowed us to better analyze the sensitivity of the OPF classifier's performance to the choice of the underlying similarity (distance) function and feature extractor effectively used. All classifiers considered in this study underwent a 10-fold stratified cross validation, with the same folds being used by each classifier. To evaluate their performance, an analysis was conducted based on mean and standard deviation values obtained over well-known metrics. These metrics are described in the sequel. For all experiments reported here, a PC Intel i7 s at 2.8 GHz and 4 GB of RAM, running Linux Ubuntu, was used.

where w; x A H; b A R, with the corresponding decision function: f ðxÞ ¼ sgnð〈w; x〉þ bÞ:

ð10Þ

In order to construct the optimal hyperplane via SVM, one should minimize the functional τðwÞ ¼ 12 ‖w‖2 subject to yi ð〈w; xi 〉 þ bÞ Z 1, 8 iA f1; …; mg, where m denotes the number of training examples and the inequality constraints ensure that f ðxi Þ will be þ 1 for yi ¼ þ 1 and  1 for yi ¼  1. Aiming to extend from linear to nonlinear support vector machines, the kernel trick is used [43]. In a nutshell, kernels stand for functions that follow the constraints imposed by Mercer's theorem and that nonlinearly map input data into high-dimensional feature spaces in a computationally-efficient manner. In the experiments reported in the next section, the hyperparameters C and s (we have employed a Radial Basis Function, RBF, kernel) were determined via a 5-fold cross-validation grid search performed for each of the 10 training sets applied in the 10-fold cross-validation process used to calculate the SVM classifier's accuracy, as explained later. When coping with classification problems with multiple classes, which is the case here, two approaches are usually employed to work with binary SVM, one-against-one and one-against-all [32]. Both strategies generally lead to similar results in terms of classification

Fig. 3. OPF training and classification steps.

Table 1 Description of the five classes available in the EEG dataset [17]. Class Description A B C D E

5

Signals obtained extra-cranially from surface EEG recordings of healthy individuals with eyes open Signals obtained extra-cranially from surface EEG recordings of healthy individuals with eyes closed Signals sampled intra-cranially (i.e., from the hippocampal formation of the opposite hemisphere of the brain) from unhealthy individuals in seizure-free intervals Signals sampled intra-cranially (i.e., directly from the epileptogenic zone) from unhealthy individuals in seizure-free intervals Signals obtained intra-cranially and related to seizure (ictal) activity

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

6

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

3.1. Modeling the EEG signal classification task as an optimum-path forest problem As aforementioned in Section 2.2.1, the OPF classifier models the task of pattern recognition as a graph partition problem, in which some key samples (EEG features) – prototypes – compete

among themselves in order to conquer the remaining samples offering to them optimum-path costs. In this paper, each graph node is represented by an EEG feature vector, and we have at least five prototype samples, one for each class described in Table 1. However, in practice, we may have more than one prototype per class, which means that each EEG data class may be represented by several OPTs. Fig. 4 illustrates this procedure. The main idea is to perform a feature extraction procedure from EEG signal repository, such that each graph node will be encoded by an EEG feature vector. Further, the dataset is partitioned in a training and a testing set, being the former used to train the OPF classifier, and the latter set is then used to assess the effectiveness of OPF. If we consider the red module (“OPF training step”) in Fig. 4 with more details, we can face the OPF training procedure, in which the nearest EEG features with different labels, i.e., classes “A”, “B”, “C”, “D” or “E”, are marked as prototypes. Fig. 5 depicts an example of one OPT for each EEG class. In regard to the classification step, an EEG feature vector is associated with the OPT of the sample (EEG feature) that has conquered it. In situations in which we have more than one optimum-path, the EEG feature that has reached the test sample first will conquer it. 3.2. Statistical measures

Fig. 4. EEG signal recognition as an OPF classification problem. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

In order to analyze the classifiers' performance, four wellknown measures were employed, viz. accuracy, sensitivity, positive predictive value and F-measure, whose definitions are briefly recapitulated below. Accuracy (Acc) is defined as the ratio between the number of EEG patterns correctly classified and the number of total EEG

Fig. 5. Optimum-path forest generated at the OPF training step: in this example we have one OPT for each EEG class.

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

7

patterns:

EEG patterns classified as pertaining to that class, that is,

# EEG patterns correctly classified : Accuracy ¼ # EEG patterns

Positive predictive value ¼

ð11Þ

Sensitivity (Se) (aka recall) is the ratio between the number of correctly classified EEG patterns of a given class and the total number of EEG patterns available for that class, including those that were misclassified, that is, Sensitivity ¼

true positives ; true positives þ false negatives

ð12Þ

in which true positives and false negatives stand for the number of EEG patterns of a given class correctly and uncorrectly classified, respectively. Positive predictive value (Ppv) (aka precision) means the ratio between the correctly classified patterns of a specific class and all

true positives ; true positives þ false positives

ð13Þ

in which false positives denote the number of EEG patterns uncorrectly classified as belonging to the considered class. The F-measure (Fm) for a given class is calculated as the harmonic mean of the Se and Ppv values for that specific class, resulting in a more global parameter for evaluating the performance of the classifier on each class. More formally:   Se  Ppv : ð14Þ Fm ¼ 2 Se þ Ppv These four metrics allow us to evaluate the performance of the classification algorithms considered in this work with great reliability. Since we are dealing with a classification problem with multiple classes, the statistic measures Se, Ppv, and Fm were calculated for each class separately.

Table 2 Characterization of the derived EEG datasets. Dataset

Wavelet

Normalization

Feature selection (FS)

# Features

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

Coiflets-Order 2 Coiflets-Order 2 Coiflets-Order 2 Coiflets-Order 2 Coiflets-Order 2 Coiflets-Order 3 Coiflets-Order 3 Coiflets-Order 3 Coiflets-Order 3 Coiflets-Order 3 Coiflets-Order 4 Coiflets-Order 4 Coiflets-Order 4 Coiflets-Order 4 Coiflets-Order 4 Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Daubecchies-Order Haar Haar Haar Haar Haar Symlet-Order 2 Symlet-Order 2 Symlet-Order 2 Symlet-Order 2 Symlet-Order 2 Symlet-Order 3 Symlet-Order 3 Symlet-Order 3 Symlet-Order 3 Symlet-Order 3 Symlet-Order 4 Symlet-Order 4 Symlet-Order 4 Symlet-Order 4 Symlet-Order 4

No Yes Yes Yes Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No Yes Yes Yes Yes No Yes Yes Yes Yes

None None Cfss InfoGain RelieF None None Cfss InfoGain RelieF None None Cfss InfoGain RelieF None None Cfss InfoGain RelieF None None Cfss InfoGain RelieF None None Cfss InfoGain RelieF None None Cfss InfoGain RelieF None None Cfss InfoGain RelieF None None Cfss InfoGain RelieF None None Cfss InfoGain RelieF

40 40 15 10 10 40 40 15 10 10 40 40 15 10 10 40 40 15 10 10 40 40 17 10 10 40 40 22 10 10 40 40 19 10 10 40 40 15 10 10 40 40 19 10 10 40 40 22 10 10

2 2 2 2 2 3 3 3 3 3 4 4 4 4 4

# Samples/class

100

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

8

For efficiency, the training and test run times associated with each classifier were also recorded and evaluated in terms of mean and standard deviation, taken over the 10 cross-validation folds. All measures were calculated for each combination of feature extractor (Coiflet, Daubecchies, Haar, and Symlet wavelets) and selector (Relief, InfoGain, and Cfss) considered. All the results were evaluated for the OPF and other classifiers considering 50 datasets, divided in: (i) not normalized and without feature selection (non_norm); (ii) normalized and without feature selection (noFS); and (iii) normalized with feature selection (FS), as shown in Table 2. 3.3. Impact of the distance measure and feature extractor/selector on the performance of the OPF classifier One of the most important characteristics of the OPF classifier is the possibility to work with different distance measures to calculate the edge weights of the pattern graphs used in the

training and classification phases [27]. Up to now, we have experimented with seven well-known metrics, namely, Euclidean, Chi-Squared, Manhattan, Squared Chord, Squared Chi-square, Canberra, and BrayCurtis, but, due to space restrictions, we focus our analysis on those that yielded best overall performance to the OPF classifier, namely, Euclidean, Manhattan, Squared Chi-square, and Canberra. In many situations, OPF classifier's accuracy rates can be highly influenced by the relationship between the adopted distance measure and some characteristics of the classification problem in hand, such as the levels of nonlinearity among the class boundaries, feature dimensionality, and class-size imbalance. In particular, the performance of the OPF classifier may decline significantly when datasets are highly imbalanced [26,48]. In this work, as seen in Table 2, all datasets are well balanced. Table 3 shows the best mean results of accuracy (in 0.1%), training and test run times (in ms) for each dataset (the complete results are stated in Appendix by Table 12), as well as the distance

Table 3 Accuracy, training, and test run times delivered by different settings of the OPF classifier. We have shown only the best mean accuracies. Wavelet

Acc [%]

Train time [ms]

Test time [ms]

Distance

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

76.6(78) 81.4(58) 80.4(51) 80.2(33) 78.2(70) 78.8(67) 82.2(35) 80.8(69) 80.2(61) 82.0(52) 80.2(52) 82.2(71) 87.4(45) 80.0(37) 81.4(27) 75.8(56) 74.4(39) 78.0(67) 71.2(45) 76.2(48) 75.8(35) 76.4(62) 75.6(51) 75.4(53) 75.8(56) 75.2(51) 77.8(55) 75.6(61) 75.2(58) 76.0(52) 75.4(66) 73.4(60) 71.4(57) 73.2(44) 74.6(47) 75.8(56) 74.4(39) 78.0(67) 71.6(45) 74.4(69) 75.8(35) 76.4(62) 72.6(71) 75.4(53) 75.8(56) 77.2(57) 77.0(44) 79.0(43) 71.4(71) 74.6(92)

8.99(3.12) 10.13(2.51) 9.99(2.12) 9.00(3.66) 11.47(0.27) 9.13(3.36) 9.04(2.50) 7.91(2.24) 9.18(3.12) 9.32(2.90) 9.64(3.02) 9.24(2.70) 9.79(2.65) 9.08(2.39) 7.21(2.77) 8.01(3.16) 8.20(2.88) 9.12(3.02) 8.14(2.87) 9.82(2.83) 10.03(2.50) 8.19(3.59) 9.25(2.78) 10.28(2.75) 6.96(3.16) 9.09(2.99) 10.26(2.12) 8.69(3.16) 9.69(1.80) 11.15(0.08) 9.54(2.57) 10.89(0.40) 10.95(0.22) 11.13(0.41) 11.25(0.43) 10.76(1.94) 10.93(0.37) 10.96(0.31) 8.80(2.47) 11.20(0.51) 11.34(0.61) 9.18(3.47) 10.83(0.78) 10.72(1.71) 10.51(2.13) 11.14(0.93) 7.76(3.46) 11.01(0.44) 10.99(0.90) 11.28(0.57)

0.25(0.15) 0.28(0.09) 0.25(0.10) 0.24(0.14) 0.36(0.11) 0.27(0.10) 0.27(0.08) 0.26(0.08) 0.26(0.09) 0.26(0.10) 0.31(0.09) 0.26(0.10) 0.28(0.08) 0.28(0.09) 0.26(0.13) 0.23(0.10) 0.15(0.10) 0.29(0.11) 0.16(0.10) 0.29(0.11) 0.30(0.08) 0.17(0.10) 0.18(0.11) 0.32(0.10) 0.26(0.11) 0.25(0.13) 0.33(0.07) 0.25(0.11) 0.35(0.05) 0.35(0.07) 0.27(0.10) 0.31(0.10) 0.41(0.15) 0.34(0.10) 0.39(0.12) 0.32(0.16) 0.33(0.07) 0.35(0.07) 0.29(0.09) 0.35(0.07) 0.33(0.09) 0.30(0.09) 0.33(0.09) 0.37(0.03) 0.31(0.10) 0.36(0.07) 0.24(0.09) 0.37(0.13) 0.32(0.09) 0.31(0.10)

Squared Chi-square Manhattan Manhattan Manhattan Manhattan Squared Chi-square Manhattan Euclidean Canberra Manhattan Squared Chi-square Manhattan Canberra Euclidean Manhattan Squared Chi-square Manhattan Manhattan Euclidean Manhattan Manhattan Manhattan Manhattan Manhattan Manhattan Squared Chi-square Manhattan Manhattan Euclidean Manhattan Manhattan Manhattan Manhattan Manhattan Manhattan Squared Chi-square Manhattan Manhattan Euclidean Euclidean Manhattan Manhattan Manhattan Manhattan Manhattan Manhattan Manhattan Manhattan Manhattan Manhattan

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

measure that achieved the best results, and their respective standard deviation values (in parentheses), delivered by the OPF classifier using the best four distance measures. The reader can perform such task by following the steps below:

 compute the distances among all pairs of dataset nodes using    

the program opf_distance; generated training and test samples with opf_split; train OPF using opf_train; classify test samples using opf_classify; evaluate results by computing the recognition rate with opf_ accuracy.

Notice all the aforementioned programs are available in LibOPF 2.0 [34]. It is possible to notice that the best mean Acc values were obtained using Coiflets of order 4 (Coif4) for all three types of datasets, with the Coif4-cfss setting prevailing over the others when the Manhattan distance was used (89.2%). For non-normalized datasets, the best results were obtained with the Squared Chi-square distance (80.2%), whereas for noFS and FS datasets the best accuracy rates were achieved with the Manhattan distance, namely, 82.2% (Coif3-noFS) and 89.2% (Coif4-cfss), respectively. These results are in agreement with those reported by Gandhi et al. [31] when working with PNN, indicating that Coiflets are in fact good filters for characterizing EEG signals for the purpose of epilepsy diagnosis – in our case, the same conclusion about the Coiflets performance for EEG signals characterization can be found, leading to the best results delivered by the OPF classifier.

9

On the other hand, the fact that the best distance measures for non-normalized and normalized datasets were not the same that testify how important is the choice of the distance function for maximizing the effectiveness of the OPF classifier. In fact, when working with Chi-Squared, Squared Chord, and BrayCurtis distance measures, the results achieved were very poor, not helping in the full discrimination among the classes. Table 3 also shows the runtimes of each machine learning technique employed in this paper. The reader can observe the average OPF training time varied from 6.31 to 11.47 ms, while the average SVM training time varied from 27,066.14 to 22,928,221,68 ms, the average Bayesian training time varied from 5.54 to 16.91 ms, and the average ANN-MLP training time varied from 16,077.47 to 19,690.59 ms. Therefore, OPF has been extremely faster for training than ANN-MLP and SVM classifiers, and it has been similar to the Bayesian classifier considering the efficiency in this phase (training). In regard to the classification step, OPF has been much faster than all techniques (up to 20 times), except for ANN-MLP, which has been slightly faster than OPF. In general, the best sensitivity, positive predictive, and F-measure results (as shown in Tables 13, 14, and 15, respectively) were also obtained using the Coif4-cfss plus Manhattan distance setting, evidencing that this is a powerful configuration of the OPF classifier for the task of epilepsy diagnosis via EEG signal classification. Indeed, with this configuration, the OPF classifier yielded precision values of 92%, 88%, and 81% for classes A, C, and D, respectively, as well as sensitivity values of 93%, 92%, 83%, and 83% for classes A, B, C, and D, respectively. For class B, the Canberra distance was a bit better than Manhattan distance in terms of

Fig. 6. Two-dimensional distributions produced by applying PCA on the Coif4-cfss dataset, considering: (a) all classes; (b) normal patients; (c) epileptic patients; and (d) patients with similar seizure levels (classes C and D).

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

10

precision, whereas for class E the sensitivity value was slightly higher (namely, 2% units) when using Chi-squared distance in place of the Manhattan distance. With respect to F-measure, one can observe that class E was the one associated with highest Fm values, evidencing that the signals of this class are more easy to discriminate from those of the other classes – this result corroborates with that reported in [15,16]. On the other hand, the Fm values achieved for classes C and D were the lowest. As mentioned before, these classes are the most intertwined ones, when considering the original dataset, and so in thesis the more difficult to discriminate one from the other. But, even for them, the best calibrated OPF classifier setting could deliver satisfactory levels of precision and recall. Fig. 6(a–d) illustrates the two-dimensional distributions of the samples for the Coif4-cfss dataset, which is the case where the OPF classifier (configured with the Manhattan distance) achieved the best overall performance, according to the aforementioned discussion. These figures were produced by considering the best couple of features elicited via principal component analysis (PCA).

Fig. 6(a) considers the samples of all five classes, while Fig. 6(b) focuses on normal patients (classes A and B), Fig. 6(c) considers epileptic patients (classes C, D and E), and Fig. 6(d) takes into account the discrimination between classes C and D. As one can see, class E is made more separable with respect to the others by the data preprocessing stage, helping the OPF classifier in correctly classifying the majority of its samples. Yet, the discrimination of classes C and D is still not much straightforward, due to their strong overlap, which evidences that the satisfactory levels of precision/recall achieved by the OPF classifier for these classes were mainly due to its distinctive logic of operation.

3.4. Evaluation of SVM-RBF, Bayesian, and ANN-MLP classifiers Table 4 shows the mean accuracy rates, training and test run times obtained by the best classifier for all datasets (notice we have shown only the best results, being the complete experimental evaluation displayed in Table 16). It can be noticed that data

Table 4 Average scores of accuracy, training and test run times (in ms) obtained by the best classifiers. Wavelet

Acc [%]

Train time [ms]

Test time [ms]

Classifier

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

70.8(54) 81.8(77) 80.4(51) 84.4(65) 79.4(50) 70.0(58) 80.6(65) 81.2(41) 81.0(51) 81.6(72) 72.6(73) 79.2(83) 87.4(50) 80.8(56) 82.6(38) 72.0(71) 75.2(61) 78.3(54) 70.8(58) 75.2(47) 72.0(47) 67.4(82) 72.0(66) 75.2(49) 74.8(52) 71.2(83) 71.6(49) 75.0(58) 75.4(57) 73.8(52) 72.2(61) 72.0(60) 70.2(45) 74.2(47) 73.4(48) 72.0(71) 75.2(61) 73.0(54) 71.0(39) 73.6(54) 72.0(53) 69.4(75) 68.8(66) 75.2(49) 74.8(52) 73.8(68) 72.6(56) 72.8(41) 76.4(41) 73.0(57)

11.95(3.6) 18,719.68(253.6) 7.99(2.7) 12,948.55(212.2) 13,567.97(286.9) 13.66(4.9) 12,626.86(147.4) 13,913.29(123.4) 4.98(2.3) 6.01(2.7) 10.09(2.8) 18,547.42(167.2) 7.62(2.8) 12,854.24(187.9) 6.30(2.0) 13.51(4.7) 21,427.69(279.0) 17,008.38(214.1) 16,636.12(311.9) 6.23(2.4) 14.57(4.2) 21,967.88(110.6) 8.99(2.6) 5.76(2.2) 5.54(2.4) 12.76(4.7) 21,973.33(162.7) 17,870.59(197.1) 6.43(2.0) 7.82(0.03) 16.81(2.5) 21,657.75(167.4) 17,405.06(197.1) 15,398.97(196.8) 7.45(0.9) 16.78(3.0) 21,436.12(208.3) 16,820.45(250.3) 7.44(0.9) 16,076.74(274.9) 14.65(2.8) 22,005.92(162.1) 18,496.28(173.0) 6.85(1.9) 7.39(1.4) 15.97(2.1) 22,086.40(193.6) 17,649.64(143.8) 15,855.21(328.4) 7.86(0.05)

6.13(0.5) 9.39(2.1) 5.49(2.2) 4.35(0.2) 4.44(0.2) 5.96(0.3) 9.28(1.2) 5.16(0.2) 3.85(1.6) 4.77(2.0) 6.25(0.7) 9.61(1.1) 5.29(1.4) 4.49(0.2) 4.79(1.2) 6.78(2.3) 13.67(1.6) 6.60(0.9) 4.80(0.2) 4.99(1.7) 5.88(0.0) 12.84(1.1) 6.46(1.7) 4.54(1.7) 4.36(1.9) 6.12(0.6) 12.17(1.3) 9.01(1.4) 5.08(1.7) 5.52(1.2) 5.90(0.1) 12.13(2.4) 6.24(0.3) 4.74(0.3) 5.87(0.8) 5.95(0.3) 12.47(1.4) 6.41(1.2) 5.71(1.0) 5.08(0.4) 6.07(0.4) 13.46(2.2) 6.72(0.7) 5.18(1.4) 6.06(1.1) 5.84(0.02) 13.36(1.6) 7.21(0.6) 5.03(0.2) 5.37(0.9)

Bayesian SVM-RBF Bayesian SVM-RBF SVM-RBF Bayesian SVM-RBF SVM-RBF Bayesian Bayesian Bayesian SVM-RBF Bayesian SVM-RBF Bayesian Bayesian SVM-RBF SVM-RBF SVM-RBF Bayesian Bayesian SVM-RBF Bayesian Bayesian Bayesian Bayesian SVM-RBF SVM-RBF Bayesian Bayesian Bayesian SVM-RBF SVM-RBF SVM-RBF Bayesian Bayesian SVM-RBF SVM-RBF Bayesian SVM-RBF Bayesian SVM-RBF SVM-RBF Bayesian Bayesian Bayesian SVM-RBF SVM-RBF SVM-RBF Bayesian

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

normalization is an essential pre-processing operation for SVMRBF and ANN-MLP algorithms, since their levels of performance on non_norm datasets were usually very poor. This behavior cannot be related to the dimensionality of the dataset, since SVM-RBF in particular is not sensitive to high-dimensional data [43]. In contrast, when working with normalized data, it can be observed a significant increase in the accuracy rates of these classifiers. Another drawback is that these classifiers are very sensitive to the calibration of their control parameters, which is not the case of the Bayesian and OPF classifiers. For ANN-MLP, the values for the number of hidden layers, neurons per layer, and epochs were the same for all 50 datasets, being set up through preliminary experiments, like in previous work [19]. For the tuning of SVM-RBF hyperparameters (C and s), grid search was performed for each training fold in each dataset. This resulted in higher training times for SVM models, mainly for non-normalized datasets.

11

Overall, the Bayesian classifier with the Coif4-cfss preprocessing presented the best accuracy rate (87.4%), and also best training and test run times (7.65 and 5.29 ms, respectively). SVM-RBF presented the second best average rate in terms of accuracy, reaching 84.4% when configured with Coif2-infogain. Its average training and test run times were of 12,948.55 ms and 4.35 ms, respectively. On the other hand, the best average accuracy result obtained by an ANN-MLP classifier was on the Coif2-cfss dataset, viz. 70.6%, with 16978.41 ms and 0.13 ms as training and test run times respectively. For noFS datasets, SVM achieved the best accuracy, namely, 81.8%, among the contestants, making use of the Coif2 wavelet. Once again, in general, Coiflets yielded the best results for normalized datasets. For non-normalized datasets, the Bayesian classifier was far better than both SVM-RBF and ANN-MLP, reaching 73.8% on the Sym4-non_norm dataset. The lowest training times were always produced by the Bayesian

Table 5 Sensitivity scores (in % by class) delivered by different settings of the SVM-RBF, Bayesian and ANN-MLP classifiers. Wavelet

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

SVM

Bayesian

ANN-MLP

A

B

C

D

E

A

B

C

D

E

A

B

C

D

E

72(40) 91(13) 90(13) 94(5) 85(7) 8(19) 89(12) 84(8) 89(7) 86(8) 56(22) 86(13) 92(10) 92(8) 84(10) 87(11) 83(11) 76(16) 68(15) 83(12) 86(13) 80(11) 80(13) 85(11) 79(17) 81(21) 83(11) 84(5) 85(10) 86(13) 82(14) 72(13) 67(15) 74(11) 77(14) 87(11) 83(11) 76(16) 69(14) 82(10) 86(13) 81(10) 74(16) 85(11) 79(17) 81(10) 81(16) 79(14) 80(12) 68(19)

23(28) 90(12) 89(10) 90(11) 87(12) 12(16) 90(8) 91(10) 86(13) 87(12) 4(10) 87(7) 91(6) 86(8) 86(12) 42(11) 83(11) 79(14) 86(14) 75(13) 39(16) 66(16) 86(10) 87(8) 84(12) 33(13) 74(11) 84(8) 83(8) 77(19) 47(12) 81(11) 78(15) 84(11) 83(7) 42(11) 83(11) 79(14) 84(14) 78(8) 39(16) 69(15) 78(12) 87(8) 84(12) 42(12) 72(11) 74(12) 87(12) 60(11)

20(42) 68(11) 74(20) 76(18) 73(16) 90(25) 67(13) 65(19) 74(10) 76(16) 86(31) 69(16) 72(15) 77(13) 64(13) 91(11) 57(17) 70(15) 66(16) 61(13) 88(11) 55(14) 59(14) 57(16) 64(13) 28(45) 63(8) 63(13) 57(18) 54(20) 69(17) 64(17) 71(12) 75(13) 58(11) 91(11) 57(17) 70(15) 63(14) 56(11) 88(11) 57(8) 65(17) 57(16) 64(13) 86(13) 63(16) 61(13) 70(16) 47(14)

3(7) 63(16) 52(18) 66(16) 55(8) 2(6) 61(17) 71(17) 55(16) 47(21) 1(3) 59(18) 58(12) 52(19) 63(19) 2(6) 58(12) 45(14) 43(11) 54(14) 1(3) 44(20) 40(13) 50(18) 54(20) 4(13) 44(14) 48(19) 47(19) 52(19) 27(7) 47(14) 41(17) 41(17) 48(14) 2(6) 58(12) 45(14) 40(16) 55(14) 1(3) 47(18) 36(12) 50(18) 54(20) 0(0) 57(13) 60(16) 48(4) 41(16)

10(32) 97(5) 96(7) 96(7) 97(7) 10(32) 96(7) 95(10) 97(5) 95(11) 10(32) 95(10) 95(7) 97(5) 97(7) 0(0) 95(7) 95(7) 91(9) 96(7) 0(0) 92(8) 94(10) 94(8) 91(11) 0(0) 94(7) 96(5) 96(7) 94(7) 0(0) 96(5) 94(5) 97(7) 95(5) 0(0) 95(7) 95(7) 91(9) 97(7) 0(0) 93(8) 91(7) 94(8) 91(11) 0(0) 90(8) 90(8) 97(5) 99(3)

81(9) 82(14) 89(10) 85(10) 86(12) 78(12) 80(14) 87(9) 91(10) 86(11) 82(12) 71(17) 91(9) 84(11) 86(16) 81(12) 76(10) 75(19) 70(15) 85(8) 82(15) 80(12) 76(16) 81(12) 83(13) 77(12) 82(10) 81(11) 78(15) 82(15) 83(14) 73(13) 63(12) 63(22) 82(10) 81(12) 76(10) 75(19) 70(15) 80(12) 82(15) 79(13) 69(24) 81(12) 83(13) 83(11) 73(21) 80(12) 72(21) 80(15)

80(11) 84(10) 86(12) 87(8) 86(13) 82(9) 85(11) 86(13) 87(11) 86(8) 85(10) 80(8) 92(6) 87(11) 87(8) 80(9) 66(12) 76(13) 79(14) 70(9) 81(10) 60(13) 83(12) 85(11) 81(11) 79(12) 65(14) 66(8) 88(6) 83(16) 80(13) 59(14) 73(12) 74(15) 75(16) 80(9) 66(12) 76(13) 79(14) 72(12) 81(10) 60(13) 78(14) 85(11) 81(11) 83(11) 54(18) 76(11) 78(9) 81(15)

56(17) 67(16) 71(18) 68(12) 70(15) 51(13) 73(12) 72(17) 66(7) 78(12) 56(13) 62(15) 83(14) 71(12) 76(8) 60(18) 55(16) 68(20) 66(14) 67(11) 58(15) 56(15) 63(15) 65(15) 66(18) 62(14) 66(12) 62(14) 68(17) 62(21) 61(17) 65(18) 60(15) 62(13) 63(11) 60(18) 55(16) 68(20) 67(13) 61(12) 58(15) 56(15) 58(15) 65(15) 66(18) 58(22) 57(16) 66(18) 67(15) 58(13)

46(22) 53(20) 61(12) 68(18) 54(16) 47(12) 64(14) 66(13) 65(13) 62(15) 48(14) 56(12) 75(17) 64(8) 68(15) 45(27) 51(17) 40(11) 47(19) 60(12) 47(19) 48(9) 43(16) 51(17) 54(14) 47(22) 45(14) 51(13) 47(16) 48(15) 45(13) 48(20) 49(12) 58(15) 51(11) 45(27) 51(17) 40(11) 47(19) 56(13) 47(19) 48(9) 42(17) 51(17) 54(14) 54(20) 52(14) 56(18) 49(12) 54(17)

91(9) 93(9) 95(8) 95(8) 95(7) 92(8) 91(7) 95(8) 96(7) 96(8) 92(11) 89(10) 96(7) 95(8) 96(7) 94(8) 84(11) 89(12) 91(12) 94(7) 92(8) 85(15) 95(10) 94(7) 90(11) 91(12) 88(6) 89(9) 96(7) 94(7) 92(9) 91(7) 93(8) 93(8) 96(5) 94(8) 84(11) 89(12) 92(9) 93(7) 92(8) 85(15) 94(10) 94(7) 90(11) 91(9) 86(12) 84(15) 92(8) 92(10)

4(8) 68(15) 80(20) 48(18) 72(29) 29(42) 59(26) 63(26) 41(25) 79(17) 26(37) 53(28) 58(25) 56(28) 56(35) 16(22) 66(21) 69(19) 22(34) 76(14) 24(30) 60(13) 71(14) 80(9) 73(19) 50(39) 70(19) 67(9) 71(12) 78(15) 12(25) 60(19) 64(14) 69(14) 73(17) 49(47) 64(22) 71(14) 68(14) 66(16) 53(44) 55(16) 74(14) 76(8) 81(9) 46(45) 56(20) 69(13) 71(9) 42(26)

46(50) 76(13) 80(18) 75(12) 79(20) 24(30) 61(26) 77(18) 62(27) 76(15) 36(47) 61(26) 78(18) 80(18) 75(30) 29(39) 62(21) 84(11) 34(37) 66(13) 28(38) 57(21) 70(12) 77(8) 80(12) 23(35) 49(23) 58(19) 72(17) 74(24) 57(41) 67(18) 64(15) 75(16) 70(16) 30(48) 55(17) 72(21) 63(17) 68(18) 24(31) 46(21) 66(12) 72(17) 78(8) 32(37) 56(18) 58(15) 67(16) 43(17)

49(45) 63(13) 77(17) 42(25) 76(15) 56(48) 72(15) 65(30) 44(28) 68(21) 37(43) 49(36) 74(24) 39(26) 59(34) 49(47) 48(14) 61(24) 33(40) 58(17) 62(44) 37(13) 55(16) 51(17) 49(18) 19(34) 47(11) 58(19) 53(16) 52(20) 48(48) 44(13) 56(18) 57(25) 44(14) 40(48) 46(19) 62(18) 51(20) 57(13) 22(32) 48(12) 50(19) 57(16) 45(21) 43(43) 45(20) 42(16) 64(19) 39(26)

10(32) 15(11) 27(20) 37(29) 16(13) 0(0) 21(17) 27(20) 46(29) 26(16) 18(38) 38(34) 30(17) 39(25) 27(22) 34(46) 38(18) 44(16) 26(35) 53(16) 20(42) 37(11) 37(13) 31(22) 40(16) 11(31) 34(20) 36(19) 28(13) 34(17) 2(6) 43(9) 37(14) 28(15) 45(17) 0(0) 39(20) 47(14) 33(16) 49(14) 18(32) 25(17) 36(18) 32(18) 56(22) 10(18) 36(14) 53(19) 25(12) 13(17)

50(38) 86(15) 89(12) 92(11) 91(9) 62(42) 83(16) 84(27) 87(21) 93(8) 37(39) 77(25) 95(8) 89(11) 95(7) 20(27) 81(10) 88(11) 61(28) 84(13) 24(30) 72(15) 89(12) 88(12) 84(15) 51(38) 80(18) 83(13) 87(13) 82(14) 37(41) 75(12) 85(13) 91(9) 88(13) 31(33) 83(13) 82(15) 89(10) 87(11) 43(41) 78(14) 84(11) 83(16) 89(11) 29(29) 81(14) 68(20) 88(15) 85(22)

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

12

classifier, whereas, for test run times, the ANN-MLP has prevailed over the others. By analyzing the sensitivity results shown in Table 5, as expected, higher values were achieved for classes A and E, since these classes can be well separated by kernel machines in general (refer to [16]). Indeed, SVM-RBF delivered the highest values of Se for these classes, namely, 94% (when configured with Coif2infogain) and 99% (when configured with Sym4-relief), respectively. In contrast, SVM-RBF presented poor sensitivity results for classes D and E, mainly when handling non-normalized data. The Bayesian classifier obtained the best rates of recall for class B (92%) when classifying the Coif4-cfss dataset. Finally, SVM-RBF with Db2-non_norm and Bayesian classifier using Coif4-cfss produced the highest values of Se for classes C and D, respectively, namely, 91% and 75%. Tables 6 and 7 present the positive predictive and Fm values achieved by the three classifiers. In general, the Bayesian classifiers

configured with Coiflets have outperformed the other models in terms of precision. Notice that the behavior shown by this classifier when configured with Coif4-cfss is similar to that exhibited by the OPF classifier configured with Coif4-cfss and Manhattan distance. On the other hand, while SVM-RBF (with Coif2-infogain or Coif4-cfss) have produced the highest values of Fm for classes A and B, the Bayesian classifiers (with Coif4-cfss) have championed for classes C and D. For class E, the performance of SVM-RBF and Bayesian classifiers goes in par, reaching the score of F m ¼ 97% for several datasets. On the other hand, the performance levels achieved with the ANN classifiers lagged behind those delivered by Bayesian and SVM-RBF classifiers. 3.5. Comparative analysis Table 8 provides a comparative analysis among all classifiers, taking primarily into account their best accuracy rates for each

Table 6 Positive predictive value scores (in % by class) delivered by different settings of the SVM-RBF, Bayesian and ANN-MLP classifiers. Wavelet

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

SVM

Bayesian

ANN-MLP

A

B

C

D

E

A

B

C

D

E

A

B

C

D

E

31(34) 89(11) 89(8) 93(6) 87(11) 30(48) 91(7) 88(8) 85(10) 87(11) 76(30) 86(7) 90(5) 85(6) 86(12) 64(15) 79(9) 73(10) 73(14) 77(8) 68(19) 72(11) 86(10) 87(8) 85(6) 36(27) 77(13) 86(12) 77(9) 79(13) 70(19) 80(7) 63(9) 77(12) 79(11) 64(15) 79(9) 73(10) 73(13) 76(7) 68(19) 73(12) 81(13) 87(8) 85(6) 69(19) 76(11) 77(11) 81(11) 89(10)

67(39) 93(7) 95(7) 92(10) 86(8) 23(31) 92(7) 90(9) 86(12) 89(9) 18(37) 88(7) 97(5) 89(7) 88(9) 89(13) 85(11) 80(12) 80(12) 80(16) 86(17) 74(13) 82(9) 85(10) 82(12) 82(17) 78(12) 85(7) 88(7) 88(12) 94(9) 81(12) 81(10) 85(11) 80(7) 89(13) 85(11) 80(12) 80(13) 84(13) 86(17) 78(16) 73(12) 85(10) 82(12) 84(14) 78(16) 76(14) 88(12) 84(16)

5(11) 71(13) 62(14) 72(11) 64(10) 29(25) 66(13) 77(16) 68(14) 62(11) 21(7) 68(14) 65(11) 65(12) 68(10) 33(13) 61(15) 63(13) 64(13) 64(13) 31(12) 51(11) 55(8) 59(10) 64(14) 7(12) 61(8) 63(12) 58(13) 56(18) 29(12) 57(14) 59(13) 62(8) 62(10) 33(13) 61(15) 63(13) 62(14) 60(10) 31(12) 52(6) 56(7) 59(10) 64(14) 30(13) 61(7) 64(9) 64(13) 62(13)

20(42) 67(18) 68(19) 73(14) 71(15) 5(16) 63(13) 67(15) 71(9) 66(15) 3(11) 65(20) 65(17) 72(15) 61(13) 7(21) 60(16) 61(14) 53(14) 59(12) 3(11) 49(16) 48(14) 53(10) 54(11) 7(21) 53(12) 53(21) 52(16) 50(13) 55(22) 53(15) 58(20) 56(17) 50(10) 7(21) 60(16) 61(14) 46(16) 58(11) 3(11) 52(14) 45(14) 53(10) 54(11) 0(0) 58(8) 61(10) 61(14) 60(13)

3(8) 90(7) 97(6) 96(8) 94(6) 3(9) 94(8) 91(8) 94(6) 93(6) 2(6) 91(9) 95(7) 97(5) 94(7) 0(0) 93(7) 90(9) 88(9) 93(7) 0(0) 92(9) 91(8) 90(7) 92(8) 0(0) 92(9) 92(6) 92(7) 91(6) 0(0) 92(8) 96(6) 91(8) 94(6) 0(0) 93(7) 90(9) 89(8) 92(8) 0(0) 93(8) 92(9) 90(7) 92(8) 0(0) 93(8) 93(9) 92(6) 49(7)

73(10) 80(11) 87(11) 86(8) 85(11) 74(11) 78(11) 82(12) 87(11) 86(8) 76(13) 71(14) 91(7) 85(10) 89(9) 72(9) 72(10) 68(9) 69(14) 78(9) 74(7) 72(10) 80(11) 84(9) 85(7) 72(13) 72(12) 74(9) 80(7) 87(10) 74(10) 73(15) 73(8) 64(12) 80(12) 72(9) 72(10) 68(9) 69(14) 73(7) 74(7) 72(10) 77(14) 84(9) 85(7) 80(10) 65(11) 77(11) 74(10) 82(9)

86(10) 82(12) 88(12) 82(12) 85(5) 89(9) 84(7) 87(11) 86(11) 91(12) 88(8) 79(14) 93(6) 82(10) 91(13) 87(7) 73(16) 80(10) 84(11) 79(10) 85(7) 71(8) 80(11) 82(11) 81(12) 86(8) 77(17) 79(11) 83(11) 83(10) 93(10) 74(11) 75(13) 78(10) 79(8) 87(7) 73(16) 80(10) 83(12) 76(13) 85(7) 70(9) 78(13) 82(11) 81(12) 86(10) 70(14) 81(14) 78(11) 81(13)

59(14) 64(12) 69(14) 71(9) 64(11) 55(13) 73(12) 76(11) 69(9) 69(12) 58(16) 64(15) 82(12) 69(12) 73(10) 59(15) 55(12) 59(13) 63(16) 71(14) 61(16) 51(12) 56(11) 62(8) 62(11) 61(17) 59(13) 55(12) 61(9) 56(6) 58(14) 55(9) 52(11) 65(9) 60(9) 59(15) 55(12) 59(13) 64(15) 64(12) 61(16) 51(12) 53(13) 62(8) 62(11) 60(18) 50(11) 60(9) 66(14) 58(11)

46(17) 63(20) 68(16) 71(13) 66(14) 45(12) 67(16) 72(18) 68(9) 72(14) 54(15) 56(12) 81(16) 72(13) 71(10) 47(17) 52(16) 52(16) 52(12) 62(9) 50(15) 52(9) 53(12) 60(11) 58(12) 49(16) 53(13) 57(10) 65(13) 56(14) 52(13) 50(12) 53(13) 58(13) 59(12) 47(17) 52(16) 52(16) 53(13) 61(16) 50(15) 52(9) 45(13) 60(11) 58(12) 54(19) 55(16) 59(16) 53(13) 57(12)

94(7) 95(6) 94(7) 98(4) 93(7) 94(7) 97(5) 94(4) 97(5) 95(6) 93(8) 95(6) 96(5) 98(4) 96(6) 96(6) 91(8) 93(8) 89(8) 90(8) 94(6) 90(9) 95(6) 92(7) 91(7) 94(6) 90(11) 91(7) 90(7) 90(7) 93(6) 91(11) 92(8) 90(8) 93(7) 96(6) 91(8) 93(8) 89(8) 91(7) 94(6) 90(9) 93(7) 92(7) 91(7) 96(9) 88(11) 92(7) 91(6) 93(7)

6(13) 58(10) 69(6) 52(13) 58(25) 17(26) 64(22) 56(22) 46(26) 62(10) 21(35) 46(19) 71(16) 55(31) 46(26) 24(34) 66(16) 72(14) 22(34) 73(9) 31(33) 61(15) 72(13) 81(6) 75(10) 39(32) 69(14) 66(13) 73(7) 74(13) 12(25) 57(14) 61(10) 55(12) 65(10) 36(38) 59(12) 74(11) 63(13) 69(9) 25(22) 62(16) 70(13) 75(14) 80(12) 26(31) 60(8) 64(12) 66(10) 52(35)

14(16) 72(13) 82(15) 65(12) 78(17) 37(42) 67(15) 71(22) 55(24) 84(12) 13(19) 65(27) 76(21) 61(13) 75(31) 19(24) 63(16) 79(12) 20(18) 70(13) 25(26) 58(19) 76(14) 74(11) 73(11) 27(41) 48(21) 60(14) 70(10) 77(14) 36(29) 68(13) 73(14) 77(20) 73(10) 7(12) 56(20) 72(14) 65(21) 71(17) 20(26) 50(23) 75(18) 73(13) 77(12) 23(27) 54(14) 62(16) 64(11) 56(18)

22(23) 50(9) 57(5) 37(15) 55(9) 23(22) 47(11) 50(20) 39(20) 51(7) 21(24) 39(25) 56(13) 49(29) 45(25) 15(14) 46(13) 56(20) 28(34) 60(12) 21(17) 40(10) 49(9) 46(21) 50(17) 10(13) 41(9) 49(11) 43(9) 48(18) 16(14) 46(12) 54(14) 54(11) 50(11) 18(23) 52(19) 54(13) 50(14) 54(14) 25(25) 39(11) 49(13) 44(8) 58(14) 22(20) 43(14) 50(15) 51(14) 40(20)

2(6) 36(27) 53(28) 55(31) 49(40) 0(0) 44(27) 48(34) 42(26) 56(25) 5(11) 37(26) 55(29) 41(23) 39(25) 11(14) 42(13) 52(17) 16(18) 54(13) 6(13) 37(13) 42(11) 40(28) 45(18) 7(16) 38(21) 47(16) 40(15) 40(18) 2(7) 42(12) 40(14) 47(20) 45(11) 0(0) 39(12) 51(21) 41(17) 54(12) 17(24) 30(23) 36(13) 44(23) 54(14) 9(14) 40(14) 48(22) 54(27) 25(34)

56(39) 91(9) 95(8) 93(7) 90(9) 38(31) 90(8) 94(8) 93(8) 96(6) 38(40) 95(7) 87(10) 93(7) 94(9) 24(31) 83(10) 88(10) 82(20) 85(9) 39(47) 75(15) 87(7) 90(11) 87(11) 24(18) 80(15) 85(10) 85(12) 85(13) 33(39) 78(16) 86(10) 88(11) 90(13) 48(45) 82(12) 92(10) 88(10) 87(6) 31(35) 77(10) 85(12) 91(9) 83(14) 42(42) 78(13) 79(14) 88(10) 49(16)

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

13

Table 7 F-measure scores (in % by class) delivered by different settings of the SVM-RBF, Bayesian and ANN-MLP classifiers. Wavelet

SVM-RBF

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

Bayesian

ANN-MLP

A

B

C

D

E

A

B

C

D

E

A

B

C

D

E

43 90 90 93 86 13 90 86 87 86 64 86 91 88 85 74 81 75 70 80 76 76 83 86 82 50 80 85 81 82 76 76 65 75 78 74 81 75 71 79 76 77 77 86 82 74 78 78 81 77

34 92 92 91 87 16 91 91 86 88 7 87 94 88 87 57 84 79 83 77 54 70 84 86 83 47 76 84 85 82 63 81 79 85 82 57 84 79 82 81 54 73 75 86 83 56 75 75 88 70

8 70 67 74 68 44 67 71 71 68 33 68 69 70 66 48 59 66 65 63 46 53 57 58 64 12 62 63 58 55 41 60 64 68 60 48 59 66 62 58 46 54 60 58 64 45 62 62 67 53

5 65 59 69 62 3 62 69 62 55 2 62 61 60 62 3 59 52 48 56 2 46 44 51 54 5 48 50 49 51 36 50 48 47 49 3 59 52 43 56 2 50 40 51 54 0 58 61 54 49

4 94 97 96 95 4 95 93 95 94 3 93 95 97 95 0 94 92 89 94 0 92 92 92 92 0 93 94 94 92 0 94 95 94 95 0 94 92 90 95 0 93 91 92 92 0 91 91 94 66

77 81 88 85 86 76 79 84 89 86 79 71 91 85 88 76 74 71 70 82 78 76 78 83 84 75 77 78 79 84 78 73 67 64 81 76 74 71 70 76 78 75 73 83 84 81 69 78 73 81

83 83 87 85 86 85 84 86 86 88 86 79 93 84 89 83 69 78 81 74 83 65 81 84 81 82 71 72 85 83 86 66 74 76 77 83 69 78 81 74 83 65 78 84 81 84 61 79 78 81

57 65 70 70 67 53 73 74 67 73 57 63 83 70 74 59 55 63 64 69 59 54 59 63 64 61 62 58 64 59 59 60 56 63 61 59 55 63 65 63 59 54 55 63 64 59 53 63 66 58

46 58 64 69 59 46 65 69 67 66 51 56 78 68 69 46 52 45 49 61 49 50 47 55 56 48 49 54 55 52 48 49 51 58 55 46 52 45 50 59 49 50 43 55 56 54 54 57 51 55

92 94 94 97 94 93 94 94 97 95 92 92 96 97 96 95 88 91 90 92 93 87 95 93 90 93 89 90 93 92 92 91 93 91 94 95 88 91 90 92 93 87 93 93 90 93 87 88 91 92

5 63 74 50 64 22 61 59 43 70 23 49 64 55 50 19 66 70 22 75 27 60 72 81 74 44 70 67 72 76 12 58 63 61 68 42 61 73 65 67 34 58 72 76 80 33 58 67 68 47

22 74 81 70 78 29 64 74 58 80 19 63 77 69 75 23 62 82 25 68 26 57 73 76 76 25 48 59 71 75 44 68 68 76 71 12 56 72 64 69 22 48 70 73 77 27 55 60 65 49

30 55 65 39 64 32 57 57 42 58 27 44 64 43 51 23 47 58 30 59 31 39 52 48 49 13 44 53 47 50 24 45 55 56 47 25 49 58 50 55 23 43 49 50 51 29 44 46 57 39

3 21 36 44 24 0 28 35 44 36 8 38 39 40 32 16 40 48 20 53 9 37 39 35 42 9 36 41 33 37 2 43 38 35 45 0 39 49 37 51 17 27 36 37 55 9 38 51 34 17

53 88 92 92 91 47 86 89 90 95 38 85 91 91 94 22 82 88 70 84 30 73 88 89 85 32 80 84 86 84 35 77 86 90 89 38 83 86 88 87 36 77 85 87 86 34 79 73 88 62

Table 8 Performance comparison among all classifiers: best accuracy rates for each type of dataset and their associated values of training/test run times and F-measure. Classifier

Dataset type

Wavelet

Acc [%]

Times [ms]

Fm [%]

mean Fm [%]

Train

Test

Total

A

B

C

D

E

OPF Bayesian SVM-RBF ANN-MLP

non_norm

Coif4 Sym4 Haar Coif3

80.2 73.8 45.0 34.2

9.64 15.97 1,372,857.42 19,105.02

0.31 5.84 17.66 0.15

9.95 21.81 1,372,875.08 19,105.17

87 81 76 22

87 84 63 29

72 59 41 32

62 54 36 0

94 93 0 47

80.4 74.2 43.2 26.0

OPF Bayesian SVM-RBF ANN-MLP

noFS

Coif4 Coif3 Coif2 Coif2

82.2 78.6 81.8 61.6

9.24 9.94 18,719.68 19,377.45

0.26 6.19 9.39 0.15

9.50 16.13 18,729.07 19,377.60

84 79 90 63

87 84 92 74

76 73 70 55

70 65 65 21

95 94 94 88

82.4 79.0 82.2 60.2

OPF Bayesian SVM-RBF ANN-MLP

FS

Coif4-cfss Coif4-cfss Coif2-infogain Coif2-cfss

89.2 87.4 84.4 70.6

9.42 7.62 12,948.55 16,978.41

0.28 5.29 4.35 0.13

9.70 12.91 12,952.9 16,978.54

92 91 93 74

93 93 91 81

85 83 74 65

82 78 69 36

96 96 96 92

89.6 88.2 84.6 69.6

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

14

Table 9 Confusion matrices associated with all classifiers for their best accuracy cases as shown in Table 8 – non-normalized datasets without feature selection.

Table 11 Confusion matrices associated with all classifiers for their best accuracy cases as shown in Table 8 – normalized datasets with feature selection.

True class

Classified as

OPF

Bayesian

SVM-RBF

ANN-MLP

True class

Classified as

OPF

Bayesian

SVM-RBF

ANN-MLP

A

A B C D E

88 08 01 03 00

83 07 06 04 00

82 02 09 07 00

29 07 30 00 34

A

A B C D E

93 05 00 02 00

91 06 01 02 00

94 05 01 00 00

80 16 02 02 00

B

A B C D E

13 85 00 02 00

13 83 01 03 00

16 47 34 03 00

18 24 18 00 40

B

A B C D E

08 92 00 00 00

08 92 00 00 00

06 90 01 02 01

19 80 00 01 00

C

A B C D E

01 00 75 24 00

04 02 58 35 01

11 00 69 20 00

21 10 56 00 22

C

A B C D E

00 00 83 16 01

00 00 83 17 00

00 02 76 21 01

05 02 77 15 01

D

A B C D E

02 01 32 60 05

06 06 30 54 04

13 01 59 27 00

23 03 45 00 29

D

A B C D E

01 00 13 83 03

01 00 20 75 04

02 02 27 66 03

12 01 55 27 05

E

A B C D E

00 01 02 04 93

00 01 02 06 91

10 00 90 00 00

14 04 20 00 62

E

A B C D E

00 01 00 04 95

00 01 00 03 96

00 00 01 03 96

01 04 02 04 89

Table 10 Confusion matrices associated with all classifiers for their best accuracy cases as shown in Table 8 – normalized datasets without feature selection. True class

Classified as

OPF

Bayesian

SVM-RBF

ANN-MLP

A

A B C D E

85 11 01 03 00

80 10 03 07 00

91 05 01 03 00

68 22 08 02 00

B

A B C D E

12 87 00 01 00

13 85 01 01 00

08 90 00 01 01

16 76 02 05 01

C

A B C D E

02 01 75 22 00

03 03 73 21 00

02 01 68 26 03

14 05 63 15 03

D

A B C D E

04 01 21 70 04

07 02 24 64 03

03 00 27 63 07

21 04 54 15 06

E

A B C D E

00 01 00 05 94

00 02 01 06 91

00 01 00 02 97

02 03 04 05 86

type of dataset. In all cases, the OPF classifier obtained the best Acc rates, also prevailing in terms of the associated total run times and average Fm values. This demonstrates the usefulness of this technique, both in terms of efficiency and effectiveness criteria, for classifying EEG signals. For non_norm datasets, in particular, the OPF classifier obtained the highest values of Fm for all classes,

contrasting with the low values delivered by SVM-RBF and ANNMLP classifiers. For noFS datasets, OPF obtained the highest Fm values for classes C, D, and E, while for classes A and B SVM-RBF proved to be superior. Finally, for FS datasets, the OPF classifier was better than the other approaches for classes C and D, whereas for classes B and E, there was a tie between the OPF and the Bayesian and SVM-RBF classifiers. For class A, the performance of SVM-RBF and OPF classifiers was quite the same, with a bit of advantage in terms of average values by the former. Another important aspect to be noticed is that the training and test run times varied a lot from non_norm to noFS to FS datasets. The OPF classifier was the most robust in terms of computational efficiency, showing small values of variance. This demonstrates the resilience of this classifier to the aspects of data normalization and dimensionality. Bayesian and ANN classifiers presented the same satisfactory behavior only for the testing phase, while the SVMRBF classifier was highly affected by the lack of attribute normalization and, sometimes, selection. Tables 9–11 present the confusion matrices associated with all classifiers for their best cases discussed in Table 8. In particular, it can be observed that the best performance for non_norm datasets was usually achieved by OPF classifiers. As expected, classes C and D were the most hard to discriminate, while class E was the easiest one. Conversely, when the dataset is normalized, as seen in Table 10, the OPF classifier's performance increases, but the highest accuracy rates for classes A, B and C were furnished by the SVM-RBF classifier, followed very closely by the OPF classifier. Finally, Table 11 shows that, when working with normalized datasets with feature selection, the performance of all classifiers tends to increase, with different classifiers outperforming for different classes.

4. Concluding remarks In this work, we have provided a thorough assessment of the performance of OPF classifiers when coping with the task

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

Wavelet

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

OPF Euclidean

OPF Manhattan

OPF Canberra

OPF Squared Chi-square

Acc

Train time [ms]

Test time [ms]

Acc

Train time [ms]

Test time [ms]

Acc

Train time [ms]

Test time [ms]

Acc

Train time [ms]

Test time [ms]

70.4(52) 74.8(49) 78.6(47) 79.4(49) 77.6(65) 69.6(52) 78.0(45) 80.8(69) 79.8(48) 81.6(72) 71.4(77) 70.8(63) 85.8(49) 80.0(37) 81.0(44) 71.2(70) 65.6(57) 69.6(52) 71.2(45) 74.0(41) 71.2(48) 65.6(72) 71.8(63) 75.2(50) 74.6(63) 71.4(78) 68.4(54) 69.0(55) 75.2(58) 72.2(61) 71.6(64) 66.6(63) 67.2(41) 70.2(44) 73.0(43) 71.2(70) 65.6(57) 69.6(52) 71.6(45) 71.8(60) 71.2(48) 65.4(72) 67.4(61) 75.2(50) 74.6(63) 74.4(75) 64.0(77) 71.8(67) 71.2(69) 72.4(56)

8.30(3.22) 8.02(3.03) 8.05(2.65) 7.96(2.49) 9.59(2.37) 8.99(3.07) 9.63(2.06) 7.91(2.24) 8.97(2.37) 7.18(2.14) 8.32(2.87) 8.96(3.21) 8.32(2.31) 9.08(2.39) 8.24(1.34) 7.86(3.43) 6.31(2.42) 7.10(2.79) 8.14(2.87) 7.16(2.16) 8.08(3.07) 8.08(3.02) 8.89(2.36) 8.86(2.23) 6.87(2.43) 8.89(2.64) 8.18(2.00) 8.79(2.46) 9.69(1.80) 8.55(2.24) 10.61(2.26) 8.75(2.67) 9.27(2.15) 8.72(2.19) 8.23(2.22) 10.03(2.85) 8.85(2.10) 8.89(1.82) 8.80(2.47) 9.22(2.58) 8.55(2.91) 9.10(2.59) 8.11(2.53) 6.85(2.38) 8.19(2.31) 10.80(2.16) 8.18(2.76) 10.43(1.37) 8.85(2.54) 9.65(2.35)

0.25(0.11) 0.28(0.10) 0.25(0.11) 0.21(0.09) 0.32(0.05) 0.27(0.11) 0.31(0.09) 0.26(0.08) 0.30(0.07) 0.26(0.10) 0.29(0.10) 0.29(0.11) 0.26(0.08) 0.28(0.09) 0.22(0.06) 0.23(0.10) 0.26(0.11) 0.25(0.09) 0.26(0.10) 0.31(0.09) 0.24(0.09) 0.27(0.11) 0.31(0.13) 0.32(0.09) 0.25(0.08) 0.29(0.10) 0.34(0.09) 0.25(0.10) 0.35(0.05) 0.31(0.09) 0.33(0.16) 0.36(0.14) 0.34(0.08) 0.32(0.07) 0.29(0.09) 0.38(0.13) 0.34(0.08) 0.34(0.08) 0.29(0.09) 0.33(0.08) 0.29(0.10) 0.25(0.11) 0.29(0.08) 0.27(0.06) 0.29(0.08) 0.29(0.11) 0.26(0.09) 0.35(0.08) 0.32(0.06) 0.38(0.13)

72.6(55) 81.4(58) 80.4(51) 80.2(33) 78.2(70) 73.8(55) 822(35) 80.4(76) 80.0(61) 82.0(52) 74.0(56) 82.2(71) 89.2(33) 78.6(48) 81.4(27) 75.2(63) 74.4(39) 78.0(67) 70.2(58) 76.2(48) 75.8(35) 76.4(62) 75.6(51) 75.4(53) 75.8(56) 74.6(50) 77.8(55) 75.6(61) 74.6(72) 760(52) 75.4(66) 73.4(60) 71.4(57) 732(44) 74.6(47) 75.2(63) 74.4(39) 78.0(67) 70.0(57) 74.4(69) 75.8(35) 76.4(62) 72.6(71) 75.4(53) 75.8(56) 77.2(57) 77.0(44) 79.0(43) 71.4(71) 74.6(92)

8.77(3.39) 10.13(2.51) 9.99(2.12) 9.00(3.66) 11.47(0.27) 9.81(3.38) 9.04(2.50) 8.72(3.42) 9.93(2.90) 9.32(2.90) 8.26(3.26) 9.24(2.70) 9.42(3.31) 10.72(2.26) 7.21(2.77) 8.53(3.80) 8.20(2.88) 9.12(3.02) 8.12(3.55) 9.82(2.83) 10.03(2.50) 8.19(3.59) 9.25(2.78) 10.28(2.75) 6.96(3.16) 9.10(3.03) 10.26(2.12) 8.69(3.16) 11.35(0.53) 11.15(0.08) 9.54(2.57) 10.89(0.40) 10.95(0.22) 11.13(0.41) 11.25(0.43) 11.37(0.40) 10.93(0.37) 10.96(0.31) 11.30(0.59) 11.20(0.51) 11.34(0.61) 9.18(3.47) 10.83(0.78) 10.72(1.71) 10.51(2.13) 11.14(0.93) 7.76(3.46) 11.01(0.44) 10.99(0.90) 11.28(0.57)

0.25(0.11) 0.28(0.09) 0.25(0.10) 0.24(0.14) 0.36(0.11) 0.25(0.11) 0.27(0.08) 0.27(0.09) 0.32(0.09) 0.26(0.10) 0.25(0.17) 0.26(0.10) 0.28(0.08) 0.31(0.14) 0.26(0.13) 0.26(0.10) 0.25(0.10) 0.29(0.11) 0.26(0.11) 0.29(0.11) 0.30(0.08) 0.27(0.10) 0.28(0.11) 0.32(0.10) 0.26(0.11) 0.27(0.10) 0.33(0.07) 0.25(0.11) 0.31(0.10) 0.35(0.07) 0.27(0.10) 0.31(0.10) 0.41(0.15) 0.34(0.10) 0.39(0.12) 0.32(0.09) 0.33(0.07) 0.35(0.07) 0.34(0.09) 0.35(0.07) 0.33(0.09) 0.30(0.09) 0.33(0.09) 0.37(0.03) 0.31(0.10) 0.36(0.07) 0.24(0.09) 0.37(0.13) 0.32(0.09) 0.31(0.10)

69.0(65) 60.8(76) 74.8(78) 80.2(33) 78.0(73) 75.0(56) 62.8(56) 80.8(60) 80.2(61) 81.8(54) 73.0(51) 64.2(89) 87.4(45) 78.6(48) 80.4(28) 72.4(58) 58.2(37) 69.0(58) 69.4(65) 69.0(57) 71.6(44) 61.6(50) 73.8(72) 69.4(39) 71.8(58) 73.6(56) 63.2(71) 69.0(53) 72.8(51) 74.0(35) 69.6(75) 61.4(44) 67.4(55) 712(50) 67.0(59) 72.4(58) 58.2(37) 69.0(58) 69.2(64) 67.4(63) 71.6(44) 61.6(47) 71.6(62) 69.4(39) 71.8(58) 7.20(74) 62.6(75) 71.2(71) 69.6(59) 73.6(88)

7.85(3.02) 9.74(1.82) 8.07(3.48) 9.33(2.96) 10.58(2.08) 8.80(2.90) 7.92(3.14) 9.71(2.62) 9.18(3.12) 9.29(3.01) 6.75(2.55) 9.30(2.41) 9.79(2.65) 9.04(3.69) 8.79(2.85) 8.05(3.10) 8.44(2.15) 8.70(3.01) 9.09(2.74) 9.21(2.94) 10.07(2.15) 7.88(2.82) 9.42(2.65) 9.50(2.77) 7.18(3.05) 8.23(2.98) 10.53(0.93) 7.26(3.22) 10.03(2.45) 11.18(0.50) 9.84(2.12) 10.18(1.01) 10.52(1.04) 11.05(0.93) 11.05(1.02) 9.57(2.54) 10.20(0.51) 10.38(1.08) 10.49(2.05) 10.59(1.18) 10.67(1.13) 8.05(3.22) 10.32(2.03) 11.18(0.23) 9.10(2.86) 10.35(1.24) 7.72(2.75) 10.15(1.33) 10.05(2.55) 9.31(2.96)

0.25(0.11) 0.33(0.11) 0.22(0.09) 0.25(0.09) 0.28(0.11) 0.28(0.10) 0.30(0.12) 0.30(0.10) 0.26(0.09) 0.27(0.13) 0.26(0.10) 0.31(0.11) 0.28(0.08) 0.26(0.10) 0.23(0.09) 0.26(0.11) 0.32(0.14) 0.30(0.11) 0.27(0.11) 0.32(0.10) 0.33(0.09) 0.26(0.11) 0.32(0.10) 0.30(0.12) 0.26(0.10) 0.29(0.10) 0.41(0.01) 0.23(0.09) 0.26(0.10) 0.37(0.14) 0.35(0.07) 0.41(0.12) 0.33(0.11) 0.31(0.10) 0.38(0.07) 0.31(0.11) 0.36(0.12) 0.35(0.09) 0.32(0.10) 0.39(0.03) 0.36(0.08) 0.33(0.12) 0.35(0.08) 0.35(0.16) 0.33(0.09) 0.28(0.10) 0.26(0.12) 0.38(0.15) 0.38(0.11) 0.30(0.10)

76.6(78) 64.2(61) 76.4(65) 79.4(50) 77.2(67) 78.8(67) 62.8(71) 80.2(62) 79.8(57) 80.6(73) 80.2(52) 61.0(57) 85.8(56) 79.8(38) 80.8(45) 75.8(56) 58.0(53) 69.8(59) 69.8(44) 72.2(52) 74.8(55) 58.2(64) 74.8(46) 72.4(40) 73.2(44) 75.2(51) 61.4(68) 67.4(65) 74.0(60) 730(25) 72.6(48) 64.2(68) 63.8(39) 702(55) 70.4(56) 75.8(56) 58.0(53) 69.8(59) 70.0(45) 69.0(60) 74.8(55) 58.2(61) 70.4(64) 72.4(40) 73.2(44) 764(44) 58.6(83) 68.2(59) 69.0(73) 70.6(64)

8.99(3.12) 9.70(2.57) 8.61(3.20) 8.64(3.35) 11.20(0.69) 9.13(3.36) 7.94(3.11) 9.00(3.29) 9.16(3.67) 8.41(3.09) 9.64(3.02) 7.57(2.87) 9.58(2.76) 9.19(3.60) 7.50(2.96) 8.01(3.16) 8.33(2.89) 8.06(2.90) 8.10(3.34) 10.56(2.15) 9.69(2.93) 8.01(2.92) 9.77(2.74) 10.02(3.09) 9.29(3.20) 9.09(2.99) 10.51(1.10) 7.51(3.14) 10.93(1.24) 11.19(0.48) 11.24(0.48) 10.45(1.18) 10.99(0.73) 10.42(2.08) 11.08(0.12) 10.76(1.94) 9.73(1.67) 10.87(0.58) 11.11(0.84) 11.24(0.26) 11.11(0.90) 7.84(2.77) 10.85(0.43) 11.37(0.46) 10.62(2.16) 10.81(1.07) 6.94(2.95) 10.88(0.36) 11.09(0.52) 9.55(2.49)

0.25(0.15) 0.33(0.11) 0.22(0.10) 0.19(0.09) 0.32(0.08) 0.27(0.10) 0.28(0.11) 0.29(0.12) 0.23(0.10) 0.25(0.11) 0.31(0.09) 0.27(0.11) 0.28(0.09) 0.24(0.10) 0.23(0.09) 0.23(0.10) 0.32(0.12) 0.27(0.10) 0.27(0.11) 0.35(0.09) 0.25(0.11) 0.35(0.21) 0.31(0.10) 0.28(0.12) 0.33(0.15) 0.25(0.13) 0.43(0.02) 0.26(0.11) 0.35(0.07) 0.36(0.07) 0.32(0.11) 0.38(0.08) 0.42(0.14) 0.28(0.11) 0.37(0.07) 0.32(0.16) 0.34(0.12) 0.40(0.01) 0.35(0.09) 0.39(0.02) 0.34(0.06) 0.26(0.11) 0.40(0.17) 0.34(0.09) 0.34(0.09) 0.28(0.11) 0.33(0.16) 0.35(0.14) 0.30(0.11) 0.32(0.11)

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 15

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

Table 12 Accuracy, training, and test run times delivered by different settings of the OPF classifier.

16

Wavelet

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

OPF Euclidean

OPF Manhattan

OPF Canberra

OPF Squared Chi-square

A

B

C

D

E

A

B

C

D

E

A

B

C

D

E

A

B

C

D

E

80(11) 81(13) 88(10) 82(8) 85(12) 78(13) 80(14) 87(9) 88(11) 86(11) 81(11) 72(17) 90(9) 85(12) 84(13) 81(12) 76(10) 74(19) 70(15) 82(10) 82(15) 78(12) 76(15) 80(14) 82(19) 77(12) 81(9) 79(14) 77(15) 79(17) 82(14) 71(15) 62(13) 64(22) 82(12) 81(12) 76(10) 74(19) 70(15) 79(13) 82(15) 77(13) 70(22) 80(14) 82(19) 85(10) 74(18) 79(11) 71(20) 79(15)

80(11) 83(11) 85(11) 86(8) 85(14) 81(10) 85(11) 86(13) 85(10) 86(8) 82(10) 79(9) 91(7) 87(11) 87(8) 79(9) 65(12) 76(16) 79(14) 68(10) 79(11) 60(15) 82(11) 85(11) 80(11) 79(12) 62(16) 65(7) 88(6) 82(15) 80(13) 59(14) 71(12) 76(14) 72(13) 79(9) 65(12) 76(16) 79(14) 70(12) 79(11) 60(15) 75(13) 85(11) 80(11) 81(11) 56(16) 76(14) 78(9) 80(15)

55(19) 66(16) 67(16) 69(14) 67(17) 52(12) 73(12) 71(17) 66(7) 78(12) 55(13) 59(13) 80(14) 69(12) 74(11) 58(18) 54(17) 68(20) 68(13) 66(11) 57(14) 56(15) 63(16) 65(14) 65(18) 64(11) 68(13) 62(14) 68(17) 61(23) 60(18) 64(18) 59(16) 61(13) 65(10) 58(18) 54(17) 68(20) 69(12) 60(12) 57(14) 56(15) 57(16) 65(14) 65(18) 58(22) 58(19) 65(16) 66(18) 58(13)

47(21) 52(20) 58(12) 68(17) 55(16) 46(14) 62(14) 66(7) 65(13) 62(15) 50(13) 56(12) 72(16) 65(10) 65(15) 45(26) 51(14) 41(11) 47(19) 60(12) 47(18) 48(9) 43(16) 52(16) 56(13) 47(23) 45(14) 51(12) 47(18) 45(17) 44(10) 49(21) 52(10) 57(16) 50(9) 45(26) 51(14) 41(11) 47(19) 58(13) 47(18) 48(9) 42(17) 52(16) 56(13) 57(19) 50(14) 55(18) 49(12) 54(17)

90(8) 92(9) 95(8) 92(10) 96(7) 91(7) 90(9) 94(8) 95(7) 96(8) 89(11) 88(9) 96(7) 94(8) 95(8) 93(8) 82(11) 89(11) 92(9) 94(7) 91(9) 86(16) 95(10) 94(7) 90(11) 90(12) 86(8) 88(9) 96(7) 94(7) 92(11) 90(8) 92(10) 93(8) 96(5) 93(8) 82(11) 89(11) 93(7) 92(8) 91(9) 86(16) 93(9) 94(7) 90(11) 91(9) 82(12) 84(15) 92(8) 91(10)

81(10) 88(9) 88(9) 87(12) 84(14) 87(13) 88(15) 87(8) 88(9) 86(12) 81(11) 85(15) 93(7) 82(11) 84(13) 88(12) 79(11) 78(17) 68(13) 85(8) 92(9) 87(9) 78(16) 86(10) 80(13) 83(8) 83(13) 87(7) 79(16) 81(17) 90(15) 82(13) 68(17) 70(17) 81(11) 88(12) 79(11) 78(17) 67(13) 83(5) 92(9) 87(9) 76(19) 86(10) 80(13) 91(11) 90(7) 85(10) 76(14) 81(17)

80(11) 89(9) 85(7) 85(7) 87(14) 85(12) 88(8) 86(14) 86(8) 87(11) 80(11) 87(8) 92(6) 88(10) 88(10) 80(12) 83(8) 88(14) 75(16) 73(9) 83(9) 79(13) 84(12) 84(12) 85(10) 81(11) 79(11) 76(13) 87(8) 83(16) 84(11) 69(16) 73(9) 78(13) 75(13) 80(12) 83(8) 88(14) 75(16) 75(12) 83(9) 79(13) 81(12) 84(12) 85(10) 82(12) 76(11) 83(8) 74(13) 82(15)

57(16) 77(13) 70(16) 72(10) 69(19) 53(13) 75(11) 73(18) 70(11) 77(13) 66(14) 75(17) 83(11) 67(13) 75(12) 67(18) 65(10) 76(16) 64(13) 68(15) 66(13) 69(18) 68(16) 61(10) 68(16) 71(11) 79(13) 71(14) 64(27) 69(14) 62(18) 72(18) 70(19) 67(13) 71(13) 67(18) 65(10) 76(16) 64(13) 63(16) 66(13) 69(18) 63(16) 61(10) 68(16) 64(13) 67(18) 70(16) 63(15) 56(20)

53(22) 60(14) 64(16) 62(9) 55(15) 52(10) 66(8) 61(17) 61(19) 65(14) 51(13) 70(9) 83(11) 61(15) 65(10) 46(19) 53(11) 53(17) 51(20) 61(14) 46(15) 54(7) 52(19) 52(14) 54(12) 48(17) 54(17) 52(12) 47(17) 55(14) 49(12) 49(14) 51(14) 56(13) 51(14) 46(19) 53(11) 53(17) 51(20) 58(13) 46(15) 54(7) 49(21) 52(14) 54(12) 56(17) 61(14) 69(7) 51(14) 62(15)

92(9) 93(9) 95(8) 95(8) 96(7) 92(8) 94(7) 95(8) 95(8) 95(8) 92(10) 94(7) 95(8) 95(7) 95(8) 95(7) 92(9) 95(10) 93(8) 94(7) 92(10) 93(13) 96(7) 94(7) 92(8) 90(13) 94(8) 92(9) 96(7) 92(9) 92(9) 95(5) 95(7) 95(7) 95(5) 95(7) 92(9) 95(10) 93(8) 93(8) 92(10) 93(13) 94(8) 94(7) 92(8) 93(8) 91(9) 88(10) 93(7) 92(10)

69(11) 56(21) 76(11) 87(12) 82(15) 79(12) 58(20) 87(8) 88(9) 86(12) 78(12) 64(16) 92(8) 82(11) 84(13) 87(12) 77(16) 63(13) 67(16) 73(9) 69(10) 77(16) 78(15) 76(10) 83(15) 80(8) 78(13) 81(14) 76(14) 81(14) 69(17) 76(13) 65(15) 68(18) 71(15) 87(12) 77(16) 63(13) 67(16) 72(15) 69(10) 77(13) 75(17) 76(10) 83(15) 72(15) 71(11) 75(15) 72(13) 78(12)

70(12) 59(13) 78(12) 85(7) 87(14) 79(16) 63(9) 82(13) 86(8) 88(9) 72(10) 65(11) 90(7) 87(11) 88(12) 72(11) 61(14) 82(9) 74(19) 79(12) 77(13) 61(9) 85(10) 83(13) 84(7) 73(16) 67(13) 76(10) 87(11) 88(11) 70(14) 59(14) 68(12) 79(14) 77(13) 72(11) 61(14) 82(9) 74(19) 72(12) 77(13) 61(9) 81(10) 83(13) 84(7) 80(16) 66(13) 80(12) 71(13) 83(13)

65(18) 56(22) 70(16) 72(10) 71(19) 64(14) 62(9) 74(15) 70(11) 76(13) 69(9) 58(15) 81(13) 67(13) 72(11) 60(19) 51(16) 68(15) 64(13) 59(14) 69(14) 60(20) 69(20) 65(12) 61(18) 73(12) 55(13) 64(13) 63(19) 63(18) 70(18) 61(16) 61(19) 64(11) 63(11) 60(19) 51(16) 68(15) 64(13) 56(10) 69(14) 60(20) 70(12) 65(12) 61(18) 58(23) 61(20) 68(15) 63(16) 58(15)

49(16) 54(14) 59(15) 62(11) 55(16) 60(12) 49(20) 69(13) 62(18) 65(14) 52(14) 48(12) 80(11) 63(15) 65(11) 52(13) 42(18) 49(13) 49(20) 50(16) 53(9) 45(20) 46(13) 32(11) 49(14) 50(17) 48(15) 44(10) 44(17) 52(17) 45(16) 38(12) 52(14) 54(14) 39(15) 52(13) 42(18) 49(13) 49(20) 51(10) 53(9) 45(20) 44(13) 32(11) 49(14) 54(13) 59(15) 56(15) 48(13) 55(16)

92(9) 79(10) 91(12) 95(8) 95(7) 93(7) 82(15) 92(9) 95(8) 94(8) 94(11) 86(8) 94(8) 94(7) 93(11) 91(10) 60(9) 83(12) 93(8) 84(13) 90(7) 65(10) 91(7) 91(7) 82(9) 92(9) 68(16) 80(9) 94(7) 86(10) 94(7) 73(8) 91(9) 91(10) 85(14) 91(10) 60(9) 83(12) 92(8) 86(10) 90(7) 65(10) 88(9) 91(7) 82(9) 96(5) 56(16) 77(12) 94(7) 94(11)

85(14) 63(14) 82(13) 82(8) 85(12) 91(9) 58(18) 86(11) 88(11) 85(11) 88(9) 54(13) 90(9) 85(12) 84(13) 89(14) 79(10) 73(13) 67(14) 79(10) 82(14) 78(12) 79(14) 79(7) 80(16) 82(17) 80(12) 80(11) 77(9) 78(12) 82(17) 77(17) 62(20) 65(20) 80(12) 89(14) 79(10) 73(13) 67(14) 75(11) 82(14) 77(12) 73(24) 79(7) 80(16) 83(17) 77(14) 72(15) 68(19) 72(17)

89(12) 61(17) 77(12) 86(8) 85(14) 87(9) 62(20) 85(14) 85(10) 86(8) 85(11) 59(11) 92(8) 87(11) 87(8) 83(9) 53(12) 77(13) 75(13) 72(11) 85(8) 48(6) 80(9) 83(12) 84(7) 84(11) 52(15) 66(10) 87(9) 86(14) 81(14) 56(12) 67(13) 76(14) 75(14) 83(9) 53(12) 77(13) 75(13) 67(16) 85(8) 49(7) 76(13) 83(12) 84(7) 83(14) 46(18) 70(14) 72(15) 79(17)

67(21) 65(20) 69(10) 68(13) 66(18) 69(17) 64(12) 72(17) 66(7) 76(14) 75(16) 55(16) 80(14) 69(12) 73(11) 59(14) 47(18) 65(17) 69(13) 64(13) 65(16) 52(15) 70(14) 64(11) 61(23) 59(10) 51(12) 65(13) 68(18) 65(16) 64(13) 64(13) 56(16) 62(11) 59(10) 59(14) 47(18) 65(17) 69(13) 61(14) 65(16) 52(15) 70(14) 64(11) 61(23) 64(14) 56(20) 67(12) 66(18) 57(14)

50(22) 51(15) 58(16) 69(18) 54(18) 54(20) 49(17) 62(6) 65(13) 61(14) 60(25) 51(16) 70(17) 64(8) 65(15) 53(13) 43(13) 49(14) 46(20) 58(16) 50(15) 47(15) 49(14) 43(15) 50(16) 60(18) 54(15) 42(13) 42(17) 44(11) 43(13) 41(13) 42(14) 56(16) 44(24) 53(13) 43(13) 49(14) 46(20) 53(13) 50(15) 47(15) 42(16) 43(15) 50(16) 60(14) 54(17) 53(15) 49(16) 53(13)

92(9) 81(14) 96(8) 92(10) 96(7) 93(8) 81(14) 96(7) 95(8) 95(8) 93(8) 86(8) 97(7) 94(8) 95(8) 95(8) 68(16) 85(14) 92(9) 88(11) 92(8) 66(15) 96(7) 93(7) 91(11) 91(12) 70(12) 84(15) 96(7) 92(8) 93(8) 83(14) 92(9) 92(11) 94(10) 95(8) 68(16) 85(14) 93(7) 89(10) 92(8) 66(15) 91(12) 93(7) 91(11) 92(9) 60(17) 79(14) 90(8) 92(10)

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

Table 13 Sensitivity scores (in % by class) delivered by different settings of the OPF classifier.

Wavelet

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

OPF Euclidean

OPF Manhattan

OPF Canberra

OPF Squared Chi-square

A

B

C

D

E

A

B

C

D

E

A

B

C

D

E

A

B

C

D

E

73(9) 79(11) 86(11) 86(8) 84(11) 73(11) 78(11) 81(9) 85(10) 86(8) 75(13) 69(14) 90(7) 85(10) 88(10) 71(9) 73(11) 68(10) 69(14) 76(10) 73(8) 71(11) 80(11) 85(9) 84(7) 72(13) 70(12) 74(10) 82(6) 87(11) 75(12) 72(15) 71(7) 64(11) 80(11) 71(9) 73(11) 68(10) 69(14) 71(6) 73(8) 71(11) 78(14) 85(9) 84(7) 79(11) 66(10) 79(13) 73(11) 81(9)

86(10) 81(11) 86(13) 78(9) 84(6) 89(10) 83(8) 87(10) 83(11) 91(12) 87(8) 80(14) 93(7) 81(10) 92(12) 86(7) 73(16) 80(12) 84(11) 78(11) 84(8) 72(7) 80(10) 82(12) 82(12) 86(9) 74(16) 77(11) 80(12) 80(10) 93(10) 77(15) 75(12) 79(10) 78(11) 86(7) 73(16) 80(12) 83(12) 76(14) 84(8) 71(7) 78(12) 82(12) 82(12) 85(8) 72(13) 82(14) 76(9) 80(14)

58(14) 63(12) 67(13) 71(10) 63(11) 54(11) 71(12) 75(13) 69(8) 69(12) 58(16) 63(16) 77(10) 70(9) 69(10) 58(15) 51(10) 60(13) 64(16) 69(13) 59(13) 51(12) 55(10) 62(9) 63(11) 62(17) 59(13) 54(12) 62(9) 54(7) 56(14) 54(10) 52(11) 65(8) 60(9) 58(15) 51(10) 60(13) 65(16) 65(12) 59(13) 51(12) 53(13) 62(9) 63(11) 61(17) 48(9) 60(9) 65(17) 59(11)

45(16) 62(20) 65(18) 70(14) 66(17) 45(12) 68(16) 71(17) 67(8) 72(14) 52(16) 55(12) 78(15) 72(12) 67(12) 46(15) 50(12) 51(16) 54(13) 61(9) 49(12) 51(9) 53(12) 60(11) 57(9) 48(15) 54(12) 56(10) 64(15) 54(16) 48(10) 48(12) 54(14) 58(12) 57(8) 46(15) 50(12) 51(16) 55(14) 61(16) 49(12) 51(9) 42(15) 60(11) 57(9) 56(20) 55(15) 55(15) 54(14) 57(12)

94(7) 95(6) 94(7) 97(5) 93(7) 95(7) 97(5) 93(5) 98(4) 95(6) 93(9) 95(6) 96(5) 98(4) 96(6) 96(6) 92(7) 93(8) 89(8) 90(8) 94(7) 90(8) 95(6) 92(7) 92(7) 96(6) 91(11) 91(7) 90(7) 89(8) 93(6) 92(11) 92(8) 90(8) 93(7) 96(6) 92(7) 93(8) 89(8) 90(8) 94(7) 90(8) 93(7) 92(7) 92(7) 96(9) 88(11) 92(9) 91(6) 93(7)

78(10) 85(10) 85(9) 84(7) 85(11) 80(10) 87(12) 84(11) 86(10) 86(9) 74(12) 83(10) 92(7) 86(10) 86(12) 78(9) 81(8) 78(14) 66(14) 79(10) 82(6) 81(8) 82(11) 83(9) 84(10) 79(11) 84(8) 76(8) 80(11) 84(12) 86(11) 79(10) 76(10) 71(10) 77(11) 78(9) 81(8) 78(14) 66(14) 74(6) 82(6) 81(8) 80(10) 83(9) 84(10) 83(10) 77(10) 80(7) 73(12) 82(10)

89(11) 85(9) 87(11) 81(12) 85(11) 93(7) 89(9) 89(10) 83(11) 90(13) 87(8) 87(10) 94(7) 81(9) 90(11) 89(9) 79(10) 85(8) 80(15) 80(11) 91(7) 85(8) 83(13) 83(9) 82(9) 88(8) 87(13) 87(10) 85(10) 82(10) 93(8) 82(11) 80(14) 77(11) 81(9) 89(9) 79(10) 85(8) 80(16) 79(9) 91(7) 85(8) 81(12) 83(9) 82(9) 90(14) 85(8) 85(11) 80(9) 85(13)

61(14) 73(10) 70(10) 69(8) 64(13) 59(13) 71(9) 69(12) 68(12) 69(10) 60(9) 77(12) 88(9) 67(12) 70(7) 60(13) 61(11) 67(6) 64(16) 69(11) 60(10) 61(11) 61(11) 62(9) 63(12) 63(11) 65(10) 62(11) 59(13) 62(5) 58(15) 62(11) 57(13) 68(6) 62(13) 60(13) 61(11) 67(6) 64(16) 66(12) 60(10) 61(11) 60(15) 62(9) 63(12) 66(11) 67(12) 71(10) 65(15) 61(18)

48(14) 74(16) 68(15) 73(13) 68(19) 50(10) 71(7) 68(16) 69(9) 75(11) 60(15) 71(13) 81(11) 68(17) 73(12) 56(14) 60(13) 67(15) 55(16) 65(11) 53(12) 63(15) 61(14) 58(11) 61(13) 55(12) 65(15) 65(13) 60(17) 61(12) 51(11) 60(15) 56(12) 59(7) 65(16) 56(14) 60(13) 67(15) 55(16) 64(18) 53(12) 63(15) 52(17) 58(11) 61(13) 57(13) 67(13) 67(13) 52(15) 58(12)

96(6) 94(6) 97(4) 97(5) 94(7) 94(7) 97(5) 95(6) 97(5) 96(6) 95(6) 96(5) 96(5) 97(5) 96(6) 96(6) 96(5) 95(6) 89(7) 91(7) 96(6) 96(5) 94(6) 93(7) 93(7) 95(8) 93(5) 93(7) 91(7) 94(6) 94(6) 92(6) 94(8) 91(9) 95(7) 96(6) 96(5) 95(6) 89(7) 92(7) 96(6) 96(5) 92(8) 93(7) 93(7) 95(9) 94(6) 96(5) 92(7) 93(7)

66(13) 53(14) 75(8) 84(8) 84(13) 75(9) 52(16) 83(9) 86(10) 86(9) 72(10) 59(13) 90(7) 85(10) 86(11) 72(11) 68(13) 72(8) 65(19) 83(8) 73(11) 71(11) 85(12) 77(11) 83(8) 75(12) 70(8) 73(14) 78(13) 89(10) 64(8) 71(12) 73(7) 70(11) 76(12) 72(11) 68(13) 72(8) 65(19) 80(10) 73(11) 72(12) 83(11) 77(11) 83(8) 74(14) 70(11) 80(14) 69(8) 84(10)

76(13) 64(13) 78(7) 82(11) 84(11) 84(9) 74(10) 89(7) 83(11) 90(13) 84(14) 75(17) 95(7) 81(10) 90(11) 84(12) 67(8) 74(10) 80(17) 74(8) 73(7) 71(12) 79(13) 84(10) 83(11) 83(10) 70(8) 78(12) 82(7) 82(11) 83(12) 72(14) 79(14) 75(11) 73(11) 84(12) 67(8) 74(10) 80(17) 74(10) 73(7) 71(12) 78(13) 84(10) 83(11) 83(11) 65(14) 72(13) 79(13) 80(12)

60(12) 53(14) 62(13) 70(10) 65(13) 62(11) 56(10) 74(8) 68(12) 68(10) 61(8) 53(12) 83(9) 68(12) 68(7) 60(17) 40(7) 58(11) 65(18) 56(12) 63(11) 48(4) 58(13) 52(7) 57(11) 62(11) 49(15) 56(11) 62(16) 56(7) 62(15) 46(10) 50(12) 65(8) 53(9) 60(17) 40(7) 58(11) 65(19) 52(9) 63(11) 48(5) 61(12) 52(7) 57(11) 56(16) 52(12) 59(7) 64(18) 60(17)

54(15) 50(11) 67(19) 71(10) 68(16) 64(13) 50(11) 69(13) 69(8) 73(12) 59(11) 51(12) 77(12) 69(16) 70(12) 60(12) 49(12) 54(14) 54(14) 51(17) 62(7) 45(16) 59(14) 45(16) 51(12) 61(18) 49(10) 51(11) 55(12) 58(13) 54(17) 45(15) 51(10) 56(10) 45(14) 60(12) 49(12) 54(14) 54(14) 48(10) 62(7) 45(16) 51(13) 45(16) 51(12) 58(16) 57(11) 62(11) 51(15) 56(12)

95(6) 98(4) 97(5) 97(5) 93(8) 98(4) 91(10) 93(7) 98(4) 96(6) 96(6) 95(7) 95(5) 97(5) 97(6) 93(7) 82(14) 93(8) 91(7) 92(10) 90(11) 86(11) 92(8) 93(7) 90(8) 95(5) 87(17) 95(9) 89(7) 93(5) 94(7) 83(4) 94(6) 89(9) 95(7) 93(7) 82(14) 93(8) 91(7) 90(7) 90(11) 86(11) 91(8) 93(7) 90(8) 93(8) 78(14) 94(8) 93(7) 92(6)

86(12) 57(11) 74(7) 85(8) 84(11) 88(8) 56(12) 80(10) 85(10) 86(8) 86(8) 54(14) 91(7) 85(10) 88(10) 80(8) 62(8) 72(9) 66(17) 77(8) 80(8) 68(18) 84(10) 81(9) 85(8) 85(12) 64(9) 72(9) 82(13) 89(11) 75(15) 72(8) 67(7) 65(12) 74(12) 80(8) 62(8) 72(9) 66(17) 72(9) 80(8) 67(18) 82(15) 81(9) 85(8) 82(11) 65(10) 76(20) 71(10) 78(15)

90(8) 66(16) 82(11) 78(9) 84(6) 94(6) 69(14) 86(12) 83(11) 89(11) 90(6) 65(13) 94(5) 81(10) 92(12) 89(11) 65(12) 75(9) 81(11) 74(9) 88(11) 65(18) 82(12) 83(11) 80(11) 85(9) 72(14) 77(6) 83(11) 80(6) 92(8) 74(13) 77(12) 79(10) 75(8) 89(11) 65(12) 75(9) 80(12) 74(10) 88(11) 66(18) 79(18) 83(11) 80(11) 90(11) 59(15) 73(12) 74(10) 79(12)

59(14) 56(12) 66(12) 71(10) 62(11) 62(14) 54(8) 75(11) 69(9) 68(11) 69(15) 58(12) 76(12) 70(9) 67(10) 62(12) 44(16) 58(15) 62(14) 69(13) 63(10) 47(11) 60(8) 58(9) 57(7) 64(13) 47(11) 55(17) 58(9) 55(3) 57(9) 51(13) 47(9) 65(10) 57(10) 62(12) 44(16) 58(15) 63(14) 58(13) 63(10) 48(11) 60(6) 58(9) 57(7) 66(7) 47(10) 58(7) 64(16) 58(13)

56(21) 53(6) 68(17) 70(14) 64(17) 59(17) 50(14) 72(17) 67(10) 70(16) 65(16) 49(13) 77(17) 71(13) 67(12) 55(9) 46(12) 57(19) 56(14) 58(16) 54(14) 41(11) 59(11) 51(12) 57(13) 54(11) 49(10) 51(13) 57(11) 54(13) 49(9) 47(14) 44(14) 58(14) 52(13) 55(9) 46(12) 57(19) 56(14) 54(15) 54(14) 40(10) 48(15) 51(12) 57(13) 59(14) 54(12) 56(11) 50(14) 53(8)

96(6) 98(5) 94(6) 98(4) 92(8) 95(6) 96(9) 92(5) 98(4) 94(7) 95(6) 94(8) 96(6) 97(5) 97(6) 95(6) 81(12) 92(8) 89(8) 89(7) 91(7) 89(12) 92(5) 92(8) 89(7) 95(6) 91(10) 93(7) 90(6) 92(6) 92(6) 87(6) 91(8) 90(8) 95(6) 95(6) 81(12) 92(8) 89(8) 90(7) 91(7) 89(12) 88(5) 92(8) 89(7) 96(7) 76(17) 92(8) 91(6) 92(8)

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 17

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

Table 14 Positive predictive scores (in % by class) delivered by different settings of the OPF classifier.

18

Wavelet

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

OPF Euclidean

OPF Manhattan

OPF Canberra

OPF Squared Chi-square

A

B

C

D

E

A

B

C

D

E

A

B

C

D

E

A

B

C

D

E

76 80 87 84 85 76 79 84 87 86 78 71 90 85 86 76 75 71 70 79 77 74 78 82 83 75 75 76 79 83 78 71 66 64 81 76 75 71 70 75 77 74 74 82 83 82 70 79 72 80

83 82 86 82 85 85 84 86 84 88 84 80 92 84 89 83 69 78 81 73 82 65 81 83 81 82 67 71 84 81 86 67 73 77 75 83 69 78 81 73 82 65 76 83 81 83 63 79 77 80

57 64 67 70 65 53 72 73 67 73 57 61 79 69 71 58 53 64 66 68 58 54 59 63 64 63 63 58 65 57 58 58 55 63 62 58 53 64 67 62 58 54 55 63 64 60 53 62 66 58

46 56 61 69 60 45 65 69 66 66 51 55 75 68 66 45 50 46 50 61 48 49 47 56 56 48 49 53 54 49 46 49 53 57 53 45 50 46 51 59 48 49 42 56 56 57 52 55 51 55

92 94 94 94 94 93 93 94 97 95 91 91 96 96 95 95 87 91 90 92 93 88 95 93 91 93 88 89 93 91 93 91 92 91 94 95 87 91 91 91 93 88 93 93 91 93 85 88 91 92

80 86 87 86 85 83 88 86 87 86 77 84 92 84 85 83 80 78 67 82 87 84 80 84 82 81 83 81 79 83 88 80 72 71 79 83 80 78 66 78 87 84 78 84 82 87 83 83 75 81

84 87 86 83 86 89 88 87 84 89 83 87 93 84 89 84 81 86 78 76 87 82 84 84 84 84 83 81 86 83 88 75 76 77 78 84 81 86 77 77 87 82 81 84 84 86 80 84 77 84

59 75 70 71 67 56 73 71 69 73 63 76 85 67 72 63 63 71 64 68 63 65 64 61 65 67 71 66 62 66 60 66 63 68 66 63 63 71 64 65 63 65 62 61 65 65 67 71 64 59

51 66 66 67 61 51 68 64 65 70 55 70 82 64 69 50 56 59 53 63 49 58 56 55 57 51 59 58 53 58 50 54 53 58 57 50 56 59 53 61 49 58 50 55 57 57 64 68 51 60

94 94 96 96 95 93 96 95 96 95 93 95 96 96 95 96 94 95 91 92 94 95 95 93 92 92 94 92 94 93 93 93 95 93 95 96 94 95 91 92 94 95 93 93 92 94 93 92 92 92

68 54 76 85 83 77 55 85 87 86 75 62 91 84 85 79 72 67 66 78 71 74 82 76 83 77 74 77 77 85 66 73 69 69 74 79 72 67 66 76 71 74 79 76 83 73 70 77 70 81

73 61 78 84 85 81 68 85 84 89 78 70 93 84 89 77 64 78 77 76 75 66 82 83 84 78 68 77 84 85 76 65 73 77 75 77 64 78 77 73 75 66 80 83 84 81 65 76 75 82

62 54 66 71 68 63 59 74 69 72 65 56 82 67 70 60 45 63 65 58 66 53 63 58 59 67 52 60 63 59 66 52 55 65 58 60 45 63 65 54 66 54 65 58 59 57 56 63 63 59

51 52 63 66 61 62 50 69 65 69 55 49 79 66 67 56 45 51 51 50 57 45 52 37 50 55 49 47 49 55 49 41 51 55 42 56 45 51 51 50 57 45 47 37 50 56 58 59 49 55

94 87 94 96 94 95 86 93 97 95 95 90 95 96 95 92 69 88 92 88 90 74 92 92 86 93 77 87 92 89 94 78 93 90 90 92 69 88 91 88 90 74 89 92 86 95 65 85 93 93

85 60 78 83 85 90 57 83 87 86 87 54 90 85 86 84 69 72 66 78 81 72 82 80 82 83 71 76 79 83 79 74 65 65 77 84 69 72 66 74 81 72 77 80 82 82 70 74 69 75

90 63 79 82 85 91 65 85 84 88 87 62 93 84 89 86 59 76 78 73 86 55 81 83 82 84 60 71 85 83 86 64 72 77 75 86 59 76 77 70 86 56 78 83 82 87 52 72 73 79

63 60 68 69 64 65 58 73 67 72 72 56 78 69 70 60 45 61 65 66 64 50 65 61 59 61 49 60 63 59 60 57 51 63 58 60 45 61 66 59 64 50 64 61 59 65 51 62 65 57

53 52 63 70 59 56 49 66 66 65 62 50 73 67 66 54 44 53 50 58 52 44 54 47 53 57 52 46 48 49 46 44 43 57 48 54 44 53 50 54 52 43 45 47 53 59 54 54 50 53

94 89 95 95 94 94 88 94 97 94 94 90 96 96 96 95 74 89 90 89 91 76 94 92 90 93 79 88 93 92 93 85 91 91 95 95 74 89 91 89 91 76 89 92 90 94 67 85 90 92

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

Table 15 F-measure scores (in % by class) delivered by different settings of the OPF classifier.

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

of epilepsy diagnosis through EEG signal classification. For this purpose, four wavelet basis configured with different orders and three feature selectors were adopted for data preprocessing. When contrasted with traditional supervised learning algorithms, namely, SVM-RBF, Bayesian, and ANN-MLP, OPF classifiers have prevailed both in terms of efficiency (computational run times) and effectiveness (accuracy, precision, recall, and F-measure) criteria. In particular, OPF classifiers configured with the Manhattan distance and induced on the EEG signals preprocessed via Coiflets and Cfss algorithm have shown very satisfactory levels of performance, considering either normalized or non-normalized data. As future work, we plan to conduct novel experiments with other types of feature extractors, such as those based on nonlinear dynamics (Lyapunov exponents) [16], independent component analysis [13], and EMD [49]. The hybridization of OPF and SVM

19

classifiers into a single algorithm is also a line of research under investigation, aiming at combining their strengths for coping with hard classification problems, such as those related to BSP.

Acknowledgments The first and last author thank National Council for Research and Development (CNPq) and Ceará Foundation for the Support of Scientific and Technological Development (FUNCAP) for providing financial support through a DCR grant #35.0053/2011.1 to UNIFOR. The second, third, and fourth authors also acknowledge the sponsorship from CNPq via grants #475406/2010-9, #304603/ 2012-0, 308816/2012-9, and #303182/2011-3. The fourth author is also grateful to FAPESP grant #2009/16206-1.

Table 16 Average scores of accuracy, training and test run times (in ms) delivered by SVM-RBF, Bayesian and ANN-MLP classifiers. Wavelet

Coif2-non_norm Coif2-noFS Coif2-cfss Coif2-infogain Coif2-relief Coif3-non_norm Coif3-noFS Coif3-cfss Coif3-infogain Coif3-relief Coif4-non_norm Coif4-noFS Coif4-cfss Coif4-infogain Coif4-relief Db2-non_norm Db2-noFS Db2-cfss Db2-infogain Db2-relief Db3-non_norm Db3-noFS Db3-cfss Db3-infogain Db3-relief Db4-non_norm Db4-noFS Db4-cfss Db4-infogain Db4-relief Haar-non_norm Haar-noFS Haar-cfss Haar-infogain Haar-relief Sym2-non_norm Sym2-noFS Sym2-cfss Sym2-infogain Sym2-relief Sym3-non_norm Sym3-noFS Sym3-cfss Sym3-infogain Sym3-relief Sym4-non_norm Sym4-noFS Sym4-cfss Sym4-infogain Sym4-relief

SVM-RBF

Bayesian

ANN-MLP

Acc [%]

Train time [ms]

Test time [ms]

Acc

Train time [ms]

Test time [ms]

Acc

Train time [ms]

Test time [ms]

25.6(67) 81.8(77) 80.2(30) 84.4(65) 79.4(50) 24.4(51) 80.6(65) 81.2(41) 80.2(55) 78.2(58) 31.4(50) 792(83) 81.6(48) 80.8(56) 78.8(70) 44.4(42) 75.2(61) 73.0(54) 70.8(58) 73.8(55) 42.8(53) 674(82) 71.8(44) 74.6(40) 74.4(60) 29.2(67) 71.6(49) 75.0(58) 73.6(51) 72.6(71) 45.0(38) 72.0(60) 70.2(45) 74.2(47) 72.2(36) 44.4(42) 75.2(61) 73.0(54) 69.4(55) 73.6(54) 42.8(53) 69.4(75) 68.8(66) 74.6(40) 74.4(60) 41.8(37) 72.6(56) 72.8(41) 76.4(41) 63.0(70)

27,568.61(429.3) 18,719.68(253.6) 14,436.41(279.5) 12,948.55(212.2) 13,567.97(286.9) 27,206.42(247.9) 18,626.86(147.4) 13,913.29(123.4) 12,891.43(197.1) 13,695.19(247.3) 27,066.14(21.3) 18,547.42(167.2) 13,586.72(172.7) 12,854.24(187.9) 14,040.37(317.7) 9,763,582.28(5,304,385.7) 21,427.69(279.0) 17,008.38(214.1) 16,636.12(311.9) 15,789.13(141.2) 9,573,029.60(4,506,036.6) 21,967.88(110.6) 17,914.36(180.3) 15,869.68(230.7) 16,340.22(289.7) 10,209,792.18(5,354,475.8) 21,973.33(162.7) 17,870.59(197.1) 16,386.31(333.5) 15,521.78(247.1) 1,372,857.42(4,254,705.2) 21,657.75(167.4) 17,405.06(197.1) 15,398.97(196.8) 16,779.54(240.9) 9,760,188.90(5,301,008.2) 21,436.12(208.3) 16,820.45(250.3) 16,285.33(174.6) 16,076.74(274.9) 9,574,059.54(4,501,078.2) 22,005.92(162.1) 18,496.28(173.0) 15,685.20(237.0) 16,163.48(224.1) 22,928,221.68(6,758,366.6) 22,086.40(193.6) 17,649.64(143.8) 15,855.21(328.4) 21,072.77(12.3)

18.38(1.63) 9.39(2.13) 5.85(0.77) 4.35(0.25) 4.44(0.27) 17.83(1.19) 9.28(1.25) 5.16(0.23) 4.46(0.22) 4.27(0.33) 19.25(1.60) 9.61(1.13) 5.76(0.80) 4.49(0.28) 4.28(0.13) 17.59(1.34) 13.67(1.67) 6.60(0.90) 4.80(0.21) 5.13(0.67) 22.67(9.80) 12.84(1.18) 6.84(0.79) 5.02(0.32) 5.61(0.91) 20.35(7.37) 12.17(1.39) 9.01(1.44) 5.10(0.46) 5.15(0.38) 17.66(1.65) 12.13(2.40) 6.24(0.38) 4.74(0.39) 5.03(0.35) 18.68(1.55) 12.47(1.45) 6.41(1.21) 4.98(0.39) 5.08(0.43) 18.15(1.96) 13.46(2.26) 6.72(0.78) 5.15(0.32) 5.21(0.34) 23.28(11.47) 13.36(1.69) 7.21(0.68) 5.03(0.24) 7.03(0.49)

70.8(54) 75.8(50) 80.4(51) 80.6(45) 78.2(64) 70.0(58) 78.6(49) 812(77) 81.0(51) 81.6(72) 72.6(73) 71.6(63) 87.4(50) 80.2(48) 82.6(38) 72.0(71) 66.4(62) 69.6(45) 70.6(40) 75.2(47) 72.0(47) 658(70) 72.0(66) 75.2(49) 74.8(52) 71.2(83) 69.2(51) 69.8(55) 75.4(57) 73.8(52) 72.2(61) 67.2(58) 67.6(40) 70.0(56) 73.4(48) 72.0(71) 66.4(62) 69.6(45) 71.0(39) 72.4(70) 72.0(47) 65.6(70) 68.2(61) 75.2(49) 74.8(52) 73.8(68) 64.4(85) 72.4(66) 71.6(59) 73.0(57)

11.95(3.67) 13.64(4.68) 7.99(2.74) 4.98(1.62) 6.51(2.18) 13.66(4.91) 9.94(2.97) 7.51(3.01) 4.98(2.34) 6.01(2.72) 10.09(2.85) 13.54(3.31) 7.62(2.84) 5.79(2.25) 6.30(2.04) 13.51(4.78) 12.28(3.96) 7.12(2.54) 6.49(2.21) 6.23(2.45) 14.57(4.29) 13.64(4.25) 8.99(2.61) 5.76(2.29) 5.54(2.46) 12.76(4.75) 15.82(2.29) 8.74(3.58) 6.43(2.07) 7.82(0.03) 16.81(2.59) 16.30(1.47) 12.31(0.66) 7.08(1.15) 7.45(0.98) 16.78(3.03) 16.91(1.80) 9.18(1.22) 7.44(0.99) 8.08(0.62) 14.65(2.87) 14.07(4.16) 12.08(0.53) 6.85(1.97) 7.39(1.48) 15.97(2.11) 13.46(5.15) 11.96(2.17) 7.61(0.55) 7.86(0.05)

6.13(0.50) 5.92(0.17) 5.49(2.21) 4.13(1.53) 4.80(1.45) 5.96(0.30) 6.19(1.01) 5.06(1.94) 3.85(1.64) 4.77(2.00) 6.25(0.74) 5.84(0.03) 5.29(1.47) 5.08(2.02) 4.79(1.21) 6.78(2.33) 6.18(0.66) 5.55(1.70) 4.71(1.65) 4.99(1.70) 5.88(0.09) 6.13(0.84) 6.46(1.79) 4.54(1.74) 4.36(1.96) 6.12(0.61) 5.83(0.02) 5.01(1.84) 5.08(1.76) 5.52(1.22) 5.90(0.12) 6.23(0.88) 6.14(2.23) 5.83(0.87) 5.87(0.82) 5.95(0.34) 5.99(0.45) 6.15(1.72) 5.71(1.08) 6.10(1.04) 6.07(0.44) 6.26(0.97) 6.10(2.17) 5.18(1.47) 6.06(1.17) 5.84(0.02) 5.87(0.11) 6.00(1.63) 6.07(0.78) 5.37(0.94)

31.8(107) 61.6(57) 70.6(52) 58.8(53) 66.8(79) 34.2(154) 59.2(120) 632(162) 56.0(114) 68.4(51) 30.8(125) 556(144) 67.0(106) 60.6(106) 62.4(133) 29.6(88) 59.0(63) 69.2(103) 35.2(100) 67.4(54) 31.6(139) 526(69) 64.4(42) 65.4(74) 65.2(74) 30.8(93) 56.0(79) 60.4(83) 62.2(67) 64.0(110) 31.2(82) 57.8(66) 61.2(66) 64.0(83) 64.0(49) 300.(105) 57.4(68) 66.8(58) 60.8(66) 65.4(63) 32.0(125) 50.4(88) 62.0(71) 64.0(44) 69.8(73) 32.0(114) 54.8(91) 58.0(87) 63.0(93) 44.4(97)

19,227.69(147.7) 19,377.45(164.2) 16,978.41(140.2) 15,985.90(130.6) 16,230.57(166.4) 19,105.02(126.1) 19,428.00(215.4) 16,837.72(141.4) 16,035.60(214.5) 16,089.91(121.4) 19,115.42(177.5) 19,379.60(209.3) 17,080.95(92.7) 15,992.41(160.9) 16,165.28(241.6) 19,427.04(406.9) 19,355.38(1010.5) 16,696.58(185.7) 16,248.33(99.1) 16,192.33(218.9) 19,187.43(123.5) 19,612.89(230.8) 17,092.99(166.2) 16,128.92(169.2) 16,124.40(118.5) 19,194.29(148.6) 19,619.92(214.4) 17,683.77(166.7) 16,168.08(202.1) 16,104.08(114.0) 19,087.22(151.2) 19,573.10(190.9) 17,353.03(118.3) 16,202.26(244.9) 16,132.51(89.7) 19,180.42(106.8) 19,690.59(186.7) 16,597.29(89.0) 16,126.94(165.7) 16,105.99(133.3) 19,128.59(183.8) 19,556.20(196.1) 17,391.26(138.8) 16,033.80(143.4) 16,077.47(85.7) 19,149.19(74.9) 19,547.43(223.1) 17,636.37(131.7) 16,149.00(197.6) 16,072.12(110.0)

0.15(0.00) 0.15(0.00) 0.13(0.00) 0.13(0.00) 0.13(0.00) 0.15(0.00) 0.15(0.00) 0.13(0.00) 0.13(0.00) 0.13(0.00) 0.15(0.00) 0.15(0.00) 0.13(0.00) 0.13(0.00) 0.13(0.01) 0.15(0.00) 0.15(0.00) 0.13(0.00) 0.13(0.00) 0.13(0.00) 0.15(0.00) 0.15(0.00) 0.14(0.01) 0.13(0.00) 0.13(0.00) 0.15(0.00) 0.15(0.00) 0.14(0.00) 0.13(0.00) 0.13(0.00) 0.15(0.00) 0.15(0.00) 0.14(0.00) 0.13(0.00) 0.13(0.00) 0.15(0.00) 0.15(0.00) 0.13(0.00) 0.13(0.00) 0.13(0.00) 0.15(0.00) 0.15(0.00) 0.14(0.00) 0.13(0.00) 0.13(0.00) 0.15(0.00) 0.15(0.00) 0.14(0.00) 0.13(0.00) 0.13(0.00)

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

20

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Appendix In this section we present an extra set of tables with the complete results of the experimental section. Table 12 displays the mean OPF accuracy, train and test times for all datasets considering Euclidean, Manhattan, Canberra and Squared Chi-square distance measures. Tables 13 and 14 present the sensitivity scores and the positive predictive scores for each class, respectively. Finally, Table 16 displays the mean accuracy, training and testing times for SVM-RBF, Bayesian and ANN-MLP classifiers.

References [1] K. Najarian, R. Splinter, Biomedical Signal and Image Processing, second ed., CRC Press/Taylor & Francis Group, Broken Sound Parkway, NW, USA, 2012. [2] B.S. Chang, D.H. Lowenstein, Epilepsy, New Engl. J. Med. 349 (2003) 1257–1266. [3] T.R. Browne, G.L. Holmes, Handbook of Epilepsy, Lippincott Williams & Wilkins, Philadelphia, PA, USA, 2003. [4] V.P. Nigam, D. Graupe, A neural-network-based detection of epilepsy, Neurol. Res. 26 (2004) 55–60. [5] L.M. Patnaik, O.K. Manyam, Epileptic EEG detection using neural networks and post-classification, Comput. Methods Progr. Biomed. 91 (2) (2008) 100–109. [6] A. Subasi, Epileptic seizure detection using dynamic wavelet network, Expert Syst. Appl. 28 (2005) 701–711. [7] A. Subasi, E. Ercelebi, Classification of EEG signals using neural network and logistic regression, Comput. Methods Progr. Biomed. 78 (2005) 87–99. [8] A. Subasi, EEG signal classification using wavelet feature extraction and a mixture of expert model, Expert Syst. Appl. 32 (2007) 1084–1093. [9] E.D. Übeyli, Wavelet/mixture of experts network structure for EEG signals classification, Expert Syst. Appl. 34 (2008) 1954–1962. [10] İ. Güler, E.D. Übeyli, Adaptive neuro-fuzzy inference system for classification of EEG signals using wavelet coefficients, J. Neurosci. Methods 148 (2005) 113–121. [11] N. Kannathal, L.C. Min, U.R. Acharya, P.K. Sadasivan, Entropies for detection of epilepsy in EEG, Comput. Methods Progr. Biomed. 80 (2005) 187–194. [12] A.T. Tzallas, M.G. Tsipouras, D.I. Fotiadis, Epileptic seizure detection in EEGs using time-frequency analysis, IEEE Trans. Inf. Technol. Biomed. 13 (2009) 703–710. [13] Y. Kocyigit, A. Alkan, H. Erol, Classification of EEG recordings by using fast independent component analysis and artificial neural network, J. Med. Syst. 32 (2008) 17–20. [14] C.A.M. Lima, A.L.V. Coelho, S. Chagas, Automatic EEG signal classification for epilepsy diagnosis with relevance vector machines, Expert Syst. Appl. 36 (2009) 10054–10059. [15] C.A.M. Lima, A.L.V. Coelho, M. Eisencraft, Tackling EEG signal classification with least squares support vector machines: a sensitivity analysis study, Comput. Biol. Med. 40 (2010) 705–714. [16] C.A.M. Lima, A.L.V. Coelho, Kernel machines for epilepsy diagnosis via EEG signal classification: a comparative study, Artif Intell. Med. 53 (2011) 83–95. [17] R.G. Andrzejak, G. Widman, K. Lehnertz, C. Rieke, P. David, C.E. Elger, The epileptic process as nonlinear deterministic dynamics in a stochastic environment: an evaluation on mesial temporal lobe epilepsy, Epilepsy Res. 44 (2001) 129–140. [18] J.P. Papa, A.X. Falcão, C.T.N. Suzuki, Supervised pattern classification based on Optimum-Path Forest, Int. J. Imag. Syst. Technol. 19 (2) (2009) 120–131. [19] J.P. Papa, V.H.C. Albuquerque, A.X. Falcão, J.M.R.S. Tavares, Efficient supervised Optimum-Path Forest classification for large datasets, Pattern Recognit. 45 (2012) 512–520. [20] A.I. Iliev, M.S. Scordilis, J.P. Papa, A.X. Falcão, Spoken emotion recognition through Optimum-path Forest classification using glottal features, Comput. Speech Lang. (2010) 445–460. [21] J.P. Papa, A.X. Falcão, G.M. Freitas, A.M.H. Ávila, Robust pruning of training patterns for Optimum-Path Forest classification applied to satellite-based rainfall occurrence estimation, IEEE Geosci. Remote Sens. Lett. 7 (2) (2010) 396–400. [22] J.P. Papa, R.Y.M. Nakamura, V.H.C. Albuquerque, A.X. Falcão, J.M.R.S. Tavares, Computer techniques towards the automatic characterization of graphite particles in metallographic images of industrial materials, Expert Syst. Appl. 40 (2013) 590–597. [23] C.C.O. Ramos, A.N. Souza, J.P. Papa, A.X. Falcão, A new approach for nontechnical losses detection based on Optimum-Path Forest, IEEE Trans. Power Syst. 26 (2011) 181–189. [24] C.R. Pereira, R.Y.M. Nakamura, K.A.P. Costa, J.P. Papa, An Optimum-Path Forest framework for intrusion detection in computer networks, Eng. Appl. Artif. Intell. 25 (2012) 1226–1234.

[25] A.T. Silva, A.X. Falcão, L.P. Magalhães, Active learning paradigms for CBIR systems based on Optimum-path Forest classification, Pattern Recognit. 44 (2011) 2971–2978. [26] E.J.S. Luz, T.M. Nunes, V.H.C. Albuquerque, J.P. Papa, D. Menotti, ECG arrhythmia classification based on Optimum-Path Forest, Expert Syst. Appl. 40 (9) (2013) 3561–3573. [27] A.T. Silva, J.A. Santos, A. Falcão, R.S. Torres, L.P. Magalhães, Incorporating multiple distance spaces in Optimum-Path Forest classification to improve feedback-based learning, Comput. Vis. Image Underst. 116 (2012) 510–523. [28] E.D. Übeyli, Statistics over features: EEG signals analysis, Comput. Biol. Med. 39 (2009) 733–741. [29] Y.Y. Tang, Wavelet Theory Approach to Pattern Recognition, World Scientific Publishing, Hackensack, NJ, USA, 2009. [30] D.F. Walnut, An Introduction to Wavelet Analysis, Birkhäuser, Virginia, Fairfax, USA, 2004. [31] T. Gandhi, B.K. Panigrahi, S. Anan, A comparative study of wavelet families for EEG signal classification, Neurocomputing 74 (17) (2011) 3051–3057. [32] I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, second ed., Morgan Kaufmann, San Francisco, CA, USA, 2005. [33] H. Ocak, Optimal classification of epileptic seizures in EEG using wavelet analysis and genetic algorithm, Signal Process. 88 (2008) 1858–1867. [34] J.P. Papa, C.T.N. Suzuki, A.X. Falcão, LibOPF: a library for the design of Optimum-Path Forest classifiers, Software Version 2.0. Available at 〈http:// www.ic.unicamp.br/  afalcao/LibOPF〉, 2009. [35] E.W. Dijkstra, A note on two problems in connexion with graphs, Numer. Math. 1 (1959) 269–271. [36] A.X. Falcão, J. Stolfi, R.A. Lotufo, The image foresting transform: theory, algorithms, and applications, IEEE Trans. Pattern Anal. Mach. Intell. 26 (2004) 19–29. [37] C. Allène, J.Y. Audibert, M. Couprie, J. Cousty, R. Keriven, Some links between min-cuts, optimal spanning forests and watersheds, in: 8th International Symposium on Mathematical Morphology (ISMM'07), October 10–13, Rio de Janeiro, Brazil, 2007, pp. 253–264. [38] E.T. Jaynes, Probability Theory: The Logic of Science, Cambridge University Press, UK, 2003. [39] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, second ed., WileyInterscience Publication, New York, NY, USA, 2000. [40] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed., Prentice Hall, Upper Saddle River, NJ, USA, 1999. [41] S. Nissen, Implementation of a Fast Artificial Neural Network Library (FANN), Department of Computer Science University of Copenhagen (DIKU). Software available at 〈http://leenissen.dk/fann/〉, 2003. [42] V.N. Vapnik, An overview of statistical learning theory, IEEE Trans. Neural Netw. 10 (5) (1999) 988–999. [43] B. Schölkopf, A.J. Smola, Learning with Kernels, MIT Press, Cambridge, MA, 2002. [44] C. Cortes, V. Vapnik, Support vector networks, Mach. Learn. 20 (1995) 273–297. [45] C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (2011) 1–27 (software available at URL 〈http:// www.csie.ntu.edu.tw/  cjlin/libsvm〉). [46] N.F. Güler, E.D. Übeyli, İ. Güler, Recurrent neural networks employing Lyapunov exponents for EEG signals classification, Expert Syst. Appl. 29 (2005) 506–514. [47] E.D. Übeyli, Combined neural network model employing wavelet coefficients for EEG signals classification, Digit. Signal Process. 19 (2009) 297–308. [48] T.M. Nunes, V.H.C. Albuquerque, J.P. Papa, C.S. Silva, P.G. Normando, E.P. Moura, J.M.R.S. Tavares, Automatic microstructural characterization and classification using artificial intelligence techniques on ultrasound signals, Expert Syst. Appl. 40 (8) (2013) 3096–3105. [49] R.B. Pachori, V. Bajaj, Analysis of normal and epileptic seizure EEG signals using empirical mode decomposition, Comput. Methods Progr. Biomed. 104 (3) (2011) 373–381.

Thiago M. Nunes is graduated in Mechatronics Technology at the Federal Institute of Education, Science and Technology of Ceará (IFCE, 2009). Currently, he is an M.Sc. student in the Department of Teleinformatics Engineering at Federal University of Ceará (UFC). He is a collaborator of the Edmond and Lily Safra International Institute of Neuroscience of Natal (ELS-IINN), and also the Brazilian National Institute of Science and Technology/ Brain-Machine Interface (INCT/INCEMAQ). His major fields of interest are in Pattern Recognition, Artificial Intelligence, Biomedical Signal/Image Analysis and Processing.

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

T.M. Nunes et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ André L.V. Coelho earned the B.Sc. degree in Computer Engineering (1996), and the M.Sc. (1998) and Ph.D. (2004) degrees in Electrical Engineering, all from the State University of Campinas (Unicamp), SP, Brazil. He has a record of publications related to the themes of machine learning, data mining, computational intelligence, metaheuristics, and multiagent systems. He is a member of ACM and IEEE, and has served as a reviewer for a number of scientific conferences and journals. Currently, he is an adjunct professor affiliated with the Graduate Program in Applied Informatics at the University of Fortaleza (Unifor), Ceará, Brazil.

Clodoaldo A.M. Lima received the B.Sc. degree in Electrical Engineering from the Federal University of Juiz de Fora (UFJF), Juiz de Fora, Brazil, in 1997, and the M.Sc. and Ph.D. degrees in Electrical Engineering from the University of Campinas (UNICAMP), Campinas, Brazil, in 2000 and 2005, respectively. From February 2005 to February 2006, he was a postdoctoral researcher at the same university. He has published dozens of articles and chapters on machine learning and related topics, in computer science journals and proceedings. Since 2010 he is a Doctor Professor in the University of São Paulo, SP, Brazil. The main topics of his research are computational intelligence, machine learning, statistical learning theory and signal processing.

21

Victor Hugo C. de Albuquerque has a Ph.D. in Mechanical Engineering with emphasis on Materials from the Federal University of Paraíba (UFPB, 2010), an M.Sc. in Teleinformatics Engineering from the Federal University of Ceará (UFC, 2007), and he graduated in Mechatronics Technology at the Federal Center of Technological Education of Ceará (CEFETCE, 2006). He is currently an Assistant I Professor of the Graduate Program in Applied Informatics at the University of Fortaleza (UNIFOR), and is a collaborator professor of the Graduate Program in Neuroengineering of the Alberto Santos Dumont Association for Research Development (AASDAP) and the Edmond and Lily Safra International Institute of Neuroscience of Natal (ELS-IINN). He is also responsible for the partnership between the Control Systems Laboratory (LSC-UNIFOR) and the Brazilian National Institute of Science and Technology – Brain-Machine Interface (INCT-INCEMAQ). He has experience in Computer Systems, mainly in the research fields of: Applied Computing, Intelligent Systems, Visualization and Interaction, with specific interest in Pattern Recognition, Artificial Intelligence, Image Processing and Analysis, as well as Automation with respect to biological signal/image processing, image segmentation, biomedical circuits and human/brain–machine interaction, including Augmented and Virtual Reality Simulation Modeling for animals and humans. He is a co-author of more than 107 papers published in national and international journals and/or presented at conferences. Also he has been evolved in several research projects, both as a researcher and as a scientific coordinator. In addition, he has been a member of scientific and organizing committees of several international conferences, and reviewer of more than 18 international journals.

João P. Papa received his B.Sc. in Information Systems from the São Paulo State University, SP, Brazil. In 2005, he received his M.Sc. in Computer Science from the Federal University of São Carlos, SP, Brazil. In 2008, he received his Ph.D. in Computer Science from the University of Campinas, SP, Brazil. During 2008–2009, he had worked as a post-doctorate researcher at the same institute. He has been a Professor at the Computer Science Department, São Paulo, State University, since 2009, and his research interests include machine learning, pattern recognition and image processing.

Please cite this article as: T.M. Nunes, et al., EEG signal classification for epilepsy diagnosis via optimum path forest – A systematic assessment, Neurocomputing (2014), http://dx.doi.org/10.1016/j.neucom.2014.01.020i

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.