SVM Expetimental Report

July 12, 2017 | Autor: 远翔 高 | Categoria: Machine Learning
Share Embed


Descrição do Produto

Second Project Report for PRML Course

Name: Gaoyuanxiang

Student No:201411010311

Experiment Outlines Notice: Three parts in outlines correspond to three sections in “Steps and Details” respectively. 1.

We first implement the optimal margin classifier (primal problem) and evaluate the precision performance with varied training set size. (Corresponding matlab code is MaxMargin.m) 2. Then we implement the SVM classifier (dual problem) and evaluate the precision performance with varied training set size. (Corresponding matlab code is SupportVectorMachine_Kernel.m) 3. For SVM classifier (dual problem), we are able to use different kernels 1 to map the original attributes space to specific higher dimensional feature spaces. We evaluate the precision performance of different kernels for SVM classifier compared with no kernel case (just inner product in (2-1)). (Corresponding matlab code is SupportVectorMachine_Kernel.m)

Steps and Details I

The Optimal Margin Classifier (Primal Problem) Next, we describe our matlab code MaxMargin.m for this experiment step by step, 1. First we set the number of training instances k ranges from 2 to 9 picking from each class. Consequently, the number of testing instances ranges from 8 to 1 for each class. 2. Then by picked training examples, we solve the primary QP problem of the optimal margin classifier formulated as follows,

1 2 w 2 y (i) (w T x (i) + b) ≥ 1, i = 1,..., m

min w,b s.t.

(1-1) 2

3. After solving the primary QP problem, we have got the optimal parameter w and b . By optimal parameter w and b , we can classify the testing instances using following criteria, 1 2

The va lidation of kernel i s one of the reasons why we s olve the dual problem instead of primary problem. Equa tion (1-1) ca n be solved by quadprog.m function i n Ma tlab.

 wT x + b > 0 → labeled as 1  T  w x + b < 0 → labeled as − 1

(1-2)

4. The precision performance of the optimal margin classifier varied with training set size is shown as follows in Fig. 1.

Fig. 1 the precision performance of optimal margin classifier II The SVM Classifier (Dual Problem) Next, we describe our matlab code SupportVectorMachine_Kernel.m for this experiment step by step, 1. For training examples, we solve the dual QP problem instead of primal problem. Then we obtain the following SVM original formulation, m

1 m (i) (j) maxa W= (a ) ∑ a i − ∑ y y a ia j x (i) , x (j) . 2 i, j 1 =i 1 = s.t.

1,..., m a i ≥ 0, i = m

∑a y i =1

i

(i)

(2-1)

= 0.

2. After getting the optimal Lagrangian multipliers, we solve the optimal parameters w and b by m

w = ∑ α i y (i) x (i) , i =1

(2-2) 3. By criteria (1-2), we classify the testing instances and evaluate the precision performance of SVM classifier in Fig. 2 as follows,

Fig. 2 the performance of SVM classifier and Primal Optimal Margin Classifier As we expected, since the KKT conditions are all satisfied in this case, the primal QP problem (1-1) is equivalent to the dual QP problem formulated as (2-1). So it is shown in Fig. 2 that the performance of the optimal margin classifier is exactly same as the SVM classifier! 4. Also, we can see that the margins for testing instances derived by both the optimal margin classifier and SVM are also the same,

Fig. 3

The normalized margin of testing instances for k=7 case

III The Performance Improvement by Square Kernel 1. For evaluate , we first change the original feature space using Square kernel defined as follows, K (x (i) , x (j) ) = x (i) , x (j)

2

(3-1)

2. Then we evaluate the performance of square kernel compared with no kernel case in Fig. 4 as follows,

Fig. 4

Performance Comparison of Inner Product and Square Kernel

As Fig. 4 shows, the amazing thing happens when k=8. When k=8, the SVM with inner product (2-1) gives 80% precision rate while the SVM with square kernel gives 100% precision rate! This interesting phenomenon results from the fact that the kernels map original attribute space into higher dimensional feature space. And it is obvious that some non-linear separable data set can be linear separable in some particular higher dimensional feature spaces. 3. Next we also try the Gaussian kernel formulated as follows for σ = 104 , K (x , x= ) exp(− (i)

(j)

x (i) − x (j) 2σ 2

2

).

(3-2)

In our experiment, The SVM with Gaussian kernel gives us same performance as SVM with inner product. For linear kernel except square kernel, no performance gain is shown compared with inner product case either.

Further Discussion In the program, we also verify the statement that there are only small numbers of support vectors for a SVM shown as follows in Fig. 5,

Fig. 5

The sparsity property of support vectors for k=9

In Fig. 5, we can see that only the support vectors (geometric margin equals 1) obtain corresponding Lagrangian multipliers being non zero. In Fig. 5, the support vectors are 5th, 8th and 18th data instances and only a few of data instances are support vectors. This is so-called “The sparsity property of support vectors”.

Challenges Because we only implement SVM by solving the dual QP problem (2-1) directly, the time consumption of SVM is as same as solving the primal problem. In order to speed up the dual QP problem of SVM classifier, the SMO (Sequential Minimal Optimization) algorithm can be integrated into the SVM algorithm.

Conclusion These experiments show that it is equivalent to solve the primal and dual problem of the SVM QP problem and give the precision performance of both solutions.

One advantage of solving dual problem instead of primal problem is the kernel method. Experiments in this report has shown that using square kernel instead of inner product of equation (2-1) can lead to 20% precision rate improvement for k=8 case. Also, we verify that the support vectors for a SVM classifier is really only a small part of the whole data set. Another advantage of solving dual problem for SVM is the validation of SMO algorithm which can training a SVM efficiently even in a very high dimensional feature space.

References [1] Andrew Ng, CS229 Lecture notes support vector machine. [2] Andrew Ng, http://v.163.com/special/opencourse/machinelearning.html.

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.