International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, Volume 2, Issue 8, August 2012)


Ashis Pradhan1
Computer Science and Engineering Department, Sikkim Manipal Institute of Technology, Majhitar, East-Sikkim.
Abstract: - Support vector machine (SVM) is one of the The problem consists in finding a function f that
most important machine learning algorithms that has minimizes the expectation of the error on new data i.e.,
been implemented mostly in pattern recognition problem, finding a function f that minimizes the expected error:
for e.g. classifying the network traffic and also in image
processing for recognition . Lots of research is going on in this V(y, f(x)) P(x, y) dx dy[5].
technique for the improvement of Qos (quality of service) and Early machine learning algorithms aimed to learn
in security perspective. The latest works in this field have representations of simple functions. Hence, the goal of
proved that SVM performs better than other network traffic learning was to output a hypothesis that performed the
classifier in terms of generalization of problem. This paper correct classification of the training data and early
presents a theoretical aspect of SVM, its concepts and its
applications overview.
learning algorithms were designed to find such an accurate
fit to the data[6]. The ability of a hypothesis to correctly
Keywords:- Support Vector Machine (SVM), Machine classify data not in the training set is known as its
learning algorithm, Quality of Service, Security generalization. SVM performs better in term of not over
perspective, Network Traffic Classification. generalization when the neural networks might end up over
generalizing easily [7].
Support Vector Machine (SVM), is one of best machine II. SVM MODEL
learning algorithms, which was proposed in 1990s and
used mostly for pattern recognition. This has also been
applied to many pattern classification problems such as
image recognition, speech recognition, text
categorization, face detection and faulty card detection,
etc. Pattern recognition aims to classify data
based on either a priori knowledge or statistical
information extracted from raw data, which is a
powerful tool in data separation in many disciplines.
SVM is a supervised type of machine learning. algorithm in
which, given a set of training examples, each marked as
belonging to one of the many categories, an SVM
training algorithm builds a model that predicts the
category of the new example. SVM has the greater ability
to generalize the problem, which is the goal in statistical
The statistical learning theory provides an outline for
studying the problem of gaining knowledge, making
predictions, making decisions from a set of data. In Figure 1: SVM model [8]
statistical learning theory the problem of supervised The figure 1 is the simple model for representing support
learning is formulated as follows. We are given a set of vector machine technique. The model consists of two
training data {(x1, y1)...(xn, yn)} in Rn x R sampled different patterns and the goal of SVM is to separate these
according to unknown probability distribution P(x, y), two patterns. The model consists of three different lines.
and a loss function V(y, f(x)) that measures the error, for The line w.x-b=0 is known as margin of separation or
a given x, f(x) is "predicted" instead of the actual value marginal line.

The lines w.x - b = 1 and w.x b = -1 are the lines on
the either side of the line of margin. These three lines
together construct the hyper plane that separates the
given patterns and the pattern that lies on the edges of the
hyper plane is called support vectors. The
perpendicular distance between the line of margin and
the edges of hyper plane is known as margin. One of
the objectives of SVM for accurate classification is to
maximize this margin for better classification. The
larger the value of margin or the perpendicular distance, the
better is the classification process and hence minimizing
the occurrence of error.


Figure 2: A Hyper Plane [8]
The support vector machine usually deals with pattern
classification that means this algorithm is used mostly for For non-linear separable patterns, the given pattern by
classifying the different types of patterns. Now, there is mapping it into new space usually a higher dimension
different type of patterns i.e. Linear and non-linear. Linear space so that in higher dimension space, the pattern
patterns are patterns that are easily distinguishable or can becomes linearly separable. The given pattern can be
be easily separated in low dimension whereas non-linear mapped into higher dimension space using kernel function,
patterns are patterns that are not easily distinguishable or (x).
cannot be easily separated and hence these type of patterns i.e. x (x)
need to be further manipulated so that they can be easily Selecting different kernel function is an important
separated. aspect in the SVM-based classification, commonly used
Basically, the main idea behind SVM is the kernel functions include LINEAR, POLY, RBF, and
construction of an optimal hyper plane, which can be used SIGMOID. For e.g.: the equation for Poly Kernel function
for classification, for linearly separable patterns. The is given as:
optimal hyper plane is a hyper plane selected from the
K(x, y) = <x, y>^p (ii)
set of hyper planes for classifying patterns that maximizes
the margin of the hyper plane i.e. the distance from the Different Kernel functions create different mapping for
hyper plane to the nearest point of each patterns. The creating non-linear separation surfaces. Another important
main objective of SVM is to maximize the margin so parameter in SVM is the parameter C. It is also called a
that it can correctly classify the given patterns i.e. larger complexity parameter and is the sum of the distances of all
the margin size more correctly it classifies the points which are on the wrong side of the hyper plane.
patterns. Basically, the complexity parameter is the amount of error
The equation shown below is the hyper plane that can be ignored during the classification process. But
representation: the value of classification process cannot be either too
large or too small. If the value of complexity
Hyper plane, aX + bY = C (i)
parameter is too large then the performance of
The figure 2 shown below is the basic idea of the classification is low and vice versa.
hyper plane describing how it looks like when two The main principle of support vector machine is that
different patterns are separated using a hyper plane, in a given a set of independent and identically distributed
three dimension. Basically, this plane comprises of three training sample {(xi , yi)}N i=1, where x Rd and yi
lines that separates two different in 3-D space, mainly {1,1} ,denote the input and output of the classification.
marginal line and two other lines on either side of The goal is to find a hyper plane wT.x + b = 0, which
marginal lines where support vectors are located. separate the two different samples accurately.

Therefore, the problem of solving optimal The Separating hyper plane.
classification now translates into solving quadratic The maximum margin hyper plane.
programming problems. It is to seek a partition hyper Soft margin.
plane to make the bilateral blank area (2/||w||) maximum, The Kernel function
which means we have to maximize the weight of the
For any kind of patterns, human beings are considered to
margin. It is expressed as:
be an ultimate judge, who can easily distinguish the
Min (w) = || w || 2 = (w, w), different pattern given to them, but for a computer system it
Such that: yi (w. xi + b) >= 1 (iii) is very difficult to distinguish and represent it. In the fig
3(a), there are two different kinds of patterns and our job is
to classify these two patterns. In this case, it is very easy to
classify visually with our naked eye as it can be visually
segmented. But, in order to represent these patterns to
belong to two different classes, a line can be drawn that
separates this pattern. The fig 3(b) shows representation
for the classification of two different patterns using a
single line, provided that the patterns are presented in
two dimensional space. The fig 3(c) shows the similar
type of two different patterns but in one dimensional space.
So, in order to separate these patterns, given in one
dimension, a point can be used to separate it. When the
similar types of patterns that are presented in fig 3(b)
is represented three dimensional space, then a plane can be
used to represent a line for the classification of patterns
into two different categories as shown in the fig 3(d). The
plane that separates these two different types of pattern
represented in 3-D space is known as a separating
hyper plane that separates patterns.
Similarly, for separating the above mentioned patterns
there may exist many such planes as shown in the fig 3(e)
that separates the patterns mentioned above. The next task
is to select the plane from the set of planes whose margin is
maximum. The plane with the maximum margin i.e.
perpendicular distance from the marginal line is known as
optimal hyper plane or maximum margin hyper plane
as shown in fig 3(f). The patterns that lie on the edges of
the plane are called support vectors. During the
classification and representation of patterns, there may
exist some errors in the representation, as shown in the fig
3(g), such types of errors is called soft margin.
During classification of such type of patterns
representation, the error can be ignored to some threshold
value. The fig 3(h) shows the classification of pattern into
Figure 3: Classification concept using SVM [13] different categories with soft margin. In other words, it is
also called the cost factor or the complexity parameter.
IV. CONCEPTS OF SVM The patterns that are discussed above are all linearly
The basic concept of SVM can be explained using separable patterns that can be easily separated using line or
four points shown below for classification of given set of plane. There may also exist non-linear separable patterns
patterns by constructing an optimal hyper plane. that are difficult to classify.

For such type of patterns to classify, the original It scales relatively well to high dimensional data and the
datas are mapped to a higher dimensional space using a trade-off between classifier complexity and error can be
function called kernel function. The fig 3(i) shows the controlled explicitly. The weakness includes the need of a
representation of pattern that is not linearly separable using good kernel-function. It has proven itself best not only in
a single line or plane. So, in order to classify such types of network field but also in image processing for recognition
patterns, the original datas are mapped to higher of gesture [11] and there are lots of future scope for this
dimensional space using kernel function, for example x2 in algorithm in image processing for using it in the
this case. The fig 3(j) shows the classification of non- classification of pattern in multi-dimension.
linear pattern after mapping the data to two
dimensional spaces. The fig 3(k) shows classification of REFERENCES
