Questions tagged [svm]
Support Vector Machine refers to "a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis."
2,292 questions
3
votes
0
answers
18
views
Machine learning for importance check of three-way interaction in a longitudinal dataset
I want to know wether the interaction between the continous variable A, the continous variable B and the group (factor with 3 levels) is associated with a continous outcome Y in a longitudinal setting....
-1
votes
1
answer
81
views
Categorical Dependent Variable
Repost:
Hello all, thank you so much for the response. Here I have provided some information.
a. This is clinical data which is around 859 in sample size.
b. It has 11 columns as input features and ...
0
votes
0
answers
16
views
Youtube Spam Classifier - Different Methods yielding the same accuracy (94%)
(CONTEXT)
I'm currently doing a report project at my university to build a classifer model that classifies a comment as spam or ham (non-spam) using this data set, and then submit a prediction csv ...
4
votes
1
answer
144
views
why am I getting worst results when using CNN for feature extraction and SVM for classification
I am using 3d cnn for feature extraction and svm for classification But I got worst results then using the 3d cnn for both feature extraction and classification is that a normal thing ?
2
votes
1
answer
20
views
Why do results from various experiments with different % of features selected through RFE for SVM-based classification yield inconsistent outcomes?
I now have a basic understanding of classifiers such as Random Forests, Gradient Boosted Trees, and Support Vector Machines. My tasks involve classifying layer stacks that consist of optical and radar ...
0
votes
0
answers
11
views
When running a Support Vector Machine, how do I formulate the linear transformation that flips the decision hyperplane in the non-augmented dimension?
We know that when running a support vector machine, we actually use the "kernel trick" to compute the decision hyperplane (boundary) as if we do so in the kernel-augmented dimension, but not ...
0
votes
0
answers
153
views
Prove matrix constructed based on gaussian RBF is PSD
I have a radial basis function $k(x, y) = \exp(-{(x-y)}^T M {(x-y)})$ where $M$ is a symmetric PSD matrix.
I know that $k(\cdot)$ is a kernel itself: Prove that multiplication with positive ...
0
votes
0
answers
14
views
Accuracy for Permutation Test is very high
I am a bit confused as to what is happening with my classifier.
I have a dataset of ~220 features and about 4000 trials. Classes are perfect balanced, and I'm doing a simple binary classification task ...
1
vote
0
answers
26
views
Some further explanation of Alex Smola's 1998 implementation of support vector regression
I am currently going through, and trying to implement the pseudo-code in Alex Smola's 1998 paper on support vector regression, particularly the one on sequential minimal optimization. (Section 4.6.3, ...
0
votes
0
answers
11
views
Averaging over labels instances for SVM classification
This is a hypothetical question about different ways to input training/testing data into an SVM model. I have 128 instances for each of two classes, that can be heirarchally grouped into 4 sets (i.e. ...
0
votes
0
answers
19
views
How to get the second stationary point condition corresponding to intercept when using the augmented weight vector and augmented design matrix in SVM?
Below is the formulation I got for SVM when using the equation of classifier as w.x + b = 0
I want to know why I am not getting the second stationary condition i.e. summation over i from 1 to n of (...
0
votes
0
answers
35
views
Mean and Standard Deviation of accuracy for SVM model prediction
I am training a SVM model for binary classification. For this, I have split the train and test datasets in an 80:20 ratio. Then I standardized the training and test data separately and tuned the ...
1
vote
0
answers
24
views
Modeling for a data set that has different number of factors for each row (not binomial) [closed]
The modeling issue I'm having is that the categorical variable for each row has different number of factors. If I can reshape the data by products (a,b,c,.....~cost, hoursum, numPod, numDate), so that ...
2
votes
1
answer
142
views
How do I perform a permutation test on a machine learning model to obtain a p-value for its performance?
this is kinda of the same question of this previous post. But since there's no reply, and I'm having a hard time to find some answers, I'd like to ask it again.
I'm training a regression model (SVM ...
0
votes
0
answers
36
views
How to handle Data Normalization in case that a Logarithmic scale is required?
Let's say we wished to build a Regressor (e.g. a Support Vector Regressor) to predict the price of an asset, within a given time span from now on.
However, what if the historical data we have ...
1
vote
0
answers
40
views
Derivation of dual formulation of support vector regression
I'm trying to derive the dual formulation of epsilon-insensitive support vector regression. I think my derivation is correct, but I can't match it up to a result for the dual that I've seen given in ...
0
votes
0
answers
28
views
I want to plot the decision boundaries of an SVM model with more than 2 variables
I understand that that is impossible to visualize, so I went in and PCA-transformed the variables. The problem is that I still need more than 2 principal components to get "good" ...
0
votes
0
answers
47
views
Applying PCA Before Training Multiple SVM Binary Classifiers To Reduce Data
I am working on a project which has a goal to determine if a new sample is part of Class A or Class A'. I need multiple of those classifiers. I will have an SVM to classify between:
ClassA - ClassA' ...
1
vote
1
answer
165
views
Non-linear kernel for classifying data points corresponding to two concentric circles [closed]
Have seen article, while doing self-study, on Non-linearly seperable problems, here. The images as given there are here, and here.
It deals a common text-book problem, where the data points are in two ...
0
votes
1
answer
32
views
SVM Kernel to compare histograms as input vectors
In lecture 7 of CS229 by Andrew Ng he mentions at the very end a specific Kernel that allows an SVM to "classify" how similar two histograms are, such as the demographics of 2 countries. He ...
4
votes
5
answers
457
views
Is it valid to exhaustively test all possible combinations of features to find the best combination?
I have about 1000 labelled observations from about 50 subjects responding physiologically under different situations and am trying to classify the situation (usually into three classes of roughly ...
2
votes
1
answer
89
views
Is my understanding/approach to nested cross-validation, final model tuning correct?
I am training a SVM on limited training data with unbalanced classes.
Here are the things that I want to do:
1.) I want to make a statement of the generalizability ...
0
votes
0
answers
30
views
How is ROC AUC calculated for a Support Vector Machine?
My understanding is that a support vector machine (SVM) finds a hyperplane that separates two classes from each other. During training, there can be some amount of error allowed so that some classes ...
0
votes
0
answers
43
views
Should I interprete data as noise or not
I am tackling a classification problem with 3 classes. Here is what those classes look like on the Two first principal axes.
I fine-tuned a SVM model and the best performance achievable was 50%. By ...
1
vote
1
answer
72
views
How does fitting data work in SVM using the Kernel Trick?
In SVM, I understand how to fit some data after transforming it into a higher dimension. (ex: $(X_1, X_2) \to (X_1, X_2, X_1^2, X_2^2, X_1X_2)$, which is a 2 dimension to 5 dimension transformation).
...
0
votes
0
answers
65
views
About the hinge loss and slack variables
I'll be denoting the $ith$ training example, target label and slack variable as $\mathbf{\vec x}^{(i)}$, $y^{(i)}$ and $\xi_i$ respectively.
Hinge Loss :
The hinge loss function in the context of ...
0
votes
0
answers
46
views
How is the SVM optimization objective derived from the hinge loss function?
The hinge loss function, in the context of SVMs, is given as:
$$
\mathcal{L}(\mathbf{\vec w}, b\,; \mathbf{\vec x}^{(i)}, y ^{(i)}) = \max(0, 1-y ^{(i)}(\mathbf{\vec w}\cdot \mathbf{\vec x}^{(i)} + b))...
3
votes
1
answer
41
views
How to determine one-class SVM's $r$ parameter after obtaining $\alpha$ from QP programming solver?
I'm reading about one-class SVM in wiki here: One-class SVM. One-class SVM attempts to learn $r$ and $c$ to fit a hypersphere to the dataset. The formula for assigning labels is:
$$sign(r^2 - ||\phi(x)...
0
votes
0
answers
32
views
Scholkopf single class linear SVM equation: why ρ substracted to 1/2 ||w||² is the same as maximizing the distance
In the one class linear SVM, the equation is :
$\min_{w, \rho} \frac{1}{2} \|w\|^2- \rho + C\sum_{i=1}^{n} \xi_i$
subject to:
$\begin{align*}
& w \cdot x_i \geq \rho - \xi_i, \\
& \xi_i \geq 0,...
0
votes
1
answer
87
views
Learning Curve to Know Underfitting or Overfitting
I want to know if the model I am using tends to be overfitting or underfitting. I am using SVM and Random Forest algorithms. How to figure it out?
0
votes
0
answers
10
views
Introducing bias via combining probability outputs from multiple models
I am working on a classification task, where I am trying to estimate the probability that a patient may not die.
I did use a Survival Analysis approach at first, but the results seemed unintuitive and ...
0
votes
0
answers
23
views
Can I find the explicit feature map that generates exponent of a kernel?
Let's say I have a kernel $K$, and another kernel of the form :
$$
K' = e^K
$$
now I know how to prove K' is a kernel, I can do it using taylor expansion of $e^x$ around $0$,
but let's say if I want ...
1
vote
0
answers
43
views
Support Vector Classifiers for Overlapping Classes
I am currently studying support vector classifiers (SVC), more specifically, the solution to the Lagrangian (Wolfe) dual function with the help of the book "The Elements of Statistical Learning&...
2
votes
2
answers
109
views
How is the Representer theorem used in the derivation of the SVM dual form?
This is the primal form of the SVM hypothesis :
$$
h _{\mathbf{\vec w}, b}(\mathbf{\vec x}^{(i)}) = \mathbf{\vec w}\cdot \mathbf{\vec x}^{(i)} + b
$$
The Representer theorem as formulated here ...
3
votes
1
answer
126
views
Why is the regularization term multiplied by the error term in the cost function of SVM?
The cost function of the Optimal Margin Classifier(non-kernelized SVM) is given as :
$$
J(\mathbf{\vec w}, b) = \frac{1}{2}\|\mathbf{\vec w}\|_{2}^{2} + C \sum_{i=1}^{n}\max(0, 1-y ^{(i)}(\mathbf{\vec ...
3
votes
1
answer
251
views
Scenario where minimizing 0-1 loss is different than minimizing hinge loss
Suppose we're using linear predictors. I'm trying to conceptually understand how minimizing hinge loss and 0-1 loss aren't necessarily the same. For instance I was told that one can choose a set of ...
0
votes
0
answers
58
views
How to use random kitchen sinks for $\sigma \neq 1$?
The RBF kernel is given by
$$
k(x,y) = \exp\left(-\frac{\| x - y \|_2^2}{2 \sigma^2}\right)
$$
where $\sigma$ is the length-scale parameter. I want to use the random kitchen sinks method to create a ...
0
votes
0
answers
17
views
Linear SVM vs Decision Stumps for AdaBoost
I have heard that AdaBoost can use a linear SVM as a weak classificer. I wonder why Decision Stumps is often used with AdaBoost? Booth are binary classifiers.
In my opinion, linear SVM seems to be a ...
3
votes
2
answers
166
views
Support Vector machine - Hingeloss
What does it mean that 'The SVM hinge loss
estimates the mode of the posterior class probabilities'(Elements of statistical Learning p.427).
The decision function f(x) assigns to the positive class(+...
1
vote
1
answer
42
views
Availability of Linear Grouping Algorithms to Linearly Cluster Datasets
I have been trying to cluster a scatter plot that has a triangular graph, ideally the proper clustering plot should have a linear form, as shown below:
I tried using Spectral Clustering:
and ...
1
vote
0
answers
30
views
Feature selection before ML (RF and SVM)
I am new to machine learning and have to work with big data (lots of OTUs along with clinical) which I will input into 2 different machine learning models (RF and SVM) that will be used for prediction ...
2
votes
1
answer
41
views
Interpreting the formula for Riemannian metric tensor
In Improving support vector machine classifiers by modifying kernel functions, the authors defined Riemannian metric tensor for a kernel as follows:
$$
\begin{align}
g(\vec{x}) &= \text{det}|g_{ij}...
5
votes
2
answers
3k
views
Support Vector Regression vs. Linear Regression
I am new to ML and I am learning the different algorithms one can use to perform regression. Keep in mind that I have a strong mathematical background, but I am new in the ML field.
So I understand ...
1
vote
1
answer
35
views
An extremely simple classification problem leads to intractable SVM program
In the popular textbook Mathematics for Machine Learning, creating a SVM requires solving:
$\text{min}_{w,b} \dfrac{1}{2}\|w\|^2$
subject to $y_n (w^T x_n + b) \geq 1$, for all $n = 1, \ldots, N$
Ok, ...
2
votes
0
answers
117
views
Textbook Recommendation other than ESL [duplicate]
My current background is as follows: (core subjects only)
Math : Linear Algebra, Analysis, (half of) Measure TheoryStats : Mathematical Statistics, Regression Analysis, Multivariate Analysis
"...
1
vote
0
answers
36
views
Convexitiy of multi-class hinge loss
The empirical risk of a multi-class hinge-loss is given by
$$L(\Theta,(x,y) = \max_{j \neq y} \Big[1+ \sum_{i=1}^{d} x_i(\Theta_{ij} - \Theta_{iy}) \Big]_{+} $$
where $x \in \mathbb{R}^{d}$ is a ...
2
votes
0
answers
32
views
Implement Nesterov's acceleration for SVM
I am trying to implement Nestrov's acceleration gradient descent for SVM. The objective function I need to minimize is $$\frac{1}{2}\lVert Au-Bv\rVert_2^2$$ with constraints $\sum_{i}u_i=\sum_{j}v_j=1$...
1
vote
0
answers
26
views
How to forecast changepoints from Gas Concentration Data?
So I'm trying to predict when gas concentrations change from sensor conductivity readings over a day. The gases randomly change concentrations around every 80-120 seconds and are kept constant between ...
0
votes
0
answers
69
views
Is $\ell_1$ regularization not compatible with SVM?
In the notes of Andrew Ng's CS229 Machine Learning course, it is mentioned:
The $\ell_2$ norm regularization is
much more commonly used with kernel methods because $\ell_1$ regularization is
...
2
votes
2
answers
343
views
What method should be used if the clusters contains different classes?
Assume that you having $N$ clusters. Each cluster have multiple classes. So we know the class ID for every major clusters, but not the class ID for the data points inside the major clusters.
Each ...