Skip to main content

Questions tagged [svm]

Support Vector Machine refers to "a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis."

Filter by
Sorted by
Tagged with
3 votes
0 answers
18 views

Machine learning for importance check of three-way interaction in a longitudinal dataset

I want to know wether the interaction between the continous variable A, the continous variable B and the group (factor with 3 levels) is associated with a continous outcome Y in a longitudinal setting....
Friedebert's user avatar
-1 votes
1 answer
81 views

Categorical Dependent Variable

Repost: Hello all, thank you so much for the response. Here I have provided some information. a. This is clinical data which is around 859 in sample size. b. It has 11 columns as input features and ...
Ayesha Haya's user avatar
0 votes
0 answers
16 views

Youtube Spam Classifier - Different Methods yielding the same accuracy (94%)

(CONTEXT) I'm currently doing a report project at my university to build a classifer model that classifies a comment as spam or ham (non-spam) using this data set, and then submit a prediction csv ...
KitanaKatana's user avatar
4 votes
1 answer
144 views

why am I getting worst results when using CNN for feature extraction and SVM for classification

I am using 3d cnn for feature extraction and svm for classification But I got worst results then using the 3d cnn for both feature extraction and classification is that a normal thing ?
anya's user avatar
  • 41
2 votes
1 answer
20 views

Why do results from various experiments with different % of features selected through RFE for SVM-based classification yield inconsistent outcomes?

I now have a basic understanding of classifiers such as Random Forests, Gradient Boosted Trees, and Support Vector Machines. My tasks involve classifying layer stacks that consist of optical and radar ...
Giancarlo's user avatar
0 votes
0 answers
11 views

When running a Support Vector Machine, how do I formulate the linear transformation that flips the decision hyperplane in the non-augmented dimension?

We know that when running a support vector machine, we actually use the "kernel trick" to compute the decision hyperplane (boundary) as if we do so in the kernel-augmented dimension, but not ...
Wonjae Oh's user avatar
0 votes
0 answers
153 views

Prove matrix constructed based on gaussian RBF is PSD

I have a radial basis function $k(x, y) = \exp(-{(x-y)}^T M {(x-y)})$ where $M$ is a symmetric PSD matrix. I know that $k(\cdot)$ is a kernel itself: Prove that multiplication with positive ...
BiriBora's user avatar
  • 101
0 votes
0 answers
14 views

Accuracy for Permutation Test is very high

I am a bit confused as to what is happening with my classifier. I have a dataset of ~220 features and about 4000 trials. Classes are perfect balanced, and I'm doing a simple binary classification task ...
Chris's user avatar
  • 1
1 vote
0 answers
26 views

Some further explanation of Alex Smola's 1998 implementation of support vector regression

I am currently going through, and trying to implement the pseudo-code in Alex Smola's 1998 paper on support vector regression, particularly the one on sequential minimal optimization. (Section 4.6.3, ...
Nnanna's user avatar
  • 11
0 votes
0 answers
11 views

Averaging over labels instances for SVM classification

This is a hypothetical question about different ways to input training/testing data into an SVM model. I have 128 instances for each of two classes, that can be heirarchally grouped into 4 sets (i.e. ...
Dusacks1570's user avatar
0 votes
0 answers
19 views

How to get the second stationary point condition corresponding to intercept when using the augmented weight vector and augmented design matrix in SVM?

Below is the formulation I got for SVM when using the equation of classifier as w.x + b = 0 I want to know why I am not getting the second stationary condition i.e. summation over i from 1 to n of (...
Shri's user avatar
  • 23
0 votes
0 answers
35 views

Mean and Standard Deviation of accuracy for SVM model prediction

I am training a SVM model for binary classification. For this, I have split the train and test datasets in an 80:20 ratio. Then I standardized the training and test data separately and tuned the ...
Sultan Ahmed's user avatar
1 vote
0 answers
24 views

Modeling for a data set that has different number of factors for each row (not binomial) [closed]

The modeling issue I'm having is that the categorical variable for each row has different number of factors. If I can reshape the data by products (a,b,c,.....~cost, hoursum, numPod, numDate), so that ...
rocknRrr's user avatar
  • 121
2 votes
1 answer
142 views

How do I perform a permutation test on a machine learning model to obtain a p-value for its performance?

this is kinda of the same question of this previous post. But since there's no reply, and I'm having a hard time to find some answers, I'd like to ask it again. I'm training a regression model (SVM ...
artvmac's user avatar
  • 73
0 votes
0 answers
36 views

How to handle Data Normalization in case that a Logarithmic scale is required?

Let's say we wished to build a Regressor (e.g. a Support Vector Regressor) to predict the price of an asset, within a given time span from now on. However, what if the historical data we have ...
Juan Flautista De Torrepacheco's user avatar
1 vote
0 answers
40 views

Derivation of dual formulation of support vector regression

I'm trying to derive the dual formulation of epsilon-insensitive support vector regression. I think my derivation is correct, but I can't match it up to a result for the dual that I've seen given in ...
oweydd's user avatar
  • 225
0 votes
0 answers
28 views

I want to plot the decision boundaries of an SVM model with more than 2 variables

I understand that that is impossible to visualize, so I went in and PCA-transformed the variables. The problem is that I still need more than 2 principal components to get "good" ...
maglorismyspiritanimal's user avatar
0 votes
0 answers
47 views

Applying PCA Before Training Multiple SVM Binary Classifiers To Reduce Data

I am working on a project which has a goal to determine if a new sample is part of Class A or Class A'. I need multiple of those classifiers. I will have an SVM to classify between: ClassA - ClassA' ...
guitardenver's user avatar
1 vote
1 answer
165 views

Non-linear kernel for classifying data points corresponding to two concentric circles [closed]

Have seen article, while doing self-study, on Non-linearly seperable problems, here. The images as given there are here, and here. It deals a common text-book problem, where the data points are in two ...
jiten's user avatar
  • 113
0 votes
1 answer
32 views

SVM Kernel to compare histograms as input vectors

In lecture 7 of CS229 by Andrew Ng he mentions at the very end a specific Kernel that allows an SVM to "classify" how similar two histograms are, such as the demographics of 2 countries. He ...
yyyLLL's user avatar
  • 33
4 votes
5 answers
457 views

Is it valid to exhaustively test all possible combinations of features to find the best combination?

I have about 1000 labelled observations from about 50 subjects responding physiologically under different situations and am trying to classify the situation (usually into three classes of roughly ...
user1596274's user avatar
2 votes
1 answer
89 views

Is my understanding/approach to nested cross-validation, final model tuning correct?

I am training a SVM on limited training data with unbalanced classes. Here are the things that I want to do: 1.) I want to make a statement of the generalizability ...
curious's user avatar
  • 115
0 votes
0 answers
30 views

How is ROC AUC calculated for a Support Vector Machine?

My understanding is that a support vector machine (SVM) finds a hyperplane that separates two classes from each other. During training, there can be some amount of error allowed so that some classes ...
inquisitive_hamster's user avatar
0 votes
0 answers
43 views

Should I interprete data as noise or not

I am tackling a classification problem with 3 classes. Here is what those classes look like on the Two first principal axes. I fine-tuned a SVM model and the best performance achievable was 50%. By ...
Yann's user avatar
  • 53
1 vote
1 answer
72 views

How does fitting data work in SVM using the Kernel Trick?

In SVM, I understand how to fit some data after transforming it into a higher dimension. (ex: $(X_1, X_2) \to (X_1, X_2, X_1^2, X_2^2, X_1X_2)$, which is a 2 dimension to 5 dimension transformation). ...
Random user33's user avatar
0 votes
0 answers
65 views

About the hinge loss and slack variables

I'll be denoting the $ith$ training example, target label and slack variable as $\mathbf{\vec x}^{(i)}$, $y^{(i)}$ and $\xi_i$ respectively. Hinge Loss : The hinge loss function in the context of ...
Sagnik Taraphdar's user avatar
0 votes
0 answers
46 views

How is the SVM optimization objective derived from the hinge loss function?

The hinge loss function, in the context of SVMs, is given as: $$ \mathcal{L}(\mathbf{\vec w}, b\,; \mathbf{\vec x}^{(i)}, y ^{(i)}) = \max(0, 1-y ^{(i)}(\mathbf{\vec w}\cdot \mathbf{\vec x}^{(i)} + b))...
Sagnik Taraphdar's user avatar
3 votes
1 answer
41 views

How to determine one-class SVM's $r$ parameter after obtaining $\alpha$ from QP programming solver?

I'm reading about one-class SVM in wiki here: One-class SVM. One-class SVM attempts to learn $r$ and $c$ to fit a hypersphere to the dataset. The formula for assigning labels is: $$sign(r^2 - ||\phi(x)...
MathematicsBeginner's user avatar
0 votes
0 answers
32 views

Scholkopf single class linear SVM equation: why ρ substracted to 1/2 ||w||² is the same as maximizing the distance

In the one class linear SVM, the equation is : $\min_{w, \rho} \frac{1}{2} \|w\|^2- \rho + C\sum_{i=1}^{n} \xi_i$ subject to: $\begin{align*} & w \cdot x_i \geq \rho - \xi_i, \\ & \xi_i \geq 0,...
Arnaud Feldmann's user avatar
0 votes
1 answer
87 views

Learning Curve to Know Underfitting or Overfitting

I want to know if the model I am using tends to be overfitting or underfitting. I am using SVM and Random Forest algorithms. How to figure it out?
Anna's user avatar
  • 3
0 votes
0 answers
10 views

Introducing bias via combining probability outputs from multiple models

I am working on a classification task, where I am trying to estimate the probability that a patient may not die. I did use a Survival Analysis approach at first, but the results seemed unintuitive and ...
vjgu's user avatar
  • 23
0 votes
0 answers
23 views

Can I find the explicit feature map that generates exponent of a kernel?

Let's say I have a kernel $K$, and another kernel of the form : $$ K' = e^K $$ now I know how to prove K' is a kernel, I can do it using taylor expansion of $e^x$ around $0$, but let's say if I want ...
aroma's user avatar
  • 123
1 vote
0 answers
43 views

Support Vector Classifiers for Overlapping Classes

I am currently studying support vector classifiers (SVC), more specifically, the solution to the Lagrangian (Wolfe) dual function with the help of the book "The Elements of Statistical Learning&...
Kobi's user avatar
  • 11
2 votes
2 answers
109 views

How is the Representer theorem used in the derivation of the SVM dual form?

This is the primal form of the SVM hypothesis : $$ h _{\mathbf{\vec w}, b}(\mathbf{\vec x}^{(i)}) = \mathbf{\vec w}\cdot \mathbf{\vec x}^{(i)} + b $$ The Representer theorem as formulated here ...
Sagnik Taraphdar's user avatar
3 votes
1 answer
126 views

Why is the regularization term multiplied by the error term in the cost function of SVM?

The cost function of the Optimal Margin Classifier(non-kernelized SVM) is given as : $$ J(\mathbf{\vec w}, b) = \frac{1}{2}\|\mathbf{\vec w}\|_{2}^{2} + C \sum_{i=1}^{n}\max(0, 1-y ^{(i)}(\mathbf{\vec ...
Sagnik Taraphdar's user avatar
3 votes
1 answer
251 views

Scenario where minimizing 0-1 loss is different than minimizing hinge loss

Suppose we're using linear predictors. I'm trying to conceptually understand how minimizing hinge loss and 0-1 loss aren't necessarily the same. For instance I was told that one can choose a set of ...
redbull_nowings's user avatar
0 votes
0 answers
58 views

How to use random kitchen sinks for $\sigma \neq 1$?

The RBF kernel is given by $$ k(x,y) = \exp\left(-\frac{\| x - y \|_2^2}{2 \sigma^2}\right) $$ where $\sigma$ is the length-scale parameter. I want to use the random kitchen sinks method to create a ...
user336650's user avatar
0 votes
0 answers
17 views

Linear SVM vs Decision Stumps for AdaBoost

I have heard that AdaBoost can use a linear SVM as a weak classificer. I wonder why Decision Stumps is often used with AdaBoost? Booth are binary classifiers. In my opinion, linear SVM seems to be a ...
euraad's user avatar
  • 425
3 votes
2 answers
166 views

Support Vector machine - Hingeloss

What does it mean that 'The SVM hinge loss estimates the mode of the posterior class probabilities'(Elements of statistical Learning p.427). The decision function f(x) assigns to the positive class(+...
J.doe's user avatar
  • 369
1 vote
1 answer
42 views

Availability of Linear Grouping Algorithms to Linearly Cluster Datasets

I have been trying to cluster a scatter plot that has a triangular graph, ideally the proper clustering plot should have a linear form, as shown below: I tried using Spectral Clustering: and ...
NOT-A-CS-GUY's user avatar
1 vote
0 answers
30 views

Feature selection before ML (RF and SVM)

I am new to machine learning and have to work with big data (lots of OTUs along with clinical) which I will input into 2 different machine learning models (RF and SVM) that will be used for prediction ...
Tori's user avatar
  • 11
2 votes
1 answer
41 views

Interpreting the formula for Riemannian metric tensor

In Improving support vector machine classifiers by modifying kernel functions, the authors defined Riemannian metric tensor for a kernel as follows: $$ \begin{align} g(\vec{x}) &= \text{det}|g_{ij}...
Omar Shehab's user avatar
5 votes
2 answers
3k views

Support Vector Regression vs. Linear Regression

I am new to ML and I am learning the different algorithms one can use to perform regression. Keep in mind that I have a strong mathematical background, but I am new in the ML field. So I understand ...
kubo's user avatar
  • 195
1 vote
1 answer
35 views

An extremely simple classification problem leads to intractable SVM program

In the popular textbook Mathematics for Machine Learning, creating a SVM requires solving: $\text{min}_{w,b} \dfrac{1}{2}\|w\|^2$ subject to $y_n (w^T x_n + b) \geq 1$, for all $n = 1, \ldots, N$ Ok, ...
Fraïssé's user avatar
  • 1,630
2 votes
0 answers
117 views

Textbook Recommendation other than ESL [duplicate]

My current background is as follows: (core subjects only) Math : Linear Algebra, Analysis, (half of) Measure TheoryStats : Mathematical Statistics, Regression Analysis, Multivariate Analysis "...
jason 1's user avatar
  • 311
1 vote
0 answers
36 views

Convexitiy of multi-class hinge loss

The empirical risk of a multi-class hinge-loss is given by $$L(\Theta,(x,y) = \max_{j \neq y} \Big[1+ \sum_{i=1}^{d} x_i(\Theta_{ij} - \Theta_{iy}) \Big]_{+} $$ where $x \in \mathbb{R}^{d}$ is a ...
Oskar's user avatar
  • 265
2 votes
0 answers
32 views

Implement Nesterov's acceleration for SVM

I am trying to implement Nestrov's acceleration gradient descent for SVM. The objective function I need to minimize is $$\frac{1}{2}\lVert Au-Bv\rVert_2^2$$ with constraints $\sum_{i}u_i=\sum_{j}v_j=1$...
struggleinmath's user avatar
1 vote
0 answers
26 views

How to forecast changepoints from Gas Concentration Data?

So I'm trying to predict when gas concentrations change from sensor conductivity readings over a day. The gases randomly change concentrations around every 80-120 seconds and are kept constant between ...
Jawi Doen's user avatar
0 votes
0 answers
69 views

Is $\ell_1$ regularization not compatible with SVM?

In the notes of Andrew Ng's CS229 Machine Learning course, it is mentioned: The $\ell_2$ norm regularization is much more commonly used with kernel methods because $\ell_1$ regularization is ...
Katatonic's user avatar
2 votes
2 answers
343 views

What method should be used if the clusters contains different classes?

Assume that you having $N$ clusters. Each cluster have multiple classes. So we know the class ID for every major clusters, but not the class ID for the data points inside the major clusters. Each ...
euraad's user avatar
  • 425

1
2 3 4 5
46