Report
Report
Report
i
Chapter 1
1.1 Introduction
Why year prediction
Year prediction has not been studied very much.There is huge untapped potential if we could
use year prediction in recommendation system.
How do we label songs?90’s song or 80’s song.Searching for a song within genre when be-
comes easier when we have an idea what era a song belongs to.
We could also use it to study evolution of music.So in short,an interesting problem to work
on. The dataset I used is the million song dataset of the (UCI Machine Learning Repository).
Problem statement in UCI Machine Learning Repository Prediction of the release year of
a song from audio features. Songs are mostly western, commercial tracks ranging from 1922
to 2011, with a peak in the year 2000s.
Attribute Information:Attribute Information: We have 90 attributes, 12 = timbre average, 78
= timbre covariance.Features extracted from the ’timbre’ features from The Echo Nest API.
We take the average and covariance over all ’segments’, each segment being described by a
12-dimensional timbre vector.[3]
what is timbreTimbre is derived from Spectrogram and it is said to be the attribute of a song
which enables the listener to judge two non-identical sounds(which have same pitch and loud-
ness) as dissimilar[1].
ii
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
Songs were divided into segments and you observe each segment through spectrogram.The
spectrogram patch is expressed as a combination of the basis[2].[2] describes “the first [di-
mension of the 12-dimension timbre vector] represents the average loudness of the segment;
second emphasises brightness; third is more closely correlated to the flatness of a sound;” etc.
The breakup of the project report:
Chapter 2 introduces the Dataset
Chapter 3 States of the Objectives of the Project.
Chapter 3 Highlights Flow of work
Chapter 4 Discusses the algorithms.
Chapter 5 tests the Algorithms.
Chapter 6 highlights the conclusion of the project.
1
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
We can observe the correlation of features using the covariance matrix.The matrix is corre-
lated.The rank is definitely less than 90.The correlation coefficient between columns 15-25 is
really high. Covariance matrix when we consider 90 features.
Covariance matrix when we consider first 12 PCA components This matrix is also corre-
lated.The Combination(column 2 and Column 3 ) is the highest correlation coefficient than
any other combination.
2
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
3
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
1.4.1 Requirements
Based on the above analysis.I would choose a model which satisfies or which covers most of
the below requirements when compared to other model in my hypothesis set.
1)I need a model to do feature selection on dense matrices (i.e)because not all the features
might be equally useful.Reducing the features,reduces the parameters and therefore reduces
the complexity of the model.Higher the complexity of the model ,more likely it would over fit.
2)I need a model which would be robust to sample bias.
4
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
3)I need a model which performs well on the cross validation set.
4)Model which has low variance.Higher the variance,higher are the chances of the model over
fitting.Keeping variance in check also keeps the complexity of the model in check.
5
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
Stochastic Gradient Descent converges much faster than the gradient Descent.
Courtesy:[https://www.coursera.org/course/ml][6]
Ridge regression addresses some of the problems of Ordinary Least Squares by imposing a
penalty on the size of coefficients. The ridge coefficients minimize a penalized residual sum of
square of the weights.Least square gives us the best way we can represent the target(year) as a
linear function of the attributes(features).We impose a penalty on the weights so that introduc-
tion of error or small changes in the input does not give rise to huge changes in the predicted
target values.It usually does not give you sparser solution but it does give a robust system.
What does the ridge result tell us tell us
1)The true value of the year predicted for a song would be between 8 years of the predicted
value with the very high probability given by the Hoeffding equation.
2)The weights assigned to the features are small.
Degrees of freedom of the Ridge regression decreases as lambda increases.It is always lesser
than or equal to 90 in this case.
The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due
to its tendency to prefer solutions with fewer parameter values, effectively reducing the num-
ber of variables upon which the given solution is dependent[5].We impose a l1 penalty on the
weights.We try to find the solution at one the corner of the points(i.e one of the co-efficients
6
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
would be zero). We compute lasso using only first 12(Mean of the PCA components) of the
features to judge which features are important.
What does the lasso result tell us tell us
1)The true value of the year predicted for a song would be between 8 years of the predicted
value with the very high probability given by the Hoeffding equation.
2)The weights assigned to the features are non-zero indicating that all the features are impor-
tant and I cannot ignore one of them.
ElasticNet is a linear regression model trained with L1 and L2 prior as regularizer. This com-
bination allows for learning a sparse model where few of the weights are non-zero like Lasso,
while still maintaining the regularisation properties of Ridge [5].We control the convex combi-
nation of L1 and L2 using the l1r atioparameter[5].Elastic−netisuse f ulwhentherearemultiple f eatureswh
netislikelytopickboth.
What does the Elastic net result tell us tell us Results:Weights are small and some of the
weights are zero. Sparse coefficients which is amazing using which we can reduce the size of
the matrix. Comparing the weights of lasso,ridge and Elastic net
Key takeaways
7
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
1. The weights of the columns as decided by lasso are bigger than ones offered by the
ridge or elasticnet.This shows that all of my 12 PCA components are important.
2. The weights of the ridge and elasticnet are quite close to each other.The weights of
many columns (given by ridge and elastic net) is close to zero.
The article [3] is one among the few articles which suggested to use random forests or carts to
deal with sampling bias.Since my sample might suffer from sample bias it is important to con-
sider a model which would address the problem. The main idea behind the ensemble method-
ology is to aggregate multiple weighted models to obtain a combined model that outperforms
every single model in it.Increasing the number of trees leads to low variance and Increasing
the depth leads to low bias.I delt with 20 trees.The function RandomTreesRegressor can also
give you an idea about the important features.
The random forests look like a good fit to the problem since it gives me really low accuracy.The
problem it takes so much time to train and as the no of samples increases it would be better if
we could have model which runs faster.The dataset has the potential of growing and we need a
model which we can update without worrying much about the execution.The elastic net seems
like the next best thing.The error is around 7 years which is higher when you compare it with
random forests. The errors obtained for various algorithms is shown in the table below
8
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
1.7 Conclusion
An ideal model should fit for this problem would be one which address the problem Random
forests face and Problem Elastic net faces and neatly capture the good part of the above men-
tioned algorithms.Since the sample set has the potential of expanding further the Elastic net
model looks promising. Test data performance= 6.81564491984 years (absolute error) and
90.9447640365(mean squared error)and it is close to the real value with a high accuracy.
The article [2] carried out work along the same lines . They compared two algorithms: k-NN
and Vowpal Wabbit, both chosen because of their applicability to large-scale problems. They
also reported the error obtained by the best constant predictor.
9
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
1)Can we find a model which takes the subset/the entire set of 90 attributes and gives us an
estimate of year of song release. Yes we can .Elastic net and Random forests perform well and
give us an reasonable estimate of the year using the audio features of the song.
2)Can we predict the year with just knowing the value of the mean of the first 12 principal
components.
3)How helpful are the other 78 columns which bring out the covariance between the principal
components in each song. Both the best performing algorithm needed all the columns.Intuitively
we know that knowing one feature is not enough to point out an era,you need combination of
features and their covariances.
1.8 Bibliography
1. https://en.wikipedia.org/wiki/Timbre.
3. Lichman:2013 , author = ”M. Lichman”, year = ”2013”, title = ”UCI Machine Learning
Repository”, url = ”http://archive.ics.uci.edu/ml”, institution = ”University of California,
Irvine, School of Information and Computer Sciences”
4. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4743660/
6. https://www.coursera.org/course/ml
7. Bimbot et al., 2011] F. Bimbot, E. Deruty, G. Sargent, and E. Vincent. Methodology and
resources for the structural segmentation of music pieces into autonomous and compa-
10
CHAPTER 1. PREDICTING YEAR OF A SONG RELEASE USING TIMBRE
FEATURES OF A SONG
11