My Spotify
My Spotify
My Spotify
Version: 1
Contents
I Preamble 2
II Introduction 3
III Goals 4
IV Instructions 5
V Mandatory part 6
VI Bonus part 9
1
Chapter I
Preamble
Imagine how YouTube would look like without the recommendation section on the
right of a video? How would Facebook look like without its smart feed and the
people you may know? How would Amazon look like without book recommendations?
What would the media look like without the section “Most popular” or “Most
commented” or “Related news”?
Recommender systems are ubiquitous. Some of them became the core of a product,
some of them are just great features. Anyway, they are important for the products.
Why is it so?
Recommender systems are useful, first of all, for users and customers, but
they are a great tool for companies too: the algorithms help them increase
revenues by upselling or cross-selling goods that are appealing for the users.
It is a win-win situation – that is why recommendations are so valuable and
important.
2
Chapter II
Introduction
What is under the hood of recommender systems? You are quite familiar with
machine learning, but although recommendations can be created by applying some
of those algorithms (by predicting a rating or by predicting the probability
that a user clicks, likes or reads), there are several specific approaches
to that domain.
Non-personalized recommender systems. They are useful when we do not know anything
about a new user or customer. We can recommend to them something popular in
different terms: bestsellers, blockbusters, most commented, most trending,
most popular, top-10 in the genre, etc. But to create such recommendations
we need to have data about how other users or customers made their choices.
It is not very useful when you just start a new online shop or media – cold
start problem.
3
Chapter III
Goals
The goal of this project is to give you a first approach to recommender systems.
You will try all the mentioned approaches. At the same time, you will think
about how to create a good product based on them. It will impact the metrics
that you will use to assess your solutions.
4
Chapter IV
Instructions
• This project will only be evaluated by humans. You are free to organize and name
your files as you desire.
• Here and further we use Python 3 as the only correct version of Python.
• The norm is not applied to this project. Nevertheless, you are asked to be clear
and structured in the conception of your source code.
5
Chapter V
Mandatory part
a. Task
In this project, you will work on a music recommender system. There are tons
of different songs and tracks. Your goal is to help users to discover something
they will like and play it on repeat! As we said, in order to do that you will
try different approaches.
• Top-100 tracks by genre: Rock, Rap, Jazz, Electronic, Pop, Blues, Country, Reggae,
New Age. Non-personalized approach.
• Collections: 50 songs about love, 50 songs about war, 50 songs about happiness, 50
songs about loneliness, 50 songs about money. Content-based approach.
• dataPeople who listen to this track usually listen: 10 recommendations for each
track. Collaborative filtering.
b. Dataset
You are lucky and you do not have to collect the dataset by yourself. With the help
of the data science community, you have access to parts of Million Songs Dataset (MSD):
6
Project 02 – Myspotify Recommender systems. Music recommendations
You can collect additional data if you find it useful for you.
c. Implementation
In your research process, you can work in Jupyter Notebooks. After that, you
need to organize your code in classes and methods. In the end, you need to
create a Python script that makes the recommendations mentioned above.
Top-250 tracks
It should return a dataframe with the following fields: index number, artist
name, track title, play count. The table should be sorted by the play count
descendingly.
Collections
It should return on a given keyword (love, war, happiness) a dataframe (50
tracks) with the following fields: index number, artist name, track title,
play count. The table should be sorted by the play count descendingly. Try
different approaches to these recommendations:
7
Project 02 – Myspotify Recommender systems. Music recommendations
• baseline - when you look for the keyword and the number of its occurrences in a
song, filter using some threshold and then sorting it by the play count,
• word2vec - when you look not only for the keyword but for several similar tokens
as well using word2vec,
• classification task -you may label your data and try classification algorithms that
will predict for the other part of the dataset if a track belongs to a specific class.
Maybe you find some other interesting ideas on how to make those recommendations
better.
To assess your recommendations use the metric p@k (precision at k). It shows
the percentage of the correct recommendations from your list. It means, that
if you gave a user 10 tracks to listen and if they liked 3 of them (they really
listen to them in the test dataset), then the p@k will be equal to 30%. Calculate
the average p@k for your recommendations. It should be at least greater than
10%.
d. Submission
You need to prepare two files for your repository: the Jupyter Notebook where
you conducted your research as well as the python script. You may keep there
additional files that you may find useful for your program.
8
Chapter VI
Bonus part
Visit any music streaming service (ex. Spotify, Apple Music, Yandex Music,
etc.), find 3 elements of their recommender systems (it can be any recommendation
block that you have not already implemented), and repeat them in your project.
9
Chapter VII
Submit your work on your Git repository as usual. Only the work on your repository
will be graded.
Here are the points that your peer-corrector will have to check:
10