Data Science
Data Science
Data Science
pikachuman
March 2024
1 Machine Learning
1.1 TODO
Do the following models :
1
• Decision Tree
• code python de base pour les algos
• seasonality
• MOSA algorithms
1.2 Metrics
1.2.1 Classification Problems
Il faut définir :
• Matrice de confusion :
• ROC-AUC : L’aire sous la courbe ROC. Compare le taux de vrais positifs
(sensibilité) au taux de faux positifs (1-spécificité) pour différents seuils.
• MSE (Mean Squared Error) : M SE(θ̂) = E((θ̂ − θ)2 ) And from this
formula we can get M SE(θ̂) = Bias(θ̂)2 + V ar(θ̂)
qP
n (yˆi −yi )2
• RMSE (Root Mean Squared Error) : RM SE = i=1 n
2
1.3 P-value
On va vulgariser le concept de p-value.
1.3.3 Significativité
Donc avec le facteur chance présent et qui fausse les résultats, on regarde à quel
point la variable isolé a quand même améliorer les performances sportives dans
notre exemple. On veut voir à quel point notre variable est significative.
3
1.4 Articles scientifique
• * : 0.01 < P < 0.05
• ** : 0.001 < P < 0.01
1.5 Bagging
Bootstrap aggregating, also called bagging (from bootstrap aggregating) :
• we use different smaller datasets for every model (Row sampling with
replacement).
• Bootstrap is using row sampling with replacement.
• Aggregation is using all the results of the models and combining them
together.
4
Figure 2: Schéma de bagging (Bootstrap Aggregating)
1.7 ARIMA
1.7.1 AR and MA models
• AR (Auto-Regressive) : stands for forecast a series solely on the past
values in the series - called lags. They are called long memory models.
• MA (Moving Average) : forecast a series based solely on the past errors
in the series - called error lags.
• I : Stands for integrated. Essentially you need to make your data station-
ary. (Distribution depends on difference in time not location in time).
5
By combining both models (AR and MA) we get this expression :
• p : Number of AR terms
• d : Number of first differencies
• q : Number of MA terms
For example :
Yt − Yt−1 = Wt
Wt = ω + ϕ1 Wt−1 + θ1 et−1 + et
6
Then we compute the probability to belong to each group.
• Vanilla
• Strawberry
• Chocolate
Independent variable:
• Age
• Gender
• etc...
Figure 5: Ice Cream Types
7
Figure 7: exemple de classification avec un decision tree ou arbre de classification
But it can be difficult to build this decision tree and this is where the machine
learning part comes into place.
2 Deep Learning
3 Business