Laboratorio 9

Descargar como pdf o txt
Descargar como pdf o txt
Está en la página 1de 7

10/1/2021 laboratorio 9 - Jupyter Notebook

Laboratorio 9

Este es un proyecto de clasificación binaria que tiene como objetivo predecir si la carrera de un
jugador de la NBA durará más de 5 años o no. Los datos incluyen las métricas de rendimiento de
los jugadores de la NBA en su carrera, mediante las cuales se deben hacer las predicciones.

Información de atributos
Name :Nombre
GP :Juegos jugados
MIN :Minutos jugados
PTS :Puntos por partido
FGM :Objetivos de campo realizados
FGA :Objetivos de campo intentados
FG% :Porcentaje de gol de campo
3P Mode :3 puntos hechos
3PA :Intentos de 3 puntos
3P% :3 puntos por ciento
FTM :Tiro libre hecho
FTA :Intentos de lanzamiento libre
FT% :Porcentaje de lanzamiento libre
OREB :Rebotes ofensivos
DREB :Rebotes defensivos
REB :Rebotes
AST :Asistencias
STL :Roba
BLK :Bloques
TOV :Pérdidas de balón
TARGET_5Yrs :Variable predictora (1-Si la duración de la carrera>= 5 años) y (0-si<5)

localhost:8888/notebooks/laboratorio 9.ipynb# 1/7


10/1/2021 laboratorio 9 - Jupyter Notebook

In [3]: 1 import numpy as np


2 import pandas as pd
3
4 df_arrest = pd.read_csv('C:/Users/Juan Carlos/Desktop/Python/9NA CLASE/nba_l
5
6 df_arrest["TARGET_5Yrs"]=df_arrest["TARGET_5Yrs"].astype('int64')
7 df_arrest.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1329 entries, 0 to 1328
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 1329 non-null object
1 GP 1329 non-null int64
2 MIN 1329 non-null float64
3 PTS 1329 non-null float64
4 FGM 1329 non-null float64
5 FGA 1329 non-null float64
6 FG% 1329 non-null float64
7 3P Made 1329 non-null float64
8 3PA 1329 non-null float64
9 3P% 1329 non-null float64
10 FTM 1329 non-null float64
11 FTA 1329 non-null float64
12 FT% 1329 non-null float64
13 OREB 1329 non-null float64
14 DREB 1329 non-null float64
15 REB 1329 non-null float64
16 AST 1329 non-null float64
17 STL 1329 non-null float64
18 BLK 1329 non-null float64
19 TOV 1329 non-null float64
20 TARGET_5Yrs 1329 non-null int64
dtypes: float64(18), int64(2), object(1)
memory usage: 218.2+ KB

Actividad

1. Estandarizar solo las variables continuas


2. Grafico PCA
3. Kriterio de keiser con la varianza explicada de los 3 primeros componentes

In [4]: 1 from sklearn.preprocessing import StandardScaler


2 #Separando las variables continuas
3 continuas = ['GP','MIN','PTS','FGM','FGA','FG%','3P Made','3PA','3P%','FTM',
4 ,'DREB','REB','AST','STL','BLK','TOV'
5 ]
6 x = df_arrest.loc[:, continuas].values
7 #Separando las variable target
8 y = df_arrest['TARGET_5Yrs'].tolist()

Estandarizar solo las variables continuas

localhost:8888/notebooks/laboratorio 9.ipynb# 2/7


10/1/2021 laboratorio 9 - Jupyter Notebook

In [11]: 1 from sklearn.model_selection import train_test_split

In [12]: 1 X_train, X_test, y_train, y_test = \


2 train_test_split(x,
3 y,
4 test_size=0.3,
5 stratify=y,
6 random_state=0)

In [9]: 1 from sklearn.preprocessing import StandardScaler

In [10]: 1 sc = StandardScaler()

In [13]: 1 X_train_std = sc.fit_transform(X_train)


2 X_test_std = sc.transform(X_test)

In [14]: 1 df_cont=pd.DataFrame(X_train_std,
2 columns=['GP','MIN','PTS','FGM','FGA','FG%','3P Made','
3 ,'DREB','REB','AST','STL','BLK','TOV'])
4 df_cont.head()

Out[14]:
GP MIN PTS FGM FGA FG% 3P Made 3PA 3P%

0 -0.719662 -0.995681 -1.118599 -1.139285 -0.943118 -2.419002 -0.652122 -0.739133 -1.194836

1 -2.848945 -0.842179 -1.005949 -0.906612 -1.134714 2.561225 -0.652122 -0.739133 -1.194836

2 0.143561 -0.440712 -0.487761 -0.441266 -0.477813 0.394503 -0.652122 -0.739133 -1.194836

3 -0.489469 -0.606022 -0.915829 -0.906612 -1.079972 1.494033 -0.652122 -0.739133 -1.194836

4 -0.662114 -1.007489 -1.118599 -1.139285 -1.079972 -1.853067 -0.390259 -0.176618 0.133145

Grafico PCA

localhost:8888/notebooks/laboratorio 9.ipynb# 3/7


10/1/2021 laboratorio 9 - Jupyter Notebook

In [16]: 1 import numpy as np


2 import math as math
3 import scipy.stats as stats
4
5 df_corr = df_cont.corr(method="pearson")
6 df_corr

Out[16]:
GM FGA FG% 3P Made 3PA 3P% FTM FTA FT% OREB

23 0.521321 0.292365 0.118687 0.109922 0.028166 0.482238 0.477561 0.218018 0.398992

12 0.910997 0.225965 0.402427 0.411432 0.141625 0.793439 0.783499 0.243309 0.589434

10 0.980357 0.269423 0.359553 0.365774 0.129978 0.900860 0.885856 0.268939 0.584093

00 0.979760 0.306244 0.301133 0.307563 0.097021 0.854418 0.847581 0.235479 0.607777

60 1.000000 0.144416 0.406311 0.425208 0.177522 0.834401 0.813735 0.278319 0.514282

44 0.144416 1.000000 -0.295126 -0.348508 -0.348601 0.258110 0.310051 -0.120935 0.520859

33 0.406311 -0.295126 1.000000 0.983250 0.578534 0.184443 0.119222 0.309173 -0.209131

63 0.425208 -0.348508 0.983250 1.000000 0.571064 0.194746 0.128305 0.310850 -0.222973

21 0.177522 -0.348601 0.578534 0.571064 1.000000 0.022762 -0.040700 0.313748 -0.306656 -

18 0.834401 0.258110 0.184443 0.194746 0.022762 1.000000 0.980457 0.270060 0.585681

81 0.813735 0.310051 0.119222 0.128305 -0.040700 0.980457 1.000000 0.130429 0.654214

79 0.278319 -0.120935 0.309173 0.310850 0.313748 0.270060 0.130429 1.000000 -0.122680 -

77 0.514282 0.520859 -0.209131 -0.222973 -0.306656 0.585681 0.654214 -0.122680 1.000000

71 0.649516 0.426626 0.031373 0.023245 -0.142314 0.665563 0.713526 -0.008982 0.843561

62 0.623150 0.479160 -0.058941 -0.069669 -0.209874 0.661342 0.719204 -0.053343 0.934889

29 0.571567 -0.104090 0.412360 0.440233 0.260831 0.463799 0.413687 0.310411 -0.000313

57 0.688775 0.074667 0.342552 0.367299 0.180276 0.595249 0.577649 0.199057 0.305876

46 0.344895 0.397967 -0.150344 -0.166188 -0.252412 0.443802 0.503856 -0.142407 0.633312

44 0.849120 0.133424 0.295542 0.316605 0.096108 0.808251 0.801566 0.219918 0.427325

In [17]: 1 import numpy as np


2 import matplotlib.pyplot as plt
3
4 cov_mat = np.cov(X_train_std.T)

In [18]: 1 eigen_vals, eigen_vecs = np.linalg.eig(cov_mat)

localhost:8888/notebooks/laboratorio 9.ipynb# 4/7


10/1/2021 laboratorio 9 - Jupyter Notebook

In [19]: 1 print('\nEigenvalues \n%s' % eigen_vals)

Eigenvalues
[9.62886703e+00 3.88752049e+00 1.14785067e+00 8.82468378e-01
7.41793351e-01 5.71211935e-01 5.04693504e-01 4.58072858e-01
4.23206166e-01 2.50445431e-01 2.36815437e-01 1.01482688e-01
1.05137644e-01 5.33749414e-02 1.43320136e-02 7.94217812e-03
4.71724574e-03 3.49739946e-04 1.70394948e-04]

In [20]: 1 from sklearn.decomposition import PCA

In [21]: 1 pca = PCA()


2 X_train_pca = pca.fit_transform(X_train_std)
3 pca.explained_variance_ratio_

Out[21]: array([5.06237548e-01, 2.04386335e-01, 6.03482329e-02, 4.63957625e-02,


3.89997749e-02, 3.00314594e-02, 2.65342538e-02, 2.40831740e-02,
2.22500582e-02, 1.31671650e-02, 1.24505682e-02, 5.52761017e-03,
5.33545087e-03, 2.80618679e-03, 7.53505411e-04, 4.17559902e-04,
2.48009128e-04, 1.83875727e-05, 8.95851200e-06])

In [23]: 1 np.cumsum(pca.explained_variance_ratio_)[5]

Out[23]: 0.886399112348117

In [25]: 1 import matplotlib.pyplot as plt


2 plt.bar(range(1, 20), pca.explained_variance_ratio_, alpha=0.5, align='cente
3 plt.step(range(1, 20), np.cumsum(pca.explained_variance_ratio_), where='mid'
4 plt.ylabel('Relación de varianza explicada')
5 plt.xlabel('Componentes principales')
6
7 plt.show()

In [26]: 1 pca = PCA(n_components=3)


2 X_std = pca.fit_transform(X_train_std)

localhost:8888/notebooks/laboratorio 9.ipynb# 5/7


10/1/2021 laboratorio 9 - Jupyter Notebook

In [28]: 1 df_x =pd.DataFrame(X_std)


2 df_x.columns = ['PC1', 'PC2','PC3']
3 df_x.head()

Out[28]:
PC1 PC2 PC3

0 -3.391530 -0.864992 -0.320351

1 -2.910358 -3.291920 -1.126430

2 -1.007929 -1.970897 0.059549

3 -1.918537 -2.614837 -0.269469

4 -3.354599 0.922713 0.362723

In [30]: 1 df_y = pd.DataFrame(y_train)


2 df_y.columns = ['ARGET_5Yrs']
3 df_y.head()

Out[30]:
ARGET_5Yrs

0 0

1 0

2 1

3 0

4 1

In [31]: 1 df_rd = pd.concat([df_x, df_y], axis=1)


2 df_rd.head(10)

Out[31]:
PC1 PC2 PC3 ARGET_5Yrs

0 -3.391530 -0.864992 -0.320351 0

1 -2.910358 -3.291920 -1.126430 0

2 -1.007929 -1.970897 0.059549 1

3 -1.918537 -2.614837 -0.269469 0

4 -3.354599 0.922713 0.362723 1

5 -1.726465 2.121716 -1.083186 0

6 0.030791 -2.792813 -0.359631 1

7 -4.168896 -0.926062 -0.169454 1

8 -0.659899 -2.469046 0.034734 1

9 0.788681 -2.029191 -0.268184 1

localhost:8888/notebooks/laboratorio 9.ipynb# 6/7


10/1/2021 laboratorio 9 - Jupyter Notebook

localhost:8888/notebooks/laboratorio 9.ipynb# 7/7

También podría gustarte