Laboratorio 9

10/1/2021 laboratorio 9 - Jupyter Notebook
Laboratorio 9
Este es un proyecto de clasificación binaria que tiene como objetivo predecir si la carrera de un
jugador de la NBA durará más de 5 años o no. Los datos incluyen las métricas de rendimiento de
los jugadores de la NBA en su carrera, mediante las cuales se deben hacer las predicciones.
Información de atributos
Name :Nombre
GP :Juegos jugados
MIN :Minutos jugados
PTS :Puntos por partido
FGM :Objetivos de campo realizados
FGA :Objetivos de campo intentados
FG% :Porcentaje de gol de campo
3P Mode :3 puntos hechos
3PA :Intentos de 3 puntos
3P% :3 puntos por ciento
FTM :Tiro libre hecho
FTA :Intentos de lanzamiento libre
FT% :Porcentaje de lanzamiento libre
OREB :Rebotes ofensivos
DREB :Rebotes defensivos
REB :Rebotes
AST :Asistencias
STL :Roba
BLK :Bloques
TOV :Pérdidas de balón
TARGET_5Yrs :Variable predictora (1-Si la duración de la carrera>= 5 años) y (0-si<5)
localhost:8888/notebooks/laboratorio 9.ipynb# 1/7

In [3]: 1 import numpy as np

2 import pandas as pd
3
4 df_arrest = pd.read_csv('C:/Users/Juan Carlos/Desktop/Python/9NA CLASE/nba_l
5
6 df_arrest["TARGET_5Yrs"]=df_arrest["TARGET_5Yrs"].astype('int64')
7 df_arrest.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1329 entries, 0 to 1328
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 1329 non-null object
1 GP 1329 non-null int64
2 MIN 1329 non-null float64
3 PTS 1329 non-null float64
4 FGM 1329 non-null float64
5 FGA 1329 non-null float64
6 FG% 1329 non-null float64
7 3P Made 1329 non-null float64
8 3PA 1329 non-null float64
9 3P% 1329 non-null float64
10 FTM 1329 non-null float64
11 FTA 1329 non-null float64
12 FT% 1329 non-null float64
13 OREB 1329 non-null float64
14 DREB 1329 non-null float64
15 REB 1329 non-null float64
16 AST 1329 non-null float64
17 STL 1329 non-null float64
18 BLK 1329 non-null float64
19 TOV 1329 non-null float64
20 TARGET_5Yrs 1329 non-null int64
dtypes: float64(18), int64(2), object(1)
memory usage: 218.2+ KB
Actividad
1. Estandarizar solo las variables continuas

2. Grafico PCA
3. Kriterio de keiser con la varianza explicada de los 3 primeros componentes
In [4]: 1 from sklearn.preprocessing import StandardScaler

2 #Separando las variables continuas
3 continuas = ['GP','MIN','PTS','FGM','FGA','FG%','3P Made','3PA','3P%','FTM',
4 ,'DREB','REB','AST','STL','BLK','TOV'
5 ]
6 x = df_arrest.loc[:, continuas].values
7 #Separando las variable target
8 y = df_arrest['TARGET_5Yrs'].tolist()
Estandarizar solo las variables continuas

In [11]: 1 from sklearn.model_selection import train_test_split
In [12]: 1 X_train, X_test, y_train, y_test = \

2 train_test_split(x,
3 y,
4 test_size=0.3,
5 stratify=y,
6 random_state=0)
In [9]: 1 from sklearn.preprocessing import StandardScaler
In [10]: 1 sc = StandardScaler()
In [13]: 1 X_train_std = sc.fit_transform(X_train)

2 X_test_std = sc.transform(X_test)
In [14]: 1 df_cont=pd.DataFrame(X_train_std,
2 columns=['GP','MIN','PTS','FGM','FGA','FG%','3P Made','
3 ,'DREB','REB','AST','STL','BLK','TOV'])
4 df_cont.head()
Out[14]:
GP MIN PTS FGM FGA FG% 3P Made 3PA 3P%
0 -0.719662 -0.995681 -1.118599 -1.139285 -0.943118 -2.419002 -0.652122 -0.739133 -1.194836
1 -2.848945 -0.842179 -1.005949 -0.906612 -1.134714 2.561225 -0.652122 -0.739133 -1.194836
2 0.143561 -0.440712 -0.487761 -0.441266 -0.477813 0.394503 -0.652122 -0.739133 -1.194836
3 -0.489469 -0.606022 -0.915829 -0.906612 -1.079972 1.494033 -0.652122 -0.739133 -1.194836
4 -0.662114 -1.007489 -1.118599 -1.139285 -1.079972 -1.853067 -0.390259 -0.176618 0.133145
Grafico PCA


2 import math as math
3 import scipy.stats as stats
4
5 df_corr = df_cont.corr(method="pearson")
6 df_corr
Out[16]:
GM FGA FG% 3P Made 3PA 3P% FTM FTA FT% OREB
23 0.521321 0.292365 0.118687 0.109922 0.028166 0.482238 0.477561 0.218018 0.398992
12 0.910997 0.225965 0.402427 0.411432 0.141625 0.793439 0.783499 0.243309 0.589434
10 0.980357 0.269423 0.359553 0.365774 0.129978 0.900860 0.885856 0.268939 0.584093
00 0.979760 0.306244 0.301133 0.307563 0.097021 0.854418 0.847581 0.235479 0.607777
60 1.000000 0.144416 0.406311 0.425208 0.177522 0.834401 0.813735 0.278319 0.514282
44 0.144416 1.000000 -0.295126 -0.348508 -0.348601 0.258110 0.310051 -0.120935 0.520859
33 0.406311 -0.295126 1.000000 0.983250 0.578534 0.184443 0.119222 0.309173 -0.209131
63 0.425208 -0.348508 0.983250 1.000000 0.571064 0.194746 0.128305 0.310850 -0.222973
21 0.177522 -0.348601 0.578534 0.571064 1.000000 0.022762 -0.040700 0.313748 -0.306656 -
18 0.834401 0.258110 0.184443 0.194746 0.022762 1.000000 0.980457 0.270060 0.585681
81 0.813735 0.310051 0.119222 0.128305 -0.040700 0.980457 1.000000 0.130429 0.654214
79 0.278319 -0.120935 0.309173 0.310850 0.313748 0.270060 0.130429 1.000000 -0.122680 -
77 0.514282 0.520859 -0.209131 -0.222973 -0.306656 0.585681 0.654214 -0.122680 1.000000
71 0.649516 0.426626 0.031373 0.023245 -0.142314 0.665563 0.713526 -0.008982 0.843561
62 0.623150 0.479160 -0.058941 -0.069669 -0.209874 0.661342 0.719204 -0.053343 0.934889
29 0.571567 -0.104090 0.412360 0.440233 0.260831 0.463799 0.413687 0.310411 -0.000313
57 0.688775 0.074667 0.342552 0.367299 0.180276 0.595249 0.577649 0.199057 0.305876
46 0.344895 0.397967 -0.150344 -0.166188 -0.252412 0.443802 0.503856 -0.142407 0.633312
44 0.849120 0.133424 0.295542 0.316605 0.096108 0.808251 0.801566 0.219918 0.427325

2 import matplotlib.pyplot as plt
3
4 cov_mat = np.cov(X_train_std.T)
In [18]: 1 eigen_vals, eigen_vecs = np.linalg.eig(cov_mat)

In [19]: 1 print('\nEigenvalues \n%s' % eigen_vals)
Eigenvalues
[9.62886703e+00 3.88752049e+00 1.14785067e+00 8.82468378e-01
7.41793351e-01 5.71211935e-01 5.04693504e-01 4.58072858e-01
4.23206166e-01 2.50445431e-01 2.36815437e-01 1.01482688e-01
1.05137644e-01 5.33749414e-02 1.43320136e-02 7.94217812e-03
4.71724574e-03 3.49739946e-04 1.70394948e-04]
In [20]: 1 from sklearn.decomposition import PCA
In [21]: 1 pca = PCA()

2 X_train_pca = pca.fit_transform(X_train_std)
3 pca.explained_variance_ratio_
Out[21]: array([5.06237548e-01, 2.04386335e-01, 6.03482329e-02, 4.63957625e-02,

3.89997749e-02, 3.00314594e-02, 2.65342538e-02, 2.40831740e-02,
2.22500582e-02, 1.31671650e-02, 1.24505682e-02, 5.52761017e-03,
5.33545087e-03, 2.80618679e-03, 7.53505411e-04, 4.17559902e-04,
2.48009128e-04, 1.83875727e-05, 8.95851200e-06])
In [23]: 1 np.cumsum(pca.explained_variance_ratio_)[5]
Out[23]: 0.886399112348117
In [25]: 1 import matplotlib.pyplot as plt

2 plt.bar(range(1, 20), pca.explained_variance_ratio_, alpha=0.5, align='cente
3 plt.step(range(1, 20), np.cumsum(pca.explained_variance_ratio_), where='mid'
4 plt.ylabel('Relación de varianza explicada')
5 plt.xlabel('Componentes principales')
6
7 plt.show()
In [26]: 1 pca = PCA(n_components=3)

2 X_std = pca.fit_transform(X_train_std)

In [28]: 1 df_x =pd.DataFrame(X_std)

2 df_x.columns = ['PC1', 'PC2','PC3']
3 df_x.head()
Out[28]:
PC1 PC2 PC3
0 -3.391530 -0.864992 -0.320351
1 -2.910358 -3.291920 -1.126430
2 -1.007929 -1.970897 0.059549
3 -1.918537 -2.614837 -0.269469
4 -3.354599 0.922713 0.362723
In [30]: 1 df_y = pd.DataFrame(y_train)

2 df_y.columns = ['ARGET_5Yrs']
3 df_y.head()
Out[30]:
ARGET_5Yrs
0 0
1 0
2 1
3 0
4 1
In [31]: 1 df_rd = pd.concat([df_x, df_y], axis=1)

2 df_rd.head(10)
Out[31]:
PC1 PC2 PC3 ARGET_5Yrs
0 -3.391530 -0.864992 -0.320351 0
1 -2.910358 -3.291920 -1.126430 0
2 -1.007929 -1.970897 0.059549 1
3 -1.918537 -2.614837 -0.269469 0
4 -3.354599 0.922713 0.362723 1
5 -1.726465 2.121716 -1.083186 0
6 0.030791 -2.792813 -0.359631 1
7 -4.168896 -0.926062 -0.169454 1
8 -0.659899 -2.469046 0.034734 1
9 0.788681 -2.029191 -0.268184 1


Laboratorio 9

Cargado por

Copyright:

Formatos disponibles

Laboratorio 9

Cargado por

Información del documento

Descripción original:

Título original

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Laboratorio 9

Cargado por

Copyright:

Formatos disponibles

10/1/2021 laboratorio 9 - Jupyter Notebook

localhost:8888/notebooks/laboratorio 9.ipynb# 1/7

In [3]: 1 import numpy as np

1. Estandarizar solo las variables continuas

In [4]: 1 from sklearn.preprocessing import StandardScaler

Estandarizar solo las variables continuas

localhost:8888/notebooks/laboratorio 9.ipynb# 2/7

In [11]: 1 from sklearn.model_selection import train_test_split

In [12]: 1 X_train, X_test, y_train, y_test = \

In [9]: 1 from sklearn.preprocessing import StandardScaler

In [13]: 1 X_train_std = sc.fit_transform(X_train)

0 -0.719662 -0.995681 -1.118599 -1.139285 -0.943118 -2.419002 -0.652122 -0.739133 -1.194836

1 -2.848945 -0.842179 -1.005949 -0.906612 -1.134714 2.561225 -0.652122 -0.739133 -1.194836

2 0.143561 -0.440712 -0.487761 -0.441266 -0.477813 0.394503 -0.652122 -0.739133 -1.194836

3 -0.489469 -0.606022 -0.915829 -0.906612 -1.079972 1.494033 -0.652122 -0.739133 -1.194836

4 -0.662114 -1.007489 -1.118599 -1.139285 -1.079972 -1.853067 -0.390259 -0.176618 0.133145

localhost:8888/notebooks/laboratorio 9.ipynb# 3/7

In [16]: 1 import numpy as np

23 0.521321 0.292365 0.118687 0.109922 0.028166 0.482238 0.477561 0.218018 0.398992

12 0.910997 0.225965 0.402427 0.411432 0.141625 0.793439 0.783499 0.243309 0.589434

10 0.980357 0.269423 0.359553 0.365774 0.129978 0.900860 0.885856 0.268939 0.584093

00 0.979760 0.306244 0.301133 0.307563 0.097021 0.854418 0.847581 0.235479 0.607777

60 1.000000 0.144416 0.406311 0.425208 0.177522 0.834401 0.813735 0.278319 0.514282

44 0.144416 1.000000 -0.295126 -0.348508 -0.348601 0.258110 0.310051 -0.120935 0.520859

33 0.406311 -0.295126 1.000000 0.983250 0.578534 0.184443 0.119222 0.309173 -0.209131

63 0.425208 -0.348508 0.983250 1.000000 0.571064 0.194746 0.128305 0.310850 -0.222973

21 0.177522 -0.348601 0.578534 0.571064 1.000000 0.022762 -0.040700 0.313748 -0.306656 -

18 0.834401 0.258110 0.184443 0.194746 0.022762 1.000000 0.980457 0.270060 0.585681

81 0.813735 0.310051 0.119222 0.128305 -0.040700 0.980457 1.000000 0.130429 0.654214

79 0.278319 -0.120935 0.309173 0.310850 0.313748 0.270060 0.130429 1.000000 -0.122680 -

77 0.514282 0.520859 -0.209131 -0.222973 -0.306656 0.585681 0.654214 -0.122680 1.000000

71 0.649516 0.426626 0.031373 0.023245 -0.142314 0.665563 0.713526 -0.008982 0.843561

62 0.623150 0.479160 -0.058941 -0.069669 -0.209874 0.661342 0.719204 -0.053343 0.934889

29 0.571567 -0.104090 0.412360 0.440233 0.260831 0.463799 0.413687 0.310411 -0.000313

57 0.688775 0.074667 0.342552 0.367299 0.180276 0.595249 0.577649 0.199057 0.305876

46 0.344895 0.397967 -0.150344 -0.166188 -0.252412 0.443802 0.503856 -0.142407 0.633312

44 0.849120 0.133424 0.295542 0.316605 0.096108 0.808251 0.801566 0.219918 0.427325

In [17]: 1 import numpy as np

In [18]: 1 eigen_vals, eigen_vecs = np.linalg.eig(cov_mat)

localhost:8888/notebooks/laboratorio 9.ipynb# 4/7

In [19]: 1 print('\nEigenvalues \n%s' % eigen_vals)

In [20]: 1 from sklearn.decomposition import PCA

In [21]: 1 pca = PCA()

Out[21]: array([5.06237548e-01, 2.04386335e-01, 6.03482329e-02, 4.63957625e-02,

In [25]: 1 import matplotlib.pyplot as plt

In [26]: 1 pca = PCA(n_components=3)

localhost:8888/notebooks/laboratorio 9.ipynb# 5/7

In [28]: 1 df_x =pd.DataFrame(X_std)

0 -3.391530 -0.864992 -0.320351

1 -2.910358 -3.291920 -1.126430

2 -1.007929 -1.970897 0.059549

3 -1.918537 -2.614837 -0.269469

4 -3.354599 0.922713 0.362723

In [30]: 1 df_y = pd.DataFrame(y_train)

In [31]: 1 df_rd = pd.concat([df_x, df_y], axis=1)

0 -3.391530 -0.864992 -0.320351 0

1 -2.910358 -3.291920 -1.126430 0

2 -1.007929 -1.970897 0.059549 1