Sklearn Tutorial: DNN On Boston Data
Sklearn Tutorial: DNN On Boston Data
Sklearn Tutorial: DNN On Boston Data
This tutorial follows very closely two other good tutorials and merges elements from both:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/boston.py
(https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/boston.py)
http://bigdataexaminer.com/uncategorized/how-to-run-linear-regression-in-python-scikit-
learn/ (http://bigdataexaminer.com/uncategorized/how-to-run-linear-regression-in-python-
scikit-learn/)
D. Thiebaut
August 2016.
Imports, first!
In [170]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from sklearn import cross_validation
from sklearn import metrics
from sklearn import preprocessing
import tensorflow as tf
#from tensorflow.contrib import learn
from tensorflow.contrib import learn
import pandas as pd
In [173]: boston.data.shape
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO'
'B' 'LSTAT']
Notes
------
Data Set Characteristics:
Out[177]: 0 1 2 3 4 5 6 7 8 9 10 11 12
0 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
1 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
2 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
3 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
Out[178]: CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
0 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98
1 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14
2 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03
3 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94
4 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33
Number of features = 13
In [180]: X = bostonDF
y = boston.target
print( "shape of X = ", X.shape, " shape of y = ", y.shape )
In [183]: y_train
22. , 7.2, 20.4, 13.8, 13. , 18.4, 23.1, 21.2, 23.1,
23.5, 50. , 26.6, 22.2, 50. , 8.3, 23.3, 21.7, 18.9,
18.4, 17.4, 13.4, 12.1, 26.6, 21.7, 28.4, 20.5, 22. ,
13.9, 11.3, 29.9, 26.6, 10.5, 23.2, 24.4, 46. , 21.9,
7.5, 36.2, 44. , 17.8, 27.5, 37.6, 14.1, 28.1, 10.2,
19.1, 43.8, 27.9, 25. , 16. , 16.6, 13.2, 50. , 22.2,
32.9, 15.2, 14.8, 13.8, 24.3, 33.8, 22.3, 50. , 9.5,
13.3, 22.2, 18.1, 18. , 25. , 16.5, 23. , 20.1, 33. ,
24.8, 18.2, 13.1, 34.9, 10.2, 19.9, 27.9, 23.3, 35.1,
12.8, 22. , 18.5, 25.1, 22.5, 22.4, 28.6, 19.5, 24.8,
24.5, 21.4, 33.1, 22.9, 20.7, 24.1, 50. , 24.7, 28.7,
7.2, 37. , 20.3, 30.1, 19.5, 23.4, 11.5, 21.6, 14.9,
15.2, 19.4, 8.4, 28. , 22.6, 13.5, 14.5, 31. , 10.9,
21.9, 22. , 19. , 21.4, 25. , 17.5, 36.5, 20.1, 20.4,
16.2, 23.6, 7.4, 35.2, 50. , 19.3, 21.2, 15.6, 33.4,
19.1, 21. , 23.7, 18.9, 16.8, 19.7, 17.7, 22.6, 11.8,
34.9, 20.6, 20.2, 32. , 22.3, 23.3, 14.4, 31.2, 24. ,
29.6, 19.6, 21.6, 20. , 27. , 33.2, 15.4, 30.5, 7.2,
23.9, 16.3, 23.9, 50. , 22.8, 15.4, 19.2, 19.6, 22.6,
33.2, 50. , 22.2, 14.9, 19.8, 23.7, 19. , 20.3, 11.9,
13.6, 29.8, 21.7, 19.5, 21.1, 24.5, 13.4, 18.6])
Out[184]: DNNRegressor()
Predict and score
MSE: 14.098925
In [187]: '''
regressor = learn.DNNRegressor( feature_columns=None,
hidden_units=[10, 10] ),
model_dir = '/tmp/tf')
'''
Out[188]: DNNRegressor()
Out[191]: DNNRegressor()
Plotting Residuals
In [194]: plt.scatter( y_train_predicted, y_train_predicted - y_train,
c ='b', s=30, alpha=0.4 )
plt.scatter(y_test_predicted, y_test_predicted - y_test,
c ='g', s=30 )
plt.hlines( y=0, xmin=-5, xmax=55)
plt.title( "Residuals" )
plt.ylabel( "Residuals" )
In [ ]: