OLSLinear Regquestion

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 5

{

"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Welcome to the first Hands On linear regression.\n",
"\n",
"In this exercise , you will try out simple linaer regression using stats model
that you have learnt in the course. We have created this Python Notebook with all
the necessary things needed for completing this exercise. \n",
"\n",
"To run the code in each cell click on the cell and press **shift + enter** "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"**Run the below cell to import the data and view first five rows of dataset**\
n",
"\n",
"- In this hands on we are using boston housing price dataset.\n",
"- The data importing part has been done for you."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \\\
n",
"0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 \n",
"1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 \n",
"2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 \n",
"3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 \n",
"4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 \n",
"\n",
" PTRATIO B LSTAT target \n",
"0 15.3 396.90 4.98 24.0 \n",
"1 17.8 396.90 9.14 21.6 \n",
"2 17.8 392.83 4.03 34.7 \n",
"3 18.7 394.63 2.94 33.4 \n",
"4 18.7 396.90 5.33 36.2 \n"
]
}
],
"source": [
"from sklearn.datasets import load_boston\n",
"import pandas as pd\n",
"boston = load_boston()\n",
"dataset = pd.DataFrame(data=boston.data, columns=boston.feature_names)\n",
"dataset['target'] = boston.target\n",
"print(dataset.head())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Follow the steps in sequence to extract features and target**\n",
"- From the above output you can see the various attributes of the dataset.\n",
"- The 'target' column has the dependent values(housing prices) and rest of the
colums are the independent values that influence the target values\n",
"- Lets find the relation between 'housing price' and 'average number of rooms
per dwelling' using stats model\n",
"- Assign the values of column \"RM\"(average number of rooms per dwelling) to
variable X\n",
"- Similarly assign the values of 'target'(housing price) column to variable Y\
n",
"- sample code: values = data_frame['attribute_name']"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"###Start code here\n",
"X = dataset['RM']\n",
"Y = dataset['target']\n",
"###End code(approx 2 lines)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Import package**\n",
"- import statsmodels.api as sm"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"###Start code here\n",
"import statsmodels.api as sm\n",
"###End code(approx 1 line)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Follow the steps in sequence to initialise and fit the model**\n",
"- Initialise the OLS model by passing target(Y) and attribute(X).Assign the
model to variable 'statsModel'\n",
"- Fit the model and assign it to variable 'fittedModel'\n",
"- Sample code for initialization: sm.OLS(target, attribute)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"###Start code here\n",
"\n",
"statsModel = sm.OLS(X, Y)\n",
"fittedModel = statsModel.fit()\n",
"###End code(approx 2 lines)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Print Summary**\n",
"- Print the summary of fittedModel using the summary() function"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" OLS Regression Results
\n",

"==================================================================================
=====\n",
"Dep. Variable: RM R-squared (uncentered):
0.901\n",
"Model: OLS Adj. R-squared (uncentered):
0.901\n",
"Method: Least Squares F-statistic:
4615.\n",
"Date: Fri, 26 Aug 2022 Prob (F-statistic):
3.74e-256\n",
"Time: 04:54:16 Log-Likelihood:
-1065.2\n",
"No. Observations: 506 AIC:
2132.\n",
"Df Residuals: 505 BIC:
2137.\n",
"Df Model: 1
\n",
"Covariance Type: nonrobust
\n",

"==============================================================================\n",
" coef std err t P>|t| [0.025
0.975]\n",

"------------------------------------------------------------------------------\n",
"target 0.2467 0.004 67.930 0.000 0.240
0.254\n",

"==============================================================================\n",
"Omnibus: 82.770 Durbin-Watson:
0.430\n",
"Prob(Omnibus): 0.000 Jarque-Bera (JB):
157.829\n",
"Skew: -0.931 Prob(JB): 5.34e-
35\n",
"Kurtosis: 5.004 Cond. No.
1.00\n",

"==============================================================================\n",
"\n",
"Warnings:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.\n"
]
}
],
"source": [
"###Start code here\n",
"print(fittedModel.summary())\n",
"###End code(approx 1 line)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Extract r_squared value**\n",
"- From the summary report note down the R-squared value and assign it to
variable 'r_squared' in the below cell after rounding it off to 2-decimal places"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"###Start code here\n",
"r_squared = 0.90\n",
"###End code(approx 1 line)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Run the below cell without modifying to save your answers\n"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a894124cc6d5c5c71afe060d5dde0762\n"
]
}
],
"source": [
"import hashlib\n",
"import pickle\n",
"def gethex(ovalue):\n",
" hexresult=hashlib.md5(str(ovalue).encode())\n",
" return hexresult.hexdigest()\n",
"def pickle_ans1(value):\n",
" hexresult=gethex(value)\n",
" with open('ans/output1.pkl', 'wb') as file:\n",
" hexresult=gethex(value)\n",
" print(hexresult)\n",
" pickle.dump(hexresult,file)\n",
"pickle_ans1(r_squared)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

You might also like