Salary Prediction LinearRegression
Salary Prediction LinearRegression
Salary Prediction LinearRegression
June 7, 2023
[25]: data = pd.read_csv('data.csv') # Loading the dataset and displaying the first 5␣
↪rows
data.head(5)
[31]: (30, 2)
[32]: data.info() # Checking information about the datset like columns, Non_null␣
↪values, datatypes.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
# Column Non-Null Count Dtype
1
--- ------ -------------- -----
0 YearsExperience 30 non-null float64
1 Salary 30 non-null float64
dtypes: float64(2)
memory usage: 608.0 bytes
[29]: data.describe()
The describe() function is a convenient method in pandas that provides a statistical summary of a
DataFrame or Series. It calculates various descriptive statistics for each numerical column in the
dataset, including count, mean, standard deviation, minimum value, 25th percentile (Q1), median
(50th percentile or Q2), 75th percentile (Q3), and maximum value.
[33]: YearsExperience 0
Salary 0
dtype: int64
if num_duplicates > 0:
print(f"The dataset contains {num_duplicates} duplicate values.")
data = data.drop_duplicates()
print("Dropped duplicates.")
print("Number of Duplicate Values after dropping :",num_duplicates)
else:
print("The dataset doesn't contain any duplicate values.")
2
[50]: YearsExperience
0 1.1
1 1.3
2 1.5
3 2.0
4 2.2
[53]: 0 39343.0
1 46205.0
2 37731.0
3 43525.0
4 39891.0
Name: Salary, dtype: float64
[54]: plt.scatter(X,Y)
plt.title("Salary according to Experience")
plt.xlabel("Salary")
plt.ylabel("Years of experience")
3
1.0.6 Splitting the dataset into train and test
[57]: print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape)
(20, 1)
(10, 1)
(20,)
(10,)
4
[59]: LinearRegression()
[60]: linear.coef_
[60]: array([9523.14578831])
[61]: linear.intercept_
[61]: 24006.035761469633
[64]: Y_pred
[65]: Y_test
[65]: 24 109431.0
8 64445.0
2 37731.0
23 113812.0
7 54445.0
27 112635.0
15 67938.0
18 81363.0
1 46205.0
19 93940.0
Name: Salary, dtype: float64
Score: 92.78148083974355
5
plt.plot(X_test, Y_pred, color='red', linewidth=2, label='Predicted')
plt.title("Salary Prediction")
plt.xlabel("Salary")
plt.ylabel("Years of experience")
plt.legend()
plt.show()