How to Calculate R^2 with Scikit-Learn

Last Updated : 05 Aug, 2024

The coefficient of determination, denoted as R², is an essential metric in regression analysis. It indicates the extent to which the independent variables account for the variation in the dependent variable.

In this article, we will walk you through calculating R² using Scikit-Learn, a powerful Python library for machine learning.

What is R²?

R² quantifies the proportion of variance in the dependent variable that can be predicted from the independent variables. It ranges between 0 and 1, with 0 indicating that the model does not explain any of the variability and 1 indicating that the model explains all the variability.

Mathematically, R² is expressed as:

$R^2 = 1 - \frac{\text{SS}_{res}}{\text{SS}_{tot}}$

Here:

$SS_{res}$ is the sum of squares of residuals (the difference between actual and predicted values).
$SS_{tot}$ is the total sum of squares (the difference between actual values and the mean of actual values).

Calculating R² with Scikit-Learn for Sample Data

Let's go through an example to calculate R² from sample data using simple linear regression model.

Step 1: Import Necessary Libraries

import numpy as np
from sklearn.metrics import r2_score

Step 2: Generate Sample Data

# Generate random data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Assuming a perfect model prediction (just for the sake of demonstration)
y_pred = 4 + 3 * X

Step 3: Computer the R² using sklearn

# Flatten the arrays to use in r2_score
y = y.flatten()
y_pred = y_pred.flatten()

# Compute R² using Scikit-Learn
R2_sklearn = r2_score(y, y_pred)
print(f"R² (Scikit-Learn Calculation): {R2_sklearn}")

Complete Code

import numpy as np
from sklearn.metrics import r2_score

# Generate random data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Assuming a perfect model prediction (just for the sake of demonstration)
y_pred = 4 + 3 * X

# Flatten the arrays to use in r2_score
y = y.flatten()
y_pred = y_pred.flatten()

# Compute R² using Scikit-Learn
R2_sklearn = r2_score(y, y_pred)
print(f"R² (Scikit-Learn Calculation): {R2_sklearn}")

Output:

R² (Scikit-Learn Calculation): 0.7639751938835576

Calculating R² for Simple Polynomial Regression Problem using Sklearn

Polynomial regression is a type of regression analysis in which the relationship between the independent variable X and the dependent variable y is modeled as an n-th degree polynomial. We will compute R-square value for polynomial regression model using python.

Step 1: Import Libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

Step 2: Generate Sample Data

We'll create a simple nonlinear dataset:

# Generate random data
np.random.seed(42)
X = 6 * np.random.rand(100, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(100, 1)

Step 3: Prepare Polynomial Features

Transform the input data to include polynomial features up to the desired degree (e.g., degree 2):

poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)

Step 4: Fit the Polynomial Regression Model

Fit a linear regression model to the polynomial features:

model = LinearRegression()
model.fit(X_poly, y)
y_pred = model.predict(X_poly)

Step 5: Calculate R² Using Scikit-Learn

Verify the manual calculation using Scikit-Learn's r2_score function:

# Flatten the arrays to use in r2_score
y = y.flatten()
y_pred = y_pred.flatten()

# Compute R² using Scikit-Learn
R2_sklearn = r2_score(y, y_pred)
print(f"R² (Scikit-Learn Calculation): {R2_sklearn}")

Visualizing the Results

It's often helpful to visualize the polynomial regression curve along with the data points:

plt.scatter(X, y, color='blue', label='Actual')
# Sort the values for better plotting
sorted_indices = X.flatten().argsort()
plt.plot(X[sorted_indices], y_pred[sorted_indices], color='red', linewidth=2, label='Predicted')
plt.title('Actual vs Predicted (Polynomial Regression)')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

Complete Code

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# Generate random data
np.random.seed(42)
X = 6 * np.random.rand(100, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(100, 1)

poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, y)
y_pred = model.predict(X_poly)

# Flatten the arrays to use in r2_score
y = y.flatten()
y_pred = y_pred.flatten()

# Compute R² using Scikit-Learn
R2_sklearn = r2_score(y, y_pred)
print(f"R² (Scikit-Learn Calculation): {R2_sklearn}")

plt.scatter(X, y, color='blue', label='Actual')
# Sort the values for better plotting
sorted_indices = X.flatten().argsort()
plt.plot(X[sorted_indices], y_pred[sorted_indices], color='red', linewidth=2, label='Predicted')
plt.title('Actual vs Predicted (Polynomial Regression)')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

Output:

R² (Scikit-Learn Calculation): 0.8525067519009746

Conclusion

Calculating R² directly from sample data in Python is straightforward and provides valuable insight into your model's performance. By following the steps outlined above, you can easily implement and interpret R² in your regression analyses without relying on a predefined regression model. This approach is useful when you want to validate the goodness of fit of your predictions against actual data.

How to Calculate F1 Score in R?

alka1974

Improve

Article Tags :

Practice Tags :

Machine Learning

How to Calculate R^2 with Scikit-Learn

What is R²?

Calculating R2 with Scikit-Learn for Sample Data

Step 1: Import Necessary Libraries

Step 2: Generate Sample Data

Step 3: Computer the R2 using sklearn

Complete Code

Calculating R2 for Simple Polynomial Regression Problem using Sklearn

Step 1: Import Libraries

Step 2: Generate Sample Data

Step 3: Prepare Polynomial Features

Step 4: Fit the Polynomial Regression Model

Step 5: Calculate R² Using Scikit-Learn

Visualizing the Results

Complete Code

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?

Calculating R² with Scikit-Learn for Sample Data

Step 3: Computer the R² using sklearn

Calculating R² for Simple Polynomial Regression Problem using Sklearn