Kemometrik - Curve Fit Dan Regresi Linier 01

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Curve Fitting dan

Regresi Linier
KIM 357 – Kemometrika
Departemen Kimia FMIPA IPB
[email protected]
Curve Fitting

Describes techniques to fit curves (curve fitting) to discrete data to


obtain intermediate estimates.
There are two general approaches for curve fitting:
 Least Squares regression:
Data exhibit a significant degree of scatter. The strategy is to
derive a single curve that represents the general trend of the data.
 Interpolation:
Data is very precise. The strategy is to pass a curve or a series of
curves through each of the points.
Curve Fitting

In Chemistry, two types of applications are encountered:


 Trend analysis. Predicting values of dependent variable, may
include extrapolation beyond data points or interpolation between
data points.

 Hypothesis testing. Comparing existing mathematical model with


measured data.
Curve Fitting
Curve Fitting: Techniques

 Where does this given function Measured Variable = f (Physical


Variable) come from in the first place?
 Analytical models of phenomena (e.g. equations from physics)
 Create an equation from observed data
 Curve fitting - capturing the trend in the data by assigning a single
function across the entire range.
Curve Fitting: Techniques

A straight line is described generically by


 Given the general form
of a straight line
f ( x ) = ax + b  How can we pick the
coefficients that best fits
the line to the data?
The goal is to identify the  What makes a particular
coefficients ‘a’ and ‘b’ such that straight line a ‘good’ fit?
f(x) ‘fits’ the data well
Linear Regression

Fitting a straight line to a set of paired observations: (x1, y1), (x2,


y2),…,(xn, yn).
 y = a0+ a1 x + e
 a1 - slope
 a0 - intercept
 e - error, or residual, between the model and the observations
Linear Regression : Residual
Linear Regression : Question ?

How to find a0 and a1 so that the error would be


minimum?
Linear Regression
Assumptions:
 positive or negative error have the same value (data point is above or
below the line)
 Weight greater errors more heavily
• Denote data values as (x, y)
• Name points on the fitted line as (x, f(x)).

The error is available at the four data


points.
Linear Regression: Least Squares Fit

n n n
S r = ∑ e = ∑ ( yi , measured − yi , model) = ∑ ( yi − a0 − a1 xi ) 2
2
i
2

i =1 i =1 i =1

n n
2 2
min S r = ∑ ei = ∑ ( yi − a0 − a1 xi )
i =1 i =1

Yields a unique line for a given set of data.


Linear Regression: Least Squares Fit

n n
2 2
min S r = ∑ ei = ∑ ( yi − a0 − a1 xi )
i =1 i =1

The coefficients a0 and a1 that minimize Sr must satisfy the following


conditions:

 ∂S r
 ∂a = 0
 0

 ∂S r = 0

 ∂a1
Linear Regression: Determination of ao and a1

∂S r
= −2∑ ( yi − ao − a1 xi ) = 0
∂ao
∂S r
= −2∑ [( yi − ao − a1 xi ) xi ] = 0
∂a1
0 = ∑ yi − ∑ a 0 − ∑ a1 xi
0 = ∑ yi xi − ∑ a 0 xi − ∑ a1 xi2

∑a 0 = na0
na0 + (∑ xi )a1 = ∑ yi 2 equations with 2 unknowns, can
be solved simultaneously
∑ yi xi = ∑ a 0 xi + ∑ a1 xi2
Linear Regression: Determination of ao and a1

n∑ xi yi − ∑ xi ∑ yi
a1 =
n ∑ x − (∑ xi )
2 2
i

a0 = y − a1 x
Error Quantification of Linear Regression

 Total sum of the squares around the mean for the dependent variable, y, is St

St = ∑ i
( y − y ) 2

 Sum of the squares of residuals around the regression line is Sr

n n
S r = ∑ e = ∑ ( yi − ao − a1 xi ) 2
2
i
i =1 i =1
Error Quantification of Linear Regression

 St-Sr quantifies the improvement or error reduction due to describing


data in terms of a straight line rather than as an average value.

2 St − S r
r =
St
r2: coefficient of determination
r : correlation coefficient
Error Quantification of Linear Regression

For a perfect fit:


 Sr= 0 and r = r2 =1, signifying that the line explains 100 percent of
the variability of the data.
 For r = r2 = 0, Sr = St, the fit represents no improvement.
Least Squares Fit of a Straight Line: Example

Fit a straight line to the x and y values in the following Table:

xi yi xiyi xi2
∑ xi = 28 ∑ yi = 24.0
1 0.5 0.5 1

2 2.5 5 4
2
∑ i = 140
x ∑ xi yi = 119.5
3 2 6 9
2 8
4 4 16 16
x = =4
5 3.5 17.5 25 7
6 6 36 36
2 4
7 5.5 38.5 49 y= = 3.4 2 8 5
7
28 24 119.5 140
Least Squares Fit of a Straight Line: Example (cont’d)

n∑ xi yi − ∑ xi ∑ yi
a1 =
n∑ x − (∑ xi )
2 2
i

7 × 119.5 − 28 × 24
= 2
= 0.8392857
7 × 140 − 28
a0 = y − a1 x
= 3.428571 − 0.8392857 × 4 = 0.07142857
Y = 0.07142857 + 0.8392857 x
Least Squares Fit of a Straight Line: Example (cont’d)

^
2
xi yi (yi − y) e = ( yi − y ) 2
2
i
1 0.5 8.5765 0.1687 S t = (
∑ i y − y )2
= 22.7143
2 2.5 0.8622 0.5625
S r = ∑ ei = 2.9911
2
3 2.0 2.0408 0.3473
4 4.0 0.3265 0.3265
5 3.5 0.0051 0.5896 St − S r
2
6 6.0 6.6122 0.7972 r = = 0.868
St
7 5.5 4.2908 0.1993
28 24.0 22.7143 2.9911 2
r= r = 0.868 = 0.932
Least Squares Fit of a Straight Line: Example (Error Analysis)

•The standard deviation (quantifies the spread around the mean):

St 22.7143
sy = = = 1.9457
n −1 7 −1
•The standard error of estimate (quantifies the spread around the regression line)

Sr 2.9911
sy / x = = = 0.7735
n−2 7−2
Because S y /,x < S y the linear regression model has good fitness
Linearization of Nonlinear Relationships

• The relationship between the dependent and independent


variables is linear.
• However, a few types of nonlinear functions can be transformed
into linear regression problems.
The exponential equation.
The power equation.
The saturation-growth-rate equation
Software
Terimakasih

You might also like