Curve Fitting

What is Regression?
What is regression? Given n data points (x1, y1), (x 2, y 2), ... , (xn, yn)
best fit y = f (x) to the data. The best fit is generally based on
minimizing the sum of the square of the residu.als, Sr
y = f ( x)
Residual at a point is
 i = y i − f (x i ) ( xn, yn )
Sum of the square of the
Sr = (yi − f (xi))2
i=1 ( x 1, y 1 )
Figure. Basic model for regression
Least-Squares Regression
When large error is associated with data,
polynomial interpolation is inappropriate since
it will generate the high number of order.
The best way is to sketch a best line through
the points.

The curve should minimizes the discrepancy

between the data points and the curve.

Linear Regression
Given n data points (x1, y1), (x 2, y 2), ... , (xn, yn )
Best fit the data to the function f(x)= y = a 0 + a 1 x
The “best” line through y
points is arbitrary and need x,y
i i
certain criterion to establish
ei = yi − a 0 − a1 x i x ,y
a base for the fit n n

minimizing x ,y
x ,y 3 3
2 2
 ei
y = a 0 + a1 x
x ,y x
work as a criterion, where 1 1

ei = yi − (a 0 + a 1 xi ) Figure. Linear regression of y vs. x data

showing residuals at a typical point, xi .
For a “Best” Fit:
Minimize the total sum of the squares of the
residuals (error) between the measured y and
the y calculated with the linear model:

n n n
S r =  ei2 =  ( yi , measured − yi , model) 2 =  ( yi − a0 − a1 xi ) 2
i=1 i=1 i=1

n = total number of points

13.2.1 Criteria for best fit
• The sum of the squares of the residuals which should be
minimized is:
n n
Sr =  e =  i − − 2
( y 2 a a x )
i o 1 i
i =1 i =1
• Find the constant parameters ao
and a1 that satisfies the above
criterion, the error is minimized as
follows: ei
Sr Sr
=0 & =0
ao a1 ao Slope =a1 x
Least Squares Fit of a Straight Line:
To determine the values of ao and a1
n n
Sr =  e =  i − − 2
( y 2 a a x )
i o 1 i
i =1 i =1

Differentiating and equating to zero:

= −2 ( yi − ao − a1 xi ) = 0
= −2 ( yi − ao − a1 xi )xi = 0
0 =  yi −  ao −  a1 xi
0 =  xi yi −  ao xi −  a1 x i2

Linear Regression
0 =  yi −  ao −  a1 xi
0 =  xi yi −  a o xi −  a1 x i2
Realizing that ao = these equations can be set with two

 nn ao + ( xi )a1 =  yi
 xi   ao  
2 = 
yi 

x a + x a (
= x )
( i )ox i a1i 1 i yi i i

Normal equations that can be solved simultaneously for a1 and ao:

n x i yi −  x i  yi  x
a1 = ao =
y i
− a1 i
= y − a1 x
n xi2 − ( x )
n n

13.2.3 Error Quantification of Linear Regression
• The square of the vertical distance between the data and the best
fit line n n
Sr =  e =  i − − 2
( y 2a a x )
i o 1 i
i =1 i =1
• The square of the discrepancy between data and the mean
S t =  ( yi − y) 2

Standard error of estimate : Standard Deviation :

Sr St
sy/ x = sy =
n−2 n−1

13.2.3 Error Quantification of Linear Regression
• To measure the improvement achieved describing the data by a
straight line instead of using the average:
Coefficeint of determination:
St − Sr • Perfect fit:
r =
St Sr = 0 and r = 1
Correlation coefficient : • No improvement:
St − Sr r = 0 and Sr = St
( x )( y )
n ( x i yi ) − i i

n x − ( x ) n y − ( y )
2 2
2 2
i i i i

i xi yi xi2 xiyi a0+a1xi (yi-y)2 (yi-a0-a1xi)2
1 10 25 100 250 -39.5833 380534.8 4171.003
2 20 70 400 1400 155.1191 327041 7245.261
3 30 380 900 11400 349.8215 68578.52 910.7419
4 40 550 1600 22000 544.5239 8441.016 29.98767
5 50 610 2500 30500 739.2263 1016.016 16699.44
6 60 1220 3600 73200 933.9287 334228.5 81836.79
7 70 830 4900 58100 1128.6311 35391.02 89180.53
8 80 1450 6400 116000 1323.3335 653066 16044.4
S 360 5135 20400 312850 5135.0008 1808297 216118.2

x = 360 / 8 = 45, y = 5135 / 8 = 641.875 s y = 1808297 / 7 = 508.26

8(312850) − 360(5135) s y / x = 216118 / 6 = 189.79
a1 = = 19.47024
8(20400) − (360) 2
1808297 − 216118
ao = 641.875 − 19.47024(45) = −234.2857 r2 = = 0.8805

