Analysis of Data - Curve Fitting and Spectral Analysis
Analysis of Data - Curve Fitting and Spectral Analysis
Analysis of Data - Curve Fitting and Spectral Analysis
step=14/365;
>> t=1981+(0:229)*step;
>> plot(t,global1,'+k')
>> axis([1980 1990 335 360])
>> xlabel('Year')
>> ylabel('CO_{2} (ppm)')
Page 1
A question we might ask when analyzing this data set is
What is the estimated rate of increase
of CO2 concentration per year?
General Theory
"i
y • Y(x;{a j })
i
!i
•
• •
xi
Δ i = Y (xi ;{a j }) − yi
Our curve fitting criterion will be that the sum of the square
of the errors be a minimum, that is, we need to finda set {aj}
that minimizes the function
Page 2
N N
D({a j }) = ∑ Δ i2 = ∑ ⎡⎣Y (xi ;{a j }) − yi ⎤⎦
2
i =1 i =1
This process will give us the least squares fit. It is not the
only way to obtain a curve fit, but it is the most common.
i =1 i =1 ⎝σ ⎠ σ 2
i i
Linear Regression
Y (x;{a1 , a2 }) = a1 + a2 x
This type of curve fit is known as linear regression. We want to
determine a1 and a2 such that
χ ({a1 , a2 })∑
2
N
[a 1 + a2 xi − yi ]
2
i =1 σ i2
is minimized. The minimum is found by differentiating the chi-
square function and setting the derivatives to zero:
∂χ 2 N
= 2∑ 1
[ a + a2 xi − yi ] = 0
∂a1 i =1 σ i2
∂χ 2 N
= 2∑ 1
[ a + a2 xi − yi ] xi = 0
∂a2 i =1 σ i2
or
a1S + a2 ∑ x − ∑ y = 0
a1 ∑ x + a2 ∑ x 2 − ∑ xy = 0
where
N
1 N
xi N
yi N
xi2 N
xi yi
S=∑ , ∑x = ∑ , ∑y = ∑ , ∑ x2 = ∑ , ∑ xy =∑
i =1 σ i i =1 σ i i =1 σ i i =1 σ i σ i2
2 2 2 2
i =1
Page 3
Since the sums may be computed directly from the data, they are
known constants. We thus have a linear set of two simultaneous
equations in the unknowns a1 and a2. The solution of these
equations is
a1 =
∑ y∑ x 2 − ∑ x ∑ xy
, a =
S ∑ xy − ∑ y∑ x
S∑ x 2 − ∑ x ( ) S∑ x 2 − ∑ x ( )
2 2 2
Notice that if σ i is a constant, that is, if the error is the
same for all points, then the σ's cancel out of these results
and the parameters a1 and a2 are independent of the error bar. In
this case we just put σ i = 1 in the formulas.
σ a1 =
∑x 2
, σ a2 =
S
S∑ x 2 − (∑ x ) S∑ x 2 − (∑ x )
2 2
Notice that σ a j is independent of yi .
If the error bars on the data are constant ( σ i = σ 0 ), the error in
the parameters is
σ x2 σ0 1
σ a1 = 0 , σ a2 =
− x − x
2 2 2 2
N x N x
1 1
x =
N
∑ x , x2 = ∑ x2
N
Finally, if the data set does not have an associated set of
error bars, we can estimate σ 0 from the data
1 N
σ ≈ (standard-deviation) =
2
0 ∑
N − 2 i =1
[
2
yi − (a1 + a2 xi )]
2
Note that this sample variance is normalized to N-2 since we
have already extracted tow parameters a1 and a2 from the data.
Page 4
Z(x; {α , β }) = α eβ x
may be written as a linear relation using the change of variable
Z(x; {α , β }) = α eβ x
ln Z = Y , ln α = a1 , β = a2
ln Z = Y = ln α + β x ln e = a1 + a2 x
Similarly, to fit a power law of the form
Z(t; {α , β }) = α t β
ln Z = Y , ln t = x , ln α = a1 , β = a2
ln Z = Y = ln α + β ln t = a1 + a2 x
These transformations should be familiar because you use them
whenever you plot data using semilog or log-log scales.
Goodness of Fit
With the given error bars, how
likely is it that the curve
actually describes the data?
Page 5
This suggests that we take our rule of thumb for a good fit to
be
χ2 ≈ N − M
function, Y (x) for the curve fit or the error bars, σ i , are too
small. On the other hand, if χ << N − M , then the fit is so
2
Page 6
sxy = sum(x.*y.*sigmaTerm);
sxx = sum((x.^2).*sigmaTerm);
denom = s*sxx-sx^2;
Page 7
figure(1); clf; % bring figure 1 window forward
errorbar(x,y,sigma,'or'); % graph data with error bars
hold on;
% freeze plot to add line fit
plot(x,yy,'-k');
%Plot fit on same graph
xlabel('x_i'); ylabel('y_i and Y(x)');
title(['\chi^2 = ',num2str(chisqr),' N-M = ',num2str(N-M)]);
>> lsftest
Curve fit data is created using the function
y(x) = c(1) + c(2)*x + c(3)*x^2
Enter the coefficients as [c(1) c(2) c(3)]: [2.0 0.5 0.0]
Enter estimated error bar: 2.0
Fit parameters:
a(1) = 2.46232 +/- 0.574279
a(2) = 0.484955 +/- 0.0195998
Page 8
The second plot is a curve fit result with input values
c = [2.0 0.5 -0.02] (underlying quadratic data), α=2.0.
>> lsftest
Curve fit data is created using the function
y(x) = c(1) + c(2)*x + c(3)*x^2
Enter the coefficients as [c(1) c(2) c(3)]: [2.0 0.5 -0.02]
Enter estimated error bar: 2.0
Fit parameters:
a(1) = 11.3023 +/- 0.574279
a(2) = -0.535045 +/- 0.0195998
Spectral Analysis
The carbon dioxide data shown earlier fro Mauna Loa, Hawaii has
a general upward trend but also a significant periodicity due to
Page 9
the annual seasonal cycle. If a data set exhibits periodic
oscillations (or if we suspect it contains periodic
oscillations), we want to fit it using trigonometric functions.
This class of problems moves us from the regime of curve fitting
to that of spectral analysis. In these notes we will only
introduce some basic aspects of this vast subject, including
discrete Fourier transform and the power spectrum.
Take a vector of N data points y = [ y1 y2 y3 ...........yN ] ; we call the data
set a time series because transform methods are most often used
in signal analysis. The data is evenly spaced in time, so t j +1 = jτ
where τ is the sampling interval, that is, the time increment
between data points, and j = 0,1,2,.....,N-1. We define the
vector Y, the discrete Fourier transform of y, as
N −1
Yk +1 = ∑ y j +1e−2 π ijk / N
j=0
1 N −1
y j +1 = ∑ Yk +1e2 π ijk / N
N k=0
Note that texts (and numerical libraries) will vary slightly in
how they define this transform, especially in how it is
normalized. You must always carefully check the definition in
any other programming language you might use.
Each point Yk +1 of the transform has an associated frequency,
k
fk +1 =
τN
The lowest (nonzero) frequency is
1 1
f2 = =
τN T
where T is tghe length of the time series. To measure very low
frequencies, we need to analyze long time series. The highest
frequency is
N −1 1
fN = ≈
τN τ
so to measure very high frequencies we need to use a short
sampling rate. For real time series the highest frequency is
actually 1 / (2τ ) , which is the Nyquist frequency. We will discuss
this shortly.
Page 10
y j +1 = sin(2π fs jτ + φ s )
The signal is a sine wave of frequency fs and phase φ s . The
program evaluates Yk +1 and plots both the signal and its
transform. Note, although y is real, Y is complex, so we
separately consider its real and imaginary parts.
Page 11
Let us discuss a few examples (sampling interval τ = 1 for all).
Case #1: N =50 data points, signal frequency fs = 0.2 and
phase = 0, we obtain the sine wave, transform and power spectrum
shown below.
>> fttest
Enter number of points: 50
Enter frequency of sine wave: 0.2
Enter phase of the sine wave: 0.0
Page 12
The discrete sampling of the sine wave is evident from the
jaggedness in the time series plot. Notice that the real part of
the transform is zero and the imaginary part has spikes at
frequencies f = 0.2 and 0.8 (k = 10 and 40). Notice also the
similar plots for the transform and power spectrum(because real
part = 0). The existence of the power spectrum peak at 0.2 makes
physical sense. The other peak will be discussed shortly.
>> fttest
Enter number of points: 50
Enter frequency of sine wave: 0.2
Enter phase of the sine wave: pi/2
Page 13
Notice that the imaginary part of the transform is zero and the
real part has spikes at frequencies f = 0.2 and 0.8 (k = 10 and
40). Notice also the similar plots for the transform and power
spectrum(because imaginary part = 0).
>> fttest
Enter number of points: 50
Enter frequency of sine wave: 0.2123
Enter phase of the sine wave: 0.0
Page 14
Page 15
Notice that we still have a peak around the frequency of the
sine wave, but the structure is more complicated. In this
example, because the frequency of the signal is not equal to a
multiple of 1 / τ N = 1 / 50 , our Fourier transform is not a simple
spike. For this reason the (unnormalized) power spectrum given
by
2
Pk +1 = Yk +1 = Yk +1Yk*+1
looks very different.
>> fttest
Enter number of points: 50
Enter frequency of sine wave: 0.8
Enter phase of the sine wave: 0
Page 16
Comparing this with the results for f = 0.2 (instead of 0.8) we
find that they are almost identical - the time series only
differ by a phase shift of π. But how is this possible since
these sine waves have completely different frequencies?
>> y1=sin(2*pi*(0.2)*(0:0.1:10)+pi);
>> y2=sin(2*pi*(0.8)*(0:0.1:10));
>> plot((0:0.1:10),y1,'-k',(0:0.1:10),y2,'-k');
>> xlabel('Time'); ylabel('Amplitude')
>> hold on;
>> y11=sin(2*pi*(0.2)*(1:9)+pi);
>> plot((1:9),y11,'or');
>> hold off;
Page 17
The two sine waves have frequencies 0.2 and 0.8 - the former is
shifted by π. When the sampling interval is τ = 1, the two data
sets for these sine waves (the circles) are identical. This
phenomenon is known as aliasing.
Page 18
% m-file extractfps
% extract frequency and power spectrum
clear;
% a common use of the FFT is to find the
% frequency components of a signal buried
% in a noisy time domain signal. Consider
% the data sample at 1000 Hz. Form a
% signal containing 50 Hz and 120 Hz and
% corrupt it with some zero-mean random noise
t=0:0.001:0.6;
x=sin(2*pi*50*t) + sin(2*pi*120*t);
y=x+2*randn(size(t));
figure('Position',[200,200,300,300]);
plot(y(1:150),'-k');
% it is difficult to identify frequency components
% from looking at the original signal.
% Converting to the frequency domain
% the DFT of the noisy signal y(t)
% is found by taking the 512 point FFT
Y=fft(y,512);
% the power spectral density, a measurement of the energy at
% various frequencies, is
Pyy=Y.*conj(Y)/512;
% the first 256 points
% (Nyquist frequency discussion says that
% same information is contained in discarded
% part of vector)
% can be graphed on a meaningful frequency axis with
f=1000*(0:255)/512;
figure('Position',[600,200,300,300])
plot(f,abs(Pyy(1:256)),''-k')
Page 19
Power spectrum filtering out 50 Hz and 120 Hz signals
Page 20
Clearly this is a wonderful tool for extracting periodic signals
from noise.
Fourier Synthesis
where
⎛ an ⎞ 2 T ⎛ cos(nω t)⎞ 2π
⎜⎝ b ⎟⎠ = T ∫ ⎜⎝ sin(nω t) ⎟⎠ f (t)dt , ω=
T
n 0
Page 21
There are even and odd functions in the world, and these need
only cosine or sine series respectively. Every function can be
written as a sum of an even and an odd function.
n = −∞
where
∞
−∞
cn = ∫ f (t)einω t dt
You can use an identity of complex exponentials, namely,
eiθ = cosθ + i sin θ
⎧+1 for 0 ≤ t ≤ π
f (t) = ⎨
⎩ −1 for − π ≤ t ≤ 0
As discussed in class we have all an = 0 and
⎧ 4 / nπ n=1,3,5,7,......
bn = ⎨
⎩0 n=2,4,6,8,......
which gives
Page 22
4 ∞ sin nt
f (t) = ∑
π n =1 n
odd
Page 23