The Neural-Network Analysis & Its Applications Data Filters: Saint-Petersburg State University JASS 2006

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 77

THE NEURAL-NETWORK

ANALYSIS
&
its applications
DATA FILTERS

Saint-Petersburg State University


JASS 2006

About me
Name: Alexey Minin
Place of studying: Saint-Petersburg State University
Current semester: 7th semester
Field of interests: Neural Nets, Data filters for Optics
(Holography), Computational Physics,EconoPhisics.

Content:
What is Neural Net & its applications
Neural Net analysis
Self organizing Kohonen maps
Data filters
Obtained results

What is NeuroNet & its applications

Recognition of images
Processing of noised signals
Addition of images
Associative search
Classification
Drawing up of schedules
Optimization
The forecast
Diagnostics
Prediction of risks

What is Neural Net & its applications

Recognition of images

M-X2

What is Neural Net & its applications

Neural Net analysis

PARADIGMS of neurocomputing
Connection
Localness and parallelism of
calculations
The training based on data
(programming)
Universality of training algorithms

Neural Net analysis


What is Neuron?
Typical formal neuron makes the elementary operation
weighs values of the inputs with the locally stored weights
and makes above their sum nonlinear transformation:

y f u , u w0 iwi x i
y
y
u

x1

u w0 wi xi

xn

neuron makes nonlinear operation above a linear combination of inputs

Neural Net analysis


Connectionism

Global communications
Formal neurons
Layers

Neural Net analysis


Localness and parallelism of calculations
Localness of processing of the information

Any neuron reacts only to the information


from connected with it neurons without the appeal
to a general plan of calculations

Neurons are capable to function in parallel

Parallelism of calculations

Comparison of ANN&BNN
BRAIN
100hz

PC IBM

Vprop=100m/s

Vprop=3*108 m/s

100hz

109hz

N 109 hz

N=10 -10 neurons


10

11

The parallelism degree ~1014


like 1014processors with 100
Hz frequency. 104 connected at
the same time.

N=109

Neural Net analysis


The training based on data (programming)
Absence of the global plan
Mode of distribution of the
information on a network with
corresponding adaptation
neurons
The algorithm is not set in
advance, and generated by data
Training of a network occurs on a
small share of all possible situations
then the trained network is capable
to function in much wider range
of patterns

Local change by any neuron


the selected parameters
Synaptic weights

Training of a network

Patterns, on which Network is


training

An ability for generalization

Neural Net analysis


Universality of training algorithms
The only principle of studying - is to find minimum of empirical error
W set of synaptic weights
E (W) error function

The task is to find


Global minimum
The stochastic optimization as
a way not to stick at local minimum

Neural Net analysis


BASIS NEURAL NETS
Perceptron
Hopfield network
Kohonen maps
Probabilistic NNets
NN with general regression
Polynomial nets

Neural Net analysis


The architecture of NN
PROTOTYPES OF ANY NEURAL ARCHITECTURE

RECURRENT
with FEEDBACK (Elman-Jordan)
LEVEL-BY-LEVEL
WITHOUT FEEDBACK

Neural Net analysis


Classification of NN

By type of training
with tutor
E ( w) E{x , y , y ( x , w)}

without tutor
E ( w) E{x , y ( x , w)}

In this case the network is offered most to find the latent laws
in data file. So, redundancy of data supposes compression of
the information, and a network it is possible to learn to find the
most compact representation of such data, i.e. to make optimum
coding the given kind of the entrance information.

Methodology of self-organizing cards


Self-organizing Kohonen cards represent the type of the neural networks
trained without the teacher. The network independently forms the outputs,
adapting to signals acting on its input. As "teacher" of a network only data,
that is an information available in them, the laws distinguishing entrance data
from casual noise can serve.
Cards unite in themselves two types of compression of the information:

Downturn of dimension of data with the


minimal loss of the information
Reduction of a variety of data due to allocation
of a final set of prototypes, and references of
data to one of such types

Methodology of self-organizing cards


Schematic representation of self-organizing network

Neurons in the target layer are ordered and


correspond to cells of a bi-dimensional card
which can be painted by a principle of
affinity of attributes

Hebb training rule


Changeof
ofweight
weightat
atpresentation
presentationof
ofith
ithexample
example
Change
isproportionally
proportionallyits
itsinputs
inputsand
andoutputs:
outputs:: :
is

Hebb, 1949

y
x

w y x

Vector representation

If to formulate training as a problem of optimization trained on Hebb


neuron aspires to increase amplitude of the output:

E
w
,
w
E w, x 12

w x

12 y 2 ,

Where averaging is spent on training sample x


NB: in this case there is no minimum error
Training on Hebb in that kind in what it is described above,
In practice not useful since leads
to unlimited increase of amplitude of weights.

Oya training rule


wj y xj y w j

The member interfering is added


To unlimited growth of weights

w y x y w

Vector representation

Rule Oya maximizes sensitivity of an output neuron at the limited amplitude of


weights. It is easy to be convinced of it, having equated average change of
weights to zero.
Having increased then the right part of equality on w. We are convinced, that in
2
balance
2

1 w 0

Thus, weights trained neuron are located on hyper sphere:

w 1.

At training on Oya, a vector of


weights neuron settles down on
hyper sphere, In a direction
maximizing Projection of entrance
vectors.

Competition of neurons: the winner takes away all


yi j 1 wij x j
d

Basis algorithm

Training of a competitive layer remains constant

wi yi x k yk w k
Winner:

x1

xd

# of neuron winner

i # i : w i x w i x
Training of the winner:

wi x w i

if w i 1 w i x w i x i i
I.e. the winner will appear neuron,
giving the greatest response to the
given entrance stimulus

yi 1, yi 0, i i

The winner takes away all


One of variants of updating of a base rule of training of a competitive layer
Consists in training not only the neuron-winner, but also its "neighbors", though and with
In the smaller speed. Such approach - "pulling up" of the nearest to the winner neuronIt is applied in topographical Kohonen cards

i # i : w i x min w i x
i

wi (t 1) wi (t 1) wi (t ) i i , t x (t ) wi (t )
Modified by Kohonen training rule

i i ,t Function of the neighborhood is equal to unit for the neuron

-winner with an index i And gradually falls down at removal


from the neuron-winner

Training on Kohonen reminds stretching an elastic grid of prototypes on


Data file from training sample

Bidimentional topographical card of a set Threedimensional data

Each point in three-dimensional space


gets in the cell of a grid having
coordinate of the nearest to its neuron from
bidimentional card.

Visualization a topographical card, Induced by i-th


component of entrance data

xi

The convenient tool of visualization


Data is coloring topographical
Cards, it is similar to how it do on
Usual geographical cards. All
attribute of data generates the coloring
Cells of a card - on size of average value
This attribute at the data who have got in given
Cell.

Having collected together cards of all interesting


Us of attributes, we shall receive topographical
The atlas, giving integrated representation
About structure of multivariate data.

Methodology of self-organizing cards


Classified SOM for NASDAQ100 index for the period
from 10-Nov-1997 till 27-Aug-2001

4,0
3,5

Ln Y(t)

3,0
2,5
2,0
1,5
1,0
1

51

101

151

Change in time of the logprice of actions of


companies JP Morgan
Chase (The top schedule)
and American Express (the
bottom schedule) for the
period With 10-Jan-1994 on
27-Oct-1997

4,5
4,0

Ln Y(t)

3,5
3,0
2,5
2,0
1,5
1

51

101

151

Change in time of the logprice of actions of


companies JP Morgan
Chase (The top schedule)
and Citigroup (the bottom
schedule) for the period
c 10-Nov-1997 on 27-Aug2001

How to choose a variant?


Annual prediction
1988

1993

1998

2003

2008

2013

2018

2023

-24

TEST
-25

A
nnual C
S
L

-26

-27

-28

-29

PREDICTION

2028

This is the forecast of the


Sea level (Caspian)
2033

2038

DATA FILTERS
Custom filters (e.g. Fourier filter)
Adaptive filters (e.g. Kalman filter)
Empirical mode decomposition
Holder exponent

y (n) b(1) x (n) b(2) x(n 1) ... b(nb 1) x (n nb) a(2) y (n 1) ... a(na 1) y (n na)

Adaptive filters
Further we will keep in mind, that we are going to make
forecasts, thats why we need filters, which wont
change phase of the signal.

y (n) b(1) x(n) b(2) x(n 1) ... b(nb 1) x(n nb)


a (2) y (n 1) ... a (na 1) y (n na )
X(n)
X(n-1)
X(n-2)

Z-1
Z-1

b(2)
b(3)

X(n-nb)

Z-1

b(nb+1)

y(n)
-a(2)
-a(3)
-a(na+1)

Z-1

y(n-1)

Z-1

y(n-2)

Z-1

y(n-nb)

Adaptive filters

Siemens value, ad close (scaled)

We saved all maxima, there is no phase distortion

Adaptive filters
Lets try to predict next value using zero-phase filter, having
information about historical price:
I used Perceptron with 3 hidden layers, logistic act function, rotation alg, 20 min

Adaptive filters
Kalman filter

x (n) a x(n 1) k (n)[ y (n) ac x(n 1)], where


x(n) ax(n 1) w(n 1) model of generating signal,

w(n) white noise and


y (n) cx(n) (n) signal after neural net , (n) white noise
y (n)

K(n)

x(n)
Z-1

ac x(n 1)

x(n 1)

Adaptive filters
Lets use Kalman filter, like the error estimator for the
forecast of the zero-phase filtered data.

Empirical Mode Decomposition


What is it?
We can heuristically define a (local) high-frequency part
{d(t), t t t+}, or local detail, which corresponds
to the oscillation terminating at the two minima
and passing through the maximum which necessarily
exists in between them. For the picture to
be complete, one still has to identify the corresponding
(local) low-frequency part m(t), or local trend,
so that we have x(t) = m(t) + d(t) for t t t+.

Empirical Mode Decomposition


What is it?

Eventually, the original signal x(t) is first decomposed


through the main loop as
x(t) d1 (t ) m1 (t ),
and the first residual m1 (t ) is itself decomposed as
m1 (t ) d 2 (t ) m2 (t ),
so that
x(t ) d1 (t ) m1 (t ) d1 (t ) d 2 (t ) m2 (t ) ...
k 1 d k (t ) mK (t )
K

Empirical Mode Decomposition


Algorithm

Given a signal x(t), the effective algorithm of


EMD
can be summarized as follows:
1. identify all extrema of x(t)
2. interpolate between minima (resp. maxima),
ending up with some envelope emin(t) (resp.
emax(t))
3. compute the mean m(t) = (emin(t)+emax(t))/2
4. extract the detail d(t) = x(t) m(t)
5. iterate on the residual m(t)

tone

chirp

tone + chirp

IMF 1; iteration 0
2
1
0
-1
-2
10

20

30

40

50

60

70

80

90

100

110

120

IMF 1; iteration 0
2
1
0
-1
-2
10

20

30

40

50

60

70

80

90

100

110

120

IMF 1; iteration 0
2
1
0
-1
-2
10

20

30

40

50

60

70

80

90

100

110

120

IMF 1; iteration 0
2
1
0
-1
-2
10

20

30

40

50

60

70

80

90

100

110

120

IMF 1; iteration 0
2
1
0
-1
-2
10

20

30

40

50

60

70

80

90

100

110

120

IMF 1; iteration 0
2
1
0
-1
-2
10

20

30

40

50

60

70

80

90

100

110

120

IMF 1; iteration 0
2
1
0
-1
-2
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

IMF 1; iteration 1
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

IMF 1; iteration 1
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

IMF 1; iteration 1
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

IMF 1; iteration 1
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

IMF 1; iteration 1
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

IMF 1; iteration 1
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

IMF 1; iteration 1
1.5
1
0.5
0
-0.5
-1
-1.5
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

IMF 1; iteration 2

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

IMF 1; iteration 2

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

IMF 1; iteration 2

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

IMF 1; iteration 2

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

IMF 1; iteration 2

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

IMF 1; iteration 2

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

IMF 1; iteration 2

1.5
1
0.5
0
-0.5
-1
-1.5

10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 1; iteration 3
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 1; iteration 4
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 1; iteration 5
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 1; iteration 6
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 1; iteration 7
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 1; iteration 8
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 2; iteration 0
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 2; iteration 1
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 2; iteration 2
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 2; iteration 3
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 2; iteration 4
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

IMF 2; iteration 5
1
0.5
0
-0.5
-1
10

20

30

40

50

60

70

80

90

100

110

120

70

80

90

100

110

120

residue
1
0.5
0
-0.5
-1
10

20

30

40

50

60

res.

imf6

imf5

imf4

imf3

imf2

imf1

Empirical Mode Decomposition

10

20

30

40

50

60

70

80

90

100

110

120

Empirical Mode Decomposition


Lets do it for Siemens index

Empirical Mode Decomposition


Lets do it for Siemens index

We saved all strong maxima and there is no phase distortion

Empirical Mode Decomposition


Lets make a forecast for Siemens index

THERE WAS NO DELAY IN THE FORECAST AT ALL!!!

Holder exponent
The main idea is next. Consider
Holder derived, that

f (t ) D f

| f (t t ) f (t ) | const (t ) ( t ) , (t ) [0,1]

0 means that we have break of second order


1means that we have O(t )
So this formula is a somewhat connection between bad functions and
good functions. If we will look on this formula with more precise we will
notice, that we can catch moments in time, when our function knows,
that its going to change its behavior from one to another. It means that
today we can make a forecast on tomorrow behavior. But one should
mention that we dont know the sigh on what behavior is going to
change.

Results

Thank You!
Any QUESTIONS?
SUGGESTIONS?
IDESAS?
Soft Im using:
1)MatLab
2)NeuroShell
3)FracLab
4)Statistika
5)Builder C++

You might also like