The document provides instructions for a neural networks homework assignment. Students are asked to:
1) Generate training data for a 2D nonlinear function and split it into training, validation, and test sets.
2) Train a multi-layer perceptron on the function using backpropagation, experimenting with different network sizes, initialization parameters, and learning rates.
3) Analyze learning curves and report on how to choose the network size based on factors like training error reduction and avoiding overfitting.
The document provides instructions for a neural networks homework assignment. Students are asked to:
1) Generate training data for a 2D nonlinear function and split it into training, validation, and test sets.
2) Train a multi-layer perceptron on the function using backpropagation, experimenting with different network sizes, initialization parameters, and learning rates.
3) Analyze learning curves and report on how to choose the network size based on factors like training error reduction and avoiding overfitting.
The document provides instructions for a neural networks homework assignment. Students are asked to:
1) Generate training data for a 2D nonlinear function and split it into training, validation, and test sets.
2) Train a multi-layer perceptron on the function using backpropagation, experimenting with different network sizes, initialization parameters, and learning rates.
3) Analyze learning curves and report on how to choose the network size based on factors like training error reduction and avoiding overfitting.
The document provides instructions for a neural networks homework assignment. Students are asked to:
1) Generate training data for a 2D nonlinear function and split it into training, validation, and test sets.
2) Train a multi-layer perceptron on the function using backpropagation, experimenting with different network sizes, initialization parameters, and learning rates.
3) Analyze learning curves and report on how to choose the network size based on factors like training error reduction and avoiding overfitting.
Copyright:
Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online from Scribd
Download as pdf or txt
You are on page 1of 2
K. N. Toosi University of Tech.
Control Department, fall 2009 - Neural Networks
Instructor: Dr. Mohammad Teshnehlab TA: M. Ahmadieh Khanesar, email: [email protected] Neural Networks Problem Set #1 (Due before class on Wednesday, 1388/8/6)
Back-propagation Algorithm In this problem, we will investigate the two identification problems. Consider a 2 dimensional nonlinear function which is fully describable using the following neural networks: 2 1 1 2 1 1 2 2 0.5 0.5 0.1 0.1 0.2 0.1 ( ) *tansig( ) , , , [0.2 0.2 0.5 0.5], 0.6 0.5 0.3 0.7 1 1 0.5 f x w w x b b w b w b ( ( ( ( ( ( = + + = = = = ( ( ( (
Generate 1000 pairs of data for inputs and compute the output. Determine the proper range for which no saturation for Tansig functions happens. Use 50 samples for validation and 2/3 of remaining samples for train and 1/3 for the test. Write your code so that in each epoch you cycle through every individual data in the training set (making an online update after each) and through every individual data in the test set (no updates, just record the error). We will look at the epoch squared error. The squared error for an epoch is simply the sum of the squared errors for each individual in the data set. This is precisely the function that is minimized in the gradient descent calculation. For the multi-layer perceptron with one hidden layer, experiment with different sizes n of the neurons, initial conditions for the weights and biases, and learning rate. Settle on particular choices of these parameters, and run the backprop algorithm until convergence. As your final results, write a report on how to choose the size of the network. Learning curves of training squared error and test squared error as function of epoch number must be included in the report. Try to derive information from these curves as much as possible. For example try to answer these questions: For your best parameters how does the squared error evolve over time for the training set? What about bad parameters? What about the test set? How do the two curves compare? Is there any case in which the parameters of neural networks do not converge to the optimal values but the identification error is good? Is it possible to compensate the wrong number of neurons with a good selection of epochs? About the biasing parameters, is it possible to omit the biasing parameters especially the biasing parameters in the output layer by using more neurons? If yes provide simulations and if not provide reasons. About the input data for identification, does the convergence to the optimal values depend on them? What can you add to the above questions? Here is a new function for identification. Do the above mentioned procedure for the following function:
sin( ) sin( ) ( , ) , , [ 2, 2] x y f x y x y x y = e Use the rules of thumbs in the next page for this function and investigate them. Which one is violated here?
Subject: How many hidden units should I use? The best number of hidden units depends in a complex way on: - the numbers of input and output units - the number of training cases - the amount of noise in the targets - the complexity of the function or classification to be learned - the architecture - the type of hidden unit activation function - the training algorithm - regularization In most situations, there is no way to determine the best number of hidden units without training several networks and estimating the generalization error of each. If you have too few hidden units, you will get high training error and high generalization error due to underfitting and high statistical bias. If you have too many hidden units, you may get low training error but still have high generalization error due to overfitting and high variance. Geman, Bienenstock, and Doursat (1992) discuss how the number of hidden units affects the bias/variance trade-off. Some books and articles offer "rules of thumb" for choosing an architecture; for example: - "A rule of thumb is for the size of this [hidden] layer to be somewhere between the input layer size ... and the output layer size ..." (Blum, 1992, p. 60). - "To calculate the number of hidden nodes we use a general rule of: (Number of inputs + outputs) * (2/3)". - "you will never require more than twice the number of hidden units as you have inputs" in an MLP with one hidden layer (Swingler, 1996, p. 53). - "How large should the hidden layer be? One rule of thumb is that it should never be more than twice as large as the input layer." (Berry and Linoff, 1997, p. 323). - "Typically, we specify as many hidden nodes as dimensions [principal components] needed to capture 70-90% of the variance of the input data set." (Boger and Guterman, 1997) Reference www.faqs.org/faqs/ai-faq/neural-nets/part3/section-10.html