2021 Homework3 Introduction
2021 Homework3 Introduction
2021 Homework3 Introduction
Make sure to read from start to finish before beginning the assignment.
1 Homeworks
This homeworks will contain three parts:
1. Analytical: These analytical questions will consider topics from the course. These
will include mathematical derivations and analyses. Your answers will be entirely
based on written work, i.e. no programming.
2. Programming: The goal of this programming assignment is for you to get familiar
with PyTorch and train classifiers to recognize simple images of clothing articles
using a subset of the FashionMNIST dataset.
3. Practicum: In the practicum portion of the assignment, you will try 3 types of
simple visualizations surrounding the training of your neural networks.
Click here for the Practicum Google Colab Notebook
2.1 Requirements
You will need Python 3.6. The Python libraries that you will need (and are allowed to
use) are given to you in the requirements.txt file. Install them using pip3:
pip3 install -r requirements.txt.
Fall 2021 CS 475/675 Machine Learning: Homework 1 2
2.2 Suggestions
Since some of these experiments will take some time to run, and will eat up CPU usage,
you might prefer to run things on the Computer Science undergraduate or graduate grid,
including the Jupyter notebook. However, this is not required as you should be able to
run everything on your own computer. Running on the grid will make it easier to run
multiple experiments in parallel.
Additionally, since we will be exploring many parameter settings for each model, you
may want to store the models while you explore parameters, and then turn in the model
with the best performance. This way you don’t need to retrain a model once you’ve
decided on the best hyperparameters for that model.
2.3 Data
The data we will use for this assignment is a subset of the FashionMNIST dataset. Our
version consists of 70,000 images (7,000 test, 7,000 dev, 56,000 train). Each image is a
28x28 grayscale article of clothing, and is labeled out of 10 classes. The classes are Shirt,
Sneaker, Bag, Ankle Boot, T-shirt, Trouser, Pullover, Dress, Coat, Sandal (with label
number in that order). Read more here. Note, though, that the order of our labels doesn’t
match the order on the website.
The dataset is provided as 5 separate .npy files. In particular, you should have
train.feats.npy, train.labels.npy, dev.feats.npy, dev.labels.npy, and test.feats.npy.
You will train your models on the train data, and evaluate it on the dev data. As in
past assignments, the unlabeled test data is just there for you to make sure you can run
your model on it without crashing. Ultimately, we will be running your models on test
data (with actual labels), and evaluating your model’s output.
The data loader we have provided expects all these files to be in the same directory,
and loads them by name, so if you rename these files or move them into separate directories,
be sure to change the data loader, or update the way you load data.
Specifically, when we run your models on test data, the command will be (for instance
on a saved ff model):
python3 main.py predict –data-dir ./data/ –model-save ff.torch –predictions-file ff-preds
Before you submit a saved model, make sure you can run this command (with the
correct data directory) and that it outputs valid test predictions.
main.py also has the following generic hyperparameters (hyperparameters used in the
training of all models):
• –model: This is the type of model to train. There are 3 models we will build:
simple-ff, simple-cnn, and best.
• –train-steps: The number of steps to train on. Each step trains on one batch.
• –batch-size: The number of examples to include in your batch.
• –learning-rate: The learning rate to use with your optimizer during training.
While you should not modify these parameters, you may add to them to experiment
with different models. You will then need to modify the train function in main.py. You
can change the training loop to make it better. As long as we can load your stored model
using your test method, you can train it however you’d like.
The file has a few model-specific arguments. You will likely need to add to these when
building your best model. The feed forward model has one model-specific hyperparameter:
• –ff-hunits: The number of hidden units in feed-forward layer 1.
And the CNN model has three model-specific hyperparameters:
What values for these parameters, and what additional parameters you add are up to
you!
If you add additional parameters, they should be optional parameters only. If we run
your train method, we will only provide the parameters listed above. Any other parameters
you add must be optional to avoid an error.
Fall 2021 CS 475/675 Machine Learning: Homework 1 4
You should start this homework in the Jupyter Notebook we provide. You will be
asked to walk through some initial data visualization cells, so that you can become familiar
with the data. Eventually you will need to return to your code, and implement a model in
order to continue using the notebook.
The notebook serves two purposes. First, the notebook will help you debug your
model while you are building it. That is, each model has a portion of the notebook devoted
to plotting loss and accuracy, as well as doing hyperparameter sweeps. Once you have
implemented those cells, you can use them to help you further debug your models and
make decisions about hyperparameter values.
Make sure that you have worked through the entire notebook and run all cells before
you finish. We will grade your notebook.
These scores will be passed directly to a loss function. In this homework, we will
use cross-entropy as our loss function. This is exactly the same loss as we used for our
multiclass logistic regression model in homework 1. You will implicitly rely on the softmax
function through PyTorch’s cross-entropy loss function however, so you never need to use
the softmax function directly.
(See Figure 1) This model uses one linear layer to map from the inputs to n hidden
units, with ReLU activations, and uses another linear layer to map from the hidden
units to vectors of length 10. You can apply relu activations to the output of the
first layer with torch.nn.functional.relu.
2. When you have finished implementing the model, train it via main.py. An example
command might be: python code/main.py train –data-dir data –log-file logs/ff-logs.csv
–model-save models/ff.torch –model simple-ff
3. Complete sections 2.1 and 3.1 of the notebook. You might use these blocks to help
you debug the model, if you are not getting good development accuracy (around
80%).
4. Save a your version of the model which uses default hyperparameters for all values
except for learning rate. Use the best learning rate you found from section 3.1 as
the learning rate for the model you submit. Name the model you submit ff.torch.
Run this model on the test data using predict mode, and name the predictions
ff-predictions.txt. Submit both the model file and the predictions file.
Let’s have a look at how the word channel is used in images and convolutional neural
networks.
The images usually consist of 3 channels, corresponding to primary colors: red, green
and blue. Thus, in reality, images are not two-dimensional objects but rather third-
order tensors, characterized by a height, width, and pixel intensity in each of the three
color channels. Alternatively, the channel dimension can be regarded as assigning a
multidimensional representation to each pixel location.
The convolutional neural network has to adapt accordingly to deal with channels of
information. We can formulate derived features in hidden layers of a CNN as channels, e.g.
third-order tensors, as well. That is to say, instead of just having a single kind of hidden
representation at each location, we would like to represent it in multiple different ways.
Intuitively, you can imagine that some channels in CNN might be focusing on different
shapes of edges, while other focusing on different types of textures, with different higher
level features based on different original input channels.
You may refer to online resources for better understanding of CNN (eg. A guide for
beginners, CS230 by Stanford)
We’ve provided the variables you should use for this model. The first CNN layer is a
k × k convolution that takes in an image with one channel and outputs an image with c
channels (where k is the value of –cnn-n1-kernel and c is the value of –cnn-n1-channels.
The second conv layer is a k2 × k2 convolution that takes in an image with c channels and
outputs an image with 10 channels (where k2 is the value of –cnn-n2-kernel). The output
image has approximately half the height and half the width because of the stride of 2.
You can see that this model has 3 hyperparameters (k, k2, c) in addition to the learning
rate, optimizer, batch size, and number of training iterations. Use the Jupyter notebook
to help select a good number of channels for our first CNN layer.
2. Complete sections 2.2 and 3.2 of the notebook using this model. You should use
these to help you select good parameters for this model. Namely, this model will
take a lot more steps to train than the previous model. Look at the graph of the dev
loss over training steps. Does it seem like it could still be trending upward? That
means you need to train longer! Try to find a setting that at least matches your ff
model on dev data.
3. Save your best CNN model as cnn.torch and the test predictions it makes as cnn-
predictions.txt. Submit both the model file and the predictions file.
mini-batch sizes, the number of layers and/or the number of hidden units / number of
filters per layer; include dropout if you’d like do; etc. You can even go the extra mile with
techniques such as data augmentation, where input images may be randomly cropped
and/or translate and/or blurred and/or rotated etc. We’ve added the scikit-image package
to the requirements.txt file, if you want to take this approach.
However, there are a couple limitations: You may not add additional training
data, and you must be able to store your model in a .torch file no bigger than
1MB.
Hints. You may want to focus on convolutional networks, since they are especially
well suited for processing images. You may often find yourself in a situation where training
is just too slow (for example where validation accuracy fails to climb after a 5 minute
period). It is up to you to cut experiments off early: if training in a reasonable amount of
time isn’t viable, then you can try to change your network or hyperparameters to help
speed things up. In addition, earlier we repeatedly asked that you vary other parameters
as necessary to maximize performance. There is obviously an enormous number of possible
configurations, involving optimizers, learning rates, mini-batch sizes, etc. Our advice is to
find settings that consistently work well across simple architectures, and to only adjust
these settings if insights based on training and validation curves suggest that you should
do so. In particular, remember that Adam helps you avoid manual learning-rate tuning,
and remember that very small minibatches and very large minibatches will both lead to
slow performance, so striking a balance is important.
1. Complete the BestNN model in models.py. You need to define the features for this
model, as well as the forward function.
2. Train this network. Remember, it’s up to you to ensure that you can train your
model in a reasonable amount of time!
3. Complete sections 2.3 and 3.3 of the notebook using this model. As always, you
should use these cells to give you information about which values you should use for
certain hyperparameters of your model, and to help you debug your model.
4. Save your best model as best.torch and it’s test predictions as best-predictions.txt.
Submit both the model file and the predictions file.
To receive full credit, you need to achieve a test accuracy which indicates that you
have put a sufficient amount of effort into building a good classifier. This should improve
over the best model in previous sections by a reasonable amount.
Your .torch models must be named ff.torch, cnn.torch, best.torch. When the code
loads a saved torch model, it is expecting the respective class to be un- changed, so
please make sure that the version of code you submit contains the correct version of
the class for each model, so that each model file can be loaded without error. You
can test this by running main.py in test mode on each of the models.
The prediction files should similarly be named ff-predictions.txt, cnn-predictions.txt,
best-predictions.txt.
Your code must be uploaded as code.zip with your code, models, and predictions in
the root directory. By ‘in the root directory,’ we mean that the zip should contain
*.py at the root (./*.py) and not in any sort of substructure (for example hw1/*.py).
One simple way to achieve this is to zip using the command line, where you include
files directly (e.g., *.py) rather than specifying a folder (e.g., hw1):
zip code.zip *.py *.torch *-predictions.txt
2.10 Questions?
Remember to submit questions about the assignment to the appropriate group on Piazza:
piazza.com/jhu/fall2021/cs601475675.