Course Material For cs391
Course Material For cs391
Course Material For cs391
Recognition
CSC 391: Introduction to Computer Vision
Recognition review
• Recognition tasks
• scene categorization, annotation, detection, activity
recognition, parsing
• Object categorization
• Machine learning framework
• training, testing, generalization
• Example classifiers
• Nearest neighbor
• Linear classifiers
The machine learning framework
y = f(x)
output Image feature
prediction function
• Training: given a training set of labeled examples
{(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error
on the training set
• Testing: apply f to a never before seen test example x and output the predicted value y =
f(x)
Steps
Training Training
Labels
Training
Images
Image Learned
Training
Features model
Learned
model
Testing
Image
Prediction
Features
Test Image Slide credit: D. Hoiem
Image features
• Spatial
support:
• GIST descriptors
• Bags of features
GIST descriptors
• Oliva & Torralba (2001)
http://people.csail.mit.edu/torralba/code/spatialenvelope/
Bags of features
Bag-of-features steps
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
14.
1. Local feature extraction
Compute
descriptor Normalize patch
Detect patches
Clustering
Clustering
Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
• Assign each feature to the nearest center
• Recompute each cluster center as the mean of all features assigned to it
Bag-of-features steps
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
Visual vocabularies: Details
• How to choose vocabulary size?
• Too small: visual words not representative of all patches
• Too large: quantization artifacts, overfitting
• Right size is application-dependent
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human
Action Categories Using Spatial-Temporal Words, IJCV 2008.
Bags of features for action recognition
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human
Action Categories Using Spatial-Temporal Words, IJCV 2008.
Credit:
Slide set developed by S. Lazebnik, University
of Illinois at Urbana-Champaign