Course Material For cs391

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Image Features for

Recognition
CSC 391: Introduction to Computer Vision
Recognition review
• Recognition tasks
• scene categorization, annotation, detection, activity
recognition, parsing
• Object categorization
• Machine learning framework
• training, testing, generalization
• Example classifiers
• Nearest neighbor
• Linear classifiers
The machine learning framework
y = f(x)
output Image feature
prediction function


• Training: given a training set of labeled examples 

{(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error
on the training set
• Testing: apply f to a never before seen test example x and output the predicted value y =
f(x)
Steps
Training Training
Labels
Training
Images
Image Learned
Training
Features model

Learned
model
Testing

Image
Prediction
Features
Test Image Slide credit: D. Hoiem
Image features
• Spatial
support:

Pixel or local patch Segmentation region

Bounding box Whole image


Image features
• Global image features for whole-image
classification tasks

• GIST descriptors
• Bags of features
GIST descriptors
• Oliva & Torralba (2001)

http://people.csail.mit.edu/torralba/code/spatialenvelope/
Bags of features
Bag-of-features steps
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”

14.
1. Local feature extraction

• Regular grid or interest regions


1. Local feature extraction

Compute
descriptor Normalize patch

Detect patches

Slide credit: Josef Sivic


1. Local feature extraction

Slide credit: Josef Sivic


2. Learning the visual vocabulary

Slide credit: Josef Sivic


2. Learning the visual vocabulary

Clustering

Slide credit: Josef Sivic


2. Learning the visual vocabulary
Visual vocabulary

Clustering

Slide credit: Josef Sivic


Review: K-means clustering
• Want to minimize sum of squared Euclidean
distances between features xi and their nearest
cluster centers mk


 2
D( X , M ) = ∑ ∑ (x i − mk )

 cluster k point i in
cluster k

Algorithm:
• Randomly initialize K cluster centers
• Iterate until convergence:
• Assign each feature to the nearest center
• Recompute each cluster center as the mean of all features assigned to it
Bag-of-features steps
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
Visual vocabularies: Details
• How to choose vocabulary size?
• Too small: visual words not representative of all patches
• Too large: quantization artifacts, overfitting
• Right size is application-dependent

• Improving efficiency of quantization


• Vocabulary trees (Nister and Stewenius, 2005)

• Improving vocabulary quality


• Discriminative/supervised training of codebooks
• Sparse coding, non-exclusive assignment to codewords

• More discriminative bag-of-words representations


• Fisher Vectors (Perronnin et al., 2007), VLAD (Jegou et al., 2010)

• Incorporating spatial information


Bags of features for action recognition
Space-time interest points

Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human
Action Categories Using Spatial-Temporal Words, IJCV 2008.
Bags of features for action recognition

Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human
Action Categories Using Spatial-Temporal Words, IJCV 2008.
Credit:
Slide set developed by S. Lazebnik, University
of Illinois at Urbana-Champaign

You might also like