Ai Image Captioning
Ai Image Captioning
Ai Image Captioning
Datasets
•Common Objects in Context (COCO). A collection of more than 120 thousand images with
descriptions
•Flickr 8K. A collection of 8 thousand described images taken from flickr.com.
•Flickr 30K. A collection of 30 thousand described images taken from flickr.com.
•Exploring Image Captioning Datasets, 2016
Data Collection
There are many open source datasets available for this
problem, like Flickr 8k (containing8k images), Flickr
30k (containing 30k images), MS COCO (containing
180k images), etc.
But for the purpose of this case study, I have used the
Flickr 8k dataset which you can download by filling this
form provided by the University of Illinois at Urbana-
Champaign. Also training a model with large number of
images may not be feasible on a system which is not a
very high end PC/Laptop.
This dataset contains 8000 images each with 5 captions A white dog in a grassy area
(as we have already seen in the Introduction section that (Image Captioning )
an image can have multiple captions, all being relevant
simultaneously).
These images are bifurcated as follows:
•Training Set — 6000 images
•Dev Set — 1000 images
•Test Set — 1000 images
Data Preprocessing — Images
Images are nothing but input (X) to our model.
As you may already know that any input to a
model must be given in the form of a vector.
We need to convert every image into a fixed sized
vector which can then be fed as input to the
neural network. For this purpose, we opt
for transfer learning by using the InceptionV3
model (Convolutional Neural Network) created
by Google Research.
This model was trained on Imagenet dataset to
perform image classification on 1000 different
classes of images. However, our purpose here is
not to classify the image but just get fixed-length
informative vector for each image. This process is
called automatic feature engineering.
Hence, we just remove the last softmax layer
from the model and extract a 2048 length vector
(bottleneck features) for every image as given.
Data Preparation
This is one of the most important
steps in this case study. Here we
will understand how to prepare the
data in a manner which will be (Train image 1) Caption -> The black cat sat on grass
convenient to be given as input to
the deep learning model.