Ai Image Captioning

Uploaded by

This document discusses automatic image captioning using convolutional neural networks (CNNs) and recurrent neural networks (RNNs). It describes commonly used datasets for image captioning like COCO, Flickr 8K, and Flickr 30K. It also outlines the steps involved, including collecting images and captions from datasets, preprocessing images by extracting features from CNNs, preparing the data by pairing images and captions and building a vocabulary, and using RNNs to generate captions.

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Ai Image Captioning

Uploaded by

Harkirat Singh

0% found this document useful (0 votes)

41 views10 pages

Original Description:

Original Title

AI IMAGE CAPTIONING.pptx

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

41 views10 pages

Ai Image Captioning

Uploaded by

Harkirat Singh

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 10

Search inside document

Automatic Image Captioning

Bot with CNN and RNN

-Submitted By-Harkirat Singh
CSE-3
01976802717
INTRODUCTION
• Image Captioning is the process of generating textual
description of an image. It uses both Natural Language
Processing and Computer Vision to generate the captions.

• The dataset will be in the form [image → captions]. The

dataset consists of input images and their corresponding
output captions.
Image Captioning

Datasets
•Common Objects in Context (COCO). A collection of more than 120 thousand images with
descriptions
•Flickr 8K. A collection of 8 thousand described images taken from flickr.com.
•Flickr 30K. A collection of 30 thousand described images taken from flickr.com.
•Exploring Image Captioning Datasets, 2016
Data Collection
 There are many open source datasets available for this
problem, like Flickr 8k (containing8k images), Flickr
30k (containing 30k images), MS COCO (containing
180k images), etc.
 But for the purpose of this case study, I have used the
Flickr 8k dataset which you can download by filling this
form provided by the University of Illinois at Urbana-
Champaign. Also training a model with large number of
images may not be feasible on a system which is not a
very high end PC/Laptop.
 This dataset contains 8000 images each with 5 captions A white dog in a grassy area
(as we have already seen in the Introduction section that (Image Captioning )
an image can have multiple captions, all being relevant
simultaneously).
These images are bifurcated as follows:
•Training Set — 6000 images
•Dev Set — 1000 images
•Test Set — 1000 images
Data Preprocessing — Images
 Images are nothing but input (X) to our model.
As you may already know that any input to a
model must be given in the form of a vector.
 We need to convert every image into a fixed sized
vector which can then be fed as input to the
neural network. For this purpose, we opt
for transfer learning by using the InceptionV3
model (Convolutional Neural Network) created
by Google Research.
 This model was trained on Imagenet dataset to
perform image classification on 1000 different
classes of images. However, our purpose here is
not to classify the image but just get fixed-length
informative vector for each image. This process is
called automatic feature engineering.
 Hence, we just remove the last softmax layer
from the model and extract a 2048 length vector
(bottleneck features) for every image as given.
Data Preparation
 This is one of the most important
steps in this case study. Here we
will understand how to prepare the
data in a manner which will be (Train image 1) Caption -> The black cat sat on grass
convenient to be given as input to
the deep learning model.

 Hereafter, I will try to explain the

remaining steps by taking a sample
example as follows.
 Consider we have 2 images and (Train image 2) Caption -> The white cat is walking on road
their 2 corresponding captions as
given.
 First we need to convert both the images to their
corresponding 2048 length feature vector as discussed
above. Let “Image_1” and “Image_2” be the feature
vectors of the first two images respectively
 Secondly, let’s build the vocabulary for the first two (train)
captions by adding the two tokens “startseq” and “endseq”
in both of them: (Assume we have already performed the
basic cleaning steps)
 Caption_1 -> “startseq the black cat sat on grass endseq”
 Caption_2 -> “startseq the white cat is walking on road
endseq”
THANK YOU!

CNN RNN Assignment Set 4
Document2 pages
CNN RNN Assignment Set 4
Surendra Tanwar
0% (1)
Cat and Dog Classification Using CNN: Project Objective
Document7 pages
Cat and Dog Classification Using CNN: Project Objective
coursera details
No ratings yet
2350 Oxidant Demand Requirement
Document5 pages
2350 Oxidant Demand Requirement
Raposo Irene
No ratings yet
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
Document10 pages
Automatic Image Captioning Bot With CNN and RNN: - Submitted By-Harkirat Singh CSE-3 01976802717
Harkirat Singh
No ratings yet
Image Captioning
Document17 pages
Image Captioning
Sita Putra Teja
No ratings yet
Assignment 1 (DIP) - Dipayan Rana
Document3 pages
Assignment 1 (DIP) - Dipayan Rana
Dipayan Rana
No ratings yet
ACFrOgCfX9ATrHm9ZSjs1HLKnJCXmmPcIwFi Y7hVAv6zU1Li3igjIXOOLtGhffODBql8a993YAsc3gM SE8bidlMJr2eFkl9eJB0BU8jcLD6iWrroxwbp1 X9yQtpQks6r8vMLEnR-ORk02lgVJ
Document20 pages
ACFrOgCfX9ATrHm9ZSjs1HLKnJCXmmPcIwFi Y7hVAv6zU1Li3igjIXOOLtGhffODBql8a993YAsc3gM SE8bidlMJr2eFkl9eJB0BU8jcLD6iWrroxwbp1 X9yQtpQks6r8vMLEnR-ORk02lgVJ
Hii There
No ratings yet
Cad and Dog 2
Document5 pages
Cad and Dog 2
Muhammad Rifai
No ratings yet
Cad and Dog
Document5 pages
Cad and Dog
Muhammad Rifai
No ratings yet
Open Frameworks and OpenCV
Document43 pages
Open Frameworks and OpenCV
Денис Перевалов
No ratings yet
Exercise 2 Building Convolution Neural Network
Document15 pages
Exercise 2 Building Convolution Neural Network
Hockhin Ooi
No ratings yet
Image Captioning
Document14 pages
Image Captioning
Saginala Sharon
No ratings yet
Extract A Feature Vector For Any Image With PyTorch - by Christian Safka - Becoming Human - Artificial Intelligence Magazine
Document7 pages
Extract A Feature Vector For Any Image With PyTorch - by Christian Safka - Becoming Human - Artificial Intelligence Magazine
lumierebatalong
No ratings yet
Emgu CV Tutorial Skander
Document36 pages
Emgu CV Tutorial Skander
Joel Matilde
No ratings yet
Lab 1 Report PDF
Document18 pages
Lab 1 Report PDF
Trần Tín
No ratings yet
Why Convolutions?: Till Now in MLP
Document38 pages
Why Convolutions?: Till Now in MLP
Itokiana valimbavaka Rabenantenaina
No ratings yet
Deep Neural Network Application
Document17 pages
Deep Neural Network Application
svenmarshall606
No ratings yet
Practical 3
Document4 pages
Practical 3
magnus
No ratings yet
CNN Implementation in Python
Document7 pages
CNN Implementation in Python
Muhammad Usman
No ratings yet
Image Reference Guide: Install Pillow
Document4 pages
Image Reference Guide: Install Pillow
guillermo
No ratings yet
Deep Learning With PyTorch
Document19 pages
Deep Learning With PyTorch
nirajaadithya.dasireddi
No ratings yet
Convolutional Neural Network Project On Image Classification
Document8 pages
Convolutional Neural Network Project On Image Classification
Reality Aielumoh
No ratings yet
Decomposing Embedded Images - Loren On The Art of MATLAB
Document12 pages
Decomposing Embedded Images - Loren On The Art of MATLAB
Yeni Ben
No ratings yet
A CNN-Based Human Head Detection Algorithm Implemented On Edge AI Chip
Document5 pages
A CNN-Based Human Head Detection Algorithm Implemented On Edge AI Chip
wafa wafa
No ratings yet
Project
Document15 pages
Project
debojyoti
No ratings yet
Assignment 2: IRAF Fundamentals: 1 Image Headers
Document2 pages
Assignment 2: IRAF Fundamentals: 1 Image Headers
Timothy Leonard
No ratings yet
Conversion of Sign Language To Text
Document13 pages
Conversion of Sign Language To Text
Aakriti
No ratings yet
Digital Image Processing-Lab (15-EC 4110L)
Document13 pages
Digital Image Processing-Lab (15-EC 4110L)
bhaskar
No ratings yet
Augmentation and Segmentation
Document32 pages
Augmentation and Segmentation
Tanmay Sahu
No ratings yet
Watermark Images: Image Processing - Opencv, Python & C++ By: Rahul Kedia
Document9 pages
Watermark Images: Image Processing - Opencv, Python & C++ By: Rahul Kedia
Kedia Rahul
No ratings yet
JupyterLab05 CLIP
Document17 pages
JupyterLab05 CLIP
RANIA_MKHININI_GAHAR
No ratings yet
Aaquib Capstone Project Edited
Document2 pages
Aaquib Capstone Project Edited
shivam5singh-25
No ratings yet
Dsaa Group Project
Document3 pages
Dsaa Group Project
msroshi madhu
No ratings yet
Manipulation of Images With Python
Document16 pages
Manipulation of Images With Python
Sabbir Hasan
No ratings yet
Building Powerful Image Classification Models Using Very Little Data
Document20 pages
Building Powerful Image Classification Models Using Very Little Data
annjie123456
No ratings yet
DLT Record Final
Document120 pages
DLT Record Final
orangesareamazing46
No ratings yet
Cartoonify An Image With OpenCV in Python
Document13 pages
Cartoonify An Image With OpenCV in Python
Priyam Singha Roy
No ratings yet
DIP Lab File
Document13 pages
DIP Lab File
Aniket Kumar 10
No ratings yet
OpenCV Lections: 3. Mat Class
Document16 pages
OpenCV Lections: 3. Mat Class
Денис Перевалов
100% (1)
Digital Image Processing: Lab Assignements: Instructions
Document4 pages
Digital Image Processing: Lab Assignements: Instructions
S. Magidi
No ratings yet
ANN Final Exam
Document13 pages
ANN Final Exam
basit
100% (1)
DL Report (Prabal)
Document11 pages
DL Report (Prabal)
prabaltiwar2004
No ratings yet
Deep Learning Sec4
Document18 pages
Deep Learning Sec4
sama ghorab
No ratings yet
Structural Damage Image Classification: Minnie Ho Jorge Troncoso
Document6 pages
Structural Damage Image Classification: Minnie Ho Jorge Troncoso
Aum Mangal
No ratings yet
Large-Scale Image Classification
Document8 pages
Large-Scale Image Classification
Seun -nuga Daniel
No ratings yet
Building A Convolutional Neural Network Using Tensorflow Keras
Document10 pages
Building A Convolutional Neural Network Using Tensorflow Keras
harshita.btech22
No ratings yet
Image Category Classification Using Deep Learning
Document11 pages
Image Category Classification Using Deep Learning
Hoàng Ngọc Cảnh
No ratings yet
Deng Et Al - 2009 - ImageNet
Document8 pages
Deng Et Al - 2009 - ImageNet
Nicolas Shu
No ratings yet
ICS 2201 Images and GUI Programming
Document89 pages
ICS 2201 Images and GUI Programming
gatmachyuol
No ratings yet
Sketch4Match - Content-Based Image Retrieval System Using Sketches
Document17 pages
Sketch4Match - Content-Based Image Retrieval System Using Sketches
Mothukuri VijayaSankar
No ratings yet
CNN Image Classification - Image Classification Using CNN
Document9 pages
CNN Image Classification - Image Classification Using CNN
chowsaj9
No ratings yet
Database Columns: Updated On 17 Apr 2012, Published On 16 Feb 2008
Document6 pages
Database Columns: Updated On 17 Apr 2012, Published On 16 Feb 2008
Yuvarajasena
No ratings yet
Renjini Paper 2
Document16 pages
Renjini Paper 2
Nancy123
No ratings yet
My Reference Manual EmuCv PDF
Document23 pages
My Reference Manual EmuCv PDF
Phiokham Suriya
No ratings yet
Computer Vision Part 2
Document5 pages
Computer Vision Part 2
banani1776
No ratings yet
Image Processing And Acquisition Using Python
From Everand
Image Processing And Acquisition Using Python
successkpk
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
Computer Graphics in Python
From Everand
Computer Graphics in Python
Martin McBride
No ratings yet
Digital Image Processing: Fundamentals and Applications
From Everand
Digital Image Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Cantonk CCTV Cameras Price V1026D
Document16 pages
Cantonk CCTV Cameras Price V1026D
Abhignan Sharma
No ratings yet
Pa PGD TT 6.6.1.10 Izbor Klima Komora
Document12 pages
Pa PGD TT 6.6.1.10 Izbor Klima Komora
Stefan Vitezović
No ratings yet
Example 12: Gas Properties 27
Document3 pages
Example 12: Gas Properties 27
Lr Fr
No ratings yet
Part 4 - Solution Design Documents - What You Need To Know
Document29 pages
Part 4 - Solution Design Documents - What You Need To Know
247 Narender
No ratings yet
iml-CONTAINERS Study
Document19 pages
iml-CONTAINERS Study
jj
No ratings yet
Sample Chapter
Document12 pages
Sample Chapter
Cinthia Gonzalez Nunez
No ratings yet
Philadelphia University Student Name: Faculty of Engineering Student Number
Document2 pages
Philadelphia University Student Name: Faculty of Engineering Student Number
Saif Uddin
No ratings yet
Using WinTech SafeTech Recover HDD Data On HP6530b
Document1 page
Using WinTech SafeTech Recover HDD Data On HP6530b
mlxipxhp
No ratings yet
MPMC MCQ Unit 1
Document7 pages
MPMC MCQ Unit 1
Prajwal Birwadkar
No ratings yet
Refrigeration Component Spare Parts Catalogue 23-09-2010
Document87 pages
Refrigeration Component Spare Parts Catalogue 23-09-2010
Iskandar Firdaus
No ratings yet
Astm E606 E606m 21
Document7 pages
Astm E606 E606m 21
akash.biradar
No ratings yet
Automatic Switching On and Off of Water Pumps
Document15 pages
Automatic Switching On and Off of Water Pumps
sravani
No ratings yet
Phylum Porifera Activity
Document5 pages
Phylum Porifera Activity
Rhealyn Baylon
50% (2)
D ODB Final Exam Answered
Document7 pages
D ODB Final Exam Answered
Mostafa Elbigarmii
No ratings yet
BEE302A
Document2 pages
BEE302A
hrushithar27
No ratings yet
A Literature Survey of Speech Recognition and Hidden Markov Models
Document6 pages
A Literature Survey of Speech Recognition and Hidden Markov Models
amardeepsinghseera
No ratings yet
Treasure Hunt
Document7 pages
Treasure Hunt
madhumitarouth15606
No ratings yet
Deoptfuscator Defeating Advanced Control-Flow Obfuscation Using Android Runtime ART
Document15 pages
Deoptfuscator Defeating Advanced Control-Flow Obfuscation Using Android Runtime ART
Paulo Junior
No ratings yet
19 Out
Document32 pages
19 Out
arby rezkiansyah
No ratings yet
Brksec-2046 (2013)
Document84 pages
Brksec-2046 (2013)
Paul Zeto
No ratings yet
Transient Heat Conduction (Chap# 4)
Document55 pages
Transient Heat Conduction (Chap# 4)
Saif Ullah
No ratings yet
Fluid Mechanics Lectures and Tutorials 30: Abs Atm Gage
Document11 pages
Fluid Mechanics Lectures and Tutorials 30: Abs Atm Gage
Anees Kadhum Alsaadi
No ratings yet
JavaScript Cheatsheet - CodeWithHarry
Document13 pages
JavaScript Cheatsheet - CodeWithHarry
bishal sarma
No ratings yet
Unit 3
Document77 pages
Unit 3
patilamrutak2003
No ratings yet
1870-Article Text-6785-1-10-20201223
Document16 pages
1870-Article Text-6785-1-10-20201223
Rayhan Fahreza
No ratings yet
Pre-Laboratory#5 - CHEM1103 - DETERMINATION OF HEAT OF COMBUSTION USING A BOMB CALORIMETER
Document3 pages
Pre-Laboratory#5 - CHEM1103 - DETERMINATION OF HEAT OF COMBUSTION USING A BOMB CALORIMETER
MarielleCaindec
No ratings yet
Automation With Logo
Document47 pages
Automation With Logo
Pat Carpa
No ratings yet
F
Document92 pages
F
josephbright330
No ratings yet
MS&E 448 Final Presentation High Frequency Algorithmic Trading
Document29 pages
MS&E 448 Final Presentation High Frequency Algorithmic Trading
akion xc
No ratings yet