Cognitive API Using Neural Network

CHAPTER 1
1. INTRODUCTION
Transfer learning is a Machine Learning method where a model developed for a task is reused
as the starting point for a model on a second task. So, this project uses the approach of transfer learning
to re-train a pre-train a model called MobileNet. MobileNets are based on a streamlined architecture
that uses depth wise separable convolutions to build light weight deep neural networks and it is trained
on ImageNet, a dataset of millions of images with labels for 1000 different classes of objects, like dogs,
cats, and fruits.
1.1 Aim of The Project
The main objective of the project is given below:
1. To train a Machine Learning model in a browser without installing various libraries and without
the need of high-end hardware.
2. To use a pre-trained model to make the work easier for the users to have a prediction out of this
model.
3. To deploy a trained model directly into a web browser and serve predictions locally from the
browser.
1.2 Project Domain

Deep Learning
Deep Learning is part of a broader family of Machine Learning methods based on the layers
used in artificial neural networks. Learning can be supervised, semi- supervised or unsupervised. Deep
Learning has recently received increasing attention from researchers and has been successfully applied
to many domains. In some domains, like bioinformatics and robotics, it is very difficult to construct a
large-scale well-annotated dataset due to the expense of data acquisition and costly annotation, which
limits its development.
Deep Transfer Learning
Transfer learning is an important tool in Machine Learning to solve the basic problem of
insufficient training data. It tries to transfer the knowledge from the source domain to the target domain
by relaxing the assumption that the training data and the test data must be identified. This will lead to
a great positive effect on many domains that are difficult to improve because of insufficient training
data.
1
Deep transfer learning studies how to utilize knowledge from other fields by deep neural
networks. Since deep neural networks have become popular in various fields, a considerable amount
of deep transfer learning methods has been proposed that it is very important to classify and summarize
them. Transfer Learning is a Machine Learning technique where a model is trained on one task and
retrained to perform another task.
Fig 1.1 Deep Transfer Learning

MobileNet
MobileNets are based on a streamlined architecture that uses depth wise separable convolutions
to build light weight deep neural networks. For MobileNets the depth wise convolution applies a single
filter to each input channel. The pointwise convolution then applies a 1x1 convolution to combine the
outputs the depth wise convolution.
A standard convolution both filters and combines inputs into a new set of outputs in one step.
The depth wise separable convolution splits this into two layers, a separate layer for filtering and a
separate layer for combining.
MobileNet, a model that is trained on ImageNet a dataset of millions of images with labels for
1000 different classes of objects, like dogs, cats, and fruits. Neural network models, like MobileNet,
typically consist of a stack of layers, which are mathematical transformations of tensors with
parameters that are automatically tuned by the training process. In this MobileNet model, the last layer
is a SoftMax normalization function.
Intuitively, this function "squashes" a vector of unnormalized predictions, generating a
probability for each of the 1000 classes (normalized predictions). The unnormalized predictions vector
2
is usually called logits. The logits are represented as a vector with 1000 elements. In TensorFlow.js,
this is represented as a tensor with shape [1000], where each value contains a number representing the
prediction for that class.
K-Nearest Neighbors
K-Nearest Neighbours are used to distinguish between images of different kinds of objects that
are shown to the webcam. The process will collect a number of images for each class, and compare
new images to this dataset and find the most similar class. The particular algorithm is going to take to
find similar images from the collected dataset is called k-nearest neighbours.
The algorithm uses the semantic information represented in the logits from MobileNet to do
comparison. In k-nearest neighbours, it looks for the most similar k examples to the input for making
a prediction on, and choose the class with the highest representation in that set.
1.3 Problem Statement

Inorder to train a cognitive model, it requires a higher computing power and a wide range of
knowledge in the Deep Learning and Machine Learning algorithms. So, it is difficult for some
researchers to predict and visualize the data for certain kind of problems. It also involves many
processes to train a model and visualize it like pre-processing, clustering the data and then serve
predictions. Due to availability of plenty of resources and numerous kinds of data, it is difficult to train
a Deep Learning model, as it is time consuming and the consistency of the data will not be at full
extent.
In some domains, like bioinformatics and robotics, it is very difficult to construct a large-scale
well-annotated dataset due to the expense of data acquisition and costly annotation, which limits its
development. Data dependence is one of the most serious problem in Deep Learning. Deep Learning
has a very strong dependence on massive training data compared to traditional Machine Learning
methods, because it needs a large amount of data to understand the latent patterns of data.
Insufficient training data is an inescapable problem in some special domains. The collection of
data is complex and expensive that make it is extremely difficult to build a large-scale, high-quality
annotated dataset.
3
CHAPTER 2
2. LITERATURE REVIEW
Several JavaScript-based Deep Learning frameworks have emerged, making it possible to

perform Deep Learning tasks directly in browsers. However, little is known on what and how well we
can do with these frameworks for Deep Learning in browsers. Finally, the performance gap was found
between Deep Learning in browsers and on native platforms by comparing the performance of
TensorFlow.js and TensorFlow in Python. Their findings could help application developers, deep-
learning framework vendors and browser vendors to improve the efficiency of Deep Learning in
browsers. The advance of Deep Learning (DL) technique has significantly promoted the artificial
intelligence (AI). Numerous AI applications, e.g., image processing, object tracking, speech
recognition, and natural language processing, have raised urgent requirements to adopt the DL. This
project made the first study on understanding the feasibility and performance of Deep Learning in Web
browsers. The conclusion is, some potential space of improvement can be done in-browser DL
frameworks, and plan to realize some practical solutions.[1]
TensorFlow.js is a library for building and executing Machine Learning algorithms in

JavaScript. TensorFlow.js models run in a web browser and in the Node.js environment. The library is
part of the TensorFlow ecosystem, providing a set of APIs that are compatible with those in Python,
allowing models to be ported between the Python and JavaScript ecosystems. TensorFlow.js has
empowered a new set of developers from the extensive JavaScript community to build and deploy
Machine Learning models and enabled new classes of on-device computation. This paper describes
the design, API, and implementation of TensorFlow.js, and highlights some of the impactful use cases.
There are a number of opportunities to extend and enhance TensorFlow.js. Given the rapid
progress of browser development, it seems likely that additional GPU programming models may
become available. In particular, conversations could be seen by browser vendors to implement general
purpose GPU programming APIs that will make these kinds of toolkits more performant and easier to
maintain.[5]
4
A survey[4] shows that Deep Learning has recently received increasing attention from
researchers and has been successfully applied to many domains. In some domains, like bioinformatics
and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of
data acquisition and costly annotation, which limits its development. Transfer learning relaxes the
hypothesis that the training data must be independent and identically distributed with the test data,
which motivates us to use transfer learning to solve the problem of insufficient training data. This
survey focuses on reviewing the current researches of transfer learning by using deep neural network
and its applications.
Deep transfer learning is classified into four categories for the first time: instances-based deep
transfer learning, mapping-based deep transfer learning, network-based deep transfer learning, and
adversarial-based deep transfer learning. In most practical applications, the above multiple
technologies are often used in combination to achieve better results. Most current researches focus on
supervised learning, how to transfer knowledge in unsupervised or semi-supervised learning by deep
neural network may attract more and more attention in the future. It can be predicted that deep transfer
learning will be widely applied to solve many challenging problems with the development of deep
neural network.[7]
MobileNets are based on a streamlined architecture that uses depthwise separable convolutions
to build light weight deep neural networks. Convolutional neural networks have become ubiquitous in
computer vision ever since AlexNet popularized deep convolutional neural networks by winning the
ImageNet Challenge: ILSVRC 2012. The general trend was to make deeper and more complicated
networks in order to achieve higher accuracy. However, these advances to improve accuracy are not
necessarily making networks more efficient with respect to size and speed. This project describes an
efficient network architecture and a set of two hyper-parameters in order to build very small, low
latency models that can be easily matched to the design requirements for mobile and embedded vision
applications. This project investigated some of the important design decisions leading to an efficient
model, then demonstrated how to build smaller and faster MobileNets using width multiplier and
resolution multiplier by trading off a reasonable amount of Accuracy to reduce size and latency, and
then compared different MobileNets to popular models demonstrating superior size, speed and
accuracy characteristics. This project concluded by demonstrating MobileNet’s effectiveness when
applied to a wide variety of tasks.[2]
5
This project[3] found that recent research on deep convolutional neural networks (CNNs) has
focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify
multiple CNN architectures that achieve that accuracy level. With equivalent accuracy, smaller CNN
architectures offer at least three advantages: (1) Smaller CNNs require less communication across
servers during distributed training. (2) Smaller CNNs require less bandwidth to export a new model
from the cloud to an autonomous car. (3) Smaller CNNs are more feasible to deploy on FPGAs and
other hardware with limited memory. To provide all of these advantages, a small CNN architecture
called SqueezeNet was proposed. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x
fewer parameters. In this project, proposed steps toward a more disciplined approach to the design-
space exploration of convolutional neural networks. Toward this goal they have presented SqueezeNet,
a CNN architecture that has 50x fewer parameters than AlexNet and maintains AlexNet-level accuracy
on ImageNet. They also compressed SqueezeNet to less than 0.5MB, or 510x smaller than AlexNet
without compression.
6
CHAPTER 3
PROJECT DESCRIPTION
3.1 Introduction
In this chapter the difference between the traditional system and the proposed system is
discussed along with the feasibility of the project and its system specification. The traditional methods
of Machine Learning model are difficult to build and it requires high level GPU cores inorder to train
the model. It also requires a significant amount of knowledge in Machine Learning algorithms and a
wide collection of data sets. The proposed system is that the model is directly deployed in the browser
and train the model in there. For the cognitive model, a pre-trained model is used and they are re-
trained to perform some other task assigned by the user and this approach is called transfer learning.
By this method the users are no longer need to have high end specifications to train the model and
serve predictions from it.
3.2 Existing System

Traditionally, the training of a Machine Learning model is done by teaching it with a large
dataset of examples. Many of the best Machine Learning (ML) and Deep Learning (DL) frameworks
required fluency in Python and its associated library ecosystem. Efficient training of ML models
required the use of special-purpose hardware and software, such as NVIDIA GPUs and CUDA.
The existing system takes time and more knowledge to train a model as many users find
difficult to sustain in the domain as numerous algorithms are used for Deep Learning models. Deep
Learning algorithms attempt to learn high-level features from mass data, which make Deep Learning
beyond traditional Machine Learning. It can automatically extract data features by unsupervised or
semi-supervised feature learning algorithm and hierarchical feature extraction. In contrast, traditional
Machine Learning methods need to design features manually that seriously increases the burden on
users. It can be said that Deep Learning is a representation learning algorithm based on large-scale data
in Machine Learning.
3.3 Proposed System

The proposed system is to run the model in the browser directly without any installation. With
a webcam, users can interactively and a Machine Learning model in real-time to classify input from a
video camera. The approach used here is the transfer learning which allows to re-train a pre-trained
7
model which saves time and also the flexibility of the data sets already available in it. This API can
run existing model or retrain existing model and can save the trained model to use it for other models.
The input images are sent to the MobileNet model and it uses Convolution neural network to
extract the features. Instead of running through all the layers the image is processed from second layer
to last layer which excludes the label and class layer. The user features which is called logits are send
to KNN classifier to differentiate the object from the background and they are labeled by custom
classes by the user. Then they are predicting the classes from the input from the webcam and perform
the respective tasks.
3.4 Feasibility Study

• Detection of unique image amidst the other natural components such as walls, backgrounds etc.
• Extraction of unique characteristic features of an image useful for object recognition.
• Effective recognition of unique objects in a place of crowded other objects.
• Automated inference learning model while training.
• It is extremely difficult to fool the system, so one can feel secure about the system.
• The accuracy of the model is tested with various test cases and it is possible to get 100 percent
accuracy based on the number of data sets trained.
3.5 System Specification

In this project there is a highly usage of the software and less usage of the hardware in order to
achieve that basic result.
3.5.1 Hardware Specification

Windows OS
● Windows 7/8/10 (32-bit or 64-bit).
● Intel Core i3 or later
● 2GB RAM minimum, 8 GB RAM-recommended.
● 2GB of available disk space minimum, 4 GB-Recommended
● Web camera with 2MP
8
Linux OS
● GNOME or KDE desktop: Tested on Ubuntu 12.04.
● 64-bit distribution capable of running 32-bit applications.
● 2GB RAM minimum, 8 GB RAM recommended.
● 2GB of available disk space minimum, 4 GB Recommended.
● 1280 x 800 minimum screen resolution.
3.5.2 Software Specification

Java Script
JavaScript is a high-level, interpreted programming language that conforms to

the ECMAScript specification. JavaScript has curly-bracket syntax, dynamic typing, prototype-
based object-orientation, and first-class functions.
Alongside HTML and CSS, JavaScript is one of the core technologies of the World Wide
Web. JavaScript enables interactive web pages and is an essential part of web applications. The vast
majority of websites use it and major web browsers have a dedicated JavaScript engine to execute it.
As a multi-paradigm language, JavaScript supports event-driven, functional,

and imperative programming styles. It has APIs for working with text, arrays, dates, regular
expressions, and the DOM, but the language itself does not include any I/O, such
as networking, storage, or graphics facilities. It relies upon the host environment in which it is
embedded to provide these features.
Tensor Flow
Tensor Flow is an end-to-end open source platform for Machine Learning. It has a
comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers
push the state-of-the-art in ML and developers easily build and deploy ML powered applications. Build
and train ML models have easily using intuitive high-level APIs like Keras with eager execution, which
makes for immediate model iteration and easy debugging.
9
ML5.JS
Ml5.js aims to make Machine Learning approachable for a broad audience of artists, creative
coders, and students. The library provides access to Machine Learning algorithms and models in the
browser, building on top of TensorFlow.js with no other external dependencies. The library is
supported by code examples, tutorials, and sample datasets with an emphasis on ethical computing.
Bias in data, stereotypical harms, and responsible crowdsourcing are part of the documentation around
data collection and usage.
MobileNet
MobileNet is an architecture which is more suitable for mobile and embedded based vision
applications where there is lack of compute power. This architecture was proposed by Google. This
architecture uses depth wise separable convolutions which significantly reduces the number of
parameters when compared to the network with normal convolutions with the same depth in the
networks. This results in light weight deep neural networks.
Fig 3.1 MobileNet model
3.6 Summary
In this chapter, the current existing systems, the proposed system, its feasibility, hardware
and software requirement for the project were explained briefly. So, this project involves training a
cognitive model inside a web browser without the need of installing any libraries and the model can
deploy the tensor flow styled training model with the help of WebGL.
10
CHAPTER 4
MODULE DESCRIPTION
4.1 Introduction
This chapter gives the functional description of the modules of the proposed system and
explains the architecture and flow of the project with data flow diagram, flow chart and UML diagram.
The inputs are given as the images from the webcam and are sent to the MobileNet model using ml5.js.
The feature extractor from the MobileNet model extracts the features from the images and sends to
KNN classifier and gives the output at a label class assigned by user or in a regression. All the input
training and predictions occur simultaneously, which this method is called inference mode.
4.2 General Architecture
Image from Webcam ImageNet
ML5.JS MobileNet
Extracted Features of the KNN Classifier

Image
Output
Fig 4.1 General Architecture

In this the input is taken through the webcam as image frames and are read by ML5.js which is
a high-level Machine Learning library for java script, then the pixels are extracted using the feature
extraction function from the MobileNet model using ml5.js. The MobileNet consists of ImageNet
which has 1000 classes of image datasets for over millions of images. The features are extracted from
the input image which are called logits and they send to KNN Classifier to distinguish between the
objects and backgrounds and then they are displayed as the output in the form of class names or
regressions.
11
4.3 Design Phase
The design of the proposed system is described, which is used to explain how the system is
going to work, i.e. the process logic behind it, the flowchart, which represents the pictorial
representation of the process logic and finally the Data Flow Diagram.
4.3.1 Data Flow Diagram
Start
Image from
Webcam
Retrain the images

using MobileNet
Feature
Extraction Image from
Webcam
Distinguish objects Gestures or live

using KNN Classifier images
Save the trained

Load the model
model locally
Predict the Output
Perform tasks
based on
predictions
Fig 4.2 Data Flow Diagram
In this data flow diagram, the input is given from webcam as images to the pre-trained model
MobileNet to re-train it using custom class and the features are extracted by the CNN algorithm in the
12
MobileNet. Then the features are sent to the KNN classifier to distinguish the object and the
background. Then they can be saved locally and can be loaded to use the trained model to predict the
classes and perform tasks based on the output.
4.3.2 UML Diagram
Image from webcam
Train the model
Researchers Saved Cognitive model
Load Model
Researchers
Get predictions
Fig 4.3 UML Diagram
In this UML diagram the researchers are the users here and they train the model with the input
images from the webcam. The trained model can be saved locally for the purpose of reusability of the
13
model in any TensorFlow based model and can be loaded to serve predictions from the given inputs as
classes or regression.
4.3.3 Flowchart
START
Image from
Webcam
MobileNet
KNN Classifier
Feature no
recognized
yes
Prediction
END
Fig 4.4 Flow Chart
In this given flowchart the inputs are the images from the webcam and are sent to MobileNet
model to process and extract the features from it. The extracted features are sent to the KNN classifier
to recognize the same object or the gestures. If the features are recognized while predicting the class
then it displays the class or perform tasks for the respective classes else it again starts from the new
input images which is an inference type process where it runs as a loop until the program is stopped.
14
4.4 Module Description
Modules of the project: -
1. Webcam Module
2. MobileNet Module
3. KNN Classification
4.4.1 Webcam Module: -

In this module the webcam captures the video and display on the canvas of the web browser.
Then the video will be sent to the pre-trained model MobileNet as image frames and the features are
extracted using convolution neural networks.
4.4.2 MobileNet Module

The normal convolution is replaced by depth wise convolution followed by pointwise
convolution which is called as depth wise separable convolution. In the depth wise separable
convolution, if the input feature map is of Hi,Wi,Ci dimension and we want Co feature maps in the
resulting feature map and the convolution kernel size is K then there are Ci convolution kernels, one
for each input channel, with dimension K,K,1. This results in a feature map of Ho,Wo,Ci after depth
wise convolution. This is followed by pointwise convolution [1x1 convolution]. This convolution
kernel is of dimension 1,1, Ci and there are Co different kernels which results in the feature map
of Ho,Wo,Co dimension.
15
Fig 4.5 MobileNet Architecture
The overall architecture of the MobileNet is having 30 layers with
1. Convolutional layer with stride 2
2. Depth wise layer
3. Pointwise layer that doubles the number of channels
4. Depth wise layer with stride 2
5. Pointwise layer that doubles the number of channels
16
Table 4.1 MobileNet’s Convolution layers
Type/ Stride Filter Shape Input Size

Conv /s2 3 x 3 x 3 x 32 224 x 224 x 3
Conv dw /s1 3 x 3 x 32 dw 112 x 112 x 32
Conv /s1 1 x 1 x 32 x 64 112 x 112 x 32
Conv dw /s2 3 x 3 x 64 dw 112 x 112 x 64
Conv /s1 1 x 1 x 64 x 128 56 x 56 x 64
Conv dw /s1 3 x 3 x 128 dw 56 x 56 x 128
Conv /s1 1 x 1 x 128 x 128 56 x 56 x 128
Conv dw /s2 3 x 3 x 128 dw 56 x 56 x 128
Conv /s1 1 x 1 x 128 x 256 28 x 28 x 128
Conv dw /s1 3 x 3 x 256 dw 28 x 28 x 256
Conv /s1 1 x 1 x 256 x 256 28 x 28 x 256
Conv dw /s2 3 x 3 x 256 dw 28 x 28 x 256
Conv /s1 1 x 1 x 256 x 512 14 x 14 x 256
5 x Conv dw /s1 3 x 3 x 512 dw 14 x 14 x 512
5 x Conv / s1 1 x 1 x 512 x 512 14 x 14 x 512
Conv dw /s2 3 x 3 x 512 dw 14 x 14 x 512
Conv /s1 1 x 1 x 512 x 1024 7 x 7 x 512
Conv dw /s2 3 x 3 x 1024 dw 7 x 7 x 1024
Conv /s1 1 x 1 x 1024 x 1024 7 x 7 x 1024
Avg Pool / s1 Pool 7 x 7 7 x 7 x 1024
FC /s1 1024 x 1000 1 x 1 x 1024
Softmax /s1 Classifier 1 x 1 x 1000
17
4.4.3 KNN Classifier
k-NN is a type of instance-based learning, or lazy learning, where the function is only
approximated locally and all computation is deferred until classification. Both for classification and
regression, a useful technique can be used to assign weight to the contributions of the neighbors, so
that the nearer neighbors contribute more to the average than the more distant ones. The neighbors are
taken from a set of objects for which the classification or the object property value is known. This can
be thought of as the training set for the algorithm, though no explicit training step is required.
For the distance measured, the Euclidian distance is used
4.5 Summary
This chapter explains the detailed functions of the modules with respect to the system
architecture. The MobileNet model has a feature extractor which uses convolution neural network, so
the images from the webcam is given to the extractor and the image pixels are reduced down to 1000
logits (or features), then they are sent to the KNN classifier to recognize the object or the gesture from
the image. The output is inferred from the given input and the classes are predicted.
18
CHAPTER 5
IMPLEMENTATION & TESTING
5.1 Introduction
The proposed system is to run the model in the browser directly without any installation or
having high-end GPU cores. Users can interactively and train in real-time a Machine Learning model
to classify input from a video camera. The approach used here is the transfer learning which allows to
re-train a pre-trained model which saves time. This API can run existing model or retrain existing
model and can save the trained model to use it for other models.
The input images are sent to the MobileNet model and it uses Convolution neural network to
extract the features. Instead of running through all the layers, the image is processed from second layer
to last layer which excludes the label and class layer. The user features which is called logits are send
to KNN classifier to differentiate the object from the background and they are labeled by custom
classes by the user. Then they are predicting the classes from the input from the webcam and perform
the respective tasks.
Sample test cases: Here, the inputs are given in the video frames through webcam to the
MobileNet model. For the input, the different positions of hand were shown in webcam and respective
pictures were given, they are trained for some few examples at that position with different angles. The
same procedure is done to train the position of various position of the slider with respect to the hand
positions. Then finally the train button is clicked to train the model. The output is shown in the canvas
of the browser as the block moves in the canvas with respect to position of the hand. The classes for
each position are trained individually and the predicted class after training has a user assigned task like
here the position of the slider with respect to the block on the canvas. The inputs and outputs are shown
in the figure 5.1 and 5.2.
There are some limitations in this API which can affect the probability and accuracy of the
predictions like image size, camera pixel quality and angle of the object and position.
19
5.2 Input & Output
5.2.1 Input
Fig 5.1.a Input 1
Fig 5.1.b Input 2
Fig 5.1.c Input 3

20
The inputs are given in the video frames through webcam to the MobileNet model. In the Fig
5.1.a the position of the hand is kept at position A, same as the slider position to label the position’s
co-ordinates and they are trained for some few examples at position A with different angles. The same
procedure is done to train the position of various position of the slider with respect to the hand
positions. Then finally the train button is clicked to train the model.
5.2.2 Output
Fig 5.2.a Output 1
Fig 5.2.b Output 2
21
Fig 5.2.c Output 3
The output is shown in the canvas of the browser as the block moves in the canvas with
respect to position of the hand. The classes for each position are trained individually and the predicted
class after training has a user assigned task like here the position of the slider with respect to the block
on the canvas. The probability of the prediction is shown as regression in the canvas.
5.3 Limitations
5.3.1. Image Quality
Image quality affects how well the neural networks algorithms work. The image quality of
scanning video is quite low compared with that of a digital camera. Even high-definition video is, at
best, 1080p (progressive scan); usually, it is 720p. These values are equivalent to about 2MP and
0.9MP, respectively, while an inexpensive digital camera attains 15MP. The difference is quite
noticeable.
5.3.2. Image Size

When a Neural network algorithm finds an object or a gesture in an image or in a still from
a video capture, the relative size of that object compared with the enrolled image size affects how well
the frames will be recognized. An already small image size, coupled with a target distant from the
camera, means that the detected object is only 100 to 200 pixels on a side. Further, having to scan an
image for varying image sizes is a processor-intensive activity.
22
5.3.3 Object Angle and Position
When training a model, the angle and the position of the object or the gesture shown in the
camera should be same at the time of recognition, else the probability of the prediction will be low.
5.4 Testing
After implementation, the model is trained with more than 20 image examples for each
position with the slider indicating the co-ordinates. When the object is recognized by the classifier the
probability of position is displayed in the canvas below and the object moves with respect to the labels
which contains the co-ordinate of the object. The probability of the position is based on the number of
examples given.
5.5 Summary
This chapter carried out various inputs and outputs of the proposed system and the procedure
of the application and its execution is explained in this chapter with its limitations which decreases the
accuracy of the predictions. Training the input and predicting the out is done concurrently which the
method is called inference mode and this is one of the major advantages of this project.
23
CHAPTER 6
RESULTS AND DISCUSSIONS
6.1 Efficiency of The Proposed System

Many of the best Machine Learning (ML) and Deep Learning (DL) frameworks required
fluency in Python and its associated library ecosystem. Efficient training of ML models required the
use of special-purpose hardware and software, such as NVIDIA GPUs and CUDA. To date, integrating
ML into JavaScript applications often means deploying the ML part on remote cloud services, such as
AWS Sage maker, and accessing it via API calls. This non-native, backend-focused approach has likely
kept many web developers from taking advantage of the rich possibilities that AI offers to frontend
development and with TensorFlow.js these obstacles gone, adopting AI solutions is now quick and
easy.
The Apps are easy to share by providing the user with a URL. Models are run directly in the
browser without additional files or installations. It no longer needs to link JavaScript to a Python file
running on the cloud. And instead of fighting with virtual environments or package managers, all
dependencies can be included as HTML script tags.
6.2 Comparison On Existing And Proposed System

Training and predictions are offloaded to the user’s hardware. This eliminates significant cost
and effort for the developer. There is no need to worry about keeping a potentially costly remote
machine running, adjusting compute power based on changing usage, or service start-up times. By
removing such backend infrastructure requirements, TensorFlow.js lets to focus on creating amazing
user experiences. With TensorFlow.js allows users to take advantage of AI without sending their
personal data over a network and sharing it with a third party. This makes it easier to build secure
applications that satisfy data security regulations, e.g. healthcare apps that tap into wearable medical
sensors.
24
Fig 6.1 Comparison of existing and proposed system
6.3 Advantages Of Proposed Sysytem
1. Apps are easy to share: Models are run directly in the browser without additional files or
installations and can be shared using an URL hosting in a server. It no longer needs to link JavaScript
to a Python file running on the cloud. And instead of fighting with virtual environments or package
managers, all dependencies can be included as HTML script tags.
2. The client provides the compute power: Training and predictions are offloaded to the user’s
hardware. This eliminates significant cost and effort for the developer.
3. Data never leaves the client’s device : This is crucially important as users are increasingly
concerned about protecting their sensitive information, especially in the wake of massive data scandals
and security breaches. With TensorFlow.js, users can take advantage of AI without sending their
personal data over a network and sharing it with a third party. This makes it easier to build secure
applications that satisfy data security regulations, e.g. healthcare apps that tap into wearable medical
sensors.
25
CHAPTER 7
CONCLUSION AND FUTURE ENHANCEMENTS
7.1 Conclusion
Instead of training a model to do something very narrow (for example, recognize cats), this
model can be trained to recognize any input data. So, transfer learning is used to re-train a pre-trained
model which here is MobileNet. This enables users to train a model within a browser without needing
to install any kind of libraries or having higher workstations. This application helps researchers and
data scientists to run a model and serve predictions directly from the browser. This API is tested with
many test cases may give 100 percent accuracy besides the limitations.
7.2 Future Enhancements

This API can be further implemented in applications in medical imaging for recognizing and
extracting tumour affected regions from brain images. It also can be developed to visualize the real
time moving object at all angles and position to further develop the computer vision for AI like self-
driving cars and robotics.
26
APPENDICES
Sample Source Code
Index.html
<!DOCTYPE html>
<html>
<head>
<title>Major Project</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/0.8.0/p5.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/0.8.0/addons/p5.dom.min.js"></script>
<script src="https://unpkg.com/[email protected]/dist/ml5.min.js"></script>
<script src="sketch.js"></script>
</head>
<body></body>
</html>
5.2.2 Sketch.js
let video;
let features;
let knn;
let labelP;
let ready = false;
let x;
let y;
let label = 'nothing';
function setup() {
createCanvas(320, 240);
video = createCapture(VIDEO);
video.size(320, 240);
features = ml5.featureExtractor('MobileNet', modelReady);
27
knn = ml5.KNNClassifier();
labelP = createP('need training data');
labelP.style('font-size', '32pt');
x = width / 2;
y = height / 2;
}
function goClassify() {
const logits = features.infer(video);
knn.classify(logits, function(error, result) {
if (error) {
console.error(error);
} else {
label = result.label;
labelP.html(result.label);
goClassify();
}
});
}
function keyPressed() {
const logits = features.infer(video);
if (key == 'l') {
knn.addExample(logits, 'left');
console.log('left');
} else if (key == 'r') {
knn.addExample(logits, 'right');
console.log('right');
} else if (key == 'u') {
knn.addExample(logits, 'up');
console.log('up');
} else if (key == 'd') {
knn.addExample(logits, 'down');
console.log('down');
28
} else if (key == 's') {
save(knn, 'model.json');
knn.save('model.json');
}
}
function modelReady() {
console.log('model ready!');
knn.load('model.json', function() {
console.log('knn loaded');
});
}
function draw() {
background(0);
fill(255);
ellipse(x, y, 24);
if (label == 'left') {
x--;
} else if (label == 'right') {
x++;
} else if (label == 'up') {
y--;
} else if (label == 'down') {
y++;
}
//image(video, 0, 0);
if (!ready && knn.getNumLabels() > 0) {
goClassify();
ready = true;
}
}
29
REFERENCES
[1] Chuanqi Tan , Fuchun Sun , Tao Kong , Wenchang Zhang , Chao Yang , and Chunfang : A
Survey on Deep Transfer Learning (2017)
[2] Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko, Weijun Wang Tobias
Weyand Marco Andreetto Hartwig AdamMobileNets: Efficient Convolutional Neural
Networks for Mobile Vision Applications (2018)
[3] Bauer, M., Rojas-Carulla, M., Świątkowski, J.B., Schölkopf, B. and Turner, R.E., 2017.
Discriminative k-shot learning using probabilistic models. arXiv preprint arXiv:1706.00326.
[4] Bragg, D., Huynh, N. and Ladner, R.E., 2016, October. A personalizable mobile sound detector
app design for deaf and hard-of-hearing users. In Proceedings of the 18th International ACM
SIGACCESS Conference on Computers and Accessibility (pp. 3-13). ACM.
[5] Flores, G.H. and Manduchi, R., 2016, October. WeAllWalk: An Annotated Data Set of Inertial
Sensor Time Series from Blind Walkers. In Proceedings of the 18th International ACM
SIGACCESS Conference on Computers and Accessibility (pp. 141-150). ACM.
[6] Fowler, A., Roark, B., Orhan, U., Erdogmus, D. and Fried-Oken, M., 2013. Improved inference
and autotyping in EEG-based BCI typing systems. In Proceedings of the 15th International
ACM SIGACCESS Conference on Computers and Accessibility (p. 15). ACM
[7] Daniel Smilkov, Nikhil Thorat, Yannick Assogba Ann Yuan, Nick Kreeger, Ping Yu Kangyi
Zhang Shanqing, Cai Eric Nielsen, David Soergel, Stan Bileschi, Michael Terry, Charles
Nicholson, Sandeep N. Gupta, Sarah Sirajuddin, D. Sculley, Rajat Monga, Greg Corrado
Fernanda B. Viegas : Tensorflow.Js: Machine Learning For The Web And Beyond
[8] Martin Wattenberg 1M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin, et al. TensorFlow: Large-scale Machine Learning on
heterogeneous systems, 2015. Software available from TensorFlow. org, 1, 2015.
30

Cognitive API Using Neural Network

Uploaded by

Copyright:

Available Formats

Cognitive API Using Neural Network

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cognitive API Using Neural Network

Uploaded by

Copyright:

Available Formats

CHAPTER 1

1.2 Project Domain

Fig 1.1 Deep Transfer Learning

1.3 Problem Statement

Several JavaScript-based Deep Learning frameworks have emerged, making it possible to

TensorFlow.js is a library for building and executing Machine Learning algorithms in

3.2 Existing System

3.3 Proposed System

3.4 Feasibility Study

3.5 System Specification

3.5.1 Hardware Specification

● Intel Core i3 or later

● 2GB RAM minimum, 8 GB RAM-recommended.

● 2GB of available disk space minimum, 4 GB-Recommended

● Web camera with 2MP

● 64-bit distribution capable of running 32-bit applications.

● 2GB RAM minimum, 8 GB RAM recommended.

● 2GB of available disk space minimum, 4 GB Recommended.

● 1280 x 800 minimum screen resolution.

3.5.2 Software Specification

JavaScript is a high-level, interpreted programming language that conforms to

As a multi-paradigm language, JavaScript supports event-driven, functional,

Fig 3.1 MobileNet model

4.2 General Architecture

Image from Webcam ImageNet

Extracted Features of the KNN Classifier

Fig 4.1 General Architecture

Retrain the images

Distinguish objects Gestures or live

Save the trained

Predict the Output

4.3.2 UML Diagram

Image from webcam

Train the model

Researchers Saved Cognitive model

Fig 4.3 UML Diagram

4.4.1 Webcam Module: -

4.4.2 MobileNet Module

The overall architecture of the MobileNet is having 30 layers with

1. Convolutional layer with stride 2

2. Depth wise layer

3. Pointwise layer that doubles the number of channels

4. Depth wise layer with stride 2

5. Pointwise layer that doubles the number of channels

Type/ Stride Filter Shape Input Size

Fig 5.1.a Input 1

Fig 5.1.b Input 2

Fig 5.1.c Input 3

Fig 5.2.a Output 1

Fig 5.2.b Output 2

5.3.2. Image Size

6.1 Efficiency of The Proposed System

6.2 Comparison On Existing And Proposed System

6.3 Advantages Of Proposed Sysytem

7.2 Future Enhancements

You might also like