Cognitive API Using Neural Network
Cognitive API Using Neural Network
Cognitive API Using Neural Network
1. INTRODUCTION
Transfer learning is a Machine Learning method where a model developed for a task is reused
as the starting point for a model on a second task. So, this project uses the approach of transfer learning
to re-train a pre-train a model called MobileNet. MobileNets are based on a streamlined architecture
that uses depth wise separable convolutions to build light weight deep neural networks and it is trained
on ImageNet, a dataset of millions of images with labels for 1000 different classes of objects, like dogs,
cats, and fruits.
1.1 Aim of The Project
The main objective of the project is given below:
1. To train a Machine Learning model in a browser without installing various libraries and without
the need of high-end hardware.
2. To use a pre-trained model to make the work easier for the users to have a prediction out of this
model.
3. To deploy a trained model directly into a web browser and serve predictions locally from the
browser.
2
is usually called logits. The logits are represented as a vector with 1000 elements. In TensorFlow.js,
this is represented as a tensor with shape [1000], where each value contains a number representing the
prediction for that class.
K-Nearest Neighbors
K-Nearest Neighbours are used to distinguish between images of different kinds of objects that
are shown to the webcam. The process will collect a number of images for each class, and compare
new images to this dataset and find the most similar class. The particular algorithm is going to take to
find similar images from the collected dataset is called k-nearest neighbours.
The algorithm uses the semantic information represented in the logits from MobileNet to do
comparison. In k-nearest neighbours, it looks for the most similar k examples to the input for making
a prediction on, and choose the class with the highest representation in that set.
3
CHAPTER 2
2. LITERATURE REVIEW
There are a number of opportunities to extend and enhance TensorFlow.js. Given the rapid
progress of browser development, it seems likely that additional GPU programming models may
become available. In particular, conversations could be seen by browser vendors to implement general
purpose GPU programming APIs that will make these kinds of toolkits more performant and easier to
maintain.[5]
4
A survey[4] shows that Deep Learning has recently received increasing attention from
researchers and has been successfully applied to many domains. In some domains, like bioinformatics
and robotics, it is very difficult to construct a large-scale well-annotated dataset due to the expense of
data acquisition and costly annotation, which limits its development. Transfer learning relaxes the
hypothesis that the training data must be independent and identically distributed with the test data,
which motivates us to use transfer learning to solve the problem of insufficient training data. This
survey focuses on reviewing the current researches of transfer learning by using deep neural network
and its applications.
Deep transfer learning is classified into four categories for the first time: instances-based deep
transfer learning, mapping-based deep transfer learning, network-based deep transfer learning, and
adversarial-based deep transfer learning. In most practical applications, the above multiple
technologies are often used in combination to achieve better results. Most current researches focus on
supervised learning, how to transfer knowledge in unsupervised or semi-supervised learning by deep
neural network may attract more and more attention in the future. It can be predicted that deep transfer
learning will be widely applied to solve many challenging problems with the development of deep
neural network.[7]
MobileNets are based on a streamlined architecture that uses depthwise separable convolutions
to build light weight deep neural networks. Convolutional neural networks have become ubiquitous in
computer vision ever since AlexNet popularized deep convolutional neural networks by winning the
ImageNet Challenge: ILSVRC 2012. The general trend was to make deeper and more complicated
networks in order to achieve higher accuracy. However, these advances to improve accuracy are not
necessarily making networks more efficient with respect to size and speed. This project describes an
efficient network architecture and a set of two hyper-parameters in order to build very small, low
latency models that can be easily matched to the design requirements for mobile and embedded vision
applications. This project investigated some of the important design decisions leading to an efficient
model, then demonstrated how to build smaller and faster MobileNets using width multiplier and
resolution multiplier by trading off a reasonable amount of Accuracy to reduce size and latency, and
then compared different MobileNets to popular models demonstrating superior size, speed and
accuracy characteristics. This project concluded by demonstrating MobileNet’s effectiveness when
applied to a wide variety of tasks.[2]
5
This project[3] found that recent research on deep convolutional neural networks (CNNs) has
focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify
multiple CNN architectures that achieve that accuracy level. With equivalent accuracy, smaller CNN
architectures offer at least three advantages: (1) Smaller CNNs require less communication across
servers during distributed training. (2) Smaller CNNs require less bandwidth to export a new model
from the cloud to an autonomous car. (3) Smaller CNNs are more feasible to deploy on FPGAs and
other hardware with limited memory. To provide all of these advantages, a small CNN architecture
called SqueezeNet was proposed. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x
fewer parameters. In this project, proposed steps toward a more disciplined approach to the design-
space exploration of convolutional neural networks. Toward this goal they have presented SqueezeNet,
a CNN architecture that has 50x fewer parameters than AlexNet and maintains AlexNet-level accuracy
on ImageNet. They also compressed SqueezeNet to less than 0.5MB, or 510x smaller than AlexNet
without compression.
6
CHAPTER 3
PROJECT DESCRIPTION
3.1 Introduction
In this chapter the difference between the traditional system and the proposed system is
discussed along with the feasibility of the project and its system specification. The traditional methods
of Machine Learning model are difficult to build and it requires high level GPU cores inorder to train
the model. It also requires a significant amount of knowledge in Machine Learning algorithms and a
wide collection of data sets. The proposed system is that the model is directly deployed in the browser
and train the model in there. For the cognitive model, a pre-trained model is used and they are re-
trained to perform some other task assigned by the user and this approach is called transfer learning.
By this method the users are no longer need to have high end specifications to train the model and
serve predictions from it.
7
model which saves time and also the flexibility of the data sets already available in it. This API can
run existing model or retrain existing model and can save the trained model to use it for other models.
The input images are sent to the MobileNet model and it uses Convolution neural network to
extract the features. Instead of running through all the layers the image is processed from second layer
to last layer which excludes the label and class layer. The user features which is called logits are send
to KNN classifier to differentiate the object from the background and they are labeled by custom
classes by the user. Then they are predicting the classes from the input from the webcam and perform
the respective tasks.
8
Linux OS
● GNOME or KDE desktop: Tested on Ubuntu 12.04.
Alongside HTML and CSS, JavaScript is one of the core technologies of the World Wide
Web. JavaScript enables interactive web pages and is an essential part of web applications. The vast
majority of websites use it and major web browsers have a dedicated JavaScript engine to execute it.
Tensor Flow
Tensor Flow is an end-to-end open source platform for Machine Learning. It has a
comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers
push the state-of-the-art in ML and developers easily build and deploy ML powered applications. Build
and train ML models have easily using intuitive high-level APIs like Keras with eager execution, which
makes for immediate model iteration and easy debugging.
9
ML5.JS
Ml5.js aims to make Machine Learning approachable for a broad audience of artists, creative
coders, and students. The library provides access to Machine Learning algorithms and models in the
browser, building on top of TensorFlow.js with no other external dependencies. The library is
supported by code examples, tutorials, and sample datasets with an emphasis on ethical computing.
Bias in data, stereotypical harms, and responsible crowdsourcing are part of the documentation around
data collection and usage.
MobileNet
MobileNet is an architecture which is more suitable for mobile and embedded based vision
applications where there is lack of compute power. This architecture was proposed by Google. This
architecture uses depth wise separable convolutions which significantly reduces the number of
parameters when compared to the network with normal convolutions with the same depth in the
networks. This results in light weight deep neural networks.
3.6 Summary
In this chapter, the current existing systems, the proposed system, its feasibility, hardware
and software requirement for the project were explained briefly. So, this project involves training a
cognitive model inside a web browser without the need of installing any libraries and the model can
deploy the tensor flow styled training model with the help of WebGL.
10
CHAPTER 4
MODULE DESCRIPTION
4.1 Introduction
This chapter gives the functional description of the modules of the proposed system and
explains the architecture and flow of the project with data flow diagram, flow chart and UML diagram.
The inputs are given as the images from the webcam and are sent to the MobileNet model using ml5.js.
The feature extractor from the MobileNet model extracts the features from the images and sends to
KNN classifier and gives the output at a label class assigned by user or in a regression. All the input
training and predictions occur simultaneously, which this method is called inference mode.
ML5.JS MobileNet
Output
11
4.3 Design Phase
The design of the proposed system is described, which is used to explain how the system is
going to work, i.e. the process logic behind it, the flowchart, which represents the pictorial
representation of the process logic and finally the Data Flow Diagram.
4.3.1 Data Flow Diagram
Start
Image from
Webcam
Feature
Extraction Image from
Webcam
Perform tasks
based on
predictions
Fig 4.2 Data Flow Diagram
In this data flow diagram, the input is given from webcam as images to the pre-trained model
MobileNet to re-train it using custom class and the features are extracted by the CNN algorithm in the
12
MobileNet. Then the features are sent to the KNN classifier to distinguish the object and the
background. Then they can be saved locally and can be loaded to use the trained model to predict the
classes and perform tasks based on the output.
Load Model
Researchers
Get predictions
In this UML diagram the researchers are the users here and they train the model with the input
images from the webcam. The trained model can be saved locally for the purpose of reusability of the
13
model in any TensorFlow based model and can be loaded to serve predictions from the given inputs as
classes or regression.
4.3.3 Flowchart
START
Image from
Webcam
MobileNet
KNN Classifier
Feature no
recognized
yes
Prediction
END
Fig 4.4 Flow Chart
In this given flowchart the inputs are the images from the webcam and are sent to MobileNet
model to process and extract the features from it. The extracted features are sent to the KNN classifier
to recognize the same object or the gestures. If the features are recognized while predicting the class
then it displays the class or perform tasks for the respective classes else it again starts from the new
input images which is an inference type process where it runs as a loop until the program is stopped.
14
4.4 Module Description
Modules of the project: -
1. Webcam Module
2. MobileNet Module
3. KNN Classification
15
Fig 4.5 MobileNet Architecture
16
Table 4.1 MobileNet’s Convolution layers
17
4.4.3 KNN Classifier
k-NN is a type of instance-based learning, or lazy learning, where the function is only
approximated locally and all computation is deferred until classification. Both for classification and
regression, a useful technique can be used to assign weight to the contributions of the neighbors, so
that the nearer neighbors contribute more to the average than the more distant ones. The neighbors are
taken from a set of objects for which the classification or the object property value is known. This can
be thought of as the training set for the algorithm, though no explicit training step is required.
For the distance measured, the Euclidian distance is used
4.5 Summary
This chapter explains the detailed functions of the modules with respect to the system
architecture. The MobileNet model has a feature extractor which uses convolution neural network, so
the images from the webcam is given to the extractor and the image pixels are reduced down to 1000
logits (or features), then they are sent to the KNN classifier to recognize the object or the gesture from
the image. The output is inferred from the given input and the classes are predicted.
18
CHAPTER 5
IMPLEMENTATION & TESTING
5.1 Introduction
The proposed system is to run the model in the browser directly without any installation or
having high-end GPU cores. Users can interactively and train in real-time a Machine Learning model
to classify input from a video camera. The approach used here is the transfer learning which allows to
re-train a pre-trained model which saves time. This API can run existing model or retrain existing
model and can save the trained model to use it for other models.
The input images are sent to the MobileNet model and it uses Convolution neural network to
extract the features. Instead of running through all the layers, the image is processed from second layer
to last layer which excludes the label and class layer. The user features which is called logits are send
to KNN classifier to differentiate the object from the background and they are labeled by custom
classes by the user. Then they are predicting the classes from the input from the webcam and perform
the respective tasks.
Sample test cases: Here, the inputs are given in the video frames through webcam to the
MobileNet model. For the input, the different positions of hand were shown in webcam and respective
pictures were given, they are trained for some few examples at that position with different angles. The
same procedure is done to train the position of various position of the slider with respect to the hand
positions. Then finally the train button is clicked to train the model. The output is shown in the canvas
of the browser as the block moves in the canvas with respect to position of the hand. The classes for
each position are trained individually and the predicted class after training has a user assigned task like
here the position of the slider with respect to the block on the canvas. The inputs and outputs are shown
in the figure 5.1 and 5.2.
There are some limitations in this API which can affect the probability and accuracy of the
predictions like image size, camera pixel quality and angle of the object and position.
19
5.2 Input & Output
5.2.1 Input
5.2.2 Output
21
Fig 5.2.c Output 3
The output is shown in the canvas of the browser as the block moves in the canvas with
respect to position of the hand. The classes for each position are trained individually and the predicted
class after training has a user assigned task like here the position of the slider with respect to the block
on the canvas. The probability of the prediction is shown as regression in the canvas.
5.3 Limitations
5.3.1. Image Quality
Image quality affects how well the neural networks algorithms work. The image quality of
scanning video is quite low compared with that of a digital camera. Even high-definition video is, at
best, 1080p (progressive scan); usually, it is 720p. These values are equivalent to about 2MP and
0.9MP, respectively, while an inexpensive digital camera attains 15MP. The difference is quite
noticeable.
22
5.3.3 Object Angle and Position
When training a model, the angle and the position of the object or the gesture shown in the
camera should be same at the time of recognition, else the probability of the prediction will be low.
5.4 Testing
After implementation, the model is trained with more than 20 image examples for each
position with the slider indicating the co-ordinates. When the object is recognized by the classifier the
probability of position is displayed in the canvas below and the object moves with respect to the labels
which contains the co-ordinate of the object. The probability of the position is based on the number of
examples given.
5.5 Summary
This chapter carried out various inputs and outputs of the proposed system and the procedure
of the application and its execution is explained in this chapter with its limitations which decreases the
accuracy of the predictions. Training the input and predicting the out is done concurrently which the
method is called inference mode and this is one of the major advantages of this project.
23
CHAPTER 6
RESULTS AND DISCUSSIONS
24
Fig 6.1 Comparison of existing and proposed system
1. Apps are easy to share: Models are run directly in the browser without additional files or
installations and can be shared using an URL hosting in a server. It no longer needs to link JavaScript
to a Python file running on the cloud. And instead of fighting with virtual environments or package
managers, all dependencies can be included as HTML script tags.
2. The client provides the compute power: Training and predictions are offloaded to the user’s
hardware. This eliminates significant cost and effort for the developer.
3. Data never leaves the client’s device : This is crucially important as users are increasingly
concerned about protecting their sensitive information, especially in the wake of massive data scandals
and security breaches. With TensorFlow.js, users can take advantage of AI without sending their
personal data over a network and sharing it with a third party. This makes it easier to build secure
applications that satisfy data security regulations, e.g. healthcare apps that tap into wearable medical
sensors.
25
CHAPTER 7
CONCLUSION AND FUTURE ENHANCEMENTS
7.1 Conclusion
Instead of training a model to do something very narrow (for example, recognize cats), this
model can be trained to recognize any input data. So, transfer learning is used to re-train a pre-trained
model which here is MobileNet. This enables users to train a model within a browser without needing
to install any kind of libraries or having higher workstations. This application helps researchers and
data scientists to run a model and serve predictions directly from the browser. This API is tested with
many test cases may give 100 percent accuracy besides the limitations.
26
APPENDICES
Sample Source Code
Index.html
<!DOCTYPE html>
<html>
<head>
<title>Major Project</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/0.8.0/p5.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/0.8.0/addons/p5.dom.min.js"></script>
<script src="https://unpkg.com/[email protected]/dist/ml5.min.js"></script>
<script src="sketch.js"></script>
</head>
<body></body>
</html>
5.2.2 Sketch.js
let video;
let features;
let knn;
let labelP;
let ready = false;
let x;
let y;
let label = 'nothing';
function setup() {
createCanvas(320, 240);
video = createCapture(VIDEO);
video.size(320, 240);
features = ml5.featureExtractor('MobileNet', modelReady);
27
knn = ml5.KNNClassifier();
labelP = createP('need training data');
labelP.style('font-size', '32pt');
x = width / 2;
y = height / 2;
}
function goClassify() {
const logits = features.infer(video);
knn.classify(logits, function(error, result) {
if (error) {
console.error(error);
} else {
label = result.label;
labelP.html(result.label);
goClassify();
}
});
}
function keyPressed() {
const logits = features.infer(video);
if (key == 'l') {
knn.addExample(logits, 'left');
console.log('left');
} else if (key == 'r') {
knn.addExample(logits, 'right');
console.log('right');
} else if (key == 'u') {
knn.addExample(logits, 'up');
console.log('up');
} else if (key == 'd') {
knn.addExample(logits, 'down');
console.log('down');
28
} else if (key == 's') {
save(knn, 'model.json');
knn.save('model.json');
}
}
function modelReady() {
console.log('model ready!');
knn.load('model.json', function() {
console.log('knn loaded');
});
}
function draw() {
background(0);
fill(255);
ellipse(x, y, 24);
if (label == 'left') {
x--;
} else if (label == 'right') {
x++;
} else if (label == 'up') {
y--;
} else if (label == 'down') {
y++;
}
//image(video, 0, 0);
if (!ready && knn.getNumLabels() > 0) {
goClassify();
ready = true;
}
}
29
REFERENCES
[1] Chuanqi Tan , Fuchun Sun , Tao Kong , Wenchang Zhang , Chao Yang , and Chunfang : A
Survey on Deep Transfer Learning (2017)
[2] Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko, Weijun Wang Tobias
Weyand Marco Andreetto Hartwig AdamMobileNets: Efficient Convolutional Neural
Networks for Mobile Vision Applications (2018)
[3] Bauer, M., Rojas-Carulla, M., Świątkowski, J.B., Schölkopf, B. and Turner, R.E., 2017.
Discriminative k-shot learning using probabilistic models. arXiv preprint arXiv:1706.00326.
[4] Bragg, D., Huynh, N. and Ladner, R.E., 2016, October. A personalizable mobile sound detector
app design for deaf and hard-of-hearing users. In Proceedings of the 18th International ACM
SIGACCESS Conference on Computers and Accessibility (pp. 3-13). ACM.
[5] Flores, G.H. and Manduchi, R., 2016, October. WeAllWalk: An Annotated Data Set of Inertial
Sensor Time Series from Blind Walkers. In Proceedings of the 18th International ACM
SIGACCESS Conference on Computers and Accessibility (pp. 141-150). ACM.
[6] Fowler, A., Roark, B., Orhan, U., Erdogmus, D. and Fried-Oken, M., 2013. Improved inference
and autotyping in EEG-based BCI typing systems. In Proceedings of the 15th International
ACM SIGACCESS Conference on Computers and Accessibility (p. 15). ACM
[7] Daniel Smilkov, Nikhil Thorat, Yannick Assogba Ann Yuan, Nick Kreeger, Ping Yu Kangyi
Zhang Shanqing, Cai Eric Nielsen, David Soergel, Stan Bileschi, Michael Terry, Charles
Nicholson, Sandeep N. Gupta, Sarah Sirajuddin, D. Sculley, Rajat Monga, Greg Corrado
Fernanda B. Viegas : Tensorflow.Js: Machine Learning For The Web And Beyond
[8] Martin Wattenberg 1M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin, et al. TensorFlow: Large-scale Machine Learning on
heterogeneous systems, 2015. Software available from TensorFlow. org, 1, 2015.
30