HCMUT Internship Report DoanTienThong

VIET NAM NATIONAL UNIVERSITY HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

FACULTY OF ELECTRICAL AND ELECTRONICS ENGINEERING
GRADUATION INTERNSHIP REPORT
ARTIFICIAL INTELLIGENCE IN
INDUSTRY ENVIRONMENT
Major: Control and Automation Engineering
COMPANY: Emage Development

SUPERVISOR: Dr. Nguyen Hoang Giap
—o0o—
STUDENT: Doan Tien Thong (1915352)
HO CHI MINH CITY, APRIL 2023

Acknowledgements
Throughout my internship, I have received support and assistance from many people, and I
would want to take this opportunity to express my heartfelt gratitude and delight.
First and foremost, I would like to thank my internship supervisor, Mr. Nguyen Hoang Giap,
for their constant encouragement, invaluable guidance, and unwavering support throughout my
internship. Your insights and expertise have been instrumental in my professional development,
and I am truly grateful for the opportunity to learn from you.
I am also grateful to the entire team at Emage Development company, who welcomed me
warmly and provided me with a conducive environment to hone my skills and explore new
horizons.
I would like to extend my gratitude to the faculty of Electrical and Electronics for their invalu-
able guidance, support, and feedback on my internship report. Your expertise and suggestions
have significantly contributed to the quality of this report.
I am also thankful to the Ho Chi Minh city University of Technology for providing me with
the platform and resources to pursue this internship. The experience has been vital to my aca-
demic and professional growth.
I am deeply grateful to my family for their unwavering love, encouragement, and patience
throughout my internship journey. Their support has been the backbone of my success, and
I cannot thank them enough for their belief in me.
Lastly, I would like to express my appreciation to my friends and fellow interns, who have
shared this journey with me. Your support, collaboration, and friendship have made this experi-
ence both enjoyable and memorable.
i
Abstract
This internship report presents an overview of the author’s experiences, key accomplishments,
and learning outcomes during their internship at Emage Development company as an AI Engi-
neer.
Throughout the internship, the intern was engaged in various projects and tasks related to Artifi-
cial Intelligent like classification, object detection and segmentation. This hands-on experience
allowed the intern to develop and strengthen essential technical and interpersonal skills, such as
programming, how use frameworks, problem-solving, teamwork, and effective communication.
The report details the intern’s experiences and learning outcomes, along with the challenges
encountered and the strategies adopted to overcome them. Furthermore, the report highlights
the intern’s contributions to the organization and reflects on the impact of the internship on
their academic and professional growth.
In conclusion, the internship experience at Emage Development company has been invaluable
in bridging the gap between academic learning and industry practices. These skills and experi-
ments when working here will help to excel in future career endeavors.
ii
Contents
Acknowledgements i
Abstract ii
Contents iii
List of Figures iv
1 Introduction 1
1.1 Introduction to Emage Development company . . . . . . . . . . . . . . . . . . 1
1.2 Internship Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Internship Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Internship Content 3
2.1 Research about Company’s Product . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Project 1: Research about EfficientNet . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Project 2: Template Matching using DL . . . . . . . . . . . . . . . . . . . . . 9
3 Internship Summary 13
3.1 Internship Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Experience Gained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
References 15
iii
List of Figures
1.1 Emage Group logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.1 Two core products of company . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Model Scaling. (a) is a baseline network example; (b)-(d) are conventional scaling
that only increases one dimension of network width, depth, or resolution. (e) Pro-
posed compound scaling method that uniformly scales all three dimensions with a
fixed ratio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Scaling Up a Baseline Model with Different Network Width (w), Depth (d), and
Resolution (r) Coefficients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Scaling Network Width for Different Baseline Network . . . . . . . . . . . . . . . 6
2.5 The architecture for the baseline network of EfficientNet-B0 is simple and clean,
making it easier to scale and generalize. . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 All requirements of Project 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.7 The baseline network architecture of QATM. The dashed arrows indicate the re-
placement relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 Some demos of the proposed method in OTB public dataset . . . . . . . . . . . . . 11
iv
Chapter 1
Introduction
This section should give brief information about the Emage Development company, estab-
lishment history, development process and vision. Next, this chapter will overview about the
internship program and timeline when working here.
1.1 Introduction to Emage Development company

Emage Vision was founded in 2011. the company’s story started when our machine vision
solutions gave “eyes” to industrial machines. With the advent of smart manufacturing, Emage
Vision saw an opportunity to provide solutions to transform manufacturing.
In 2016, Emage Vision started to look into developing AI solutions to help customers im-
prove their manufacturing solutions. In 2019, the Machine Learning software was adopted and
integrated to major customer’s manufacturing lines. By then, these solutions had encompassed
the “eyes” and “brains” to industrial machines.
Figure 1.1: Emage Group logo
1
1.2. Internship Project
In 2020, Emage prototyped a humanoid. This enables us to incorporate “touch” into our
suite of solutions. By incorporating machine vision, AI and humanoid, the company give ‘eyes”,
“brain” and “touch” to help our customers in their quest to enlarge the footprints in their smart
manufacturing efforts.
Today, the company are headquartered in Singapore with RD centres in Russia, India and
Vietnam, and field operations in the USA and Philippines. Our organisation is dedicated to re-
lentless innovations, delivering to our commitments and support to our customers.
1.2 Internship Project

These are the tasks assigned during the internship:
• Research about EfficientNet Network in classification task.
• Research and apply other advanced techniques in deep learning to solve real project of
company. (like Template matching using deep learning, Domain Adaptive model, etc.
1.3 Internship Timeline

Working time at Emage Development company: from 26 April 2022 to 28 December 2022,
that include:
• 26 April - 26 June: Internship
• 27 June - 28 December: Full-time work
2
Chapter 2
Internship Content
This chapter will go into detail about the implementation of projects during the internship
as an AI Engineer at Emage Development company.
2.1 Research about Company’s Product

Emage Group provides specialized and customized vision solutions to medical, consumer
electronics and semiconductor industries. These technologies helped customers build billions
of quality products. We use differentiated technology to design, develop and deploy these cus-
tomized vision modules, integrating into the manufacturing processes or providing standalone
systems.
Machine Learning Product

Emage Group is one of the pioneers in Singapore to successfully deploy its machine learning
software as a yield improvement tool to help our customers save millions of dollars. A proven
platform with a simple approach to problem-solving, the Machine Learning software can be
deployed in most manufacturing processes.
Today, most manufacturing processes rely on machine vision for product quality inspections.
Overkills and false rejects cost money, entails more resources and affects overall efficiency.
These machine-learning solutions can enhance your vision inspections and reduce the false
rejects resulting in better yields. This proven platform can help label, train and predict all your
images, with accuracy of better than 95
AEON (Autonomous Equipment+Operation Networking) is our Neural Network and Re-
inforce Learning software that provides “autonomous” operations and Defect Classifications.
3
2.2. Project 1: Research about EfficientNet
Figure 2.1: Two core products of company
This platform will enable our customers a step closer to a fully automated manufacturing envi-
ronment.
OSPREY (Operation Specific Process Recovery Ecosystem) is our real-time trend analysis
utility tool that combines Machine Learning Defects Classification and Alert Management Sys-
tem to help process/manufacturing engineering to troubleshoot and diagnose faster and easier.
2.2 Project 1: Research about EfficientNet

EfficientNet [2] is the core model of the AEON product of the company, so every employee
must have deep knowledge of this model.
EfficientNet uses a technique called compound coefficient to scale up models in a simple

but effective manner. Instead of randomly scaling up width, depth or resolution, compound scal-
ing uniformly scales each dimension with a certain fixed set of scaling coefficients. Using the
scaling method and AutoML, the authors of efficient developed seven models of various dimen-
sions, which surpassed the state-of-the-art accuracy of most convolutional neural networks, and
with much better efficiency.
4
Figure 2.2: Model Scaling. (a) is a baseline network example; (b)-(d) are conventional scaling
that only increases one dimension of network width, depth, or resolution. (e) Proposed
compound scaling method that uniformly scales all three dimensions with a fixed ratio.
2.2.1 Mean of scaling of CNNs

There are three scaling dimensions of a CNN: depth, width, and resolution
Depth Scaling (d)
Depth Scaling will increase the receptive field of the model, the deeper network can capture
richer and more complex features and generalizes well on new tasks. The challenge is vanishing
gradients is one of the most common problems that arise as we go deep and adding more layers
doesn’t always help. Ex: ResNet-1000 has similar accuracy as ResNet-101.
Width Scaling (w)
Wider networks tend to be able to capture more fine-grained features. Also, smaller models
are easier to train. The challenge is making the network extremely wide, with shallow models
(less deep but wider) accuracy saturates quickly with a larger width.
Resolution (r)
With high-resolution images, the features are more fine-grained and hence high-res im-
ages should work better. For example, in Object detection tasks, we use image resolutions like
300x300, or 512x512, or 600x600. But this doesn’t scale linearly.
5
Figure 2.3: Scaling Up a Baseline Model with Different Network Width (w), Depth (d), and
Resolution (r) Coefficients.
Figure 2.4: Scaling Network Width for Different Baseline Network
Observation 1: Scaling up any dimension of the network (width, depth, or resolution)

improves accuracy, but the accuracy gain diminishes for bigger models.
2.2.2 Combined Scaling

Intuitively, as the resolution of the images is increased, the depth and width of the network
should be increased as well. As the depth is increased, larger receptive fields can capture similar
features that include more pixels in an image. Also, as the width is increased, more fine-grained
features will be captured.
For example, as shown in Figure 2.4 from the paper, with deeper and higher resolution,
6
width scaling achieves much better accuracy under the same FLOPS cost.
Observation 2: It is critical to balance all dimensions of a network (width, depth, and

resolution) during CNNs scaling for getting improved accuracy and efficiency.
2.2.3 Proposed Compound Scaling

In this paper, The compound scaling method is proposed, which uses a compound coeffi-
cient φ uniformly scale network width, depth, and resolution in a principled way.
φ is a user-specified coefficient that controls how many resources are available whereas
α, β , and γ specify how to assign these resources to network depth, width, and resolution
respectively.
Note:
• In a CNN, Conv layers are the most compute expensive part of the network. Also, the
FLOPS of a regular convolution op is almost proportional to d, w², r².
• For example, doubling the depth will double the FLOPS while doubling width or resolu-
tion increases FLOPS almost by four times.
2.2.4 EfficientNet Architecture

Scaling doesn’t change the layer operations, hence it is better to first have a good baseline
network and then scale it along different dimensions using the proposed compound scaling.
In the paper, the authors obtained their base network by doing a Neural Architecture Search
(NAS) that optimizes for both accuracy and FLOPS.
EfficientNet-B0 baseline network:
The specific architecture of EfficientNet-B0 is shown in Figure 2.5. The MBConv block:
Inverted Residual Block (used in MobileNetV2) with a Squeeze and Excite block injected some-
times.
7
2.3. Project 2: Template Matching using DL
Figure 2.5: The architecture for the baseline network of EfficientNet-B0 is simple and clean,
making it easier to scale and generalize.
Figure 2.6: All requirements of Project 2
Scaling parameters: The model has 4 parameters to search for: α, β , γ, and ϕ. apply the
compound scaling method to scale it up with two steps:
1. Fix φ =1, assuming that twice more resources are available, and do a small grid search
for α, β and γ. For baseline network B0, it turned out the optimal values are α =1.2, β =
1.1, and γ = 1.15 such that α ∗ β 2 ∗ γ 2 ≈ 2.
2. Now fix α, β and γ as constants (with values found in step 1) and experiment with differ-
ent values of φ . The different values of φ produce EfficientNets B1-B7.
8
Figure 2.7: The baseline network architecture of QATM. The dashed arrows indicate the
replacement relationship
2.3 Project 2: Template Matching using DL

2.3.1 Project Requirements
From a template image of an IC component, find its position on a large picture of a circuit
system. Because of the security of corporate data, this report will be replaced with random
images collected on the internet. All of the requirement is described in Figure 2.6
2.3.2 Baseline - QATM Network

"QATM [1] : Quality-Aware Template Matching For Deep Learning" is a novel quality-aware
template matching method. Which is not only used as a standalone template matching algorithm,
but also a trainable layer that can be easily embedded into any deep neural network. QATM not
only outperforms state-of-the-art template matching methods when used alone, but also largely
improves existing deep network solutions. Figure 2.7 represents an overview architecture of the
QATM network
The author realizes that need to define Quality(s, t), i.e. how to assess the matching quality
between (s, t). In the rest of section, we derive the quality-aware template matching (QATM)
measure, which is a proxy function of the ideal quality assessment Quality(s, t).
Let fs and ft be the feature representation of patch s and t, and ρ(·) is a predefined similarity
measure between two patches, e.g. cosine similarity. Given a search patch s, we define the
likelihood function that a template patch t is matched, as shown in the equation below:
exp{α · ρ( ft , fs )}
L(s|t) =
∑t ′ ∈T exp{α · ρ( ft ′ , fs )}
9
This likelihood function can be interpreted as a soft-ranking of the current patch t compared
to all other patches in the template image in terms of matching quality. It can be alternatively
considered as a heated-up softmax embedding, which is the softmax activation layer with a
learnable temperature parameter, i.e. α in our context.
In this way, we can define the QATM measure as simple as the product of likelihoods that
s is matched in T and t is matched in S as shown in the equation below:
QAT M(s,t) = L(t|s) · L(s|t)

Once we have the pairwise QATM results between S and T, the matching quality of an ROI
s can be found as shown in the equation below:
q(s) = max{QAT M(s,t)|t ∈ T }

where q(·) indicates the matching quality function. Eventually, we can find the best matched
region R∗ which maximizes the overall matching quality as shown in the equation below:
R∗ = arg max{ ∑ q(r)}

R r∈R
The detail Algorithm of the QATM network is described in algorithms 1.
Algorithm 1 Compute QATM and matching quality between two images

Require: Given template image IT and search image IS , a feature extracting model F, a temper-
ature parameter α. Func(.|I) indicates doing operation along axis of I. Return the position of
template IT in search image IS .
T ← F(IT )
S ← F(IS )
ρst ← Patch − wiseSimilarity(T, S)
Which can be easily obtained by off-the-shelf functions such as tensor f low.einsum or
tensor f low.tensordot
ρst ← ρst × α
L(s|t) ← So f tmax(ρst |T )
L(t|s) ← So f tmax(ρst |S)
QAT M ← L(s|t) × L(t|s)
Smap ← Max(QAT M|T ) ▷ Matching quality score
Tmap ← Max(QAT M|S)
2.3.3 Proposed Method and Result

Due to a confidentiality agreement with the company, the proposed method and result will
not be covered in this report. Instead, this report will demo some images from the published
10
(a) pair0001 (b) pair0007
(c) pair0028 (d) pair0041
Figure 2.8: Some demos of the proposed method in OTB public dataset
dataset OTB [3] . The results of the proposed method are shown in Figure 2.8, while the red
rectangle is the predicted from my method, and the green one is the ground truth of image.
11
Chapter 3
Internship Summary
3.1 Internship Result

During the internship, the work was basically done well
• Gain deep knowledge about EfficientNet - a core network used in the company, able to
build EfficientNet from scratch and use it to solve problems in the company.
• Gain deep knowledge about Template Matching using Deep-Learning
• Apply and improve QATM baseline to solve well real-world company tasks, and able to
write an article from that solution.
• Learn how to write a report to manager and able to present solutions to the team leader
and manager.
3.2 Experience Gained

During the internship at Emage Development, I not only learned a lot of specialized knowl-
edge, but also learned more soft skills like writing reports, presentations, and communicating
with foreign customers.
After my internship time at Emage Development, I have been in contact with many col-
leagues in the company, and learned lots of experiments from them. Participated a list of works
of AI Engineer, which helps me have a clear direction as well as helped me in my career path.
Once a gain, thank you everybody help me get this chance to work here.
13
References
[1] Jiaxin Cheng, Yue Wu, Wael AbdAlmageed, and Premkumar Natarajan. Qatm: Quality-
aware template matching for deep learning. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 11553–11562, 2019.
[2] Mingxing Tan and Quoc Le. Efficientnet: Rethinking model scaling for convolutional neu-
ral networks. In International conference on machine learning, pages 6105–6114. PMLR,
2019.
[3] Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. Online object tracking: A benchmark. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
15

HCMUT Internship Report DoanTienThong

Uploaded by

Copyright:

Available Formats

HCMUT Internship Report DoanTienThong

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

HCMUT Internship Report DoanTienThong

Uploaded by

Copyright:

Available Formats

VIET NAM NATIONAL UNIVERSITY HO CHI MINH CITY

HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY

GRADUATION INTERNSHIP REPORT

COMPANY: Emage Development

HO CHI MINH CITY, APRIL 2023

1.1 Emage Group logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2.1 Two core products of company . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1 Introduction to Emage Development company

Figure 1.1: Emage Group logo

1.2 Internship Project

• Research about EfficientNet Network in classification task.

1.3 Internship Timeline

• 26 April - 26 June: Internship

• 27 June - 28 December: Full-time work

2.1 Research about Company’s Product

Machine Learning Product

Figure 2.1: Two core products of company

2.2 Project 1: Research about EfficientNet

EfficientNet uses a technique called compound coefficient to scale up models in a simple

2.2.1 Mean of scaling of CNNs

Depth Scaling (d)

Width Scaling (w)

Figure 2.4: Scaling Network Width for Different Baseline Network

Observation 1: Scaling up any dimension of the network (width, depth, or resolution)

2.2.2 Combined Scaling

Observation 2: It is critical to balance all dimensions of a network (width, depth, and

2.2.3 Proposed Compound Scaling

2.2.4 EfficientNet Architecture

EfficientNet-B0 baseline network:

Figure 2.6: All requirements of Project 2

2.3 Project 2: Template Matching using DL

2.3.2 Baseline - QATM Network

QAT M(s,t) = L(t|s) · L(s|t)

q(s) = max{QAT M(s,t)|t ∈ T }

R∗ = arg max{ ∑ q(r)}

Algorithm 1 Compute QATM and matching quality between two images

2.3.3 Proposed Method and Result

(a) pair0001 (b) pair0007

(c) pair0028 (d) pair0041

3.1 Internship Result

• Gain deep knowledge about Template Matching using Deep-Learning

3.2 Experience Gained

You might also like