(Slide) Neural Machine Translation

AI VIETNAM
All-in-One Course
NLP Project
Neural Machine Translation
AI VIET NAM
Nguyen Quoc Thai
1
Year 2023
Outline
Ø Introduction
Ø NMT using Transformer
Ø NMT using Pre-trained LMs
2
Introduction
! Translate a sentence w(s) in a source language (input) to a sentence w(t) in the
target language (output)
3
Introduction
Automatic Speech Natural Language Natural Language

Recognition (ASR) Understanding (NLU) Generation (NLG)
translation of spoken a computer’s ability to generate natural

language into text understand language language by a computer
q Syntax
q Semantics
q Phonology
q Pragmatics
q Morphology
4
Introduction
Ø Can be formulated as an optimization problem:

! (") = argmax 𝜃( 𝑤 (%) , 𝑤 (&) )
𝑤
$(")
Where 𝜃 is a scoring function over source and target sentences
Ø Requires two components:
q Learning algorithm to compute parameters of 𝜃
! (")
q Decoding algorithm for computing the best translation 𝑤
5
Introduction
1950 1980 1990 2007 2015

6
Introduction
! Evaluating translation quality
Ø Human judgement
q Given: machine translation output
q Given: source / reference translation
q Task: assess the quality of machine translation output
Ø Different translations of “A Vinay le gusta Python”
7
Introduction
Ø Two main criteria:

q Adequacy: Translation w(t) should adequately reflect the linguistic content of w(s)
q Fluency: Translation w(t) should be fluent text in the target language
Ø Different translations of “A Vinay le gusta Python”
8
Introduction
Ø Two main criteria:

q Adequacy: Translation w(t) should adequately reflect the linguistic content of w(s)
q Fluency: Translation w(t) should be fluent text in the target language
Ø Adequacy and fluency: Adequacy Fluency

5 All meaning 5 Flawless English
4 Most meaning 4 Good English
3 Much meaning 3 Non-native English
2 Little meaning 2 Disfluent English
1 None 1 Incomprehensible
9
Introduction
! Evaluating Metrics
Ø Manual evaluation is most accurate, but expensive

Ø Automated evaluation metrics:
q Compare system hypothesis with reference translations
q BLEU Score (BiLingual Evaluation Understudy): Modified n-gram Precision
q SacreBLEU Score (A Call for Clarity in Reporting BLEU Scores)
10
Introduction
Precision and Recall of words

System A A officials responsibility of airport safety
Reference A officials are responsible for airport security
Ø Precision: Ø Recall:
correct 3 correct 3
= = 50% = = 43%
output − length 6 reference − length 7
Ø F-measure:
PxR 0.5 x 0.43
= = 46%
(P + R)/2 (0.5 + 0.43)/2
11
Introduction
Precision and Recall of words

v Flaw: no penalty for reordering
System B airport security A officials are responsible
Metric System A System B

Precision 50% 100%
Recall 43% 86%
F-measure 46% 92,5%
12
Introduction
BLEU
v N-gram overlap between machine translation output and reference translation
v Compute precision for n-grams of size 1 to 4
v Add brevity penalty (for too short translations)
* )/*
output − length
BLEU = min 1, > precision)
reference − length
'()
v Typically computed over the entire corpus, not single sentences
13
Introduction
BLEU 1-gram
Metric System A System B
Precision (1 gram) 3/6 6/6
Precision (2 gram)
Precision (3 gram)
Precision (4 gram)
Brevity penalty
BLEU
14
Introduction
BLEU
2 -gram Metric System A System B

Brevity penalty 6/7 6/7
BLEU 0 0.52
15
Introduction
BLEU
-
r
logBLEU = min 1 − , 0 + B w,logp,
c
,()
r: reference-length, c: output (candidate)-length
n: n-gram (1,2,3,4), wn: weight of n-gram
uniform weights wn=1/n
pn: precision n-gram
SacreBLEU (A Call for Clarity in Reporting BLEU)
16
Introduction
17
Outline
Ø Introduction
18
NMT using Transformer
! Sequence to Sequence
v A single neural network is used to translate from source to target

v Architecture: Encoder-Decoder
v Encoder: Convert source sentence (input) into a vector/matrix (State)
v Decoder: Convert encoding into a sentence in target language (output)
Input Decoder State Encoder Output

Thought Vector
Capture all information of input sentence
19
! Transformer Model
20
! Training
Target
I go to work <end>
Loss
Prediction I go _earn work <end>
t
_ôi
đi ENCODER DECODER
l
_àm
<start> I go to work
21
! Training
How to choose “Best candidate”

Output Sequence (Target)
ENCODER DECODER
Input Sequence (Source) 22

! Greedy Decoding
23
Outline
Ø Introduction
24
NMT using Pre-trained LMs
! Pre-trained LMs
25
! Pre-trained LMs
Source
26
! Pre-trained LMs
27
! Pre-trained LMs: BERT
v BERT: An encoder-only model

H 𝟏:𝒏
v Maps an input sequence to a contextualized sequence: 𝒇𝜽𝑩𝑬𝑹𝑻 : 𝑿𝟏:𝒏 ⟶ 𝑿
28
! Pre-trained LMs: BERT
29
! Pre-trained LMs: GPT2
v GPT2: A decoder-only model, use uni-directional (causal) self-attention

v Maps an input sequence to a “next word” logit vector sequence:
𝒇𝜽𝑮𝑷𝒀𝟐 : 𝑿𝟎:𝒎4𝟏 ⟶ 𝑳𝟏:𝒎
30
! Pre-trained LMs: GPT2
31
! Encoder-Decoder with BERT and GPT2
32
! BERT for Encoder
33
! BERT for Decoder
34
! GPT2 for Decoder
35
! Experiment
v Dataset: IWSLT’15 English-Vietnamese

Training: 133 317 Validation: 1 553 Test: 1 269
Experiment Model ScareBLEU
#1 Standard Transformer (Greedy Search) 24.66 55.9/30.3/18.5/11.8
#2 BERT-to-BERT (Greedy Search) 25.41 53.8/31.8/19.8/12.3
#3 BERT-to-GPT2 (Greedy Search) 23.56 49.1/28.5/18.4/12.0
36
Thanks!
Any questions?
37

(Slide) Neural Machine Translation

Uploaded by

Copyright:

Available Formats

(Slide) Neural Machine Translation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Slide) Neural Machine Translation

Uploaded by

Copyright:

Available Formats

AI VIETNAM

Neural Machine Translation

Automatic Speech Natural Language Natural Language

translation of spoken a computer’s ability to generate natural

Ø Can be formulated as an optimization problem:

1950 1980 1990 2007 2015

Ø Two main criteria:

Ø Different translations of “A Vinay le gusta Python”

Ø Two main criteria:

Ø Adequacy and fluency: Adequacy Fluency

Ø Manual evaluation is most accurate, but expensive

Precision and Recall of words

Precision and Recall of words

Metric System A System B

2 -gram Metric System A System B

v A single neural network is used to translate from source to target

Input Decoder State Encoder Output

How to choose “Best candidate”

Input Sequence (Source) 22

v BERT: An encoder-only model

v GPT2: A decoder-only model, use uni-directional (causal) self-attention

v Dataset: IWSLT’15 English-Vietnamese

You might also like