CH2-1 To CH2-3

A simple One Pass Compiler
1
Introduction
 In computer programming, a one-pass compiler is a compiler
that passes through the parts of each compilation unit only once,
immediately translating each part into its final machine code.
 One pass compiler reads the code only once and then translates it.
 A one-pass compiler is fast since all the compiler code is loaded in the
memory at once.
 It can process the source text without the overhead of the operating
system having to shut down one process and start another.
2
Introduction
 Building one pass compiler involves:
 Defining the syntax of a programming language (CFG/BNF)
 Develop a source code parser: (Top down parser)
 Implementing syntax directed translation to generate

intermediate code:
 Generating
 Optimize
3
Structure of Compiler
Character Token Intermediate
Syntax-directed
stream Lexical analyzer stream Representation
translator
Develop
parser and code
generator for translator
Syntax definition
(CFG)
4
Syntax Definition
 To specify the syntax of a language : CFG and BNF (Backus–Naur form or
Backus normal form)
Example : if-else statement in C has the form of statement → if (

expression ) statement else statement;
 An alphabet of a language is a set of symbols.
Examples : {0,1} for a binary number system (language)

={0,1,100,101,...}
{a,b,c} for language={a,b,c, ac,abcc..}
{if,(,),else ...} for a if statements={if(a==1)goto10, if--}

5
Syntax Definition
 Backus-Naur Form (BNF): BNF is a formal notation that uses a set of
symbols and production rules to describe the legal constructs and their
arrangement in a programming language.Components:
 Symbols: These can be terminal symbols (representing basic elements like
keywords, identifiers, operators, punctuation) or non-terminal symbols
(representing higher-level constructs like expressions, statements, or
program blocks).
 Production Rules: These rules define how symbols can be combined to
form valid program elements. They use the arrow (->) to separate the left-
hand side (non-terminal) from the right-hand side, which can be a sequence
of terminals, non-terminals, or a combination with options (using | for OR).
6
CFG
 A Context-free Grammar (CFG) Is Utilized to Describe the Syntactic
Structure of a Language.
 CFG is a set of recursive rules used to generate patterns of strings.
 In CFG, the start symbol is used to derive the string. You can derive the
string by repeatedly replacing a non-terminal by the right hand side of the
production, until all non-terminal have been replaced by terminal symbols.
 It is useful to describe most of the programming languages.
 If the grammar is properly designed then an efficient parser can be

constructed automatically.
7
CFG
 A CFG recursively defines several sets of strings
 Each set is denoted by a name, which is called a nonterminal.
 One of the non terminals are chosen to denote the language described by the
grammar. This is called the start symbol of the grammar.
 Each production describes some of the possible strings that are contained in the set
denoted by a nonterminal.
 A production has the form N → X1…….Xn
where N is a nonterminal and X1…Xn are zero or more symbols, each of which is
either a terminal or a nonterminal.
8
CFG
 Some examples:
A→ a
 says that the set denoted by the nonterminal A contains the one-
character string a.
A→ aA
 says that the set denoted by A contains all strings formed by

putting an a in front of a string taken from the set denoted by A.
9
CFG
 From regular expressions to context free grammars
10
CFG
 Common syntactic categories in programming languages
are:
 Expressions:- are used to express calculation of values.
 Statements:- express actions that occur in a particular
sequence.
 Declarations:- express properties of names used in other parts
of the program.
11
CFG
 A CFG Is Characterized By a 4 tuple:
1. A Set of Tokens(Terminal Symbols)
2. A Set of Non-terminals
3. A Set of Production Rules
Each Rule Has the Form NT →{T, NT}*
4. A designated Start symbol.
12
Example CFG
 Context-free grammar for simple expressions
G = <{list, digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list> with a production P=
List → list + digit
List → list-digit
List → digit
Digit → 0|1|2|3|4|5|6|7|8|9
(the “|” means OR)
(so we could have written List → list + digit | list - digit | digit )
13
Derivation(used to determine all set of string)
 A given CFG we can determine the set of all strings(tokens) generated by the
grammar using derivation.
 The basic idea of derivation is to consider productions as rewrite rules:

Whenever we have a nonterminal, we can replace this by the right-hand side
of any production in which the nonterminal appears on the left-hand side.
 During parsing we have to take two decisions. These are as follows:

 We have to decide the non-terminal which is to be replaced.
 We have to decide the production rule by which the non-terminal will be
replaced.
14
Derivation
 We begin with the start symbol
 In each step, we replace one non terminal in the current

sentential form with one of the right-hand sides of production
for that nonterminal.
 Formally, we define the derivation relation by the three rules
1: N =>    if there is a production N → 
2:  => 
3:  =>  if there is a  such that  => and =>

15
Derivation
generates the string aabbbcc by the derivation
16
Left-most Derivation
 the input is scanned and replaced with the production rule from left to
right. So in left most derivatives we read the input string from left to
right. Example
 Production rules:
S=S+S
S=S-S
 S = a | b |c
 Input : a - b + c
17
Right-most Derivation
 The input is scanned and replaced with the production rule from right
to left. So in right most derivatives we read the input string from right
to left.. Example
S=S+S
S=S-S
 S = a | b |c
 Input : a - b + c
18
Grammars are Used to Derive Strings:
 We can derive the string: 9 - 5 + 2 as follows:
 list → list + digit P1: list → list + digit
→list - digit + digit P2: list → list - digit
→digit - digit + digit P3:list→digit
→9 - digit + digit P4: digit →9
→9 - 5 + digit P4: digit → 5
→9 - 5 + 2 P4: digit → 2
This is an example leftmost derivation, because we replaced the

leftmost nonterminal (underlined) in each step
19
Defining Parse tree
➢ More Formally, a Parse Tree for a CFG Has the

Following Properties:
➢ The root of the tree is labeled by the start symbol
➢ Each leaf of the tree is labeled by a terminal(token) or ε
➢ Each Interior Node (Now Leaf) Is a Non-Terminal
➢ If A→ x1x2…xn, is a production, Then A Is an Interior;
x1x2…xn Are Children of A and May Be Non-Terminals or
Tokens.
20
Parse Tree for the Example
Grammar
➢ Parse tree of the string 9-5+2 using grammar G
21
Ambiguity
 A grammar is said to be ambiguous if there exists more than one left
most derivation or more than one right most derivation or more than
one parse tree for a given input string.
 Consider the following context-free grammar:
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
with production P =
string → string + string | string - string | 0 | 1 | … | 9
 This grammar is ambiguous, because more than one parse tree

generates the string 9-5+2
22
Ambiguity
 Two derivations (Parse Trees) for the same token string.
23
Associativity of Operators
➢ An operator  is left-associative if the expression abc must be evaluated
from left to right, i.e., as (ab)c .
➢ An operator  is right-associative if the expression abc must be evaluated

from right to left, i.e., as a a(bc).
➢ An operator  is non-associative if expressions of the form abc are illegal.
➢ Left-associative operators have left-recursive productions.
eg) 9+5+2≡(9+5)+2, a=b=c≡a=(b=c)
• Left Associative Grammar • Right Associative Grammar
list → list + digit | list – digit right → letter = right | letter
digit →0|1|…|9 letter → a|b|…|z
24
Associativity of Operators
• Left Associative Grammar • Right Associative Grammar
list → | list – digit right → letter = right | letter
digit →0|1|…|9 letter → a|b|…|z
25
Precedence of Operator
➢ A possible way of resolving the ambiguity is to use
precedence rules during syntax analysis to select among the
possible syntax trees.
➢ We say that a operator(*) has higher precedence than other

operator(+) if the operator(*) takes operands before other
operator(+) does.
• ex. 9+5*2≡9+(5*2), 9*5+2≡(9*5)+2
• left associative operators : + , - , * , /
• right associative operators : = , **
26
Precedence of Operator
expr → expr + term | term

term → term * factor | factor
factor → number | ( expr )
String 2+3*5 has the same meaning as 2+(3*5)
expr term
term term factor
factor factor number
number number
2 + 3 * 5 27
Syntax-Directed Translation
 Parser uses a CFG(Context-free-Grammar) to validate the input string and produce
output for the next phase of the compiler. Output could be either a parse tree or an
abstract syntax tree.
 To interleave(mix) semantic analysis with the syntax analysis phase of the

compiler, we use Syntax Directed Translation.
 SDT has augmented rules to the grammar that facilitate semantic analysis. SDT
involves passing information bottom-up and/or top-down to the parse tree in form
of attributes attached to the nodes.
 Syntax-directed translation rules use :-

 lexical values of nodes,
 constants &
 attributes associated with the non-terminals in their definitions.
28
Syntax-Directed Translation
 SDT Uses a CF grammar to specify the syntactic
structure of the language.
 AND associates a set of attributes with (non)terminals.
 AND associates with each production a set of semantic

rules for computing values for the attributes.
 The attributes contain the translated form of the input

after the computations are completed
29
Synthesized and Inherited
Attributes
 An attribute is said to be …
 synthesized if its value at a parse-tree node is
determined from the attribute values at the
children of the node
 inherited if its value at a parse-tree node is
determined by the parent (by enforcing the
parent’s semantic rules)
30
Attribute Grammar
 Attribute grammar is a special form of CFG where some additional information (attributes) are
appended to one or more of its non-terminals in order to provide context-sensitive information.
 Each attribute has well-defined domain of values, such as integer, float, character, string, and
expressions.
 AG is a medium to provide semantics to the CFG and it can help specify the syntax and semantics
of a programming language.
 Attribute grammar (when viewed as a parse-tree) can pass values or information among the nodes
of a tree.
 It is useful to allow side effects (printing a value, updating a global variable, ...) in semantic rules.
 It is a syntax-directed definition where the semantic rules cannot have side effects.
31
Example Attribute Grammar
 Example: E → E + T { E.value = E.value + T.value }
 The right part of the CFG contains the semantic rules that specify how the
grammar should be interpreted.
 Here, the values of non-terminals E and T are added together and the result is
copied to the non-terminal E.
 Each Production Rule of the CFG Has a Semantic Rule. Concate operator
Note: Semantic Rules for

expr Use Synthesized
Attributes Which Obtain
Their Values From Other Rules.
32
Notation
 The way to write arithmetic expression is known as a
notation.
 An arithmetic expression can be written in three different but

equivalent notations, i.e., without changing the essence or
output of an expression.
 These notations are

 Infix Notation
 Prefix (Polish) Notation
 Postfix (Reverse-Polish) Notation
33
Notation
 We write expression in infix notation,
 e.g. a - b + c, where operators are used in-between operands

 Prefix Notation:- In this notation, operator is prefixed to operands,
 i.e. operator is written ahead of operands.
 For example, +ab. This is equivalent to its infix notation a + b. Prefix
notation is also known as Polish Notation.
 This notation style is known as Reversed Polish Notation. In this notation

style, the operator is postfixed to the operands
 i.e., the operator is written after the operands.
 For example, ab+. This is equivalent to its infix notation a + b.
34
Postfix notation for an expression E
 If E is a variable or constant, then the postfix notation for
E is E itself ( E.t≡E )
 If E is an expression of the form E1 op E2 where op is a

binary operator
 E1' is the postfix of E1, E2' is the postfix of E2
 Then E1' E2' op is the postfix for E1 op E2
 if E is (E1), and E1' is a postfix then E1' is the postfix for E
eg) 9 - 5 + 2 ⇒ 9 5 - 2 +
9 - (5 + 2) ⇒ 9 5 2 + -
35
Postfix Evaluation Algorithm
 Postfix Evaluation Algorithm
We shall now look at the algorithm on how to evaluate postfix

notation −
 Step 1 − scan the expression from left to right
 Step 2 − if it is an operand push it to stack
 Step 3 − if it is an operator pull operand from stack and perform
operation
 Step 4 − store the output of step 3, back to stack
 Step 5 − scan the expression until all operands are consumed
 Step 6 − pop the stack and perform operation
36
Annotated Parse Tree
 The parse tree containing the values of attributes at each
node for given input string is called annotated or decorated
parse tree.
 It is a parse tree showing the values of the attributes at each

node.
 Features :
 High level specification
 Hides implementation details
 Explicit order of evaluation is not specified
37
Example Annotated Parse Tree
 Examples: (9-5)+2 → 9 5 – 2 +
Expr.t=expr.t+term.t
Expr.t=expr.t-term.t
9 – (5 + 2) → 9 5 2 + - Expr.t=term.t
Term.t=0|1|…..|9
38
Depth-First Traversals
 It is a method for exploring a tree node. In a DFT, you go as deep as possible down one path
before backing up and trying a different one.
 You explore one path, hit a dead end, and go back and try a different one.
 In this traversal first the deepest node is visited and then backtracks to it’s parent node if no
sibling of that node exist.
 procedure visit(n : node);
begin
for each child m of n, from left to right do
visit(m);
evaluate semantic rules at node n

end
39
Depth-First Traversals (Example)
40
Depth-First Traversals
 The Depth First Traversals of this Tree will be
 In-order (Left, Root, Right) : 4 2 5 1 3
 Pre-order (Root, Left, Right) : 1 2 4 5 3
 Post-order (Left, Right, Root) : 4 5 2 3 1
41
Parsing
 Parsing = process of determining if a string of tokens can be
generated by a grammar.
 The parser is that phase of the compiler which takes a token string
as input and with the help of existing grammar, converts it into the
corresponding Intermediate Representation(IR).
 The parser is also known as Syntax Analyzer.
 The parser is mainly classified into two categories

 Top-down Parser
 Bottom-up Parser.
42
Parsing
 The Top-down parser is the parser that generates parse for the given input
string with the help of grammar productions by expanding the non-terminals.
 i.e. it starts from the start symbol and ends on the terminals.
 It uses left most derivation.
 Derivation of a token string occurs in a top down fashion.
 It constructs from the Grammar which is free from ambiguity and left
recursion.
 Top-down parser is classified into 2 types:

 A recursive descent parser, and Non-recursive descent parser.
43
A recursive descent parser
 This technique follows the process for every terminal and
non-terminal entity.
 It reads the input from left to right and constructs the parse
tree from right to left. And have one backtracking
 As the technique works recursively, it is called recursive

descent parsing.(with back tracking).
 If one derivation of a production fails, the syntax analyzer

restarts the process using different rules of same production.
44
A recursive descent parser
 It uses leftmost derivation to construct a parse tree.
 For Example, Consider: Input: abbcde
S -> aABe aABe - aAbcBe – abbcBe -- abbcde
A -> Abc | b
B -> d
45
Non-recursive descent parser
 Parser or predictive parser or without backtracking parser or
dynamic parser. It uses a parsing table to generate the parse tree
instead of backtracking.
 Predictive parsing is a special form of recursive descent parsing

where we use one lookahead token to unambiguously determine the
parse operations.
 where no Back Tracking is required.
 Guess a production, see if it matches, if not, backtrack and try

another.
46
Predictive Parsing
 Parser Operates by Attempting to Match Tokens in the Input Stream.
 Utilize both Grammar and Input Below to Motivate Code for Algorithm
 It is a kind of top-down parsing that predicts a production whose derived terminal symbol is
equal to next input symbol while expanding in top-down paring. o without backtracking
For example, input stream is a + b.
lookahead == a
match()
lookahead == +
match ()
lookahead == b
47
Predictive parsing
procedure match(t : token);
procedure simple();
begin begin
if lookahead = t then if lookahead = ‘integer’ then
lookahead := nexttoken() match(‘integer’)
else error() else if lookahead = ‘char’ then
end; match(‘char’)
else if lookahead = ‘num’ then
procedure type(); match(‘num’);
begin match(‘dotdot’);
if lookahead in { ‘integer’, ‘char’, ‘num’ } then match(‘num’)
else error()
simple() end;
else if lookahead = ‘^’ then
match(‘^’); match(id)
else if lookahead = ‘array’ then
match(‘array’); match(‘[‘); simple();
match(‘]’); match(‘of’); type()
else error()
end; 48
Advantages of Top-Down Parsing
 Advantages
 Top-down parsing is very simple.
 It is very easy to identify the action decision of the top-down
parser.
 Disadvantages
 Top-down parsing is unable to handle left recursion in the
present in the grammar.
 Some recursive descent parsing may require backtracking.
49
Problem with Top Down Parsing
 Left Recursion in CFG May Cause Parser to Loop

Forever.
 Backtracking
 Left factoring
 Ambiguity
50
Left Recursion
 A production of grammar is said to have left recursion if the leftmost variable of its RHS is
same as variable of its LHS.
 A grammar is left recursive if it contains a nonterminal A, such that A⇒+ Aα, where is any
string.
 Grammar {S→ Sα | c} is left recursive because of S⇒Sα
 Grammar {S→ Aα, A→ Sb | c} is also left recursive because of S⇒Aα⇒ Sb
 If a grammar is left recursive, you cannot build a predictive top down parser for it.
 If a parser is trying to match S & S→Sα, it has no idea how many times S must be applied
 Given a left recursive grammar, it is always possible to find another grammar that
generates the same language and is not left recursive.
 The resulting grammar might or might not be suitable for RDP.
51
Left Recursion
 When a production for nonterminal A starts with a self reference then a
predictive parser loops forever
A → Aα / β
We can eliminate left recursive productions by systematically rewriting the

grammar using right recursive productions
A → βA’
A’ → αA’ / ∈
Exercise: Remove the left recursion in the following grammar:
expr → expr + term | expr – term expr → term
solution: expr → term rest , rest → + term rest | - term rest | ε
52
Right Recursion
 A production of grammar is said to have right recursion if the
rightmost variable of its RHS is same as variable of its LHS.
 A grammar containing a production having right recursion is called as

Right Recursive Grammar.
 Example : S → aS / ∈
 Right recursion does not create any problem for the Top down parsers.
 Therefore, there is no need of eliminating right recursion from the

grammar.
53
Left Factoring
 If more than one grammar production rules has a common prefix
string, then the top-down parser cannot make a choice as to which of
the production it should take to parse the string in hand.
 Example:- If a top-down parser encounters a production like
A ⟹ αβ | α𝜸 | …
 Then it cannot determine which production to follow to parse the

string as both productions are starting from the same terminal (or non-
terminal).
 To remove this confusion, we use a technique called left factoring.
54
Left Factoring
 Left factoring transforms the grammar to make it useful for top-down parsers.
In this technique, we make one production for each common prefixes and the
rest of the derivation is added by new productions.
 The above productions can be written as
A => αA’
A'=> β | 𝜸 | …
 Now the parser has only one production per prefix which makes it easier to
take decisions.
A concrete example: <stmt> → IF <boolean> THEN <stmt> | IF <boolean> THEN <stmt> ELSE <stmt>
is transformed into
<stmt>→ IF <boolean> THEN<stmt> S’ S' → ELSE <stmt> | ε
55
Bottom-Up parsing
 The bottom-up parsing works just the reverse of the

top-down parsing. It first traces the rightmost
derivation of the input until it reaches the start
symbol.
 It starts from non-terminals and ends on the start

symbol. It uses the reverse of the rightmost derivation.
56
Bottom-Up parsing
 Example: eg. a + b * c
 Input string : a + b * c
 S→E
 E→E+T
 E→E*T
 E→T
 T → id
57
Adding a Lexical Analyzer
 Typical tasks of the lexical analyzer:
 Remove white space and comments
 Remove white space(blank, tab, new line etc.) and comments
 Encode constants as tokens
Constants: For a while, consider only integers
Eg. for input 31 + 28, output(token representation)?
input : 31 + 28
output: <num, 31> <+, > <num, 28>
Num, + :token
31 28 : attribute, value(or lexeme) of integer token num
58
 Recognizing
 Identifiers:- are names of variables, arrays, functions...
 A grammar treats an identifier as a token.
input : count = count + increment;
output : <id,1> <=, > <id,1> <+, > <id, 2>;
 Symbol table tokens attributes(lexeme)
0
1 Id Count
2 id Increment
3
 Keywords are reserved, i.e., they cannot be used as identifiers.
59
 Interface to lexical analyzer.
60
 A lexical analyzer.
61
LEXICAL ERRORS:
 LEXICAL ERRORS:
 Lexical errors are the errors thrown by your Lexer when
unable to continue.
 Which means that there's no way to recognize a
lexeme as a valid token for your Lexer.
 Already recognized valid tokens but don’t match
any of the right sides of your grammar rules.
 Error-recovery actions
62
FINITE AUTOMATON
 A recognizer for a language is a program that takes a string x, and
answers “yes” if x is a sentence of that language, and “no” otherwise.
 We call the recognizer of the tokens as a finite automaton.
 A finite automaton can be: deterministic (DFA) or non-deterministic

(NFA)
 This means that we may use a deterministic or non-deterministic

automaton as a lexical analyzer.
 Both deterministic and non-deterministic finite automaton recognize

regular sets.
63
FINITE AUTOMATON
 Deterministic – faster recognizer, but it may take more
space.
 Non-deterministic – slower, but it may take less space
 Deterministic automatons are widely used lexical

analyzers
 First, we define regular expressions for tokens; Then we

convert them into a DFA to get a lexical analyzer for our
tokens.
64
INPUT BUFFERING
 The LA scans the characters of the source program one at a
time to discover tokens. Because of large amount of time
can be consumed scanning characters, specialized
buffering techniques have been developed to reduce the
amount of overhead required to process an input character.
 Buffering techniques:
 Buffer pairs
 Sentinels
65
INPUT BUFFERING
 Lexical Analysis scans input string from left to right one
character at a time to identify tokens. It uses two pointers to
scan tokens
 Begin Pointer (bptr) − It points to the beginning of the string to be
read.
 Look Ahead Pointer (lptr) − It moves ahead to search for the end of
the token.
Example − For statement int a, b;
 Both pointers start at the beginning of the string, which is stored in the
buffer.
66
INPUT BUFFERING
•Look Ahead Pointer scans buffer until the token is found.
•After processing token ("int") both pointers will set to the next token ('a'),
& this process will be repeated for the whole program.
67
Symbol Table
 The symbol table is a data structure used in compiler design. Compilers
keep track of the occurrence of various entities, including variable names,
function names, objects, classes, and interfaces in the symbol table.
 It is utilized in the compiler's different phases

 Lexical Analysis: New table entries are created in the table, For example,
entries about tokens.
 Syntax Analysis: Adds the information about attribute type,, dimension, scope
line of reference, use, etc in the table.
 Semantic Analysis: Checks for semantics in the table, i.e., verifies that
expressions and assignments are semantically accurate (type checking) and
updates the table appropriately.
68
Symbol Table
 Intermediate Code generation: The symbol table is used to
determine how much and what type of run-time is allocated, as
well as to add temporary variable data.
 Code Optimization: Uses information from the symbol table for

machine-dependent optimization.
 Target Code generation: Uses the address information of the

identifier in the table to generate code.
69
Operations Of Symbol Table
Operation Function
allocate To allocate a new empty symbol table
free To remove all entries and free storage of symbol

table
lookup To search for a name and return a pointer to its
entry
insert To insert a name in a symbol table and return a
pointer to its entry
set_attribute To associate an attribute with a given array
get_attribute To get an attribute associated with a given array
70
Symbol Table
 A symbol table is a table that binds names to information. We
need a number of operations on symbol tables to accomplish
this:
 We need an empty symbol table, in which no name is defined.
 We need to be able to bind a name to a piece of information.
 We need to be able to look up a name in a symbol table to
find the information.
 We need to be able to enter a new scope.
 We need to be able to exit a scope.
71
Implementation of symbol tables
 There are many ways to implement symbol tables, but the most
important distinction between these is how scopes are handled. This
may be done using
 A persistent (or functional) data structure, or
 Imperative (or destructively-updated) data structure.
 A persistent data structure has the property that no operation on the

structure will destroy it. Conceptually, a new modified copy is made of
the data structure whenever an operation updates it, hence preserving
the old structure unchanged.
72
Implementation of symbol tables
 In the imperative approach, only one copy of the symbol table
exists, so explicit actions are required to store the information
needed to restore the symbol table to a previous state.
 This can be done by using an auxiliary stack.
 When an update is made, the old binding of a name that is

overwritten is recorded (pushed) on the auxiliary stack.
73
Simple persistent symbol tables
 Empty: An empty symbol table is an empty list.
 Binding: A new binding (name/information pair) is added to the

front of the list.
 Lookup: The list is searched until a pair with a matching name

is found. The information paired with the name is then returned.
 Enter: The old list is remembered, i.e., a reference is made to it.
 Exit: The old list is recalled, i.e., the above reference is used.
74
A simple imperative symbol table
 Empty: An empty symbol table is an empty stack.
 Binding: A new binding (name/information pair) is pushed on top of

the stack.
 Lookup: The stack is searched top-to-bottom until a matching name is

found.
 Enter: We push a marker on the top of the stack.
 Exit: We pop bindings from the stack until a marker is found. This is
also popped from the stack.
75
Implementation Techniques
 If a compiler only needs to process a small quantity of data, the
symbol table can be implemented as an unordered list, which is
simple to code but only works for small tables.
 The following methods can be used to create a symbol table:

 Linear (sorted or unsorted) list
 Binary Search Tree
 Hash table
76
ABSTRACT STACK MACHINE
 An abstract machine is for intermediate code generation/execution.
 Instruction classes: arithmetic / stack manipulation / control flow
 3 components of abstract stack machine

 Instruction memory : abstract machine code, intermediate
code(instruction)
 Stack
 Data memory
77
 An example of stack machine operation.
for a input (5+a)*b, intermediate codes : push 5 rvalue 2 ....
78
 L-value and r-value
 l-values a : address of location a
 r-values a : if a is location, then content of location a
 if a is constant, then value a
 eg) a :=5 + b;
 lvalue a⇒2 r value 5 ⇒ 5 r value of b ⇒ 7
79

CH2-1 To CH2-3

Uploaded by

Copyright:

Available Formats

CH2-1 To CH2-3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH2-1 To CH2-3

Uploaded by

Copyright:

Available Formats

A simple One Pass Compiler

 Develop a source code parser: (Top down parser)

 Implementing syntax directed translation to generate

Example : if-else statement in C has the form of statement → if (

 An alphabet of a language is a set of symbols.

Examples : {0,1} for a binary number system (language)

{a,b,c} for language={a,b,c, ac,abcc..}

{if,(,),else ...} for a if statements={if(a==1)goto10, if--}

 CFG is a set of recursive rules used to generate patterns of strings.

 It is useful to describe most of the programming languages.

 If the grammar is properly designed then an efficient parser can be

 Each set is denoted by a name, which is called a nonterminal.

 A production has the form N → X1…….Xn

either a terminal or a nonterminal.

 says that the set denoted by A contains all strings formed by

 From regular expressions to context free grammars

1. A Set of Tokens(Terminal Symbols)

3. A Set of Production Rules

Each Rule Has the Form NT →{T, NT}*

4. A designated Start symbol.

G = <{list, digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list> with a production P=

List → list + digit

(the “|” means OR)

 The basic idea of derivation is to consider productions as rewrite rules:

 During parsing we have to take two decisions. These are as follows:

 In each step, we replace one non terminal in the current

 Formally, we define the derivation relation by the three rules

1: N =>    if there is a production N → 

3:  =>  if there is a  such that  => and =>

generates the string aabbbcc by the derivation

 list → list + digit P1: list → list + digit

→list - digit + digit P2: list → list - digit

→digit - digit + digit P3:list→digit

→9 - digit + digit P4: digit →9

→9 - 5 + digit P4: digit → 5

This is an example leftmost derivation, because we replaced the

➢ More Formally, a Parse Tree for a CFG Has the

➢ Parse tree of the string 9-5+2 using grammar G

 Consider the following context-free grammar:

G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>

string → string + string | string - string | 0 | 1 | … | 9

 This grammar is ambiguous, because more than one parse tree

➢ An operator  is right-associative if the expression abc must be evaluated

➢ An operator  is non-associative if expressions of the form abc are illegal.

➢ Left-associative operators have left-recursive productions.

eg) 9+5+2≡(9+5)+2, a=b=c≡a=(b=c)

• Left Associative Grammar • Right Associative Grammar

list → list + digit | list – digit right → letter = right | letter

digit →0|1|…|9 letter → a|b|…|z

➢ We say that a operator(*) has higher precedence than other

expr → expr + term | term

String 2+3*5 has the same meaning as 2+(3*5)

term term factor

factor factor number

 To interleave(mix) semantic analysis with the syntax analysis phase of the

 Syntax-directed translation rules use :-

 AND associates a set of attributes with (non)terminals.

 AND associates with each production a set of semantic

 The attributes contain the translated form of the input

String 2+35 has the same meaning as 2+(35)