Efi Arazi School of Computer Science

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Efi Arazi School of Computer Science Digital Systems Construction

Compiler I:
Syntax Analysis

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 1

Course map

Human Abstract design Software


abstract interface
Thought hierarchy
Chapters 9, 12
H.L. Language Compiler
& abstract interface
Chapters 10 - 11
Operating Sys.
Virtual VM Translator
abstract interface
Machine Chapters 7 - 8

Assembly
Language

Assembler

Chapter 6

abstract interface
Computer
Machine Architecture
abstract interface
Language
Chapters 4 - 5
Hardware Gate Logic
abstract interface
Platform Chapters 1 - 3 Electrical
Chips & Engineering
Hardware Physics
Logic Gates
hierarchy

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 2

Copyright © Shimon Schocken


The big picture
Modern compilers:
Some
language
... Some Other
language
... Jack
language
„ Front-end:
from high-level
language to some Compiler
intermediate Some
Some Other
Jack lectures
language compiler
compiler compiler

„ Back-end:
from the
intermediate Intermediate code
language to
binary code. VM
implementation VM imp. VM imp.
VM over the Hack VM
over CISC over RISC
emulator platform lectures
platforms platforms

CISC RISC written in Hack


machine
language
machine
language
... a high-level
language
machine
language

... ...
CISC RISC other digital platforms, each equipped Any Hack
machine machine with its VM implementation computer computer

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 3

... ... Jack

Compiler architecture (front end)


Some Some Other
language language language

Some
Some Other
Jack
compiler compiler
compiler

Intermediate code

VM
implementation VM imp. VM imp.
VM over the Hack
over CISC over RISC
emulator platform
platforms platforms

(Chapter 10) XML CISC RISC written in Hack


machine
language
machine
language
... a high-level
language
machine
language

Jack Compiler code


... ...

Syntax Analyzer

Jack Toke-
Code (Chapter 11) VM
Parser Gene
Program nizer code
-ration

Front-end:
„ Syntax analysis: understanding the semantics implied by the source code
‰ Tokenizing: creating a list of “atoms”
‰ Parsing: matching the atom list with the language grammar
XML output = proof that the syntax analyzer is parsing correctly
„ Code generation: reconstructing the semantics using the target syntax.

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 4

Copyright © Shimon Schocken


Tokenizing / Lexical analysis

„ Remove white space


„ Construct a token list (language atoms)
„ Things to worry about:
z Language specific rules:
e.g. how to treat “++”
z Language specific token types:
keyword, identifier, operator, constant, ...

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 5

Jack Tokenizer
Source code
if
if (x
(x << 153)
153) {let
{let city
city == ”Paris”;}
”Paris”;}

Tokenizer’s output
<tokens>
<tokens>
<keyword>
<keyword> if
if </keyword>
</keyword>
<symbol>
<symbol> (( </symbol>
</symbol>
<identifier>
<identifier> xx </identifier>
</identifier>
<symbol>
<symbol> &lt;
&lt; </symbol>
</symbol>
<integerConstant>
<integerConstant> 153153 </integerConstant>
</integerConstant>
<symbol>
<symbol> )) </symbol>
</symbol>
<symbol>
<symbol> {{ </symbol>
</symbol>
<keyword>
<keyword> let
let </keyword>
</keyword>
<identifier>
<identifier> city
city </identifier>
</identifier>
<symbol>
<symbol> == </symbol>
</symbol>
<stringConstant>
<stringConstant> Paris
Paris </stringConstant>
</stringConstant>
<symbol>
<symbol> ;; </symbol>
</symbol>
<symbol>
<symbol> }} </symbol>
</symbol>
</tokens>
</tokens>

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 6

Copyright © Shimon Schocken


Parsing

„ Each language is characterized by a grammar


„ A text is given:
z The parser, using the grammar, can either accept or reject the text
z In the process, the parser performs a complete analysis of the text
„ The language can be:
z Context-dependent (English, …)
z Context-free (Jack, …).

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 7

Examples
context free context dependent

(5+3)*2 – sqrt(9*4) she discussed sex with her doctor

- parse 1 parse 2

sqrt
* discussed with
discussed
+ 2 *
she sex her doctor
5 3 9 4 she with

sex her doctor

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 8

Copyright © Shimon Schocken


code sample
A typical grammar (C/Java-like) while
while (some
(some expression)
expression) {{
if
if (some
(some expression)
expression)
program: statement; some
program: statement; some statement;
statement;
while
while (some
(some expression)
expression) {{
statement: whileStatement some
statement: whileStatement
|| ifStatement some statement;
statement;
ifStatement if
if (some
(some expression)
expression)
|| ////other
otherstatement
statementpossibilities
possibilities...... some
|| '{' some statement;
statement;
'{' statementSequence
statementSequence '}' '}' }}
while
while (some
(some expression)
expression) {{
whileStatement:
whileStatement: 'while'
'while' '('
'(' expression
expression ')'
')' statement some
statement some statement;
statement;
some
some statement;
statement;
ifStatement:
ifStatement: simpleIf
simpleIf }}
|| ifElse
ifElse }}

simpleIf:
simpleIf: 'if'
'if' '('
'(' expression
expression ')'
')' statement
statement
ifElse:
ifElse: 'if'
'if' '('
'(' expression
expression ')'
')' statement
statement 'else'
'else' statement
statement
statementSequence:
statementSequence: '' ////null,
'' null,i.e.
i.e.the
theempty
emptysequence
sequence
|| statement
statement ';' ';' statementSequence
statementSequence
code sample
expression:
expression: ////definition
definitionofofan
anexpression
expressioncomes
comeshere
here if
if (some
(some expression)
expression) {{
statement;
statement;
////more
moredefinitions
definitionsfollow
follow while
while (some
(some expression)
expression)
statement;
statement;
statement;
statement;
„ Simple (terminal) forms / complex (non-terminal) forms }}
if
if (some
(some expression)
„ Grammar = set of rules on how to construct if
expression)
if (some
(some expression)
expression)
complex forms from simpler forms some
some statement;
statement;
}}
„ Highly recursive.

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 9

Parse tree program:


program: statement;
statement;
statement:
statement: whileStatement
whileStatement
|| ifStatement
ifStatement
|| ////other
otherstatement
statementpossibilities
possibilities......
Input Text: || '{'
statement '{' statementSequence
statementSequence '}' '}'
while (count<=100) {
whileStatement:
whileStatement: 'while'
'while'
/** demonstration */
'('
'(' expression
expression ')'
')'
count++;
statement
statement
// ...
whileStatement ...
...
Tokenized:

while
(
count
<=
100 expression statement
)
{
count
++ statementSequence
;
...

statement statementSequence

while ( count <= 100 ) { count ++ ; ...


Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 10

Copyright © Shimon Schocken


Recursive descent parsing

code sample
while
while (some
(some expression)
expression) {{
some
some statement;
statement;
some
some statement;
statement;
while
while (some
(some expression)
expression) {{
while
while (some
(some expression)
expression)
some
some statement;
statement;
some
some statement;
statement;
}}
}}

„ Highly recursive „ parseStatement()

„ LL(0) grammars: the first token „ parseWhileStatement()


determines in which rule we are
„ parseIfStatement()
„ In other grammars you have to „ parseStatementSequence()
look ahead 1 or more tokens
„ parseExpression().
„ Jack is almost LL(0).
Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 11

A linguist view on parsing

Parsing:

One of the mental processes involved


in sentence comprehension, in which
the listener determines the syntactic
categories of the words, joins them
up in a tree, and identifies the
subject, object, and predicate, a
prerequisite to determining who did
what to whom from the information in
the sentence.

(Steven Pinker,
The Language Instinct)

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 12

Copyright © Shimon Schocken


The Jack grammar

’x’: xxappears
’x’: appearsverbatim
verbatim
x: xxisisaalanguage
x: languageconstruct
construct
x?: xxappears
x?: appears00oror11times
times
x*: xxappears
x*: appears00orormore
moretimes
times
x|y: either
x|y: eitherxxor
oryyappears
appears
(x,y): xxappears,
(x,y): appears,then
theny.y.
Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 13

The Jack grammar (cont.)

’x’: xxappears
’x’: appearsverbatim
verbatim
x: x isaalanguage
x: x is languageconstruct
construct
x?: xxappears
x?: appears00oror11times
times
x*: x appears 0 ormore
x*: x appears 0 or moretimes
times
x|y: either
x|y: eitherxxor
oryyappears
appears
(x,y): xxappears,
(x,y): appears,then
theny.y.
Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 14

Copyright © Shimon Schocken


Jack syntax analyzer in action
Class
Class Bar
Bar {{ <varDec>
<varDec>
method
method Fraction
Fraction foo(int
foo(int y)y) {{ <keyword>
<keyword> var
var </keyword>
</keyword>
var
var int
int temp;
temp; //
// aa variable <keyword>
variable <keyword> int
int </keyword>
</keyword>
let
let temp
temp == (xxx+12)*-63;
(xxx+12)*-63; <identifier>
... <identifier> temp
temp </identifier>
</identifier>
... <symbol>
<symbol> ;; </symbol>
</symbol>
...
... </varDec>
</varDec>
Syntax analyzer <statements>
Syntax analyzer
<statements>
<letStatement>
<letStatement>
„ Using the language grammar, a
<keyword>
<keyword> let
let </keyword>
programmer can write a syntax analyzer </keyword>
program <identifier>
<identifier> temp
temp </identifier>
</identifier>
<symbol>
<symbol> == </symbol>
„ The syntax analyzer takes a source text </symbol>
file and attempts to match it on the <expression>
<expression>
language grammar <term>
<term>
„ If successful, it generates a parse tree in <symbol>
<symbol> (( </symbol>
</symbol>
some structured format, e.g. XML. <expression>
<expression>
This syntax analyzer’s algorithm: <term>
<term>
„ If xxx is non-terminal, output: <identifier>
<identifier> xxx
xxx </identifier>
</identifier>
<xxx> </term>
</term>
Recursive code for the body of xxx <symbol> + </symbol>
</xxx> <symbol> + </symbol>
<term>
<term>
„ If xxx is terminal (keyword, symbol, constant, or identifier) , <int.Const.>
<int.Const.> 1212 </int.Const.>
</int.Const.>
output: </term>
</term>
<xxx> </expression>
</expression>
xxx value ...
</xxx> ...
Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 15

Summary and next step


„ Syntax analysis: understanding syntax
„ Code generation: constructing semantics
(Chapter 10) XML
Jack Compiler code

Syntax Analyzer

Jack Toke-
Code (Chapter 11) VM
Parser Gene
Program nizer code
-ration

The code generation challenge:


„ Extend the syntax analyzer into a full-blown compiler that, instead of
generating passive XML code, generates executable VM code
„ Two challenges: (a) handling data, and (b) handling commands.

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 16

Copyright © Shimon Schocken


Perspective
„ The parse tree can be constructed on the fly
„ Bottom up compilation is harder and more powerful
„ Syntax analyzers are typically built using tools like:
z Lex for tokenizing
z Yacc for parsing
„ The Jack language is intentionally simple:
z Statement prefixes: let, do, ...
z No operator priority
z No error checking
z Basic data types, etc.
„ Typical languages are richer, requiring more powerful compilers
„ The Jack compiler: designed to illustrate the key ideas that underlie
modern compilers, leaving advanced features to more advanced courses.

Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, www.idc.ac.il/tecs , Chapter 10: Compiler I: Syntax Analysis slide 17

Copyright © Shimon Schocken

You might also like