Oz Language Syntax

C Language Syntax
The devil is in the details.

Traditional proverb.
God is in the details.
Traditional proverb.
I dont know what is in those details,
but it must be something important!
Irreverent proverb.
This appendix denes the syntax of the complete language used in the book,
including all syntactic conveniences. The language is a subset of the Oz language
as implemented by the Mozart system. The appendix is divided into six sections:
Section C.1 denes the syntax of interactive statements, i.e., statements that can
be fed into the interactive interface.
Section C.2 denes the syntax of statements and expressions.
Section C.3 denes the syntax of the nonterminals needed to dene statements
and expressions.
Section C.4 lists the operators of the language with their precedence and associa-
tivity.
Section C.5 lists the keywords of the language.
Section C.6 denes the lexical syntax of the language, i.e., how a character
sequence is transformed into a sequence of tokens.
To be precise, this appendix denes a context-free syntax for a superset of the
language. This keeps the syntax simple and easy to read. The disadvantage of a
context-free syntax is that it does not capture all syntactic conditions for legal
programs. For example, take the statement local X in statement end. The
statement that contains this one must declare all the free variable identiers of
statement, possibly minus X. This is not a context-free condition.
This appendix denes the syntax of a subset of the full Oz language, as dened
in [55, 87]. This appendix diers from [87] in several ways: it introduces nestable
constructs, nestable declarations, and terms to factor the common parts of state-
ment and expression syntax; it denes interactive statements and for loops; it
834 Language Syntax
interStatement ::= statement
| declare { declarationPart }+ [ interStatement ]
| declare { declarationPart }+ in interStatement
Table C.1: Interactive statements.
statement ::= nestCon(statement) | nestDec(variable)
| skip | statement statement
expression ::= nestCon(expression) | nestDec($)
| unaryOp expression
| expression evalBinOp expression
| $ | term | self
inStatement ::= [ { declarationPart }+ in ] statement
inExpression ::= [ { declarationPart }+ in ] [ statement ] expression
in(statement) ::= inStatement
in(expression) ::= inExpression
Table C.2: Statements and expressions.
leaves out the translation to the kernel language (which is given for each linguistic
abstraction in the main text of the book); and it makes other small simplications
for clarity (but without sacricing precision).
C.1 Interactive statements
Table C.1 gives the syntax of interactive statements. An interactive statement is
a superset of a statement; in addition to all regular statements, it can contain
a declare statement. The interactive interface must always be fed interactive
statements. All free variable identiers in the interactive statement must exist in the
global environment; otherwise the system gives a variable not introduced error.
C.2 Statements and expressions
Table C.2 gives the syntax of statements and expressions. Many language constructs
can be used in either a statement position or an expression position. We call such
constructs nestable. We write the grammar rules to give their syntax just once,
C.2 Statements and expressions 835
nestCon() ::= expression ( = | := | , ) expression
| { expression { expression } }
| local { declarationPart }+ in [ statement ] end
| ( in() )
| if expression then in()
{ elseif expression then in() }
[ else in() ] end
| case expression of pattern [ andthen expression ] then in()
{ [] pattern [ andthen expression ] then in() }
[ else in() ] end
| for { loopDec }+ do in() end
| try in()
[ catch pattern then in()
{ [] pattern then in() } ]
[ finally inStatement ] end
| raise inExpression end
| thread in() end
| lock [ expression then ] in() end
Table C.3: Nestable constructs (no declarations).
nestDec() ::= proc { { pattern } } inStatement end
| fun [ lazy ] { { pattern } } inExpression end
| functor
[ import { variable [ at atom ]
| variable (
{ (atom | int) [ : variable ] }+ )
}+ ]
[ export { [ (atom | int) : ] variable }+ ]
define { declarationPart }+ [ in statement ] end
| class { classDescriptor }
{ meth methHead [ = variable ]
( inExpression | inStatement ) end }
end
Table C.4: Nestable declarations.
836 Language Syntax
term ::= [ ! ] variable | int | oat | character
| atom | string | unit | true | false
| label ( { [ feature : ] expression } )
| expression consBinOp expression
| [ { expression }+ ]
pattern ::= [ ! ] variable | int | oat | character
| atom | string | unit | true | false
| label ( { [ feature : ] pattern } [ ... ] )
| pattern consBinOp pattern
| [ { pattern }+ ]
Table C.5: Terms and patterns.
in a way that works for both statement and expression positions. Table C.3 gives
the syntax for nestable constructs, not including declarations. Table C.4 gives the
syntax for nestable declarations. The grammar rules for nestable constructs and
declarations are templates with one argument. The template is instantiated each
time it is used. For example, nestCon() denes the template for nestable con-
structs without declarations. This template is used twice, as nestCon(statement)
and nestCon(expression), and each corresponds to one grammar rule.
C.3 Nonterminals for statements and expressions
Tables C.5 and C.6 dene the nonterminal symbols needed for the statement and
expression syntax of the preceding section. Table C.5 denes the syntax of terms
and patterns. Note the close relationship between terms and patterns. Both are used
to dene partial values. There are just two dierences: (1) patterns can contain only
variable identiers, whereas terms can contain expressions, and (2) patterns can be
partial (using ...), whereas terms cannot.
Table C.6 denes nonterminals for the declaration parts of statements and loops,
for unary operators, for binary operators (constructing operators consBinOp
and evaluating operators evalBinOp), for records (labels and features), and for
classes (descriptors, attributes, methods, etc.).
C.4 Operators
Table C.7 gives the precedence and associativity of all the operators used in the
book. All the operators are binary inx operators, except for three cases. The
minus sign is a unary prex operator. The hash symbol # is an n-ary
C.4 Operators 837
declarationPart ::= variable | pattern = expression | statement
loopDec ::= variable in expression [ .. expression ] [ ; expression ]
| variable in expression ; expression ; expression
| break : variable | continue : variable
| return : variable | default : expression
| collect : variable
unaryOp ::= | @ | !!
binaryOp ::= consBinOp | evalBinOp
consBinOp ::= # | |
evalBinOp ::= + | - |
*
| / | div | mod | . | andthen | orelse
| := | , | = | == | \= | < | =< | > | >=
| :: | =: | \=: | =<:
label ::= unit | true | false | variable | atom
feature ::= unit | true | false | variable | atom | int
classDescriptor ::= from { expression }+ | prop { expression }+
| attr { attrInit }+
attrInit ::= ( [ ! ] variable | atom | unit | true | false )
[ : expression ]
methHead ::= ( [ ! ] variable | atom | unit | true | false )
[ ( { methArg } [ ... ] ) ]
[ = variable ]
methArg ::= [ feature : ] ( variable | _ | $ ) [ <= expression ]
Table C.6: Other nonterminals needed for statements and expressions.
mixx operator. The . := is a ternary inx operator that is explained in the
next section. There are no postx operators. The operators are listed in order of
increasing precedence, i.e., tightness of binding. The operators lower in the table
bind tighter. We dene the associativities as follows:
Left. For binary operators, this means that repeated operators group to the left.
For example, 1+2+3 means the same as ((1+2)+3).
Right. For binary operators, this means that repeated operators group to the
right. For example, a|b|X means the same as (a|(b|X)).
Mixx. Repeated operators are actually just one operator, with all expressions
being arguments of the operator. For example, a#b#c means the same as #(a b
c).
None. For binary operators, this means that the operator cannot be repeated. For
example, 1<2<3 is an error.
Parentheses can be used to override the default precedence.
838 Language Syntax
Operator Associativity
= right
:= . := right
orelse right
andthen right
== \= < =< > >= =: \=: =<: none
:: none
| right
# mixx
+ - left
*
/ div mod left
, right
left
. left
@ !! left
Table C.7: Operators with their precedence and associativity.
.
S I X
(any ref)
(index)
(dictionary)
S . I := X
X
(any ref)
(S . I) := X
(cell)
. := :=
S I
(index)
or record)
(dictionary
Figure C.1: The ternary operator . :=.
C.4.1 Ternary operator
There is one ternary (three-argument) operator, . :=, which is designed for
dictionary and array updates. It has the same precedence and associativity as :=. It
can be used in an expression position like :=, where it has the eect of an exchange.
The statement S.I:=X consists of a ternary operator with arguments S, I, and X.
This statement is used for updating dictionaries and arrays. This should not be
confused with (S.I):=X, which consists of the two nested binary operators . and
:=. The latter statement is used for updating a cell that is inside a dictionary.
The parentheses are highly signicant! Figure C.1 shows the dierence in abstract
C.5 Keywords 839
andthen default false lock require (*)
at define feat (*) meth return
attr dis (*) finally mod self
break div for not (*) skip
case do from of then
catch else fun or (*) thread
choice elsecase (*) functor orelse true
class elseif if otherwise try
collect elseof (*) import prepare (*) unit
cond (*) end in proc
continue export lazy prop
declare fail local raise
Table C.8: Keywords.
syntax between S.I:=X and (S.I):=X. In the gure, (cell) means any cell or object
attribute, and (dictionary) means any dictionary or array.
The distinction is important because dictionaries can contain cells. To update a
dictionary D, we write D.I:=X. To update a cell in a dictionary containing cells,
we write (D.I):=X. This has the same eect as local C=D.I in C:=X end but
is more concise. The rst argument of the binary operator := must be a cell or an
object attribute.
C.5 Keywords
Table C.8 lists the keywords of the language in alphabetic order. Keywords marked
with (*) exist in Oz but are not used in the book. Keywords in boldface can be used
as atoms by enclosing them in quotes. For example, then is an atom, whereas
then is a keyword. Keywords not in boldface can be used as atoms directly, without
quotes.
C.6 Lexical syntax
This section denes the lexical syntax of Oz, i.e., how a character sequence is
transformed into a sequence of tokens.
840 Language Syntax
variable ::= (uppercase char) { (alphanumeric char) }
| ` { variableChar | pseudoChar } `
atom ::= (lowercase char) { (alphanumeric char) } (except no keyword)
| { atomChar | pseudoChar }
string ::= " { stringChar | pseudoChar } "
character ::= (any integer in the range 0. . . 255)
| & charChar | & pseudoChar
Table C.9: Lexical syntax of variables, atoms, strings, and characters.
variableChar ::= (any inline character except `, \, and NUL)
atomChar ::= (any inline character except , \, and NUL)
stringChar ::= (any inline character except ", \, and NUL)
charChar ::= (any inline character except \ and NUL)
pseudoChar ::= \ octdigit octdigit octdigit
| (\x | \X) hexdigit hexdigit
| \a | \b | \f | \n | \r | \t
| \v | \\ | \ | \" | \` | \&
Table C.10: Nonterminals needed for lexical syntax.
int ::= [ ] nzdigit { digit }
| [ ] 0 { octdigit }+
| [ ] (0x | 0X) { hexdigit }+
| [ ] (0b | 0B) { bindigit }+
oat ::= [ ] { digit }+ . { digit } [ (e | E) [ ] { digit }+ ]
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
nzdigit ::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
octdigit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
hexdigit ::= digit | a | b | c | d | e | f
| A | B | C | D | E | F
bindigit ::= 0 | 1
Table C.11: Lexical syntax of integers and oating point numbers.
C.6 Lexical syntax 841
C.6.1 Tokens
Variables, atoms, strings, and characters
Table C.9 denes the lexical syntax for variable identiers, atoms, strings, and
characters in strings. Unlike the previous sections which dene token sequences,
this section denes character sequences. An alphanumeric character is a letter
(uppercase or lowercase), a digit, or an underscore character. Single quotes are
used to delimit atom representations that may contain nonalphanumeric characters
and backquotes are used in the same way for variable identiers. Note that an
atom cannot have the same character sequence as a keyword unless the atom is
quoted. Table C.10 denes the nonterminals needed for table C.9. Any inline
character includes control characters and accented characters. The NUL character
has character code 0 (zero).
Integers and oating point numbers
Table C.11 denes the lexical syntax of integers and oating point numbers. Note
the use of the (tilde) for the unary minus symbol.
C.6.2 Blank space and comments
Tokens may be separated by any amount of blank space and comments. Blank space
is one of the characters tab (character code 9), newline (code 10), vertical tab (code
11), form feed (code 12), carriage return (code 13), and space (code 32). A comment
is one of three possibilities:
A sequence of characters starting from the character % (percent) until the end of
the line or the end of the le (whichever comes rst).
A sequence of characters starting from /
*
and ending with
*
/, inclusive. This
kind of comment may be nested.
The single character ? (question mark). This is intended to mark the output
arguments of procedures, as in
proc {Max A B ?C} ... end
where C is an output. An output argument is an argument that gets bound inside
the procedure.

Oz Language Syntax

Uploaded by

Oz Language Syntax

Uploaded by

C Language Syntax

The devil is in the details.

You might also like