ARM Assembly Language Programming
ARM Assembly Language Programming
Peter Knaggs
and
Stephen Welsh
Contents i
List of Programs vii
Preface ix
1 Introduction 1
1.1 The Meaning of Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Binary Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 A Computer Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 The Binary Programming Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Using Octal or Hexadecimal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Instruction Code Mnemonics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 The Assembler Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6.1 Additional Features of Assemblers . . . . . . . . . . . . . . . . . . . . . . . 4
1.6.2 Choosing an Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7 Disadvantages of Assembly Language . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.8 High-Level Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.8.1 Advantages of High-Level Languages . . . . . . . . . . . . . . . . . . . . . . 6
1.8.2 Disadvantages of High-Level Languages . . . . . . . . . . . . . . . . . . . . 7
1.9 Which Level Should You Use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.9.1 Applications for Machine Language . . . . . . . . . . . . . . . . . . . . . . . 8
1.9.2 Applications for Assembly Language . . . . . . . . . . . . . . . . . . . . . . 8
1.9.3 Applications for High-Level Language . . . . . . . . . . . . . . . . . . . . . 8
1.9.4 Other Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.10 Why Learn Assembler? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Assemblers 11
2.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Delimiters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Operation Codes (Mnemonics) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3.1 The DEFINE CONSTANT (Data) Directive . . . . . . . . . . . . . . . . . 14
2.3.2 The EQUATE Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.3 The AREA Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.4 Housekeeping Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.5 When to Use Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Operands and Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Decimal Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Other Number Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.3 Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.4 Character Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
i
ii CONTENTS
3 ARM Architecture 23
3.1 Processor modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 The stack pointer, SP or R13 . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 The Link Register, LR or R14 . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3 The program counter, PC or R15 . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.4 Current Processor Status Registers: CPSR . . . . . . . . . . . . . . . . . . . 28
3.3 Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.1 Conditional Execution: hcc i . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.2 Data Processing Operands: hop1 i . . . . . . . . . . . . . . . . . . . . . . . 32
3.5.3 Memory Access Operands: hop2 i . . . . . . . . . . . . . . . . . . . . . . . . 34
4 Instruction Set 37
4.0.4 Branch instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.0.5 Data-processing instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.0.6 Status register transfer instructions . . . . . . . . . . . . . . . . . . . . . . . 39
4.0.7 Load and store instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.0.8 Coprocessor instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.0.9 Exception-generating instructions . . . . . . . . . . . . . . . . . . . . . . . . 41
4.0.10 Conditional Execution: hcc i . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Addressing Modes 45
5.1 Data Processing Operands: hop1 i . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.1 Unmodied Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.2 Logical Shift Left . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.3 Logical Shift Right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.4 Arithmetic Shift Right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.5 Rotate Right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.6 Rotate Right Extended . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Memory Access Operands: hop2 i . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.1 Oset Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.2 Pre-Index Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.3 Post-Index Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6 Programs 51
6.1 Example Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.1.1 Program Listing Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.1.2 Guidelines for Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Trying the examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.3 Trying the examples from the command line . . . . . . . . . . . . . . . . . . . . . . 53
6.3.1 Setting up TextPad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.4 Program Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.5 Special Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
CONTENTS iii
7 Data Movement 57
7.1 Program Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.1.1 16-Bit Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.1.2 One's Complement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.1.3 32-Bit Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.1.4 Shift Left One Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.1.5 Byte Disassembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.1.6 Find Larger of Two Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.1.7 64-Bit Adition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.1.8 Table of Factorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2.1 64-Bit Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2.2 32-Bit Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2.3 Shift Right Three Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.2.4 Halfword Assembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.2.5 Find Smallest of Three Numbers . . . . . . . . . . . . . . . . . . . . . . . . 66
7.2.6 Sum of Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.2.7 Shift Left n bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8 Logic 69
9 Program Loops 71
9.1 Program Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.1.1 Sum of numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.1.2 Number of negative elements . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.1.3 Find Maximum Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.1.4 Normalize A Binary Number . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.2.1 Checksum of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
9.2.2 Number of Zero, Positive, and Negative numbers . . . . . . . . . . . . . . . 77
9.2.3 Find Minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9.2.4 Count 1 Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9.2.5 Find element with most 1 bits . . . . . . . . . . . . . . . . . . . . . . . . . 77
10 Strings 79
10.1 Handling data in ASCII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
10.2 A string of characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
10.2.1 Fixed Length Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.2.2 Terminated Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.2.3 Counted Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.3 International Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.4 Program Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.4.1 Length of a String of Characters . . . . . . . . . . . . . . . . . . . . . . . . 82
10.4.2 Find First Non-Blank Character . . . . . . . . . . . . . . . . . . . . . . . . 84
10.4.3 Replace Leading Zeros with Blanks . . . . . . . . . . . . . . . . . . . . . . . 84
10.4.4 Add Even Parity to ASCII Chatacters . . . . . . . . . . . . . . . . . . . . . 85
10.4.5 Pattern Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10.5.1 Length of a Teletypewriter Message . . . . . . . . . . . . . . . . . . . . . . 88
10.5.2 Find Last Non-Blank Character . . . . . . . . . . . . . . . . . . . . . . . . . 88
10.5.3 Truncate Decimal String to Integer Form . . . . . . . . . . . . . . . . . . . 88
10.5.4 Check Even Parity and ASCII Characters . . . . . . . . . . . . . . . . . . . 89
10.5.5 String Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
iv CONTENTS
11 Code Conversion 91
11.1 Program Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.1.1 Hexadecimal to ASCII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.1.2 Decimal to Seven-Segment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.1.3 ASCII to Decimal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
11.1.4 Binary-Coded Decimal to Binary . . . . . . . . . . . . . . . . . . . . . . . . 94
11.1.5 Binary Number to ASCII String . . . . . . . . . . . . . . . . . . . . . . . . 95
11.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.2.1 ASCII to Hexadecimal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.2.2 Seven-Segment to Decimal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.2.3 Decimal to ASCII . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.2.4 Binary to Binary-Coded-Decimal . . . . . . . . . . . . . . . . . . . . . . . . 97
11.2.5 Packed Binary-Coded-Decimal to Binary String . . . . . . . . . . . . . . . . 97
11.2.6 ASCII string to Binary number . . . . . . . . . . . . . . . . . . . . . . . . . 97
12 Arithmetic 99
12.1 Program Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.1.2 64-Bit Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
12.1.3 Decimal Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
12.1.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
12.1.5 32-Bit Binary Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
12.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
12.2.1 Multiple precision Binary subtraction . . . . . . . . . . . . . . . . . . . . . 103
12.2.2 Decimal Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
12.2.3 32-Bit by 32-Bit Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
vii
viii LIST OF PROGRAMS
12.3 addbcd.s Add two packed BCD numbers to give a packed BCD result . . . . . 100
12.4a mul16.s 16 bit binary multiplication . . . . . . . . . . . . . . . . . . . . . . . . 101
12.4b mul32.s Multiply two 32 bit number to give a 64 bit result (corrupts R0 and R1)101
12.5 divide.s Divide a 32 bit binary no by a 16 bit binary no store the quotient and
remainder there is no 'DIV' instruction in ARM! . . . . . . . . . . . . 102
13.1a insert.s Examine a table for a match - store a new entry at the end if no match
found . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
13.1b insert2.s Examine a table for a match - store a new entry if no match found
extends insert.s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
13.2 search.s Examine an ordered table for a match . . . . . . . . . . . . . . . . . . 106
13.3 head.s Remove the rst element of a queue . . . . . . . . . . . . . . . . . . . 107
13.4 sort.s Sort a list of values simple bubble sort . . . . . . . . . . . . . . . . . 108
Broadly speaking, you can divide the history of computers into four periods: the mainframe, the
mini, the microprocessor, and the modern post-microprocessor. The mainframe era was charac-
terized by computers that required large buildings and teams of technicians and operators to keep
them going. More often than not, both academics and students had little direct contact with the
mainframeyou handed a deck of punched cards to an operator and waited for the output to ap-
pear hours later. During the mainfame era, academics concentrated on languages and compilers,
algorithms, and operating systems.
The minicomputer era put computers in the hands of students and academics, because university
departments could now buy their own minis. As minicomputers were not as complex as main-
frames and because students could get direct hands-on experience, many departments of computer
science and electronic engineering taught students how to program in the native language of the
computerassembly language. In those days, the mid 1970s, assembly language programming
was used to teach both the control of I/O devices, and the writing of programs (i.e., assembly
language was taught rather like high level languages). The explosion of computer software had
not taken place, and if you wanted software you had to write it yourself.
The late 1970s saw the introduction of the microprocessor. For the rst time, each student was
able to access a real computer. Unfortunately, microprocessors appeared before the introduction
of low-cost memory (both primary and secondary). Students had to program microprocessors
in assembly language because the only storage mechanism was often a ROM with just enough
capacity to hold a simple single-pass assembler.
The advent of the low-cost microprocessor system (usually on a single board) ensured that virtually
every student took a course on assembly language. Even today, most courses in computer science
include a module on computer architecture and organization, and teaching students to write
programs in assembly language forces them to understand the computer's architecture. However,
some computer scientists who had been educated during the mainframe era were unhappy with
the microprocessor, because they felt that the 8-bit microprocessor was a retrograde stepits
architecture was far more primitive than the mainframes they had studied in the 1960s.
The 1990s is the post-microprocessor era. Today's personal computers have more power and
storage capacity than many of yesterday's mainframes, and they have a range of powerful software
tools that were undreamed of in the 1970s. Moreover, the computer science curriculum of the
1990s has exploded. In 1970 a student could be expected to be familiar with all eld of computer
science. Today, a student can be expected only to browse through the highlights.
The availability of high-performance hardware and the drive to include more and more new ma-
terial in the curriculum, has put pressure on academics to justify what they teach. In particular,
many are questioning the need for courses on assembly language.
If you regard computer science as being primarily concerned with the use of the computer, you
can argue that assembly language is an irrelevance. Does the surgeon study metallurgy in order
to understand how a scalpel operates? Does the pilot study thermodynamics to understand how
a jet engine operates? Does the news reader study electronics to understand how the camera
ix
x PREFACE
operates? The answer to all these questions is no . So why should we inict assembly language
and computer architecture on the student?
First, education is not the same as training. The student of computer science is not simply being
trained to use a number of computer packages. A university course leading to a degree should
also cover the history and the theoretical basis for the subject. Without a knowledge of computer
architecture, the computer scientist cannot understand how computers have developed and what
they are capable of.
The argument for teaching assembly language programming today can be divided into two com-
ponents: the underpinning of computer architecture and the underpinning of computer software.
Assembly language teaches how a computer works at the machine (i.e., register) level. It is there-
fore necessary to teach assembly language to all those who might later be involved in computer
architectureeither by specifying computers for a particular application, or by designing new
architectures. Moreover, the von Neumann machine's sequential nature teaches students the limi-
tation of conventional architectures and, indirectly, leads them on to unconventional architectures
(parallel processors, Harvard architectures, data ow computers, and even neural networks).
It is probably in the realm of software that you can most easily build a case for the teaching of
assembly language. During a student's career, he or she will encounter a lot of abstract concepts in
subjects ranging from programming languages, to operating systems, to real-time programming,
to AI. The foundation of many of these concepts lies in assembly language programming and
computer architecture. You might even say that assembly language provides bottom-up support
for the top-down methodology we teach in high-level languages. Consider some of the following
examples (taken from the teaching of Advanced RISC Machines Ltd (ARM) assembly language).
Data types
Students come across data types in high-level languages and the eects of strong and weak
data typing. Teaching an assembly language that can operate on bit, byte, word and long
word operands helps students understand data types. Moreover, the ability to perform any
type of assembly language operation on any type of data structure demonstrates the need
for strong typing.
Addressing modes
A vital component of assembly language teaching is addressing modes (literal, direct, and
indirect). The student learns how pointers function and how pointers are manipulated. This
aspect is particularly important if the student is to become a C programmer. Because an
assembly language is unencumbered by data types, the students' view of pointers is much
simplied by an assembly language. The ARM has complex addressing modes that support
direct and indirect addressing, generated jump tables and handling of unknown memory
osets.
PREFACE xi
Recursion
The recursive calling of subroutines often causes a student problems. You can use an assem-
bly language, together with a suitable system with a tracing facility, to demonstrate how
recursion operates. The student can actually observe how the stack grows as procedures are
called.
Protected-mode operation
Members of the ARM family operate in either a priviledge mode or a user mode. The
operating system operates in the priviledge mode and all user (applications) programs run in
the user mode. This mechanism can be used to construct secure or protected environments in
which the eects of an error in one application can be prevented from harming the operating
system (or other applications).
Input-output
Many high-level languages make it dicult to access I/O ports and devices directly. By
using an assembly language we can teach students how to write device drivers and how to
control interfaces. Most real interfaces are still programmed at the machine level by accessing
registers within them.
All these topics can, of course, be taught in the appropriate courses (e.g., high-level languages,
operating systems). However, by teaching them in an assembly language course, they pave the
way for future studies, and also show the student exactly what is happening within the machine.
Conclusion
A strong case can be made for the continued teaching of assembly language within the computer
science curriculum. However, an assembly language cannot be taught just as if it were another
general-purpose programming language as it was once taught ten years ago. Perhaps more than
any other component of the computer science curriculum, teaching an assembly language supports
a wide range of topics at the heart of computer science. An assembly language should not be used
just to illustrate algorithms, but to demonstrate what is actually happening inside the computer.
xii PREFACE
1 Introduction
A computer program is ultimately a series of numbers and therefore has very little meaning to a
human being. In this chapter we will discuss the levels of human-like language in which a computer
program may be expressed. We will also discuss the reasons for and uses of assembly language.
The instruction set of a microprocessor is the set of binary inputs that produce dened actions
during an instruction cycle. An instruction set is to a microprocessor what a function table is to a
logic device such as a gate, adder, or shift register. Of course, the actions that the microprocessor
performs in response to its instruction inputs are far more complex than the actions that logic
devices perform in response to their inputs.
The microprocessor (like any other computer) only recognises binary patterns as instructions or
data; it does not recognise characters or octal, decimal, or hexadecimal numbers.
Actually, a computer program includes more than instructions, it also contains the data and the
memory addresses that the microprocessor needs to accomplish the tasks dened by the instruc-
tions. Clearly, if the microprocessor is to perform an addition, it must have two numbers to add
and a place to put the result. The computer program must determine the sources of the data and
the destination of the result as well as the operation to be performed.
All microprocessors execute instructions sequentially unless an instruction changes the order of
execution or halts the processor. That is, the processor gets its next instruction from the next
higher memory address unless the current instruction specically directs it to do otherwise.
Ultimately, every program is a set of binary numbers. For example, this is a snippet of an ARM
program that adds the contents of memory locations 809432 and 809832 and places the result in
memory location 809C32 :
1
2 CHAPTER 1. INTRODUCTION
11100101100111110001000000010000
11100101100111110001000000001000
11100000100000010101000000000000
11100101100011110101000000001000
This is a machine language, or object, program. If this program were entered into the memory of
an ARM-based microcomputer, the microcomputer would be able to execute it directly.
There are many diculties associated with creating programs as object, or binary machine lan-
guage, programs. These are some of the problems:
• The programs are dicult to understand or debug. (Binary numbers all look the same,
particularly after you have looked at them for a few hours.)
• The programs do not describe the task which you want the computer to perform in anything
resembling a human-readable format.
• The programmer often makes careless errors that are very dicult to locate and correct.
For example, the following version of the addition object program contains a single bit error. Try
to nd it:
11100101100111110001000000010000
11100101100111110001000000001000
11100000100000010101000000000000
11100110100011110101000000001000
Although the computer handles binary numbers with ease, people do not. People nd binary
programs long, tiresome, confusing, and meaningless. Eventually, a programmer may start re-
membering some of the binary codes, but such eort should be spent more productively.
We can improve the situation somewhat by writing instructions using octal or hexadecimal num-
bers, rather than binary. We will use hexadecimal numbers because they are shorter, and because
they are the standard for the microprocessor industry. Table 1.1 denes the hexadecimal digits
and their binary equivalents. The ARM program to add two numbers now becomes:
E59F1010
E59f0008
E0815000
E58F5008
At the very least, the hexadecimal version is shorter to write and not quite so tiring to examine.
Errors are somewhat easier to nd in a sequence of hexadecimal digits. The erroneous version of
the addition program, in hexadecimal form, becomes:
1.5. INSTRUCTION CODE MNEMONICS 3
E59F1010
E59f0008
E0815000
E68F5008
The hexadecimal version of the program is still dicult to read or understand; for example, it
does not distinguish operations from data or addresses, nor does the program listing provide any
suggestion as to what the program does. What does 3038 or 31C0 mean? Memorising a card full
of codes is hardly an appetising proposition. Furthermore, the codes will be entirely dierent for
a dierent microprocessor and the program will require a large amount of documentation.
An obvious programming improvement is to assign a name to each instruction code. The instruc-
tion code name is called a mnemonic or memory jogger.
In fact, all microprocessor manufacturers provide a set of mnemonics for the microprocessor in-
struction set (they cannot remember hexadecimal codes either). You do not have to abide by the
manufacturer's mnemonics; there is nothing sacred about them. However, they are standard for
a given microprocessor, and therefore understood by all users. These are the instruction codes
that you will nd in manuals, cards, books, articles, and programs. The problem with selecting
instruction mnemonics is that not all instructions have obvious names. Some instructions do
(for example, ADD, AND, ORR), others have obvious contractions (such as SUB for subtraction,EOR
for exclusive-OR), while still others have neither. The result is such mnemonics as BIC, STMIA,
and even MRS. Most manufacturers come up with some reasonable names and some hopeless ones.
However, users who devise their own mnemonics rarely do much better.
Along with the instruction mnemonics, the manufacturer will usually assign names to the CPU
registers. As with the instruction names, some register names are obvious (such as A for Accumu-
lator) while others may have only historical signicance. Again, we will use the manufacturer's
suggestions simply to promote standardisation.
If we use standard ARM instruction and register mnemonics, as dened by Advanced RISC Ma-
chines, our ARM addition program becomes:
4 CHAPTER 1. INTRODUCTION
The program is still far from obvious, but at least some parts are comprehensible. ADD is a
considerable improvement over E59F. The LDR mnemonic does suggest loading data into a register
or memory location. We now see that some parts of the program are operations and others are
addresses. Such a program is an assembly language program.
How do we get the assembly language program into the computer? We have to translate it, either
into hexadecimal or into binary numbers. You can translate an assembly language program by
hand, instruction by instruction. This is called hand assembly.
The following table illustrates the hand assembly of the addition program:
Hand assembly is a rote task which is uninteresting, repetitive, and subject to numerous minor
errors. Picking the wrong line, transposing digits, omitting instructions, and misreading the codes
are only a few of the mistakes that you may make. Most microprocessors complicate the task even
further by having instructions with dierent lengths. Some instructions are one word long while
others may be two or three. Some instructions require data in the second and third words; others
require memory addresses, register numbers, or who knows what?
Assembly is a rote task that we can assign to the microcomputer. The microcomputer never
makes any mistakes when translating codes; it always knows how many words and what format
each instruction requires. The program that does this job is an assembler. The assembler
program translates a user program, or source program written with mnemonics, into a machine
language program, or object program, which the microcomputer can execute. The assembler's
input is a source program and its output is an object program.
Assemblers have their own rules that you must learn. These include the use of certain markers
(such as spaces, commas, semicolons, or colons) in appropriate places, correct spelling, the proper
control of information, and perhaps even the correct placement of names and numbers. These
rules are usually simple and can be learned quickly.
• Allowing the user to assign names to memory locations, input and output devices, and even
sequences of instructions
• Converting data or addresses from various number systems (for example, decimal or hex-
adecimal) to binary and converting characters into their ASCII or EBCDIC binary codes
1.7. DISADVANTAGES OF ASSEMBLY LANGUAGE 5
• Telling the loader program where in memory parts of the program or data should be placed
• Allowing the user to assign areas of memory as temporary data storage and to place xed
data in areas of program memory
• Providing the information required to include standard programs from program libraries, or
programs written at some other time, in the current program
• Allowing the user to control the format of the program listing and the input and output
devices employed
The assembler does not solve all the problems of programming. One problem is the tremendous gap
between the microcomputer instruction set and the tasks which the microcomputer is to perform.
Computer instructions tend to do things like add the contents of two registers, shift the contents
of the Accumulator one bit, or place a new value in the Program Counter. On the other hand, a
user generally wants a microcomputer to do something like print a number, look for and react to
a particular command from a teletypewriter, or activate a relay at the proper time. An assembly
language programmer must translate such tasks into a sequence of simple computer instructions.
The translation can be a dicult, time-consuming job.
Furthermore, if you are programming in assembly language, you must have detailed knowledge of
the particular microcomputer that you are using. You must know what registers and instructions
the microcomputers has, precisely how the instructions aect the various registers, what addressing
methods the computer uses, and a mass of other information. None of this information is relevant
to the task which the microcomputer must ultimately perform.
In addition, assembly language programs are not portable. Each microcomputer has its own
assembly language which reects its own architecture. An assembly language program written for
the ARM will not run on a 486, Pentium, or Z8000 microprocessor. For example, the addition
program written for the Z8000 would be:
LD R0,%6000
ADD R0,%6002
LD %6004,R0
The lack of portability not only means that you will not be able to use your assembly language
program on a dierent microcomputer, but also that you will not be able to use any programs that
were not specically written for the microcomputer you are using. This is a particular drawback
for new microcomputers, since few assembly language programs exist for them. The result, too
frequently, is that you are on your own. If you need a program to perform a particular task, you
are not likely to nd it in the small program libraries that most manufacturers provide. Nor are
you likely to nd it in an archive, journal article, or someone's old program File. You will probably
have to write it yourself.
6 CHAPTER 1. INTRODUCTION
The solution to many of the diculties associated with assembly language programs is to use,
insted, high-level or procedure-oriented langauges. Such languages allow you to describe tasks in
forms that are problem-oriented rather than computer-oriented. Each statement in a high-level
language performs a recognisable function; it will generally correspond to many assembly language
instruction. A program called a compiler translates the high-level language source program into
object code or machine language instructions.
Many dierent hgih-level languages exist for dierent types of tasks. If, for exampe, you can
express what you want the computer to do in algebraic notation, you can write your FORTRAN
(For mula Tran slation Language), the oldest of the high-level languages. Now, if you want to add
two numbers, you just tell the computer:
That is a lot simpler (and shorter) than either the equivalent machine language program or the
equivalent assembly language program. Other high-level languages include COBOL (for business
applications), BASIC (a cut down version of FORTRAN designed to prototype ideas before codeing
them in full), C (a systems-programming language), C++ and JAVA (object-orientated general
development languages).
Machine Independence
High-level languages solve many other problems associated with assembly language programming.
The high-level language has its own syntax (usually dened by an international standard). The
language does not mention the instruction set, registers, or other features of a particular computer.
The compiler takes care of all such details. Programmers can concentrate on their own tasks; they
do not need a detailed understanding of the underlying CPU architecture for that matter, they
do not need to know anything about the computer the are programming.
Portability
Programs written in a high-level language are portable at least, in theory. They will run on
any computer that has a standard compiler for that language.
At the same time, all previous programs written in a high-level language for prior computers and
available to you when programming a new computer. This can mean thousands of programs in
the case of a common language like C.
1.8. HIGH-LEVEL LANGUAGES 7
Syntax
One obvious problem is that, as with assembly language, you have to learn the rules or syntax
of any high-level language you want to use. A high-level langauge has a fairly complicated set of
rules. You will nd that it takes a lot of time just to get a program that is syntactically correct
(and even then it probably will not do what you want). A high-level computer language is like
a foreign language. If you have talent, you will get used to the rules and be able to turn out
programs that the compiler will accept. Still, learning the rules and trying to get the program
accepted by the compiler does not contribute directly to doing your job.
Cost of Compilers
Another obvious problem is that you need a compiler to translate program written in a high-level
language into machine language. Compilers are expensive and use a large amount of memory.
While most assemblers occupy only a few KBytes of memory, compilers would occupy far larger
amounts of memory. A compiler could easily require over four times as much memory as an
assembler. So the amount of overhead involved in using the compiler is rather large.
Furthermore, only some compilers will make the implementation of your task simpler. Each
language has its own target proglem area, for example, FORTRAN is well-suited to problems
that can be expressed as algebraic formulas. If however, your problem is controlling a display
terminal, editing a string of characters, or monitoring an alarm system, your problem cannot
be easily expressed. In fact, formulating the solution in FORTRAN may be more awkward and
more dicult than formulating it in assembly language. The answer is, of course, to use a more
suitable high-level language. Languages specically designed for tasks such as those mentioned
above do exist they are called system implementation languages. However, these languages are
less widely used.
Ineciency
High-level languages do not produce very ecient machine language program. The basic reason
for this is that compilation is an automatic process which is riddled with compromises to allow for
many ranges of possibilities. The compiler works much like a computerised language translator
sometimes the words are right but the sentence structures are awkward. A simpler compiler connot
know when a variable is no longer being used and can be discarded, when a register should be
used rather than a memory location, or when variables have simple relationships. The experienced
programmer can take advantage of shortcuts to shorten execution time or reduce memory usage.
A few compiler (known as optimizing cmpilers) can also do this, but such compilers are much
larger than regular compilers.
8 CHAPTER 1. INTRODUCTION
Which language level you use depends on your particulr application. Let us briey note some of
the factors which may favor particular levels:
If hardware will ultimately be the largest cost in your application, or if speed is critical, you should
favor assembly language. But be prepared to spend much extra time in software development in
exchange for lower memory costs and higher execution speeds. If software will be the largest cost
in your application, you should favor a high-level language. But be prepared to spend the extra
money required for the supporting hardware and software.
Of course, no one except some theorists will object if you use both assembly and high-level lan-
guages. You can write the program originally in a high-level language and then patch some sections
in assembly language. However, most users prefer not to do this because it can create havoc in
debugging, testing, and documentation.
Given the advance of high-level languages, why do you need to learn assembly language program-
ming? The reasons are:
1.10. WHY LEARN ASSEMBLER? 9
2. Many microcomputer users will continue to program in assembly language since they need
the detailed control that it provides.
6. Almost all microcomputer programmers ultimately nd that they need some knowledge of
assembly language, most often to debug programs, write I/O routines, speed up or shorten
critical sections of programs written in high-level languages, utilize or modify operating
system functions, and undertand other people's programs.
The rest of these notes will deal exclusively with assembler and assembly language programming.
10 CHAPTER 1. INTRODUCTION
2 Assemblers
This chapter discusses the functions performed by assemblers, beginning with features common
to most assemblers and proceeding through more elaborate capabilities such as macros and con-
ditional assembly. You may wish to skim this chapter for the present and return to it when you
feel more comfortable with the material.
As we mentioned, today's assemblers do much more than translate assembly language mnemonics
into binary codes. But we will describe how an assembler handles the translation of mnemonics
before describing additional assembler features. Finally we will explain how assemblers are used.
2.1 Fields
Assembly language instructions (or statements) are divided into a number of elds .
The operation code eld is the only eld which can never he empty; it always contains either an
instruction mnemonic or a directive to the assembler, sometimes called a pseudo-instruction,
pseudo-operation, or pseudo-op.
The operand or address eld may contain an address or data, or it may be blank.
The comment and label elds are optional. A programmer will assign a label to a statement or
add a comment as a personal convenience: namely, to make the program easier to read and use.
Of course, the assembler must have some way of telling where one eld ends and another begins.
Assemblers often require that each eld start in a specic column. This is a xed format.
However, xed formats are inconvenient when the input medium is paper tape; xed formats are
also a nuisance to programmers. The alternative is a free format where the elds may appear
anywhere on the line.
2.1.1 Delimiters
If the assembler cannot use the position on the line to tell the elds apart, it must use something
else. Most assemblers use a special symbol or delimiter at the beginning or end of each eld.
11
12 CHAPTER 2. ASSEMBLERS
whitespace Between label and operation code, between operation code and ad-
dress, and before an entry in the comment eld
comma Between operands in the address eld
asterisk Before an entire line of comment
semicolon Marks the start of a comment on a line that contains preceding code
The most common delimiter is the space character. Commas, periods, semicolons, colons, slashes,
question marks, and other characters that would not otherwise be used in assembly language
programs also may serve as delimiters. The general form of layout for the ARM assembler is:
You will have to exercise a little care with delimiters. Some assemblers are fussy about extra spaces
or the appearance of delimiters in comments or labels. A well-written assembler will handle these
minor problems, but many assemblers are not well-written. Our recommendation is simple: avoid
potential problems if you can. The following rules will help:
• Do not use extra spaces, in particular, do not put spaces after commas that separate
operands, even though the ARM assembler allows you to do this.
• Include standard delimiters even if your assembler does not require them. Then it will be
more likely that your programs are in correct form for another assembler.
2.1.2 Labels
The label eld is the rst eld in an assembly language instruction; it may be blank. If a label
is present, the assembler denes the label as equivalent to the address into which the rst byte
of the object code generated for that instruction will be loaded. You may subsequently use the
label as an address or as data in another instruction's address eld. The assembler will replace
the label with the assigned value when creating an object program.
The ARM assembler requires labels to start at the rst character of a line. However, some other
assemblers also allow you to have the label start anywhere along a line, in which case you must
use a colon (:) as the delimiter to terminate the label eld. Colon delimiters are not used by the
ARM assembler.
Labels are most frequently used in Branch or SWI instructions. These instructions place a new
B 15016
value in the program counter and so alter the normal sequential execution of instructions.
means place the value 15016 in the program counter. The next instruction to be executed will
be the one in memory location 15016 . The instruction B START means place the value assigned
to the label START in the program counter. The next instruction to be executed will be the on at
the address corresponding to the label START. Figure 2.1 contains an example.
• The label can easily be moved, if required, to change or correct a program. The assembler
will automatically change all instructions that use the label when the program is reassembled.
2.1. FIELDS 13
When the machine language version of this program is executed, the instruction
B START causes the address of the instruction labeled START to be placed in the
program counter That instruction will then be executed.
• The assembler can relocate the whole program by adding a constant (a relocation constant)
to each address in which a label was used. Thus we can move the program to allow for the
insertion of other programs or simply to rearrange memory.
• The program is easier to use as a library program; that is, it is easier for someone else to
take your program and add it to some totally dierent program.
• You do not have to gure out memory addresses. Figuring out memory addresses is partic-
ularly dicult with microprocessors which have instructions that vary in length.
You should assign a label to any instruction that you might want to refer to later.
The next question is how to choose a label. The assembler often places some restrictions on the
number of characters (usually 5 or 6), the leading character (often must be a letter), and the
trailing characters (often must be letters, numbers, or one of a few special characters). Beyond
these restrictions, the choice is up to you.
Our own preference is to use labels that suggest their purpose, i.e., mnemonic labels. Typical
examples are ADDW in a routine that adds one word into a sum, SRCHETX in a routine that searches
for the ASCII character ETX, or NKEYS for a location in data memory that contains the number of
key entries. Meaningful labels are easier to remember and contribute to program documentation.
Some programmers use a standard format for labels, such as starting with L0000. These labels are
self-sequencing (you can skip a few numbers to permit insertions), but they do not help document
the program.
Some label selection rules will keep you out of trouble. We recommend the following:
• Do not use labels that are the same as operation codes or other mnemonics. Most assemblers
will not allow this usage; others will, but it is confusing.
• Do not use labels that are longer than the assembler recognises. Assemblers have various
rules, and often ignore some of the characters at the end of a long label.
• Avoid special characters (non-alphabetic and non-numeric) and lower-case letters. Some
assemblers will not permit them; others allow only certain ones. The simplest practice is to
stick to capital letters and numbers.
• Start each label with a letter. Such labels are always acceptable.
• Do not use labels that could be confused with each other. Avoid the letters I, O, and Z and
the numbers 0, 1 , and 2. Also avoid things like XXXX and XXXXX. Assembly programming is
dicult enough without tempting fate or Murphy's Law.
• When you are not sure if a label is legal, do not use it. You will not get any real benet
from discovering exactly what the assembler will accept.
14 CHAPTER 2. ASSEMBLERS
These are recommendations, not rules. You do not have to follow them but don't blame us if you
waste time on unnecessary problems.
One main task of the assembler is the translation of mnemonic operation codes into their binary
equivalents. The assembler performs this task using a xed table much as you would if you were
doing the assembly by hand.
The assembler must, however, do more than just translate the operation codes. It must also
somehow determine how many operands the instruction requires and what type they are. This
may be rather complex some instructions (like a Stop) have no operands, others (like a Jump
instruction) have one, while still others (like a transfer between registers or a multiple-bit shift)
require two. Some instructions may even allow alternatives; for example, some computers have
instructions (like Shift or Clear) which can either apply to a register in the CPU or to a memory
location. We will not discuss how the assembler makes these distinctions; we will just note that it
must do so.
2.3 Directives
Some assembly language instructions are not directly translated into machine language instruc-
tions. These instructions are directives to the assembler; they assign the program to certain areas
in memory, dene symbols, designate areas of memory for data storage, place tables or other xed
data in memory, allow references to other programs, and perform minor housekeeping functions.
To use these assembler directives or pseudo-operations a programmer places the directive's mnemonic
in the operation code eld, and, if the specied directive requires it, an address or data in the
address eld.
Dierent assemblers use dierent names for those operations but their functions are the same.
Housekeeping directives include:
We will discuss these pseudo-operations briey, although their functions are usually obvious.
The dene constant directive treats the data as a permanent part of the program.
The format of a dene constant directive is usually quite simple. An instruction like:
DZCON DCW 12
will place the number 12 in the next available memory location and assign that location the name
DZCON. Every DC directive usually has a label, unless it is one of a series. The data and label may
take any form that the assembler permits.
More elaborate dene constant directives that handle a large amount of data at one time are
provided, for example:
A single directive may ll many bytes of program memory, limited perhaps by the length of a
line or by the restrictions of a particular assembler. Of course, you can always overcome any
restrictions by following one dene constant directive with another:
Microprocessor assemblers typically have some variations of standard dene constant directives.
Dene Byte or DCB handles 8-bit numbers; Dene Word or DCW handles 32-bit numbers or addresses.
Other special directives may handle character-coded data. The ARM assembler also denes DCD
to (Dene Constant Data) which may be used in place of DCW.
The EQUATE directive assigns the numeric value in its operand eld to the label in its label eld.
Here are two examples:
TTY EQU 5
LAST EQU 5000
16 CHAPTER 2. ASSEMBLERS
Most assemblers will allow you to dene one label in terms of another, for example:
The label in the operand eld must, of course, have been previously dened. Often, the operand
eld may contain more complex expressions, as we shall see later. Double name assignments (two
names for the same data or address) may be useful in patching together programs that use dierent
names for the same variable (or dierent spellings of what was supposed to be the same name).
Note that an EQU directive does not cause the assembler to place anything in memory. The as-
sembler simply enters an additional name into a table (called a symbol table) which the assembler
maintains.
When do you use a name? The answer is: whenever you have a parameter that you might want to
change or that has some meaning besides its ordinary numeric value. We typically assign names to
time constants, device addresses, masking patterns, conversion factors, and the like. A name like
DELAY, TTY, KBD, KROW, or OPEN not only makes the parameter easier to change, but it also adds to
program documentation. We also assign names to memory locations that have special purposes;
they may hold data, mark the start of the program, or be available for intermediate storage.
What name do you use? The best rules are much the same as in the case of labels, except that
here meaningful names really count. Why not call the teletypewriter TTY instead of X15, a bit
time delay BTIME or BTDLY rather than WW, the number of the GO key on a keyboard GOKEY
rather than HORSE? This advice seems straightforward, but a surprising number of programmers
do not follow it.
Where do you place the EQUATE directives? The best place is at the start of the program, under
appropriate comment headings such as i/o addresses, temporary storage, time constants,
or program locations. This makes the denitions easy to nd if you want to change them.
Furthermore, another user will be able to look up all the denitions in one centralised place.
Clearly this practice improves documentation and makes the program easier to use.
Denitions used only in a specic subroutine should appear at the start of the subroutine.
The assembler maintains a location counter (comparable to the computer's program counter) which
contains the location in memory of the instruction or data item being processed. An area directive
causes the assembler to place a new value in the location counter, much as a Jump instruction
causes the CPU to place a new value in the program counter. The output from the assembler
must not only contain instructions and data, but must also indicate to the loader program where
in memory it should place the instructions and data.
Microprocessor programs often contain several AREA statements for the following purposes:
Still other origin statements may allow room for later insertions, place tables or data in memory,
or assign vacant memory space for data buers. Program and data memory in microcomputers
may occupy widely separate addresses to simplify the hardware. Typical origin statements are:
AREA RESET
AREA $1000
AREA INT3
The assembler will assume a fake address if the programmer does not put in an AREA statement.
The AREA statement at the start of an ARM program is required, and its absence will cause the
assembly to fail.
END, marks the end of the assembly language source program. This must appear in the le or a
missing END directive error will occur.
INCLUDE will include the contents of a named le into the current le. When the included le
has been processed the assembler will continue with the next line in the original le. For
example the following line
INCLUDE MATH.S
will include the content of the le math.s at that point of the le.
You should never use a lable with an include directive. Any labels dened in the included le
will be dened in the current le, hence an error will be reported if the same label appears
in both the source and include le.
An include le may itself include other les, which in turn could include other les, and so
on, however, the level of includes the assembler will accept is limited. It is not recommended
you go beyond three levels for even the most complex of software.
1. All EQU directives must have labels; they are useless otherwise, since the purpose of an EQU
is to dene its label.
2. Dene Constant and Dene Storage directives usually have labels. The label identies the
rst memory location used or assigned.
The assembler allow the programmer a lot of freedom in describing the contents of the operand or
address eld. But remember that the assembler has built-in names for registers and instructions
and may have other built-in names. We will now describe some common options for the operand
eld.
18 CHAPTER 2. ASSEMBLERS
ADD 100
means add the contents of memory location 10010 to the contents of the Accumulator.
It is good practice to enter numbers in the base in which their meaning is the clearest: that is,
decimal constants in decimal; addresses and BCD numbers in hexadecimal; masking patterns or
bit outputs in hexadecimal.
2.4.3 Names
Names can appear in the operand eld; they will be treated as the data that they represent.
Remember, however, that there is a dierence between operands and addresses. In an ARM
assembly language program the sequence:
FIVE EQU 5
ADD R2, #FIVE
will add the contents of memory location FIVE (not necessarily the number 5) to the contents of
data register R2.
Assemblers vary in what expressions they accept and how they interpret them. Complex expres-
sions make a program dicult to read and understand.
2.5. COMMENTS 19
• Masks and BCD numbers in decimal, ASCII characters in octal, or ordinary numerical
constants in hexadecimal serve no purpose and therefore should not be used.
• Keep expressions simple and obvious. Don't rely on obscure features of the assembler.
2.5 Comments
All assemblers allow you to place comments in a source program. Comments have no eect on the
object code, but they help you to read, understand, and document the program. Good commenting
is an essential part of writing computer programs, programs without comments are very dicult
to understand.
We will discuss commenting along with documentation in a later chapter, but here are some
guidelines:
• Use comments to tell what application task the program is performing, not how the micro-
computer executes the instructions.
• Comments should say things like is temperature above limit?, linefeed to TTY, or ex-
amine load switch.
• Comments should not say things like add 1 to Accumulator, jump to Start, or look at
carry. You should describe how the program is aecting the system; internal eects on the
CPU should be obvious from the code.
• Keep comments brief and to the point. Details should be available elsewhere in the docu-
mentation.
• Do not comment standard instructions or sequences that change counters or pointers; pay
special attention to instructions that may not have an obvious meaning.
• Comment all denitions, describing their purposes. Also mark all tables and data storage
areas.
• Be consistent in your terminology. You can (should) be repetitive, you need not consult a
thesaurus.
20 CHAPTER 2. ASSEMBLERS
• Leave yourself notes at points that you nd confusing: for example, remember carry was set
by last instruction. If such points get cleared up later in program development, you may
drop these comments in the nal documentation.
A well-commented program is easy to use. You will recover the time spent in commenting many
times over. We will try to show good commenting style in the programming examples, although
we often over-comment for instructional purposes.
Although all assemblers perform the same tasks, their implementations vary greatly. We will not
try to describe all the existing types of assemblers, we will merely dene the terms and indicate
some of the choices.
A cross-assembler is an assembler that runs on a computer other than the one for which it assembles
object programs. The computer on which the cross-assembler runs is typically a large computer
with extensive software support and fast peripherals. The computer for which the cross-assembler
assembles programs is typically a micro like the 6809 or MC68000.
A self-assembler or resident assembler is an assembler that runs on the computer for which it
assembles programs. The self-assembler will require some memory and peripherals, and it may
run quite slowly compared to a cross-assembler.
A microassembler is an assembler used to write the microprograms which dene the instruction
set of a computer. Microprogramming has nothing specically to do with programming micro-
computers, but has to do with the internal operation of the computer.
A meta-assembler is an assembler that can handle many dierent instruction sets. The user must
dene the particular instruction set being used.
A one-pass assembler is an assembler that goes through the assembly language program only
once. Such an assembler must have some way of resolving forward references, for example, Jump
instructions which use labels that have not yet been dened.
A two-pass assembler is an assembler that goes through the assembly language source program
twice. The rst time the assembler simply collects and denes all the symbols; the second time
it replaces the references with the actual denitions. A two-pass assembler has no problems with
forward references but may be quite slow if no backup storage (like a oppy disk) is available;
then the assembler must physically read the program twice from a slow input medium (like a
teletypewriter paper tape reader). Most microprocessor-based assemblers require two passes.
2.7 Errors
Assemblers normally provide error messages, often consisting of an error code number. Some
typical errors are:
2.8. LOADERS 21
2.8 Loaders
The loader is the program which actually takes the output (object code) from the assembler and
places it in memory. Loaders range from the very simple to the very complex. We will describe a
few dierent types.
A bootstrap loader is a program that uses its own rst few instructions to load the rest of itself
or another loader program into memory. The bootstrap loader may be in ROM, or you may have
to enter it into the computer memory using front panel switches. The assembler may place a
bootstrap loader at the start of the object program that it produces.
A relocating loader can load programs anywhere in memory. It typically loads each program
into the memory space immediately following that used by the previous program. The programs,
however, must themselves be capable of being moved around in this way; that is, they must be
relocatable. An absolute loader, in contrast, will always place the programs in the same area of
memory.
A linking loader loads programs and subroutines that have been assembled separately; it resolves
cross-references that is, instructions in one program that refer to a label in another program.
Object programs loaded by a linking loader must be created by an assembler that allows external
references. An alternative approach is to separate the linking and loading functions and have the
linking performed by a program called a link editor and the loading done by a loader.
22 CHAPTER 2. ASSEMBLERS
3 ARM Architecture
This chapter outlines the ARM processor's architecture and describes the syntax rules of the ARM
assembler. Later chapters of this book describe the ARM's stack and exception processing system
in more detail.
Figure 3.1 on the following page shows the internal structure of the ARM processor. The ARM
is a Reduced Instruction Set Computer (RISC) system and includes the attributes typical to that
type of system:
• A load/store model of data-processing where operations can only operate on registers and not
directly on memory. This requires that all data be loaded into registers before an operation
can be preformed, the result can then be used for further processing or stored back into
memory.
• A small number of addressing modes with all load/store addresses begin determined from
registers and instruction elds only.
In addition to these traditional features of a RISC system the ARM provides a number of additional
features:
• Separate Arithmetic Logic Unit (ALU) and shifter giving additional control over data pro-
cessing to maximize execution speed.
• Conditional execution of instructions to reduce pipeline ushing and thus increase execution
speed.
The ARM supports the seven processor modes shown in table 3.1.
Mode changes can be made under software control, or can be caused by external interrupts or
exception processing.
Most application programs execute in User mode. While the processor is in User mode, the
program being executed is unable to access some protected system resources or to change mode,
other than by causing an exception to occur (see 3.4 on page 29). This allows a suitably written
operating system to control the use of system resources.
23
24 CHAPTER 3. ARM ARCHITECTURE
The modes other than User mode are known as privileged modes. They have full access to system
resources and can change mode freely. Five of them are known as exception modes : FIQ (Fast
Interrupt), IRQ (Interrupt), Supervisor, Abort, and Undened. These are entered when specic
exceptions occur. Each of them has some additional registers to avoid corrupting User mode state
when the exception occurs (see 3.2 for details).
The remaining mode is System mode, it is not entered by any exception and has exactly the same
registers available as User mode. However, it is a privileged mode and is therefore not subject to
the User mode restrictions. It is intended for use by operating system tasks which need access to
system resources, but wish to avoid using the additional registers associated with the exception
modes. Avoiding such use ensures that the task state is not corrupted by the occurrence of any
exception.
3.2 Registers
The ARM has a total of 37 registers. These comprise 30 general purpose registers, 6 status registers
and a program counter. Figure 3.2 illustrates the registers of the ARM. Only fteen of the general
purpose registers are available at any one time depending on the processor mode.
There are a standard set of eight general purpose registers that are always available ( R0 R7 ) no
matter which mode the processor is in. These registers are truly general-purpose, with no special
uses being placed on them by the processors' architecture.
A few registers ( R8 R12 ) are common to all processor modes with the exception of the q
mode. This means that to all intent and purpose these are general registers and have no special
use. However, when the processor is in the fast interrupt mode these registers and replaced with
dierent set of registers ( R8_q - R12_q ). Although the processor does not give any special
purpose to these registers they can be used to hold information between fast interrupts. You can
consider they to be static registers. The idea is that you can make a fast interrupt even faster
by holding information in these registers.
The general purpose registers can be used to handle 8-bit bytes, 16-bit half-words , or 32-bit
1
words. When we use a 32-bit register in a byte instruction only the least signicant 8 bits are
used. In a half-word instruction only the least signicant 16 bits are used. Figure 3.3 demonstrates
this.
The remaining registers ( R13 R15 ) are special purpose registers and have very specic roles:
R13 is also known as the Stack Pointer, while R14 is known as the Link Register, and R15 is
the Program Counter. The user ( usr) and System ( sys) modes share the same registers. The
exception modes all have their own version of these registers. Making a reference to register R14
will assume you are referring to the register for the current processor mode. If you wish to refer
1 Although the ARM does allow for Half-Word instructions, the emulator we are using does not.
26 CHAPTER 3. ARM ARCHITECTURE
Modes
Privileged Modes
Exception Modes
User System Supervisor Abort Undened Interrupt Fast Interrupt
R0 R0 R0 R0 R0 R0 R0
R1 R1 R1 R1 R1 R1 R1
R2 R2 R2 R2 R2 R2 R2
R3 R3 R3 R3 R3 R3 R3
R4 R4 R4 R4 R4 R4 R4
R5 R5 R5 R5 R5 R5 R5
R6 R6 R6 R6 R6 R6 R6
R7 R7 R7 R7 R7 R7 R7
R8 R8 R8 R8 R8 R8 R8_q
R9 R9 R9 R9 R9 R9 R9_q
R10 R10 R10 R10 R10 R10 R10_q
R11 R11 R11 R11 R11 R11 R11_q
R12 R12 R12 R12 R12 R12 R12_q
R13 R13 R13_svc R13_abt R13_und R13_irq R13_q
R14 R14 R14_svc R14_abt R14_und R14_irq R14_q
PC PC PC PC PC PC PC
to the user mode version of this register you have refer to the R14_usr register. You may only
refer to register from other modes when the processor is in one of the privileged modes, i.e., any
mode other than user mode.
There are also one or two status registers depending on which mode the processor is in. The Cur-
rent Processor Status Register ( CPSR) holds information about the current status of the processor
(including its current mode). In the exception modes there is an additional Saved Processor Status
Register ( SPSR) which holds information on the processors state before the system changed into
this mode, i.e., the processor status just before an exception.
The stack is typically used to store temporary values. It is normal to store the contents of any
registers a function is going to use on the stack on entry to a subroutine. This leaves the register
free for use during the function. The routine can then recover the register values from the stack
3.2. REGISTERS 27
on exit from the subroutine. In this way the subroutine can preserve the value of the register and
not corrupt the value as would otherwise be the case.
• On entry to the subroutine store R14 to the stack with an instruction of the form:
See Chapter ?? on page ?? for further details of using the stack, and Chapter 15 on page 113 for
further details on using subroutines.
When an exception occurs, the exception mode's version of R14 is set to the address after the
instruction which has just been completed. The SPSR is a copy of the CPSR just before the
exception occurred. The return from an exception is performed in a similar way to a subroutine
return, but using slightly dierent instructions to ensure full restoration of the state of the program
that was being executed when the exception occurred. See 3.4 on page 29 for more details.
When an instruction reads the PC the value returned is the address of the current instruction plus
8 bytes. This is the address of the instruction after the next instruction to be executed2 .
2 This is caused by the processor having already fetched the next instruction from memory while it was deciding
what the current instruction was. Thus the PC is still the next instruction to be executed, but that is not the
instruction immediately after the current one.
28 CHAPTER 3. ARM ARCHITECTURE
This way of reading the PC is primarily used for quick, position-independent addressing of nearby
instructions and data, including position-independent branching within a program.
An exception to this rule occurs when an STR (Store Register) or STM (Store Multiple Registers)
instruction stores R15 . The value stored is UNKNOWN and it is best to avoid the use of these
instructions that store R15 .
When an instruction writes to R15 the normal result is that the value written is treated as an
instruction address and the system starts to execute the instruction at that address .
3
The exception modes also have a saved processor status register (SPSR), that is used to preserve
the value of the CPSR when the associated exception occurs. Because the User and System modes
are not exception modes, there is no SPSR available.
Figure 3.4 shows the format of the CPSR and the SPSR registers.
31 30 29 28 27 ··· 8 7 6 5 4 ··· 0
N Z C V SBZ I F SBZ Mode
The processors' status is split into two distinct parts: the User ags and the Systems Control
ags. The upper halfword is accessible in User mode and contains a set of ags which can be
used to eect the operation of a program, see section 3.3. The lower halfword contains the System
Control information.
Any bit not currently used is reserved for future use and should be zero, and are marked SBZ in
the gure. The I and F bits indicate if Interrupts (I) or Fast Interrupts (F) are allowed. The Mode
bits indicate which operating mode the processor is in (see 3.1 on page 23).
The system ags can only be altered when the processor is in protected mode. User mode programs
can not alter the status register except for the condition code ags.
3.3 Flags
The upper four bits of the status register contains a set of four ags, collectively known at the
condition code. The condition code ags are:
Negative (N)
Zero (Z)
Carry (C)
Overow (V)
3 As the processor has already fetched the instruction after the current instruction it is required to ush the
instruction cache and start again. This will cause a short, but not signicant, delay.
3.4. EXCEPTIONS 29
The condition code can be used to control the ow of the program execution. The is often
abbreviated to just hcc i.
N The Negative (sign) ag takes on the value of the most signicant bit of a result. Thus
when an operation produces a negative result the negative ag is set and a positive
result results in a the negative ag being reset. This assumes the values are in standard
two's complement form. If the values are unsigned the negative ag can be ignored or
used to identify the value of the most signicant bit of the result.
Z The Zero ag is set when an operation produces a zero result. It is reset when an
operation produces a non-zero result.
C The Carry ag holds the carry from the most signicant bit produced by arithmetic op-
erations or shifts. As with most processors, the carry ag is inverted after a subtraction
so that the ag acts as a borrow ag after a subtraction.
V The Overow ag is set when an arithmetic result is greater than can be represented in
a register.
Many instructions can modify the ags, these include comparison, arithmetic, logical and move
instructions. Most of the instructions have an S qualier which instructs the processor to set the
condition code ags or not.
3.4 Exceptions
Exceptions are generated by internal and external sources to cause the processor to handle an event,
such as an externally generated interrupt or an attempt to execute an undened instruction. The
ARM supports seven types of exception, and a provides a privileged processing mode for each
type. Table 3.2 lists the type of exception and the processor mode associated with it.
When an exception occurs, some of the standard registers are replaced with registers specic to the
exception mode. All exception modes have their own Stack Pointer ( SP) LR)
and Link ( registers.
The fast interrupt mode has more registers ( R8_q R12_q ) for fast interrupt processing.
Reset when the Reset pin is held low, this is normally when the system is rst turned on or when
the reset button is pressed.
Software Interrupt is generally used to allow user mode programs to call the operating system.
The user program executes a software interrupt (SWI, A.18 on page 135) instruction with a
argument which identies the function the user wishes to preform.
30 CHAPTER 3. ARM ARCHITECTURE
Prefetch Abort occurs when the processor attempts to access memory that does not exist.
Data Abort occurs when attempting to access a word on a non-word aligned boundary. The
lower two bits of a memory must be zero when accessing a word.
Interrupt occurs when an external device asserts the IRQ (interrupt) pin on the processor. This
can be used by external devices to request attention from the processor. An interrupt can
not be interrupted with the exception of a fast interrupt.
Fast Interrupt occurs when an external device asserts the FIQ (fast interrupt) pin. This is
designed to support data transfer and has sucient private registers to remove the need for
register saving in such applications. A fast interrupt can not be interrupted.
When an exception occurs, the processor halts execution after the current instruction. The state
of the processor is preserved in the Saved Processor Status Register (SPSR) so that the original
program can be resumed when the exception routine has completed. The address of the instruction
the processor was just about to execute is placed into the Link Register of the appropriate processor
mode. The processor is now ready to begin execution of the exception handler.
The exception handler are located a pre-dened locations known as exception vectors. It is the
responsibility of an operating system to provide suitable exception handling.
Why are a microprocessor's instructions referred to as an instruction set? Because the micropro-
cessor designer selects the instruction complement with great care; it must be easy to execute
complex operations as a sequence of simple events, each of which is represented by one instruction
from a well-designed instruction set.
Assembler often frighten users who are new to programming. Yet taken in isolation, the operations
involved in the execution of a single instruction are usually easy to follow. Furthermore, you need
not attempt to understand all the instructions at once. As you study each of the programs in
these notes you will learn about the specic instructions involved.
Table 4.1 lists the instruction mnemonics. This provides a survey of the processors capabilities,
and will also be useful when you need a certain kind of operation but are either unsure of the
specic mnemonics or not yet familiar with what instructions are available.
See Chapter ?? and Appendix ?? for a detailed description of the individual instructions and
chapters 7 through to 15 for a discussion on how to use them.
The ARM instruction set can be divided into six broad classes of instruction.
Before we look at each of these groups in a little more detail there are a few ideas which belong
to all groups worthy of investigation.
3.5. INSTRUCTION SET 31
Operation Operation
Mnemonic Meaning Mnemonic Meaning
ADC Add with Carry MVN Logical NOT
ADD Add ORR Logical OR
AND Logical AND RSB Reverse Subtract
BAL RSC
cc
Unconditional Branch Reverse Subtract with Carry
Bh i Branch on Condition SBC Subtract with Carry
BIC Bit Clear SMLAL Mult Accum Signed Long
BLAL SMULL
cc
Unconditional Branch and Link Multiply Signed Long
BLh i Conditional Branch and Link STM Store Multiple
CMP Compare STR Store Register (Word)
EOR Exclusive OR STRB Store Register (Byte)
LDM Load Multiple SUB Subtract
LDR Load Register (Word) SWI Software Interrupt
LDRB Load Register (Byte) SWP Swap Word Value
MLA Multiply Accumulate SWPB Swap Byte Value
MOV Move TEQ Test Equivalence
MRS Load SPSR or CPSR TST Test
MSR Store to SPSR or CPSR UMLAL Mult Accum Unsigned Long
MUL Multiply UMULL Multiply Unsigned Long
C S C C
Mnemonic Condition Mnemonic Condition
CS CC
Eq N E
arry et arry lear
EQ NE
v S v C
ual (Zero Set) ot qual (Zero Clear)
VS VC
G T L T
O erow et O erow lear
GT LT
G E L E
reater han ess han
GE LE
Pl Mi
reater Than or qual ess Than or qual
PL MI
Hi Lo
us (Positive) nus (Negative)
HI LO CC)
H S L S
gher Than wer Than (aka
HS igher or ame (aka CS) LS ower or ame
Table 4.2 on page 42 shows a list of the condition codes and their mnemonics. To indicate that an
instruction is conditional we simply place the mnemonic for the condition code after the mnemonic
for the instruction. If no condition code mnemonic is used the instruction will always be executed.
For example the following instruction will move the value of the register R1 into the R0 register
only when the Carry ag has been set, R0 will remain unaected if the C ag was clear.
MOVCS R0, R1
Note that the Greater and the Less conditions are for use with signed numbers while the Higher
and Lower conditions are for use with unsigned numbers. These condition codes only really make
seance after a comparison (CMP) instruction, see A.5 on page 129.
32 CHAPTER 3. ARM ARCHITECTURE
Most data-processing instructions can also update the condition codes according to their result.
Placing an S after the mnemonic will cause the ags to be updated. For example there are two
versions of the MOV instruction:
MOV R0, #0 Will move the value 0 into the register R0 without setting the ags.
MOVS R0, #0 Will do the same, move the value 0 into the register R0 , but it will also set
the condition code ags accordingly, the Zero ag will be set, the Negative ag
will be reset and the Carry and oVerow ags will not be eected.
If an instruction has this ability we denote it using hS i in our description of the instruction. The
hS i always comes after the hcc i (conditional execution) modication if it is given. Thus the full
description of the move instruction would be:
With all this in mind what does the following code fragment do?
MOVS R0, R1
MOVEQS R0, R2
MOVEQ R0, R3
The rst instruction will move R1 into R0 unconditionally, but it will also set the N and Z ags
accordingly. Thus the second instruction is only executed if the Z ag is set, i.e., the value of R1
was zero. If the value of R1 was not zero the instruction is skipped. If the second instruction is
executed it will copy the value of R2 into R0 and it will also set the N and Z ags according to
the value of R2 . Thus the third instruction is only executed if both R1 and R2 are both zero.
Unmodied Value
You can use a value or a register unmodied by simply giving the value or the register name. For
example the following instructions will demonstrate the two methods:
MOV R0, #1234 Will move the immediate constant value 123410 into the register R0
MOV R0, R1 Will move the value in the register R1 into the register R0
This will take the value of a register and shift the value up, towards the most signicant bit, by n
bits. The number of bits to shift is specied by either a constant value or another register. The
lower bits of the value are replaced with a zero. This is a simple way of performing a multiply by
n
a power of 2 (×2 ).
3.5. INSTRUCTION SET 33
MOV R0, R1, LSL #2 R0 will become the value of R1 shifted left by 2 bits. The value of R1
is not changed.
MOV R0, R1, LSL R2 R0 will become the value of R1 shifted left by the number of bits
specied in the R2 register. R0 is the only register to change, both R1
and R2 are not eected by this operation.
C
If the instruction is to set the status register, the carry ag ( ) is the last bit that was shifted out
of the value.
Logical Shift Right is very similar to Logical Shift Left except it will shift the value to the right,
towards the lest signicant bit, by n bits. It will replace the upper bits with zeros, thus providing
an ecient unsigned divide by 2n function (| ÷ 2
n
|). The number of bits to shift may be specied
by either a constant value or another register.
MOV R0, R1, LSR #2 R0 will take on the value of R1 shifted to the right by 2 bits. The
value of R1 is not changed.
MOV R0, R1, LSR R2 As before R0 will become the value of R1 shifted to the right by the
number of bits specied in the R2 register. R1 and R2 are not altered
by this operation.
C
If the instruction is to set the status register, the carry ag ( ) is the last bit to be shifted out of
the value.
The Arithmetic Shift Right is rather similar to the Logical Shift Right, but rather than replacing
the upper bits with a zero, it maintains the value of the most signicant bit. As the most signicant
bit is used to hold the sign, this means the sign of the value is maintained, thus providing a signed
divide by 2n n
operation (÷2 ).
MOV R0, R1, ASR #2 Register R0 will become the value of register R1 shifted to the right
by 2 bits, with the sign maintained.
MOV R0, R1, ASR R2 Register R0 will become the value of the register R1 shifted to the
right by the number of bits specied by the R2 register. R1 and R2
are not altered by this operation.
Given the distinction between the Logical and Arithmetic Shift Right, why is there no Arithmetic
Shift Left operation?
As a signed number is stored in two's complement the upper most bits hold the sign of the number.
These bits can be considered insignicant unless the number is of a sucient size to require their
use. Thus an Arithmetic Shift Left is not required as the sign is automatically preserved by the
Logical Shift.
Rotate Right
In the Rotate Right operation, the lest signicant bit is copied into the carry ( ) ag, while the C
value of the C ag is copied into the most signicant bit of the value. In this way none of the bits
in the value are lost, but are simply moved from the lower bits to the upper bits of the value.
34 CHAPTER 3. ARM ARCHITECTURE
MOV R0, R1, ROR #2 This will rotate the value of R1 by two bits. The most signicant bit
of the resulting value will be the same as the least signicant bit of
the original value. The second most signicant bit will be the same
as the Carry ag. In the S version the Carry ag will be set to the
second least signicant bit of the original value. The value of R1 is
not changed by this operation.
MOV R0, R1, ROR R2 Register R0 will become the value of the register R1 rotated to the
right by the number of bits specied by the R2 register. R1 and R2
are not altered by this operation.
An Add With Carry (ADC, A.1 on page 127) to a zero value provides this service for a single bit.
The designers of the instruction set believe that a Rotate Left by more than one bit would never
be required, thus they have not provided a ROL function.
This is similar to a Rotate Right by one bit. The extended section of the fact that this function
C
moves the value of the Carry ( ) ag into the most signicant bit of the value, and the least
C
signicant bit of the value into the Carry ( ) ag. Thus it allows the Carry ag to be propagated
though multi-word values, thereby allowing values larger than 32-bits to be used in calculations.
MOV R0, R1 RRX The register R0 become the same as the value of the register R1 rotated
though the carry ag by one bit. The most signicant bit of the value
becomes the same as the current Carry ag, while the Carry ag will be the
same as the least signicant bit or R1 . The value of R1 will not be changed.
Constant Value
An immediate constant value can be provided. If no oset is specied an immediate constant
value of zero is assumed.
Register
The oset can be specied by another register. The value of the register is added to the
address held in another register to form the nal address.
Scaled
The oset is specied by another register which can be scaled by one of the shift operators
used for hop1 i. More specically by the Logical Shift Left ( LSL), Logical Shift Right (LSR),
Arithmetic Shift Right ( ASR), ROtate Right ( ROR) or Rotate Right Extended ( RRX) shift
operators, where the number of bits to shift is specied as a constant value.
3.5. INSTRUCTION SET 35
Oset Addressing
In oset addressing the memory address is formed by adding (or subtracting) an oset to or from
the value held in a base register.
LDR R0, [R1] Will load the register R0 with the 32-bit word at the memory
address held in the register R1 . In this instruction there is no
oset specied, so an oset of zero is assumed. The value of
R1 is not changed in this instruction.
LDR R0, [R1, #4] Will load the register R0 with the word at the memory ad-
dress calculated by adding the constant value 4 to the memory
address contained in the R1 register. The register R1 is not
changed by this instruction.
LDR R0, [R1, R2] Loads the register R0 with the value at the memory address
calculated by adding the value in the register R1 to the value
held in the register R2 . Both R1 and R2 are not altered by
this operation.
LDR R0, [R1, R2, LSL #2] Will load the register R0 with the 32-bit value at the memory
address calculated by adding the value in the R1 register to the
value obtained by shifting the value in R2 left by 2 bits. Both
registers, R1 and R2 are not eected by this operation.
This is particularly useful for indexing into a complex data structure. The start of the data
structure is held in a base register, R1 in this case, and the oset to access a particular eld within
the structure is then added to the base address. Placing the oset in a register allows it to be
calculated at run time rather than xed. This allows for looping though a table.
A scaled value can also be used to access a particular item of a table, where the size of the item
is a power of two. For example, to locate item 7 in a table of 32-bit values we need only shift the
2
index value 6 left by 2 bits (6 × 2 ) to calculate the value we need to add as an oset to the start
of the table held in a register, R1 in our example. Remember that the computer count from zero,
thus we use an index value of 6 rather than 7. A 32-bit number requires 4 bytes of storage which
is 22 , thus we only need a 2-bit left shift.
Pre-Index Addressing
In pre-index addressing the memory address if formed in the same way as for oset addressing.
The address is not only used to access memory, but the base register is also modied to hold
the new value. In the ARM system this is known as a write-back and is denoted by placing a
exclamation mark after at the end of the hop2 i code.
Pre-Index address can be particularly useful in a loop as it can be used to automatically increment
or decrement a counter or memory pointer.
36 CHAPTER 3. ARM ARCHITECTURE
LDR R0, [R1, #4]! Will load the register R0 with the word at the memory address
calculated by adding the constant value 4 to the memory ad-
dress contained in the R1 register. The new memory address
is placed back into the base register, register R1 .
LDR R0, [R1, R2]! Loads the register R0 with the value at the memory address
calculated by adding the value in the register R1 to the value
held in the register R2 . The oset register, R2 , is not altered
by this operation, the register holding the base address, R1 ,
is modied to hold the new address.
LDR R0, [R1, R2, LSL #2]! First calculates the new address by adding the value in the
base address register, R1 , to the value obtained by shifting
the value in the oset register, R2 , left by 2 bits. It will then
load the 32-bit at this address into the destination register, R0 .
The new address is also written back into the base register, R1 .
The oset register, R2 , will not be eected by this operation.
Post-Index Addressing
In post-index address the memory address is the base register value. As a side-eect, an oset
is added to or subtracted from the base register value and the result is written back to the base
register.
Post-index addressing uses the value of the base register without modication. It then applies the
modication to the address and writes the new address back into the base register. This can be
used to automatically increment or decrement a memory pointer after it has been used, so it is
pointing to the next location to be used.
As the instruction must preform a write-back we do not need to include an exclamation mark.
Rather we move the closing bracket to include only the base register, as that is the register holding
the memory address we are going to access.
LDR R0, [R1], #4 Will load the register R0 with the word at the memory address
contained in the base register, R1 . It will then calculate the
new value of R1 by adding the constant value 4 to the current
value of R1 .
LDR R0, [R1], R2 Loads the register R0 with the value at the memory address
held in the base register, R1 . It will then calculate the new
value for the base register by adding the value in the oset
register, R2 , to the current value of the base register. The
oset register, R2 , is not altered by this operation.
LDR R0, [R1], R2, LSL #2 First loads the 32-bit value at the memory address contained in
the base register, R1 , into the destination register, R0 . It will
then calculate the new value for the base register by adding the
current value to the value obtained by shifting the value in the
oset register, R2 , left by 2 bits. The oset register, R2 , will
not be eected by this operation.
4 Instruction Set
Why are a microprocessor's instructions referred to as an instruction set? Because the micropro-
cessor designer selects the instruction complement with great care; it must be easy to execute
complex operations as a sequence of simple events, each of which is represented by one instruction
from a well-designed instruction set.
Assembler often frighten users who are new to programming. Yet taken in isolation, the operations
involved in the execution of a single instruction are usually easy to follow. Furthermore, you need
not attempt to understand all the instructions at once. As you study each of the programs in
these notes you will learn about the specic instructions involved.
Table 4.1 lists the instruction mnemonics. This provides a survey of the processors capabilities,
and will also be useful when you need a certain kind of operation but are either unsure of the
specic mnemonics or not yet familiar with what instructions are available.
The appendix A gives a detailed description of the individual instructions while chapters 7 through
to 15 provide a discussion on how to use them.
The ARM instruction set can be divided into six broad classes of instruction.
Before we look at each of these groups in a little more detail there are a few ideas which belong
to all groups worthy of investigation.
Important Note:
The ARM instruction set can be divided into six broad classes of instruction:
37
38 CHAPTER 4. INSTRUCTION SET
Operation Operation
Mnemonic Meaning Mnemonic Meaning
ADC Add with Carry MVN Logical NOT
ADD Add ORR Logical OR
AND Logical AND RSB Reverse Subtract
BAL RSC
cc
Unconditional Branch Reverse Subtract with Carry
Bh i Branch on Condition SBC Subtract with Carry
BIC Bit Clear SMLAL Mult Accum Signed Long
BLAL SMULL
cc
Unconditional Branch and Link Multiply Signed Long
BLh i Conditional Branch and Link STM Store Multiple
CMP Compare STR Store Register (Word)
EOR Exclusive OR STRB Store Register (Byte)
LDM Load Multiple SUB Subtract
LDR Load Register (Word) SWI Software Interrupt
LDRB Load Register (Byte) SWP Swap Word Value
MLA Multiply Accumulate SWPB Swap Byte Value
MOV Move TEQ Test Equivalence
MRS Load SPSR or CPSR TST Test
MSR Store to SPSR or CPSR UMLAL Mult Accum Unsigned Long
MUL Multiply UMULL Multiply Unsigned Long
There is a Branch and Link (BL) option that also preserves the address of the instruction after
the branch in R14, the LR. This provides a subroutine call which can be returned from by copying
the LR into the PC.
• Arithmetic/logic instructions
• Comparison instructions
• Multiply instructions
Arithmetic/logic instructions
There are twelve arithmetic/logic instructions which share a common instruction format. These
perform an arithmetic or logical operation on up to two source operands, and write the result to a
destination register. They can also optionally update the condition code ags based on the result.
an immediate value
If the operand is a shifted register, the shift amount can be either an immediate value or the
value of another register. Four types of shift can be specied. Every arithmetic/logic instruction
can therefore perform an arithmetic/logic and a shift operation. As a result, ARM does not have
dedicated shift instructions.
Comparison instructions
There are four comparison instructions which use the same instruction format as the arith-
metic/logic instructions. These perform an arithmetic or logical operation on two source operands,
but do not write the result to a register. They always update the condition ags based on the
result.
The source operands of comparison instructions take the same forms as those of arithmetic/logic
instructions, including the ability to incorporate a shift operation.
Multiply instructions
Multiply instructions come in two classes. Both types multiply two 32-bit register values and store
their result:
64-bit result Long. Stores the 64-bit result in two separate registers.
The Count Leading Zeros (CLZ) instruction determines the number of zero bits at the most
signicant end of a register value, up to the rst 1 bit. This number is written to the destination
register of the CLZ instruction.
Load Register instructions can load a 32-bit word, a 16-bit halfword or an 8-bit byte from memory
into a register. Byte and halfword loads can be automatically zero-extended or sign-extended as
they are loaded.
Store Register instructions can store a 32-bit word, a 16-bit halfword or an 8-bit byte from a
register to memory.
Load and Store Register instructions have three primary addressing modes, all of which use a base
register and an oset specied by the instruction:
• In pre-indexed addressing, the memory address is formed in the same way as for oset
addressing. As a side-eect, the memory address is also written back to the base register.
• In post-indexed addressing, the memory address is the base register value. As a side-eect,
an oset is added to or subtracted from the base register value and the result is written back
to the base register.
In each case, the oset can be either an immediate or the value of an index register. Register-based
osets can also be scaled with shift operations.
As the PC is a general-purpose register, a 32-bit value can be loaded directly into the PC to
perform a jump to any address in the 4GB memory space.
Load Multiple (LDM) and Store Multiple (STM) instructions perform a block transfer of any
number of the general-purpose registers to or from memory. Four addressing modes are provided:
• pre-increment
• post-increment
• pre-decrement
• post-decrement
The base address is specied by a register value, which can be optionally updated after the
transfer. As the subroutine return address and PC values are in general-purpose registers, very
ecient subroutine entry and exit sequences can be constructed with LDM and STM:
41
• A single STM instruction at subroutine entry can push register contents and the return address
onto the stack, updating the stack pointer in the process.
• A single LDM instruction at subroutine exit can restore register contents from the stack, load
the PC with the return address, and update the stack pointer.
LDM and STM instructions also allow very ecient code for block copies and similar data movement
algorithms.
By specifying the same register for steps 2 and 3, the contents of a memory location and a register
are interchanged.
The swap operation performs a special indivisible bus operation that allows atomic update of
semaphores. Both 32-bit word and 8-bit byte semaphores are supported.
Data transfer instructions These transfer coprocessor data to or from memory. The address
of the transfer is calculated by the ARM processor.
Software interrupt instructions SWI instructions cause a software interrupt exception to oc-
cur. These are normally used to make calls to an operating system, to request an OS-dened
service. The exception entry caused by a SWI instruction also changes to a privileged proces-
sor mode. This allows an unprivileged task to gain access to privileged functions, but only
in ways permitted by the OS.
C S C C
Mnemonic Condition Mnemonic Condition
CS CC
Eq N E
arry et arry lear
EQ NE
v S v C
ual (Zero Set) ot qual (Zero Clear)
VS VC
G T L T
O erow et O erow lear
GT LT
G E L E
reater han ess han
GE LE
Pl Mi
reater Than or qual ess Than or qual
PL MI
Hi Lo
us (Positive) nus (Negative)
HI LO CC)
H S L S
gher Than wer Than (aka
HS igher or ame (aka CS) LS ower or ame
In addition to the above, the following types of instruction cause an Undened Instruction excep-
tion to occur:
• most instruction words that have not yet been allocated a meaning as an ARM instruction.
In each case, this exception is normally used either to generate a suitable error or to initiate
software emulation of the instruction.
Table 4.2 shows a list of the condition codes and their mnemonics. To indicate that an instruction
is conditional we simply place the mnemonic for the condition code after the mnemonic for the
instruction. If no condition code mnemonic is used the instruction will always be executed.
For example the following instruction will move the value of the register R1 into the R0 register
only when the Carry ag has been set, R0 will remain unaected if the C ag was clear.
MOVCS R0, R1
Note that the Greater and the Less conditions are for use with signed numbers while the Higher
and Lower conditions are for use with unsigned numbers. These condition codes only really make
seance after a comparison (CMP) instruction, see A.5 on page 129.
Most data-processing instructions can also update the condition codes according to their result.
Placing an S after the mnemonic will cause the ags to be updated. For example there are two
versions of the MOV instruction:
MOV R0, #0 Will move the value 0 into the register R0 without setting the ags.
MOVS R0, #0 Will do the same, move the value 0 into the register R0 , but it will also set
the condition code ags accordingly, the Zero ag will be set, the Negative ag
will be reset and the Carry and oVerow ags will not be eected.
If an instruction has this ability we denote it using hS i in our description of the instruction. The
hS i always comes after the hcc i (conditional execution) modication if it is given. Thus the full
description of the move instruction would be:
43
With all this in mind what does the following code fragment do?
MOVS R0, R1
MOVEQS R0, R2
MOVEQ R0, R3
The rst instruction will move R1 into R0 unconditionally, but it will also set the N and Z ags
accordingly. Thus the second instruction is only executed if the Z ag is set, i.e., the value of R1
was zero. If the value of R1 was not zero the instruction is skipped. If the second instruction is
executed it will copy the value of R2 into R0 and it will also set the N and Z ags according to
the value of R2 . Thus the third instruction is only executed if both R1 and R2 are both zero.
44 CHAPTER 4. INSTRUCTION SET
5 Addressing Modes
The majority of the instructions relate to data processing of some form. One of the operands
to these instructions is routed through the Barrel Shifter. This means that the operand can be
modied before it is used. This can be very useful when dealing with lists, tables and other
complex data structures. We denote instructions of this type as taking one of its arguments from
hop1 i.
An hop1 i argument may come from one of two sources, a constant value or a register, and be
modied in ve dierent ways. See Chapter ?? for more detailed information.
MOV R0, #1234 Will move the immediate constant value 123410 into the register R0
MOV R0, R1 Will move the value in the register R1 into the register R0
This will take the value of a register and shift the value up, towards the most signicant bit, by n
bits. The number of bits to shift is specied by either a constant value or another register. The
lower bits of the value are replaced with a zero. This is a simple way of performing a multiply by
n
a power of 2 (×2 ).
MOV R0, R1, LSL #2 R0 will become the value of R1 shifted left by 2 bits. The value of R1
is not changed.
MOV R0, R1, LSL R2 R0 will become the value of R1 shifted left by the number of bits
specied in the R2 register. R0 is the only register to change, both R1
and R2 are not eected by this operation.
C
If the instruction is to set the status register, the carry ag ( ) is the last bit that was shifted out
of the value.
45
46 CHAPTER 5. ADDRESSING MODES
Logical Shift Right is very similar to Logical Shift Left except it will shift the value to the right,
towards the lest signicant bit, by n bits. It will replace the upper bits with zeros, thus providing
an ecient unsigned divide by 2n function (| ÷ 2
n
|). The number of bits to shift may be specied
by either a constant value or another register.
MOV R0, R1, LSR #2 R0 will take on the value of R1 shifted to the right by 2 bits. The
value of R1 is not changed.
MOV R0, R1, LSR R2 As before R0 will become the value of R1 shifted to the right by the
number of bits specied in the R2 register. R1 and R2 are not altered
by this operation.
C
If the instruction is to set the status register, the carry ag ( ) is the last bit to be shifted out of
the value.
The Arithmetic Shift Right is rather similar to the Logical Shift Right, but rather than replacing
the upper bits with a zero, it maintains the value of the most signicant bit. As the most signicant
bit is used to hold the sign, this means the sign of the value is maintained, thus providing a signed
divide by 2n n
operation (÷2 ).
MOV R0, R1, ASR #2 Register R0 will become the value of register R1 shifted to the right
by 2 bits, with the sign maintained.
MOV R0, R1, ASR R2 Register R0 will become the value of the register R1 shifted to the
right by the number of bits specied by the R2 register. R1 and R2
are not altered by this operation.
Given the distinction between the Logical and Arithmetic Shift Right, why is there no Arithmetic
Shift Left operation?
As a signed number is stored in two's complement the upper most bits hold the sign of the number.
These bits can be considered insignicant unless the number is of a sucient size to require their
use. Thus an Arithmetic Shift Left is not required as the sign is automatically preserved by the
Logical Shift.
In the Rotate Right operation, the lest signicant bit is copied into the carry ( ) ag, while the C
value of the C ag is copied into the most signicant bit of the value. In this way none of the bits
in the value are lost, but are simply moved from the lower bits to the upper bits of the value.
5.2. MEMORY ACCESS OPERANDS: hOP2 i 47
MOV R0, R1, ROR #2 This will rotate the value of R1 by two bits. The most signicant bit
of the resulting value will be the same as the least signicant bit of
the original value. The second most signicant bit will be the same
as the Carry ag. In the S version the Carry ag will be set to the
second least signicant bit of the original value. The value of R1 is
not changed by this operation.
MOV R0, R1, ROR R2 Register R0 will become the value of the register R1 rotated to the
right by the number of bits specied by the R2 register. R1 and R2
are not altered by this operation.
An Add With Carry (ADC, A.1 on page 127) to a zero value provides this service for a single bit.
The designers of the instruction set believe that a Rotate Left by more than one bit would never
be required, thus they have not provided a ROL function.
This is similar to a Rotate Right by one bit. The extended section of the fact that this function
C
moves the value of the Carry ( ) ag into the most signicant bit of the value, and the least
C
signicant bit of the value into the Carry ( ) ag. Thus it allows the Carry ag to be propagated
though multi-word values, thereby allowing values larger than 32-bits to be used in calculations.
MOV R0, R1 RRX The register R0 become the same as the value of the register R1 rotated
though the carry ag by one bit. The most signicant bit of the value
becomes the same as the current Carry ag, while the Carry ag will be the
same as the least signicant bit or R1 . The value of R1 will not be changed.
The memory address used in the memory access instructions may also modied by the barrel
shifter. This provides for more advanced access to memory which is particularly useful when
dealing with more advanced data structures. It allows pre- and post-increment instructions that
update memory pointers as a side eect of the instruction. This makes loops which pass though
memory more ecient. We denote instructions of this type as taking one of its arguments from
hop2 i. For a full discussion of the hop2 i addressing mode we refer the reader to Chapter ?? on
page ??.
There are three main methods of specifying a memory address (hop2 i), all of which include an
oset value of some form. This oset can be specied in one of three ways:
Constant Value
An immediate constant value can be provided. If no oset is specied an immediate constant
value of zero is assumed.
Register
The oset can be specied by another register. The value of the register is added to the
address held in another register to form the nal address.
48 CHAPTER 5. ADDRESSING MODES
Scaled
The oset is specied by another register which can be scaled by one of the shift operators
used for hop1 i. More specically by the Logical Shift Left ( LSL), Logical Shift Right (LSR),
Arithmetic Shift Right ( ASR), ROtate Right ( ROR) or Rotate Right Extended ( RRX) shift
operators, where the number of bits to shift is specied as a constant value.
In oset addressing the memory address is formed by adding (or subtracting) an oset to or from
the value held in a base register.
LDR R0, [R1] Will load the register R0 with the 32-bit word at the memory
address held in the register R1 . In this instruction there is no
oset specied, so an oset of zero is assumed. The value of
R1 is not changed in this instruction.
LDR R0, [R1, #4] Will load the register R0 with the word at the memory ad-
dress calculated by adding the constant value 4 to the memory
address contained in the R1 register. The register R1 is not
changed by this instruction.
LDR R0, [R1, R2] Loads the register R0 with the value at the memory address
calculated by adding the value in the register R1 to the value
held in the register R2 . Both R1 and R2 are not altered by
this operation.
LDR R0, [R1, R2, LSL #2] Will load the register R0 with the 32-bit value at the memory
address calculated by adding the value in the R1 register to the
value obtained by shifting the value in R2 left by 2 bits. Both
registers, R1 and R2 are not eected by this operation.
This is particularly useful for indexing into a complex data structure. The start of the data
structure is held in a base register, R1 in this case, and the oset to access a particular eld within
the structure is then added to the base address. Placing the oset in a register allows it to be
calculated at run time rather than xed. This allows for looping though a table.
A scaled value can also be used to access a particular item of a table, where the size of the item
is a power of two. For example, to locate item 7 in a table of 32-bit values we need only shift the
2
index value 6 left by 2 bits (6 × 2 ) to calculate the value we need to add as an oset to the start
of the table held in a register, R1 in our example. Remember that the computer count from zero,
thus we use an index value of 6 rather than 7. A 32-bit number requires 4 bytes of storage which
is 22 , thus we only need a 2-bit left shift.
5.2. MEMORY ACCESS OPERANDS: hOP2 i 49
In pre-index addressing the memory address if formed in the same way as for oset addressing.
The address is not only used to access memory, but the base register is also modied to hold
the new value. In the ARM system this is known as a write-back and is denoted by placing a
exclamation mark after at the end of the hop2 i code.
Pre-Index address can be particularly useful in a loop as it can be used to automatically increment
or decrement a counter or memory pointer.
LDR R0, [R1, #4]! Will load the register R0 with the word at the memory address
calculated by adding the constant value 4 to the memory ad-
dress contained in the R1 register. The new memory address
is placed back into the base register, register R1 .
LDR R0, [R1, R2]! Loads the register R0 with the value at the memory address
calculated by adding the value in the register R1 to the value
held in the register R2 . The oset register, R2 , is not altered
by this operation, the register holding the base address, R1 ,
is modied to hold the new address.
LDR R0, [R1, R2, LSL #2]! First calculates the new address by adding the value in the
base address register, R1 , to the value obtained by shifting
the value in the oset register, R2 , left by 2 bits. It will then
load the 32-bit at this address into the destination register, R0 .
The new address is also written back into the base register, R1 .
The oset register, R2 , will not be eected by this operation.
In post-index address the memory address is the base register value. As a side-eect, an oset
is added to or subtracted from the base register value and the result is written back to the base
register.
Post-index addressing uses the value of the base register without modication. It then applies the
modication to the address and writes the new address back into the base register. This can be
used to automatically increment or decrement a memory pointer after it has been used, so it is
pointing to the next location to be used.
50 CHAPTER 5. ADDRESSING MODES
As the instruction must preform a write-back we do not need to include an exclamation mark.
Rather we move the closing bracket to include only the base register, as that is the register holding
the memory address we are going to access.
LDR R0, [R1], #4 Will load the register R0 with the word at the memory address
contained in the base register, R1 . It will then calculate the
new value of R1 by adding the constant value 4 to the current
value of R1 .
LDR R0, [R1], R2 Loads the register R0 with the value at the memory address
held in the base register, R1 . It will then calculate the new
value for the base register by adding the value in the oset
register, R2 , to the current value of the base register. The
oset register, R2 , is not altered by this operation.
LDR R0, [R1], R2, LSL #2 First loads the 32-bit value at the memory address contained in
the base register, R1 , into the destination register, R0 . It will
then calculate the new value for the base register by adding the
current value to the value obtained by shifting the value in the
oset register, R2 , left by 2 bits. The oset register, R2 , will
not be eected by this operation.
6 Programs
The only way to learn assembly language programming is through experience. Throughout the rest
of this book each chapter will introduce various aspects of assembly programming. The chapter
will start with a general discussion, then move on to a number of example programs which will
demonstrate the topic under discussion. The chapter will end with a number of programming
problems for you to try.
Each example is written and assembled as a stand-alone program. They can be downloaded from
1
the web site .
2. The forms in which data and addresses appear are selected for clarity rather than for con-
sistency. We use hexadecimal numbers for memory addresses, instruction codes, and BCD
data; decimal for numeric constants; binary for logical masks; and ASCII for characters.
1 http://dec.bournemouth.ac.uk/support/sem/sysarch/examples.zip
51
52 CHAPTER 6. PROGRAMS
6. Simple and clear structures are emphasised, but programs are written as eciently as possible
within this guideline. Notes accompanying programs often describe more ecient procedures.
8. Program end with a SWI &11 (Software Interrupt) instruction. You may prefer to modify
this by replacing the SWI &11 instruction with an endless loop instruction such as:
To test one of the example programs, rst obtain a copy of the source code. The best way of doing
this is to type in the source code presented in this book, as this will help you to understand the
code. Alternatively you can download the source from the web site, although you won't gain the
same knowledge of the code.
Go to the start menu and call up the Armulate program. Next open the source le using the
normal File | Open menu option. This will open your program source in a separate window
within the Armulate environment.
The next step is to create a new Project within the environment. Select the Project menu option,
then New. Give your project the same name as the source le that you are using (there is no
need to use a le extension it will automatically be saved as a .apj le).
Once you have given the le a name, a further dialog will open as shown in the gure 6.1 on the
next page.
Click the Add button, and you will again be presented with a le dialog, which will display the
source les in the current directory. Select the relevant source le and OK the dialog. You will
be returned to the previous dialog, but you will see now that your source le is included in the
project. OK the Edit Project dialog, and you will be returned to the Armulate environment,
now with two windows open within it, one for the source code and one for the project.
We recommend that you always create a .list listing le for each project that you create. Do
this by selecting the Options menu with the project window in focus, then the Assembler item.
This will open the dialog shown in gure 6.2 on the facing page.
Enter -list [yourfilename].list into the Other text box and OK the dialog.
You have now created your project and are ready to assemble and debug your code.
Additional information on the Armulator is available via the help menu item.
6.3. TRYING THE EXAMPLES FROM THE COMMAND LINE 53
When developing the example programs, we found the Armulate environment too clumsy. We
used the TextPad editor and assembled the programs from the command line. The Armulate
environment provides commands for use from the command line:
1. Assembler
The command line assembler is used to create an object le from the program source code.
During the development of the add program (program 7.3a) we used the command line:
3. Debugger
Finally it is necessary to debug the load image. This can be done in one of two ways, using
a command line debugger or the windows debugger. In either case they require a load image
(add in our example). To use the command line debugger (known as the source debugger)
the following command is used:
ARMSD add
However, the command driven nature of this system is confusing and hard to use for even
the most experienced of developers. Thus we suggest you use the windows based debugger
program:
54 CHAPTER 6. PROGRAMS
WINDBG add
Which will provide you with the same debugger you would have seen had you used the
Window based Armulate environment.
Download Derek Law's ARM Assembler Syntax Denition le from the TextPad web site. You
can nd this under the Syntax Denition sub-section of the Add-ons section of the Download page.
Unpack the armasm.syn from the arm.zip le into the TextPad Samples directory.
Having installed the Syntax Denitions you should now add a new Document Class to TextPad.
Run TextPad and select the New Document Class. . . wizard from the Congure menu. The wizard
will now take you though the following steps:
1. The Document Class requires a name. We have used the name ARM Assembler .
2. The Class Members, the le name extension to associate with this document class. We
associate all .s and .list les with this class: *.s,*.list
3. Syntax Highlighting. The next dialog is where we tell TextPad to use syntax highlighting,
simply check the Enable Syntax Highlighting box. We now need to tell it which syntax
denition le to use. If the armasm.syn le was placed in the Samples directory, this will
appear in the drop down list, and should be selected.
While this will create the new document class, you will almost certainly want to change the colour
settings for this document class. This class uses the dierent levels of Keyword colouring for
dierent aspects of the syntax as follows:
Keywords 1 Instructions
Keywords 2 Co-processor and pseudo-instructions
Keywords 3 Shift-addresses and logical directives
Keywords 4 Registers
Keywords 5 Directives
Keywords 6 Arguments and built-in names
You will probably want to set the color (sic ) setting for all of these types to the same settings.
We have set all but Keywords 2 to the same colour scheme. To alter the color setting you should
select the Preferences. . . option from the Congure menu.
In the Preference dialog (shown in gure 6.4 on the next page), open the Document Classes
section and then your new document class ( ARM Assembler). Now you should select the colors
section. This will now allow you to change the colours for any of the given color settings.
Finally you may like to consider adding a File Type Filter to the Open File dialog. This
can be done by selecting the File Type Filter entry in the Preference dialog. Simply click on the
New button, add the description (ARM Assembler (*.s, *.list)) and wildcard (*.s;*.list) details.
Finally click on the OK button.
Note the use of a comma to seperate the wildcards in the description, and the use of a semi-colon
(without spaces) in the wildcard entry.
2 http://www.textpad.com
6.4. PROGRAM INITIALIZATION 55
All of the programming examples presented in these notes pay particular attention to the correct
initialization of constants and operands. Often this requires additional instructions that may
appear superuous, in that they do not contribute directly to the solution of the stated problem.
Nevertheless, correct initialization is important in order to ensure the proper execution of the
program every time.
We want to stress correct initialization; that is why we are going to emphasize this aspect of
problems.
For the same reasons that we pay particular attention to special conditions that can cause a pro-
gram to fail. Empty lists and zero indexes are two of the most common circumstances overlooked
in sample problems. It is critically important when using microprocessors that you learn with your
very rst program to anticipate unusual circumstances; they frequently cause your program to fail.
You must build in the necessary programming steps to account for these potential problems.
6.6 Problems
Each chapter will now end with a number of programming problems for your to try. They have
been provided to help you understand the ideas presented in the chapter. You should use the
56 CHAPTER 6. PROGRAMS
programming examples as guidelines for solving the problems. Don't forget to run your solutions
on the ARMulator to ensure that they are correct.
1. Comment each program so that others can understand it. The comments can be brief and
ungrammatical. They should explain the purpose of a section or instruction in the program,
but should not describe the operation of instructions, that description is available in manuals.
For example the following line:
You do not have to comment each statement or explain the obvious. You may follow the
format of the examples but provide less detail.
2. Emphasise clarity, simplicity, and good structure in programs. While programs should be
reasonably ecient, do not worry about saving a single byte of program memory or a few
microseconds.
3. Make programs reasonably general. Do not confuse parameters (such as the number of
elements in any array) with xed constants (such as the code for the letter C).
6. Use symbolic notation for address and data references. Symbolic notation should also be
used even for constants (such as DATA_SELECT instead of 2_00000100). Also use the clearest
possible form for data (such as 'C' instead of 0x43).
7. Use meaningful names for labels and variables, e.g., SUM or CHECK rather than X or Z.
8. Execute each program with the emulator. There is no other way of ensuring that your
program is correct. We have provided sample data with each problem. Be sure that the
program works for special cases.
7 Data Movement
This chapter contains some very elementary programs. They will introduce some fundamental
features of the ARM. In addition, these programs demonstrate some primitive tasks that are
common to assembly language programs for many dierent applications.
This program solves the problem in two simple steps. The rst instruction loads data register R1
with the 16-bit value in location Value. The next instruction saves the 16-bit contents of data
register R1 in location Result.
As a reminder of the necessary elements of an assembler program for the ARMulator, notice that
this, and all the other example programs have the following elements. Firstly there must be an
ENTRY directive. This tells the assembler where the rst executable instruction is located. Next
there must be at least one AREA directive, at the start of the program, and there may be other
AREA directives to dene data storage areas. Finally there must be an END directive, to show where
the code ends. The absence of any of these will cause the assembly to fail with an error.
Another limitation to bear in mind is that ARMulator instructions will only deal with BYTE (8
bits) or WORD (32 bit) data sizes. It is possible to declare HALF-WORD (16 bit) variables by the use
57
58 CHAPTER 7. DATA MOVEMENT
of the DCW directive, but it is necessary to ensure consistency of storage of HALF-WORD by the use
of the ALIGN directive. You can see the use of this in the rst worked example.
In addition, under the RISC architecture of the ARM, it is not possible to directly manipulate
data in storage. Even if no actual manipulation of the data is taking place, as in this rst example,
it is necessary to use the LDR or LDRB and STR or STRB to move data to a dierent area of memory.
This version of the LDR instruction moves the 32-bit word contained in memory location Value
into a register and then stores it using the STR instruction at the memory location specied by
Result.
Notice that, by default, every program is allocated a literal pool (a storage area) after the last
executable line. In the case of this, and most of the other programs, we have formalised this by
the use of the AREA Data1, DATA directive. Instruction on how to nd addresses of variables
will be given in the seminars.
This program solves the problem in three steps. The rst instruction moves the contents of
location Value into data register R1 . The next instruction MVN takes the logical complement of
data register R1 . Finally, in the third instruction the result of the logical complement is stored in
Value.
Note that any data register may be referenced in any instruction that uses data registers, but note
the use of R15 for the program counter, R14 for the link register and R13 for the stack pointer.
Thus, in the LDR instruction we've just illustrated, any of the general purpose registers could have
been used.
The LDR and STR instructions in this program, like those in Program 7.1, demonstrate one of
the ARM's addressing modes. The data reference to Value as a source operand is an example
of immediate addressing. In immediate addressing the oset to the address of the data being
referenced (less 8 byes) is contained in the extension word(s) following the operation word of the
instruction. As shown in the assembly listing, the oset to the address corresponding to Value is
found in the extension word for the LDR and STR instructions.
7.1. PROGRAM EXAMPLES 59
The ADD instruction in this program is an example of a three-operand instruction. Unlike the LDR
instruction, this instruction's third operand not only represents the instruction's destination but
may also be used to calculate the result. The format:
As with any microprocessor, there are many instruction sequences you can execute which will solve
the same problem. Program 7.3b, for example, is a modication of Program 7.3a and uses oset
addressing instead of immediate addressing.
Program 7.3b: add2.s Add two numbers and store the result
1 ; Add two numbers and store the result
2
3 TTL Ch4Ex4 - add2
4 AREA Program, CODE, READONLY
5 ENTRY
6
7 Main
8 LDR R0, =Value1 ; Load the address of first value
9 LDR R1, [R0] ; Load what is at that address
10 ADD R0, R0, #0x4 ; Adjust the pointer
11 LDR R2, [R0] ; Load what is at the new addr
12 ADD R1, R1, R2 ; ADD together
13 LDR R0, =Result ; Load the storage address
14 STR R1, [R0] ; Store the result
15 SWI &11 ; All done
16
60 CHAPTER 7. DATA MOVEMENT
The ADR pseudo-instruction introduces a new addressing mode oest addressing, which we
have not used previously. Immediate addressing lets you dene a data constant and include that
constant in the instruction's associated object code. The assembler format identies immediate
addressing with a # preceding the data constant. The size of the data constant varies depending
on the instruction. Immediate addressing is extremely useful when small data constants must be
referenced.
The ADR pseudo-instruction could be replaced by the use of the instruction LDR together with the
use of the = to indicate that the address of the data should be loaded rather than the data itself.
The second addressing mode oset addressing uses immediate addressing to load a pointer
to a memory address into one of the general purpose registers.
Program 7.3b also demonstrates the use of base register plus oset addressing. In this example
we have performed this operation manually on line 10 (ADD R0, R0, #0x4), which increments the
address stored in R0 by 4 bytes or one WORD. There are much simpler and more ecient ways of
doing this, such as pre-index or post-index addressing which we will see in later examples.
Another advantage of this addressing mode is its faster execution time as compared to immediate
addressing. This improvement occurs because the address extension word(s) does not have to be
fetched from memory prior to the actual data reference, after the initial fetch.
A nal advantage is the exibility provided by having R0 hold an address instead of being xed as
part of the instruction. This exibility allows the same code to be used for more than one address.
Thus if you wanted to add the values contained in consecutive variables Value3 and Value4, you
could simply change the contents of R0 .
The MOV instruction is used to perform a logical shift left. Using the operand format of the MOV
instruction shown in Program 7.4, a data register can be shifted from 1 to 25 bits on either a byte,
word or longword basis. Another form of the LSL operation allows a shift counter to be specied
in another data register.
Input: Value = 5F
Output: Result = 050F
Program 7.5: nibble.s Disassemble a byte into its high and low order nibbles
This is an example of byte manipulation. The ARM allows most instructions which operate on
words also to operate on bytes. Thus, by using the B sux, all the LDRinstructions in Program 7.5
become LDRB instructions, therefore performing byte operations. The STR instruction must remain,
since we are storing a halfword value. If we were only dealing with a one byte result, we could use
the STRB byte version of the store instruction.
Remember that the MOV instruction performs register-to-register transfers. This use of the MOV
instruction is quite frequent.
Generally, it is more ecient in terms of program memory usage and execution time to minimise
references to memory.
62 CHAPTER 7. DATA MOVEMENT
Sample Problems
a b
Input: Value1 = 12345678 12345678
Value2 = 87654321 0ABCDEF1
The Compare instruction, CMP, sets the status register ags as if the destination, R1 , were sub-
tracted from the source R2 . The order of the operands is the same as the operands in the subtract
instruction, SUB.
The conditional transfer instruction BHI transfers control to the statement labeled Done if the
unsigned contents of R2 are greater than or equal to the contents of R1 . Otherwise, the next
instruction (on line 12) is executed. At Done, register R2 will always contain the larger of the two
values.
The BHI instruction is one of several conditional branch instructions. To change the program to
operate on signed numbers, simply change the BHI to BGE (Branch if Greater than or Equal to):
...
CMP R1, R2
BGE Done
...
You can use the following table 7.1 to use when performing signed and unsigned comparisons.
Note that the same instructions are used for signal and unsigned addition, subtraction, or com-
parison; however, the comparison operations are dierent.
The conditional branch instructions are an example of program counter relative addressing. In
other words, if the branch condition is satised, control will be transfered to an address relative
7.1. PROGRAM EXAMPLES 63
to the current value of the program counter. Dealing with compares and branches is an important
part of programming. Don't confuse the sense of the CMP instruction. After a compare, the relation
tested is:
For exampe, if the condition is less than, then you test for destination less than source. Become
familiar with all of the conditions and their meanings. Unsigned compares are very useful when
comparing two addresses.
Here we introduce several important and powerful instructions from the ARM instruction set. As
before, at line 8 we use the LDR R0 to hold the starting address
instruction which causes register
of Value1. At line 9 the instruction LDR
R1, [R0] fetches the rst 4 bytes (32-bits) of the 64-bit
value, starting at the location pointed to by R0 and places them in the R1 register. Line 10 loads
the second 4 bytes or the lower half of the 64-bit value from the memroy address pointed to by
R0 plus 4 bytes ([R0, #4]. Between them R1 and R2 now hold the rst 64-bit value, R1 has
the upper half while R2 has the lower half. Lines 1113 repeat this process for the second 64-bit
value, reading it into R3 and R4 .
Next, the two low order word s, held in R2 and R4 are added, and the result stored in R6 .
This is all straightforward, but note now the use of the S sux to the ADD instruction. This forces
the update of the ags as a result of the ADD operation. In other words, if the result of the addition
results in a carry, the carry ag bit will be set.
Now the ADC (add with carry) instruction is used to add the two high order word s, held in R1 and
R3 , but taking into account any carry resulting from the previous addition.
Finally, the result is stored using the same technique as we used the load the values (lines 1618).
Program 7.8: factorial.s Lookup the factorial from a table by using the address of the memory
location
1 ; Lookup the factorial from a table using the address of the memory location
2
3 TTL Ch4Ex9 - factorial
4 AREA Program, CODE, READONLY
5 ENTRY
6
7 Main
8 LDR R0, =DataTable ; Load the address of the lookup table
9 LDR R1, Value ; Offset of value to be looked up
10 MOV R1, R1, LSL #0x2 ; Data is declared as 32bit - need
11 ; to quadruple the offset to point at the
12 ; correct memory location
13 ADD R0, R0, R1 ; R0 now contains memory address to store
14 LDR R2, [R0]
15 LDR R3, =Result ; The address where we want to store the answer
16 STR R2, [R3] ; Store the answer
17
18 SWI &11
7.2. PROBLEMS 65
19
20 AREA DataTable, DATA
21
22 DCD 1 ;0! = 1 ; The data table containing the factorials
23 DCD 1 ;1! = 1
24 DCD 2 ;2! = 2
25 DCD 6 ;3! = 6
26 DCD 24 ;4! = 24
27 DCD 120 ;5! = 120
28 DCD 720 ;6! = 720
29 DCD 5040 ;7! = 5040
30 Value DCB 5
31 ALIGN
32 Result DCW 0
33
34 END
The approach to this table lookup problem, as implemented in this program, demonstrates the
use of oset addressing. The rst two LDR instructions, load register R0 with the start address of
1
the lookup table , and register R1 contents of Value.
The actual calculation of the entry in the table is determined by the rst operand of the R1, R1,
LSL #0x2 instruction. The long word contents of address register R1 are added to the long word
contents of data register R0 to form the eective address used to index the table entry. When R0
is used in this manner, it is referred to as an index register.
7.2 Problems
1 Note that we are using a LDR instruction as the data table is sucently far away from the instruction that an
ADR instruction is not valid.
66 CHAPTER 7. DATA MOVEMENT
Sample Problems
Test A Test B
Input: VALUE 415D7834 9284C15D
Input: LIST 0C
02
06
09
Sample Problems
Test A Test B
Input: VALUE 182B 182B
COUNT 0003 0020
In the rst case the value is to be shifted left by three bits, while in the second case the same
value is to be shifted by thirty two bits.
68 CHAPTER 7. DATA MOVEMENT
8 Logic
69
70 CHAPTER 8. LOGIC
Program 8.7a: factorial.s Lookup the factorial from a table by using the address of the
memory location
1 ; Lookup the factorial from a table using the address of the memory location
2
3 TTL Ch4Ex9 - factorial
4 AREA Program, CODE, READONLY
5 ENTRY
6
7 Main
8 LDR R0, =DataTable ; Load the address of the lookup table
9 LDR R1, Value ; Offset of value to be looked up
10 MOV R1, R1, LSL #0x2 ; Data is declared as 32bit - need
11 ; to quadruple the offset to point at the
12 ; correct memory location
13 ADD R0, R0, R1 ; R0 now contains memory address to store
14 LDR R2, [R0]
15 LDR R3, =Result ; The address where we want to store the answer
16 STR R2, [R3] ; Store the answer
17
18 SWI &11
19
20 AREA DataTable, DATA
21
22 DCD 1 ;0! = 1 ; The data table containing the factorials
23 DCD 1 ;1! = 1
24 DCD 2 ;2! = 2
25 DCD 6 ;3! = 6
26 DCD 24 ;4! = 24
27 DCD 120 ;5! = 120
28 DCD 720 ;6! = 720
29 DCD 5040 ;7! = 5040
30 Value DCB 5
31 ALIGN
32 Result DCW 0
33
34 END
9 Program Loops
The program loop is the basic structure that forces the CPU to repeat a sequence of instructions.
Loops have four sections:
1. The initialisation section, which establishes the starting values of counters, pointers, and
other variables.
2. The processing section, where the actual data manipulation occurs. This is the section that
does the work.
3. The loop control section, which updates counters and pointers for the next iteration.
4. The concluding section, that may be needed to analyse and store the results.
The computer performs Sections 1 and 4 only once, while it may perform Sections 2 and 3 many
times. Therefore, the execution time of the loop depends mainly on the execution time of Sections
2 and 3. Those sections should execute as quickly as possible, while the execution times of Sections
1 and 4 have less eect on overall program speed.
section of the while loop at all. The repeat-until loop is more Loop Control Section
natural, but the while loop is often more ecient and eliminates Until task completed
the problem of going through the processing sequence once even Concluding Section
The computer can use the loop structure to process large sets of
data (usually called arrays). The simplest way to use one se- Algorithm 9.1b
quence of instructions to handle an array of data is to have the Initialisation Section
program increment a register (usually an index register or stack While task incomplete
pointer) after each iteration. Then the register will contain the Processing Section
address of the next element in the array when the computer re- Repeat
peats the sequence of instructions. The computer can then handle
arrays of any length with a single program.
Register indirect addressing is the key to the processing arrays since it allows you to vary the
actual address of the data (the eective address ) by changing the contents of a register. The
autoincrementing mode is particularly convenient for processing arrays since it automatically up-
dates the register for the next iteration. No additional instruction is necessary. You can even have
an automatic increment by 2 or 4 if the array contains 16-bit or 32-bit data or addresses.
Although our examples show the processing of arrays with autoincrementing (adding 1, 2, or 4 after
each iteration), the procedure is equally valid with autodecrementing (subtracting 1, 2, or 4 before
each iteration). Many programmers nd moving backward through an array somewhat awkward
71
72 CHAPTER 9. PROGRAM LOOPS
and dicult to follow, but it is more ecient in many situations. The computer obviously does
not know backward from forward. The programmer, however, must remember that the processor
increments an address register after using it but decrements an address register before using it.
This dierence aects initialisation as follows:
1. When moving forward through an array (autoincrementing), start the register pointing to
the lowest address occupied by the array.
2. When moving backward through an array (autodecrementing), start the register pointing
one step (1, 2, or 4) beyond the highest address occupied by the array.
Program 9.1a: sum16.s Add a series of 16 bit numbers by using a table address
1 * Add a series of 16 bit numbers by using a table address look-up
2
3 TTL Ch5Ex1
4 AREA Program, CODE, READONLY
5 ENTRY
6
7 Main
8 LDR R0, =Data1 ;load the address of the lookup table
9 EOR R1, R1, R1 ;clear R1 to store sum
10 LDR R2, Length ;init element count
11 Loop
12 LDR R3, [R0] ;get the data
13 ADD R1, R1, R3 ;add it to r1
14 ADD R0, R0, #+4 ;increment pointer
15 SUBS R2, R2, #0x1 ;decrement count with zero set
16 BNE Loop ;if zero flag is not set, loop
17 STR R1, Result ;otherwise done - store result
18 SWI &11
19
20 AREA Data1, DATA
21
22 Table DCW &2040 ;table of values to be added
23 ALIGN ;32 bit aligned
24 DCW &1C22
25 ALIGN
26 DCW &0242
27 ALIGN
28 TablEnd DCD 0
29
30 AREA Data2, DATA
31 Length DCW (TablEnd - Table) / 4 ;because we're having to align
32 ALIGN ;gives the loop count
33 Result DCW 0 ;storage for result
34
35 END
Program 9.1b: sum16b.s Add a series of 16 bit numbers by using a table address look-up
1 * Add a series of 16 bit numbers by using a table address look-up
9.1. PROGRAM EXAMPLES 73
2 * This example has nothing in the lookup table, and the program handles this
3
4 TTL Ch5Ex2
5 AREA Program, CODE, READONLY
6 ENTRY
7
8 Main
9 LDR R0, =Data1 ;load the address of the lookup table
10 EOR R1, R1, R1 ;clear R1 to store sum
11 LDR R2, Length ;init element count
12 CMP R2, #0
13 BEQ Done
14 Loop
15 LDR R3, [R0] ;get the data that R0 points to
16 ADD R1, R1, R3 ;add it to r1
17 ADD R0, R0, #+4 ;increment pointer
18 SUBS R2, R2, #0x1 ;decrement count with zero set
19 BNE Loop ;if zero flag is not set, loop
20 Done
21 STR R1, Result ;otherwise done - store result
22 SWI &11
23
24 AREA Data1, DATA
25
26 Table ;Table is empty
27 TablEnd DCD 0
28
29 AREA Data2, DATA
30 Length DCW (TablEnd - Table) / 4 ;because we're having to align
31 ALIGN ;gives the loop count
32 Result DCW 0 ;storage for result
33
34 END
32-bit
64-bit
Program 9.2a: countneg.s Scan a series of 32 bit numbers to nd how many are negative
1 * Scan a series of 32 bit numbers to find how many are negative
2
3 TTL Ch5Ex3
4 AREA Program, CODE, READONLY
5 ENTRY
6
7 Main
8 LDR R0, =Data1 ;load the address of the lookup table
9 EOR R1, R1, R1 ;clear R1 to store count
10 LDR R2, Length ;init element count
11 CMP R2, #0
12 BEQ Done ;if table is empty
13 Loop
14 LDR R3, [R0] ;get the data
15 CMP R3, #0
16 BPL Looptest ;skip next line if +ve or zero
17 ADD R1, R1, #1 ;increment -ve number count
18 Looptest
19 ADD R0, R0, #+4 ;increment pointer
20 SUBS R2, R2, #0x1 ;decrement count with zero set
74 CHAPTER 9. PROGRAM LOOPS
Program 9.2b: countneg16.s Scan a series of 16 bit numbers to nd how many are negative
1 * Scan a series of 16 bit numbers to find how many are negative
2
3 TTL Ch5Ex4
4 AREA Program, CODE, READONLY
5 ENTRY
6
7 Main
8 LDR R0, =Data1 ;load the address of the lookup table
9 EOR R1, R1, R1 ;clear R1 to store count
10 LDR R2, Length ;init element count
11 CMP R2, #0
12 BEQ Done ;if table is empty
13 Loop
14 LDR R3, [R0] ;get the data
15 AND R3, R3, #0x8000 ;bit wise AND to see if the 16th
16 CMP R3, #0x8000 ;bit is 1
17 BEQ Looptest ;skip next line if zero
18 ADD R1, R1, #1 ;increment -ve number count
19 Looptest
20 ADD R0, R0, #+4 ;increment pointer
21 SUBS R2, R2, #0x1 ;decrement count with zero set
22 BNE Loop ;if zero flag is not set, loop
23 Done
24 STR R1, Result ;otherwise done - store result
25 SWI &11
26
27 AREA Data1, DATA
28
29 Table DCW &F152 ;table of values to be tested
30 ALIGN
31 DCW &7F61
32 ALIGN
33 DCW &8000
34 ALIGN
35 TablEnd DCD 0
36
37 AREA Data2, DATA
38 Length DCW (TablEnd - Table) / 4 ;because we're having to align
39 ALIGN ;gives the loop count
40 Result DCW 0 ;storage for result
41
42 END
9.1. PROGRAM EXAMPLES 75
Program 9.3: largest16.s Scan a series of 16 bit numbers to nd the largest
1 * Scan a series of 16 bit numbers to find the largest
2
3 TTL Ch5Ex5
4 AREA Program, CODE, READONLY
5 ENTRY
6
7 Main
8 LDR R0, =Data1 ;load the address of the lookup table
9 EOR R1, R1, R1 ;clear R1 to store largest
10 LDR R2, Length ;init element count
11 CMP R2, #0
12 BEQ Done ;if table is empty
13 Loop
14 LDR R3, [R0] ;get the data
15 CMP R3, R1 ;bit is 1
16 BCC Looptest ;skip next line if zero
17 MOV R1, R3 ;increment -ve number count
18 Looptest
19 ADD R0, R0, #+4 ;increment pointer
20 SUBS R2, R2, #0x1 ;decrement count with zero set
21 BNE Loop ;if zero flag is not set, loop
22 Done
23 STR R1, Result ;otherwise done - store result
24 SWI &11
25
26 AREA Data1, DATA
27
28 Table DCW &A152 ;table of values to be tested
29 ALIGN
30 DCW &7F61
31 ALIGN
32 DCW &F123
33 ALIGN
34 DCW &8000
35 ALIGN
36 TablEnd DCD 0
37
38 AREA Data2, DATA
39
40 Length DCW (TablEnd - Table) / 4 ;because we're having to align
41 ALIGN ;gives the loop count
42 Result DCW 0 ;storage for result
43
44 END
9.2 Problems
Note: Checksums are often used to ensure that data has been correctly read. A checksum calcu-
lated when reading the data is compared to a checksum that is stored with the data. If the two
checksums do not agree, the system will usually indicate an error, or automatically read the data
again.
Sample Problem:
Sample Problem:
Positive
8D489867 ( )
Zero
21202549 ( )
Negative
00000000 ( )
Positive
E605546C ( )
00000004 ( )
Sample Problem:
Sample Problem:
Input: NUM 2866B794 = 0011 1000 0110 0110 1011 0111 1001 0100
Output: NUMBITS 0F = 15
Sample Problem:
78 CHAPTER 9. PROGRAM LOOPS
START 205A15E3 (0010 0000 0101 1010 0001 0101 1101 0011 13)
256C8700 (0010 0101 0110 1100 1000 0111 0000 0000 11)
295468F2 (0010 1001 0101 0100 0110 1000 1111 0010 14)
29856779 (0010 1001 1000 0101 0110 0111 0111 1001 16)
9147592A (1001 0001 0100 0111 0101 1001 0010 1010 14)
Microprocessors often handle data which represents printed characters rather than numeric quan-
tities. Not only do keyboards, printers, communications devices, displays, and computer terminals
expect or provide character-coded data, but many instruments, test systems, and controllers also
require data in this form. ASCII (American Standard Code for Information Interchange) is the
most commonly used code, but others exist.
We use the standard seven-bit ASCII character codes, as shown in Table 10.1; the character code
occupies the low-order seven bits of the byte, and the most signicant bit of the byte holds a 0 or
a parity bit.
• The codes for the numbers and letters form ordered sequences. Since the ASCII codes for
the characters 0 through 9 are 3016 through 3916 you can convert a decimal digit to the
equivalent ASCII characters (and ASCII to decimal) by simple adding the ASCII oset: 3016
= ASCII 0. Since the codes for letters (4116 through 5A16 and 6116 through 7A16 ) are in
order, you can alphabetises strings by sorting them according to their numerical values.
• The computer does not distinguish between printing and non-printing characters. Only the
I/0 devices make that distinction.
• An ASCII I/0 device handles data only in ASCII. For example, if you want an ASCII printer
to print the digit 7, you must send it 3716 as the data; 0716 will ring the bell. Similarly,
if a user presses the 9 key on an ASCII keyboard, the input data will be 3916 ; 0916 is the
tab key.
• Many ASCII devices do not use the entire character set. For example, devices may ignore
many control characters and may not print lower-case letters.
• Despite the denition of the control characters many devices interpret them dierently. For
example they typically uses control characters in a special way to provide features such as
cursor control on a display, and to allow software control of characteristics such as rate of
data transmission, print width, and line length.
79
80 CHAPTER 10. STRINGS
MSB
LSB 0 1 2 3 4 5 6 7 Control Characters
0 NUL DLE SP 0 @ P ` p NUL Null DLE Data link escape
1 SOH DC1 ! 1 A Q a q SOH Start of heading DC1 Device control 1
2 STX DC2 " 2 B R b r STX Start of text DC2 Device control 2
3 ETX DC3 # 3 C S c s ETX End of text DC3 Device control 3
4 EOT DC4 $ 4 D T d t EOT End of tx DC4 Device control 4
5 ENQ NAK % 5 E U e u ENQ Enquiry NAK Negative ack
6 ACK SYN & 6 F V f v ACK Acknowledge SYN Synchronous idle
7 BEL ETB ' 7 G W g w BEL Bell, or alarm ETB End of tx block
8 BS CAN ( 8 H X h x BS Backspace CAN Cancel
9 HT EM ) 9 I Y i y HT Horizontal tab EM End of medium
A LF SUB * : J Z j z LF Line feed SUB Substitute
B VT ESC + ; K [ k { VT Vertical tab ESC Escape
C FF FS , < L \ l | FF Form feed FS File separator
D CR GS - = M ] m } CR Carriage return GS Group separator
E SO RS . > N ^ n ~ SO Shift out RS Record separator
F SI US / ? 0 _ o DEL SI Shift in US Unit separator
SP Space DEL Delete
• Each ASCII character occupies eight bits. This allows a large character set but is wasteful
when only a few characters are actually being used. If, for example, the data consists entirely
of decimal numbers, the ASCII format (allowing one digit per byte) requires twice as much
storage, communications capacity, and processing time as the BCD format (allowing two
digits per byte).
The assembler includes a feature to make character-coded data easy to handle, single quotation
marks around a character indicate the character's ASCII value. For example,
is the same as
The rst form is preferable for several reasons. It increases the readability of the instruction, it
also avoids errors that may result from looking up a value in a table. The program does not
depend on ASCII as the character set, since the assembler handles the conversion using whatever
code has been designed for.
Individual characters on there own are not really all that helpful. As humans we need a string
of characters in order to form meaningful text. In assembly programming it is normal to have
to process one character at a time. However, the assembler does at least allow us to store a
string of byes (characters) in a friendly manner with the DCB directive. For example, line 26 of
program 10.1a is:
Binary: 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 0D
Text: H e l l o , SP W o r l d CR
Use table 10.1 to check that this is correct. In order to make the program just that little bit more
readable, line 5 denes the label CR to have the value for a Carriage Return (0D16 ).
There are three main methods for handling strings: Fixed Length, Terminated, and Counted. It
is normal for a high level language to support just one method. C/C++ and Java all support the
use of Zero-Terminated strings, while Pascal and Ada use counted strings. Although it is possible
to provide your own support for the alternative string type it is seldom done. A good programmer
will use a mix of methods depending of the nature of the strings concerned.
This has an immediate advantages in that the management of the strings is simple when compared
to the alternative methods. For example we only need one label for an array of strings, and we
can calculate the starting position of the nth string by a simple multiplication.
This advantage is however also a major disadvantage. For example a persons name can be anything
from two characters to any number of characters. Although it would be possible to reserve sucient
space for the longest of names this amount of memory would be required for all names, including
the two letter ones. This is a signicant waist of memory.
It would be possible to reserve just ten characters for each name. When a two letter name appears
it would have to be padded out with spaces in order to make the name ten characters in length.
When a name longer than ten characters appears it would have to be truncated down to just ten
characters thus chopping o part of the name. This requires extra processing and is not entirely
friendly to users who happen to have a long name.
When there is little memory and all the strings are known in advance it may be a good idea to
use xed length strings. For example, command driven systems tend to use a xed length strings
for the list of commands.
Over the years several dierent sentinels have been used, these include $ (2616 ), EOT (End of
Text 0416 ), CR (Carriage Return 0D16 ), LF (Line Feed 0A16 ) and NUL (No character
0016 ). Today the most commonly used sentinel is the NUL character, primarily because it is used
by C/C++. The NUL character also has a good feeling about it, as it is represented by the value
0, has no other meaning and it is easier to detected than any other character. This is frequently
referred to as a Null- or Zero-Terminated string or simply as an ASCIIZ string.
The terminated string has the advantage that it can be of any length. Processing the string is
fairly simply, you enter into a loop processing each character at a time until you reach the sentinel.
The disadvantage is that the sentinel character can not appear in the string. This is another reason
why the NUL character is such a good choice for the sentinel.
82 CHAPTER 10. STRINGS
A counted string may appear rather clumsy at rst. Having the length of the string as a binary
value has a distinct advantage over the terminated string. It allow the use of the counting instruc-
tions that have been included in many instruction sets. This means we can ignore the testing for
a sentinel character and simply decrement our counter, this is a far faster method of working.
To scan through an array of strings we simply point to the rst string, and add the length count
to our pointer to obtain the start of the next string. For a terminated string we would have to
scan for the sentinel for each string.
There are two disadvantages with the counted string. The string does have a maximum length,
255 characters or 64K depending on the size of the count value (8- or 16-bit). Although it is
normally felt that 64K should be sucient for most strings. The second disadvantage is their
perceived complexity. Many people feel that the complexity of the counted string outweighs the
speed advantage.
As computing expands outside of the English speaking world we have to provide support for
languages other than standard American. Many European languages use letters that are not
available in standard ASCII, for example: ÷, ×, ø, Ø, æ, Æ, ª, , ÿ, ½, and ¾. This is particularly
important when dealing with names: Ångstrøm, Karlstraÿe or ukasiewicz.
The ASCII character set is not even capable of handling English correctly. When we borrow a
word from another language we also use it's diacritic marks (or accents ). For example I would
rather see pâté on a menu rather than pate. ASCII does not provide support for such accents.
To overcome this limitation the international community has produced a new character encoding,
known as Unicode. In Unicode the character code is two bytes long, the rst byte indicates which
character set the character comes from, while the second byte indicates the character position
within the character set. The traditional ASCII character set is incorporated into Unicode as
character set zero. In the revised C standard a new data type of wchar was dened to cater for
this new wide character.
While Unicode is sucient to represent the characters from most modern languages, it is not
sucient to represent all the written languages of the world, ancient and modern. Hence an
extended version, known as Unicode-32 is being developed where the character set is a 23-bit
value (three bytes). Unicode is a subset of Unicode-32, while ASCII is a subset of Unicode.
Although we do not consider Unicode you should be aware of the problem of international character
sets and the solution Unicode provides.
Program 10.1a: strlencr.s Find the length of a Carage Return terminated string
10.4. PROGRAM EXAMPLES 83
Program 10.4: setparity.s Set the parity bit on a series of characters store the amended
string in Result
1 ; Set the parity bit on a series of characters store the amended string in Result
2
3 TTL Ch6Ex5
4
5 AREA Program, CODE, READONLY
6 ENTRY
7
8 Main
9 LDR R0, =Data1 ;load the address of the lookup table
10 LDR R5, =Pointer
11 LDRB R1, [R0], #1 ;store the string length in R1
12 CMP R1, #0
13 BEQ Done ;nothing to do if zero length
14 MainLoop
15 LDRB R2, [R0], #1 ;load the first byte into R2
16 MOV R6, R2 ;keep a copy of the original char
17 MOV R2, R2, LSL #24 ;shift so that we are dealing with msb
18 MOV R3, #0 ;zero the bit counter
19 MOV R4, #7 ;init the shift counter
20
21 ParLoop
22 MOVS R2, R2, LSL #1 ;left shift
23 BPL DontAdd ;if msb is not a one bit, branch
24 ADD R3, R3, #1 ;otherwise add to bit count
25 DontAdd
26 SUBS R4, R4, #1 ;update shift count
27 BNE ParLoop ;loop if still bits to check
28 TST R3, #1 ;is the parity even
29 BEQ Even ;if so branch
30 ORR R6, R6, #0x80 ;otherwise set the parity bit
31 STRB R6, [R5], #1 ;and store the amended char
32 BAL Check
33 Even STRB R6, [R5], #1 ;store the unamended char if even pty
34 Check SUBS R1, R1, #1 ;decrement the character count
35 BNE MainLoop
36
37 Done SWI &11
38
86 CHAPTER 10. STRINGS
44 ALIGN
45 Match DCD &FFFF ;storage for parity characters
46
47 END
Program 10.5b: strcmp.s Compare null terminated strings for equality assume that we have
no knowledge of the data structure so we must assess the individual strings
1 ; Compare two null terminated strings for equality
2
3 TTL Ch6Ex7
4
5 AREA Program, CODE, READONLY
6 ENTRY
7
8 Main
9 LDR R0, =Data1 ;load the address of the lookup table
10 LDR R1, =Data2
11 LDR R2, Match ;assume strings not equal, set to -1
12 MOV R3, #0 ;init register
13 MOV R4, #0
14 Count1
15 LDRB R5, [R0], #1 ;load the first byte into R5
16 CMP R5, #0 ;is it the terminator
17 BEQ Count2 ;if not, Loop
18 ADD R3, R3, #1 ;increment count
19 BAL Count1
20 Count2
21 LDRB R5, [R1], #1 ;load the first byte into R5
22 CMP R5, #0 ;is it the terminator
23 BEQ Next ;if not, Loop
24 ADD R4, R4, #1 ;increment count
25 BAL Count2
26
27 Next CMP R3, R4
28 BNE Done ;if they are different lengths,
29 ;they can't be equal
30 CMP R3, #0 ;test for zero length if both are
31 BEQ Same ;zero length, nothing else to do
32 LDR R0, =Data1 ;need to reset the lookup table
33 LDR R1, =Data2
34
35 * if we got this far, we now need to check the string char by char
36 Loop
37 LDRB R5, [R0], #1 ;character of first string
38 LDRB R6, [R1], #1 ;character of second string
39 CMP R5, R6 ;are they the same
40 BNE Done ;if not the strings are different
41 SUBS R3, R3, #1 ;use the string length as a counter
42 BEQ Same ;if we got to the end of the count
43 ;the strings are the same
44 BAL Loop ;not done, loop
45
46 Same
47 MOV R2, #0 ;clear the -1 from match (0 = match)
48 Done
49 STR R2, Match ;store the result
50 SWI &11
51
52 AREA Data1, DATA
53 Table1 DCB "Hello, World", 0 ;the string
54 ALIGN
55
56 AREA Data2, DATA
57 Table2 DCB "Hello, worl", 0 ;the string
58
88 CHAPTER 10. STRINGS
10.5 Problems
Sample Problem:
Sample Problems:
Test A Test B
Input: START String String
String 37 ( 7) 41 ( A)
0D (CR) 20 (Space)
48 ( H)
41 ( A)
54 ( T)
20 (Space)
20 (Space)
0D (CR)
Sample Problems:
Test A Test B
Input: START String String
LENGTH 4 3
Note that in the second case (Test B) the output is unchaged, as the number is assumed to be
671..
Sample Problems:
Test A Test B
Input: START String String
String 3 03
B1 (1011 0001) B1 (1011 0001)
B2 (1011 0010) B6 (1011 0110)
33 (0011 0011) 33 (0011 0011)
Sample Problems:
1. Some conversions can easily be handled by algorithms involving arithmetic or logical func-
tions. The program may, however, have to handle special cases separately.
2. More complex conversions can be handled with lookup tables. The lookup table method
requires little programming and is easy to apply. However the table may occupy a large
amount of memory if the range of input values is large.
3. Hardware is readily available for some conversion tasks. Typical examples are decoders
for BCD to seven-segment conversion and Universal Asynchronous Receiver/Transmitters
(UARTs) for conversion between parallel and serial formats.
In most applications, the program should do as much as possible of the code conversion work.
Most code conversions are easy to program and require little execution time.
91
92 CHAPTER 11. CODE CONVERSION
19
20 AREA Data1, DATA
21 Digit
22 DCD &0C ;the hex digit
23
24 AREA Data2, DATA
25 Result DCD 0 ;storage for result
26
27 END
Program 11.1b: wordtohex.s Convert a 32 bit hexadecimal number to an ASCII string and
output to the terminal
1 * now something a little more adventurous - convert a 32 bit
2 * hexadecimal number to an ASCII string and output to the terminal
3
4 TTL Ch7Ex2
5
6 AREA Program, CODE, READONLY
7 ENTRY
8 Mask EQU 0x0000000F
9
10 start
11 LDR R1, Digit ;load the digit
12 MOV R4, #8 ;init counter
13 MOV R5, #28 ;control right shift
14 MainLoop
15 MOV R3, R1 ;copy original word
16 MOV R3, R3, LSR R5 ;right shift the correct number of bits
17 SUB R5, R5, #4 ;reduce the bit shift
18 AND R3, R3, #Mask ;mask out all but the ls nibble
19 CMP R3, #0xA ;is the number < 10 decimal
20 BLT Add_0 ;then branch
21
22 ADD R3, R3, #"A"-"0"-0xA ;add offset for 'A' to 'F'
23
24 Add_0 ADD R3, R3, #"0" ;convert to ASCII
25 MOV R0, R3 ;prepare to output
26 SWI &0 ;output to console
27 SUBS R4, R4, #1 ;decrement counter
28 BNE MainLoop
29
30 MOV R0, #&0D ;add a CR character
31 SWI &0 ;output it
32 SWI &11 ;all done
33
34 AREA Data1, DATA
35 Digit DCD &DEADBEEF ;the hex word
36
37 END
7
8 Main
9 LDR R0, =Data1 ;load the start address of the table
10 EOR R1, R1, R1 ;clear register for the code
11 LDRB R2, Digit ;get the digit to encode
12 CMP R2, #9 ;is it a valid digit?
13 BHI Done ;clear the result
14
15 ADD R0, R0, R2 ;advance the pointer
16 LDRB R1, [R0] ;and get the next byte
17 Done
18 STR R1, Result ;store the result
19 SWI &11 ;all done
20
21 AREA Data1, DATA
22 Table DCB &3F ;the binary conversions table
23 DCB &06
24 DCB &5B
25 DCB &4F
26 DCB &66
27 DCB &6D
28 DCB &7D
29 DCB &07
30 DCB &7F
31 DCB &6F
32 ALIGN
33
34 AREA Data2, DATA
35 Digit DCB &05 ;the number to convert
36 ALIGN
37
38 AREA Data3, DATA
39 Result DCD 0 ;storage for result
40
41 END
Program 11.4b: ubcdtohalf2.s Convert an unpacked BCD number to binary using MUL
1 * convert an unpacked BCD number to binary using MUL
2
3 TTL Ch7Ex6
4
5 AREA Program, CODE, READONLY
6 ENTRY
7
8 Main
9 LDR R0, =BCDNum ;load address of BCD number
10 MOV R5, #4 ;init counter
11 MOV R1, #0 ;clear result register
12 MOV R2, #0 ;and final register
13 MOV R7, #10 ;multiplication constant
14
11.1. PROGRAM EXAMPLES 95
15 Loop
16 MOV R6, R1
17 MUL R1, R6, R7 ;mult by 10
18 LDRB R4, [R0], #1 ;load digit and incr address
19 ADD R1, R1, R4 ;add the next digit
20 SUBS R5, R5, #1 ;decr counter
21 BNE Loop ;if count != 0, loop
22
23 STR R1, Result ;store the result
24 SWI &11 ;all done
25
26 AREA Data1, DATA
27 BCDNum DCB &02,&09,&07,&01 ;an unpacked BCD number
28 ALIGN
29
30 AREA Data2, DATA
31 Result DCD 0 ;storage for result
32
33 END
Program 11.5: halftobin.s Store a 16bit binary number as an ASCII string of '0's and '1's
7 6 5 4 3 2 1 0
0 g f e d c b a
0 3F 0 0 1 1 1 1 1 1
1 06 0 0 0 0 0 1 1 0
2 5B 0 1 0 1 1 0 1 1
3 4F 0 1 0 0 1 1 1 1 a
4 66 0 1 1 0 0 1 1 0
5 6D 0 1 1 0 1 1 0 1 f b
6 7D 0 1 1 1 1 1 0 1
7 07 0 0 0 0 0 1 1 1 g
8 7F 0 1 1 1 1 1 1 1 e c
9 6F 0 1 1 0 1 1 1 1
A 77 0 1 1 1 0 1 1 1 d
B 7C 0 1 1 1 1 1 0 0
C 3A 0 0 1 1 1 0 0 1
D 5E 0 1 0 1 1 1 1 0
E 7A 0 1 1 1 1 0 0 1
F 71 0 1 1 1 0 0 0 1
11.2 Problems
Sample Problems:
Test A Test B
Input: A_DIGIT 43 ( C) 36 ( 6)
Output: H_DIGIT 0C 06
Sample Problems:
Test A Test B
Input: CODE 4F 28
Output: NUMBER 03 FF
Sample Problems:
Test A Test B
Input: DIGIT 07 55
Sample Problem:
Sample Problem:
Sample Problems:
Test A Test B
Input: STRING 31 (1) 31 (1)
31 (1) 31 (1)
30 (0) 30 (0)
31 (1) 31 (1)
30 (0) 30 (0)
30 (0) 37 (7)
31 (1) 31 (1)
30 (0) 30 (0)
NUMBER Valid
No Error Error
Output: D2 (1101 0010) 00 ( )
ERROR 0 ( ) FF ( )
98 CHAPTER 11. CODE CONVERSION
12 Arithmetic
Most processors provide for both signed and unsigned binary arithmetic. Signed numbers are
represented in two's complement form. This means that the operations of addition and subtraction
are the same whether the numbers are signed or unsigned.
Multiple-precision binary arithmetic requires simple repetitions of the basic instructions. The
Carry ag transfers information between words. It is set when an addition results in a carry or
a subtraction results in a borrow. Add with Carry and Subtract with Carry use this information
from the previous arithmetic operation.
Decimal arithmetic is a common enough task for microprocessors that most have special instruc-
tions for this purpose. These instructions may either perform decimal operations directly or correct
the results of binary operations to the proper decimal form. Decimal arithmetic is essential in
such applications as point-of-sale terminals, check processors, order entry systems, and banking
terminals.
You can implement decimal multiplication and division as series of additions and subtractions,
respectively. Extra storage must be reserved for results, since a multiplication produces a result
twice as long as the operands. A division contracts the length of the result. Multiplications and
divisions are time-consuming when done in software because of the repeated operations that are
necessary.
99
100 CHAPTER 12. ARITHMETIC
Program 12.3: addbcd.s Add two packed BCD numbers to give a packed BCD result
1 * add two packed BCD numbers to give a packed BCD result
2
3 TTL Ch8Ex3
4 AREA Program, CODE, READONLY
5 ENTRY
6
7 Mask EQU 0x0000000F
8
9 Main
10 LDR R0, =Result ;address for storage
11 LDR R1, BCDNum1 ;load the first BCD number
12 LDR R2, BCDNum2 ;and the second
13 LDRB R8, Length ;init counter
14 ADD R0, R0, #3 ;adjust for offset
15 MOV R5, #0 ;carry
16
17 Loop
18 MOV R3, R1 ;copy what is left in the data register
19 MOV R4, R2 ;and the other number
20 AND R3, R3, #Mask ;mask out everything except low order nibble
21 AND R4, R4, #Mask ;mask out everything except low order nibble
22 MOV R1, R1, LSR #4 ;shift the original number one nibble
23 MOV R2, R2, LSR #4 ;shift the original number one nibble
24 ADD R6, R3, R4 ;add the digits
25 ADD R6, R6, R5 ;and the carry
26 CMP R6, #0xA ;is it over 10?
27 BLT RCarry1 ;if not, reset the carry to 0
28 MOV R5, #1 ;otherwise set the carry
29 SUB R6, R6, #0xA ;and subtract 10
30 B Next
31 RCarry1
32 MOV R5, #0 ;carry reset to 0
33
34 Next
35 MOV R3, R1 ;copy what is left in the data register
36 MOV R4, R2 ;and the other number
37 AND R3, R3, #Mask ;mask out everything except low order nibble
38 AND R4, R4, #Mask ;mask out everything except low order nibble
39 MOV R1, R1, LSR #4 ;shift the original number one nibble
40 MOV R2, R2, LSR #4 ;shift the original number one nibble
41 ADD R7, R3, R4 ;add the digits
42 ADD R7, R7, R5 ;and the carry
43 CMP R7, #0xA ;is it over 10?
44 BLT RCarry2 ;if not, reset the carry to 0
45 MOV R5, #1 ;otherwise set the carry
46 SUB R7, R7, #0xA ;and subtract 10
47 B Loopend
12.1. PROGRAM EXAMPLES 101
48
49 RCarry2
50 MOV R5, #0 ;carry reset to 0
51 Loopend
52 MOV R7, R7, LSL #4 ;shift the second digit processed to the left
53 ORR R6, R6, R7 ;and OR in the first digit to the ls nibble
54 STRB R6, [R0], #-1 ;store the byte, and decrement address
55 SUBS R8, R8, #1 ;decrement loop counter
56 BNE Loop ;loop while > 0
57 SWI &11
58
59 AREA Data1, DATA
60 Length DCB &04
61 ALIGN
62 BCDNum1 DCB &36, &70, &19, &85 ;an 8 digit packed BCD number
63
64 AREA Data2, DATA
65 BCDNum2 DCB &12, &66, &34, &59 ;another 8 digit packed BCD number
66
67 AREA Data3, DATA
68 Result DCD 0 ;storage for result
69
70 END
12.1.4 Multiplication
16-Bit
32-Bit
Program 12.4b: mul32.s Multiply two 32 bit number to give a 64 bit result (corrupts R0 and
102 CHAPTER 12. ARITHMETIC
R1)
1 * multiply two 32 bit number to give a 64 bit result
2 * (corrupts R0 and R1)
3
4 TTL Ch8Ex4
5 AREA Program, CODE, READONLY
6 ENTRY
7
8 Main
9 LDR R0, Number1 ;load first number
10 LDR R1, Number2 ;and second
11 LDR R6, =Result ;load the address of result
12 MOV R5, R0, LSR #16 ;top half of R0
13 MOV R3, R1, LSR #16 ;top half of R1
14 BIC R0, R0, R5, LSL #16 ;bottom half of R0
15 BIC R1, R1, R3, LSL #16 ;bottom half of R1
16 MUL R2, R0, R1 ;partial result
17 MUL R0, R3, R0 ;partial result
18 MUL R1, R5, R1 ;partial result
19 MUL R3, R5, R3 ;partial result
20 ADDS R0, R1, R0 ;add middle parts
21 ADDCS R3, R3, #&10000 ;add in any carry from above
22 ADDS R2, R2, R0, LSL #16 ;LSB 32 bits
23 ADC R3, R3, R0, LSR #16 ;MSB 32 bits
24
25 STR R2, [R6] ;store LSB
26 ADD R6, R6, #4 ;increment pointer
27 STR R3, [R6] ;store MSB
28 SWI &11 ;all done
29
30 AREA Data1, DATA
31 Number1 DCD &12345678 ;a 16 bit binary number
32 Number2 DCD &ABCDEF01 ;another
33 ALIGN
34
35 AREA Data2, DATA
36 Result DCD 0 ;storage for result
37 ALIGN
38
39 END
12.2 Problems
Sample Problem:
That is,
2F5B856884C32546706C9567
− 14DF409885B81095A3BC1284
1A7C44CFFF0B14B0CCB082E3
Sample Problem:
That is,
367019857834
− 126634593269
240385264565
PROD1
product is
Output: 0000
72EC (MULU 72ECB8C25B6016 )
B8C2
5B60
PROD2
Shift product is
0000
72EC ( 72ECB8C25B6016 )
B8C2
5B60
13 Tables and Lists
Tables and lists are two of the basic data structures used with all computers. We have already seen
tables used to perform code conversions and arithmetic. Tables may also be used to identify or
respond to commands and instructions, provide access to les or records, dene the meaning of keys
or switches, and choose among alternate programs. Lists are usually less structured than tables.
Lists may record tasks that the processor must perform, messages or data that the processor must
record, or conditions that have changed or should be monitored.
Program 13.1a: insert.s Examine a table for a match - store a new entry at the end if no
match found
1 * examine a table for a match - store a new entry at
2 * the end if no match found
3
4 TTL Ch9Ex1
5 AREA Program, CODE, READONLY
6 ENTRY
7
8 Main
9 LDR R0, List ;load the start address of the list
10 LDR R1, NewItem ;load the new item
11 LDR R3, [R0] ;copy the list counter
12 LDR R2, [R0], #4 ;init counter and increment pointer
13 LDR R4, [R0], #4
14 Loop
15 CMP R1, R4 ;does the item match the list?
16 BEQ Done ;found it - finished
17 SUBS R2, R2, #1 ;no - get the next item
18 LDR R4, [R0], #4 ;get the next item
19 BNE Loop ;and loop
20
21 SUB R0, R0, #4 ;adjust the pointer
22 ADD R3, R3, #1 ;increment the number of items
23 STR R3, Start ;and store it back
24 STR R1, [R0] ;store the new item at the end of the list
25
26 Done SWI &11
27
28 AREA Data1, DATA
29 Start DCD &4 ;length of list
30 DCD &5376 ;items
31 DCD &7615
32 DCD &138A
33 DCD &21DC
34 Store % &20 ;reserve 20 bytes of storage
105
106 CHAPTER 13. TABLES AND LISTS
35
36 AREA Data2, DATA
37 NewItem DCD &16FA
38 List DCD Start
39
40 END
Program 13.1b: insert2.s Examine a table for a match - store a new entry if no match found
extends insert.s
1 * examine a table for a match - store a new entry if no match found
2 * extends Ch9Ex1
3
4 TTL Ch9Ex2
5 AREA Program, CODE, READONLY
6 ENTRY
7
8 Main
9 LDR R0, List ;load the start address of the list
10 LDR R1, NewItem ;load the new item
11 LDR R3, [R0] ;copy the list counter
12 LDR R2, [R0], #4 ;init counter and increment pointer
13 CMP R3, #0 ;it's an empty list
14 BEQ Insert ;so store it
15 LDR R4, [R0], #4 ;not empty - move to 1st item
16 Loop
17 CMP R1, R4 ;does the item match the list?
18 BEQ Done ;found it - finished
19 SUBS R2, R2, #1 ;no - get the next item
20 LDR R4, [R0], #4 ;get the next item
21 BNE Loop ;and loop
22
23 SUB R0, R0, #4 ;adjust the pointer
24 Insert ADD R3, R3, #1 ;incr list count
25 STR R3, Start ;and store it
26 STR R1, [R0] ;store new item at the end
27
28 Done SWI &11 ;all done
29
30 AREA Data1, DATA
31 Start DCD &4 ;length of list
32 DCD &5376 ;items
33 DCD &7615
34 DCD &138A
35 DCD &21DC
36 Store % &20 ;reserve 20 bytes of storage
37
38 AREA Data2, DATA
39 NewItem DCD &16FA
40 List DCD Start
41
42 END
5 ENTRY
6
7 Main
8 LDR R0, =NewItem ;load the address past the list
9 SUB R0, R0, #4 ;adjust pointer to point at last element of list
10 LDR R1, NewItem ;load the item to test
11 LDR R3, Start ;init counter by reading index from list
12 CMP R3, #0 ;are there zero items
13 BEQ Missing ;zero items in list - error condition
14 LDR R4, [R0], #-4
15 Loop
16 CMP R1, R4 ;does the item match the list?
17 BEQ Done ;found it - finished
18 BHI Missing ;if the one to test is higher, it's not in the list
19 SUBS R3, R3, #1 ;no - decr counter
20 LDR R4, [R0], #-4 ;get the next item
21 BNE Loop ;and loop
22 ;if we get to here, it's not there either
23 Missing MOV R3, #0xFFFFFFFF ;flag it as missing
24
25 Done STR R3, Index ;store the index (either index or -1)
26 SWI &11 ;all done
27
28 AREA Data1, DATA
29 Start DCD &4 ;length of list
30 DCD &0000138A ;items
31 DCD &000A21DC
32 DCD &001F5376
33 DCD &09018613
34
35 AREA Data2, DATA
36 NewItem DCD &001F5376
37 Index DCW 0
38 List DCD Start
39
40 END
23
24 * each item consists of a pointer to the next item, and some data
25 Item1 DCD Item2 ;pointer
26 DCB 30, 20 ;data
27
28 Item2 DCD Item3 ;pointer
29 DCB 30, 0xFF ;data
30
31 Item3 DCD 0 ;pointer (NULL)
32 DCB 30,&87,&65 ;data
33
34 END
13.2 Problems
Sample Problems:
Test A Test B
Input Output Input Output
6C20432E
2054A346 D0102596
05723A64 3A64422B
12576C20 6C20432E
Sample Problems
Test A Test B
Input Output Input Output
ITEM
Table Table
7A35B310 7A35B310
LIST
Table 0005 No change
since ITEM
00000004 00000005
already in
09250037 09250037 09250037
7A35B310 list.
29567322 29567322 29567322
A356A101
A356A101 7A35B310
E235C203
E235C203 A356A101
E235C203
Sample Problem:
110 CHAPTER 13. TABLES AND LISTS
ITEM
item1 item1
23854760 23854760
QUEUE
item1 item2 item2
00000001 00000001
item2
00000123 00000123
item3
item3
00123456 00000000 00123456
23854760 00000000
Input Output
LENGTH
4A4B4C 13
00000004
LIST
4A4B41 37
414243 07 (ABC) (JKL)
444B41 3F
4A4B4C 13 (JKL) (JKA)
414243 07
4A4B41 37 (JKA) (DKA)
444B41 3F (DKA) (ABC)
Sample Problem:
INDEX 00000010
111
112 CHAPTER 14. THE STACK
15 Subroutines
None of the examples that we have shown thus far is a typical program that would stand by itself. Most
real programs perform a series of tasks, many of which may be used a number of times or be common to
other programs.
The standard method of producing programs which can be used in this manner is to write subroutines
that perform particular tasks. The resulting sequences of instructions can be written once, tested once,
and then used repeatedly.
There are special instructions for transferring control to subroutines and restoring control to the main
program. We often refer to the special instruction that transfers control to a subroutine as Call, Jump, or
Brach to a Subroutine. The special instruction that restores control to the main program is usually called
Return.
In the ARM the Branch-and-Link instruction (BL) is used to Branch to a Subroutine. This saves the
current value of the program counter ( PC or R15 ) in the Link Register (LR or R14 ) before placing the
starting address of the subroutine in the program counter. The ARM does not have a standard Return
from Subroutine instruction like other processors, rather the programmer should copy the value in the
Link Register into the Program Counter in order to return to the instruction after the Branch-and-Link
instruction. Thus, to return from a subroutine you should the instruction:
MOV PC, LR
Should the subroutine wish to call another subroutine it will have to save the value of the Link Register
before calling the nested subroutine.
Relocatable
The code can be placed anywhere in memory. You can use such a subroutine easily, regardless of
other programs or the arrangement of the memory. A relocating loader is necessary to place the
program in memory properly; the loader will start the program after other programs and will add
the starting address or relocation constant to all addresses in the program.
Position Independent
The code does not require a relocating loader all program addresses are expressed relative to the
program counter's current value. Data addresses are held in-registers at all times. We will discuss
the writing of position independent code later in this chapter.
Reentrant
The subroutine can be interrupted and called by the interrupting program, giving the correct results
for both the interrupting and interrupted programs. Reentrant subroutines are required for good
for event based systems such as a multitasking operating system (Windows or Unix) and embedded
real time environments. It is not dicult to make a subroutine reentrant. The only requirement is
that the subroutine uses just registers and the stack for its data storage, and the subroutine is self
contained in that it does not use any value dened outside of the routine (global values).
Recursive
The subroutine can call itself. Such a subroutine clearly must also be reentrant.
113
114 CHAPTER 15. SUBROUTINES
Most programs consist of a main program and several subroutines. This is useful as you can use known pre-
written routines when available and you can debug and test the other subroutines properly and remember
their exact eects on registers and memory locations.
You should provide sucient documentation such that users need not examine the subroutine's internal
structure. Among necessary specications are:
In order to be really useful, a subroutine must be general. For example, a subroutine that can perform
only a specialized task, such as looking for a particular letter in an input string of xed length, will not
be very useful. If, on the other hand, the subroutine can look for any letter, in strings of any length, it
will be far more helpful.
In order to provide subroutines with this exibility, it is necessary to provide them with the ability to
receive various kinds of information. We call data or addresses that we provide the subroutine parameters.
An important part of writing subroutines is providing for transferring the parameters to the subroutine.
This process is called Parameter Passing.
The registers often provide a fast, convenient way of passing parameters and returning results. The
limitations of this method are that it cannot be expanded beyond the number of available registers; it
often results in unforeseen side eects; and it lacks generality.
The trade-o here is between fast execution time and a more general approach. Such a trade-o is common
in computer applications at all levels. General approaches are easy to learn and consistent; they can be
automated through the use of macros. On the other hand, approaches that take advantage of the specic
features of a particular task require less time and memory. The choice of one approach over the other
depends on your application, but you should take the general approach (saving programming time and
simplifying documentation and maintenance) unless time or memory constraints force you to do otherwise.
Using this method of parameter passing, the subroutine can simply assume that the parameters are
there. Results can also be returned in registers, or the addresses of locations for results can be passed as
parameters via the registers. Of course, this technique is limited by the number of registers available.
Processor features such as register indirect addressing, indexed addressing, and the ability to use any
register as a stack pointer allow far more powerful and general ways of passing parameters.
If you place the parameter block immediately after the subroutine call the address of the parameter
block is automatically place into the Link Register by the Branch and Link instruction. The subroutine
must modify the return address in the Link Register in addition to fetching the parameters. Using this
technique, our example would be modied as follows:
BL Subr
DCD BufferLen ;Buffer Length
DCD BufferA ;Buffer A starting address
DCD BufferB ;Buffer B starting address
The subroutine saves' prior contents of CPU registers, then loads parameters and adjusts the return
address as follows:
The addressing mode [LR], #4 will read the value at the address pointed to by the Link Register and then
move the register on by four bytes. Thus at the end of this sequence the value of LR has been updated to
point to the next instruction after the parameter block.
xed
This parameter passing technique has the advantage of being easy to read. It has, however, the disadvan-
tage of requiring parameters to be when the program is written. Passing the address of the parameter
block in via a register allows the papa meters to be changed as the program is running.
The subroutine must begin by loading parameters into CPU registers as follows:
Subr STMIA R12, {R0, R1, R2, R12, R14} ; save working registers to stack
LDR R0, [R12, #0] ; Buffer Length in D0
LDR R1, [R12, #4] ; Buffer A starting address
LDR R2, [R12, #8] ; Buffer B starting address
... ; Main function of subroutine
LDMIA R12, {R0, R1, R2, R12, R14} ; Recover working registers
MOV PC, LR ; Return to caller
116 CHAPTER 15. SUBROUTINES
In this approach, all parameters are passed and results are returned on the stack.
The stack grows downward (toward lower addresses). This occurs because elements are pushed onto the
stack using the pre-decrement address mode. The use of the pre-decrement mode causes the stack pointer
to always contain the address of the last occupied location, rather than the next empty one as on some
other microprocessors. This implies that you must initialise the stack pointer to a value higher than the
largest address in the stack area.
When passing parameters on the stack, the programmer must implement this approach as follows:
1. Decrement the system stack pointer to make room for parameters on the system stack, and store
them using osets from the stack pointer, or simply push the parameters on the stack.
2. Access the parameters by means of osets from the system stack pointer.
3. Store the results on the stack by means of osets from the systems stack pointer.
4. Clean up the stack before or after returning from the subroutine, so that the parameters are removed
and the results are handled appropriately.
Regardless of our approach to passing parameters, we can specify the parameters in a variety of ways. For
example, we can:
pass-by-value
Where the actual values are placed in the parameter list. The name comes from the fact that it is
only the value of the parameter that is passed into the subroutine rather than the parameter itself.
This is the method used by most high level programming languages.
pass-by-reference
The address of the parameters are placed in the parameter list. The subroutine can access the value
directly rather than a copy of the parameter. This is much more dangerous as the subroutine can
change a value you don't want it to.
pass-by-name
Rather than passing either the value or a reference to the value a string containing the name of the
parameter is passed. This is used by very high level languages or scripting languages. This is very
exible but rather time consuming as we need to look up the value associated with the variable
name every time we wish to access the variable.
17
18 AREA Stack1, DATA
19 Value1 DCD 0xFFFF
20 Value2 DCD 0xDDDD
21 Value3 DCD 0xAAAA
22 Value4 DCD 0x3333
23
24 AREA Data2, DATA
25 Stack % 40 ;reserve 40 bytes of memory for the stack
26 StackEnd
27 DCD 0
28
29 END
16 * =========================
17 * Hexdigit subroutine
18 * =========================
19
20 * Purpose
21 * Hexdigit subroutine converts a Hex digit to an ASCII character
22 *
23 * Initial Condition
24 * R0 contains a value in the range 00 ... 0F
25 *
26 * Final Condition
27 * R0 contains ASCII character in the range '0' ... '9' or 'A' ... 'F'
28 *
29 * Registers changed
30 * R0 only
31 *
32 * Sample case
33 * Initial condition R0 = 6
34 * Final condition R0 = 36 ('6')
35
36 Hexdigit
37 CMP R0, #0xA ;is it > 9
38 BLE Addz ;if not skip the next
39 ADD R0, R0, #"A" - "0" - 0xA ;adjust for A .. F
40
41 Addz
42 ADD R0, R0, #"0" ;convert to ASCII
43 MOV PC, LR ;return from subroutine
44
45 AREA Data1, DATA
46 HDigit DCB 6 ;digit to convert
47 AChar DCB 0 ;storage for ASCII character
48
49 END
bystack.s A more complex subroutine example program passes variables to the routine
using the stack
Program 15.1f:
61 END
15.6 Problems
Write both a calling program for the sample problem and at least one properly documented subroutine
for each problem.
Test A Test B
Input: R0 43 `C' 36 `6'
Output: R0 0C 06
Input: R0 String
STRING 42 `B'
32 `2'
46 `F'
30 `0'
Output: R0 0000B2F0
Sample Problems:
Sample Problems:
Test A B
R1 String
String
Input: 6100
43 `C' 32 `2'
61 `a' 50 `P'
74 `t' 49 `I'
String String
0D CR 0D CR
Output: R1 + 4 + 0
(CR) (2)
124 CHAPTER 15. SUBROUTINES
Test A Test B
R0 String String
String
Input:
03 03
47 47
AF AF
18 19
Output: Z 00 FF
Note that 1916 is 0001 10012 which has three 1 bits and is thus has an odd parity.
String String
Test A Test B
R0
String
Input:
03 (Length) 03 (Length)
41 (`A') 61 (`a')
42 (`B') 62 (`b')
125
126 CHAPTER 16. INTERRUPTS AND EXCEPTIONS
A ARM Instruction Definitions
Operation
A Register Transfer Language (RTL) / pseudo-code description of what the instruction does. For
details of the register transfer language, see section ?? on page ??.
Syntax
<cc> Condition Codes <op1> Data Movement Addressing Modes <op2> Memory Addressing
Modes <S> Set Flags bit
Description
Written description of what the instruction does. This will interpret the formal description given
in the operation part. It will also describe any additional notations used in the Syntax part.
Exceptions
This gives details of which exceptions can occur during the instruction. Prefetch Abort is not listed
because it can occur for any instruction.
Usage
Suggestions and other information relating to how an instruction can be used eectively.
Condition Codes
Indicates what happens to the CPU Condition Code Flags if the set ags option where to be set.
Notes
Contain any additional explanation that we can not t into the previous categories.
Appendix B provides a summary of the more common instructions in a more compact manner, using the
operation section only.
cc : Rd ← Rn + hop1 i + CPSR(C)
cc S : CPSR ← ALU(Flags)
Operation h i
h ih i
Syntax ADChcc ihS i Rd Rn hop1 i
, ,
Description The ADC (Add with Carry) instruction adds the value of op1 i
h and the Carry ag to the
value of Rn and stores the result in Rd . The condition code ags are optionally updated,
based on the result.
ADDS R4,R0,R2
ADC R5,R1,R3
ADC R5,R1,R3
127
128 A.3 Bitwise AND (AND)
to:
ADCS R5,R1,R3
the resulting values of the ags indicate:
The following instruction produces a single-bit Rotate Left with Extend operation (33-bit
rotate through the Carry ag) on R0 :
ADCS R0,R0,R0
See Data-processing operands - Rotate right with extend for information on how to perform
a similar rotation to the right.
Condition Codes
The N and Z ags are set according to the result of the addition, and the C and V ags are
set according to whether the addition generated a carry (unsigned overow) and a signed
overow, respectively.
ADD Add
cc : Rd ← Rn + hop1 i
cc S : CPSR ← ALU(Flags)
Operation h i
h ih i
Syntax ADDhcc ihS i Rd Rn hop1 i
, ,
Description Adds the value of hop1 i to the value of register Rn, and stores the result in the destination
register Rd . The condition code ags are optionally updated, based on the result.
Usage The ADD instruction is used to add two values together to produce a third.
Condition Codes
The N and Z ags are set according to the result of the addition, and the C and V ags are
set according to whether the addition generated a carry (unsigned overow) and a signed
overow, respectively.
cc : Rd ← Rn ∧ hop1 i
cc S : CPSR ← ALU(Flags)
Operation h i
h ih i
Syntax ANDhcc ihS i Rd Rn hop1 i
, ,
Usage AND is most useful for extracting a eld from a register, by ANDing the register with a
mask value that has 1s in the eld to be extracted, and 0s elsewhere.
Condition Codes
op1 i
The N and Z ags are set according to the result of the operation, and the C ag is set to
the carry output generated by h (see 5.1 on page 45) The V ag is unaected.
cc L : LR ← PC + 8
cc : PC ← PC + hoset i
Operation h ih i
h i
Syntax BhLihcc i hoset i
Description The B (Branch) and BL (Branch and Link) instructions cause a branch to a target address,
and provide both conditional and unconditional changes to program ow.
The BL (Branch and Link) instruction stores a return address in the link register ( LR or
R14 ).
The hoset i species the target address of the branch. The address of the next instruction
is calculated by adding the oset to the program counter ( PC) which contains the address
of the branch instruction plus 8.
• Storing a group of registers and R14 to the stack on subroutine entry, using an in-
struction of the form:
Notes Branching backwards past location zero and forwards over the end of the 32-bit address
space is UNPREDICTABLE.
CMP Compare
cc : ALU(0) ← Rn - hop1 i
cc : CSPR ← ALU(Flags)
Operation h i
h i
Syntax CMPhcc i Rn hop1 i
,
CMP
op1 i
Description The (Compare) instruction compares a register value with another arithmetic value.
The condition ags are updated, based on the result of subtracting h from Rn, so that
subsequent instructions can be conditionally executed.
Condition Codes
The N and Z ags are set according to the result of the subtraction, and the C and V ags
are set according to whether the subtraction generated a borrow (unsinged underow) and
a signed overow, respectively.
130 A.7 Load Multiple (LDM)
EOR Exclusive OR
cc : Rd ← Rn ⊕ hop1 i
cc S : CPSR ← ALU(Flags)
Operation h i
h ih i
Syntax EORhcc ihS i Rd Rn hop1 i
, ,
hop1 i
Description The EOR (Exclusive OR) instruction performs a bitwise Exclusive-OR of the value of
register Rn with the value of , and stores the result in the destination register Rd .
The condition code ags are optionally updated, based on the result.
Usage EOR can be used to invert selected bits in a register. For each bit, EOR with 1 inverts
that bit, and EOR with 0 leaves it unchanged.
Condition Codes
The N and Z ags are set according to the result of the operation, and the C ag is set to
the carry output bit generated by the shifter. The V ag is unaected.
Operation if hcc i
IA: addr ← Rn
IB: addr ← Rn + 4
DA: addr ← Rn - (#hregisters i * 4) + 4
DB: addr ← Rn - (#hregisters i * 4)
for each register Ri in hregisters i
IB: addr ← addr + 4
DB: addr ← addr - 4
Ri ← M(addr)
IA: addr ← addr + 4
DA: addr ← addr - 1
h! i: Rn ← addr
Syntax LDMhcc ihmode i Rnh! i, {hregisters i}
Description The LDM (Load Multiple) instruction is useful for block loads, stack operations and pro-
cedure exit sequences. It loads a subset, or possibly all, of the general-purpose registers
from sequential memory locations.
The general-purpose registers loaded can include the PC. If they do, the word loaded for
thePC is treated as an address and a branch occurs to that address.
The register Rn points to the memory local to load the values from. Each of the registers
registers i
hmode i
listed in h is loaded in turn, reading each value from the next memory address as
directed by , one of:
IB Increment Before
DB Decrement Before
IA Increment After
DA Decrement After
The base register writeback option (h !i) causes the base register to be modied to hold the
address of the nal valued loaded.
The register are loaded in sequence, the lowest-numbered register from the lowest memory
address, through to the highest-numbered register from the highest memory address.
If the PC (R15 ) is specied in the register list, the instruction causes a branch to the address
loaded into the PC.
Exceptions Data Abort
Condition Codes
The condition codes are not eected by this instruction.
A.8 Load Register (LDR) 131
Operation cc : Rd ← M(hop2 i)
h i
Syntax LDRhcc i Rd hop2 i
,
LDR
hop1 i
Description The (Load Register) instruction loads a word from the memory address calculated by
and writes it to register Rd .
If the PC is specied as register Rd , the instruction loads a data word which it treats as an
address, then branches to that address.
Usage Using the PC as the base register allows PC-relative addressing, which facilitates position-
independent code. Combined with a suitable addressing mode, LDR allows 32-bit memory
data to be loaded into a general-purpose register where its value can be manipulated. If
the destination register is the PC, this instruction loads a 32-bit address from memory and
branches to that address.
To synthesize a Branch with Link, precede the LDR instruction with MOV LR, PC.
Condition Codes
The condition codes are not eected by this instruction.
Notes If hop2 i species an address that is not word-aligned, the instruction attempts to load a
byte. The result is UNPREDICTABLE and the LDRB instruction should be used.
If hop2 i Rd and
species base register writeback (!), and the same register is specied for
Rn, the results are UNPREDICTABLE.
If the PC (R15 ) is specied for Rd , the value must be word alligned otherwise the result is
UNPREDICTABLE.
cc : Rd (7:0) ← M(hop2 i)
cc : Rd (31:8) ← 0
Operation h i
h i
Syntax LDRhcc iB Rd hop2 i ,
LDRB
hop2 i
Description The (Load Register Byte) instruction loads a byte from the memory address calculated
by , zero-extends the byte to a 32-bit word, and writes the word to register Rd .
Exceptions Data Abort
Usage LDRB allows 8-bit memory data to be loaded into a general-purpose register where it can
be manipulated.
Using the PC as the base register allows PC-relative addressing, to facilitate position-
independent code.
Condition Codes
The condition codes are not eected by this instruction.
MOV Move
cc : Rd ← hop1 i
cc S : CPSR ← ALU(Flags)
Operation h i
h ih i
132 A.12 Bitwise OR (ORR)
Syntax cc S R op1 i
MOVh ih i d , h
Description The MOV (Move) instruction moves the value of hop1 i to the destination register Rd . The
condition code ags are optionally updated, based on the result.
• Perform a shift without any other arithmetic or logical operation. A left shift by n
n
can be used to multiply by 2 .
• When the PC is the destination of the instruction, a branch occurs. The instruction:
MOV PC, LR
can therefore be used to return from a subroutine (see instructions B , and BL on
page 129).
Condition Codes
The N and Z ags are set according to the value moved (post-shift if a shift is specied),
and the C ag is set to the carry output bit generated by the shifter (see 5.1 on page 45).
The V ag is unaected.
cc : Rd ← hop1 i
cc S : CPSR ← ALU(Flags)
Operation h i
h ih i
Syntax MVNhcc ihS i Rd hop1 i
,
MVN
hop1 i
Description The (Move Negative) instruction moves the logical one's complement of the value of
Rd
to the destination register . The condition code ags are optionally updated,
based on the result.
Condition Codes
The N and Z ags are set according to the result of the operation, and the C ag is set to
the carry output bit generated by the shifter (see 5.1 on page 45). The V ag is unaected.
ORR Bitwise OR
cc : Rd ← Rn ∨ hop1 i
cc S : CPSR ← ALU(Flags)
Operation h i
h ih i
Syntax ORRhcc ihS i Rd Rn hop1 i
, ,
ORR
hop1 i
Description The (Logical OR) instruction performs a bitwise (inclusive) OR of the value of register
Rn with the value of , and stores the result in the destination register Rd . The
condition code ags are optionally updated, based on the result.
Usage ORR can be used to set selected bits in a register. For each bit, OR with 1 sets the bit, and
OR with 0 leaves it unchanged.
Condition Codes
The N and Z ags are set according to the result of the operation, and the C ag is set to
the carry output bit generated by the shifter (see 5.1 on page 45). The V ag is unaected.
A.13 Subtract with Carry (SBC) 133
cc : Rd ← Rn - hop1 i - NOT(CPSR(C))
cc S : CPSR ← ALU(Flags)
Operation h i
h ih i
Syntax SBChcc ihS i Rd Rn hop1 i
, ,
SBC
hop1 i
Description The (Subtract with Carry) instruction is used to synthesize multi-word subtraction.
SBC subtracts the value of and the value of NOT(Carry ag) from the value of
register Rn, and stores the result in the destination register Rd . The condition code ags
are optionally updated, based on the result.
Usage If register pairs R0 ,R1 and R2 ,R3 hold 64-bit values (R0 and R2 hold the least signicant
words), the following instructions leave the 64-bit dierence in R4 ,R5 :
SUBS R4,R0,R2
SBC R5,R1,R3
Condition Codes
The N and Z ags are set according to the result of the subtraction, and the C and V ags
are set according to whether the subtraction generated a borrow (unsigned underow) and
a signed overow, respectively.
Notes If S
h i is specied, the C ag is set to:
0 if no borrow occurs
In other words, the C ag is used as a NOT(borrow) ag. This inversion of the borrow
condition is usually compensated for by subsequent instructions. For example:
• The SBC and RSC instructions use the C ag as a NOT(borrow) operand, performing
a normal subtraction if C == 1 and subtracting one more than usual if C == 0.
• The HS (unsigned higher or same) and LO (unsigned lower) conditions are equivalent
to CS (carry set) and CC (carry clear) respectively.
Operation if hcc i
IA: addr ← Rn
IB: addr ← Rn + 4
DA: addr ← Rn - (#hregisters i * 4) + 4
DB: addr ← Rn - (#hregisters i * 4)
for each register Ri in hregisters i
IB: addr ← addr + 4
DB: addr ← addr - 4
M(addr) ← Ri
IA: addr ← addr + 4
DA: addr ← addr - 4
h! i: Rn ← addr
Syntax STMhcc ihmode i Rnh! i, {hregisters i}
Description The STM (Store Multiple) instruction stores a subset (or possibly all) of the general-purpose
registers to sequential memory locations.
The register Rn species the base register used to store the registers. Each register given
Rregisters is stored in turn, storing each register in the next memory address as directed
mode i
in
by h , which can be one of:
134 A.16 Store Register Byte (STRB)
IB Increment Before
DB Decrement Before
IA Increment After
DA Decrement After
If the base register writeback option (h !i) is specied, the base register ( Rn) is modied
with the new base address.
hregisters i is a list of registers, separated by commas and species the set of registers to
be stored. The registers are stored in sequence, the lowest-numbered register to the lowest
memory address, through to the highest-numbered register to the highest memory address.
Usage STM is useful as a block store instruction (combined with LDM it allows ecient block copy)
and for stack operations. A single STM used in the sequence of a procedure can push the
return address and general-purpose register values on to the stack, updating the stack
pointer in the process.
Condition Codes
The condition codes are not eected by this instruction.
Notes If R15 (PC) is given as the base register (Rn), the result is UNPREDICTABLE.
If Rn is specied as hregisters i and base register writeback (h! i) is specied:
• If Rn is the lowest-numbered register specied in hregisters i, the original value of Rn
is stored.
Operation cc : M(hop2 i) ← Rd
h i
Syntax cc R op2 i
STRh i d , h
STR Rd to the memory address
hop2 i
Description The (Store Register) instruction stores a word from register
calculated by .
Usage Combined with a suitable addressing mode, STR stores 32-bit data from a general-purpose
register into memory. Using the PC as the base register allows PC-relative addressing,
which facilitates position-independent code.
Condition Codes
The condition codes are not eected by this instruction.
Notes Using the PC as the source register (Rd ) will cause an UNKNOWN value to be written.
If op2 i
h Rd and
species base register writeback (!), and the same register is specied for
R n, the results are UNPREDICTABLE.
The address calculated by hop2 i must be word-alligned. The result of a store to a non-
word-alligned address is UNPREDICTABLE.
STRB
Rd to the memory address calculated by hop2 i.
Description The (Store Register Byte) instruction stores a byte from the least signicant byte of
register
Usage Combined with a suitable addressing mode, STRB writes the least signicant byte of a
general-purpose register to memory. Using the PC as the base register allows PC-relative
addressing, which facilitates position-independent code.
Condition Codes
The condition codes are not eected by this instruction.
SUB Subtract
cc : Rd ← Rn - hop1 i
cc S : CPSR ← ALU(Flags)
Operation h i
h ih i
Syntax SUBhcc ihS i Rd Rn hop1 i, ,
Description Subtracts the value of hop1 i from the value of register Rn, and stores the result in the
destination register Rd . The condition code ags are optionally updated, based on the
result.
Usage SUB is used to subtract one value from another to produce a third. To decrement a register
value (in x ) use:R
SUBS Rx, Rx, #1
SUBS is useful as a loop counter decrement, as the loop branch can test the ags for the
appropriate termination condition, without the need for a compare instruction:
CMP Rx, #0
This both decrements the loop counter in Rx and checks whether it has reached zero.
Condition Codes
The N and Z ags are set according to the result of the subtraction, and the C and V ags
are set according to whether the subtraction generated a borrow (unsigned underow) and
a signed overow, respectively.
Notes If S
h i is specied, the C ag is set to:
1 if no borrow occurs
In other words, the C ag is used as a NOT(borrow) ag. This inversion of the borrow
condition is usually compensated for by subsequent instructions. For example:
• The SBC and RSC instructions use the C ag as a NOT(borrow) operand, performing
a normal subtraction if C == 1 and subtracting one more than usual if C == 0.
• The HS (unsigned higher or same) and LO (unsigned lower) conditions are equivalent
to CS (carry set) and CC (carry clear) respectively.
cci: R14_svc ← PC + 8
cci: SPSR_svc ← CPSR
Operation h
cci: PC
h
h ← 0x00000008
Syntax SWIhcc i hvalue i
Description Causes a SWI exception (see 3.4 on page 29).
Usage The SWI instruction is used as an operating system service call. The method used to select
which operating system service is required is specied by the operating system, and the SWI
exception handler for the operating system determines and provides the requested service.
Two typical methods are:
• hvalue i species which service is required, and any parameters needed by the selected
service are passed in general-purpose registers.
Condition Codes
The ags will be eected by the operation of the software interrupt. It is not possible to
say how they will be eected. The status of the condition code ags is unknown after a
software interrupt is UNKNOWN.
SWP Swap
cc : ALU(0) ← M(Rn)
cc : M(Rn) ← Rm
Operation h i
cc : Rd ← ALU(0)
h i
h i
Syntax SWPhcc i Rd Rm Rn
, , [ ]
Description Swaps a word between registers and memory. SWP loads a word from the memory address
given by the value of register Rn. The value of register Rm is then stored to the memory
address given by the value of Rn, and the original loaded value is written to register Rd . If
the same register is specied for Rd and Rm, this instruction swaps the value of the register
and the value at the memory address.
Condition Codes
The condition codes are not eected by this instruction.
Rd i
If a data abort is signaled on either the load access or the store access, the loaded value is
not written to h . If a data abort is signaled on the load access, the store access does
not occur.
cc : ALU(0) ← M(Rn)
cc : M(Rn) ← Rm(7:0)
Operation h i
cc : Rd (7:0) ← ALU(0)
h i
h i
Syntax SWPhcc iB Rd Rm Rn , , [ ]
Description Swaps a byte between registers and memory. SWPB loads a byte from the memory address
given by the value of register Rn. Rm is
The value of the least signicant byte of register
stored to the memory address given by Rn, the original loaded value is zero-extended to a
32-bit word, and the word is written to register Rd . If the same register is specied for Rd
and Rm, this instruction swaps the value of the least signicant byte of the register and
the byte value at the memory address.
A.20 Swap Byte (SWPB) 137
SWPB
Semaphore instructions
Usage The instruction can be used to implement semaphores, in a similar manner to that
shown for SWP instructions in .
Condition Codes
The condition codes are not eected by this instruction.
Rd i
If a data abort is signaled on either the load access or the store access, the loaded value is
not written to h . If a data abort is signaled on the load access, the store access does
not occur.
138 A.20 Swap Byte (SWPB)
B ARM Instruction Summary
R Rm]!
[ n, hop2 i ← Rn Rm
Rn ← hop2 i
Register Pre-indexed +
Immediate Post-indexed R
[ n], #±hvalue i hop2 i ← Rn
Rn ← Rn
hop2 i ← Rn
+ IR(value)
Register Post-indexed [Rn], Rm
Rn ← Rn Rm
[Rn], Rm, hshift i value i hop2 i ← Rn
+
#h
Rn ← Rn Rm shift
Scaled Register Post-indexed
+ IR(value)
Where hshift i is one of: LSL, LSR, ASR, ROR or RRX and has the same eect as for hop1 i
139
140 APPENDIX B. ARM INSTRUCTION SUMMARY
ARM Instructions
cc R R
ADCh ihS i d , n, h op1 i cci Rd R
← n+h op1
cc op1 i cci op1
Add with Carry h : i + CPSR(C)
R R
ADDh ihS i d , n, h Rd R
← n+h
cc op1 i cci op1
Add h : i
R R
ANDh ihS i d , n, h : Rd R
← n&h
cc oset cci oset
Bitwise AND h i
Bh i
cc oset cci
Branch h i h : PC ← PC + h i
BLh i
cci oset
Branch and Link h i h : LR ← PC + 8
SWPhcc i hcc i
Software Interrupt
Rd , Rm, [Rn] : Rd ← M(Rn)
hcc i
Swap
: M(Rn) ← Rm
SWPhcc iB Rd , Rm, [Rn] hcc i
: Rd (7:0) ← M(Rn)(7:0)
hcc i
Swap Byte
: M(Rn)(7:0) ← Rm(7:0)
Index
141
142 INDEX
Terminated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81