Operating Systems
Operating Systems
Operating Systems
Author
Dr. Sukomal Pal
Associate Professor
Department of Computer Science & Engineering
IIT (BHU), Varanasi, UP
Reviewer
Dr. Sandeep Kumar
Associate Professor
Computer Science and Engineering Department,
IIT Roorkee, Uttarakhand
(ii)
BOOK AUTHOR DETAILS
Sukomal Pal, Associate Professor, Department of Computer Science & Engineering, IIT (BHU),
Varanasi, UP - 221005
Email ID: [email protected]
July, 2023
ISBN : 978-81-963773-1-1
All rights reserved. No part of this work may be reproduced in any form, by mimeograph
or any other means, without permission in writing from the All India Council for Technical
Education (AICTE).
Further information about All India Council for Technical Education (AICTE) courses may be
obtained from the Council Office at Nelson Mandela Marg, Vasant Kunj, New Delhi-110070.
Printed and published by All India Council for Technical Education (AICTE), New Delhi.
Disclaimer: The website links provided by the author in this book are placed for informational,
educational & reference purpose only. The Publisher do not endorse these website links or the
views of the speaker / content of the said weblinks. In case of any dispute, all legal matters to be
settled under Delhi Jurisdiction, only.
(iii)
(iv)
ACKNOWLEDGEMENT
The author is grateful to the authorities of AICTE, particularly Prof. T. G. Sitharam, Chairman;
Dr. Abhay Jere, Vice-Chairman; Prof. Rajive Kumar, Member-Secretary; Dr. Ramesh
Unnikrishnan, Advisor-II and Dr. Sunil Luthra, Director, Training and Learning Bureau for their
planning to publish the books on Operating Systems. We sincerely acknowledge the valuable
contributions of the reviewer of the book Dr Sandeep Kumar, Associate Professor, IIT Roorkee
for making it students’ friendly.
Writing a book is always a dream in an academician’s mind – and my late father Damodar Pal
sowed the seed of inspiration in me. My PhD supervisor Prof Mandar Mitra introduced me to the
realm of operating systems who not only taught me how to teach the subject but also how to live
and play with their different incarnations. At the onset of drafting the manuscript, he supplied
some insightful material and necessary guidance. During the drafting, I was also assisted by many
colleagues and students, mainly through their thoughtful suggestions and constructive feedback.
I also owe to the contribution of my student Shivam Solanki, who supplied me with a set of
questions collected from different sources.
This book is an outcome of various suggestions of AICTE members, experts and authors who
shared their opinion and thought to further develop the engineering education in our country.
Acknowledgements are due to the contributors and different workers in this field whose
published books, review articles, papers, photographs, footnotes, references and other valuable
information enriched us at the time of writing the book.
(v)
PREFACE
The book titled “Operating Systems” is an outcome of my teaching of operating systems courses.
The motivation of writing this book is to expose operating system to the engineering students, the
fundamentals of operating systems as well as enable them to get an insight of the subject. Keeping
in mind the purpose of wide coverage as well as to provide essential supplementary information,
we have included the topics recommended by AICTE, in a very systematic and orderly manner
throughout the book. Efforts have been made to explain the fundamental concepts of the subject
in the simplest possible way.
During the process of preparation of the manuscript, several standard textbooks are consulted
and accordingly questions are developed with answer keys and hints. Emphasis has also been
laid on definitions and explaining concepts with easy real-life examples so that students can
readily relate to. Each chapter ends with a summary and pointers to resources for further
learning.
The book starts with an introduction to the concept of operating systems, placing it appropriately
under the computer black box, as a set of programs in the core of system software. It covers the
evolution of computer as well as that of operating system side by side. In the second unit, some
fundamental concepts like program, process and threads are developed, with their activities and
interactions. Third unit deals with co-ordination among different cooperating processes in a
multiprogramming environment. Process synchronization is discussed in reasonable detail with
sample pseudo-codes. One severe fall-out of concurrent execution of several co-operating
processes is deadlock when neither of the processes / threads can proceed. The definition of
deadlock, when it can form, how it can be avoided and remedies, if it is formed are discussed in
Unit 4. Memory is the second most important component of any computer. Program code and
data are stored in different memory elements and brought to main memory and registers for
processing. How operating systems manage these code and data in the main memory and cache
are discussed in Unit 5. The final chapter deals with management of peripheral and devices by
an operating system.
Even though several books on operating systems are available in the market, this book provides
all the necessary introductory materials in a very concise manner. However, ‘Know More’
sections are also provided for the inquisitive students in each chapter. Questions are designed
following Blooms’ taxonomy incorporating the latest relevant ones from different competitive
examinations.
I sincerely hope that the book will inspire the students to learn and discuss the ideas behind
operating systems and will surely contribute to the development of a solid foundation of the
subject. We would be thankful to all constructive comments and suggestions which will contribute
to the improvement of the future editions of the book. It gives me immense pleasure to place this
book in the hands of the teachers and students.
Dr. Sukomal Pal
(vi)
OUTCOME BASED EDUCATION
For the implementation of an outcome-based education the first requirement is to develop an
outcome-based curriculum and incorporate an outcome-based assessment in the education
system. By going through outcome-based assessments evaluators will be able to evaluate whether
the students have achieved the outlined standard, specific and measurable outcomes. With the
proper incorporation of outcome-based education there will be a definite commitment to achieve
a minimum standard for all learners without giving up at any level. At the end of the programme
running with the aid of outcome- based education, a student will be able to arrive at the following
outcomes:
PO1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering
problems.
PO2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
PO3. Design / development of solutions: Design solutions for complex engineering problems
and design system components or processes that meet the specified needs with
appropriate consideration for the public health and safety, and the cultural, societal, and
environmental considerations.
PO4. Conduct investigations of complex problems: Use research-based knowledge and
research methods including design of experiments, analysis and interpretation of data,
and synthesis of the information to provide valid conclusions.
PO5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.
PO6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.
PO7. Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and
need for sustainable development.
PO8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities
and norms of the engineering practice.
PO9. Individual and team work: Function effectively as an individual, and as a member or
leader in diverse teams, and in multidisciplinary settings.
(vii)
PO10. Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and
write effective reports and design documentation, make effective presentations, and give
and receive clear instructions.
PO11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member
and leader in a team, to manage projects and in multidisciplinary environments.
PO12. Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change.
(viii)
COURSE OUTCOMES
After completion of the course the students will be able to:
CO-1. Create processes and threads.
CO-2. Develop algorithms for process scheduling for a given specification of CPU.
CO-3. Utilization, Throughput, Turnaround Time, Waiting Time, Response Time.
CO-4. For a given specification of memory organization develop the techniques for optimally
allocating memory to processes by increasing memory utilization and for improving the
access time.
CO-5. Design and implement file management system.
CO-6. For a given I/O devices and OS (specify) develop the I/O management functions in OS
as part of a uniform device abstraction by performing operations for synchronization
between CPU and I/O controllers.
(ix)
GUIDELINES FOR TEACHERS
To implement Outcome Based Education (OBE) knowledge level and skill set of the students
should be enhanced. Teachers should take a major responsibility for the proper implementation
of OBE. Some of the responsibilities (not limited to) for the teachers in OBE system may be as
follows:
Within reasonable constraint, they should manoeuvre time to the best advantage of all
students.
They should assess the students only upon certain defined criterion without considering
any other potential ineligibility to discriminate them.
They should try to grow the learning abilities of the students to a certain level before they
leave the institute.
They should try to ensure that all the students are equipped with the quality knowledge
as well as competence after they finish their education.
They should always encourage the students to develop their ultimate performance
capabilities.
They should facilitate and encourage group work and team work to consolidate newer
approach.
They should follow Blooms taxonomy in every part of the assessment.
Bloom’s Taxonomy
Teacher should Student should be Possible Mode of
Level
Check able to Assessment
Students ability to
Create Design or Create Mini project
create
Students ability to
Evaluate Argue or Defend Assignment
justify
Students ability to Differentiate or
Analyse Project/Lab Methodology
distinguish Distinguish
Students ability to Operate or Technical Presentation/
Apply
use information Demonstrate Demonstration
Students ability to
Understand Explain or Classify Presentation/Seminar
explain the ideas
Students ability to
Remember recall (or Define or Recall Quiz
remember)
(x)
GUIDELINES FOR STUDENTS
Students should take equal responsibility for implementing the OBE. Some of the responsibilities
(not limited to) for the students in OBE system are as follows:
Students should be well aware of each UO before the start of a unit in each and every
course.
Students should be well aware of each CO before the start of the course.
Students should be well aware of each PO before the start of the programme.
Students should think critically and reasonably with proper reflection and action.
Learning of the students should be connected and integrated with practical and real life
consequences.
Students should be well aware of their competency at every level of OBE.
(xi)
ABBREVIATIONS AND SYMBOLS
General Terms
Abbreviations Full form Abbreviations Full form
CAS Compare & Swap Mutex Mutual Exclusion
CLI Command Line Interface NRU Not Recently Used
CS Critical Section OPT Optimal
CSP Critical Section Problem OS Operating Systmem
DLL Dynamic Link Libraries PCB Process Control Block
DMA Direct Memory Access PT Page Table
FCFS First Come First Serve RAG Resource Allocation Graph
FIFO First In First Out RAID Redundant Array of
Inexpensive Disk
HAL Hardware Abstraction RM Resident Manager
Layer
I/O Input / Output RTOS Real Time Operating System
IPC Interprocess SC Second Chance
Communication
JVM Java Virtual Machine TSL Test & Set Lock
KLT Kernel Level Thread ULT User Level Thread
LRU Least Recently Used VM Vrtual Machine
LWP Light Weight Process VMM Virtual Machine Manager
(xii)
CONTENTS
Foreword iv
Acknowledgement v
Preface vi
Outcome Based Education vii
Course Outcomes ix
Guidelines for Teachers x
Guidelines for Students xi
Abbreviations and Symbols xii
UNIT 1: INTRODUCTION
(xiii)
1.3.10 Embedded Systems 12
1.5.8 Monitoring 19
1.7.2 Microkernel 23
(xiv)
1.8 OS CASE STUDIES 27
1.8.1 UNIX 27
1.8.1.1 System Design 27
1.8.1.2 User Perspective 28
1.8.1.3 OS Services 28
1.8.2 WINDOWS 29
1.8.2.1 System Design 29
1.8.2.2 User Perspective 30
1.8.2.3 OS Services 31
UNIT SUMMARY 31
EXERCISES 32
PRACTICAL 33
KNOW MORE 33
(xv)
2.7 THREADS 45
2.7.1 Definition 45
PRACTICAL 78
KNOW MORE 78
(xvi)
3.1.1 Shared Memory Model 82
3.2 SYNCHRONIZATION 87
(xvii)
3.8.2 The Readers-Writers Problem 106
UNIT 4: DEADLOCKS
(xviii)
5.4 MEMORY ALLOCATION 149
(xix)
6.3.1 Interrupt Handlers 189
INDEX 222
(xxi)
1 Introduction
UNIT SPECIFICS
Through this unit we have discussed the following aspects:
● Concept of Operating Systems
● Generations of Operating systems
● Types of Operating Systems, OS Services, System Calls
● Structure of an OS - Layered, Monolithic, Microkernel Operating Systems
● Concept of Virtual Machine
● Case study on UNIX and WINDOWS Operating System.
This chapter introduces operating systems and their uses in computers. It also provides an intuitive understanding
of the entire book in a nutshell. Emphasis is placed on lucidly discussing the topic with simple real-life examples that
a novice reader can readily relate to and remember the underlying concept easily. The examples should not be taken
as precise and exact, but approximate ones to serve the purpose in general.
Besides giving several multiple-choice questions as well as questions of short and long answer types marked in
two categories following lower and higher order of Bloom’s taxonomy, assignments through several numerical
problems, a list of references and suggested readings are given in the unit so that one can go through them for
practice. It is important to note that for getting more information on various topics of interest, appropriate URLs
and QR code have been provided in different sections which can be accessed or scanned for relevant supportive
knowledge.
We also have a “Know More” section. This section has been carefully designed so that the supplementary
information provided in this part becomes beneficial for the users of the book. This section mainly highlights the
initial activity, examples of some interesting facts, analogy, history of the development of the subject focusing the
salient observations and finding, timelines starting from the development of the concerned topics up to the recent
time, applications of the subject matter for our day-to-day real life or/and industrial applications on variety of
aspects, case study related to the topic, and finally to serve the inquisitiveness and curiosity of the readers related
to topics.
Operating Systems | 2
RATIONALE
This introductory unit on operating systems helps students to get a primary idea about the system software that
works at the core of a computer system and as an important layer between the bare hardware and the users. It
starts with the rudimentary division of a computing system into hardware and software and then where and why
an operating system fits in this binary division. This basic understanding is very important to start the study of
operating systems properly. It then discusses different activities of an operating system and its evolution over
the years. All these are discussed with simple real-life examples for important concepts and necessary details
to develop the subject. Keeping in mind the need of intended readers, efforts are made to keep the content
minimum and language simple.
Operating Systems is an important subject of Computer Science. As a user of any computing device from a
large-size mainframe or a server of a datacentre to a personal computer or a smartphone, even a smart domestic
appliance like washing machine or dishwasher, we interact with operating system interfaces. Hence, basic
understanding of operating systems is more than necessary for everyone in this world of today driven by
knowledge. For an academician or a practitioner of computer science, it is an absolute necessity as it enriches
the understanding of both system software and application programs. This enables one to comprehend the
functions and behaviours of different computing devices, what they can do and what they cannot, and thus use
the computers in a better way in everyday life. At the same time, it equips one to design a software, write
programs and implement different algorithms in an efficient and effective manner.
PRE-REQUISITES
Basics of Computer Organization and Architecture
Fundamentals of Data Structures
Introductory knowledge of Computer Programming
UNIT OUTCOMES
CO-1: To learn the mechanisms of OS to handle processes and threads and their communication.
CO-2: To learn the mechanisms involved in memory management in contemporary OS.
CO-3: To gain knowledge on distributed operating system concepts that includes architecture, Mutual exclusion
algorithms, deadlock detection algorithms and agreement protocols.
CO-4: To know the components and management aspects of concurrency management.
3 | Introduction
between the users (programmers or end-users) and the hardware. An OS provides an “easy-to-use” platform to the
users over the “difficult-to-use” bare-bone hardware.
A simple analogy can be made with the railway operation here. A train can certainly run on the tracks without a
station. However, it will be difficult to get in and out of the train for the passengers (the users). Railways not only
provide platforms for easy boarding and deboarding but several facilities for passengers’ convenience. Railways
provide drivers with guards and signalling staff with a signalling for smooth running of the train. Ticket counters
at the station enable us to buy tickets for journeys and other services in the station premises. Think of different
services like waiting rooms, washrooms, food stalls and so on.
Keep this example of a railway system in mind to understand different activities of an operating system. Rail tracks,
some signals or train rakes can be considered analogous to the hardware components of a computer. Railway
reservation system, on the other hand, is a software in every sense.
The entire railway system is built to serve two basic purposes: 1. to cater passengers (users) and 2. to manage
railways resources (trains, other physical resources and manpower). Loosely put, an operating system is like the
railways system. Its functions can thus also be viewed from two perspectives: 1) User View and 2) System View as
briefly described below.
User View
In most cases, a personal computer (PC) or a laptop is used by a single user (Fig 1.4) at a given point of time. The
user monopolizes the resources (hardware resources like processors, memories, I/O devices etc or software
resources like programs, files, databases etc). The primary job of the operating system is to ensure ease of use of
the resources.
However, in a multi-user environment (Fig. 1.5), several users work on a single system (mainframe, minicomputer,
workstation or a server) connected through their own terminals. The users share different computing resources
(h/w and s/w) and exchange information. Here, resource utilization in an equitable and fair manner is very
important. The job of an OS is not only to offer ease of use to the users, but also to ensure that every user gets a
fair share of the resources (both h/w and s/w) and maximise the overall performance (e.g., maximum resource
utilization, lowest overall CPU time etc) of the system from all the users’ point of view.
Operating Systems | 6
Smartphones and tablets of today’s world are single-user computers, but they connect to a server and cloud
through cellular or wireless networks. The OS provides a touchscreen (or keypad) to interact with the user as well
as with a remote server and/or a cloud to provide service to the user.
However, there are few embedded systems where the operating system does not or very rarely interact with the
user (home appliances like refrigerators, washing machines, dishwashers or car indicators).
System View
As an OS can directly interact with the hardware of the system and the user programs cannot, OS must ensure the
resource allocation to all users. These resources can be hardware resources like CPU time, memory blocks, I/O and
network devices as well as software resources like programs, file systems etc. From a computer's point of view,
these resources need to be controlled, managed and allocated by an operating system. Hence, an OS is also seen
as a control program or resource allocator for a computer.
The need for interactive computing was met in time-sharing phase. The computer was connected to several
terminals where each terminal can offer interaction with a user. While a user interacts with the terminal (I/O
operation), the processor could execute other users’ programs as processor speed is much faster than that of a
user. It provided an illusion as if each user was having a dedicated processor. CTSS (acronym for Compatible Time-
Sharing System), designed at MIT, USA and MULTICS (MULTiplexed Information and Computing Service) designed
at AT&T Bell Labs were two early examples of time-sharing systems. Later, UNIX was developed in 1971 following
the principles of MULTICS. Concepts of file system, file protection, passwords also came as part of time-sharing
systems.
Concurrent Programming was the next logical step. Computers gradually evolved into such a complex system
that the problems due to multiprogramming and time-sharing features (deadlocks, to be discussed later) could not
be solved in an ad hoc fashion. Conceptual basis was developed to design complex systems in a principled manner
that can offer simultaneous execution of several tasks without problems. Synchronization primitives like
semaphore, monitor etc (discussed later) were introduced. THE (acronym for Technische Hogeschool Eindhoven or
Technical School of Eindhoven, the Netherlands) operating system by Edger Djikstra (1968), RC 4000 operating
system by P Brinch Hansen (1970) included concurrent programming.
The above phases of OS development discussed so far considered self-contained, stand-alone systems. With
the advancement of networking and communication technologies, the need of sharing computations among several
computer systems was felt. Specific responsibilities were assigned to different computers connected to do a certain
task in a cooperative manner. Servers and resource sharing became common with remote procedure calls (RPC),
and distributed programming as dominant concepts leading to development of Distributed Systems. WFS File
Server, Amoeba, Unix with additional layers of distributed computing are examples of distributed systems.
Real Time systems1 are those that respond to events within predictable and specific time constraints. They are used
in several time-critical systems like air traffic control systems, process control systems, autonomous driving
systems, robotics etc. Often sensors send the data to the computer and output is produced within a specific time
so that the appropriate next action initiates, otherwise the system fails (think of robot actions). Precise timeliness,
time synchronization among different agents and priority-based actions are important attributes of the RT
operating systems (RTOS). It is characterized as a small, fast, responsive, and deterministic OS. VxWorks is a RTOS.
An Embedded System is a small operating system that lies within a larger machine - e.g., a microcontroller within
a robotic arm. Often a RTOS is used as an embedded system when timeliness and reliability are critical. Symbian
(used in basic cellular phones) is an embedded operating system.
Depending on the features, operating systems can be divided into several categories.
Based on the mode of data entry and response time (the time between request for a service from a system and the
first response from it), some types are briefly mentioned below.
Based on the number of users, processors, and programs and their connections, operating systems can be classified
into several categories as given below.
instructions to these special processors and monitor their status but cannot directly control the operation of the
devices. While controllers work, the general-purpose CPU remains free from I/O devices management and OS uses
it to execute another program. Operating systems working on a uniprocessor system are simple in design. However,
very few present-day computer systems are single processor systems.
uses a system dedicated to her (time-multiplexing) (Fig 1.9 - Fig. 1.11). The OS allocates the resources in a fair and
orderly manner to all the users, without bothering the users. Security is a major issue here. OS must ensure that
each user works within her own authorized area (for her program and data) and does not transgress beyond her
authority. OS also needs to track usages of resources by each user and to pre-empt the resource(s) and/or user
when a user unduly monopolizes a set of resources and others wait indefinitely for them. Workstations and servers
are multiuser systems.
Appn 1
On Core1
OS pgms
Appn 2 On Core2
Appn 1 …
Appn N On CoreN
Appn 2 time
Fig 1.10 Multiprogramming in a multi-core system (true
… parallelism)
Appn N Appn 1
Appn 2
Common …
codes
Appn N
Free space
time
Fig 1.9 Main memory in Fig 1.11 Multiprogramming in 1-core system (concurrency
multiprogram system with time-sharing or time-interleaving)
Point to note that most of our computers are both multiuser and multiprogram systems. Most of today’s single-
user systems like mobile devices are also multiprogram ones.
Operating Systems | 12
We must keep in mind that this kind of classifications can overlap with one another, as the criteria of division are
different. One OS can belong to several of the above types.
An operating system manages the entire hardware of a computer and serves its users. To do so, it must perform
many tasks. It starts soon after the power is switched on in a computer. To boot a computer, a small bootstrap
code kept in the firmware of the computer (ROM or its variant) needs to be executed. The bootstrap initializes all
necessary hardware (CPU registers, memory contents and I/O device controllers). It locates the OS kernel (core of
the operating system) in the memory (usually HDD or some external media disks or flash media), then loads it in
the main memory (RAM) and initiates kernel execution leaving the control to it. This process is called bootstrapping.
The kernel takes control of the computer after bootstrapping and provides services to the system and its users.
Some services are provided by other system modules of the OS, outside the kernel, that are loaded along with the
kernel at boot time - these are known as system daemons (systemd is one such daemon in Linux systems). Once
the kernel and system daemons are loaded in memory, the system is considered completely booted and waits for
some events to occur.
Operating systems are event driven. They remain idle as long as there are no programs to execute and no I/O
requests to serve. Events are signalled via interrupts. Hardware components raise interrupt signals through device
controllers (like a keyboard through a keyboard controller) to the corresponding device driver (part of the OS
managing the device). The OS kernel listens to it and takes appropriate action executing an interrupt service routine
(ISR). These interrupts are called hardware interrupts. There can be software interrupts also, known as traps or
exceptions which occur when some illegal operations are attempted by programs (like division by zero).
Appropriate ISRs are invoked and executed for software interrupts as well. All these interrupts are assigned a
priority level. If more than one interrupts occur simultaneously, high priority interrupts are served before the low-
priority ones. Often user programs can explicitly request OS services to access some system resources (e.g., memory
allocation, scanning input, printing output, system files etc) through a special operation known as system calls
(discussed in the next section).
13 | Introduction
The interrupts, service routines and system calls are some of the mechanisms through which an OS either receives
notifications from or controls and manages different resources of a computer system. This overall management can
be classified into a few broad categories as briefly discussed below.
Operating systems work as the resource manager of a computer. These resources are processes, memory,
filesystem and I/O devices. We briefly introduce here how these resources are managed by operating systems in
general.
While the OS kernel resides in the memory if the computer is running, other processes come and go. If we
can accommodate many processes, the degree of multiprogramming increases, but a processor core can
serve only one process at a time. Higher number of processes will cause a higher number of context switches
(attaching a CPU core from one
process to another). How many
processes can be kept, for how
Size & access registers Cost per bit
time increases long, when a process needs to be
increases
pre-empted (removed) - are
some of the important
management issues.
Cache
1.4.1.3 File-system
Fig 1.12: Memory Hierarchy
Management
Program code and data are stored
temporarily in registers, cache
and main memory during the program execution. But, in the long term, they are stored in secondary (HDD) and
tertiary storage (CD, floppy, DVD, pen drive, magnetic tape etc). These media stores data persistently (can retain
information even when the computer is shut down). The information (code + data) is stored in the logical unit of
files. A file is a sequence of records. Each file is a device-independent concept (e.g., a .txt file is .txt, no matter
what physical device stores it or an .exe is always an .exe irrespective of storage media). But each physical
storage medium has different physical characteristics with diverse ways of storing and retrieving data. Operating
systems provide abstraction of files and map the logical files onto physical media.
File management subsystem, a part of an OS, also helps organize the files into directories (or folders) that users
think are a logical collection of related files.
Specifically, an operating system does the following jobs as part of file system management:
1. formatting the media into file system type (e.g., DOS, Windows, Unix file system etc.)
2. mapping files onto physical media
3. creation, modification, and deletion of files
4. creation, modification, organization, deletion of directories (and sub-directories)
5. copying (backing up) files and directories from one media to another.
during their execution. These devices are made of different physical materials, have different physical
characteristics, and thus require different handling techniques. Performance of a computer is dependent on proper
management and control of the I/O devices and application programmers need to be kept free of their low-level
nitty-gritties which can be diverse and complex. Operating systems provide a general device driver for related
categories of I/O devices and specific ones when necessary. The drivers interact with the device-controllers and
manage the devices. Operating systems provide users simpler interfaces to interact with the devices. In most of the
operating systems, an I/O subsystem does the job, by specifically providing:
1. a memory management module to buffer, cache, and spool data transfer
2. a general device driver interface
3. specific device driver interfaces.
I/O Management will be discussed in detail in Module 6.
A computer system implements various kinds of protection schemes: some are at the hardware level and some at
the software (or operating system) level.
Some processors support more than two operating modes (e.g., Intel has 4 protection rings or modes, ARMv8 has
7 modes).
Also, we need to understand the distinctions between user and kernel mode, process, and system (or kernel) space
and process and system context (Fig 1.14). User codes run in user mode and process context and can access only
process address space. System calls and exceptions are handled in process context but in kernel mode and can
access both process and kernel space. Interrupts are handled in system context, kernel mode and access only
system space.
Even though a system has adequate protection, it can fail and/or be vulnerable to inappropriate access to its
resources. User’s authentication information may be stolen, her code and data can be copied and deleted. Such
vulnerabilities can spread across the system and come through viruses and worms and materialise as identity theft,
denial-of-service and/or theft of service attacks. Prevention of some of these attacks are the job of operating
systems and some OSs offer some security measures for the same. All modern OSs maintain a list of users and offer
user-ids (UID). During login, UIDs are checked and only on successful authentication, users are allowed to use the
operating system. All processes (and sub-processes) are associated with the UIDs and monitored for use of
resources. In some operating systems, users are grouped based on their privileges to access files and other
resources (e.g., group-id or GIDs in UNIX/Linux systems). Only privileged groups can access some of the resources.
Operating systems provide different services to the users and application programs. Recall that users view an
operating system as a service provider (Sec 1.1.1). The services offer ease-of-using the computer. These services
vary from one OS to another. Here, we list some of the general and common services: first, from the users’
perspective and then from the system's perspective.
These services are to provide an easy-to-use environment to the users. However, the computer system also requires
some services for its smooth operation and improvement of overall performance. Some of these services are briefly
mentioned below that are common to most operating systems.
19 | Introduction
1.5.8 Monitoring
Use of all resources by all processes and their users therefore need to be closely monitored in terms of CPU usage,
main memory usage, usages of cache, and different I/O buffers in real time. Operating systems keep track of the
same and help the super user (system administrator) to take punitive action, if required.
User programs make system calls when they need to execute privileged instructions. These are like function calls
offered by operating systems to user programs mainly for accessing the hardware. As given in Fig 1.13 and 1.14,
system calls are executed in kernel mode but initiated by the user process. Hence, they run in process context and
access both process space and kernel space.
Actually, a system goes into the kernel mode under three events: interrupts (from an I/O device to the processor,
e.g., when reading from an input device or writing on an output device is complete), exceptions (generated due to
an error in a running process) or a system call (explicit request from a running process). For an operating system,
they are treated in a similar fashion.
Operating Systems | 20
In each case, an interrupt signal (int <n>, where n is an integer pointing to interrupt type) is generated and
the kernel receives the control. It immediately suspends the normal execution of the processor and saves some
important information related to the running state (program counter value, process status word or PSW) of the
suspended process so that the processor can resume execution from the point of suspension at a suitable later
time. It then consults a system call table (in Linux, it is called a dispatch table) (Fig 1.16). Corresponding to interrupt
number (n), an appropriate interrupt handler (or interrupt service routine or ISR) is invoked. ISR is executed in
kernel mode and system space (ISRs remain in system space only).
However, it is important to understand the differences among system calls, exceptions and interrupts. Interrupts
can come from any I/O device that may be active due to any process, not necessarily the currently running one.
Hence, it is an asynchronous event and therefore, depending on its priority level, the hardware interrupts can be
serviced immediately or later.
Exceptions are caused by illegal instructions that happen in the process space, and user mode and are synchronous
events. They need to be handled by the kernel in the kernel mode but in process context. It may access the user
area of the kernel as well as system space. System calls are very much like exceptions with the difference that they
are lawful requests from the running process.
Once the ISR (typically called syscall()) completes its execution, kernel checks and sets the return value or
error status in appropriate registers, restores the saved state information of the suspended process from its user
area of the kernel space and returns to user mode and returns the control to the suspended process. Note that
mode change is a privileged instruction and can be done in kernel mode, kernel space and kernel context only.
21 | Introduction
Syscalls can be considered as buying tickets for the services of an operating system. Like we can buy tickets for
different services of a railway station, we can request different system calls (Fig 1.17 and Fig 1.18). An application
program may need several syscalls to complete its intended task.
For example, a simple program for
copying some content (say, only first-
names) from an input file (having first-
names and surnames) to an output file
involve several I/O operations or
syscalls like:
i. opening the input file (1)
ii. opening the output file (2)
within a loop till there is content in (1)
iii. reading the input file (1)
iv. writing on the output file (2)
v. closing the file (1)
vi. closing file (2).
Fig 1.18: Syscalls are like railway counters
Each of these operations is to be done through different syscalls (Fig 1.17) that the user program is supposed to
request to the underlying operating system running on a system. Exact nomenclatures of syscalls (function-names)
are different from one operating system to another (Windows and Unix syscalls are different, even within the same
family it can vary from one version to another). For an application programmer, it is difficult to remember all these
syscall function-names. Also, the program written under one OS (say Windows) cannot run on another operating
system (say Unix) if the user program uses direct syscall functions. Operating systems therefore provide a system
call interface that interacts with different compilers, shells and programming language libraries (Often system call
interface lies within a runtime environment or RTE that comes bundled with the OS). Application programming
languages (like C, C++, Java) directly talk to system call interfaces on behalf of the programs to make necessary
syscalls and offer application programming interfaces (APIs) to the programmers. These APIs are functions available
in the standard libraries of the programming languages (e.g., libc for standard C library) that correspond to
different syscalls. (APIs can be loosely compared to different ticket booking mechanisms for railway services like
using mobile apps, browsers, or agents).
As shown in Fig 1.19, user programs use APIs provided by library functions for making system calls. The compiler
transfers it to the kernel's system call interface that changes the processor mode (user to kernel) (Step 1). The
interface raises an interrupt signal with the necessary number (remember int n) (Step 2). In the system call table,
it is resolved which system call is to be invoked (Step 3). Once the appropriate ISR completes execution, control
goes back to the system call interface (Step 4). Return value is checked for error messages. If no error is found,
program state and other status variables are restored, with change in operating mode (from kernel mode to user
mode) and control is returned to the user program so that execution can resume from the point where it was
interrupted (Step 5).
Operating Systems | 22
Also from the designer’s viewpoint, the operating system must have following two properties:
i. portability: the same OS can easily run on different underlying hardware architecture
ii. extensibility: newer features can be easily added and incorporated into the existing OS.
These requirements often compete and meeting all of them together is difficult. Designing an operating system is
a complicated task and has some trade-offs. Requirements of multiplexing, isolation and interaction are met with
kernel mode of operation. But the question comes how much of the OS operation will be in kernel mode?
This question drives different structural organization of the OS family and offers a few types of OS as briefly
discussed below.
23 | Introduction
Original Unix was a monolithic OS. Due to its simplicity, speed and efficiency, it still has partial presence in some of
the latest versions of Unix, Linux and Windows systems.
1.7.2 Microkernel
An opposite design strategy can be to keep the amount of kernel mode code to a bare minimum and leave most of
the OS services offered in user mode. This type of OS organization is called microkernel. OS kernels provide minimal
process and memory management with interprocess communication facility. Bulk of the OS services including
device drivers, filesystem, system call handling are run as processes (like user processes) and are called servers.
Processes (both user and system level ones alike) interact with OS servers using message passing via OS kernel.
The scheme has several advantages like:
i. good portability: since the kernel is small, it is easily portable and manageable.
ii. greater reliability: problems in OS servers do not cause kernel to fail.
iii. easy extensibility: adding newer functionalities or modifying the existing ones is simple due to robust
implementation of the isolation.
On the other hand, it has a few disadvantages as well like:
i. increased communication: heavy amount of message passing through kernel for different OS services.
Operating Systems | 24
ii. increased use of space: every message is copied in two different process address spaces (of requester and
server) as well as in kernel space.
iii. poor performance: increase in the overall workload of the kernel and thereby drop in overall performance.
Originally Mach (developed at CMU in the 1980s) brought the concept of microkernel. Darwin, the core of Apple’s
MacOS and iOS uses a microkernel. Several embedded systems, like QNX, use microkernel architecture.
THE multiprogramming system (in the 1960s) implemented pure layered architecture. However, THE system had
hardware-dependent layered architecture that fails the portability requirement. Few modern OSs use a limited
number of layers with more functionalities added in each layer.
The approach is a mixture of microkernel and layered architecture. Very few services constitute the core
components and other services are dynamically attached (or “inserted” in) to the kernel. The LKMs can be removed
from the kernel during runtime as well.
Linux systems use this modular approach.
Virtual machines are non-real or illusory computing environments created over a single real, physical computer
system. Often, we simultaneously need different operating systems but are limited by hardware constraints (e.g.,
having a single CPU, single memory and a single set of I/O devices) (Fig 1.23). Different operating systems run
simultaneously on a single set of hardware (a real physical computing system) where each such OS ‘feels’ as if it is
exclusively owning the system hardware. Each such OS can be considered as a virtual machine (VM) (Fig 1.25). Note
that this setup is different from a computer with multiple boot options. In a multi-boot system, the hard disk is
partitioned and only a single OS is booted at a time and works solo till it is shutdown. Not two or more OSs can be
booted at the same time. But virtualization enables simultaneous booting and working of multiple OSs on a single
real machine (on different virtual machines). A loose analogy can be multiple roles played by an actor in different
Operating Systems | 26
movies (multi-boot), to double / triple / multiple roles of an actor in a single movie (VMs).
Virtual machine implementation consists of a few components (Fig 1.25). The base has the hardware components,
known as the host. The host is managed by a virtual machine manager (VMM) that virtualizes the computing
environment (as if it creates replicas of the underlying hardware) to several virtual machines (Fig 1.23 and Fig 1.25).
Each virtual machine offers a ‘feel’ of an independent hardware machine and can run an OS on it. Each OS can run
processes and use virtual resources independently, oblivious of the fact that there are other OSs running
simultaneously on the same real machine. The OSs running on VMs are called guest OSs and the application
processes on them are guest applications.
VMMs are also called hypervisors and can be of several types as described below.
Type-0 Hypervisor: Underlying hardware supports virtualization through creation and management of VMs via
firmware. Mainframe and large size servers like IBM LPARs, Oracle LDOMs contain Type-0 VMMs.
Type-1 Hypervisor: VMMs here are more like operating systems that interact with both host hardware and virtual
machines as intermediaries. VMs almost work like processes running on the host OS or Type-1 hypervisors. VMWare
ESX, Citrix XenServer are examples of Type-1 hypervisors.
Type-2 Hypervisor: These hypervisors are applications that run on some host OS and allow some other OS to run
inside the application. Obviously, these hypervisors offer very limited features and compromised performance.
VMWare Workstation and Fusion, Oracle VirtualBox are examples of Type-2 hypervisors.
Virtual machines are very popular for cross-platform software development and testing due to the following
reasons.
i. Cost-effectiveness: One does not need different hardware to test a newly developed OS as a VMM can
simulate the same. Similarly, a new application can be developed and tested for multiple OS platforms
using VMs.
ii. Isolation: VMs provide isolation between the host OS and guest OSs as well as between any two guest OSs.
A bug or virus or worm within a particular OS cannot play havoc on other OSs.
27 | Introduction
iii. Consolidation: In data centers two or more lightly loaded systems can be combined in a virtual machine to
run on a single real system. This way load can be consolidated and balanced with better resource utilization.
iv. live migration: Some VMMs include the feature that allows some guests to move from one host system to
another without any interruption. This live migration enables better resource management also.
Sometimes virtualization comes with different flavours. For example, paravirtualization does not offer pure
virtualization where all the hardware is simulated as per the need of a guest OS. Rather, the guest OS can customize
itself to the basic set of virtual hardware. This scheme reduces the volume of virtualization software.
Virtualization can also happen at application level. For example, Java Virtual Machine (JVM) is actually
virtualization of the programming environment. JVM provides an execution environment that is independent of
operating systems. Java programs are compiled into bytecodes that are executed in a JVM irrespective of the OS
that is running the JVM.
Virtualization, discussed so far, takes care of different OSs running on the same instruction set architecture (same
processor). However, when programs compiled in one instruction-set architecture (guest) need to be run on
another instruction set (host), the entire guest instruction-set needs to be converted. Emulator software, sitting on
the host system, translates each of the guest instructions into a host instruction and enables execution of the guest
executables. This emulation is also virtualization of instruction-set - which is complicated and challenging, but very
popular, particularly in gaming softwares.
Present day cloud computing is enabled by massive virtualization of hardware resources over the Internet. The
processing and storage are offered as a service to the users but are actually done in remote data centers.
There are quite a few operating systems in the market. However, most of the general-purpose OSs belong to either
of the two popular families: UNIX and WINDOWS. We shall briefly provide an overview of the two families here.
1.9.1 UNIX
UNIX, first developed in 1971 (See Sec 1.2.3), is one of the most successful operating systems. It has been widely
used and is still available in variants and offshoots with different open-source and commercial versions, both in
academia, research and the business world. UNIX is a multi-user, multiprogramming OS. It has simplicity and
elegance in its design from the system’s point of view. It also provides simplicity, clarity, and ease-of-use from the
user's point of view.
executable program. These programs and editors (vi and ed) interact with the kernel invoking system calls. Even
the user program (a.out) can be in this layer. A standard C compiler (cc) is found in the outermost layer which
invokes a pre-compiler (cpp), 2-pass compiler (comp), assembler (as), and linker-loader (ld) from the lower-layer.
Other applications remain in the outermost layer that can use different lower-layer programs to invoke syscalls.
This architecture is generic, different variants have different numbers of layers and UNIX, being open source since
inception, allows extension on the hierarchy of layers.
Filesystem: UNIX has a hierarchical filesystem (Fig 1.27) where ‘/’ is the root and all directories and files are arranged
in a tree structure. Leaf-nodes are files and non-leaf nodes are directories and subdirectories. But UNIX treats all
files and folders alike as an unformatted stream of bytes. Every file is considered unique by the system and identified
by the path from the root to leaf (e.g., /usr/src/test.c). Even the devices are also treated as files, as tty01
and tty02 represent 2 devices.
Processing environment: A program is any executable binary file (e.g., a.out), but during execution, UNIX sees
it as a process (a running instance of a program). Several processes can run concurrently. A process can create
another process (using fork() system call), execute a program within the process (using exec() syscall),
and communicate with one another using IPC mechanisms (e.g., signals and pipes).
System primitives: Shell is a powerful tool that comes with UNIX offering a few building block primitives. These
primitives help users write small modular programs that can be combined to create complex programs. One such
primitive is redirection of I/O. Processes easily access standard input, standard output and standard error of the CLI
as 3 files and can independently redirect any of the files to another location. Another useful primitive is pipe where
output of one process can be treated as input to another process.
1.9.1.3 OS Services
UNIX provides the following standard services. Privileged services are provided in kernel mode.
29 | Introduction
Process management: The kernel does process creation, termination, suspension and communication. It ensures
fair process scheduling using time-sharing.
Memory management: UNIX kernel allocates memory to all executing processes ensuring isolation between the
kernel space and user space and among the address spaces of several user processes. It also takes care of virtual
memory when main memory is low.
File management: For persistent storage, UNIX formats the disk space, allocates to different files and directories
and allows users to organize and manage it. It also provides well-structured security at folder and file levels.
I/O Management: UNIX allows processes to access I/O devices like terminals, disk drives, network devices in a
controlled manner.
Interrupt and exception handling: UNIX allows peripheral devices and the system clock to asynchronously interrupt
CPU and support exception handling synchronously. Interrupts have defined priority levels. When a high priority
interrupt is serviced, all interrupts below its priority level are blocked.
Even though earlier versions of UNIX had only CLI, present-day UNIX also has GUIs. Linux and open-source versions
have made UNIX freely distributed and well accepted across the globe.
1.9.2 WINDOWS
Windows, developed by Microsoft (MS) Corporation, is a family of operating systems that are perhaps the most
used OSs across the world. It started with Windows 1.0 in 1985 that came following MS-DOS (1981), a collaborative
effort of MS and IBM for IBM personal computers. Windows 10 and Windows Server 2016 with annual updates are
the latest in the family and briefly discussed.
In the user mode, processes and dynamic link-libraries (DLLs) are executed.
1. Processes: Windows considers processes of 4 categories.
i. user processes: These are Windows applications developed by users.
ii. service processes: These are Windows services independent of user logins like Task Scheduler and
Print Spooler services including MS SQL Server or Exchange Server services.
iii. system processes: These are some fixed processes not considered Windows services like logon
process, Session Manager process.
iv. environment subsystem processes: These are part of support for other OS environments like OS/2
and POSIX systems by Windows. However, they are now discontinued.
2. Sub-system DLLs: DLLs are stand-alone executable routines that are linked by applications (different processes)
and they translate functions to the lower level native system calls.
3. NTDLL.dll: These have the Windows lower-level system calls that are executed in the kernel mode.
Operating Systems | 30
2. indirectly using local procedure call (LPC) or RPC where message passing technique is used via kernel space.
1.9.2.3 OS Services
Process management: Windows kernel does process creation, termination, suspension and communication. Kernel
uses a scheduler (called thread dispatcher) for CPU time management with pre-emption (forceful eviction of threads
from CPU). Windows use objects to manage processes and threads (named as EPROCESS and ETHREAD objects
respectively). Windows also supports fibers (sub-thread) and jobs (a group of processes).
Memory management: Windows divides main memory into 2 halves and allocates almost equally to user processes
(lower memory region is called user half) and kernel processes (upper memory region is called kernel spaces).
Virtual memory with paging is used.
File management: For persistent storage, Windows formats the disk space, allocates to different files and
directories and allows users to organize and manage it. Files are managed in terms of volumes where a master file
table (MFT) takes care of each volume. Files are protected through security mechanisms.
I/O Management: Windows I/O manager along with device drivers does the I/O management. Windows I/O
subsystem covers device drivers, filesystem drivers, network drivers, a cache and message buffers.
Interrupt and exception handling: Generic name for interrupts, exceptions and system calls is trap in Windows.
Traps have 32 interrupt request levels or IRQLs (0-31). Hardware interrupts have high IRQLs (3-31), followed by
software interrupts (IRQL 2 and 1). A processor always runs with a single IRQL. Normal user thread executes in IRQL
0.
Windows has greatly evolved over the years from a PC-based OS to a complex Windows-as-a-service (WaaS) in a
cloud computing environment implementing virtualization. Microsoft has a good deal of its documentation with
other resources at https://docs.microsoft.com/en-us/. The most authentic source to learn about the latest version
of Windows is [YIR17].
UNIT SUMMARY
An operating system is a software, the core of system software that runs all the time from booting to
shutdown of a computing device. It acts as the intermediary between the bare hardware of the system and
its users.
It provides a lot of services to the users so that they need not bother about the specialties of the underlying
hardware as that can vary across the systems. An OS allocates the hardware whenever user programs need
them, controls and manages them. An OS also manages execution of user programs. through process
management.
OSs have evolved a lot since the 1950s as computer systems did with time. From the open shop era when
there was no OS to today’s cloud computing, OS has seen batch processing, multiprogramming, time-
sharing, concurrent programming, personal computing, distributed computing and embedded systems.
Based on needs, OS has various types like: batch system, multiprogramming system, interactive system,
multi-user systems, distributed systems, embedded and realtime systems.
Again, based on organization and architecture, there are variations like monolithic, microkernel, hybrid and
loadable kernel modules.
These classifications are not non-overlapping. Most of the available operating systems actually belong to
several categories simultaneously.
However, most of the OSs do process management, memory management, file management, I/O device
management and provide security and protection to hardware & software entities and the users.
OS ensures protection in collaboration with the hardware through different isolation schemes like operating
modes (kernel and user), address spaces (kernel space and process space) and execution context (system
and process).
The isolation schemes are assisted by another set of hardware-software mechanisms: interrupts, exceptions
and system calls.
The chapter concludes with case studies of two popular OS families: UNIX and Windows.
Operating Systems | 32
EXERCISES
Q1. Which of the following standard C library functions will always invoke a system call when executed from a
single-threaded process in a UNIX/Linux operating system?
Q2. Which combination of the following features will suffice to characterize an OS as a multiprogrammed OS?
(a). More than one program may be loaded into main memory at the same time for execution.
(b). If a program waits for certain events such as I/O, another program is immediately scheduled for
execution.
(c) If the execution of a program terminates, another program is immediately scheduled for execution.
Q3. Fork is
A. the creation of a new job
B. the dispatching of a task
C. increasing the priority of a task
D. the creation of a new process
Numerical Problems
Q1. How many of the following instructions should be privileged _____
1) set mode to kernel mode 2) reboot 3)read the program status word 4) disable interrupts 5) write the
instruction register
Q2. How many bits are required to control Windows IRQLs?
PRACTICAL
1. Install any Linux operating system in your computer. There are many free OS available at
https://distrowatch.com/ and Internet tutorials on installing Linux.
2. Check on the Internet, how Ubuntu can be activated and used from Windows and explore the Ubuntu CLI.
3. Learn details of different syscalls in the UNIX and Windows. [Hint: For Unix commands, try man <cmd> or
info <cmd>]
KNOW MORE
● Evolution of Computer systems as well as that of operating systems can be studied from [Mil11],
[SGG18], [Hal15], [Han00].
● Types of operating systems can be learned more from [Dha09].
● Operation of Interrupts is detailed in [Sta12].
● Virtualization was covered from [Hal09] and [SGG18].
● [Bac05] and [Vah12] give a broad overview as well as details of UNIX.
● Windows is discussed in reasonably great detail in [Hal15] and [SGG18].
● However, to work with Windows in exploratory details, one must refer [YIR17].
[Bac05] Maurice J Bach: The Design of the UNIX Operating System, Prentice Hall of India, 2005.
Operating Systems | 34
[CKM16] Russ Cox, Frans Kaashoek, Robert Morris: xv6, a simple, Unix-like teaching operating system,
available at https://www.cse.iitd.ac.in/~sbansal/os/book-rev9.pdf
[Dha09] Dhananjay M. Dhamdhere: Operating Systems, A Concept-Based Approach, McGraw Hill, 2009.
[HA09] Sibsankar Haldar and Alex A Aravind: Operating Systems, Pearson Education, 2009.
[Hal15] Sibsankar Haldar: Operating Systems, Self-Edition 1.1, 2015.
[Han00] Per Brinch Hansen: The Evolution of Operating Systems, (2000) (available at http://brinch-
hansen.net/papers/2001b.pdf) ((as on 8-Jul-2022).
[Mil11] Milan Milenkovic: Operating Systems - Concepts and Design, 2nd edition, Tata McGraw Hill, 2011
[SGG18] Abraham Silberschatz, Peter B Galvin, Greg Gagne: Operating Systems Concepts,10th Edition,
Wiley, 2018.
[Sta12] William Stallings: Operating Systems Internals and Design Principles, 7th Edition, Prentice Hall,
2012.
[Vah12] Uresh Vahalia: UNIX Internals, The New Frontiers, Pearson, 2012.
[YIR17] Pavel Yosifovich, Alex Ionescu, Mark E. Russinovich, and David A. Solomon: Windows Internals,
Seventh Edition (Part 1 and 2), Microsoft, 2017. https://docs.microsoft.com/en-
us/sysinternals/resources/windows-internals (as on 8-Jul-2022).
UNIT SPECIFICS
Through this unit we have discussed the following aspects:
● Processes: Definition, Process Relationship, Different states of a Process, Process State transitions,
Process Control Block (PCB), Context switching
● Thread: Definition, Various states, Benefits of threads, Types of threads, Concept of multithreads
● Process Scheduling: Foundation and Scheduling objectives, Types of Schedulers
● Scheduling criteria: CPU utilization, Throughput, Turnaround Time, Waiting Time, Response Time
● Scheduling algorithms: Pre-emptive and Non preemptive, FCFS, SJF, RR;
● Multiprocessor scheduling
● Real Time scheduling: RM and EDF.
This chapter introduces the basic units of program execution: processes and threads. A process is a
running instance of a set of instructions, called a program. Execution of a program is facilitated and
managed by the OS on the computing hardware. Operating system sees it in terms of a process, allocates
resources to the process and its sub-units, called threads, and allows the CPUs to execute the instructions
for a process and threads. We develop necessary concepts centred around program execution with a
focus on CPU scheduling.
Like the previous unit, many multiple-choice questions as well as questions of short and long answer
types following Bloom’s taxonomy, assignments through several numerical problems, a list of references
and suggested readings are provided. It is important to note that for getting more information on various
topics of interest, appropriate URLs and QR code have been provided in different sections which can be
accessed or scanned for relevant supportive knowledge. “Know More” section is also designed for
supplementary information to cater to the inquisitiveness and curiosity of the students.
RATIONALE
This unit on process management starts with the discussion on functioning of an operating system in
detail. The unit helps students understand the fundamental concepts of program execution under the
control of an OS. A program is a set of instructions stored in the persistent memory. They need to be
brought to the main memory and then executed on the processor using different hardware units to produce
a desired output. In a multi-user, multi-program environment, several programs from several users run.
But every program essentially needs a processor to execute and other resources (both software and
hardware) to complete its intended task. How these resources are allocated, when they are allocated,
deallocated and reclaimed - are central questions to understand the overall functioning of an OS. We
define the fundamental concepts of program execution - processes and threads here. We also define other
necessary concepts related to process management. How different resources are allocated to processes,
how their usages are tracked and monitored, which data structures help them in this tracking are
discussed here. Out of several hardware resources, the processor or CPU is the most important one. How
a CPU is allocated to a process, for how much time, when it is taken off from the process - are discussed
as part of CPU Scheduling. All these are discussed with respect to uniprocessor, multiprocessor and real
time operating systems.
This unit builds the fundamental concepts to understand the functioning of an OS. The concepts will
be used in all the forthcoming units of the book.
Operating Systems | 36
PRE-REQUISITES
● Basics of Computer Organization and Architecture
● Fundamentals of Data Structures
● Fundamentals of Algorithms
● Introductory knowledge of Computer Programming
● Introduction to Operating Systems (Unit I of the book)
UNIT OUTCOMES
List of outcomes of this unit is as follows:
U2-O1: Define a process, a thread, PCB, context switch, performance metrics.
U2-O2: Describe the life cycle of a process through different states, PCB, different scheduling
algorithms.
U2-O3: Understand the state transitions of a process and context switching.
U2-O4: Realize the need of threads and their differences with processes.
U2-O5: Analyse and compare different CPU scheduling algorithms.
U2-O6: Design CPU scheduling algorithms to optimize performance.
CO-1: To learn the mechanisms of OS to handle processes and threads and their communication.
CO-2: To learn the mechanisms involved in memory management in contemporary OS.
CO-3: To gain knowledge on distributed operating system concepts that includes architecture,
Mutual exclusion algorithms, deadlock detection algorithms and agreement protocols.
CO-4: To know the components and management aspects of concurrency management.
When a program is executed, it becomes a process. A process is a program in execution. It is an active entity
and dynamically changes. An OS considers processes as units of program execution or simply, computation.
When we talk of multiprogramming, we mean that multiple processes run simultaneously. A process is much
more than a program. For an OS, a process subsumes a program which is a sequence of instructions (code),
but also contains data and a lot of other entities. A single program can have multiple instances as multiple
processes running at the same time on a given machine. For example, a word processor (a program) can open
several documents, each one can be considered as a separate process (each document as data is different)
and the OS tracks each process individually. As a loose analogy, a person can be considered as a program, but
she can be a mother, a daughter-in-law, wife at the same time in a family and play different roles
simultaneously. Different roles can be considered as processes that can come from a single program (a single
person).
Some of the processes can belong to application programs or user programs (called user processes), and some
OS programs (called kernel processes).
Each process holds some attributes assigned by the OS as follows.
Process-id: a process identifier (often referred to as pid)
User-id: the process is owned by a specific user (owner’s user identifier or user-id, referred as uid)
Process Group-id: Every process is supposed to belong to a group, based on the task. The group has a process
group identifier (or pgid)
Address space: main memory space (known as process address space) where it stores
i. program (code or text)
ii. static data
Operating Systems | 38
A. Text Section: It stores the program code (in executable form, not the source code). All the instructions
are stored here.
B. Data Section: This section stores the data that are used by the process. Some data comes attached to the
program code that cannot be dynamically changed (globally declared and initialized as read-only data) - we
call this as program data. However, most of the data belong to the two major classes as follows.
i. Static data: This data is statically bound to program code and can be allocated space during compilation.
This can be initialized as read-only data (program data) or initialized read-write data or uninitialized data (Fig
2.2).
ii. Dynamic data: This data is not allocated during compilation, rather can only be allocated during program
execution or in the run-time. It can grow or shrink during execution depending on the requirement of the
process. It has two important sub-sections.
39 | Processes, Threads and their Scheduling
a. Heap: During program execution the process dynamically allocates memory (as done by malloc()in
Fig. 2.2) based on requirement and deallocates when the need is over. Often actual data size of such
need is not known during code development or programming and is left to run-time. Such data is
allocated from the heap space. The space is allocated from the free space of the main memory by the OS
after getting a syscall from the process. Thus, the heap can grow in size, requesting free space from the
OS. This causes an increase in the process address space also. It again shrinks in size when the space is
freed by the process via a syscall and reclaimed by the OS.
b. Stack: This space is used by the arguments, local variables, return values of a function or a method within
a source program. For each function call, stack stores the above variables and data structures for it. When
several functions are called in succession, space for each called but not yet terminated functions are
maintained here. The stack space thus grows with function calls and is removed as the function
terminates.
For the given example C-source code (Fig 2.2), when compiled using GNU C compiler (gcc)
and a program is created as a.out. Some portions of different sections discussed, and its process address
space is mentioned below (using shell command size -A a.out).
3
https://www.csl.mtu.edu/cs4411.ck/www/NOTES/process/fork/create.html (as on 21-Jul-2022)
4
https://ece.uwaterloo.ca/~dwharder/icsrts/Tutorials/fork_exec/ (as on 21-Jul-2022)
41 | Processes, Threads and their Scheduling
The state of any object defines and/or represents its circumstance, situation or form. Any dynamic object
changes its state from one to another. A process is very much a dynamic object that goes through various
states as described below (see Fig 2.5).
New: This is the first state of a process. When a process is created or a program is invoked the OS creates a
new execution context (recall Sec 1.4.2.3), allocates a process address space in the main memory and other
necessary per-process resources in the kernel mode.
Ready: Once the per-process resources are created, the process becomes ready for execution. It needs a
processor (actually, a core of a processor, to be specific) to be allocated.
Running: As soon as a processor is allocated, the process starts executing the instructions from the program
text. Here, the program can run in user mode, However, for privileged instructions, it can go to kernel mode
also.
Waiting: When the process needs an I/O to be done or explicitly waits (via a system call like wait()), the
process is taken off the processor and is considered to be waiting. When the I/O is complete or the explicit
wait is over, the process becomes ready and joins the ready queue. It can run only when it is allocated to the
processor.
Terminated: When the process completes normally (even abnormally also), process address space is
reclaimed by the OS. All process-related resources are also de-allocated. This state is called a terminated
state. A process cannot be made to run from this state. As mentioned above, the state when the process
completes execution, but its resources are not yet deallocated, is known as a zombie state.
Nomenclatures of these states vary from one OS to another. However, other than the running state, all other
states happen in kernel mode only. From kernel to user mode or vice versa can happen in the running state
of a process.
Operating Systems | 42
After a process is created, it changes its states from one to another as it proceeds in its life cycle (very much
like infancy, childhood, adolescence, youth, middle-age, old-age of a person). Along with the states, there
are several other parameters of a live process that the OS kernel has to keep track of. For example, which
instruction a process is currently running and thus where the CPU will find the next instruction from
(remember program counter or PC), how many special purpose registers (SPRs) like stack pointer (SP),
program status word (PSW), condition codes (CC) it is using and what are their values, how many general
purpose registers (GPRs) the current process holds, what are their values, what are values of base register
and limit register with respect to the process, how many I/O devices it is currently allotted, etc. This
information is particularly important when a process moves from running to waiting or running to ready
state, because we have to resume the process exactly from the same condition where it was suspended
(before the state transition). As if we need to take a snapshot of the running condition of the process with
values of all the controlling variables and accounting parameters and preserve the snapshot. All such control
information related to a process collectively defines an execution context of the process and is called the
process context (recall the definition of execution context in Sec 1.4.2.3). In a multiprogramming environment
where resources are shared among several processes, maintaining and keeping track of process contexts is
absolutely critical for correct and smooth running of the system.
The OS kernel maintains a special data-structure called process control block (PCB) or process descriptor in
its kernel space for each live process. This is a per-process data structure that stores the context of a process.
A user process may not need it for its running, but the kernel maintains it for managing and monitoring the
process and providing protection to other processes.
The PCB has a number of attributes, some of which are the following (Fig 2.6).
Process id: Every process has a unique identifier.
User id: The owner of the process.
Process state: Process state is kept track of that can be new, ready, running, waiting, terminated etc.
43 | Processes, Threads and their Scheduling
Scheduling information: For CPU allocation to a process, process priority, pointers to scheduling
queues, and other scheduling parameters need to be maintained.
Memory-management information: Information like value of the base and limit registers and the
page tables, or segment tables.
Accounting information: Information like amount of CPU time used, wait time, time limits etc.
Software context: This can be a list of open files, open sockets (ip-address + port-address, used for
communicating with remote processes) and memory regions.
Hardware context: There are a number of hardware information that need to be kept track of like
o Program counter - stores the address of the next instruction to be executed
o Stack pointer - points to the top of the procedure that is being executed
o Other CPU registers - values held by accumulators, PSW, CC and other GPRs
o I/O devices - list of I/O devices held by the process
Pointers to different data structures: There are other data structures that are needed by the kernel
for managing a process - pointers to all of such data structures are kept in the PCB which are
dynamically added and deleted.
When the CPU is changed from one process to another, the context of the first process is saved and that of
the second process is loaded into appropriate registers and other data structures. We call this context
switching or process switching. Mind that process switching happens from one process to another in relation
to a CPU allotment and is managed by the OS. But, mode switching (user mode to kernel mode or vice versa)
is essentially a processor mode activity - that happens within the context of a running process. Context switch
Operating Systems | 44
is a kernel activity and seen only in a multiprogramming environment. It is done to improve the performance
of the OS (to increase throughput, reduce average execution time for a set of processes etc).
Hardware context of the processor (PC, SP, PSW and other registers) are saved.
The PCB of the running process is updated with the hardware context. Process state is changed from
running to other appropriate states (waiting, ready or terminated) along with other relevant fields
including accounting information.
The PCB is put to the appropriate queue (ready queue, blocked on some event queue, I/O queue etc).
Another PCB is selected based on the priority and position of the process in the scheduler queue.
The selected PCB is updated with process state (from earlier state to running state).
Memory management data-structures are updated (e.g., base register, limit register of the processor)
All the hardware context of the processor is restored from the selected PCB.
A context switch thus takes some amount of time to complete these tasks. In comparison, processor mode
switching (user to kernel or vice-versa) is less time-consuming.
45 | Processes, Threads and their Scheduling
2.7 THREADS
Processes have been discussed so far as units of program execution and resource allocation & utilization.
Traditional OSs allocate computing resources at process level and monitor execution of each process as if it
has a single flow of execution. The flow is suspended when it goes for some system call or I/O operation.
However, in many tasks, there are many independent subtasks that can be done in parallel. For example, in
real life, you can solve some numerical problems as well as listen to music simultaneously. Any busy lawyer
handles several cases simultaneously - since a single case does not get dates of hearing continuously, but
after days of interval. After one hearing, while a case waits for the next hearing, the lawyer can handle other
cases.
In the world of computing, the task of matrix multiplication can be divided into several independent subtasks.
If Amxn and Bnxp are two matrices that are multiplied to generate another matrix Cmxp,
such that [ , ] = ∑ [ , ] ∗ [ , ] for i =1, ...m and j = 1, ...p.
Here, each of (mxp) entries of matrix C can be computed independently and finally compiled together to
generate matrix C.
Recent advances in processor architecture provide multiple CPU units within a single processor and even
multiple cores within a single CPU. Hence, if a task can be intelligently divided into several independent
subtasks, they can be allocated to different cores of a CPU and the entire task can be efficiently accomplished
in a short period of time.
These facts have been instrumental in bringing the concept of threads.
2.7.1 Definition
A thread is a single flow of execution and considered a basic unit of CPU utilization. A process can have one
or more threads. A traditional process is considered to have a single thread of control, but most modern OSs
support processes to have more than one thread.
Each thread can run independently. If there are multiple CPUs or multicores within a CPU, threads of a single
process can execute in parallel simultaneously. Each thread has a thread ID, and holds a program counter
(PC), a register set, and a stack on its own. However, code section, data section, and other operating-system
resources, such as open files and signals are shared by all threads within a process (Fig 2.8).
Operating Systems | 46
Improved performance: Multithreading enables division of a task into several independent subtasks,
each of which can be performed by a thread. This reduces the blocking time of processes increasing
overall CPU utilization and user responsiveness. This is particularly useful in interactive applications
where the user does not have to wait for completion of one action before invoking another, especially
when such action is time-consuming.
Resource sharing: Threads belonging to a process share the memory and the resources by default. This
increases the overall resource utilization by a set of processes of a system.
Low cost: Allocating memory and resources to processes are costly in terms of both space and time.
Thread creation takes almost 10 times less time than a process creation. A thread switch is also less
expensive than a process switch in terms of space and time.
Scalability: Several threads of a process can run in parallel on different CPU cores, whereas a single-
threaded process can run on only one processor, no matter how many cores are available. Threading
thus unleashes the advantage of exploiting the full potential of modern multi-core multi-processor
architecture.
However, there are a few disadvantages as well.
Increased stack space: Since each thread needs a stack that comes from the stack space of the
corresponding process, usually restriction is set on per-thread stack size. Thread stack size cannot
always grow on demand - often a bottleneck for application development.
Increased complexity: Multi-threaded applications exhibit non-deterministic behaviour as ordering of
threads is difficult to implement. Designing and developing concurrent multi-threaded applications,
debugging and correcting them are very complex and demanding exercises.
Overall, advantages often outweigh disadvantages and use of multi-threading is on the rise across
applications.
Threads are created, managed and destroyed in the user space by the threads library. User heap space
maintains the thread descriptors and user stack space is divided into thread stack spaces. In a pure ULT
system, the kernel allocates only a single CPU core to the process and thread concurrency is achieved at the
user level via threads library. Only a single ULT can run at a time while other threads need to wait in blocking
or ready state. True parallel execution is therefore not possible in a pure ULT system. To interact with the
kernel, an ULT first makes a thread API call provided by threads library (Fig 2.10). But the OS can see only
processes. So it is modified to a process API call as provided by the system library which again converts it to
a process system call of the underlying OS. ULTs are entirely managed by threads library and any
communication from an ULT to kernel can happen only on behalf of the entire process.
Advantages:
Since ULTs are managed in user space, thread management does not require any mode switch (user to
kernel mode).
Thread switching is less costly in space and time than context switching.
Application programs need not be changed depending on whether the OS supports multi-threading or
not.
Disadvantages:
Thread-level concurrency is limited as true parallel execution is not possible.
User threads can directly interact with the kernel space with necessary mode switch (user to kernel mode).
An ULT through a KLT can make a system call independent of other threads from the same process (Fig 2.12).
There can be thus two or more system calls from a single process at the same time.
Advantages:
KLTs help to achieve true parallelism and provide substantial speed up in execution.
Operating Systems | 50
Disadvantages:
Since thread management happens in kernel space, every thread switch results in a mode switch (user
mode to kernel mode and vice versa). Mode switch is an order of magnitude more time-consuming than
a pure ULT switch.
KLTs have scalability issues. When a very high number of KLTs are required, kernel space requirement
also increases leading to burdening the system in main memory space usage.
Both pure ULT and pure KLT have their pros and cons. Some systems therefore use a mixed or hybrid kind
threading.
Many-to-many model: Here, many ULTs map to a smaller or equal number of KLTs. How many KLTs will be
assigned to a process varies depending on the application or the architecture. A multi-core processor can
allocate a higher number of KLTs (Fig 2.16).
Although theoretically a higher number of cores can run multiple threads in parallel leading to increase in
performance, the speed-up is not linear. This follows Amdahl’s Law that justifies the diminishing return.
speed-up
=
time to execute a program using a single threadtime to execute the same program using multiple threads
=
( )/
where s is the fraction of serial code that cannot be parallelized, and N is the number of threads.
Fig 2.17 shows the variation in speed up vs number of cores that run threads in parallel ignoring the overhead
of creating threads. With a higher fraction of serial (or non-parallelizable) code, speed-up diminishes. For
example, with 50% of serial code, maximum speed-up can be 2, attainable only when a huge number of CPU
cores are employed. When the overhead of creating threads and thread-switches is considered, speed-up is
even worse. In fact, speed-up falls after reaching a highest point due to increase in cost of thread
53 | Processes, Threads and their Scheduling
CPU is the most important resource of any computing system as it is the only resource that executes
instructions. Every process needs a CPU core non-sharably to execute its code (like every train needs a track).
During its life-time, a process uses CPU for execution and other resources (persistent memory and I/O
devices) for other activities like read, write, display, print and so on. When the process uses other resources
and does not require CPU, the CPU remains idle (recall from Fig 2.7). In a single process system, this is not a
problem as the process can monopolize the CPU. But in a multiprogramming system where several processes
are waiting to access a CPU core, keeping the CPU idle is a waste of time. To maximize the performance of a
system, the CPU utilization should be maximum. In other words, whenever a CPU is free, we should allow
other processes to use it. But at a single point in time, only one process can use the CPU. In a single-core CPU,
all the processes that require CPU should be allocated some CPU time one after another. Even in multi-core
CPU or multiprocessor systems, the number of processes are much higher than the number of cores. Hence,
not all processes get the CPU core as and when required. So which process should get a CPU core, when and
for how long - are very important questions. Remember from Unit 1 (Sec 1.1) that resource allocation is an
important responsibility of any OS and CPU is the most important resource. Process scheduling or CPU
scheduling is the task of an OS dealing with the allocation of a CPU or a CPU core to a set of processes.
CPU Scheduling is a kernel activity that involves context switch and change of states in the processes (from
ready to running and running to ready or running to wait/blocked states). CPU scheduling is managed by an
OS program - known as CPU scheduler. The scheduler runs in quick intervals, checks the queue of ready
processes and allocates a CPU core to one process when the core is free.
Short-term scheduler: Once processes are brought into the CPU ready queue, which process out of them will
be assigned the CPU next, what will be the selection criteria for CPU allocation, when it will be assigned and
for how long - these fine level decisions are taken by a short-time scheduler. This is also called dispatcher.
The job of CPU scheduler or dispatcher is to
a) ensure context switch from one process to another.
b) switch to user mode (from kernel mode).
c) pointing to the appropriate location in the user program to start / resume the process.
Operating Systems | 56
The aim of a CPU scheduler is to achieve “goodness” with respect to some measurable criteria. Different
systems have different criteria, some are user-oriented and some system-oriented. Goodness according to
each criterion is measured in terms of a performance metric. Some performance metrics are defined below.
CPU Utilization: Utilization of any resource is defined as a ratio of its busy time and total time including
its idle time. Hence,
= =
The utilization can be expressed as a fraction or in percentage. It should lie between 0 to 1 (or 0% to
100%). Higher its value, the better is the performance of the entire system as it means the CPU has lower
idle time. (Use top command in MacOS and UNIX based systems to see CPU utilization. Its value 40%
or less means lightly loaded system, 90% or more means highly loaded system.)
Throughput: Throughput is used to measure the performance of any system in terms of units of work or
task accomplished in unit time. In case of CPU scheduling, it is defined as the number of processes
completed in unit time (say in 1 second). Its value can be any positive real number per unit time. For a
set of long processes, throughput can be a fractional value (say, 0.05 per sec), whereas for very short
processes, we can have integers (say, 10 per sec).
Turnaround Time: It is the total time since a process is created to the time of its completion. Hence, it is
the sum of wait time in the ready queue, CPU execution time, wait time in the I/O queue and time for
doing I/O. Mathematically,
( ) = + +
/ + / .
For any process, instruction execution in CPU and I/O activities are not contiguous. Both these activities
rather happen in spells - few CPU bound instructions are followed by an I/O bound action and then again CPU
bound instructions and so on.
These spells are also called bursts. A CPU burst (a continuous sequence of CPU-bound instructions) is followed
by an I/O burst and vice versa. Any process can be considered as a sequence of several CPU bursts and I/O
bursts having start and end mandatorily with CPU bursts.
Burst time is defined as the time spent for executing the activity in the burst excluding the wait time in
the queue. Hence, TA time can also be defined as the sum of all CPU bursts and I/O bursts and wait-times
in different queues.
=∑ +∑ / +∑ /
TA time is any positive real number usually expressed in microseconds, milli-seconds or seconds.
Waiting Time: A process has to wait for any resources if there is a high demand for the resource. Every
resource is generally associated with a queue, where processes wait to access the resource. Waiting time
is the time spent in the queue for the resource starting from joining the queue to using the resource. In
CPU scheduling, waiting time means waiting in the CPU ready queue, unless otherwise mentioned.
57 | Processes, Threads and their Scheduling
Response Time: In interactive processes, users are more interested in getting the responses from the
system. Users may tolerate delay in completion of the entire task if it may take time and continue in the
background. In those cases, turnaround time is not that important, but what matters is the time spent in
getting the first response from the system. Hence, response time is defined as the time between
submission of a request and getting the first response from the system (not completion of production of
the response).
For a good scheduling algorithm, we expect high CPU utilization, high throughput, low TA time, low wait time
and low response time. However, not all such criteria can be met in a single algorithm. There are different
algorithms to prioritize different criteria. We shall learn a few algorithms here.
A little thought over the above cases will say that some of the cases (Circumstances 1, 2, 5) need eviction of
the CPU from a currently executing process, unless it voluntarily releases it. In other cases (Circumstances 3,
4, 6), the CPU can be voluntarily released by the executing process. It is up to the designer of the OS to decide
whether the OS will apply force to evict CPU from the running process or not. If scheduling algorithms allow
forceful eviction (or preemption) of the CPU, they are called preemptive algorithms. When no preemption is
allowed and the processes can only voluntarily release CPU, corresponding scheduling algorithms are called
non-preemptive algorithms. Obviously, non-preemptive algorithms have the potential problem of
monopolizing the CPU, particularly by long processes when other processes suffer from indefinite block or
starvation. The starvation can lead to a catastrophe if the executing process goes into an infinite loop due to
some programming errors. Preemptive algorithms do not suffer from this problem. But they cause frequent
context switches and incur associated overhead. Also, a context switch can lead to a serious issue when the
preemption occurs in the middle of a modification of a shared data. If the process modifying the shared data
could not complete it before the context switch, and another process uses the data immediately after - the
second process gets incorrect data. This problem, called data inconsistency problem, adds complexity and is
discussed in process synchronization. Nevertheless, most modern OSs (Windows, Linux, UNIX, MacOS) use
preemptive algorithms nowadays.
leaves CPU for some I/O operation. When the CPU is free, the process that has the earliest arrival time is
scheduled next. FCFS can be implemented using a FIFO ready queue. Even though it is simple, it is not a very
efficient algorithm as far as performance is concerned. The long processes can hold the CPU for long causing
starvation to late comers (See Example 1).
If we compare Example 1 and Example 2, the processes have the same CPU burst times but different arrival
times. Processes arrive at different time points in Ex 1 but at the same time in Ex 2. We considered the same
arrival time in Ex 2, to illustrate the non-preemptive SJF algorithm.
Operating Systems | 60
It seemingly shows performance gain in both average time and average TA time. However, the comparison
is not fair as the processes have different arrival times in the two problems. Hence, we revisit the problem of
Example 1 in Example 3 with a non-preemptive SJF algorithm again.
61 | Processes, Threads and their Scheduling
Example 3 shows improvements over the FCFS. But can we do any better? What if we can suspend P1 as soon
as P2 arrives with 3ms of execution when P1 has 4ms of execution left? Can there be any gain if we preempt
P1 and run P2? In other words, what will be the gain in a preemptive SJF? In preemptive SJF, we check CPU
burst time of every process whenever a new process joins the ready queue. We schedule a new process
preempting the current one only if the new process has the smallest CPU burst time. This is therefore also
called the Shortest Remaining Time Next (SRTN) or Shortest-Remaining-Time-First (SRTF) algorithm. Let us
revisit Example 1 with SRTN or preemptive SJF below.
Compare Example 3 and Example 4 carefully. We have done better both in terms of average waiting time
and average TA time in preemptive SJF or SRTN.
SJF, though elegant, is difficult to implement as we do not know the CPU bursts of the processes before they
execute. Sometimes, the next CPU burst of a process is estimated from its past CPU bursts as an exponential
average like
τ =α + (1 − α)τ , where τ is the estimate for next CPU burst and is the observed CPU burst.
α is the weightage (0 ≤ α ≤ 1) given to real burst time and (1- α) to the estimated burst time (at the n-th
time) in the new estimate. The estimated burst times can be used for implementing SJF algorithms.
Let us see the same problem with higher quantum value in Example 6.
63 | Processes, Threads and their Scheduling
With higher quantum, processes with smaller CPU bursts suffer, especially if they join the ready queue late
(see P4 in Example 6). If we use a very high time quantum (say 5ms or higher in Example 6), it becomes FCFS.
However, the number of context switches decreases (4 in Example 6 compared to 11 in Example 5).
We did not consider the overhead time of context switch here, but that is not always negligible. Therefore,
time quantum must be way greater than time of context switch time. Quantum is generally kept 10 to 100
milliseconds in modern OSs, while a context switch takes in the order of a few microseconds.
The RR algorithm can be implemented using a circular queue and a timer interrupt that interrupts to invoke
the dispatcher after the time quantum expires and causes a context switch. The dispatcher picks the process
from the front of the queue.
Here, we considered a non-preemptive version of the priority algorithm. It suffers from the problem of
indefinite blocking and causing starvation to other waiting processes. Hence, preemptive priority-based
algorithms (whenever a higher priority process arrives, the current process is preempted and the high-
priority one is scheduled immediately) or RR based priority algorithms are more popular. The RR-based
algorithm is also implemented using multi-level priority queues where each queue is supposed to store
processes of the same priority level. Every process executes for a time quantum and then preempted and put
at the end of the ready queue of the same priority level. Often multi-level priority queues with feedback are
used where short processes (for example, interactive processes) are put in queues with higher priority and
low time quantum and long processes (batch jobs) are put in queues with low priority and high time quantum.
Processes can move from high-priority queue to low-priority one after execution of a time slice and from low
to high-priority after spending a threshold of waiting time in a queue.
a. Load balancing: As SMP systems can have independent scheduling per core, some of the cores might be
overloaded while others are lightly loaded. Load balancing is particularly necessary when cores have their
private ready queues. This is achieved by push migration (a special process that runs periodically to check
the loads of each processor and pushes some threads from a highly loaded processor to a lightly loaded one)
or pull migration (an idle processor pulls threads from the queue of a loaded processor). Some SMP systems
use both push and pull migrations.
b. Processor Affinity: When a process / thread runs on a processor, some of its code and data remain in its
cache attached to the processor. When the process (or thread) migrates to a different processor, the
corresponding cache does not have the code and data and it needs to store them again. Had the process /
thread been scheduled with the old processor again, old copies of the code and data could be re-used, and
time could be saved. OS often attempts to schedule a given thread to a single processor, even though the
allotment is not always guaranteed (the situation is called soft affinity) or allows processes to make system
calls for scheduling to a given processor (hard affinity).
are used for background processes that run for longer duration. The littles save energy while the bigs deliver
performance. Windows 10 supports HMP scheduling.
For a hard RTOS scheduler, interrupt latency must be less than the event latency. Often interrupts are
disabled when some kernel data structures are accessed. RTOS requires that disabling interrupts can be
allowed only for a very short time period. Also, to minimize dispatch latency, the RTOS kernel needs to be
preemptive. Whenever a high priority processor arrives, in minimum time possible, CPU should be allocated
to it preempting any running process of low-priority and freeing the resources held by the running process.
Hence, RTOS should have priority-based preemptive scheduling.
With respect to RTOS scheduling, there are some important concepts and terms that need to be discussed.
RTOSs are mostly used within embedded systems which collect data from sensors at regular intervals. The
tasks that are performed at regular intervals are called periodic and the time between initiation of two
successive such tasks is called a period (p) (Fig 2.26).
In preemptive scheduling, a running task may not be complete at one go (like in a non-preemptive
scheduling), however the time required for completing the task (t) must be smaller than the deadline (d).
Generally, 0 ≤ ≤ ≤ . The rate of a periodic task is the frequency of appearing the task (1/p) in unit
time and is often expressed in Hertz (Hz).
67 | Processes, Threads and their Scheduling
Deadline (d) is a time-constraint by which a task should either start or end. Typically, completion or end
deadlines are more popular.
There are some tasks which occur from time to time, but not in regular intervals. They are called aperiodic.
For example, closing of a valve in a duct when fluid level reaches a certain threshold.
With this background we can discuss two scheduling algorithms that are popularly used in RTOSs.
One important concept related to hard RT scheduling is schedulability - i.e., whether a given set of periodic
processes can be at all scheduled or not, meeting all the hard deadlines. This is checked in terms of CPU
utilization.
If the set of processes , , … , have execution times , …, with periods , …, respectively,
then utilization of each task s given by = .
For the entire set of processes, the sum of these utilizations must be less than or equal to maximum possible
utilization, i.e. 1.
In other words, + + ⋯+ ≤1
In Example 10, little thought will reveal that instead of EDF, if we follow a non-preemptive FCFS algorithm,
T1 would continue for 25ms. When T1 releases the CPU, we miss the end-deadline of T2 which is (5+20) =
25ms itself.
EDF can also be applied for periodic tasks (Please see Example 11). The example can also be tried with the
RM algorithm. The RM schedulability criteria will show that there is no guarantee of finding a solution. And,
in fact, applying RM to Example 11 will lead to a situation where we cannot meet all the deadlines. The
students are strongly encouraged to try on their own and get convinced.
71 | Processes, Threads and their Scheduling
In both the scheduling algorithms related to RTOSs, it was assumed that tasks or processes are independent.
However, they may have interdependence among them. Then precedence constraints will necessitate
topological sort of the processes to find an execution order. Deadline-based scheduling policy may involve
other constraints also. Some of them will be discussed in the later units.
Operating Systems | 72
UNIT SUMMARY
This chapter introduced the basic units of program execution: processes and threads.
A process is a running instance of a program which is a set of instructions. Execution of a
program is facilitated and managed by an OS on the computing hardware. An OS sees it in terms
of a process and allocates resources to the processes.
A process may have more than one independent flow of execution, each of them can be
executed concurrently. Each such execution flow is called a thread. Processes are units of
resource allocation by an OS while threads are units of work.
There are different terms related to a process: process states, process control block, context
switching and similarly that of threads like thread states, thread types and multi-threading.
Every process has a life cycle where it is first created and is assigned resources. After it becomes
ready to run, it is allocated CPU to begin execution. When it needs I/O, it goes to wait state and
then rescheduled for CPU. After one or more CPU and I/O bursts, It finally terminates.
Processes are connected to each other in a parent, child and siblings relationship.
Threads are of two types: ULTs and KLTs. ULTs are managed in user applications through
threads library. KLTs are managed by the OS.
There are different CPU scheduling algorithms for single processor systems. Some are
preemptive where the scheduler applies force to evacuate a currently running process from CPU.
Non-preemptive ones rest on voluntary releases of the CPU cores by the processes.
FCFS algorithm schedules processes according to the time of arrival. SJF gives preferences to
the shortest remaining job at any moment. The RR algorithm offers a fixed time slice to all
processes in the queue. Priority scheduling prefers important processes over the not-so-
important ones.
Different scheduling algorithms have different purposes to serve. Their performance can be
measured using different metrics like CPU utilization, throughput, average wait time, average
turnaround time, response time etc.
Multiprocessor systems are nowadays quite commonplace even for PCs. Load balancing,
processor affinity are two important issues in homogeneous multiprocessor scheduling.
Heterogeneous multiprocessing is seen in mobile computing devices nowadays to save battery
power.
Scheduling in real time systems is deadline driven. Hard real time systems require that deadlines
to be met for all tasks. Soft real time systems need predictable and time bound responses. The
RM algorithm is used for periodic RTOS tasks, but EDF can be used for both periodic and
aperiodic tasks.
EXERCISES
Q1. In UNIX Which of the following command is used to set task priority
A init
B nice
C kill
D ps [UGC NET CS (2012)]
73 | Processes, Threads and their Scheduling
Q5. Which combination of the following features will suffice to characterize an OS as a multi-
programmed OS?
a. More than one program may be loaded into main memory at the same time for execution
b. If a program waits for certain events such as I/O, another program is immediately scheduled for
execution
c. If the execution of a program terminates, another program is immediately scheduled for execution.
A. (a) B. (a) and b. C. (a) and (c) D. (a), (b) and (c) [GATE (2002)]
Q6. Consider the following statements with respect to user-level threads and kernel-supported
threads
I. context switch is faster with kernel-supported threads
II. for user-level threads, a system call can block the entire process
III. Kernel supported threads can be scheduled independently
IV. User level threads are transparent to the kernel
Which of the above statements are true?
Operating Systems | 74
Q9. Which scheduling policy is most suitable for a time-shared operating system?
A. Shortest Job First B. Round Robin
C. First Come First Serve D. Elevator [GATE(1995])
Q10. Four jobs to be executed on a single processor system arrive at time 0 in the order A, B, C, D.
Their CPU burst time requirements are 4, 1, 8, 1-time units respectively. The completion time
of A under round robin scheduling with a time slice of one-time unit is.
Q11. The process state transition diagram of an operating system is as given below. Which of the
following must be false about the above operating system ?
Q1. On a system call with N CPUs what is the minimum number of processes that can be in the
ready, run and blocked state?
Q2. What is the principal advantage of multiprogramming?
Q3. What is the principal disadvantage of too much multiprogramming?
Q4. What are the differences between process switch and thread switch? When do they occur?
Q5. What are process execution modes? Explain their purpose.
Q6. What is the difference between turnaround time and response time?
Q7. What is the purpose of a ready queue?
Q8. Explain two level thread scheduling.
Q9. On a system using round robin scheduling what would be the effect of including one process
twice in the list of processes?
Q10. In the following process state transition diagram for a uniprocessor system, assume that there
are always some processes in the ready state:
Q1. The operating system protects one process from another one. Why does it not protect one thread
from its sibling thread?
Q2. What are user-threads and kernel-threads? Write the similarities and differences between them.
Q3. Why does the UNIX system use the zombie state? Is this an execution state of a thread or a
process?
Q4. Define system throughput and CPU utilization. Are these two related to one another?
Q5. Explain FCFS scheduling and discuss its advantages and disadvantages.
Q6. What is priority-based scheduling? Explain the difference between preemptive priority
scheduling and non preemptive priority scheduling.
Q7. Is a non-preemptive scheduling algorithm a good choice for an interactive system? Justify your
answer.
Q8. Draw the process state transition diagram of an OS in which (i) each process is in one of the five
states: created, ready, running, blocked (i.e., sleep or wait), or terminated, and (ii) only non-
preemptive scheduling is used by the OS. Label the transitions appropriately.
Q9. How is uniprocessor scheduling different from multiprocessor scheduling? Explain.
Q10. What are the issues of real time scheduling? Discuss its specialities in comparison to
uniprocessor systems.
Numerical Problems
Q1. Consider the following set of processes, with the arrival times and the CPU-burst times given in
milliseconds.
P1 0 5
P2 1 3
P3 2 3
P4 4 1
What is the average turnaround time for these processes with the preemptive shortest remaining
processing time first (SRPT) algorithm? [GATE(2004)]
Q2. Consider three CPU-intensive processes, which require 10, 20 and 30 time units and arrive at
times 0 , 2 and 6,respectively. How many context switches are needed if the operating system
implements a shortest remaining time first scheduling algorithm? Do not count the context
switches at time zero and at the end. [GATE(2006) ISRO (2009)]
77 | Processes, Threads and their Scheduling
Q3. Consider three processes, all arriving at time zero, with total execution time of 10, 20 and 30
units, respectively. Each process spends the first 20% of execution time doing I/O, the next 70%
of time doing computation, and the last 10% of time doing I/O again. The operating system uses
a shortest remaining compute time first scheduling algorithm and schedules a new process
either when the running process gets blocked on I/O or when the running process finishes its
compute burst. Assume that all I/O operations can be overlapped as much as possible. For
what percentage of time does the CPU remain idle? (upto 2 decimal place)
Q4. Consider the following four processes with arrival times (in milliseconds) and their
length of CPU bursts (in milliseconds) as shown below:
Process p1 p2 p3 p4
These processes are run on a single processor using the preemptive Shortest Remaining Time First
scheduling algorithm. If the average waiting time of the processes is 1 millisecond, then the
value of Z is _____? [GATE(2019)]
Q5. Consider a uniprocessor system executing three tasks T1 ,T2 and T3 each of which is composed
of an infinite sequence of jobs (or instances) which arrive periodically at intervals of 3, 7 and
20 milliseconds, respectively. The priority of each task is the inverse of its period, and the
available tasks are scheduled in order of priority, which is the highest priority task scheduled
first. Each instance of T1, T2 and T3 requires an execution time of 1, 2 and 4 milliseconds,
respectively. Given that all tasks initially arrive at the beginning of the 1st millisecond and task
preemptions are allowed, the first instance of T3 completes its execution at the end
of_____________________milliseconds. [GATE (2015)]
Q6. Consider the set of processes with arrival time (in milliseconds), CPU burst time (in milliseconds),
and priority (0 is the highest priority) shown below. None of the processes have I/O burst time.
p1 0 11 2
p2 5 28 0
p3 12 2 3
p4 2 10 1
p5 9 16 4
Operating Systems | 78
The average waiting time (in milliseconds) of all the processes using preemptive priority scheduling
algorithm is ____.
PRACTICAL
1. Study different process management-related POSIX calls like: i. fork() ii. exec() iii. wait() iv. sleep
v. kill() vi. exit() vii. getpid() viii. getppid() and use them in your program.
3. For a set of processes with arrival times and CPU burst times provided, implement FCFS, SJF,
RR algorithm. Use different POSIX calls to simulate the same and find out average waiting time,
TA time.
KNOW MORE
Process creation and management in UNIX & Linux are elaborately discussed and demonstrated in
[RR03] and [SR05] for hands-on experiences. Similarly for Windows [YIR17] contains the manual.
UNIX processes and their scheduling are detailed in [Bac05] and [Vah12] and about UNIX threads
in [Vah12], while for Windows threads [YIR17] stands as the authentic source.
For general discussion on processes, threads and their scheduling [SGG18], [Sta12] and [Hal15] are
good books. [Mil11] and [Sta12] contain good accounts of scheduling.
Discussion on multiprocessing environments is covered in [SGG18] and [Sta12].
For real time systems, [Nar14] provides a brief but nice overview. Real time scheduling was
elaborately covered in [SGG18], [Sta12] and [Nar14].
[Bac05] Maurice J Bach: The Design of the UNIX Operating System, Prentice Hall of India, 2005.
[HA09] Sibsankar Haldar and Alex A Aravind: Operating Systems, Pearson Education, 2009.
[Hal15] Sibsankar Haldar: Operating Systems, Self Edition 1.1, 2015.
[Mil11] Milan Milenkovic: Operating Systems - Concepts and Design, 2nd edition, Tata McGraw Hill,
2011
[Nar14] Naresh Chauhan: Principles of Operating Systems, Oxford University Press, 2014.
[RR03] Kay A. Robbins, Steven Robbins: Unix™ Systems Programming: Communication,
Concurrency, and Threads, PrenticeHall, 2003.
[SR05] Richard W Stevens, Stephen A Rago: Advanced Programming in the UNIX Environment (2nd
Edition), Addison-Wesley Professional, 2005.
[SGG18] Abraham Silberschatz, Peter B Galvin, Greg Gagne: Operating Systems Concepts,10th
Edition, Wiley, 2018.
79 | Processes, Threads and their Scheduling
[Sta12] William Stallings: Operating Systems Internals and Design Principles, 7th Edition, Prentice
Hall, 2012.
[Vah12] Uresh Vahalia: UNIX Internals, The New Frontiers, Pearson, 2012.
[YIR17] Pavel Yosifovich, Alex Ionescu, Mark E. Russinovich, and David A. Solomon: Windows
Internals, Seventh Edition (Part 1 and 2), Microsoft, 2017. https://docs.microsoft.com/en-
us/sysinternals/resources/windows-internals (as on 8-Jul-2022).
UNIT SPECIFICS
RATIONALE
This unit on interprocess communication and process synchronization starts with the discussion on different
IPC models and techniques in reasonable detail. The unit helps students learn the fundamental concepts of
communication techniques and some examples of their implementations. Interprocess communication increases
utilization of available resources in a computer and its overall efficiency through increase in modularity and
reduction in redundancy of codes. But concurrent execution on the IPC data structures creates serious issues
like race conditions leading to data inconsistency and thus program malfunctioning. The sections of code within
the cooperating processes where shared data structures are accessed are called critical sections. Access to
these critical sections needs to be done in mutual exclusion to each other. The problems arising out of
simultaneous attempts to access these critical sections are called critical section problems. Necessary
definitions with relevant examples are provided and other necessary concepts are developed. Basic primitives
required towards solution to critical section problems and offered at hardware level are described. A few more
powerful primitives developed using the basic tools at algorithmic level as well as operating system, and high-
level programming language levels are then discussed. Finally, a few classic and standard IPC problems are
81 | Interprocess Communication and Process Synchronization
explained and how their solutions can be designed using different synchronization and then are described in
detail. This unit builds the fundamental concepts to understand the concurrent (and parallel) programming
environment of an OS. The concepts developed here are central and critical to the utilization of computing
resources and their management by OS and will be used in other forthcoming units of the book.
This unit builds the fundamental concepts to understand the concurrent (and parallel) programming
environment of an OS. The concepts developed here are central and critical to the utilization of computing
resources and their management by OS and will be used in other forthcoming units of the book.
PRE-REQUISITES
UNIT OUTCOMES
List of outcomes of this unit is as follows:
U3-O1: Define process communication models, race condition, critical section, different solution primitives
and tools like mutex, semaphores, monitors.
U3-O2: Describe methods of process communication like shared memory model, message passing model and
their implementation, different critical section problems and their solutions using various
synchronization tools and primitives.
U3-O3: Understand the need for process cooperation and the problems arising out of sharing data and
resources among cooperating resources.
U3-O4: Realize the importance of process coordination and synchronized execution of critical sections.
U3-O5: Analyze and compare different process synchronization techniques at different levels of hardware
and software.
U3-O6: Design solutions to different classical IPC problems as well as some novel (non-classical) problems.
Processes can execute within a multiprocess OS in two ways. Either they share some information among them or
do not do it at all. When they share information (code and/or data) among them during execution, they are called
cooperating processes, otherwise independent processes. Cooperating processes collaborate to accomplish a task
through various interprocess communication (IPC) techniques. They belong to either of the two popular IPC models:
shared memory (SM) and message passing.
A process can detach itself from the shared memory when its use is over, but the SM remains in the main memory
until it is explicitly destroyed by some process (not necessarily the creator). If several processes want to access the
space simultaneously, the OS kernel does not have any control on it. Concurrent access to the shared memory is
thus to be managed at the user level only.
Fig 3.1b and Fig 3.1c show a simple example of SM implementation in a Unix-based system. The writer process (Fig
3.1b) reads an input string from the console and writes it on the shared memory. The reader process (Fig 3.1c) can
83 | Interprocess Communication and Process Synchronization
attach to it and then read from the shared memory, if it is not destroyed. The reader can also write on it. Any other
process can attach to the shared memory and use it freely. Students are strongly encouraged to run the code and
play with them by modifying the programs. (They can learn more about the necessary syscalls by doing (man
<shm-service-name>).
Data structure buffer and the variable count are accessed by both the processes. The variable in is only
used by the producer, and the variable out by only the consumer, variable count is updated by both.
3.1.2.3 Pipes
Pipes are an asynchronous and uni-directional message passing mechanism between two related processes. These
pipes are created in kernel space and generally un-named. Usually, parent and child processes communicate
through unnamed pipes (Fig 3.5).
In UNIX, a pipe is treated almost like a file. However, each process has two file descriptors for a pipe: one for read
and another for write. Writer uses the write-descriptor and closes the read-descriptor, while the reader process
uses the read-descriptor, closing the write-descriptor (Fig 3.5b). In shell programming, pipes are used to send
5
https://users.cs.cf.ac.uk/dave/C/node25.html
85 | Interprocess Communication and Process Synchronization
output of one command to be used as input of another command. For example, two popular commands ‘ls | more’
use here a pipe denoted by ‘|’. Output of ls is sent as input of more. In the UNIX shell, a series of commands can be
cascaded using pipes this way.
3.1.2.5 Sockets
Sockets are endpoints of a bi-directional communication channel through which two processes communicate. The
processes can be related or unrelated, local or remote. A socket represents a port on a host machine through which
a process sends or receives data. Sockets implement indirect IPC, i.e., any process (including the sender) that
connects to the other end of the channel, i.e., another socket, can receive or send data (Fig 3.6a).
Operating Systems | 86
Sockets are mostly used in client-server configurations between two remote processes. Each socket is
supposed to have a host address where the host id depends on the domain (domain can be UNIX or Internet).
In the Internet domain, a host address consists of a 32-bit ip-addr and 32-bit port number. In the UNIX
domain, it is a unique name like a filename.
An operating system provides the following system calls to implement a socket.
i. socket(): to create a socket. It returns a socket descriptor.
ii. bind(): the server binds the created socket to a local port.
iii. listen(): the socket is ready to communicate, waits for some request from other processes.
iv. connect(): the other process (client) connects to the remote socket (server-side) as given by the host
address.
v. accept(): the server accepts the request of remote host, creates another socket at the server for
communication
vi. close(): closes the socket.
Once connection is established, the socket is accessed like a file within the host programs and data is read
from or written to the socket.
Sockets can be implemented using different communication protocols like UDP, TCP, or IP. Fig 3.6b shows a
block diagram of the sequence of data flow in a client server implementation of sockets following TCP.
Fig 3.7 shows the actual implementation in a single UNIX system. Both the programs need to be executed
simultaneously to see the communication. Sockets are here implemented at the user level with the APIs
provided by the OS. The control lies with the OS kernel as the necessary message buffers are created in the
kernel space. When several processes simultaneously attempt to access the same socket, how it will be
managed is a kernel prerogative.
87 | Interprocess Communication and Process Synchronization
Sockets are an important part of networking and socket programming is considered an integral part of
network programming. You can learn more on sockets and their implementations from the given links6.
3.2 SYNCHRONIZATION
In both the IPC models across the implementation schemes discussed, there are several shared data
structures: a shared memory region, or a shared message buffer and shared variables. All the cooperating
processes either share data through them or modify them. It may happen that more than one process
attempts to simultaneously access the same data-structure (or shared variables) at the same time.
Simultaneous read of a given shared data by several processes may lead to a contention in a single processor
system as to who gets the first chance to read. Even though this is a scheduling issue, it is not a serious
problem. All processes are supposed to read the same value of the variable or face the same state of the
shared data structure.
But when simultaneous read and write attempts are made or simultaneous writes are attempted on a shared
data item by more than one process - their execution order has serious implications.
If the writer writes before the reader, while the read should happen before any write - the reader gets the
modified data. It may lead to an undesirable effect. Similarly, if reading of a data item should happen after a
write, but it happens in the opposite order - it is also a potentially serious issue.
6
https://www.ibm.com/docs/en/ztpf/1.1.0.15?topic=considerations-unix-domain-sockets
https://users.cs.cf.ac.uk/dave/C/node28.html#SECTION002800000000000000000
Operating Systems | 88
There are several techniques and tools that an OS uses to implement mutual exclusion. There needs to be a few
more desired properties that we shall discuss related to critical sections.
However, let us formally define the problem first followed by the solutions.
Any good solution to the critical section problem must have three following properties:
Mutual Exclusion: One and only one process is allowed to execute in a critical section corresponding to a shared
data object at any time. In other words, access to the CS is done mutually exclusively. This is also known as the
safety property.
Progress: If no process is executing in a CS, but some other process(es) want(s) to enter the CS, then the processes
which are not in the remainder section (that means processes in either entry or exit or critical sections) will decide
which process can enter in the CS next. Also, this decision must be taken within a bounded time. This is also known
as finite arbitration or liveness property.
Bounded wait: Once a process has made a request to enter a critical section, there must be a limit or bound on
how many times other processes can be allowed to enter the CS before the requesting process is granted access to
enter the CS. This is also called the property of starvation freedom.
Property 1 is self-explanatory.
Property 2 ensures that every process gets a chance to enter a CS if it wants to. The decision as to which process
will go into the CS next is taken by processes which are either in the entry section, critical section or exit section.
The processes which are in the remainder section have either already completed execution of the CS some time
back and/or are not interested now in CS - hence these processes are excluded. The rest of the processes are
91 | Interprocess Communication and Process Synchronization
immediate stakeholders and thus take part in the decision. The bound on the time ensures that the decision is
actually taken and not indefinitely postponed for some reasons or other.
Property 3 ensures that the wait for going into a CS for every process is well defined. A process cannot go into a CS
indiscriminately denying access to other processes.
All the three properties are necessarily satisfied in a solution to a CSP. Note that Property 2 does not necessarily
ensure Property 3. For example, out of n processes in the system, a proper subset of processes (say
, ,…,, ) can have a collusion to deprive another proper subset of processes ( ,
, ,… , , ) entering into a CS (0 < < ).
These three properties are essential ones of a good solution to a CSP. There are a few other desirable properties as
well.
One of them is the property of fairness - i.e., there should not be any undue priority to any process for entering into
the critical section when other processes are waiting. Fairness can be implemented in different ways like: FCFS
fairness (no overtaking of a waiting process by another to enter a CS) or LRU fairness (the process that received the
service least recently will get services next) etc.
Solutions to CSPs can be implemented in different ways and at different layers of abstractions. Some of the
synchronization tools are available at the basic hardware level, extended hardware level or at different software
levels. Some examples of solution tools or primitives are shown in Fig 3.11. While some are supportive tools, some
are ready-made solutions based on some basic or extended tools. We shall discuss their implementation details
below and analyse how many of the necessary and desirable criteria are satisfied.
memory location may do. In a uniprocessor system, only one instruction can be executed by the processor at a
single point in time. After invoking a memory read or memory write operation, the CPU stalls or blocks the process.
It does not proceed until the memory operation is over. Hence, no other instructions from the same process can
invoke the same memory access in a non-preemptive kernel. In a preemptive system, the CPU can go to other
threads or other processes that can attempt to read or write from the same memory location. However, even in a
preemptive kernel, only one memory operation is generally allowed at a time. When one memory read / write is
going on, other attempts to the same memory location are blocked by the hardware. Another memory operation
by the same or another process is allowed only after the current one is completed. Simultaneous attempts to
memory access are, however, serialized in an arbitrary manner - whoever executes the memory operation
instruction first, gets to access the memory. The memory operation is atomic - i.e., if it is started, the memory access
hardware ensures that it is done in a mutually exclusive manner.
This atomic memory access feature is only basic, as it provides only mutual exclusion. It does not have any
mechanism to ensure progress (Property 2) and bounded wait (Property 3). Also, often a critical section includes
memory access to compound data structures involving several memory cells. Ensuring synchronization among
simultaneous accesses to several memory cells is not trivial with only atomic memory access.
3.7.1.2 Disabling interrupts
One solution to stopping simultaneous attempts for accessing the same
critical section can be dis-allowing preemption, i.e., not allowing any
interrupts to occur during CS execution. However, disabling interrupts can
only be done in kernel mode. Hence, user processes cannot implement it in
user mode. Kernel processes can implement it by disabling interrupts from
devices, timer or other processes (traps) in the entry section before going into
the critical section. Again, after execution in CS is over, the process enables
the interrupt so that interrupts from timer, other devices and processes can
happen. Also, another kernel process can disable interrupt and enter the CS.
This scheme ensures mutual exclusion, but not progress and bounded wait.
More than just disabling and enabling interrupts needs to be done in entry and exit sections respectively to achieve
other properties.
Also, disabling interrupts for long due to a lengthy critical section can be detrimental to the system performance,
as it under-utilizes the peripheral devices and reduces concurrency. Moreover, although the scheme can be easily
implemented in a uniprocessor system, it will be very difficult to implement in a multiprocessor system. Ensuring
mutual exclusion in a multiprocessor system needs blocking other processors whenever attempts to access the
same critical section is made. This is non-trivial. Hence, even if we implement interrupt disabling, we need other
synchronization mechanisms, especially for multiprocessor systems.
Variable lock in case of testAndset (Fig 3.13a) is checked in the entry section of every process. But whoever
gets the first chance to set it (the first process finds false as the return value of testAndset()), can enter into
the critical section. Other processes find the value of testAndset() as true and loops around in the entry
section. When the first process completes the critical section, it resets the lock (making lock = false) in the exit
section. Other processes then again can fight for entering the critical section. Only one process can set the lock and
thus enter the critical section at a time. Hence mutual exclusion is ensured.
The implementation using compareAndswap()(Fig 3.13b)is exactly the same except the entry section where
we replace testAndset. The lock initially has value 0. Hence the first process that executes
compareAndswap(), finds return value 0, sets lock = 1, and enters CS. Other processes find 1 as the return
value of compareAndswap() and loop in the entry section. After CS execution is done, in the exit section, lock
is reset to 0 to allow other waiting processes to acquire the lock and enter their CS. Mutual exclusion is thus ensured.
Both the implementation also ensures that which process can go into the CS next is decided by processes in the
entry and exit section only. Hence, progress is also ensured by both. But there is no guarantee on which process
among several contenders will get the chance to go into the CS. There is a possibility that a group of processes are
deprived if they cannot acquire the lock. Hence bounded wait is not met by the above implementation.
We shall see later implementations that meet all necessary properties, but that involves other shared data-
structures and their orderly management.
Synchronizing instruction like CAS is not typically used directly for mutual exclusion. Rather, CAS is more used in
implementing atomic increment or decrement of a variable (Remember that an increment / decrement of a shared
variable like count can cause a race condition in Sec 3.1.2.2). A function for atomic decrement for an integer var
can be called within a program as:
decrement (&var);
The function can be implemented in the following way.
The value var is decremented only once by only one
process that tried to execute the CAS first.
The increment can also be done in the same manner.
This implementation can ensure atomic updates: no
two processes can update at the same time. But it may
not always mitigate the race condition. For example, in the bounded buffer implementation of multiple producers-
consumers problem, if a producer produces one message and puts it in the empty buffer, two or more consumers
can come out of the while loop (busy wait loop) in the entry section if busy wait checks are not done in mutual
exclusion (Fig 3.4c) before doing any modification in the buffer. But only one process can consume, others will not.
Nevertheless, the system will be in an inconsistent state or race condition.
Here, Thread 1 is supposed to print 10 as it should be in a busy wait state due to the while loop initially and can
proceed only after Thread 2 executes. This is the expected behaviour.
But if the processor / compiler does instruction reordering in Thread 2, flag may be set to true before x = 10. If
the thread-switch happens immediately after the flag is set, Thread 1 can show 0, not expected behavior.
If instruction reordering happens in Thread 1, not in Thread 2: i.e., x is printed before flag, x can display 0, again
undesired behavior.
In a shared memory multiprocessor environment, this kind of instruction reordering involving memory load / store
can lead to inconsistency. To mitigate, a hardware instruction is used that propagates any memory updates within
a processor to all other threads running on other processors. These instructions are called memory barriers or
memory fences. When a memory barrier is executed, the system ensures that all load and store instructions are
completed and visible to other processors before any subsequent load or store can happen. The memory barriers
are thus used to synchronize events across processors. To revisit our example above, memory barriers can ensure
that the flag is checked before printing x in Thread 1. Also, x is guaranteed to get updated before the flag is
set in Thread 2 (Fig 3.15).
95 | Interprocess Communication and Process Synchronization
These hardware supports for synchronization do not directly provide solutions to CSP. But they are used to design
solutions. Nevertheless, their use is limited to kernel processes, as user level processes cannot directly access these
hardware tools and instructions.
The two processes are assumed to have similar structure and are running similar code concurrently. The variable
turn denotes whose turn it is to go into the critical section and is initialized to 0 (there is no harm if it is initialized
to 1). Variable self denotes the own process number and the other the counterpart. In the entry section, both
the processes check whether the turn is set for it or not (Fig 3.16b or Fig 3.16c). If not, it is in the busy wait loop,
otherwise it can execute the critical section. In the exit section, the process that executed the critical section,
changes the value in turn so that the other process can enter CS next. Since turn can be either 0 or 1 at a given
moment, only one process can enter and execute CS. Hence mutual execution is satisfied. Progress is also ensured
as the exit section changes the turn to the other process. Bounded wait is also met as for any process wait time is
maximum 1 process as processes alternately execute CS if both want to execute CS.
Even though all 3 necessary conditions are satisfied, it assumes that both processes would like to enter CS all the
time. That is not a realistic situation. Consider a case when one of processes (say P0) does not want to enter CS, but
turn is set to 0. Now the other process (here, P1) cannot enter CS, even though it wants to get into the CS. This will
cause indefinite wait for the process wanting to get into the CS. Hence, progress will not be ensured in case one of
Operating Systems | 96
the processes does not want to get into the CS. Also, after executing a CS, a process cannot go into the CS next time,
if the other process does not execute the CS in between.
Variable turn is declared as a volatile type to notify that turn should not be reordered by compilers - as reordering
the instructions involving update to turn can cause indefinite wait also (see Sec 3.7.2.4).
Both the flag variables and turn are boolean shared variables, and not intended to be reordered by compilers
(declared volatile). They are all initialized to 0 (the algorithm will behave the same with other initialisations also).
In the entry section, the process sets its intent flag, but turn is set for the other process, as if for courtesy (compare
Fig 3.17b and Fig 3.17c). For concurrent executions, nobody knows which process will set the turn variable last that
will prevail. If the other process intends to enter CS (as available in flag[1-i]), the process loops in a busy wait
and lets the other process go into CS. Since turn can be either 0 or 1 at a moment, only one process can enter CS.
Hence, mutual exclusion is met. Also, if the other process does not intend to go into the CS, irrespective of the value
of turn, the process can enter CS. Hence, progress is also ensured. In the exit section, a process resets its intent
flag so that the other process, if intending, can enter CS. Hence bounded wait is also satisfied.
&-set().The TSL returns 0 and gets into CS by breaking the busy-wait while loop. Since TSL is an atomic
operation, only the first process can set the lock uninterrupted, out of several simultaneous contenders. Other
processes will get return value 1 and will loop around in the busy-wait state. Hence mutual exclusion is ensured.
The process getting into the CS voluntarily resets its intent flag in the entry section. This ensures that the process
executing CS now will not try to have another turn immediately.
In the exit section, the algorithm finds the next intending process to go into the CS in a principled manner. The next
process is the intending process (flag[j) is set) with higher process-id in the circular sequence {0, 1, 2, ..., n-1}. If
one is found, its intent flag is reset as that process is certainly looping in busy-wait condition in the entry section of
process Pj. Such process Pj can thus come out of the while loop and enter the CS. The lock seems to be transferred
to the process Pj. If we cannot find one among the remaining processes(j==i), the lock is reset (=0) so that any
process that intends to go into the CS in future, can acquire the lock.
If the flag[j] is already reset in the exit section, again resetting it within the entry section of process Pj is
redundant. But we do not know how the process comes into the CS. Also, this redundant action does not harm but
helps achieve progress.
The implementation ensures checking the turn in a circular fashion all the processes with id 0, 1, 2, …, n-1. Hence,
no process waits more than (n-1) times, after it notifies its intent to go into CS and before it actually gets the chance.
Hence, the bounded wait is also met.
Like earlier, the updates on the shared variables flag[j] and lock are very important, and they are to be done
in the given order in all the processes. Instruction reordering can create race conditions and affect synchronization.
Hence, we declared them as volatile.
Operating Systems | 98
The two operations acquire() and release() are considered atomic. They can be implemented using
hardware primitives like TSL or CAS.
This type of mutex locks has a busy-wait loop where a process intending to go into CS loops around. These are also,
therefore, called spinlocks.
Even though we encountered busy-wait loops before, spinlocks are different from them. In spinlocks we check only
the lock variable avail whereas in other busy-waiting loops we can have a predicate involving one or more
variables. Thus, spinlocks are simpler to check, and computationally lighter.
But still spinlocks are not good from the performance point of view. The process interested to go into a CS, spins to
get a lock and wastes CPU cycles. For a single-core system, it cannot be also implemented, as that will require
context switches within atomic operations.
Nonetheless, spinlocks are beneficial when critical sections are very small needing mutual exclusion for only short
duration. A thread can `spin’ on acquiring a lock in a processing core when another thread can execute the critical
section on another core of a multi-core processor. This obviates the need of blocking a process for mutual exclusion
and causing a context switch which is costly in terms of time and other logistics.
3.7.4.2 Semaphores
Semaphores are improvements and generalizations over mutex locks. A semaphore S can be considered as an
integer variable that is, after initialisation, accessed and updated only by two atomic operations wait() and
signal() (Fig 3.20). The semaphore integer variable (val) keeps track of simultaneous access to a critical section
that can be allowed. It is initialized with an integer indicating maximum of such simultaneous accesses (often
simultaneous reads to a CS data item is allowed, but simultaneous read & write are to be done mutually
exclusively). wait()allows the use of a semaphore and decrements val. When no more simultaneous access
is allowed (val <=0), a process spins in busy-wait. signal() increments semaphore value to allow other waiting
99 | Interprocess Communication and Process Synchronization
processes to use the semaphore. wait() is also known as down() or P() (short of Dutch term proberen meaning
‘to test’) and signal() as up() or V() (short for Dutch term verhogen meaning to increment) in the literature.
Semaphore types
There are two types of semaphore.
A counting semaphore allows multiple but limited number of processes to simultaneously access a shared resource
(including reading). When non-negative, the semaphore value represents how many more processes can still be
allowed to simultaneously access a shared resource. When negative, semaphore shows the number of processes
waiting to access the shared resource.
A restricted type is the binary semaphore that is like a mutex lock. Its val thus can be 0 or 1. However, mutex lock
(or spinlock) is different from binary semaphore in that mutex requires the same process to unlock it that locked it.
On the other hand, binary semaphores are operated by any process that has access to it (not necessarily the same
process).
Implementation of a semaphore
As mentioned earlier, busy-wait is a wastage of CPU time. Instead of spinning, a process can rather block and have
a context switch to let other processes execute when its competitor(s) are executing in CS. The semaphores,
therefore, get rid of busy wait loops by maintaining a list of such blocked processes. Necessary changes in the
implementation are shown in Fig 3.21.
A semaphore is always initialized with a non-negative integer. Then its value is inspected and updated only by two
functions. In the wait function sem_wait(), semaphore value is decremented first and then the calling process
is blocked, if the semaphore value becomes negative. The blocked processes wake up only through a call of signal
function (sem_signal()) invoked by some other process. In the signal function, semaphore value is
incremented first and if it becomes non-positive (≤ 0), a blocked process is woken up and allowed to continue. The
list of processes is implemented using a pointer to the linked list of PCBs of blocked processes.
The two functions sem_wait() and sem_signal()must be executed atomically. In other words, these
functions can also be considered critical sections for a semaphore. Hence, they must be implemented using
disabling interrupts (Sec 3.7.1.2) or CAS or mutex locks. Implementation of the semaphores needs judicious
Operating Systems | 100
consideration of processor architecture (single-core or SMP multi-core or heterogeneous) and basic hardware
synchronization primitives.
Use of semaphores
Semaphores are offered by OS to ease the job of synchronization for application programmers. Primary use is in
mutual exclusion of a critical section (CS) among a set of cooperating processes. A binary semaphore s is initialized
with value 1. The process that wants to execute a CS, calls sem_wait(s) in the entry section. If no other process
is in CS, it can go into the CS. In the exit section, it calls sem_signal(s) to let others go (Fig 3.22a). The code
looks simpler and tidy from the application programmers’ end.
A binary semaphore can also be used for ensuring serialization of events, tasks or statements. Suppose we want to
ensure that statement S1 of process P1 need to execute before the statement S2 of the process P2 where both
processes are running concurrently (recall the problem of using a memory barrier as it needs kernel access in Sec
3.7.2.4). We can do the following implementation using a semaphore sync, initialized to 0 (Fig 3.22b). Since sync
has initial value 0, P2 will block due to sem_wait() and cannot execute S2 in P2. Once S1 in P1 is executed and
then sem_signal()increments the semaphore sync, S2 in P2 can execute.
Counting semaphores are often used for managing simultaneous access of a resource by more than one process. A
counting semaphore can keep track of the accesses to resources that have multiple instances like scanners, printers,
shared buffers, files etc. and can stop further attempts when the maximum limit is reached.
We shall soon see more use of semaphores in solving some of the classical critical section problems.
The synchronizing tools discussed so far are elementary in nature. The hardware tools are the most basic and are
used to develop little more sophisticated ones. But, even then, tools like mutex lock, spinlock or semaphores do
not provide ready-made solutions to CSPs. They need to be intelligently used along with their associated functions.
Little sloppiness in their use can cause problems like the following silly mistakes in Fig 3.23.
compilation. We shall get undesired results in an irregular fashion (synchronization issues only come up occasionally
depending on dynamic conditions of several processes). Hence, they are not easily reproducible and are very
difficult to detect and debug.
Different programming languages provide a few synchronization primitives built on the elementary tools. These
high-level primitives relieve the developers from the hassle of painstakingly invoking the correct procedures with
correct parameters every time.
Below we mention two broad such categories.
3.7.5.2 Monitors
Monitors are more powerful and sophisticated tools than critical regions provided by programming languages. They
can be considered as abstract data types (ADT) that encapsulate both data and methods, resembling objects in C++
or Java. A user can define her own monitor based on her need using the prototype as given in Fig 3.24a. Each
monitor has the provision of defining a set of shared variables that can represent the states of the monitor, a set of
condition variables whose values determine the progress of the monitor, and a set of functions that can be executed
in a mutually exclusive manner. A process enters a monitor by invoking a function or method within it. Within an
invoked function, the parameters, shared variables and condition variables defined can be accessed. A condition
variable here is like that in a conditional critical region. The variable determines whether to proceed in execution
of the monitor-function that it is executing or to block, based on the value of the variable. Since only one of the
monitor functions is active at a given time, and no other functions from the same monitor can be active that time,
several processes can wait or block to enter a given monitor. Again, within a monitor-function, a process can check
a condition variable and block itself. This condition variable is a shared variable on which several processes can
block. Hence, there can be two sets of blocked processes. One set of blocked processes have not entered the
monitor (called inactive processes) and another set of processes that are within the monitor (and hence, active) but
blocked on condition variables. Inactive processes can enter a monitor when no other processes are active in the
monitor. They are not directly dependent on any control variable.
Operating Systems | 102
Each condition variable x within a monitor is associated with two functions: x.wait() and x.signal(). Very much like
a semaphore, x.wait() blocks an active process running within a monitor. x.signal() wakes up a blocked active
process, if any. If there are no blocked processes, x.signal() does not have any effect (unlike normal semaphore).
However, once a x.signal() is invoked by a process (say A) and there is a process (say B) waiting on x.wait(), a
pertinent question is: which process can start execution inside the monitor immediately?
There are two possible answers as strategies given below.
1. Signal and wait: Process A signals and then waits until B completes execution in the monitor.
2. Signal and continue: Process A signals and continues while process B waits until A completes execution in the
monitor.
Any one of the strategies is followed in an implementation. However, both have their advantages and disadvantages
and are used in different implementations. Java, C# support monitors.
of such processes that have signalled and are waiting. Let us consider
such a variable as next_count. Hence, a call to the monitor will look
like Fig 3.25. The process will enter after making a wait call on mutex.
After execution of the monitor function, it will signal next if there are
other waiting processes inside the monitor. If not, then mutex will be
signalled to allow other processes waiting outside the monitor.
Now, for each condition variable (say x), we need to implement x.wait()
and x.signal(). For each such x, a binary semaphore sem_x (initialized
to 0) can be used with an counter variable x_count. An illustrative
x.wait() and x.signal() is shown in Fig 3.26.
For wait due to x, x_count increases. If there are other processes inside
the monitor waiting (next_count >0) they are woken up.
Otherwise, processes waiting outside the monitor are allowed to enter the monitor. The calling process then
goes to wait for the condition
variable x.
For signal from x, we need to see
how many processes are waiting on
x. If there is at least one, then we
need to wake up the process. Also,
the calling process will exercise
signal-and-wait. hence
next_count is incremented first
and then the processing on x is
woken up. The calling process itself
goes to wait.
3.7.6.1 EventCounts
Eventcounts are abstract objects that take integer values to keep track of occurrences of events. Each eventcount
corresponds to events of a particular event-type. There are three operations defined on an eventcount ,
i. ( ): increments the value of E by 1. It indicates occurrence of an event of a particular event type.
ii. ( , ): blocks the calling process until E reaches the value v.
iii. ( ): reads the current value E.
Eventcounts are initialized to 0 and then operated by these primitive operations inside the cooperating
processes. However, these operations may happen concurrently in an uncontrolled manner and need not
be done mutually exclusively.
7
http://www.cs.uml.edu/~bill/cs515/Eventcounts_Sequencers_Reed_Kanodia_79.pdf (as on 23-Sep-2022)
Operating Systems | 104
3.7.6.2 Sequencers
Sequencers, like eventcounts, are abstract objects but are needed to ensure ordering of a set of events of a
particular type. Often, we need to know which of the several processes should execute an event first (e.g.,
simultaneous write attempts), as that can decide other follow-up events. A sequencer S ensures sequencing a set
of events by issuing tokens (a token is a sequence number that we come across for getting services in banks, bakery,
reservation counters etc.). The operation defined on the abstract object S is ticket (S) that always generates non-
negative integers (S is initialised to 0). Two calls to ticket(S) will give two different numbers that indicate serial
numbers of their operations.
While eventcounts can be used independently for handling concurrent processes, sequencers are always used along
with eventcounts, especially to ensure mutual exclusion.
Solutions to several standard critical section problems (CSPs) can be designed using eventcounts and the
combination of eventcounts and sequencers. Even synchronizing tools like semaphores can be designed using the
two primitives as illustrated in Fig. 3.27. A semaphore S should have an eventcount S.E and a sequencer S.T. The
initial value of the eventcount before calling the sem_wait() is represented by S.I.
With necessary background on the interprocess communication, synchronization, critical section problems and
their solution attempts using different synchronizing tools, we discuss a few classical IPC problems where there are
a few CSPs. Let us describe the problems and their solutions.
In case of multiple producers, we need to synchronize simultaneous putting of items onto the buffer (similar to
multiple writes) as well as reading the buffer by the consumer. However, the above solution ensures mutual
exclusion of accessing the bounded buffer, no matter how many processes are involved. Progress and bounded
wait are met for single producer-single consumer cases, but not for multiple producers.
The problem can also be solved using other synchronizing tools like a critical region (see [Hal15]), a monitor ([Sta12])
or an eventcount and a sequencer ([RK79]).
We provide a solution here (Fig 3.30) considering priority to readers: no reader should wait unless a writer has
already accessed the critical section. It allows simultaneous reads and counts the readers using a shared variable
r_count (initialized to 0). The reader process first increments r_count and then checks its value. If it is the first
reader, it should stop any writer and thus invokes wait for binary semaphore rw_mutex (initialized to 1). Ensuring
mutual exclusion with a writer is the responsibility of the first reader only, successive readers need not bother about
it. At the end of a read, every reader decrements the r_count. Update to r_count is also a critical section,
which is done in mutual exclusion using another binary semaphore mutex (initialized to 1). If the reader is the last
reader, it needs to unlock the critical section by signaling rw_mutex and allowing a writer.
The writer process is simple. It does write in mutual exclusion to read. If any reader is already within the CS, it waits.
As the readers have the priority, the writers may wait indefinitely causing starvation to writers. Hence, mutual
exclusion is maintained in the solution, but not progress nor bounded-wait for the writers.
Solution with priority to writers is provided in [Sta12]. [Dow16] contains interesting variations with detailed
discussion on the solution using semaphores. [Hal15] provides solutions using critical region and condition
variables. [RK79] illustrates the solution using eventcounts and sequencers.
107 | Interprocess Communication and Process Synchronization
then (1& 4)] and they keep on taking their turns, one philosopher (Philosopher 3) has to starve indefinitely. Hence,
the solution is not starvation-free when n is odd (check yourself that for even n, the above solution is starvation-
free).
Also, in the extreme case, there is a possibility that each philosopher picks up the left fork first and before the right
can be picked up, her neighbour picks it up. This leads to a situation where everyone has left fork on their left-hands
and their right-hands are empty, nobody can eat and there is a complete stalemate or deadlock. The problem can
linger forever unless there is preemption of resources externally. Hence the problem is not deadlock-free.
There can be starvation-free and deadlock-free solutions imposing some restrictions like
1. There should be entry for a maximum of (n-1) philosophers when there are n forks, OR
2. n is even and we ensure that even-numbered philosophers (numbered 0, 2, 4, …) pick up left forks first
and then right ones, while odd-numbered philosophers pick up right forks first followed by left, etc.
Obviously, these restrictions are not very general-purpose and are difficult to implement in a dynamic system. Also,
remember the fallibility of a programmer while coding the semaphores (the errors are not detectable by compilers
and difficult to re-create runtime scenarios always). Let us therefore try a solution with a more powerful tool like a
monitor (Fig 3.33).
Since a monitor ensures mutual exclusion among procedures inside it, get_forks() and put_forks()
execute undisturbed. When the parallel execution starts, the first philosopher that invokes get_forks(), gets
both the forks and completes eating. Any subsequent philosophers may have to wait, till the first philosopher
completes eating and puts down the forks. The solution is deadlock-free, however as it is provided, not starvation-
free. If all the philosophers are allowed to go parallel [as shown by parbegin in Fig 3.33], starvation can happen to
one or more philosophers. To avoid starvation, the philosophers need to be called in a sequence (look at the
parbegin part) – an exercise left for the readers to try.
109 | Interprocess Communication and Process Synchronization
UNIT SUMMARY
This chapter introduced the interprocess communication mechanisms following two models:
shared memory and message passing. Different schemes of the message passing model are
discussed with examples of implementation.
Interprocess communication in a multiprogramming environment can create race conditions due
to concurrent execution of certain sections of code where shared data are being accessed and
updated. These sections are called critical sections. Concurrent execution of these sections give
rise to a class of problems - known as critical section problems (CSPs).
Critical sections are to be accessed by cooperating processes in mutual exclusion to each other.
But the ideal solutions to CSPs need to have the properties of progress and bounded wait also.
Solutions to the CSPs are designed using different synchronization tools. Some of the tools are
available at the basic hardware level like atomic memory access and disabling interrupts. Some
tools like TSL and CAS offer atomic operations including checking and setting a variable. These
primitives help design solutions to CSP involving 2-processes like Peterson’s solution as well as
that of n-processes (n>2).
Popular synchronizing tools like mutex and semaphores assist developers to design
synchronization solutions among various user processes. But using them requires a great amount
of care and diligence during coding. Silly mistakes in their uses are neither easily detectable nor
reproducible and can cause serious synchronization issues.
Synchronization support in the form of critical region, condition variables and monitors are offered
by some higher-level programming languages. A critical region is one or more critical sections
which are executed in mutual exclusion. A condition variable forces a process to block until a
condition is met. A monitor is an object consisting of local variables, condition variables and
methods. Each method here is invoked in mutual exclusion to others. Also, there may be processes
blocking on a particular condition variable.
When used intelligently, all these tools offer elegant solutions to the classical synchronization
problems like producer-consumer problem, readers-writers problem, dining philosophers’ problem
etc.
EXERCISES
Multiple Choice Questions
Q4. Consider Peterson's algorithm for mutual exclusion between two concurrent processes i and j. The program
executed by process is shown below.
repeat
flag[i] = true;
turn = j;
while (P) do no-op;
Enter critical section, perform actions, then
exit critical section
flag[i] = false;
Perform other non-critical section actions.
until false;
For the program to guarantee mutual exclusion, the predicate P in the while loop should be
A. flag[j] = true and turn = i B. flag[j] = true and turn = j
C. flag[i] = true and turn = j D. flag[i] = true and turn = i [GATE (2001)]
Q5. The semaphore variables full, empty and mutex are initialized to 0, n and 1 respectively. Process P1
repeatedly adds one item at a time to a buffer of size n, and process P2 repeatedly removes one item at a time
from the same buffer using the programs given below. In the programs K, L, M and N are unspecified
statements.
P1 P2
Q6. Suppose we want to synchronize two concurrent processes P and Q using binary semaphores S and T.
The code for the processes P and Q is shown below. Synchronization statements can be inserted only at points
W, X, Y and Z
111 | Interprocess Communication and Process Synchronization
Process P: Process Q:
while (1) { while (1) {
W: Y:
print '0'; print '1';
print '0'; print '1';
X: Z:
} }
Which of the following will ensure that the output string never contains a substring of the form 01n0 or 10n1
when n is an odd positive integer?
A. P(S) at W, V(S) at X, P(T) at Y, V(T) at Z, S and T initially 1
B. P(S) at W, V(T) at X, P(T) at Y, V(S) at Z, S and T initially 1
C. P(S) at W, V(S) at X, P(S) at Y, V(S) at Z, S initially 1
D. V(S) at W, V(T) at X, P(S) at Y, P(T) at Z, S and T initially 1 [GATE (2003)]
Q7. Consider two processes P1 and P2 accessing the shared variables X and Y protected by two binary
semaphores SX and SY respectively, both initialized to 1. P and V denote the usual semaphores
P1: P2:
While true do { While true do {
L1 : ................ L3 : ................
L2 : ................ L4 : ................
X = X + 1; Y = Y + 1;
Y = Y - 1; X = Y - 1;
V(SX); V(SY);
V(SY); V(SX);
} }
operators, where P decrements the semaphore value, and V increments the semaphore value. The pseudo-
code of P1 and P2 is as follows:
In order to avoid deadlock, the correct operators at L1, L2, L3 and L4 are respectively.
A. P(SY), P(SX); P(SX), P(SY)
B. P(SX), P(SY); P(SY), P(SX)
C. P(SX), P(SX); P(SY), P(SY)
D. P(SX), P(SY); P(SX), P(SY) [GATE (2004)]
Q8. Given below is a program which when executed spawns two concurrent processes:
semaphore X : = 0 ;
/* Process now forks into concurrent processes P1 & P2 */
P1 P2
repeat forever repeat forever
V (X) ; P(X) ;
Compute ; Compute ;
Operating Systems | 112
P(X) ; V(X) ;
Consider the following statements about processes P1 and P2:
1. It is possible for process P1 to starve
2. It is possible for process P2 to starve.
Q9. The enter_CS() and leave_CS() functions to implement critical section of a process are realized using
test-and-set instruction as follows:
void enter_CS(X)
{ while test-and-set(X); }
void leave_CS(X)
{ X = 0;}
In the above solution, X is a memory location associated with the CS and is initialized to 0. Now consider
the following statements:
The code for P10 is identical except it uses V(mutex) in place of P(mutex). What is the largest number
of processes that can be inside the critical section at any moment?
A.1 B. 2 C. 3 D. None
Q11. Consider the following threads, T1, T2, and T3 executing on a single processor, synchronized using
three binary semaphore variables, S1, S2, and S3, operated upon using standard wait() and signal(). The
threads can be context switched in any order and at any time.
T1 T2 T3
boolean flag[2];/*initially 0 */
int turn;
How does the solution differ from Peterson’s solution? Check and
justify whether the solution satisfy all the necessary criteria.
Q5. Consider the Dijkstra’s solution to -process CSP ( > 1) as given in Fig 3.35. The processes , ,… ,
share the following variables with the given initialisation among them.
enum pr_state = {idle, want_cs, in_cs};
int n; /* no. of processes >1 */
shared volatile pr_state flag[n] = {idle,..., idle};
shared volatile int turn =0;
Operating Systems | 114
Check and justify whether the solution meet all necessary criteria or not. If not, how to make necessary changes
in the given pseudo-code to fulfill the unmet criteria.
Q6. Bakery algorithm is one of the first solutions for n-process CSP. Fig 3.37 shows the pseudocode for process Pi.
The algorithm is proposed by Leslie Lamport and named by him as it mimics the service to customers in a
bakery (or a bank or a reservation counter, a pizza outlet etc.). n processes arrive at the bakery and each one
first takes a token (sequence number of getting the service). Each process gets the chance to enter into CS
strictly according to its token number.
Each process modifies its own variable but checks the values of others’ in the for loop and waits. Analyze
the algorithm given and answer the following:
i. Justify whether two or more processes can get the same token number or not.
ii. How mutual exclusion of CS is maintained?
iii. Does the solution have all necessary properties of a solution to a CSP? Justify.
iv. Is there a bound on the token number?
v. What can be the issues in the above solution for uniprocessor and multiprocessor systems? How can
they be addressed?
vi. Discuss if a bakery algorithm can be designed with the help of eventcounts and sequencers.
Q7. Provide an algorithmic solution to n-process (n>2) CSP using CAS. Does it meet all the necessary properties?
Justify.
Q8. Discuss the similarities and differences between a mutex and a binary semaphore.
115 | Interprocess Communication and Process Synchronization
Q9. Design a solution to the readers-writers problem with priority to writers, i.e., no writer should wait for a reader
when no reader is reading?
Q10. Describe the dining - philosophers problem and solution using a monitor for 7 philosophers.
Numerical Problems
Q1. Consider three concurrent processes P1, P2 and P3 as shown below, which access a shared variable D that
has been initialized to 100
P1 P2 P3
. . .
. . .
D=D+20 D=D-50 D=D+10
The processes are executed on a uniprocessor system running a time-shared operating system. If the minimum and
maximum possible values of D after the three processes have completed execution are X and Y respectively,
then the value of Y - X is______?
[GATE (2019)]
ANS : 80
Q2. Two concurrent processes P1 and P2 use four shared resources and , as shown below.
P1 P2
Compute; Compute;
Use R1; Use R1;
Use R2; Use R2;
Use R3; Use R3;
Use R4; Use R4;
Both processes are started at the same time, and each resource can be accessed by only one process at a
time. The following scheduling constraints exist between the access of resources by the processes:
P2 must complete use of R1 before P1 gets access to R1
P1 must complete use of R2 before P2 gets access to R2
P2 must complete use of R3 before P1 gets access to R3
P1 must complete use of R4 before P2 gets access to R4
There are no other scheduling constraints between the processes. If only binary semaphores are used to
enforce the above scheduling constraints, what is the minimum number of binary semaphores needed?
[GATE (2005)]
ANS : 2
Q3. Processes P1 and P2 use critical_flag in the following routine to achieve mutual exclusion. Assume that
critical_flag is initialized to FALSE in the main program.
get_exclusive_access ( )
{
if (critical _flag == FALSE) {
critical_flag = TRUE ;
critical_region () ;
critical_flag = FALSE;
Operating Systems | 116
}
}
Consider the following statements.
i. It is possible for both P1 and P2 to access critical_region concurrently.
ii. This may lead to a deadlock.
How many of the following statements hold?
ANS : (i)=true and (ii)=false
Q4. The enter_CS() and leave_CS() functions to implement critical section of a process are realized using test-and-
set instruction as follows:
void enter_CS(X)
{
while(test-and-set(X));
}
void leave_CS(X)
{
X = 0;
}
In the above solution, X is a memory location associated with the CS and is initialized to 0. Now consider the
following statements:
Q5. The following two processes P1 and P2 that share a variable B with an initial value of 2 execute concurrently.
P1()
{
C = B – 1;
B = 2*C;
}
P2()
{
D = 2 * B;
B = D - 1;
}
The number of distinct values that B can possibly take after the execution is______? [GATE (2015)]
117 | Interprocess Communication and Process Synchronization
ANS: 3
Q6. A counting semaphore was initialized to 10. Then 6P (wait) operations and 4V (signal) operations were completed
on this semaphore. The resulting value of the semaphore is_____? [GATE (1998)]
ANS : 8
Q7. Consider a non-negative counting semaphore S. The operation P(S) decrements S, and V(S) increments S.
During an execution, 20 P(S) operations and 12 V(S) operations are issued in some order. The largest initial
value of S for which at least one P(S) operation will remain blocked is_____? [GATE (2016)]
ANS: 7
PRACTICAL
1. Write a program to create two child processes (or two threads) that share a variable. You allow the
processes (or threads) to concurrently run. While in one process (thread), increment the variable, in the
other, decrement it along with simultaneously printing the values. See whether race conditions appear or
not.
2. In the same manner, implement the producer-consumer problem using a bounded buffer by enacting a
process (or thread) as a producer and another a consumer respectively. From the producer process (or
thread) write onto the buffer and print the item (may be an integer representing the item). Do you observe
any situation where nothing is printed for an indefinite amount of time (deadlock)?
3. Create a shared memory. Write a program to write onto the shared memory and print the content written.
Write another program to read from it and print the content. Every time you print the process id as well.
From a number of different terminals, run several instances of readers and writers and see their concurrent
execution. Observe starvation and deadlock, if any.
4. See necessary documentation from the web and references, learn and solve the concurrency using
semaphores.
5. Implement the dining philosophers’ problem in Java using a monitor.
KNOW MORE
Interprocess communication mechanisms in general are described in [Hal15]. For practical implementation
in UNIX, necessary details can be found in [RR03] and [SR05]. However, in a very detailed discussion
with theoretical treatment on IPC, semaphores in UNIX systems can be obtained in [Vah12] and [Bac05].
Race conditions, mutual exclusion, critical sections and different synchronization tools are discussed in
general in [Hal15], [SGG18] and [Sta12].
Different algorithmic efforts towards CSPs like Dekker solution, Dijkstra solution, Bakery algorithm,
Sleeping Barbers problem and several others are discussed briefly in [Hal15] and elaborately in [Dow16]
with implementation help.
Classical synchronization problems like producers-consumers problems, readers-writers problem and
dining philosophers’ problem in general are well explained with elaborate diagrams in [Sta12].
Synchronization primitives as offered in Windows OS are discussed in [YIR17].
[Bac05] Maurice J Bach: The Design of the UNIX Operating System, Prentice Hall of India, 2005.
[Dow16] Allen B. Downey: The Little Book of Semaphores, 2e, Green Tea Press, 2016 (available at
https://greenteapress.com/semaphores/LittleBookOfSemaphores.pdf as on 9-Oct-2022).
[Hal15] Sibsankar Haldar: Operating Systems, Self Edition 1.1, 2015.
[RR03] Kay A. Robbins, Steven Robbins: Unix™ Systems Programming: Communication, Concurrency,
and Threads, Prentice Hall, 2003.
Operating Systems | 118
[SR05] Richard W Stevens, Stephen A Rago: Advanced Programming in the UNIX Environment (2nd
Edition), Addison-Wesley Professional, 2005.
[SGG18] Abraham Silberschatz, Peter B Galvin, Greg Gagne: Operating Systems Concepts,10th Edition,
Wiley, 2018.
[Sta12] William Stallings: Operating Systems Internals and Design Principles, 7th Edition, Prentice Hall,
2012.
[Vah12] Uresh Vahalia: UNIX Internals, The New Frontiers, Pearson, 2012.
[YIR17] Pavel Yosifovich, Alex Ionescu, Mark E. Russinovich, and David A. Solomon: Windows Internals,
Seventh Edition (Part 1 and 2), Microsoft, 2017. https://docs.microsoft.com/en-
us/sysinternals/resources/windows-internals (as on 8-Jul-2022).
UNIT SPECIFICS
Through this unit we have discussed the following aspects:
Deadlocks: Definition, Necessary and sufficient conditions for Deadlock, Deadlock Prevention
Deadlock Avoidance: Banker’s algorithm, Deadlock detection and Recovery.
This chapter discusses a negative fallout of concurrent execution - deadlocks. A process (or more specifically a
thread) needs a number of resources to accomplish a job. While some are hardware resources like processors,
registers, main memory, printers, scanners etc.; some are software like files, shared objects, sockets etc. and
some are combination of hardware and software objects including synchronizing constructs (e.g., locks,
semaphores, mutex, critical regions, monitors etc.). If the resources are non-shareable (i.e., they need to be
accessed mutually exclusively) and finite in numbers, simultaneous demands from several threads throws a
challenge to the system - while one thread holds a resource, others demanding for the same resource have to
wait. If the holding and waiting for a set of resources by several threads are such that everyone waits for release
of one or more resources held by some other, none can proceed and fall into a state of indefinite starvation
known as a deadlock. The concept of deadlock, its formation criteria, prevention and avoidance principles and
mechanisms are discussed. If deadlocks cannot be prevented due to some reasons, how they can be detected
and how the system can recover from it are also explained. For every concept, wherever required, necessary
definitions, algorithms and adequate examples are provided.
Like previous units, a number of multiple-choice questions as well as questions of short and long answer
types following Bloom’s taxonomy, assignments through a number of numerical problems, a list of references
and suggested readings are provided. It is important to note that for getting more information on various topics
of interest, appropriate URLs and QR code have been provided in different sections which can be accessed or
scanned for relevant supportive knowledge. “Know More” section is also designed for supplementary
information to cater to the inquisitiveness and curiosity of the students.
RATIONALE
This unit on deadlocks starts with an informal introduction to the concept of different stalemate situations. Few real-
life examples of deadlocks are provided, clearly pointing out the differences with livelocks before going into the
technical terms in the context of operating systems. Necessary definitions are then introduced so that the concept can
be discussed with appropriate rigor and preciseness. Different types of computing resources are mentioned, and
which type can cause deadlock are clearly pointed out. Also, under what conditions a deadlock will result (the
necessary and sufficient conditions) is discussed with reasonable detail. How to prevent occurrence of a deadlock,
whether it can be avoided in the runtime, or, if it happens, how to recover from it are explained with necessary
algorithms and examples.
This unit builds the fundamental concepts to understand deadlocks - a negative fallout of the concurrent execution
environment of an OS. The concepts developed here are central and critical to comprehend and appreciate the
interaction of threads (also processes) with computing resources.
Operating Systems | 120
PRE-REQUISITES
UNIT OUTCOMES
List of outcomes of this unit is as follows:
U4-O1: Define deadlocks, livelocks, resources, necessary conditions for deadlock formation.
U4-O2: Describe a deadlock situation and its difference with a livelock, different deadlock handling
techniques, Banker’s algorithm for deadlock avoidance and detection, recovery from a deadlock.
U4-O3: Understand the connection among several conditions leading to a deadlock and thus how to
prevent, avoid and recover from a deadlock.
U4-O4: Realize the overhead involved in deadlock prevention, avoidance mechanisms.
U4-O5: Analyze and compare different deadlock handling mechanisms.
U4-O6: Design cost-effective and practical solutions for handling deadlocks in an OS.
CO-1: To learn the mechanisms of OS to handle processes and threads and their communication
CO-2: To learn the mechanisms involved in memory management in contemporary OS.
CO-3: To gain knowledge on distributed operating system concepts that includes architecture, Mutual
exclusion algorithms, deadlock detection algorithms and agreement protocols.
CO-4: To know the components and management aspects of concurrency management
4.1 INTRODUCTION
The term deadlock comes from two words dead and lock. It symbolizes a lock that is closed and whose key is, as
if, lost. In real-life, deadlock means a situation where a group of entities (at least two people or objects) are
engaged with each other in such a way that none in the group can proceed as another from the group obstructs.
As a dead lock cannot open on its own and needs to be broken by external forces, a deadlock situation does not
resolve on its own.
Deadlock or stalemate situations are often found in real life. At a road crossing, uncontrolled traffic often causes
deadlock (Fig 4.1). No cars can move as there are no spaces in any of the directions. These deadlocks are not
resolved unless some external efforts are applied (by traffic police or voluntary efforts from individuals).
However, sometimes a group of entities temporarily face obstructions from each other, but they themselves can
try and resolve it. If the entities involved can come out of the stalemate on their own - it is not a deadlock. Even
though their attempts may fail repeatedly, eventually they can come out of the stalemate, maybe after several
attempts. For example, in Fig 4.2 the cars are not actually in a deadlock, if there are spaces behind them. The cars
can come a little backward. The apparent lock can be easily resolved if a pair of opposite cars come back and wait
(say, the north and the south-bound cars) and allow the other pair (say, the east and the west-bound ones) to go
forward.
If both pairs of cars simultaneously come back and try to go forward at the same time, there will be a locking
situation. Their attempts to go forward may fail, if the attempts are synchronized each time - this kind of lock is
called a livelock, and not a deadlock. In a livelock, the entities do not hold the resources (here free space in front
of the cars) continuously. Rather, they can attempt to make progress, but the attempts continuously fail due to
some reason. One can hope that their attempts will succeed, and they can come out of the lock after one or more
attempts. How many attempts will be needed, however, is completely unpredictable.
On the contrary, the situation would be a deadlock if there are no spaces behind when other cars also line up in
all the four directions as illustrated below (Fig 4.3).
Traffic deadlock at a crossroad can be explained with an example. Suppose four fleets of cars are approaching
the crossroad from four directions but have not crossed the junction. There are open spaces in front marked as
A, B, C, D (Fig 4.3a). All the cars want to go straight crossing the junction. For example, cars from west would like
to go straight to the east crossing region A & D, cars from south head north crossing B & A and so on. If the
junction is signalled and the cars stop at the crossing, there need not be any problem.
However, if there is no signaling system, there is a possibility of a deadlock. The deadlock happens when
every car proceeds straight simultaneously such that the east-bound first car occupies space A, the north-bound
space B, the west-bound space C and the south-bound space D. No space is left for any of the cars to move ahead,
neither to move back as well (almost) unlike Fig 4.2. The stalemate will continue forever (Fig 4.3b). The open
spaces (A, B, C, D) are important resources here. Deadlock happened as each car on the front occupied a piece of
8
Picture courtesy: https://www.worldatlas.com/articles/the-biggest-traffic-jams-in-history.html
9
Picture courtesy: https://twitter.com/cartoonlka/status/1069775359695093761
Operating Systems | 122
land and needed another that was occupied by another car and their occupation and requirement of space made
a chain or cycle.
4.2 DEFINITION
In operating systems, deadlock is a serious problem caused by concurrent execution of processes (or threads). It
refers to a situation where a set of concurrent processes (or threads) perpetually block or starve for want of some
resources held by some other processes (or threads) within the set. The processes (or threads) cannot come out
of the situation on their own.
Concurrency offers a set of advantages like increased CPU utilization and throughput but throws serious
challenges as well. In the last unit (Unit 3), we studied the issue of race conditions due to attempts of concurrent
execution of critical sections. Remember that two of the necessary conditions of solutions to CSPs are progress
or liveness (all processes or threads involved will progress and no process or thread will block forever) and
bounded wait or starvation freedom (no process should wait or starve indefinitely). Critical section problems can
cause starvation to one or more processes (or threads), specific to execution of critical sections.
But the issue of deadlock is more general and pervasive. The processes (or threads) involved in a deadlock cannot
proceed any further (not only execution of critical sections but non-critical sections as well). Deadlock is
characterized by the following:
i. it is caused for the want of computing resources (of any type).
ii. nature of starvation is perpetual.
iii. starvation occurs to more than one processes (or threads) simultaneously.
iv. the set of processes (or threads) have dependencies on each other in such a manner that they cannot
come out of the perpetual stalemate on their own.
A deadlock differs from a livelock in the starvation in a livelock is not permanent and the entities involved in the
livelock can resolve on their own without necessarily requiring external efforts.
4.2.1 Examples
Recall the dining philosophers’ problem in Sec 3.8.3. In the first naive attempt to solve the problem, every
philosopher picks up the left fork first and then the right fork. Picking up the forks was considered a critical section
and thus was guarded using semaphores. However, in an extreme case, when every philosopher is hungry at the
same time, everyone can pick up her left fork and cannot get the chance to pick up the right fork. All the
123 | Deadlocks
philosophers wait for the right fork, but nobody gets it as nobody puts down the fork as their eating is not complete.
This literally creates an indefinite starvation or deadlock (In real-life, this can be a livelock as any of the philosophers
can voluntarily release a fork by courtesy and allow her neighboring colleague to proceed to eating! But in a
programmed environment, this courtesy cannot be seen unless programmed!!).
In a multithreaded environment, semaphores or mutex locks can cause deadlocks as illustrated below. The example
uses POSIX mutex locks (Fig 4.4). Two threads thread1 and thread2 acquire two locks mutex1 and mutex2
Operating Systems | 124
for doing some thread-specific critical section. thread1 acquires the locks mutex1 followed by mutex2,
whereas thread2 acquires them in the reverse order. In a single processor system, thread1 can acquire lock
mutex1 immediately followed by thread2 acquiring mutex2. Or, in a multiprocessor system, both the threads
can acquire the first mutex locks simultaneously before either can get the next lock. Then, no thread can successfully
acquire two mutex locks. The thread1 holds mutex1 and waits for mutex2 held by thread2 and vice-versa.
Thus, neither can proceed. This kind of deadlock, although happens occasionally, is quite commonplace and not
easily detectable.
The above problem can be resolved using POSIX pthread_mutex_trylock() (Fig 4.5). Here, each thread
attempts to acquire the lock only if it is available, otherwise it immediately releases the already held mutex.
However, this may cause a livelock situation if both the threads acquire a mutex lock simultaneously. None gets the
other lock as the invocation of pthread_mutex_trylock()fails and simultaneously releases the already-
acquired mutex locks (mutex1 by thread1 and mutex2 by thread2). Livelocks continue when threads retry
simultaneously. The stalemate can be broken if each thread attempts retry at random times.
With the background, we shall discuss deadlocks in more detail, specifically in the context of operating systems.
To that end, we are required to define and discuss a few concepts as given below.
4.2.2 Resources
A computing resource can be any object (hardware or software) that a process (or thread) requires to complete its
execution. Hardware resources can be processors, network cards, memory elements, I/O devices; software
resources can be files, shared objects (In UNIX, .so files), sockets, messages or synchronization tools like
semaphores. mutex locks etc. A computing system can have one or more instances of each resource type, but only
a finite number of instances. If the resources are shareable among the processes (or threads), i.e., the resources
can be accessed simultaneously by more than one threads (like read access to a file by several readers) - there will
not be problems of deadlock. But often the resources are non-shareable, i.e., they cannot be accessed
simultaneously by more than one thread (e.g., simultaneously both read and write of a file or simultaneous writes
to it). This non-shareable use can happen on the following two types of resources.
Reusable resources: The use of the resource does not expire, i.e., the resource can be used by several threads one
after another without any loss. Example: processors, memory elements, network devices, I/O devices, files, sockets,
locks, semaphores etc.
Consumable resources: The resource is here for single-use. Once it is used by a thread, it no longer exists. For
example: ephemeral messages.
Each resource category has only a finite number of instances of each resource type. For example, a computer can
have only a few processors, a finite number of registers, memory cards, network cards, printers, scanners, sockets,
buffers, semaphores, mutex locks etc.
Deadlocks occur because of the non-shareable use of reusable and consumable resources that are finite in numbers.
When the total demand of a resource type is more than the available number of its instances (e.g., there are 3
processors in a system, but 5 processes want to simultaneously run) - some of the demanding process(es) (or
thread(s)) need to be blocked. If there are one or more processes (or threads) that demand for one or more non-
shareable resources held by one or more in the group in such a way that everyone blocks - deadlock happens.
point onward, we shall consider threads as stakeholders in a deadlock. Resources are considered demanded /
requested by threads and used by threads (and not processes). Also, the discussion of deadlocks will be done in the
system context as deadlocks in the user context are supposed to be dealt with by the application developers.
With the technical background given above, we can now define a deadlock more precisely. A deadlock can occur if
all the following conditions are satisfied simultaneously.
i. Mutual Exclusion in use of resources: When resources are used by threads non-sharably, then only there
may emerge a possibility of deadlock. There should be at least one resource that is used by threads in a
mutually exclusive way - i.e., only one thread can use an instance of the resource at a time. If another
thread wants to use the same instance of the resource, the thread needs to wait till the first thread releases
the resource.
ii. Hold and Wait for resources: During execution, threads are allowed to hold one or more resources and, at
the same time, request to acquire a few more resources held by other thread(s).
iii. No Preemption of resources: None of the resources are preempted from the threads that hold them. A
thread releases the resources voluntarily when either their need is over, or the thread terminates.
iv. Unresolvable Circular Wait: A set of threads = { , , … , } hold and wait for resources from a set R
= {{ , … , } in such a way that → , → , → , …, → and → . i.e., threads
and resources make a cycle in the resource allocation graph with assignment and request edges.
But as soon as is complete, it can release an instance of , and then can acquire it and the cycle is broken. A
deadlock-like situation (not actual deadlock) is thus resolved (then Fig 4.7a becomes Fig 4.5).
However, if thread requests for resource , there will be two cycles as follows (Fig. 4.7b).
127 | Deadlocks
→ → → → and → → → → .
Neither can be resolved if the other three conditions (Condition i - iii) also hold true. Then, threads , and
are deadlocked.
All the above four conditions (i - iv) are therefore necessary and sufficient to form a deadlock. They are necessary
as not fulfilling even a single condition can stop forming the deadlock. For example, in most of the running systems,
the first three conditions (Condition i - iii) are often satisfied - meaning that there is a possibility of a deadlock in
the system, but the system may not be in a deadlock. If the fourth condition is also satisfied then a deadlock
happens, for sure, when all resources have a single instance. When there are multiple instances of resources, a
cycle denotes only a possibility of a deadlock. Whether there is a deadlock or not depends on whether the requests
can be fulfilled after some time or not.
No more criteria other than the above four (Condition i - iv) are needed to form a deadlock - hence these four
conditions are sufficient.
Deadlocks are undesired fallout of concurrency. They happen when all the four conditions stated above hold true
simultaneously. To stop occurrences of deadlocks, we must make sure that not all the four conditions are true at
any point of time. In other words, at least one of the four conditions must be negated.
In most cases, the condition of mutual exclusion is non-negotiable, simply because the resources which are non-
shareable cannot be shared. We must thus negate one of the other three conditions.
Now requests for resources and their allocations are very dynamic in nature. Keeping track of this dynamism for
hundreds of threads and resources and then taking appropriate actions require both space and time. It is thus up
to the OS designers to decide what strategy can be adopted to handle deadlocks, based on the constraints in space
and time. The strategies are clubbed into the following three categories.
1. Deadlock Prevention: Requests to resources are monitored and allowed to be made only if all the four
conditions are not satisfied simultaneously.
2. Deadlock Avoidance: The threads notify their overall need of resources in advance and the resources are
allocated only if the allocation is safe (it does not lead to the possibility of a deadlock).
3. Deadlock Detection & Recovery: Deadlocks are allowed to happen. But they are detected, and appropriate
recovery actions are taken.
Steps are taken so that, at no time, all the four conditions of a deadlock are met simultaneously. We consider each
of the conditions again and discuss how a particular condition can be prevented to occur.
shareable. Hence, when the resources are non-shareable, mutual exclusion is an absolute necessity and thus cannot
be compromised.
resources, current state of resource allocation to different threads and their outstanding resource-needs to
complete execution. We call a system safe if all the threads in the system can complete their execution with the
available resources without facing any deadlock. When the outstanding needs of a system of threads (even for a
single thread) cannot be met with the available resources, we call the system unsafe. An unsafe system does not
necessarily mean a deadlock but indicates the possibility of a deadlock in future (that may or may not come in
reality) (Fig 4.8). A system can go from a safe state to an unsafe one, and vice versa. However, from an unsafe state,
it can go to a deadlocked state from which a system cannot come out on its own.
Deadlock avoidance algorithms check when a system attempts to slip from a safe state to an unsafe one and stops
it so that a deadlock can never arise. Given any scenario of resource allocation, the algorithms try to find a safe
sequence of threads ( , , … , ) in which the outstanding needs of resources for each of the threads
{ , , … , } can be met ( > 1 and ′ not necessarily = ) without putting any of the threads in an unsafe state.
If a sequence can be found, resource allocation and resource reclamation (when a thread completes its execution,
all its resources are reclaimed, and they add to the available resources) must be done in the sequence to avoid
deadlock. If more than one such sequence is found, randomly anyone can be used. If no such sequence exists, the
system is in an unsafe state which can lead to a deadlock. Requests for new resources (incremental demands) are
not honored then.
Example: Let us consider a system of three threads { , , } using a single resource that has (3 + 2) instances
( > 2). , and require (2 + 1), ( + 2), and (2 − 1) instances to complete their execution
respectively.
Initially (at time ), if all the threads notify their total requirement and the threads execute only sequentially one
after another, each can complete its execution safely in any order.
However, during concurrent execution, at some time, say , thread , and hold ( + 1), 2, and ( − 1)
instances of respectively. Hence, number of unallocated or available resources is = (3 + 2) −{( + 1) + 2 +
( − 1)} = units. The outstanding needs of , and are , and units respectively. Here, total
outstanding need by all the threads = (3 ) units > . Hence, all the outstanding needs of all the threads cannot
be met simultaneously. However, if we have a sequential allocation and reclamation of resources, any sequence of
allocation is safe. Hence, the system is in a safe state (in fact, there are six safe sequences possible, namely:
( , , ) or ( , , ) or ( , , ) or ( , , ) or ( , , ) or ( , , ).
After consider, at another time , each of , , make an incremental request of one more instance of . If
such requests are met, , will hold ( + 2), 3 and instances respectively with each having
outstanding needs at ( − 1) instances. But number of available resources = (3 + 2) - {( + 2) + 3 + } =
( − 3) units. Hence, the outstanding need of no thread can be satisfied, or no safe sequence can be obtained.
This will be a deadlock situation. This kind of allocation at time throws the system in an unsafe state which then
leads to a deadlock.
However, note that allocation to only a single thread (say, ) will not put the system in an unsafe state as there
will be ( − 1) instances available. Even though other two threads ( ) have outstanding need of units,
we can allow the thread to proceed allocating all the available ( − 1) instances. When it completes, all the
resources it holds can be reclaimed ( 2 + 1 for ) and allow any one of the other two threads ( or ) to proceed
till its completion first and then the third in a sequence.
A little thought will reveal that, at time , we can always have at least one safe sequence if allocation is made to a
single thread. As soon as we allocate any resource to the second thread (at time , before the first thread
completes its execution), we enter an unsafe state (we fail to find a safe sequence).
Deadlock avoidance algorithms checks safety whenever new request is made (at time ) and allows allocation only
if the system remains safe after such allocation. If the system becomes unsafe (as in the above case), the allocation
is not granted.
Operating Systems | 130
One of the most popular deadlock avoidance techniques is known as Banker’s algorithm - which always ensures
that a system of threads is in a safe state before and after any allocation of resources. The name comes from the
fact that a bank needs to allow withdrawal of cash in such a way that it can meet cash requirements of all its
customers at any given time.
The algorithm considers n threads { , , … , } and resource-types { , , … , } each having one or more
instances. Let us define some vectors and matrices necessary for discussing the algorithm.
Resources: Total available resources are represented by a m-dimensional vector,
= [ , ,…, ] where each indicates total number of instances for resource-type in the system.
Each thread has allotment or requirement of resources represented as a vector [ , ,… , ] where
represents the number of instances of resource-type in .
Maximum resource needs: Total requirement of all resource types by different threads is represented by a ( ×
)
matrix
= [[ , , … , ] [ , ,…, ] … [ , ,…, ]] where [ ][ ] indicates maximum need of
thread for resource type , given by .
Resource allocation: Similarly, current allocation of resources at a given time, is also represented by another ( ×
) matrix,
= [[ ′ , ′ , … , ′ ] [ ′ , ′ , … , ′ ] … [ ′ , ′ ,…, ′ ]] where ′ stands for number of
instances of type allocated to .
Available resources: As resources are allocated to threads, free and available instances of resources reduce. The
current number of available instances of resources is represented by an -dimensional vector
= [ ′ , ′ , … , ′ ] where each indicates number of instances available at a given moment for
resource-type
Outstanding needs: Once the threads are allocated resources, remaining resource needs of the threads are also
represented by an × matrix
NEED = [[ ′′ , ′′ , … , ′′ ] [ ′′ , ′′ , … , ′′ ] … [ ′′ , ′′ , … , ′′ ]] where ′′ stands for number of
instances of type still needed by to complete its execution.
Resource requests: Another matrix of ( × ) dimension represents new (incremental) need of all the threads,
REQ = [[ ′′′ , ′′′ , … , ′′′ ] [ ′′′ , ′′′ , … , ′′′ ] … [ ′′′ , ′′′ , … , ′′′ ]] where stands for number of
instances of type newly needed by .
2. [ ]≥∑ [ ][ ] for all , (sum of allocated instances of any resource-type cannot be more than
total number of instances at any moment)
4. [ ][ ]= [ ][ ] - [ ][ ] ≥0 for all ,
131 | Deadlocks
1. Checking safety of the system, given the available resources (vector ), current state of allocation (matrix
), and the outstanding need (matrix ), is done by function check_safety(). It tries to first find a single
thread whose current requirements can be fulfilled with the available instances of resources (Step 1). If the thread
gets the resources, completes its execution and returns all the resources, it is checked for another thread (Step 2)
and so on. This way it is checked whether current needs of all the threads can be satisfied or not. Thus, if a complete
sequence of all the threads that can complete their execution is found, the function declares safety (Step 3) and
the sequence is called a safe sequence.
Step 1 here needs a search of maximum n threads to find the first thread in the sequence, followed by that of
maximum ( − 1) threads for the second, and so on. For each thread , we need to check [ ] vector requiring
comparisons. Hence, the function has a complexity of ( ).
2. Whenever a thread makes an additional request as given by [ ] by a thread, say , before granting, the
algorithm checks whether the request can be granted safely (shown in function grant_request()). First, if the
incremental request is more than its outstanding need, it is outright rejected flagging error message (Step 1 & 2). If
Operating Systems | 132
the request is within declared maximum need, but less than available resources at present, the thread is not granted
resources and must wait till the resources become available (Step 3). Otherwise, it is assumed as if the resources
are granted, we modify the vectors (Step 4) and check for safety (Step 5) before the actual allocation. If the assumed
allocation is safe, permission for allocation is granted and actual allocation is done. This function is called for each
thread requesting resources and involves comparisons of resources, thus has the complexity of ( ).
Example: Consider the following snapshot of a system:
=
A B C D
0 0 0 0
T0 0 7 5 0
T1 1 0 0 2
T2 0 0 2 0
T3 0 6 4 2
T4
b. T0 does not need any resources. So, when T0 is complete, AVAIL = AVAIL + ALLOC[0] = [1 5 3 2].
Now, either T1 or T4 can complete. Considering T1 gets over first, AVAIL = [3 14 11 8] > NEED [4]. Hence, T4 can also
complete. The system is safe, and one of the safe sequences is T0→T2→T3→T1→T4.
A B C D
T0 0 0 1 2
T1 1 4 2 0
T2 1 3 5 4
T3 0 6 3 2
T4 0 0 1 4
modified NEED =
A B C D
T0 0 0 0 0
T1 0 3 3 0
T2 1 0 0 2
T3 0 0 2 0
T4 0 6 4 2
Hence, the system will be safe (one safe sequence is T0 →T2→T1→T3→T4) after grant of the request from T1. It
can be immediately granted safely.
Banker’s algorithm can be easily implemented at the user level and students should be encouraged to do it as a
programming exercise.
Tools: Linux kernel ensures that the resources are acquired in a proper order so that deadlock does
not occur. However, Linux also provides a feature-rich tool lockdep to check locking order in the
kernel10.
10
https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt (as on 4-Nov-2022)
Operating Systems | 134
and only if → and → in the original RAG. The edge → indicates that waits for a resource held by
. For example, a wait-for graph for the RAG in Fig 4.4 can be drawn as in Fig 4.10. Fig 4.11 shows how a wait-for
graph can be drawn from a RAG. A cycle in the wait-for graph denotes a deadlock and all the threads in the cycle
are deadlocked. A cycle in a graph can be detected in ( ) time, where is the number of nodes in a graph.
Tools: In Linux, BCC toolkit can detect potential deadlock using deadlock_detector that finds
cycles in the mutex locks in the user code11.
11
https://github.com/iovisor/bcc (as on 4-Nov-2022)
135 | Deadlocks
Example: For the following matrices, check whether there is any deadlock or not?
Soln. Here, for T3, present allocation is all zero. Hence, it can not cause any deadlock and thus is left out in
the deadlock detection algorithm.
Hence, the detection algorithm stops with T0 and T1 unmarked or, with hold_res[0] = hold_res[1] = 1, i.e.
deadlock exists in the system and T0 & T1 are in deadlock.
Resource Preemption: The other alternative is to forcefully take the resources from the threads that are involved
in the cycle and give them to other threads of the cycle that can complete execution. Pertinent questions are as
follows:
(i) selection of victims: which resources are to be chosen and from which threads? The deciding factor can be the
cost of preemption: number of resources, percentage of completed execution, etc. The preempted resources need
to be re-assigned to the thread and the thread re-executed, if not the entire process.
(ii) rollback: The victim process cannot proceed for want of preempted resources. If it is not aborted, then it needs
to be rolled back to a safe state from which it can resume its operation when the resources are re-assigned.
Appropriate mechanisms should be in place so that process states are stored for possible rollback.
(iii) starvation: If cost is the only deciding factor behind victim selection, then it is possible that the same victims
are selected repeatedly. The victim processes then face repeated denial of progress or starvation. To mitigate this,
the victim selection algorithm may incorporate the number of rollbacks happening to a thread as one of the deciding
factors.
137 | Deadlocks
UNIT SUMMARY
This chapter discusses deadlocks in concurrent execution of threads (or processes).
Deadlocks happen when requests for non-shareable computing resources cannot be met at all.
Deadlocks emerge when two or more threads hold some resources and simultaneously request
for some more which are held by some other threads. All the involved threads make a cycle in the
resource allocation graph.
There are four necessary and sufficient conditions needed to form a deadlock: mutual exclusion,
hold and wait, no preemption and circular wait.
Mutual exclusion of resources means resources are used in a non-shareable manner (i.e., one
resource instance is used by only one thread at a time).
Hold & wait means a thread can hold some resources and wait for getting some more.
No preemption does not allow forceful release of resources from any threads.
Circular wait is the formation of a cycle in the resource allocation graph involving threads and
resources.
Deadlocks are handled in three ways: prevention, avoidance and detection & recovery.
Prevention techniques are the most restrictive ones that negate at least one of the four necessary
conditions.
Avoidance techniques are comparatively lenient, where safety of the system of threads is checked
before each new allocation of resources. Banker’s algorithm is a popular deadlock avoidance
technique.
In detection & recovery, deadlocks are allowed to happen. At appropriate intervals, detection
algorithms are run. When detected, either involved processes are terminated or resources are
preempted.
EXERCISES
Multiple Choice Questions
Q1. Which of the following statements is/are TRUE with respect to deadlocks?
Q2. A system has 6 identical resources and N processes competing for them. Each process can request at
most 2 resources. Which one of the following values of N could lead to a deadlock?
A. 1 B. 2 C. 3 D. 4 [GATE(2015)]
Q3. Which of the following is not true with respect to deadlock prevention and deadlock avoidance schemes ?
A. In deadlock prevention, the request for resources is always granted if resulting state is safe
B. In deadlock avoidance, the request for resources is always granted, if the resulting state is safe
C. Deadlock avoidance requires knowledge of resource requirements a priori
D. Deadlock prevention is more restrictive than deadlock avoidance [ISRO(2017)]
Q4. Consider a system with 3 processes that share 4 instances of the same resource type. Each process can
request a maximum of K instances. Resources can be requested and released only one at a time. The largest
value of K that will always avoid deadlock is ___. [GATE (2018)]
Operating Systems | 138
Q5. In a system, there are three types of resources: E, F and G. Four processes P0, P1, P2 and P3 execute
concurrently. At the outset, the processes have declared their maximum resource requirements using a matrix
named Max as given below. For example, Max[P2, F] is the maximum number of instances of F that P2 would
require. The number of instances of the resources allocated to the various processes at any given state is given
by a matrix named Allocation. Consider a state of the system with the Allocation matrix as shown below, and in
which 3 instances of E and 3 instances of F are the only resources available.
Allocation Max
E F G E F G
P0 1 0 1 P0 4 3 1
P1 1 1 2 P1 2 1 4
P2 1 0 3 P2 1 3 3
P3 2 0 0 P3 5 4 1
From the perspective of deadlock avoidance, which one of the following is true?
A. The system is in safe state
B. The system is not in safe state, but would be safe if one more instance of E were available
C. The system is not in safe state, but would be safe if one more instance of F were available
D. The system is not in safe state, but would be safe if one more instance of G were available
[GATE(2018)]
Q6. Consider the following snapshot of a system running n concurrent processes. Process i is holding Xi
instances of a resource R, 1<=i<=n . Assume that all instances of R are currently in use. Further, for all
i, process i can place a request for at most Yi additional instances of R while holding the Xi instances it already
has. Of the n processes, there are exactly two processes p and q such that Yp=Yq=0 . Which one of the
following conditions guarantees that no other process apart from p and q can complete execution?
A. Xp + Xq < Min{Yk ∣ 1 ≤ k ≤ n, k ≠ p, k ≠ q}
B. Xp + Xq < Max{Yk ∣ 1 ≤ k ≤ n, k ≠ p, k ≠ q}
C. Min(Xp, Xq) ≥ Min{Yk ∣ 1 ≤ k ≤ n, k ≠ p, k ≠ q}
D. Min(Xp, Xq) ≤ Max{Yk ∣ 1 ≤ k ≤ n, k ≠ p, k ≠ q} [GATE(2019)]
Q7. A system has 3 user processes each requiring 2 units of resource R. The minimum number of units of R
such that no deadlock will occur-
A. 3
B. 5
C. 4
D. 6
Q1. “Livelock can be opened, but a deadlock needs to be broken.” Justify or refute the statement.
Q2. Justify why the necessary conditions of a deadlock are sufficient.
Q3. Construct a resource allocation graph using four process and four resources such that
a. the graph has a cycle and processes are deadlocked,
b. the graph has a cycle but the processes are not in a deadlock.
Q4. Discuss different types of deadlock prevention techniques.
Q5. What do you mean by safety of a system? Explain how it is related to deadlock in the system.
Q6. Illustrate with examples when a system is not safe, but a deadlock is not formed in the system.
Q7. Discuss how banker’s algorithm is related to avoidance as well as detection of deadlocks.
Q8. Discuss different issues with recovery from a deadlock.
Q9. Why do most commercial OS not implement any OS handling mechanisms? Explain.
Numerical Problems
Q1. A system has 9 user processes each requiring 3 units of resource R. What is the minimum number of units
of R such that no deadlock will occur?
[Hint: a deadlock happens when each process holds resources less than its maximum demand, but outstanding
need of none is fulfilled from available resources. Hence, for deadlock, NEED[i] > AVAIL for all i. No deadlock
means NEED[i] >= AVAIL for at least one i] [Ans. 19]
Q2. If there are 7 units of resource R in the system and each process in the system requires 2 units of resource
R, then how many processes can be present at maximum so that no deadlock will
occur? [Ans. 6]
Q3. Consider a system having m resources of the same type being shared by n processes. Resources can be
requested and released by processes only one at a time. Derive the condition necessary for the system to be
deadlock-free. [Ans. sum of max need < m+n]
Q4. Suppose there are 4 tape drives, 2 plotters, 3 scanners and 1 CD drive in a system. They are allocated to
3 processes in the following order: P1: [0 0 1 0] P2: [2 0 0 1] and P3: [0 1 2 0]. If the processes request for
additional needs as P1: [2 0 0 1] P2: [1 0 1 0] and P3:[2 1 0 0], check whether the requests can be safely met.
PRACTICAL
Q1. Implement a deadlock detection algorithm while there are single instances of every resource. Input number
of resources and that of processes and different edges among resources and processes. Use any language of
your choice.
Operating Systems | 140
Q2. Implement banker’s algorithm in any language of your choice with number of processes, number of
resources as inputs. Also take the maximum number of resource instances, allocation matrix and immediate
need matrix as inputs to determine safety of a system.
Q3. Explore lockdep and bcc toolkit to learn their use in Linux kernel.
KNOW MORE
Deadlocks are discussed in general with good detail in [SGG18]. It nicely points out the differences with
livelock with examples from POSIX threads.
[Sta12] explains deadlock with timing diagrams to illustrate the difference between the possibility of a
deadlock and actual deadlock. It also contains good examples of different types of resources that can
cause deadlock. The book also provides a nice summary of three deadlock handling techniques.
[Hal15] sees deadlock as part of process synchronization and provides a brief and summarized version in
general.
[Dow16] illustrates a few examples of deadlock involving synchronizing tools: semaphores, barriers, mutex
locks.
[Dha09] gives a general introduction followed by brief discussion on deadlocks in UNIX and Windows
systems. The book also provides rich references to the seminal and original work on deadlocks.
[Bac05] and [Vah12] discuss deadlocks and their avoidance in the UNIX system, specifically in the locks
and file system both in single-processor, multiprocessor and distributed environments.
[YIR17] contains the issue of deadlock in Windows operating systems.
[Bac05] Maurice J Bach: The Design of the UNIX Operating System, Prentice Hall of India, 2005.
[Dha09] Dhananjay M. Dhamdhere: Operating Systems, A Concept-Based Approach, McGraw Hill, 2009.
[Dow16] Allen B. Downey: The Little Book of Semaphores, 2e, Green Tea Press, 2016 (available at
https://greenteapress.com/semaphores/LittleBookOfSemaphores.pdf as on 9-Oct-2022).
[Hal15] Sibsankar Haldar: Operating Systems, Self Edition 1.1, 2015.
[SGG18] Abraham Silberschatz, Peter B Galvin, Greg Gagne: Operating Systems Concepts,10th Edition,
Wiley, 2018.
[Sta12] William Stallings: Operating Systems Internals and Design Principles, 7th Edition, Prentice Hall,
2012.
[Vah12] Uresh Vahalia: UNIX Internals, The New Frontiers, Pearson, 2012.
[YIR17] Pavel Yosifovich, Alex Ionescu, Mark E. Russinovich, and David A. Solomon: Windows Internals,
Seventh Edition (Part 1 and 2), Microsoft, 2017. https://docs.microsoft.com/en-
us/sysinternals/resources/windows-internals (as on 8-Jul-2022).
141 | Deadlocks
UNIT SPECIFICS
Through this unit we have discussed the following aspects:
Memory Management: Basic concept, Logical and Physical address map, Memory allocation:
Contiguous Memory allocation – Fixed and variable partition– Internal and External fragmentation
and Compaction; Paging: Principle of operation – Page allocation – Hardware support for paging,
Protection and sharing, Disadvantages of paging.
Virtual Memory: Basics of Virtual Memory – Hardware and control structures – Locality of reference,
Page fault, Working Set, Dirty page/Dirty bit – Demand paging, Page Replacement algorithms:
Optimal, First in First Out (FIFO), Second Chance (SC), Not recently used (NRU) and Least Recently
used (LRU).
This chapter discusses the role of memory in computers. Memory is the second most important hardware after
the processor. A processor constantly interacts with the memory during execution. While all the programs
persistently remain in either secondary (HDD) or tertiary memory (removable media), they are temporarily
brought in main memory for execution. The processor can fetch (and store) instructions and data from (to) the
main memory and not from secondary or tertiary storage. Hence all processes are loaded in the main memory.
But main memory is costly in price and thus small. How this space can be judiciously utilized so that we can
maximize CPU utilization, throughput and overall performance of a computer is the motivation of this chapter.
First, we study how main memory is managed by an operating system and how secondary storage can augment
the management to improve performance.
Like previous units, several multiple-choice questions as well as questions of short and long answer types
following Bloom’s taxonomy, assignments through numerical problems, a list of references and suggested
readings are provided. It is important to note that for getting more information on various topics of interest,
appropriate URLs and QR code have been provided in different sections which can be accessed or scanned for
relevant supportive knowledge. “Know More” section is also designed for supplementary information to cater
to the inquisitiveness and curiosity of the students.
RATIONALE
This unit starts with enumerating and introducing different types of memories available in a computer and their
interaction with the processor. The largest memory unit that a processor can directly fetch (and store) instructions
and data from (to) is the main memory. All programs residing in secondary or tertiary memory are therefore brought
into main memory for execution. The main memory is costly and thus small in size. Different parts of a program may
be stored in different areas of memory. How they are uniformly referenced and accessed is discussed through logical
and physical addressing schemes. It is followed by a discussion on how processes are allocated space in the memory.
When the main memory is inadequate and/or a process is so large that it cannot be accommodated in the main
memory, how secondary memory can support as a back-up memory is talked about in the virtual memory section.
The intricacies and nuances of data transfer between main memory and secondary memory, intervention of the
processor and responses of the operating system in the memory management are discussed.
143 | Memory Management
This unit builds the fundamental concepts to understand memory management issues of an OS, introduces
necessary terms and terminologies, and details important techniques. The concepts form the core of computation and
interaction between the processor and memory under the control of an operating system.
PRE-REQUISITES
UNIT OUTCOMES
5.1 INTRODUCTION
Memory is the second most important resource after the processor in a computer. Although there are several
memory elements like registers, cache, ROM, RAM, secondary memory (hard disk), tertiary memory (recall Ch.1,
Fig 1.1) in a computer, RAM or random-access memory is known as the main memory.
While registers and cache memory are closer to the processor (Fig 1.7 - 1.8) that a processor can directly access,
they are very expensive and thus of very small capacity. They cannot accommodate either the operating system or
other programs that are executed by the processor. Main memory is the furthest memory unit from a processor
that it can directly access which can accommodate the OS as well as user applications during their execution.
Processors make all memory references with respect to this memory. Main memory is volatile in nature - it keeps
the code and data (both for system and application programs) as long as the computer is on. Hence, both the OS
and other programs need to be loaded on the main memory (hereinafter referred to as memory only) from the
secondary or tertiary memory after each start-up and/or execution (Fig 5.1).
The OS kernel remains loaded in the
low memory region of the memory as
long as the system runs (Fig 1.9, Fig
2.1). Once loaded, the OS divides the
memory into two parts: kernel space
for storing the OS; and the user space
for storing the application processes. In
a single-programming system, the OS
remains in the kernel space and only
one application program can reside in
the user space. But in today’s multi-
programming environment, the OS
needs to further divide the user space
so that several user programs can
coexist in the memory. How many
programs can be accommodated in the
memory decides the degree of
multiprogramming. When one process
goes for I/O, the CPU remains idle. We
can schedule another process only if it
is available in main memory. Thus, CPU usage can be maximized if we can accommodate in the memory as many
processes as possible. But the main memory is much smaller compared to other permanent storage devices. If a
process with a large address space is loaded in the memory, it potentially precludes loading of other processes,
reducing the degree of multiprogramming. Can a process be partially loaded? If yes, how much of it is to be loaded
now, when and where is the remaining portion to be loaded later? - These are some of the important issues. Main
memory space management is thus an important OS task.
Specifically, some of the critical questions related to this space management are as follows.
1. How many processes are to be loaded in the memory?
145 | Memory Management
2. Which processes will be allocated space? In other words, what will be the selection criteria for the
processes during memory allocation?
3. How much space will be allocated to a given process and in what way?
We shall investigate these broad questions in this chapter of memory management. The first part will focus on the
basic space allocation techniques in the main memory. The later part (virtual memory) will discuss how to handle
space requirements of large processes that may go beyond the available main memory space with the help of
secondary memory.
For example, suppose a process has an address space with base address 1204 and length 476 bytes. The base
register will contain value 1204 and limit register 476. The end-address of the address space is then (1204 + 476) =
1680. If the memory reference (m) generated from the CPU is between these two locations (both inclusive), such
Operating Systems | 146
an address is valid. But, if the reference address is beyond the two boundaries (less than base address or greater
than end-address), the address reference is illegal. A hardware scheme traps the error and invokes a suitable
interrupt routine (illegal memory access). Both base register and limit register thus provide protection from illegal
memory references.
These registers are handled only by the OS in kernel mode. Appropriate values are populated there whenever a
process is scheduled to run in CPU. The OS thus can save its own kernel space from illegal memory access, as well
as address spaces of other processes.
int a, b;
are two placeholders whose relative byte positions in the object code can be 14 bytes (for a) and 18 bytes (for b)
from a particular reference point (say, the starting location of the program or of the current module). We may not
know the actual memory address (known as absolute or real addresses) of the identifiers until the process is loaded
in the main memory. This kind of relative addressing does not refer to the real addresses in the main memory but
is helpful in doing address calculations and address references in a logical space. Also, some of the addresses remain
unknown in the compiled code if they belong to different source codes and static libraries. These are resolved, only
after linking, (however, as relative / relocatable addresses). Similarly, some of the addresses are resolved only after
147 | Memory Management
loading (due to shared libraries), while some are only possible during execution of the in-memory executable
(dynamically linked libraries or DLLs).
Hence, generating absolute memory addresses involve several resolution-related steps. We call this procedure of
resolving addresses to different identifiers as address binding. This binding depends on the computer architecture
and programming environment. Address binding is divided into three categories based on the time of its occurrence
as given below.
1. Compile-time address binding: The absolute addresses are generated during the compilation of the source
code by the compiler itself. In other words, the object code itself is the final in-memory binary executable.
This is only possible if the source has only a single file and does not use any other sources, static libraries,
shared files or dynamic libraries (see Fig 5.3). Also, the executable must be placed only at a fixed location
in the memory every time it is loaded and is limited by the maximum size allowable by the memory
manager. Obviously, this is the most restrictive address binding technique and is rarely used nowadays. In
MS-DOS, .COM files are examples of compile-time binding.
2. Load-time address binding: If all the real addresses are not known during compilation time, the compiler
generates relocatable code based on relative addresses. The unresolved addresses till compilation are
resolved by the linker (static parts like other source files or static libraries) or linker-loader (shared files).
All these addresses are in relocatable format. The absolute addresses are generated during loading of the
executable based on the relocation-value of the base address (e.g., if the relocation-value is 0, all
relocatable addresses themselves become absolute addresses; if the value is 12000, all real addresses are
12000 + relocatable addresses). Once the executable image is loaded, the addresses are fixed in the
memory and the image cannot be relocated any further during execution. However, in a separate loading
(or re-loading), the image can be relocated based on relocation-value. This binding does not need re-
compilation of source text for reloading. But dynamic relocation of the executable in run-time is not
possible.
3. Run-time address binding: This is the most flexible address binding scheme. Real addresses are computed
only before accessing the identifiers. Otherwise, they are referenced in the relative addressing mode, even
after loading. The in-memory image does all references in relative or relocatable addressing style. A
memory management unit (MMU) generates the actual or real addresses during the memory accesses.
Since address resolution is done at run-time, the executable image can be dynamically relocated anywhere
in the memory anytime - and it does not need any re-compilation of the source code, even across the
systems. Most modern operating systems support run-time address binding.
In high level languages, variables and functions are known by their symbolic names (e.g., variables like a, count;
function calculate_interest etc.) as supported by the languages. This namespace is called program identifier
space. Again, during compilation, they are represented as numbered identifiers (id1, id2, id3 etc.) or
placeholders in different memory elements (either registers, memory or stack). In the object code, they are referred
to in terms of relocatable addresses relative to some reference point (say, start of an object code). All these
references are done in the relocatable or relative addressing mode as if the identifiers are available at those
addresses. This namespace is called logical address space, as the address references and address arithmetic here
are computed logically, and not physically. All the addresses generated and referenced in logical address space are
called logical or virtual addresses.
Operating Systems | 148
However, in the run-time, when the identifiers need to be accessed, their real or absolute addresses in the memory
are evaluated. These addresses are called physical or real addresses.
There is a one-to-one mapping between identifiers, their logical addresses and physical addresses (Fig 5.4). The
identifiers from program address space go through logical address space to physical memory address space but
users see the correspondences between identifiers and their physical addresses.
For compile-time and load-time bindings, the in-memory executables already have all the addresses resolved, and
thus the CPU here can generate physical addresses.
But, nowadays, for most of the modern systems (run-time bindings), CPU generates logical addresses from an in-
memory image of the executable. The MMU calculates the real addresses, based on the memory allocation
technique used, and puts the physical addresses on the memory address register (MAR).
Alternatively, it is also said that addresses generated by the CPU are logical addresses, but addresses written on
MAR are physical addresses. For compile-time & load-time bindings, logical addresses and physical addresses are
the same.
We shall discuss some of the memory allocation techniques in the following sections. However, a simple
implementation of MMU can be thought of using a relocation register that stores the value of relocation. When the
value is added to a logical address generated by the CPU, we obtain the physical address (Fig 5.5). A limit register
stores the length of the process and checks whether every logical address remains within the logical limit to prevent
illegal memory access.
149 | Memory Management
Memory allocation is the primary task in memory management. Main memory is the largest memory element that
CPUs can directly access. It is smaller and faster than secondary memory but volatile in nature. Hence, programs
along with data need to be loaded there each time before their execution.
If a program is large, requiring more than available memory, it cannot be loaded entirely. In the earlier days (single
programming environment), the application developer had to decide which part of the program would be loaded
at a given time. The program had to be designed into several parts (modules) so that the main module with a
currently executed one along with its necessary data fit in the available memory. When another module needed to
be loaded, it used to replace the in-memory counterpart from the secondary memory. This technique is called
overlaying.
Obviously, overlaying was a concern for application developers, as program development was constrained by the
hardware. It did not have portability even in a single-programming environment as program design may require
changes if available memory space is different. In a multiprogramming environment, it is almost impossible for the
program developer to keep track of the available memory space that changes dynamically. Also, the application
developer ideally should be kept free from the burden of this kind of micro-management.
Hence, the job of memory allocation is delegated to the OS in most of the modern systems. The OS itself remains
in the main memory, and fulfills this responsibility in kernel mode to optimize the overall performance of the
system. An OS accomplishes this with the help of secondary memory using different allocation schemes. We shall
start with the following three basic schemes.
i. Contiguous Allocation: a process in its entirety is allocated contiguous memory space
ii. Paging: a process is divided into a number of equal sized pages; pages are loaded
iii. Segmentation: a process is divided into a number of unequal-sized segments; segments are loaded.
We shall briefly discuss each of the above allocation techniques along with the issues involved therein followed by
some of the combinational schemes.
Dynamic partitioning starts with no external fragmentation at all. But gradually it keeps on adding external
fragmentation as processes leave and occupy the memory. Often it comes to a situation where the total available
memory is more than the required space of a new process, but it cannot be allocated as the space is not contiguous,
but fragmented.
For example, suppose in a memory of 32 MB, the OS occupies 8 MB space leaving 24 M space free (Fig 5.7a). A
process P1 of size 10M arrives and is allocated space (b). Similarly, P2 and then P3 arrive and occupy (c-d), leaving
1M of space. Now, process P4 of size 8M comes but there is not enough space. P1, the only process that occupies
the space large enough is then swapped out (assuming it was not running). P4 takes 8M of space, leaving 2M empty
(e). Similarly, P5 leaves another 1M hole (f). Now, if a process P6 with say, 3M size comes, even though we have
4M space empty, but not contiguous and thus cannot be allocated according to the contiguous allocation scheme.
How can the issue of external fragmentation be effectively managed, as it keeps on increasing with continuous
arrival of newer processes? Two popular solutions applied are: i. compaction and ii. placement algorithms.
Compaction: Processes are pushed towards one end of the memory and small holes are acquired and added to
make a bigger hole. Compaction routine is run by the OS to reclaim the wasted space at regular intervals or when
memory utilization falls below a pre-decided threshold. This is like the defragmentation utility that Windows
systems provide as ‘Disk Defragmenter’ to make the hard disk drives organized in a compact way and to free some
disk space.
Compaction, however, cannot solve the problem of internal fragmentation. Often dynamic partitioning also leaves
traces of internal fragmentation. Instead of allocating part of a word (4 bytes), spaces are allocated in multiples of
a word to minimize the overhead of management (e.g., if a process needs 2046 bytes of space, 2048 bytes of space
is allocated).
Compaction involves movement of many live processes within memory and needs resolving a lot of memory
references. This is hugely time consuming and takes substantial processor time.
Placement Algorithms: Another way out to minimize the fragmentation is done while placing new processes into
the available holes. The OS maintains a list of available holes with their sizes when processes leave the memory.
Before a new process is allocated space, the search is made on the list to find the most appropriate hole. There are
Operating Systems | 152
different algorithms for allocating holes to the requesting processes. The strategies discussed below are some of
them.
i. First-fit: A process is allocated the first hole that is found large enough to accommodate it. It is quick to
place the processes but suffers from potential external fragmentation.
ii. Best-fit: A process is allocated the smallest hole, after exhaustively searching the list, that is just large
enough to accommodate the process leaving the smallest empty space. It ensures the lowest
fragmentation.
iii. Worst-fit: A process is allocated the largest hole, after exhaustively searching the list, that can
accommodate the process. Even though it seems counter-intuitive at the first instant, it takes away the
remaining space after allocation and adds to the list of empty holes.
Which one of the above strategies is the best depends on the size-distribution of the processes that arrive in a given
system. However, in most of the cases, in general, the first-fit is seen to be the simplest and fastest. The best-fit is
often the worst performer.
One of the IBM’s mainframe operating systems, OS/MVT (Multiprogramming with a Variable Number of Tasks) used
dynamic partitioning.
Even though the buddy system minimizes external fragmentation to some extent, internal fragmentation is very
much there as we have to allocate space in the size of 2i words, whereas need may be much less (See P4 in Fig
153 | Memory Management
5.8b). Also, two empty blocks can only be merged if they are next to each other. Otherwise, compaction is
necessary.
As evident from the discussion so far, memory utilization would improve if we can
i. reduce the block size so that internal fragmentation is minimum.
ii. instead of allocating a single big block, more than one small block can be allocated to a process.
iii. compaction will not be necessary if some mechanism is in place to know what blocks are allocated to a
process and where they are.
Essentially, these are done in the other two basic memory allocation techniques: paging and segmentation.
5.4.2 Paging
Memory is divided into several small blocks of equal size. For the convenience of address translation and data
transfer, block-size is kept in the power of 2, like 1024 bytes (1K), 4K, 1MB or higher. The blocks in the memory are
called frames. Program code and program data residing in the secondary device are also considered a sequence of
blocks of the same size, called pages. Processes are loaded in units of pages into the available frames. The frames
need not be allocated contiguously in the memory.
For example, a logical address of 2056 byte (100000001000) with 1024B (1K) of page-size has <page#, offset>
representation as <10, 0000001000>. If the page-table shows the corresponding frame as 517 (i.e. mapping 10 →
1000000101) then the real address in the memory is <1000000101, 0000001000> or byte location 529,416 (Fig 5.9).
The pages, frames, page-tables are managed by the OS. The kernel allocates a page-table to each process and keeps
track of the allocated frames. A process address space is considered as a contiguous sequence of pages, each of the
same size as that of a frame. The OS also maintains a global frame-table (FT) to keep track of the occupied and
available frames. When a frame is allocated to a process, its page table as well as the global frame table are updated.
An entry in the FT stores corresponding process id. If the page is also used by other processes, it stores those process
ids along with reference counts of the page. If the frame is free, it is added to the list of available frames. When a
process needs a page to be loaded, the list is searched, and the first available frame is allocated. When a process
terminates, or is idle, its frames are reclaimed by the OS so that they can be allocated to a new process. When a
process is no more active, its page table is also destroyed. The allocation and release keep on occurring dynamically
in a system, and the pages belonging to a process thus are unlikely to get contiguous frames (Fig 5.10).
For any memory references, first its logical address <page#, offset> is determined. Page number is first
searched in the TLB. If it is found (TLB hit), the corresponding frame number is directly used in getting the physical
address. If not (TLB miss), the page-table needs to be searched. PTBR provides the base address of the in-memory
page table. The rest of the address translation mechanism is similar to what is already discussed earlier.
The use of TLB saves a search in the page-table and thus saves one memory access in case of a TLB hit. In case of a
TLB-miss, at least 2 memory accesses (1 for finding frame-id + 1 for actual addr) are required. Out of the total
number of memory references, the fraction of TLB hits is called hit-ratio (0 ≤ hit-ratio ≤ 1), and TLB misses are
represented by (1 − hit-ratio).
Average memory access time is given by (assuming negligible TLB cache lookup time)
For example, if memory access time is 20ns, with hit-ratio = 0.7, effective memory access time or = 0.7 x
20 + (1 - 0.7) x 2 x 20 = 14+12 = 26ns.
Hence, placing page tables in the memory slows down effective memory access time (here, from 20ns to 26ns). It
is better if TLB hits can be increased (say, 90 percent or higher). This, however, requires a large size of TLB in the
processor unit which is costly.
Alternatively, some processors use multiple layers of TLB caches in a cascaded fashion (for TLB misses in L1, L2 is
searched and so on). For example, Intel Core i7 has 128-entry instruction cache and 64-entry data cache in L1
followed by 512-entry L2 cache. Effective memory access time requires information on hit-ratio and the required
number of clock-cycles to search in each layer and time of the memory-access.
Since TLBs are hardware features that come with the processor, the OS must customize paging implementation
according to the TLB implementation in the system.
the page numbers. In other words, we need to have a page-table with 220 entries (32-bit addresses) to 252 entries
(64-bit addresses). Handling such a large page table is difficult.
Thus, one popular solution scheme is to use hierarchical page tables. For example, with 4K pages, an
implementation of hierarchical paging can be the following.
The implementation will look like the following (Fig 5.12). The first-level page table (p1) gives the location of the
second level page table (p2) and so on. The innermost page table provides the actual frame id within which the
memory address is found.
High number of hierarchies involves high overhead of address translation, both in terms of space and time.
Depending on the maximum allowable page-size and average process address space, the designer must make a
trade-off on the number of hierarchies.
To minimize the space requirement due to high number of page tables in the hierarchical scheme, two other
implementations are also popular: hashed page table and inverted page table.
In hashed page-table implementation, the page-id of log -addr are passed to a hash function. The hash function
maps all pages to a smaller set of hash-values. Corresponding to each hash-value, page-id and its frame-id are
stored. To avoid collisions among several pages with the same hash-value, a linked list is maintained, whenever
necessary. Hash table is much smaller than a page table. Search time is also linear in length of the linked list, in case
of collision. The scheme is shown in Fig 5.13.
157 | Memory Management
In the inverted page table implementation, only one global page-table is maintained for all the live processes. The
logical address thus needs to include process-id (pid) also and looks like the tuple: <pid, page-id,
offset>.
Inverted page table is a frame table (also termed as page-frame-table) with entries in the form of <pid, page-
id> arranged in the increasing order of allocated frame-ids. For each address-reference from a process, the entire
page-frame-table is sequentially searched for the query <pid, page-id> and the index of the table itself provides the
frame-id. This frame-id is used in forming the physical address as <frame-id, offset>. The scheme is shown in Fig
5.14.
The table is sorted in the increasing order of frame-id, but the search is on <pid, page-id>. Thus, searching is
exhaustive. To cut-down the search time, a hash table is used where a given <pid, page-id> is hashed into a hash-
table that stores the frame-values.
A page can be a read-only (ro) page or read-write (rw) page. A read-only page needs to be protected from write
attempts. Each page is marked by a special bit in the page table to show whether it is ro or rw. Before writing on
any page, its writability field is checked.
Similarly, sometimes entries on a page-table may refer to some pages
belonging to earlier loaded page-tables which were not flushed properly, or
some junk values existed for the frame-id. Another special bit shows whether
a given entry belongs to the current page-table or not. In other words,
whether an entry is for a valid page or an invalid page. The bit is called valid-
invalid bit. A frame-id is only used to get a physical address if a valid bit is set
(v). Hence, considering these together, each entry in a page-table accommodates two extra bits as shown in Fig
5.15. Any attempt to write to a read-only page or accessing an invalid page leads to an error (trap) and invokes
appropriate interrupt routine.
Shared libraries mentioned in Sec 5.2.2 implement the above idea. This is used in many applications like compilers,
window systems, database systems wherever the code is re-entrant (a reusable routine that does not change and
multiple processes can invoke, interrupt, and re-invoke simultaneously). Also recall Shared Memory Model (Sec
3.1.1) that can be implemented using paging and page sharing.
5.4.2.6 Disadvantages of paging scheme
Since a process can be allocated as many frames as it has pages at maximum, there is no external fragmentation in
paging. However, if the process size is not exactly multiple of page-size, some spaces in the last page remain unused.
E.g., a process of size 5200 bytes in a paging scheme of 1K page-size needs = ceiling [5200/1024] = 6 pages with
internal fragmentation (here [6 × 1024 – 5200] = 944 bytes remain unusable within a 1K page). Even 1 byte space
beyond the multiple of page-size requires a new page allocation. Hence, on an average, half-the page-size per
process is wasted due to internal fragmentation. Thus, smaller the page-size, less the fragmentation. On the other
hand, programs and data are loaded in terms of pages. Hence, it is convenient, if page-size matches block-size of
data transfer, otherwise the number of I/O required will be high.
159 | Memory Management
5.4.3 Segmentation
Segmentation is another non-contiguous memory allocation technique where space is allocated to processes in the
units of logical blocks or segments from a user’s perspective. Any program can be considered as a collection of
logical segments. For example, a C- implementation of quicksort algorithm may have the following logical sections:
functions main(), read(), quicksort(), partition(), swap() etc. and data section. Each of the
functions and data sections can be considered as a segment and they can be independently loaded onto the
memory (Fig 5.17a). Each segment is assigned a number (numbering is not done by the OS, but by either the
compiler, linker or loader) and any memory reference in the logical space is done relative to the beginning of a
segment. Hence, logical addresses have the form <seg-id, offset> where seg-id represents segmentation id
and offset is the location of the address from the base-address of the segment. Base-addresses of all the segments
are maintained in a segment table by the OS.
Segmentation can be implemented using contiguous allocation where the segment table stores the base-address
and size of each segment (Fig 5.17b). For each logical address in the form <seg-id, offset>, the offset is first checked
whether it is within the size of the corresponding segment. If yes, the base-address of the segment is added to the
offset to get the physical address (Fig 5.18).
obtained from the MSBs of the <offset> field. The rest of the offset (LSBs) provide offset within a page (Fig
5.19).
While segmentation provides a more user-oriented view of a user program and memory allocation to it, the unequal
sizes of the segments make the address translation a little clumsy and not as simple as the paging scheme.
In most of the modern systems, segmentation is done by the compilers and paging is implemented as the memory
allocation technique. Both paging and segmentation removes the restriction of contiguous memory locations. Pages
or segments can be anywhere in the memory. Till now, it is assumed that an entire process with all its code and
data is loaded and remains stationed in the main memory during execution. However, most modern operating
systems do not require this restriction nowadays and allow only a portion of the process address space to remain
in the main memory during execution.
Virtual memory is an ‘illusion’ of a larger memory over the real physical memory using a part of secondary memory.
As the name suggests (the word virtual is inspired from the field of Physics where virtual images are formed in
mirrors and some types of lenses), this memory is not real main memory, but an illusion of the same. In a loose
analogy, virtual memory is like the inflated market capitalization of a business organization while real memory is its
enterprise value or real worth. However, an organization can also be under-valued in market capitalization, but
virtual memory always projects an enlarged main memory hiding the loans from the secondary memory.
Technically, virtual memory (VM) is seen as a memory management scheme to enable execution of processes
without requiring them to be fully memory-resident, i.e., only a small part of their code and data can be in the
memory, while the remaining majority can reside in a back-up store of the secondary memory.
2) every program has a locality of references: memory references within a time window are usually made to
locations that are spatially close to each other (e.g., accessing successive array elements or nodes in a linked
list) and often the same instructions are executed several times repeatedly (within a loop)
3) a program may have a lot of non-essential code to tackle error conditions and/or special cases that are rarely
encountered.
The above facts facilitate realization of virtual memory in which the frequently used code and data within a locality
reside in the memory and infrequently accessed portions are brought to memory only when they are required.
VM also provides the application programmers the freedom from the constraints of physical memory space and its
allocation policies. An application program can go beyond the boundary of limited and costly real memory space,
and the programmer need not bother about placement of the program and data (or process address space) in the
memory during execution.
Even though the scheme is convenient to the programmers, VM is complicated to implement. It needs support from
the hardware units and operating system software. The hardware units provide support for address translation.
The software called the virtual memory manager (VMM) takes care of the issues like when to load a portion of code
and/or data (page or segment), and when and how to replace them to the backing store. VMM implements a set
of placement and replacement algorithms. We shall discuss in detail the nuances of virtual memory in the following
subsections.
Virtual memory allows partial loading of a process to begin its execution. The OS thus starts by loading only the
initial piece of the process (a few pages or a segment) to the memory that includes the initial set of instructions and
the data that it refers to. This portion of the process that is in the memory is called the resident set of the process.
Execution goes smoothly if memory references are within the resident set. However, when the references are
beyond it, as flagged by the page table or the segment table, an software interrupt is generated indicating a memory
access fault. The OS then suspends the ongoing process and puts it in the waiting state. The OS also issues a disk
I/O request to bring the page or the segment corresponding to the logical address that caused the memory access
fault. Once the demanded piece (page or segment) is brought to the memory, an I/O completion interrupt through
the processor notifies the OS. The OS then places the blocked process to the ready queue to resume its execution.
When there is not enough space in the main memory, some piece of the process address space is replaced to the
backing store of the secondary memory. A coarse-level broad overview is shown in Fig 5.20.
In VM, main memory is not increased, but the programmer is provided a perception (or illusion) of a larger main
memory with the support from several hardware units. Following units and/or phenomena cover the contribution
of different hardware components in implementation of VM.
Operating Systems | 162
Understanding page-fault needs at least one memory access. If the page table is long and implemented in a
hierarchical fashion (Sec. 5.4.2.3, Fig. 5.12), understanding page-fault itself (Step 1) will take more than one
memory access. Once the page is re-loaded, when the instruction is restarted, another memory access is required.
Hence, handling page-fault needs at least 2 memory accesses (in the order of nanoseconds). However, the major
time goes into the I/O as disk access is much slower (in the order of milliseconds). If we assume that a memory
access takes 100 nanoseconds, and disk I/O takes 10 milliseconds then, with page fault rate ( is the fraction of
page-faults among all page references, 0 ≤ ≤ 1),
effective memory access time = 100 ∗ (1 − ) + ∗ 10 ∗106 nanoseconds = (100 + 9,999,900* ) nanoseconds.
Even after ignoring several other factors, a page-fault increases memory access time enormously. To keep the
effective memory access time within a tolerable limit (say, 10% of usual memory access time), we have, (100 +
9,999,900 ∗ ) ≤ 110 or, ≤ 0.000001 i.e., page-fault needs to be very rare, in the order of one in a million page-
references.
only for very few moments. The diagram clearly shows that the memory references have temporal locality (a page
is referred to in quick succession for some time, as shown in region t) and spatial locality (in a short window of time,
locations that are close to each other are referenced frequently). All the regions a-h show both temporal as well as
spatial locality of references.
Even though the above corresponds to execution of a particular program, all program executions follow this locality
model. When a function is invoked, the program control jumps from the locality of the calling function to that of
the called function. Local variables are accessed, and computations happen in that locality. Once the function
returns to the caller, execution again happens in the spatial locality of the caller. Memory references thus move
from one locality to another, with or without overlap among them (see localities at time t1 and t2). If we can load
into memory only the pages from the active regions instead of all the pages, we can save main memory space. This
can accommodate more processes in the memory. It leads to an increase in the degree of multiprogramming. Also,
temporal locality helps in reducing the number of page-faults.
Page id: . . 2 6 1 5 7 7 7 5 1 6 2 3 4 1 2 3 4 4 3 4 3 4 4 4 1 3 2 3 4 4 3 4 4 4 . . .
← ∆ →| ← ∆ →|
T1 T2
165 | Memory Management
WSS and page-fault rate (PFR) have a close relationship (Fig 5.23). When a process starts execution in a new locality,
page faults increase as the pages referenced are not available. However, the same set of pages are soon referred
again due to locality. As the pages are available in the memory, page fault rate drops. Hence, it is important to
estimate the working set window () and allocate an adequate number of frames to accommodate all working sets
for a process. An ideal working set size is the number of pages referenced between two troughs in the above time-
diagram. If it is made smaller, more page-faults will result in average.
Working set changes as the program executes. A new page is added to and an old page is removed from the working
set as the working set window moves. Keeping track of dynamically changing working sets for all the processes
needs considerable hardware support. Each page reference needs to be remembered to decide whether to keep
the page in the memory or not. Necessary book-keeping and protection mechanisms are to be in-place for each
page in the page table.
is not available in the memory. In the simple paging scheme, it means that the page is not a legitimate page of the
process. But in virtual memory, a non-resident page may be a legal page of the process, but not loaded in the
memory. In that case, the trap generated is analyzed by the OS and if it is a legitimate page, it is loaded from the
secondary memory. Also, another bit per page is used to notify whether the page is read-only (ro) or read-write
(rw). If the page is writable (rw), the process can modify it. If the page is modified by the process, the modified copy
needs to be backed up in swap space, especially when there is no free frame, and the page needs to be swapped
out. But, if the page is ro, or it is rw but not modified, we do not need to swap out the page as the swap space has
a copy of it. Instead, just invalidating the page will serve the purpose. This can save time-consuming I/O. We thus
need to check whether a rw page in the memory is modified. Another bit is, therefore, assigned for each entry in
the page table to mark whether the page is modified or not. This bit is called a dirty / modify bit. If the bit is set,
the page is understood to have been modified and needs to be backed up when swapped out (see Sec 5.5.3.2
below) or the process terminates.
Hence, virtual memory implementation needs at least 3 bits for each entry
in the page table as shown in (Fig 5.24). The dirty bit (drt) is particularly useful
when we decide to swap out a victim page from the main memory.
There can be a few more control information like reference number
(discussed in Sec 5.5.3.2), position in the swap space etc. in each entry of a
page-table.
Virtual memory is implemented employing active support of different OS software. Three key decisions related to
VM implementation are:
1. whether to implement VM or not
2. whether to use paging or segmentation or both
3. which memory management technique to use.
Early OSs (MS-DOS, early UNIX) did not support VM as the underlying hardware did not provide address translation
mechanisms and support other necessary functions.
Pure segmentation where each segment is provided contiguous memory space is becoming rare. Most OSs use
paging as the basic memory management technique. Hence, even if segmentation is used, segmentation with
paging is mostly used.
Although the first two decisions are hardware-driven, performance of the VM implementation depends on a few
software issues as follows.
1. Fetch policy: when to bring pages (or segments) to the memory
2. Placement & Replacement policy: where and how to place the fetched page(s)
3. Resident Set Management: how many frames to be allocated per process
4. Load Control: how many processes to be accommodated in the memory (degree of multiprogramming).
The issues are discussed as follows.
loading redundant pages that are not referenced during a particular run of a program, especially the routines that
are rarely invoked.
But, how much of the address space needs to be loaded before a process can start? In simple paging (without
VM), all the pages need to be loaded. If this is one extreme (a stringent restriction for simple paging!) of the
spectrum in the paging scheme, the other extreme can be ‘do not load a page until and unless it is required’. That
is, each page of a process is fetched only when it is demanded. This is called pure demand paging. Pure demand
paging, at the first instant, seems beneficial as no unnecessary page is loaded. It saves main memory space that can
accommodate more processes. It can thus increase the degree of multiprogramming, CPU utilization and
throughput. But pure demand paging also causes 100% page faults, as each page-reference results in a page-fault
leading to a serious performance issue. Hence, pure demand paging is not a good idea either. Typically, demand
paging is implemented to increase the degree of multiprogramming but keeping the page-fault-rate (PFR) as low as
possible.
In case the dirty-bit is set, demand paging must incur the cost of loading back the page to the swap space. This
increases the effective memory access time even further.
An OS also has to manage this extra work (i-iv). To be specific, it must select a victim frame, free the frame and
replace its page. An OS uses different algorithms to select the victim frame. They are called page replacement
algorithms. Choice of the algorithm may be based on many factors - but its performance is decided by the number
of page faults. For a fixed number of frames, the less the number of page-faults, the better is the algorithm in
performance. We shall discuss the following page replacement algorithms here.
1. Optimal (OPT)
2. First in First Out (FIFO)
3. Second Chance (SC)
4. Not recently used (NRU) and
5. Least Recently used (LRU)
For the sake of comparison, we shall consider a single sequence of memory references in a small-time window
(assume they are coming from a program execution). For example, this string of references (called a reference
string) in terms of byte locations are:
1240, 2243, 3450, 4456, 2345, 1645, 5658, 6745, 2234, 1343, 2654, 3674, 7856, 6542, 3654, 2346, 1234, 2543,
3432, 6676
Considering 1 KB (1024 bytes) page-size, corresponding page references are:
1, 2, 3, 4, 2, 1, 5, 6, 2, 1, 2, 3, 7, 6, 3, 2, 1, 2, 3, 6.
Operating Systems | 168
We shall primarily assume that four (4) frames are allocated to the process.
This is the best possible algorithm that has the minimum page fault rate. The principle used here is: replace a page
that is not going to be used for the longest time.
The algorithm ensures that a page once brought into the memory is kept if it is to be used in the future. When a
new page is required to replace an old one, the victim must be the one not to be used in the future. If such a page
is not found, then the victim must be the page to be used in the most distant future. Let us consider the example
to understand the algorithm better (Fig 5.25).
The first four page-references cause mandatory page faults. At position 5, page-id 2 is already there in the memory
(frame 2). Similar is the case at position 6 for page-id 1. At position 7, page-id 5 must replace a page. Here, page-id
4 is chosen, as it is not used at all in future. Similarly, at position 8, frame 4 (page 5) is again chosen as the victim.
Positions 9-12 do not cause any page-faults. But at position 13, page 7 needs to replace page 1 even though page 1
is used again in future. We did not have a choice as this is the most distant page in the future. At position 17, page
1 again replaces page 7.
Thus, we have a total of 8 page-faults with 4 page-replacements for the given reference string of 20 pages.
You can check that, with a higher number of frames allocated to the process, the number of page-faults can be
reduced, but not the other way (try with 3 and 5 frames to get convinced).
This is called the optimal algorithm as we cannot reduce the number of page faults any further using any other
algorithms for the given reference string and given number of frames.
But this is impossible to implement as it is based on future page references. During program execution, at a given
instant, we do not know, for sure, which page will be used in future. Nevertheless, it is used as the benchmark for
evaluating the performance of other algorithms.
This is the simplest possible algorithm for page replacement. When replacement is required, the page that came in
first (to memory), goes out first. In other words, the algorithm replaces the page that has been staying in the
memory for the longest time. To understand this, let us illustrate with the same reference string (Fig 5.26).
The algorithm results in 14 page-faults with 10 page replacements for the given string with 4 frames. One can try
with different numbers of frames and check that: with 1 frame, there will be 20 page-faults; 2 frames → 18 page
faults; 3 → 16, 5 → 12, 6 → 10, 7→7, 8 →7. For the given string, 7 is the minimum number of page-faults that is
bound to happen as there are 7 different page-ids.
In general, with increase in the number of frames, the number of page-faults decreases (Fig 5.27a). But this is not
always true. In some of the page replacement algorithms including FIFO, increasing the number of frames
sometimes causes an increase in the number of page faults. For example, for the page-reference string: 0, 1, 2, 3,
0, 1, 6, 0, 1, 2, 3, 6, we see the number of page-faults increase from 3 frames (9 faults) to 4 frames (10 faults) in
FIFO (Fig 5.27b). This anomalous phenomenon is called Belady’s anomaly (named after László "Les" Bélády).
FIFO algorithm presumes that a page that is brought in the memory first, is the best candidate to go out first.
Because of the locality, it is thought less likely to be used again. But reality might be different as illustrated in the
example. The most striking drawback of FIFO is that it does not consider the usage history of a page. No matter
whether a page is used in the recent past (one or more times), the oldest page is chosen as the victim.
Operating Systems | 170
In the example, starting from the frame-id 0, the next pointer moves in the clockwise direction to find the victim
frame that has use-bit = 0. Frame-id 3 fulfills such a criterion, the page 167 is replaced (Fig 5.28a) with the new page
29 (Fig 5.28b). All the pages with u=1 are reset to 0 [frame-id 0, 1, 2] and the new page is set with u=1 and the
pointer points to the next frame 4.
The SC algorithm checks whether the page is used or not but cannot check the order of use. Also, it does not
distinguish between a page that has been only read and another that is modified. Remember that replacing a
modified page is costly as it needs to be backed up. But the page that is only read or not modified does not need to
be backed up. It thus can save I/O.
It is a more refined version of the Second Chance algorithm that takes into consideration the above aspect. Along
with the use-bit (u), modify-bit (m) is also checked to select the victim frame.
The frames are considered to belong to the following four categories:
i. not recently used and not modified (u=0, m=0)
ii. not recently used but modified (u=0, m=1)
iii. recently used but not modified (u=1, m=0)
iv. recently used and modified (u=1, m=1).
Step 1: The first frame belonging to the first category (u=0, m=0) is selected as the victim frame as it has the lowest
I/O cost involved. Since the page is not modified, it does not need to be backed up. When a frame is bypassed by
the moving next pointer, use-bits are not changed (unlike simple SC algorithm).
171 | Memory Management
Step 2. If such a frame is not found in the first sweep of the circular buffer, the first frame from the second category
(u=0, m=1) is chosen as the victim frame. Even though the frame is modified, since it is not used in the recent past,
it is assumed unlikely to be used in future.
However, when the next pointer moves clockwise, but skips a frame, it changes its use-bit from u=1 to u=0.
Step 3. If Step 2 fails, the moving next pointer comes back to the starting position, but all use-bits are 0 now. We
re-run Step 1 or if needed, then Step 2 to find the victim frame.
NRU is also known as enhanced SC algorithm. Although it may need several iterations, and thus need little more
time for victim selection, it can minimize the cost of page-faults (due to minimization of I/O cost for back-up).
The SC and NRU are also called clock algorithms, as they use the principle of clock.
This is another important page replacement algorithm that performance-wise goes quite close to the OPT
algorithm. Although it is not possible to exactly predict the future use of a page, its past usage can help us in arriving
at a better guess for most of the pages.
We assume that if a page has not been used for long it is less likely to be used again soon. On the contrary, the
pages that are recently used are likely to be used soon due to locality of references. Hence the victim should be the
frame that holds the page used in the most distant past or is the least recently used page. Fig 5.29 shows the running
example again using LRU.
The algorithm causes 11 page-faults with 7 page replacements. LRU performs much better than FIFO. It rectifies the
problem in FIFO where, no matter whatever be the recent usages of a page, the oldest page is replaced. If LRU is
seen in the opposite direction (right to left), it is the reverse of the optimal algorithm.
However, implementing LRU is not trivial without hardware support. The past usage of a page needs to be kept
track of before choosing a victim frame. This can be done if every page-use is time-stamped. Before the page
Operating Systems | 172
replacement, the timestamp of last use is checked for each frame and the one with the oldest timestamp is chosen
as the victim. Instead of timestamp, counters can also be used. For every page reference, a counter associated with
a frame is incremented by 1. The frame with the lowest counter-value is selected as the victim.
Another alternative is the use of a stack.
Whenever a page is referenced, it is put on
the top. If the page is already available in
the memory, it is taken out of the stack
(anywhere in the stack, not necessary on
the top) and pushed on top again. When a
page is to be replaced, the page is available
at the bottom of the stack (Fig 5.30).
Because of this reason, LRU is also called a
stack-based algorithm.
However, the above algorithms are not the
exhaustive list. There can be other
algorithms like:
LIFO (Last-In-First-Out) or MRU (Most
Recently Used): The last page will be
replaced. Even though it seems
counterintuitive, for some cyclical reference
strings, it may be the closest approximation
to OPT.
MFU (Most Frequently Used): the
most frequently occuring page is replaced.
…etc.
Among the page replacement algorithms
OPT is the best performer in comparison.
But that is not practically implementable.
LRU is close to it and is implemented with active support from the hardware. Clock algorithms are next best, and
thus, considered as LRU approximation algorithms. FIFO is the simplest to implement, but the worst performer in
general (Fig 5.31).
Keeping these two factors into account, two frame allocation policies are adopted.
Fixed Allocation: Each process gets a fixed number of frames decided during loading of the process or
process creation time. The number may depend on the type of process (batch, interactive, or the
application-type) and its size. If a page-fault occurs and there is no free frame in the allocated set, page-
replacement must be done. Allocated number of frames does not change during the execution of the
process.
173 | Memory Management
Variable Allocation: Number of frames allocated to a process change during the execution of the process
depending on the paging behavior. If the occurrence of page-faults increases, more frames are allocated.
On the contrary, if page fault rate decreases, some of the frames allocated are taken away so that they can
be used for other processes.
Variable allocation is more powerful but at the cost of increased software overhead. The OS has to monitor page
fault rates (PFRs) of all the processes and the allocation needs support of the hardware including processor.
The use of allocation policy also depends on the page-replacement policy: local or global.
Local Replacement Policy: When a page-fault occurs and there is no free frame available in the allocated
set, a victim frame is chosen from there only. The referenced page must be loaded in the victim frame
replacing (overwriting or swapping out) the old page.
Global Replacement Policy: All the unlocked frames or resident pages are candidates for replacement,
regardless of the processes that own the pages. The benefits of variable allocation can be best leveraged
in global replacement policy only. A process facing high occurrences of page-faults can take free frames
from any of the processes.
However, when there are no free frames, a process encountering a high PFR will snatch frames from another
process. This new process, may, in turn, suffer from increased PFR. It may again snatch frames from other processes.
Gradually this may have a spiralling effect leading to very high overall PFR.
Hence, blind use of variable allocation with global replacement is not good. The use can be moderated by adopting
the local replacement policy first, monitoring the page-fault-rates of different processes and if needed, adjusting
the allocation by taking extra frame(s) from a process with very low page-fault-rate and allocating to another with
very high PFR.
Such a dynamic mechanism is difficult to implement. However, the working set strategy (Sec 5.5.2.4) is a useful and
popular attempt.
If thrashing is detected, degree of multiprogramming should be reduced by suspending one or more processes.
Which process(es) need to be suspended - depends on lot many factors like
process-priority: low priority processes are easy targets
page-fault rates: processes with high PFRs may be chosen
resident set size: processes with small resident set size can be re-loaded easily later
process-size: largest process will free lot of frames
activation-time: last process activated has the lowest cost of re-starting
remaining time: process(es) with large remaining time will hold and use lot of resources
…etc.
The choice is decided by the OS designer based on one or more of the above factors.
Virtual memory implementation with demand paging is summarized in the following flow-diagram.
175 | Memory Management
2. 1-level simple paging with TLB (hit-ratio = ) & in-memory page table:
= ×( + ) + (1 − ) × ( +2× ) (Neglecting TLB-search time)
(for TLB misses, one to access PT, the other to access physical address)
3. 2-level simple paging with TLB (hit-ratio = ) & in-memory page tables:
= ×( + ) + (1 − ) × ( +3× )
(for TLB misses, 2* to access PTs, the other physical address) ...
4. Demand paging (PFR = ) with TLB (hit-ratio = ), single-layer in-memory page table:
= ×( + ) + (1 − ){ + + (1 − ) × + × + +2× }
(for TLB miss, ( + ) is compulsory. One more if page is in memory; for a page-fault, page-fault
handling (reloading) is followed by restarting page-search ( +2× ).
Operating Systems | 176
UNIT SUMMARY
This chapter starts with enumerating different memory elements: registers, cache, main memory,
secondary and tertiary memory.
Main memory is the largest and farthest unit of memory that the processor can directly access.
All programs are loaded into the main memory for execution.
A program can have different components spread across different parts of main memory.
The components are referenced in a logical space and then they are finally converted to physical
addresses through address binding.
Main memory space is allocated to processes using three basic techniques: contiguous allocation,
paging and segmentation.
In contiguous allocation, entire process address space gets memory in a single space.
In paging, a process is divided into several equal-sized pages and pages are loaded in the memory.
In segmentation, a process is divided into several logical components that are of different sizes;
segments are loaded.
Pages are managed through page-tables that OS maintains per process.
In virtual memory, not all the pages are loaded at the same time; few are memory resident while
the majority remain in the backing store of secondary memory.
When a referenced page is not available in memory, a page-fault occurs.
Page-fault handling is a time-consuming activity: it involves swapping in the page from the backing
store and swapping out a modified page when no free frame is available.
Pure demand paging causes 100% page-faults and thus not recommended.
No page-fault means simple paging scheme with low degree of multiprogramming, low CPU
utilization and throughput.
Very high degree of multiprogramming may cause very high page-faults and thrashing.
Thrashing is an undesired phenomenon when a process remains busy in handling page-faults
without doing any computation.
Page-fault-rate thus should be kept as minimum as possible within an upper and a lower threshold.
If page-fault rate goes beyond the upper threshold, one or more processes need to be suspended
and page-frames released should be allocated to processes suffering from high PFR.
EXERCISES
Multiple Choice / Objective Questions
Q1. Which of the following actions is/are typically not performed by the operating system when switching context
from process A to process B
A. Saving current register values and restoring saved register values for process B .
B. Changing address translation tables.
C. Swapping out the memory image of process A to the disk.
D. Invalidating the translation look-aside buffer. [GATE (1999)]
Q2. A 1000 Kbyte memory is managed using variable partitions but no compaction. It currently has two partitions
of sizes 200 Kbytes and 260 Kbytes respectively. The smallest allocation request in Kbytes that could be denied
is for
A. 151
B. 181
C. 231
D. 541 [GATE(1996)]
Q3. Consider six memory partitions of size 200 KB, 400 KB, 600 KB, 500 KB, 300 KB, and 250 KB, where KB
refers to kilobyte. These partitions need to be allotted to four processes of sizes 357 KB, 210 KB, 468 KB and
491 KB in that order. If the best fit algorithm is used, which partitions are NOT allotted to any process?
177 | Memory Management
Q4. In which one of the following page replacement policies, Belady’s anomaly may occur?
A. FIFO B. Optimal C. LRU D. MRU [GATE (2009)]
Q5. The page size is 4 KB (1KB = 210 bytes) and page table entry size at every level is 8 bytes. A process P is
currently using 2 GB (1 GB = 230 bytes) virtual memory which OS mapped to 2 GB of physical memory. The
minimum amount of memory required for the page table of P across all levels is _________ KB
A. 4108
B. 1027
C. 3081
D. 4698 [GATE(2021)]
Q7. In a system with 32 bit virtual addresses and 1 KB page size, use of one-level page tables for virtual to
physical address translation is not practical because of
A. the large amount of internal fragmentation
B. the large amount of external fragmentation
C. the large memory overhead in maintaining page tables
D. the large computation overhead in the translation process [GATE (2003)]
Q8. Consider a virtual memory system with FIFO page replacement policy. For an arbitrary page access pattern,
increasing the number of page frames in main memory will
A. always decrease the number of page faults
B. always increase the number of page faults
C. sometimes increase the number of page faults
D. never affect the number of page faults [GATE(2001)]
Q9. Assume that in a certain computer, the virtual addresses are 64 bits long and the physical addresses are
48 bits long. The memory is word addressable. The page size is 8KB and the word size is 4 bytes. The
Translation Look-aside Buffer (TLB) in the address translation path has 128 valid entries. At most, how many
distinct virtual addresses can be translated without any TLB miss?
A. 16 x 210
B. 8 x 220
C. 4 x 220
D. 256 x 210 [GATE(2019)]
Q10. Consider a process executing on an operating system that uses demand paging. The average time for a
memory access in the system is M units if the corresponding memory page is available in memory, and D units
if the memory access causes a page fault. It has been experimentally measured that the average time taken for
a memory access in the process is X units. Which one of the following is the correct expression for the page
fault rate experienced by the process?
A. (D – M) / (X – M)
B. (X – M) / (D – M)
Operating Systems | 178
C. (D – X) / (D – M)
D. (X – M) / (D – X) [GATE(2018)]
Q11. A processor uses 2-level page tables for virtual to physical address translation. Page tables for both levels
are stored in the main memory. Virtual and physical addresses are both 32 bits wide. The memory is byte
addressable. For virtual to physical address translation, the 10 most significant bits of the virtual address are
used as index into the first level page table while the next 10 bits are used as index into the second level page
table. The 12 least significant bits of the virtual address are used as offset within the page. Assume that the
page table entries in both levels of page tables are 4 bytes wide. Further, the processor has a translation look-
aside buffer (TLB), with a hit rate of 96%. The TLB caches recently used virtual page numbers and the
corresponding physical page numbers. The processor also has a physically addressed cache with a hit rate of
90%. Main memory access time is 10 ns, cache access time is 1 ns, and TLB access time is also 1 ns.
Assuming that no page faults occur, the average time taken to access a virtual address is approximately (to the
nearest 0.5 ns)
A. 1.5 ns
B. 2 ns
C. 3 ns
D. 4 ns [GATE(2003)]
Q12. A multilevel page table is preferred in comparison to a single level page table for translating virtual address
to physical address because
A. It reduces the memory access time to read or write a memory location.
B. It helps to reduce the size of the page table needed to implement the virtual address space of a process.
C. It is required by the translation lookaside buffer.
D. It helps to reduce the number of page faults in page replacement algorithms. [GATE(2009)]
a. fragmentation vs segmentation
179 | Memory Management
b. segmentation vs paging
c. best-fit vs worst-fit
d. buddy system vs equal partitioning
e. paging vs demand paging
f. LRU vs NRU
a) compaction
b) working set
c) thrashing
d) page-fault-rate (PFR)
e) degree of multiprogramming
f) resident set management
g) relationship between PFR and working set
Numerical Problems
Q1. Consider a main memory system that consists of 8 memory modules attached to the system bus, which is
one word wide. When a write request is made, the bus is occupied for 100 nanoseconds (ns) by the data,
address, and control signals. During the same 100 ns, and for 500 ns thereafter, the addressed memory module
executes one cycle accepting and storing the data. The (internal) operation of different memory modules may
overlap in time, but only one request can be on the bus at any time. The maximum number of stores (of one
word each) that can be initiated in 1 millisecond is_________? (ANS : 10000) [GATE(2014)]
Q2. A process has been allocated 3 page frames. Assume that none of the pages of the process are available
in the memory initially. The process makes the following sequence of page references (reference string): 1, 2,
1, 3, 7, 4, 5, 6, 3, 1. If optimal page replacement policy is used, how many page faults occur for the above
reference string ______? (ANS :7) [GATE (2007)]
Q3. A demand paging system takes 100 time units to service a page fault and 300 time units to replace a dirty
page. Memory access time is 1 time unit. The probability of a page fault is p. In case of a page fault, the
probability of page being dirty is also p. It is observed that the average access time is 3 time units. Then the
value of p is_______? (ANS : 0.019[approx]) [GATE (2007)]
Q4. A system uses FIFO policy for page replacement. It has 4 page frames with no pages loaded to begin
with. The system first accesses 100 distinct pages in some order and then accesses the same 100 pages but
now in the reverse order. How many page faults will occur? (ANS: 196) [GATE (2010)]
Q5. A system uses 3 page frames for storing process pages in main memory. It uses the Least Recently Used
(LRU) page replacement policy. Assume that all the page frames are initially empty. What is the total number
of page faults that will occur while processing the page reference string given below? 4, 7, 6, 1, 7, 6, 1, 2, 7,
2 (ANS: 6) [GATE(2014)]
Q6. Consider a computer system with ten physical page frames. The system is provided with an access
sequence (a1, a2, ...a20, a1, a2, ...a20), where each ai is a distinct virtual page number. The difference in the
number of page faults between the last-in-first-out page replacement policy and the optimal page replacement
policy is_________ number. (ANS: 1) [GATE (2016)]
PRACTICAL
Q1. Write a program to implement the contiguous memory allocation and visually display the output when
dynamically a set of processes comes and memory is allocated.
Q2. Write a program that will take a page-reference string as input and determine the number of page-faults for
(i) OPT (ii) FIFO (ii) LRU and (iv) SC algorithms.
Q3. In a UNIX or Linux system, explore the following commands (learn using man <command>) to see page
table, page-faults and other page-related activities for a process or several processes:
(i) ps command
Operating Systems | 180
KNOW MORE
Memory Management and Virtual Memory are discussed in general with good detail as two separate
chapters in [SGG18], [Sta12], [Hal15] and [Dha09].
[SGG18] covers address binding and explains implementation of paging hardware specially in different
architectures and commercial systems. It also discusses newer technologies like memory compression.
[Sta12] also covers securities issues including attacks and protection to memory. This also provides a very
organized and holistic view of virtual memory with emphasis on implementation of TLB.
[Hal15] clarifies different types of address spaces and illustrates their interaction. It provides segmentation
well in memory management and virtual memory.
[Dha09] sees memory management as two separate entities like heap space management and that for kernel
stack. It provides a good account of kernel space allocation and a mathematical framework for finding memory
access time.
[Bac05] and [Vah12] discuss memory management in the UNIX system. While [Bac05] discusses
swapping and demand paging there, [Vah12] is more comprehensive. [Vah12] discusses UNIX virtual
memory implementation in several architectures and systems like SVR4, SVR 4.2, Mach, Solaris 2.4, 4.3
& 4.4 BSD.
[YIR17] contains implementational details of memory management and virtual memory in Windows
operating systems across different architectures.
[Bac05] Maurice J Bach: The Design of the UNIX Operating System, Prentice Hall of India, 2005.
[Dha09] Dhananjay M. Dhamdhere: Operating Systems, A Concept-Based Approach, McGraw Hill, 2009.
[Hal15] Sibsankar Haldar: Operating Systems, Self Edition 1.1, 2015.
[SGG18] Abraham Silberschatz, Peter B Galvin, Greg Gagne: Operating Systems Concepts,10th Edition,
Wiley, 2018.
[Sta12] William Stallings: Operating Systems Internals and Design Principles, 7th Edition, Prentice Hall,
2012.
[Vah12] Uresh Vahalia: UNIX Internals, The New Frontiers, Pearson, 2012.
[YIR17] Pavel Yosifovich, Alex Ionescu, Mark E. Russinovich, and David A. Solomon: Windows Internals,
Seventh Edition (Part 1 and 2), Microsoft, 2017. https://docs.microsoft.com/en-
us/sysinternals/resources/windows-internals (as on 8-Jul-2022).
181 | Memory Management
UNIT SPECIFICS
Through this unit we have discussed the following aspects:
I/O Hardware: I/O devices, Device controllers, Direct memory access, Principles of I/O Software:
Goals of Interrupt handlers, Device drivers, Device independent I/O software, Secondary-Storage
Structure: Disk structure, Disk scheduling algorithms
Disk Management: Disk structure, Disk scheduling - FCFS, SSTF, SCAN, C-SCAN, Disk reliability,
Disk formatting, Boot-block, Bad blocks
File Management: Concept of File, Access methods, File types, File operation, Directory structure,
File System structure, Allocation methods (contiguous, linked, indexed), Free-space management (bit
vector, linked list, grouping), directory implementation (linear list, hash table), efficiency and
performance.
This chapter discusses the role of input and output devices in a computer. I/O devices are the gateways to
interact with a computing system. Users and application programs provide inputs through input devices and
receive outputs through output devices. Each such device has some hardware components like device
controllers, DMA, I/O ports, and I/O bus, that are connected to other hardware components like CPU and
memory through system bus. However, there are also few software components like I/O subsystem and device
drivers provided by the operating system that coordinate with different hardware components and the device.
We start with an introduction to different hardware devices and components and then the software needed in
I/O operations. We also discuss disk, an important I/O device to persistently store code and data for a
computer. Its physical structure, functionalities and management is discussed in detail. We then delve into files,
the software abstraction of data. With reasonable depth and rigor, we cover the structure and management
of files.
Like previous units, a number of multiple-choice questions as well as questions of short and long answer
types following Bloom’s taxonomy, assignments through a number of numerical problems, a list of references
and suggested readings are provided. It is important to note that for getting more information on various topics
of interest, appropriate URLs and QR code have been provided in different sections which can be accessed or
scanned for relevant supportive knowledge. “Know More” section is also designed for supplementary
information to cater to the inquisitiveness and curiosity of the students.
RATIONALE
A computer interacts with users or applications through I/O devices: it takes inputs through one or more input
devices and provides output through one or more output devices. How this interaction happens, specifically how the
operating system manages this interaction is the content of this chapter. The chapter begins with the definition of I/O
devices and their interaction with other necessary hardware components like I/O controllers, DMA, I/O ports, I/O
bus, processor, memory and system bus. It is followed by discussion on necessary software components like I/O
subsystem and device drivers. We then focus on the most important I/O device that persistently stores code and data
across - a disk. Necessary details of disk management are discussed. Data is stored in the storage device as well as
used in applications in the abstraction of files. Files are software entities that are used across the multitude of
physical storage media. Files and their management is thus an important concept. How an operating system creates
and manages files are discussed in reasonable detail in the last part.
183 | Memory Management
This unit builds the fundamental concepts to understand I/O devices and their management in a computer. It
introduces necessary terms and terminologies related to different I/O devices and I/O operations.
PRE-REQUISITES
UNIT OUTCOMES
List of outcomes of this unit is as follows:
U6-O1: Define different hardware components like device controllers, DMA, I/O ports, I/O buffers, I/O
bus, files, directories and so on
U6-O2: Describe the data transfer mechanism between memory and an I/O device, operation of a DMA,
disk formatting, disk scheduling algorithms, disk space allocation, implementation of file system
and directory structure
U6-O3: Understand the issues in I/O management, variety and diversity in I/O devices, their interfaces,
intricacies in disk management, disk space allocation and access
U6-O4: Realize the need of files, the concept of device-independent abstraction of storage units and their
management from OS perspective
U6-O5: Analyze and compare pros and cons of different disk scheduling algorithms, disk allocation
techniques, directory structure implementations
U6-O6: Design an I/O management system choosing the most appropriate techniques available or
prescribing one for a given use-case scenario to minimize overall I/O time
6.1 INTRODUCTION
A computer interacts with the user through a variety of hardware devices. While some of them are used to accept
inputs (keyboard, mouse, joystick, scanner, screen-reader etc.), some to show the outputs (display unit, printer) or
to run other systems (computer-driven robotic devices), or to communicate with other computing units (network
devices). These devices are in general called input/output devices or I/O devices. These hardware devices do not
form the core of computational components (processor, bus and memory) and remain in the periphery of a
computer (they are, hence, also called peripheral devices) (see Fig 1.1). I/O devices vary widely in shape, size,
functionality, input and output format. Handling I/O devices involves complexities and is thus the most difficult part
of a computer system.
Despite differences at different levels, one thing is common at a very high level. All I/O devices either store or carry
data that are used by the processor.
Recall from Unit 1 that an operating system provides an “easy-to-use” interface for the users to use the barebone
hardware. The OS not only takes care of the computing components of the hardware, but also of these I/O devices
which deal with data for storage or communication. An OS provides “easy-to-use” interfaces to different application
programs and kernel modules for a huge variety of devices. I/O management is thus a very important job of an OS.
First, we shall discuss the I/O hardware units followed by the software involved in the interaction with them in
general. We shall then focus on a particular device type: the disk device (in disk management). Finally, we shall
discuss storage, organization and management of data in the hardware devices in the abstraction of files (in file
management).
Wide variety of hardware is used in computers. Except the essential few like processor, memory and
communication bus, the most belong to the peripheral devices. They are mostly used by a computer to interact
with the outer world and thus are known as I/O devices (however, not all peripheral devices are I/O devices, e.g., a
timer is a peripheral device but not an I/O). The interaction needs both hardware and software. In the following we
first discuss different hardware units.
read-only / write-only / read-write: A device can only take input (e.g., a keyboard) or produce only
output (e.g., a printer) or does both read and write (e.g., a disk).
slow or fast: A device can be very slow transferring a few bytes a second (e.g., a keyboard), while
another can be very fast, transferring several MBs per second (e.g., NICs).
transient or persistent: Some devices store data for very short duration (e.g., NICs), while some can
store for long periods (e.g., disks).
serial or parallel: A device can transmit one bit at a time (bit-stream device), while another can
transmit several bits simultaneously in parallel.
sequential or random access: A device can support access of data only sequentially (e.g., a tape drive)
or can support random access from any region of the storage (e.g., a magnetic disk).
sharable or exclusive: A device can be concurrently accessed by several processes (e.g., a disk) or can
be used by only a single process at a time (a graphics plotter).
Each of these devices get connected to the host computer through a hardware component called device controller
or I/O controller. An I/O device is controlled by the controller and transmits data through an I/O bus (Fig. 6.1).
There are different interface lines connected to a general-purpose processor (Fig. 6.3a) through which a processor
gets various signals and inputs as well as sends and provides output. Two of them are interrupt request (INTR) and
interrupt acknowledgement (ACK) lines. I/O devices draw the attention of the CPU (Fig. 6.3b) through INTR.
Even though there can be several I/O devices, only one of the interrupt activation signals (IRQ) goes to the CPU at a
time - which one will go is decided by the interrupt controller (another component of the processor and is different
from an I/O controller) (Fig. 6.3b).
Interrupt controller uses a multiplexer to
select only one, out of several
simultaneous IRQ lines, based on priority
of the device or some other criteria (Fig
6.4). Until the I/O controller receives an
acknowledgement (ACK) from the CPU,
the signal remains active. Once ACK is
received, the signal is deactivated,
and the I/O device can go back to its
normal operation. In many systems,
separate lines are maintained for
maskable and non-maskable interrupts.
Non-maskable interrupts are immediately
sent to the CPU while the maskable ones
can be turned off by the CPU before
executing critical instructions. In programmable interrupt controller (PIC), separate mask registers are provided to
control masking of IRQ lines by the CPU.
CPU - I/O Controller Interaction: An I/O controller works on behalf of a CPU to get some I/O operation done by an
I/O device. However, the I/O devices widely vary in user interfaces and the controller insulates the CPU from low-
level differences. Processor architecture supports a few special I/O instructions, and the processor executes those
instructions (like IN, OUT) to operate the controller. The CPU writes the command on the designated I/O port and
input data on the input port, if any, and waits for the completion of the I/O operation intended.
This wait can happen in two ways. The I/O operations are also divided based on the wait-type.
187 | I/O Management
One, the processor can continuously or intermittently check the Status register of the controller. If the I/O is
complete then, it also reads the Output register. The processor remains busy with the I/O operation during the
entire interval since issuing an I/O command till its completion (successful or error). This type of busy-wait
handshaking is called programmed I/O.
Two, the processor populates the command register along and the Input register(s) and goes back to do other
activities. When the device completes the intended operation, the controller raises an interrupt request (INTR) to
draw attention of the processor. The processor, on receiving the INTR, invokes appropriate interrupt service routine
(ISR). The ISR checks the Status register and Output register of the controller and does other necessary work as per
the ISR. This option is called interrupt-driven I/O.
Programmed I/O does not need a context switch. It can save time and logistic overhead of context switching.
However, it can be used only if the I/O device is quite fast, and the controller responds quickly.
But, in general, most of the I/O devices are much slower than the processor and hence, most contemporary systems
implement interrupt-driven I/O. When the I/O device takes time to do the I/O operation, the processor can execute
other instructions for other processes. The processor and I/O controllers can execute in parallel.
The processor only needs to check the presence of the INTR signal intermittently. Generally, the processor does it
after every clock cycle and addresses the interrupt first, if any, suspending the current process and invoking an
interrupt service routine (ISR). When execution of the ISR is complete, then either the suspended process is
resumed, or execution of another program is started as decided by the ISR.
The I/O ports of all the controllers in a system make a composite I/O address space. I/O activity involving these I/O
ports of I/O controllers is called port-mapped I/O.
In some systems, instead of I/O ports in the controller, a certain portion of main memory is used for I/O control.
The CPU can do I/O operations very much like memory accesses (read/write). This kind of I/O activities are called
memory-mapped I/O.
For data transfer between main memory and an I/O device, the CPU is involved in the initial setup. The CPU first
arranges a memory buffer for data transfer. It conveys the I/O controller address of the buffer, number of bytes to
be transferred and direction of transfer (from or to the memory). DMA transfers the data. Only at the end of
transfer, DMA (or the I/O controller) interrupts the CPU. In between, the CPU remains free and can do other
execution. Instead of byte-by-byte (or word-by-word) involvement, the CPU is involved only at the beginning and
end.
However, the DMA transfers the data byte-by-byte or word-by-word or block-by-block. It uses the system bus for
the transfer.
When DMA transfers data of one byte or one word at a time, it uses the host bus in an interleaved fashion along
with other activities of the CPU. This intermittent use of the host bus is also called cycle stealing of DMA transfer.
DMA can also use burst mode or block transfer mode where DMA uses the host bus uninterrupted. Other devices
are not allowed to use the system bus at that time. Obviously, the bus needs to support the burst mode.
DMA can also transfer data in a single clock cycle at high-speed bypassing the DMA registers. DMA needs to activate
necessary control signals at both the source and the destination. For example, for a secondary memory to main
memory transfer, DMA simultaneously enables read signal at the secondary memory and write request to the main
memory. This mode of data transfer is called fly-by mode or single-access mode.
Interrupts are signals that I/O devices raise to draw the attention of a CPU (Sec 3.1.2.1). Modern computing systems
are mostly interrupt-driven. Computers achieve multiprogramming because of interrupts. An interrupt disrupts the
normal activity of a CPU. Normally a CPU sequentially executes instructions of a program. An interrupt forces it to
stop and execute another set of instructions from another program. At the end of each instruction, the CPU checks
the interrupt request line (INTR) and needs to handle the interrupt, if there is any.
Interrupt handling is an extremely important task and involves both hardware and software. While handling of a
few interrupts can be deferred during critical processing (by masking low priority interrupts), some interrupts need
immediate attention of the operating system. Interrupt handling thus requires hardware mechanisms (like
identifying an interrupt type, its priority-level, assigning a number to it, putting an entry in the interrupt vector table
and pointing to the memory address corresponding to its interrupt service routine or ISR through a pointer etc.)
and necessary software like the ISR to handle the interrupt.
189 | I/O Management
I/O request scheduling: For sharable devices, there is no requirement of scheduling. But for non-sharable
devices, if there are concurrent requests from several processes, there needs to be a scheduling algorithm
(like CPU scheduling discussed in Unit 2) (Fig. 6.9). While FCFS can work for most of the cases, sometimes
process priority or other constraints can play a role. For every device, I/O subsystem determines the best
order among the pending requests. While one is allocated a non-sharable I/O device (with a single
instance), the others need to wait for it in a queue.
Operating Systems | 192
Coordinating I/O operations: I/O operations like read and write are typically synchronous or blocking. The
process must wait till the I/O is complete. But if the I/O is lengthy, it can affect the performance of the
process. If there are other tasks of the process that can be independently done - the process can execute
using an alternative way: asynchronous and non-blocking I/O operations.
In asynchronous I/O operations, a process initiates the I/O and then leaves the task to the OS (I/O
subsystem). The process goes back to its own other work. When I/O is complete, an interrupt or call-back
mechanism notifies the process about I/O completion. The process can then perform the subsequent
actions.
Managing data cache: For block devices, I/O subsystem maintains a few data cache or I/O buffers. After a
block of data is read, it is temporarily kept in the cache. Before a block is read from the device, it is first
checked in the cache. If found (i.e., a cache-hit), time for reading from the device (which is a way more
costly than from the kernel memory), is saved.
Mass-storage medium in any computer system that is closest to the processor where code and data can reside
persistently is the secondary storage. Two primary categories in the secondary storage devices are hard-disk drives
(HDDs) and solid-state disks (SSDs).
We shall study them in greater detail below.
spindle. Platters are very thin (10 - 20 nm) that have coatings of magnetic material (iron oxides) on both sides that
store the data (Fig. 6.10).
For each platter, there are two read-write heads to access its either side. The heads do not touch the disk surface,
although the separation is extremely narrow (about one-millionth of an inch). The heads are mounted on arms that
enable horizontal movement over the platters. The arms are fixed on the arm-assembly and can only have linear
motion along the radius of the platter.
Data is stored on either side of the platter. Entire space of all the platters together makes the total disk space. Each
platter is divided into hundreds of circular rings. Each such ring or circular stripe is called a track. A track is divided
into a number of sectors. However, all the similar tracks across platters together make a cylinder or volume. A typical
sector stores 512 or 1024 bytes of data. All cylinders, tracks and sectors are numbered. Each sector is uniquely
referred to by a tuple <cylinder-no, track-no, sector-no>.
Cylinder number starts from the periphery (0) and increases towards the center. The one closest to the spindle has
the highest cylinder id.
A disk is rotated at a high speed (3600 rpm to 15000 rpm) by a disk-drive motor. To access a particular sector,
appropriate volume (or track) needs to be identified and the r/w head to be brought over it through radial
movement of the arm. The platter then needs to be rotated so that the beginning of the sector comes under the
r/w head. Hence, data-access from a disk involves the time for arm-movement (called seek-time) and time of
rotation (called rotational latency).
Disk-access time = seek-time + rotational latency.
Once the sector is perfectly located, data transfer can take place. Including data transfer, effective disk access time
= seek-time + rotational latency + data transfer time.
Seek time is on an average 5 − 25 and one complete rotation of the platter takes 8 − 16 in modern disks.
Data transfer time depends on a few factors like the amount of data, the position of the sectors involved (contiguous
or spread over the disk) and speed of rotation of the disk.
The total number of to and fro movement of the r/w head is = (97- 52) + (182 - 97) + (182 - 36) + (121 - 36) + (121 -
13) + (123 - 13) + (123 - 64) + (66 - 64) = 45 + 85 + 146 + 85 + 108 + 110 + 59 + 2 = 640 cylinders.
The algorithm is easy to understand and implement, but not good at all from the performance point of view. The
r/w arm has to move back and forth several times that results in very high overall seek-time.
The total number of to and from movement is = (64 - 52) + (66 - 64) + (66 - 36) + (36 - 13) + (97 - 13) + (121 - 97) +
(123 - 121) + (187 - 121) = 243 cylinders.
The performance is much improved over FCFS, even though it may not be the best for the given reference string.
The algorithm can be implemented using a min-heap built over the remaining disk cylinder requests and using the
min-value as the destination of head movement each time.
The algorithm is elegant and easily implementable. However, it suffers from a few problems. The algorithm always
looks for the local minimum (the closest cylinder from the current position) without considering the global
minimum. Hence, it can involve few back-and-forth movements of the head (though much less than FCFS). More
serious is the starvation problem. When some requests keep on coming that are near the current head, they will
be served before an old request that is far off from the current position.
Total head movement for the reference string is = (64-52) + (66 - 64) + (97- 66) + (121 - 97) + (123 - 121) + (187 -
123) + (199 - 187) + (199 - 36) + (36 - 13) = (199 - 52) + (199 - 13) = 147 + 186 = 333 cylinders.
The performance is not that great, although much better than FCFS. When the head moves in a particular direction,
if the requests also come in the same direction, the requests will be served immediately. However, the requests
coming in the opposite direction will have to wait. The wait is the longest for the requests at the opposite end to
the current direction of the head. For example, in the above reference string, requests for Cylinder 36 and Cylinder
13 arrive at position 3 and 5 but are served at position 7 and 8 respectively.
Assuming uniform distribution for arrival of requests, when the r/w head reaches near to one end, very few
unattended requests remain on the front. Most of the unattended requests remain on the back of the head that
we cover in the reverse direction. However, the most affected (waiting for long) are the ones that lie near the
opposite end of the platter. According to uniform distribution, most of them have come earlier than those in the
middle of the spectrum. This non-uniform delay in service is a problem in the SCAN algorithm.
Operating Systems | 196
The probability of data loss thus exponentially decreases when we use multiple disks to store the same data with
some redundancy. This resulted in a redundant array of independent disks (RAID) [the term ‘inexpensive’ used
earlier is now replaced by ‘independent’].
RAID technology uses several identical disks to store data. The array of disks is seen as a single ‘logical’ storage unit
managed by a single ‘logical’ disk controller. The low-level multiplicity of disks is hidden from the user. RAID has
different types, depending upon how the data is organized. Data is either divided (or stripped) and/or replicated
(or mirrored) among the disks in a RAID. Stripping means dividing data into multiple units and storing each unit in
different disks. Stripping can be done at bit level (each bit of a byte is saved in different disks), or byte, word or
sector or block levels. Often a parity information is also added in stripping. Initially six levels of RAID were proposed
(Fig. 6.15 and Fig. 6.16) as given below.
RAID Level 0: Only stripping is used, without any redundancy. Data can be accessed from the disks in parallel. Data
transfer rate is thus very fast.
RAID Level 1: Stripping is used, with mirroring. Total number of disks required is double the size of data that can be
stored. Data read can be done very fast in parallel, but data write is slower due to mirroring.
RAID Level 2: Stripping is used. Instead of mirroring, error-detection and correction techniques (for example,
Hamming code) are used. Parity bits are stored separately for each stripe in some disks. Depending on the error
detection & correction level, the number of extra disks vary.
Operating Systems | 198
RAID Level 3: Similar to RAID Level 2 but uses only one redundant disk for storing bit-level parity information (also
called bit-interleaved parity).
RAID Level 4: Like RAID Level 3, but stores block-level parity information.
RAID Level 5: Like RAID Level 4, but stores block-level parity information distributed across several disks.
A newer RAID Level 6, similar to Level 5, but has an additional redundant disk for dual redundancy with distributed
parity.
Different RAID levels offer diverse choices that a designer can opt for based on requirements. RAID 0 provides faster
data transfer (superior read-write performance) but no fault-tolerance or reliability. On the other hand, RAID 1
provides good performance in terms of read / write and fault tolerance, but disk usage is halved.
RAID was initially conceptualized to induce reliability against disk failures for cost-effective disks. However, the
technology evolved gradually, and RAID is now used in workstations and large-scale data centres involving
expensive disks as well.
In sector interleaving, consecutive numbers are assigned keeping the physical distance or gap of one or more
sectors. For example, in Fig. 6.19, a gap of one sector is interleaved. There can be interleaving of zero or more
sectors. Zero sector interleaving is the same as linear numbering.
Two sectors interleaving will create a circular sequence like: 0, 3, 6, 1, 4, 7, 2, 5.
Three sectors interleaving will create a circular sequence like: 0, 2, 4, 6, 1, 3, 5, 7; and so on.
Sector Skewing is aimed at minimizing the delay due to arm movement. Suppose while the arm moves by one track,
at the same time the disk rotates by three sectors. To keep continuity of data access from one track to another, the
Operating Systems | 200
sector numbers are assigned by making an appropriate gap between the seek time (arm movement) and rotational
latency. For example, in Fig. 6.20, from an inner track, when the arm moves to the outer track, at the same time,
the platter can move by 3 sectors. From sector 7 of the inner track, the r/w head can start accessing sector 0 of the
outer track.
This is particularly useful when consecutive tracks are accessed across the tracks.
6.4.4.2 Partitioning
Once the physical formatting is done at the production site, logical formatting follows using an OS. However, in-
between, a physical disk is optionally divided into a number of partitions using some partitioning tool (e.g.
GParted in Linux) that comes bundled with the OS software. Each partition is considered as a logical disk and
treated as if a hardware disk device (or a mini-disk). A number of consecutive cylinders make a partition. Each
partition can hold a file system including a swap system.
Each partition mandatorily contains a boot block (the 0th block) followed by a usable area. This area is formatted
according to the filesystem in the later stage.
In the first partition, the boot block is followed by a partition block that stores information about all the partitions
in the disk. This partition block can be optionally replicated in all the partitions (Fig. 6.21).
Partitioning also adds to the reliability of the disk. Each partition can be formatted separately by an OS into different
file systems and maintained by possibly different OSs. Data corruption in one partition does not cause problems in
another as the partitions are considered separate.
2. Basic-Input-Output-System (BIOS): the CPU points to a fixed location of ROM. This is called BIOS code. It
continues the power-on self-test and initializes other hardware components of the computer system including
storage devices.
3. boot loader: BIOS searches for a kernel image from storage devices like floppy, CD, HDD sequentially. If found,
the BIOS loads the tiny code into the RAM. This code is called bootloader. This bootloader is kept in the first
block of the device so that the BIOS can easily find it. This bootloader is a simple code that stores the location
(e.g., partition information of a disk) of the entire OS where it is saved persistently and from where it can be
loaded. This block is called the boot block that stores the boot loader.
4. Loading OS proper: The entire operating system is stored in a partition the pointer to which is stored in the
bootloader of the boot block. When the boot-loader loaded into RAM and is executed, the bootloader in turn
refers to the OS proper and loads the kernel and other different subsystems as per requirement.
is
This can be explained with a specific example. In Windows systems, a disk drive can be divided into several
partitions. A partition can store the OS and device drivers. This partition is called boot partition. But the bootloader
is placed in the very first block of the hard disk - this boot block is called Master Boot Record (MBR).
The MBR contains the boot code and a partition table. Booting the Windows system starts using the POST and BIOS
steps as indicated above. Then the BIOS accesses and executes the boot code of MBR. The boot code enables the
storage device controller and the storage device to locate the boot partition through the partition table. The first
sector of the boot partition (called the boot sector) points to the kernel. The rest of the booting is taken up by the
kernel that loads different OS subsystems and services (Fig. 6.22).
may find it to be a bad block. The controller reports it to the OS as an I/O error. The controller also replaces the
block with a spare block so that the next request to block 18 is transferred to the replacement block by the
controller.
This strategy bypasses the involvement of the OS and the OS remains unaware of the replacement. But at the
physical level, this can cause unplanned redirections of read/write head and rotations and affect optimisation of
disk scheduling algorithms. Most disks are thus provided with few spare sectors in each cylinder and a spare cylinder
during the time of formatting. The controller tries to replace a bad block from the same cylinder, whenever possible.
An alternative to sector sparing is sector slipping. Here, if the bad block is logical block no. 18 and the next spare
sector is, say, logical block no. 196, then logical block 195 is mapped to 196, 194 to 195, and so on till logical block
18 maps to logical block 19. Shifting this way leaves the bad block.
Soft errors are thus recovered using sector sparing or sector slipping. Hard errors are not recoverable and result in
loss of data. The data is only restored from the back-up manually.
All the discussions so far in Section 6.4 considered HDDs. Use of SSDs is on the rise as a permanent storage device
nowadays. SSDs, also called non-volatile memory (NVM) devices, are electrical and electronic, rather than
mechanical in nature. They contain a controller and semiconductor chips. However, discussion on these devices is
beyond the scope of this book.
than real-time applications, most applications use files as inputs (to read data from) and outputs (to store results).
A file outlives the lifetime of a program that uses or creates it and can be shared among several programs
simultaneously or at different times. A file can move from one medium (say, flash drive) to another (say HDD or
magnetic tape) without any compromises on the content (data) or other logical attributes of the content (data types
or permissible operations on the data). However, the data at the physical level can be stored differently in different
media. Even though a file may be divided into separate blocks and stored at different physical locations within a
device, the user remains unaware of these physical variations and sees the file as a continuous stream of bytes (Fig.
6.24).
A file may consist of one or more sub-units. The smallest logical unit within a file is called a field. A field can be a
single value like firstname of a person, employee-number, a date, or a hash-value of a password etc. A field is
characterized by length (a single byte or several bytes), and data type (e.g., binary, ASCII string, decimal value etc.).
A record is a collection of related fields within a file that can be considered as a logical unit by a program. For
example, an employee name with employee number, date-of-birth, address is a record.
A file may contain a single field or several records. The records may be of similar nature, of similar length or of
variable nature and/or length.
A database contains several files logically related to each other. Database management systems is another layer of
software working on top of a file management system and is beyond the scope of the book. We focus on files, file
systems and file management systems here.
A file is created, accessed, manipulated, and deleted by a user or an application program and is referenced by a
name. Every file belongs to a class of files depending on a set of properties. Such classes are called filesystems. An
operating system supports one or more filesystems. An OS also manages files belonging to different file systems
through file management systems. A file can belong to only one filesystem at a time in a given system.
of the pointer and then advances the pointer. Similarly, write_next() starts writing from the current position
and, at the end of writing, places the pointer after the end of the unit written.
Some systems also provide abstract
method rewind() to get to the beginning
of the file from any position (Fig.
6.25). Even though the records seem to be
consecutive, in the real disk, they can be
sparsely located. File management system
takes care of the translation from this
logical file address to the actual storage medium address.
source code from different programming languages like C, C++, .c, .java, .pl, .asm
Java, Perl, assembly languages
library libraries or shared objects used in source code .lib, .a. .so, .dll
archive / compress for compression of files, storage and archives .zip, .rar, .bzip, .bz2, .tar
MacOS supports file type. Along with the file-type, it also keeps track of the application that created the file as file-
creator. For example, if a file is created by a word processor, the application is invoked by the OS while opening the
file.
UNIX supports six different file types: 1. regular 2. directory 3. symbolic link 4. device (character or block device) file
5. FIFO (named pipe) and 6. socket.
A regular file is unformatted data. Most of the file-types in Table 6.1 belong to this type. The FMS is not supposed
to interpret the data. Users are supposed to maintain the internal structure of the data, and the application is
supposed to interpret the structure. Other file-types, not mentioned in the table above, however, are interpreted
by the FMS. For example, a directory contains a list of filenames and a reference to its metadata. This way UNIX
associates names to file objects, but the file objects themselves are treated as nameless entities.
6.5.4.1 Create
Users are allowed to create new files and/or directories within a file system, unless the concerned file system is
read-only. Every file is associated with a set of attributes.
Operating Systems | 206
Create operation requires the name of the file / directory to be created, its container directory and the values for
the attributes. On successful completion, the FMS creates an empty file / directory, allocates space for it, makes an
entry in the container directory, and initializes the attributes with supplied values or default ones. FMS populates
some fields like creation time, owner of the file / directory, and permission attributes. If the space is allocated, its
address is also recorded in the metadata of the file.
Creating a directory involves a few extra steps like initializing directory content, search structures etc.
6.5.4.2 Delete
Users are also allowed to delete a file or a directory with all the files under it, unless the concerned file system is
read-only. A delete operation only requires the name of the file / directory to be deleted. On success, the FMS frees
the space held by the file / directory along with its metadata. It also removes the entry from its container directory.
The removal of the entry from its container directory is done first. In a multiprogramming environment, a file may
be used by other processes also. Hence, the file content is deleted and the space allocated is freed only if no other
process is using the same file / directory.
6.5.4.3 Open
Every file needs to be opened before any application can access it for any purpose. The FMS needs the name of the
file to be opened along with a few additional values like opening mode (read / write / append etc). An open
operation checks whether the user has required access permission or not. If not, the operation replies with a
negative response. If yes, two objects called file descriptors (fd) are created for the file and the application pair. One
fd (called external fd) is returned to the application and another (called internal fd) is for the operating system. They
are used to operate on the same file. The external fd works as the symbolic link to the internal one that operates
on the file.
In some operating systems, open operation also creates a file pointer (fp) to keep track of the read / write
operations. File pointer points to the byte offset position with respect to the start of the file (start position
considered as byte 0) (Fig. 6.28).
6.5.4.4 Close
207 | I/O Management
Any opened file, at the end of use, needs to be closed. A close operation requires the file descriptor as an argument.
It releases the internal file descriptor, file pointer and other resources allocated during open operation. The time
between opening and closing the file is called a file session (Fig. 6.28).
6.5.4.5 Reposition
Reposition operation is related to positioning the file pointer. It takes the file descriptor and offset as the inputs
and positions the file pointer at the desired location of the file (Fig. 6.28). This operation is only allowed in random
access files.
6.5.4.6 Read
Reading a file means copying the contents of a file to an I/O buffer. Read operation takes as inputs a file descriptor
(of the opened file), a positive integer (the number of bytes to be read), and a buffer address. On success, the
required number of bytes starting from the current file pointer is copied into the buffer. At the end of read, the file
pointer is repositioned to the end of the last byte read.
6.5.4.7 Write
A write operation writes a string onto a file starting at the position of the current file pointer. It takes as arguments
a file descriptor, the string to be written and the length of the string. It can overwrite the earlier content of the file
or can append after the current file pointer. If space is needed to complete the write, the space is also allocated for
the same if available. At the end of the write, it repositions the file-pointer to the end of the last byte written.
6.5.4.8 Truncate
Truncate operation cuts the file length. It takes a file descriptor and a positive integer as inputs and reduces the file
size to the specified number of bytes. The extra space is freed.
The above operations are considered as basic operations on files. There can be a few more operations depending
on the operating system and the file management system involved therein like memory mapping, locking a file from
access etc.
Fig. 6.29 provides an example of different fields that a directory entry may contain. How this information will be
stored varies across the file systems. Some of this metadata may be stored as a part of the header record of a file,
and the rest as part of the directory. This splitting reduces the directory size, making it easier to be loaded into the
main memory. But on the contrary, a file needs to be accessed to ascertain whether a user can access the file or
not.
The simplest form of directory structure can be a list of entries, each for a file within the directory. A directory thus
can be implemented as a sequential file. But this will take a good amount of time to search for a particular file,
when the directory contains a lot of entries. Before creating a file, it is to be ensured that the same filename is not
present, and this involves searching the entire list. Even though it is possible for a single user, it is problematic for
multiple users of the directory as concurrent accesses will involve synchronization issues.
One advancement over it can be a two-level structure: one directory implementing a sequential file for each user
and a master directory containing all the users only. Even if this serves the purpose for a small size multi-user
system, it is not scalable to a large system. Both these schemes do not allow subdirectories - which we often need
to logically organize the files in some particular order.
A more flexible and popular approach employs a hierarchical arrangement, or a tree-structure as shown in Fig. 6.30.
At the root, there is a system-wide master directory. All the files within the system are divided into a set of
subdirectories. Each subdirectory can get further divided into another set of subdirectories and so on. All the files
remain as leaves whereas subdirectories as intermediate nodes (Fig. 6.30a). A file and/or subdirectory, other than
the master directory, can be added, deleted or modified at any level if the user has necessary permissions.
UNIX system implements the scheme where root (/) is the master directory that contains a set subdirectories like
dev (for managing devices), bin (executables), usr (users) and so on. Each of the first level subdirectories can have
few files or subdirectories. Each file or subdirectory is uniquely referenced by a branch starting from root. For
example, /bin/ls or /dev/disk1 etc. Each user is supposed to have a home directory (/usr/home) where user contents
can be saved. Nevertheless, a user can create any subdirectory, and store any file at any level below the root
directory, if she has necessary permissions (Fig. 6.30b).
209 | I/O Management
system is a self-contained system that manages itself. Some separate the structural part from the functional
aspects. Structural part is called the file system, whereas the functional part is called the file management system
(FMS), which is part of an OS. To be specific, Fig. 6.31 shows the logical structure of FMS, whereas Fig. 6.30 provides
structure of the same.
The applications and the users see a file as a contiguous sequence of records or bytes. This logical view gives us
logical blocks (l-blocks). FMS translates these l-blocks into physical block (p-blocks) in steps following different layers
(shown in Fig. 6.31) using different metadata in Fig. 6.32. For example, for a given filename supplied by an
application, the root directory is searched to get the container directory of the file. The container directory contains
a pointer to the metadata of the file, which contains a pointer to the start address of the file content (Fig. 6.33).
Another problem is that of external fragmentation. As the files are deleted or relocated, -blocks are freed. But
such free spaces are not contiguous. Remember our discussion on contiguous main memory space allocation (Sec
5.4.1). The issues and solutions are relevant here as well.
In other words, -blocks make a linked list for a file. To access a -block, one needs to find its sequence number in
the logical file space, and then sequentially traverse required number of -blocks.
Linked allocation solves the issue of external fragmentation. There is also no problem when a file grows. No
relocation of -blocks is necessary due to growth of file size. Only file-metadata needs to be modified.
However, a major drawback is the mandatory sequential access of file blocks. No random access is possible. An -
block search time is linear in length. For a large file, this is costly. Also, each -block must hold a block header, trailer
and a pointer. This increases the overall space overhead of the scheme.
Operating Systems | 212
Another problem is the issue of reliability. Since a number of links are involved to track the -blocks of a file, where
damage in a single link may break the entire chain and the file may get partially or fully inaccessible.
= 230 bits = 128 MB of bitmap), this may not be possible. With continuously increasing disk space, keeping the entire
bitmap in the main memory, however, can be space-consuming.
6.5.8.3 Grouping
Grouping is the modification of the linked free list
approach. Here, instead of making a single list containing all the free blocks, a group of n blocks are made. The first
free -block stores the addresses of first free blocks. While the next ( − 1) blocks are free, the -th free block
contains the addresses of the next free blocks and so on. Hence, a contiguous list of ( − 1) blocks can be quickly
found, unlike the standard linked list approach above.
scheme drastically reduces the directory search time (from linear to constant time). Insertion and deletion of files
are straightforward and managed in constant time.
One problem with the hashing scheme is the fixed size of the function. For example, suppose a directory has the
hash table of 64 entries and the file names are hashed to values 0 to 63. Even for an increase of one new file (65
entries), one needs to increase the hash table to accommodate 128 entries and a new hash function. All the
directory entries need to be changed to reflect the new hash-function values.
Otherwise, we must use an overflow hash table where each hash-value can have multiple entries due to collision.
Whenever a collision happens, a linked list is added. Search time marginally increases due to linear search in case
of collisions, but still much faster than linear search.
6.5.10.3 Caching
Caching is an important technique to minimize disk access and to speed up I/O activities. Part of main memory is
dedicated for caching to exploit temporal and spatial locality of data blocks. Some operating systems use page
caching to cache both file data as well process data. Effective cache management techniques can optimize cache-
hits and minimize the cache-misses. For majority of cases, the least recently used (LRU) page replacement technique
proves effective. But, for sequential file accesses, where a seen page is not going to be used again LRU does not
work. In such cases, free-behind (free a page as soon as a new page requested) and read-ahead (reading a requested
page along with pre-fetching a few next pages) techniques are employed.
6.5.10.4 Buffering
Sometimes storage device controllers have on-board caches that can temporarily store a track or a few blocks of
data.
Some operating systems also maintain a special section in the main memory, called buffer caches, to speed up I/O
operations. Buffer cache, along with page cache, sometimes offer double caching. Double caching is wastage of
memory, CPU cycles and I/O cycles. Also, it leads to potential inconsistencies in the filesystem. Hence, some systems
provide unified buffer cache.
215 | I/O Management
All these techniques and algorithms together improve the performance of a file system as well that of a disk. There
is no single technique that can be said to be the most effective, nor can it optimize all the performance issues. Based
on a given situation, the operating system must dynamically decide the best technique and deploy.
UNIT SUMMARY
This chapter discusses the role of input and output devices in a computer.
I/O devices are the gateways to interact with a computer.
Users and application programs provide inputs through input devices and receive outputs through
output devices.
An I/O device is connected through an I/O bus to a device controller that has a few I/O ports, and an
optional DMA.
Normally, data to/from an I/O device goes to main memory through the processor.
A DMA can bypass the continuous interference of a processor in the data transfer during I/O operation.
I/O operations are managed by an OS through different software components like I/O subsystem and
device drivers.
A device driver is a low-level OS module that interacts with I/O controllers and manages I/O activities.
A device driver can handle more than one device of similar type.
Disk is an important I/O device that persistently stores code and data for a computer.
Hard disk drives (HDD) are cheap and popular mass storage devices. It is a collection of several very
thin magnetic platters that store data. Data is accessed through radial movement of a r/w head and
rotation of the disk structure.
A disk is divided into a number of volumes or cylinders, where each volume is a set of concentric circles
across the platters. Each circle within a platter is called a track. Each track is further divided into a
number of sectors. Each sector is either 512 bytes or 1024 bytes.
When a sector comes under the r/w head, data transfer takes place.
Disk access time is much higher than processor computation time or main memory time. Hence, disk
management is a very important activity of an OS.
Different disk scheduling algorithms are proposed to reduce seek time (time of r/w head movement).
Files are device-independent software abstractions to store and use persistent data.
A file is a sequence of data blocks. Users and applications access persistent data in the units of files.
Files are organized in a hierarchy of directories. Efficient file system management needs effective
directory implementation.
System performance largely depends on good I/O management that consists of disk management and
file management.
EXERCISES
Multiple Choice / Objective Questions
Q2. What is the bit rate of a video terminal unit with 80 characters/line, 8 bits/character and horizontal sweep
time of 100 µs (including 20 µs of retrace time)?
A. 8 Mbps
B. 6.4 Mbps
C. 0.8 Mbps
D. 0.64 Mbps [GATE(2004)]
(sweep time is the time for a signal to reach its maximum value, retrace time is the time to fall from the maximum
to original value)
Q3. Which one of the following is true for a CPU having a single interrupt request line and a single interrupt
grant line?
A. Neither vectored interrupt nor multiple interrupting devices are possible.
B. Vectored interrupts are not possible but multiple interrupting devices are possible.
C. Vectored interrupts and multiple interrupting devices are both possible.
D. Vectored interrupt is possible but multiple interrupting devices are not possible. [GATE (2005)]
Q4. Normally user programs are prevented from handling I/O directly by I/O instructions in them. For CPUs
having explicit I/O instructions, such I/O protection is ensured by having the I/O instructions privileged. In a CPU
with memory mapped I/O, there is no explicit I/O instruction. Which one of the following is true for a CPU with
memory mapped I/O?
A. I/O protection is ensured by operating system routine (s)
B. I/O protection is ensured by a hardware trap
C. I/O protection is ensured during system configuration
D. I/O protection is not possible [GATE (2005)]
Q5. Which of the following DMA transfer modes and interrupt handling mechanisms will enable the highest I/O
band-width?
A. Transparent DMA and Polling interrupts
B. Cycle-stealing and Vectored interrupts
C. Block transfer and Vectored interrupts
D. Block transfer and Polling interrupts [GATE (2006)]
Q6. Consider a computer system with DMA support. The DMA module is transferring one 8-bit character in one
CPU cycle from a device to memory through cycle stealing at regular intervals. Consider a 2 MHz processor. If
0.5% processor cycles are used for DMA, the data transfer rate of the device is __________ bits per second.
A. 80000
B. 10000
C. 8000
D. 1000 [GATE(2021)]
Q7. Which one of the following facilitates the transfer of bulk data from hard disk to main memory with the
highest throughput?
A. DMA based I/O transfer
B. Interrupt driven I/O transfer
C. Polling based I/O transfer
D. Programmed I/O transfer [GATE (2022)]
Q8. Suppose the following disk request sequence (track numbers) for a disk with 100 tracks is given: 45, 20,
90, 10, 50, 60, 80, 25, 70. Assume that the initial position of the R/W head is on track 50. The additional distance
that will be traversed by the R/W head when the Shortest Seek Time First (SSTF) algorithm is used compared
to the SCAN (Elevator) algorithm (assuming that SCAN algorithm moves towards 100 when it starts execution)
is _________ tracks
A. 8
B. 9
C. 10
D. 11
217 | I/O Management
1. A 2. B 3. C 4. A 5. C 6. A 7. A 8. C
Q1. Explain the interaction among a device, a device controller and the CPU.
Q2. Discuss different transfer modes in a DMA.
Q3. Differentiate between port-mapped I/O and memory mapped I/O.
Q4. Explain the scenarios when polling I/O and interrupt-driven I/O are beneficial.
Q5. Describe the organization of a magnetic disk.
Q6. Explain different stages of disk formatting.
Q7. With necessary diagrams, explain different types of blocks like superblock, boot block, partition block.
Q8. Both main memory and disk are storage units. Explain the similarities and differences in the space allocation
and free space management.
Numerical Problems
Q1. A disk drive has 8 usable surfaces with 110 tracks per surface. If each track has 96 sectors and each sector
is 512 bytes, what is the size of the disk?
Q2. If we want to store 300,000 logical records of 120-bytes long in the above disk (as in Q1.), how many
surfaces, tracks and sectors will be necessary?
Q3. In Q2., assume that the disk rotates at 360 rpm. The processor reads from the disk using interrupt-driven
I/O with one interrupt per byte. If it takes 2.5 microseconds to process each interrupt, calculate the percentage
of time spent in I/O handling (neglect seek time).
Q4. Suppose you have a 4-drive RAID array with 200GB per drive. Calculate available data storage capacity
for different RAID levels: 0, 1, 2, 3, 4, 5, 6.
Q5. A disk having 500 cylinders (0 to 499) needs to serve a reference string: 144, 10, 123, 75, 304, 281, 480.
If the r/w head is at 250, calculate the total head movement (in cylinders) if the disk scheduling algorithm is: (i)
SSTF (ii) SCAN (iii) C-SCAN and ((iv) FCFS>
Operating Systems | 218
Q6. Suppose a disk has a label “160GB SATA HDD 7200rpm 3MB/s transfer rate” and 200 sectors per track
with sector size 512 bytes. What is the average rotational latency? What is the average transfer time to read
one sector of data?
PRACTICAL
Q1. In a UNIX or Linux system, Learn and try different shell commands like: df, ls, fdisk, fsck, mkdir,
mkfs, sfdisk, parted etc.
Q2. Check the filesystem hierarchy in a UNIX or Linux system using command ‘ls /’. make a tree structure to
reach to your home directory.
Q3. In a UNIX or Linux system, try ‘ls -l’ from your present working directory and see the entire record for
each file entry. Learn what each of the letters means in the first string like ‘drwxr-xr-w’. How to change them?
Q4. In a Windows system, open the command prompt and write dir. See the output and understand what it
shows.
Q5. In a Windows system, on the command prompt, type tree / TREE to see the filesystem hierarchy.
Q6. In a Windows system, on the command prompt, type help to learn about other Windows commands and
try some commands related to disk and file management.
Q7. Write a program to design and implement a file management system.
Q8. Write a program to perform operations for synchronization between CPU and I/O controllers.
KNOW MORE
I/O Management is a vast area and is discussed in general with good detail as two separate parts as
storage management and file systems each with at least one or more chapters in [SGG18] , [Sta12],
[Hal15] and [Dha09].
[SGG18] covers SSD devices and flash drives with reasonable details under NVM storage devices. It also
discusses file system mounting and recovery mechanisms.
[Sta12] also covers the DMA and RAID structures vividly. This also provides a brief account of different
file systems found in UNIX, Linux and Windows systems.
[Hal15] discusses disk formatting techniques very nicely. It also covers file system journaling and virtual
file systems with a focus on UNIX systems.
[Dha09] especially covers error recovery and buffering part quite well.
[Bac05] has a complete chapter each on buffer cache and I/O subsystem, while two chapters on file
systems of UNIX OS. [Vah12] dedicates four chapters to UNIX file systems.
[YIR17] contains implementational details of I/O system management in Windows operating systems
across different architectures.
[Bac05] Maurice J Bach: The Design of the UNIX Operating System, Prentice Hall of India, 2006.
[Dha09] Dhananjay M. Dhamdhere: Operating Systems, A Concept-Based Approach, 18]McGraw Hill, 2009.
[Hal15] Sibsankar Haldar: Operating Systems, Self Edition 1.1, 2016.
[SGG18] Abraham Silberschatz, Peter B Galvin, Greg Gagne: Operating Systems Concepts,10th Edition,
Wiley, 2018.
[Sta12] William Stallings: Operating Systems Internals and Design Principles, 7th Edition, Prentice Hall, 2012.
219 | I/O Management
[Vah12] Uresh Vahalia: UNIX Internals, The New Frontiers, Pearson, 2012.
[YIR17] Pavel Yosifovich, Alex Ionescu, Mark E. Russinovich, and David A. Solomon: Windows Internals,
Seventh Edition (Part 1 and 2), Microsoft, 2017. https://docs.microsoft.com/en-
us/sysinternals/resources/windows-internals (as on 17-Marl-2023).
[Bac05] Maurice J Bach: The Design of the UNIX Operating System, Prentice Hall of India, 2005.
[CKM16] Russ Cox, Frans Kaashoek, Robert Morris: xv6, a simple, Unix-like teaching operating system,
available at https://www.cse.iitd.ac.in/~sbansal/os/book-rev9.pdf
[Dha09] Dhananjay M. Dhamdhere: Operating Systems, A Concept-Based Approach, McGraw Hill, 2009.
[Dow16] Allen B. Downey: The Little Book of Semaphores, 2e, Green Tea Press, 2016 (available at
https://greenteapress.com/semaphores/LittleBookOfSemaphores.pdf as on 9-Oct-2022).
[HA09] Sibsankar Haldar and Alex A Aravind: Operating Systems, Pearson Education, 2009.
[Hal15] Sibsankar Haldar: Operating Systems, Self-Edition 1.1, 2015.
[Han00] Per Brinch Hansen: The Evolution of Operating Systems, (2000) (available at http://brinch-
hansen.net/papers/2001b.pdf) ((as on 8-Jul-2022).
[Mil11] Milan Milenkovic: Operating Systems - Concepts and Design, 2nd edition, Tata McGraw Hill, 2011
[Nar14] Naresh Chauhan: Principles of Operating Systems, Oxford University Press, 2014.
[RR03] Kay A. Robbins, Steven Robbins: Unix™ Systems Programming: Communication, Concurrency,
and Threads, PrenticeHall, 2003.
[SGG18] Abraham Silberschatz, Peter B Galvin, Greg Gagne: Operating Systems Concepts,10th Edition,
Wiley, 2018.
[SR05] Richard W Stevens, Stephen A Rago: Advanced Programming in the UNIX Environment (2nd
Edition), Addison-Wesley Professional, 2005.
[Sta12] William Stallings: Operating Systems Internals and Design Principles, 7th Edition, Prentice Hall,
2012.
[Vah12] Uresh Vahalia: UNIX Internals, The New Frontiers, Pearson, 2012.
[YIR17] Pavel Yosifovich, Alex Ionescu, Mark E. Russinovich, and David A. Solomon: Windows Internals,
Seventh Edition (Part 1 and 2), Microsoft, 2017. https://docs.microsoft.com/en-
us/sysinternals/resources/windows-internals (as on 8-Jul-2022).
Operating Systems | 221
Course outcomes (COs) for this course can be mapped with the programme outcomes (POs) after the completion
of the course and a correlation can be made for the attainment of POs to analyze the gap. After proper analysis
of the gap in the attainment of POs necessary measures can be taken to overcome the gaps.
CO-1
CO-2
CO-3
CO-4
CO-5
CO-6
The data filled in the above table can be used for gap analysis.
Operating Systems | 222
INDEX
A E
Address Space, 16, 38 ease of use, 5
application, 4 Embedded System, 8
application software, 4 Embedded Systems, 12
Emulator, 27
Emulation, 27
B exceptions, 12, 15, 17, 18, 19, 20, 31, 189
Banker’s algorithm, 119, 120, 130, 133, 135, 137 Execution context, 16
batch processing, 7 extensibility, 22, 23
Batch Systems, 9
Belady, 169, 177, 178 F
Boot, 182, 200, 201
bootstrap, 12 FCFS, 35, 57, 58, 61, 63, 70, 72, 76, 78, 91, 182, 191, 193, 195,
bootstrapping 218
bootstrap. See FIFO, 58, 83, 85, 104, 112, 116, 142, 167, 168, 169, 170, 171,
Burst time, 56, 77 172, 177, 178, 179, 205
filesystem, 13, 18, 23, 28, 30, 31, 200, 202, 203, 209, 214, 218
File-system, 14
C formatting, 14, 182, 183, 198, 199, 200, 201, 202, 205, 217,
CAS, 93, 94, 96, 98, 99, 109, 113, 114 218
Circular Wait, 126, 128 fragmentation, 142, 143, 150, 151, 152, 153, 158, 177, 178,
CLI, 18, 27, 28, 29, 33 211, 212
Closed Shop, 7
cloud computing, 27, 31 G
computer, 4, 224
Concurrent Programming, 7 GENERATIONS, 6
context switch. See CONTEXT SWITCH, See Context Switch
CONTEXT SWITCH, 43
context switches, 14, 44, 57, 61, 63, 76, 98, 154
H
controller, 12, 185, 186, 187, 188, 189, 190, 197, 199, 201, HAL, 30
202, 215, 217 hardware, 4
CPU Utilization, 56 high-level languages, 4
critical region, 101, 106, 109 Hold & Wait, 128
critical section, 80, 81, 89, 90, 91, 92, 93, 95, 98, 100, 101, Hybrid Systems, 9
104, 106, 109, 110, 112, 113, 116, 122, 124, 125 hyperthreading, 65
CS, 32, 72, 89, 90, 91, 92, 93, 95, 96, 97, 98, 99, 100, 106, hypervisors, 26
112, 114, 116, See Critical Section
C-SCAN, 182, 196, 218
CSP, 90, 91, 95, 96, 103, 109, 113, 114 I
cylinder, 193, 194, 195, 202 I/O, ix, 5, 6, 7, 9, 12, 13, 14, 15, 18, 19, 20, 21, 25, 28, 29, 30,
31, 41, 42, 43, 44, 45, 54, 56, 57, 58, 64, 72, 73, 77, 124,
D 144, 158, 161, 162, 163, 166, 167, 170, 171, 173, 174, 182,
183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 202, 207,
deadlock, 2, 36, 107, 108, 109, 110, 111, 112, 116, 117, 119, 209, 214,215, 216, 217, 218
120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 133, I/O subsystem, 15, 31, 182, 190, 191, 192, 202, 209, 215, 218
134, 135, 136, 137, 138, 139, 140, 143, 183 Interactive Systems, 9
Demand paging, 142, 166, 167, 175 interrupts, 7, 12, 13, 15, 16, 19, 20, 29, 31, 33, 44, 63, 66, 83,
Device drivers, 30, 182, 189, 190 91, 92, 99, 109, 186, 188, 189, 216
Dining Philosophers, 107, 109 IPC, 28, 30, 80, 81, 82, 83, 84, 85, 87, 88, 104, 105, 113, 117
Distributed Systems, 8, 12
DLL, 29
DMA, 182, 183, 187, 188, 190, 215, 216, 217, 218 J
Java Virtual Machine, 27
JVM. See Java Virtual Machine
223 | Operating Systems
122, 124, 136, 137, 138, 139, 140, 142, 144, 145, 146, 148,
K 149, 150, 151, 152, 153, 154, 156, 157, 158, 160, 161, 162,
Kernel, 16, 23, 30, 31, 38, 48, 73, 74, 92 163, 165, 166, 167, 168, 170, 172, 173, 174, 176, 177, 178,
KLT, 48, 49, 50, 51, 63 179, 185, 187, 189, 191, 192, 193, 205, 206, 214, 217
Process, 10, 13, 29, 31, 35, 37, 38, 42, 43, 44, 54, 76, 77, 78,
89, 102, 110, 111, 136, 138
L PROCESS CONTEXT, 42
livelock, 120, 121, 122, 123, 124, 138, 140 Processing mode, 15
Load Control, 166, 173 processor, 4, 7, 9, 10, 14, 15, 16, 19, 20, 21, 27, 31, 33, 35, 37,
LRU, 91, 142, 167, 171, 172, 177, 178, 179, 214 41, 43, 44, 45, 46, 47, 52, 64, 65, 66, 72, 74, 77, 87, 88, 91,
LWPs, 47, 50, 63 94, 98, 100, 106, 112, 124, 140, 142, 143, 144, 145, 151,
154, 155, 161, 162, 173, 176, 178, 182, 184, 185, 186, 187,
192, 193, 200, 205, 215, 216, 217
M producer-consumer, 84, 104, 105, 106, 109, 113, 117
Protection, 15, 19, 142, 157, 165
machine language, 4
Memory, 11, 13, 29, 31, 38, 43, 44, 64, 65, 82, 91, 94, 142,
144, 145, 149, 152, 153, 157, 158, 161, 164, 179, 180, 187 R
message passing, 12, 18, 23, 31, 80, 81, 82, 83, 84, 109
Microkernel, 1, 23 race condition, 81, 88, 94, 113
Monitors, 80, 101, 102 RAG, 125, 126, 128, 133, 139
Monolithic, 1, 23 RAID, 197, 198, 217, 218
MQ, 83, 84 Rate Monotonic, 67
multiplexing, 11, 22, 50 Readers-Writers, 106
Multi-processor Systems, 10 real addresses, 146, 147, 148
Multiprogram Systems, 11 Real Time systems, 8, 66
multiprogramming, 7, 14, 19, 25, 27, 31, 37, 42, 43, 44, 54, Real-Time Scheduling, 66
55, 75, 80, 88, 109, 143, 144, 149, 150, 164, 165, 166, 167, Real-time Systems, 9
172, 173, 174, 176, 179, 188, 193, 206 resource allocation, 6, 45, 54, 72, 124, 125, 126, 129, 133,
Multi-user Systems, 10 137, 139
Mutex, 98 resource management
resource manager, 13
Resource Management, 13
N resource utilization, 5, 7, 27, 47, 128
Response Time, ix, 35, 57
No Preemption, 126, 128
RM, 35, 67, 68, 70, 72
NRU, 142, 167, 170, 171, 179
rotational latency, 163, 193, 200, 214, 218
O S
Open Shop, 6
SC, 142, 167, 169, 170, 171, 179
operation mode, 15
SCAN, 182, 195, 196, 216, 218
OPT, 167, 168, 171, 172, 179
scheduling, ix, 10, 13, 19, 29, 30, 35, 36, 43, 54, 56, 57, 63,
64, 65, 66, 67, 68, 69, 71, 72, 74, 75, 76, 77, 78, 87, 88,
P 115, 150, 167, 182, 183, 189, 191, 193, 196, 202, 209, 215,
218
page, 43, 142, 143, 153, 154, 155, 156, 157, 158, 159, 161, Scheduling
162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, Thread scheduling. See Scheduling
174, 175, 176, 177, 178, 179, 204, 214 Security, 11, 15, 19
Page Table, 155 seek-time, 163, 193, 194, 214
Paging, 142, 149, 153, 154, 166 Segmentation, 149, 159
paravirtualization, 27 Semaphores, 80, 98, 100, 117, 140, 220
partitions, 149, 150, 152, 176, 200, 201, 209 shared memory, 18, 30, 80, 81, 82, 87, 94, 109, 117
PCB, 35, 36, 42, 43, 44, 48 Signal system, 83
Personal Computing, 7 SJF, 35, 58, 59, 60, 61, 63, 72, 78
Peterson, 80, 96, 109, 110, 113 software, 4
Pipes, 84 spooling, 7
portability, 22, 23, 24, 25, 149, 190 SSTF, 182, 194, 216, 218
process, ix, 8, 12, 13, 14, 16, 17, 18, 19, 20, 22, 23, 24, 28, 29, synchronization, ix, 8, 10, 30, 57, 66, 80, 81, 88, 91, 92, 95,
30, 31, 32, 33, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 97, 98, 100, 101, 103, 104, 106, 107, 109, 113, 117, 124,
47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 61, 63, 64, 65, 66, 67, 140, 208, 218
72, 73, 74, 75, 76, 77, 78, 80, 81, 82, 83, 84, 85, 86, 87, 88, syscall, 15, 20, 21, 28, 39, 40, 83, 85
89,90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, system calls, 12, 13, 16, 18, 19, 20, 21, 28, 29, 31, 38, 44, 49,
103, 106, 107, 109, 110, 112, 113, 114, 115, 116, 117, 119, 65, 82, 86, 125
Operating Systems | 224
system daemons, 12
system software, 4
V
virtual addresses, 147, 177
virtual machines
T virtual machine manager. See
Thrashing, 110, 174, 176 VIRTUAL MACHINES, 25
thread, 30, 31, 36, 45, 46, 47, 48, 50, 51, 52, 54, 63, 65, 72, Virtual memory, 31, 160, 161, 166, 174
74, 75, 76, 78, 94, 98, 117, 119, 122, 124, 125, 126, 128, VM, 25, 160, 161, 162, 166, 167
129, 130, 131, 132, 133, 135, 136, 137 VMM. See Virtual Machine Manager
Throughput, ix, 35, 56, 174
time-sharing, 7, 9, 22, 29, 31
traps, 12, 15, 44, 83, 92, 146, 189
W
TSL, 92, 97, 98, 109, 113 Waiting Time, ix, 35, 56
Turnaround Time, ix, 35, 56 Windows, 7, 9, 14, 18, 21, 23, 25, 29, 30, 31, 33, 34, 37, 51,
57, 65, 66, 78, 79, 117, 118, 140, 151, 180, 201, 209, 218,
219, 220
U WINDOWS, 1, 27, 29
ULT, 48, 49, 50, 51, 63
UNIX, 1, 7, 9, 17, 27, 28, 29, 31, 32, 33, 34, 37, 40, 51, 54, 56,
57, 72, 76, 78, 79, 83, 84, 85, 86, 117, 118, 124, 140, 166,
Z
179, 180, 190, 205, 208, 209, 218, 219, 220 zombie, 40, 41, 76
user mode, 15, 17, 20, 21, 23, 29, 38, 41, 43, 50, 55, 92
225 | Operating Systems