Os Notes Final
Os Notes Final
AIT
OPERATING
SYSTEMS NOTES
15
OPERATING SYSTEM
UNIT 1
Syllabus:
Introduction:
1.1 What operating systems do
1.2 Operating System Structure
1.3 Operating System Operations
1.4 Process Management
1.5 Memory Management
1.6 Storage Management
1.7 Protection and Security
1.8 System Structures: Operating System Services
1.9 User Operating System Interface
1.10 System Calls
1.11 Types of System Calls
1.12 System Programs
1.13 Operating System Structure
1.14 Virtual Machines
1.15 Process Concept: Process Scheduling
1.16 Operations on Processes
1.17 Interprocess Communication
1.18 Multithreaded Programming: Multithreading Models
1.19 Process Scheduling: Scheduling Criteria
1.20 Scheduling Algorithms
1.21 Multiple- Processor Scheduling
Introduction
An is a program that manages the computer hardware. It also provides a basis for application
programs and acts as an intermediary between the computer user and the computer hardware.
Operating system
Computer H/W
The user view of the computer varies by the interface being used:
1. PC: Monitor, keyboard, mouse and system unit. It is designed for one user to have all the
resources (monopolize its resources) and maximize performance. Easy to use is the design
criteria of the OS with importance given to performance rather than resource utilization. OS
optimized for single user expectations.
2. Terminal to Mainframe or minicomputer: Many users may be accessing the same system.
Sharing of resources and exchange of information may be the criteria. Design issue of OS is
to maximize resource utilization. I.e. assure that all available CPU time, memory and I/O are
used efficiently and no individual user takes more than his/her fair share.
3. Workstations: users can have the workstations connected to other workstations or servers.
Even though the users have dedicated resources at their disposal, they can also share
resources which are networked: like files, print servers etc. Here the OS is designed to
compromise between individual usability and resource utilization.
4. Handheld computers: They are basically standalone. They may be connected to networks and
other devices either directly by wire or through wireless modems and networking. These
devices have a limitation of power, speed and interface and hence perform relatives few
remote operations. OS design issue here is individual usability and performance per battery
life.
5. Embedded: These computers have little or no user view. They may be embedded in home
appliances, automobiles etc. The OS is designed to run efficiently without user intervention
at times.
1.1.3 Defining OS
The fundamental goal of computer systems is to execute user programs and to make solving
user problems easier. Toward this goal, computer hardware is constructed.
Since bare hardware alone is not particularly easy to use, application programs are
developed. These programs require certain common operations, such as those controlling the
I/O devices.
The common functions of controlling and allocating resources are then brought together into
one piece of software: the operating system.
A more common definition is that the operating system is the one program running at all
times on the computer (usually called the kernel), with all else being systems programs and
application programs.
One of the most important aspects of operating systems is the ability to multiprogram. A
single user cannot, in general, keep either the CPU or the I/O devices busy at all times.
Multiprogramming increases CPU utilization by organizing jobs (code and data) so that the
CPU always has one to execute.
The operating system keeps several jobs in memory simultaneously (Figure 1.1). This set of
jobs can be a subset of the jobs kept in the job pool—which contains all jobs that enter the
system.
The operating system picks and begins to execute one of the jobs in memory. Eventually, the
job may have to wait for some task, such as an I/O operation, to complete.
Operating
System
Job1
Job2
Job3
Time-Sharing System
Time sharing (or multitasking) is a logical extension of multiprogramming.
The CPU executes multiple jobs by switching among them, but the switches occur so
frequently that the users can interact with each program while it is running.
An interactive (or hands-on) computer system, which provides direct communication
between the user and the system. The user can give instruction to the OS or to a program
directly through keyboard or a mouse, and waits for immediate results.
A time-shared operating system allows multiple users to use the computer simultaneously.
Since each action or command are short in time shared system, only a little CPU time is
needed for each user.
A time-shared operating system uses CPU scheduling and multiprogramming to provide each
user with a small portion of a time-shared computer.
When a process executes, it typically executes for only a short time before it either finishes
or needs to perform I/O. I/O may be interactive; that is, output goes to a display for the user,
and input comes from a user keyboard, mouse, or other device.
Since it has to maintain several jobs at a time, system should have memory management
Though systems differ from each other because they are organized on different lines, there are some
commonalities. Many commonalities are discussed here.
When only a single user is using the system (uniprogramming) the user cannot keep the CPU
busy all the time. This decreases CPU utilization and hence performance. A solution to this is to use
multiprogramming.
Introduction:
A switch from uniprogramming i.e. only one process/job in the memory to
multiprogramming was a remarkable achievement. The job-scheduling notion gave way to
multiprogramming capability. The idea here is to increase the CPU utilization even further. The fact
is that no user program can either keep the CPU or I/O devices busy all the time. Thus concurrency
of operation between the CPU and I/O subsystem is exploited to get more work done by the CPU
and hence increase CPU utilization. Multiprogramming arrangement ensures synchronization of the
CPU and I/O activities in a simple manner. We space multiplex physical memory and time multiplex
the physical processor. Sometimes they are also called as multi-user system.
The OS has to keep several jobs in memory simultaneously. Usually this is the subset of the
jobs kept in the job pool because of the limited primary memory. The OS picks up and starts
executing on of the jobs. Eventually this job may not need the CPU due to some reason like say an
I/O operation to complete. Usually a DMA is initiated which does not require CPU intervention.
Instead of having the CPU idle the OS switches to the next job in the memory. (This is called context
switching. Though context switching is an overhead and expensive the rationale is that the time loss
in context switch if far lower than time gained due to CPU utilization.) CPU remains with this job
until a request for I/O is made. Thus CPU switches from one job to another. Eventually all jobs
complete execution by acquiring the CPU in course of time. Interrupts are used to indicate
completion of I/O. Thus the OS does three major functions: Scheduling, memory management and
I/O management.
Observation?
Multiprogramming is the first instance where the OS has to make decisions for the users.
Which means the OS is fairly complex.
Switching from one job to the next involves huge effort on the part of the OS because it has
keep track of where it left off one job and from where to pick up the next.
All jobs that enter the system are kept in the job pool on the disk. If the jobs have to be
brought to the memory and there is not enough space the OS has to decide which jobs to
bring in.
This decision of which job to be brought to memory is called job scheduling which the OS
will have to do.
When jobs are loaded into the memory requires that the different programs be handled
properly in the memory. So memory management has to be done by the OS.
If several jobs are ready to run on the CPU a decision has to be made as to which job goes
first to the CPU. This decision making process is called CPU scheduling. The part of the OS
which decides on the job that will go to the CPU is called the dispatcher.
The OS has to make sure that when multiple programs are running concurrently they do not
affect each others program.
It may have to also take part in the I/O management. I/O interrupt handling has to be done
when a job comes back indication the completion of I/O processing.
Advantages:
High CPU utilization.
It appears that many programs allotted CPU almost simultaneously.
Disadvantages:
Jobs may be different sizes so memory management is needed to accommodate them.
Since many jobs are ready to run CPU scheduling will have to be done.
Will have to do process management, disk and memory management.
Multiprogrammed systems do not function well if there are only CPU bound jobs or only I/O
bound jobs.
Does not allow user interaction with the computer.
Performance:
For better performance we need a mix of CPU and I/O bound jobs. When a proper mix is there an
increase in degree of multiprogramming will yield higher throughput.
The number of processes running simultaneously and competing for the CPU is called degree
of multiprogramming.
Features:
Interactive systems need to recognize a terminal as an input medium. Thus the user gives
instructions to the OS or a program directly using a keyboard or a mouse and waits for
immediate results. This time between request and service is called response time. The
effectiveness of the timesharing system is measured using its response time.
Allows several users to share the computer simultaneously.
The presence of other users is transparent to the user.
The action or commands tend to be short and hence only little CPU time is needed for each
user. The CPU gives a small time slot called as the time slice. Time Slice δ is the largest
amount of CPU time any program can consume when scheduled to execute.
The system switches rapidly from one user to another giving an impression that the entire
computer is dedicated to his/her use even though it is shared by many users.
The time-sharing system uses CPU scheduling and multiprogramming to provide each user
with a small portion of a time-shared computer.
Each user has at least one separate program/process in memory. Each process executes for a
short time before it either finishes or needs I/O operation. This interactive I/O takes place at
the users speed, during which time the CPU switches to another process of a different
program of another user.
They have memory management and protection since many user tasks are in the memory
simultaneously.
It implements the concept of virtual memory (VM). In order to get reasonable response time
the jobs in the memory have to be swapped in and out of the main memory. To achieve this it
uses the concept of virtual memory which is a technique that allows the execution of a job
that may not be completely in the physical memory.
Advantage of VM is that:
The size of the program can be greater than the size of the physical memory.
It helps to separate the logical memory as viewed by the user from the physical
memory.
Time sharing systems also provide file systems.
They also provide disk management, a mechanism for concurrent execution and many CPU
scheduling techniques.
It also provides job synchronization, and communication and deadlock handling.
Since there is both OS code and the user code in the system (1) we need a mechanism to
distinguish between the two. To distinguish these codes the system needs to operate in different
modes. At the least we need two different modes of operations. Hence the name dual-mode of
operation. (2) It is also required to protect the OS from malicious users.
The two different modes of operation are user mode and kernel mode (also called as
supervisor mode, system mode or privileged mode).
Systems provide hardware support to distinguish between the modes. A bit called the mode
bit is added to the hardware to indicate the current mode. When the mode bit is 0 it indicates that
kernel mode is on and when the bit is set to 1 it implies the user mode. The mode bit thus indicates
the task being performed is either on behalf of the user or the OS.
When the user code is being executed the system is set to user mode. When the user requests
a service from the OS via a system call it must transition from the user mode to the kernel mode.
Privileged instructions
Some machine instructions that are executed only in the kernel mode and cannot be made
easily accessible to errant users are said to be privileged instructions. If an attempt is made to
execute the privileged instructions in the user mode the hardware treats it as an illegal operation and
traps to the OS. Ex. instruction to switch to user mode, I/O control, timer management and interrupt
management.
System calls
System calls are a means by which the user program can ask the OS to perform tasks
reserved for the OS on the user program’s behalf. Or it is a method used by a process to request
action by the OS.
When a system call is invoked it usually takes the form of a trap to a specific location in the
interrupt vector.
When a system call is executed it is treated by the hardware as software interrupt. The
interrupt control passes to the interrupt vector to a service routine. The mode bit is set to kernel
mode. The kernel examines the interrupting instruction to determine what system call has occurred.
The parameters passed indicate what type of service the user is requesting. Additional information
can also be passed in registers or on the stack or in memory. The kernel verifies the parameters are
correct and legal, executes the request and returns control to the instruction following the system
call.
Cases:
Very early in time MS-DOS was written for 8088 architecture with no mode bit. Hence the
user program could wipe out the OS.
Recent one i.e. Pentium provides dual-mode of operation.
Microsoft Windows 2000 and XP, Linux and Solaris also provide greater protection
mechanism.
Errors violating modes are detected by the hardware and handled by the OS. When an illegal
operation like say ex. access to address not in the users space; the hardware will trap to the OS.
Interrupt is transferred to the interrupt vector. When the error occurs the program is terminated
abnormally, an error message is given and a memory dump may be taken.
Timer:
goals of a timer
To ensure control over the CPU.
To prevent the user program from getting stuck in an infinite loop.
To prevent a program from not returning the resources that it holds.
To prevent the user program from running too long.
Dept of ISE, Dr.AIT Page 9
OPERATING SYSTEM
A timer is set to act like an interrupt after a specified period of time. The period may be fixed
or a variable. A variable timer is implemented by a fixed rate clock and a counter. The OS sets the
counter and every time the clock ticks the counter is decremented. When the counter reaches zero an
interrupt occurs. Before turning over control to the user the OS ensures that the timer interrupt
occurred. If so the control is transferred the OS so that a suitable action can be taken. Instructions
with privilege can only modify the timer value.
Process management:
A program in execution is a process. It is unit of work in a system.
Ex1. a time-shared user program like a compiler is a process.
Ex2. A word processing program run by an individual user on a PC is a process.
The different types of processes are:
1. OS processes called system processes as they execute system code.
2. User processes that execute user code.
They execute concurrently by time multiplexing the CPU.
A process requires resources to complete its task. Resources include CPU time, memory,
files, I/O devices etc. Along with the resources it may also require initialization data. After execution
the resources are returned to the pool.
Ex. if the function of a process is to display the status of a file on the screen then the process must be
given an input namely the name of the file.
The process acquires the resources it needs right at the time of creation or while it is running.
A single threaded process has one PC (program counter) specifying the address of the next
instruction to execute. Such processes are sequential i.e. the CPU executed one instruction after the
other. Even if there are two or more processes associated with a program they are considered as
separate execution sequences.
A multi-threaded process has multiple PCs each pointing to the next instruction to execute
for a given thread.
Memory Management:
The CPU can access and address only the main memory directly. Any data on the disk will
have to be moved to the main memory first before the CPU can access.
The CPU reads instructions from the main memory during the instruction fetch cycle and
reads and writes data to the main memory during the data fetch cycle.
Its addresses are absolute addresses and every instruction before execution is mapped to the
absolute address. Hence a program execution will involve accessing instructions and data by
generating the corresponding absolute addresses.
When the program terminates its memory space is freed and a new program is loaded.
In order to improve CPU utilization and to speed up the response time for the users several
users programs are kept in the main memory. Hence to be able to manage multiple programs and to
prevent conflicts memory management is required.
Design feature:
The scheme of memory management for a specific system must take into account the hardware
design of the system.
Storage Management:
One of the OS goals is to provide convenience to the users. In this regard the OS provides a
uniform logical view of information storage. The physical properties of the storage devices are
abstracted to define a logical storage unit called a file.
The different types of physical media are Magnetic disk, optical disk, magnetic tape. Each
has their own physical organization and characteristics like access speed, capacity, data transfer rate,
access method (sequential or random). Each medium is controlled by a device like disk drive, tape
drive etc. which have their own unique characteristics.
File-It is a collection of related information defined by its creator. They commonly represent
programs (source and object) and data.
Types of files
Program files, data files. Data files may be numeric, alphabetic, alphanumeric or binary. Files may
be free form like text files or may be formatted.
Mass-Storage Management:
Design criteria?
The speed of operation of the computer may depend on the speed of the disk system and the
algorithms that manipulate the subsystem.
Tertiary storage management is either done by the OS or can be left to application program.
Caching:
It is a fast memory which is used for storing information under the assumption that we will
need the info again very soon. A copy of the information is maintained in the cache on a temporary
basis. When a particular piece of information is required during processing first the cache is
searched. If the information is available it is directly used from the cache else a search to the disk is
sent.
Dept of ISE, Dr.AIT Page 12
OPERATING SYSTEM
Implementation:
1. Internally programmable registers like index registers can be used as high-speed cache for
the main memory. Either the programmer or the compiler implements the register allocation
and register replacement algorithms to decide which info to be kept in registers and which in
the main memory.
2. Caches can also be implemented in hardware. Ex. most systems have instruction cache to
hold the next instruction expected to be executed. This prevents the CPU from waiting
several cycles when an instruction is to be fetched from memory.
A copy of the same information may exist at different levels in the storage hierarchy. Bulk of the
secondary storage is on magnetic disks. They may be backed up by magnetic tapes or removable
disks to protect against any loss of data. The movement of data in the hierarchy may be explicit or
implicit depending the hardware design and the controlling OS software.
In general transfer of data from cache to CPU registers is a hardware function and transfer from disk
to memory is controlled by the OS.
What about inconsistency of data when they are multiple copies of the same thing?
Ex. Say an integer A of file B is to be incremented. Through an I/O operation the block
containing A is copied to the main memory. This is followed by a copy to the cache and the internal
registers of the CPU. Hence copy of A appears in several places. The increment to A takes place in
the internal registers. At this point there is inconsistency of the same data. When the new value of A
is written back to the disk then the value of A changes everywhere.
I/O Subsystems:
The I/O subsystem consists of several components like:
The memory management component that includes buffering, caching and spooling.
A general device driver interface.
Drivers for specific hardware devices.
The OS must hide the peculiarities of the I/O devices from the user. Only the device drivers know
the peculiarities of each of the devices.
Protection is any mechanism for controlling the access of processes or users to the resources defined
by a computer system.
The mechanism must specify the controls to be imposed and means for enforcement of the controls.
What is security?
It is a mechanism to ensure that the resources of a system are used as intended under all
circumstances.
A system may have adequate protection but may not be secure. The security system must defend the
system from external and internal attacks. Attacks can be of various types like viruses, worms,
denial-of-service attack, identity theft, theft of service etc. Ex. If a users authentication information
is stolen then the owner’s data can be stolen, corrupted or deleted.
The mechanism of protection and security must be able to distinguish among all its users.
This is possible because the system maintains a list of all user ids. These ids are unique per user. The
authentication stage determines the appropriate user id for the user and that user id is associated with
all the user’s processes and threads. When it is required to distinguish among a set of users rather
than individual users then group functionality is implemented. A system-wide list of group names
and group ids are stored. A user can belong to one or more groups depending on the OS design and
implementation. If an user needs to escalate privileges for gaining extra permission then different
methods are provided by the OS for escalation. Ex. in UNIX the setuid attribute causes the program
to run with the user id of the owner of the file rather than the current user’s id. This is the effective
user id which is used until the privileges are turned off.
Network:
A network is a communication path between two or more systems. Basically when computers
communicate they either create networks or use a network. Based on type of transport media,
protocol used and distance between the nodes networks vary.
Types of protocols:
TCP/IP: common and supported by Windows and UNIX, ATM, and other proprietary
protocols.
Based on distance:
LAN: Local Area Network: exits within a room, floor or a building.
MAN: Metropolitan Area Network: could link buildings within a city.
WAN: Wide Area Network: exits between buildings, cities or countries.
SAN: Small Area Network: Blue tooth devices can communicate over short distances of
several feet like in a home setup.
Transmission Medium:
Copper wires, fiber strands, and wireless transmissions between microwave dishes, satellites
and radio. Infrared communication is also one.
Performance and Reliability:
Based on these two factors also the networks vary.
Introduction:
The OS provides an environment within which programs are executed: commonality.
Internally OS varies based on the makeup, algorithms and strategies used and the intended usage of
the computer system.
OS can be assessed or viewed based on the following three points:
1. Examining the services that it provides.
2. The type of the interface it makes available to the users.
3. Disassembling and looking at the components and the type of interconnection they have.
OS services:
OS provides certain services to programs & the users. The services provided vary from OS to
OS even though a common class can be identified.These services are required to provide
convenience to the programmer and make programming task easy.
Types of services:
by having authentication like having login and password. Protection also applies to I/O devices,
modems, network adapters so that no break-ins happen.
System Calls:
System calls provide the interface between a process and the OS. They are generally in
assembly language and are listed in manuals used by programmers. Some system calls are also made
from HLL which resemble predefine functions or subroutine calls. C, C++ and Perl are used to write
the system calls. They may be generated directly inline or a call to a special run time routine can
make the system call.
Discuss the example of how to read from one file and copy to another file.
Ex.
In UNIX system calls are directly invoked from C or C++ program.
In MS Windows platforms system calls are part Win32 API.
Parameters:
Sometimes parameters are passed to the system calls. Three methods are used to pass
parameters to the OS:
1. Pass parameters in registers.
2. If the parameters are more than the registers then they are stored in a block or a table and the
address of the block is passed as a parameter in the register. (like in LINUX)
3. Parameters can be pushed on to the stack by the program and popped off by the OS.
Types/categories of system calls?
Process control
o end, abort
=> normal execution, abnormal execution (causes error trap, dump of memory is
taken and error message generated). Dump written to disk and examined by the debugger.
=> in both cases the control is passed to command interpreter which reads the next
command.
=> depending on the error an error level is indicated (level 0 for normal) so that next
action can be automatically interpreted.
o load, execute
=> for loading and executing a program.
=> The command interpreter after loading starts executing the program and on
termination the control has to be returned.
=> based on whether the program is lost, saved or allowed to continue the control is
sent to the relevant location.
o create process, terminate process
=> A process or a job executing one program can load and execute another program.
If both programs have to run concurrently then multiprogramming is happening.
=> For which we use create or submit process.
=> to terminate a process that was created terminate process is used.
o get process attributes, set process attributes
=> To control the execution of the jobs that are created.
Control can be to determine (get process attributes) and reset (set process attributes)
the attributes of a job or a process.
=> the attributes can be job’s priority, maximum allowable execution time, etc.
o wait for time
=> When jobs have been created waiting is required before the process can finish the job.
=> to wait for certain amount of time wait time system call is used.
o wait event, signal event
=> When a job has to wait for a certain event to occur wait event system call is used.
=> When the event has occurred the job should signal the occurrence through the signal
event system call.
o allocate and free memory
Some system calls help in debugging.
Ex. some system calls help to dump memory.
A program trace lists each instruction as it is executed.
In single step mode a trap is executed by the CPU after each instruction. The trap is caught
by the debugger which helps in finding and correcting bugs.
A time profile provided by the OS indicates the amount of time that the program executes at
a particular location or set of locations.
A time profile requires tracing facility or regular timer interrupt.
File management.
o create file, delete file
=> the system call will require name of the file and perhaps some attributes.
o open, close
=> once the file is created it has to be opened with the open system call and then closed
with a close system call.
o read, write, reposition
=> once the file has been opened read, write or reposition ( skip to eof) may have to be
done and the corresponding system call is used.
o get file attributes, set file attributes
=> attributes of a file like file name, file type, protection codes, accounting information
can be determined or changed using these two system calls.
Device management
When a program is running it may need additional resources like more memory, tape drives,
access to files etc.
o request device, release device
=> make a request for additional resources. If the resources are available they are granted
and control can be returned to the user process else the program will have to wait for the
resource.
=> Since the system can have multiple users then a request has to be made for the
resource through the request system call and released through the release system call
once the work is done.
o read, write, reposition
=> once the device has been requested and allocated read, write and reposition can be
done on the device by using the corresponding system calls.
o get device attributes, set device attributes
o logically attach or detach devices
Information maintenance
These system calls help in the transfer of information between the user program and the OS.
Dept of ISE, Dr.AIT Page 19
OPERATING SYSTEM
System Programs:
System programs fall between the OS and the application program. They provide convenient
environment for program development and execution. They are like user interfaces to the system
calls. The categories of system programs are:
1. File Management: These programs create, delete, copy, rename, print, dump, list and
manipulate files & directories.
2. Status Information: They help to ask the system for date, time, amount of available memory
or disk space, number of u sers etc. The information is formatted and printed to the terminal
or o/p device.
3. File Modification: Text editors help to create and modify the contents of the files stored on
the disk or tape.
4. Programming-Language Support: Compilers, assemblers, and interpreters for common
languages come with the OS. Sometimes they come separately also.
5. Program Loading & execution: A compiled program to be loaded into memory for execution
is done by these programs. They can be absolute loaders, relocatable loaders, linkage editors
and overlay loaders.
6. Communications: These programs help to create connection among processes, users and
different computer systems. They allow users to send messages to others screen, browse the
net, send electronic mail. Log in remotely or transfer files from one machine to another.
Common programs supplied with the OS are web browsers, word processors, text formatters,
spreadsheets, d/b systems, compilers, games, statistical packages etc. These programs are called as
system utilities or application programs.
Important system program for OS is the command interpreter. Commands given at this level create,
delete, list, print, copy, execute files etc. Two approaches here are: to have the code to execute the
command in the interpreter itself. The number of commands that are supported determines the size
of the command interpreter. Alternately in UNIX the OS implements most commands as system
programs. The command simply identifies the file to be loaded into memory to be executed.
Advantage: Programmers can add new commands to the system by just creating new files related to
the command.
Disadvantage: Since they execute the command is a separate system program OS must provide a
mechanism to pass parameters to the system program from the command interpreter. Clumsy
method: parameter list can be big; Sometimes the command interpreter and the system program may
not be memory resident at the same time. Since the parameter interpretation is users choice they may
be inconsistently define across the system.
System Structure:
In order to make the OS function properly and to be easily modifiable the common approach
to development is to partition the system into small tasks and implement rather than building a
monolithic structure. Each small component must be well defined must a defined set of inputs,
outputs and functionality.
Simple Structure:
Many commercial systems do not have a well-defined structure:
Ex1. MS-DOS. Such OS started as small, simple and limited system and grew later. It was originally
designed to provide most functionality in least space. Hence it was not divided into modules
carefully.
MS-DOS layer structure.
Application program
Ex2. UNIX was limited by h/w functionality initially. It consists of two parts the kernel and the
system programs. Kernel has a series of interfaces and device drivers which got added over the
years. Traditionally it has been a layered system. Everything below the system call interface and
above the physical hardware is the kernel. Kernel does CPU scheduling, memory management, file
system and other functions through system calls. This is a lot of functionality combined at one level
which makes UNIX difficult to enhance. System calls define API to UNIX which defines the user
interface.
Newer UNIX versions:
Come with advanced hardware. So the OS can be broken into smaller components.
Have greater control over the computer and the applications.
Implementers have more freedom to make changes to the inner workings of the system and
the working of the OS.
A top down approach is used where the functionality and the features are separated into
components. This helps to hide information and hence freedom to implement the low level
routines as required.
(the users)
Layered Approach:
One method of modularization is through a layered approach. Here the OS is broken into
number of layers of levels each built on top of the other. The bottom most layer is the h/w and the
top most is the user interface. Each layer is like an object where the implementation and operations
along with the data is encapsulated. Each layer has data structures and a set of routines that can be
invoked by the higher layers. Each layer selects and uses the functions and services of only layers
below it (*). Each layer is implemented using only those services provided by the lower layer
without actually knowing how it is implemented. Hence each layer hides its dsata structures,
operations and hardware from the higher layers.
Advantage:
Modularity.
Easy debugging and system verification (because of *). 1st layer is debugged without concern
for rest of the layers. It just has to work correctly on the hardware right below it. Once the 1 st
layer is debugged the 2nd layer is debugged and so on. If an error is found during debugging
of a particular layer the error must be in that layer since the layers below it have been verified
to be correct.
Because of the above advantage the design and implementation of the system is simplified.
Disadvantages:
Require careful definition of the layers.
It tends to be less efficient than other systems. Ex. When a user program executes I/O it
executes a system call that trapped to the I/O layer which in turn calls the memory
management layer, which in turn calls the CPU scheduling layer and so on. At each layer
parameters may be modified and data need to be passed. Hence each layer adds overhead
to the system call.
This because of the above point the system call takes longer than on a non-layered
system.
Changes: To get over the disadvantages fewer layers with more functionality added to get the
advantage of modularized code. Ex. Windows NT had highly layered approach but delivered low
performance compared to Windows 95. Hence Windows NT 4.0 addressed this problem by moving
layers from user space to kernel space and closely integrating them.
OS layer:
Microkernel:
As UNIX system expanded its kernel became large and difficult to manage. In mid 80s
researchers developed Mach OS that modularizes the kernel using the micro kernel approach.
What is microkernel approach?
In this approach the OS is structured such that all non-essential components are removed
from the kernel and implemented them as system and user level programs. They typically provide
minimal process and memory management and communication facility. Communication is through
message passing.
Function of the microkernel is to provide an interface between the client program and various
services that are also running in the user space.
Advantage:
Ease of extending the OS. All new services are added to the user space and hence no
modification is required to the kernel. Changes are minimal because the kernel is just the
microkernel.
Such an OS is easy to port from one h/w to another.
Security is more since the services are running as user rather than kernel processes.
Reliability is high because if a service fails the rest of the system remains untouched.
Dept of ISE, Dr.AIT Page 23
OPERATING SYSTEM
Ex. Tru64 UNIX provides UNIX interface to the user but provides Mach kernel. The Mach kernel
maps UNIX system calls to messages to the appropriate user level services.
MacOS X Server OS is based on Mach kernel.
QNX is a real time OS also based on microkernel.
Windows NT uses hybrid structure since part of it is layered. It is designed to run various
applications including Win32, OS/2 and POSIX. It provides a server that runs in user space
for each application type. The kernel coordinates the message passing between the client
applications and application servers.
Modules:
Current methodology for OS design is using OO programming techniques. This helps to create
modular kernel.
Technique:
OS only has core components.
For additional services dynamic linking at boot time or run time is done. I.e. dynamically
loadable modules.
Ex. Solaris, Linux and Mac OSX.
Solaris: Scheduling
Device and classes File
bus drivers Systems
Core Solaris
Miscellaneous kernel
modules
Loadable
system calls
STREAMS Executable
modules formats
Mac OSX:
Application environments
and common services
BSD
Mach
kernel environment
Uses hybrid structure.
Top layers include application environments and a set of services providing GUI.
Below is the kernel environment which has the Mach microkernel and BSD kernel.
Mach provides memory management, supports RPC, IPC, message passing and thread
scheduling.
BSD kernel provides BSD command line interpreter, support for networking and file
systems, implementation of POSIX APIs.
The Mach and BSD provide an environment that provides an I/O kit for development of
device drivers and dynamically loadable modules.
Virtual Machines:
Computer system made of layers with h/w at the bottom and the kernel above it. Kernel uses
the h/w instructions and creates a set of system calls for use by the outer layer. The system programs
above the kernel can use system calls or hardware instructions and cannot in fact differentiate
between the two, even though they are accessed differently. System programs use them to create
more advanced functions. The system calls and h/w instructions are treated to be at the same level by
the system programs.
A virtual machine takes the layered approach to its logical conclusion. It treats hardware and
the operating system kernel as though they were all hardware. A virtual machine provides an
interface identical to the underlying bare hardware. The operating system creates the illusion of
multiple processes, each executing on its own processor with its own (virtual) memory. The
resources of the physical computer are shared to create the virtual machines. CPU scheduling can
create the appearance that users have their own processor. Spooling and a file system can provide
virtual card readers and virtual line printers. A normal user time-sharing terminal serves as the
virtual machine operator‘s console.
OS Generation:
It is easy to design OS that that is specific for one machine and one site.
system generation (SYSGEN)
The designing of an OS that runs on any of a class of machines and at different sites with
different peripheral configurations is required. This requires that the system must be configured for
each specific computer site and this process is called system generation.
The SYSGEN program reads from a given file, asks the operator of the system for
information concerning the specific configuration of hardware system or probes the hardware
directly what its components are.
System Boot:
The procedure of starting a computer by loading the kernel is called booting. After loading
execution begins at which point the system is said to be running.
How does the hardware know where the kernel is how to load the kernel?
On most computer systems a small program called the bootstrap program or bootstrap loader
locates the kernel, loads it into memory and starts execution.
In some PCs it is a two-step process. A simple bootstrap loader fetches a more complex boot
program form the disk, which in turn loads the kernel.
In cellular phones, PDAs and consoles the entire OS (and the bootstrap) is stored on a ROM
(firmware). Problem here is changing the bootstrap will require change in the entire ROM
chip. Solution could be to use EPROM.
Note: Executing from firmware is slower and more expensive than executing from RAM. Hence
some systems store the OS in firmware and a copy of it in RAM for fast execution.
In large OS and for OS that change frequently the bootstrap loader is stored in the firmware
and the OS on the disk. It has the diagnostics and a small code to read a single block from a
fixed location on the disk called the boot block. This is loaded into memory and executed.
Advantage here is that the OS can be changed by writing new versions to disk. A disk with a
boot partition is called a boot disk or system disk.
Steps in booting
When the CPU receives reset event like power on, the IR gets loaded with a predefined
memory location containing the starting address of the bootstrap program which resides on a
ROM. Hence starting the execution there.
The bootstrap runs diagnostics to determine the state of the machine.
If the diagnostics passes the booting steps continue.
Dept of ISE, Dr.AIT Page 27
OPERATING SYSTEM
All CPU registers, device controllers and some memory locations are updated.
Process Management
The evolution process of computer systems:
A single program in execution which had complete control over the system and had access to
all the systems resources to
Multiple programs in memory which had to be executed concurrently and required more
control and more compartmentalization.
This evolution required the introduction of the notion of a process.
Before we understand process management it is necessary to understand the concept of a process.
Process:
A program in execution.
A program in execution that competes for the CPU and other resources.
A unit of work in timesharing systems. An entity that can be assigned and executed by the
CPU.
An active entity.
Process Concept:
CPU activities
A batch system executes jobs.
A timesharing system executes user programs called tasks.
On a single-user system like Windows or Macintosh: the several programs like word
processor, web browser, e-mail package etc.
Your command interpreter is a process.
All these are activities. All these activities in many respect are similar. And they are all called
process.
Process Details:
1. Consists of program code: called text section.
2. Current activity: indicated by the program counter and contents of the processors registers.
3. Temporary data stored in: stack that stores method parameters, return addresses and local
variables. Process may also have a heap which is dynamically allocated at run time.
4. Global variables stored in Data Section.
5. Process state: A state of a process is defined in part by the current activity it performs. The
process may be in the following states:
a. New: The process is being created.
b. Running: Instructions are being executed. It is the only process which is executed by
the CPU at any given time.
c. Waiting: The process is waiting for some event to occur like I/O completion or
interrupt signal. When a process is waiting it is said to be in a blocked state. A
blocked process cannot be directly scheduled even if the CPU is free.
d. Ready: The process is waiting to be assigned to the processor. Such a process is not
waiting for any external event like I/O or other interrupts.
e. Terminated: The process has finished execution.
terminated
new
Admitted interrupt exit
ready running
scheduler dispatch
waiting
process number
program counter
registers
memory limits
g. Accounting Information: Indicates the amount of CPU and real time used, time limits,
account numbers, job or process numbers.
h. I/O status information: Includes list of I/O devices allocated to this process, list of
open files, etc.
i. Other items: can have process priority, path name etc.
7. Threads: Here it indicates that the process is executing a single thread of instructions. This
single thread of control allows the process to perform only one task at a time.
Process Scheduling:
Since there are multiple processes are there in the system like ex.:
Some process must be running in a multiprogramming environment to increase CPU
utilization.
In timesharing the CPU has to switch between different user processes.
Since it is a uni-processor system we can have only one process to be running at a time. If there is
more than one process in the system they must be kept some place before execution. They will have
to wait for the CPU and rescheduled.
Hence the scheduler selects an available process for execution on the CPU.
While scheduling various processes there are many objectives the OS has to choose from like:
Fairness
Good throughput
Good CPU utilization
Low turnaround time
Low waiting time
Good response time
Time slice
expired
Wait for an
Interrup interrupt
t occurs
Here rectangles represent different types of queues and the circle represents the resources that serve
the queue. When a process is executing one of the following events can occur:
The process can issue an I/O request and then placed in the I/O queue.
The process could create a new process and wait for its termination.
The process could be removed from the CPU as a result of an interrupt after which it is put
back to the ready queue.
Note: The main process or the parent process can create a child process which in turn can also create
its own child process. All these processes form a tree structure with the parent process as the root.
There are some advantages for creating a child process:
1. Computational Speed-up: multiple processes results in multi-tasking. OS can interleave execution
of I/O bound and CPU bound to increase the degree of multiprogramming.
2. Higher priority for critical functions: A child process created to perform a critical function in an
application may be assigned higher priority than others. Such priority assignments help the OS to
meet real-time requirements in an application.
3. Protection of parent process from errors: The OS cancels the child process if an error arises during
its execution. This action does not affect the parent process. This is normally done when a software
system has to invoke an untrusted program.
Scheduler:
A process migrates between various queues during its lifetime. A scheduler is one that does
the selection of a process for scheduling purpose from the various queues maintained in the system.
Types of schedulers?
1. Long-term scheduler:
Ex. In a batch system more processes are submitted than can be executed immediately.
They are spooled on a mass storage (disk) where they are kept for later execution.
This scheduler also called the job-scheduler selects processes from this pool and loads them
into memory for execution.
This scheduler executes much less frequently.
I/O bound process- It is a process that spends more time doing I/O and spends less time on useful
computation.
CPU bound process-It is a process that spends more time doing computation and generates I/O
requests very infrequently.
2. Short-term scheduler:
Also called as CPU-scheduler.
It selects from among the processes that are ready to execute and allocates the CPU to them.
The frequency is low i.e. must select a process very frequently.
It typically executes at least every 100msecs.
Because of the brief time intervals the short-term scheduler must execute very fast.
Ex. If it takes 10ms to decide to execute a process for 100ms what percent of CPU is used for
scheduling?
= 10/(100 + 10) = 9% of the CPU is used for scheduling the work.
3. Medium-Term scheduler:
Some time-sharing systems introduce an additional intermediate level of scheduling.
This scheduler removes the processes from active contention of CPU (from memory) to
reduce the degree of multiprogramming.
The process is reintroduced later and continued from where it was left of. This scheme is
called swapping.
Swapping may be necessary to improve the process mix or a change in memory requirements
has over committed available memory requiring memory to be freed up.
Context Switch:
The process of switching the CPU to another process by saving the state of the old process
and loading the saved state of a new process is called context switch.
The context of a process is represented by its PCB including the value of the CPU registers,
the process state (state diagram) and the memory management information.
When a context switch occurs the kernel saves the context (state save) of the old process in
its PCB and then loads (state restore) the context of the saved process scheduled to run.
Drawback of context switching
Context switch is an overhead because the system does no useful work during switching.
The speed of switching varies from machine to machine depending on the memory speed and
the number of registers that must be copied and the existence of special instructions. Typical speed
varies from 1 to 1000microsecs.
Other factors that context switch depends on?
It is highly dependent on the underlying hardware. Ex. In Sun UltraSPARC there are multiple
sets of registers. A context switch just involves changing the pointer from one set to another
register set. If active processes exceed the register set then the register data is copied to and
from the memory.
More the complex an OS is more work is done during context switch.
More advanced is the memory management technique more data has to be switched with
each context.
Operations on processes:
The system must provide a mechanism for creation and deletion of processes.
Process Creation:
Process created via create_process system call.
Creating process is the parent process and the created is the child process.
Each of these may again create processes forming a tree structure.
The parent has the identity of all its child processes.
The created sub process may get its resources directly from the OS or may be constrained to
the subset of resources of the parent.
The parent either partitions its resources among its children or shares its resources.
The advantage of restricting the resources of a child process to the parent’s resources is:
prevents any process from overloading the system by creating too many sub-processes.
When a child process is created initialisation data may be passed from the parent process
apart from other physical and logical resources.
With respect to execution two possibilities exist:
o Parent continues to execute concurrently with the children.
o The parent waits until some or all of the children have terminated.
With respect to the address space there are two possibilities:
o The child process is a duplicate of parent process. Ex. UNIX
o The child process has a program loaded into it. Ex. DEC VMS
o Windows NT supports both implementations.
Process Termination:
When a process finishes execution it terminates and asks the OS to delete it using the exit
system call.
All the resources of a child are returned to the parent process.
The resources of the main process including physical and virtual memory, open files, I/O
buffers are deallocated by the OS.
A process can initiate termination via abort system call.
Only the parent can invoke such a call. Else users will arbitrarily kill each other.
A parent can terminate the child process for various reasons:
o The task assigned to the child is no longer required.
o The child has exceeded the usage of some of the resources that are allocated.
o The parent wants to exit. The OS does not allow child to continue. If a process
terminates normally or abnormally all its children must also be terminated. This is
called as cascading termination.
In UNIX we terminate a process using exit system call. The parent waits for the child to exit
via a wait system call. The wait system call returns the process id of the child terminated.
Dept of ISE, Dr.AIT Page 33
OPERATING SYSTEM
Cooperating Process:
The process executing in a system can be independent processes or cooperating
processes,
Independent Process: it cannot effect or be effected by the other process executing in the
system. Such processes typically do not share any data with any other process.
Cooperating Process: Processes that can effect or can be affected by other processes
executing in the system. Typically such processes share data with other processes. They require IPC
to exchange data and information.
Need for process cooperation:
Information Sharing: If several users want the same resource like a shared file an
environment to allow that must exist.
Computational Speedup: If a particular task is to run faster then we may want to break it up
into subtasks all executing in parallel. Such speed up happens if there are multiple
processors.
Modularity: Helps to construct a modular system dividing the system functions into separate
processes or threads.
Convenience: Every user may have many tasks to work at one time like editing, printing,
compiling etc. That can be achieved through cooperation.
Shared Memory:
#define BUFFER_SIZE 10
typedef struct {
……
} item;
item buffer[BUFFER_SIZE];
Dept of ISE, Dr.AIT Page 35
OPERATING SYSTEM
int in = 0;
int out = 0;
item nextProduced;
while(true){
/* produce an item in nextProduced*/
while (((in + 1) % BUFFER_SIZE) == out)
; /* do nothing*/
buffer[in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
}
The item produced by the producer is put in a local variable called nextProduced.
The consumer process:
item nextConsumed;
while(true){
while (in == out)
; /* do nothing*/
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
/* consume the item in nextConsumed*/
}
The consumer process has a local variable nextConsumed in which the item to be consumed
is stored.
Note:
The buffer is empty when in ==out and buffer full when ((in + 1) % BUFFER_SIZE) ==out.
In this solution we do not address the situation in which both the producer and the consumer
attempt to access the shared buffer concurrently. There is no synchronization done here in the
code.
This is a form of communication between processes without the need for shared address
space. They also do synchronization.
Useful in a distributed environment.
Typically two primitives are used namely send(message) and receive(message).
This is the form of communication in microkernels.
Messages can be of fixed size or variable size.
The system implementation is easy if the messages are of fixed sizes but programming task
difficult.
If the message lengths are variable the system implementation is complex but programming
simpler.
Also they are more suitable for small size messages.
They are slower because they are typically implemented by using system calls which require
more time consuming task of kernel intervention.
For two processes P & Q to communicate a communication link must exist between them.
The link can be physical or logical.
The different methods for implementing the logical link and send and receive primitives is as
follows:
1. Direct or Indirect Communication:
2. Synchronous or Asynchronus Communication:
3. Automatic or Explicit Buffering:
4. Send by Copy or Send by Reference:
5. Fixed size or variable size messages:
Direct:
Naming is crucial for direct communication.
The process that wants to communicate must explicitly mention the recipient or the sender.
Primitives are defined as follows:
o send(P, message) => send message to process P
o receive(Q, message) => receive message from process Q.
The communication link in this form of communication has the following characteristics:
o A link is established automatically between every pair of processes that want to
communicate. The processes only need to know each other’s address.
o A link is associated with exactly two processes.
o Exactly one link exists between each pair of processes.
There is symmetry in this addressing because both the sender and receiver must mention each
other’s address.
In asymmetric form only the sender has to mention the receiver’s address, the recipient is not
required to name the sender. The primitives are as follows:
o send(P, message) => send message to process P
o receive(id, message) => receive message from any process. The variable id is set to
the name of the process with which the communication has taken place.
Disadvantage of symmetric & asymmetric:
o Limited modularity.
o Changing the name of the process requires examining all other process definitions.
I.e. all references to the old name must be found so that they can be modified to the
new name.
Indirect:
Here the messages are sent to and received from mailboxes or ports.
Each mailbox has a unique identification.
Dept of ISE, Dr.AIT Page 37
OPERATING SYSTEM
Synchronization:
Communication takes place through calls send( ) and receive( ) primitives in message
passing technique.
Message passing can be blocking/synchronous and nonblocking/asynchronous.
Types are:
o Blocking send: the sending process is blocked until the message is received by the
receiving process or mailbox.
o Nonblocking send: the sending process sends the message and resumes operation.
o Blocking receive: the receiver blocks until a message is available.
o Nonblocking receive: the receiver retrieves either a valid message or a null.
When the send and receive are blocking type then we have rendezvous between the sender
and receiver.
Producer-Consumer Problem:
o Produces invokes blocking send( ) call and waits until the message is delivered to
either receiver or mailbox.
o When the consumer invokes receive ( ) it blocks until the message is available.
Buffering:
The messages exchanged in direct or indirect reside in queues. The queues can be
implemented in three ways:
Zero Capacity: this is message system with no buffering. The queue has a maximum length
of zero and hence there can be no message waiting. Sender must block unless the receiver receives
the message.
Bounded Capacity: queue is of finite length n. Hence at most n messages can be in the queue
at any instant of time. If the queue is not full when a new message is sent it will be placed in the
queue and the sender can continue execution without waiting. If the queue is full the sender must
block until space becomes available.
Unbounded Capacity: the queue length is potentially infinite. The sender never blocks.
CASE STUDY: Linux System
Multithreading Models:
Many-to-One Model:
Many user level threads are mapped to one kernel thread.
Thread management is done at user level so it is efficient but entire process blocked if a
thread does a blocking system call.
Since only one thread can access the kernel at a time multiple threads are unable to run in
parallel on multicomputers. I.e. even though many user threads can be created true
concurrency is not achieved because the kernel can schedule one thread at a time.
Green threads - a thread library available in Solaris 2 uses this model.
Also user level thread libraries implemented on OS that do not support kernel threads use the
many-to-one model.
One-to-One Model:
Each user thread is mapped to one kernel thread.
More concurrency is achieved since another thread can run if one thread makes a blocking
system call.
Multiple threads can run in parallel in multiprocessor systems.
Problem in this method is that creating a user requires creating a kernel thread.
Creating kernel threads can affect the performance of the application.
Implementations of this model restricts the number of threads supported by the system.
Windows NT, Windows 2000 and OS/2 implement this model.
Many-to-Many Model:
Many user level threads are multiplexed to smaller or equal number of kernel threads.
Number of kernel threads can be specific to a particular application or a particular machine.
This model allows greater concurrency but the user has to be careful not to create too many
threads within an application.
If one kernel does a blocking system call the kernel can schedule another thread.
A variation of this is sometimes referred to as two-level model.
Ex. Solaris 2, IRIX, HP-UX and Tru64
Multiprogramming concept: Process executed until it must wait typically for the completion of
I/O. This helps:
To have some process running all the time.
Increase CPU utilization.
Since in a uniprocessor system only one process can run at a time any other process must wait till is
scheduled. Hence CPU scheduling is a must for multiprogramming. Hence scheduling is
fundamental to the OS.
All resources including the primary resource namely the CPU must be scheduled.
To use time productively in multiprogramming: when a process has to wait the OS takes away the
CPU from that process and gives it to another process.
What is CPU-I/O burst cycle?
CPU burst: the CPU time required/used between two I/O bursts.
I/O burst: the CPU time required/used between two CPU bursts.
The following property is observed:
Typically process execution consists of a cycle of CPU and I/O wait. Processes alternate
between these states.
Process execution starts with a CPU burst, followed by an I/O burst. The last CPU burst ends
the execution.
The durations of CPU bursts have been measured extensively. They seem to vary from
process to process and computer to computer. But they all tend to have the same curve which
is exponential.
Typically I/O bound jobs have many short CPU bursts and CPU bound jobs have few long
CPU bursts.
This property is used for building several scheduling algorithms.
CPU scheduler
When the CPU is idle the OS has to select a process from the ready queue to be executed. This
selection process is done by a scheduler. The records in the ready queue are generally PCBs of the
processes.
The ready queue can be implemented in different ways:
FIFO
Priority Queue
Tree
Simply an unordered linked list.
CPU scheduling decisions take place under four circumstances:
1. When a process switches from run state to wait state. Ex. I/O request.
2. When a process switches from run state to ready state. Ex. In case of interrupt
3. When a process switches from wait state to ready state. Ex. Say I/O complete.
4. When a process terminates.
In the condition 1 & 4 the scheduling scheme is called nonpreemptive.
Else it is called preemptive scheduling.
What is nonpreemptive (cooperative) scheduling?
In this form of scheduling once the CPU is allocated to a process the process keeps the CPU
until it either terminates or switches to the wait state.
Advantage:
Does not need special h/w like timer.
Ex. Used in Microsoft Windows 3.1 and Apple Macintosh
What is preemptive scheduling?
In this form of scheduling once the CPU is allocated to a process the process does not keep the CPU
until it either terminates or switches to the wait state.
Disadvantage:
Cost.
Coordination is required to access shared data.
Note: Preemption has an affect on the design of the OS kernel. Say the kernel is in the midst of a
system call and the process gets pre-empted. Say next the kernel is suppose to read or modify the
same data structure it was using earlier. There is chaos. Some systems deal with this kind of a
situation by waiting for the system call to complete. This way the kernel structure can be kept
simple. But this kernel execution model is poor for supporting real time applications.
A dispatcher is a component of the CPU scheduling function. The dispatcher is the one that gives
control of the CPU to the process selected by the short-term scheduler. The dispatcher must be fast.
There are 3 functional parts in a dispatcher:
1. Switch context.
2. Switching to user mode.
3. Jumping to the proper location in the user program to restart that program.
Dispatch latency is the time it takes for the dispatcher to stop one process and start another.
4.Waiting Time: It is the sum of time a process spends waiting in the ready queue. The CPU
scheduling affects the amount of time a process spends waiting in the ready queue.
5. Response Time: Applicable for interactive system. It is the time from the submission of request
until the first response is produced. The amount of time it takes to start responding. This time should
be minimum.
Scheduling Algorithms:
1. FCFS:
Simplest.
Non-Preemptive.
The process that requested the CPU first gets it.
Easily implemented with a FIFO queue.
The new process is inserted at the tail of the queue.
The process at the head of the queue is given to the CPU when it is free.
Problem with this algorithm is that the average waiting time is quite high.
The average wait times in this algorithm varies substantially if the burst times vary greatly.
This algorithm is a problem in time-sharing systems since a process holds on to the CPU
until termination or an I/O request.
Performance of FCFS algorithm in a dynamic situation: Convoy Effect:
Say there is one CPU bound process and many I/O bound process. When the CPU bound
process is holding the CPU all others will finish their I/O and move to ready queue. At this point I/O
devices are idle. Eventually the CPU bound process will finish the CPU burst and move to the I/O
device queue. At this point all the I/O bound processes which have a very short CPU burst times
quickly finish and move back to the I/O device queue. At this point the CPU is sitting idle. The CPU
bound process will eventually move back to ready queue and gets the CPU. Again I/O processes end
up waiting in the ready queue. This effect is called convoy effect.
PROBLEMS:
Ex1.
Process Burst Time
P1 24ms
P2 3ms
P3 3ms
Gantt Chart
P1 P2 P3
0 24 27 30
Wait time for P1 = 0ms
Wait time for P2 = 24ms
Wait time for P3 = 27ms
Average wait time = ( 0 + 24 + 27)/3 = 17ms
SJF: Shortest Job First (Shortest next CPU burst, shortest remaining time):
The length of the CPU burst time is associated with this algorithm.
When the CPU is available then it is assigned to a process with the smallest next CPU burst.
If two processes have same burst times FCFS is used.
It is an optimal algorithm because it gives the minimum average waiting time for a set of
processes: By moving a short process ahead decreases its wait time, but increases the wait
time of a long process. Hence average wait time decreases.
Problem:
o Real difficulty in knowing the length of the next CPU request. Can be suitable with
long-term scheduler in batch processing.
o Not suitable with short term scheduling even though optimal. There is no way to
know the length of the next CPU burst even though we can predict it.
Calculation of the approximate CPU burst of a process:
We can calculate an approximation of the next CPU burst.
The next CPU burst is generally predicted as an exponential average of the measure lengths of the
previous CPU bursts.
Let tn be the length of the nth CPU burst. Let n+1 be the predicted value for the next CPU burst.
n+1 = tn + (1 - ) n this is the formula for exponential average.
tn stores the most recent information
n stores the past history. controls the relative weight of the recent and past history in the
prediction.
where 0 <= <= 1
Ex1. (Non-Preemptive):
Process Burst Time (ms)
P1 6
P2 8
P3 7
P4 3
P4 P1 P3 P2
3 9 16 24
Ex2. Preemptive
Process Arrival Time Burst Time
P1 0 8
P2 1 4
P3 2 9
P4 3 5
P1 P2 P4 P1 P3
1 5 10 17 26
Priority Scheduling:
SJF is a special case of general priority scheduling algorithm. The larger the CPU burst the
lower is the priority. The priority p is the inverse of the (predicted) next CPU burst.
The process with the highest priority is allocated the CPU.
Equal priority processes are scheduled as FCFS.
Priorities are generally some fixed range of numbers like 0 to 7.
There is no general agreement on whether 0 is the lowest or the highest priority. It varies
from system to system.
Priorities can be defined internally or externally. Internal ones use some measurable
quantity/quantities (time limits, memory requirements, number of open files, ratio of average
I/O burst to average CPU burst) to computer the priority of a process. External priorities are
set by external criteria like importance of a process, type and amount paid for computer use,
political factors, dept sponsoring the work etc.
They can be preemptive or non-preeemptive.
In preemptive mode if the arriving process has higher priority than the running process then
it gets preeempted.
In non-preemptive mode if the arriving process has higher priority than the running process
then the arriving process is put at the head of the ready queue.
Disadvantage:
o Indefinite Blocking/Starvation: A process that is ready to run but lacks CPU is said to
be blocked. This algorithm can leave some low priority processes waiting indefinitely
for the CPU. In a heavily loaded situation a stream of high priority processes can
block low priority processes from ever getting the CPU.
o Solution: Aging: It is a technique of gradually increasing the processes that wait for a
long time.
Process Burst Time Priority Arrival Start Wait Finish TA
1 10 3 0 6 6 16 16
2 1 1 0 0 0 1 1
3 2 4 0 16 16 18 18
4 1 5 0 18 18 19 19
5 5 2 0 1 1 6 6
Gantt chart:
The ready queue is a circular queue and a FIFO queue of processes. The CPU scheduler goes
round the queue allocating the CPU to each process a 1 time quantum.
The first process in the ready queue is dispatched with the timer on.
If the process’s CPU burst is less than 1 time quantum then the process will itself give up the
CPU otherwise the timer will go off and an interrupt happens to do context switch. The old
process goes to the tail of the queue.
The average wait time in RR is fairly long.
EX.
Process Burst Time
P1 24
P2 3
P3 3
Time quantum = 4ms
P1 P2 P3 P1 P1 P1 P1 P1
4 7 10 14 18 22 26 30
Average wait time = 5.66ms
If there are n processes in the ready queue and time quantum is q then each process gets 1/n of the
CPU time in chunks of at most q time units. Each process must wait no longer than (n – 1)q time
units until its next time quantum.
If there are 5 processes with time quantum of 20ms then each process will 20ms every 100ms.
Performance: Depends on the size of the quantum.
When time quantum is large it looks like FCFS. But if time quantum is small it resembles processor
sharing. It appears to the users as though there are n processes and each processor running at a speed
of 1/n speed of the real processor.
The time quantum must be large w.r.t the context switch.
Turnaround time depends on the size of the time quantum. In general the average turnaround time
does not necessarily improve as the time quantum increase.
Queue0 Quantum 8
Queue1 Quantum 16
Queue2 FCFS
o A process entering the ready queue is put in queue0. If it does not complete in 8ms it is put in
the tail of queue1. If queue0 is empty then the head of queue1 gets 16ms. If it does not
complete it is put in queue2. Processes in queue2 are run FCFS basis if queue0 and queue1
are empty.
o This algorithm gives highest priority to a process that requires a time quantum of 8ms or less.
Such processes finish their work and go for I/O. Processes that need more than 8ms but less
than 24ms are in the next lower queue. Long processes automatically sink to the lowest level
which is FCFS.
o The following parameters define the multilevel feedback queue scheduler:
o The number of queues.
o The scheduling algorithm for each queue.
o The method used to determine when to upgrade a process to a high priority queue.
o The method used to demote a process to lower priority queue.
o The method used to determine which queue a process will enter when the process
needs service.
It is the most general CPU scheduling algorithm. It can be configured to match a specific system
under design.
Multiple-Processor Scheduling:
o In case multiple processors the CPU scheduling is complex.
o There is no best algorithm as such.
Homogeneous system:
o When all processors are identical it is said to be a homogenous system. In terms of
functionality any processor can be used to run any process in the queue. Hence load sharing
can occur.
o If we provide a separate queue for each processor then when the queue is empty the CPU sits
idle. To prevent this we use a common ready queue.
o Two scheduling approaches are used:
o 1. Each processor is self-scheduling. Each processor examines the common ready
queue and selects a process to execute. This is called symmetric multiprocessing.
Dept of ISE, Dr.AIT Page 46
OPERATING SYSTEM
o Here we have to make sure that two processors do not select the same process and
that processes are not lost from the queue.
o 2. To avoid the above problem one processor is appointed as a scheduler for the other
processes thus creating a master slave structure. Some systems have all scheduling
decisions, I/O processing and other system activities to be handled by the master
server.
o This is asymmetric multiprocessing.
o It is simpler than symmetric form.
o In heterogeneous systems where processors are different only programs compiled on a given
processor can run that process.
Load Balancing:
In SMP to get maximum performance it is important to keep workload balanced among all the
processors.
Else if one or more processors sit idle the load on other processors increases. The wait queue on
highly loaded processors increases.
Hence load balancing attempts to keep the workload evenly distributed on all processors.
Load balancing is required when each processor has its own queue.
When there is common queue from which processors pick the process no load balancing is
required.
There are two approaches to load balancing:
o Push migration: a specific task periodically checks the load on the processors. If it finds
an imbalance then it evenly distribute the load by moving from overloaded processor to
less loaded or a idle processor.
o Pull migration: An idle processor pulls a waiting task from a busy processor.
Linux implements both methods.
Symmetric Multithreading:
SMP allows multiple threads to run concurrently on multiple processors.
Instead of physical processors logical processors are provided. This is called Symmetric
multithreading (SMT) or Hyperthreading technology.
SMT: Idea:
Is to create multiple logical processors on the same physical processors presenting a view of
several logical processors to the OS.
Each logical processor has its own architecture state, which indicates the general purpose and the
machine state registers.
Thread scheduling:
There are user level and kernel level threads.
User level threads managed by thread library.
To run on the CPU the user level threads have to be mapped onto kernel threads through may be
LWP.
UNIT 2
Syllabus:
Synchronization:
1. THE CRITICAL SECTION PROBLEM
2. PETERSON'S SOLUTION
3. SYNCHRONIZATION HARDWARE
4. SEMAPHORES
5. CLASSICAL PROBLEMS OF SYNCHRONIZATION
6. MONITORS
7. SYNCHRONIZATION EXAMPLES
8. DEAD LOCKS: SYSTEM MODEL
9. DEADLOCK CHARACTERIZATION
10. METHODS FOR HANDLING DEADLOCKS
11. DEADLOCK PREVENTION
12. DEADLOCK AVOIDANCE AND DETECTION
13. RECOVERY FROM DEADLOCK.
SYNCHRONIZATION
Cooperating Processes
Affect or affected by other processes executing in the system.
They may directly share a logical address space (through threads) or share data through files
or messages.
The problem with sharing data is making data inconsistent.
Race Conditions
In operating systems, processes that are working together share some common storage (main
memory, file etc.) that each process can read and write.
When two or more processes are reading or writing some shared data and the final result
depends on who runs precisely when, are called race conditions.
Concurrently executing threads that share data need to synchronize their operations and
processing in order to avoid race condition on shared data.
Only one ‗customer‘ thread at a time should be allowed to examine and update the shared
variable.
Race conditions are also possible in Operating Systems.
If the ready queue is implemented as a linked list and if the ready queue is being manipulated
during the handling of an interrupt, then interrupts must be disabled to prevent another
interrupt before the first one completes.
If interrupts are not disabled than the linked list could become corrupt.
Consider a system consisting of n processes {P0, P1, ..., Pn-1 }.Every process has a segment of
code called critical section in which processes may be changing variables, updating a table,
writing a file etc.
When one process is executing in its critical section no other process is allowed to execute in
its critical section.
Critical section problem-to design a protocol that processes can use in order to cooperate.
do {
entry section
critical section
exit section
remainder section
} while(1);
A solution to the critical-section problem must satisfy the following three requirements:
1. Mutual exclusion. If process P; is executing in its critical section, then no other processes
can be executing in their critical sections.
2. Progress. If no process is executing in its critical section and some processes wish to enter
their critical sections, then only those processes that are not executing in their remainder
sections can participate in the decision on which will enter its critical section next, and this
selection cannot be postponed indefinitely.
3. Bounded waiting. There exists a bound, or limit, on the number of times that other processes
are allowed to enter their critical sections after a process has made a request to enter its
critical section and before that request is granted.
We assume that each process is executing at a nonzero speed. However, we can make no assumption
concerning the relative speed of the n processes.
At a given point in time, many kernel-mode processes may be active in the operating system. As a
result, the code implementing an operating system(kernel code) is subject to several possible race
conditions.
Consider as an example a kernel data structure that maintains a list of all open files in the
system. This list must be modified when a new file is opened or closed (adding the file to the
list or removing it from the list).
If two processes were to open files simultaneously, the separate updates to this list could
result in a race condition.
Other kernel data structures that are prone to possible race conditions include structures for
maintaining memory allocation, for maintaining process lists, and for interrupt handling. It is
up to kernel developers to ensure that theoperating system is free from such race conditions.
Two general approaches are used to handle critical sections in operating systems:
Why, then, would anyone favor a preemptive kernel over a nonpreemptive one?
A preemptive kernel is more suitable for real-time programming, as it will allow a real-time process
to preempt a process currently running in the kernel. Furthermore, a preemptive kernel may be more
responsive, since there is less risk that a kernel-mode process will run for an arbitrarily long period
before relinquishing the processor to waiting processes.
PETERSON'S SOLUTION
Let the two processes be P0 and P1. Let i and j be used to denote other processes.
Algorithm1:
Here the processes share a common variable turn, which is initialised to 0/1.
If turn == i, then Pi is allowed to execute in its critical section.
Structure of Pi is as follows:
do {
while (turn != i);
critical section
turn = j;
remainder section
} while(1);
For process P0 For process P1
do { do {
while (turn != 0); while (turn != 1);
critical section critical section
turn = 1; turn = 0;
remainder section remainder section
} while(1); } while(1);
This solution meets the mutual exclusion requirement because it allows only one process to
enter the CS at any given time.
It does not meet the progress requirement because it requires strict alternation of processes in
execution of its CS. If Pj wants to enter CS then Pi must have executed and made turn to
become j. But if Pi does not want to execute then there is no way to make turn == j and hence
Pj will not enter CS.
Algorithm2:
In Algo1 there is no way to retain sufficient information about the state of each process.
The variable turn is replaced by an array Boolean flag[2];
The elements of the array are initialised to false.
If flag[i] is true then Pi is ready to enter critical section.
Structure of process Pi:
do {
flag [i] = true;
while (flag [j]);
critical section
flag [i] = false;
remainder section
} while(1);
For process P0 For process P1
do { do {
flag [0] = true; flag [1] = true;
while (flag [1]); while (flag [0]);
critical section critical section
flag [0] = false; flag [1] = false;
remainder section remainder section
} while(1); } while(1);
Pi can be prevented from entering the CS if it is stuck in the while loop while (flag [j] && turn ==
j); But if Pj is not ready to enter then flag[j] is false then Pi can enter. Similarly the case of Pj. Once
Pi completes it sets its flag to false which means the Pj can now attempt to enter CS. Hence progress
and bounded waiting is taken care of.
Dept of ISE, Dr.AIT Page 54
OPERATING SYSTEM
SYNCHRONIZATION HARDWARE
In general any solution to CS will require a lock mechanism. A process must acquire a lock
before entering the CS and release the lock when it exits the CS.
do {
acquire lock
critical section
release lock
remainder section
} while(true);
In uni-processor systems critical section problem could be solved by not allowing the interrupts
to occur when a shared variable is being modified.
This solution is not feasible in multiprocessor systems since interrupts can be time consuming as
message has to be passed to all processors. This decreases the efficiency of the system. Ex. Say
the clock is being updated by interrupts.
Hence many machines provide hardware instructions.
Many systems have simple hardware instruction that are effectively used to solve the critical
section problem.
Ex. TestAndSet instruction, Swap instruction.
These instructions have to be executed atomically without interrupting.
Hardware solutions are more efficient than the software ones.
1. TestAndSet instruction:
Executed atomically.
A boolean variable lock is used with this instruction that is initialised to false.
It is used to implement mutual exclusion.
It is a function that returns a boolean value.
Used in IBM/370.
Mutual Exclusion: When one process is in CS it sets its value of lock to true by executing the
TestAndSet instruction. Even if another process wants to enter the CS it will wait in the while loop
till the lock is set false by the process in the CS.
Progress:It is valid because if the second process does not want to enter CS now the first process can
still enter.
Bounded Waiting: When a process exits CS, the selection of the next P j to enter CS is arbitrary: it’s
a race! Hence does not satisfy bounded waiting.
2. Swap instruction:
Swap instruction exchanges the contents of two memory locations.
Swap instruction is also executed atomically.
Used in Intel IA-32 and IA-64.
Mutual exclusion is implemented using by declaring a global boolean variable lock which is
initialised to false.
Each process has its own local boolean variable key.
// The value of the first compare operand is compared with the value of the second compare operand.
If they are unequal, the second compare operand is stored into the first compare operand's location.
If they are equal, the swap operand is stored in the second compare operand's location.
Critical section
lock = false;
Remainder section
} while(1);
Explanation:
A shared variable lock is initialised to 0. Each process uses a local variable key that is
initialised to 1. The only process that enters CS is the one that finds the lock to be 0. It eliminates all
other processes by making lock to be 1. When a process leaves CS it resets lock to 0 so that the next
process can access gain to CS.
Mutual exclusion: If lock is 0 no process is in its CS else if lock is 1 exactly one process is in CS.
Progress: also works since even if one process does not want to enter CS the other still can enter.
Algorithm of TestAndSet Instruction that satisfies all the requirements of critical section.
The data structures used are boolean waiting[n]; and boolean lock; both are initialised to false.
do { waiting[i] = true;
key = true;
while (waiting[i] && key)
key = TestAndSet (lock);
waiting[i] = false;
Critical section
j = (i + 1) % n;
while ((j!=i) && !waiting[j])
j = (j + 1) % n;
if (j == i) lock = false;
else
waiting[j] = false;
Remainder section
} while(1);
Process Pi can enter CS only if either waiting[i] == false or key == false. The value of key
becomes false only if TestAndSet is executed. The first process to execute the TestAndSet
instruction will find key == false; and all others have to wait. Also waiting[i] becomes false only if
another process leaves its critical section. Only one waiting[i] is set to false which means mutual
exclusion is valid.
Since a process executing its CS either sets lock to false or waiting[j] to false this will
definitely allow a waiting process to enter its critical section. Hence Progress condition is verified.
The bounded waiting can be verified as follows: When a process leaves its CS it scans
through the array waiting in the cyclic order (i+1, i+2, …….n-1, 0, 1,…, i-1). The first entry in this
ordering whose waiting[j] == true will enter the critical section. Any process waiting to enter CS will
not have to wait for more than n-1 turns.
Hence all three requirements of critical section problem are verified. Hence this algorithm forms a
solution.
SEMAPHORES
The solution to CS discussed so far (i.e. TestandSet and Swap) are not easy to generalize to
complex problems. Also they are complicated for the application programmers to use.
Hence a synchronization tool called semaphore is used.
Semaphore are high-level constructs used to synchronize concurrent process.
Given first by Dijkstra.
A semaphore S is an integer variable that apart from initialisation is accessed through only
two standard atomic operations wait( ) and signal( ).
(Note: wait( ) and signal ( ) was originally termed P and V respectively to represent some Dutch
word meaning test and increment).
Definition of wait:
wait (S) {
while (S<=0) ; // no op
S--; }
Definition of signal:
signal(S) {
S++; }
Modifications to the integer value of the semaphore must be done indivisibly.
I.e. no two processes can modify the semaphore simultaneously.
Hence in wait the two steps: while (S<=0) ; S--; must be done indivisibly.
When a process is in its CS any other process trying to enter CS will loop continuously in the
entry code. This looping is a problem in multiprogramming environment where a single CPU is
shared among several processes. Busy wait wastes CPU cycles. This type of semaphore is called
spinlock.
In busy waiting a process spins continuously waiting for a lock. This type of semaphore is
called spinlock.
Disadvantage? Busy wait wastes CPU cycles.
Advantage? No context switch is required when a process must wait on a lock. Context switch takes
time and is expensive. Hence when locks are to be held for a short time spinlocks are useful.
They are often employed in a multiprocessor system where one thread can spin on one processor
while another thread performs its critical section on another processor.
P0 P1
wait(S); wait(Q);
wait(Q); wait(S);
Suppose P0 executes wait(S) and then P1 executes wait(Q). Next when P0 executes wait(Q) it must
wait until P1 executes signal(Q).
Similarly when P1 executes wait(S) it must wait until P0 executes signal(S). Since these signal
operations cannot happen P0 and P1 are deadlocked.
Deadlock-We say a set processes are deadlocked if every process is waiting on an event that can be
caused by only by another process (also waiting) in the set.
Problem with deadlock is indefinite blocking or starvation. Indefinite blocking may occur if
we add and remove processes from the list associated with the semaphore in the LIFO order.
Binary Semaphores:
A binary semaphore is one whose integer value is restricted a range between 0 and 1.
Advantage: Simpler to implement than counting semaphore depending on the underlying h/w.
Drawbacks of semaphores:
A process that uses a semaphore has to know which other processes also uses the semaphore.
It may also have to know how they are using the semaphore.
The semaphore operation must be carefully put inside a process. The omission of a P or V
can result in inconsistencies.
Programs using semaphores can be extremely hard to verify for correctness.
Solution to the above two problems result in starvation. In the first the writers are starved and in the
second the readers are starved.
If a writer is in CS then one reader is queued on wrt and the remaining n-1 are queued on mutex.
Solution:
Represent the chopstick by a semaphore.
A philosopher grabs the chopstick by executing the wait operation and releases the chopstick
by executing the signal operation.
The shared data is represented as follows:
Structure of philosopher i:
do {
wait(chopstick[i]);
wait(chopstick[i+1]%5);
…
eat
……
…..
signal(chopstick[i]);
signal(chopstick[i + 1] % 5);
…..
think
…..
}while(1);
MONITORS
Though semaphores are a convenient and effective mechanism for process synchronization,
using them incorrectly can result in timing errors that are difficult to detect.
These errors happen only if some particular execution sequence takes place and these sequences
happen very rarely.
Also when semaphores are used incorrectly there may be deadlocks in the system.
To deal with such situations researchers have developed monitor type.
A monitor is a high-level synchronization construct.
It is characterized by set of user-defined operations.
The representation of monitor types is as follows:
Syntax
monitor monitor-name
{
shared variables declaration
procedure body P1 (….) {
…..
}
Dept of ISE, Dr.AIT Page 64
OPERATING SYSTEM
The monitor construct ensures that only one process at a time can be active within a monitor.
Additional synchronization mechanisms are provided by the conditional construct.
A programmer who needs to define his own synchronization scheme can define one or more
variables of type condition:
condition x, y; only wait and signal can be invoked on these variables. I.e. x.wait( ); and
x.signal( );
x.wait( ); process invoking this suspended until someone does a signal.
x.signal( ); resumes exactly one suspended process.
Assume that: philosopher is allowed to pickup chopstick only if both are available;
States of the philosopher: thinking, eating and hungry:
enum{thinking, hungry, eating} state[5];
Set to state of eating only if: i.e. state[i] = eating if the two neighbours are not eating. I.e. (state[(i+4)
%5] != eating) (state[(i+1) %5] != eating)
Condition to be declared: condition self[5]; where philosopher i delays when hungry and unable to
obtain both the chopstick.
Chopsticks: the distribution of the chopsticks is controlled by the monitor dp i.e.
dp.pickup(i);
……
eat
…..
dp.putdown(i);
monitor dp
{
enum{thinking, hungry, eating} state[5];
condition self[5];
void pickup(int i) {
state[i] = hungry;
test(i);
if(state(i) != eating)
self[i].wait( );
}
void putdown(int i) {
state[i] = thinking;
test((i+4) % 5);
test((i+1) % 5);
}
void test(int i) {
if ((state[(i+4) %5]!= eating ) && (state[i] == hungry) && (state[(i + 1)%5]
!=eating)) {
state[i] = eating;
self[i].signal();
}
}
void init( ) {
for(int i=0; i<5; i++)
state[i] = thinking;
}
}
Implementing a Monitor Using Semaphores
We now consider a possible implementation of the monitor mechanism using semaphores.
For each monitor, a semaphore mut ex (initialized to 1) is provided.
A process must execute wait (mutex) before entering the monitor and must execute signal
(mutex) after leaving the monitor.
Since a signaling process must wait until the resumed process either leaves or waits, an
additional semaphore, next, is introduced, initialized to 0, on which the signaling processes
may suspend themselves.
An integer variable next-count is also provided to count the number of processes suspended
on next. Thus, each external procedure F is replaced by
wait(mutex);
body of F
if (next_count > 0)
signal(next);
else
signal(mutex);
Mutual exclusion within a monitor is ensured.
We can now describe how condition variables are implemented. For each condition x, we introduce
a semaphore x_sem and an integer variable x_count, both initialized to 0. The operation x. wait ()
can now be implemented as
x_count++;
if (next_count > 0)
signal(next);
else
signal(mutex);
wait(x_sem);
x_count—;
The operation x. signal () can be implemented as
if (x_count > 0) {
next_count++;
signal(x_sem);
wait(next) ;
next_count—; }
This implementation is applicable to the definitions of monitors given by both Hoare and Brinch-
Hansen. In some cases, however, the generality of the implementation is unnecessary, and a
significant improvement in efficiency is possible.
where c is an integer expression that is evaluated when the wait () operation is executed. The value
of c, which is called a priority number, is then stored with the name of the process that is
suspended. When x. signal () is executed, the process with the smallest associated priority number is
resumed next.
The same difficulties are encountered with the use of semaphores, and these difficulties are similar
in nature to those that encouraged us to develop the monitor constructs in the first place.
One possible solution to the current problem is to include the resourceaccess operations
within the ResourceAllocator monitor.
However, using this solution will mean that scheduling is done according to the built-in
monitor-scheduling algorithm rather than the one we have coded. To ensure that the
processes observe the appropriate sequences, we must inspect all the programs that make use
of the ResourceAllocator monitor and its managed resource.
We must check two conditions to establish the correctness of this system.
First, user processes must always make their calls on the monitor in a correct sequence.
Second, we must be sure that an uncooperative process does not simply ignore the mutual-
exclusion gateway provided by the monitor and try to access the shared resource directly,
without using the access protocols.
Only if these two conditions can be ensured can we guarantee that no time-dependent errors
will occur and that the scheduling algorithm will not be defeated. Although this inspection
may be possible for a small, static system, it is not reasonable for a large system or a dynamic
system.
DEAD LOCKS:
When processes request a resource and if the resources are not available at that time the process enters
into waiting state. Waiting process may not change its state because the resources they are requested are
held
by other process. This situation is called deadlock.
• The situation where the process waiting for the resource i.e., not available is called deadlock.
SYSTEM MODEL
Resources in the system are finite: several processes compete. So need for distribution.
Processes request for resources and when not available they wait.
When the requested resource is held by another waiting processes the resulting situation is
called deadlock.
Resources:
o Finite.
o Partitioned to different types.
o Can be physical resources like printers, tape drives, CPU, memory space or logical
resources like files, semaphores and monitors.
o There can be any number of instances of a resource type.
o Ex. If there are two printers then there are two instances of the type printer.
o Each type may have several identical instances.
o If allocation of any instance of the resource will satisfy a process when it requests
then resources instances are identical else not identical.
o Deadlocks can involve different resource types.
A system table records whether each resource is free or allocated and to which process.
Deadlock-A set of processes is in a deadlock state when every process in the set is waiting for an
event that can be caused only by another process in the set.
Event= here is resource acquisition & release.
Consequence of deadlocks
Processes never finish execution.
System resources are tied up.
New jobs not allowed to start.
Performance degradation.
DEADLOCK CHARACTERIZATION
For a deadlock to hold good the following four conditions must simultaneously hold good in
a system:
1. Mutual Exclusion:
There must be at least one resource that must be held in a non-sharable mode.
Allow one process to use the resource at a time.
If another process requests a resource when in use the requesting process must wait until
released.
2. Hold & Wait:
A process must be waiting with at least one resource and waiting to acquire additional
resources that are currently being held by other processes.
3. No Preemption:
Resources cannot be pre-empted. Resources have to be given up voluntarily after completing
the task.
4. Circular Wait:
A set of processes {P0, P1, P2, … Pn} of waiting processes must exist such that P0 is waiting
on a resource that is held by P1, P1 waiting on a resource held by P2, and Pn-1 is waiting on
a resource held by Pn and Pn is waiting on a resource held by P0.
Resource Allocation Graph:
They are directed graphs used to describe the deadlocks precisely.
It consists of set of vertices V and set of edges E.
Dept of ISE, Dr.AIT Page 70
OPERATING SYSTEM
P1 P2 P3
R4
R2
If the resource allocation graph has no cycles then there is no deadlock in the system.
But if there is a cycle then there may be a deadlock.
If each resource has only one instance then a cycle implies that there is a deadlock. All
processes in the cycle are deadlocked. In this case a cycle is a necessary and sufficient
condition to prove a deadlock.
If each resource has more than one instance then a cycle in the graph does not necessarily
imply a deadlock. Here the cycle is necessary but not sufficient condition to imply deadlock.
Ex1. Consider the above resource allocation graph:
R1 R3
P1 P2 P3
R2
R4
P3
P1
P4
R2
In this resource allocation graph there is a cycle P1-> R1-> P3-> R2-> P1. But this is not deadlock.
When P4 releases R2 then the resource can be allocated to P3 breaking the cycle.
Deadlock prevention is a set of methods for ensuring that at least one of the necessary
conditions cannot hold.
These methods prevent deadlock by constraining how requests for resources can be made.
In deadlock avoidance it requires that the OS be given advance additional information
concerning which resources a process will request and used during its lifetime.
With this information the OS can decide whether or not a process should wait.
To decide whether a request can be serviced or must wait till the following considerations are
made:
o Resources currently available.
o Resources currently allocated.
o Future requests and releases of each process.
2. Allow the system to enter deadlock, detect and recover from it.
An algorithm examines the state of the system to determine/detect whether a deadlock
has occurred.
Then an algorithm to recover from the deadlock is run.
3. Ignore the problem and pretend that deadlocks never occur in the system.
Does not ensure that deadlock does not occur.
No mechanism for detection and recovery.
No way of recognizing a deadlock even if it occurs.
It can result in the deterioration of performance because resources are being held up.
Eventually the system will stop functioning and has to be manually restarted.
Not a viable approach yet used in many OS. Ex. Unix.
DEADLOCK PREVENTION
Ensure that at least one of the four necessary conditions does not hold good. Thus can prevent
deadlock from happening.
Mutual Exclusion:
Resources can be:
Sharable:
o Ex. read-only files.
o Do not require mutual exclusiveness.
o The simultaneous access is always granted. Not need for wait.
Non-Sharable:
o Ex. Printer.
o Cannot be shared simultaneously be several processes.
o Since intrinsically non-sharable mutual exclusion happens.
o Else do it implicitly.
No Preemption:
Since there can be no pre-emption of resources already allocated we have to make sure that
this condition does not hold good.
Protocols:
1. If a process is holding some resource and requests for other resources that cannot be granted
immediately then the currently held resources are pre-empted: implicitly released.
Released resources are added back to the list of resources.
The pre-empted process can be later restarted and old and new resources can be allocated.
Circular Wait:
One way of ensuring that this does not happen is through total ordering of resource type and
each process request only in the increasing order of enumeration.
Let R = {R1, R2, R3…., Rm} be set of resource type.
We assign a unique integer number to each resource type so that we can compare two
resources and determine whether one precedes another.
Let F: R -> N be a one way function where N is a set of natural numbers. Ex. F(tape drive) =
1, F(Printer) = 12 etc.
Protocol:
Each process can only request in the increasing order of enumeration.
Initially a process can request any instances of resource Ri. A single request for all of them
have to be made.
If the next request is for resource Rj then it may be granted only if available and if F(Rj) >
F(Ri). Alternatively when a process requests an instance of Rj then it must have released any
resource Ri such that F(Ri >= Rj).
Conclusion:
Prevention of deadlock is made by restraining how the requests are made.
This restrain makes sure that at least one of the four conditions for deadlock cannot occur.
Side effects of preventing deadlocks is that there is low device utilization and hence reduced
system throughput.
Resource allocation state is defined as the number of available and allocated resources and
the maximum demands of the process.
A state is safe if the system can allocate resources to each process (!> maximum) in some
order and still avoid deadlock. A system is in a safe state only if there exists a safe sequence.
safe sequence- A sequence of processes <P1, P2, P3,….Pn> is a safe sequence for the current
allocation state if for each Pi the resource that Pi can still request can be satisfied by the currently
available resources plus resources held by all Pj with j<i.
If resources that Pi needs are currently not available then Pi waits until all Pj have finished.
When Pi uses the resources and later releases them then Pi+1 can use them.
deadlock
Safe
Safe state is not a deadlock. A deadlock state is an unsafe state. Not all unsafe states are deadlocks.
System can go from safe state to unsafe state.
Illustration:
Let a system have 12 magnetic tapes and 3 processes P0, P1 and P2. P0 requires 10 tape
drives, P1 may need as many as 4, and P2 may need up to 9.
Let at time t0 P0 is holding 5 tapes, P1 is holding 2 and P2 is holding 2 tape drives. Hence
there are 3 free tapes.
P
P1
R2
Consider the above fig:
Suppose P2 requests R2 although it is free it cannot be allocated since this action will create
a cycle.
This is an unsafe state. P1 requests R2 and P2 requests R1 will create a deadlock.
Ex. R1
P2
P1
R2
Disadvantages:
Not applicable in a resource allocation system with multiple instances.
Banker’s Algorithm:
Applicable for multiple instances of each resource type.
Name given because it could be used in a banking system to ensure that the bank never
allocates available cash such that it can no longer satisfy the needs of the customer.
Allocation: a n x m matrix. Defines the number of resources of each type currently allocated to each
process.
If Allocation[i,j] = k then process Pi is currently allocated k instances of resource type Rj.
Need: an n x m matrix indicates the remaining resource need for each process.
If Need [i,j] = k then process Pi may need k instances of resource type Rj to complete task.
Need [i,j] = Max[i,j]- Allocation[i,j]
Disadvantage:
Less efficient than the resource allocation graph algorithm.
Safety Algorithm:
This algorithm is used to find whether a system is in a safe state.
1. Let Work and Finish be vectors of length m and n respectively.
Initialise
Work := Available
Finish[i]:= false for i= 1,2,3,…n
2. Find i such that both
a. Finish[i]= false
b. Needi <= Work
If no such i exists go to step 4.
3. Work := Work + Allocation;
Finish[i] := true;
Go to step 2.
4. If Finish[i] = true for all i then the system is in safe state.
This algorithm requires m x n2 operations.
Ex. P5
P5
R1 R3 R4 P1 P2 P3
P1
P P3
P4
R2 P4 R5
In a wait-for graph the edge Pi -> Pj implies that process Pi is waiting for a resource to released by
Pj. This edge is two edges in the resource allocation graph namely: Pi -> Rq, Rq -> Pj for some
resource Rq.
When is it a deadlock?
If a cycle exists in a wait-for graph then a deadlock exists.
Detection?
Maintain the wait-for graph and periodically invoke an algorithm that searches for a cycle in the
graph.
Here cycle detection is O(n2) where n is the number of vertices in the graph.
4. If Finish[i] = false for some i 1<= i <= n, then the system is in deadlock.
If Finish[i] = false then process Pi is deadlocked.
This algorithm has complexity of O(m x n2).
Illustration:
Process Allocation Request Available
A B C A B C A B C
P0 0 1 0 0 0 0 0 0 0
P1 2 0 0 2 0 2
P2 3 0 3 0 0 0
P3 2 1 1 1 0 0
P4 0 0 2 0 0 2
<P0, P2, P3, P1, P4> will make Finish[i] = true for all i. Hence there will be no deadlock.
Suppose P2 makes one additional request for an instance of type C then the Request matrix will be
Process Request
A B C
P0 0 0 0
P1 2 0 2
P2 0 0 1
P3 1 0 0
P4 0 0 2
System will be deadlocked.
I. Process Termination:
1. Abort all deadlocked processes:
Breaks the deadlock cycle.
Aborting them is at a great expense:
o Since many processes have executed for a long time.
o The results of computation wasted.
o Recomputation late.
2. Abort one process at a time until the deadlock cycle is terminated:
Abort a process and run deadlock detection algorithm.
Considerable overhead because of this.
Not easy to abort a process because:
o Say a process was in the midst of updating a file then terminating it will leave the file in an
incorrect state.
o If there is asset of processes in deadlock then determining which process to terminate is
difficult.
o Which to terminate? Several factors are used:
1. Apply economics and abort the one that will be a minimum cost.
2. See what the priority of the process is.
3. How long a process has computed and how much longer a process will compute before
completing its task.
4. How many and what type of resources the process has used.
5. How many more resources the process needs to complete.
6. How many processes need to be terminated.
7. Whether the process is interactive batch.
Syllabus:
1. Background
2. Swapping
3. Contiguous Memory Allocation
4. Paging
5. Structure of Page Table
6. Segmentation
7. Virtual Memory Management: Background
8. Demand Paging
9. Copy on Write
10. Page Replacement
11. Allocation of frames
12. Allocating Kernel Memory.
1. Background
Memory:
Central to the operation of the computer system.
Is a large array of words or bytes each with specific address.
The memory unit only sees a stream of addresses. It does not know how they are generated or
what they are used for (instruction or data).
PC mentions the address in the memory from where instructions can be fetched
Memory management is concerned with managing the primary memory.
Instruction execution:
Fetch the instruction from memory.
Decode and fetch operands if required.
Execute & store the results back in the memory.
Basic Hardware:
Address Binding:
Programs are stored on the secondary storage disks as binary executable files.
When the programs are to be executed they are brought in to the main memory and placed
within a process.
The collection of processes on the disk waiting to enter the main memory forms the input
queue.
One of the processes which are to be executed is fetched from the queue and placed in the main
memory.
During the execution it fetches instruction and data from main memory. After the process
terminates it returns back the memory space.
During execution the process will go through different steps and in each step the address is
represented in different ways.
In source program the address is symbolic.
The compiler converts the symbolic address to re-locatable address.
The loader will convert this re-locatable address to absolute address.
Binding of instructions and data can be done at any step along the way:
1. Compile time:-If we know whether the process resides in memory then absolute code can be
generated. If the static address changes then it is necessary to re-compile the code from the
beginning.
2. Load time:-If the compiler doesn‘t know whether the process resides in memory then it generates
the re- locatable code. In this the binding is delayed until the load time.
3. Execution time:-If the process is moved during its execution from one memory segment to another
then the binding is delayed until run time. Special hardware is used for this. Most of the general
purpose operating system uses this method.
The address generated by the CPU is called logical address or virtual address.
The address seen by the memory unit i.e., the one loaded in to the memory register is called
the physical address.
Compile time and load time address binding methods generate some logical and physical
address.
The execution time addressing binding generate different logical and physical address.
Set of logical address space generated by the programs is the logical address space.
Set of physical address corresponding to these logical addresses is the physical address space.
The mapping of virtual address to physical address during run time is done by the hardware
device called memory management unit (MMU).
The base register is also called re-location register.
Value of the re-location register is added to every address generated by the user process at
the time it is sent to memory.
Dynamic Loading:
For a process to be executed it should be loaded in to the physical memory. The size of the
process is limited to the size of the physical memory.
Dynamic loading is used to obtain better memory utilization.
In dynamic loading the routine or procedure will not be loaded until it is called.
Whenever a routine is called, the calling routine first checks whether the called routine is
already loaded or not. If it is not loaded it cause the loader to load the desired program in to
the memory and updates the programs address table to indicate the change and control is
passed to newly called routine.
Advantage:
Gives better memory utilization. x Unused routine is never loaded.
Do not need special operating system support.
This method is useful when large amount of codes are needed to handle in frequently
occurring cases.
2. SWAPPING
Swapping is a technique of temporarily removing inactive programs from the memory of the
system.
A process can be swapped temporarily out of the memory to a backing store and then brought
back in to the memory for continuing the execution. This process is called swapping.
Eg:-In a multi-programming environment with a round robin CPU scheduling whenever the time
quantum expires then the process that has just finished is swapped out and a new process swaps in to
the
memory for execution.
A variation of swap is priority based scheduling. When a low priority is executing and if a
high priority process arrives then a low priority will be swapped out and high priority is
allowed for execution. This process is also called as Roll out and Roll in.
Normally the process which is swapped out will be swapped back to the same memory space
that is occupied previously. This depends upon address binding.
If the binding is done at load time, then the process is moved to same memory location.
If the binding is done at run time, then the process is moved to different memory location.
This is because the physical address is computed during run time.
Swapping requires backing store and it should be large enough to accommodate the copies of
all memory images.
The system maintains a ready queue consisting of all the processes whose memory images
are on the backing store or in memory that are ready to run.
Swapping is constant by other factors:
o To swap a process, it should be completely idle.
o A process may be waiting for an i/o operation. If the i/o is asynchronously accessing
the user memory for i/o buffers, then the process cannot be swapped.
One of the simplest method for memory allocation is to divide memory in to several fixed
partition. Each partition contains exactly one process. The degree of multi-programming
depends on the number of partitions.
In multiple partition method, when a partition is free, process is selected from the input
queue and is loaded in to free partition of memory.
When process terminates, the memory partition becomes available for another process.
Batch OS uses the fixed size partition scheme.
The OS keeps a table indicating which part of the memory is free and is occupied.
When the process enters the system it will be loaded in to the input queue. The OS keeps
track of the memory requirement of each process and the amount of memory available and
determines which process to allocate the memory.
When a process requests, the OS searches for large hole for this process, hole is a large block
of free memory available.
If the hole is too large it is split in to two. One part is allocated to the requesting process and
other is returned to the set of holes.
The set of holes are searched to determine which hole is best to allocate. There are three
strategies to select a free hole:
o First bit:-Allocates first hole that is big enough. This algorithm scans memory from the
beginning and selects the first available block that is large enough to hold the process.
o Best bit:-It chooses the hole i.e., closest in size to the request. It allocates the smallest
hole i.e., big enough to hold the process.
o Worst fit:-It allocates the largest hole to the process request. It searches for the largest
hole in the entire list.
First fit and best fit are the most popular algorithms for dynamic memory allocation. First fit is
generally faster. Best fit searches for the entire list to find the smallest hole i.e., large enough. Worst
fit reduces the rate of production of smallest holes.
All these algorithms suffer from fragmentation.
Memory Protection:
Memory protection means protecting the OS from user process and protecting process from
one another.
Memory protection is provided by using a re-location register, with a limit register.
Re- location register contains the values of smallest physical address and limit register
contains range of logical addresses. (Re-location = 100040 and limit = 74600).
The logical address must be less than the limit register, the MMU maps the logical address
dynamically by adding the value in re-location register.
When the CPU scheduler selects a process for execution, the dispatcher loads the re-location
and limit register with correct values as a part of context switch.
Since every address generated by the CPU is checked against these register we can protect
the OS and other users programs and data from being modified.
Fragmentation:
Memory fragmentation can be of two types: Internal Fragmentation, External Fragmentation
In Internal Fragmentation there is wasted space internal to a portion due to the fact that block
of data loaded is smaller than the partition. Eg:-If there is a block of 50kb and if the process
requests 40kb and if the block is allocated to the process then there will be 10kb of memory
left.
External Fragmentation exists when there is enough memory space exists to satisfy the
request, but it not contiguous i.e., storage is fragmented in to large number of small holes.
External Fragmentation may be either minor or a major problem.
One solution for over-coming external fragmentation is compaction. The goal is to move all
the free memory together to form a large block. Compaction is not possible always. If the
relocation is static and is done at load time then compaction is not possible. Compaction is
possible if the re-location is dynamic and done at execution time.
Another possible solution to the external fragmentation problem is to permit the logical address
space of a process to be non-contiguous, thus allowing the process to be allocated physical memory
whenever the latter is available.
4. PAGING
Paging is a memory management scheme that permits the physical address space of a process
to be non-contiguous. Support for paging is handled by hardware.
It is used to avoid external fragmentation.
Paging avoids the considerable problem of fitting the varying sized memory chunks on to the
backing store.
When some code or date residing in main memory need to be swapped out, space must be
found on backing store.
Basic Method:
Physical memory is broken in to fixed sized blocks called frames (f).
Logical memory is broken in to blocks of same size called pages (p).
When a process is to be executed its pages are loaded in to available frames from backing
store.
The blocking store is also divided in to fixed-sized blocks of same size as memory frames.
The following figure shows paging hardware:
Logical address generated by the CPU is divided in to two parts: page number (p) and page
offset (d).
The page number (p) is used as index to the page table. The page table contains base address
of each page in physical memory. This base address is combined with the page offset to
define the physical memory i.e., sent to the memory unit.
The page size is defined by the hardware. The size of a power of 2, varying between 512
bytes and 10Mb per page.
If the size of logical address space is 2^m address unit and page size is 2^n, then high order
m-n designates the page number and n low order bits represents page offset.
Eg:-To show how to map logical memory in to physical memory consider a page size of 4 bytes and
physical memory of 32 bytes (8 pages).
a. Logical address 0 is page 0 and offset 0. Page 0 is in frame 5. The logical address 0 maps to
physical address 20. [(5*4) + 0].
b. Logical address 3 is page 0 and offset 3 maps to physical address 23 [(5*4) + 3]. c. Logical
address 4 is page 1 and offset 0 and page 1 is mapped to frame 6. So logical address 4 maps to
physical address 24 [(6*4) + 0].
d. Logical address 13 is page 3 and offset 1 and page 3 is mapped to frame 2. So logical address 13
maps to physical address 9 [(2*4) + 1].
Hardware Support for Paging:
The hardware implementation of the page table can be done in several ways:
1. The simplest method is that the page table is implemented as a set of dedicated registers. These
registers must be built with very high speed logic for making paging address translation. Every
accessed memory must go through paging map. The use of registers for page table is satisfactory if
the page table is small.
2. If the page table is large then the use of registers is not visible. So the page table is kept in the
main memory and a page table base register [PTBR] points to the page table. Changing the page
table requires only one register which reduces the context switching type. The problem with this
approach is the time required to access memory location. To access a location [i] first we have to
index the page table using PTBR offset. It gives the frame number which is combined with the page
offset to produce the actual address. Thus we need two memory accesses for a byte.
3. The only solution is to use special, fast, lookup hardware cache called translation look aside buffer
[TLB] or associative register. TLB is built with associative register with high speed memory. Each
register contains two paths a key and a value.
When an associative register is presented with an item, it is compared with all the key values, if
found
the corresponding value field is return and searching is fast.
TLB is used with the page table as follows:
TLB contains only few page table entries.
When a logical address is generated by the CPU, its page number along with the frame
number is added to TLB. If the page number is found its frame memory is used to access the
actual memory.
If the page number is not in the TLB (TLB miss) the memory reference to the page table is
made. When the frame number is obtained use can use it to access the memory.
If the TLB is full of entries the OS must select anyone for replacement.
Each time a new page table is selected the TLB must be flushed [erased] to ensure that next
executing process do not use wrong information.
The percentage of time that a page number is found in the TLB is called HIT ratio.
Protection:
Memory protection in paged environment is done by protection bits that are associated with
each frame these bits are kept in page table.
One bit can define a page to be read-write or read-only.
To find the correct frame number every reference to the memory should go through page
table. At the same time physical address is computed.
The protection bits can be checked to verify that no writers are made to read-only page.
Any attempt to write in to read-only page causes a hardware trap to the OS.
This approach can be used to provide protection to read-only, read-write or execute-only
pages.
One more bit is generally added to each entry in the page table: a valid-invalid bit.
A valid bit indicates that associated page is in the processes logical address space and thus it
is a legal or valid page.
If the bit is invalid, it indicates the page is not in the processes logical addressed space and
illegal. Illegal addresses are trapped by using the valid-invalid bit.
The OS sets this bit for each page to allow or disallow accesses to that page.
b. Hashed page table: Hashed page table handles the address space larger than 32 bit. The virtual
page number is used as hashed value. Linked list is used in the hash table which contains a list of
elements that hash to the same location.
Each element in the hash table contains the following three fields:
Virtual page number
Mapped page frame value
Pointer to the next element in the linked list
Working: Virtual page number is taken from virtual address.
Virtual page number is hashed in to hash table.
Virtual page number is compared with the first element of linked list.
Both the values are matched, that value is (page frame) used for calculating the
physical address.
If not match then entire linked list is searched for matching virtual page number.
Clustered pages are similar to hash table but one difference is that each entity in the
hash table refer to several pages.
c. Inverted Page Tables:Since the address spaces have grown to 64 bits, the traditional page tables
become a problem. Even with two level page tables. The table can be too large to handle. An
inverted page table has only entry for each page in memory. Each entry consisted of virtual address
of the page stored in that read-only location with information about the process that owns that page.
Each virtual address in the Inverted page table consists of triple <process-id , page number , offset >.
The inverted page table entry is a pair <process-id , page number>. When a memory reference is
made, the part of virtual address i.e., <process-id , page number> is presented in to memory sub-
system. The inverted page table is searched for a match. If a match is found at entry I then the
physical address <i , offset> is generated. If no match is found then an illegal address access has
been attempted. This scheme decreases the amount of memory needed to store each page table, it
increases the amount of time needed to search the table when a page reference occurs. If the whole
table is to be searched it takes too long.
If the code is reentrant then it never changes during execution. Thus two or more processes can
execute same code at the same time. Each process has its own copy of registers and the data of two
processes will vary.
• Only one copy of the editor is kept in physical memory. Each users page table maps to same
physical
copy of editor but date pages are mapped to different frames.
• So to support 40 users we need only one copy of editor (150k) plus 40 copies of 50k of data space
i.e.,
only 2150k instead of 8000k.
6. SEGMENTATION
What is the user’s view of memory?
subroutine Symbol
table
Main
program
stack sqrt
A technique that allows the execution of processes/programs that may not be completely in
memory.
It is the technique of separating the users logical memory from physical memory.
Advantage:
Programs can be larger that main memory size.
Abstracts the main memory into an extremely large and uniform array of storage. OR allows an
extremely large virtual memory to be provided to the programmers even though a small physical
memory is available.
Separates the logical view as seen by the user from the physical view of memory.
Frees programmers from the concerns of memory storage limitations.
Allows processes to easily share files and address spaces.
Also provides efficient mechanism for process creation.
Disadvantage:
Complexity.
Cost.
Difficulty in implementation.
Decrease in performance if not used carefully.
Since each user program takes less physical memory more programs can run at the same time.
This increases CPU utilization and throughput.
8. DEMAND PAGING
In a paging system with swapping, processes residing on the disk are swapped into memory for
execution.
In demand paging, unless a page is requested for it is not swapped.
In demand paging, the lazy swapper is used. The lazy swapper is one that does not swap a page
into memory unless that page will be needed.
The swapper can be referred to as a pager.
The pager guesses what pages are required. So instead of bringing in the entire process only the
necessary pages are brought into the memory. This decreases the swap time and the amount of
memory used.
Hence demand paging is a technique of not bringing the page until required.
The important thing is to guess the pages right and page them in so that the processes execute the
resident pages and complete execution normally.
When the first instruction is to be fetched it refers to the non-memory resident page thus causing
a page fault.
The desired page is brought to the memory.
The process of page fault and paging continues till all the desired pages are paged in.
At this point there are no more page faults and this whole scheme is referred to as pure demand
paging.
Hardware support is required to support demand paging is same hardware as paging and swapping.
Items are:
1. Page Table: the table has the ability to have bits for marking them valid and invalid or special
protection bits value.
2. Secondary Memory/Swap Space: holds pages that are not present in the main memory. Should be
a reasonably high-speed disk.
Not all the above steps are encountered all the time.
In general the three major components of the page-fault service time is:
1. Service the page fault interrupt.
2. Read in the page.
3. Restart the process.
The typical values of the different components of the page-fault service time
The first and the third tasks of the above components typically takes 1 to 100microseconds.
Page switch time is typically 24ms.
The typical latency of hard disk is 8ms and seek time of 15ms
Transfer time of 1ms.
9. COPY ON WRITE
Demand paging is used when reading a file from disk in to memory. Fork () is used to create
a process and it initially bypass the demand paging using a technique called page sharing.
Page sharing provides rapid speed for process creation and reduces the number of pages
allocated to the newly created process.
Copy-on-write technique initially allows the parent and the child to share the same pages.
These pages are marked as copy- on-write pages i.e., if either process writes to a shared page,
a copy of shared page is created.
Eg:-If a P1 process try to modify a page containing portions of the stack; the OS recognizes
them as a copy-on-write age and create a copy of this page and maps it on to the address
space of the child process. So the child process will modify its copied page and not the page
belonging to parent. The new pages are obtained from the pool of free pages.
Memory Mapping:Standard system calls i.e., open (), read () and write () is used for sequential read
of a file. Virtual memory is used for this. In memory mapping a file allows a part of the virtual
address space to be logically associated with a file. Memory mapping a file is possible by mapping a
disk block to page in
memory.
Victim Page
The page that is supported out of physical memory is called victim page.
If no frames are free, the two page transforms come (out and one in) are read. This will see
the effective access time.
Each page or frame may have a dirty (modify) bit associated with the hardware. The modify
bit for a page is set by the hardware whenever any word or byte in the page is written into,
indicating that the page has been modified.
When we select the page for replacement, we check its modify bit. If the bit is set, then the
page is modified since it was read from the disk.
If the bit was not set, the page has not been modified since it was read into memory.
Therefore, if the copy of the page has not been modified we can avoid writing the memory
page to the disk, if it is already there. Sum pages cannot be modified.
We must solve two major problems to implement demand paging: we must develop a frame
allocation algorithm and a page replacement algorithm. If we have multiple processors in
memory, we must decide how many frames to allocate and page replacement is needed.
Page replacement Algorithms
FIFO Algorithm:
This is the simplest page replacement algorithm. A FIFO replacement algorithm associates
each page the time when that page was brought into memory.
When a Page is to be replaced the oldest one is selected.
We replace the queue at the head of the queue. When a page is brought into memory, we
insert it at the tail of the queue.
Example: Consider the following references string with frames initially empty.
The first three references (7,0,1) cases page faults and are brought into the empty frames.
The next references 2 replaces page 7 because the page 7 was brought in first.
Since 0 is the next references and 0 is already in memory e has no page faults.
The next references 3 results in page 0 being replaced so that the next references to 0
causer page fault. This will continue till the end of string. There are 15 faults all together.
Belady’s Anamoly
For some page replacement algorithm, the page fault may increase as the number of allocated
frames increases. FIFO replacement algorithm may face this problem.
Optimal Algorithm
Optimal page replacement algorithm is mainly to solve the problem of Belady‘s Anamoly.
Optimal page replacement algorithm has the lowest page fault rate of all algorithms.
An optimal page replacement algorithm exists and has been called OPT.
The working is simple ―Replace the page that will not be used for the longest period of
time‖ Example: consider the following reference string
The first three references cause faults that fill the three empty frames.
The references to page 2 replaces page 7, because 7 will not be used until reference 18.
The page 0 will be used at 5 and page 1 at 14.
With only 9 page faults, optimal replacement is much better than a FIFO, which had 15
faults.
This algorithm is difficult implement because it requires future knowledge of reference
strings.
LRU Approximation
An LRU page replacement algorithm should update the page removal status information after
every page reference updating is done by software, cost increases.
But hardware LRU mechanism tend to degrade execution performance at the same time, then
substantially increases the cost. For this reason, simple and efficient algorithm that
approximation the LRU have been developed. With h/w support the reference bit was used.
A reference bit associate with each memory block and this bit automatically set to 1 by the
h/w whenever the page is referenced. The single reference bit per clock can be used to
approximate LRU removal. The page removal s/w periodically resets the reference bit to 0,
write the execution of the users job causes some reference bit to be set to 1.
If the reference bit is 0 then the page has not been referenced since the last time the reference
bit was set to 0.
11.ALLOCATION OF FRAMES
The allocation policy in a virtual memory controls the operating system decision regarding
the amount of real memory to be allocated to each active process.
In a paging system if more real pages are allocated, it reduces the page fault frequency and
improved turnaround throughput.
If too few pages are allocated to a process its page fault frequency and turnaround times may
deteriorate to unacceptable levels.
The minimum number of frames per process is defined by the architecture, and the maximum
number of frames. This scheme is called equal allocation.
Proportional Allocation: Allocate the memory according to the size of each process.
Let si be the size of the logical memory for process pi. Let the total number of frames be m.
And S = si
If we allocate ai frames to process pi then
ai = si/S x m
With multiple processes competing for frames, we can classify page replacement into two
broad categories a) Local Replacement: requires that each process selects frames from only
its own sets of allocated frame. b). Global Replacement: allows a process to select frame
from the set of all frames. Even if the frame is currently allocated to some other process, one
process can take a frame from another. In local replacement the number of frames allocated
to a process do not change but with global replacement number of frames allocated to a
process do not change global replacement results in greater system throughput.
Alternate strategy.
Slab is one or more physically contiguous pages.
Cache consists of one or more slabs.
Single cache for each unique kernel data structure.
Each cache filled with objects – instantiations of the data structure.
When cache created, filled with objects marked as free.
When structures stored, objects marked as used.
If slab is full of used objects, next object allocated from empty slab.
If no empty slabs, new slab allocated.
Benefits include no fragmentation, fast memory request satisfaction.
Syllabus:
1. File Concept.
2. Access Methods
3. Directory Structure
4. File System Mounting
5. File Sharing
6. Protection
7. Implementing File Systems: File System Structur
8. File System Implementation
9. Directory Implementation
10. Allocation Methods
11. Free Space Management
12. Efficiency and Performance
13. Recovery
FILE SYSTEM
File system is the most visible part of the OS. It has two parts: a collection of files each
storing related data and a directory structure for organizing the information about all files. Some
have the third component called partition used to logically separate large collection of directories.
1. File Concept
A file is a named collection of related information recorded on the storage media. The
information stored either on disk, tape, optical media or any other storage device require
uniform logical view of the information. This view is provided by the file system.
According to the type of file it has a defined structure.
o Text File: sequence of characters organized into lines or pages.
o Source File: sequence of subroutines and functions each of which is organized into
declarations and executable statements.
o Object File: is a sequence of bytes organized into blocks understandable by the system’s
linker.
o Executable File: is a series of code section (binary) that the loader can bring into memory
and execute.
A file has a name for convenience. Systems may be sensitive to case while naming and some
are not.
File attributes
A file has attributes which vary from system to system.
Some of the attributes are:
o Name: symbolic file name kept in human readable form.
o Identifier: is a unique tag which is usually a number and identifies the file within the file
system and is in the non-human readable format.
o Type: do depict the different types supported by the system.
o Location: it is pointer to the device and to the location of the file on that device.
o Size: indicates the current size of the file in bytes or words or blocks and also possibly
the maximum allowed size.
o Protection: represents the access control information like read/write/execute.
o Time, date, and user identification: depicts the creation time, last modification, last use.
Useful for usage monitory, protection and security.
File operations
Creating a file:
o space must be found for the file
o a new file entry in the directory must be made.
Writing a file:
o System call is used to specify the name of the file and the information to be written.
o With the file name the system searches directory for the file location.
o A write pointer indicates the location from where writing is to be done in the file.
o The current operation location is kept in the current-file-position pointer.
Reading a file:
o System call is used to specify the name. With the name the directory is searched for the
file location.
o Read pointer points to the location from where reading is to be done.
o The current operation location is kept in the current-file-position pointer.
Repositioning within a file:
o Set the current-file-position pointer to a given value.
o This file operation is known as seek.
Deleting a file:
o Search the directory for the file location with the name as index.
o Release the file space for reuse.
Truncating a file:
o Erase desired contents of the file but still keep its attributes.
Append: add new information at the end of the existing file.
Renaming: giving a different name to the existing file.
Copy: creating a new file, reading from the old file and writing those contents into the new
one.
Open: Open system call issued to bring up the file.
o Requires the file name and searches the directory for the location.
o Accepts access mode information create, read, read-write etc.
o Makes an entry in the open file table.
o Open file returns a pointer to the entry in the open-file table.
o Some systems open the file implicitly as soon as the file is referenced. Some systems
require that the user explicitly open the file.
o Close: when the file is no longer required the entry is removed from the open-file table.
The open-file table keeps information about all opened files. This is to avoid constant search
through the directory.
UNIX implementation:
o In UNIX the open and close are more complicated because it operates in a multiuser
environment. Here several users may open the file simultaneously.
o Two level internal tables are maintained:
A per process table: keeps track of all the files opened by the process. Access
rights, accounting information read write pointers are also in the table. Each entry
points to the system wide table.
System wide table: contains process independent information like location of file
on disk, access dates and file size. It has an open count associated with each file
indicating the number or processes that have opened that file. A close of file
decreases the count and open of the file increases the count.
Information associated with opening a file
o File Pointer: if systems do not include offset as part of read and write system calls to
track the read and write location the current file position pointer is used. The pointer is
unique to each process.
o File Open Count: to track the number of open files and the value of the counter reaches
zero when the last file is closed.
Dept of ISE, Dr.AIT Page 109
OPERATING SYSTEM
o Disk Location of the File: the information needed to locate the file on disk is kept in
memory to avoid having to read it from disk for each operation.
o Access Rights: per process access modes are stored.
File Locks:
Some OS provide a locking mechanism for open files. File locks are used when the file is shared
by several processes.
Ex. a system log file can be accessed and modified by a number of processes in the system.
A shared lock is a reader lock which can be acquired by several processes concurrently. An
exclusive lock is like a writer lock. Only one process at a time can acquire it. Systems can
provide either mandatory or advisory file locking mechanisms.
Mandatory file lock:
If a lock is mandatory then once the process acquires an exclusive lock the OS will prevent any
other process from acquiring the locked file. In this mechanism the OS ensures the locking
integrity.
Ex. UNIX OS.
Advisory file lock:
In this case the OS will not prevent another process from acquiring a locked file. It is the
responsibility of the software developers to ensure that the locks are appropriately acquired and
released.
Ex. Windows OS.
File types
The common technique for implementing the file types is to include it as a part of the file
name. The name is split into name and extension separated by a period.
File Structures
The type of the file may indicate the structure of the file. Source and object files have structures
that are suited to programs that call them. Some files have structures that are understood by the
OS. Some have system supported file structures with sets of special operations for users for
manipulating them. Means there are several structures available.
Disadvantage:
The OS has to support several file structures.
Complexity of the OS increases.
New applications also face problems if the OS requires information about structures not
supported by it.
Solution:
Some OS restrict to minimum number of structures.
Ex. In UNIX and MS-DOS each file is a sequence of 8bit bytes and no interpretation of these
bits is made by the OS. Each application must include code to interpret the file to the appropriate
structure. But all OS must support at least one structure. This scheme gives more flexibility.
Macintosh supports limited structures. It expects the file to contain two parts resource fork and a
data fork. The resource fork contains information of interest to the user. The OS provides tools
for allowing modification to data in resource fork. Ex. wanting to relabel the button in own
language. The data fork contains program code or data namely traditional file contents.
2. Access Methods
The information stored in the files have to be accessed for reading into computer
memory.The access can be done in several ways. Some systems provide only one access method like
IBM and some support many access methods.
Sequential Access:
Sequential access is the simplest access method. Based on tape model of a file.
Information in the file is processed in order; one record after another.
Most common method.
Ex. editors and compilers access file by this method.
During the read operation (read next) after reading one portion the file pointer automatically
moves to the next portion in order.
Write (write next) appends at the end of the file and advances the pointer to the end.
Advantage: Simple
Disadvantage: No random access.
3. Directory Structure
The number of files on disks today is very large. Need to organize them.
The organization is done in two parts.
First the disk is split into one or more partitions called minidisks/volumes. Each minidisk can
be treated as a separate storage device. Partitions can be grouped also to build a larger
memory.
Second each partition contains information about files within. This information is kept in
device directory/volume table of contents.
The directory records name, location, size and type for all files on that partition.
Directory overview
Different operations are performed on the directories:
o Search for a file: Search a file for the particular entry through the symbolic name. We should
be able to find all files that match a particular pattern.
o Create a file:New files need to be added and added to the directory.
o Delete a file:When a file is no longer needed it has to be removed from the directory.
o List a directory: Need to list the entries in the directory.
o Rename a file: Change the name of the file. Renaming may allow the position of the file in
the directory structure to be changed.
o Traverse the file system: Trying to access every directory and every file in the directory.
Single-Level Directory
Simplest, easy to support and understand.
All files of all users are contained in the same directory.
The problem is that when the number of entries increases all files in the directory must have
unique names. It becomes difficult to remember all file names as the number of files increases. It
also leads to confusion of file names between different users.
Two-Level Directory
Create a separate directory for each user called the user file directory (UFD). Each of the UFD
has similar structure but lists only the files of a single user.
The master file directory (MFD) is indexed by user name or account number has each entry
pointing to the UFD of that user. Several UFDs may have files of the same name, but within
UFD it has to be unique.
When a file is to be created the OS searches the UFD of only that user.Similarly to delete a file
the file in the user UFD is deleted.The user directories can also be created and deleted.
Advantage:
The two level directory solves the name collision problem.
Isolates one from user the other.
Disadvantage:
Isolation has advantage but when cooperation between users on some file is required it is not
possible.
Some systems do not allow this sharing at all.
To name that file both the user name and the file name must be given.
A two level directory is a tree structure of height 2. Root of the tree is MFD and the children are
the UFDs. The descendents of the UFDs are the files.
To specify a file we need to specify the file name from the root to the leaf. This is the path.
Every file in the system has a path name.
Ex. the programs like loaders, compilers, utility routines and libraries are defined as files. When a
command comes these files are read by the loader and executed. The file name is searched in the
local UFD. This will require a copy of the system files in each UFD. That is waste of space. Instead
a special user directory is created to contain all system files. So when a file name is given to be
loaded first the UFD is searched. If not present then the system searches the special user directory.
This sequence of directories searched is called as search path.
Tree-Structured Directories
The two level directory or the two level tree can be extended to a tree of arbitrary length. This
allows the creation of sub directories. Files can be organized here. Most common approach.
Ex. MS DOS.
There is a root directory. A directory has files or sub directories.
Does not allow sharing of files and directories between users. Creating separate subdirectories
helps in better structure and also to put related files together. Ex. the subdirectory programs may
contain source programs, subdirectory bin may contain all binaries.
Every file has a unique path from the root to through the sub directories and to the file. One bit is
used to indicate whether it is a file or a directory.
Special system calls for creating and deleting directories.
Each user has current directory where all the files of current interest are present.
When a file name is given the current directory is searched else if not here the entire path name
has to be specified.
The path names can be absolute: starts from the root and follows down to the specified file, or
relative: which defines the path from the current directory.
When a directory is to be deleted and if there are entries in it then two approaches are followed:
systems like MS DOS will not let deletion if there are entries in the directory. This may result in
too much work.
Alternatively if a delete directory command is given then all entries are also deleted. Like in
UNIX rm command. This is a convenient policy but dangerous.
With tree structure users can access other user files also by specifying either absolute or relative
path names.
Acyclic-Graph Directories
It allows directories to have shared subdirectories and files.
Same file or directory may be in two different directories.
A graph with no cycles is a generalization of the tree structure subdirectories scheme.
Shared files and subdirectories can be implemented by using links. A link is a pointer to
another file or a subdirectory. A link is implemented as absolute or relative path.
An acyclic graph directory structure is more flexible then is a simple tree structure but
sometimes it is more complex.
Advantage of acyclic:
Simplicity of the algorithms to traverse the graph.
More flexible.
Disadvantage of acyclic:
More complex than tree structure.
Problem with acyclic method is ensuring that there are no cycles.
Avoiding traversing shared sections of the acyclic graph twice for performance reasons.
Files may have multiple absolute path names. Hence distinct file names may refer to the same
file. This is similar to aliasing problem. Hence we have to make sure the same file is not
referenced twice say we are traversing the whole file system to collect statistics.
Another problem is deletion of shared files.
o Solution1: remove the file whenever someone deletes it. But this will lead to dangling
pointers.
o Solution2: using a link. Deletion of a link does not affect the original file. But if file entry
is deleted then the links will be dangling. We can keep track of the associated links also
and remove them but it is expensive.
o OR we can leave the links as it is and when referenced later determine that the file does
not exist and treat it as an illegal file name. In UNIX it is implemented in this manner. It
is up to the user to realize that the original file is gone.
Solution3: deletion of the file is left pending till all references to it are deleted. Keep a list of
all references to the file. When the file reference list is empty the file can be deleted. Problem
here is the potential large size of the list. Hence we keep only the count of number of
references. When count is 0 the file is deleted. The UNIX system uses this approach for
nonsymbolic links called as hard links by keeping the reference count in the file information
block.
When a two level directory is taken and allow users to create directories under them a tree
structure results. It is simple. Just add directories at the leaves of the tree.
When links are added to the tree structure the tree structure is destroyed and a graph is obtained.
help
Mount point
/
users
sue jane
prog doc
Ex. In Macintosh whenever the system encounters a disk for the first time the file system on the
device is searched for. If there is one the OS automatically mounts the file system to the root and
adds a folder icon on the screen labelled with the same name as the file system. The users will be
able to click on the icon and display the newly mounted file.
In Microsoft Windows the OS maintains an extended two level directory structure with devices and
partitions assigned a drive letter, then is the path and then the file name. Like drive-
letter:\path\to\file. The OS automatically discovers all devices and mount all located file systems at
boot time.
In UNIX the mount commands are explicit. A system configuration file contains a list of devices and
mount points for automatic mounting at boot time.
5. File Sharing
Need for file sharing is to help users to collaborate and reduce the effort required to achieve the goal.
It has several inherent difficulties but may be required at many instances.
Multiple users
When multiple users are sharing file: then file sharing process, file naming and file protection are
important issues.
When multiple user sharing is supported by the system then system must mediate file sharing.
Accessing other users files may be implemented as default or may require grant access to files.
For sharing more file attributes are required.
To incorporate sharing and protection the concept of owner and group is used. The owner can
change file attributes (perform all operations on files) whereas a group defines a subset of users
who can share access to files (perform a subset of operations). The owner and group id are stored
with the file attributes.
When a user requests an operation on a file the userid is compared with the owner attribute to
determine if the requesting user is the owner of the file. Then they can also be compared to group
ids. This comparison will indicate the operations that can be performed.
1. Manually transfer files between machines via programs like ftp. In ftp we can use anonymous and
authenticated access. In anonymous access transfer of files can happen even without having an
account on the remote machine.
2. Use DFS (distributed file system) where remote directories are visible from a local machine.
3. Use of www: a browser is needed to gain access to the remote files and then separate operations
are used to transfer files.
Consistency Semantics:
It is an important criterion for evaluating any file system that supports file sharing.
The semantics specify how the multiple users of the system are to access shared files.
The semantics are implemented as code in the file system.
They specify when the modifications of data by one user will be observable by the other users.
UNIX Semantics:
Writes to an open file by a user are visible immediately to other users that have this file open.
Advancing the current location pointer by one affects all sharing users.
Here the file has single image.
Session Semantics:
The AFS (Andrew file system) uses this semantics.
Writes to an open file are not immediately visible to other users who have the same file opened.
Once the file is closed the changes made to it are visible only in sessions starting later. Already
opened instances of the file do not reflect the changes.
6. Protection
Protection can be provided in many ways.
Types of Access
Access is permitted or denied based on the factors like
Read. Read from the file.
Write. Write or rewrite the file.
Execute. Load the file into memory and execute it.
Append. Write new information at the end of the file.
Delete. Delete the file and tree its space for possible reuse.
List. List the name and attributes of the file.
Access control can be done through Access-controlled list which specifies the user’s name and the
type of access. Disadvantage is the length.
To reduce the length of the list many systems use three classifications on users like
owner/group/universe.
Other protection mechanism is to associate a password.
File-organization module
I/O control
Devices
Dept of ISE, Dr.AIT Page 119
OPERATING SYSTEM
In-memory structures:
In-memory mount table: contains the information about the each mounted volume.
In-memory directory structure cache: holds the directory information of recently accessed
directories.
The system wide open file table: has a copy of FCB of each open file.
The per process open file table: contains pointer to the appropriate entry in the system wide open
file table.
When a new file is to be created the logical file system is called. It allocates a new FCB. Then the
appropriate directory is read in to memory, updated with a new name and then written back to the
disk.
FCB:
File permissions
File dates (create access, write)
File owner, group, ACL
File size
File data blocks or pointers to file data blocks
Hence the VFS distinguishes local files from remote ones and local files are further distinguished
according to their file system types.
The VFS activates file system specific operations to handle local requests according to the file types.
A virtual file system (VFS) or virtual file system switch is an abstraction layer on top of a more
concrete file system. The purpose of a VFS is to allow client applications to access different types of
concrete file systems in a uniform way. A VFS can, for example, be used to access local and
network storage devices transparently without the client application noticing the difference.
9. Directory Implementation
Linear List:
The directory is implemented using a linear list of file names.
To create a new file we search the directory to be sure that there is no name collision. Then the
new entry is added at the end of the directory.
To delete a file the directory is searched for the file name then the space allocated is released.
To reuse the directory entry the following can be done:
o Mark the entry as unused.
o Attach it to the list of free directory entries.
o Copy the last entry of the directory into the freed location.
Advantage: Simplest method.
Disadvantage: Linear search used which can be time consuming. It may have noticeable slow
access time.
Solution: use binary search on a sorted list or use a cache.
Hash Table:
This is another data structure used for implementing directories.
The hash table takes the value computed from the file name and returns a pointer to the file name
in the linear list.
This greatly decreases the search time.
Care must be taken to avoid collisions: where two file names hash to the same location.
Another problem is the size of the hash table and corresponding function that is to be used.
A chained overflow hash table is used. Each hash entry can be a linked list and collisions are
avoided by adding the new entry in the linked list.
If the file is n blocks long and starts at location b, then it occupies blocks b, b+1,
b+2…………….b+n
The file allocation table entry for each file indicates the address of starting block and the
length of the area allocated for this file. Contiguous allocation is the best from the point of
view of individual sequential file.
It is easy to retrieve a single block. Multiple blocks can be brought in one at a time to
improve I/O performance for sequential processing. Sequential and direct access can be
supported by contiguous allocation.
Contiguous allocation algorithm suffers from external fragmentation.
Another problem with contiguous allocation is determining how much space is needed for a
file. When the file is created, the total amount of space it will need must be found and
allocated.
Characteristics:
Supports variable size portion.
Pre-allocation is required.
Requires only single entry for a file.
Allocation frequency is only once.
Advantages:
Supports variable size problem.
Easy to retrieve single block.
Accessing a file is easy.
It provides good performance.
Disadvantage:
Pre-allocation is required.
It suffers from external fragmentation.
Linked Allocation
It solves the problem of external fragmentation. This allocation is on the basis of an
individual block. Each block contains a pointer to the next block in the chain. The disk block can be
scattered anywhere on the disk. For example, a file of five blocks might start at block 9 and continue
at block 16, then block 1, then block 10, and finally block 25 Each block contains a pointer to the
next block. These pointers are not made available to the user. Thus, if each block is 512 bytes in
size, and a disk address (the pointer) requires 4 bytes, then the user sees blocks of 508 bytes. The
directory contains a pointer to the first and the last blocks of the file.
There is no external fragmentation since only one block is needed at a time. The size of a file need not be
declared when it is created. A file can continue to grow as long as free blocks are available.
Advantages:
No external fragmentation.
Compaction is never required.
Pre-allocation is not required.
Disadvantage:
Files are accessed sequentially.
Space required for pointers.
Reliability is not good.
Cannot support direct access.
Indexed Allocation
The file allocation table contains a separate one level index for each file. The index has one entry
for each portion allocated to the file. The i th entry in the index block points to the i th block of the file.
The indexes are not stored as a part of file allocation table rather than the index is kept as a separate
block and the entry in the file allocation table points to that block. Allocation can be made on either
fixed size blocks or variable size blocks. When the file is created all pointers in the index block are
set to nil. When an entry is made a block is obtained from free space manager. Allocation by fixed
size blocks eliminates external fragmentation where as allocation by variable size blocks improves
locality. Indexed allocation supports both direct access and sequential access to the file.
Advantages:
Supports both sequential and direct access.
No external fragmentation. Faster then other two methods.
Supports fixed size and variable sized blocks.
Disadvantage:
Suffers from wasted space.
Pointer overhead is generally greater
Syllabus:
1. Mass storage structures
2. Disk structure
3. Disk Attachment
4. Disk Scheduling Methods
5. Disk management
6. Swap-Space Management.
7. Protection: Goals of protection
8. Principles of protection
9. Domain of protection
10. Access matrix
11. Implementation of access matrix
12. Access control
13. Revocation of access rights
14. Capability-Based systems.
Secondary-Storage Structure
The lowest level of the file structures is the secondary and the tertiary storage.
Disk is the bulk of storage in today’s machine. Conceptually they are simple.
Each disk platter has a flat circular shape. Platter diameters range from 1.85inches to 5.25inches.
The two surfaces of the platter are covered with magnetic material. The read-write head flies
over each surface of the platter. The heads are attached to the disk arm that moves all the heads
as a unit.
The surface of the platter is logically divided into circular tracks. Tracks are divided into sectors.
There are hundreds of sectors in each track. Set of tracks makes up the cylinder. There are many
concentric cylinders in a disk drive.
When the disk is in use the drive motor spins at high speed. Most drives rotate at 60 to 200 times
per second.
Disk speed has two parts:
Transfer rate: is the rate at which the data flow between the drive and the computer;
typical rates are several mega bytes per second.
Positioning time: also called as random access time or seek time: is the time to move the
disk arm to the desired cylinder and can be several milliseconds.
The time for the desired sector to rotate to the disk head is called rotational latency.
Since the head flies on an extremely thin cushion of air (microns) there is a danger that the head
will make contact with disk surface there by damaging the magnetic surface. This accident is
called a head crash.
Disks can be Removable: allows different disks to be mounted as needed. They generally consist
of one platter held in a plastic case to prevent damage.
Floppy Disks: inexpensive removable magnetic disk, which have a soft plastic case containing a
flexible platter. The head generally sits on the disk surface. So they rotate more slowly than hard
disk. Storage capacity is 1.44MB or more.
It is attached to the computer by a set of wires called the I/O bus. Different types of buses are:
EIDE: Enhanced Integrated Drive Electronics
ATA: Advanced Technology Attachment
SATA: serial ATA
USB: Universal Serial Bus
FC: Fiber Channel and
SCSI: Small Computer System Interface.
Dept of ISE, Dr.AIT Page 128
OPERATING SYSTEM
The data transfers on a bus is carried out by special electronic processors called controllers. The
host controller is the controller at the computer end and the disk controller is built into each disk
drive. To perform a disk I/O the computer places a command into the host controller using
memory-mapped I/O ports. The host controller then sends the command via messages to the disk
controller and the disk controller operates the disk drive to carry out the command.
Magnetic Tapes:
Tapes act mainly as backup for infrequently used information. For storing large quantities of data
that is too big for the disk.
As a medium for transferring information from one medium to another. But they are not used for
normal working of the machine. Since they are very slow.
It is kept in a spool and is wound or rewound past a read-write head. Moving to the correct spot
on the tape make take several minutes.
Tape capacities are generally 20GB to 200GB. Tapes and their drivers are categorized by width.
Tapes are named according to technology.
2. Disk structure
Disks are addressed as a large one-dimensional array of logical blocks, which is the smallest unit
of transfer. Each logical block size is usually 512 bytes although some disks are low level
formatted to choose a different logical block size of 1024bytes.
The logical blocks are mapped to sectors sequentially. Sector 0 is the first sector of the first track
on the outermost cylinder.
The mapping proceeds from that entire track to the rest of the tracks on that cylinder and then to
the other cylinder from the outermost to the innermost.
The farther away the tracks from the centre the more number of sectors it can hold. The number
of sectors per track decreases as we move inwards.
There are 40% more sectors in the outermost tracks than the inner most one.
The storage media which use CLV (Constant linear velocity) the density of bits per track is
uniform.
The drives increases the rotation speed as the head moves from outer to inner tracks to keep the
same rate of data to be moving under the head. Ex. in CD-ROM and DVD ROM drives.
In some drives the rotation speed is constant but the bit density decreases from inner track to
outer to keep the data rate constant. This method is used in hard disks known as CAV (Constant
Angular Velocity).
3. Disk Attachment
Host-Attached Storage:
Common to all small systems.
Storage access is via I/O ports.
Ports have several technologies:
o Desktop PC uses I/O bus architecture called IDE or ATA. This architecture supports only
two drives per bus.
o Newer technology used is SATA.
o High-end workstations and servers use sophisticated architecture like SCSI and FC.
SCSI:
Is a bus architecture.
Its physical medium is a ribbon cable having many (50 to 68) conductors.
This protocol supports max of 16 devices on the bus. I.e. 1 controller card called
SCSI initiator and 15 storage devices called SCSI targets.
The SCSI targets are normally disks.
It can address up to 8 logical units like components of a RAID (Redundant Array of Inexpensive
Disks) array.
FC:
It is a high-speed serial architecture.
It can operate over fiber of four conductor copper cable.
They are of two types:
1. A large switched fabric having 24-bit address space. This is the basis of SANs (Storage Area
Networks). They support large address space and use switch nature of communication. They
have greater flexibility.
2. Arbitrated loop is another FC that can address 126 devices.
2. Network-Attached-Storage (NAS):
Storage-Area Network:
Problem:
Consider the following disk queue with the requests for I/O blocks on cylinders 98, 183, 37, 122, 14,
124, 65, 67. Let the head start at 53. What is the total head movement? Apply FCFS.
2. SSTF Scheduling:
Shortest Seek Time First scheduling.
Service all requests close to the current head position before moving the head far away. I.e. the
minimal seek time form the current head position.
Increase in performance.
Like SJF scheduling.
Not an optimal algorithm.
Problem: Consider the following disk queue with the requests for I/O blocks on cylinders 98, 183,
37, 122, 14, 124, 65, 67. Let the head start at 53. What is the total head movement? Apply SSTF.
Note: say the current service is at 14 and a request for 189 is pending. If immediately a service for
17 is got from 14 the head moves to 17 keeping 189 waiting. If many more requests near to 14 arrive
they will all be serviced before 189. This keeps 189 waiting for a long time.
3. SCAN Scheduling:
The disk arm starts at one end of the disk and moves towards the other end servicing requests on
the way till it reaches the other end.
At the other end the direction of the head is reversed servicing requests as it moves.
The head SCANS back and forth and hence the name.
Also called as an elevator algorithm. The request coming ahead of the head will get service and
the requests coming behind the head will have to wait.
Problem:
Consider the following disk queue with the requests for I/O blocks on cylinders 98, 183, 37, 122, 14,
124, 65, 67. Let the head start at 53. what is the total head movement? Apply SCAN. Say the head is
moving towards 0.
4. C-SCAN Scheduling:
Circular SCAN.
Provides more uniform wait time.
When the head reaches one end it immediately returns to the beginning without servicing any
requests in this journey.
It treats the cylinders as a circular list that wrap around.
Problem:
Consider the following disk queue with the requests for I/O blocks on cylinders 98, 183, 37, 122, 14,
124, 65, 67. Let the head start at 53. what is the total head movement? Apply C-SCAN. Say the head
is moving towards 0.
5. LOOK Scheduling:
Same as SCAN but allow the arm to go only to the farthest request.
In C-LOOK same as C-SCAN but only till farthest request.
LOOK:
C-LOOK:
0 14 37 53 65 67 98 122 124 183 199
Assignment Problem:
Work Queue: 23, 89, 132, 42, 55, 13, 75, 144, 189, 12, 187
There are 200 cylinders numbered from 0 - 199
the disk head stars at number 100.
The disk head moves towards 0.
Calculate the total head movement using FCFS, SSTF, SCAN, C_SCAN, LOOK, C-LOOK.
5. Disk management
Low Level formatting:
Before the disk can store data it must be divided into sectors that the disk controller can read and
write. This process is called low level formatting or physical formatting.This formatting fills
special data structure for each sector.
The data structure consists of a header, a data area and a trailer. The header and the trailer has
information like sector number and ECC (error correcting code) used by the disk controller.
When the disk controller writes to sector/data area the ECC is updated with the value calculated
from all the bytes in the data area.
When a sector is read the ECC is recalculated and compared with the stored value. If the
numbers mismatch that means the sector is corrupted and the sector is bad.
Most disks are formatted in the factory to check for any bad sectors.
When the disk controller is instructed to low level format the disk it is told what should be the
data area between the header and the trailer.
Sizes are 256, 512, 1024bytes. Larger the sector size fewer will be number sectors per track,
lesser is the number of headers and trailers and more space for the data area.
To hold files on the disk the OS puts its own data structures on the disk.
The OS does two things:
Dept of ISE, Dr.AIT Page 134
OPERATING SYSTEM
1. Partition:
Partition the disk into one or more groups of cylinders each acting like a separate disk.
2. Logical Formatting:
The OS stores the initial file system data structures on the disk which represent maps of free and
allocated space.
To increase efficiency most file systems group blocks together into larger chunks called clusters.
Boot Block
bootstrap program
When the computer starts running when powered or when it is restarted/rebooted it needs an
initial program to run. This initial program is called bootstrap.
It initializes all aspects of the system including CPU registers, device controllers & contents of
main memory & then starts OS.
It finds the OS kernel on the disk loads it into memory and jumps to an initial address to begin
the OS execution.
The location where the full bootstrap program is stored in a partition is called boot block and is
at a fixed location on the disk.
The memory used for the bootstrap program is ROM.
The disk is divided into partitions (in Windows 2000) and the partition that contains OS and the
device drivers is called as boot partition.
The disk that has a boot partition is called a boot disk or system disk.
Master Boot Record MBR-The Windows 2000 places the boot code in the first sector of the hard
disk called MBR.
Bad Blocks
Since disks are moving parts and have small tolerances they are prone to failure.
Either the whole disk may have to be replaced or only sectors get corrupted.
These corrupted sectors are called bad blocks.
Generally data is lost in bad blocks.
How are bad blocks handled?
On some disks the bad blocks are manually handled.
Ex. MS DOS format command checks for bad blocks. If it finds one it writes special value into
the corresponding FAT (File Allocation Table) entry.
Later whenever the bad sector is accessed the request is translated into replacement sectors
address by the controller. (This is redirection by the OS it actually invalidates optimisation
by the disk scheduling algorithm).
Alternate to sector sparing some controllers replace the bad sector by sector slipping.
6. Swap-Space Management.
Swap space management is a low level OS task. It provides an extension to physical memory
in the form of VM on the disk. Tries to provide better throughput to undo the decrease in
performance because of swapping.
Swap space is an area on disk that temporarily holds a process memory image. Used based
on the type of memory management algorithm used. Those that use swapping use the swap space for
holding the process image. Store pages pushed out of memory. Amount of swap space varies based
on the amount of physical memory. It can vary from few MB to few GB. In UNIX multiple swap
spaces are allowed. Swap spaces are put on a separate disk.
The swap space can reside at two places;
It can be taken out of the file system or put in a separate disk space/partition.
If in file system then normal file system routines for creation, naming and allocation of space
can be used.
It is easy approach but inefficient. Time to search directories takes time. When the swap space is on
a separate partition a separate swap manager is used to allocate and deallocate blocks. A fixed
amount swap space is allocated in each system. This manager uses algorithms optimized for speed
rather than storage efficiency. Even internal fragmentation may exist but this can be ignored as the
data is temporary in the swap space.
In SOLARIS the swap space is allocated only when the page is forced out of the physical memory.
This gives better performance.
Processes in the system must be protected from one another’s activities. Else there may be
disruption in the normal working. Protection refers to a mechanism for controlling the access of
programs, processes or users to resources defined by the computer system. It includes specification
of controls and means of enforcement. Security is a measure of confidence that the integrity of a
system and the data will be preserved.
Goals of protection
Provides a means to distinguish between an authorized and unauthorized usage.
To prevent mischievous, intentional violation of an access restriction by the user.
To ensure that each program component which is active in a system uses system resources only
in ways consistent with stated policies. (This gives a reliable system).
To detect latent errors at the interfaces between the component subsystems. (This can improve
reliability). Early detection helps in preventing malfunctioning of subsystems.
To enforce policies governing resource usage.
8. Principles of protection
The time tested guiding principle used for protection is called principle of least privilege. It states
that programs, users and even systems be given just enough privileges to perform their tasks.
An OS following this principle implements its features, programs, system calls and data
structures so that failure or compromise of a component does the minimum damage and allows
minimum damage to be done. Such OS have fine grained access control.
It provides mechanisms to enable privileges when they are needed and to disable them when not
needed.
Privileged function access have audit trails that enable programmer or systems administrator or
law-enforcement officer to trace all protection and security activities of the system.
We can create separate accounts for each user with just the privileges that the user needs.
At any given time a process should be able to access only those resources that it currently requires.
This is called need to know principle.
9. Domain of protection
Protection domain is the domain/set that specifies the resources that the process may access.
The domain defines the set of objects and the type of operation that may be invoked on each object.
The ability to execute an operation on an object is called access rights. A domain is a collection of
access rights each of which is ordered pair <object-name, rights set>.
Ex. if domain D has access right <file F, {read, write}> then a process executing in domain D can
both read and write file F and no other operation.
2. Each process may be a domain. Here the set of objects that can be accessed depends on the
identity of the process. Domain switching occurs when one process sends a message to another
process and is waiting for response.
3. Each procedure may be a domain. Here the set of objects that can be accessed corresponds to the
local variables defined within the procedure. Domain switching occurs when a procedure call is
made.
2. To prevent this in some systems the privileged programs are put in a special directory.
The OS is designed to change the user-id of any program run from this directory.
This will eliminate any secret setuid.
This method is more flexible.
3. In some systems it is more restrictive and protective:
they do not allow any change of user-id.
Special techniques are used to provide access to privileged facilities.
A daemon process is started at boot time and is run as a special user-id.
Users run a special program which sends a special request to the daemon whenever they need
the facility.
2. MULTICS:
The protection domains are organized hierarchically in a ring structure numbered from 0-7.
Each ring corresponds to a single domain.
Process executing in D0 has most privileges. If j< i then the processes executing in Di have more
privileges than the process in Dj.
MULTICS has segmented address space and each segment is a file.
Each segment is associated with one of the rings.
The segment description includes ring number, three access bits for read/write/execute.
Each process is associated with a current ring number counter identifying the ring in which it is
currently executing.
A process in segment associated with ring i can only access a segment in ring k if k >=i.
Domain switching in MULTICS happens when process crosses from one ring to the other.
This switching is done in a controlled manner.
To allow controlled switching the segment descriptor’s ring field is modified to include the
following:
o Access bracket: a pair of integers b1, b2 such that b1<=b2.
o Limit: an integer b3 such that b3<b2.
o List of gates: identifies the entry points at which the segments may be called.
If a process is executing in ring i and calls a procedure with access bracket (b1,b2) then the call
is allowed if b1<=i<=b2. But the ring number will remain i. Else trap issued and handled
separately.
If i> b2 then the call is allowed to occur only if b3 >=i. The call is directed to one of the
designated entry points in the list of gates. This limits access rights.
The disadvantage of ring is that it does not allow need to know principle.
Protection scheme is more complex in MULTICS but less efficient. The performance is also
lower.
Object F1 F2 F3 Printer
Domain
D1 read read
D2 print
D3 read execute
D4 read read
write write
Object F1 F2 F3
Domain
D1 execute write*
D2 execute read*
D3 execute execute
Object F1 F2 F3
Domain
D1 execute write*
D2 execute read*
D3 execute read execute
A right is copied from access(i,j) to access(k,j). It is then removed from access(i,j). this action is
called transfer of right rather than copy.
Propagation of copy right may be limited called limited copy. I.e. when the right R* is copied
from access(i,j) to access(k,j) only the right R and not R* is created. A process executing in
domain Dk cannot further copy the right R.
A system may have only one of these copy rights or may have all of them.
Owner:
Allows addition of new right and deletion of some rights.
If access(I,j) includes owner rights then a process in Di can add or remove any right in any entry
in that column.
Note: both copy and owner allow changes only to the columns.
To allow entry to rows we use control.
Control:
If access(I,j) includes control rights then a process in Di can remove any access right from row j.
The copy and owner limit the propagation of access rights. But it does not guarantee that information
does not migrate outside the execution environment. This is called confinement problem.
With this scheme, a key can be associated with several objects, and several keys can be
associated with each object, providing maximum flexibility. In key-based schemes, the
operations of defining keys, inserting them into lists, and deleting them from lists should not
be available to all users. In particular, it would be reasonable to allow only the owner of an
object to set the keys for that object. This choice, however, is a policy decision that the
protection system can implement but should not define.