OS-module 2 Notes
OS-module 2 Notes
Multithreading Models
Support for threads may be provided either at the user level, for user threads, or by the kernel, for
kernel threads.
User Threads :
• Thread management done by user-level threads library
• Three primary thread libraries: POSIX Pthreads, Win32 threads and Java threads
Kernel Threads:
• Supported by the Kernel
• Examples: Windows XP/2000, Solaris, Linux, Tru64 UNIX, Mac OS X
Ultimately, a relationship must exist between user threads and kernel threads. Now we look at three
common ways of establishing such a relationship.
Different types of thread models: Many-to-One, One-to-One, Many-to-Many
1) Many-to-one model:
• Maps many user-level threads to one kernel thread.
• Because only one thread can access the kernel at a time, multiple threads are unable to run in
parallel on multiprocessors.
• Thread management is done by the thread library in user space, so it is efficient; but the entire
process will block if a thread makes a blocking system call
• The entire process will block if a thread makes a blocking system call.
• Examples: Solaris Green Threads, GNU Portable Threads
2) One-to-One
• Maps each user thread to a kernel thread.
• It provides more concurrency than the many-to-one model by allowing another thread to run
when a thread makes a blocking system call
• It also allows multiple threads to run in parallel on multiprocessors.
• Examples : Windows NT/XP/2000, Linux, Solaris 9 and later
• Drawback : creating a user thread requires creating the corresponding kernel thread which can
burden the performance of an application
3) Many-to-many model
• Allows many user level threads to be mapped to many kernel threads.
• Allows the operating system to create a sufficient number of kernel threads
• The number of kernel threads may be specific to either a particular application or a particular
machine. When a thread performs a blocking system call, the kernel can schedule another thread for
execution.
• Ex: Solaris prior to version 9, Windows NT/2000 with the ThreadFiber package
Comparison of 3 models:
• The Many-to-one model allows the developer to create as many user threads as he wishes, but
true concurrency is not gained because kernel can schedule only one thread at a time.
• The one-to-one model allows for greater concurrency, but the developer has to be careful not to
create too many threads within an application.
• The many-to-many model has neither of these shortcomings: Developers can create as many
user threads as necessary, and corresponding kernel threads can run in parallel on a
multiprocessor.
Two-level model
• Maps many user-level threads to a smaller or equal number of kernel threads but also allows a
user-level thread to be bound to a kernel thread.
• Similar to Many to Many, except that it allows a user thread to be bound to kernel thread
• Examples : IRIX, HP-UX, Tru64 UNIX, Solaris 8 and earlier
Thread Libraries
• Thread library provides programmer with API for creating and managing threads
• Two primary ways of implementing
• Library entirely in user space
• Kernel-level library supported by the OS
POSIX Pthreads
Pthreads Example
The statement pthread_t tid declares the identifier for the thread we will create. Each thread has a set
of attributes, including stack size and scheduling information. The pthread_attr_t attr declaration
represents the attributes for the thread. We set the attributes in the function call pthread_attr_init
(&attr).
A separate thread is created with the pthread_create() function call. In addition to passing the thread
identifier and the attributes for the thread, we also pass the name of the function where the new thread
will begin execution-in this case, the runner() function. Last, we pass the integer parameter that was
provided on the command line, argv[1].
At this point, the program has two threads: the initial (or parent) thread in main() and the summation
(or child) thread performing the summation operation in the runner() function. After creating the
summation thread, the parent thread will wait for it to complete by calling the pthread_join() function.
The summation thread will complete when it calls the function pthread_exit(). Once the summation
thread has returned, the parent thread will output the value of the shared data sum.
#include <pthread.h>
#include <stdio.h>
/* The thread will begin control in this function */ void *runner(void *param)
{inti, upper= atoi(param);
sum = 0;
for (i = 1; i <= upper; i++)
sum += i;
pthread_exit(0);
}
Win32 Threads:
The technique for creating threads using the Win32 thread library is similar to the Pthreads technique
in several ways. We illustrate the Win32 thread API in the C program shown in Figure below. We must
include the windows.h header file when using the Win32 API.
Program:
Java Threads
• Java threads are managed by the JVM.
• Typically implemented
using the threads model provided
by underlying OS
• Java threads may be created
by:
o Extending Thread class
o Implementing the Runnable
interface Java Multithreaded
Program
Threading Issues
Thread Cancellation
other threads.
• Canceling a thread asynchronously may not free a necessary system-wide resource.
Signal Handling
• A signal is used in UNIX systems to notify a process that a particular event has occurred.
• A signal is generated by the occurrence of a particular event. A generated signal is delivered to
a process. Once delivered, signal must be handled.
• Synchronous signals include illegal memory access and division by 0.
• Synchronous signals are delivered to the same process that performed the operation that caused
the signal.
• Asynchronous signal is generated by an event external to a running process, for example
terminating a process with specific keystrokes (such as <control><C>) and having a timer expire.
• An asynchronous signal is sent to another process.
• Delivering signals is more complicated in multithreaded programs, where a process may have
multiple threads.
• Following operations exist;
◦ Deliver the signal to the thread to which the signal applies.
◦ Deliver the signal to every thread in the process.
◦ Deliver the signal to certain threads in the process.
◦ Assign a specific thread to receive all signals for the process.
• Synchronous signals need to be delivered to the thread causing the signal and not to other
threads in the process. Some asynchronous signals such as a signal that terminates a process
(<control><C>) should be sent to all threads.
Thread Pools
• Whenever web server receives a request, it creates a separate thread to service the request.
• The first issue concerns the amount of time required to create the thread prior to servicing the
request.
• If we allow concurrent requests to be serviced in a new thread, we need to create too many
threads. Unlimited threads could exhaust system resources, such as CPU time or memory.
• One solution is to use thread pool. Create a number of threads at process startup and place them
into a pool, where they sit and wait for work.
Scheduler Activations
• Both M:M and Two-level models use an intermediate data structure between user and kernel
threads – lightweight process (LWP).
• Appears to be a virtual processor on which process can schedule user threads to run. Each LWP
is attached to a kernel thread.
• Each LWP is attached to a kernel thread, and it is kernel threads that the operating system
schedules to run on physical processors. If a kernel thread blocks (such as while waiting for an I/0
operation to complete), the LWP blocks as well. Up the chain, the user-level thread attached to the
LWP also blocks.
• Communication between user-thread library and kernel is known as Scheduler activation.
• The kernel provides an application with a set of virtual processors and application schedules
user threads on available virtual processors.
• Operating system schedules kernel threads on physical processors.
• Kernel must inform an application about certain events, This is known as an upcall. Upcalls are
handled by upcall handler in the thread library.
CPU Scheduler
• Whenever CPU becomes idle, the operating system must select one of the processes in the
ready queue to be executed. The selection process is carried out by the Short-term scheduler.
• CPU scheduling decisions may take place when a process:
1. Switches from running state to waiting state.
2. Switches from running to ready state.
3. Switches from waiting to ready.
4. Terminates.
• Scheduling under circumstances 1 and 4 is nonpreemptive or cooperative; otherwise it is
preemptive.
• Under nonpreemptive scheduling, once the CPU has been allocated to a process, the process
keeps the CPU until it releases the CPU either by terminating or by switching to the waiting state.
• Consider the case of two processes that share data.While one is updating the data, it is
preempted. The second process then tries to read the data, which are in an inconsistent state.In such
case we need mechanisms to coordinate access to shared data.
Dispatcher
• Dispatcher module gives control of the CPU to the process selected by the short-term
scheduler; this involves:
• switching context
• switching to user mode
• jumping to the proper location in the user program to restart that program
• Dispatch latency – time it takes for the dispatcher to stop one process and start another
running.
Scheduling Criteria
• Different CPU-scheduling algorithms have different properties, and the choice of a particular
algorithm may favor one class of processes over another.
• Following criteria have been suggested for comparing CPU-scheduling algorithms:
◦ CPU utilization – keep the CPU as busy as possible.
◦ Throughput –number of processes that complete their execution per time unit.
◦ Turnaround time – Interval from the time of submission of a process to the time of
completion.
◦ Waiting time – The sum of periods spent waiting in the ready queue.
◦ Response time – amount of time it takes from the submission of a request until the first
response is produced, not output.
Scheduling Algorithms
• The process that requests the CPU first is allocated the CPU first.
• When a process enters the ready queue, its PCB is linked onto the tail of the queue. When the
CPU is free, it is allocated to the process at the head of the queue.
• Consider the following set of processes that arrive at time 0 in the order: P1, P2, P3. The length
of the CPU burst given in milliseconds.
• FCFS scheduling result is shown in the following Gantt Chart (bar chart that illustrates a
particular schedule including start and finish time of each participating process).
• Thus average waiting time under FCFS policy is generally not optimal.
• FCFS scheduling algorithm is nonpreemptive. Once the CPU has been allocated to a process,
that process keeps the CPU until it releases the CPU, either by terminating or by requesting I/O.
◦ Thus FCFS algorithm is troublesome for time-sharing systems.
◦ Convoy effect – Many short processes wait for one big process to get off the CPU. Consider
one CPU-bound and many I/O-bound processes.
• SJF is optimal – gives minimum average waiting time for a given set of processes.
• Although SJF is optimal, it cannot be implemented at the level of short-term scheduler. There is
no way for short-term scheduler to know the length of the next CPU burst.
◦ One approach is to predict the length of the next CPU burst. Can be done by using the length of
previous CPU bursts, using exponential averaging.
t n=actual length of nth CPU burst
τn+1= predicted value for the next CPU burst
3 . α , 0≤α ≤1
4. Define: τn + 1 =αtn+(1−α ) τn .
• α controls the relative weight of the actual length and predicted length of nth CPU burst.
Commonly, α set to ½. Expanding the formula
• When next CPU burst of the newly arrived process is shorter than what is left of the currently
executing process,
◦ Preemptive SJF algorithm will preempt the currently executing process.
◦ Nonpreemptive SJF algorithm will allow the currently running process to finish its CPU burst.
• Average waiting time = [(10-1)+(1-1)+(17-2)+(5-3)]/4 = 26/4 = 6.5msec.
Priority Scheduling
• A priority is associated with each process and CPU is allocated to the process with highest
priority.
• SJF is priority scheduling where priority is the inverse of predicted next CPU burst time.
• We assume that low numbers represent high priority.
• Consider the following set of processes, assumed to have arrive at time 0 in the order P1, P2, . .
• When priority of the newly arrived process is higher than the priority of the currently executing
process, Preemptive priority scheduling algorithm will preempt the currently executing process.
Nonpreemptive priority scheduling algorithm will put the new process at the head of the ready queue.
• A major problem with priority scheduling algorithm is indefinite blocking, or starvation.
• A priority scheduling algorithm can leave some low priority processes waiting indefinitely. A
steady stream of high priority processes can prevent a low-priority process from ever getting the CPU.
• A solution to the problem of indefinite blockage of low-priority processes is aging.
• Aging is a technique of gradually increasing the priority of processes that wait in the system for
long time.
• The round-robin (RR) scheduling algorithm is designed especially for time-sharing systems.
• Each process gets a small unit of CPU time (1 time quantum q), usually 10-100 milliseconds.
After this time has elapsed, the process is preempted and added to the end of the ready queue.
• Timer interrupts every quantum to schedule next process.
• The process may have a CPU burst of less than 1 time quantum. In this case process itself will
release the CPU voluntarily.
• Consider the following set of processes that arrive at time 0, time quantum =4msec.
• P1 waits for 6 milliseconds (10 – 4), P2 waits for 4 milliseconds and P3 waits for 7
milliseconds. Thus average waiting time is 17/3 = 5.66 milliseconds.
• The performance of the RR algorithm depends heavily on the size of the time quantum q.
• We also need to consider the effect of context switching on the performance of RR scheduling.
If the time quantum is 1 time unit and if we have only one process of 10 time units, then 9 context
switches will occur slowing the execution of the process.
• Time quantum should be large with respect to context switch time. If context switch time is 10
percent of the time quantum, then about 10 percent of the CPU time will be spent on context
switching.
• A multilevel queue scheduling algorithm partitions the ready queue into several separate
queues.
• The processes are permanently assigned to one queue, based on some property of the process,
such as process priority or process type.
• For example separate queues might be used for: foreground (interactive) processes and
background (batch) processes
• Each queue has its own scheduling algorithm: foreground queue might be scheduled by an RR.
background queue is scheduled by an FCFS.
• In multilevel queue scheduling processes are permanently assigned to a queue when they enter
the system.
• Multilevel feedback queue scheduling algorithm allows a process to move between queues.
• If a process uses too much CPU time, it will be moved to a lower priority queue.
• A process that waits too long in a lower priority queue may be moved to a higher priority
queue, this prevents starvation.
• Multilevel-feedback-queue scheduler is defined by the following parameters:
• The number of queues
• The scheduling algorithms for each queue
• The method used to determine when to upgrade a process The method used to determine when
to demote a process
• The method used to determine which queue a process will enter when that process needs
service
• Three queues:
◦ Q0 –RR, time quantum 8 milliseconds
◦ Q1 –RR, time quantum 16 milliseconds
◦ Q2 –FCFS
• Scheduling
◦ A process entering ready queue is put in queue Q0
◦ A process in Q0 is given a time quantum of 8 milliseconds.
◦ If it does not finish in 8 milliseconds, it is moved to the tail of queue Q1
◦ If queue Q0 is empty, process at the head of Q1 which is given a time quantum of 16 additional
milliseconds can be considered for execution.
◦ If it still does not complete, it is preempted and moved to queue Q2
◦ Q2 is serviced when Q0 and Q1 are empty.
Thread Scheduling
• On Operating systems that support threads, it is kernel level threads -not processes- that are
being scheduled by the operating system.
• To run on a CPU, user-level threads must be mapped to an associated kernel-level thread.
• This mapping may be indirect and may use a lightweight process (LWP).
• In Many-to-one and many-to-many models, thread library schedules user-level threads to run
on available LWP.
• This scheme Known as process-contention scope (PCS), since competition for the LWP takes
place among threads belonging to the same process.
• Next Kernel thread scheduled onto available physical CPU using system-contention scope
(SCS) – competition for the CPU among all kernel threads in system.
Multiple-Processor Scheduling
• CPU scheduling becomes more complex when multiple CPUs are available.
• Homogeneous processors – We can use any available processor to run any process in the
queue.
• Asymmetric multiprocessing:
◦ All scheduling decisions, I/O processing and other system activities handled by a single
processor.
◦ only one processor accesses the system data structures, reducing the need for data sharing.
• Symmetric multiprocessing (SMP):
◦ Each processor is self-scheduling, all processes may be in a common ready queue, or each
processor may have its own private queue of ready processes.
◦ We must ensure that two processors do not choose the same process.
Processor affinity
• The data most recently accessed by the process populate the cache for the processor.
• If the process migrates to another processor, the contents of cache must be invalidated for the
first processor and the cache for the second processor must be repopulated.
• Cost of invalidating and repopulating caches is high, most SMP systems try to avoid migration
of processes from one processor to another.
• Processor affinity – process has an affinity for the processor on which it is currently running.
• soft affinity- attempt is made to avoid the migration.
• hard affinity-Avoiding migration is Guaranteed
Load Balancing
• Load balancing attempts to keep the workload evenly distributed across all processors in an
SMP system.
• Load balancing is necessary only on systems where each processor has its own private queue.
• With common ready queue load balancing is unnecessary, because once a processor becomes
idle, it extracts a runnable process from the queue.
• Two approaches for load balancing:
Push migration – a specific task periodically checks the load on each processor, and if an imbalance
is found pushes task from overloaded CPU to other CPUs.
Pull migration – idle processor pulls a waiting task from busy processor.
• Pulling or pushing a process from one processor to another cannot take advantage of the data in
processor's cache memory.
Thread scheduling
On Operating systems that support threads, it is kernel level threads. being scheduled by the operating
system. To run on a CPU, user-level threads must be mapped to an associated kernel-level thread.
This mapping may be indirect and may use a lightweight process (LWP).
In Many-to-one and many-to-many models, thread library schedules user-level threads to run on
available LWP. This scheme Known as process-contention scope (PCS), since competition for the
Next Kernel thread scheduled onto available physical CPU using system-contention scope (SCS) –
competition for the CPU among all kernel threads in system.
Pthread API allows specifying either PCS or SCS during thread creation.
PTHREAD_SCOPE_PROCESS schedules threads using PCS scheduling.
PTHREAD_SCOPE_SYSTEM schedules threads using SCS scheduling.
Pthread IPC provides 2 functions for getting and setting contention scope policy:
pthread_attr_setscope(pthread_attr_t *attr, int scope)
pthread_attr_getscope(pthread_attr_t *attr, int *scope)
********************************************************************************
Background:
Concurrent access to shared data may result in data inconsistency. Maintaining data consistency
requires mechanisms to ensure the orderly execution of cooperating processes. Suppose that we
wanted to provide a solution to the consumer-producer problem that fills all the buffers. We can do so
by having an integer count that keeps track of the number of full buffers. Initially, count is set to 0. It is
incremented by the producer after it produces a new buffer and is decremented by the consumer after it
consumes a buffer.
while (true)
{
/* produce an item and put in nextProduced */
while (counter == BUFFER_SIZE)
; // do nothing
buffer [in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
counter++;
}
Race Condition
Race Condition: Several processes access and manipulate the same data concurrently and the
outcome of the execution depends on the particular order in which the access takes place, is called a
race condition. So, Only one process at a time can be manipulating the variable counter to aviod race
condition.
Critical Section
General structure of process pi is
1. Mutual Exclusion - If process Pi is executing in its critical section, then no other processes can be
executing in their critical sections
2. Progress - If no process is executing in its critical section and there exist some processes that wish
to enter their critical section, then the selection of the processes that will enter the critical section next
cannot be postponed indefinitely
3. Bounded Waiting - A bound must exist on the number of times that other processes are allowed to
enter their critical sections after a process has made a request to enter its critical section and before that
request is granted
Peterson’s Solution
Synchronization Hardware
TestAndSet Instruction
Swap Instruction
• Shared Boolean variable lock initialized to FALSE; Each process has a local Boolean variable
key
• Solution:
do {
key = TRUE;
while ( key == TRUE)
Swap (&lock, &key );
// critical section lock = FALSE;
// remainder section
} while (TRUE);
Semaphores
• Synchronization tool that does not require busy waiting
• Semaphore S – integer variable
• Two standard operations modify S: wait() and signal()
▪ Originally called P( ) andV( )
• Less complicated
• Can only be accessed via two indivisible (atomic) operations
Semaphore Implementation
• Must guarantee that no two processes can execute wait () and signal () on the same semaphore
at the same time
• Thus, implementation becomes the critical section problem where the wait and signal code are
placed in the critical section
▪ Could now have busy waiting in critical section implementation
▪ But implementation code is short
▪ Little busy waiting if critical section rarely occupied
• Note that applications may spend lots of time in critical sections and therefore this is not a good
solution
Implementation of signal:
signal(semaphore *S)
{S->value++;
if (S->value <= 0) {
remove a process P from S->list; wakeup(P);
}
}
Deadlock and Starvation
• Deadlock – two or more processes are waiting indefinitely for an event that can be caused by
only one of the waiting processes
• Let S and Q be two semaphores initialized to 1
P0 P1
Bounded-Buffer Problem
Readers-Writers Problem
• First variation – no reader kept waiting unless writer has permission to use shared object
• Second variation – once writer is ready, it performs write asap
• Both may have starvation leading to even more variations
• Problem is solved on some systems by kernel providing reader-writer locks
Dining-Philosophers Problem
Monitors
• A high-level abstraction that provides a convenient and effective mechanism for process
synchronization
• Abstract data type, internal variables only accessible by code within the procedure
• Only one process may be active within the monitor at a time
• But not powerful enough to model some synchronization schemes monitor monitor-name
{
// shared variable declarations
procedure P1 (…) { …. }
procedurePn (…) {……}
Initialization code (…) { … }
}
}
Condition Variables
• condition x, y;
• Two operations on a condition variable:
▪ x.wait () – a process that invokes the operation is suspended until x.signal ()
▪ x.signal () – resumes one of processes (if any) that invoked x.wait ()
▪ If no x.wait () on the variable, then it has no effect on the variable
• If process P invokes x.signal (), with Q in x.wait () state, what should happen next?
▪ If Q is resumed, then P must wait
• Options include
▪ Signal and wait – P waits until Q leaves monitor or waits for another condition
▪ Signal and continue – Q waits until P leaves the monitor or waits for another condition
▪ Both have pros and cons – language implementer can decide
▪ Monitors implemented in Concurrent Pascal compromise
➢ P executing signal immediately leaves the monitor, Q is resumed
▪ Implemented in other languages including Mesa, C#, Java
initialization_code()
{
for (int i = 0; i < 5; i++)
state[i] = THINKING;
}
}
monitor ResourceAllocator
{
boolean busy; condition x;
void acquire(int time)
{
if (busy)
x.wait(time);
busy = TRUE;
}
void release()
{
busy = FALSE;
x.signal();
}
initialization code()
{
busy = FALSE;
}