CS330 Operating System Part V

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

CS330: Operating Systems

Lecture 23
Virtual Memory: Address Translation
Address space abstraction provides the same memory view to all the process. However, this address
space is virtual, and it is the responsibility of the OS, to provide this view. Users use several APIs
to manage this virtual view, and there is no direct control over the physical memory resource.

Translation at address space granularity


Consider the two extremes, first where each byte of the virtual memory has been mapped to a
physical memory, of which the information is stored in some translation table, and the hardware
maps these virtual addresses to physical addresses to carry out the process. The other extreme
would be to allocate physical memory of same size as address space. Under the later scheme we
have that the physical memory for a process can be any address, but should be contiguous.

A detour to a bit of x86 ISA


An ISA defines, encoding of Opcode, mode, and register operands. At a high level, instructions
contain opcode and operands. Operands can be specified in several ways:

1. Register : mov %rcx, %rax (value of %rcx moved to %rax)


2. Immediate: mov $5, %rax (value 5 moved to %rax)
3. Absolute: mov $0x800000, %rax (value at address 0x800000 moved to %rax)
4. Indirect: mov (%rcx), %rax (value at memory address stored in rcx moved to rax).
5. Displacement: mov -16(%rcx), %rax (value at memory address in rcx - 16 moved to rax)

Note: rbp register is the marker of the start of the stack space of a function. The compiler
does not know the stack address, it uses the rsp, rbp registers to access function stacks.
2

an overview of how functions are called

// OS during binary load (exec)


load_new_executable(PCB* current, File* exe) {
verify_executable(exe);
reinit_address_space(current -> mm_state);
allocate_phys_mem(current);
load_exe_to_physmem(current, exe);
// kernel stack
set_user_sp(current -> mm_state -> stack_start);
set_user_pc(current -> mm_state -> code_start);
return_to_user;
}

For eg.:

Physical memory of 8KB is allocated and code and data is loaded. The PCB memory state is
updated based on the executable format.
3

When the process returns to user space (after


exec), the registers are loaded with virtual address
(for eg.: PC = 0, and SP = 8KB. Code is loaded
into physical memory (20KB). At the start of the
func, therefore PC = 10, and SP would be around
8KB. How is the translation carries out? → Us-
ing some process specific base registers, which is
added to the virtual address to get the physical ad-
dress.

How is memory isolation achieved? → To prevent a


process from accessing any arbitrary memory location after even the base register is added, we
need to ensure memory isolation. The hardware can provide support for some limit register,
which are used to first verify whether or not, address access is legitimate or not.

What happens during a context switch? Saving and restoring the base and limit registers values
onto the PCB of the process (User to OS context switching).

Advantages: the number of operations for the hardware are very limited. However the
disadvantages are many fold.
Disadvantage: Physical memory must be greater than address space size. This is unrealistic,
and against the philosophy of address space abstraction. This also limits the size of address space.
This, is memory inefficient, there is memory wastage, the unused space between heap and
stack, which is allocated but never used. This also limits the degree of multiprogramming because
limits the physical resource available.

Lecture 24
Virtual Memory: Segmentation
We saw that address space granularity level address translation was memory inefficient, and in-
flexible. We can now extend the scheme to segment level granularity. We would now require more
base-limit pairs. That is different for each segment.

How does the CPU decide which segment to use?


The CPU does not what the code, or the data is. To make the CPU aware of the segment it
belongs to possible schemes are:

• Explicit addressing: Part of the address is used to specify segment. For eg.: consider a
virtual address space of 8KB, which requires an address length of 13 bits. The first two bits
can be used to specify the segment. For instance, 00 for code, 01 for data, and 11 for stack.
The rest of the bits can be used as offsets.
Issues with this style of addressing:
– Inflexible: max size of segments already fixed.
4

– Wastage of virtual address space.


Note: However, the allocation can still be done on-demand basis, when allocated, the segment
is allocated completely.
• Implicit addressing: The hardware judges which segment based on operation, i.e. the
hardware selects the segment register. This is done, as certain operations are mostly limited
to certain segments. For eg.:
– Code segment for instruction access: Fetch address, jump target, call address.
– Stack segment for stack operations (push, pop), and indirect addressing using SP, BP.
– Data segment for other addresses.

How is stack growth in opposite direction handled?


Apart from base and limit registers, we require thus more information about the segment.

For stack the direction can be negative, for the hardware to calculate addresses.
S bit can be used to specify privilege, specifically in the code segment.
R, W, X can be used to enforce isolation and sharing.
5

Segmentation in reality

Instead of specific segments, there can be multiple segments. And the behaviour of each can be
captured using the flags and other metadata corresponding to it. Descriptor Table Register
(DTR) is used to access the descriptor table. The number of descriptors depends on the ar-
chitecture. We may need separate descriptors for user and kernel mode.

Note: the CS, SS and DS and other segment registers, hold the offset to the information cor-
responding to their entry in the descriptor table.

Changing the privilege is pointing the code segment register to appropriate descriptor offset.

In the case of context switch, we now some more information to store. Specifically the process’
segment registers.

Advantages of Segmentation: The operations regarding address translation are still easy and
straightforward. It saves the memory wastage from unused addresses, which was previously faced
during address space granularity address translation scheme.
Disadvantages: External fragmentation: The physical memory is divided into usable chunks,
however they are distributed throughout, and may cause several small unusable chunks remaining.
Also, it cannot support dis-contiguous sparse mapping.

Lecture 25
Virtual Memory: Paging
The idea behind paging is to partition the address space into fixed sized blocks (called pages).
The physical memory is partitioned into a similar way (page frames).
The responsibility of the OS is to set the mapping between virtual and physical address.
The responsibility of the hardware is to walk through the translation to translate the virtual
address to the physical address.
6

Example
Consider a virtual address space of 32KB, and a page size of 256 bytes. The address length would
be atleast 15 bits, the first 7 bits of which (the MSB 7) can be used to find the page number.
The number of pages being 128. The LSB 8 bits can be used to find the offset within a page. For
instance: VA = 0x0510 corresponds to page number 5 and offset 16.
Consider similarly the physical memory to be of 64KB size, the page frame size (is same as
page size) is 256 bytes, which would mean the number of pages to be 256. The required address
length would thus be 16 bits long, the MS-8-bits to specify the page frame number, and the later
bits to find offset within the page frame. For instance: PA = 0x1F51 corresponds to page frame
number 31 and offset 81.

Paging example

Each entry in the table is called the page table entry PTE.
PTW (vaddr V, PTable P) { // page table walk (by the hardware)
// input: virtual address, page table
// returns: physical address
Entry = P[V >> 8];
if (Entry.present) {
return (Entry.PFN << 8) + (V & 0xFF);
}
Raise PageFault;
}

In the example depicted in the figure, the VA 0x10 translated to PA 0x110. While VA 0x7FF0
translates to PA 0x3F0.
7

Where is the page table stored?


Page table is stored in the RAM. Page table base register (CR3 in x86_64) contains the address to
it.

Structure of PTE

Hence consider again the Paging example figure, we would have the code segment (Page 0)’s
(which is read and execute mode) PTE as something like 0x125, data segment (Page 2 and 3)’s
(which is read and write) to be 0x207, and ox407. The stack (Page 127)’s (which is again read
and write) to be something like 0x307.
Thus the PTE apart from the PFN, also stores the privilege information, permissions and
other flags.

What is the maximum physical memory size supported?


If we consider even the reserved/unused bits we would have
1024
|{z} ∗ |{z}
256 = 256KB
number of pages possible = 210 page size

Lecture 26-27
Virtual memory: Multi-level Paging - I
One level of page tbale may not be enough. Consider the case of a 32-bit system, having a address
space of 4GB.
8

What could be the page size for the address space? Too large address space results in internal
fragmentation. Assuming a page size of 4KB, we have under one-level paging 22 0 entries, which
are not possible to be held in a single page. Therefore multi-level paging is used in modern-day
systems.

4-level page tables: 48-bit VA (Intel x86-64)

For a virtual address of size 248 , and page size and page frame size being 4KB. Four levels of
paging with entry size: 64 bits.
Hardware translation by repeated access of page table stored in physical memory. Inside a
page table entry, we have the 12bits (LSB) for the access flags.

Paging efficiency
Consider a simple loop:
sum = 0;
for (int i = 0; i < n; i++)
sum += i;

// the corresponding assembly:


0x20100: mov $0, %rax;
0x20102: mov %rax, -8(%rbp); // sum=0
0x20104: mov $0, %rcx; // ctr=0
0x20106: cmp $10, %rcx; // ctr < 10
0x20109: jge 0x2011f; // jump if >=
0x2010f: add %rcx, %rax;
0x20111: mov %rax, -8(%rbp); // sum += ctr
0x20113: inc %rcx // ++ctr
0x20115: jmp 0x20106 // loop
0x2011f: ....
9

The number of instruction access are: Loop (10 ∗ 6), others (2 + 3) = 63


- Memory access during translation: 65 ∗ 4 = 260
Data/stack access are: Initialization (1) and Loop (10) = 11.
- Memory access during translation: 11 ∗ 4 = 44
Now to access a single code and stack page required over 300 memory accesses.

Paging with TLB: Translation efficiency


TLB is a hardware cache to store, page to PFN mapping. It avoid page walks after first access to
the page frame.

Address translation (TLB + PTW)

OS cannot make entries in the TLB, but can flush out entries. There can be separate TLBs
for instruction and data, multi-level TLBs.

Sharing TLBs across applications


Assume that a process A is currently executing, and now B is scheduled.
We cannot the PTE to be as it is, as it can now lead to incorrect translations. Alternately,
we can flush the entire TLB, per process-context switch. This ensures correction, but is inefficient
for frequently switching processes. A better solution is a modified TLB with Address space
identifier (ASID) with each entry in the TLB.
10

Page fault
Page faults are required to support memory-over-commitment through lazy allocation and swap-
ping. Specific error codes are pushed into the kernel stack in x86.

If( !pte.valid ||
(access == write && !pte.write) ||
(cpl != 0 && pte.priv == 0)){
CR2 = Address;
errorCode = pte.valid
| access << 1
| cpl << 2;
Raise pageFault;
} // Simplified

The hardware invokes the page fault handler by handing the error code and virtual address.
The OS handles the page fault by either fixing it or raising a SEGFAULT.
HandlePageFault( u64 address, u64 error_code)
{
if ( AddressExists(current -> mm_state, address) &&
AccessPermitted(current -> mm_state, error_code) {
PFN = allocate_pfn( );
install_pte(address, PFN);
return;
}
RaiseSignal(SIGSEGV);
}

Lecture 28
Page fault and Swapping
When the number of free PFNs may be running low and an allocate_pfn() might be called, the
OS uses its page replacement policy to swap out certain pages onto the hard disk from the
DRAM.
A methodology: update the present-bit to 0, in the PTE such that any access to the page
through the virtual address will result in a page fault. Also the swap address has to be maintained
11

in the PTE. Content of the PFN is now in the swap device. In future, any translation using the
PTE will result in a page fault. The page fault handler would copy it back from the swap device.

Thus modifying the HandlePageFault


HandlePageFault( u64 address, u64 error_code)
{
if ( AddressExists(current -> mm_state, address) &&
AccessPermitted(current -> mm_state, error_code) {
PFN = allocate_pfn( );
if (is_swapped_pte(address)) // Check if the PTE is swapped out
swap_in(getPTE(address), PFN); // Copy the swap block to PFN
install_pte(address, PFN); // and update the PTE
return;
}
RaiseSignal(SIGSEGV);
}

Page replacement
Objective: is to minimize the number of page faults (due to swapping)
The three parameters are: (i) A given sequence of access to virtual pages, (ii) number of
memory pages, (iii) Page replacement policy.
Metrics to measure the effectiveness: number of page faults, page fault rate, average memory
access time.

After swapping out the TLB has to be flushed out. Swapping is initiated by the OS. We need
that S bit to know that it is a swapped out page, as page fault might not have involved swapping.

Belady’s optimal algorithm


Strategy: Replace the page that will be referenced after the longest time.
Eg.: # of frames = 3, reference sequence (in temporal order): 1, 3, 1, 5, 4, 1, 2, 5, 2, 2, 5, 3
# of page faults = 6 (3 cold-start misses result in page faults, no swapping)
Belady’s MIN is proven to be optimal, but impractical as it requires knowledge of future access.

First in First Out (FIFO)


Strategy: Replace the page that is in memory for the longest time
12

Eg.: # of frames = 3, reference sequence (in temporal order): 1, 3, 1, 5, 4, 1, 2, 5, 2, 2, 5, 3


# of page faults = 8 (3 cold-start misses)

FIFO suffers from an anomaly known as Belady’s anomaly


- With increased # of frames, # of page fault may also increase!
For instance, consider the access sequence: 0, 1, 2, 3, 0, 1, 4, 0, 1, 2, 3, 4
# of page faults with 3 frames < # of page faults with 4 frames.

Least recently used (LRU)


Strategy: Replace the page that is not referenced for the longest time
Eg.: # of frames = 3, reference sequence (in temporal order) 1, 3, 1, 5, 4, 1, 2, 5, 2, 2, 5, 3
# of page faults = 7 (3 cold-start)
LRU shown to be useful for workloads with access locality

Implementation of LRU using a single accessed-bit may not be practical, can be approximated
using CLOCK (a useful approximation to LRU, in which pages can be placed on a dial of clock
in which the first not swapped (while going clockwise/counter-clockwise) is swapped.

Follows Stack property or inclusion property of eviction algorithms (also the reason
behind Belady’s anomaly): Let A = {a1 , a2 , ..., an } be the access order and p be the cache-size and
at some point k, we have the contents of the stack as S, if for another cache size q(< p), at the
′ ′
same point k, the contents of the stack S . If S ⊆ S, then the eviction rule follows the property.

You might also like