Module 4 Memory (BKM)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

The Memory System

Dr. Bimal Kumar Meher


Associate Professor
Dept. of CSE
Basic concepts
• Maximum size of the Memory can be determined by the
addressing scheme.
• Example: A 16 bit computer that generates 16 bit addresses
is capable of addressing up to 64K memory locations.
• The number of memory locations represents the size of the
address space of the computer.

Processor Memory
k-bit
address bus
MAR
n-bit
data bus Up to 2k addressable
MDR locations

Word length = n bits

Control lines
( R / W , MFC, etc.)
Read/Write Operations in Memory
• The processor reads the data from the memory by loading
the address of the required memory location into the MAR
and setting the R/W line to 1.
• The memory responds by placing the data from the
addressed location onto the data lines and confirms this
action by asserting the MFC (memory function completed)
signal.
• After receiving the MFC, the processor loads the data on the
data lines into the MDR.
• Similarly the processor writes data into a memory location
by loading the address this location into MAR and loading
the data into MDR.
• It also sets the R/W line to 0.
Memory access time vs Memory cycle time
 Memory access time: This is the time that elapses between
the initiation and the completion of that operation.
 Example: The time between the Read and the MFC signals.
 Memory cycle time: It is the minimum time delay required
between the initiation of two successive memory operations.
 Example: The time between the two successive Read
operations.
 Note: The cycle time is usually slightly longer than the access
time, depending on the implementation of the memory unit.
 Note: An important design issue is to provide a computer
with large and fast a memory as possible, within a given cost
target.
 Common techniques to increase the effective size and speed
of the memory:
 Cache memory (to increase the effective speed).
 Virtual memory (to increase the effective size).
Memory Hierarchy
Processor

Registers
Increasing Increasing Increasing
size speed cost per bit
Primary
cache L1

Secondary
cache L2

Main
memory

Magnetic disk
secondary
memory
Memory Hierarchy (contd…)
• Fastest access is to the data held in processor registers.
• Registers are at the top of the memory hierarchy.
• Relatively small amount of memory that can be implemented
on the processor chip is called processor cache.
• Two levels of cache:
• Level 1 (L1) cache is on the processor chip.
• Level 2 (L2) cache is in between main memory and processor.
• Next level is main memory, implemented as SIMMs. Much
larger, but much slower than cache memory.
• Next level is magnetic disks. Huge amount of inexpensive
storage.
• Note: Speed of memory access is critical, the idea is to bring
instructions and data that will be used in the near future as
close to the processor as possible.
Internal organization of RAM
• A memory unit is called random access memory (RAM) if any
location can be accessed for a Read or Write operation in some
fixed amount of time independent of the location’s address.
• This is in contrast to magnetic disks or tapes those depend on
the address or the position of the data.
• RAM uses the semiconductor integrated circuits for its
implementation.
• It consists of Memory cells those are organized in the form of an
array.
• Each memory cell can hold one bit of information.
• One row is one memory word.
• All cells of a row are connected to a common line, known as the
word line.
• Word line is connected to the address decoder.
• Sense/write circuits are connected to the data input/output
lines of the memory chip.
Internal organization of RAM chip
7 7 1 1 0 0
W0




FF FF
A0 W1




A1
Address Memory
• • • • • • cells
decoder • • • • • •
A2
• • • • • •
A3

W15


Sense / Write Sense / Write Sense / Write R/W


circuit circuit circuit
CS

Data input /output lines: b7 b1 b0


SRAM vs DRAM
• Static RAM (SRAM):
• These are fast memories but come at higher cost, because their
cells require several transistors.
• Consist of circuits that are capable of retaining their state as long
as the power is applied.
• Volatile memories, because their contents are lost when power
is interrupted.
• Access times of static RAMs are in the range of few
nanoseconds.
• Dynamic RAMs (DRAMs):
• Contains less no of transistors than SRAM.
• Do not retain their state for long time.
• Contents must be periodically refreshed by a capacitor.
• Contents may be refreshed while accessing them for reading.
• Access times of DRAMs are in the range of tens of milliseconds.
DRAM Packages
 Placing large memory systems directly on the motherboard
will occupy a large amount of space.
 Also, this arrangement is inflexible since the memory
system cannot be expanded easily.
 Packaging considerations have led to the development of
larger memory units known as
 SIMM (Single In-line Memory Module) and
 DIMM (Dual In-line Memory Module).
 Memory modules are an assembly of memory chips on a
small board that plugs vertically onto a single socket on the
motherboard.
 Occupy less space on the motherboard.
 Allows for easy expansion by replacement.
Cache Memory
 What is the need of cache memory?
 Processor is much faster than the main memory.
 As a result, the processor has to spend much of its
time waiting while instructions and data are being
fetched from the main memory.
 This idle time of the processor leads to poor
performance.
 Speed of the main memory cannot be increased
beyond a certain point.
 Cache memory is an architectural arrangement
which makes the main memory appear faster to the
processor than it really is.
 Cache memory is based on the property of computer
programs known as locality of reference.
Locality of Reference
 Analysis of programs indicates that many instructions in
localized areas of a program are executed repeatedly
during some period of time, while the others are accessed
relatively less frequently.
 These instructions may be the ones in a loop, nested
loop or few procedures calling each other repeatedly.
 This is called locality of reference.
 Temporal locality of reference:
 Recently executed instruction is likely to be executed
again very soon.
 Spatial locality of reference:
 Instructions with addresses close to a recently executed
instruction are likely to be executed soon.
Cache memory operations

Main
Processor Cache memory

• When the processor issues a Read request,


• a block of words is transferred from the main memory
to the cache, one word at a time.
• Subsequent references to the data in this block of
words are found in the cache block (also called
cache line).
• So, at any given time, only some blocks in the
main memory are held in the cache.
Cache memory operations

Main
Processor Cache memory

• A mapping function decides where the blocks of


the main memory are kept in the cache
memory.
• When the cache is full, and a block of words
needs to be transferred from the main memory,
some block of words in the cache must be
replaced. This is determined by a replacement
algorithm.
Cache hit
• Existence of a cache is transparent to the processor. When
the processor issues Read or Write requests and the data is
available in the cache is called a Read or Write hit.
• Following are the operations done during the hit:
• Read hit:
 The data is simply read/obtained from the cache.
• Write hit:
 Cache has a replica of the contents of the main memory.
 Case I- Contents of the cache and the main memory may
be updated simultaneously. This is the write-through
protocol.
 Case II- Update the contents of the cache, and mark it as
updated by setting a bit known as the dirty bit or
modified bit. The contents of the main memory are
updated when this block is replaced. This is write-back or
copy-back protocol.
Cache miss
• If the data is not present in the cache, then a Read miss or Write
miss occurs. Following are the operations done during a miss:
• Read miss:
 Block of words containing this requested word is transferred
from the memory to the cache.
 After the block is transferred, the desired word is forwarded to
the processor.
 The desired word may also be forwarded to the processor as
soon as it is transferred without waiting for the entire block to
be transferred. This is called load-through or early-restart.
• Write-miss:
 If write-through protocol is used, then the data is written
directly into the main memory.
 If write-back protocol is used, the block containing the
addressed word is first brought into the cache, then the desired
word in the cache is overwritten with new information.
Cache Coherence Problem
• This is a data inconsistency problem in memories.
• It occurs when data is transferred between the
main memory and the disk and copies of those
data blocks also present in the cache.
• Valid bit: Each block is provided with a bit called
as valid bit . If the block contains valid data, this
bit is set to 1, else it is set to 0.
• When it is set to 0 and when it is set to 1?
• Valid bits are set to 0,
• when the power is just turned on or
• main memory is loaded with new programs and data
from the disk.
• Valid bit is set to 1,
• when a block is loaded into the cache for the first time.
Cache Coherence Problem (contd…)
• Situation 1:
• Assume that, there is a data transfer from the disk
to the main memory by using DMA. It occurs
directly by bypassing the cache for cost and
performance reason.
• So, a check is made to determine whether the
block being loaded is currently present in the
cache.
• If it is present, then this will lead to the cache
coherence problem because of data inconsistency.
• So, the solution is to set the valid bit to 0 to ensure
that any stale or outdated data does not exist in
the cache.
Cache Coherence Problem (contd…)
• Situation 2:
• Assume that data is transferred from main memory
to disk and the cache uses write-back protocol.
• In this case, the data in the memory might not
reflect the changes that have been made in the
cached copy.
• This also leads to the cache coherence problem.
• One solution to this problem is to flush the cache
by forcing the dirty data (modified data) to be
written back to memory before the DMA transfer
takes place.
• The operating system can do this easily without
affecting performance because such disk transfers
don’t occur often.
Mapping functions
 Mapping functions determine how memory blocks
are placed in the cache.
 Three mapping functions:
 Direct mapping
 Associative mapping
 Set-associative mapping.
 Let us take a simple processor example to
understand these mappings:
 Cache consisting of 128 blocks of 16 words each.
 Total size of cache is 2048 (2K) words.
 Main memory is addressable by a 16-bit address.
 Main memory has 4K blocks of 16 words each.
 Therefore, main memory has 64K words.
Direct mapping • Block j of the main memory maps to j modulo
Main 128 of the cache.
memory • Example: 0 maps to 0, 129 maps to 1 and …
Block 0 • Memory address is divided into three fields:
Cache Block 1
- Low order 4 bits determine one of the 16
tag words in a block.
Block 0
tag
- When a new block is brought into the cache,
Block 1
the next 7 bits determine in which cache
Block 127 block this new block is placed in.
Block 128 - High order 5 bits determine which of the
tag
Block 127 Block 129 possible 32 blocks is currently present in the
cache. These are tag bits.
• Problem: More than one memory block is
mapped onto the same position in the cache.
Block 255
Tag Block Word • May lead to contention for cache blocks even if
Block 256
5 7 4 the cache is not full.
Main memory address
Block 257
• Resolve the contention by allowing new block
to replace the old block, leading to a trivial
replacement algorithm.
Block 4095 • Simple to implement but not very flexible.
Associative mapping
Main • Here, main memory block can be
memory placed into any cache block position.
Block 0 • Memory address is divided into two
Cache Block 1 fields:
tag
Block 0 - Low order 4 bits identify the word
tag
Block 1 within a block.
Block 127
- High order 12 bits called tag bits
Block 128
identify a memory block when it is
tag resident in the cache.
Block 127 Block 129
• Flexible, and uses cache space
efficiently.
• Replacement algorithms can be used
Block 255
Tag Word to replace an existing block in the
Block 256
12 4 cache when the cache is full.
Main memory address
Block 257
• Drawback: Cost is higher than direct-
mapped cache because of the need to
search all 128 patterns to determine
Block 4095
whether a given block is in the cache.
Set-Associative mapping
Main • It is a combination of direct and asso. mapping.
memory
Cache • Blocks of cache are grouped into sets.
Block 0
tag Block 0
• It allows a block of the main memory to reside in
Block 1
tag
any block of a specific set.
Block 1
• Example: Let us divide the cache into 64 sets,
tag Block 2
with two blocks per set.
tag Block 3
Block 63 • Memory block 0, 64, 128 etc. map to cache set
Block 64 0, and can occupy either of the two positions.
tag
Block 126 Block 65
• Memory address is divided into three fields:
tag
- 6 bit field determines the set number.
Block 127
- 6 bit tag field is essential to check if the desired
block of a set is present.
Tag Set Word
Block 127 • Note: No. of blocks per set is a design choice.
6 6 4
Block 128 • A cache that has k blocks per set is called k-way
Main memory address Block 129 set associative cache.
• One extreme is to have all the blocks in one set,
requiring no set bits (fully associative mapping).
Block 4095
• Other extreme is to have one block per set, is
the same as direct mapping.
Example
• A computer uses 32-bit, byte-addressing. It uses a 2-way set
associative cache with a capacity of 32KB. Each cache block
contains 16 bytes. Calculate the number of bits in the TAG, SET,
and OFFSET fields of a main memory address.
• Solution:
• Since a cache block contains 16 bytes, the OFFSET field must contain 4 bits
(24 = 16).
• To determine the number of bits in the SET field, we need to determine the
number of sets.
• Each set contains 2 cache blocks (2-way associative) so a set contains 32 bytes.
• There are 32KB bytes in the entire cache, so there are 32KB/32B = 1K sets.
Thus the SET field contains 10 bits (since 1K=210 ).
• Finally, the TAG field contains the remaining 18 bits (=32 - 4 - 10). So, main
memory address is decomposed as shown below.
Problems
1. A block-set-associative cache consists of a total of 64 blocks
divided into 4-block sets. The main memory contains 4096
blocks, each consisting of 128 words.
a) How many bits are there in a main memory address?
b) How many bits are there of the TAG, SET, and WORD field?
2. A computer system has a main memory consisting of 1M
16-bit words. It also has a 4K-word cache organized in the
block-set-associative manner, with 4 blocks per set and 64
words per block. Calculate the number of bits in each of
the TAG, SET, and WORD fields of the main memory
address format.
3. A block set associative cache consists of a total of 128
blocks divided into 4-block sets. The main memory contains
4K blocks of 128 words each.
a) How many bits are there in main memory address and WORD field?
b) How many bits are there in each of the TAG and SET?
Replacement Algorithm
• If the cache is full and a new block to be brought into the cache,
then the cache-controller must decide which of the old blocks has
to be replaced.
• But, in direct mapping method, the position of each block is pre-
determined and there is no need of replacement strategy.
• In associative & set associative method, the block position is not
pre-determined.
• A thumb rule is that, the blocks which are supposed to be
referred in near future should not be replaced.
• Although it is not easy to decide, but locality of reference helps to
choose the correct block in most cases.
• Therefore, a strategy would be if a block is not referenced for
longest time that can be selected for replacement.
• This block is called the least recently used block and the
technique is called the LRU replacement algorithm.
Hit Rate and Miss Penalty
• Hit rate: It is the ratio of no. of successful attempts to access the
data (called a hit) to all the attempted accesses in a cache.
• Miss rate: It is the ratio of no. of missed accesses to all the
attempted accesses in the cache.
• Miss penalty: when there is a miss, then extra actions are
required to bring the desired data to the cache. This is called miss
penalty.
• Hit rate can be improved by increasing block size, while keeping
cache size constant
• Miss penalty can be reduced if load-through approach is used
when loading new blocks into cache.
• Let h be the hit rate, M be the miss penalty (time to access data in
main memory), and C be the time to access data in the cache.
• Then the average access time experienced by the processor is :
Tavg = hC + (1-h)M
Virtual Memory
 Recall that an important challenge in the
design of a computer system is to provide a
large, fast memory system at an affordable
cost.
 Architectural solutions to increase the effective
speed and size of the memory system.
 Cache memories were developed to increase
the effective speed of the memory system.
 Virtual memory is an architectural solution to
increase the effective size of the memory
system.
Virtual Memory (contd..)
 We know that, the addressable memory space
depends on the number of address bits used by the
processor.
 For example, if a processor issues 32-bit
addresses, the addressable memory space is 4G
bytes.
 Even larger programs need more than 4GB to
execute where the main memory is insufficient to
accommodate.
 Large programs that cannot fit completely into the
main memory have their parts stored on secondary
storage devices such as magnetic disks.
 Pieces of programs must be transferred to the
main memory from secondary storage before
they can be executed.
Virtual memory larger than Physical
memory
Virtual Memory (contd..)
 Techniques that automatically move program and
data between main memory and secondary
storage when they are required for execution are
called virtual-memory techniques.
 Programs and processors reference an instruction
or data independent of the size of the main
memory.
 Processor issues binary addresses for instructions
and data.
 These binary addresses are called logical or
virtual addresses.
Virtual memory organization
Processor
• Memory management unit (MMU)
Virtual address translates virtual addresses into
physical addresses.
Data MMU
• If the desired data or instructions are
Physical address in the main memory they are fetched
as described in cache memory.
Cache
• If the desired data or instructions are
Data Physical address not in the main memory, they must
be transferred from secondary
Main memory
storage to the main memory.
DMA transfer
• MMU causes the operating system to
bring the data from the secondary
Disk storage storage into the main memory.
Concept of Page
 Assume that program and data are composed of
fixed-length units called pages.
 A page consists of a block of words that occupy
contiguous locations in the main memory.
 Page is a basic unit of information that is transferred
between secondary storage and main memory.
 Size of a page commonly ranges from 2K to 16K
bytes.
 Pages should not be too small, because the access
time of a secondary storage device is much larger
than the main memory.
 Pages should not be too large, else a large portion
of the page may not be used, and it will occupy
valuable space in the main memory.
Page Table and PTBR
 Each virtual or logical address generated by a
processor is interpreted as a virtual page number
(high-order bits) plus an offset (low-order bits) that
specifies the location of a particular byte within that
page.
 Information about the main memory location of each
page is kept in the page table. Information includes:
 main memory address where the page is stored.
 current status of the page.
 An area of the main memory that can hold a page is
called as page frame.
 Starting address of the page table is kept in a page
table base register (PTBR).
Page Table and PTBR(contd..)
• Virtual page number generated by the processor is
added to the contents of the PTBR.
• This provides the address of the corresponding
entry in the page table.
• The contents of this location in the page table give
the starting address of the page if the page is
currently in the main memory.
Location of Page Table
 Where should the page table be located?
 Page table is used by the MMU for every read and
write access to the memory.
 Ideal location for the page table is within the
MMU.
 But, page table is quite large.
 Since, MMU is implemented as part of the processor
chip, it is impossible to include a complete page table
on the chip.
 Therefore, page table is kept in the main memory.
 A copy of a small portion of the page table can be
accommodated within the MMU.
 A small cache called TLB is incorporated inside the
MMU for this purpose.
Address translation using Page Table
Virtual address from processor
PTBR holds
Page table base register
the address of
the page table. Page table address Virtual page number Offset
Virtual address is
interpreted as page
+ number and offset.
PAGE TABLE
PTBR + virtual
page number provide
the entry of the page This entry has the starting location
in the page table. of the page.

Page table holds information


about each page. This includes
the starting address of the page
in the main memory. Control
bits
Page frame
in memory Page frame Offset

Physical address in main memory


Address translation (contd..)-Control
bits
 Page table entry for a page also includes some control bits
which describe the status of the page while it is in the main
memory.
 One bit indicates the validity of the page.
 Indicates whether the page is actually loaded into the main memory.
 Allows the operating system to invalidate the page without actually
removing it.
 One bit indicates whether the page has been modified during
its residency in the main memory.
 This bit determines whether the page should be written back to the
disk when it is removed from the main memory.
 Similar to the dirty or modified bit in case of cache memory.
 Other control bits for various other types of restrictions that
may be imposed.
• For example, a program may only have read permission for a page, but
not write or modify permissions.
Translation Lookaside Buffer (TLB)
 A small cache called as Translation Lookaside Buffer (TLB)
is included in the MMU.
 TLB holds page table entries of the most recently
accessed pages.
 Recall that cache memory holds most recently accessed
blocks from the main memory.
 Operation of the TLB and page table in the main
memory is similar to the operation of the cache and
main memory.
 Page table entry for a page includes:
 Address of the page frame where the page resides in
the main memory.
 Some control bits.
 In addition to the above for each page, TLB must hold the
virtual page number for each page.
Associative-mapped TLB(contd..)
Virtual address from processor

Virtual page number Offset


• High-order bits of the virtual
address generated by the
processor select the virtual page.
Virtual page
TLB

Control Page frame


• These bits are compared to the
number bits in memory
virtual page numbers in the TLB.
• If there is a match, a hit occurs
and the corresponding address
No
=? of the page frame is read.
Yes • If there is no match, a miss
Miss
occurs and the page table within
Hit
the main memory must be
Page frame Offset
consulted.
• Set-associative mapped TLBs are
Physical address in main memory found in commercial processors.
Associative-mapped TLB(contd..)
 How to keep the entries of the TLB coherent with
the contents of the page table in the main memory?
 Operating system may change the contents of the
page table in the main memory.
 Simultaneously it must also invalidate the
corresponding entries in the TLB.
 A control bit is provided in the TLB to invalidate an
entry.
 If an entry is invalidated, then the TLB gets the
information for that entry from the page table.
 Follows the same process that it would follow if
the entry is not found in the TLB or if a “miss”
occurs.
Address Translation Steps
 Given a virtual address, the MMU looks into the
TLB for the referenced page.
 If the page table entry for this page is found in
the TLB, the physical address is obtained
immediately.
 If there is a miss in the TLB, then the required
entry is obtained from the page table in the main
memory and the TLB is updated.
 What happens if a program generates an access
to a page that is not in the main memory?
 In this case, a page fault is said to occur.
 Whole page must be brought into the main
memory from the disk, before the execution
can proceed.
Address Translation Steps (contd…)
 Upon detecting a page fault by the MMU,
following actions take place:
 MMU asks the operating system to intervene by
raising an exception.
 Processing of the active task which caused the
page fault is interrupted.
 Control is transferred to the operating system.
 Operating system copies the requested page
from secondary storage to the main memory.
 Once the page is copied, control is returned to
the task which was interrupted.
Page Replacement
 When a new page is to be brought into the main
memory from secondary storage, the main memory
may be full.
 Some page from the main memory must be
replaced to keep the new pages.
 How to choose which page to replace?
 This is similar to the replacement that occurs
when the cache is full.
 The principle of locality of reference can also be
applied here.
 There are numerous page replacement algorithms,
but we will discuss three of them as follows:
 FIFO
 Optimal
 LRU
Basic Page Replacement
1. Find the location of the desired page on disk

2. Find a free frame:


- If there is a free frame, use it
- If there is no free frame, use a page
replacement algorithm to select a victim frame

3. Bring the desired page into the (newly) free


frame; update the page and frame tables

4. Restart the process


Page Replacement Algorithms
• Objective is to achieve lowest page-fault rate
• Evaluate algorithm by running it on a particular string of
memory references (reference string) and computing the
number of page faults on that string
• In all our examples, the reference string is
7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1
FIFO Page Replacement
• The page that has entered into a frame first has to leave first.
• Example: To find the no. of page faults for a given string of
page numbers.
• We take a Reference string to denote the page numbers
referred over a period of time:
7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1
• We also assume that, only 3 frames (3 pages can be in
memory at a time per process) are available.

• 15 page faults
Belady’s Anomaly
• It states that, sometimes adding more frames can cause
more page faults in FIFO page replacement algorithm!
• Example : Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
• 3 frames (3 pages can be in memory at a time per process)

1 1 4 5
2 2 1 3
3 3 2 4 9 page faults
• 4 frames
1 1 5 4
2 2 1 5
3 3 2

4 4 3 10 page faults
Optimal Page Replacement Algorithm
• Replace page that will not be used for longest period of time
• Example: 7,0,1,2,0,3,0,4,2,3,0,3,2,1,2,0,1,7,0,1

• 9 page faults which is optimal. No other algorithm can do better.


• How do you know this?
• Can’t read the future
• This algorithm is difficult to implement since it needs the future
knowledge of the referenced string.
• Mainly used for comparing performance of other algorithms.
Least Recently Used (LRU) Algorithm
• It uses the past knowledge of page replacement rather than
future
• Replace page that has not been used in the most amount of
time
• Associate time of last use with each page

• 12 faults – better than FIFO but worse than OPT


• Generally good algorithm and frequently used
• There are two implementations possible: stack and counter.

You might also like