Memory System Design PDF
Memory System Design PDF
Memory System Design PDF
A ROM is a form of semiconductor memory technology used where the data is written once and then
not changed. In view of this it is used where data needs to be stored permanently, even when the power
is removed - many memory technologies lose the data once the power is removed. As a result, this type
of semiconductor memory technology is widely used for storing programs and data that must survive
when a computer or processor is powered down. For example the BIOS of a computer will be stored in
ROM. As the name implies, data cannot be easily written to ROM. Depending on the technology used
in the ROM, writing the data into the ROM initially may require special hardware. Although it is often
possible to change the data, this gain requires special hardware to erase the data ready for new data to be
written in.
The different memory types or memory technologies are detailed below:
DRAM: Dynamic RAM is a form of random access memory. DRAM uses a capacitor to store each bit
of data, and the level of charge on each capacitor determines whether that bit is a logical 1 or 0.
However these capacitors do not hold their charge indefinitely, and therefore the data needs to be
refreshed periodically. As a result of this dynamic refreshing it gains its name of being a dynamic RAM.
DRAM is the form of semiconductor memory that is often used in equipment including personal
computers and workstations where it forms the main RAM for the computer.
EEPROM: This is an Electrically Erasable Programmable Read Only Memory. Data can be written to
it and it can be erased using an electrical voltage. This is typically applied to an erase pin on the chip.
Like other types of PROM, EEPROM retains the contents of the memory even when the power is turned
off. Also like other types of ROM, EEPROM is not as fast as RAM.
EPROM: This is an Erasable Programmable Read Only Memory. This form of semiconductor
memory can be programmed and then erased at a later time. This is normally achieved by exposing the
silicon to ultraviolet light. To enable this to happen there is a circular window in the package of the
EPROM to enable the light to reach the silicon of the chip. When the PROM is in use, this window is
normally covered by a label, especially when the data may need to be preserved for an extended period.
The PROM stores its data as a charge on a capacitor. There is a charge storage capacitor for each cell
and this can be read repeatedly as required. However it is found that after many years the charge may
leak away and the data may be lost. Nevertheless, this type of semiconductor memory used to be widely
used in applications where a form of ROM was required, but where the data needed to be changed
periodically, as in a development environment, or where quantities were low.
AM / PCM: This type of semiconductor memory is known as Phase change Random Access Memory, P-
RAM or just Phase Change memory, PCM. It is based around a phenomenon where a form of chalcogenide
glass changes is state or phase between an amorphous state (high resistance) and a polycrystalline state (low
resistance). It is possible to detect the state of an individual cell and hence use this for data storage. Currently
this type of memory has not been widely commercialized, but it is expected to be a competitor for flash
memory.
PROM: This stands for Programmable Read Only Memory. It is a semiconductor memory which can
only have data written to it once - the data written to it is permanent. These memories are bought in a
blank format and they are programmed using a special PROM programmer. Typically a PROM will
consist of an array of fuse able links some of which are "blown" during the programming process to
provide the required data pattern.
SDRAM: Synchronous DRAM. This form of semiconductor memory can run at faster speeds than
conventional DRAM. It is synchronized to the clock of the processor and is capable of keeping two sets
of memory addresses open simultaneously. By transferring data alternately from one set of addresses,
and then the other, SDRAM cuts down on the delays associated with non-synchronous RAM, which
must close one address bank before opening the next.
SRAM: Static Random Access Memory. This form of semiconductor memory gains its name from the
fact that, unlike DRAM, the data does not need to be refreshed dynamically. It is able to support faster
read and write times than DRAM (typically 10 ns against 60 ns for DRAM), and in addition its cycle
time is much shorter because it does not need to pause between accesses. However it consumes more
power, is less dense and more expensive than DRAM. As a result of this it is normally used for caches,
while DRAM is used as the main semiconductor memory technology.
MEMORY ORGANIZATION
Memory Interleaving:
Pipeline and vector processors often require simultaneous access to memory from two or more
sources. An instruction pipeline may require the fetching of an instruction and an operand at the same
time from two different segments.
Similarly, an arithmetic pipeline usually requires two or more operands to enter the pipeline at
the same time. Instead of using two memory buses for simultaneous access, the memory can be
partitioned into a number of modules connected to a common memory address and data buses. A
memory module is a memory array together with its own address and data registers. Figure 9-13 shows a
memory unit with four modules. Each memory array has its own address register AR and data register
DR.
The address registers receive information from a common address bus and the data registers
communicate with a bidirectional data bus. The two least significant bits of the address can be used to
distinguish between the four modules. The modular system permits one module to initiate a memory
access while other modules are in the process of reading or writing a word and each module can honor a
memory request independent of the state of the other modules.
The advantage of a modular memory is that it allows the use of a technique called interleaving.
In an interleaved memory, different sets of addresses are assigned to different memory modules. For
example, in a two-module memory system, the even addresses may be in one module and the odd
addresses in the other.
Capacity:
It is the global volume of information the memory can store. As we move from top to bottom in the Hierarchy, the
capacity increases.
Access Time:
It is the time interval between the read/write request and the availability of the data. As we move from top to bottom in
the Hierarchy, the access time increases.
Performance:
Earlier when the computer system was designed without Memory Hierarchy design, the speed gap increases between
the CPU registers and Main Memory due to large difference in access time. This results in lower performance of the
system and thus, enhancement was required. This enhancement was made in the form of Memory Hierarchy Design
because of which the performance of the system increases. One of the most significant ways to increase system
performance is minimizing how far down the memory hierarchy one has to go to manipulate data.
Cost per bit:
As we move from bottom to top in the Hierarchy, the cost per bit increases i.e. Internal Memory
is costlier than External Memory.
Cache Memories:
The cache is a small and very fast memory, interposed between the processor and the main memory. Its purpose is to
make the main memory appear to the processor to be much faster than it actually is. The effectiveness of this approach
is based on a property of computer programs called locality of reference.
Analysis of programs shows that most of their execution time is spent in routines in which many instructions are
executed repeatedly. These instructions may constitute a simple loop, nested loops, or a few procedures that repeatedly
call each other.
The cache memory can store a reasonable number of blocks at any given time, but this number is small compared to
the total number of blocks in the main memory. The correspondence between the main memory blocks and those in the
cache is specified by a mapping function.
When the cache is full and a memory word (instruction or data) that is not in the cache is referenced, the cache control
hardware must decide which block should be removed to create space for the new block that contains the referenced
word. The collection of rules for making this decision constitutes the cache‟s replacement algorithm.
Cache Hits
The processor does not need to know explicitly about the existence of the cache. It simply issues Read andWrite
requests using addresses that refer to locations in the memory. The cache control circuitry determines whether the
requested word currently exists in the cache.
If it does, the Read or Write operation is performed on the appropriate cache location. In this case, a read
Or write hit is said to have occurred.
Cache Misses
A Read operation for a word that is not in the cache constitutes a Read miss. It causes the block of words containing
the requested word to be copied from the main memory into the cache.
Cache Mapping:
There are three different types of mapping used for the purpose of cache memory which are as follows: Direct
mapping, Associative mapping, and Set-Associative mapping. These are explained as following below.
Direct mapping
The simplest way to determine cache locations in which to store memory blocks is the direct- mapping technique. In
this technique, block j of the main memory maps onto block j modulo 128 of the cache, as depicted in Figure 8.16.
Thus, whenever one of the main memory blocks 0, 128, 256, . . . is loaded into the cache, it is stored in cache block 0.
Blocks 1, 129, 257, . . . are stored in cache block 1, and so on. Since more than one memory block is mapped onto a
given cache block position, contention may arise for that position even when the cache is not full.
For example, instructions of a program may start in block 1 and continue in block 129, possibly after a branch. As this
program is executed, both of these blocks must be transferred to the block-1 position in the cache. Contention is
resolved by allowing the new block to overwrite the currently resident block.
With direct mapping, the replacement algorithm is trivial. Placement of a block in the cache is determined by its
memory address. The memory address can be divided into three fields, as shown in Figure 8.16. The low-order 4 bits
select one of 16 words in a block.
When a new block enters the cache, the 7-bit cache block field determines the cache position in which this block must
be stored. If they match, then the desired word is in that block of the cache. If there is no match, then the block
containing the required word must first be read from the main memory and loaded into the cache.
The direct-mapping technique is easy to implement, but it is not very flexible.
Associative Mapping
In Associative mapping method, in which a main memory block can be placed into any cache
block position. In this case, 12 tag bits are required to identify a memory block when it is resident in the
cache. The tag bits of an address received from the processor are compared to the tag bits of each block
of the cache to see if the desired block is present. This is called the associative-mapping technique.
It gives complete freedom in choosing the cache location in which to place the memory block,
resulting in a more efficient use of the space in the cache. When a new block is brought into the cache, it
replaces (ejects) an existing block only if the cache is full. In this case, we need an algorithm to select
the block to be replaced.
To avoid a long delay, the tags must be searched in parallel. A search of this kind
is called an
Associative search.
Set-Associative Mapping
At the same time, the hardware cost is reduced by decreasing the size of the associative
search.
An example of this set-associative-mapping technique is shown in Figure 8.18 for a
cache with two blocks per set. In this case, memory blocks 0, 64, 128, . . . , 4032 map
into cache set 0, and they can occupy either of the two block positions within this set.
Having 64 sets means that the 6-bit set field of the address determines which set of the
cache might contain the desired block. The tag field of the address must then be
associatively compared to the tags of the two blocks of the set to check if the desired
block is present. This two-way associative search is simple to implement.
The number of blocks per set is a parameter that can be selected to suit the requirements
of a particular computer. For the main memory and cache sizes in Figure 8.18, four
blocks per set can be accommodated by a 5-bit set field, eight blocks per set by a 4-bit set
field, and so on. The extreme condition of 128 blocks per set requires no set bits and
corresponds to the fully-associative technique, with 12 tag bits.
The other extreme of one block per set is the direct-mapping.
Replacement Algorithms
Write Policies
Write-through protocol:
Here the cache location and the main memory locations are updated
simultaneously.
Write-back protocol:
This technique is to update only the cache locationand to mark it as with associated
flag bit called dirty/modified bit.
The word in the main memory will be updated later, when the block containing this
marked word is to be removed from the cache to make room for a new block.
To overcome the read miss Load –through / Early restart protocol is used.