A Review of Architectures - Intel Single Core, Intel Dual Core and AMD Dual Core Processors and The Benefits

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/350515341

A Review of Architectures -Intel Single Core, Intel Dual Core and AMD Dual Core
Processors and the Benefits

Article  in  International Journal of Engineering and Technology · May 2012

CITATIONS READS

2 1,361

3 authors, including:

Ayodeji Fasiku Jimoh Olawale


Ekiti State University, Ado Ekiti Cyprus International University
20 PUBLICATIONS   34 CITATIONS    5 PUBLICATIONS   6 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Digital Security View project

Design and Implementation of automatic driver face detection and recognition system View project

All content following this page was uploaded by Ayodeji Fasiku on 31 March 2021.

The user has requested enhancement of the downloaded file.


International Journal of Engineering and Technology Volume 2 No. 5, May, 2012

A Review of Architectures - Intel Single Core, Intel Dual Core and AMD
Dual Core Processors and the Benefits
A.I. Fasiku1, J. B. Olawale2, O. T. Jinadu3
1
Department of Computer Science,
The Federal University of Technology, Akure, Ondo State, Nigeria.
2
Department of Computer Engineering,
Rufus Giwa Polytechnic, Owo, Ondo State, Nigeria.
3
Department of Computer Science,
Rufus Giwa Polytechnic, Owo, Ondo State, Nigeria

ABSTRACT
Computer architectures are approaching physical and technological barriers which make increasing the speed of a single core
exceedingly difficult and economically infeasible. As a result, hardware architects have begun to design microprocessors with
multiple processing cores that operate independently and share the same address space. However, increasing the number of
cores on a single chip, generate a lot of challenges that arises with memory and cache coherence as well as communication
between the cores. AMD and Intel have found solution to this problem by redesigning the processor architecture differently by
putting multiple CPU cores on a chip. This paper describes the architecture of single core and Multi core processor technology

Keywords: Computer Architectures, Single and Multicore Processors

1. INTRODUCTION The last 30 years have seen the computer industry driven
primarily by faster and faster uniprocessors; those days
In 1945, mathematician John von Neumann, with the aid have come to a close. Emerging in their place are
of J. Presper Eckert and John Mauchly, wrote a memo microprocessors containing multiple processor cores that
proposing the creation of an Electronic Discrete Variable are expected to exploit parallelism. Modern Computer
Automatic Computer, more famously known as the Systems are made up of multicore processor e.g Dual
EDVAC. In that paper, von Neumann suggested the Core CPUs. Dual Core is an architecture that refers to a
stored-program model of computing. In the von Neumann Central Processing Unit with two complete execution
architecture, the program is a sequence of instructions cores in a single processor. The two cores, their caches
stored sequentially in the computer’s memory. The and cache controllers are all built together in a single IC.
program’s instructions are executed one after the other in A Dual Core processor executes two threads by running
a linear, single-threaded fashion. each thread on a different core. Thus the Dual Core
processors improve multithreaded throughput, and
As time went on, advancements in mainframe technology delivering the advantages of parallel computing to
expanded upon the ideas presented by von Neumann. The properly thread mainstream applications [3].
1960s saw the advent of time-sharing operating systems.
Run on large mainframe computers, these operating A dual core processor is an integrated circuit with two
systems first introduced the concept of concurrent central processing units. This allows an operating system
program execution. Multiple users could access a single to multitask, or perform different functions
mainframe computer simultaneously and submit jobs for simultaneously, at the hardware level as opposed to the
processing. The operating system handled the details of software level. This uses less energy and generates less
allocating Central Processing Units (CPU) time for each heat than multitasking on one or even two single-core
individual program. At this time, concurrency existed at processors of similar performance. Additionally, central
the process level, and the job of task switching was left to processing units-intensive operations will not overload
the systems programmer [1]. dual core processors as they can be assigned one core in
the background and the other core can handle foreground
operations. In order for a computer to utilize both
ISSN: 2049-3444 © 2012 – IJET Publications UK. All rights reserved. 809
International Journal of Engineering and Technology (IJET) – Volume 2 No. 5, May, 2012

processor cores, the operating system must support of threads, there is only one thread per process that
thread-level parallelism (TLP), that is, the ability to send will be running at any given time and only a single
new instructions to the central processing units before processor core allocated to that process. There may
receiving the results of previous instructions. Most be thousands or tens of thousands user - level threads
modern operating systems, such as Linux and Microsoft for a single process, but they have no impact on the
windows, support TLP. While dual core processors are system resources.
essentially two processors on a single chip, the advantages
of having both processors in such close proximity go far b) Kernel level thread is the process or thread making
beyond the obvious space savings [8]. system calls such as accessing resources or throwing
exceptions. Kernel - level threads reside in kernel
space and are kernel objects. With kernel threads,
II. THREAD LEVEL PARALLELISM each user thread is mapped to or bound to a kernel
thread. The user thread is bound to that kernel thread
for the life of the user thread. Once the user thread
A thread is a flow of control through a program with a
terminates, both threads leave the system. The
single execution point. It is a sequence or stream of
operating system scheduler manages, schedules, and
executable code within a process that is scheduled for
dispatches these threads. The runtime library requests
execution by the operating system on a processor or core.
a kernel level thread for each of the user level thread.
All processes have a primary thread; the primary thread is
The operating system’s memory management and
a process’s flow of control or thread of execution. A
scheduling subsystem must be considered for very
process with multiple threads has as many flows of
large numbers of user - level threads and creates a
controls as there are threads. Each thread executes
context for each thread and each of the threads from a
independently and concurrently with its own sequence of
process can be assigned to a processor core as the
instructions. A process with multiple threads is called
resources become available.
multithreaded.

Threads execute independent concurrent tasks of a c) Hybrid threads are the combination of user and kernel
program. Threads can be used to simplify the program level threads. A hybrid thread implementation is a
structure of an application with inherent concurrency in cross between user and kernel threads and allows
the same way that functions and procedures make an both the library and the operating system to manage
application’s structure simpler by encapsulating the threads. User threads are managed by the runtime
functionality. Threads can encapsulate concurrent library scheduler, and the kernel threads are managed
functionality and use minimal resources shared in the by the operating system scheduler. With this
address space of a single process as compared to an implementation, a process has its own pool of kernel
application, which uses multiple processes. This threads. The user threads that are runnable are
contributes to an overall simpler program structure being dispatched by the runtime library and are marked as
seen by the operating system. Threads can improve the available threads ready for execution. The operating
throughput and performance of the application if used system selects a user thread and maps it to one of the
correctly, by utilizing multicore processors concurrently. available kernel threads in the pool. More than one
Each thread is assigned a subtask for which it is user thread may be assigned to the same kernel
responsible, and the thread independently manages the thread.
execution of the subtask. Each thread can be assigned a
priority reflecting the importance of the subtask it is One of the big differences between these implementations
executing [6]. is the mode in which they exist in and the ability of the
threads to be assigned to a processor. User and kernel -
A. Threads Model Implementation level threads also become important when determining a
thread’s scheduling model and contention scope.
There are three implementation models for threads: Contention scope determines which threads a given thread
contends with for processor usage, and it also becomes
a) User or application level thread is a process or thread very important in relation to the operating system ’ s
executing instructions in the program or linked memory management for large numbers of threads.
library. They are not making any calls to the
operating system kernel. They are not visible to the III. ARCHITECTURE OF SINGLE
operating system and cannot be scheduled to a CORE PROCESSOR
processor core. Each thread does not have its own
Pentium 4 processor brand, refers to Intel's line of single-
thread context. So, as far as simultaneous execution
core desktop and laptop Central Processing Units (CPUs)
ISSN: 2049-3444 © 2012 – IJET Publications UK. All rights reserved. 810
International Journal of Engineering and Technology (IJET) – Volume 2 No. 5, May, 2012

introduced on November 20, 2000 at a speed rate of


1.5GHZ and shipped through August 8, 2008. They are
the 7th-generation of the X86 micro architecture,
called Netburst, this is the company's first new design
since the introduction of P6 micro architecture of Pentium
pro CPUs in 1995.

The Pentium 4 processor is designed to deliver


performance across applications where end users can truly
appreciate and experience better performance. For
example, it allows a much better user experience in areas
such as Internet, audio and streaming video, image
processing, video content creation, speech recognition, 3D
applications and games, multi-media and multi-tasking
user environments. The Pentium 4 processor enables real-
time MPEG2 video encoding and near real-time MPEG4
encoding, allowing efficient video editing and video
conferencing. It delivers world-class performance on 3D
applications and games, such as Quake 3, enabling a new
level of realism and visual quality to 3D applications [8].

The Pentium 4 processor has 42 million transistors


implemented on Intel’s 0.18u CMOS process; with six
levels of aluminium interconnect. It has a die size of
217mm2 and it consumes 55watts of power at 1.5GHz. It
has 3.2GB/second system bus that helps to provide the
high data bandwidths needed to supply data to today’s and
tomorrow’s demanding applications. It adds 144 new,
128-bit Single Instruction Multiple Data (SIMD) Fig. 1: Basic block diagram of Intel NetBurst
instructions called SSE2 (Streaming SIMD Extension 2)
that improves performance of multi-media, content 1) In-Order Front End: The in-order front end is part of
creation, scientific, and engineering applications. the machine that fetches the instructions to be
executed next in the program and prepares them to be
A. The Netburst Microarchitecture used later in the machine pipeline. Its job is to supply
Pentium 4 is a processor with a deep pipeline supporting a high-bandwidth stream of decoded instructions to
multiple issues with speculation. The Pentium 4 uses an the out-of-order execution core, which will do the
aggressive out-of-order speculative microarchitecture, actual completion of the instructions. The front end
called Netburst, that is deeply pipelined with the goal of has highly accurate branch prediction logic that uses
achieving high instruction throughput by combining the past history of program execution to speculate
multiple issue and high clock rates, Unlike the where the program is going to execute next. The
microarchitecture used in the Pentium III [7]. predicted instruction address, from this front-end
branch prediction logic, is used to fetch instruction
This processor requires balancing and tuning of many bytes from the Level 2 (L2) cache. These IA-32
microarchitectural features that compete for processor die instruction bytes are then decoded into basic
cost and for design and validation efforts. Fig. 1 shows operations called micro-operations (micro-
the basic Intel NetBurst microarchitecture of the Pentium operations) that the execution core is able to execute
4 processor. There are four main sections in this [5].
architectural diagram of Pentium 4: The NetBurst microarchitecture has an advanced
Level 1 (L1) instruction cache called the Execution
a) The in-order front end Trace Cache. Unlike conventional instruction caches,
b) The out-of-order execution engine the Trace Cache sits between the instructions decode
c) The integer and floating-point execution units logic and the execution core as shown in Fig. 1. In
d) The memory subsystem microarchitecture of this location the Trace Cache is able to store the
Pentium 4 [4]. already decoded IA-32 instructions or micro-
operations (uops). Storing already decoded
instructions removes the IA-32 decoding from the

ISSN: 2049-3444 © 2012 – IJET Publications UK. All rights reserved. 811
International Journal of Engineering and Technology (IJET) – Volume 2 No. 5, May, 2012

main execution loop. Typically the instructions are execution units include several types of integer and
decoded once and placed in the Trace Cache and then floating-point execution units that compute the results
used repeatedly from there like a normal instruction and also the L1 data cache that is used for most load
cache on previous machines. The IA-32 instruction and store operations.
decoder is used only when the machine misses the
Trace Cache and needs to go to the L2 cache to get 4) Memory Subsystem: The memory subsystem showed
and decode new IA-32 instruction bytes. in Fig. 1, includes the L2 cache and the system bus.
The front-end decoder translates each IA-32 The L2 cache stores both instructions and data that
instruction to a series of micro-operations (uops), cannot fit in the Execution Trace Cache and the L1
which are similar to typical RISC instructions. The data cache. The external system bus is connected to
micro-operations are than executed by a dynamically the backside of the second-level cache and is used to
scheduled speculative pipeline. A trace cache is a access main memory when the L2 cache has a cache
type of instruction cache that holds sequences of miss, and to access the system I/O resources [4].
instructions to be executed including nonadjacent
instructions separated by branches; a trace cache tries
to exploit the temporal sequencing of instruction IV. OVERVIEW OF MULTICORE
execution rather than the spatial locality exploited in PROCESSOR
a normal cache.
Multi-core processor is the most recent evolution in
2) Out-of-Order Execution Logic: The out-of-order computing technology. A multi-core processor is
execution engine is where the instructions are composed of two or more independent cores. It can be
prepared for execution. The out-of-order execution describe as an integrated circuit which has two or more
logic has several buffers that it uses to smooth and re- individual processors called cores. Manufacturers
order the flow of instructions to optimize typically integrate the cores into a single integrated circuit
performance as they go down the pipeline and get die known as a Chip Multiprocessors (CMPs). A dual-
scheduled for execution. Instructions are aggressively core processor contains two cores (Such as AMD Phenom
reordered to allow them to execute as quickly as their II X2, AMD Turion II P520 Dual-Core, Intel Pentium
input operands are ready. This out-of-order execution Dual-Core and Intel Core Duo), a quad-core processor
allows instructions in the program following delayed contains four cores (Such as the AMD Phenom II X4 and
instructions to proceed around them as long as they Intel 2010 core line, which includes 3 levels of quad core
do not depend on those delayed instructions. Out-of- processors), and a hexa-core processor contains six cores
order execution allows the execution resources such (Such as the AMD Phenom II X6 or Intel Core i7 Extreme
as the ALUs and the cache to be kept as busy as Edition 980X). A multi-core processor implement
possible executing independent instructions that are multiprocessing in a single physical package. Designers
ready to execute. may couple cores in a multi-core device together tightly
The retirement logic is what reorders the instructions, or loosely. For example, cores may or may not share
executed in an out-of-order manner, back to the caches, and they may implement message passing or
original program order. This retirement logic receives shared memory inter-core communication methods.
the completion status of the executed instructions In today's digital world, the demands for complex 3D
from the execution units and processes the results so simulations, streaming media files, added levels of
that the proper architectural state is committed (or security, more sophisticated user interfaces, larger
retired) according to the program order. The Pentium databases, and more on-line users are beginning to exceed
4 processor can retire up to three micro-operations single-core processor capabilities. Multi-core processors
per clock cycle. This retirement logic ensures that enable true multitasking. On single-core systems,
exceptions occur only if the operation causing the multitasking can max out CPU utilization, resulting in
exception is the oldest, non-retired operation in the decreased performance as operations have to wait to be
machine. This logic also reports branch history processed. On multi-core systems, since each core has its
information to the branch predictors at the front end own cache, the operating system has sufficient resources
of the machine so they can train with the latest to handle most compute intensive tasks in parallel [8].
known-good branch-history information [5].
The most recent advances in microprocessor design for
3) Integer and Floating-Point Execution Units: The desktop computers involve putting multiple processors on
execution units are where the instructions are actually a single computer chip. These multicore designs are
executed. This section includes the register files that completely replacing the traditional single core designs
store the integer and floating-point data operand that have been the foundation of desktop computers. IBM,
values that the instructions need to execute. The
ISSN: 2049-3444 © 2012 – IJET Publications UK. All rights reserved. 812
International Journal of Engineering and Technology (IJET) – Volume 2 No. 5, May, 2012

Sun, Intel, and AMD have all changed their chip pipelines A. Multicore Architectures
from single core processor production to multicore
processor production. This has prompted computer Chip Multiprocessors (CMPs) come in multiple flavors:
vendors such as Dell, HP, and Apple to change their focus two processors (dual core), four processors (quad core),
to selling desktop computers with multicores. and eight processors (octa - core) configurations. Some
configurations are multithreaded; some are not. There are
Multicore Programming showed that approaches to several variations in how cache and memory are
designing and implementing application software that will approached in the new Chip Multiprocessors. The
take advantage of the multicore processors are radically approaches of processor - to - processor communication
different from techniques used in single core vary among different implementations. The Chip
development. The focus of software design and Multiprocessors implementations from the major chip
development will have to change from sequential manufacturers on which each processor handle the
programming techniques to parallel and multithreaded Input/Output (I/O) bus and the Front Side Bus (FSB) are
programming techniques. The standard developer’s different.
workstation and the entry-level server are now
multiprocessors capable of hardware-level multithreading, Again, most of these differences are not visible when
multiprocessing and parallel processing. Although looking strictly at the logical view of an application that is
sequential programming and single core application being designed to take advantage of a multicore
development have a place and will remain with us, the architecture. Fig. 2 illustrates three common architectural
ideas of multicore application design and development are configurations that support multiprocessing [6].
now in the mainstream [6].

Fig. 2: Architectural Configuration of Multiprocessing

 Configuration 1 in Fig. 2 uses hyperthreading. complete multiple processors when in fact there is
Like Chip Multiprocessors, an hyperthreaded a single processor running multiple threads.
processor allows two or more threads to execute on  Configuration 2 in Fig. 2 is the classic
a single chip. However, in a hyperthreaded multiprocessor. In configuration 2, each processor
package the multiple processors are logical instead is on a separate chip with its own hardware.
of physical. There is some duplication of hardware  Configuration 3 represents the current trend in
but not enough to qualify a separate physical multiprocessors. It provides complete processors
processor. So hyperthreading allows the processor on a single chip.
to present itself to the operating system as
ISSN: 2049-3444 © 2012 – IJET Publications UK. All rights reserved. 813
International Journal of Engineering and Technology (IJET) – Volume 2 No. 5, May, 2012

What important to remember is that each configuration Pentium® processor Extreme Edition, in April of 2005. It
presents itself to the developer as a set of two or more had dual cores and supported hyperthreading, giving the
logical processors capable of executing multiple tasks system eight logical cores. The Core Duo multicore
concurrently. The challenge for system programmers, processor was introduced in 2006 and offered not only
kernel programmers, and application developers is to multiple cores but also multiple cores with lower power
know when and how to take advantage of it. consumption and it has no hyperthreading but supports 64
bit architecture. Fig. 4 shows a block diagram of Intel
V. INTEL CORE 2 DUO PROCESSOR Core 2 Duo. The Core 2 Duo processor has two 64 - bit
cores and two 64KB level 1 caches, one for each core.
ARCHITECTURE Level 2 cache is shared between cores. Level 2 cache can
be up to 4MB. Either core can utilize up to 100 percent of
Intel’s Core 2 Duo is one of Intel series of multicore the available L2 cache. This means that when the other
processors. Multicore processors are enhanced with core is underutilized and is, therefore, not requiring much
hyperthreading, giving each core two logical processors. L2 cache, the more active core can increase its usage of
The first of Intel’s multicore processors was Intel® L2.

Fig. 3: Block diagram of Intel Core 2 Duo Processor Architecture

1) Northbridge and Southbridge: CPUs in order to optimize its performance and the
performance of the system in general. The chipset moves
Besides the Central Processing Units (CPUs), the next data back and forth from CPU to the various components
most important component of the motherboard is the of the motherboard, including memory, graphics card, and
chipset. The chipset, shown in Fig. 3, is a group of Input/Output (I/O) devices, as showed in Fig. 4. All
integrated circuits designed to work together that connects communication to the CPU is routed through the chipset.
the CPUs to the rest of the components on the The chipset comprises of two chips: Northbridge and
motherboard. It is an integrated part of the motherboard Southbridge. These names were adopted based on the
and, therefore, cannot be removed or upgraded. It is locations of the chips on the motherboard and the
manufactured to work with a specific class or series of purposes they serve. The Northbridge is located in the

ISSN: 2049-3444 © 2012 – IJET Publications UK. All rights reserved. 814
International Journal of Engineering and Technology (IJET) – Volume 2 No. 5, May, 2012

northern region, north of the components on the One of the primary functions of cache is to take advantage
motherboard, and the Southbridge is located in the of this temporal and spatial locality characteristic of a
southern region, south of the components on the program. Cache is often divided into two levels, they are:
motherboard. Both serve as bridges or connections level 1 and level 2 (Patterson and Henness, 2007).
between devices; they bridge components to make sure
that data goes where it is supposed to go [6]. 3) Level 1 Cache:

 The Northbridge, also called the Memory Level 1 cache is small in size sometimes as small as 16K.
Controller Hub, communicates directly with the L1 cache is usually located inside the processor and is
Central Processor via the Front Side Bus (FSB). It used to capture the most recently used bytes of instruction
connects the CPUs with high - speed devices such or data.
as main memory. It also connects the CPUs with 4) Level 2 Cache:
Peripheral Component Interconnect Express (PCI -
E) slots and the Southbridge via an internal bus. Level 2 cache is bigger and slower than L1 cache.
Data is routed through the Northbridge first before Currently, it is stored on the motherboard (outside the
it reaches the Southbridge. processor), but this is slowly changing. L2 cache is
currently measured in megabytes. L2 cache can hold an
 The Southbridge, also called the Input/Output (I/O) even bigger chunk of the most recently used instruction,
Controller, is slower than the Northbridge. data, and items that are in the near vicinity than L1 holds.
Because it is not directly connected to the CPUs, it Because L1 and L2 are faster than general - purpose
is responsible for the slower capabilities of the RAM, the more correct the guesses of what the program is
motherboard like the I/O devices such as audio, going to do next, the better the overall system
disk interfaces, and so on. The Southbridge is performance because the right chunks of data will be
connected to BIOS support via the Serial located in either L1 or L2 cache. This saves a trip out to
Peripheral Interface (SPI), six PCI - E slots, and either Random Access Memory or Virtual Memory or,
other I/O devices not shown on the diagram. The even worse, external storage.
Serial Peripheral Interface enables the exchange of
data (1 bit at a time) between the Southbridge and A dual core processor design could provide the following
the BIOS support using a master - slave for each physical processor:
configuration. It also operates with a full duplex,
meaning that data can be transferred in both  have its own on-die cache,
directions.
 it could provide for the on-die cache to be shared
2) Dual Core Cache: by the two processors,

Cache is memory placed between the processor and main  enable each processor could have a portion of
system memory. While cache is not as fast as registers on-die cache that is exclusive to a single
but, it is faster than Random Access Memory (RAM). It processor and then have a portion of on-die
holds more than the registers but does not have the cache that is shared between the two dual core
capacity of main memory. Cache increases the effective processors
memory transfer rates and the overall processor
performance. Cache is used to contain copies of recently The two processors in a dual core package could have an
used data or instruction by the processor. Small chunks of on-die communication path between the processors so that
memory are fetched from main memory and stored in putting snoops and requests out on the Front Side Bus
cache in anticipation that they will be needed by the (FSB) is not necessary. Both processors must have a
processor. Programs tend to exhibit both temporal locality communication path to the computer system front side
and spatial locality. bus.

 Temporal locality is the tendency to reuse recently VI. AMD DUAL CORE PROCESSOR
accessed instructions or data. ARCHITECTURE
 Spatial locality is the tendency to access Advanced Micro Devices (AMD), Inc., released its dual-
instructions or data that are physically close to core Opteron server/workstation processors on 22 April
items that were most recently accessed. 2005, and its dual-core desktop processors, the Athlon 64
X2 family, was released on 31 May 2005. In terms of

ISSN: 2049-3444 © 2012 – IJET Publications UK. All rights reserved. 815
International Journal of Engineering and Technology (IJET) – Volume 2 No. 5, May, 2012

architecture, AMD and Intel have quite different ways of This is in stark contrast to the Intel design where
dealing with this issue of multicore systems. Fig. 4 shows Modified, Exclusive, Shared and Invalid (MESI) updates
very simplified diagram of a dual-core Opteron designed are communicated over the front side bus [2].
by AMD. Each of the KB cores has its own, independent
L2 cache on-board, but the two cores share a common VII. CONCLUSION
system request queue. They also share a dual channel
DDR memory controller and a set of Hyper-Transport The requirements for successfully delivering hardware-
links to the outside world. Access to these I/O resources is enhanced threading and multicore processing capability
adjudicated via a crossbar, or switch, so that each CPU go beyond critical silicon manufacturing capacity and
can talk directly to memory or I/O as efficiently as technology. The promise of a better user experience also
possible. In some respects, the dual-core Opteron acts depends on software as well, unless we develop parallel
very much like a sort of SMP system on a chip, passing user level applications, it will be difficult to harness the
data back and forth between the two cores internally. To full power of multi core processor technology.
the rest of the system I/O infrastructure, though, the dual-
core Opteron looks more or less like the single-core Multiple cores will provide easy benefits for
version [8]. multithreaded workloads, but many applications written
for uniprocessors, will therefore not automatically benefit
Each processor (whether dual core or single) has its own from CMP designs. The performance improvement to be
local dual channel DDR memory controller, and the gained from using some faster mode of execution is
processors talk to one another and to I/O chips via point- limited by the fraction of the time the faster mode can be
to-point Hyper-Transport links running at 1GHz. The total used. The speedup that can be gained by using a
possible bandwidth flowing through the 940 pins of an particular feature is the factor of a parallel system as the
Opteron 875 is 30.4GB/s. With HyperTransport link, the ratio between the times taken by a single processor to
Opteron 275 can theoretically hit 22.4GB/s. AMD uses a solve a given problem to the time taken by a parallel
cache coherency protocol called MOESI and CPU that system (consisting of n processors) to solve the same
owns certain data has the data in its cache, modified it, problem This is expressed in equation 1.1.
and yet makes it available to other CPUs. Data flagged in
an Opteron cache can be delivered directly from the cache
of CPU 0 into the cache of CPU 1 via a CPU-to-CPU
HyperTransport link, without having to be written to main
------ 1.1
memory. This interface runs at the speed of the CPU, so
transfers from the cache on core 0 into the cache on core 1 It appears that the advent of these multi-core architectures
should happen very, very quickly. will finally force a radical change in how applications are
programmed. Specifically, developers must consider how
to direct the collaboration of many concurrent threads of
execution to solve a single problem.

REFERENCES
[1] S. Akhter, and J. Roberts, (2006); “Multi-Core
Programming (Increasing Performance through
Software Multi-threading)” Published by Richard
Bowles, Library of Congress Cataloging in
Publication Data, Printed in the United States of
America.

[2] S. Anirban, (2006); Dual Core Processors – A brief


overview1, uploaded: April 12th 2006,
[email protected] retrieved on April, 2011.

[3] V. Saxena and M. Shrivastava, (2009); “UML


Modeling and Performance Evaluation of
Multithreaded Programs on Dual Core Processor”,
Ambedkar University (Central University),
Fig. 4: AMD Dual Core Processor Architecture Lucknow, India, published by International Journal

ISSN: 2049-3444 © 2012 – IJET Publications UK. All rights reserved. 816
International Journal of Engineering and Technology (IJET) – Volume 2 No. 5, May, 2012

of Hybrid Information Technology Vol.2, No.3, [6] C. Hughes and T. Hughes, (2008); “Professional
July, 2009). Multicore Programming, (Design and
Implementation for C++ Developers)” , Published
[4] J. Emer, (2005); “Microprocessor Evolution: 4004 to by Wiley Publishing, Inc. 10475 Crosspoint
Pentium-4” Computer Science and Artificial Boulevard, Indianapolis, IN 46256, www.wiley.com
Intelligence Laboratory, Massachusetts Institute of
Technology, Cambridge, Massachusetts. [7] D.A. Patterson and J.L. Hennessy (2007); Computer
Architecture A Quantitative Approach, 4th Edition,
[5] G. Hinton, D. Sager, M. Upton, D. Boggs, D. Published by Morgan Kaufmann publications, inc.
Carmean, A. Kyker and P. Roussel, (2001); “The San Francisco California, Printed in the United
Microarchitecture of the Pentium® 4 Processor”, States of America.
Desktop Platforms Group, Intel Corp., Intel
Technology Journal Q. [8] A. I. Fasiku, (2012); Performance Evaluation of
Multicore Processors, M.Tech Thesis, Federal
University of Technology, Akure, Nigeria.

ISSN: 2049-3444 © 2012 – IJET Publications UK. All rights reserved. 817

View publication stats

You might also like