Fallacies and Pitfalls
Fallacies and Pitfalls
Fallacies and Pitfalls
The purpose of this section, which will be found in every chapter, is to explain some
commonly held misbeliefs or misconceptions that you should avoid. We call such misbeliefs
fallacies. When discussing a fallacy, we try to give a counterexample.
We also discuss pitfallseasily made mistakes. Often pitfalls are generalizations of principles
that are true in a limited context. The purpose of these sections is to help you avoid making
these errors in computers that you design.
Fallacy Multiprocessors are a silver bullet.
The switch to multiple processors per chip around 2005 did not come from some
breakthrough that dramatically simplified parallel programming or made it easy to build
multicore computers. The change occurred because there was no other option due to the ILP
walls and power walls. Multiple processors per chip do not guarantee lower power; its
certainly possible to design a multicore chip that uses more
power. The potential is just that its possible to continue to improve performance by
replacing a high-clock-rate, inefficient core with several lower-clock-rate, efficient cores. As
technology improves to shrink transistors, this can shrink both capacitance and the supply
voltage a bit so that we can get a modest increase in the number of cores per generation.
For example, for the last few years Intel has been adding two cores per generation.
As we shall see in Chapters 4 and 5, performance is now a programmers burden. The La-ZBoy programmer era of relying on hardware designers to make their programs go faster
without lifting a finger is officially over. If programmers want their programs to go faster with
each generation, they must make their programs more parallel.
The popular version of Moores lawincreasing performance with each generation of
technologyis now up to programmers.
Pitfall Falling prey to Amdahls heartbreaking law.
Virtually every practicing computer architect knows Amdahls law. Despite this, we almost all
occasionally expend tremendous effort optimizing some feature before we measure its
usage. Only when the overall speedup is disappointing do we recall that we should have
measured first before we spent so much effort
enhancing it!
Pitfall A single point of failure.
The calculations of reliability improvement using Amdahls law on page 48 show that
dependability is no stronger than the weakest link in a chain. No matter how much more
dependable we make the power supplies, as we did in our example, the single fan will limit
the reliability of the disk subsystem. This Amdahls law observation led to a rule of thumb for
fault-tolerant systems to make sure that every component was redundant so that no single
component failure could bring down the whole system. Chapter 6 shows how a software
layer avoids single points of failure inside warehouse-scale computers.
Fallacy Hardware enhancements that increase performance improve energy efficiency or
are at worst energy neutral.
Esmaeilzadeh et al. [2011] measured SPEC2006 on just one core of a 2.67 GHz Intel Core i7
using Turbo mode (Section 1.5). Performance increased by a factor of 1.07 when the clock
rate increased to 2.94 GHz (or a factor of 1.10), but the i7 used a factor of 1.37 more joules
and a factor of 1.47 more watt-hours!
Fallacy Benchmarks remain valid indefinitely.
Stated alternatively, 0.9% would fail per year, or 4.4% over a 5-year lifetime. Moreover,
those high numbers are quoted assuming limited ranges of temperature and vibration; if
they are exceeded, then all bets are off. A survey of disk drives in real environments [Gray
and van Ingen 2005] found that 3% to 7% of drives failed per year, for an MTTF of about
125,000 to 300,000 hours. An even larger study found annual disk failure rates of 2% to 10%
[Pinheiro, Weber, and Barroso 2007]. Hence, the real-world MTTF is about 2 to 10 times
worse than the manufacturers MTTF.
Fallacy Peak performance tracks observed performance.
The only universally true definition of peak performance is the performance level a
computer is guaranteed not to exceed. Figure 1.20 shows the percentage of peak
performance for four programs on four multiprocessors. It varies from 5% to 58%. Since the
gap is so large and can vary significantly by benchmark, peak performance is not generally
useful in predicting observed performance.
Pitfall Fault detection can lower availability.
This apparently ironic pitfall is because computer hardware has a fair amount of state that
may not always be critical to proper operation. For example, it is not fatal if an error occurs
in a branch predictor, as only performance may suffer.
In processors that try to aggressively exploit instruction-level parallelism, not all the
operations are needed for correct execution of the program. Mukherjee et al. [2003] found
that less than 30% of the operations were potentially on the critical path for the SPEC2000
benchmarks running on an Itanium 2.
The same observation is true about programs. If a register is dead in a programthat is,
the program will write it before it is read againthen errors do
not matter. If you were to crash the program upon detection of a transient fault in a dead
register, it would lower availability unnecessarily. Sun Microsystems lived this pitfall in 2000
with an L2 cache that included
parity, but not error correction, in its Sun E3000 to Sun E10000 systems. The SRAMs they
used to build the caches had intermittent faults, which parity detected. If the data in the
cache were not modified, the processor simply reread the data from the cache. Since the
designers did not protect the cache with ECC
(error-correcting code), the operating system had no choice but to report an error to dirty
data and crash the program. Field engineers found no problems on inspection in more than
90% of the cases.
To reduce the frequency of such errors, Sun modified the Solaris operating system to scrub
the cache by having a process that proactively writes dirty data to memory. Since the
processor chips did not have enough pins to add ECC, the only hardware option for dirty
data was to duplicate the external cache, using the copy without the parity error to correct
the error.
The pitfall is in detecting faults without providing a mechanism to correct them. These
engineers are unlikely to design another computer without ECC on external caches.