RMSS (1.5 Sigma Shift Paper) PDF
RMSS (1.5 Sigma Shift Paper) PDF
RMSS (1.5 Sigma Shift Paper) PDF
All Rights Reserved. No part of this book may be used or reproduced in any manner whatsoever
without written permission from the publisher except in the case of brief quotations embodied in
critical articles and reviews.
Publisher:
Palladyne Publishing
Distributor:
Tri Star Visual Communications
3110 North 35th Avenue, Suite 4
Phoenix, Arizona 85017
(602) 269-2900
[email protected]
ISBN 0-9715235-1-7
Table of contents
Forward i
Two pillars of seemingly mystical origin and uncertain composition have long
supported the practice of six sigma. The first pillar is characterized by the quantity "six"
in the phrase "six sigma." The second pillar is related to the 1.5 sigma shift. This book
sets forth the theoretical constructs and statistical equations that underpin and validate
both of these pillars, as well as several other intersecting issues related to the subject.
The reader should be aware that this book has been prepared from a design
engineering perspective. Owing to this, it can fully support many of the aims associated
with design-for-six-sigma (DFSS). Although skewed toward design engineers, this book
provides a methodology for risk analysis that would be of keen interest to producibility
engineers. In addition, the book is also intended for quality professionals and process
engineers that are responsible for the "qualification" of a process prior to its adoption.
With these aims in mind, the ensuing discussion will mathematically demonstrate
that the "1.5 sigma shift" can be attributable solely to the influence of random error. In
this context, the 1.5 sigma shift is a statistically based correction for scientifically
compensating or otherwise adjusting a postulated model of instantaneous reproducibility
for the inevitable consequences associated with random sampling variation. Naturally,
such an adjustment (1.5 sigma shift) is only considered and instituted at the opportunity
level of a product configuration. Thus, the model performance distribution of a given
critical performance characteristic can be affectively attenuated for many of the
operational uncertainties associated with a design-process qualification (DPQ).
Based on this quasi-definition, it should fairly evident that the 1.5 sigma shift factor
can often be treated as a "statistical correction," but only under certain engineering
conditions that would generally be considered “typical.” By all means, the shift factor
(1.5 sigma) does not constitute a "literal" shift in the mean of a performance distribution
– as many quality practitioners and process engineers falsely believe or try to postulate
through uniformed speculation and conjecture. However, its judicious application during
the course of designing a system, product, service, event, or activity can greatly facilitate
the analysis and optimization of "configuration repeatability."
1
Based on application experiences, Mr. William “Bill” Smith proposed the 1.5σshift factor more than 18
years ago (as a compensatory measure for use in certain reliability and engineering analyses). At that
time, the author of this book conducted several theoretical studies into its validity and judiciously
examined its applicability to design and process work. The generalized application components were
subsequently published in several works by this author (see bibliography). While serving at Motorola,
this author was kindly asked by Mr. Robert “Bob” Galvin not to publish the underlying theoretical
constructs associated with the shift factor, as such “mystery” helped to keep the idea of six sigma alive.
He explained that such a mystery would help “keep people talking about six sigma in the many hallways
of our company.” To this end, he fully recognized that no matter how valid an initiative may be, if people
stop talking about it, interest will be greatly diminished, or even lost. In this vein, he rightfully believed
that the 1.5σ mystery would motivate further inquiry, discussion and lively debate – keeping the idea
alive as six sigma seated itself within the corporation. For such wisdom and leadership, this author
expresses his deepest gratitude. However, after 18 years, the time has come to reveal the theoretical
basis of six sigma and that of the proverbial 1.5σ shift.
2
At all times, the reader must remain cognizant of the fact that the field of producibility assessment is
relatively new territory in terms of engineering. Because of this, and its enormous scope, the full and
complete articulation of certain details is not possible within the confines of this book. As a
consequence, emphasis is placed on the development of a conceptual understanding at many points in
the discussion. Also recognize that the focus of this book is on the statistical theory and supporting
engineering rational surrounding the 1.5σ shift factor advocated by the practice of six sigma.
Nonetheless, the discussion is framed in the context of design engineering and producibility analysis.
3
Although a particular set of subgroup-to-subgroup centering errors may be classified as “random,” their
individual existence is nonetheless unique and real. Just because the subgroup-to-subgroup variation is
random does not preclude the resulting units of product from being different. In short, the subgroup
differences (in terms of centering) may be statistically insignificant (owing to random sampling error), but
the difference is real – in an absolute sense, no matter how small or large. Owing to this, it is rational to
assert that such variation would induce unit-to-unit differences in reliability.
4
With this as a backdrop, this writer feels compelled to acknowledge the very fine technical contributions
and enhancements provided over the years by such accomplished engineers as Dr. Thomas Cheek, Dr.
Jack Prins, Dr. Douglas Mader, Dr. Ron Lawson and Mr. Reigle Stewart, just to name a few. During this
author’s years at Motorola, their personal insights and “late at night over a beer, pencil and calculator”
discussions significantly aided in adding to the body of six sigma research. Perhaps most of all, this
writer would like to recognize Mr. Robert “Bob” Galvin. His many words of wisdom, piercing leadership
acumen and personal encouragement provided this scientist the “intellectually-rich and politically-risk -
free” environment from which to reach out and question conventional thinking. Only with his support
were the beginnings of this investigator’s journey made possible (and relatively painless). He is truly an
icon of leadership and a solid testament to what can happen when a senior executive embodies and
empowers the “idea of ideas.”
5
Given the model Y = f (X), it should be recognized that the function can be of a linear or nonlinear form.
For a linear transfer function f, we would rightfully expect that any given incremental change in X would
necessarily induce a corresponding and incremental change in Y. Given the same increment of change
in X, a nonlinear function would induce a disproportional change in Y.
6
Generally speaking, such variation can be of the random or nonrandom variety. Random variation is
also referred to as “white noise,” where as nonrandom variation is referenced as “black noise.”
7
As may be apparent, such a deviation from expectation could be the result of a random or nonrandom
effect (as the case may be). Of course, the discovery, classification, and subsequent study of such
effects are of central concern to the field of mathematical statistics.
8
Of interest, most errors can be classified into one of four broad categories: 1) random transient; 2)
nonrandom transient; 3) random temporal; and 4) nonrandom temporal. While transient errors are
relatively instantaneous in nature, temporal errors require time to be fully created or otherwise
manifested. Without saying, random errors cannot be predicted or otherwise forecast (in a statistical
sense) whereas nonrandom errors can be. In this context, random errors do not have an “assignable
cause,” but the occurrence of nonrandom errors can be assigned. This is to say that nonrandom errors
can be directly attributed to the influence of one or more independent variables or some interactive
combination thereof.
9
This theoretical understanding naturally assumes that the partial derivatives associated with the
contributing Xs have been rank ordered in terms of influence and then subjected to the transformative
process f. Under this assumption, the residual error will decrease as the accumulation of influence
increases. Of course, the inverse of this is also true.
forced to grapple with the case ε > 0. Hence, the ever present need for
mathematical statistics.
With respect to any dependent variable Y, each X within the
corresponding system of causation exerts a unique and contributory influence
(W). Of course, the weight of any given X is provided in the range 0.0 <
10
To this end, a statistical experiment is often designed and executed. Such experiments are intended to
efficiently isolate the underlying variable effects that have an undue effect on the mean and variance of
Y. As a part of such an exercise, a polynomial equation is frequently developed so as to interrelate or
otherwise associate Y to the “X effects” that prove to be of statistical and practical concern. In such
cases, it is not feasible to isolate the exhaustive set of Xs and all of their independent and interactive
effects. In other words, it would not make pragmatic or economic sense to attempt a full explanation or
accounting of the observed behavior in Y. Consequently, we observe that ε > 0 and conclude that the
given set of causative variables is not exhaustive.
11
For the moment, let us postulate that N is exhaustive. As any given X is made to vary, we would
naturally observe some corresponding variation in Y, subject only to the mechanistic nature of the
function f. Of course, such variation (in Y and X) is also referred to as “error.” Thus, the function f is
able to “transmit” the error from X to Y. If the errors assignable to X are independent and random, the
corresponding errors in Y will likewise be independent and random. Naturally, the inverse of this would
be true – nonrandom error in X would transmit to Y in the form of nonrandom error. From a more
technical perspective, it can be said that any form of autocorrelated error in X would necessarily
transmit to Y in a consistent and predictable fashion – to some extent, depending on the function f. In
any such event, it is quite possible that a particular “blend” of nonrandom input variation could be
transmitted through the given function in such a way that the output variation would not exhibit any
outward signs of autocorrelation (for any given lag condition). Since Y would exhibit all the statistical
signs of random behavior, it would be easy to falsely conclude that the underlying system of causation
is non-deterministic.
12
Uncertainty is often manifested when: a) one or more causative variables are not effectively contained
within the composite set of such variables; b) the transfer function f is not fully valid or reliable; c) one or
more of the causative (independent) variables has undergone a momentary or temporal change of
state; d) two or more of the causative (independent) variables are somehow made interactive,
instantaneously or longitudinally; or e) some combination thereof.
14
In many cases, the nature of such error is often so complex, compounded, and confounded that existing
analytical technologies do not have the “diagnostic power” to discern or otherwise “source trace” its
independent origins through the many chains of causation. When it is not pragmatically feasible or
economically sensible to “track down” the primary sources of variation, we simply declare (assume) that
each individual error constitutes an “anomaly.” For any given anomaly, the circumstantial state of the
underlying cause system is momentarily declared to be “indeterminate” and, as a consequence, the
perturbation is treated as if it emanated from a system of random causes.
15
The momentary or longitudinal blending (mix) of many independent variables (each with a unique
weighting) can effectively “mask” the presence of a nonrandom signal condition inherent to the
dependent variable Y. As may be apparent, this would create the illusion of random variation (with
respect to Y). However, as the sources of variation (Xs) are progressively blocked or otherwise
neutralized (by virtue of a rational sampling scheme coupled with the appropriate analytical tools), the
dominant signal conditions would then be discernable from the white noise. When such a signal
condition is detected, the composite (total) variation would no longer be considered fully random. In
other words, as the background variations are minimized, the likelihood of detecting some type or form
of underlying signal increases. From a purely classical point-of-view, some would assert that nothing in
nature happens by chance (everything is theoretically deterministic). In other words, everything moves
in some form of trend, shift, or cycle. Holding this as an axiom, it would then be reasonable to assert
that Y is always perfectly predictable (theoretically speaking), regardless of how complex or
sophisticated the underlying system of causation may be. Accepting that Y = f (X1, … , XN) and given
that the influence of all variables is effectively eliminated except that of XK, then Y would necessarily
exhibit the same behavior as XK (momentarily and longitudinally). Thus, it can be theoretically argued
that any collective set of independent variables, each having a unique signal effect of a nonrandom
nature, can be momentarily or longitudinally blended or otherwise mixed in such a manner so as to form
a seemingly nondeterministic system. When this type of condition is at hand, it is often far more
convenient (for purposes of analysis) to assume a random model than it is to progress under the
constraints of a nonrandom model.
16
As independent agent, any given source of variation has the capacity and capability to induce a
transient or temporal effect (error). However, when two or more such forces work in unison (at certain
operational settings), it is often possible to form an effect that is larger than the simple sum of their
independent contributions. In general, as the number of independent contributory forces increases, it
becomes less likely that the resulting effect (error) can be dissected or otherwise decomposed for
independent consideration and analysis. Consequently, higher order interactions are often treated as a
random effect when, in reality, that effect is comprised of several deterministic causes.
17
To better understand the idea of autocorrelation, let us consider a set of time-series data. Given this,
we say that the data are “sequentially realized over time.” First, let us consider a lag one condition. For
this condition, it can be said that any given error cannot be used to statistically forecast the next
observed error. For a lag two condition, the error from the two previous time periods cannot be used
(individually or collectively) to forecast the next observed error. Of course, this line of reasoning would
apply for all possible lag conditions. If no statistical correlation is observed over each of the possible
lags, it would then be reasonable to assert that the data is not patterned (the data would be free of any
discernable trends, shifts, or cycles).
18
From the dictionary, it should be noted the word “temporal” is taken to mean “of or related to time.” Of
course, this definition could be applied to a short-term or long-term effect. However, for purposes of this
book and six sigma work, we naturally apply its meaning in a long-term sense. For example, when
characterizing a performance variable, we often seek to accomplish two things. First, we attempt to
isolate the short-term influence of random, “transient” effects (instantaneous errors). In general,
transient errors usually prove to be of the random variety. Second, we isolate those factors that require
the passage of time before their unique character can be fully identified or otherwise assessed. Such
errors are time-dependent and, as a consequence, are often referred to as “temporal errors.” From this
perspective, it is easy to understand how the collective influence of transient effects can govern the
short-term capability (instantaneous reproducibility) of a process. Given this, it is now easy to reason
how the total set of temporal effects (coupled with the aggregate transient effects) determine the long-
term (sustainable reproducibility) of a process. Again, we must take notice of the fact that any given
transient or temporal effect can be of the random or nonrandom variety. However, as previously stated,
transient effects most generally induce a random influence whereas temporal effects are generally
manifested as both.
19
The reader should recognize that the idea of “error” and that of a “defect” are closely related, but not
necessarily synonymous. For example, let us postulate the marriage of a certain process to a
symmetrical bilateral specification, such that µ = T. In addition, we will also postulate the existence of a
particular error described as δi = Yi - µy. In this case, the deviation δi is fully recognized as an error, but
its vectored magnitude may not be large enough to constitute a defect (nonconformance to
specification).
20
The reader is again reminded that our discussion is based on the assumption of a random normal
variable.
21
The colorful term denominator management is used to describe the practice of inflating or otherwise
distorting the denominator term of the classic quality metric called defects-per-opportunity. As should
be apparent to the informed practitioner, such a practice is most often applied to effectively mask or
confound the true quality of a product or service. For example, consider a simple printed circuit board
(PCB) that employs through-hole technology. In this case, we will exemplify the soldered connection
between the two leads of a standard carbon resistor and the PCB. Given this, it is understood that each
component lead must be adequately soldered to the PCB at two different but related points (i.e., on the
top-side and bottom-side of the board). For the sake of discussion, let us say that the performance
category called “solder joint pull strength” is the CTQ of concern. Given the nature of this CTQ and
application technology at hand, it should be quite evident that each PCB connection constitutes an
independent opportunity to realize a pull-test failure. In other words, each lead of the resistor
represents a defect opportunity. If one lead of the resistor passes the pull test and the other lead fails
the test, then the defects-per-opportunity metric would be properly presented as dpo = d / o = 1 / 2 =
.50. A more liberal perspective would hold there are four defect opportunities since there would exist
four separate solder joints. In this event, the defects-per-opportunity would be wrongfully reported as
dpo = d / o = 1 / 4 = .25. Even more liberal would be the case that advocates six defect opportunities –
four solder joints and two leads. Taken to an extreme, some conniving managers might even try to say
there exist eight defect opportunities – four solder joints, two leads, and two through-holes. In this case,
the product quality would be given as dpo = d / o = 1 / 8 = .125. In this way, management could
inappropriately create a 4X quality improvement by simply changing the “rules of defect accounting.”
Thus, we have improvement by denominator management. To avoid such an error of leadership, we
must recognize that any given unit of product or service will inherently possess “Y” number of critical
failure modes, where each mode has “X” number of active chances. Thus, the total number of defect
opportunities can be described by the general relation O = Σ( Y * X ).
22
For purposes of simplified communication, the author shall define the term “product” to mean any form of
deliverable resulting from a commercial or industrial endeavor or process. In some cases the “product”
may be a process, such as those often encountered in the service sector. In addition, any performance
characteristic that is vital to customer or provider satisfaction will be herein referred to as a “critical-to-
quality characteristic,” or CTQs for short.
23
There is usually a performance expectation for each and every critical feature in a system. Of course,
such specifications and requirements are derived from higher-order negotiations between the customer
and provider about what constitutes “value” in the business relationship. When such value is achieved or
exceeded, even for a single CTQ, we can say that entitlement has been realized. In this sense, value
entitlements are rightful expectations related to the various aspects of product utility, access and worth.
For example, there are three primary physical aspects (expectations) of utility – form, fit, and function. In
terms of access, we have three basic needs – volume, timing and location. With respect to worth, there
exist three fundamental value states – economic, intellectual and emotional.
24
Holistically speaking, any design feature (or requirement) constitutes a quality characteristic.
Interestingly, such characteristics are also known as “potential defect opportunities.” If a defect
opportunity is vital or otherwise critical to the realization of quality, it is most typically called a critical-to-
quality characteristic and designated as a “CTQ.”
25
Based on this, it is only natural that the defects-per-unit (dpu) will increase as the number of CTQs are
increased, given a constant and uniform level of process capability. As a result of this, the DPU metric is
not a good comparative index for purposes of benchmarking. In other words, DPU should not be used to
compare the inherent quality capability of one deliverable to some other type of deliverable, owing to
differences in complexity. However, by normalizing the DPU to the opportunity level, and then converting
the defect rate to a sigma value (equivalent Z), it is possible to compare apples-to-oranges, if you will.
Only then do we have a level playing field for purposes of benchmarking and for subsequently comparing
dissimilar phenomena.
26
We naturally recognize that a symmetrical-bilateral specification is arguably the most common type of
performance requirement. As a consequence, this particular type of design expectation was selected to
conventionally idealize a statistically-based definition of six sigma capability. Nevertheless, we must also
acknowledge the existence of asymmetrical-bilateral specifications, as well as unilateral specifications
(one-sided). While the unilateral case can be defined by either side of a symmetrical-bilateral
specification (with or without a nominal specification), the short-term error rate is consequently reduced
to one defect-per-billion-opportunities, or DPBO = 1.0. However, the asymmetrical bilateral case
presents some interesting challenges when attempting to define a six sigma level of capability. For
example, consider an asymmetrical-bilateral performance specification while recognizing that a normal
distribution is symmetrical – indeed, an interesting set of circumstances. Given this framework, a six
sigma level of capability must be conditionally associated with the most restrictive side of the
specification. In other words, the capability must be made relational to the smallest semi-tolerance zone.
But if for some pragmatic reason it is more beneficial to locate the process center off target (in the form
of a static mean offset), the short-term definition of six sigma becomes highly relative. For such
instances, sound statistical reasoning must prevail so as to retain a definition that is rational, yet
theoretically sound.
27
It must be recognized that the short-term standard deviation (root-mean-square) is a statistical measure
of random error that, when properly estimated, provides an index of instantaneous reproducibility. In this
regard, it only reports on the relative extent to which random background variation (extraneous noise)
influences the “typical mean deviation” that can be expected at any given moment in time. In this sense,
it constitutes the magnitude of instantaneous error that emanates from the system of causation and is,
therefore, a measure of inherent capability, also called entitlement capability.
28
The uninformed reader should understand that unity (per se) is statistically constituted by 100 percent of
the area under the normal distribution. Given this, we naturally recognize that the “tails” of a normal
distribution bilaterally extend to infinity. However, conventional quality practice often “trims the tails” of
such a distribution and declares that unity exists between the three sigma limits. This is done in the
interests of enjoying certain analytical conveniences. Of course, this convention logically assumes the
area extending beyond the three-sigma limits is trivial and, therefore, inconsequential. Perhaps such an
assumption is reasonable when balancing statistical precision against the demands of quality reporting.
1.5sA
m =100 115.0 122.5 130.0
Case A = Short-Term
ppm = .001
50%
Margin
scale
Case A = 6.0s
Case B = 4.5s
T USL
Figure 3.3.1
Depiction of a Six Sigma Critical-to-Quality Characteristic that
Reflects Transient and Temporal Sources of Random Error
30
These assertions will be theoretically demonstrated later in this discussion. For the moment, the reader
is kindly asked to faithfully accept this premise without proof.
31
However, such a shock effect is often manifested as a transient (short-term) disturbance to the process
center. When this happens, the probability of a defect temporarily increases. Of course, the exact
duration of this effect is generally indeterminate, owing to the random nature of the underlying system of
causation. Because of this, it should now be apparent that if a design engineer seeks to establish a long-
term safety margin of M = .25, the short-term marginal expectation must be generously greater than 25
percent. By enlargement of M, the engineer is able to provide a more realistic level of “guard banding”
that cushions a performance distribution against certain types of disturbances resulting from transient
and temporal effects that tend to upset process centering. Again, more will be said about this later in this
book.
Figure 3.4.1
Depiction of a Long-term Six Sigma Critical-to-Quality Characteristic
Presented as an Equivalent Short-term Shifted Distribution
32
As this discussion point would naturally infer, the population standard deviation is fully known a priori
and genuinely reflects all known and unknown sources of random error (white noise).
33
For purposes of this discussion, it will be known to the reader (by definition) that the given sample
consisting of N = 4 members prescribes a “rational” sub-grouping of the measurements. In recognition
of conventional quality practice, this assertion stipulates that the observed within-group errors are fully
independent, random and normally distributed. Furthermore, sampling plans that involve the formation
of rational subgroups often rely on a subgroup size within the general range 4 < N < 6, where the typical
subgroup size is often defined as N = 5. Since subgroup size is positively correlated to statistical
precision, it is proposed that the case of N = 4 can be pragmatically and operationally viewed as a
“worst-case” sampling construct, especially when declaring the expected theoretical error associated
with process centering. In other words, a design engineer is often not privy to the sampling plan that
manufacturing intends to implement (for purposes of statistical process control). As a consequence, the
design engineer should be somewhat pessimistic when attempting to analyze the influence of natural
process centering errors on design performance. Hence, the reliance on “worst-case” sampling
assumptions when analyzing the producibility of a design (prior to its release for production).
34
For example, the distribution of sampling averages is one of several theoretical constructs that is
essential to the proper construction and operation of an Xbar and R chart. Such statistical devices are
often employed during the course of production to ensure the proper and sufficient management of
process centering.
35
Knowledge of the distribution of sample averages would make it possible (and highly advantageous) to
account for natural process centering error during the course of design. In this manner, the natural and
expected errors in process centering (as would be normally experienced during the course of
production) could be effectively neutralized or otherwise managed at the time of design configuration.
Of course, the principles of robust design and mathematical optimization could be invoked to realize this
aim.
The reader must recognize that such a level of confidence ( 1- α = 1 - .0027 = .9973, or 99.73 percent)
36
is frequently employed in the quality sciences, especially in the application of statistical process control
(SPC) charts. Statistically speaking, this particular level of confidence is defined or otherwise
circumscribed by the ± 3.0σXbar limits commonly associated with the distribution of sampling averages.
37
The reader must recognize that a critical-to-quality characteristic (CTQ) is, by definition, a defect
opportunity (assuming it is actively assessed and reported). To illustrate, let us consider a product
called “Z.” As expected, Z would most likely consist of Y number of CTQs, where any given CTQ could
have X number of occurrences. Therefore, the total number of defect opportunities per unit of product
would be computed as O = Σ(Y * X).
38
Of sidebar interest, the advanced reader will understand that the Poisson distribution can be employed
to establish the throughput yield of a process (likelihood of zero defects) when the dpu is known or has
been rationally estimated. This is done by considering the special case of r = 0 (zero defects), where
r -dpu
the quantity Y = [(dpu) * e ] / r! is reduced to Y = e -dpu. In this reduced form, the quantity Y
represents the statistical probability of first-time yield. In other words, e -dpu is a quantity that reports the
statistical probability of a unit of product (or service) being realized with zero defects (based on the
historical dpu or projection thereof).
39
Generally speaking, the shift factor is added to an estimate of long-term capability in order to remove
long-term influences, therein providing an approximation of the short-term capability. Conversely, the
shift factor is subtracted from an estimate of the short-term capability in order to inject long-term
influences, thereby providing an approximation of the long-term capability. For example, if the long-
term capability of a process was known to be 4.5σ, and we seek to approximate the short-term
capability, then 1.5σ would be added to 4.5σ, therein providing the short-term estimate of 6.0σ.
Conversely, if the short-term capability was known to be 6.0σ, and we seek to approximate the long-
term capability, then 1.5σ must be subtracted from 6.0σ, therein providing the long-term estimate of
4.5σ
40
In other words, it is a “best guess” in the light of ambiguity – especially in the absence of actual short-
term performance information. As such, it must not be viewed as an empirical measure of inherent
capability or instantaneous reproducibility – as many uninformed practitioners might falsely believe. It is
simply a rational “ballpark” approximation, expectation, or projection of short-term performance – made
in the absence of empirical data or experiential information.
41
Naturally, this assumes that the process of verification is perfect. This is to say that the test or
inspection process is fully devoid of any type or form of error. In other words, the probability of decision
error is zero, regardless of its nature -- Type I ( α ) or Type II ( β ).
42
In this particular case example, the “centering drift” is biased toward the USL, owing to the fact that an
outside dimension (OD) is being considered. If an inside dimension (ID) is being considered, the drift will
be biased toward the LSL.
43
Utility has to do with the form, fit and function of a deliverable. Access has to do with the various timing,
volume and location aspects associated with the delivery of a product or service. Worth covers the
emotional value, intellectual value and economic value of any given deliverable.
44
We naturally recognize that the configuration and composition of a system’s design is unique in every
case. In fact, the interactions within and between these two aspects of a design can spawn a very
complex system of classification in terms of scope and depth. Owing to this, we often see the Pareto
principle at play when considering a certain aspect of the design. Translated, this principle holds that a
certain 15 percent of a system’s complexity will fully account for 85 percent of the value associated with a
given aspect of quality (utility, access, worth). However, when the various aspects of quality are
considered as a collective whole, the Pareto principle is often severely mitigated or otherwise distorted.
In general, however, the Pareto Principle (85/15 rule) will emerge and become self-apparent as a given
system of quality classification is hierarchically and progressively interrogated. The reader should be
aware that many practitioners advocate the rule of Pareto to be 80/20. Regardless of analytical
precision, the main lesson under girding the Pareto Principle is about how the vital few often has more
influence than the trivial many.
45
In the namesake of pragmatic communication, this author has made liberal use of the term “worst-case.”
For the given context, it must not be interpreted as a mathematical absolute or engineering construct, but
rather as a statistical boundary condition (much like the natural limits of a confidence interval). For
example, one of the confidence bounds related to some mean (or standard deviation) can be thought of
as the “statistical worst-case condition” of that parameter. In this context, the term is quite relative to
such things as alpha risk and degrees-of-freedom, not to mention various distributional considerations.
Nonetheless, its use carries high appeal for those not intimately involved with the inner workings of
statistical methods. More will be said on this topic later on in the discussion.
Figure 4.1.1
Visualization of the Design Margins Imposed on CTQ4
At this point, the analyst decided it would be necessary to set forth the
short-term standard deviation that would be associated with CTQ4. Using the
short-term system-level producibility analysis as a backdrop, she computed
the quantity σA = (SL – T) / ZST = (130 –100) / 4.0 = 7.50. Of course, this
particular standard deviation represents or otherwise constitutes the
instantaneous capability of CTQ4.46 For purposes of our discussion, we will
46
Instantaneous capability only reports on the short-term reproducibility of a characteristic. In other words,
it only considers the influence of random background variations (white noise, or pure error as some
would say). In this context, the instantaneous (short-term) capability offers a moment-in-time “snapshot”
of the expected performance error. An extension of this idea provides the understanding of “longitudinal
capability.” The longitudinal capability (also called temporal capability) not only considers the influence
Eq.( 4.2.1 )
of black noise (nonrandom variations), but includes the influence of white noise as well. In the real world,
short-term capability is always greater than long-term capability (in terms of Z) for a wide range of
pragmatic reasons (e.g., the influence of tool wear, machine set-up and the like). Only when there is an
absence of black noise will the two forms of capability be equal. Under this condition, the characteristic
of concern is (by definition) said to be in a perfect state of “statistical control.” In other words, variation in
the characteristic’s performance is free of assignable and special causes and, as a consequence, is
subject to only random sources of error.
47
The short-term variation model (SVM) is offered as an analytical contrast to the long-term variation model
(LVM). By definition, the SVM only reflects the influence of random variation (extraneous error of a
transient nature), also called “white noise.” The LVM not only reflects random sources of error, but
nonrandom sources as well. In this sense, the LVM echoes “gray noise” because it reflects the mixture
of random and nonrandom sources of error. The differential between the SVM and the LVM portrays the
pure effect of nonrandom variation, or “black noise” as it is often called. In general, it can be said that the
influence of random error determines the bandwidth of a performance distribution, whereas the signal
(central tendency) of that distribution is governed by nonrandom error. Thus, we say that T = W + B,
where T is the total noise, W is the white noise and B is the black noise. Owing to this relationship, it
should be apparent that the total noise can be decomposed into its component parts for independent
analysis and optimization.
48
There are a number of different types and forms of variation design models (VDMs), such as that for a
hypothetical mean and variance. In most cases, the VDM is a theoretical construct (or set of constructs)
that is postulated so as to engage or otherwise facilitate some type of design-related analysis, simulation
or optimization. Interestingly, in more progressive design organizations, the VDMs are provided in a
database that consists of actual process capabilities and various types of parametric data. Such
databases provide a distinct advantage when attempting to “mate” a CTQ specification to a production
process. The pairing of a design specification with the performance capability of a candidate process is
a key topic in the field of design for six sigma, or DFSS as it is most often called. Of course, the primary
aim of such “pairing” is to optimize all value entitlements, not just those that are product performance or
quality related.
49
As a theoretical construct, the notion of degrees of freedom is fully independent of time, but not so in
practice. For example, it would take an infinite period of time to produce an infinite number of units.
However, when considering the many approaches to the conduct of a producibility analysis, it should be
recognized that any given VDM containing an infinite degrees of freedom can be declared as a short-
term or long-term model, depending on application circumstances. For the case scenario at hand, we
can say that the designer postulated the referenced VDM as a short-term construct, owing to the
application context. In other words, the designer treated the given VDM as an “instantaneous model,”
versus a “temporal model.” As a result, the analytical focus is on “error expansion” as compared with
“error contraction.” More will be said later in this discussion about these two unique but interrelated
concepts.
50
During the execution of a design-process qualification (DPQ) it is often not possible to obtain the
measurements by way of a random sampling plan. For example, a newly developed process might be
brought “on-line” just long enough to get it qualified for full-scale production. Given this, it is likely that
only a few units of product will be produced (owing to the preparation and execution costs associated
with a short production run). As yet another example, the candidate process might currently exist (and
have a performance history), but has been selected to produce a newly developed product (with no
production history). In either case, there is no “steady stream” of measurements from which to “randomly
sample”. When such constraints are at hand, such as presented in our application scenario, the
performance measurements are usually taken in a sequential fashion. Owing to this, one must often
assume that the resulting measurements are independent and random (for purposes of statistical
analysis). The validity of this assumption can be somewhat substantiated by autocorrelation (testing the
measurements at various lag conditions). If the resulting correlation coefficients are statistically
insignificant (for the first several lag conditions) it is reasonable to assume that the measurements are
random-even though they were sequentially obtained. Given the general absence of correlation, it would
then be rational to assert their random nature and independence. In addition, we are also often forced to
assume that the measurements are normally distributed. Employing a simple normal probability plot can
test this assumption (to a reasonable extent). In essence, we are often forced to employ a sequential
sampling strategy and then subsequently utilize a family of statistical tools that assumes the data is
normal, independent, and random. Fortunately, many statistical procedures (such as those often used
during a DPQ) are relatively robust to moderate violations of the aforementioned assumptions.
51
Statistically speaking, we recognize that the given sample size (n = 30) constitutes a point of “diminishing
return” with respect to “precision of estimate.” To better understand this point, let us consider the
standard error of the mean. This particular statistic is defined by the quantity σ/sqrt(n). Now suppose we
were to plot this quantity for various cases of n under the condition σ = 1.0. Such a plot would reveal
several break points, or “points of diminishing return.” The first point occurs at about n = 5, the second
point at around n = 30, and the third point in the proximity of n = 100. Thus, as n is incrementally
increased, the quantity σ/sqrt(n) decreases disproportionately. This is one of the reasons statisticians
often say that n = 30 is the ideal sample size – it represents a rational tradeoff between statistical
precision and sampling costs.
52
The reader should recognize that many manufacturing organizations “buy off” on a process (during the
design phase) on the basis of only a few samples. In fact, some execute a practice called “first article
inspection.” From a statistical point of view, this is a very spurious practice, since it is virtually impossible
to construct meaningful (useful) confidence intervals with only a few degrees of freedom. Without proof,
there are valid reasons for supporting the case of n = 30. For the purpose of process qualification, it may
be necessary to form g rational subgroups consisting of n observations to realize ng = 30 samples.
Rational subgrouping is often employed to block sources of black noise. In essence, such a practice
enables the benefits of a larger df, but minimizes the likelihood of “black noise contamination” in the final
estimate of instantaneous capability.
Since the given DPQ called for n = 30 samples, the analyst recognized
that she would only have df = n – 1 = 30 – 1 = 29 degrees of freedom with
which to statistically verify the SVM during the course of process evaluation.
Given this, she reasoned to herself that such a sampling constraint might
produce a biased estimate of the “true” short-term process standard deviation.
Owing to this phenomenon, there would exist some statistical likelihood of
rejecting a candidate process that might have otherwise been fully qualified.
As we shall come to understand, the implications of this are quite profound.
For example, if the true short-term process standard deviation of a
particular “candidate process” is in reality 7.50, it is quite likely that a limited
sampling of n = 30 will reveal a biased short-term standard deviation. This is
to say that any given estimate of instantaneous reproducibility could provide a
short-term standard deviation greater than 7.50, owing to a pragmatic
constraint on the degrees of freedom made available for the process
53
The idea of an alpha state can be applied to any type of sampling distribution (empirical or theoretical) or,
more specifically, to any or all of the parameters associated with a sampling distribution (such as
measures of central tendency and variability, or mean and standard deviation, respectively). Owing to
this, a “statistical worst-case distribution” is also called the “alpha sampling distribution.” As such, it
constitutes a producibility risk condition that prescribes the “statistical state of affairs” in the presence of
random sampling error.
54
By convention, alpha risk is often established at the .05 level. Of course, this translates to 95 percent
decision confidence. Since the statistical analysis of a design almost always involves multiple decisions,
a higher level of decision confidence is often required to compensate for the degradation of confidence
when considering the cross-multiplication of decision probabilities. Therefore, we impose a 99.5 percent
level of confidence (as a convention) for purposes of producibility analysis. This substantially improves
the aggregate confidence when considering multiple decisions. For example, a .95 level of decision
n −1 30 − 1
σˆ B = σˆ A = 7. 5 = 11.15
χ1− α
2
13.12
Eq.( 4.4.1 )
Thus, she was able to compute (estimate) the worst-case condition of the
short-term standard deviation model (SVM). The analyst then concluded that
if the DPQ team isolated a process that exhibited a short-term standard
deviation of 7.50 (on the basis of n = 30 random samples), it would be
possible that the “true” standard deviation could be as large as 11.15 (worst-
case condition). Obviously, if such a magnitude of variation eventually
proved to be true (because of random error in the qualification sample), there
would be a practical (as well as statistical) discontinuity between the analyst’s
reproducibility expectation and reality.55
confidence applied to 10 decisions provides an aggregate (joint) confidence of only 60 percent, whereas
a .995 level reveals the joint certainty to be about 95 percent. Of course, it is fully recognized that some
circumstances might require a more stringent alpha while others might tolerate a more relaxed criterion.
For the reader’s edification, it should be pointed out that the general combination α = .005 and df = 29 is
55
arguably the most “generally optimal” set of such decision criteria to employ when conducting a design-
process qualification (DPQ). When considering this, and other factors, the given combination offers a
standard convention from which to initiate the practice of design for six sigma (DFSS). Of course, this
particular convention should give way to other combinations as DFSS becomes entrenched in an
organization. Experience will naturally show the path to more optimal conditions of α and df, owing to the
unique circumstances associated with each application – sampling costs, destructive testing, production
volume, background knowledge, internal procedures, customer requirements and so on. Owing to the
consideration of these and many other factors, the combination α = .005 and df = 29 was employed by
this researcher to originally establish the first six sigma DPQ and subsequently validate the 1.5σ shift
factor. Again, it must be recognized that this particular set of decision criteria was judiciously selected
and practiced by this researcher for an array of theoretical and pragmatic reasons, many of which are far
beyond the scope and intent of this book. Consequently, we must recognize and always bear in mind
that the 1.5σ shift factor is a dynamic construct of a theoretical nature. As such, it is only retained in
static form when the aforementioned decision criteria are considered and practiced as a convention.
Based on an expected short-term standard deviation of 7.50, and given that α = .50 and df = 29, the 50th
56
percentile of the chi-square distribution reveals a theoretical short-term standard deviation of 7.68. This
small discrepancy is attributable to the fact that df = 29. However, as the degrees of freedom
approaches infinity, the consequential discrepancy would necessarily approach zero. Thus, we
recognize the 50/50 odds that the true short-term standard deviation will be greater (or less) than 7.50.
Eq.( 4.5.1 )
57
Following a DPQ, it is conventional practice to continually monitor the instantaneous and longitudinal
capability of CTQs. For a continuous, high-volume performance characteristic (such as CTQ4), this task
can be effectively accomplished by way of a statistical process control device called an “Xbar and S
chart.” The use of such an oversight mechanism forces the white noise (extraneous variations) to
appear in the S chart while the influence of black noise (nonrandom variations) is forced to emerge in the
Xbar chart. The general use of such analytical tools requires the implementation of a sampling technique
called “rational subgrouping.” Essentially, this method of sampling forces the random sampling errors to
be retained within groups, while the nonrandom sources of error is forced to develop between groups.
By virtue of the merits associated with rational subgrouping, one-way analysis of variance can be
naturally employed to interrogate the root-mean-square of the error term (within-group standard
deviation). As would be intuitively apparent to the seasoned practitioner, the within-group component is
a direct measure of instantaneous reproducibility. As a natural consequence, the various components of
error can be subsequently utilized to formulate certain other indices of short-term capability (ZST, Cp, Ppk
and so on). To achieve this aim, we employ the general model SST = SSW + SSB, where SS is the sum
of squares, T is the total estimate, W is the within group estimate and B is the between group estimate.
In this form, the SSW term can be continually updated to obtain an ongoing estimate of the background
noise (random error) without the contaminating influence of nonrandom sources of error, as this type of
error is continually integrated into the SSB term. As a side benefit of this, the SSB term can be employed
to establish the “equivalent mean shift.” Of course, all of these assertions can be directly verified by
mathematical examination or by way of a simple Monte Carlo simulation. More will be said about this
later on in the discussion.
58
From this perspective, it should be evident that a process capability ratio of Cp = 2.00 defines a six sigma
level of performance. For this level of capability, only 50 percent of the design bandwidth is consumed
by the process bandwidth. Of course, the remaining 50 percent of the design bandwidth is dedicated as
“design margin.” Given this, it should be self-evident that a process capability ratio of Cp = 2.00
corresponds to a 50 percent design margin (M = .50). Here again, the criterion of “six sigma” would be
fully satisfied.
59
The reader is admonished to recognize that the general practice of worst-case analysis is, in an of itself,
generally not a good thing. Such a position is rational because the statistical probability of such events
often proves to be quite remote. For example, if the probability of failure for a single characteristic is 10
percent and there are only five such interdependent characteristics, the likelihood of worst-case would be
5
.10 , or .00001. Of course, this translates to one in 100,000. Obviously, this probabilistic circumstance
would imply an “overly conservative” design. Although the aim of worst-case design is to secure a
“guarantee” of conformance to performance standards, it usually does nothing more than suboptimize
the total value chain. However, when descriptive and inferential statistics are integrated into the general
practice of worst-case analysis, we are able to scientifically secure a “conditional guarantee” of sorts, but
without absorbing any of the principal drawbacks. In other words, the application of modern statistics to
worst-case analysis provides us a distinct opportunity to significantly enhance the ways and means by
which the producibility of a product design can be assessed and subsequently optimized. From this
perspective, it is easy to understand why this six sigma practitioner advocates the use of applied
statistics when establishing performance specifications (nominal values and tolerances) during the
course of product and process design.
T + 3 σˆ A + Zshift σˆ A = T + 3 σˆ B .
Eq. ( 5.1.1 )
Applying this equality to the case data, the analyst reaffirmed that T =
100, σA = 7.50, Zshift = 1.46, σB = 11.15. In addition, she used the conventional
value of Z = 3.0 as a constant to prescribe the upper limit of unity. By
substitution, the analyst computed the equality as 100 + (3 * 7.5) + (1.46 *
7.5) = 100 + (3 * 11.15) = 133.45.
Recognizing the equality of these two quantities, the analyst was able to
successfully establish that the upper limit of unity related to case A exactly
coincided with the worst-case condition given by case B. Simply aligning the
elements of Eq. (5.1.1) with the corresponding elements provided in figure
4.6.1 provides even greater insight into the stated equality. To further such
insight, she determined that the standardized equivalent mean offset (Zshift)
3 σˆ B − 3 σˆ A (3 ∗11.15) − (3 ∗ 7.50)
Z shift = = = 1.46 .
σˆ A 7.50
Eq. ( 5.1.2 )
Thus, she was able to recognize that the quantity Zshift describes the
relative differential between the mean of case A2 and that of case B, but
scaled by the nominal standard deviation associated with case A. Because of
the nature of Zshift, the analyst clearly understood that it could not be
technically referenced as a “mean shift” in the purest and most classical sense.
However, she did come to understand that it could be declared as an
“equivalent mean shift,” but only in a theoretical sense. From another
perspective, she recognized that the quantity Zshift provided her with yet
another benchmark from which to gain deeper insight into the “statistical
worst-case” condition of the design, but only with respect to the temporal
reproducibility of CTQ4 set in the context of a DPQ.
Owing to this line of reasoning, the analyst concluded that the quantity
Zshift is simply a compensatory static (stationary) off-set in the mean of a
theoretical performance distribution (TPD) reflecting the potential influence
of dynamic sampling error (of a random nature) that would otherwise inflate
the postulated short-term standard deviation of the TPD (at the time of
performance validation). In light of this understanding, the analyst noted to
herself that Zshift cannot be statistically described as a “true” standard normal
deviate, simply because its existence is fully dependent upon the chi-square
distribution (owing to the theoretical composition of σB).
Stated in more pragmatic terms, the analyst recognized that Zshift does
not infer a “naturally occurring” shift in a distribution mean (in an absolute or
classical sense). Rather, it is an “equivalent and compensatory shift”
employed to statistically emulate or otherwise account for the long-term
σˆ B 11.15
c= = = 1.488 ≅ 1.49 ,
σˆ A 7.50
Eq. ( 5.2.1 )
where the resultant 1.49 indicated that the SVM (σA = 7.50) should be
artificially inflated or “expanded” by about 149 percent to account for the
potential effect of statistical worst-case sampling error.
At this point, the analyst decided to transform σA to the standard normal
case. Thus, she was able to declare that σA = 1.0 and thereby constitute a unit-
less quantity. For this case, she then rationalized that c2 = ( n - 1) / χ2, where
χ2 is the chi-square value corresponding to selected α and df. Thus, the
analyst was able to establish the theoretical connection between c and the chi-
square distribution. Given this, she then formulated Zshift from a rather unique
perspective by providing the standardized equivalent mean shift in the form:
Eq. ( 5.2.2 )
c = σˆ B =
.
F
σˆ A
Eq. ( 5.3.1 )
As may be apparent, from Eq. 5.2.2 and Eq. 5.3.1 the analyst was able
to formulate the quantity Zshift = 3( sqrt(1 / F ) - 1 ). Using this particular
relationship as a backdrop, she then computed Zshift = 3( sqrt( 1/ .4525) - 1) =
1.46, or 1.50 in its rounded form. Of course, she referenced the F distribution
with the appropriate of freedom. In this case, she utilized df = ( n – 1 ) = ( 30
– 1 ) = 29 degrees of freedom in the numerator term and declared an infinite
degrees of freedom for the denominator term. In addition to this, she
referenced the F distribution with a decision confidence of C = (1 - α ) = ( 1 -
.005) = .995, or 99.5 percent. Given these criteria, the analyst discovered that
F = .4525.
σˆ =
n −1 30 − 1
C
Eq.( 7.1.1 )
She then informed one of her colleagues that the given value of 5.045
represents the criterion short-term sampling standard deviation that must be
acquired by the production manager upon execution of the DPQ in order for
the related process to “qualify” as a viable candidate for full-scale production.
By isolating a process with a short-term sampling standard deviation of 5.045
or less (at the time of the DPQ), the production manager would be at least
99.5 percent certain that the true short-term standard deviation would not be
greater than σA = 7.50, given df = n – 1 = 30 –1 = 29 at the time of sampling.
As should be evident, such a “target” standard deviation will virtually ensure
that the design engineer’s minimum producibility expectation will be met.
Merging the statistical mechanics of a confidence interval with the idea
of design margins and specification limits, the analyst computed the same
result by interrogating the relation
.
Eq.( 7.1.2 )
σˆ C = = = 5.045 ≅ 5.0
9(n − 1) 9(30 − 1)
Eq.( 7.1.3 )
Eq.( 7.2.1 )
Eq.( 7.2.2 )
Since the analyst’s goal was to net Cp = 1.33 with at least 99.5 percent
certainty, she determined that a process capability of Cp = 2.0 or greater would
have to be discovered upon execution of the DPQ (based on df = 29). Of
course, this was to say that the ±3.0σST limits of such a process would
naturally consume only one-half of the specification bandwidth (tolerance
zone), owing to the understanding that 1 / Cp = .50, or 50 percent. Of course,
it is fully recognized that this quantity is just another form of design margin.
If such a process could be isolated and subsequently qualified, there would be
a very small risk (α = .005) of inappropriately accepting that process as a
candidate for adoption and implementation. Figure 7.2.1 graphically
summarizes all of the fundamental assertions related to our discussion thus
far.
SL − T 130 − 100
σˆ spec = = = 5.0 .
6 6
Eq. ( 7.4.1 )
60
The idea of independence is essential to the existence of modern statistics and the practice of six sigma.
To illustrate, consider a cupcake pan. If we prepared n = 8 cupcakes from the same “mix” and then put
them in a standard 8-hole pan for purposes of baking, we could subsequently measure the “rise height”
of each cupcake once removed from the oven. In this scenario, all n = 8 cupcakes would have likely
experienced very little difference in the baking conditions during preparation. In other words, each hole
would have simultaneously experienced the same causal conditions during preparation and baking. As a
consequence, we could not consider the “within pan” measurements to be independent of each other. It
is interesting to notice that by preparing all n = 8 cupcakes at the same time, we would likely have
“blocked” the influence of many variables (of a random and nonrandom nature).
61
The reader is kindly asked to remember that throughout this portion of this book the term “process
center” is used without regard to the nominal (target) specification. It simply implies the central location
of a normal distribution relative to some continuous scale of measure. Furthermore, it must always be
remembered that the employment of a rational sampling strategy gracefully supports the study of
autocorrelations and time-series phenomenon during the course of a process characterization and
optimization initiative. Of course, this is another discussion in and of itself – perhaps at some other time.
62
Numerous times, this practitioner of six sigma has witnessed (after the fact) precious resources
squandered on new capital-intensive technology because the true capability of the existing technology
was not properly estimated, or was improperly computed. It is professionally shameful that, virtually
every day across the world, many key quality and financial decisions are founded upon highly biased
indices of capability. Arguably, the most common error in the use of Cp is the inclusion of a standard
deviation that, unknown to the analyst, was confounded with or otherwise contaminated by sources of
nonrandom error (black noise) – thereby providing a less favorable estimate of short-term capability.
63
So as to facilitate the execution of a process characterization study, this six sigma practitioner has often
employed a rational sampling strategy that is “open ended” with respect to subgroup size. In other
words, the sample size of any given subgroup is undefined at the onset of sampling. The performance
measurements are progressively and sequentially accumulated until there is a distinct ”slope change” in
the plotted cumulative sums-of-squares. As a broad and pragmatic generality, the cumulative sums-of-
squares will aggregate as a relatively straight line on a time series plot-but only if the progressive
variations are inherently random. It can generally be said that as the sampling progresses, a pragmatic
change in process centering will reveal itself in the form of a change in slope. Naturally, the source of
such a change in slope is attributable to the sudden introduction of nonrandom variation. Of course, the
point at which the slope change originated is also a declaration for terminating the interval of subgroup
sampling. Although it can be argued that this particular procedure is somewhat inefficient, it does help to
ensure that virtually all of the key sources of random error have had the opportunity to influence the
subgroup measurements. At the same time, the sampling disallows the aggregate mean square
(variance) from being biased or otherwise contaminated by the influence of assignable causes. After
defining the first subgroup, the second subgroup is formed in the same manner, but not necessarily with
the same sample size. This process continues until the “cumulative pooled variance” has reached a
rational plateau (in terms of its cumulative magnitude). Naturally, the square root of this quantity
constitutes the composite within-group error and is presented as the “short-term” standard deviation. As
such, it constitutes the instantaneous capability of the process and constitutes the pragmatic limit of
reproducibility. However, it is often necessary to continue the formation of such subgroups until all
principal sources of nonrandom variation have been accounted for. This objective is usually achieved
once the between-group mean square has reached its zenith (in terms of its cumulative magnitude). At
this point, the composite sums-of-squares can be formed and the total error variance estimated. Of
course, the square root of this quantity constitutes the overall error and is known as the “long-term”
standard deviation. As such, it constitutes the sustainable capability of the process. The relative
differential between these two indices constitutes the extent of “centering control” that has been exerted
during the interval of sampling. To facilitate the computational mechanics of such analyses, several
years ago this researcher provided the necessary statistical methods to MiniTab. In turn, they created
the “add-on” analytics (and reports) now known as the “Six Sigma Module.” This particular module has
been specifically designed to facilitate the execution of a six sigma process characterization study.
2 2 2 2
σT = σ1 + σ2 + + σ N.
Eq. ( 8.2.3 )
Obviously, as the leverage or vital few sources of replication error are
discovered and subsequently reduced in strength and number, our capability
and capacity to replicate a set of “success conditions” will improve
accordingly.
When reporting the performance of an industrial or commercial product,
process or service, it is customary and recommended practice to prepare three
separate but related measures of capability. The first performance measure
reflects the minimum capability or "longitudinal reproducibility" of the
characteristic under investigation. In this case, the performance measure is
given by σ2T. As previously indicated, the total error accounts for all sources
64
The instantaneous reproducibility of a manufacturing process reflects the state of affairs when the
underlying system of causation is entirely free of nonrandom sources of error (i.e., free of variations
attributable to common or "assignable" causes). Given this condition, the related process would be
operating at the upper limit of its capability. Hence, this particular category of error cannot be further
reduced (in a practical or economical sense) without a change in technology, materials or design.
65
For purposes of process characterization, such a sampling strategy has been thoroughly described by
Harry and Lawson (1990). The theoretical principals and practical considerations that underpin the issue
of "rational sub-grouping" have also been addressed by Juran (1979), as well as Grant and Leavenworth
(1980). In the context of this book, it should be recognized that the intent of a rational sampling strategy
is to allow only random effects within groups. This is often accomplished by “blocking” on the variable
called "time." When this is done, the nonrandom or assignable causes will tend to occur between
groups. Hence, one-way ANOVA can be readily employed to decompose the variations into their
respective components. With this accomplished, the practitioner is free to make an unbiased estimate of
the instantaneous reproducibility of the process under investigation. With the same data set, an estimate
of the sustained reproducibility may also be made, independent of existing background noise.
Eq. ( 8.2.5 )
Eq. ( 9.1.1 )
g nj
SS W = Σ Σ Xij - Xj 2
j=1 i=1
Eq. ( 9.1.2 )
th
where X j is the average of the j subgroup. The second component to be
examined is the between-groups sum of squares. Considering the general
case, this partition be described by
g
2
SS B = Σ
j=1
nj X j - X
Eq. ( 9.1.3 )
or, for the special case where n1 = n2 = ... = ng, the between-group sum of
squares can be presented in the form
g
2
SS B = n Σ
j=1
Xj - X .
Eq. ( 9.1.4 )
Thus, we may now say that
SS T = SS B + SS W.
Eq. ( 9.1.5 )
ng - 1 = g-1 + g(n-1) .
Eq. ( 9.1.8 )
SS T
MST =
ng - 1
Eq. ( 9.2.1 )
and
SS B
MSB =
g - 1,
Eq. ( 9.2.2 )
MSW = SS W
g(n - 1) .
Eq. ( 9.2.3 )
66
The reader must bear in mind that a temporal source of error requires the passage of time before it can
develop or otherwise realize its full contributory influence – such as the effects of machine set-up, tool
wear, environmental changes and so on.
67
From a process engineering perspective, the system of causation relates directly to the technologies
employed to create the performance characteristic. Thus, the measure of instantaneous reproducibility
(short-term standard deviation) reports on how well the implemented technologies could potentially
function in a world that is characterized by only random variations (error), and where the process is
always postulated to be centered on its nominal specification (target value).
68
For example, consider a high-speed punch press. Natural wear in the die is not generally discernable
within a small subgroup (say n=5), but can be made to appear between subgroups in the context of a
rational sampling strategy. With this understanding as a backdrop, we can say that the influence of die
wear will be manifested over time in the form of a “drifting process center.” To some extent, wear in the
die does exert an influence during the short periods of subgroup sampling, but that effect is so miniscule
it would not be practical or economical to analytically separate its unique contributory effect for
independent consideration. With respect to the within-group partition, the influence of die wear should
remain confounded with the many other sources of background noise. Consequently, its momentary
influence should be reflected in the MSW term, but its temporal influence should be reflected in the MSB
term. Thus, the long-term standard deviation would likely be considerably larger than the short-term
standard deviation. Given this line of reasoning, it is easy to understand why nonrandom sources of
temporal error generally have a destabilizing effect on the subgroup averages rather than their respective
standard deviations. Based on this line of reasoning, we must recognize that a rational sampling
strategy can greatly facilitate “bouncing” nonrandom variations into the between-group partition while
restraining the inherent random variations within groups. Only in this manner can the short-term
standard deviation (instantaneous capability) of a given performance characteristic be contrasted or
otherwise made relative to the long-term standard deviation (temporal capability).
69
A temporal source of error generally exerts its unique and often interactive influence over a relatively
protracted period of time. Although such errors can be fully independent or interactive by nature, their
aggregate (net) effect generally tends to “inflate” or otherwise “expand” the relative magnitude of the
short-term standard deviation. Naturally, a statistically significant differential between the temporal and
instantaneous estimates of reproducibility error necessarily constitutes the relative extent to which the
process center is “controlled” over time. This understanding cannot be overstated, as it is at the core of
six sigma practice, from a process as well as design point of view.
as the standard deviation of the response ... [so as] to relate the
component tolerances and the response tolerance.
2 2
v T
= d c vW i
Eq. ( 10.1.1 )
or
c = σT
σW
Eq. ( 10.1.2 )
where c is the relative magnitude of inflation imposed or otherwise overlaid
on the measure of instantaneous reproducibility.
In this context, c is a corrective measure used to adjust the
instantaneous reproducibility of a performance characteristic. This
corrective device is intended to generally account for the influence of
random temporal error, and an array of transient effects that periodically
1.4 ≤ c ≤ 1.8 .
Eq. ( 10.1.3 )
SS T
ng -1
c=
SS W
g(n-1) .
Eq. ( 10.1.4 )
SS B +1 = c2 ng-1
SS W g n-1 .
Eq. ( 10.1.6 )
SS B = SS W c2 ng-1 - g n-1
g n-1 .
Eq. (10.1.7 )
g
2 c2_ ng - 1i - g _ n - 1i
R ` Xj - Xj = vt 2W
j=1
n
Eq. ( 10.1.8 )
g
2 c2_ ng - 1i - g _ n - 1i
R ` Xj - Xj = vt 2W
j=1
n
Eq. ( 10.1.9 )
g
2
R ` X - Xj
j=1
j
ct 2_ ng - 1i - g _ n - 1i
t 2W
=v
g ng
Eq. ( 10.1.10 )
Taking the square root of both sides, we are left with the "typical" absolute
mean deviation (shift). Of course, this is provided in the form
g
2
R `X - Xj j R 2 V
S ct _ ng - 1i - g _n - 1i W
d = vW j=1
tW
=v
g SS ng WW
T X
Eq. ( 10.1.11 )
c2 (ng-1) - g(n-1)
ZShift.Typ = ng .
Eq. ( 10.1.12 )
(c2 -1)(ng-1)
ZShift = ng .
Eq. ( 10.1.14 )
Figure 10.1.1
71
The mathematically inclined reader will quickly recognize that such a proposal is somewhat spurious
from a theoretical point of view. However, the author asserts that the practical benefits tied to the
proposal far outweigh the theoretical constraints. For example, when c=1, the uninformed practitioner
would intuitively reason that ZShift = 0 since the variances are equal. However, from Eq. (10.1.13) it is
apparent that ZShift would necessarily prove to be greater than zero, owing to a differential in the degrees
of freedom. Needless to say, this would present the uninformed practitioner with a point of major
contention or confusion. Although less precise, Eq. (10.1.14) provides a more intuitive result over the
theoretical range of c. It should also be noted that this operation has a negligible effect on ZShift for
typical combinations of n and g. As a consequence of these characteristics, the author believes the
application of Eq. (10.1.13) as a corrective device or compensatory measure is justified, particularly in
the spirit of many conventional design engineering practices and forms of producibility analysis.
2.5
2
ZSHIFT
1.5
1
ng=10
.5 ng=100
ng=1000
0
-.5 C
.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25
Figure 10.1.2
The Effect of ng on ZShift for c=1 to c=3.
Figure 10.1.3
The Effect of ng on ZShift for c=1.5 to c=2.0.
4≤n≤6
and
25 ≤ g ≤ 100.
72
The vast majority of benchmarking data gathered by this researcher and practitioner (since 1984) has
revealed the process capability associated with a great many products and services to exist in the range
of 3.5σ to 4.5σ, with the trailing edges dropping off at 3.0σ and 5.0σ respectively. Obviously, this tends
to infer that the typical CTQ of a product or service will exhibit a performance capability of about 4.0σ.
This is to be generally expected, given the conventional practice of establishing 25 percent design
margins. Of course, 4.0σ is the equivalent form of such a safety margin.
F = MSB
MSW .
Eq. ( 10.3.1 )
By standardizing to the case NID(0,1), it will be recognized that
F = MSB
Eq. ( 10.3.2 )
SS B
F=
g-1
SS B = F g-1 .
Eq. ( 10.3.4 )
g 2
n Σ
j=1
X j - X = F g-1
Eq. ( 10.3.5 )
g
2
Σ
j=1
Xj - X
F g-1
g = ng .
Eq. ( 10.3.6 )
g
2
Σ
j=1
Xj - X
g-1 F g-1 g-1
g - ng = ng - ng
.
Eq. ( 10.3.7 )
F-1 g-1
ZShift =
ng .
Drawing upon the merits of our previous discussion, we may now state the
equality
(F-1) (g-1)
c= 1+
ng-1 .
Eq. ( 10.3.10 )
-µ
Z =X
σ
Eq. ( 10.4.1 )
T - SL
ZST = t ST
v
Eq. ( 10.4.2 )
interest and σST is an estimator of the short-term population σST. Notice that
Eq. (10.4.2) assumes µ = T. However, when the mean does not coincide
with the target value, we may calculate
X - SL
Z1 = t ST
v
Eq. ( 10.4.3 )
where x is the grand mean of the sample data, or the estimator of µ.
Due to dynamic perturbations of a transient or temporal nature, we often
witness an inflation of the initial short-term standard deviation that, over many
cycles of a process, will degrade the value of ZST . To compensate for this
phenomenon, we calculate the quantity
T - SL T - SL
Z2 = vLT = vSTc
Eq. ( 10.4.4 )
T - SL ZST
CP = 1 t =
3 vST 3
Eq. ( 10.4.6 )
In accordance with existing literature, we may account for the effect of a static
mean offset by computing the ratio
T–X
k1 =
T – SL
Eq. ( 10.4.7 )
which may be restated in the form
X – SL
1– k1 =
T – SL .
Eq. ( 10.4.8 )
Finally, the cross multiplication of Eq. (10.4.6) and Eq. (10.4.8) reveals
X - SL 1 X - SL Z1
CPK1 = CP _1 - k1i = T -t SL : = t ST = 3
3vST T - SL 3 v
Eq. ( 10.4.9 )
Z1
ZST = _1 - k1i
Eq. ( 10.4.11 )
By analogy, we write the equation
Z2 = 1–k
2
ZST
Eq. ( 10.4.12 )
and
Z3 = 1–k
3
ZST .
Eq. ( 10.4.13 )
By the manipulation of Eq. (10.4.4) we discover that
t ST 1
v
t 1T = c = _1 - k2i
v
Eq. ( 10.4.14 )
from which we observe
k2 = 1 - 1c .
Eq. ( 10.4.15 )
By substitution, we recognize that
1 - k2 = 1 - k
c 3
Eq. ( 10.4.16 )
which may be rearranged to reveal
Eq. ( 10.4.17 )
Thus, from the latter arguments, it follows that
k3 = k1 + k2 - k1 k2 .
Eq. ( 10.4.19 )
Cp
Cpk2 = Cp 1 - k2 = c
Eq. ( 10.4.20 )
and
Cp 1 - k1 Cpk1
Cpk3 = Cp 1 - k3 = c = c .
Eq. ( 10.4.21 )
The classical form of the Monte Carlo method is a very powerful and
commonly used simulation tool; however, in the instance of product design, D
frequently suffers a major limitation -- the approach assumes a static universe
for each selected c.d.f. In most industrial applications, the µ and σ2
associated with any given c.d.f. are not necessarily immobile. In fact, it is
almost an idealization to make such an assumption. For example, such
nonrandom phenomenon as tool wear, supplier selection, personnel
differences, equipment calibration, etc. will synergistically contribute to
nonrandom parametric perturbations. For many varieties of nonlinear transfer
functions, the resulting pattern of D and V will appear random. In fact, many
analytical methods would substantiate such a qualitative assertion.
As may be apparent, the previously mentioned limitation can adversely
influence the decision making process during the course of product
configuration (design). Therefore, it is here postulated that to account for
these seemingly random perturbations, it becomes mandatory to sample from
2
S = µ ij , σ ij i = 1 to r
j = 1 to c .
Eq. (11.3.1)
To begin our discussion on the use of chaos theory and fractal geometry,
let us qualitatively define what is meant by the term "chaos." According to
Gleick (1987), the phenomenon of chaos may be described by its unique
properties. In a mathematical sense, chaos is " ... the complicated, aperiodic,
attracting orbits of certain (usually low-dimensional) dynamical systems." It
is also described by " ... the irregular unpredictable behavior of deterministic,
nonlinear dynamical systems." Yet another description is " ... dynamics with
positive, but finite, metric entropy ... the translation from math-ease: behavior
that produces information (amplifies small uncertainties), but is not utterly
unpredictable."
Obviously, the latter descriptions may prove somewhat bewildering to
the uniformed reader, to say the least. So that we may better understand the
unique properties associated with the chaos phenomenon, let us consider a
simple example. Suppose that we have some experimental space given by Q,
where Q is a quadrilateral. We shall label the northwest corner as A, the
southwest corner as B, the southeast corner as C, and the northeast corner as
D.
With the task of labeling accomplished, we must locate a randomly
selected point within the confines of Q. This is a starting point and shall be
referred to as τ0. Next, we shall select a random number, r, between 0 and 1.
Based on the value of r, we follow one of three simple rules:
Rule 1: If r <= .333, then move one-half the distance to vertex A
Figure 11.4.1
Fractal Mosaic Created by Successive Iteration of a Rule Set
x t+1 = φx t + θ 1(φ − x t)
Eq. (11.5.1)
and
y t+1 = φy t + θ 2(φ − y t)
Eq. (11.5.2)
r ≤ ξ, then θ 1 =θ 2 =0
Rule (11.5.1)
or, if
r ≥ 1 − ξ, then θ 1 =1,θ 2 =0
Rule (11.5.2)
θ 1 =0,θ 2 =1
Rule (11.5.3)
ωi = τi sinθ
Eq.( 11.6.1 )
where θ is defined as
θ=α-π
4
Eq.( 11.6.2 )
and
α = tan -1(τi ).
The reader should recognize that the mosaic displayed in figure 11.4.1 was generated by letting φ = .5,
73
α = 0, 1
τi
ωi
.5
+.3535
β = 0, 0
−.3535
.5
Time Axis
γ = 1, 0
Figure 11.6.1
Transformation of the Fractal Mosaic into a Time Series
zt = φzt -1
Eq.( 11.6.4 )
x t+1 = 1 + φ x t − φx t-1
Eq. ( 11.6.6 )
and for the case θ1=1, we find that
x t+1 = φx t + 1 − φ x t-1 .
Eq. ( 11.6.7 )
Naturally, the y coordinate of the Cartesian system would reflect the same
mathematical constructs. From either perspective, Eq. (11.6.6) and Eq.
(11.6.7) will be recognized as the deterministic portion of an AR(2) time
Xt = !ziXt - i + at
i=1
Eq. ( 11.6.8 )
where at is the shock at time t. Notice that the shock is also referred to as
"white noise." It is imperative to understand that the stochastic nature of Eq.
(11.6.8) manifests itself in the distribution of at. This particular distribution is
described by g(at) = N(0,σat). Of course, it has been well established that a
great many industrial processes display a time dependency of some form. It is
interesting that the time series phenomenon is also displayed by the chaotic
pattern described in this book. The reader is directed to Box and Jenkins
(1976) for a more thorough discussion on the nature of autoregressive models.
Now that we have generated the set {ω1, ω2, ..., ωn}, each ωi may be
employed as a new universe mean (µi). This is done for purposes of
enhancing the Monte Carlo simulation; e.g., it provides a mechanism for
introducing dynamic perturbations in process centering during the course of
simulation. The same logic and methodology may be applied to the
distribution variance (σ2). However, for the sake of illustration, we shall
constrain the ensuing example by only perturbing µ.
Let us suppose that we are concerned with the likelihood of assembly
pertaining to a certain product design, say a widget such as displayed in figure
11.7.1.
Part 5
Figure 11.7.1
Illustration of the Widget Product Example
Cp = ∆
3σ ,
Eq.( 11.7.1 )
4
Gi = Xi5 - !R ij
j=1
Eq.( 11.7.2 )
74
Recognize that such information is obtained from a process characterization study. The reader is
directed to Harry and Lawson (1988) for additional information on this topic.
µ-T
k=
∆ .
Eq.( 11.7.3 )
ωmax = 1
2 2.
Eq.( 11.7.4 )
Since the maximum mean shift is k∆, it can be demonstrated the scaling factor
is
ρ = k∆ 2 2 .
Eq.( 11.7.5 )
Figure 11.7.2
Effects of Dynamic Mean Perturbations on the
Widget Monte Carlo Simulation
(Note: ordinates not to scale)
700
400
300
200
100
0
0 .005 .01 .015 .02 .025 .03 .035 .04
No Shift: Z = 7.1
Z
Shifted: Z = 4.5
Figure 11.7.3
Effect of Dynamic Mean Perturbations on the
Widget Assembly Gap
Figure 11.7.4
Autocorrelation (Lag=1) of the Widget Assembly
Gap Under the Condition k=.00
Figure 11.7.5
Autocorrelation (Lag=1) of the Widget Assembly
Gap Under the Condition k=.75
Where the theory and practice of six sigma is concerned, there has been
much debate since its inception in 1984. Central to this debate is the idea of
reproducibility, but often discussed in the form of process capability – a contrast
of the actual operating bandwidth of a response characteristic to some established,
theoretical or expected performance bandwidth. It is widely recognized among
quality professionals that there exists many types of performance metrics to report
on process capability, and that all such valid measures are interrelated by a web of
well established statistics and application practices.
In order to better understand the pragmatic logic of six sigma, we have
underscored the key ideas, tenets and statistical concepts that form its core. In
specific, this book has presented and interrogated the theoretical underpinnings,
Guideline 3: In general, if the originating data are discrete by nature, the resulting Z
transform should be regarded as long-term. The logic of this guideline is simple: a fairly
large number of cycles or time intervals is often required to generate enough
nonconformities from which to generate a relatively stable estimate of Z. Hence, it is
Guideline 4: In general, if the originating data are continuous by nature and were
gathered under the constraint of sequential or random sampling across a very limited
number of cycles or time intervals, the resulting Z value should be regarded as short-
term. The logic of this guideline is simple: data gathered over a very limited number
cycles or time intervals only reflects random influences (white noise) and, as a
consequence, tends to exclude temporal sources of variation.
Briggs, J. and Peat, F. (1989). Turbulent Mirror. Harper and Row, New York, New York.
Evans, David H., (1974). “Statistical Tolerancing: The State of the Art, Part I:
Background,” Journal of Quality and Technology, 6 (4), pp. 188-195.
Evans, David H., (1975). “Statistical Tolerancing: The State of the Art, Part II: Methods
for Estimating Moments,” Journal of Quality and Technology, 7 (1), pp. 1-12.
Evans, David H., (1975). “Statistical Tolerancing: The State of the Art, Part III: Shifts
and Drifts,” Journal of Quality and Technology, 7 (2), pp. 72-76.
Gleick, J. (1987). Chaos, Making a New Science. Penguin Books, New York, New York.
Grant, E.L., and Leavenworth, R.S. (1972). Statistical Quality Control (4th Edition). New
York: McGraw-Hill Book Company.
Harry, M.J. (1986). The Nature of Six Sigma Quality. Motorola University Press,
Motorola Inc., Schaumburg Illinois.
Harry, M.J. and Lawson, R.J. (1988). Six Sigma Producibility Analysis and Process
Characterization. Publication Number 6σ-3-03/88. Motorola University Press, Motorola
Inc., Schaumburg Illinois.
Harry, M.J. and Prins, J. (1991). The Vision of Six Sigma: Mathematical Constructs
Related to Process Centering. Publication Pending. Motorola University Press, Motorola
Inc., Schaumburg Illinois.
Harry, M.J. and Schroeder R. (2000). Six Sigma: The Breakthrough Management
Strategy Revolutionizing the World’s Top Corporations. New York, NY: Doubleday.
Juran, J.M., Gryna, F.M., and Bingham, R.S. (1979). Quality Control Handbook. New
York, NY: McGraw-Hill Book Co.
Krasner, S. (1990). The Ubiquity of Chaos. American Association for the Advancement
of Science. Washington D.C.
Mandelbrot, B. (1982). The Fractal Geometry of Nature. W.H. Freeman, San Francisco.
Mood, A. and Graybill, F. (1963). Introduction To The Theory of Statistics (2nd Edition).
New York: McGraw-Hill Book Co.
Motorola Inc. (1986). Design for Manufacturability: Eng 123 (Participant Guide).
Motorola Training and Education Center, Motorola Inc., Schaumburg, IL.
Pearson, E.S. and Hartley, H.O. (1972). Biometrika Tables for Statisticians. Vol. 2,
Cambridge University Press, Cambridge.