Batch Fermentation Modeling, Monitoring, and Control (Chemical Industries, Vol. 93) PDF
Batch Fermentation Modeling, Monitoring, and Control (Chemical Industries, Vol. 93) PDF
Batch Fermentation Modeling, Monitoring, and Control (Chemical Industries, Vol. 93) PDF
fill Cinar
Satish J. Parcilekar
Ccnk Undey
Illinois Institute of Technology
Chicago, Illinois, U.S.A.
Gtilnur Birol
Northwestern University
Evans ton, Illinois, U.S.A.
MARCEL
ISBN: 0-8247-4034-3
Headquarters
Marcel Dekker, Inc.
270 Madison Avenue, New York, NY 10016
tel: 212-696-9000; fax: 212-685-4540
The publisher offers discounts on this book when ordered in bulk quantities. For more infor-
mation, write to Special Sales/Professional Marketing at the headquarters address above.
Neither this book nor any part may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, microfilming, and recording, or
by any information storage and retrieval system, without permission in writing from the
publisher.
Founding Editor
HEINZ HEINEMANN
linear processes with nonstationary and correlated data. Methods are in-
troduced for finding optimal reference trajectories and operating condi-
tions, and for manufacturing the product profitably in spite of variations in
the characteristics of raw materials and ambient conditions, malfunctions
in equipment, and variations in operator judgment and experience. The
book presents both fundamental and data-based empirical modeling meth-
ods, several monitoring techniques ranging from simple univariate statisti-
cal process control to advanced multivariate process monitoring techniques,
many fault diagnosis paradigms and a variety of simple to advanced process
control approaches. The integration of techniques in model development,
signal processing, data reconciliation, process monitoring, fault detection
and diagnosis, quality control, and process control for a comprehensive ap-
proach in managing batch process operations by a supervisory knowledge-
based system is illustrated. Most of these methods have been presented in
various conferences and have been discussed in research journals, but they
have not appeared in books for the general technical audience. The focus of
the book is on batch fermentation in pharmaceutical processes. However,
the methods presented can be used for batch processes in other areas by
paying attention to the special characteristics of a specific process.
The book will be a useful resource for engineers and scientists working
with fermentation processes, as well as students in biotechnology, mod-
eling, reaction engineering, quality control, and process control courses.
One objective of the book is to provide detailed information for under-
standing, comparing, and implementing new techniques reported in the
research literature. Various paradigms are introduced in each subject to
provide a balanced view. Some of them are based on the research of the
authors, while others have been proposed by other researchers. A well-
documented industrial process, penicillin fermentation, is used throughout
the book to illustrate the methods, their strengths and limitations. An-
other objective is to provide a detailed case study to the reader to practice
these methods and become comfortable in using them. Data sets, mod-
els, and software are provided to encourage the reader to gain hands-on
experience. A dynamic simulator for batch penicillin fermentation is avail-
able as a web-based application and downloadable material. The fermen-
tation simulator, batch process monitoring software, and software tools
for supervision of batch process operations are provided at the website
www. chee. lit. edu/~cinar/batchbook.html.
Convincing the reader about the strengths and limitations of the tech-
niques discussed in this book would be impossible without reference to
proper theory. Theoretical derivations are kept at an appropriate level to
enhance the readability of the text, and references are provided for readers
seeking more rigorous theoretical treatment. The level of the treatment of
Preface vii
The book also discusses recent advances that may have an impact on the
next generation of modeling, monitoring, and control methods. Metabolic
pathway engineering, real-time knowledge-based systems, and nonlinear dy-
namics are introduced as some of the powerful paradigms that would be of
interest.
This book could not have been written without the strong cooperation
of the authors and the sacrifices of many family members and friends. The
labor and agony of writing a multidisciplinary book tested the strength of
several relationships. All four authors are grateful for the encouragement
and support they have received from their loved ones. One of the authors,
Cenk Undey, has done a magnificent job in coordinating the work of all
authors, integrating the manuscript and providing technical support in the
use of LaTeX to the others. All four authors are also grateful to Dr. Inane
Birol for contributing an important chapter on System Science Methods for
Nonlinear Model Development (Chapter 5). It is certain that the impact
of the methods and tools discussed in that chapter will increase in future
years in analyzing the dynamics of many nonlinear batch fermentation pro-
cesses and developing new monitoring and control methods. His insight
and knowledge have enhanced the value of the book. It seems that no book
can be published free of errors. As time progresses, errors, omissions, and
better ways to express the material discussed in the book will be discovered.
Each author apologizes for the remaining errors and agrees that they are
the fault of the other three.
Ali Cinar
Satish J. Parulekar
Cenk Undey
Giilnur Birol
Contents
Prefac e
Nomenclatur e
1 Introduction
1.1 Characteristics of Batch Processes
1.2 Focus Areas of the Book6
1.2.1 Batch Process Modeling 7
1.2.2 Process Monitoring 11
1.2.3 Process Control12
1.2.4 Fault Diagnosis 13
1.3 Penicillin Fermentation 13
1.4 Outline of the Book 15
IX
x CONTENTS
Appendix 537
Bibliography 539
Nomenclature
xv
xvi Nomenclature
I Identity matrix
ka, ks,kh Maximum specific growth rates of apical, subapical, and hyphal
cells, respectively, in Eq. 2.25 (1/h)
kic Gas-side mass transfer coefficient for specie i, defined in
Eq. 2.4 (m/h)
r trans Specific rate of transport of specie q from the cells of type j to the
abiotic phase (g/{g cell type j'.h})
S Riccati transformation matrix in Eqs. 7.136 and 7.137
S Covariance matrix of scores in Ch. 4, 6 and 8
SB Between-class scatter matrix
Sf Defined in Eq. 7.132
SPF(Q) Plant fault TFM
Spi Pooled estimate of 5]
SPAT(?) Plant noise TFM
Sw Within-class scatter matrix
Sy Total scatter matrix
S Concentration of limiting substrate in the abiotic phase
(liquid) (g/L)
S Distinct states of a discrete-time system in Ch. 8
S Limiting substrate
s Laplace transform variable
Si Score distance based on the PC model for fault i
t,T Scores vector and matrix, respectively, in Ch. 4, 6 and 8
tf Duration of a bioreactor operation (h)
T Sampling period (h)
u m-dimensional vector of manipulated inputs
u, U PLS scores vector and matrix, respectively, in Ch. 4, 6 and 8
7/1, 7/2, 7/3 Rates of the three metamorphosis reactions in (Eqs. 2.22-
2.24) (1/h)
xxii Nomenclature
W Projection matrix
W Weight matrix in Ch. 4, 6 and 8
W Wavelet transform
Greek Letters
Oij Constant characteristic of a particular metabolite Pj in Eq. 2.21
/?, (3 Vector and matrix of regression coefficients, respectively
fi(i) Step-response functions in Eq. 7.142
f3j Constant characteristic of a particular metabolite Pj in
Eq. 2.21 (1/h)
xxiv Nomenclature
/^a, /L£S, /j,h Specific growth rates of apical, subapical, and hyphal
cells, respectively, defined in Eq. 2.25 (1/h)
y,0 Characteristic of a particular strain in Eq. 2.17 (1/h)
Subscripts
abiotic Abiotic phase
biotic Biotic phase
/ At the end of bioreactor operation (t = £/)
F Bioreactor feed, gas feed or liquid feed as appropriate
J Partial derivative with respect to J (Sections 7.2.5 and 7.3.2)
m, max Maximum values of a variable
min Minimum value of a variable
r Reference state/value
sp Set point
syn, util Synthesis and utilization, respectively (Section 2.6.3)
0, 0 Initial conditions
0, 0 Steady-state conditions (Section 7.3)
Reciprocal of a scalar or inverse of a matrix
Nomenclature xxvii
Superscripts
c Complex conjugate
T Transpose of a matrix
* Optimal trajectory/value or desired trajectory/value
Abbreviations
adj A Adjoint of a matrix A, Eqs. 7.119 and 7.120
AHPCA Adaptive hierarchical principal component analysis
AIC Akaike information criteria
ANN Artificial neural network
AO Additive outlier
AR Auto regressive
ARL Average run length
ARMA Auto regressive moving average
ARMAX Auto regressive moving average with exogenous inputs
ARX Auto regressive model with exogenous inputs
BJ Box-Jenkins
CCC Concentration control coefficient
CPCA Consensus principal components analysis
CUMPRESS Cumulative prediction sum of squares
CUSUM Cumulative sum
CV Canonical variate
CVA Canonical variates analysis
CVSS Canonical variate state space (models)
d.f. Degrees of freedom
diag A Diagonal matrix containing the diagonal elements of a matrix A
xxviii Nomenclature
MS Mass spectrometer
MSB Least squares mean squared error
MSMPCA Multiscale Multiway principal component analysis
MSPM Multivariate statistical process monitoring
MV Multivariate
NAR Nonlinear auto regressive
NARMAX Nonlinear autoregressive moving average with exogenous inputs
NLTS Nonlinear time series
NO Normal operation
NOC Normal operating conditions
NPETM Nonlinear polynomial models with exponential and trigonometric
functions
OD Optical density
OE Output error
OVAT One-variable-at-a-time
PARAFAC Parallel factor analysis
PC Principal component
PCA Principal components analysis
PCD Parameter change detection (method)
PCR Principal components regression
PDA Principal differential analysis
PDF Probability distribution function
PLS Partial least squares (Projection to latent structures)
PRESS Prediction sum of squares
PSSE Penalized sum of squared error
PSSH Pseudo-steady state synthesis
Nomenclature xxxi
QQ Quantile-Quantile
RGA Relative gain array
RQ Respiratory quotient
RTKBS Real-time knowledge-based systems
RVWLS Recursive variable weighted least squares
RWLS Recursive weighted least-squares
SISO Single-input single-output
SNR Signal-to-noise ratio
SPC Statistical process control
SPE Squared prediction error
SPM Statistical process monitoring
SS Sum of squares
SSE Sum of squares explained
SSR Regression sum of squares
SSY Sum of squares on Y-block
STFT Short-time Fourier transform
SV Singular values
SVD Singular value decomposition
TFM Transfer function matrix
UCL Upper control limit
UWL Upper warning limit
VIP Variable influence on projection
Introduction
Batch processes have been around for many millennia, probably since the
beginning of human civilization. Cooking, bread making, tanning, and wine
making are some of the batch processes that humans relied upon for survival
and pleasure. The term "batch process" is often used to refer generically
to both batch and fed-batch operations. In the former case, all ingredients
used in the operation are fed to the processing vessel at the beginning of
the operation and no addition or withdrawal of material takes place during
the batch run. In the latter, material can be added during the batch run.
For brevity, the term batch is used in this text to refer to both batch and
fed-batch operations when there is no need to distinguish between them.
The term fed-batch is used to denote addition of material in some portions
of an otherwise batch operation.
Batch processes have received increasing attention in the second half of
the twentieth century. Specialty chemicals, materials for microelectronics,
and Pharmaceuticals are usually manufactured using batch processes. One
reason for this revival is the advantages of batch operation when there is
limited fundamental knowledge and detailed process models are not avail-
able. Batch processes are easier to set up and operate with limited knowl-
edge when compared to continuous processes. The performance of the
process can be improved by iterative learning from earlier batch runs. A
second reason is the increasing pressure to start commercial production of
novel materials once patents have been issued to recover research and de-
velopment costs before competing products affect prices. Another reason
is the ability to use the facilities for many products with little or no hard-
ware modification. Many pharmaceutical products are produced in limited
quantities and the plant manufactures a specific product for a short period
of time before switching to another product. Batch operation is usually
more efficient than continuous operation for frequent product changes and
small amounts of products.
Although batch processes are simple to set up and operate, modeling,
2 Chapter 1. Introduction
those for beer, whiskey, pickles, or sauerkraut, but the conventional design
evolved in the 1940's as the pharmaceutical companies scaled up reactors
for antibiotics from shake flasks and milk bottles to stirred tanks with fea-
tures to discourage entry of contaminating organisms. Typical sizes for
commercial production bioreactors are 60,000 to 200,000 liters, but there
are a few that are considerably larger. One famous bioreactor that was
known as the Merck hot dog was a cylinder laying on its side with four or
five agitators mounted along the top. Its dimensions were 3.6 ra diameter
by 27 ra long. The world's largest industrial bioreactor is still the Id's
air lift system first operated at the Billingham, U.K. plant for producing
single-cell protein in 1979. The size of a bioreactor is limited by its ability
to remove the heat generated by cellular metabolism. Volume goes up by a
dimension cubed while area depends on a dimension squared. This means
that the volume of culture fluid overwhelms the heat transfer area when
the fermenter is very large. Products based on genetic engineering tend
to be produced in small amounts and are suited to much smaller biore-
actors. Furthermore, production cultures derived from plant, animal, or
insect cells require expensive media which contain many more special nu-
trients than those present in media employed for synthesis of antibiotics,
vitamins, and other products with bulk markets. The microorganisms that
make antibiotics, in particular, are relatively easy to cultivate because their
products discourage the growth of other microorganisms. Animal cell cul-
tures, in contrast, have no self-protection and cannot compete with hardy,
rapidly-growing microorganisms that find the media delectable [133].
ing, process control, and fault diagnosis tools that are the building blocks
for supervision of process operation and intervention decisions. The book
focuses on these four topics: modeling, monitoring, control, and diagno-
sis. Introductory remarks on these four topics are given in the following
subsections.
simulate bioprocess
Shake flask
vegetative growth
From bioreactor
First extractor
Rotary filter 11
Solvent R purification column
A A A A A A A A A A
techniques.
Chapter 8 discusses various fault diagnosis techniques. One approach is
based on determining first the variables that contribute to the increase in
the statistic that indicates an out-of-control signal and then using process
knowledge to reason about the source causes that will affect those vari-
ables to identify the likely causes of faulty operation. The contribution
plots method is presented in the first part of the chapter. Automating
the integration of the variables indicated by contribution plots and process
knowledge with a knowledge-based system (KBS) is discussed in the last
section of the chapter. Section 8.2 of the chapter is devoted to multivari-
ate statistical classification techniques such as discriminant analysis and
Fisher's discriminant function, and their integration with PCA. Section 8.3
focuses on a variety of model-based techniques from systems science for
fault diagnosis. Generalized likelihood ratio, parity relations, observers,
Kalman filter banks, and hidden Markov models are presented. Section 8.4
is devoted to model-free fault diagnosis techniques such as limit checking,
hardware redundancy and KBSs. The last section outlines real-time su-
pervisory KBSs that integrate SPM, contribution plots and KBS rules to
provide powerful fault diagnosis systems.
Chapter 9 introduces some related developments in modeling, dynamic
optimization, and integration of various tasks in batch process operations
management. Metabolic engineering, metabolic flux analysis and metabolic
control analysis concepts are introduced and their potential contributions
to modeling is discussed. Dynamic optimization and its potential in indus-
trial applications is discussed and compared with classical and advanced
automatic control approaches. The integration of various tasks in process
operation using a supervisory knowledge-based system is outlined for on-
line process supervision.
Background Information and Road Maps to Use the Book
The book is written for professionals and students interested batch fer-
mentation process operations. It requires little background information in
various areas such as biotechnology, statistics, system theory, and process
control. Introductory materials in biotechnology can be found in various
process engineering books [35, 426, 546]. Applied statistics books for engi-
neers and scientists [78, 167, 400, 626] provide the basic theory and tech-
niques. A reference for multivariate statistics [262] would be useful for
Chapters 6 and 8. Several good textbooks are available for basic concepts
in process control [366, 438, 541]. Advanced books in all these areas are
referenced in appropriate chapters in the book.
Ideally, the chapters in this book should be read in the sequence they ap-
pear. However, allowing for potential diversity in background in fundamen-
1.4. Outline of the Book 19
21
22 Chapter 2. Kinetics and Process Models
in contact with a liquid phase. Whether the cells are suspended in the liquid
phase (suspension culture) or attached to a suitable solid support (immo-
bilized) and in contact with the liquid phase, the interactions between the
two phases [biotic phase (cell population) and abiotic phase (liquid)] must
be considered and fully accounted for. Both phases are multicomponent
systems. The abiotic phase usually contains all of the nutrients essential
for cell growth and various end products of cellular metabolism that are
excreted. Some of the end products may undergo further reactions in this
phase. A classic example is the hydrolysis of antibiotics such as penicillin
in the liquid medium. Transport of nutrients from abiotic phase to biotic
phase is essential for utilization of these for cell growth and maintenance and
for formation of a host of metabolic intermediates and end products. Some
of the end products are retained within the cells (intracellular metabolites),
while others are excreted by the cells (transport from biotic phase to abi-
otic phase). The large number of chemical reactions occurring within a cell
result in accumulation or depletion of energy. Exchange of energy between
abiotic and biotic phases must be accounted for to determine the culture
temperature. The temperature of the abiotic phase usually determines the
temperature of the biotic phase. Some of the cellular reactions impact the
acid-base equilibria in the biotic phase and in turn the pH of the abiotic
phase, which in turn influences cellular activities and transport processes
across the abiotic - biotic two-phase interface. In addition to transport of
essential nutrients and end products of cellular metabolism between the two
phases, one must also consider transport of ionic species (such as protons
and cations). As a result of cellular reactions, the properties of the abiotic
phase, such as viscosity, may change during the course of cell cultivation.
y = g(x,u,d). (2.2)
(2.4)
G
with C*G denoting the concentration of specie i in the gas phase at the gas-
liquid interface, kiG the gas-side mass transfer coefficient for specie i, DIG
the molecular diffusivity of i in the gas phase, and 8G the thickness of the
gas-side boundary layer. On the other side of the gas-liquid interface, one
must consider in the liquid-side boundary layer the transport of specie i by
molecular diffusion. Such transport occurs in parallel with consumption or
generation, as appropriate, of specie i as a result of cellular metabolism by
cells present in the liquid-side boundary layer and is therefore influenced
by the latter. Precise description of events occurring in the liquid-side
boundary layer then requires solution of conservation equations which ac-
count for diffusion of specie i and its participation in one or more reactions
within cells leading to its consumption or generation. Similar conservation
equations must also be considered for all species that are non-volatile and
participate in cellular reactions. These conservation equations are typically
nonlinear second-order (spatially) ordinary (partial) differential equations
and simultaneous solution of these can be a computationally challenging
task. For this reason, it is assumed that cellular reactions occur to negligi-
ble extents in the liquid-side boundary layer. Since the gas-liquid interface
has infinitesimal capacity to retain specie i, flux of specie i must be con-
tinuous at the gas-liquid interface and the gas and liquid phases must be
at equilibrium with respect to species i at the gas-liquid interface. When
the two phases are dilute with respect to i, the equilibrium is described by
Henry's law. The following relations then apply at the gas-liquid interface.
(2-7)
with p and Q denoting the density and volumetric effluent rate, respec-
tively, of the culture and pp and Qp the density and volumetric flow rate,
respectively, of the sterile feed (usually liquid). Q is trivial in batch and
fed-batch operations, while QF is trivial in a batch operation. The mass
balance in Eq. 2.7 is simplified via a customary assumption that pp and
p are not significantly different. This assumption is reasonable since the
densities of nutrient medium and biomass are not substantially different.
The simplified form of Eq. 2.7 is
^ = QF ~ Q. (2.8)
at
The culture is comprised of the biotic phase (cell mass) and the abiotic
(extracellular) phase. Let the concentration of biomass (X, cell mass) be
denoted as Cx (Cx — X, the notation popular in the biochemical engineer-
ing literature is X] and the density of biomass as p^. It is then not difficult
to deduce that the volume fractions of the biotic and abiotic phases in the
culture are Cx/'Pb and (1 - Cx/ Pb)-, respectively. The volumes of biotic
and abiotic phases (Vb and Va: respectively) and the volumetric flow rates
of these phases in the bioreactor effluent (in the case of continuous culture,
Qb and Q a , respectively) can then be expressed as
Q. = (!-£*)«. (2.9)
Pb Pb Pb Pb
(Cx <C Pb), in cultures that are concentrated in biomass, neglecting the vol-
ume fraction of the biotic phase may lead to certain pitfalls.
Before proceeding further, some comments are in order regarding bases
for different species. For a particular specie, the basis for choice is based on
how the specie is monitored. Thus, the biomass concentration is expressed
on the basis of unit culture volume, concentrations of species present in
the abiotic phase are expressed on the basis of unit abiotic phase volume,
and concentrations of intracellular species are usually expressed on the ba-
sis of unit biomass amount. For rate processes occurring entirely in the
abiotic phase, the basis is the volume of the abiotic phase (Va), while for
rate processes occurring in the biotic phase (metabolic reactions) and at
the interface between the abiotic and biotic phases (such as species trans-
port), the basis is the amount of the biotic phase. On a larger scale, a
single cell is viewed also as a catalyst (hence the name biocatalyst), or in a
stricter sense, an autocatalyst, since resource utilization and generation of
end products of cellular metabolism are promoted by the cell. The rates of
proliferation/replication of a living species and other processes associated
with it (utilization of resources and synthesis of end products) are as a
result proportional to the amount of the living species.
Approaches to representation of the biotic phase according to the num-
ber of components (species) used for such representation and whether or
not the biotic phase is viewed as a heterogeneous collection of discrete cells
have been succinctly classified by Fredrickson and Tsuchiya [166]. Repre-
sentations which view each cell as a rnulticomponent mixture are referred
to as structured representations, while those which view the biotic phase
as a single component (like any specie in the abiotic phase) are termed un-
structured representations. An unsegregated representation is based on use
of average cellular properties and does not account for cell-to-cell hetero-
geneity. A segregated representation, on the other hand, involves descrip-
tion of behavior of discrete, heterogeneous cells suspended in the culture
and thereby accounts for cell-to-cell variations. The segregated-structured
representation is most suitable for a bioreactor. In order to have tractable
representations of biotic phase, it is often assumed that the cell-to-cell varia-
tions do not substantially influence the kinetic processes in the biotic phase.
The segregated representation can then be reduced to an unsegregated rep-
resentation based on average cell properties. The discussion in this chapter
is limited to the latter perspective. With this in mind, the conservation
equation for the biotic phase can be stated as
loss due to cell death or cell lysis, and p,net the net specific growth rate,
respectively, the basis for each being unit biomass amount. It must be kept
in mind that Cx (later referred to also as X) represents the concentration of
viable cell mass. The mass fraction of dead cells in the total cell population
is considered to be negligible. In view of Eqs. 2.8 and 2.10, the temporal
variation in Cx can be described as
- — y-, '
with D denoting the dilution rate for the culture. The mass balance above
applies for all three reactor operations under consideration, with D being
trivial for a batch operation.
The conservation equation for a specie i in the abiotic phase in its
general form can be expressed as
with CIF denoting the concentration of specie i in the nutrient feed, Rfen
the rate of generation of specie i due to any reactions in the abiotic phase,
r trans ^g biomass-specific rate of transport of specie i from the biotic phase
to the abiotic phase, and Qa being trivial in batch and fed-batch operations.
When the specie is supplied in the feed gas (as is the case with oxygen),
CiF is trivial and Ni is non-trivial. For species which are not transported
into the biotic phase (for example, macromolecules like starch), r^ rans is
trivial. Although bulk of the chemical transformations occur in the biotic
phase, some species may undergo reactions in the abiotic phase and for
these species RfGn is non-trivial. Two examples of this situation are acidic
or enzymatic hydrolysis of starch to generate readily metabolizable carbo-
hydrates and degradation of antibiotics and enzymes in abiotic phase. This
sets the stage for accounting for the intracellular components (components
of the biotic phase) .
The conservation equation for a specie i in biotic phase can be succinctly
stated as
rlr
1 trans nei
—r11 — r-
~7T — ™i _ ,,
P rCii
^=rfen-rfans-MCi, (2.14b)
with rf en being the net rate of generation of specie i in the biotic phase
exclusive of the rate of its loss from the biotic phase due to cell death or
cell lysis. In the case of cell lysis, specie i will be introduced into the abiotic
phase at the rate equal to r^CxV and this must be accounted for in Rfen
in Eq. 2.12.
Conservation equations for intracellular species (and therefore tempo-
ral variations in quantities of these) are not accounted for in an unstruc-
tured representation of kinetics of cellular processes (the so-called unstruc-
tured kinetic models). For species that are present in both abiotic and
biotic phases, examples of which include readily metabolizable nutrients
and metabolites that are excreted by the cells, no differentiation is (can
be) made between rf en (specie synthesis in the biotic phase) and r*rans
(specie transport across the outer cell membrane into the abiotic phase).
The conservation equation for a specie i in the abiotic phase is then based
on Eq. 2.12 and is expressed as
This representation then involves conservation equations for cell mass, key
nutrients, and metabolites of interest (target products), the rates of gen-
eration/consumption of the individual species being expressed in general
in terms of concentrations of nutrients in abiotic phase, Ni (Ni — Ci for
nutrient Ni as per the notation commonly used in literature in biochemi-
cal engineering and biotechnology), cell mass concentration, X (X = Cx),
concentrations of metabolites of interest (Pj, Pj = Cj for product Pj for
an extracellular metabolite and Pj — CjCx for an intracellular metabolite
as per the notation commonly used in literature in biochemical engineering
and biotechnology), and other parameters such as culture pH and temper-
ature (T). For biomass (cell mass), the specific cell growth rate is therefore
expressed as // = //(A^, A^, • • • , PI, P-2, • • • , -X", pH, T). Consumption of a
nutrient Ni implies that the rate of its generation in the biotic phase, rf en ,
is negative [Eq. 2.15], with the consumption rate usually being expressed as
(. r f en ) = <Ti(Ni, N2, ..., PI, P2, • • • , X, fj., pH, T) and being referred to
as the cell mass-specific uptake rate of nutrient Ni. Similarly, for a target
metabolite Pj, the rate of its generation in the biotic phase, r|en (whether
or not the metabolite is excreted) [Eqs. 2.14a and 2.15], is expressed as
r|en = £j(Ni, N2,..., PI, PI, ..., X, u, pH, T), with EJ being referred to
as the cell mass-specific production rate of metabolite Pj.
(2.17)
(i + K^r1 [35]
vPO C\L
\
X
\
J(^) [23, 24]
(2.18)
with Yx/Ni being the biomass yield with respect to nutrient Ni, m^ (referred
to as the maintenance coefficient) accounting for consumption of N^ for
maintenance activities of cells, and a^ being the amount of Ni utilized
for production of unit amount of metabolite Pj. (The reciprocal of a^
is commonly referred to as the yield of Pj with respect to JVj.) Since
production of metabolites Pj is a part of cellular metabolism, the biomass
yield in Eq. 2.19 is the apparent biomass yield (and not the true biomass
yield) when consumption of Ni for production of Pj is accounted for directly
[as in the last term in Eq. 2.19] and also indirectly via uptake of Ni for
biomass production. When the last term in Eq. 2.19 is trivial, the cell
mass yield (Yx/Ni) in this relation represents the true cell mass yield. A
direct accounting of utilization of a nutrient Ni for production of metabolite
PJ, as represented by the last term in Eq. 2.19, is justified only when the
amount of Pj is substantial, so that utilization of Ni for synthesis of Pj is
comparable to that for cell growth. The significance of utilization of Ni for
cell maintenance relative to utilization of the same nutrient for cell growth
increases as the specific cell growth rate is reduced.
with £j0 being constant characteristic of the metabolite Pj. The popular
forms of Xji(Ni) and ipjk(Pk) are based on the experimental observations
for a particular strain and the metabolite Pj. Synthesis of a metabolite Pj
may be
38 Chapter 2. Kinetics and Process Models
K'3k
(K']k + Pk)
[8,9]
^jk(Pk} g-OtjkPjk
[8,9]
Pk
(IL }0 [47, 190, 239, 326, 395, 544, 600]
( P'km>
1>(X) V 1 - -M
(I X'm> [325, 470]
sion and differentiation. During tip extension, some apical cells become
subapical cells. Branching refers to formation of new apical compartments
from the cells in the subapical compartment. The subapical cells further
away from the tip become more and more vacuolated as their age increases.
As a result, cells further away from the tip contain large vacuoles. These
cells, which form the hyphal compartment, play an important role in trans-
port of protoplasm toward the tip section. Formation of vacuolated hyphal
cells from the subapical cells is referred to as differentiation. The transi-
tion from active subapical cells to completely vacuolated hyphal cells takes
place gradually. The hyphal cells located in the vicinity of the subapical
compartment are therefore assumed to retain the metabolic activity and
ability to grow as do the subapical cells. This has been accounted for in the
formulation of the model by considering that a fraction fh of the hyphal
cells is metabolically active ([423, 424, 425]).
The kinetic expressions for branching, tip extension and differentiation
are considered to be first order in cell type being transformed, which leads
to the following rate expressions for the metamorphosis reactions under
consideration.
Branching (1):
Zs^Za, Ul=kUlZs (2.22)
Extension (2):
Za —> Zs, «2 = kU2Za (2.23)
Differentiation (3):
03(S) = .. * . (2.24)
In Eqs. 2.22 - 2.24, Za, Zs, and Zh represent the mass fractions of apical,
subapical, and hyphal cells, respectively, in the total cell population, w/s
(j = 1, 2, 3) the rates of the three metamorphosis reactions and kUj's (j =
1, 2, 3) the kinetic coefficients for these. Differentiation is assumed to be
inhibited by the carbon source (usually glucose, S = glucose concentration
in the abiotic phase). The form of $3(8) in Eq. 2.24 is a special case of the
form of <j>i(Ni) (Ni = S) in Table 2.1. The specific growth rates of each cell
type have been represented by the Monod kinetics, viz.,
then can be described by Eqs. 2.14a and 2.14b (ci — Zi, i = a, s, /i, r£rans
= 0) with rfen (i = a, s, h) being
rf n = ui - u2 + ^ a Z a , rf n = w2 - ui - u3 + ^SZS, (2.26)
Via addition of Eq. 2.14b for the three cells types and in view of the identity
Za + Zs + Zh — 1, the expression for the specific cell growth rate /u can
be deduced to be
(2.29)
In the relations above, fci, &2> and K\ are the kinetic parameters. The
dependence of /j, on pint is expressed by the Tessier equation. Cell lysis
releases phosphate into the abiotic phase in quantity proportional to the
cell mass phosphate content (YP/X) and the intracellular phosphate con-
centration (PJ). This release must be accounted for in the mass balance for
extracellular phosphate, which is described by Eq. 2.12 with Cp = p and
Np = 0, and Rfn and r*rans being
44 Chapter 2. Kinetics and Process Models
In Eq. 2.30, k$ and K-2 are kinetic parameters. The conservation equation
for intracellular phosphate is provided by Eq. 2.14b with a = cp — pj nt and
r en
p — ~ Yp/xfJ" The specific phosphate utilization rate for generation of
biomass is therefore considered to be proportional to specific cell growth
rate. It should be noted that the expression for Tprans in Eq. 2.30 is similar
to the Monod expression. The transport of many species from the abiotic
phase to the biotic phase and vice versa is facilitated by transport proteins
(such as permeases). The rate expressions for transport of species across the
outer cell membrane therefore bear close resemblance to the rate expressions
for enzyme-catalyzed reactions (such as the Michaelis-Menten expression).
Finally, the mass balance for the alkaloid (target metabolite) is provided
by Eq. 2.15 with d = Ca = a, 7Va = 0, Rfn = 0, and r|en being provided
by
, (2.31)
with KS and k± being the kinetic parameters. It is evident from Eqs. 2.29
and 2.31 that while increasing intracellular phosphate content is conducive
for cell growth, it inhibits alkaloid synthesis due to repression of phos-
phatase activity. In this chemically structured model, the rates of all key
kinetic activities, viz., cell growth, phosphate consumption, (- r|en), and
alkaloid production are expressed in terms of the conditions prevailing in
the biotic phase, viz., pl in the present case. The state variables [Eq. 2.1] in
this model therefore are x — (X p pl a}T for batch and continuous cultures
and x = [X p pi a V]T for a fed-batch culture.
source, denoted as m). The uptake of these two nutrients is confined mainly
to hyphae and swollen hyphal fragments. Of the three cell types, only the
swollen hyphal fragments are primarily capable of synthesizing CPC. Ex-
perimental studies have indicated that the rate of CPC synthesis is directly
related to the activity of enzymes responsible for this. These enzymes are
induced by intracellular methionine and are repressed by glucose. Only
hyphae are capable of replication (growth) . The rate of reaction (i) is ex-
pressed as a function of concentrations of hyphae, glucose and methionine,
while the rate of reaction (ii) is expressed as a function of concentrations
of glucose and swollen hyphal fragments. Let Zh, Zs and Za denote the
mass fractions of hyphae, swollen hyphal fragments, and arthrospores, re-
spectively, in the total cell population. Then the conservation equations
for the three cell types can be expressed as in Eq. 2.14a with Ci — Zi,
r trans _ Q^ _ ^ s ^ a rpj^ net rates of generation of the three cell types,
Ti (i — h, s, a), are expressed as ([35, 374])
rh = ( f j , ' - p - kD)Zh, rs = (3Zh - (7 + kD)Z8, ra =7ZS - kDZa (2.33)
with &£> being the kinetic coefficient for cell death or cell lysis and the
specific rates //, (3 and 7 being expressed as
i8 + (3mlhZh/Zs - jmls,
n rans
rf = -k3amia + ~fmlsZs/Za, r* = 0. (2.41)
The first terms on the right sides of expressions for r^en and rf en account
for biosynthesis of methionine in hyphae and swollen hyphal fragments,
respectively. The terms in the expressions above containing (3 and 7 rep-
resent (dis)appearance of methionine in a particular cell type population
associated with interconversion between two cell types. The presence of
glucose in the abiotic medium is considered to increase the rate of me-
thionine utilization for protein synthesis. The estimation of the kinetic
parameters in Eq. 2.41 has been based on comparison of the experimen-
tally measured and model predicted values of the average intracellular me-
thionine concentration, (rai) avg , the value predicted by the model being
mlsZs + miaZa.
2.6. Structured Kinetic Models 47
and VmE,KE,K, a,n, and 77 being the kinetic parameters. The effect of
catabolite repression by glucose is included in Q (n > 1). The subscript
(t — £/) denotes evaluation at time t' = £ — £/, with £/ representing the time
lag between induction and gene expression. Finally, the mass balance for
the target product (p), cephalosporin C (Cp = p), is expressed as in Eq.
2.15 with
(2.43)
The second expression in Eq. 2.43 accounts for degradation of cephalosporin
C in the abiotic phase. The magnitudes of various kinetic parameters for
the structured model are reported in [35] and [374]. The state variables
(Eq. 2.1) in this model therefore are x = [X g m Z^ Zs m^ rriis mia e p]T
for batch and continuous cultures and x = [X g m Zh Zs m^ m,is rriia e p
V]T for a fed-batch culture. The identity Zh + Zs + Za — I implies that
only two of the three fractions of the cell population are independent state
variables.
and P, respectively. The conservation equations for mRNA and P then are
provided by Eq. 2.14b [329, 330] with a = [i], i = mRNA, P and
ke[P],
(2.44)
In Eqs. (2.44), kp and kq are the kinetic coefficients for transcription of the
gene and translation of mRNA, 77 the efficiency of promoter utilization, £
the efficiency of utilization of the mRNA at the ribosomes, and kj and ke
the kinetic coefficients for deactivation of the mRNA and the active protein,
respectively. For intracellular proteins, rp ans is trivial, while for proteins
partially excreted from living cells, rp ans is non-trivial and positive. In
balanced growth, pseudo-steady state hypothesis (PSSH) is often invoked
for the specific mRNA and the target protein, i.e., the rate of intracellular
accumulation of each species (left side of Eq. 2.14b) is considered to be
insignificant compared to rates of other processes (the non-trivial terms
on the right side of Eq. 2.14b). Application of PSSH for an intracellular
protein results in the following algebraic relations.
The cell mass-specific rate of synthesis of the target protein, rpen, therefore
can be deduced to be
From this the cell mass-specific production rate of the target protein ( s p ,
total rate of protein production in the culture = £pXV) can be obtained
as follows. If the cells are subject to death, then Ep = r|fn — r^, while if
the cells are subject to lysis, then assuming total release of protein from
the cells undergoing lysis into the abiotic phase, R^nVa = rd[P}XV and
in that case £p = r|fn. If the target protein is partially excreted, then one
must consider mass balances for it in both biotic and abiotic phases with
r trans providing the linkage between the two balances.
The rate of expression of an operator-regulated gene depends on the
efficiency of transcription of that gene (77), which in turn is determined by
interactions of modulating species at operator sites and RNA polymerase
binding. This efficiency is thus proportional to the probability that the op-
erator site O is not bound to represser protein R. The genetically structured
model involves a large number of model parameters representing various
molecular interactions. A specific genetic change would affect only certain
interactions and therefore specific model parameters. Further details on
this model [329, 330] are spared here and interested readers are referred
2.7. Case Studies 49
to the source references ([33, 329, 330]). For Escherichia coli, the kinetic
parameters for the transcription and translation processes, viz., kp and kq,
have been correlated to the specific cell growth rate (//), with both pa-
rameters increasing with increasing /j, [329, 330]. Such kinetic models will
allow mapping of nucleotide sequence into cell population productivity and
therefore afford the user capability for systematic optimization of cloned
DNA inserts and in the long run the genetic makeup of the organism.
Unstructured Models
Mass balance equations can be summarized as follows.
^-
PROCESS
Input Variables Output Variables
Glucose Feed Temperature Culture Volume
Glucose Feed Flow Rate Fermenter Temperature
Aeration Rate Generated Heat
Agitator Power Input pH
Coolant Flow Rate
Acid/Base Flow Rate Concentrations of
Glucose
Biomass
Penicillin
Dissolved Oxygen
Carbon Dioxide
Model Structure
X = f (X, S, C L , H, T)
S = f (X, S, C L , H, T)
CL = f (X, S, CL, H, T)
P = f (X, S, CL, H, T, P)
CO-2 = f (X, H, T)
H - f (X, H, T)
Biomass: The dependence of specific growth rate on carbon and oxygen sub-
strates was assumed to follow Contois kinetics [36] to consider the biomass
inhibition. The biomass growth has been described by Eq. 2.11 with
Cx = X and /znet = fj. and the specific growth rate /u being
CL
(2.47)
(KXX + S) (KOXX
in the original model [36]. The variables and parameters used are defined
in Table 2.3 and 2.4.
In order to include the effects of environmental variables such as pH
and temperature, biomass formation can be related to these variables by
introducing their effects in the specific growth rate expression [61] to give:
P'X S CL
If \
A \\n
W~Mj :oxx + CL}
( K X -\- S1^ ( A
L ' \H+] ' K2 \
This would in turn affect the utilization of substrate and the production
of penicillin. Direct effects of pH and temperature on penicillin production
2.7. Case Studies 51
Table 2.4. Initial conditions, kinetic and controller parameters for normal
operation (adapted from [61])
Time: t (h)
Initial Conditions Value
Biomass concentration: X (g/L) 0.1
Carbon dioxide concentration: CC>2 (mmole/L) 0.5
Culture volume: V (L) 100
Dissolved oxygen concentration: CL (= C*L at saturation) (g/L) 1.16
Heat generation: Q r x n (cal) 0
10-5.5
Hydrogen ion concentration: [H+] (mole/L)
Penicillin concentration: P (g/L) 0
Substrate concentration: S (g/L) 15
Temperature: T (K) 297
Activation energy for growth: E 9 (cal/mole) 5100
Arrhenius constant for growth: k g 1X60 3
Activation energy for cell death: E^ (cal/mole) 52000
Arrhenius constant for cell death: ka 10 33
Constant: KI (mole /L) 10 -10
Constant: I<2 (mole /L) 7x10 -5
Constant relating COa to growth: 0.1 (mmole CO?/ g biomass) 0.143
Constant relating COa to maintenance energy:
Constant relating COa to penicillin production:
Constant: p
Constant: b 0.60
Constants in Kj a : a, (3 72, 0.5
Constant in F J o s s : A (h" 1 ) 2.5xlO~4
Constant in heat generation: rg2 (cal/g biomass.h) 1.6783X10"1
Cooling water flow rate: Fc (L/h)
Contois saturation constant: K x (g/L) 0.15
Density x heat capacity of medium: p Cp (1/L°C) 1/1580
Density x heat capacity of cooling liquid: pcCpc (1/L°C) 5/2000
Feed substrate concentration: Sf (g/L) 600
Feed flow rate of substrate: F (L/h)
Feed temperature of substrate: T/ (K) 298
Heat transfer coefficient of cooling/heating liquid: a (cal/h°C) 1050
Inhibition constant: K p (g/L) 0.0002
Inhibition constant for product formation: K/ (g/L) 0.10
Maintenance coefficient on substrate: mx (h" 1 ) 0.014
Maintenance coefficient on oxygen: m 0 ( h ~ J ) 0.467
Maximum specific growth rate: p,x ( h ~ J ) 0.092
Oxygen limitation constant: K O I , K o p (no limitation) 0
Oxygen limitation constant: K O I , K o p (with limitation) 2xlO~2, 5xlO~4
Penicillin hydrolysis rate constant: K (h" 1 ) 0.04
pH : (Base)K c ,Tj:(h),Tr>:(h) 8 x 4 0 ~ 4 , 4.2, 0.2655
4
(Acid)K c ,T/:(h),TD:(h) I x l O " , 8.8, 0.125
Specific rate of penicillin production: np (h" 1 ) 0.005
Temperature: (Cooling)K c ,r/:(h),TD:(h) 70, 0.5, 1.6
(Heating)K c ,T/:(h),T£>:(h) 5, 0.8, 0.05
Yield constant: Y x / s (g biomass/g glucose) 0.45
Yield constant: Yx/0 (g biomass/g oxygen) 0.04
Yield constant: Y p / s (g penicillin/g glucose) 0.90
Yield constant: Y p / 0 (g penicillin/g oxygen) 0.20
Yield of heat generation: r g l (cal/g biomass) 60
O.-2 (mmole CO?/ g biomass h) 4xlO~7
as (mmole CO 2 / L h) io-4
52 Chapter 2. Kinetics and Process Models
are not considered due to the complex nature of the phenomenon, and
unavailability of the experimental data.
A typical inhibition term that includes hydrogen ion concentration [H+] is
introduced into the specific growth rate expression. It has been found that
the [H+]-dependent term in the rectangular parentheses in Eq. 2.48. The
values of KI and K2 are chosen to be in the range of their typical values in
the literature [426, 545].
The influence of temperature on the specific growth rate of a microor-
ganism shows an increasing tendency with an increase in temperature up
to a certain value which is microorganism specific and a rapid decrease
is observed beyond this value. This decrease might be treated as a death
rate [545] . These effects are reflected in the temperature-dependent term in
Eq. 2.48 with kg and E9 being the pre-exponential constant and activation
energy for cell growth, and k^ and E^ being the pre-exponential constant
and activation energy for cell death, respectively. Typical values for these
parameters were taken from the literature [545]. An adjustment has been
made so that an increase in temperature enhanced the biomass formation
up to 35°C.
Penicillin:
The production of penicillin is described by non-growth associated prod-
uct formation kinetics. The hydrolysis of penicillin is also included in the
rate expression [36] for completeness.
f-o* — -7W
where, ep is the specific penicillin production rate defined as:
S Cp
Substrates:
The utilization of each of the two substrate (glucose and oxygen) largely
for biomass growth, and penicillin formation, and cell maintenance [36].
The mass balances for glucose and oxygen for the variable volume fed-
batch operation therefore are
Glucose:
Dissolved Oxygen:
dCL n ep CLdV
—rr = -T?— A - —— A - m0X + Kia(CL - CL) - -^--JT (2.52)
at Yx/0 Yp/0 V at
with yield coefficients Yx/s, Yp/s, Yx/0, and Yp/0 and maintenance coeffi-
cients mx and m0 being constants characteristic of a particular penicillin
producing strain. Whereas Bajpai and Reuss [36] have considered the over-
all mass transfer coefficient Kj a to be constant, we have assumed K/ a to
be a function of agitation power input Pw and flow rate of oxygen fg (as
fg = QG = QGF suggested by [35].
(2-53)
Volume Change:
The change in the bioreactor volume during fed-batch process operation
is provided by a modified form of Eq. 2.8, which is
Culture Temperature:
Neglecting all other sources of heat generation except that caused by
microbial reactions, the volumetric heat production rate is given as:
^n dXV
qi (2.55)
dt dt
where r gi is assumed to be constant and might be treated as a yield coeffi-
cient [426]. During the product synthesis phase, when the rate of biomass
formation is rather low, there is still significant heat generation associated
with metabolic maintenance activities. Therefore, we have included the
second term on the right hand side of Eq. 2.55 to account for the heat pro-
duction during maintenance. Because the heat generation and CO<i evolu-
tion show similar profiles, their production rate due to growth (dX/dt) and
biomass (X) should have the same ratio as a first approximation. Based on
this observation, rQ2 is calculated and tabulated in Table 2.4. The energy
balance is written based on a coiled type heat exchanger which is suitable
for a laboratory scale fermentor [424]:
2pcCpc J
Carbon Dioxide:
The introduction of variables which are easy to measure yet important
in terms of their information content has been very helpful in predicting
other important process variables. One such variable is CO2 from which
biomass may be predicted with high accuracy. In this work, CO2 evolution
is assumed to be due to growth, penicillin biosynthesis and maintenance
requirements as suggested by [398]. The CO2 evolution is:
— ~ - = 1a — +aX
2 + a3 (2.57)
dt dt
2.7. Case Studies 55
Here, the values of ai, a-z and 0:3 are chosen to give CC>2 profiles similar to
the predictions of [398].
The extended model developed consists of differential equations 2.11,
2.49, and 2.51-2.57 that are solved simultaneously.
IV
J Fed-batch switch
1.5
II III
0.5
1.3
1.25
IV
1.2
«n*
1.05
Fed-batch switch
Branching:
u\ (2.58)
2.7. Case Studies 59
Extension:
Differentiation:
(2 60)
-
Mass Balance Equations
Growth of apical and subapical cells is described by saturation type
kinetics including effects of both glucose and oxygen in multiplicative form.
The motivation for this is an earlier modeling work on penicillin production
by Bajpai and Reuss [36] where growth has been described by Contois
kinetics. Here, in order to reduce the model complexity, Monod kinetics
has been used for describing the growth as suggested by Nielsen [423].
Zangirolami et al. [685] suggest that hyphal cells may still retain the same
metabolic activity and growth ability exhibited in the subapical compart-
ment to some extent and considers a growing fraction (//J of hyphal cells
in their model. On the other hand, Nielsen [423] suggests that hyphal cells
have a metabolism completely different from the actively growing apical and
subapical cells, and hence, they are believed not to contribute to the over-
all growth process and assumes ^h to be zero. For simplicity, the growth
rate of hyphal cells (/z^) is also considered to be trivial based on Nielsen's
work [423]. The overall specific growth rate (//), which is an average of the
growth rates of individual compartments, is then obtained as
In view of the above, the conservation equations for the three compartments
(components) of the cell population can be expressed as (Za+Zs+Zh=l)
/j7
—r^- — HI — u-2 + (//a — fJL)Za apical cells (2.64)
T f-7
dZh
hyphal cells. (2.66)
dn
The terms yZi (i = a, s, h) in Eqs. 2.64, 2.65, and 2.66 account for
dilution associated with biomass formation. The random fragmentation at
different positions in individual hyphal elements leads to distribution in
characteristics of the population such as the mass and numbers of total
tips and actively growing tips [423]. Estimation of the average properties
of the hyphal elements has been addressed theoretically by Nielsen [423]
and experimentally using image analysis by Yang et. al. [674, 675]. In
this case, we have made use of this population model based on the average
properties of the hyphal elements [423] . In summary,
Hyphal element balance:
^ = (^ - D)e (2.67)
(2.69)
For oxygen:
F _ V CL aoc r, \
- -CL - a0me- — /0 A
(2.74)
V ^abiotic ^abiotic at
1 J.
where a0 = ——/z + ——£ P Z S + m0. (2.75)
*
fp/o
For penicillin:
p
dP = E
—
^
„me— V -KP-'-P-,/
eppZZ,rhe^—
r^ - —P
KP ?„- —— T™ (2-76)
s
at ^abiotic V Vabiotic at
Cp
The last term in Eqs. 2.72, 2.74 and 2.76 is due to volume correction
that is applied to glucose, penicillin and oxygen concentrations since these
are based on liquid volume (V'abiotic)- Biomass concentration, X (= me ,
e = number of elements per culture volume, and rh = average mass per
element) is on the other hand based on culture volume (V).
In Eq. 2.73, ms and ep are the maintenance on glucose and the specific
rate of product formation, respectively and ota and as are the stoichiometric
biomass yield coefficients for apical and subapical cell compartments, re-
spectively. The last term on the right hand side of as (Eq. 2.73) reflects the
fact that the target antibiotic is synthesized only by subapical cells. The
dissolved oxygen balance (Eq. 2.74) can similarly be expressed after ac-
counting for oxygen consumption due to cell growth, cell maintenance and
product formation and m0 is the maintenance on oxygen in Eq. 2.75. The
mass balance for penicillin in Eq. 2.76 accounts for hydrolysis/degradation
for the antibiotic with K being the degradation/hydrolysis coefficient. The
form of £p in Eq. 2.77 is chosen so as to reflect the inhibitory effects ob-
served at high biomass and glucose concentrations.
These balances [Eqs. 2.72, 2.74 and 2.76] reduce to standard balances
without volume correction when X « l/Vbiotic? since Vabiotic = V in that
case.
of actively growing tips and mass. The model parameters are presented in
Table 2.5. Parameters related to growth and substrate consumption were
taken from Nielsen [423]. Again for simplicity, the growth kinetics of apical
and subapical compartments are assumed to be the same resulting in the
same stoichiometric yield coefficients for the two compartments and the
same maximum specific growth rates (ka = ks).
In all simulations, a batch operation is considered to be followed by a
fed-batch operation. The transition from batch culture to fed-batch culture
occurs when the level of glucose concentration reaches a threshold value (10
g/L); such threshold values are commonly used in industrial scale penicillin
production. The predictions of the model presented here under different
operating conditions were compared with various experimental data. Note
that most of the parameters are specific to the strain employed, substrate
used and culture parameters such as pH, and temperature. Hence, this work
focuses on capturing the general dynamic behavior of penicillin production
rather than concentrating on strain or medium specific conditions. A set
of simulation results are illustrated through Figures 2.7 and 2.13. Similar
to the unstructured model, it is obvious from the simulated results that
there are four distinct phases based on growth and are shown in Figures
2.7 through 2.13.
IV
:h switch
Time, h
Figure 2.7. Time course of the apical fraction of the cells based on the
structured model.
2.7. Case Studies 63
N 0.15n
100 150
Time, h
Figure 2.8. Time course of the subapical fraction of the cells based on the
structured model.
IV
Fed-b^tch switch
100 150
Time, h
Figure 2.9. Time course of the hyphal fraction of the cells based on the
structured model.
64 Chapter 2. Kinetics and Process Models
70
60
50
40
20
10
100 150
Time, h
IV
Fed-bktch switch
Time, h
0.5
\I
0.4 ^
II
r 0.3
0.1
50 100 150
Time, h
100 150
Time, h
67
68 Chapter 3. Experimental Data Collection
3.1 Sensors
Sensors may be categorized as on-line and off-line sensors. On-line sensors
are preferred since they provide process information quickly without any
disruption in the process and any sampling and cultivating delays, and
fewer human errors, and allow for arbitrary frequencies of measurement.
Off-line analysis techniques are used because of the difficulty and expense
of developing sterilizable probes or constructing a sampling system for some
process variables and product properties.
Sensors must have several characteristics that must meet the specifications
for use in a particular application [366, 439, 475]:
Accuracy is the degree of conformity to standard when the device is op-
erated under specified conditions. This is typically described in terms
of maximum percentage of deviation expected based on a full-scale
reading on the device specification sheet.
Precision (Repeatability) is the exactness with which a measuring in-
strument repeats indications when it measures the same property un-
der the same conditions. Sensors display a drift in time which can be
corrected by periodic calibration.
Range is the difference between the minimum and the maximum values
of the sensor output in the intended operating limits. It is essential
that accuracy and precision improve as the range is reduced, which
implies that a small range would be preferred. However, the range
must be large enough to span the expected variation of the process
variable under typical operational conditions, including disturbances
and set point changes.
Durability refers to the endurance of a sensor under the exposure to dif-
ferent operational conditions (pH, temperature, acidity). Since most
of the industrial scale cultivations require extensive periods of oper-
ation time for completion (2-20 days), the sensor response should be
stable for extended periods.
Reliability is the degree of how well a sensor maintains both precision
and accuracy over its expected lifetime. Reliability is a function of
the failure rate, failure type, ease of maintenance, and robustness of
the sensor.
Response Time is the time it takes for the sensor output to reach its final
value. It indicates how quickly the sensor will respond to changes in
the environment. This parameter indicates the speed of the sensor
and must be compared with the speed of the process.
3.1. Sensors 69
On-line Sensors
On-line sensors are crucial for monitoring and controlling a process for its
safe and optimal performance. These instruments can be classified as
1. sensors that do not come in contact with the cultivation broth, (e.g.,
in a thermocouple)
2. in-situ sensors that are immersed directly into the cultivation broth
and hence are in contact with it (e.g., pH meter, dissolved oxygen
probe and level sensor).
3. other sensors, such as tachometer and rotameter.
When the sensors/probes directly come in contact with the cultivation
broth, one potential problem is to maintain aseptic conditions. Under these
conditions, probe should be sterilizable and should be placed in a way
so as to avoid any possible leakage from/ to the bioreactor through the
connections. The seal is usually accomplished by elastomer "O" rings that
also provide an easy insertion of the probe.
The location of the sensor in the fermenter is very important since the
contents of the bioreactor are usually heterogeneous. As a result, the mea-
surements of variables that are critical for control action will be dependent
on the location of the sensor. Conventionally, sensors are placed in the
midsection of the vessel, though placement somewhere else may also be
considered depending on the design of the bioreactor. A sensor should be
placed in a region with sufficient turbulence to maintain the surface of the
sensor clean and avoid build-up of material on it. Besides corrupting the
sensor output, such build-up may lead to fouling of the sensor.
In the absence of in-situ sensors, on-line analysis of medium compo-
nents is preferred. The main idea is to sample the medium automatically
by collecting it in a loop that has a relatively small volume compared to
the cultivation broth and to analyze it. Automatic sampling can be per-
formed in two ways: (1) direct withdrawal of sample by using a syringe
70 Chapter 3. Experimental Data Collection
Off-line Sensors
Off-line analysis becomes a viable option especially when there is a need
to measure a large number of medium components in order to improve the
understanding of the process. Disadvantages of off-line analysis include in-
3.2. Computer-Based Data Acquisition 71
Off-line
"r Analyzers
i Analog
Final Signals
Control =$ Fermenter = Sensors
E lements
Analog
Signals 1L Supervisory
Computer
On-line
Analyzers
11 Jl H—
Digital
Signals
Data M =|A/D Converter (C=-
{-=\ D/A Converter |£d
Acquisition
Dig^j" and Control ^_
J
sf
Sig iais Computer
of discrete states. Since the number of bits in the digital code is finite, A/D
conversion results in a finite resolution, rounding off an analog number to
the nearest digital level and producing a quantization error.
The functionality and ease of use of commercially available data col-
lection and processing packages have significantly improved over the years.
Most commercial data acquisition software in the market are capable of
• capturing and recording process data over time
• data reconciliation and outlier detection
• custom tailoring data treatment according to the user's needs
• transferring the data to other software
• sending out commands or data to control instruments and final control
elements
• alarm generation and handling
• inputting time series data from any device into any application pro-
gram
• creating charts and graphs that automatically uptake real-time data
from serial devices
• performing real time analysis of data
• storing and compressing the data
Most software work with popular operating systems such as Windows 2000
and Unix. User-friendly graphical user interface (GUI) of software provides
a convenient environment for the user. Simple, menu driven, step by step
set-up is possible in most commercial software due to the interactive nature
of the GUI. Hierarchical password protection personalizes the user. In most
applications, controllers can be designed and set points can be changed as
a function of any parameter using simple pictorial function blocks avoiding
any programming.
where Xi,i = I : p are the factors, (e) is the random and systematic error
and y is the response variable. Approximating this equation by using Taylor
76 Chapter 3. Experimental Data Collection
series expansion:
y — bo + b\x\ + 6 2 x 2 + • • • + bpXp + 6i 2 xirr 2 + • • • 4- bijXiXj + • • •
-L-h T .T _1_..._1_/1-,,'T'J^J_...--I_^..T'^-J_...4-?1 T^ I ^ 9^
T^i/j p XoX p T^ I U\\JU-y ~ | "T~ Uii-^i I I "Pp«i-p V" /
feed rate (R), bioreactor temperature (T) and two different strains (S) of
the inoculum on the total amount of product (yield Y) in a fed-batch run.
The high (+) and low (—) settings for feed rate R (L/h) and temperature
T °C are 0.08, 0.02 and 35, 17, respectively. Two strains A (-) and B
(+) are used. It is assumed that approximately 5% higher production is
reached when strain A is used (first four runs in Table 3.2). The fictitious
experiments and penicillin production information are listed in a tabular
form (Table 3.2).
Table 3.2. Data from a 23 full factorial design for investigating the effects of
substrate feed rate (R L/h), bioreactor temperature (T °C) and inoculum
strains (S) on the total amount of product (Y grams) in a fed-batch run.
Run R T S Y
1 - - - 69.24
2 + - - 214.82
3 - + - 59.45
4 + + - 133.49
5 - - + 65.78
6 + - + 201.93
7 - + + 57.07
8 + + + 126.82
cillin) yield? Consider for example runs 1 and 3 in Table 3.2: The variation
in the yield is due to a variation in T and experimental error. In fact, there
are four pairs of runs in Tables 3.1 and 3.2 where R and S have identical
values in each pair and T is at two different levels. The variations in yield
with variation in temperature for the four pairs and the corresponding R
and S settings are listed in Table 3.3.
where yi+ and yt_ are the average responses for the + and - levels of
variable i, respectively. Hence, for T:
rrT =ys-
+ y4 + y7-+ y& ---
yi+yz + ys -+-
ye ,„ A(3.4)
,
Similar equations can be developed for other main effects. The main effects
of all three factors are T = -43.73, R = 106.38, and S = -6.35. D
All eight observations are used to compute the information on each of
the main effects, providing a fourfold replicate of the differences. To secure
the same precision in the OVAT approach for estimating the main effect of
temperature, eight experiments have to be conducted, four at each level of
temperature, while the other two inputs are fixed at one of their respective
levels. A total of 24 experiments (a threefold increase) is needed to obtain
the estimates of the three main effects. In general, a p-fold (p =number of
3.3. Statistical Design of Experiments 79
factors) increase in the number of experiments are needed for OVAT over
the full factorial approach. Even if all changes are made with respect to
a common experimental condition in OVAT design, (p + l)/2 times more
experiments are needed than the full factorial designs [78] .
The implicit assumption in the OVAT design is that the main effect
observed for one factor will remain the same at different settings of the other
factors. In other words, the variables act on the response additively. If this
assumption is correct, the results based on the OVAT design will provide
complete information about the effects of various factors on the response
even though the OVAT design would necessitate more experiments to match
the precision of factorial design. If the assumption is not appropriate, data
based on factorial design (unlike the OVAT design) can detect and estimate
interactions between factors that lead to nonadditivity [78].
RI x R3 = - (yi - y2 + y3 - y* - y*> + ye - y? +
_ yi+ys + y& + y& 3/2 + ?/4 + ys + yt
80 Chapter 3. Experimental Data Collection
RxTxS
(c) Three-factor interactions
Similarly,
r> 2/1+2/4+2/5+2/8 2/2 + 2/3 + 2/6 + 2/7
D D 2 / 1 + 2 / 2 + 2 / 7 + 2/8 2/3+2/4+2/5+2/6 /„ 7x
K2 X K3 = -- --- -- . (6.1)
The interactions of higher number of factors are denoted using the same
convention. For example, the three-factor interaction between concentra-
tion, temperature, and strain is denoted by R x T x S. Three-factor inter-
actions are computed using similar equations. The interaction between the
three factors (factor levels as listed in Table 3.1) and illustrated in Figure
3.2 is computed by using two factor interactions. The interaction between
jRi and R% for one level of -Rs(-) is [(2/4 - 3/3) - (7/2 - 2/i)]/2 and for the
other level of RS(+) [(2/8 — 2/7) ~ (2/6 ~ 2/s)]/2- Half of their difference (for
RZ[+] - RS[— ]) is denned as the three factor interaction:
The levels of factors such as those displayed in Table 3.2 can be used to
generate a table of contrast coefficients that facilitates the computation of
the effects (Table 3.4). The signs of the main effects are generated using
the signs indicating the factor levels. The signs of the interactions are
generated by multiplying the signs of the corresponding experiment levels
(main effect signs). For example, the main effect T is calculated by using
the signs of the third column:
Table 3.4. Signs for calculating the effects from a 23 full factorial design.
Last column (product) for use in the fermentation example
1 - 2345
2 = 1345
3 = 1245
4 = 1235
5 = 1234
12 = 345
13 = 245
14 = 235
15 = 234
23 = 145
24 = 135
25 = 134
34 = 125
35 = 124
45 = 123
design selected satisfy the relationship 123 = —45. Consequently, the 123
and 45 interactions are confounded. The individual interactions 123 and
45 are called aliases of each other. A relationship such as 5 = 1234 used
to construct the 25"1 design is called the generator of the design. Recall
that the numbers 1 to 5 used above or the uppercase letters used in Section
3.3.1 denote a factor and a column of — and + signs indicate its level.
The multiplication of the elements of a column by another column having
identical elements is represented as 1 x 1 = I 2 = I. Similarly 2 x 2 = 1 and
T x T = I. Furthermore, 2 x 1 = 2. Hence,
The relation 1=12345 is called the defining relation of the design and is
the key for determining all confoundings. For example, multiplying both
sides of the defining relation with 1 yields 1= 2345, indicating that the main
effect 1 is confounded with the four-factor interaction 2345. All confounding
patterns for the 25"1 design with the defining relation 1=12345 are given
in Table 3.6.
The complementary half-fraction design for 25 1 is made up by all the
entries in Table 3.5 without the asterisk in the "half-fraction" column. Its
86 Chapter 3. Experimental Data Collection
defining relation is I= —12345, the "—" sign indicating that the — level of
1234 interaction is used. Higher fractions such as 1/4 or 1/8 may also be
of interest because of limited resources to conduct experiments. Then, ad-
ditional denning relations must be used to design the experiment plan.
The selection of the denning contrasts and confounded effects becomes
more challenging as the number of factors and level of the fractions in-
crease, necessitating systematic procedures such as the algorithm proposed
by Franklin [164].
the standard errors must be computed. If replicate runs are made at each
set of experimental conditions, the variation between their outcomes may
be used to estimate the standard deviation of a single observation and
consequently the standard deviation of the effects [78] . For a specific com-
bination of experimental conditions, n^ replicate runs made at the iih set
of experimental conditions yield an estimate s2 of the variance a 2 having
z/j = m — 1 degrees of freedom. In general, the pooled estimate of the run
variance for g sets of experimental conditions is
. (3 12)
l 2 --'+Vg
Table 3.7. Data and computed values for Q-Q plot of the main effects and
interactions for the experimental data in Tables 3.1 and 3.2 and standard
Normal distribution
\0.14 3/8
where fl ~ (3.13)
n + 1/4
120
100
80
60
g 40
I
c 20
CO
E
-20
o RT
o
-40 L T
o
same main effects and interactions noted earlier (R, T, RT} deviate sub-
stantially from the Normal distribution (i.e., from the line of unit slope
passing through the origin in Figure 3.3).
where BjMs the "material balance" coefficient matrix of the reduced con-
straints with a residual vector (reduced balance residuals)
e = Bf x . (3.16)
The optimal value of the objective function F is
F = eTHe-1e ~ xl (3.17)
which follows a chi-squared (x2} distribution with m degrees of freedom
where m is the rank of He [113]. Additional restrictions such as flow rates
being positive or zero may be introduced so that (x + a) is not negative.
This framework can be combined with principal components analysis for
gross error detection and reconciliation [259, 591].
Other data reconciliation and gross error detection paradigms have been
proposed for linear processes operating at steady state. A serial strategy for
detecting and identifying multiple gross errors eliminates sequentially mea-
surements susceptible to gross errors, recomputes a test statistic, and com-
pares it against a critical value [258, 519]. The use of generalized likelihood
ratio (Section 8.3) method for identifying abrupt changes [651] has been pro-
posed to discriminate between gross measurement errors and process faults
(for example between malfunctions of flow rate sensors and leaks) [409]. The
92 Chapter 3. Experimental Data Collection
more than one outlier may exist in data, some outliers may be masked by
other dominating outliers in their vicinity. A patch of outlying successive
measurements is common in time series data, and masking of outliers by
other outliers is a problem that must be addressed. One approach for
determining patches of outliers is the generalization of the leave-one-out
technique to the leave-fc-out diagnostics. However, at times the presence of
a gross outlier will have sufficient influence such that deletion of aberrant
values elsewhere in the data has little effect on the estimate. More subtle
types of masking occur when moderate outliers exist close to one another
[379]. These types of masking can often be effectively uncovered by an
iterative deletion process that removes suspected outlier(s) from the data
and recomputes the diagnostics.
Several modeling methods have been proposed to develop empirical
models when outliers may exist in data [91, 595]. The strategy used in some
of these methods first detects and deletes the outlier (s), then identifies the
time series models. A more effective approach is to accommodate the pos-
sibility of outliers by suitable modifications of the model and/or method of
analysis. For example, mixture models can be used to accommodate certain
types of outliers [10]. Another alternative is the use of robust estimators
that yield models (regression equations) that represent the data accurately
in spite of outliers in data [69, 219, 524]. One robust estimator, the LI
estimator, involves the use of the least absolute values regression estima-
tor rather than the traditional least sum of squares of the residuals (the
least squares approach). The magnitudes of the residuals (the differences
between the measured values and the values estimated by the model equa-
tion) have a strong influence on the model coefficients. Usually an outlier
yields a large residual. Because the least squares approach takes the square
of the residuals (hence it is called the Z/2 regression indicating that the
residual is squared), the outliers distort the model coefficients more than
I/i regression that uses the absolute values of the residuals [523]. An im-
proved group of robust estimators includes the M estimator [216, 391, 245]
that substitutes a function of the residual for the square of the residual
and the Generalized M estimator [15, 217] that includes a weight function
based on the regressor variables as well. An innovative approach, the least
trimmed squares (LTS) estimator uses the first h ordered squared residuals
in the sum of squares (h < n, where n is the number of data points), thereby
excluding the n — h largest squared residuals from the sum and consequently
allowing the fit to stay away from the influence of potential outliers [523].
A different robust estimator, the least median squares (LMS) estimator, is
based on the medians of the residuals and tolerates better outliers in both
dependent and independent (regressor) variables [523].
Subspace modeling techniques such as principal components analysis
94 Chapter 3. Experimental Data Collection
(PCA) provide another framework for outlier detection [224, 591, 592] and
data reconciliation. PCA is discussed in detail in Section 4.1. One advan-
tage of PCA based methods is their ability to make use of the correlations
among process variables, while most univariate techniques are of limited use
because they are ignoring variable correlations. A method that integrates
PCA and sequential analysis [592] to detect outliers in linear processes op-
erated at steady state is outlined in the following paragraphs. Then, PCA
based outlier detection and data reconciliation approach for batch processes
is discussed.
PCA can be used to build the model of the process when it is operating
properly and the data collected do not have any outliers. In practice, the
data sets from good process runs are collected, inspected and cleaned first.
Then the PCA model is constructed to provide the reference information.
When a new batch is completed, its data are transformed using the same
PCs and its scores (see Section 4.1) are compared to those of the reference
model. Significant increases in the scores indicate potential outliers. Since
the increases in scores may be caused by abnormalities in process operation,
the outlier detection activities should be integrated with fault detection
activities. The PCA framework can also be used for data reconciliation as
illustrated in the example given in this section.
Consider a set of linear combinations of the reduced balance residuals
e defined in Eq. 3.16:
ye = WeTe = A<r 1/2 U e T e (3.18)
where Ae is a diagonal matrix whose elements are the magnitude ordered
eigenvalues of He (Eq. 3.15). Matrix Ue contains the orthonormalized
eigenvectors of He (detailed discussion of PCA computations are presented
in Section 4.1). The elements of vector ye are called PC scores and cor-
respond to individual principal components (PC). The random variable e
has a statistical distribution with the mean 0 and covariance matrix He
(e ~ (0,H e )). Consequently, ye ~ (0,1) where I denotes the identity ma-
trix (a diagonal matrix with Is in the main diagonal), and the correlated
variables e are transformed into an uncorrelated set (ye) with unit vari-
ances. Often the measured variables are Normally distributed about their
mean values. Furthermore, the central limit theorem would be applicable
to the PCs. Consequently, ye is assumed to follow Normal distribution
(ye ~ N(0,T)) and the test statistic for each PC is
ye,i = (Wje), ~ 7V(0,1), i = 1, - • • , m (3.19)
which can be tested against tabulated threshold values. When an outlier
is detected by noting that one or more ye^ are greater than their thresh-
old values, Tong and Crowe [592] proposed the use of contribution plots
3.4. Data Pretreatment: Outliers and Data Reconciliation 95
(Section 8.1) to identify the cause of the outlier detected. They have also
advocated the use of sequential analysis approach [625] to make statistical
inferences for testing with fewer observations whether the mean values of
the PCs are zero.
Outlier detection in batch processes can be done by extending the PCA
based approach by using the multiway PCA (MPCA) framework discussed
in Section 4.5.1. The MPCA model or reference models based on other
paradigms such as functional data analysis (Section 4.4) representing the
reference trajectories can also be used for data reconciliation by substituting
"reasonable" estimated values for outliers or missing observations. The
example that follows illustrates how the MPCA models can be used for
outlier detection and data reconciliation.
Example Consider a data set collected from a fed-batch penicillin fer-
mentation process. Assume that there are a few outliers in some of the
variables such as glucose feed rate and dissolved oxygen concentration due
to sensor probe failures. This scenario is realized by adding small and large
outliers to the values of these variables as shown in Figure 3.4. Locations
of the outliers for the two variables are shown in Table 3.8.
In this example, a multiway PCA (MPCA) model with four principal com-
ponents is developed out of a reference set (60 batches, 14 variables, 2000
samples) for this purpose. A number of multivariate charts are then con-
structed to unveil the variables that might contain outlying data points and
the locations of the outliers in those variables. The first group of charts one
might inspect is the SPE, T2 charts and the charts showing variable con-
tributions to these statistics. Contribution plots are discussed in detail in
Section 8.1. Both SPE and T2 charts signal the outliers and their locations
correctly (Figure 3.5), but they do not give any information about which
variable or variables have outliers. At this point of the analysis, contribu-
tion (to SPE and T2 values) plots are inspected to find out the variables
responsible for inflating SPE and T2. Since outliers will be projected far-
ther from the plane defined by MPCA model, their SPE values are expected
to be very high. Consistently, SPE contribution plot indicates two variables
Glucose feed rate (no. 3) 500, 750, 800, 1000, 1500, 1505, 1510
Dissolved O<2 cone. (no. 6) 450, 700, 900, 1400, 1405, 1410
96 Chapter 3. Experimental Data Collection
. 0.06
.I
CD" 0.05
Feed Ra
0.04 - 1 1
0.03
$ 0.02 - -
8
O 0.01 - -
0
) 200 400 600 800 1000 1200 1400 1600 1800 2000
Sample number
1.3
~^V
a 0.9
| 0.8
b
0.7
200 400 600 800 1000 1200 1600 1800 2000
Sample number
0.8
£0.6
w
I 0.4
i
0.2
0.8
1- 0.6
o
O 0.4
0.2
n __•
500 1000 1500 2000 1 2 3 4 5 6 7 8 9 1011 121314
Sample No. Variable
Figure 3.5. Multivariate charts for detecting and diagnosing outliers. Vari-
able 3 is glucose feed rate and variable 6 is dissolved oxygen concentration.
1000
1 500 8 ) ( .1510
ffl°° 1 1505
750 1500
0.8 <
J- 0.6
c
° 0.4
0.2
n
[ vVv/ A n
500 ^ 1000 1500 500 1000 1500
Sample No. Sample No.
u.u/
•••- Measured
0.06 1500 o Estimated •
1505
| 1510
^
of 0.05
2
T3
A rtyl
U.U4 m~~™»^-«~™™«™-~«™™«-™TO™««_™K_M^.--Jk™™~,,iW
; ! T |
.£
0.03
<D I ! 800 !
8 0.02 ! i i
n
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Sample no.
_, ••••• Measured
"ra
1.2
l 90° O Estimated
d ^**™^
Bxife^
I
^
c
1.1 \j I ! >-——ww«»s^_^^»
c
0)
1 i i 1400 -i
141
o I 700 °
•a 0.9 |
a>
>
o
w 0.8 1
450
b
n7
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Sample no.
where n is the current sampling time and na and rib are the lengths of the
past sampling time windows for y and x signals, respectively. This is the
3.5. Data Pretreatment: Signal Noise Reduction 101
(b) Filtered signal with a poor MA fil- (c) Filtered signal with a good MA
ter. filter.
(d) Filtered signal with a poor ARMA (e) Filtered signal with a good ARMA
filter. filter.
n r
,2842
c B
° o
,31
-38
o41
2 0
I
35 34
Figure 3.9. Biplot of first two score vectors (ti and t2, respectively) of
the MPCA model representing normal operation with 95 and 99 % control
limits.
is used to ensure that the energy of the scaled and translated signals are
the same as the mother wavelet. Scale parameter specifies the location
in frequency domain and translation parameter determines the location in
time domain. This equation can be interpreted as the inner product of x ( t )
with the scaled and translated versions of the basis function \l> [116]:
(3.26)
t-b
(3.27)
Scaled and translated versions of the basis functions are obtained from
the mother wavelet (Eq. 3.27). The discrete wavelet transform is used to
reduce the computational burden without losing significant information. To
obtain the discretized wavelet transform, scale and translation parameters
are discretized as a — 2-7 and b = 2J x k. Then, there exists \I> with good
time-frequency localization properties such that the discretized wavelets
3.5. Data Pretreatment: Signal Noise Reduction 105
1000 1500
Sample no.
Figure 3.10. CO^ concentration profile before and after MPCA based de-
noising.
constitute an orthonormal basis. For this reason, although there are other
choices for discretization, dyadic discretization is used frequently [116]. The
discretized wavelet function becomes
where < •, • > indicates the inner product operation. Mallat [362] developed
a fast pyramid algorithm for wavelet decomposition based on successive
filtering and dyadic downsampling. Figure 3.11 represents this process for
one scale. The input signal X is filtered by a low pass filter L(n) and a high
pass filter H(n) in parallel obtaining the projection of the original signal
onto wavelet function and scaling function. Dyadic downsampling is applied
to the filtered signal by taking every other coefficient of the filtered output.
The same procedure is repeated for the next scale to the downsampled
output of L(n) shown as Al, since the low pass output includes most of
the original signal content. By applying this algorithm successively, scaling
coefficients dj and wavelet coefficients dj at different scales j can be found
as
dj = -i . (3.32)
Increasing the scale yields scaling coefficients that become increasingly
smoother versions of the original signal. The original signal can be com-
puted recursively by adding the wavelet coefficients at each scale and the
scaling coefficients at the last scale.
Haar wavelet [116] is the simplest wavelet function that can be used
as a basis function to decompose the data into its scaling and wavelet
coefficients. It is defined as
0 < t < 1/2
1/2 < t < 1 (3.33)
otherwise
and its graphical representation is shown in Figure 3.12. The scaling and
wavelet coefficients for the Haar wavelet are [1,1] and [1,-1], respectively.
Haar wavelet transform gives better results if the process data contain jump
discontinuities. Most of batch process data by nature contain such disconti-
nuities which make Haar wavelet a suitable basis function for decomposing
\2
12
w(t)1
) 0.5 1 t
batch process data. A noisy process signal (COi evolution rate) was decom-
posed in four scales using Haar wavelet in Figure 3.13. The low frequency
component (dominating nonlinear dynamics) of the original signal (upper-
most figure) is found in the scaling coefficients at the last scale whereas the
high frequency components that are mostly comprised of noise appear at
wavelet coefficients at different scales.
Wavelets are widely used to remove the noise from signals by extracting
the low frequency content and removing the high frequency content above
a threshold value. The denoised signal is obtained by reconstructing the
signal by applying inverse wavelet transform to the scaling and thresholded
wavelet coefficients. Thresholding, a crucial step of wavelet denoising, can
be applied either as soft or hard thresholding. Hard thresholding (Eq. 3.34)
removes the wavelet coefficients smaller than the threshold and replaces
them with zero:
\x\ > A
(3.34)
otherwise
where 6h(x) denotes the threshold value of x. Soft thresholding shrinks the
wavelet coefficients which are greater than the threshold value towards zero
by subtracting the threshold value from the wavelet coefficients as well:
x >A
Ss(x] = \x < A (3.35)
x < -A
Different methods for selecting the threshold value have been suggested in
the literature by Donoho and co-workers [132]. These methods are grouped
108 Chapter 3. Experimental Data Collection
scaling coefficients
10 15 20 25 30 35 40 45 50
0.2
ivtelet coefficients at scale 1
-0.2
50 100 150 200 250 300 350 400
0.2
-0.2
40 60 80 100 120 140 160 180 200
0.5
wavelet coefficients at scale 3
-0.5
20 30 40 50 60 70 90 100
-0.5
10 15 20 25 30 35 40 45 50
original signal
HJV>>
^ *^^^
1.16
1.14
1.12
1.1
1.08
1.06
The subscript x/s denotes that, cell yield (X] is based on substrate
(5). This notation is especially important when there is more than
one substrate which significantly influences cell mass yield. This defi-
nition of yield can be extended to non-biomass products (P) with the
basis being substrate consumed or biomass produced:
_ amount of product produced _ AP .
amount of substrate consumed AS
or
amount of product produced AP , .
amount of cell mass produced AX
The cell mass yield based on oxygen (Yx/0) and yield of ATP (Adeno-
sine triphosphate, Y A T P / X ] can be obtained in analogous manner.
Respiratory Quotient, RQ, is defined as the rate of carbon dioxide for-
mation divided by the rate of oxygen consumption in aerobic growth.
_ rate of CQ2 formation . .
rate of O<2 consumption
This ratio can be calculated from on-line measurements of feed and
exit CC>2 and 62 using CC>2 and C>2 analyzers. If the nature of the
major extracellular product(s) is known (i.e., x, y, z of CH x O y N z ),
then it is possible to calculate the parameters a, /?, 7 and 5 in Eq.
3.37 from experimental measurement of RQ and one other measure-
ment. If no significant amount of extracellular product is formed, as
in some cell growth processes, then it is evident from Eqs. 3.38-3.41
(/3 = 0) only one measurement such as RQ is needed to calculate
stoichiometric coefficients.
Degree of Reductance of an organic compound is defined as the number
of electrons available for transfer to oxygen upon combustion of the
compound to CO2, N2 and H^O. It is also defined as the number
of equivalents of available electrons per g atom of the compound.
The number of equivalents for carbon, hydrogen, oxygen and nitrogen
are 4, 1, -2 and -3 respectively. In view of this, for different cell
compositions, degree of reductance can be calculated. Examples of
the degree of reductance values for a wide range of compounds can
be found in [514].
energy while the latter consumes energy. However, some energy is always
lost as heat. For this reason, in large-scale processes, it is necessary to
remove this heat so that the culture is maintained at its optimum temper-
ature. When there is negligible amount of extracellular product formation
under aerobic conditions, the growth reaction (Eq. 3.36) may be rewritten
as,
The total heat evolved (AQ) during growth can be calculated from an
enthalpy balance
(3.50)
Typical values of Y^cai range between 0.096 and 0.126 g/kcal for many
microorganisms .
When significant amount of product is present, based on the stoichio-
metric description of cell growth (Eq. 3.36), total heat evolved (Eq. 3.48)
114 Chapter 3. Experimental Data Collection
should be modified to
(3.51)
(3.52)
Example
In this example, stoichiometric balances and calculation of yield coefficients
will be illustrated for growth of Penicillium chrysogenum and penicillin
production. For growth, a simple stoichiometric model can be used that
is based on the theoretical analysis of biosynthesis and polymerization by
Nielsen [424] and is given by:
Both penicillin G and a-AAA would accumulate. From the above stoi-
chiometry (Eqs. 3.54 and 3.55), the theoretical yield of penicillin on either
glucose, or ammonia or sulfate can be calculated based on the definition
of yield coefficient for the two cases (in which a-AAA is either recycled
or discarded) [112]. Theoretical yield coefficients are presented in Table
3.10, [112] where the stoichiometry of Eq. 3.54 is used in case 1 and the
stoichiometry of Eq. 3.55 is used in case 2.
116 Chapter 3. Experimental Data Collection
119
120 Chapter 4. Linear Data-Based Model Development
should not be used for extrapolation. There are numerous well established
techniques for linear input-output model development. Nonlinear input-
output model development techniques have been proposed during the last
four decades, but they have not been widely accepted. There are more than
twenty different paradigms, and depending on the type of nonlinearities in
the data, some paradigms work better than others for describing a specific
process. The design of experiments to collect data and the amount of data
available have an impact on the accuracy and predictive capability of the
model developed. Data collection experiments should be designed such that
all key features of the process are excited in the frequency ranges of inter-
est. Since, the model may have terms that are composed of combinations
of inputs and/or outputs, exciting and capturing the interactions among
variables is crucial. Hence, the use of routine operational data for model
development, without any consideration of exciting the key features of the
model, may yield good fits to the data, but provide models that have poor
predictive ability. The amount of data needed for model development in-
creases with the order of first principle models, linear input-output models,
and nonlinear input-output models.
Biochemical processes have become increasingly instrumented in recent
years. More variables are being measured and data are being recorded more
frequently [304, 655]. This creates a data overload, and most of the use-
ful information gets hidden in large data sets. There is a large amount of
correlated or redundant information in these process measurements. This
information must be compressed in a manner that retains the essential in-
formation about the process, extracts process knowledge from measurement
information, and presents it in a form that is easy to display and interpret.
A number of methods from multivariate statistics, systems theory and ar-
tificial intelligence for data based model development are presented in this
chapter.
Model development may have various goals. These goals warrant consid-
eration of the following cases. One case is the interpretation and modeling
of one block of data such as measurements of process variables. Princi-
pal components analysis (PCA) may be useful for this to retain essential
process information while reducing the size of the data set. A second case
is the development of a relationship between two groups of data such as
process variables and product variables, the regression problem. PCA re-
gression or partial least squares (PLS) regression techniques would be good
candidates for addressing this problem. Discrimination and classification
are activities related to process monitoring that lead to fault diagnosis.
PCA and PLS based techniques as well as artificial neural networks (ANN)
and knowledge-based systems may be considered for such problems. Since
all these techniques are based on process data, the reliability of data is
4.1. Principal Components Analysis 121
the data collected from various periods of plant operation when the per-
formance is good. The PCA model development is based on this data set.
This model can be used to detect outliers in data, data reconciliation, and
deviations from NO that indicate excessive variation from normal target
or unusual patterns of variation. Operation under various known upsets
can also be modelled if sufficient historical data are available to develop
automated diagnosis of source causes of abnormal process behavior [488].
Principal Components (PC) are a new set of coordinate axes that are
orthogonal to each other. The first PC indicates the direction of largest
variation in data, the second PC indicates the largest variation unexplained
by the first PC in a direction orthogonal to the first PC (Fig. 4.1). The
number of PCs is usually less than the number of measured variables.
T = XP , X = TPT + E (4.2)
S - PLPT (4.3)
X - UAV T (4.4)
Figure 4.2. Data preprocessing: Scaling of the variables, (a) Raw data,
(b) After mean-centering only, (c) After variance-scaling only, (d) After
autoscaling (mean-centering and variance-scaling) [145, 181].
where tr(S) = tr(L). A more precise method that requires large computa-
tional time is cross-validation [309, 659]. Cross-validation is implemented
by excluding part of the data, performing PCA on the remaining data, and
computing the prediction error sum of squares (PRESS) using the data re-
tained (excluded from model development). The process is repeated until
4.2. Multivariable Regression Techniques 125
every observation is left out once. The order A is selected as that minimizes
the overall PRESS. Two additional criteria for choosing the optimal number
of PCs have also been proposed by Wold [659] and Krzanowski [309], related
to cross-validation. Wold [659] proposed checking the following ratio
(47)
V
'
where RSS^ is the residual sum of squares after Ath principal component
based on the PCA model. When R exceeds unity upon addition of another
PC, it suggests that the Ath component did not improve the prediction
power of the model and it is better to use A — 1 components. Krzanowski
[309] suggested the ratio
W = ( PRESS A-i-PRESS A )/£> m
(
PKESSA/DA ''
- 2A, DA = JK(I - 1) - / + JK - 2i
1=1
where Dm and DA denote the degrees of freedom required to fit the Aih
component and the degrees of freedom after fitting the Ath component,
respectively. If W exceeds unity, then this criterion suggests that the Ath
component could be included in the model [435].
• The correlation between any two predictors exceeds 0.95 (only colin-
earity between two predictors can be assessed).
• Stepwise regression
• Ridge regression
Computation of F-statistics:
Regression sum of squares: SSR = Y^(Vi ~ 2/) 2 » witn P degrees of freedom
(d.f.), Error sum of squares: SSE = ^(yi — y) 2 , with d.f.= m — p — 1.
Denote a model of order r by M% and a model of order r + 1 by MI , and
their error sum of squares by SSE? and SSEi , respectively. Then
. (4.14)
r+l-r ra-r-2
Y = TB + E (4.17)
where the optimum matrix of regression coefficients B is obtained as
B = (TTT)-1TTY . (4.18)
Substitution of Eq. 4.18 into Eq. 4.17 leads to trivial E's. The inversion
of TTT should not cause any problems due to the mutual orthogonality of
the scores. Score vectors corresponding to small eigenvalues can be left out
in order to avoid colinearity problems. Since principal components regres-
sion is a two-step method, there is a risk that useful predictive information
would be discarded with a principal component that is excluded. Hence
caution must be exercised while leaving out vectors corresponding to small
eigenvalues.
4.2. Multivariable Regression Techniques 129
In the Y data:
r__t[Y_ „ Yq,
Ql Ul (422>
- pTtJ ' - IJqTqTil '
130 Chapter 4. Linear Data-Based Model Development
Figure 4.3. The matrix relationships in PLS [145]. T and U show PLS
scores matrices on X and Y blocks, respectively, P, X loadings, W and
Q represent weight matrices for each block, E and F are residual matrices
formed by the variation in the data that were left out of modeling.
(4.24)
fti|
Once the scores and loadings have been calculated for the first latent vari-
able, X- and Y-block residuals are computed as
= X - t1Pf (4.25)
(4.26)
The entire procedure is now repeated for the next latent variable start-
ing with Eq. 4.21. X and Y are replaced with the residuals EI and FI,
respectively, and all subscripts are incremented by 1. Hence, the variabil-
ity explained by the earlier latent variables is filtered out from X and Y
by replacing them in the next iteration with their residuals that contain
unexplained variation.
4.3. Input-Output Modeling of Dynamic Processes 131
Several enhancements have been made to the PLS algorithm [118, 198,
343, 363, 664, 660, 668] and software is available for developing PLS models
[472, 548].
Disturbances d(t), residuals e(t} = y(t) — y(t}, and random noise attributed
to inputs, outputs and state variables are also represented by column vectors
with appropriate dimensions in a similar manner.
because they describe the relationship of the present value of the output
to external variables but do not provide any knowledge about the physical
description of the processes they represent.
A general linear discrete time model for a single variable y(t] can be
written as
y(t) = r ) ( t ) + w ( t ) (4.28)
where w(t) is a disturbance term such as measurement noise and 17 (t) is the
noise-free output
)u(t) (4.29)
with the rational function G(q,0] and input u(t). The function G(q,9)
relates the inputs to noise-free outputs whose values are not known because
the measurements of the outputs are corrupted by disturbances such as
measurement noise. The parameters of G(q, 9} (such as bi in Eq. 4.30) are
represented by the vector 6, and q is called the shift operator (Eq. 4.31).
Assume that relevant information for the current value of output y(i) is
provided by past values of y(t) for ny previous time instances and past
values of u(t) for nu previous instances. The relationship between these
variables is
(4.31)
where
r)(t) = G(q,Q] u(t] with G(q,0) = -=$-. (4.34)
4.3. Input-Output Modeling of Dynamic Processes 133
Often the inputs may have a delayed effect on the output. If there is a
delay of nk sampling times, Eq. (4.30) is modified as
i7(t) + firt(t -l} + --- + fnyrj(t- ny) (4.35)
= biu(t - nk] + b2u(t - (nk + 1)) H ----- h bnuu(t - (nu + nk- 1)) .
The disturbance term can be expressed in the same way
w(t) = H(q,0)e(t) (4.36)
where e(t) is white noise and
~ D(d] ~ 1 + cfcg-i + - - -
The model (Eq. 4.28) can be written as
y(t) = G(q, 0)u(t] + H(q, 0)e(t) (4.38)
where the parameter vector 0 contains the coefficients &i, C{, di and fi of the
transfer functions G(q,0} and H(q,0). The model structure is described
by five parameters ny, nu, nk, nc, and n^. Since the model is based
on polynomials, its structure is finalized when the parameter values are
selected. These parameters and the coefficients are determined by fitting
candidate models to data and minimizing some criteria based on reduction
of prediction error and parsimony of the model.
The model represented by Eq. (4.38) is known as the Box-Jenkins
(BJ) model, named after the statisticians who have proposed it [79]. It
has several special cases:
• Output error (OE) model. When the properties of disturbances
are not modeled and the noise model H(q) is chosen to be identity
(nc = 0 and U& = 0), the noise source w(t) is equal to e(£), the
difference (error) between the actual output and the noise-free output.
• AutoRegressive Moving Average model with eXogenous in-
puts (ARM AX) . If the same denominator is used for G and H
A(q] = F(q) = D(q) = I + a.q'1 + ••• + anaq~n~ . (4.39)
Hence Eq. (4.38) becomes
A(q)y(t) = B(q}u(t] + C(q)e(t] (4.40)
where A(q)y(t) is the autoregressive (regressing on previous values
of the same variable y(t)} term, C(q)e(t) is the moving average of
white noise e(t), and B(q}u(€) represents the contribution of external
inputs. Use of a common denominator is reasonable if the dominating
disturbances enter the process together with the inputs.
134 Chapter 4. Linear Data-Based Model Development
1 A
(4.41)
t=i
where "arg min" denotes the minimizing argument. This criteria has to be
extended to prevent overfit of data. A larger model with many parameters
may fit data used for model development very well, but it may give large
prediction errors when new data are used. Several criteria have been pro-
posed to balance model fit and model complexity. Two of them are given
here to illustrate how they balance accuracy and parsimony:
• Akaike's Information Criterion (AIC)
Design the
experiment and
collect data
second part of this section. They order state variables according to the
magnitude of their contributions in explaining the variation in data. State-
space models also provide the structure for developing state estimators
where one can estimate corrected values of state variables, given process
input and output variables and estimated values of process outputs. State
estimators are discussed in the last part of this section.
The notation in Eq. (4.50) can be simplified by using Xfc or x(fc) to denote
A : n x n B : n x m
C : p x n D : p x m
These models are called linear time-invariant models. Mild nonlinear-
ities in the process can often be described better by making the matrices
in model equations (4.49) or (4.50) time dependent. This is indicated by
symbols such as A(£) or Ffc.
Disturbances
Disturbances are inputs to a process. Some disturbances can be measured,
others arise and their presence is only recognized because of their influence
on process and/or output variables. The state-space model needs to be
138 Chapter 4. Linear Data-Based Model Development
x(t) = f(x(t),u(t),w(t))
y(t) = h(x(t),u(t),w(t)) (4.53)
where w(t) denotes disturbances. It is necessary to describe w(t) in order
to compute how the state variables and outputs behave. If the disturbances
are known and measured, their description can be appended to the model.
For example, the linear state-space model can be written as
where wi(t) and W2(£) are disturbances affecting the state variables and
outputs, respectively, and EI and £2 are the corresponding coefficient ma-
trices. This model structure can also be used to incorporate modeling
uncertainties (represented by wi(t)) and measurement noise (represented
by w 2 (t)).
Another alternative is to develop a model for unknown disturbances to
describe w(t) as the output from a dynamic system with a known input
uw(t) that has a simple functional form.
where the subscript w indicates state variables, inputs and functions of the
disturbance(s). Typical choices for input forms may be an impulse, white
noise, or infrequent random step changes. Use of fixed impulse and step
changes lead to deterministic models, while white noise or random impulse
and step changes yield stochastic models [347]. The disturbance model is
appended to the state and output model to build an augmented dynamic
model with known inputs.
uss:
f(x ss ,u ss ) = 0 . (4.56)
If f(x, u) has continuous partial derivatives in the neighborhood of the
stationary solution x = xss, u = uss, then for £ — ! , • • • ,n:
r\ p
ft(x,u) - fe(xss,uss) + -—^-(x s5 ,u ss )(:ri - zss,i) H ---- (4.57)
xi
. ... dxn \ I dui
... .du
A= : •-. : , B= : •.. : (4.58)
a/n a/n / I a/n a/n
8x1 ''' 9a:™ / \ 8ui du
gular values (SV) of a covariance matrix (the ratio of the specific SV to the
sum of all the SVs [21] generated by singular value decomposition (SVD) or
an information theoretic approach such as the Akaike Information Criterion
(AIC) [315].
The Hankel matrix (Eq. 4.65) is used to develop subspace models. It
expresses the covariance between future and past stacked vectors of output
measurements. If the stacked vectors of future (3^) and past (y^K} data
are given as
Yfc
Yfc+i yfc-2
and »** = (4.64)
Yk-K
the Hankel matrix (note that H#j is different than the H matrix in Eq.
(4.63)) is
A2
A3
(4.65)
Aj AJ+/C-I
where A^ is the autocovariance of y/t's which are i time period apart and
E[-] denotes the expected value of a stochastic variable. K and J are past
and future window lengths. The non-zero singular values of the Hankel
matrix determine the order of the system, i.e., the dimension of the state
variables vector. The non-zero and dominant singular values of HJK are
chosen by inspection of singular values or metrics such as AIC.
CV (canonical variate) realization requires that covariances of future
and past stacked observations be conditioned against any singularities by
taking their square roots. The Hankel matrix is scaled by using R^ and
Rj defined in Eq. (4.67). The scaled Hankel matrix (H.JK) and its singular
value decomposition is given as
HJK =
-1-1/2
R K\ = U£VT (4.66)
where
(4.67)
n contains the n left eigenvectors of HJK, £ n xn contains the singular
values (SV), and V/<:pxn contains the n right eigenvectors of the decompo-
sition. The subscripts associated with U, E and V denote the dimensions
142 Chapter 4. Linear Data-Based Model Development
of these matrices. The SVD matrices in Eq. 4.66 include only the SVs and
eigenvectors corresponding to the n state variables retained in the model.
The full SV matrix S has dimension Jp x Kp and it contains the SVs in a
descending order. If the process noise is small, all SVs smaller than the nth
SV are effectively zero and the corresponding state variables are excluded
from the model.
The state variables are given as
= Fxfc + Gufc +
yfc = Cxfc + Dufc + H 2 v fc (4.69)
indicates span of
available data
(a) Filtering
(b) Smoothing
(c) Prediction
In smoothing, the time of the estimate falls within the span of measurement
data available. The state of the process at some prior time is estimated
based on all measurements collected up to the current time. In prediction,
the time of the estimate occurs after the last available measurement. The
state of the process in some future time is estimated. The discussion in this
section focuses on filtering (Fig. 4.6), and in particular on Kalman filtering
technique.
An estimate x of a state variable x is computed using the measured
outputs y. An unbiased estimate x has the same expected value as that of
Process Measurement
Error Error A Priori
Sources Sources Information
1 1 | Process
1 Process 1 1 State
T Stfltp T Observation » Potimato
DDOPPQQ
X(t) _
MPAQI IDPMCMT
y(t) KALMAN x(t) _
FILTER
xfc ^ F f c ^ X f c . i + W f c . ! (4.70)
where Xfc is an abbreviation for x(tfc), and the subscript of Ffc_i indicates
that it is time dependent (F(tfc_j)). Note that the time index is shifted
back by 1 with respect to the discrete time state-space model description
in Eq. (4.50) to emphasize the filtering problem, w^ is a zero mean, white
(Gaussian) sequence with covariance Qk, and the system is not subjected
to external inputs (unforced system) (G(tfc) = 0). The measured output
equation is
yfc=CfcXfc+Vfc (4.71)
where v& is a vector of random noise with zero mean and covariance Rfc
corrupting the output measurements yk- Given the prior estimate of Xfc
denoted by x^ , a recursive estimator is sought to compute an updated esti-
mate x^ based on measurements y k • The recursive estimator uses only the
most recent values of measurements and prior estimates, avoiding the need
for a growing storage of past values. The updated estimate is a weighted
sum of x^ and y^:
x £ = K i x f c +K f c y f c (4.72)
where K^, and Kfc are unspecified (yet) time-varying weighting matrices.
Expressing the estimates as the sum of unknown real values and estimation
errors denoted by Xfc
and inserting the equation for x^" and Eq. (4.71) in Eq. (4.72), the estima-
tion error x^ becomes:
The corresponding estimation error is derived from Eqs. (4.71), (4.73) and
(4.76) as
x+ = (I - K fc C fc )Xfc + Kkyk . (4.77)
The error covariance matrix P^ changes when new measurement informa-
tion is used.
where Pj£" and P^~ are the prior and updated error covariance matrices,
respectively [182].
From Eq. (4.76), the updated estimate is equal to the prior estimate
corrected by the error in predicting the last measurement and the magni-
tude of the correction is determined by the "gain" K^. If the criterion for
choosing K.k is to minimize a weighted scalar sum of the diagonal elements
of the error covariance matrix Pj£" , the cost function Jk could be
Substituting Eq. (4.80) in Eq. (4.78) provides a simpler expression for PjJ~
[182]:
P+ = ( I - K f c C f c ) P ^ . (4.81)
The equations derived so far describe the state estimate and error co-
variance matrix behavior across a measurement. The extrapolation of these
entities between measurements is
x^ = Ffc-ixjjlj (4.82)
y = C(t)x + v (4.85)
P(t) = A(t)P(t)+P(t)AT(t)+E(t)Q(t)ET(t]
-P(t)C T (t)R~ 1 (t)C(t)P(t) (4.86)
Description Equation
Process model x = Ax + Ew w fc ~ AT(0, Q)
Output model y = Cx + v vfc ~ N(Q, R)
Initial £[x(0)] = x0 R"1 exists
conditions £[(x(0)-xo)(x(0)-x 0 ) T l = P 0
State estimate x = Ax + K ( y - C x ) , x ( 0 ) = x 0
Error covariance P = AP + PAT + EQET - PC^R-^CP, P(0) = Po
Kalman gain K = PCTR-1 when E[w(i)v T (r)] = 0
matrix = (PCT + EZ)R- X when ^[w(t)v T (r)] = Z6(t - T)
the arguments (t) from the matrices in the equations of Table 4.2 for com-
pactness. Time dependency of system matrices A, C, and E will indicate
which matrices in other equations are time dependent.
If the process is excited by deterministic inputs u (either a deterministic
disturbance or a control signal), the procedures for computing P and K
remain the same, but the estimators are modified. For the discrete time
process, state equation Eq. (4.70) becomes
(x - x) + • • • (4.96)
=x
By using the first two terms of the expansion in Eq. (4.96) an approximate
differential equation for the estimation error covariance matrix is obtained:
Fy (*(*),*) = (4.100)
x(t)=x(t)
To complete the filtering algorithm, update equations that account for new
measurement information must be developed. Assume that the estimate of
x(£) and its associated covariance matrix are propagated using Eqs. (4.98)
and (4.99) and denote the solutions at time tk by x^ and P^T. When a new
measurement y^ is received the updated estimates are
x + = X f c +Kfc(yfc-hfc(Xfc)) (4.101)
with
P+ = (I-K f c H f c (Xfc))Pfc . (4.102)
The same approach as the linear case is used to determine the optimal filter
gain matrix:
>
I f , fc —
— P~H/
k •^fc Vfv~
k 'HHi
v fc\f v jfc~ ) l Pk~ H /k v(TC~}
fc / -4-' n^k)
Tt, \ f4 1 0*^
{*±.i\JO)
where
/ni, /'Y^, ^^
(4.104)
x(t f c )=x f c
real shift in the level of the process variables caused by nonstationary dis-
turbances such as changes in the impurity level of the feedstock. Stochastic
nonstationary disturbances force the process to drift away from determinis-
tic model predictions. The presence of a disturbance state model in addition
to white noise variables w^ and v/c will provide the necessary information
for tracking the trajectories of state variables. A common practice for elim-
inating the offset caused by nonstationary disturbances (instead of using
the disturbance state model) is to increase the Kalman filter gain K^ ei-
ther directly or indirectly by augmenting the magnitude of the state noise
covariance matrix Qfc-i (Refer to Eqs. (4.80) and (4.83)). This will reduce
the bias, but will also increase the sensitivity of the Kalman filter to mea-
surement noise, just like the effect of increasing the proportional gain of a
feedback controller with only proportional action. The addition of the non-
stationary disturbance model will have an effect similar to integral action
in feedback controllers to eliminate the offset.
Since most Kalman filter applications for processes with nonstationary
disturbances are for processes with external inputs u and involve processes
that are typically nonlinear, the incorporation of the nonstationary distur-
bance model will be illustrated using an EKF for processes with external
inputs. The nonlinear process is described by
where
I Q<
The measurements are represented by Eq. (4.95), which can be modified if
the inputs u directly affect the measurements. The Kalman filter, and the
recursive relations to compute the filter gain matrix K&, and the covari-
ance propagation matrices P^T and P^ are given by Eqs. (4.101), (4.103),
(4.102), and (4.99) or (4.83), respectively.
A challenge in this approach is the selection of covariance matrices Q/j,
Rfc, and PQ. Process knowledge and simulation studies must be used to
find an acceptable set of these tuning parameters to prevent biased and
poor estimates of state variables. Knowledge of the initial state x0 affects
the accuracy of estimates as well. If x0 initially unknown, the EKF can
152 Chapter 4. Linear Data-Based Model Development
be restarted from the beginning with each new measurement using the up-
dated estimate xo|fc. Convergence to the unknown XQ is usually achieved
during the early states of the estimation, but there is a substantial increase
in computational load. If feedback control is used during the batch run,
rapid convergence to initially unknown disturbance and parameter states
can be achieved using this reiterative Kalman filter. One approach to im-
plement the reiterative estimation of XQ is to combine a recursive nonlinear
parameter estimation procedure with the EKF [300].
x = f(x, u, w)
y = h(x,u,v) (4.112)
where x, u, y are the state, input (control) and output (measurement) vec-
tors, w and v are disturbance and measurement noise vectors, and f and
g are nonlinear function systems. Assume that the batch run can be parti-
tioned into i operating regimes that can be represented sufficiently well by
local model structures
X = fi(x,U,W,0i)
y - h,(x,u,v,0i) (4.113)
parameterized with the vector 0^. Each local model will be valid in its
particular operating regime. Denote by 0^ the operating point (described
by some x, u, y) representing a specific regime 3>j. The whole batch run
(the full range of operation) is composed of N regimes: {$1, • • • , &N} = &•
The selection of variables to characterize an operating regime will be process
dependent, containing a subset of state variables, inputs, and disturbances.
Assume the existence of a smooth model validity function pi that has a value
close to 1 for operating points where the model i of Eq.(4.113) is a good
description of the process, and close to 0 otherwise. Define an interpolation
4.3. Input-Output Modeling of Dynamic Processes 153
such that ^i-i ^i(</>) = 1. To guarantee a global model, not all local model
validity functions should vanish at any operating point 4>.
The modeling framework consists of three tasks [156]:
• Decompose the operating range of the process into a number
of operating regimes that completely cover the whole range of oper-
ation (complete batch run). This can be achieved based on process
knowledge or by using computerized decomposition tools on the basis
of an informative data sequence [156].
• Develop a local model structure using process knowledge and
data. Assign local model validity functions.
• Identify local model parameters. The unknown parameter sets
$ ! ? • • • i &N are identified. If the models are linear, many model iden-
tification methods and tools that are readily available can be used.
Attention must be paid during data collection to generate data that
contain significant information for all operating regimes.
(4.116)
154 Chapter 4. Linear Data-Based Model Development
where the deviation variables (Eq. 4.60) are defined using xss = xo,i and
Yss — yo,i and the elements of the Jacobian matrices are derived by eval-
uating the derivatives of f and g at the corresponding operating point
(XQ i,yo,i) such that
<9f
A; = (4.117)
<9x
A linear time-varying global model is constructed by using the local
models to approximate the nonlinear dynamics of a batch run. The time-
varying model is obtained by interpolating between local models by using
model validity functions pi(t) which are similar to the interpolation function
uji of Eq. 4.114. Model validity functions are the estimates of the validity
of various local models in different operating points and for N local models
N
P(*) = bi(*),P2(*), • • • ,PN(t}} X>(£) = 1 • (4.118)
i=l
The state space matrices of the global model are then parametrized in terms
of p(t] as a LPV model:
{A[p(*)],B[p(t)],C[p(t)],D[p(*)]} - (4-119)
The LPV model dynamics can be constructed in terms of model validity
functions as
£ = A[p(t)]x + B[p(t)]u
y = C[p(t)]x + D[p(*)]u (4.120)
with noise covariance matrices Q& and Rfc for w and v, respectively. A
moving horizon estimator that updates the piS based on a past window of
4.3. Input-Output Modeling of Dynamic Processes 155
data of length ne can be developed. At the start of the batch, the number
of data samples is fewer than ne and the measurement data history is given
by data of different length
p(j|Y fc ) =
p\.yk)
1
- (4.124)
- . (4.1*0
156 Chapter 4. Linear Data-Based Model Development
P01Y(0)) = {^ ^\ k < ne
\ N~l J
(4.129)
p(j\Y(k-ne)) = — k > ne
where p\ > (1 —pi)/(N — 1) and N is the number of local models. The rela-
tive magnitude of p\ with respect to pi depends on the expected magnitude
of disturbances, for large disturbances pi is closer to pi [312].
Local model dynamics are affected by disturbances entering the batch
process, and the PDFs of various local models may become identical. The
proximity of model outputs to the optimal profiles may be used to select
the best local model with a moving horizon Bayesian estimator (MHBE)
with time-varying tuning parameters [312]. The aim of the MHBE is to
assign greater credibility to a model when plant outputs are closer to the
outputs around which the model is identified. This reduces the covariance
of model residuals fi^ for model i at time k. This approach is implemented
by reducing the noise covariance matrices Q;^ and ~Ri,k which may be used
as the tuning parameters for the respective Kalman filters. Relating these
covariances to deviations from optimal output trajectories
where yo,i is the optimal output profile for local model i and a is a tuning
parameter, Q;;/c and R^/c of the most likely local model is reduced. The
Euclidian norm in Eqs. 4.130 is defined as x ||2= XTX. Consequently,
the residual covariance matrix fi^fc is reduced as well and the probability
of model i is increased. The parameter a reflects the trust in the model
and at higher values promotes rapid transition between models [312]. Case
studies reported in [312] indicate that model predictive control of a batch
reactor using LVS models provided better control than model predictive
control with extended Kalman filters.
run as
u = [uT(0) u T (l) ... uT(N-l)]T
y = [yT(l] y T (2) . . . yT(N)}T (4.131)
d = [d T (l) d T (l) ... dT(N}}T
where N is the batch run data length. Given the initial condition yini =
y(0), a nonlinear model relating the outputs to inputs and disturbances is
expressed as
where Au fe+ i = u fc+ i-u fe , Ay in i )fc+ i = yini,fe+i-yini,fc, wj£ and vjj are zero-
mean, independently and identically distributed random noise sequences
with respect to fe, and e^ is the noise-free (cannot be measured) part of the
error trajectory. Matrices Gy and Gf ni are linear system approximations.
The same modeling approach can be applied to secondary outputs s (out-
puts that are not used in control systems) and quality variables q and the
resulting models can be combined. Define the error trajectory vector for
the controlled outputs, secondary outputs and quality variables as
B* . (4.134)
*-*»i
with wm(t) — I for all t. The representation (L:Efc) 2 (£) is used to underline
the pointwise nature of time- variant data [495]. Defining the K x m re-
gressor matrix T(t) and the -ftT-dimensional dependent variable vector A(i)
as
160 Chapter 4. Linear Data-Based Model Development
Z^Cjifa (4.143)
i
where the mL coefficients c = [cji]j=itm;i=i!L are stored as a column vector.
The estimates c are the solution of Re = — s resulting from the minimiza-
tion of the quadratic form
(4.145)
Sj =
^D^M^xMdt . (4.146)
f
k=i
The integrals are evaluated numerically by using traditional tools such as
the trapezoidal rule.
An alternative computation of w can be made by attaching a penalty term
toEq. (4.139):
.. K m—l ,.
2
PSSE(L} = - ^(Lxk) (t] +Y.°i ^(*) 2 <ft (4-147)
J
fc=l j=0
4.4. Functional Data Analysis 161
1-5
Original data
- - (1) Method 1
0.5 (2) Method 2
- - (3) Method 3
- - (4) PDA
- - (5) PDA+local weighting
•= 0
i-0.5
Original data
(1) Method 1
(2) Method 2
(3) Method 3
'-1.5 (4) PDA
(5) PDA+local weighting
-2
100 200 -r-300
u 400 500 600
Time, h
1.6
°> 1.4
.o
5 1.2
o>
| 0.8
o
£ 0.6 Original data
(1) Method 1
0.4 (2) Method 2
(3) Method 3
0.2 (4) PDA
(5) PDA+local weighting
Original data
(1) Method 1
(2) Method 2
(3) Method 3
(4) PDA
(5) PDA+local weighting
ta®Pa + E (4.149)
a=l
/~7\
a=l
IxJxK 1x1x1 1xJxK IxJxK
(4.150)
0=1
Batches
Variables
b(2)
b(l)
PT
Figure 4.10. Batch data representation and unfolding process. The rows
are batches and columns are the variables, Vj, sampled at each time Tk
[433].
the final product qualities are not well predicted by process measurements
when the residuals in the Y space [SPEy = ]Cc=i F(^ c ) 2 ] are large. The
W, P, and Q matrices of MPLS model bear all the structural information
about how the process variables behaved and how they are related to the
final quality variables. Implementation of this technique is discussed in
Section 6.4.4.
mx tT
Super Level
m
m X2 \ t 2 xb
Figure 4.12. CPCA and HPCA methods [640, 665]. X data matrix is
divided into b blocks (Xi, X2,. . ., X&) with block b having mxb variables.
Wold et al. [665] the modeling process data (hundreds of variables) from a
catalytic cracker.
Another PLS algorithm called multiblock PLS (MBPLS) has been in-
troduced to deal with data blocks [629, 640, 666]. The algorithm could
handle many types of pathway relationships between the blocks. It is log-
ically specified from left to right. Left end blocks are denned as blocks
that predict only while right end blocks are blocks that are predicted but
do not predict. Interior blocks both predict and are predicted. The main
difference between this method and HPLS is that in MBPLS, each X block
is used in a PLS cycle with Y block to calculate the block scores t&, while
in HPLS tb is calculated as in CPCA. The basic methodology is illustrated
in Figure 4.14 where there is a single Y block and two X blocks and the
algorithm is given as
mY u
Super Level
mx1 t, mX2 m
xb
\
X1 i X2 xb
-Jf
p] PT2 J— J- Prb
Figure 4.13. HPLS method [640, 665]. X data matrix is divided into 6
blocks (Xi, X2,..., Xfc) with block 6 having mxb variables while only one
Y block containing my variables is present.
172 Chapter 4. Linear Data-Based Model Development
Another application of the same algorithm is also reported for the case
of wet granulation and tableting [638]. An improvement (with respect to
ordinary PLS method) in prediction of a number of pharmaceutical tablet
properties was reported.
When extra information is available (such as feed conditions, initial
conditions, raw material qualities, etc.), this information should be incor-
porated in the multiway MBPLS framework. A general block interaction
can be depicted for a typical batch process as shown in Figure 4.15. In this
typical multiblock multiway regression case (multiblock MPLS), the blocks
are the matrix containing a set of initial conditions used for each batch,
Z(/ x TV), the three-way array of measurements made on each variable at
each batch, X(J x J x A"), and Y(/ x M) containing quality measurements
made on batches. Kourti et al. have presented an implementation of this
MBPLS technique [297] by dividing process data into two blocks based on
different polymerization phases and also incorporating a matrix of initial
conditions. An improvement in the interpretation of multivariate charts
4.5. Multivariate Statistical Paradigms 173
and fault detection sensitivity on individual phases are reported. The abil-
ity to relate the faults detected to initial conditions was another benefit of
multiblock modeling that included relations between initial conditions and
final product quality.
INITIAL
ccEDITIONS QUALITY
z Y i
Batches
oc Batches
Batches
Variables
and C are rotation-dependent i.e., there can be no multiple solutions for the
calculated set of loadings. In terms of data manipulations, the PARAFAC
model is a simplification of the Tucker model in two ways [555]:
1. The number of components in all three modes (/, J, K] are equal, and
2. There is no interaction between latent variables of different modes.
The use of MPCA and three-way techniques have been reported for SPM
in [641, 646].
Benefits
• Adaptive Behavior. ANNs have the ability to adapt, or learn, in
response to their environment through training. A neural network
can easily be retrained to deal with minor changes in the operational
and/or environmental conditions. Moreover, when it is operating in a
nonstationary environment, it can be designed to adjust its synaptic
weights in real time. This is especially a valuable asset in adaptive
pattern classification and adaptive control.
Limitations
• Long Training Times. When structurally complex ANNs or inappro-
priate optimization algorithms are used, training may take unreason-
ably long times.
• Necessity of Large Amount of Training Data. If the size of input-
output data is small, ANNs may not produce reliable results. ANNs
provide more accurate models and classifiers when large amounts of
historical data rich in variations are available.
(4.160)
and
Vk = v(vk] (4.161)
Fixed input
W =
Output
Inputs
Synaptic weights
(including bias)
where xi,X2, . . . ,Xj, . . . ,xm are the input signals; Wki,Wk2, . . • , Wkj> ••>
are the synaptic weights of neuron k, Uk is the linear combiner output of the
input signals, bk is the bias, Vk is the activation potential (or induced local
field), <£>(•) is the activation function, and y^ is the output signal of the neu-
ron. The bias is an external parameter providing an affine transformation
to the output Uk of the linear combiner.
Several activation functions are available. The four basic types illus-
trated in Figure 4.17 are:
1. Threshold Function. Also known as McCulloch-Pitts model [377]
2. Piecewise-linear Function.
<p(v) = -— av (4.164)
1 + e~
where a is the slope parameter.
4. Hyperbolic Tangent Function. This is a form of sigmoid function but
it produces values in the range [—1, +1] instead of [0,1]
e_ . (4.165)
Processing units (neurons) are linked to each other to form a network as-
sociated with a learning algorithm. A neural network can be formed with
any kind of topology (architecture). In general, three kinds of network
topologies are used [226]:
Single-layer feedforward networks include input layer of source nodes
that projects onto an output layer of neurons (computation nodes),
but not vice versa. They are also called feedforward or acyclic net-
works. Since the computation takes place only on the output layer
nodes, the input layer does not count as a layer (Figure 4. 18 (a)).
180 Chapter 4. Linear Data-Based Model Development
1 1
0.8 0.8
0.6 0.6
0.2 0.2
0. 0
-0.5 0 0.5 1 -1 -0.5 0 0.5 1
\) D
Sigmoid Function Hyperbolic Tangent Function
Jordan network [306]. The activation values of the output units are
fed back into the input layer through a set of extra units called the
state units. Learning takes place in the connection between input and
hidden units as well as hidden and output units. Recurrent networks
are useful for pattern sequencing (i.e., following the sequences of the
network activation over time). The presence of feedback loops has
a profound impact on the learning capability of the network and on
its performance [226]. Applications to chemical process modeling and
identification have been reported [97, 616, 679].
Before proceeding with training the network, an appropriate network ar-
chitecture should be declared. This can be done either in static or dy-
namic manner. Many ad hoc techniques for static network structure se-
lection are based on pruning the redundant nodes by testing a range of
network sizes, i.e., number of hidden nodes. However, techniques for net-
work architecture selection for feedforward networks have been proposed
[301, 335, 482, 627, 628]. Reed [499] gives a partial survey of pruning algo-
rithms and recent advances can be found in the neural network literature
[144, 404].
and the actual response (computed value) of the network. This cor-
rective algorithm is repeated iteratively until a preset convergence
criteria is reached. One of the most widely used supervised train-
ing algorithms is the error backpropagation or generalized delta rule
proposed by Rumelhart and others [527, 637].
Learning without a teacher, in which there is no teacher, and the net-
work must find the regularities in the training data by itself. This
paradigm has two subgroups
1. Reinforcement learning/Neurodynamic programming,
where learning the relationship between inputs and outputs is
performed through continued interaction with the environment
to minimize a scalar index of performance. This is closely related
to Dynamic Programming [53].
2. Unsupervised learning, or self-organized learning where there
is no external teacher or critic to oversee the learning process.
Once the network is tuned to the statistical regularities of the
input data, it forms internal presentations for encoding the input
automatically [48, 226].
Batch Age
Substrate Feed
1, bias
Figure 4.19. A hypothetical feedforward ANN with one hidden layer for
estimating substrate and biomass concentrations in fed-batch penicillin
fermentation (OUR: Oxygen uptake rate, S: Substrate cone., X: Biomass
cone., Wij and Wjk weight vectors associated with interconnected layers).
3. Input variables with nonrandom features that are called external (ex-
ogenous) (X) variables.
Volterra series models [624] do not utilize previous values of the depen-
dent variable, while nonlinear autoregressive moving average models with
exogenous variables (NARMAX) (Eqs. 4.171-4.174) use all three types of
variables. Model structures are either linear or nonlinear in the parameters.
Model parameter estimation task is much less computation intensive if the
model parameters appear in a linear structure. This permits use of well-
developed parameter estimation techniques for linear modeling paradigms.
NARMAX, bilinear (Eq. 4.170), and threshold models (Eq. 4.177) are
linear in the parameters, while exponential models are nonlinear in the pa-
rameters.
Volterra models have been utilized by Wiener [644] for the study of non-
linear systems by constructing transformations of Volterra series in which
the successive terms are orthogonal. Expressing y(t) as a function of current
and past values of a zero mean white noise process e(t)
oo oo oo
gijke(t - i)e(t - j } e ( t - k) + . . . (4.167)
; = 1 fc=l
where
A* = H la 5 • J — -\o /,
de(t-t)Ja \de(t - i)de(t -
(4.168)
When input u(t) and output y(i) are both observable, the Volterra series
can be represented in terms of the input by replacing e(t) by u(t). If the
system is linear, only the first derivative term is present and the model
is completely characterized by the transfer function gi of the system. For
nonlinear processes, additional terms in Eq. (4.167) must be included, and
the generalized transfer functions concept is used [479].
4.7. Extensions of Linear Modeling Techniques 187
where e(£) is a sequence of iid random variables, aj, ft-, and 6 are model
parameters. Since 6 is in the argument of the exponential term, the model
estimation problem is computationally more challenging.
Bilinear models [394] cannot describe several types of nonlinearities such
as limit cycles, but they have a simple form that can describe processes
where products of two variables appear in equations derived from first prin-
ciples. The general form of a bilinear model is
where
3/i (0
y(t) = : , u(t) = : , e(t) = : (4.172)
are the system output, input and noise, respectively, ny,nu, and ne are the
maximum lags in the output, input and noise, respectively, { e ( t } } is a zero
mean iid sequence, and f (•) is some vector valued nonlinear function.
NARMAX models can be illustrated by a NAR model
yq(t) = / 9 (yi ( * - ! ) , - • •
+e 9 (t), q = l,m (4.173)
EE
(4.174)
predicted in terms of past and present inputs and outputs. This approach
is similar to linear subspace state-space modeling [316, 415, 613]. The
appeal of linear and nonlinear subspace state-space modeling is the ability
to develop models with error prediction for a future window of output
(window length selected by user) and with a well-established procedure that
minimizes trial-and-error and iterations. An illustrative example of such
modeling is presented based on a simulated continuous chemical reactor
that exhibits multiple steady states in the outputs for a fixed level of the
input [122].
Models with a small number of monomials are usually adequate to de-
scribe the dynamic behavior of most real processes. Methods have been
developed for the combined structure selection and parameter estimation
problem based on Gram-Schmidt orthogonalization [94]. The selection of
monomials is carried out by balancing the reduction in residuals and in-
crease in model complexity. Criteria such as Akaike Information Criteria
(AIC) are used to guide the termination of modeling effort. A variant of
AIC is given in Eq. 4.42
~ 0 + E ^e(t - i) (4.177)
i=0
190 Chapter 4. Linear Data-Based Model Development
where the appropriate parameter set (oj, 6j) is selected based on y(t — d) e
.Rj, j = 1,1. Here .Rj — (rj-_i,rj) with the linearly ordered real numbers
ro < 7*1 < • • • < TI called the threshold parameters, and d is the delay
parameter [24]. The identification of threshold models involves estimation
of model parameters and selection of d and TJ . The threshold model (Eq.
4.177) can be reduced to an AR structure by setting b^ — I and b ^ ' =
0, i = l,m — 1. External input variables can also be incorporated and
the condition for selection of parameter sets may be based on the input
variables. The submodels may also be nonlinear functions such as NARX
and NARMAX models.
Models Based on Spline Functions. Spline functions provide a non-
parametric nonlinear regression method with piecewise polynomial fitting.
A spline function is a piecewise polynomial where polynomials of degree q
join at the knots Kl, i = 1, k and satisfy the continuity conditions for the
function itself and for its q — 1 derivatives [658]. Often continuity of the
first and second derivatives are enough, hence cubic splines (q = 3) have
been popular. One-sided and two-sided power univariate basis functions
for representing gth order splines are
where subscript + indicates that the term is evaluated for positive values,
the basis function has a value of zero for negative values of the argument.
The multivariate adaptive regression splines (MARS) method [169]
is an extension of the recursive partitioning method. Friedman [169] de-
scribes the evolution of the method and presents algorithms for build-
ing MARS models. An introductory level discussion with applications in
chemometrics is presented by Sekulic and Kowalski [539]. Spline fitting is
generalized to higher dimensions and multivariable systems by generating
basis functions that are products of univariate spline functions
r=l
where OQ is the coefficient of the constant basis function B\ and the sum
is over all the basis functions Bm produced by the selection procedure.
4.7. Extensions of Linear Modeling Techniques 191
Basis function selection is carried out in two steps. The first step is forward
recursive partitioning which selects candidate basis functions. The second
step is backward stepwise deletion which removes splines that duplicate
similar information. Both steps are implemented by evaluating a lack-of-fit
function [169]. A recent study reports the comparison of models developed
by MARS and ANN with sigmoid functions [480].
Nonlinear Polynomial Models with Exponential and Trigonomet-
ric Terms (NPETM). If process behavior follows nonlinear functions such
as trigonometric, exponential, or logarithmic functions, restricting model
structure to polynomials would yield a model that has a large number
of terms and acceptable accuracy over a limited range of predictor vari-
ables. Basically, several monomials are included in the model in order to
describe approximately the functional behavior of that specific exponential
or trigonometric relation. For example,
Cascade Systems
Cascade structures [210, 367] are composed of serially connected static
nonlinear and dynamic linear transfer function blocks. This structure is ap-
propriate when the process has static nonlinearities. The structure is called
192 Chapter 4. Linear Data-Based Model Development
= C0a (4.182)
where a represents the model dimension, coa, ci a , and C2a are constants,
and ha is a vector of residuals. This quadratic function can be generalized
to other nonlinear functions of t a :
ua = / ( t a ) + h a (4.183)
where /(•) may be a polynomial, exponential, or logarithmic function.
4.7. Extensions of Linear Modeling Techniques 193
Contributing author
Inang Birol
One of the nature's greatest mysteries is the reason why she is un-
derstandable. Yet, she is understandable, and the language she speaks
is mathematics. The history of science is full of breakthroughs when the
mathematics of a certain type of behavior is understood. The power of
that understanding lies in the fact that, stemming from it, one can build a
model for the phenomenon, which enables the prediction of the outcome of
experiments that are yet to be performed.
Now, it is a part of scientific folklore [195], how Lorenz [350] realized
the phenomenon that was later given the name deterministic chaos; how it
stayed unnoticed on the pages of the Journal of Atmospheric Sciences for
a period of time only to be rediscovered by other scientists; and how it un-
folded a new scientific approach. In fact, the existence of chaotic dynamics
has been known to mathematicians since the turn of the century. The birth
of the field is commonly attributed to the work of Poincare [473]. Subse-
quently, the pioneering studies of Birkhoff [55], Cartwright [89], Littlewood
[344], Smale [551], Kolmogorov [284] and others built the mathematical
foundations of nonlinear science. Still, it was not until the wide utilization
195
196 Chapter 5. Nonlinear Model Development
of digital computers in late- 1970s for scientific studies, that the field made
its impact on sciences and engineering. It has been demonstrated that chaos
is relevant to problems in fields as diverse as chemistry, fluid mechanics, bi-
ology, ecology, electronics and astrophysics. Now that it has been shown
to manifest itself almost anywhere scientists look, the focus is shifted from
cataloging chaos, to actually learning to live with it. In this chapter, we are
going to introduce basic definitions in nonlinear system theory, and present
methods that use these ideas to analyze chaotic experimental time series
data, and develop models.
x(i + l ) = f ( x ( i ) ) (5.2)
Poincare Map
Although most physical systems manifest continuous dynamics, maps arise
naturally in many applications. Furthermore, even when the natural state-
ment of a problem is in continuous time, it is often possible and sometimes
desirable to transform the continuous dynamics to a map. Note, however
5.1. Deterministic Systems and Chaos 197
Phase Volume
The way the phase space volume changes in time is an important property
of systems with continuous or discrete dynamics. Select a subset of the
phase space with a positive finite (hyper-) volume, and evolve the points
in this subset in time. If the volume defined by the new subset is always
equal to the initial volume, the dynamics under investigation belongs to a
conservative system, such as a Hamiltonian system. If, on the other hand,
that volume is changing in time, we have a nonconservative system. If the
phase volume of the system always increases, the system will be structurally
unstable, and the trajectories will diverge to infinity. Thus we cannot ob-
serve such systems for long, and they are not of much interest. The class of
systems with shrinking phase volume in time are called dissipative systems.
They are structurally stable, and the methods introduced in this chapter
are directed at studying such systems.
The rate of change of the phase space volume for a continuous flow
defined by Eq (5.1) is given by the trace of the tangent flow matrix, (or the
Jacobian matrix evaluated along the flow),
. (5.4)
If this rate is positive (negative), then the phase space volume grows (shrinks)
in time.
or —cr a 0
-X3 + p —1 —xi (5-6)
X<2 X\ —ft
which has a trace r = —a — I — ft, that is less than zero. Thus, an initial
phase space volume, V(0) shrinks with time as V(t) — V(0)ert.
5.1. Deterministic Systems and Chaos 199
A similar definition is made for the map of Eq. (5.2), using the magni-
tude of the determinant of the tangent flow matrix,
r = det ~ (5.7)
Eq. (5.7) defines the factor by which the n-dimensional phase space volume
changes. If this factor is greater (less) than one, the phase space volume
grows (shrinks) at the next iteration.
Example 2 Phase volume change of a map
Consider the two dimensional Henon map [231], given by
xi(i + l) = a - x\(i) + Px2(i) (5.8)
z 2 (« + l) = xi(i),
where a and (3 are constants. The tangent flow matrix for this system
(5 9)
OK ~ 1 0 '
has a constant determinant r = |/3|. The hyper volume defined in this phase
space is in fact an area, since the phase space is two dimensional. If the
"absolute value" of the parameter (3 is less than one, then the area shrinks
by a factor of \(3\. D
Systems with phase space contraction, such as the ones presented in the
last two examples, are commonly characterized by the presence of attrac-
tors. The trajectories of the system originating from a specific region of the
phase space are attracted to a bounded subset of the phase space, called
the attractor, and that specific region that hosts all such initial conditions
is called the basin of attraction for that attractor.
(5.10)
a at
where 0 is the angular displacement from the vertical, £ is the damping
coefficient accounting for friction, and a and /3 are the forcing amplitude
and forcing frequency, respectively. Note that, this is a non-autonomous
second order system. Applying the definitions,
we can transform the system into an autonomous third order system. First,
consider a pendulum with no forcing (a = 0) , which reduces the phase space
dimension to two. The system will have infinitely many steady states,
located at 9 = ±/CTT and w = 0, with k = 0, 1, 2, ____ The steady states for
even k (corresponding to the lower vertical position) are stable, and those
for odd k (corresponding to the upper vertical position) are unstable. If
we also set £ = 0 to eliminate friction, the pendulum will swing back-and-
forth, or rotate in an infinite loop, determined solely by its initial conditions,
as shown in Figure 5. 2. a. Note that, the pendulum with no friction is a
Hamiltonian system, hence it conserves the phase space volume due to
Liouville theorem. If, however, we consider a finite friction, the energy
of the trajectories will eventually be consumed, and the pendulum will
come to a halt at one of its steady states (Figure 5.2.b). To have a better
understanding of the mechanism of this, take a closer look at the trajectories
near the two types of steady states. Near the stable ones, (Figure 5. 3. a)
the trajectories spiral down to the steady state. Near the unstable steady
states, trajectories approach the (saddle) steady state from one direction,
and are repelled in another direction. There are some trajectories that
seem to violate the uniqueness of the solution; they approach the steady
state from opposite directions to meet at the steady state, and diverge
from it in opposite direction, starting from the steady state. If we consider
the time aspect of the problem, the uniqueness condition is not actually
violated, since it takes infinitely long to converge to the steady state, on the
trajectories that end in there. Similarly, for the trajectories that emanate
5.1. Deterministic Systems and Chaos 201
from the steady state, it takes infinitely long to diverge from the steady
state. Another property of the trajectories that converge to this steady
state is the partitioning of the phase space in the sense that trajectories on
the right hand side of this trajectory cannot cross over to the left hand side
of it, and vice versa. Therefore, they define the boundaries of the basins of
attraction.
Next, introducing the forcing back into the system, we have a three di-
mensional phase space. For certain combinations of driving amplitude and
frequency, we observe a rich dynamic behavior, which neither converges to
a steady state, nor gets attracted by a limit cycle. Instead, the trajectories
explore a finite subset of the phase space, converging to a strange attractor.
When the system converges to a steady state (also called a limit point),
the limit set of the system in phase space is an object of zero dimension.
When it converges to a limit cycle, the limit set is still an object of integer
dimension (one). However, when the system exhibits a rich dynamic behav-
ior, such as the one shown in Figure 5.2.c, the limit set is a fractal object
with a non-integer dimension. We will discuss the concept of dimension in
the next section in more detail.
One way of identifying chaotic behavior is using the Poincare surface
of section technique. For example, let us consider the periodically driven
pendulum again, and use a surface of section on the angle of the forcing term
(j). If we operate the system with £ = 0.4, a = 1 and j3 = 2/3, it converges
to a periodic trajectory which gives a single point in the Poincare surface
of section of Figure 5.4.b. If we operate it with £ = 0.4, a = 1.4 and
J3 — 2/3, the dynamics would be richer, and we observe a fractal object
resembling the shape in the projection of the attractor on the (0, it;)-plane
(Figure 5.4.d). This kind of an attractor is called a strange attractor.
The manifestation of chaos in the dynamics of a system is often asso-
ciated with a sensitive dependence on its initial conditions. If we initialize
our driven pendulum with slightly different initial conditions around its
strange attractor, initially nearby trajectories diverge exponentially in time
as shown by solid and dotted curves in Figure 5.5.a.
If we observe a block of initial conditions, shown in Figure 5.5.b, for 4
units of simulation time, the volume element that we started with shrinks
in one direction, and is stretched in another. If we keep on observing the
system, since the trajectories stay confined in a certain region of the phase
space, the volume element cannot perform this shrinking and stretching
without eventually folding on itself (Figure 5.5.c). This stretch-and-fold
routine repeats itself as the dynamics further evolves. Hence, we will even-
tually find points that were arbitrarily close initially, separated in the phase
space by a finite distance. In fact, the stretch-and-fold is the very mecha-
nism that generates the fractal set of the strange attractor. D
202 Chapter 5. Nonlinear Model Development
Figure 5.2. Phase space of pendulum, projected on (#, w) plane, (a) Trajec-
tories for no friction case with several initial conditions either oscillating or
rotating around the stable steady state, (b) Trajectories for several initial
conditions converge to the stable steady state, when there is friction, (c)
The chaotic trajectory of the driven pendulum with friction.
5.1. Deterministic Systems and Chaos 203
(b)
Figure 5.3. Trajectories near steady states of the pendulum, 9 = ITT (a) for
i — even, and (b) for i = odd.
(d)
Figure 5.4. Periodically driven pendulum (a) goes to a limit cycle for £ =
0.4, a = l and /3 = 2/3. (b) Strobing it with a frequency that is a multiple
of the oscillation frequency results in a single point in the Poincare section.
(c) If we operate the system with £ = 0.4, a = 1.4 and j3 — 2/3, it leads to
this chaotic orbit, and (d) strobing this motion results in a fractal object
in the Poincare section.
5.1. Deterministic Systems and Chaos 205
- 9 - 7 - 5 - 3 - 1 1 3 5 7 9
(c)
Using the length of the ith ellipsoidal principal axis Pi(t), we can define the
ith Lyapunov exponent of the system from
t t-oo t pl(0) -
when the limit exists. Aj are conventionally ordered from largest to the
smallest. Note that, this definition is akin to the definition of eigenvalues
for linear systems, but unlike the eigenvalues, there is no unique direction
associated with a given Lyapunov exponent. This is understandable, since
the eigenvalue is a local definition, and, characterizes a steady state, while
the Lyapunov exponent is a time average associated with a principal axis,
that continuously changes orientation as it evolves.
As one classifies linear systems using their eigenvalues, Lyapunov spec-
tra can be used to classify the asymptotic behavior of nonlinear systems.
For example, for a system to be dissipative, the sum of its Lyapunov expo-
nents should be negative. Likewise, if we have a Hamiltonian system, the
sum of its Lyapunov exponents should be zero, due to the volume preserv-
ing property of such systems. A continuous dynamical system is chaotic, if
it has at least one positive Lyapunov exponent.
In the investigation of chaotic systems, we have mentioned that third-
order systems have a special importance. For third order dissipative sys-
tems, we can easily classify the possible spectra of attractors in four groups,
based on Lyapunov exponents.
1. ( — , — , — ) : a fixed point,
2. (0, — , — ): a limit cycle,
3. (0,0,-): a 2-torus,
4. ( + , 0 , — ) : a strange attractor.
Therefore, the last configuration is the only possible third-order chaotic
system. However, in a continuous fourth-order dissipative system, there are
three possible types of strange attractors with Lyapunov spectra (+, 0, — , — ),
(+, 0, 0, — ) and (+, +, 0, — ). Note that, all three configurations have at least
one vanishing Lyapunov exponent. In fact, it is required by the theorem of
Haken [215] that the system should have at least one zero Lyapunov expo-
nent, if the trajectory of its attractor does not have a fixed point. The last
case where there are two positive Lyapunov exponents is called the hyper
chaos.
The classical Lyapunov exponent computation method of Wolf et al.
[669] is based on observing the long time evolution of the axes of an in-
finitesimal sphere of states. It is implemented by defining the principal
5.1. Deterministic Systems and Chaos 207
(b)
Figure 5.6. Time evolution of the fiducial trajectory and the principal axis
(axes), (a) The largest Lyapunov exponent is computed from the growth
of length elements, (b) The sum of the largest two Lyapunov exponents is
computed from the growth of area elements.
208 Chapter 5. Nonlinear Model Development
axes, with initial conditions that are separated as small as the computer
arithmetic allows, and by evolving these using the nonlinear model equa-
tions. The trajectory followed by the center of the sphere is called the fidu-
cial trajectory. The principal axes are denned throughout the flow via the
linearized equations of an initially orthonormal vector frame "anchored" to
the fiducial trajectory. To implement the procedure, the fiducial trajectory
on the attractor is integrated simultaneously with the vector tips defin-
ing n arbitrarily oriented orthonormal vectors. Eventually, each vector in
the set tends to fall along the local direction of most rapid growth (or a
least rapid shrink for a non-chaotic system). On the other hand, the col-
lapse toward a common direction causes the tangent space orientation of all
axis vectors to become indistinguishable. Therefore, after a certain inter-
val, the principal axis vectors are corrected into an orthonormal set, using
the Gram-Schmidt reorthonormalization. Projection of the evolved vectors
onto the new orthonormal frame correctly updates the rates of growth of
each of the principal axes, providing estimates of the Lyapunov exponents.
Following this procedure, the rate of change of a length element, li, around
the fiducial trajectory, as shown in Figure 5. 6. a, would indicate the domi-
nant Lyapunov exponent, with
(5.13)
tk+i - tk I
(5.14)
tk+l - tk ak
GI
dt ~~ t^~ * 9xG^
5.1. Deterministic Systems and Chaos 209
Routes to Chaos
Unlike the continuous flows, discrete maps need not have a minimum phase
space dimension to exhibit chaotic behavior. Since the values are attained
at discrete instances, orbit crossings in data representations are mostly
superfluous, hence do not pose the existence- uniqueness problems of con-
tinuous flows. Even a first order discrete map can produce chaotic behavior,
as shown in the following example.
The logistic map is a one dimensional nonlinear system, given by the dif-
ference equation
(l - x(i}} (5.16)
which was originally proposed to model population dynamics in a limited
resource environment [375]. The population size, x(i), at instant i is a
normalized quantity. It can be easily shown that a choice of /j, in the range
[0,4] guarantees that, if we start with a physically meaningful population
size, i.e., x(0) e [0, 1], the population size stays in [0, 1].
If we simulate the system with an initial condition x(0) = 0.1, we will
obtain the results shown in Figure 5.7 for various n values. The system
goes to a steady state for fj. = I and fj, — 2, but as // is further increased to
2.9, the behavior of the convergence to a steady state is qualitatively dif-
ferent than the previous cases. It reaches the steady state in an oscillatory
manner. For // = 3.3, the oscillations are not damped anymore, and we
have periodic behavior, every other value of X{ being equal for large i. The
system is said to have a two-period oscillation in this regime. For fj, = 3.5,
210 Chapter 5. Nonlinear Model Development
the asymptotic behavior of the system is similar to the previous case. This
time, we have a four-period oscillation though. The demonstrated increase
in the period is actually common to many nonlinear systems, and is called
period doubling. The period-doubling mechanism is a route to chaos, that
has been studied extensively, since it is encountered in many dynamical
systems. One interesting finding is that, period doubling may be charac-
terized by a universal number independent of the underlying dynamics. In
our example, if we label the kih period doubling value of // with ^, then
r ,. Hk ~ Mfc-1 fr- ,-N
6 = hm (5-17)
fc-+oo /J,fc + i — //fc
x : ; : : : x : : : : : • ; •
0.8 0.8
i i i ; i i i i i '
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 10
(a) (b)
x ; : : : : x : : . : :
0.8 0.8 : :
<••' i i i :
0.6
: : :
: .
0.6 -••';• ;••' : • ': i
0.4 0.4 -•••i ;-•• ; • • • • ; : :-•••••
0.2 0.2 -•-•i :- • : ••••] : :
. • • • • i ; : : : : ' : {
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 10
(c) (d)
;
x : : : :
• :" f • . '. . ° .•* " *
0.8 0.8
• . : : : : : : • :....;.../ : ;. .;•. ,.
0.6 0.6
:
: • • . : . ':
04 0.4
0.2 0.2
l i i i i i i i i '
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 1C
(e) (f)
Figure 5.7. Simulation results of the logistic map for (a) ^ = 1, (b) \JL = 2,
(c) /z = 2.9, (d) n = 3.3, (e) fj, = 3.5, and (f) fj, = 3.9.
212 Chapter 5. Nonlinear Model Development
/^ = 3.83 (Figure 5.9), which is an evidence that this map has a region in the
parameter space (// for this example) that it experiences chaotic behavior.
After this periodic window, we again observe a chaotic regime, but this
time the chaos is reached via the intermittency route. There are three
documented intermittency routes [540]
R + 2P^3P (5.18)
with decay
P^ D (5.19)
as a paradigm for population dynamics of sexually reproducing species [64,
65], with kp and dp representing the birth and death rates of the species
P, respectively. If we let these reactions occur in two coupled identical
5.1. Deterministic Systems and Chaos 213
(b)
Figure 5.8. Bifurcation diagrams of the logistic map, (a) showing the stable
(solid curve) and the unstable (dashed curve) steady states of the system
versus the system parameter, n, and (b) showing the limiting values of x(i)
versus the system parameter, ^.
214 Chapter 5. Nonlinear Model Development
0 10 20 30 40 50 60 70 80 90 100
continuous stirred tank reactors fed with a volumetric flow rate /, and a
coupling strength g, as shown in Figure 5.10, we can write the material
balance equations as
drl
-kpTrp2 + f ( r 0 - Ti) + g(rj - rl (5.20)
~dt
dpi O / /»
7*' T) • ( r j fj ] rf) • \ rj (T)' — T) ' )
Figure 5.10. Two coupled identical CSTRs with cubic autocatalytic species
P.
5.2. Nonlinear Time Series Analysis 215
i
0.8
0.6
0.4
0.2
0
0.0066 0.0068 0.007 0.0072 0.0074
f
0.2
0.16
0.12
0.08
0.04
M
0 i i _!
0 0.05 0.1 0.15 0.2 0.25
(a)
0.25
0.15
0.05
Figure 5.12. Two projections of the chaotic orbit of the autocatalysis sys-
tem.
5.2. Nonlinear Time Series Analysis 217
X = [yi-(m-l)T,2/t-(m-2)T,---,2/i]T (5-21)
where m is called the embedding dimension, and r the time delay. The
embedding theorems of Takens [582] and Sauer et al. [535] show that,
under some conditions, if the sequence {y^} is representative of a scalar
measurement of a state, and m is selected large enough, the time delay
coordinates provide a one-to-one image of the orbit with the underlying
dynamics.
(5.22)
where Pab is estimated from the normalized histogram of the joint distri-
bution, and Pa and P^ are the marginal distributions for y(t — r) and y(i),
respectively. Similar to the correlation coefficient guided selection of the
time delay, we should select the T where IT attains its first local minimum.
Note that, for our example data, the choice of T with both methods is
around 25 seconds (Figure 5.14). The drift towards zero in both quantities
is due to the finite data size of 17,000, with a sampling rate of 2 measure-
ments per second. Although both methods of choosing a time delay give
useful guidelines, one should always make a reality check with the data.
5.2. Nonlinear Time Series Analysis 219
(b)
(c)
Figure 5.13. (a) Time series data of the blood oxygen concentration of a
sleep apnea patient, (b) Selection of a too small time delay (T = 2) hides
the information in the data, (c) Selecting a more appropriate time delay
(r = 25) reveals more detail in the data.
220 Chapter 5. Nonlinear Model Development
Figure 5.14. The correlation coefficient (solid line), and the mutual infor-
mation content (dashed line) versus the time delay.
neighbors in the data set. The choice of the second nearest neighbors is to
eliminate the impossibility of comparing with a zero distance, should we
have a repeating pattern in the data set. Although it is computationally
straightforward to come up with a numerical value for d using this algo-
rithm, producing an accurate value is often of doubt. A rule of thumb is to
use at least 10d/2 data points to compute d [526].
Note that the box counting dimension computed from Eq (5.23) need
not be integer. In fact, it is certainly non-integer for a strange attractor,
hence the name strange. On the other hand, the phase space dimension m
is an integer, and to host the attractor, it should be grater than or equal
to the box counting dimension of the attractor, d. Although we stick with
the box counting dimension in our arguments, there are other definitions
of (fractal) dimensions, such as correlation and Lyapunov dimensions, but
selecting one or the other would not change the line of thought, as different
measurements of dimension of the same strange attractor should not differ
in a way to contain an integer value in the range. Thus, no matter which
definition we use for the fractal dimension, we have the same necessary
condition ra > d, and the same sufficient condition m > 2d.
Using these guidelines, one may be tempted to use an embedding di-
mension equal to the next integer value after Id. In an ideal case, where
there is no noise in the infinitely many data points, such a selection would
be sound and safe. However, in a more realistic setup, if m is chosen too
large, the noise in the data will decrease the density of points defining the
attractor. In this analysis we are interested in finite dimensional determin-
istic systems, whereas noise is an infinite dimensional process that fills each
available dimension in a reconstructed phase space. Increasing m beyond
what is minimally required has the effect of unnecessarily increasing the
level of contamination of data with noise [669]. A method to determine the
minimal sufficient embedding dimension is called the false nearest neighbor
method [276].
Suppose that the minimal embedding dimension for our dynamics is mo,
for which a time delay state-space reconstruction would give us a one-to-
one image of the attractor in the original phase space. Having the topolog-
ical properties preserved, the neighbors of a given point are mapped onto
neighbors in the reconstructed space. If we try to embed the attractor
in an m-dimensional space with m < mo, the topological structure would
no longer be preserved. Points would be projected into neighborhoods of
other points to which they would not belong in higher dimensions. Such
data points are called false neighbors. To find the minimal embedding di-
mension, we should require the fraction of the false neighbors to be less
than a heuristic value.
222 Chapter 5. Nonlinear Model Development
signals from broadband signals. Methods for this [172] are over fifty years
old and are well developed.
In more general terms, in filtering the noise from the signal, we are
separating the information-bearing signal and the interference from the en-
vironment. In the case of a narrowband signal, such as signals from a linear
system, in a broadband environment, the distinction is quite straightfor-
ward. The frequency domain is the appropriate space to perform the sepa-
ration, and looking at the Fourier spectrum is sufficient to differentiate the
signal from noise.
Similar to the linear case, if the nonlinear process signal and the con-
tamination are located in significantly distinct frequency bands, the Fourier
techniques are still indicative. In sampling dynamic systems, if for exam-
ple the Fourier spectrum of the system is bounded from above at a cut-off
frequency, /c, Shannon's sampling theorem states that, by choosing a sam-
pling frequency, fs > 2/c, the signal can be perfectly reconstructed [172].
However, in the case of signals that come from sources that are dynamically
rich, such as chaotic systems, both the signal and the contamination are
typically broadband, and Fourier analysis is not of much assistance in mak-
ing the separation. It is shown analytically that, the frequency spectrum
of a system that follows intermittency route to chaos has a I// tail [540].
When the orbits converge to a strange attractor, which is a fractal limit set,
it again has a I// tail in the frequency domain. Thus, for dynamically rich
systems, no matter how high one considers the cut-off, the filtered portion
of the signal will still have more information. This can be easily seen from
the signal to noise ratio of a signal 5, whose power content up to a frequency
fb is P, and for frequencies greater than /&, it goes proportional to I//.
This ratio
P + a f/c df/fJ
SNR= a
£» (5.24)
lfc d//f
with a a real positive proportionality constant, vanishes for all fc < oo.
Furthermore, we cannot practically consider a very large fc, since most
of the measurements are done by the aid of digital computers with finite
clock frequencies. Nevertheless, we will be gathering measurements from
such sources with finite sampling frequencies, and still wish to filter the
data for the underlying signal. Another problem caused by finite sampling
is the so called aliasing effect. That is, in the Fourier domain, the power
contributions coming from the replicas of the original signal centered at the
multiples of the sampling frequency are not negligible either.
If we can make the assumption that the signal we seek to separate is
coming from a low-order system with specific geometric structure in its
state space, we can make use of a deterministic system model or a Markov
224 Chapter 5. Nonlinear Model Development
chain model, and seek for model parameters or transition probabilities via
a time domain matching filter. The geometric structure of a system in
its state space is characteristic for each chaotic process, which enables us
to distinguish its signal from others. These separating techniques have a
significant assumption about the nature of the process generating the signal,
that is, the 'noise' we wish to separate from the 'signal' should be coming
from a high-order chaotic source. Depending on the a priori information
we have about the underlying system dynamics, various filtering problems
can be stated.
• If we know the exact dynamics that generated the signal,
x i+ i = f ( Xi ) (5.25)
with xz e T2J1 (i.e., an n-dimensional real vector) and f (•) : ~R,n —> 7ln
(i.e., an n-dimensional vector function that takes an n-dimensional
argument), we can use this knowledge to extract the signal satisfying
the dynamics. This method is referred as the regression technique.
• If we have a filtered signal from the system of interest extracted at
some prior time, we can use this pivot signal to establish a statistics
of the evolution on the attractor, and use it to separate the signal in
the new set of measurements. This is gray box identification.
• If we know nothing about the underlying process and have just one
instance of measurements, then we must start by making simplifying
assumptions. Such assumptions may be that the dynamics is deter-
ministic, and that it has a low-dimensional state space. This is black
box identification.
Although as the problem bleaches out, the task of separating the signal
from noise gets easier, the real life cases unfortunately favor darker shade
situations. Various linear filtering and modeling techniques were discussed
in Chapter 4.
To filter out noise in the time series signal, we will make use of the serial
dependencies among the measurements, that cause the delay vectors to fill
the available ra-dimensional space in an inhomogeneous fashion. There is
a rich literature on nonlinear noise reduction techniques [117, 295]. In this
section we will briefly discuss one approach that exploits the geometric
structure of the attractor by using local approximations.
The method is a simple local approximation that replaces the central
coordinate of each embedding vector by the local average of this coordinate.
The practical issues in implementing this technique are as follows [228]. If
the data represents a chaotic dynamics, initial errors in the first and the
5.2. Nonlinear Time Series Analysis 225
last coordinates will be magnified through time. Thus, they should not be
replaced by local averages. Secondly, except for oversampled data sets, it
is desirable to choose a small time delay. Next, the embedding dimension,
m, should be chosen higher than 2d + 1, with d being the fractal dimension
of the attractor. Finally, the neighborhood should be defined by selecting
a neighborhood radius r such that, r should be large enough to cover the
extent of the contaminating noise, yet smaller than the typical radius of
curvature of the attractor. These conditions may not always be satisfied
simultaneously. As we have been stressing repeatedly for other aspects of
nonlinear data analysis, the process of filtering should be carried out in
several attempts, by trying different tuning parameters, associated with a
careful evaluation of the results, until they look reasonably satisfactory.
The filtering algorithm is as follows:
where £/(•) is the unit step function, and || • || is the vector norm. Note
that the first and the last (m — l)/2 data points will not be filtered
by this averaging.
Again consider the time series data of Example 6 (Figure 5. 16. a). Applying
the local averaging noise filtering method to the signal with an embedding
dimension of m — 5, a time delay of r — 9 sec., and a neighborhood radius
of r = 400, we obtain the filtered signal shown in Figure 5.16.b. Note how
the orbits became crisp, and how the conic shape of the attractor became
visible in the filtered signal. D
226 Chapter 5. Nonlinear Model Development
(a)
(b)
Figure 5.16. The reconstructed state space of the blood oxygen concentra-
tion signal with r = 9 sec. projected in two dimensions (a) before filtering,
and (b) after filtering.
5.2. Nonlinear Time Series Analysis 227
and then use various criteria to determine the parameters a. Thus building
an understanding of local neighborhoods, one can build up a global non-
linear model by piecing the local models to capture much of the attractor
structure.
The main departure from linear modeling techniques is to use the state
space and the attractor structure dictated by the data itself, rather than
to resort to some predefined algorithmic approach. It is likely that there
is no algorithmic solution [511] to how to choose a model structure for
chaotic systems, as the data from the dynamics dictate properties that are
characteristic for the underlying structure.
If we are going to build a continuous model, we need the time derivatives
of the measured quantities, which are generally not available. However, one
should avoid numerical differentiation whenever possible, as it amplifies
measurement noise. One remedy is to smooth the data before taking the
time derivatives. The smoothing techniques usually involve least-squares
fit of the data using some known functional form, e.g., a polynomial. In-
stead of approximating the time series data by a single (thus of high order)
polynomial over the entire range of the data, it is often desirable to replace
each data point by the value taken on by a (low order) least-squares poly-
nomial relevant to a subrange of 2M + 1 points, centered, where possible,
at the point for which the entry is to be modified. Thus, each smoothed
value replaces a tabulated value. For example, if we consider a first order
least squares fit with three points, the smoothed values, y i; in terms of the
230 Chapter 5. Nonlinear Model Development
(5.33)
(5.34)
(5.35)
Next, we can consider the reconstructed state space, and seek which set
of coordinates give enough information about the time derivative of each
coordinate. This is illustrated with an example.
In the four dimensional reconstructed state space of the blood oxygen con-
centration signal (Table 5.1), we would like to investigate the mutual infor-
mation contents of Table 5.2.
From an information theoretic point of view, the more state components
we compare with a given time derivative, the more information we would
gather. For example, the mutual information between x\ and x\,x<2 would
be greater than or equal to the mutual information between x\ and x\.
Therefore, for each ±k, the last line of the Table 5.2 would be the largest
entry. Of course, this would depend on the choice of time delay r we use
to reconstruct the state-space. If we plot this dependence (Figure 5.17),
the mutual information contents of all time derivatives behave similarly,
making a peak around r = 5 sec, and a dip around r = 22 sec. For
modeling purposes, this plot suggests the use of a time delay of T = 5 sec.
For this choice of time delay, we can investigate how information are
gathered about the time derivatives by filling out the mutual information
table (Table 5.2) and observing the information provided by various subsets
of coordinates. At this point, we again resort to our judgement about the
system, and tailor the functional dependencies of the coordinates guided by
our knowledge about the system and educated guess. We are interested in
5.3. Model Development 231
Table 5.1. The mutual information contents to be computed for the four
dimensional reconstructed state space of the blood oxygen concentration
signal, with k = 1,2,3,4. The mutual information content between the
time derivative of each x^ and the entries in the right-hand column are
computed.
Xk X2
X3
£4
Xk
Xk
£2,^4
xk
2
0.45
Figure 5.17. The mutual information content between Xk and #1, #25^35
versus the time delay, for k — 1,2,3,4.
232 Chapter 5. Nonlinear Model Development
Table 5.2. The mutual information contents for the four dimensional re-
constructed state space of the blood oxygen concentration signal.
±i 22 23 ±4
Xi 2.665e-01 3.664e-01 2.493e-01 1.250e-01
X2 2.188e-01 2.679e-01 3.651e-01 2.476e-01
X3 1.507e-01 2.190e-01 2.675e-01 3.627e-01
£4 7.433e-02 1.508e-01 2.186e-01 2.665e-01
Xi,X2 4.312e-01 5.043e-01 4.183e-01 3.214e-01
Xl,X3 3.819e-01 5.770e-01 4.283e-01 4.098e-01
Xi,X4 3.240e-01 5.125e-01 4.398e-01 3.486e-01
X2,X3 2.6126-01 4.315e-01 5.029e-01 4.1526-01
X2,X4 2.614e-01 3.829e-01 5.751e-01 4.260e-01
X3,X4 2.0136-01 2.616e-01 4.302e-01 4.999e-01
21,22,23 4.533e-01 5.979e-01 5.318e-01 4.431e-01
2i,£2,24 4.568e-01 5.800e-01 6.052e-01 4.752e-01
2j, 23, 24 4.098e-01 6.0416-01 5.422e-01 5.347e-01
22,23,24 2.905e-01 4.535e-01 5.959e-01 5.287e-01
2i,22, 23,24 4.802e-01 6.268e-01 6.257e-01 5.562e-01
21 = /i (21,22), (5.36)
X2 = /2 (£1,2:3), (5.37)
X3 = /3(22,2 4 ), (5.38)
24 = /4(23,2 4 ). (5.39)
D
The conventional usage of the methods introduced in this chapter in
system modeling is to reconstruct a phase space using a scalar measurement.
In the following example we will demonstrate how these concepts can be
used to narrow down the phase space using multivariate measurements from
a fermentation process [59].
5.3. Model Development 233
FLOWMETER
SAMPLE EXHAUST
NUTRIENT r~S * CONDENSER
A MEDIUM
BREAK
ANTIFOAM [X]
DEBUBBLER
WATER -*-
HARVEST
VESSEL
variables would make a loosely quilted pattern, leaving tracks on the phase
space. At the other extreme, if the variables reveal a dense pattern yielding
no significant information about each other, (d < 0.5 and I < 0.5), these
are considered to be INDEPENDENT. If two arrays of data leave tracks on
the phase space by moderately filling it and display a considerable amount
of information about each other, and are highly correlated (0.5 < d < 1,
/ > 0.5 and p > 0.6), then one of the two arrays can be discarded in
favor of the other, since measuring both would be REDUNDANT. For other
combinations of /, d and p, two arrays of data will be considered to be
COUPLED.
Our measurement space is 9-dimensional, and our measurement vector
is composed of samples of the vector [X T> X S 1Z, Q P a j] T '. When
we compute the capacity dimension of this signal we find d — 2.98. This
capacity dimension yields a sufficient embedding dimension of n = 6. On
the other hand, due to the statistical fluctuations, we may as well have a
capacity dimension that is slightly above 3.0. In such a case, we should
be computing an embedding dimension of n = 7. However, this is not the
case, as the actual dimension of this signal must be an integer value (3 in
this case), due to the assumed non-chaotic nature of the signal. Therefore,
we choose n — 6.
This choice of embedding dimension for a 9-dimensional signal implies
that at least 3 of the entries in the measurement vector should be discarded.
In agreement with this finding, if we look at the eigenvalues of the covariance
5.3. Model Development 237
Pair d I P Class
T>Q 0.65 0.51 0.13 C
x>n 0.71 0.60 -0.12 C
VS 0.59 0.53 -0.69 R
-DX 1.26 0.58 0.26 I
T>a 0.85 0.66 0.23 C
£>7 0.84 0.51 0.21 C
VP 1.03 0.64 0.49 I
Qn 0.75 0.55 0.63 R
QS 0.28 0.47 0.25 I
QX 0.59 0.53 0.38 C
Qa 1.11 0.51 -0.36 I
Gl 0.78 0.41 -0.07 C
QP 1.10 0.52 -0.17 I
I'D 1.40 0.62 0.13 I
IQ 0.70 0.60 0.05 C
in 0.79 0.62 -0.45 C
IS 0.59 0.51 -0.34 C
IX 0.95 0.58 0.92 R
la 0.91 0.62 -0.08 C
J7 0.96 0.53 -0.10 C
IP 1.33 0.63 -0.18 I
Ua 1.16 0.65 -0.34 I
7^7 1.00 0.48 -0.16 I
np 1.09 0.63 -0.09 I
sn 0.31 0.57 0.38 C
Sa 0.63 0.50 -0.53 C
<S7 0.47 0.34 -0.26 I
SP 0.77 0.50 -0.58 C
xn 0.40 0.62 -0.07 C
xs 0.36 0.54 -0.15 C
Xa 1.01 0.61 -0.25 I
*7 0.93 0.52 -0.10 C
XP 1.36 0.59 -0.14 I
aP 1.19 0.64 0.50 I
70; 0.87 0.53 0.46 C
~fP 0.59 0.52 0.35 C
X = /i(#,S,ft,7), (5-42)
S = / 2 (*,S,fc,P,a), (5.43)
U = /3(*,S,fc), (5.44)
P = / 4 («S,P,7), (5.45)
a = / 5 (5,a,7), (5.46)
7 - /6(#,7>,a,7). (5.47)
Writing down such a set of generic model equations for potential mod-
els reduces the computational effort of parameter estimation to about one-
fourth, while increasing the reliability of the model constructed by increas-
ing its degrees of freedom. In a modeling attempt with 100 data points,
this corresponds to about a four-fold increase in reliability.
Matlab [372] is arguably the most widely used interactive numerical pro-
gramming environment in science and engineering. The name is an acronym
for MATrix LABoratory. It has a symbolic computation interface using
Maple. It is a commercial product. More information can be obtained from
http://www.mathworks.com/. Platforms: Unix flavors (including Linux),
Windows and Macintosh.
XPP [146] is a package for simulating dynamical systems that can handle
differential equations, difference equations, Volterra integral equations, dis-
crete dynamical systems and Markov processes. The name is an acronym for
X-windows Phase Plane. Data structure used by XPP is compatible with
AUTO. XPP also offers a graphical user interface for AUTO. It is a free
software that can be obtained from http: //www. math. pitt. edu/^bard/
xpp/xpp.html. Online documentation is also available from the same ad-
dress. Platforms: Unix flavors (including Linux).
Statistical Process
Monitoring
Monitoring and control of batch processes are crucial tasks in a wide variety
of industrial processes such as pharmaceutical processes, specialty chemi-
cals production, polymer production and fermentation processes. Batch
processes are characterized by prescribed processing of raw materials for a
finite duration to convert them to products. A high degree of reproducibil-
ity is necessary to obtain successful batches. With the advent of process
computers and recent developments in on-line sensors, more data have be-
come available for evaluation. Usually, a history of the past successful and
some unsuccessful batches exist. Data from successful batches characterize
the normal process operation and can be used to develop empirical process
models and process monitoring systems.
The goal of statistical process monitoring (SPM) is to detect the exis-
tence, magnitude, and time of occurrence of changes that cause a process to
deviate from its desired operation. The methodology for detecting changes
is based on statistical techniques that deal with the collection, classification,
analysis, and interpretation of data. Traditional statistical process control
(SPC) has focused on monitoring quality variables at the end of a batch
and if the quality variables are outside the range of their specifications,
making adjustments (hence control the process) in subsequent batches. An
improvement of this approach is to monitor quality variables during the
progress of the batch and make adjustments if they deviate from their ex-
pected ranges. Monitoring quality variables usually delays the detection
of abnormal process operation because the appearance of the defect in the
quality variable takes time. Information about quality variations is encoded
in process variables. The measurement of process variables is often highly
automated and more frequent, enabling speedy refinement of measurement
information and inferencing about product quality. Monitoring of process
variables is useful not only for assessing the status of the process, but also
243
244 Chapter 6. Statistical Process Monitoring
Illlllllllllllll r
Time Time
UNI illl
Time Time
Null hypothesis: Ho : p, = a
Alternate hypothesis: Ji\ : p, ^ a
First a is selected to compute the confidence limit for testing the hy-
pothesis then a test procedure is designed to obtain a small value for /5,
if possible. j3 is a function of sample size and is reduced as sample size
increases. Figure 6.1 represents this hypothesis testing graphically.
Critical Value
a
Sampling
Distribution of x
assuming Ht true at
(J =X, Specified
upper control limit (UCL) and the lower control limit (LCL).
Two Shewhart charts (sample mean and standard deviation or the
range) are plotted simultaneously. Sample means are inspected in order
to assess between samples variation (process variability over time). Tra-
ditionally, this is done by plotting the Shewhart mean chart (x chart, x
represents average (mean) x). However, one has to make sure that there
is no significant change in within sample variation which may give an er-
roneous impression of changes in between samples variation. The mean
values at times t — 2 and t — 1 in Figure 6.3 look similar but within sample
variation at time t — I is significantly different than that of the sample at
time t — 2. Hence, it is misleading to state that between sample variation
is negligible and the process level is constant. Within sample variations
of samples at times t — 2 and t are similar, consequently, the difference in
variation between samples is meaningful. The Range chart (R chart) or
O Individual points
• Mean
CD
Time
be used to develop the x chart (rather than the x chart and the range chart
is developed by using the "moving range" concept discussed in Subsection
6.1.3.
The assumptions of Shewhart charts are:
• The distribution of the data is approximately Normal.
• The sample group sizes are equal.
• All sample groups are weighted equally.
• The observations are independent.
Describing Variation
The location or central tendency of a variable is described by its mean,
median, or mode. The spread or scatter of a variable is described by its
range or standard deviation. For small sample sizes (n < 6, n=number of
samples), the range chart or the standard deviation chart can be used. For
larger sample sizes, the efficiency of computing the variance from the range
is reduced drastically. Hence, the standard deviation charts should be used
when n > 10.
One or more observations may be made at each sampling instant. The
collection of all observations at a specific sampling time is called a sample.
The convention on summation and representation of mean values is
_ __ _ _ /,, 1x
X-i, — y Xi'i . X.. — z/ y Xi-j (O.i)
n z—' ran —' z — '
j=i 1=1 j=\
The x chart considers only the current data value in assessing the status
of the process. Run rules have been developed to include historical infor-
mation such as trends in data. The run rules sensitize the chart, but they
also increase the false alarm probability. The warning limits are useful in
developing additional rules (run rules) in order to increase the sensitivity
of Shewhart charts. The warning limits are established at "2-sigma" level,
which corresponds to a/2=0.02275. Hence,
UWL = Target + 2(7 LWL = Target - la (6.3)
The random variable R/a is called the relative range. The parameters of its
distribution depend on sample size n, with the mean being d<2- An estimate
of <j (the estimates are denoted by a ?) can be computed from the range
data by using
Defining
and J94 = 1 + 3 (6.8)
"2
the control limits become
UCL = RD4 and LCL = RD3 (6.9)
which are tabulated for various values of n and are available in many SPC
references and in the Table of Control Chart Constants in the Appendix.
The x chart
The estimator for the mean process level (centerline) is x. Since the
estimate of the standard deviation of the mean process level a is ^-,
R
° (6.10)
Example Consider the following data set where three measurements have
been collected at each sampling time in Table 6.1. The first twenty samples
are used to develop the monitoring charts and the last five samples are
monitored by using these charts.
Data used in the development of the SPM charts by computing the mean
and standard deviation and calculating the control limits are also plotted
to check if any of these samples are out of control. If not, the charts are
used as developed. If there are any out of control points, special causes for
such behavior are investigated. If such causes are found, the corresponding
data are excluded from the data set used for chart development and the
chart limits are computed again. Since there are no data out of control for
the first 20 samples, the charts are used as developed for monitoring the
five "new" samples.
6.1. SPM Based on Univariate Techniques 251
The overall mean, range, and standard deviation are 19.48, 2.80 ana
1.18, respectively. The mean and range charts are developed by using
the overall mean and range values in Eqs. 6.10 and 6.11. The resulting
Shewhart charts are displayed in Figure 6.4. The mean of sample 22 is out
of control while the range chart is in control, indicating a significant shift in
level. Both the mean and range are out of control at sample 23, indicating
significant change in both level and spread of the sample.
252 Chapter 6. Statistical Process Monitoring
UCL
LCL
16-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
10-
UCL
2. 5
A A CL
LCL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Observation Number
Figure 6.4. Shewhart chart for mean (CL = x) and range (CL — R).
The 5 chart is preferable for monitoring variation when the sample size
is large or varying from sample to sample. Although S2 is an unbiased es-
timate of cr 2 , the sample standard deviation 5 is not an unbiased estimator
of a. For a variable with a normal distribution, S estimates C4cr, where c^
is a parameter that depends on the sample size n. The standard deviation
of S is a- When a is to be estimated from past data,
(6.12)
m 1=1
and 5/C4 is an unbiased estimator of a. The exact values for 04 are given
in the Table of Control Chart Constants in the Appendix. An approximate
relation based on sample size n is
(6.13)
4n-3
6.1. SPM Based on Univariate Techniques 253
The S Chart
The control limits of the S chart are
Example The mean and standard deviation charts are developed by using
the overall mean and standard deviation values in Eqs. 6.16 and 6.18. The
resulting Shewhart charts are displayed in Figure 6.5. The means of samples
22 and 23 are out-of-control, while the standard deviation chart is out-of-
control for sample 23, providing similar results as x and R charts.
Interpretation of x Charts
The x charts must be used along with a spread chart. The process
spread must be in-control for proper interpretation of the x chart.
The x chart considers only the current data value in assessing the status
of the process. In order to include historical information such as trends in
data, run rules have been developed. The run rules sensitize the chart,
but they also increase the false alarm probability. If k run rules are used
simultaneously and rule i has a Type I error probability of on , the overall
Type I error probability cttotai is
UCL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
c 4 »
A UCL
andard Devic
-\ A ^ /A /^-V \ -
to
CL
o
O
LCL
C/)
_2 _
1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Observation Number
Figure 6.5. Shewhart chart for mean (CL = x) and standard deviation
(CL = S).
where p is the probability that a sample exceeds the control limits, R is the
run length and E[-] denotes the expected value. For an x chart with 3cr
limits, the probability that a point will be outside the control limits even
though the process is in control is p = 0.0027. Consequently, the ARL(O) is
ARL = l/p = 1/0.0027 = 370. For other types of charts such as CUSUM,
it is difficult or impossible to derive ARL(O) values based on theoretical
arguments. Instead, the magnitude of the level change to be detected is
selected and Monte Carlo simulations are run to compute the run lengths,
their averages and variances.
is plotted against the sample number i. CUSUM charts are more effective
than Shewhart charts in detecting small process shifts, since they combine
256 Chapter 6. Statistical Process Monitoring
information from several samples. CUSUM charts are effective with samples
of size 1. The CUSUM values can be computed recursively
5i = ( x i - A t o ) + 5 i _i . (6.22)
If the process is in-control at the target value //o, the CUSUM Si should
meander randomly in the vicinity of 0. If the process mean is shifted, an
upward or downward trend will develop in the plot. Visual inspection of
changes of slope indicates the sample number (and consequently the time)
of the process shift. Even when the mean is on target, the CUSUM Si may
wander far from the zero line and give the appearance of a signal of change
in the mean. Control limits in the form of a V-mask were employed when
CUSUM charts were first proposed in order to decide that a statistically
significant change in slope has occurred and the trend of the CUSUM plot
is different than that of a random walk. CUSUM plots generated by a
computer became more popular in recent years and the V-mask has been
replaced by upper and lower confidence limits of one-sided CUSUM charts.
One-Sided CUSUM charts are developed by plotting
( 6 - 23 )
A" = f , H= f . (6.24)
Given the a and j3 probabilities, the size of the shift in the mean to be
detected (A), and the standard deviation of the average value of the variable
x (<7x), the parameters in Equation 6.24 are:
respectively. The starting values are usually set to zero, S#(0) = SL(O) = 0.
When <$"# (i) or SL(I) exceeds the decision interval H, the process is out-of-
control. Average Run Length (ARL) based methods are usually utilized to
find the chart parameter values H and K. The rule of thumb for ARL(A)
for detecting a shift of magnitude A in the mean when A ^ 0 and A > K
is
ARL(A) = 1 + J?-^ . (6.27)
3 H = 2.62
2h
CO
I
CO
-3 H = 2.62
-4
-5 i i i i i i
1 2 3 4 5 6 7 8 910111213141516171819202122232425
Observation Number
-l)ff (6.28)
\ ii-1)
6.1. SPM Based on Univariate Techniques 259
± V(*,) = (6.31)
i=t-a+l
or =^±5 (6.32)
In general, the span a and the magnitude of the shift to be detected are
inversely related.
Spread Monitoring by Moving-Range Charts
In a moving-range chart, the range of two consecutive sample groups of
size a are computed and plotted. For a > 2,
MRt= max(xi) - mm(x z ) , i = (t — a + l),t (6.33)
The computation procedure is:
1. Select the range size a. Often a = 2.
2. Obtain estimates of MR and a = MR/d-2 by using the moving-ranges
MRt of length a. For a total of m samples:
771—a+l
MR = V MRt (6.34)
m - a+ 1 -
weight to more recent data and has a fading memory where old data are
discarded from the average. Since the EWMA is a weighted average of
several consecutive observations, it is insensitive to nonnormality in the
distribution of the data. It is a very useful chart for plotting individual
observations (n = 1). If x^ are independent random variables with variance
cr 2 /n, the variance of Zi is
^z.z = —\(-^—]
o
[1 -\ (1 - ' w}J21}
I L >•
(6.37)
'
The last term (in brackets) in Eq. 6.37 quickly approaches 1 as i increases
and the variance reaches a limiting value. Often the asymptotic expres-
sion for the variance is used for computing the control limits. The weight
constant w determines the memory of EWMA, the rate of decay of past
sample information. For w = 1, the chart becomes a Shewhart chart. As
w —> 0 EWMA approaches a CUSUM. A good value for most cases is in
the range 0.2 < w < 0.3. A more appropriate value of w for a specific ap-
plication can be computed by considering the ARL for detecting a specific
magnitude of level shift or by searching w which minimizes the prediction
error for a historical data set by an iterative least squares procedure. 50
or more observations should be utilized in such procedures. EWMA is also
known as geometric moving average, exponential smoothing, or first order
pole filter.
Upper and the lower control limits are calculated as
= fiQ + 3a2i
CL = ZQ
UCL
LCL
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Observation Number
• •*
d» . •• •'**»
. •. ..••• •\
Mean
./ ; **
LCL
Sample number
SPE
SPE
PC
2/ Envelop of NOC
20
15
99 % limit
95 % limit
200 400 600 800 1000 1200 1400 1600 1800 2000
99 % limit
10
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Observations
Figure 6.10. SPE and T2 charts for continuous process monitoring based
on PCA.
puts that describe product quality are collected less frequently since these
measurements are expensive and time consuming. Although it is possible to
measure some quality variables on-line by means of sophisticated devices,
measurements are generally made off-line in the quality control laboratory.
Process data contain important information about both the quality of the
product and the performance of the process operation. PLS models can be
used in two ways:
Quality monitoring. The correlation between the process variables and
the quality variables can be determined through the PLS model. This
statistical model provides information for estimating product quality
from process data.
Statistical process control. PLS model can also be used to quickly de-
tect process upsets and unexpected behavior. When an assignable
cause is detected, necessary actions can be taken to prevent any dam-
age to process performance and/or product quality.
266 Chapter 6. Statistical Process Monitoring
dimensions for monitoring while for prediction more PLS dimensions are
needed in order to improve the precision of the predictions.
Once the PLS model is built, squared prediction error (SPE) can be
calculated for either the X or the Y block model (Eqs. 6.38 and 6.39)
(6.38)
y-yy)2 (6.39)
where x and y are predicted observations in X and Y using the PLS model,
respectively, i and j denote observations and variables in X or Y, respec-
tively.
x and y in Eqs. 6.38 and 6.39 are calculated for new observations as
follows: m
£a,new — / ^ •^new,j^a,j ^D.4UJ
J=l
A
,j — / ^ *a,newPa,-j
a=l
where waj denotes the weights, paj the loadings for X block (process
variables) of the PLS model, ta>new the scores of new observations and b
the vector of regression coefficients.
Multivariate control charts based on squared prediction errors (SPEx
and SPEy), biplots of the scores (ta vs t a + i) and the Hotelling's statistic
(T2) are constructed with the control limits. The control limits at signif-
icance level a/2 for a new independent t score under the assumption of
normality at any time interval are
are
where n, sest the number of observations and the estimated standard
deviation of the score sample at the chosen time interval and i n -i, a /2 is
the critical value of the ^-student test with n — 1 degrees of freedom at
significance level a/2 [214, 435]. The Hotelling's statistic (T2) for a new
independent t vector is calculated as [594]
rpi
*
A.T Q-IJ.tnew
— tnew^
•"•\n ~ T^^A,n-A
~ —7
*•) z? (a AA\
(6.44)
268 Chapter 6. Statistical Process Monitoring
where S is the estimated covariance matrix of PLS model scores, A the num-
ber of latent variables retained in the model and FA,H-A the F-distribution
value. The control limits on SPE charts can be calculated by an approxi-
mation of the x2 distribution given as SPEQ = gXha [?6] • This equation is
well approximated as [148, 255, 435]
SPEa * gh I - (6.45)
X-block Y-block
SPE (Figure 6.12(c)) and T2 (Figure 6.12(e)) charts for X block have de-
tected this change on time. Biplot of the latent variables also shows an
excursion from the in-control region defined by ellipses and the score val-
ues come back to the in-control region after the change is over (Figure
6.12(b)). SPE of Y block shows an out-of-control situation as well (Figure
6.12(f)). Although the disturbance is over after 150th observation (Figure
6.12(c)-6.12(e)), product quality seems to deteriorate because the predic-
tion capability of PLS model becomes poor after 150th observation (Figure
6.12(d)) suggesting a change in the quality space which is different than
the one reflected by PLS model.
3 4 5 6 7 8 5 10 15
Number of Latent Variables Latent variable 1
(a) (b)
IOU
140
/
:
120 -
•
;
100
CMX
<~ 80
60
40
\
20
y
X
_<~Vi^ J->N^_^>«^.,_-''" T ,\_^~~^^/
3 50 _ 100 .. 150 2( 0.5 1
°
Observation no. Observed Product Cone.
(c) (d)
n I
300
250
A 30
25
I
LUX200
Q_
C/3
150
100
u
A I \
\
\
Q.
CO
V
LJJ 20
15
10
I I^ \
V
n -
\
•
50 5
VA,
K^^Jv^^^A^^/ '' 7
\^| j
50 1 15 2C
°0 50 100 . 150 200 "° ... °<> . °
Observation no.
(e) (f)
Figure 6.12. SPM charts based on PLS model for monitoring a faulty case
(step decrease in substrate feed rate).
6.3. Data Length Equalization 271
• Curve registration
o
D.
d
§
o
CO
o
CO
at specified time intervals and then adjusted with respect to the indicator
variable. In this technique, a measure of the maturity or percent comple-
tion of any batch is provided by the percentage of its final value that has
been attained by the indicator variable at the current time. Several suc-
cessful applications of this approach can be found in the literature, mostly
for batch/semi-batch polymerization processes, reaction extent or percent
of component fed being the indicator variables [296, 418]. An application
for fermentation processes has been also given in the literature [522].
Choosing an indicator variable in batch fermentation processes depends
on the process operation and characteristics. If the process is a batch fer-
mentation, the choice of this variable is simpler than processes with batch
and fed-batch phases. For batch fermentations, there may be several vari-
ables, which can serve as indicator variables such as substrate concentra-
tion, product concentration or product yield. In the fed-batch case, in
addition to the aforementioned variables, percent substrate fed is also an
indicator variable. This percentage is calculated by fixing the total amount
of substrate added into the fermenter based on some performance crite-
ria. This end point (total amount of substrate fed), which is eventually
reached in all batches, defines a maturity point. For more complex oper-
ations such as batch operation followed by fed-batch operation, which is
very common for non-growth associated products such as antibiotics, dif-
ferent approaches to choosing indicator variables can be considered. Batch
and fed-batch phases of the operation can be treated separately so that
appropriate indicator variables can be determined for individual phases.
Implementation of this two-phase operation is illustrated in the following
example.
Example. Assume that data are available from 5 runs of a batch fol-
lowed by fed-batch penicillin fermentation. Potential process variables are
shown in Figure 6.14 for all batches before data pretreatment. Based on
simulation studies, data were collected using 0.2 h of sampling interval on
each variable for each batch resulting in total batch lengths varying be-
tween 403.8 h (2019 observations) and 433.6 h (2168 observations). When
these variables are assessed for use as an indicator variable, none of them
seem appropriate. Most of these variables contain discontinuities because of
the two operating regions (batch and fed-batch) and some of them are not
smooth or monotonically increasing/decreasing. Since none of the variables
can be chosen as an indicator variable that spans the whole duration of fer-
mentation, a different approach is suggested. The solution is to look for
different indicator variables for each operating region. In order to achieve
this mixed approach fermentation data are analyzed. For the first operating
(batch operation) region, substrate concentration in the fermenter can be
6.3, Data Length Equalization 273
IU N 1.2 15
s DO
10 \ 10
1.1
5 5
0
n
"0 200 400 , 200 400
3
2
CO,
1
J
5 ^Jfr^ -T--. -1
298 50 Q
297.9
' () 200 400 ) 200 400 290 400
Time, h Time, h Time, h
Figure 6.14. Output variables for the five batches. S: Substrate cone.,
DO: Dissolved oxygen cone., X: Biomass cone., P: Penicillin cone., V: Cul-
ture volume, CO2: CO2 cone., T: Temperature in the fermenter and Q:
Generated heat [62, 603].
considered as a good candidate since it can be started from the same initial
value and terminated at the same final value for each batch. The initial and
final substrate concentrations are fixed to 15 g/L and 0.4 g/L, respectively
to implement this idea. Instead of reporting data as a function of time for
these batches, data are reported on each variable for each batch at every
decrease of 0.5 g/L in substrate concentration using linear interpolation.
Choosing substrate concentration decrease (substrate consumption) as
an indicator variable for the batch operation provides another advantage,
it defines the end of batch operation or in other words the switching point
to fed-batch operation. Since the operating conditions are slightly different
and there are some random changes in microbial phenomena, the switch-
ing point is reached at different times for each batch resulting in different
274 Chapter 6. Statistical Process Monitoring
15
o 1.2
o
10 20 30 40 50
Time, h
(a)
0.4
50 100 150
1.2
o
!
Q
50 100 150
Indicator variable sample no
(b)
o
"c
CD
Q_
14 1.5
12
.o
I0.5
100 200 300 100 200 300
298.02
297.98.
0 100 200 300 0 50 100 150 200 250
Indicator variable sample no. Indicator variable sample no.
patterns between reference data and the new data are matched resulting
in the same data length as the reference data [252, 533]. Basic description
of DTW and different algorithms for implementing it have been reported
[252, 406, 533, 485].
One of the pioneering implementations of DTW to bioprocesses was
suggested by Gollmer and Posten [197] on the detection of important pro-
cess events including the onset of new phases during fermentations. They
have provided a univariate scheme of DTW for recognition of phases in
batch cultivation of S. cerevisiae and detection of faults in fed-batch E.
coli cultivations. Another application of DTW (by Kassidas et al. [270])
has focused on batch trajectory synchronization/equalization. They have
provided a multivariate DTW framework for both off-line and on-line time
alignment and discussed a case study based on polymerization reactor data.
To introduce the DTW theory, consider two sets of multivariate ob-
servations, reference set R, with dimensions j x P, and test set T, with
dimensions i x P (Eq. 6.48). These sets can be formed from any multivari-
ate observations of fermentation processes (or batch processes in general)
where j and i denote the number of observations in R and T, respectively,
and P the number of measured variables in both sets as p — 1, 2, . . . , P.
R(j, p) : Reference Set, j = 1, 2, . . . , M
T(i,p): Test Set, i = 1,2,..., N
R = rip,r2p,...,rjp,...,rMp
T = tip,t2p,...,tip,...,tNp (6.48)
Data lengths TV and M will not be equal most of the time because the
operating time is usually adjusted by the operators to get the desired prod-
uct quality and yield in response to variations in input properties for each
batch run and the randomness caused by complex physiological phenomena
inherent in biochemical reactions. This problem could be overcome using
linear time alignment and normalization based on linear interpolation or
extrapolation techniques. Let i and j be the time indices of the obser-
vations in T and R sets, respectively. In linear time normalization, the
dissimilarity between T and R for any variable trajectory is simply denned
as
N
) = d(i,j) (6-49)
r:
M
Temporal
Fluctuation
Region
1 N
Figure 6.18. Linear time alignment of two trajectories with different dura-
tions [484].
Note that the dissimilarity measure d(ti, TJ] between T and R is denoted as
d ( i , j ) for simplicity of notation in Eq. 6.49. Hence, the distortion measure
assessment will take place along the diagonal straight line of the rectangu-
lar (ti, TJ] plane shown in Figure 6.18. Linear time normalization implicitly
assumes that the temporal trajectory variations are proportional to the
duration of the batch (or the number of samples made on each variable).
However, since the timing differences between the two batches will be local
and not global, a more general time alignment and normalization scheme
would be appealing, including the use of nonlinear warping functions that
relate the indices of the variables in two trajectory sets to a common "nor-
mal" time axis k. Time warping has been developed to deal with these
issues by using the principles of dynamic programming (that is why it is
called as dynamic time warping) [252, 485, 533].
The objective in time warping is to match the elements of each pattern
(trajectory in our case) T and R so as to minimize the discrepancy in each
pair of samples. Similar events will be aligned when this is achieved using
a nonlinear warping function. This function will shift some feature vectors
in time, compress and/or expand others to obtain minimum distances. For
each vector pair in T and R, DTW is performed on a M x N grid under
a number of constraints. An example of pattern matching between two
vectors is shown in Figure 6.19.
280 Chapter 6. Statistical Process Monitoring
1 2 3 4 5 6 7 8 N=9
Figure 6.19. Nonlinear time warping process details. The point at (5,4)
aligns £(5) with r(4).
The most commonly used local distance for multivariate data is the weighted
quadratic distance
d[c(k)} = (6.54)
where W p is a positive definite weight matrix that shows the relative im-
portance of each variable p.
6.3. Data Length Equalization 281
c,(k)=i(k) *
1 2
c2(k)=j(k)
(6.55)
where D[i(k), j(k)} = D(t, r) is a normalized total distance between the two
trajectories along the path of length K, w(k) is a weighting function for the
local distances and N(w] is a normalization factor which is a function of
the weighting function. Now, the problem is reduced into an optimization
problem that can be written as
1
ZT(C) = (6.56)
282 Chapter 6. Statistical Process Monitoring
where D*(C) denotes the minimum normalized total distance and C* the
optimal path. This optimization problem can be efficiently solved by using
dynamic programming techniques [49, 53]. Dynamic programming is a well
known optimization technique used extensively in operations research for
solving sequential decision problems. The decision rules about determining
the next point (location) to be visited following a current point i is called
"policy". Dynamic programming determines the policy that leads to the
minimum cost, moving from point 1 to point i based on the Principle of
Optimality denned by Bellman [49] as
An optimal policy has the property that, whatever the initial state
and decision are, the remaining decisions must constitute an optimal
policy with regard to the state resulting from the first decision.
For time normalization problem this principle can be recast as follows [270,
406, 484]:
1. A globally optimal path is also locally optimal. If C* is determined
as the optimal path, any (i,j) point on C* is also optimal.
2. The optimal path to the grid point (i,j) only depends on the values
of the previous grid points.
Before delving into the integration/formulation of dynamic programming
and time warping, constraints on the warping function must be discussed.
2. Monotonicity Conditions.
The temporal order of the measurements collected on each variable is
of crucial importance to their physical meaning. Hence, imposing a
reasonable monotonicity constraint (monotonically nondecreasing se-
quence requirement) to maintain the temporal order while performing
6.3. Data Length Equalization 283
DTW is necessary:
(6.60)
If (z,j) is the kth path point in the grid shown in Figure 6.21, then
the previous path point, c(k — 1) can only be chosen from a set of
preceding points (Eq. 6.60). In this simple example, [i(k),j(k]] can
only be reached by either from [i(k),j(k - 1)} or [i(k - l ) , j ( k - 1)] or
[i(k — 1), j(k}}. This is also known as "no slope constraint" case. Ob-
viously, the control of the slope would be of importance for the correct
alignment. Sakoe and Chiba [533] have proposed a slope constraint
on the warping function using a slope intensity measure P — q/p
284 Chapter 6. Statistical Process Monitoring
11 * p-times
'I.
(a) Minimum slope (b) Maximum slope
w(k). This function depends only on the local path and controls the
contribution of each local time distortion d[i(k}, j(k)].
<Py
<E
Figure 6.24. Local continuity constraints studied by Myers et al. [406, 533].
6.3. Data Length Equalization 287
(c) (d)
Figure 6.25. Sakoe-Chiba slope weightings for Type III local continuity
constraint studied by Myers et al. [406, 533].
For instance, when Type (c) and Type (d) slope weighting constraints
are used, the overall normalization factors would be
and
fc=l
= i(K)-i(Q)+j(K)-j(0)=N +M (6.69)
(c)
(d)
pression) can be defined using the two parameters Qmax and Qmin'-
Kt Kt
(t)
= max
£ E (6.70)
-* - c
(l)
= mn £ (6.71)
(6.73)
M (6.74)
Eq. 6.73 defines the range of the points that can be reached using a
legal path based on a local constraint from the beginning point (1,1).
Likewise, Eq. 6.74 specifies the range of points that have a legal
path to the ending point (N, M) defined by Itakura [252] (Itakura
constraints). Figure 6.27 shows the effects of the global constraints
on the optimal search region defined by the parallelogram (Itakura
constraints) in the (TV, M) grid. An additional global path constraint
has been proposed by Sakoe and Chiba [533] as
\i(k) - j(k)\ < K0 (6.75)
where KQ denotes the maximum allowable absolute temporal differ-
ence between the two variable trajectories at any given sampling in-
stance. This constraint further decreases the search range as well as
the potential misalignments by trimming off the edges of the paral-
lelogram in the grid.
6.3. Data Length Equalization 291
Table 6.4. Allowable local path specifications and associated Qmax and
Qmin values for different types of local continuity constraints given in Figure
6.24 [484]
CO
II 1/2
III 1/2
IV 1/2
V 3> 1/3
IPs -» (i,ixo,i)(o,i)
VI 3/2 2/3
VII 1/3
rithms. The two most commonly used ones are Types (c) and (d)
(N,M)
(1-1) (Ko+1,1)
Figure 6.27. Global path constraints on the search grid (N x M) [484] for
Qmax = 2 and K0 = 2 N - M given that (KQ > \i(K] - j(K}\).
6.3. Data Length Equalization 293
Consequently, the optimal path will pass through each point on t but may
skip some on r.
A symmetric algorithm, however, will transform both time axes onto a
temporarily denned common axis with a common index, k. In this case,
the optimal path will go through all the points in both trajectory sets. Fur-
thermore, for the same reference trajectory set, the number of points in the
optimal path will be different for each new test trajectory set. Although
each test trajectory individually will be synchronized with the reference
trajectory, they will not be synchronized with each other, resulting in a set
of individually synchronized batch trajectories with unequal batch lengths.
If an asymmetric DTW is used, it will skip some points in the test trajec-
tory but will produce synchronized (with reference and each other) batch
trajectories having equal length with the reference set. Depending on the
choice of the reference set, some of the inconsistent features in T that may
cause false alarms in statistical process monitoring will be left out. In order
to compromise between the two extremes, solutions that are presented in
the following sections have been suggested [270].
where D(Ck) — DA(i, j)- Eq. 6.80 defines the accumulated distance that is
comprised of the cost of particular point [i(k),j(k)j itself and the cheapest
cost path associated with it. The second term in Eq. 6.80 requires a
decision on the predecessor. In Table 6.5, dynamic programming recursion
equations are summarized for different local continuity constraints when
Type (d) slope weighting is used. Note that slope weightings of the paths in
Table 6.5 are smoothed according to Figure 6.26. To illustrate the progress
of dynamic programming procedure, assume that Type (d) slope weighting
(symmetric) in Eq. 6.66 and Type III local continuity constraint (Figures
6.24(c) and 6.25(d)) are used (Figure 6.24).
During the forward phase, transition cost to the accumulated distance
at point ( i , j ) can be found by solving the following simple minimization
problem (dynamic programming recursion equation)
DA(i,j) = (6.81)
The local continuity constraints chosen above mean that the point ( i , j ) can
only be reached by either points (i — 2, j — 1), or (i — 1, j — 1) or (i — 1, j — 2)
as shown in Figure 6.24(c). To initialize the iterative process, DA(^-, 1) can
be assigned to 2d(l, 1). The forward phase finishes when point [i(K),j(K)]
is reached and the minimum normalized distance, D*(C), in Eq.6.56 is
computed (note that N(w) = i(K] + j ( K ) for Type (d) slope weighting).
At this point the second phase, which is the reconstruction of the op-
timal path, starts. Starting from point [i(K),j(K)] (say i ( K ) = N and
j ( K ] = M) in the search grid and using the stored information on opti-
mal transitions, first the predecessor of point (N, M) is located, then the
predecessor of the latter is identified. This is repeated until point (1,1) is
reached. At the end of the second phase, the optimal path is reconstructed
and as a consequence pattern indices are matched accordingly.
Example Consider two artificial signals to illustrate the DTW algorithm.
r is a (1 x 60) reference signal and t is a (1 x 50) test signal (Figure 6.28(a)).
Boundary conditions are assumed as
r(0) = *(0) = 0 and j(M) = 60, i(N) = 50. (6.82)
6.3. Data Length Equalization 295
Table 6.5. Accumulated distance formulations used for some of the local
constraints for Type (d) slope weighting case [406, 484]
mn
II mn
III mn
2d(t,j)
IV mn
Itakura min
Type V local continuity constraint was used along with a Type (d) slope
weighting function to produce the results in Figure 6.29 (see Figure 6.24(e)
and Table 6.4 for definitions). For simplicity, Sakoe and Chiba band global
path constraint was applied as shown by the shaded region in Figure 6.29.
The resulting synchronized signals are now comparable since the similar
features were aligned by DTW (Figure 6.28(b)). The signal magnitudes at
the end points were different, and DTW has preserved this difference while
adjusting the time scale.
296 Chapter 6. Statistical Process Monitoring
1.2
0.8
0.6
0.4
0.2
-0.2-
10 20 30 40 50 60
(a) Reference: r and Test: t signals before DTW
1.2
1-
0.8
0.6
0.4
0.2
-0.2-
0 10 20 30 40 50 60
(b) Reference: r and Test: t signals after DTW
(1,1)
test
based technique. One can calculate the means and standard deviations for
each variable in each batch trajectory set, take the average of those and
use to autoscale the set of trajectories to a common y-axis scale which was
used in the following example in this book. The average mean and stan-
dard deviation should be stored for rescaling and scaling of future batches.
The iterative synchronization procedure that will be presented here is an
adaptation of Kassidas and his co-workers' approach [270].
Consider T^, i = !,...,!/ set of trajectories of normal operation. Each
trajectory set in these batches will contain unequal data lengths as well as
unsynchronized trajectories. Each T^, is a matrix of Ng x P where A^ is the
number of observations and P is the number of variables, as given in Eq.
6.48. It is also assumed that a reference batch trajectory set R/ (M x P)
has been defined. The objective is to synchronize each T^ with R/.
After scaling and choosing one of the batches as reference batch run,
the next step becomes deciding on which DTW algorithm to implement.
If a symmetric algorithm is used, the resulting trajectories will be of equal
length that is greater than the length before synchronization since a sym-
metric algorithm projects the time indices of both test and reference trajec-
tories onto a common time scale. After each round of synchronization for
each batch, the resulting batch lengths will be different even though each
test batch will be synchronized with the reference batch individually but
not with each other. However, if one chooses to implement an asymmetric
algorithm, the optimal path will go through each point on the reference
batch run but could skip some points on the test set. The resulting tra-
jectories will be of equal length with the reference set and each test set
will be synchronized with each other. Since the synchronized trajectories
may not contain all the data points that were in the original trajectories
before synchronization, some inconsistent features of the test trajectories
may be left out. A combined algorithm (a symmetric DTW followed by an
asymmetric DTW) has been proposed by Kassidas et al. to compromise
between the two extremes [270].
According to their proposition, conventional symmetric DTW is first
applied for each batch trajectory set. The resulting expanded trajectories
are then exposed to an asymmetric synchronization step. If more than
one point of T is aligned with one point of R/, they suggested to take the
average of these points of T and align this average point with the particular
point of R/ [270]. As shown in Figure 6.30 for one variable, both t% and
ti+\ are aligned with the same TJ of the reference trajectory. In this case,
the following averaged point is aligned with TJ instead
6.3. Data Length Equalization 299
Reference trajectory
I
Choose a Reference
Batch, R f = T ,
I
Define Boundary
Conditions for DTW
-100
100 200 300 400 500 500
Time, h
60
40
20
o-20
CD
a.-40
-60
-80
100 200 300 400 500 500
Time, h
Iteration # 1 Iteration # 2
30
L
«- 25
| 20 |20
ILL h
CO I15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 4 5 6 7 8 9 10 11 12 13 14
Variable Number Variable Number
Iteration # 3 Iteration # 4
«15
£10 £10
* 5
0
5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Variable Number Variable Number
Iteration Number 5
"
_ 15
iJlLJl 5 6 7 8
Variable Number
9 10 11 12 13 14
Figure 6.33. Percent weight change for the variables after five iterations.
60
40
20
i 0
i -20
; ^0
-60
-80
-10 200 300 400 500
-100
100 200 300 400 500
Time, h Time, h
60
40
d 20
o
° 0
^c
15-20
(D
Q--40
-60
-80 -60
100 200 300 400 500 500
Time, h
where D~l is the integration operator and Co and C\ are arbitrary con-
stants [495]. Imposing the constraints /i(0) = 0 and /i(To) = T;, CQ = 0
and Ci = Ti/[{D-1exp(D-1q}}(TQ}]. Hence, h depends on q. The time
warping function h can be estimated by minimizing a measure of the fit T,j
of Xi[hi(tj)] to y. A penalty term in T,, based on q permits the adjustment
of the smoothness of hi [495]. To estimate the warping function /&j, one
minimizes
(6.90)
j
and Wj are weight matrices [495]. Weight matrices Wj allow for gen-
eral weighting of the elements and weight functions ctj (t} permit unequal
weighting of the fit to a certain target over time [495]. Parameter 77 ad-
justs the penalty on the degree of smoothness and q is expressed as a linear
combination of B-spline bases
K
fc=0
B-splines are used in this study as the polynomial basis for performing
curve registration because calculating the coefficients of the polynomial
is well defined. When estimating the solution to transforming particular
waveforms into the B-spline domain, the required number of calculations
increases linearly with the number of data points [494]. The derivative of
T^ with respect to the B-spline coefficient vector c is
dh(t]
JV
dc ^J ' dc 6h
reference
5.3 test
5.2 i
5.1
- 1
if, I ™— — -—~^~~ "
_-
5
4.9 M 2 :3
Aa
0 50 100 150 200 250 300 350 400 450 500
Time, h
500
before it will be stretched to align the estimated landmark. After the first
landmark, the warped curve lies above the center dashed-line, indicating
that the values after the first landmark location need to be compressed to
align the second and third landmarks with respect to the mean-value land-
marks. This makes intuitive sense, because an estimated landmark that is
advanced before a mean-value landmark must shift the data in a direction
that will align similar process events.
The on-line optimization procedure sequentially searches for the optimal
location of the landmarks of the test data with respect to the reference
mean landmarks. The following procedure is given for the mixed land-
marks case and can be modified to implement in an adaptive hierarchical
PCA framework for online SPM:
1. Initialize estimated landmarks vector Li.
2. For i = 1 , . . . , m (m, number of landmarks in the reference set).
3. Collect values of test trajectory that contains landmark information
up to time K. Choose time Ki for ith landmark so that it will span
the reference landmarks range as
Ki > argmax(^i).
K
A q
1 100 DO
15 •
0.04 10
I
0.02 5 90
F
0 100 200 300 400 50
n
0
I'
100 200 300 400 500 100 200 300 400 500
115
15
2 3
110
//-^"~
10 !/ 105
5 X 100
1 '
95
0 100 200 300 400 5C 0 100 200 300 400 500 0 100 200 300 400 500
x10"S
1 pH
base acid
5.2 1.5
5.1 3 1
2 / ~s~~~-
5 Ir—-— -—-' 0.5 I
A\ L
AQ n / |"~^~->_^ 2
0 100 200 300 400 500 0 100 200 300 400 500 100 200 300 400 500
Time, h Time, h Time, h
similar line of thought can be followed for detecting the temporal location
of the third landmark which is the start of the death phase towards har-
vesting the fermentation. At this phase, biomass concentration begins to
decline resulting in the decrease in hydrogen ion concentration level. Note
that, a set point gap is defined for acid flow rate controller action to avoid
excessive acid addition during the simulations resulting in a small increase
at pH right after the third landmark [61]. Therefore, the instant when acid
addition takes place after stationary phase can be used to determine the
location of the third landmark and the beginning of the death phase.
Once the decision about the choice of the variables that contain land-
mark information is made, these variables (biomass concentration (X), base
(-^base) and acid (Facid) flow rates in Figure 6.36) are investigated in each
batch of the reference set and landmark locations are stored. In this ex-
ample, reference landmark locations matrix im is of size (3 x 40). Note
6.3. Data Length Equalization 311
200 300
Time, h
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
Time, h Time, h
(a) Before alignment (b) After alignment
5001
450!-
400 j-
350 h
O)
E 300 f-
IB 250-
e-
5 200-
100^
50f
0_
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
Time Time, h
(c) Warping functions (d) Deformation functions
Unregistered Registered
100 200 300 400 500 100 200 300 400 500
100 200 300 400 500 100 200 300 400 500
profiles in the reference batch set are aligned similarly so that the same
set of nonlinear warping functions (Figure 6.38(c)) are used to align rest
of the variable profiles. Since the same physiological events affect most
variables such as concentration profiles, their landmark locations overlap in
time (Figure 6.39).
Alignment of variable profiles online in real-time of a new batch using land-
mark registration
After aligning the reference batch profiles, the necessary information is
available for implementing the alignment procedure for a new batch online
in real-time. The iterative online landmark estimation procedure described
earlier is used in this example. The necessary information from reference
batch alignment includes reference landmark locations matrix £m and its
mean vector, and the aligned reference set to calculate average profiles.
Since a combined landmark location vector is used from different process
variables, the corresponding reference profile is used as a comparative tem-
plate while implementing the online procedure. For instance, the inflection
point in biomass concentration profile determines the location of the first
314 Chapter 6. Statistical Process Monitoring
Jo.i
iiiiiiijjiijiiiiijjjiiitlLlili
mi llhl II I I ll III!
Var#1 Var#2
1.32 o
cP o oo o 0.34
1.3
„
0 ° 0 —. 0 0° 03 °0
1.28 o 0°° o o 0
0.33 °° ° o o °° Q 3 Q .
o o
1.26 0 o 0
0 0 °
1.24 0.32
) 10 20 30 40 ) 10 20 30 40
Var#3 Var#4
0.13 0.14
0.12
0.11
0.11 •^^o^f^&Vtf&fb*
0.1
0.08
rf.-.^.-'.V*
0.09 0.05
) 10 20 30 40 ) 10 20 30 40
Var#5 Batch number
150
140
o oo QcO 0° o 0°o o^ ^ ^
130 O o O •
•t-yn
10 20 30 40
Batch number
Figure 6.41. Quality variables measured at the end-of-batches for the ref-
erence set (listed in Table 6.7).
(6.95)
where n and sref are the number of observations and the estimated
standard deviation of the t-score sample at a given time interval k
(mean is always 0) and £ n -i,a/2 is the critical value of the Studentized
variable with n — 1 degrees of freedom at significance level a/2 [435].
The axis lengths of the confidence ellipsoids in the direction of ath
principal component are given by [262]
0.02
95
298.1
Bt (6.97)
(/-i)
where t a is a vector of A scores [254] and S is the (A x A) estimated
covariance matrix, which is diagonal due to the orthogonality of the
t scores [594]. The statistics aforementioned in Eq. 6.97 is called
322 Chapter 6. Statistical Process Monitoring
7.5 0
Var#4 Var#5 Var#6
296.3 1.2
1.15
296
1.1
295.7 1.05
Var#7 Var#8 Var#9
15 105
10
100
5
0 95
Var#10 Var # 11 Var# 12
3 298.1
2 5.2
5.1 298
1
'•
0 AQ 297.9
Var# 13 Var#14 200 400 600 800
100 Number of samples
50
Di = (6.98)
a=l a=l
Bf (6.99)
6.4. Multivariable Batch Processes 323
fl
o=l 0=1
(6-102)
where e^ is the ith row of E, /is the number of batches in the reference
set, A is the number of PCs retained in the model, and ta is a vector
of A scores [254] .
Statistical limits on the Q-statistic are computed by assuming that
the data have a multivariate normal distribution [253, 254]. The
control limits for Q-statistic are given by Jackson and Mudholkar
[255] based on Box's [76] formulation (Eq. 6.104) for quadratic forms
with significance level of a given in Eqs. 6.104 and 6.105 as
(6-104)
324 Chapter 6. Statistical Process Monitoring
9 = 02/01, h = 01/02
ho = 1-20103/302- (6-107)
#;'s can be estimated from the estimated covariance matrix of residu-
als (residual matrix used in Eq. 6.103) for use in Eq. 6.105 to develop
control limits on Q for comparing residuals on batches. Since the co-
variance matrices ETE (JK x JK} and EET (/ x /) have the same
non-zero eigenvalues [435], EET can be used in estimating 0^'s due to
its smaller size for covariance estimation as
EET
V = -—- , Oi = trace(V'), for i = 1,2, and 3. (6.108)
1 1
A simplified approximation for Q-limits has also been suggested in
[148] by rewriting Box's equation (Eq. 6.104) by setting 6\ « #i# 3
Eq. 6.105 can be used together with Eq. 6.108 to calculate control
limits for sum of squared residuals when comparing batches (Qi in
Eq. 6.103).
In order to calculate SPE values throughout the batch as soon as the
batch is complete, Eq. 6.110 is used for each observation at measure-
ment time k [435]
Calculated SPE values for each time k using Eq. 6.110 follow x 2
(chi-squared) distribution (Eq. 6.104, [76]). This distribution can be
6.4. Multivariable Batch Processes 325
9=—,
y
2m *=— v
where m and v are the estimated mean and variance of the SPE at
a particular time interval k, respectively. It was reported that these
matching moments were susceptible to error in the presence of outliers
in the data or when the number of observations was small. Outliers
should be eliminated as discussed in Section 3.4.2.
Contribution plots are used for fault diagnostics. Both T2 and SPE
charts produce an out-of-control signal when a fault occurs but they
do not provide any information about the cause. Variable contribu-
tions to T2 and SPE values indicate which variable(s) are responsible
for the deviation from normal operation. T2 statistic is used to mon-
itor the systematic variation and SPE statistic is used to monitor the
residual variation. Hence, in the case of a process disturbance, ei-
ther of these statistics will exceed the control limits. If only the T2
statistic is out of control, the model of the process is still valid but
the contributions of each process variable to this statistic should be
investigated to find a cause for the deviation from normal operation.
If SPE is out of control, a new event is found in the data, that is
not described by the process model. Contributions of each variable
to SPE will unveil the responsible variable(s) to that deviation.
Contribution plots are discussed in more detail as a fault diagnosis
tool in Section 8.1.
Explained variance, loadings and weights plots highlight the varia-
bilities of batch profiles. The explained variance is calculated by
comparing the real process data with the MPCA model estimates.
This can be calculated as a function of batch number, time, or variable
number. The value of explained variance becomes higher if the model
accounts for more variability in the data and for the correlation that
exists among the variables. Variance plots over time can be used as
an indicator of the phenomenological/operational changes that occur
during the process evolution [291]. This measure can be computed as
~2
SS explained, % = ^- x 100 (6.112)
326 Chapter 6. Statistical Process Monitoring
where SS stands for 'sum of squares', cr2 and a1 are the true and
estimated sum of squares, respectively.
Loadings also represent variability across the entire data set. Al-
though the loadings look like contributions, a practical difference oc-
curs when some of the contributions of the process variables have
values much smaller than their corresponding loadings and vice versa.
In the case of MPLS-based empirical modeling, variable contributions
to weights (W) carry valuable information since these weights sum-
marize information about the relationship between X and Y blocks.
There are several ways of present this infirmation as charts. The over-
all effect of all of the process variables on quality variables over the
course of process can be plotted, or this can be performed for a specific
period of the process to reflect the change of the effect of the predictor
block (X). Recently, Wold et al. [145] suggested yet another statistic
as they coined the term Variable Influence on Projection (VIP) using
the following formula
1/2
J
VIP* = (6.113)
0 - SSYA)
where t new denotes the scores of the new batch calculated by using P (JKx
A) loadings from the MPCA model with A PCs. If the scores of the new
batch are close to the origin and its residuals are small, this indicates that
its operation is also similar to that of reference batches representing normal
operation. The sum of squared residuals Q for the new batch over all the
time periods can be calculated as Q = eTe = X^=i e(^) 2 for a quick com-
parison with Q values of reference batches. D statistic (Eq. 6.97) can also
be used to get an overall view. These statistics give only summary infor-
mation about the new batch with respect to the behavior of the reference
set, they do not present instantaneous changes that might have occurred
during the progress of the batch. It is a common practice to use on-line
MPCA algorithms to obtain temporal SPE and T2 values. These charts
are introduced in Section 6.5.1. However, T2 and cumulative score plots
are used along with the variable contributions in this example to find out
the variable(s) responsible for deviation from NO. T2 is computed for each
sampling instant using Eq. 6.101. Scores are calculated for each sampling
instance and summed until the end of the batch to reach the final score
value. Limits on individual scores are given in Eq. 6.95.
The MPCA model can be utilized to classify a completed batch as 'good'
or 'bad'. Besides providing information on the similarity of a newly finished
batch with batches in the reference set, MPCA model is also used to assess
the progress during a run of a finished batch. Temporal scores evolution
plots, SPE and T2 charts, are generally used along with contribution plots
to further investigate a finished batch.
Example. MPCA-based SPM framework is illustrated for a simulated data
set of fed-batch penicillin production presented in Section 6.4.1. Two main
328 Chapter 6. Statistical Process Monitoring
PC no. X-block
This PC Cumulative
1 16.15 16.15
2 10.34 26.49
3 7.89 34.38
4 5.96 40.34
steps of this framework are model development stage using a historical refer-
ence batch database that defines normal operation and process monitoring
stage that uses the model developed for monitoring a new batch.
.100
0 5 10 15 20 25 30 35 40 ° 5 10 15 20 25 30 35 40
9 10 11 12 13 14
(e)
(f)
evolution. The lowest line in both figures represents the percent explained
by the first PC, the next line above shows the percent explained by the
first two PCs together, and so on. Several operational and physiologi-
cal phases throughout the fed-batch penicillin fermentation are detected
from this plot. Comparing the relative increases in the four curves that
indicate the cumulative variation explained, the first PC explains most
of the variability in first phase that corresponds to the batch operation
(switching from batch to fed-batch at around the measurement 85), while
the second PC explains variability in the second phase (fed-batch opera-
tion/exponential growth phase). This is a common observation in MPCA
because the correlation of the process variables in each phase changes over
the progress of a batch. Figure 6.45(f) shows that the dominant variables
in the first principal component are 5, 7, 8, 9, 13 and 14. These variables
contain physiological change information and their profiles look similar (see
Figure 6.31). Variable 10 and others are explained mostly by the second
and third components. The first principal component explains most of the
batch operation phase and exponential growth phase in fed-batch operation
where most of the process dynamics take place (in the associated variables
5, 7, 8, 9, 13 and 14). The second and additional principal components cap-
ture variability mostly in the fed-batch operation where 10 (carbon dioxide
evolution) is dominant. Figure 6.45(e) indicates a decrease in explained
variance during the period of approximately 40th and 60th measurements
for all of the 4 PCs that precedes switching to fed-batch operation, because
the variability of process variables is low in this period. To increase phase-
based explained variability, multiple model approaches are also suggested
[130, 291, 605]. An example is given in Section 6.4.5.
Process monitoring stage: The MPCA model developed here is used
to monitor finished batches to classify them as 'good' or 'bad' and also
investigate past batch evolution, and detect and diagnose abnormalities. A
batch scenario including a small downward drift fault is simulated (Section
6.4.1, Figure 6.44 and Table 6.8). New batch data are processed with
MPCA model using Eq. 6.114 after proper equalization/synchronization,
unfolding and scaling. The same set of multivariate SPM charts are plotted
(Figure 6.46). Score biplots in Figures 6.46(a) and 6.46(b) detect that
the new batch (batch number 42) is operated differently since its scores
fall outside of the NO region defined by MPCA model scores. Both D
and Q statistics also indicate an out-of-control batch. Now that the batch
is classified as out-of-control, the time of the occurrence of the deviation
and the variables that have contributed to increasing the values of the
statistics can be determined. The aforementioned temporal T2 chart based
on cumulative scores and individual score plots can be used here. The
T2 value goes out-of-control as shown in Figure 6.47(a), the same out-of-
6.4. Multivariable Batch Processes 331
25 30 35 40 10 15 20 25 30 35 40
(c) (d)
(b)
(a)
-0.05
-0.1
-150
(c) (d)
Figure 6.47. End-of-batch fault detection and diagnosis for a faulty batch.
during and/or at the end of the batch [298, 434, 663]. When a batch is
finished, a block of recorded process variables Xnew (K x J) and a vector
of quality measurements y new (1 x M) that are usually measured with a
delay due to quality analysis, are obtained. X new (K x J) is unfolded to
Xnew (1 x KJ) and both xnew and y n ew are scaled similarly as the reference
batch set scaling factors. Then, they are processed with MPLS model
loadings and weights that contain structural information on the behavior
of NOC set as
-i
= x new W (PTW) (6.115)
^new — Y Yn (6.116)
where tnew (1 x^4) denotes the predicted t-scores, ynew (1 xM) the predicted
quality variables, and e and f the residuals.
6.4. Multivariable Batch Processes 333
1234567891011121314
•y
1.5
0.5
0
1 2 3 4 5 6 7 8 91011121314 1 2 3 4 5 6 7 8 91011121314
Variable No.
1 2 3 4 5 6 7 8 9 1011121314
Variable No.
of time for initial fast assessment before the real Y is available. New batch
data are processed with MPLS model at the end of the batch as shown in
Eqs. 6.115 and 6.116 after proper equalization/synchronization, unfolding
and scaling, resulting in multivariate SPM charts (Figures 6.50 and 6.51)
for detection and diagnosis. Figure 6.50 summarizes several statistics to
compare new batch with the reference batches. Figures 6.50(a) and 6.50(b)
indicate that there is a dissimilarity between the new batch and the NO
batches in both process and quality spaces. Scores of the new batch in both
spaces fall outside of the in-control regions denning NO in figures 6.50(c)
and 6.50(d). These charts suggest that an unusual event occurred in new
batch and should be investigated further. To find out when the process
wnet out-of-control and which variables were responsible SPEx chart and
a variety of contribution plots are used (Figure 6.51). SPEx chart of process
space in Figure 6.51 (a) reveals a deviation from NO and process goes out-
of-control around 570th observation. The overall variable contributions to
SPEx in Figure 6.51(b) over the course of batch run indicate that variable 9
336 Chapter 6. Statistical Process Monitoring
(c (d)
O 0.8
15 20 25 30 35
25 30 35 Batch Number
Batch Number
(e) (f)
0 5 10 15 20 25 30 35 40
Batch number
(a)
Jlllll. Jill
(a) (b)
1 2 3 4 5 6 7
III.
8
Variable No.
9 10 11 12 13 14
ll.lll.l.l.ll
(d)
J.,1 .11..
~1t
-i.si-
o
-2r • 2
(f)
| Staqe 1
>• I Stage 2
Phase 1 Phase 2
X1
/\
X1b
Figure 6.52. Data arrangement and blocking of variables for process stages
and phases.
Stage 1 is the wet granulation of the fine powder mix of active ingre-
dient (s) and other pharmaceutical excipients. The objective of the
granulation is to increase the particle size by agglomerating this fine
powder by adding a binder solution under continuous mixing. Parti-
cle size increase promotes higher bioavailability of the drug. At this
stage, the amount of binder used and its addition rate are effective on
the particle size increase. The amount of binder solution and its ad-
dition rate are predefined based on experimental design studies. We
have assumed a fixed total amount of binder solution in the simulation
studies. Binder addition rate, impeller speed, and power consump-
tion are taken as measured process variables at this stage. Stage 1
is operated in two phases: phase 1, dry mixing for a fixed time in-
terval, and phase 2, binder addition while mixing (Fig. 6.53). Since
there are small fluctuations in binder flow rate at each batch, the
final time of the second phase is variable, producing unequal batch
length for stage 1. These differences should be eliminated prior to
multivariate statistical modeling. To equalize data lengths in phase
6.4. Multivariable Batch Processes 341
Figure 6.53. Phase structure of the first stage. F^: Binder addition rate,
Pw: Agitator power consumption.
Stage 2 is the drying stage where a fluid bed dryer is used. The wet gran-
ulates are dried using hot airflow to decrease their moisture content.
The increase in product temperature is measured as an indicator of
drying. Airflow rate, inflow air temperature, drying rate, and prod-
uct moisture are also measured. Product temperature is found to be
appropriate as an indicator variable for this stage, and measurements
on each variable are interpolated on every 0.5 °C increase in product
temperature, resulting in 63 observations (Figs. 6.54c and 6.54d).
MFC A model development stage for data blocks: There are two opera-
tional phases at the first stage of the granule production process. The first
phase contains dry mixing for a fixed time, and the second phase involves
342 Chapter 6. Statistical Process Monitoring
wet massing. Since the total amount of binder to be added is fixed, the
exact completion of the second phase is reached when all of the binder solu-
tion is consumed. Data from the first phase are collected based on the fixed
operation time, resulting in the 'X.ijiki unfolded matrix. Data arrangement
in the second phase is based on a fixed indicator variable (percent binder
addition), resulting in X i j 2 fe2- The index pairs jl, kl and j2, k2 denote
variables and observation numbers of each phase, respectively. The overall
performance can also be investigated by appending these matrices to form
an augmented matrix
Xij2k2\,
Two local models for phase 1 and phase 2, and one overall model, are
developed using MPCA technique and compared. Variance plots can be use-
ful for comparing different models and investigating the changing variance
structure in overall data. The variance explained is higher (45.67% more)
for the local model of phase 1 than the overall model, whereas variances
explained are much closer but still higher (4.22% more) for the local model
of phase 2 (Fig. 6.55). This is expected, since the same event occurs in the
second phase (Fig. 6.53). Local models explain more information (17.98%
more for the whole process) based on computations of sum of squared errors
and data lengths in each phase.
Process monitoring stage: A new batch with a small drift in impeller
speed introduced at 0.5 rain (10th observation in phase 1 of stage 1) was
monitored after its completion. Note that the fault starts early in the
first phase. Both SPE and T2 plots for local and overall models in Figure
6.56 indicated that there is a deviation from the NO. Since overall model
performance is not high in the first phase, the early departure is caught
later with monitoring based on the overall model than with the local model
(Figure 6.56b), and many false alarms are observed in the SPE plot (Figure
6.56a). The advantages of using the local model for phase 1 are:
1. The false alarms observed with the overall model for phase 1 in the
SPE plot are eliminated.
344 Chapter 6. Statistical Process Monitoring
25
Overall Model (OM)
Local Model (LM)
95 % Limit (OM)
20 99 % Limit (OM)
95 % Limit (LM)
99 % Limit (LM)
15
10-
25
20
15 . A
« f T J
150
100
50
O 0.5
200 10
150
LLJ
85 100
50
0 20 40 60 0 20 40 60
Observation Number Observation Number
The case illustrates that local models provide the capability to detect ear-
lier small trends and departures from NO that will be propagated to the
next phase and eventually cause significant deviation, thus allowing process
operators to improve their operations.
Online monitoring was performed in each processing stage based on
adaptive hierarchical PC A (AHPCA). For this multistage process, AH-
PCA is limited to stages due to interstage discontinuity. To overcome this
problem, different AHPCA models are developed for each stage. Differ-
ent weightings can also be applied to better account for changing phase
structure.
To illustrate online monitoring, a case is generated where a small drift
in impeller speed is introduced (dashed line) in the first stage and a step
increase (dashed line) in inflow air rate in the second stage. Each AHPCA
model successfully detected and diagnosed the problem online in each stage
for the overall process (Fig. 6.57).
5. Develop MPCA models for the coefficients at each scale for the past
batches
8. Identify the scales that violate the detection limits as important scales
10. Check the state of the process by comparing the reconstructed data
with detection limits
The data set representing normal operation is decomposed to wavelet coef-
ficients for each variable trajectory. MPCA models are developed at each
scale. The overall MPCA model for all scales is obtained by reconstructing
the decomposed reference data. Wavelet decomposition is applied to new
batch data using the same wavelet function. For each scale, T2 and SPE
values of the new batch are compared with control limits computed based
on reference data. The scales that violate the detection limits are consid-
ered as important scales for describing the critical events in current data.
Inverse wavelet transform is applied recursively to the important scales to
reconstruct the signal. The new batch is considered to be out-of-control if
T2 and/or SPE values of the reconstructed signal violate the control limits.
the dimension of data used. For selection of scales the following formula
can be used:
£ = log2n-5 (6.118)
where t is the number of scales and n is the number of observations.
Example. MSMPCA based SPM framework is illustrated for a simulated
data set of fed-batch penicillin production presented in Section 6.4.1. Two
main steps of this framework are model development stage using a histor-
ical reference batch database that defines normal operation and process
monitoring stage that uses the model developed for monitoring of a new
batch.
MSMPCA model development stage: A reference data set of equal-
ized/synchronized (Figures 6.42 and 6.43), unfolded and scaled 40 good
batches (each batch contains 14 variables 764 measurements resulting in a
three-way array of size X(40 x 14 x 764) is used. After unfolding by preserv-
ing the batch direction (/), the unfolded array becomes X(40 x 10696)).
Each variable trajectory in X is decomposed into its approximation and
detail coefficients in three scales using Daubechies 1 wavelet family that is
chosen arbitrarily. Although Eq. 6.118 suggests four scales, the decomposi-
tion level of three is found sufficient in this case. Since the original signals
can be reconstructed from their approximation coefficients at coarsest level
and detail coefficients at each level, those coefficients are stored for MPCA
model development (Table 6.12). Then, MPCA models with five PCs are
developed at each scale and MV control limits are calculated.
Process monitoring stage: MPCA models developed at each scale are
used to monitor a new batch. A faulty batch with a small step decrease
on glucose feed between measurements 160 and 200 is mean-centered and
scaled similarly to the reference set and 1-D wavelet decomposition is per-
formed on variable trajectories using Daubechies 1 wavelets. This three-
level decomposition is illustrated for penicillin concentration profile (vari-
able 8 in the data set, x& = OQ) in Figure 6.58. Note that, the effect of the
step change on this variable becomes more visible as one goes to coarser
6.4. Multivariable Batch Processes 349
20
400
200
-10
100 50 100
scales. Starting and end points of the fault are more apparent in the detail
coefficient (d^) of the third scale since detail coefficients are sensitive only
to changes and this sensitivity increases in coarser scales. SPE on each scale
is calculated based on MPCA models on scales. An augmented version of
SPE values at all scales is presented in Figure 6.59. The 99% control limit
is violated at scale m = 3 on both its approximation and detail coefficients.
There are also some violation at scale two but no violation is detected at
the first scale hence this scale is eliminated. Fault detection performances
of conventional MPCA and MSMPCA are also compared in Figure 6.60.
The lower portion of this figure represents SPE of the approximation coeffi-
cients. The first out-of-control signal is detected at point 162 and returning
to NO is detected at point 208 at that scale on SPE whereas conventional
MPCA detects first out-of-control signal at 165th measurement and return-
ing point to NO at 213th measurement. In addition, MSMPCA-based SPE
contains no false alarms but conventional MPCA has 16 false alarms after
the process returns to NO. The advantage of MSMPCA stems from com-
bined used of PCA and wavelet decomposition. The relationship between
the variables is decorrelated by MPCA and the relationship between the
stochastic measurements is decorrelated by the wavelet decomposition. MV
350 Chapter 6. Statistical Process Monitoring
Figure 6.59. SPE on different scales of the decomposed faulty batch data.
Darker line represents 99% control limit.
80
60 h
UJ
0- 40-
CO
20 kf
^400
E
m 300
ca
CO, 200
HI
Q- 100
CO
500
400
""co
3, 300
LLJ
& 200
100
^ 80
CO
3-
o 60
CNJ
| 40
of
J5 20
o
O .LIU
1 2 3 4 5 6 7 8 91011121314
Variable number
O
O
LL-.1
2 3 4 5 6 7 8 91011121314
Variable number
Figure 6.61. Fault detection and diagnosis by MSMPCA. Dashed line rep-
resents 99% control limit on SPE charts.
1. Use the MSPM tools with variable trajectories that are combinations
of real data (up to the present time in the batch) and estimates of
the future portion of the trajectories to the end of the batch
4. Use estimators for predicting the final product quality and base batch
monitoring on this estimate.
where x®^ denotes the full variable measurements vector (1 x KJ) that is
estimated at each k onwards to the end of the batch run, t neWj fc (IxA), the
predicted scores at sampling time k from the P loadings, and e neW) fc (1 x KJ)
the residuals vector at time k. To construct the control limits for on-line
monitoring of new batches, each reference batch is passed through the on-
line monitoring algorithm above, as if they are new batches, and their
predicted scores (tnew.fc) and squared prediction errors (SPEfc) are stored
at each sampling interval k.
Example. MPCA-based on-line SPM framework is illustrated using the
same simulated data set of fed-batch penicillin production presented in Sec-
tion 6.4.1. The large downward drift fault in glucose feed rate is used as
a case study (Figure 6.44 and data set X3 (764 x 14) in Table 6.8). The
model development stage and the MPCA model developed are the same as
in Section 6.4.3, with the exception that the construction of control limits
is performed by passing each batch data in the reference set through the
estimation-based on-line SPM procedure. The process monitoring stage
6.5. On-line Monitoring of Fermentation Processes 355
— Original "~|
- - - Method 1
— • Method 2
Methods
100 200 300 400 500 600 700 800 100 200 300 400 500 600 700 800
depends on the estimation method used. All three methods are imple-
mented in this example. Greater difference caused by the data estimation
method used is observed in the T2 chart in Figure 6.63(a). The out-of-
control signal is first detected by the second technique (the future values
of disturbances remain constant at their current values over the remaining
batch period) at the 325th measurement in T2 chart. SPE chart detected
the fault around the 305th measurement in all of the techniques. Variable
contributions to SPE and T2 and scores biplots are presented for Method
2. Contribution plots revealed the variables responsible for the deviation
from NO when out-of-control state is detected. Variables 3 and 5 in SPE
contributions (Figure 6.63(d)) at 305th measurement and variable 3 and 5
(and 7, 13, 14 to a lesser extent) in T2 contribution plot (Figure 6.63(c))
356 Chapter 6. Statistical Process Monitoring
where (1 : k J, a) indicates the elements of the ath column from the first row
up to the kJth row. The missing values are predicted by restricting them
to be consistent with the values already observed, and with the correlation
structure that exists between the process variables as defined by the MPLS
model. It is reported that this approach gives t-scores very close to their
final values as X new is getting filled with measured data (k increases) and
it works well after 10 % of the batch evolution is completed [433, 434, 435].
When a new variable measurements vector is obtained and k is incre-
mented, scores t(l, a) neW) fc can be estimated and used in MPLS (Eqs. 6.115
and 6.116). There are no residuals f on quality variables space during on-
line monitoring since the actual values of the quality variables will be known
only at the end of the batch. Each batch in the reference database is passed
through the on-line MPLS algorithm as if they were new batches to con-
struct control limits. Since MPLS provides predictions for the final product
qualities at each sampling interval, the confidence intervals for those can
also be developed [434]. The confidence intervals at significance level a for
an individual predicted final quality variable y are given as [434]
) 1/2 (6.123)
6.5. On-line Monitoring of Fermentation Processes 357
° 0.15
L •!.!•
(c) (d)
(e)
where T is the scores matrix, ti_A-\, a/2 is the critical value of the Stu-
dentized variable with I — A — I degrees of freedom at significance level a/2
and mean squared errors on prediction (MSE) are given as
ol
W 20
.'Afc
'* \ i'^V ",^'\rW\
*•. -
100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 700 800
Measurements Measurements
(b)
150
"- — — --..
5 5°
e o
U-
50
,
----"
150
Measurements
(c) (d)
0.12
0.115 0.09
0.11 0.08
)
0.105 0.07
0.06
0.05
200 400 600
Measurements
_w U.
J
O 0.15
0.18
0.16
0.14
12
g °'
J-" 0.1 [•
JL.
o o.oe!
o.oe;
0.04
0.02
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Variable No.
(c) (d)
(r a fc) at time k and the recent history (ta(k-i))i playing a role similar to that
of the exponential weighting factor in an EWMA model. The consensus
matrix R a fc is formed from ta(k-i) and r a fc column vectors and the weight
vector w a fc is computed as (wafc = R^ fc t a fe) for calculating the new score
vector t a fc = RafcWafc. Then, t a fc is normalized and checked for convergence.
If convergence is achieved the Xafc blocks are deflated as X( 0+1 ) fc = Xafc —
tafcp^r, to calculate the next dimension (a is increased by 1). The converged
latent vectors are computed for a given a for all k, then a is incremented
by 1 and the process is repeated until a = A. The model generated can
be used to monitor future batches by storing p a fc, w a fc, and dk for a =
As data are collected from the new batch and stored as row vectors x^,
the values for r a fc, t a fc, and x.(a+i)k are computed at time k for a = 1, • • • , A
362 Chapter 6. Statistical Process Monitoring
0.03 r
0.02 [
o
o
0.01}
OL MMI
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Variable No.
by using
t afc = [t a(fc _i) dkrak]\vak, (6.126)
The score and error values at each k can be plotted for MSPM of the
batch. Since no missing data estimation is required in AHPCA, the control
limits are calculated directly using the residuals and scores from the model
building stage.
Example. AHPCA-based SPM framework is illustrated using the same
simulated data set of fed-batch penicillin production presented in Section
6.4.1. Two main steps of this framework can be expressed as model de-
velopment stage using a historical reference batch database that defines
normal operation and process monitoring stage by making use of the model
developed for monitoring a new batch.
AHPCA model development stage: AHPCA model is developed from a data
set of equalized/synchronized (Figures 6.42 and 6.43), unfolded and scaled
37 good batches (each containing 14 variables 764 measurements resulting
6.5. On-line Monitoring of Fermentation Processes 363
(a) (b)
(c) (d)
0.36
0.34
0.32
0.3
0.28
0.26
PC no. X-block
This PC Cumulative
1 28.25 28.25
2 20.05 48.30
3 10.29 58.59
UJ 20
(b)
Figure 6.71. AHPCA model (with three PCs) statistics, (a) biplots of 37
reference runs, (b) cumulative explained variance with one, two and three
PCs.
500,
450!
400 ^
350r
300 r
_250t
200
150-
100-
(a) (b)
1 2 3 4 5
ill
6 7 8 9 10 11 12 13 14
100 Variable No.
Measurements
(c) (d)
Figure 6.72. On-line monitoring of a faulty batch using AHPCA. The sub-
script "140-180" in figures (b) and (d) indicate that contributions are av-
eraged between 140th and 180th measurements.
\ J
X(IKxJ)
(b)
trajectories [663].
Process measurements array X can be unfolded to X (IK x J) by pre-
serving the variable direction [232, 552, 663]. In this case, X can be thought
of as a combination of slices of matrices of size (KxJ] for each batch (Figure
6.73(a)). X is formed after rearrangement of these slices. This type of un-
folding suggests a different multivariate modeling approach [232, 606, 663].
Batch evolution can be monitored by developing an MPLS model between
X (IK x J) and a time stamp vector z (IK x 1) (Figure 6.75(b)). In this
case, MPLS decomposes X and z into a combination of scores matrix T
(IK x .R), loadings matrix P (J x R} and vector q (R x 1) and weight
matrix W (J x R} with different sizes compared to conventional MPLS
decomposition discussed in Section 4.5.2
X = TPT + E
z = Tq + f (6.129)
2.5
2
1.5
1
0.5
(a) (b)
0
(e)
Since the size of the resulting matrix from the operation W(P T W) x is
J x R, online monitoring of the new batch can be performed without any
future value estimation.
In the pre-processing step, X is mean-centered by subtracting variable
means and usually scaled to unit variance. This pre-processing differs from
the conventional approach (Figure 6.74(c)-(d)) in that the dynamic non-
linear behavior of trajectories in X is retained (Figure 6.74(e)-(f)). This
technique can also be combined with conventional MPLS for predicting
product quality after the completion of the batch run [606, 663].
6.5. On-line Monitoring of Fermentation Processes 369
J t Z U
1
KJ
X
IK IK
<}
(a) MPLS model blocks for predicting product (b) MPLS model blocks
quality for predicting progress of
the batch
600
500
Figure 6.76. Predicted local batch time for the entire process duration. The
peak corresponds to switching from batch to fed-batch operation.
This model provides information about the relationship between time and
the evolution of process variable trajectories. The predicted time stamp
vector Zpred can then be used as an indicator variable such that process
variables are re-sampled on percent increments of this derived variable. It
is assumed that variable trajectories contain sufficient information to fairly
predict batch time in MPLSV modeling. This assumption implies that
variable trajectories somewhat linearly increase or decrease in each time
region. Local batch time prediction produces weak results when there are
discontinuities or there exists instances that variables have simultaneous
piecewise linear dynamics during the evolution of the batch. As illustrated
in Figure 6.76 with fed-batch penicillin fermentation data, predicted time
shows non-increasing or decreasing behavior in the region around the dis-
continuity which makes it inappropriate for data alignment. Similar results
were also reported for industrial data [641].
A solution is proposed to this problem by partitioning the entire process
into major operational phases [606]. Two different data alignment methods
are used. For the general case when batches in the reference data set are of
unequal length and no appropriate indicator variable is found, an MPLSV
model is developed between X and local time stamps vector z for each pro-
cess phase. Process variable trajectories are then re-sampled with respect to
the percent completion of predicted local batch time vector zpred • A vector
6.5. On-line Monitoring of Fermentation Processes 371
t±3cr (6.131)
where t are average estimated scores and a their standard deviations [663].
When a new batch is monitored with the model parameters of MPLSV,
estimated scores of this new batch will also be nonlinear. After proceed-
ing with mean-centering of these scores that reduces the nonlinearity, it is
possible to construct tighter control limits by using Eq. 6.95. This mod-
ification allows faster fault detection as discussed in case studies. When
an out-of-control status is detected with either type of score plots, variable
contributions are checked for fault diagnosis.
Online Prediction of Product Quality. It is advantageous to use
MPLSV type models for online monitoring because it is not necessary to
372 Chapter 6. Statistical Process Monitoring
0 10 20 30 40 50 60 70 80 90 100
0 10 20 30 40 50 60 70 80 90 100
% completion on zmax
Figure 6.78. Predicted local batch times (z pre d) in Phase 1 and 2 with
control limits (dashed lines).
6.5. On-line Monitoring of Fermentation Processes 373
100%' 7 ^ S^
x« . . . . X(lxKJ)
Xw X« x«
60%
X(IKxJ)
(a)
Figure 6.79. (a) Partitioning of process measurements space and (b) re-
structuring for online quality prediction framework.
batch/fed-batch switching point is found for each batch and data are di-
vided into two sets as phase 1 and phase 2. Because the third variable
(substrate feed) is zero in batch phase, only 13 variables are left in this
first set. Data alignment is performed by using the IV technique. Since an
indicator variable is not available for the entire batch run, separate indi-
cator variables are selected for each phase. Variable 9, the culture volume
decrease, is a good candidate to be chosen as an indicator variable for phase
1. A new variable called 'percent substrate fed' is calculated from variable
3 (substrate feed) and used as an indicator variable for phase 2 data set.
This variable is added as the 15th variable to the data set of phase 2. It
is assumed that fed-batch phase is completed when 25 L of substrate is
added to the fermenter. Data are re-sampled by linear interpolation at
each 1 percent completion of volume decrease for phase 1 and at each 0.2
percent of total substrate amount added for phase 2. Data alignment is
achieved yielding in equal number of data points in each phase such that
the data lengths are Kl = 101 and K2 — 501, respectively.
MPLS model development stage: Model development includes two stages.
In the first stage, an MPLSV model is developed between process variables
matrix (unfolded in variable direction) and an indicator variable. This
model is used for online SPM purposes. The second stage involves devel-
oping predictive MPLSB models between available data partitions matrix
(rearranged process variables matrix in batch direction) and end-of-batch
quality matrix.
An MPLSV model is developed for phase 1 between autoscaled XI
(IKl x Jl) and the IV vector zl (IKl x I) by using 5 latent variables. The
number of latent variables should be chosen large enough to explain most
of the information in zl block because the MPLSV model is used to predict
batch evolution. Cross validation is used to determine the number of latent
variables. XI (IKl x Jl) can be rearranged into matrix XI (/ x KUl) to
develop an MPLSB model to obtain an estimate of end-of-batch quality at
the end of phase 1. Since all Kl measurements of the first phase have been
recorded by the beginning of the second phase, there would be no estimation
of variable trajectories required and / x K J partitioning can be used for
modeling. Autoscaled XI (IxKUl) and product quality matrix Y ( I x M )
are used as predictor and predicted blocks, respectively. Similarly, another
MPLSV model is developed for phase 2 between autoscaled X2 (/ x K2J2)
and IV vector z2 (IK2 x I).
In the second modeling stage, quality prediction models are developed.
To develop the first MPLSB model, data are collected in 50% increment
of phase 1 resulting in two data partitions Xi 5 i and Xi,2 (Figure 6.79b).
A similar approach is followed in phase 2 for every 20% increase in phase
2 evolution resulting in five data partitions (X2, n5 n ~ 1, • • • , 5). MPLSB
6.5. On-line Monitoring of Fermentation Processes 375
Table 6.14. Explained variance of MPLSB models for online quality pre-
diction
1 61.57 68.85
2 61.27 71.27
3 58.85 89.21
4 60.62 95.07
5 63.10 97.31
6 63.35 98.39
7 63.39 98.89
T2 48.4 269
Linear Score LV 2 50 276
Linear Score LV 5 51.6 283
Nonlinear Score LV 2 51.6 283
Nonlinear Score LV 5 52 285
Linear Score LV 4 59.4 319
Nonlinear Score LV 4 60.6 324
Linear Score LV 3 70.2 368
Nonlinear Score LV 3 84.2 433
SPE - -
of the batch, the contribution plot for SPE signals an unusual situation
for variable 3 (Figure 6.80c). Variable 3 and 11 are found to be the most
affected variables because of the fault according to T2 contribution plot.
Deviation from average batch behavior plot is ineffective in indicating the
most affected variable(s) in this case (Figure 6.83a).
Quality prediction ability of the integrated MSPM framework is also
tested via two cases. A normal batch is investigated first. As expected, SPE
plot produced no out-of-control signal and final product quality on all five
variables (shown as a solid star) is successfully predicted (Figure 6.84). The
prediction capability is somewhat poor in the beginning because of limited
data, but it gets better as more data become available. In the second case,
where a drift of magnitude -0.05% h~l is introduced into substrate feed rate
at the beginning of the fed-batch phase until the end of operation, SPE plot
signaled out-of-control right after the sixth quality prediction point (80%
completion of phase 2). Because MPLSB model is not valid beyond this
point no further confidence limit is plotted (Figure 6.85). Although the
predictions of MPLSB model might not be accurate for the seventh (and
final) value, the framework generated fairly close predictions of the inferior
quality. Predicting the values of end-of-batch quality during the progress
of the batch provided a useful insight to anticipate the effects of excursions
from normal operation on final quality. D
6.5. On-line Monitoring of Fermentation Processes 377
10'
10'
UJ 4
0.10
CO
10"
10
20 40 60 80 100 20 40 60 80 100
% Completion of IV % Completion of IV
(a) (b)
/ IU
6
i
ii
8
-i
5 i
in i
m i 18 6
4 \
?
3
£ 3
co M, j~ 4 ,
0 ./' \ $
M \l ** ^ • \ // N .
2 1 '• « "' ll / v\ / ^ "\
I / ^ W >' \ 1 1 2 v / \ ' \
1 \ 1 \ \
V
t ^"Pll t *v '
n
1 \'^ , J
n n 1 II lr-,Mi^n m
1 2 3 4 5 6 7 8 9 101112131415 1 2 3 4 5 6 7 8 9 1011 12131415
Variables Variables
(c) (d)
Figure 6.80. Control charts for SPE, T2 for the entire process duration
and contributions of variables to SPE and T2 for a selected interval after
out-of-control signal is detected in Phase 2 with 95% and 99% control limits
(dashed-dotted and dashed lines).
-— = f x (x,
v u, v) = f y (x,w) q= (6.132)
dt '
where x are the state variables, u the manipulated inputs, v and w the
state and output disturbances, y the measured outputs, and q the final
product quality at the end of the batch(t = £/). If a fundamental model
378 Chapter 6. Statistical Process Monitoring
100 0 50 100
% Completion of IV % Completion of IV
Figure 6.81. Nonlinear scores in Phase 2 with control limits (dashed lines).
of the process were available, the final product quality can be estimated
by using an Extended Kalman filter. When a fundamental dynamic model
is not available, an empirical model could be developed by using historical
data records of successful batches. The problem may be cast as a regression
problem where the measurements y upto the current time tc, and inputs u
upto the end of the batch are used at any time tc to estimate q. Note that
the inputs at t = tc, • • • ,tf have not been implemented yet and have to be
assumed. A linear predictor for final product quality has been proposed by
using a least squares estimator obtained through biased regression (by using
PCA or PLS) and extended to recursive least squares prediction through a
Kalman filter [531].
50 100 o so 100
% Completion of IV % Completion of IV
Figure 6.82. Linear scores in Phase 2 with 95% and 99% control limits
(dashed-dotted and dashed lines).
4- (6.134)
Just as the state Xfc in Section 4.3.2 was holding relevant process informa-
tion from sampling times & — ! , • • • ,1 for predicting future process behavior
380 Chapter 6. Statistical Process Monitoring
(a) (b)
11 12 13 14 15
(c) (d)
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
Model number Model number Model number
Figure 6.84. Online predictions for end-of-batch quality values for a normal
batch. Dotted straight line indicates the average value of a quality variable
based on reference batches.
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
Model number Model number Model number
Figure 6.85. Online predictions for end-of-batch quality for the faulty batch.
382 Chapter 6. Statistical Process Monitoring
Qi
The size of the lifted output vector y may be too large to develop sub-
space models quickly. This high dimensionality problem can be alleviated
by applying PCA prior to subspace identification [134]. Since 34 may have
a high degree of colinearity, there is a potential to reduce the number of
variables significantly. If the number of principal components is selected
correctly, the residuals are mostly noise that tends to be batchwise uncor-
related, and the principal components will retain the important features
of batch-to-batch behavior. Applying PCA to project 3^ of length JK to
a lower dimensional space y_ of size a such that a <C JK, the state-space
model based on 3^ is
H£ i
£l (6.136)
where 3^ is defined by
(6.137)
with the columns of matrix 0 being the principal directions (loadings) and
E being the PCA residuals matrix.
Several monitoring charts can be developed to detect abnormal batch-
to-batch behavior. T2 and Q charts of principal components would be one
alternative. But T2 charts of states X% and prediction errors £ r offer better
alternatives. The use of a small window CUSUM chart of prediction error
T2 has also been proposed [134]. It filters out high frequency variation in
£ i over i, and enhances the trends by accumulating the deviation over a
number of batches. A window size of 5 batches provides a good compro-
mise between capturing trends and delay in indicating big increases in the
prediction error of one batch run.
Process Control
7.1 Introduction
It should be evident from the discussion and various illustrations in Chap-
ter 2 that even in its simplest form, the representation of dynamic and
steady-state behavior of a bioprocess is multivariate in nature. Even a
simple unstructured kinetic representation for biomass formation would re-
quire knowledge / monitoring / prediction of a minimum of two variables,
namely concentrations / amounts of biomass and at least one specie (sub-
strate) which leads to production of cell mass. Recognizing that biomass
formation is the sum total of a large number of intracellular chemical reac-
tions, each of which is catalyzed by an enzyme, and that activity of each
enzyme is very sensitive to intracellular pH and temperature, one can ap-
preciate that this simplest black box representation would be applicable
only if the intracellular pH and temperature, and therefore indirectly the
culture (composite of abiotic and biotic phases) pH and temperature were
kept invariant. Considering that the pH and temperature in the biotic por-
tion of the culture and culture as a whole, if left uncontrolled, would vary
with time because of the large number of intracellular chemical reactions,
it is obvious that maintaining the culture pH and temperature at desired
values would require addition of an acid or a base and addition/removal of
thermal energy (heating/cooling) as appropriate. Thus, even in the sim-
plest scenario where the focus in the forefront is on formation of biomass
and consumption of a single substrate, one must consider in the background
manipulation of rates of acid/base addition and heating/cooling to keep
culture pH and temperature at the desired values.
Having realized that one must always deal with multivariate problems
when dealing with biological reactors, the dimension of the system rep-
resentation will depend on the nature of that kinetic representation em-
ployed (complexity of the kinetic model if one is available or complexity
of the bioprocess under consideration if a kinetic model is not available),
mode of operation of bioreactor [whether batch, fed-batch, or continuous,
with/without recycle (after selective removal of a portion of the bioreactor
contents using a separation technique)], and other external influences, such
383
384 Chapter 7. Process Control
rfx
— =f(x,u,d), x(0)=x 0 , (7.1)
388 Chapter 7. Process Control
with x denoting the state variables which represent the status of the cell
culture in the bioreactor, and u and d representing the input variables
which indirectly influence the status of the cell culture. The input variables
are further classified into manipulated inputs (u) and disturbance variables
(d). Let n, m and m^ denote the number of state variables, manipulated
inputs and disturbance variables. Not all state variables can be measured.
Some of the state variables, which cannot be measured or can be measured
less frequently, are estimated from measurements of other variables that are
measured frequently by using estimators (Section 4.3). It must therefore
be realized that only some of the state variables may be monitored or
estimated. The set of variables which can be measured will be referred to
as bioreactor outputs, y, with the number of outputs being p. The relations
among the state variables and the output (measured) variables can then be
succinctly stated as
y = h(x). (7.2)
The functions f (•) and h(-) are in general nonlinear. But for mathemat-
ical convenience in developing the control equations, these functions are
linearized. Linear state-space equations are discussed in Section 7.4.
then in view of relation 7.2, the objective function in Eq. 7.4 can be re-
stated as in Eq. 7.3. Besides the constraints imposed by the conservation
7.2. Open-Loop (Optimal) Control 389
where u m i n and u max denote the lower and upper bounds, respectively.
Maximization of the objective function is then accomplished via maximiza-
tion of the Hamiltonian H with respect to u. The Hamiltonian is denned
as [84]
# = c?(x,u)+A T f(x,u), (7-10)
with A being the vector of adjoint variables associated with Eqs. 7.1. The
variation in A with time is described by
dxT 9
= dG
^i(tf) 7j— ^ x i ( t f ) is not specified. (7-13)
The conditions in Eqs. 7.12 and 7.13 are applicable for once-through oper-
ations, i.e., process operations where x(0) and x(t/) are independent (i.e.,
£x(0) and <5x(t/) are not identical). In cyclic operation of a bioreactor, the
operation modes under consideration here (batch, fed-batch and continu-
ous) and certain sequences of these are repeated, with tf being the duration
of a cycle. In this case, x(t) satisfy the periodic boundary conditions
The boundary condition for the adjoint variables are then obtained as
dG
\i(tf) = ^— + Ai(0) if Xi(tf) is not specified. (7.16)
UXi
In view of the conditions above (Eqs. 7.12-7.14 and 7.16), 6J can be ex-
pressed as
f f [ / f)U\ 1
SJ = \(^-}6u(t)\dt. (7.17)
Jo L Vou J J
If some components of u*(t) include segments (sections) where M; = (iti) m i n
or (wi) ma x, then 5ui(t) must be positive at (iii)min and 6ui(t) must be
negative at (it* ) max- Since SJ is expected to be non-positive, the following
conditions must be satisfied on the optimal trajectory for i^, w*(t):
Q rr
in if « - < 0 , (7.18)
if ->0, (7.19)
<3 TT
lf
(Wi)min < tij(t) < (Ui)max ^7 ^ °"
Further details on the derivation of Eqs. 7.10 through 7.20 are discussed in
Bryson and Ho [84]. As long as H is a nonlinear function of Ui, Eq. 7.20
provides an explicit expression for w*(t).
with u' being the vector obtained from u by excluding m, then Ui (t) cannot
be obtained explicitly from the condition in Eq. 7.20 if hi is trivial over a
finite time interval (ti < t < t^)- The control over each such finite inter-
val is referred to as singular control and the time interval is referred to as
singular control interval. The singular control problems are especially dif-
ficult to handle due to difficulties associated with identification of singular
arc (trajectory of w*(t)), and estimation of when to transit from boundary
control [u* = (wi) m in or (ui)max\ to singular control and vice versa. Triv-
iality of hi over a finite time interval implies triviality of first and higher
392 Chapter 7. Process Control
derivatives of hi with respect to time over the entire time interval. As will
become evident in the illustrations presented later, this property is used to
identify u* in a singular control interval. Admissibility of singular control
is related to the kinetics of the process being optimized, i.e., the elements
of f. If the bioprocess kinetics is such that singular control is not admis-
sible, then the optimal control policy (trajectory of a manipulated input
Ui) would involve operation at the lower or upper bounds for Ui [(ui)m-m or
(iti)maxj or a composite of operations at the lower and upper bounds such
that
u*(t) = (wi) m in if hi < 0 (7.22a)
<(*) = Mmax if h,>Q (7.22b)
The trajectory of u*(t) then may involve one or more switches from the
lower bound to upper bound and vice versa. The values of t at which such
switches occur are called switching times.
Next we consider situations where integral constraints in Eqs. 7.5 are appli-
cable. Let there be a equality constraints and b inequality constraints. The
original vector of state variables can be augmented by additional (a + b)
state variables satisfying the following relations
dx'
-£ = 0,(x, u), Xj(Q) = 0, j = (n + 1), (n + 2 ) , . . . , (n + a) (7.23)
with x = \x\ X2 . . . xn]T being the original vector of n state variables. The
vector of state variables may have to be further augmented if the objective
function cannot be directly expressed in the form displayed in Eq. 7.3.
Consider for example batch and/or fed-batch operation of a bioprocess.
For cost-effective operation, it may be of interest to maximize productivity
of the target product P. The objective function in this case would be
J=[P(tf)V(tf)-P(0)V(Q)]/tf. (7.25)
For a single-cycle operation, P(0) and V(0) will be known (specified). The
objective function above can be expressed as in Eq. 7.3 by augmenting the
vector of state variables by an additional variable satisfying the following
The objective function in Eq. 7.25 can now be expressed as in Eq. 7.3 with
g(x, u) being trivial. The application of optimal control theory then follows
as discussed before with the vector of state variables now consisting of n
process variables and additional state variables satisfying relations such as
Eqs. 7.23, 7.24 and 7.26. Since the Hamiltonian in Eq. 7.10 is independent
of Xj (j — n + 1, n + 2 , . . . , n + a + b), it follows from Eq. 7.11 that the
corresponding adjoint variables, Xj (j = n + 1, n + 2 , . . . , n + a + b), are
time-invariant. For a steady-state operation, all adjoint variables Xj (j =
l , 2 , . . . , n + a + 6 + l) are time-invariant and still provided by Eq. 7.11.
While integral constraints can be handled by augmenting the state variable
space, if the process to be optimized is subject to algebraic constraints of
the type
</>j(x, u) = 0 and i/^x, u) < 0, (7.27)
then the Hamiltonian in Eq. 7.10 must be appended to account for these
constraints. The case study which follows provides an illustration of this.
The subscripts X, S and P used in the above and elsewhere in the case
studies in this chapter denote partial derivatives of a quantity (such as a
specific rate or ratio of two specific rates) with respect to X, S and P,
respectively. The Hamiltonian must be constant (= H*) on the optimal
path, H* being zero when tf is not specified (Eq. 7.14).
The trajectory of x(t) will, in general, comprise an interior arc (V <
Vm) and a boundary arc (V = Vm). When the orders of the boundary
control and singular control are both unity, there is no jump in the adjoint
variables at each junction point of the boundary and interior arcs [392, 447,
454]. Once the bioreactor is full (V — Vm), its operation continues on the
boundary arc (F = 0) until t-tf [392].
7.2. Open-Loop (Optimal) Control 395
It is evident from Eqs. 7.35 and 7.37 that the Hamiltonian is linear in
both F and SF. Admissibility of singular control must therefore be exam-
ined. The conditions for admissibility of singular control can be obtained
from Eq. 7.20 as
and
-S- p V = c2 = (SF - S0 - ^P0) V0 (7.47)
Satisfaction of relations in Eq. 7.45 implies that the bioreactor dynamics
can be completely described by two algebraic relations, Eqs. 7.46 and 7.47
and two differential equations among Eqs. 7.28-7.31. Utilizing triviality of
hi and dhi/dt, it can be deduced that the bioreactor trajectories must lie
on the surface
V
396 Chapter 7. Process Control
x = £^ = 5 (7.50)
p c
then X, 5 and P lie on the same line during that cycle (Eqs. 7.46 and
7.47). The expressions for the singular surface and control policy during
singular control, Eqs. 7.48 and 7.49, then reduce to
When Eq. 7.55 is satisfied, the following necessary and sufficient condition
for admissibility of singular control is obtained in view of the triviality of
ho and dhi/dt in a singular control interval.
(7.57)
/Vx /Vs /VP
The above relation provides the description of the singular surface in the
three dimensional concentration space (X, 5, P). The feeding policy during
singular control is obtained from the triviality of d'2hi/dt2 as
Fs
V
where
5 = a + Xax - (SF - S)fix + PTX (7.58b)
398 Chapter 7. Process Control
The feeding policy during the singular control is obtained as in Eq. 7.58a
(since d^hi/dt2 is trivial), with a, (3 and 7 being denned as
AX + B(SF - S) + CP = 0, (7.61)
, 5 and P lie on the same during that cycle (see Eq. 7.54). The biore-
actor trajectories can then be completely described in a two-dimensional
phase-plane (5 — X if B = 0 or X — P if B ^ 0) with the singular arc
(Xi, S% and Pj moving along the singular arc) being the intersection of the
singular surface in Eq. 7.57 (if B is non-zero) or Eq. 7.59 (if B — 0) with
the plane in Eq. 7.61.
The feed point (S = SF, X = P = 0) lies on the plane in Eq. 7.61.
Further, in a strictly batch operation (F = 0), it can be deduced from Eqs.
7.29-7.31 and 7.53 that the concentration trajectories will lie on the plane
in Eq. 7.61. Moreover, for a continuous culture at steady state, X, S and P
also lie on the plane in Eq. 7.61. In a typical cycle of a fed-batch operation,
increase in V implies that the bioreactor state (in terms of concentrations)
moves closer to the plane in Eq. 7.61 if not already on it at the start of
that cycle (see Eq. 7.54). In a cyclic fed-batch operation, the bioreactor
contents are partially or completely withdrawn at the end of each cycle,
which is followed by rapid addition of fresh feed. The initial state of the
reactive portion of the next cycle, (Xo, So, -Po)> therefore moves closer to
the plane in Eq. 7.61 if not already on it. One can conclude then that in
a repeated fed-batch operation with reproducible cycles, all concentration
trajectories will lie on this plane. The trajectories in a batch operation in
the two-dimensional phase-plane (S - X if B = 0 or X — P if B ^ 0) will in
7.2. Open-Loop (Optimal) Control 399
general be nonlinear. These can in some cases have inflection points. The
locus of inflection points is described by one of the following surfaces [447] :
if cr = a/i + 6e (7.62)
/
(7 66)
' '
= earp(-QiP), e 2 (P) = exp(-faP). (7.67)
The third example pertains to ethanol production from cellulose hydrolysate
by S. cerevisiae. The following kinetic expressions have been used for de-
scription of this bioprocess [190, 600].
S
-—}
Pm ) KS + S
P \ S
6
= „,
K
S
cyclic batch operation. Each cycle of a cyclic batch operation consists of fill-
ing the reactor rapidly (Fm —>• oo) with feed to increase the reactor volume
from an initial volume VQ to Vm (0 < VQ < Vm), followed by a batch opera-
tion until the objective function J is maximized and then terminating the
cycle by rapid withdrawal of the reactor contents to reduce the bioreactor
volume from Vm to VQ. A batch operation is normally continued until the
stoichiometrically limiting nutrient (limiting substrate here) is completely
utilized and/or product synthesis is terminated, for this ensures that the
bioreactor contents at the end of each cycle will have the maximum product
concentration for a given feed composition. The batch reactor trajectories
would terminate at (Xf, 5/, Pf) starting from (Xo, SQ, PO) with the
feed point being (0, 5^, 0). The relations among the three concentration
variables at these points are
X0 _
_ SF - So
_ _ __ _
_ PQ _
_VQ
__
Xf &F - of Pf Vm
The cyclic batch operations with VQ = 0 are referred to as operations with-
out recycle (from one batch to the next) while those with non-zero VQ are
termed as operations with recycle.
For the kinetics described in Eq. 7.66, a unique and locally asymptot-
ically stable limit point on the singular arc is guaranteed for all 5^ with
the exception of very low SF (Figure 7.1 (a)). In a cyclic operation, the
bioreactor trajectories in fed-batch mode terminate at a limit point and
the trajectories approach the limit point in a single-cycle operation. The
overall product-to-substrate yield for each of the three operations under
consideration and the substrate and product concentrations at the limit
point, Si and Pi, respectively, all increased with increasing 5^ (Figure 7.1).
It is evident from Figure 7.1(b) that for the kinetics under consideration,
the cyclic fed-batch operations are superior to cyclic batch operations with
recycle, which in turn are superior to cyclic batch operations without recy-
cle. The differential in the product yield between the optimal (fed-batch)
operation and the two suboptimal (batch) operations increases as SF is
increased (Figure 7.1(b)). The maximum theoretical yield of ethanol based
on glucose is 0.5111. For the kinetic parameters considered, Eq. 7.69, the
overall ethanol yields for cyclic fed-batch operations exceed the maximum
theoretical yield for SF in excess of 76.6 g/L. The magnitudes of some of
the kinetic parameters in this SF range are therefore suspect.
Results for the kinetics described in Eqs. 7.66, 7.67, and 7.69 are pre-
sented in Figure 7.2. The singular control in this case has a richer variety.
The number of limit points for the singular arc is (i) zero if SF < 3.1957 g/L
or if SF > 125.317 g/L, (ii) one if 3.1957 g/L < SF < 8.0894 g/L, and
(ii) two if 8.0894 g/L < SF < 125.317 g/L. The critical SF (125.317 g/L)
402 Chapter 7. Process Control
0 30 60 90 120
60 90 120
S (g/L)
Figure 7.1. Profiles of (a) Si ( --- ) and Pi and (b) overall product
yields for repeated batch operation without recycle (lower solid curve) and
repeated fed-batch operation (upper solid curve) for fermentation described
by Eq. 7.66. The dashed curve in (b) represents the upper (open) bound
on the profiles of the overall product yield for repeated batch operations
with recycle [447] .
also represents the bifurcation point for the limit-point curve (Figure 7.2).
Limit points lying on the lower branch (portion CDE) of the limit-point
7.2. Open-Loop (Optimal) Control 403
0.4
CL"
0.2
0,0
50 100 150
Figure 7.2. Profiles of 5* (dashed curve ABODE) and overall product yields
for repeated batch operation without recycle (lower solid curve) and re-
peated fed-batch operation (upper solid curve). Singular control in each
cycle of a repeated fed-batch operation terminates at a locally stable limit
point [S; lying on the upper branch (portion ABC) of the limit point curve
ABODE]. The non-labeled dashed curve represents the upper (open) bound
on the profiles of the overall product yield for repeated batch operations
with recycle [447].
curve are unstable. Parulekar [447] has established that fed-batch opera-
tions terminating at an unstable limit point are not feasible. The profiles
of overall product yield in Figure 7.2 illustrate the superiority of cyclic
fed-batch operation with singular control terminating at the stable limit
point over cyclic batch operations. These profiles also reveal the substan-
tial improvement in yield that can be obtained with recycle in a cyclic batch
operation.
The optimizations based on the highly lumped models such as the ones
considered in Eqs. 7.66 and 7.67 may be sensitive to variations in the ki-
netic parameters in these, some of which have significant uncertainty. For
the kinetic parameters considered in Eq. 7.69, the predicted maximum
product yield exceeded the theoretical maximum yield beyond certain 5p-
This indicates that these parameter values are not accurate enough to be
used for fed-batch optimization. Sensitivity of the objective function (max-
404 Chapter 7. Process Control
Case 3. The three specific rates are functions of 5 and P but have no
linear relations among them.
The objective function in Eq. 7.3 is considered to be independent of Xf
and t/, both of which are not specified, and A 5 is trivial as a result. Further,
g in Eq. 7.3 is considered to be trivial. Termination of bioreactor operation
in a particular cycle must occur in singular control or batch mode, the
final reactor volume being Vm in either case. It has been shown that A2 is
trivial during singular control. Triviality of ho and dhi/dt during singular
control then provides the following necessary and sufficient condition for
admissibility of singular control and description of singular arc
Example 1 Example 2
SF (g/L) 50 50 70 70
i) -„(!) = 0 (7.73)
a/p \cr/s
The intersections of the singular arc and the locus of inflection points have
special significance with respect to singular control. These intersections are
406 Chapter 7. Process Control
l)X. (7.75)
The complex exponential notation used here simplifies the analysis [549,
632, 634]. Let Pij(u>) (i, j = 1,2) denote the individual elements of II(u>).
Then Eq. 7.84 can be deduced to have the form [p2i(w)
1.A r
1 CT
SJ = -z V P«M ? +r~ Jo
i=i
Z2i = [/te(p2i) cos(02 - 0i) + Im(p2i) sin(02 - ft)] . (7.85)
In what follows, we examine the forms Eq. 7.85 reduces to when the number
of inputs subject to periodic variation is 1 or 2.
>, n\ =
u> = min(a>i, u>2), r = max(ri, r2), ujjTj = ur = 1, j = 1, 2 (7.86)
It is evident from Eq. 7.87 that the interaction between the control variables
HI and u-2 vanishes when u>\ ^ cu?. Simultaneous periodic variations in u\
and U2 may provide improvement in process performance vis-a-vis periodic
variations in u\ or u2 alone for those intervals of LJ where both pn and
P22 are positive. For a particular 77 ( = r2/ri), the optimum frequency (OJQ)
then is the frequency at which (pn + P22??2) is maximized.
Equal forcing frequencies .
When uo\ — cu2 = u, Eq. 7.85 assumes the form
5J
=2 [P^(^ri
/(<£, a;) = [Re(p2i) cos(0) + Im(p2l) sm(<j>)} . (7.88)
The third term on the right side of Eq. 7.88 represents the interaction
between the control variables u\ and u%. A positive effect of interaction
between the two control variables in forced periodic operation involving
perturbations in both u\ and u^ requires that / be positive. Maximization
of 5J for a particular steady state requires that / be maximized. Since
[/ + P11P22
'/^ / \ > w
^ j — r^i • (7.91a)
(-Pll)
[/ + ~ PHP22
, (7.93a)
(-P22)
(b)
Figure 7.3. (a) Portraits of GI and r? for p22 < 0 and portraits of G2
and C for pu < 0. Gl = pu + 2/r? + p 22 r? 2 , G2 = P22 + 2/C + PiiC 2 -
[ ( x , y ) = (77, GI) and ( x , y ) = (C, G2).] The profiles 1, 2 and 3 correspond
to pu > 0, pu = 0 and pu < 0, respectively, when ( x , y ) — (77, GI) and
to p22 > 0, p22 = 0 and p22 < 0, respectively, when ( x , y ) = (C, G2).
(b) Profiles of 6J — c (c an arbitrary constant) when min(pn, p 22 ) > 0,
max(pn, p 22 ) > 0 and / > 0 [449].
PllP22
Ci = 0 < / < p 2 i|, (7.93b)
P22)
7.3. Forced Periodic Operations 413
for
P22 r 2 ) J V
'
There therefore are infinite sets of r\ and r2 which lead to the same value
of 5J for a particular Js when min(/9n, p22) > 0, max(/9n, p22) > 0 and
/ > 0. An optimum amplitude ratio, ri/r 2 , is as a result not admissible.
The contribution of the off-diagonal elements of II, pjk (j ^ k, j, k =
1 , 2 in the present case) , to 8 J is more significant than that of the diagonal
elements, PJJ (j = 1, 2) [449, 450, 571, 572, 573]. In the case study that
follows, the forcing frequencies of inputs subject to periodic variation are
therefore considered to be equal.
with w (w > 0) being the cost of limiting substrate relative to the price
of the desired product. The term involving the coefficient v in Eq. 7.95
accounts for the difference between the price of those products (other than
the desired product) whose formation is associated with cell growth and
the cost of separation of cell mass from the desired product relative to the
price of the desired product. The objective function in Eq. 7.95 is there-
fore appropriate for optimizing operation of continuous bioprocesses that
generate growth-associated and non-growth associated products and incor-
porates costs associated with separation of the desired product from cell
mass. In both steady-state and periodic operations of continuous cultures,
the input variable space is defined by the inequality constraints
0 < D < D* and 0 < SF < S*F (7.96)
with D* being the dilution rate beyond which retention of cells is not pos-
sible in a steady state continuous culture and Sp the maximum permissible
concentration of the limiting substrate in the bioreactor feed (usually de-
cided by solubility limits of the substrate in the feed medium).
Since the performance index considered here (Eq. 7.95) is non-positive
for steady-state operations at D — 0, D — D* or SF = 0 (P = X = 0
for D > D* or SF = 0), the optimal steady state solutions cannot lie on
the boundaries D = 0, D = D* and SF = 0 of the control variable space.
The optimal steady state solutions may therefore lie strictly in the interior
of the control variable space (denned by the inequality constraints in Eq.
7.96) or on the boundary SF = Sp.
The expressions for the scalars and vectors involved in evaluation of
II(u;) have the form
X /i Ai
s h h = D [P + vX - wSF],
P /3
-X 0
(A 2 - w)
B (SF-. D
_p (\ 0
0
D
u=
SF
X^s
A= + Xas} -XaP
X£s - D)
(7.97)
7.3. Forced Periodic Operations 415
with /i, /2, and /3 being the right sides of Eqs. 7.29-7.31, respectively,
and the elements of P being
Pn = (2/^r + VxxX) AI - (2ax + °xxX] A2 + (2ex + £xxX) A 3 ,
Pi2 = (p>s + v-xsX) AI - (as + crxsX) A2 + (ES + £xsX} A 3 ,
Psi = (IJ-P + VxpX) AI - (aP + (?xpX) A 2 + (EP + £xpX) A 3 ,
P22 = (uss^i — 0"ssA2 -f £33^3) X,
P23 — (^SpAi — CTsp\2 + £SpA3) X,
P33 = (/^ppAi - aPP\2 + £ppA 3 ) X. (7.98)
The adjoint variables at a steady state are obtained from solution of Eq.
7.81, which in this case assume the form
m 2 A 2 + m3A3 — —vD,
msA 2 + m 6 A 3 = 0,
myAi - mgA 2 + mgA 3 = -D,
mi = /z - D + Xfjix, m-2 = a -\-
m4 = Xfis, m5 = D + Xcrs, m&
m7 = X//p, mg = Xap, mg = Xep — D. (7.99)
For the problem formulation described by Eqs. 7.29-7.31, depending on
the nature of relations among the three rate processes, various bioprocesses
can be classified into three types as [448, 449]: (I) bioprocesses where a
and e are each related linearly to /u, (II) bioprocesses where //, a and e are
related by a single linear relation, and (III) bioprocesses where /^, a and £
are not related linearly. For type I bioprocesses, the relations in Eq. 7.45
are applicable. In steady-state operations and forced periodic operations
with variation in D alone, the state variables X, S, and P satisfy the
stoichiometric relations in Eq. 7.50 (Sp = SFO}- For type II bioprocesses,
the three specific rates are related linearly as in Eq. 7.53. In steady-
state operations and forced periodic operations with variation in D alone,
the state variables X, S, and P satisfy the stoichiometric relation in Eq.
7.61 (Sp = SFQ). The analysis of forced periodic operation is simplified
considerably if the bioreactor state (X, S, P) in steady-state and forced
periodic operations satisfies Eq. 7.50 or Eq. 7.61, as appropriate [448, 449].
For details of the analysis of the forced periodic operations of the three
bioprocess types, the reader should refer to [448, 449].
Forced periodic operations of continuous cultures may in some situations
extend the regions of the operating parameter space where non- washout so-
lutions are admissible [322]. The extension of the regions of admissibility of
the meaningful states of continuous cultures via forced periodic operations
416 Chapter 7. Process Control
Example 2.
This example pertains to type I bioprocesses with /j, being dependent on
S, X, and P (Eq. 7.50), ^x and p,p being negative. Case studies of these
bioprocesses include fermentations producing alcohols [47, 239, 325, 326,
338, 395, 544]. For numerical illustration, p, is expressed as [326, 395]
with
For the parameters listed in Eq. 7.101, the maximum number of non-trivial
steady states is two. When two non-trivial steady states are admissible, one
of these is locally, asymptotically stable and the other is unstable. Where
7.3. Forced Periodic Operations 417
D 0.6
(a)
10 20
(b)
Figure 7.4. (a) Operating diagram for Example 1 with the cell growth
following Monod kinetics (/zm = 1.0 h~l, KS — 0.05 gL~l, and Kj —» oo
in Table 3, D in h~l and SF in gL~1}. n(SF] = D on the curve ABODE.
Forced periodic operation with variations in D and Sp is superior to steady-
state operation (i) at all frequencies in region I [(Spo^Do) lying below the
curve ABFG], (ii) for 0 < u < ui and u>2 < u> < oo (0)2 > ^i) in
region II {(Spo,DO) lying between the curves BFG and BODE], and (iii)
for all (jj except u = cj* for (SFQ,DQ) lying on the curve BFG (f^ax ~
Pi 1/^22 at u — cu*). (b) For a particular SFO, forced periodic operation
involving variations in D and Sp is superior to steady-state operation for
(z, DQ)(Z = uj2} lying outside the curve ABCDEF [/^ax = PUP22 on the
curve ABCDEF, f^ax < pnp22 for (z, D0) enclosed by the curve ABCDEF,
and f^ax > PiiPm for (z, DO] lying outside the curve ABCDEF]. Regions
I and II in (a) correspond to DO < D* and DO > D*, respectively. D*
corresponds to asterisk [449].
DO < PI(SFO) [under the curve ABCEFG in Figure 7. 5 (a)], a unique lo-
cally, asymptotically stable non-trivial steady state is admissible, the global
asymptotic stability of which is also assured since the washout state is un-
stable. Two non-trivial steady states are admissible in a portion of the
SF — D space where DQ > /^i(SVo) [(•S'.FCb -Do) tying inside the envelope
CEFHC in Figure 7.5(a)]. One of the non-trivial steady states and the
washout state are locally stable in this portion. On the curve CEF [ex-
cluding points C and F, /^I(SFO) = -Do], a unique non-trivial steady state,
which is globally, asymptotically stable, is admissible.
Since positive J is of interest, it follows that (v + c) must be positive (Eq.
7.97). In the entire region of the SF — D space where a stable non-trivial
steady state is admissible [below the curve ABCHFG in Figure 7.5(a)], peri-
odic control with variation in D alone is not proper. When u = Sp, P22(u)
is positive for some uj in region I [(/S^o? DQ) lying to the right of the curve
FIJ and below the curve FG, Figure 7.5(a)] and negative for all uj in region
II [(SFO, DQ) lying below the curve ABCHFIJ] and for (SFO, -Do) lying
on the curve FIJ excluding point F. In region I, forced periodic operations
subject to weak variations in SF will yield superior performance vis-a-vis
steady-state operation.
The effect of simultaneous periodic variations in D and SF on the biore-
actor performance was examined for v — w — 0 (Eq. 7.95). In these opera-
tions, 5J is positive at all frequencies in region I and for wi < uj < oo (u)\ ^
0, aji depends on DQ and SFO) in region II. On the interface between the
two regions [(5Vo> -Do) lying on the curve FIJ excluding point F], 5J is
positive for LU > 0.
Following conditions must be satisfied at the optimal steady-state (D =
Figure 7.5. Results for Example 2. (a) Operating diagram for Example 2 (D
in h~l and SF in gL~1}. ^\(SF] = D on the curve ABCEFG. Non-trivial
steady-states are admissible for (Sp, D} lying below the curve ABCHFG.
The number of non-trivial steady-states is (i) one for (Sp, D) lying below
the curve ABCEFG and on the curve CEF (excluding points C and F) and
(ii) two for (Sp, D) lying inside the envelope CEFHC. Periodic operations
involving weak variations in Sp are superior to steady-state operation only
in region I [(Spo 5 A)) lying to the right of the curve FIJ and below the curve
FG]. For v = w = 0, periodic operations involving weak variations in Sp and
D are superior to periodic operations involving weak variations in Sp and
steady-state operation (i) at all frequencies in region I and for (SFQ,DQ)
lying on the curve FIJ (excluding point F) and (ii) for uj\ < uj < oo
(ui > 0) in region II [(5>o, A>) lying below the curve ABCHFIJ]. The
asterisk denotes the optimal steady-state for v = w = 0. (b) Portraits of
Q [curve 1: Q = ^22(^0)) curve 2: Q = G2U(^o}} and SFQ and (J>I(UJQ) and
SPQ (dashed curve) for D0 = 0.3045 h~l. (c) Portraits of G2 for C = 0.0417
and uj (dashed curve) and GIU and uj (solid curve) for DQ = 0.3045 h~l and
SFQ = 101.5 gL~l. C, = 0.0417 is the optimum amplitude ratio (6?2 = G-2u)
at u = 1.904 cycles h~l. G2 = pii + 2/C + piiC 2 , G2u = P22 - fmax/Pu
[449].
420 Chapter 7. Process Control
7.90 with / = /max) is provided in Figure 7.5(b) for various SFO'S in re-
gion I (u) = LUQ in both types of operations). The benefit of simultaneous
variation in D and SF over variation in SF alone is self-evident. The differ-
ential in the maximum improvement attainable in the two forced periodic
operations is significantly sensitive to SFQ. For certain sets of operating
parameters, (Spo, DQ) therefore, periodic operations involving variation
exclusively in SF can be substantially inferior to those involving variations
in both D and SF- In the narrow range of SFQ considered in Figure 7.5(b),
there is substantial variation in the optimum phase difference (0 = x) that
leads to maximum positive interaction between D and Sp (f = / max ). The
optimum frequency UJQ for forced periodic operation involving variation in
Sp alone decreases with increasing SFQ (profile not shown).
For DQ = 0.3045 h~l and SFo = 101.5 gL~l, variations in G2 (G2 =
P22 ~ / 2 //>n, Pu < 0, Eq. 7.90) for the optimal amplitude ratio (G2 for
/ = /max) and G-2 for a fixed amplitude ratio (£ = 0.0417) are presented
in Figure 7.5(c). The amplitude ratio in the latter case is the optimal am-
plitude ratio only at w = 1.904 cycles h~l. Periodic operations employing
this amplitude ratio are suboptimal at other frequencies [Figure 7.5(c)].
The difference between the performance of periodic operation employing
optimal amplitude ratio and that of the periodic operation employing a
fixed amplitude ratio increases as the deviation of u; from the frequency for
which the fixed amplitude ratio is the optimal one [ui = 1.904 cycles h~1 in
Figure 7.5(c)] increases.
Example 3.
The expressions for fj,, a and e are provided in Eq. 7.68, with the parameter
values being [190, 600]
n = 1, /v> = 0.4 JT 1 , em = 1.4 h~l, Ks = 0.476 g/L,
K's = 0.666 g/L, Pm = 87 g/L, P'm = 114 g/L, KI = 203.49 g/L,
K'j = 303.03 g/L, YP/S = 0.47. (7.103)
A unique non-trivial steady state is admissible in that portion of the Sp—D
space where HI(SFQ) > D0 [(SFQ, DQ) lying below the curve ABCDEF
in Figure 7.6]. The non-trivial steady state does not undergo any Hopf
bifurcations and since the washout state is unstable when ^I(SFQ) > DQ,
the non-trivial steady state is globally, asymptotically stable.
Application of 7r-criterion was considered for v = 0 (Eq. 7.95). Periodic
control was found not to be proper when u = D in the entire region where
HI(SFQ) > DQ. When u = SF, P22(&) is positive for some LJ in region
I [(SFO, DQ) lying to the right of the curve DGH and below the curve
DEF] and negative for all o> in region II [(SFO> -^o) tying below the curve
ABCDGH] and for (/S>o, -Do) lying on the curve DGH (excluding point D)
7.3. Forced Periodic Operations 421
0.48
D 0.24
The output is
A(t), B(t), C(t) and E(t) are the appropriately dimensioned system ma-
trices with the respective multiplying vectors, the elements of which are
partial derivatives evaluated at the reference state (xr, u r , d r ). If the ref-
erence state happens to be a steady state (admissible only in a continuous
bioreactor operation), then the system matrices are time-invariant. In that
case, the state-space representation in Eqs. 7.104 and 7.106 can be trans-
formed into transfer function representation by applying Laplace transform
to Eqs. 7.104 and 7.106.
y(s) = G(s)u(s) + Gd(s)d(s), (7.107)
with the transfer functions having the form
As mentioned previously, the process models for biological reactors are in-
herently nonlinear due to large number of chemical reactions occurring in
a typical cell. Where kinetic descriptions are available, the values of model
424 Chapter 7. Process Control
In Eqs. 7.109 and 7.110, G c (s) and G m (s) represent the transfer function
matrices for the controllers and measuring devices, respectively, and y m (s)
and y d (s) the Laplace transforms of the vector of measured outputs and
7.4. Feedback Control 425
The relative gain array (RGA) has some interesting properties, which
are listed below.
1. RGA is a symmetric matrix.
2. The elements of RGA in any row or any column add up to unity.
3. The elements of RGA are dimensionless.
4. The gain in the open loop pairing y± with Uj when all other loops are
closed (operating), Ky, is related to the open-loop gain for this pair
(Kij] as
equivalent process gain matrix from the nonlinear process model (Eqs. 7.1
and 7.2), the individual gains, (K)jj (between y^ and Wj), being obtained
as
(Ky) « [yi(t] - yt(t - At)]/[ Uj -(t) - ^-(t - At)]. (7.115)
The choice of At is somewhat arbitrary. Witcher [657] has recommended
At to be 20 to 100% of the dominant time constant in the process. The
magnitude of At is reduced by the process time delay, if any, in effect of Uj
on yi, dij [173]. One can then proceed with obtaining RGA as described
earlier (Eq. 7.114). This equivalent RGA has been referred to as the
dynamic relative gain array. We will continue to refer to it as RGA. During
the transient operation of a bioprocess from an initial state to a final state
(this may be a steady state for continuous bioprocess operation) in a single
operation (run or experiment), the elements of the process gain matrix
and hence the elements of RGA may alter significantly. The input-output
pairings therefore may not be the same throughout the operation and may
have to be switched on one or more occasions.
It should be apparent from (Eq. 7.114) that even though the elements
of RGA involve comparison of open-loop gain between an input Uj and an
output yi with the closed-loop gain for this pair (when all other control
loops except the loop controlling yi by manipulating Uj are closed), RGA
can be estimated solely from the open-loop gains. Although the discussion
related to estimating the interaction among inputs and outputs thus far
has been based on availability of a mathematical description of the process,
the so-called process model, one should not be under the impression that
availability of a model is essential for estimation of RGA and decision on
input-output pairing (controller configuration). When process models are
not available or when available are reliable only in a narrow region of oper-
ating conditions, it is still possible to obtain the RGAs from experimental
data. In an uncontrolled process, one can implement changes in an input
(one input at a time) and observe the changes in various output variables.
The elements of the process gain matrix, K, can then be obtained, similar
to Eq. 7.115 as
this case, the exact number of sets being pCmt[= p'/{m*'(P ~ ^t)'}]- The
relative gain arrays for all sets must be obtained. Comparison of the RGAs
for these subsystems will reveal which subsystem has RGA closest to the
ideal situation (elements corresponding to particular input-output pairing
as close to unity as possible) and therefore will provide the best possible
control.
A system where mt > p is an overdefined system since there are not
enough output variables to be controlled with the available input variables
which can be manipulated (mt). The number of controllers in this situation
is p and only p inputs can be manipulated. The remaining (mt — p) inputs
would therefore not be manipulated and can be used for process optimiza-
tion. If they cannot be regulated then they will be classified as disturbance.
Multiple independent sets (subsystems) of input-output pairing are candi-
dates in this case, the exact number of sets being mtCp[= mtl/{pl(mt— p)!}]-
The relative gain arrays for all sets must be obtained. Comparison of the
RGAs for these subsystems will reveal which subsystem has RGA closest
to the ideal situation (elements corresponding to particular input-output
pairing as close to unity as possible) and therefore will provide the best
possible control.
and
Ki(t) = [K(t)]-1diag K(i), [K(t)]-1 = adj K(t)/|K(t)| (7.120)
In Eqs. 7.119 and 7.120, adj M denotes the adjoint matrix of M. Some
words of caution are in order here. Perfect decoupling is possible only if
the process model is perfect and reliable. However, even with imperfect
process models, decoupling can be applied with considerable success. The
dynamic decouplers being based on model inverses (Eqs. 7.119 and 7.120),
these can be implemented only if the inverses are causal and stable. For
further discussion of this and other related issues, the reader should refer
to Ogunnaike and Ray [438].
n u
uc V G
The application of the relative gain array method involves pairings among
actual process inputs and outputs. For non-square process gain matrices
(the number of inputs not being the same as the number of outputs), use
of minimal controller configuration implies that either some inputs cannot
be manipulated (overdefined system) or some outputs cannot be controlled
(underdefined system). This problem does not arise when the controller
configuration is based on SVD since all inputs and outputs are involved in
the feedback control (Eq. 7.129).
434 Chapter 7. Process Control
with
x(£) -x*(t), 6u(t) = u(t)-u*(t), a n d < 5 J = J - J * , (7.131)
and
2
" ' s/ =
s
Notice that the definitions of P, Q and R are the same as those in Eqs.
7.79. The matrices P(£), Q(t), R(t) and S/ are evaluated at the optimal
trajectories of x and u, viz., x(i) = x*(i) and u(t) — u*(t). The vector
of state variables x considered here includes the n process variables which
influence the process kinetics and additional up to (a + 6+1) state variables,
the time-variance of which is described by Eqs. 7.23, 7.24 and 7.26. P, R
and S/ are symmetric matrices. One can then work with the following
perturbation equations obtained from Eq. 7.1 via linearization around the
open-loop optimal policy [u(t) — u*(t), x(i) = x*(£)] for a fixed initial
condition stated in Eq. 7.1, viz., x(0) — XQ.
, (5x0 (7.133)
at
7.5. Optimal Linear-Quadratic Feedback Control 435
with
A(t B(i)= (7 i34)
>=!' ^- -
The equation above represents the process behavior for initial conditions
in a close neighborhood of XQ. Definitions of system matrices A and B are
the same as those in Eqs. 7.79 and 7.104. The variation in the objective
function in Eq. 7.130 can be arranged in the following quadratic form
(7.135)
The objective of the optimal feedback control is then to minimize the degra-
dation in the process performance (5J < 0) due to perturbations in x and
u. Maximization of 5J then requires solution of Eq. 7.133 and the asso-
ciated adjoint variable equations. The boundary conditions for 6x(t) are
provided at t = 0, while those for the adjoint variables X(i) are known at
t = tf. The solution to the resulting two-point boundary value problem
can be conveniently expressed using the Riccati transformation wherein
the adjoint variables and the corresponding state variables are related as
[498, 560]
X(t) = S(f)5x(f). (7.136)
For the objective functional in Eq. 7.135, the variation in the n x n matrix
with t is described by the following Riccati equation
The solution to Eq. 7.137 is then employed to relate the manipulated inputs
to the state variables as per the following perturbation feedback control law
[84, 498]
u(t) - u(t) - K(t) [x(t) - x(t)] , (7.138)
with
K(t) = R-l(QT + BTS). (7.139)
For implementation of the feedback control policy outlined in Eqs. 7.137-
7.139, knowledge of the optimal open-loop control policies is required. If
the initial condition x(0) is altered, the entire nonlinear open-loop opti-
mal control policy must be recalculated, since nonlinear optimal control
problems, such as the ones encountered with bioprocesses, depend nonlin-
early on the initial process conditions. A set of optimal open-loop control
policies over a range of nominal initial conditions XQ must be calculated
and stored prior to implementation of optimal feedback control. The corre-
sponding trajectories of controller gains, K(£), based on solution of Riccati
436 Chapter 7. Process Control
equation, Eq. 7.137, should be calculated and stored. The on-line feedback
control can then be implemented by identifying the closest initial condi-
tion (among the stored values) to the actual initial condition and using
the corresponding trajectory of proportional controller gain matrix, K(t),
for feedback control. The procedure described here is useful for designing
optimal proportional controllers with time-varying gains. Besides the pro-
portional action, the other two controller actions, viz., the derivative and
integral actions, can be built in with certain modifications of the problem
considered earlier [135, 498]. For example, integral action can be added by
inclusion of time derivative of u in the objective function J or by augment-
ing the state variables by p auxiliary state variables z(t) with
Past
Target
.,-'|Y(k) y(k+2)
/"^y m (k) u (k+m-1)
y(k-2)
y(k-2)
u(k)
the process output will change in the future if no control action is taken
(model-based prediction) and to target control action as a compensatory
effect for what will need to be corrected after the full effects of the previously
implemented control action have been completely realized. This is the
motivation behind the MFC methodology.
The MFC design methodology consists of four elements: (i) specifica-
tion of reference trajectories for the process outputs, (ii) model-based pre-
diction of process outputs, (iii) model-based computation of control action,
and (iv) update of error prediction for future control action. The varia-
tions in different MFC schemes are based primarily on how each element is
implemented in the MFC scheme. The continuous-time process operation
is comprised of successive time intervals. The four elements of MFC must
be updated in each time interval. For this reason, it is convenient to work
with discrete-time models for process and controllers. The discrete-time
models are naturally well suited since most MFC schemes are implemented
using digital computers. Techniques for transformation of continuous-time
models into discrete-time models have been discussed earlier in Chapter 4.
ries for the process outputs, y*(fc) (Figure 7.9). For an individual output,
this can be a fixed set-point value or a trajectory. The second element in-
volves prediction of trajectory of process outputs y in response to changes
in the manipulated variables u in the absence of further control action. At
the present time k (t = fcT, T — sampling period), the behavior of the
process is predicted over a horizon p. For discrete-time systems, this leads
to prediction of y(k + 1), y(k + 2), . . . , y(k + i) for i sample times into the
future based on all actual past control actions u(fc), u(k — 1 ) , . . . , u(k — j )
(Figure 7.9). In the third element of MFC, the same model as that used in
the second element is employed to calculate control trajectories that lead
to optimization of a specified objective function, which typically may in-
clude minimization of the predicted deviation of the process outputs from
the target trajectories over the prediction horizon and minimization of the
expense for control effort in driving the process outputs to their respective
target trajectories. This is equivalent to constructing and utilizing a suit-
able model inverse to predict trajectories of the manipulated inputs. This
optimization must of course be accomplished while satisfying pre-specified
operating constraints. This element therefore involves prediction of the
control sequence u(k),u(k + 1), . . . ,u(fc + m — 1) required for achieving
the desired output behavior p sampling times into the future [from t = kT
to t — (k + p — 1)T] (Figure 7.9). Usually, the prediction horizon p is
larger than the control horizon m. For computations, all control com-
mands for times (k + m) to (k + p) are kept constant at their values at
time (k + m — 1). This reduces the computational burden during real time
optimization. The last element of MFC involves comparison of the output
measurements y m (fc) to model-predicted values of the same, y(k}. The pre-
diction error e(k) = ym(k] — y(k] [not to be confused with the controller
input, e(fc) = ym(k] — y*(k)} is then used to update future predictions
(7.141)
440 Chapter 7. Process Control
with g(i) being the impulse response functions of the process. The step-
response model can be expressed as
(7.142)
i=0
with (3(i) being the step-response functions for the process and A-u(fc) =
u(k) — u(k — 1). For all real, causal systems, both <?(0) and /?(0) are consid-
ered to be trivial, hence such systems will exhibit the mandatory one-step
delay. For a process represented by the two model forms in Eqs. 7.141 and
7.142, the equivalency of the two models follows by equality of coefficients
of u(k — i), i — 0 , 1 , . . . , k, leading to the following relations.
i
The effect of time delay is included when d > 1. For real, causal processes,
it follows that a(0) = 6(0) = 0. The coefficients a(i) and b(i) and the
time delay, d, in Eq. 7.144 must be identified by fitting the model to
experimental process data. The linearized continuous-time versions of the
nonlinear continuous-time state-space models, such as in Eqs. 7.104 and
7.106 obtained from linearization of Eqs. 7.1 and 7.2, can be transformed
into time series models as in Eq. 7.144 with relative ease.
7.6. Model Predictive Control 441
Past Future
Target
cr
y(k)< > ° 0
y(k+2)
cD 1 1 u (k+m-1)
Auflrt
Au(k+l)
Ic
] k+1 k+2 k+m-1 k+p
Horizon
Figure 7.10. The elements of DMC: The "reference trajectory" is the set-
point line [438].
From BA Ogunnaike and WH Ray. Process Dynamics, Modeling, and Control. New York: Oxford
University Press, Inc., 1994. Used by permission.
that
w(k -M), i = 1, 2, . . . , m,
(7.148)
m - j) + w(k + i),
i = m + 1, m + 2, . . . , p.
Eq. 7.148 may be rewritten succinctly as
The left hand side of Eq. 7.152 represents the predicted deviation of process
output from the desired set-point trajectory in the absence of further control
action and the right hand side the predicted change in the process output
resulting from the control action, Au(/c).
The horizon over which control moves are computed is always smaller
than the horizon chosen for output prediction (i.e., m < p). As a result, Eq.
7.152 represents an overdetermined system of equations. No exact solution
exists for Eq. 7.152 as a result. A satisfactory "solution" to Eq. 7.152 then
may be obtained by minimizing an appropriate metric that represents the
difference between the left hand and right hand sides of Eq. 7.152. One
such metric is described by the right hand side of Eq. 7.153.
min J = [e(k + 1) - XAu(/c)] T [e(/e + 1) - XAu(fc)]
Au(fc)
+K[Au(k)}TAu(k), K>0 (7.153)
The second term on the right hand side of Eq. 7.153 reflects a penalty
against excessive control action. The necessary condition for minimization
of J with respect to Au(fc) is that the derivative vector <9J/<9Au(/c) be
trivial. The application of this condition to Eq. 7.153 leads to the following
feedback control law [438].
(X T X+ltt)Au(fc) - XTe(/c + l) =* Au(fc) = (X T X+XI)- 1 X T e(/c + l).
(7.154)
The projected error vector requires the vector of future values of effects
of unmeasured disturbances on the process output, values that are not
available at the present time k. In the absence of any better information,
w(A; -f- 1) is estimated as
w(k + i)=ym(k}-y(k), i = l,2,...,p. (7.155)
It is not advisable to implement the entire control sequence, Aw(fc), Au(k +
1 ) , . . . , Aw(/c + m — 1), as calculated from Eq. 7.154, in quick succession for
the following reasons. It must be recognized that it is impossible to antic-
ipate or predict precisely over the next m sampling intervals, the process-
model mismatch and unmodeled disturbances which will cause the actual
state of the process to differ from the model predictions used to compute
this sequence of control actions. Additionally, there may be changes in
the process set point at any time over the next m time intervals as bet-
ter information becomes available on the status of the process. The pre-
computed control sequence is inherently incapable of reflecting the changes
which would occur after the computation. For these reasons, as mentioned
earlier, the MFC strategy therefore is to implement only the first control
action, Au(/c), and repeatedly execute the following steps.
7.6. Model Predictive Control 445
with /33i and /?32 being the parameters indicative of the sensitivity of y3
to changes in u\ and u 2 , respectively. The procedure for application of
the four elements of MFC for MIMO processes is the same as that for
SISO processes, except that the matrices and vectors involved are much
larger. For example, the relations between the predicted error vector and
the control action in Eq. 7.152 are also applicable for MIMO processes with
X= (7.157)
X22
446 Chapter 7. Process Control
Nonlinear MFC
Many processes have significant nonlinearities that challenge successful im-
plementation of linear MFC. This has motivated the development of non-
linear MFC (NMPC) which relies on the use of a nonlinear process model.
NMPC has the potential of improving process operation, but it also pro-
vides challenging theoretical and practical problems mostly because of the
nonlinear optimization problem that must be solved at each sampling in-
stant in real time to compute the control moves.
Many nonlinear model representations were discussed in Section 4.3.
Consider the general form expressed in Eqs. (4.44)- (4.45)
(7.161)
where x(fc) is a condensed form of the terminology x(tfc) used in Section
4.3.2. The optimization problem in NMPC formulation can be expressed
as finding the values of u to optimize the objective function J subject to
constraints [234, 381]
min J = L0[y(k + p\k)} (7.162)
. ,u(fe+m-l|fc)
-l
3=0
yr(k) = ysp -
= y(Q-y(k\k) (7.168)
where ysp are the set points of outputs, y ( k ) are the measured values of
outputs, y(k\k) are output estimates obtained from the nonlinear model
Eq. (7.161), and d(k) are the estimated disturbances. This disturbance
model assumes that plant-model mismatch is attributable to a step dis-
turbance in the output that remains constant over the prediction horizon
[234]. A method for incorporating integral action based on steady-state
target optimization has been developed [381].
Simultaneous state and disturbance estimation can be performed by
augmenting the state-space model:
x(fc + l) = f(x(fc),u(A;))
d(fc + l) = d(fc) (7.169)
- h(x(fc),ii(AO)+d(AO
where d(k) is a constant output disturbance. The augmented process model
can be used for designing a nonlinear observer. A general theory for nonlin-
ear observer design is not available, and input-output models are preferred
450 Chapter 7. Process Control
over state-space models when full state feedback is not available. A list of
NMPC applications with simulations and experimental studies is given in
[234] along with a discussion of computational issues and future research
directions.
Heuristic tuning guidelines are discussed in [381] and summarized in
[234]. For stable systems, the sampling interval should be selected to pro-
vide a compromise between on-line computation load and closed-loop per-
formance. There is an inverse relationship between sampling interval and
allowable modeling error. Smaller control horizons (ra) yield more sluggish
output responses and more conservative input moves. Large values of m
increase the computation burden. Large prediction horizons (p) cause more
aggressive control and heavier computation burden. The weighting matri-
ces (Q, R, S) are dependent on the scaling of the problem. Usually they
are diagonal matrices with positive elements. The parameter values can be
tuned via simulation studies.
Computational constraints and stability of the controlled system are
critical issues in NMPC. The need to solve the nonlinear programming
problem in real time necessitate efficient and reliable nonlinear program-
ming techniques and MPC formulations that have improved computational
speed. Successive linearization of model equations, sequential model solu-
tion and optimization, simultaneous model solution and optimization are
some of the approaches proposed in recent years [234, 381].
Fault Diagnosis
453
454 Chapter 8. Fault Diagnosis
One approach for FDD that appeals to plant personnel is to first iden-
tify process variables that have significant influence on an out-of-control
signal issued by process monitoring tools, and then to reason based on
their process knowledge about the possible source causes that affect these
variables. The influence of process variables can be determined by contri-
bution plots discussed in Section 8.1. The second stage of this indirect FDD
approach can be automated by using knowledge-based systems. Many FDD
techniques are based on direct pattern recognition and discrimination that
diagnoses the fault directly from process data and models. Their founda-
tions are built on signal processing, machine learning and statistics theory.
In some techniques, trends in process variables are compared directly to
a library of patterns that represent normal and faulty process behavior.
The closest match is used to identify the status of the process. Statistical
discrimination and classification analysis, and Fisher's discriminant func-
tion are some of the techniques drawn from statistical theory. They are
discussed in Section 8.2. Other model-based FDD techniques are based
on signal processing and systems science theory such as Kalman filters,
residuals analysis, parity relations, hidden Markov models, and parameter
estimation. They are introduced in Section 8.3. Artificial neural networks
provide FDD techniques relying on fundamentals in statistics and computer
science classification and machine learning, respectively. Knowledge-based
systems (KBS) provide another group of FDD techniques that have roots
in artificial intelligence. KBSs and their use in integrating and supervis-
ing various model-based and model-free FDD techniques are discussed in
Section 8.4.
Faults can be classified as abrupt (sudden) faults and incipient (slowly
developing) faults. Abrupt faults may lead to catastrophic consequences.
They need to be detected quickly to prevent compromise of safety, produc-
tivity or quality. Incipient faults are usually associated with maintenance
problems (heat exchange surfaces getting covered with deposits) or devia-
tion trends in critical process activities from normal behavior (trends in cell
growth in penicillin production). Incipient faults are typically small and
consequently more difficult to detect. Multivariate techniques are more
useful in their detection (See Chapter 6) since these techniques make use
of information from all process measurements and can notice burgeoning
trends in many variables and integrate that information to reach a decision.
Quick detection may not be as critical for maintenance related problems,
but deviations in critical process activities are usually time critical. The
time behavior of faults can be grouped into a few generic types: jump (also
called step or bias change), intermittent, and drift (Figure 8.1). Jumps in
sensor readings are often caused by bias changes or breakdown. Wrong
manual recordings of data entries or loose wire connections that lose con-
456 Chapter 8. Fault Diagnosis
where rc n ewjfc is the jkih element of £ new (l x JK), x new ,jfc is its prediction
by the model, and enewjk is the vector of residuals.
Recently, control limits for variable contributions to Q-residuals were
suggested by Westerhuis et al. [639] to compare the residuals of the new
batch to the residuals of the NO data. If a particular variable has high
residuals in the NO set, it can also be expected to have high residuals in
the new batch. The control limits are calculated similar to those of the
Q-statistic as discussed in Section 6.4.2 (Eqs. 6.104-6.111). The residuals
matrix E of the reference set that is used to calculate contribution limits
is obtained by "monitoring" each reference batch with one of the on-line
SPM techniques discussed in Sections 6.5.1 and 6.5.2.
Contributions of process variables to the .D-statistic. Two different
approaches for calculating variable contributions to D-statistic have been
proposed. The first approach introduced by Miller et al. [389] and by
MacGregor et al. [355] calculates the contribution of each process variable
to a separate score. The first step in this approach is to determine t score
that is above its own confidence limits. Constructing confidence limits on
individual scores is discussed and formulated in Section 6.4.2 (Eq. 6.95).
The next step is to calculate the contribution of each element of the new
batch run xnew,jk on the rth score [389, 639]
C =x
jk,r new,jkPjk,r (8.4)
458 Chapter 8. Fault Diagnosis
The sum of the contributions in Eq. 8.4 is equal to the t ne w,r score of the
new batch.
The second approach was proposed by Nomikos [432]. This approach
calculates contributions of each process variable to the D-statistic instead
contributions of separate scores.
R
=
jk / j rr tnew,rxne-w,jkPr,jk W'^J
r=l
In Eq. 8.5, the contribution of each element in x n ew,jfc to the .D-statistic
is summed over all r components. This formulation is valid for the case of
orthogonal scores because S"1, which is the inverse of covariance matrix
of reference set scores T, then becomes diagonal and its diagonal elements
are used. The loadings P of the MFC A model are also assumed to be or-
thogonal so that P T P = I. Westerhuis et al. [639] have extended Nomikos'
[432] formulation to cases where scores and loadings are non-orthogonal.
According to this generalization, D-statistic is calculated as follows:
T 1 T
n
•'-'new —tfnew°
— —^new
Q- *L new — f Q-
0
jk=l
jk=l
JK
W •> j-v
X s~*iL> (Q &\
/ _^ J™ ^ '
jk=l
Hence, the contribution of new observation vector xnewjk of the new batch
to the D-statistic is calculated as
/~iD iT o —1 „ T^^1 ^"P/^n"P\ (R r7\
of the variable contributions at each time interval plus three times the
corresponding standard deviation. It is noted that UCL obtained by this
calculation is not considered to have a statistical significance, but it is useful
for detecting contributions that are higher than those of NO batches in the
reference set. A lower control limit (LCL) can also be developed in the same
manner. If it is preferred to sum contributions over all time instances or
over all process variables, then the control limits are obtained by summing
the means of the corresponding jackknifed contributions from the reference
set. The standard deviation of these summed means can be calculated as
[639]
K
\ a* =
\ k=i
where o~k and <jj are the standard deviations of the summed mean contri-
butions over all process variables and all time instances, respectively. If the
sum of the contributions over all variables at each time instance is used, one
can zoom in the region(s) where summed contributions exceed the control
limits that are calculated by using <7fc in Eq. 8.8.
It is always a good practice to check individual process variable plots
for those variables diagnosed as responsible for nagging an out-of-control
situation. When the number of variables is large, analyzing contribution
plots and corresponding variable plots to reason about the faulty condition
may become tedious and challenging. All these analyzes can be automated
and linked with real-time diagnosis [436, 607] by means of knowledge-based
systems.
Example. Consider a reference data set of 42 NO batches from fed-batch
penicillin fermentation process (see Section 6.4.1). An on-line SPM frame-
work is developed with that data set (X(42 x 14 x 764)). The model devel-
opment stage and the MPCA model developed are the same as in Section
6.4.3, except that the construction of control limits is performed by passing
each batch data in the reference set through the estimation-based on-line
SPM procedure. Estimation method 2 (the future values of disturbances
being assumed to remain constant at their current values over the remaining
batch period) discussed in Section 6.5.1 is chosen for on-line SPM. A new
batch scenario with a small downward drift on glucose feed rate (variable
3) between 180th and 300th measurements (Figure 8.3(d)) is produced for
illustration of contribution plots. Both SPE (Figure 8.2(a)) and T2 (Fig-
ure 8.2(c)) charts have detected the out-of-control situation between 250th
and 310th measurements and 270th and 290th measurements, respectively.
Variable contributions are summed for the intervals of out-of-control for
SPE and T2 in Figures 8.2(b) and 8.2(d). Since these summations represent
460 Chapter 8. Fault Diagnosis
(a)
Figure 8.2. On-line monitoring results with contribution limits for a faulty
batch.
faulty situation after the fault has developed long enough to affect related
variables, most of the variable contributions in Figures 8.2(b) and 8.2(d)
violate the control limits. The real fault is the drift in glucose feed rate
(variable 3), which is highly correlated with glucose concentration (vari-
able 5), dissolved oxygen concentration (variable 6), biomass concentration
(variable 7), penicillin concentration (variable 8), culture volume (variable
9), heat generated (variable 13), and cooling water flow rate (variable 14).
Note that penicillin concentration (variable 8) in Figure 8.2(b) and dis-
solved oxygen concentration (variable 6) in Figure 8.2(d) have the highest
contributions to SPE and T2 during the out-of-control period. Variable
contributions to T2 over all of the variables at each time instant are also
presented in Figure 8.3(a) as another indicator for detecting out-of-control
8.1. Contribution Plots 461
1 2 3 4 5 6 7 8 9 10 11 12 13 14
(a) Variables
(b)
Figure 8.3. On-line monitoring results with contribution limits for a faulty
batch.
(8.10)
k=2
(8.12)
is smallest [18, 262]. If all misclassification costs are equal, the event
described by data x will be assigned to that population TTfe for which
Sf = i ) i^fePi/i( x ) is smallest. This means that the omitted term Pfc/fc(x) is
largest. Consequently, the minimum ECM rule for equal misclassification
costs becomes [262]:
Allocate x to TT/J if Pfc/fc(x) > Pifi(x) for all i ^ k.
464 Chapter 8. Fault Diagnosis
2 (X ~ ^ fc)TS fc 1(x ~ ^
(8.14)
and all misclassification costs are equal, then x is allocated to TTfc if
n 1 1
lnp fc / fc (x) =
= max In piFi(x) . (8.15)
i
The constant p/2 ln(2-7r) is the same for all populations and can be ignored
in discriminant analysis. The quadratic discrimination score for the zth
population ^^(x) is defined as [262]
d i Q ( x ) = l n p i - i l n | E i | - i ( x - / i i ) T S r 1 ( x - M i ) » = ! , • • • , 0 - (8.16)
The generalized variance Si , the prior probability pi and the Mahalanobis
distance contribute to the quadratic score di ('x). Using the discriminant
scores, the minimum total probability of misclassification rule for Normal
populations and unequal covariance matrices becomes [262]:
Allocate x to nk if d ^ ( x . ) is the largest of all c^ (x), i = 1, • • • , g.
In practice, population mean and covariances (//j and Si) are unknown.
Computations are based on historical data sets of classified observations,
and sample mean (xj) and covariance matrices (Si) are used in Eq. (8.16).
A simplification is possible if the population covariance matrices Si are
equal for all i. Then, Si = S and Eq. (8.16) reduces to
Since the second and third terms are independent of z, they are the same for
all dj^(x) and can be ignored in classification. Since the remaining terms
8.2. Statistical Techniques for Fault Diagnosis 465
consist of a constant for each i (\npi — 1/2/zf E^O and a linear combination
of the components of x, a linear discriminant score is defined as
where
S =
^ r,
Til + r> ++ •1• • +
+ ni +nr,g-
—g n ^ ~
l)§1
+ ''' + K ~ ^ (8-20)
10*
Residual —
95% Confidence Level — (a)
50 100 150 200 250
Time index
100 _ , , 150
Time index
Cliosen Qass
(d)
100 150 250
Time index
+
where Sj and r* are the score distance and residual based on the PC model,
respectively, for fault i, Si}0c and r^ a are the score distance and residual
thresholds using the PC model, respectively, for fault i, and q is a weight
between 0 and 1. To weigh scores and residuals according to the amount
of variation in data explained by each, q is set equal to the fraction of
total variance explained by scores. The combined discriminant value thus
calculated gives an indication of the degree of certainty for the diagnosis;
statistics less than 1 indicate a good fit to the chosen model. If no model
results in a statistic less than 1, none of the models provide an adequate
match to the observation.
The FDD system design includes development of PC models for NO and
faulty operation, and computation of threshold limits using historical data
sets collected during normal plant operation and operation under specific
faults. The implementation of the FDD system at each sampling time
starts with monitoring. The model describing NO is used with new data
to decide if the current operation is in-control. If there is no significant
evidence that the process is out-of-control, further analysis is not necessary
and the procedure is concluded for that measurement time. If score or
residual tests exceed their statistical limits, there is significant evidence
that the process is out-of-control. Then, the PC models for all faults are
used to carry out the score and residuals tests, and discriminant analysis
is performed by using PC models for various faults to diagnose the source
cause of abnormal behavior.
Discrimination and Diagnosis of Multiple Disturbances
In fault diagnosis, where process behavior due to different faults is de-
scribed by different models, it is useful to have a quantitative measure of
8.2. Statistical Techniques for Fault Diagnosis 469
Classify as n2 ._ * Classify as 7^
1
s2 = -**)' (8.25)
+ n-2 - 2
= Sw + SB-
The first FDA vector wi maximizes the scatter between classes (SB)
while minimizing the scatter within classes (Sw) is obtained as
wrsgw
max ^TTCi— — (8.36)
under the assumption of Sw being invertible [139, 99]. The second FDA
vector is calculated to maximize the scatter between classes while minimiz-
ing the scatter within classes among all axes perpendicular to the first FDA
vector (wi). Additional FDA vectors are determined if necessary by us-
ing the same maximization objective and orthogonality constraint. These
FDA vectors wa form the columns of an optimal W that are the generalized
eigenvectors corresponding to the largest eigenvalues in
where Wa contains the first a FDA vectors [99]. The allocation rule is:
Allocate XQ to TT^ if dk (XQ) is the largest of all d^(x.o), ? = ! , • • • , g.
The classification rule in conjunction with Bayes' rule is used [262, 99]
so that the posterior probability (Eq. 8.13) assuming Xlf=i P(^k\^} = 1
that the class membership of the observation XQ is i. This assumption may
lead to a situation where the observation will be classified wrongly to one
of the fault cases which were used to develop the FDA discriminant when
an unknown fault occurs. Chiang et al. [99] proposed several screening
procedures to detect unknown faults. One of them involves FDA related
T2 statistic before applying Eq. 8.38 as
2
Tin - CV "^x - x"^l^/T W Or (W
— \
T
n Si W
* * f l^
1
W T f x"^- x -^"1)
/ ~ rt\
>
l (8391
\{Jf*J*jJ
Tl = x^PaA-^xo (8.41)
where Aa is (a x a) diagonal matrix containing eigenvalues and P are the
loading vectors. A set of threshold values based on NO and the known fault
classes using Eq. 8.40 is calculated. If T% < T^a, it is concluded that this
is a known class (either NO or faulty) and FDA assignment rule is used to
diagnose the fault class (or NO class if it is in-control).
The second combined algorithm (FDA/PCA) deploys FDA initially to
determine the most probable fault class i. Then it uses PCA T2 statistic
to find out if the observation XQ is truly associated with fault class i.
Hidden Output
Layer Layer
I
N
P
U
T
V
A
R
I
A
B
X
L 4
E
Bias
(first stage) is followed by another ANN (second stage) that uses the out-
puts of the previous one to determine the level of deterioration (severity of
the deviations) [631]. Such a network will have a number of outputs that
is equal to the number of causes x the number of levels of deterioration,
this considerably increases the computational requirement. A cascaded hi-
erarchically layered network is also suggested for simultaneously detecting
multiple faults [630]. Recently, an alternative two-stage framework was sug-
gested for use of ANNs in FDD [360]. In this two-stage network, a primary
network is trained to determine basic process trends (increasing, decreasing
and steady) including the level of change. The secondary network receives
the outputs from the primary network and assigns them to particular faults
that it is trained for. It is reported that when network receives data for an
unknown fault, it assigns the fault to either normal operation or untrained
faults class [360].
Most ANN based FDD architectures assume that input-output pairs are
available on-line. But in fermentation processes, very important state vari-
ables such as biomass and substrate concentrations are measured off-line in
the laboratory while measurements on variables such as dissolved oxygen
and carbon dioxide concentrations are available on-line. To develop a reli-
able ANN-based FDD scheme, values of infrequently measured (or off-line
available) variables must be provided as well. This can be done by including
some state observers or estimators such as Extended Kalman Filters (EKF)
(Section 6.5.4) into the FDD framework. Such cascaded ANN-based fault
diagnosis system (Figure 8.7) particularly designed for fermentation pro-
cesses (glutamic acid fermentation in particular) is proposed by Liu [345]. A
typical ANN architecture is used in the classifier that is a multi-layer feed-
478 Chapter 8. Fault Diagnosis
Assigned faults
ments. The residual statistics for the test sample are then generated by
using the PLS calibration model. The statistical test compares the residu-
als statistics of the test sample with the statistics of the calibration set for
detecting any significant departures.
Denote by R.j the ith N x 1 residual vector column from the TV x p
residual block matrix R. The statistic for testing the null hypothesis of the
equality of means from two normal populations with equal and unknown
variances is
2 (8.42)
where R.itest and R«i model denote the maximum likelihood estimates of the
residual means for the variable i in the test sample and the calibration set,
api is the pooled standard deviation of the two residual populations for
the i-th variable, N and Nt denote the sizes of the calibration and testing
populations, and tN+Nt-2 is the ^-distribution with N + Nt — 2 degrees of
freedom [140].
The statistic for testing the null hypothesis of the equality of variances
from two normal populations with unknown means is [140]
-^--FNt_l>N^ (8.43)
, x process
actuator fanitc j- i ^
faults
au !aultf disturbances Kprocess
"s actuator ( Up) (d) noise
I I "/?? I I I <VP>
controlled
inputs f1 f1 "°c
c * * * 0
y
(Me) o process
M _
inputs input outp ut —*
sensor —»
sens>or
faults
fault s
( UM) ( y)
input —*
sensor output —».
noise sensor
(vM\ noise
(Vy) \r
measured measured
inputs outputs
(UM) (y)
u°c(t] = uc(t)+5uc(t)+vc(t)
u°M(t) = UM (*) - 5uM(t) (8.45)
y°(*) = y(t)-Sy(t)-vy(t)
The relation in Eq. (8.44) is between the nominal inputs and outputs.
Expanding this relationship to show explicitly the faults, noises and process
disturbances:
where Sp(q) is the combined fault FTM, 8^(9) is the combined noise TFM,
484 Chapter 8. Fault Diagnosis
where component and actuator faults are modeled by Ejrifc and sensor
faults are modeled by Fpfk- The unknown inputs affecting the actuators
and process dynamics are introduced by Ejrjd/c and unknown inputs to
sensors are introduced by F^dfc. This modified representation is used in
illustrating the use of Kalman filters and observers in subsequent sections.
Ho : [J,r = 0 no fault
Hi : A*r ^ 0 fault (8.50)
where /ir is the mean of the residual vector. Because of limited data, the
test is conducted using the sample mean of residuals f instead of /ir. The
test may be conducted on a single residual at a given time (r(t)), a single
residual over a time window I (r(t) = [r(£), • • • ,r(£ — /)] T )? or an average
residual over the window I (r(t, I) — [!/(/+!)] X^=o r(t~J"))- The same tests
can be conducted on a vector of residuals where r(t) = [ri(t), • • • , rn(t}]T,
f(t] = [rT(t), • • • , TT(t - OF, and f(t, I) = [I/(I + 1)] £< =0 r(t - j). The
tests are designed for a specified false alarm rate a, and Normal distribution
and zero mean of residuals is assumed, x2 tests are used for fault detection.
They can be developed for scalar or vector residuals. Detailed discussion
of scalar and vector residuals tests is given in [189]. The tests for vector
residuals are summarized below.
486 Chapter 8. Fault Diagnosis
n
^0 : Pn(f) < Xn,a ° fault
(8 54)
'
with the covariance matrix Sp = ^[f^rj. The test statistic
r(t) (8.55)
(8-56)
^ ' (27r)"/ 2 |E f
where
1
4- T - (/ + 1 - j) [^n(j)Y^(j)] (8.57)
where
{ ^to
A*i
if k < q — I
if fc > q
The detection problem can be phrased as a hypothesis testing problem [44].
HQ : q>k no change
Hi : q<k change (8.61)
This is an easy case since the new value of the mean (//i) is known and
only the change time is investigated. The likelihood ratio between these
two hypotheses is
k
n
i=q
A fe (r) = ^-f
z
' ~q\ /
= -^S*(vo,8) (8.64)
where
n j.
and 6 — ^i — IJ.Q is change magnitude which is known in this case. The jump
time q is not known. Consequently, q in likelihood ratio (Eq. 8.62) and log-
likelihood ratio (Eq. 8.64) should be replaced by its maximum likelihood
estimate <]k under hypothesis HI'.
V-i
qk — &rg max I I Po\yi)
i=0 i=q
where
Ho : gk < T no change at time k
HI : gk > T change at time k (8.68)
Hence, the detector detects a jump of magnitude 6 in the mean at the first
time where
gk = Si(jUo,<5) — min S\(iJ,Q,8) > r (8.69)
•"- 1
6k = arg max
o
S£(^o,£) = rK — O -pr?1 Z_A^ ~ ^°) C8-?1)
i=q
where K denotes the constant term preceding the exponential part (it re-
mains the same for all hypotheses tested) , z stands for the type of obser-
vation used (r, r, or f), and p>z(f) the corresponding mean. The simplified
log- likelihood function L[-] is defined as
where log K is omitted because it will cancel out when the likelihood ratio
is denned.
The ML test consists of the following procedure:
1. Compute the maximum likelihood estimates of the residual mean from
observations under various hypotheses Jij\
f i z j ( t ) = arg max log L[(z(t), £,(*)) \Hj} j = 1, • • • , / (8.75)
A*z(*)
where Tij are the hypotheses about various possible faults that im-
pose constraints on the estimates of the mean and / is the number of
faults. The hypotheses are a function of the properties of the resid-
uals generators such as directional residuals or structured residuals
discussed below.
2. Compute the conditional likelihood functions using the observations
and conditional estimates
log Lj(t} = log L[(z(t), fizj(t))} j = 1, - - - , / (8.76)
The most likely fault is the fault that yields the highest log-likelihood value.
Extensions of ML approach with additional checks to account for the un-
certainty in the decision because of signal noise are discussed in [189].
490 Chapter 8. Fault Diagnosis
p = Wy (8.78)
These conditions assure that the rows of W are orthogonal and W is the
null space of C. Consequently,
p = W 5y (8.80)
Hence, parity equations are independent of x and contain only the errors
6y caused by faults. Furthermore, the columns of W define q distinct fault
directions, each associated with only one of the measurements. If there is
significant increase in the iih direction of p, it indicates faulty measurement
Vi-
The residual vector r = y — Cx is related to p as r = W T p where
x = (C T C)~ 1 C T y is the least squares estimate of x. The FDD prob-
lem can be stated as a two-step procedure: (1) Find x and compute r;
(2) Detect and diagnose the faulty measurements by parity checks. This
concept is extended during the last three decades to handle more complex
cases involving faults, disturbances, and noise. A short discussion of the
formulation of residual generator and parity equations is given below.
Residual Generators. A residual generator is a linear discrete dynamic
algorithm acting on observable variables [189]
where r(t] is the vector of residuals, and V(g) and W(<?) are TFMs. Noting
that r(£) must be zero when all inputs u(t) and y(t) are zero, and substi-
tuting y(t) = G(g)u(t) into Eq. (8.81) yields (V(g) + W(g)G(g))u(t) = 0.
Hence, Eq. (8.81), the computational form of the residual generator can be
written as
r(t) = W(q)(y}t)-G(q}u(t)} (8.82)
The term in brackets in Eq. (8.82) can be substituted using Eq. (8.48) to
yield the internal form of the residual generator
(8-83)
Ideally, residuals r(t) should only be affected by faults. If specific unique
residuals patterns for each fault could be generated, fault detection and
isolation would reduce to checking the violation of limits of residuals and
recognizing the patterns. However, disturbances, noise and modeling errors
(nuisance inputs) contribute to residuals as well and interfere with FDD.
The residual generator should be designed such that the effects of these
nuisance inputs on the residuals are as small as possible, leading to robust
residual generators. The differences in the properties of these three nui-
sance inputs determine the approach used in marginalizing them. Additive
disturbances and modeling errors have similar temporal behavior to addi-
tive faults. Explicit decoupling of residuals from disturbances and modeling
errors is necessary to improve the detection and diagnosis capability of the
residuals.
Noises usually have much higher frequencies than fault signals and zero
mean values. Therefore, filtering the residuals signals with low-pass niters
reduces the effects of noise without affecting the fault signals significantly.
In addition, testing the residuals against some threshold value as opposed
to testing them for nonzero values reduces false alarms caused by noise.
There is a tradeoff between the number of false alarms and the number of
missed alarms which is affected by the level of thresholds selected (Type I
and Type II errors).
The residual generator should be designed to improve fault isolation.
The residual set should have different patterns for particular faults. Resid-
ual sets designed with the isolation objective are called enhanced residuals.
There are two enhancement approaches, structured and directional. In
structured residuals, each residual responds to a different set of faults and
is insensitive to others. Threshold tests are applied to each element of the
residual vector and the test results are converted to a fault code vector
s(t) of binary digits. Defining a residual threshold vector (T), Si(t) = 1 if
\ri(t)\ > Ti', Si(t] = 0 otherwise. The pattern of the fault code vector (a
492 Chapter 8. Fault Diagnosis
binary string) is matched against the library of fault signatures for diagno-
sis. Directional residuals generate fault-specific vector directions (3 and the
scalar transfer function 7(9) in that direction indicates the dynamics of the
fault [189]
(8-84)
where j3j is the direction of the jth fault. Fault diagnosis is based on
associating r(t|f) with the closest fault direction in the fault library.
The implementation of the residual generator may be done either in
input-output form (Eq. (8.46)) or in the equivalent state-space form. (Note
the conventional use of G in state-space representation which is different
than its as a TFM G(q] and the difference between the conventional use of
F in state-space representation and F^, F£>, and F^.)
The residual responses are specified such that detection and diagnosis
are enhanced. For additive faults and disturbances (noise and multiplicative
faults are neglected) define the specifications as
Comparing the internal residual expression Eq. (8.83) (ignoring the noise
term) and the specification in Eq. (8.86), one can deduce that
The residual generator is obtained by solving Eq. (8.87) for W(g). Detailed
examples in [189] illustrate the technique and its extensions with multiplica-
tive faults and disturbances. Other extensions include integration of parity
relation design and residual evaluation with GLR test and whitening fil-
ters for FDD of dynamic stochastic processes [464]. An implementation
of this approach to continuous pasteurization systems and comparison of
parity space approach with a statistical approach that combines T2 and
SPE tests with contribution plots illustrates the strengths and limitations
of both techniques [292].
and disturbances, the Kalman filter being used for the stochastic case that
includes noise
x fc+ i - Fxfc +
Yfe = Cxfc (8.88)
The observer with a gain matrix K06 has a structure similar to Kalman
filters discussed in Section 4.3.2, viz.,
= Fxfc + Gufc + K o6 (y -
yk = Cxfc . (8.89)
The relations for the state estimation error e = x — x and the output
estimation error e = y — y for the system with faults and disturbances
become
the sensor whose reading is being estimated. The residuals are checked us-
ing threshold logic to diagnose a faulty sensor. Reduced-order or nonlinear
estimators can also be used to develop FDD systems with Kalman niters
and diagnostic observers. The equivalence between parity relation based
and diagnostic observer based FDD has been shown [162, 188].
FDD Using Robust Observers for Unknown Inputs
Deterministic observers and niters were used in the previous section to es-
timate state variables and outputs. The effect of disturbances and noise
were accounted for by using nonzero threshold limits for residuals. Ro-
bust observers can be designed by including disturbances [163] and both
disturbances and noise [427]. To illustrate the methodology and design
challenges, robust residuals generation using unknown deterministic input
(disturbance) observers [163] are discussed. Consider the process model
x fc+ i = Fxfc + Gufc + E F ffc + E£>d fc
yfc - Cxfc + FFfk + F D d fc (8.91)
Define a linear transformation
z fc = Txfc (8.92)
for the fault free system and the robust unknown input observer
z fe+1 = Rzfc + Syfc + Jufc (8.93)
with the residual
r fc = L lZfc + L 2 y fc (8.94)
such that if ffc = 0 then lim^oo r^ = 0 for all u and d, and for all initial
conditions XQ and ZQ. If ffc ^ 0, then r^ 7^ 0. The estimation error equation
for the observer is
e fc+ i = z fc+ i - Txfc+i (8.95)
= Rzfe + Syfc + Jufc - TFxfe - TGufc - TE F f fc - TE D d fc
Substituting for x^ and yfc, and imposing that the error should be inde-
pendent of state variables, control inputs and disturbances, the following
equations are established:
TF -RT = SC
J = TG
TED = 0
SFD = 0 (8.96)
TEF + o
SFF ^ o
8.3. Model-based Fault Diagnosis Techniques 495
where the last two equations ensure that the residual is nonzero if there is
a fault (ffc 7^ 0). The equations for y^ and r^ and Eq. (8.95) lead to
LiT + L 2 C = 0
L2FD - 0 (8.97)
L2FF ^ 0
0 = [1 ai • • • an b0 • • • bm}T . (8.99)
Identify model parameters 6 from process data (u, y). Then, determine
the physical parameter values using the inverse relationship 0 = f" 1 ^) and
compute changes in 0, A0. Use threshold logic or other tools to determine
the magnitude of changes in A0 and presence of faults.
A more general framework can be established for modeling the changes
in the eigenstructure of a data-based model in state space form (F ma-
trix of discrete-time equation such as Eq. (8.49)) or time series form (AR
or ARMA model). The version discussed below will provide detection of
change in univariate systems. Extension to multivariable processes has
been developed [45]. Additional steps are necessary for diagnosis if mul-
tiple faults are possible. For the case of additive changes, the cumulative
sum to be computed becomes
where pe1 reflects the change of magnitude 6 at time r and the stacked
output values are
yk-i = [Vk-i yk-2 ••• yi]T • (8.102)
The GLR is
&k(po0,P91) = max max S*(peo,pei ) (8.103)
l<r<fe (?i
and the GLR test becomes
HO : A.k(po0,pe1) < T no change at time k
H\ : A.k(peQ,p6l) > r change at time k (8.104)
Significant savings in computation time can be generated by using a two-
model approach [50]. For illustration, consider a two-model approach for
on-line detection of change in scalar AR models
(8.105)
\ fr n ^ r ~ l
(8.106)
il tor n > r
r% for n < r~1
af
(P) 1 (T? for n >r .
Define the parameter vectors
6" P =
L fa?
i • • • aP
n crj]
p j p ^= 0,1
' . (8.107)
\ /
8.3. Model-based Fault Diagnosis Techniques 497
The on-line change detection can be formulated as a GLR test using Eqs.
(8.101-8.104). If the AR model M0 (with parameter vector 0°) for the no
change hypothesis is not known, identify it with a recursive growing memory
filter. For each possible change time r, identify the after change AR model
MI using data for the time window [r,k] and compute the log-likelihood
ratio ,5^ • Maximize 5* over r. Simplifications for saving computation time
and other distance measures between models MQ and MI are discussed in
[50].
[414]. The new features of the PCD method include use of recursive variable
weighted least squares (RVWLS) with adaptive forgetting, and an implicit
parametrization scheme that estimates the process level at each sampling
instant. RVWLS parameter updating with adaptive forgetting provides
better tracking of abrupt changes in the process parameters than the usual
RWLS updating, and it reduces the number of false detections of change as
well. The detection capabilities of PCD method are superior to methods
based on forecast residuals for highly positively correlated processes. As au-
tocorrelation increases, the improvement of PCD over residuals based SPM
methods becomes more significant. The PCD method possesses several
attractive features for on-line, real time operation: its computations are ef-
ficient, its implementation is easy, and the resulting charts are clearly inter-
pretable. The implicit parametrization feature of PCD provides a statistic
for the process level (mean) which is used to detect and distinguish between
changes in level and eigenstructure of a time series. Model eigenstructure
is determined by the roots (or eigenvalues) of a model. It is related to the
order and parameter values of AR or ARM A models, and has a direct effect
on the level (bias) of the variable described by the model and its variance
(spread). Based on the values assigned by PCD to various indicators one
can determine if an eigenstructure change has occurred and if so whether
this involves a level change, a spread change, or both. The outcome of
implicit parametrization confirms the existence or lack of a level change,
and provides the magnitude of the level change [414, 411].
diate previous state. Called discrete-time, first order Markov chain, this
special case is represented as P[q± = Sj\qt-\ = Si]. If the state transition
probabilities a^ from state i to state j are not time dependent
This process is an observable Markov model because the process outputs are
the set of states and each state corresponds to a deterministically observable
event. The outputs in any given state are not random. A simple Markov
chain with three states is presented in Figure 8.9.
Hidden Markov Models
Consider the case where the stochastic process is observed only through
a set of stochastic processes that produce the sequence of observations.
The states are not directly observable, they are inferred from the observa-
tions. An example would be a process consisting of a few containers that
are filled with marbles of multiple colors. Each container has marbles of
all colors, but the fraction of marbles of a certain color in each container
varies. At each observation time, a marble rolls out through a channel that
connects all containers, but the observer does not see the container that
dispenses the marble and does not know the rule that selects the container
that dispenses the marble. The dispensing of marbles generates a finite ob-
servation sequence of colors which can be modeled as the observable output
of an HMM. A simple HMM of this process would have each state corre-
sponding to one of the containers and for which a marble color probability
is defined for each state. The choice of containers is determined by the
state transition matrix A = [a^-] of the HMM.
The elements of the HMM as depicted in Figure 8.10, include [279, 484]:
N The number of states in the model. The model is called ergodic if any
state can be reached from any other state. Generally all states are
interconnected. Denote the state at time t as qt.
M The number of distinct observation symbols per state, the alphabet size.
The individual symbols are denoted as V = {vl5 • • • , VM}
A = {o>ij} The state probability distribution where
31
23
Figure 8.9. A Markov chain with three states (labelled si, si and s3) and
selected transitions (ais and 0,32 set to 0).
that would be generated with the given model. Hence, if these se-
quences are similar, one may accept that the model used is describing
the process that generated these observations.
The HMM is formulated in two stages: training stage that solves Prob-
lem 3, and testing stage that addresses Problems 1 and 2. Problem 1
is solved using either the forward or the backward computation procedure,
Problem 2 is solved using the Viterbi algorithm, and Problem 3 is solved us-
ing the Expectation-Maximization (EM) (also called Baum-Welch) method
[279, 484]. Details of the algorithms, implementation issues, and illustra-
tions are given in both references.
Pattern recognition systems combined with finite-state HMMs have
been used for fault detection in dynamic systems [557]. The HMM pa-
rameters are derived from gross failure statistics. Wavelet-domain HMMs
have also been proposed for feature extraction and trend analysis [671].
A wavelet-based smoothing algorithm filters high-frequency noise. A tra-
jectory shape analysis technique called triangular episodes converts the
smoothed data into semi-qualitative mode and membership functions trans-
forms the information to a symbolic representation. The symbolic data
is classified with a set of sequence matching HMMs for trend analysis.
This approach is extended to detection and classification of abnormal pro-
cess situations using multidimensional hidden Markov trees [37, 579]. The
case studies discussed in these publications illustrate the application of the
method to various continuous processes.
Time
Observation
sequence Known
(feature
vectors)
Determine the
state q q _ q . . . q most likely
sequence 2 t T sequence q that
explains best the
(q f =S. i=1 N) observation
' ' ' sequence O given
the model
= (A, B, C).
different sensor readings indicate a sensor fault. When three or more sen-
sors are used, a voting mechanism may be established to diagnose the faulty
sensor (s). Another hard ware-based FDD is self-diagnosing or smart sensors
that can check the correctness of their own operation. These approaches are
usually more expensive than FDD based on analytical redundancy, but the
cost may be justifiable for mission critical measurements and equipment.
A simple model-free FDD is based on limit checking. Each measurement
is compared to its upper and lower (preset) limits, and exceeding the limits
indicates a fault. The limit checking approach can be made more elaborate
by defining warning and alarm limits, and by monitoring time trends (run
rules in univariate SPC in Section 6.1.1). An important disadvantage of
the limit checking approach is the need to interpret the alarms generated.
A single disturbance that travels through the process can generate many
alarms. Extensive process knowledge and process operation experience is
necessary to determine the source cause of the alarms. Knowledge-based
systems (KBS) can automate alarm interpretation.
Logic reasoning using ladder diagrams and hard-wired systems have
been useful for FDD in the latter part of the 20th century. Recently, fuzzy
logic and FDD based on fuzzy logic has gained popularity. Fuzzy logic
systems are discussed in Section 8.4.1 as an integral part of KBS.
Software based logic reasoning, especially real-time KBSs have become
more abundant with the increase of computation power and reduction of
computer costs. Object oriented real-time KBSs and their use in logic
reasoning are discussed in Section 8.4.1. KBSs can also provide a super-
8.4. Model-free Fault Diagnosis Techniques 503
{ 0
1
1,
T (^ A
x fI A.
A < 8 - 114 )
MA(x) t
1-
X€U xeu
Linguistic variable
Mx) f
A
Linguistic values
Membership values
(c) Function of trapezoidal (d) The linguistic variable temperature and some
fuzzy sets of its values
where '+' denotes the union of elements (/ does not indicate division) and
HA(xi) is the grade of membership of Xi for n membership values. For the
temperature example, Eq. 8.116 becomes
Theory and applications of FL, and integration of KBS and FL for control
of fermentation processes are discussed in the literature [285]-[422].
8.4. Model-free Fault Diagnosis Techniques 509
Control Strategy
Selection and
u Fermentation y - Physiological
State Variables
Process
Control Action
ik t, ii .
X X
Recognition c>f X
Physiological S ate
Fermentation Multivariate
Process Statistics
Rule-Base Rule-Base
G2 Knowledge-Base
Process G2 Inference
History Engine
Database
to detect the out-of-control situation and reports it along with the time of
its occurrence. The RTKBS continues to use the statistical rule-base to
find out the responsible variable(s) by analyzing contribution plots. Based
on the contribution limit violations, a conclusion is reached on the respon-
sible variable(s). At this point process expertise is required. Hence, the
RTKBS turns to process specific rule-base to further investigate the sit-
uation and generate some advice to isolate the problem. In the example
shown in Figure 8.16, the process rule-base is used by the RTKBS to infer
that the problem is with the glucose feed. The RTKBS also checks cer-
tain variables that are highly correlated with glucose feed such as glucose
and biomass concentrations in the fermenter to verify its conclusion. Since
these variables are also affected (determined by analyzing their contribu-
tion values), the certainty about a potential glucose feed failure is high
and these findings are reported to the operator by displaying them on the
monitor. The messages include the time of detection of the deviation, the
input variable(s) responsible, and the process variable(s) affected by the
Figure 8.15. Fed-batch penicillin production process flow chart and profiles
of process variables in G2 environment.
514 Chapter 8. Fault Diagnosis
Data Monitoring Messages Displays Help Batch Age: 121.0 hr Exponential Growth Phase
T*Ch«rt
150.0 400.0 00
C«i<n6utton» to T*
i ll lllL
s
T«1U ConMlxiflaw tc. T
Data Monitoring Messages Displays Help Batch Age: 381.5 hr Exponential Growth Phase
Ctotg
End of Batch Report 24.3.2002 20:20:53
Total Balch Time : 381.5 hours
Productivity Measures
Max Penicillin Cone. » 1.303 g/L at 3780 h
Final Penicillin Cone. - 1.302 pA 61 381.5 h
Total Penlclilln Produced - 135 264 g
160.0 4000
Related Developments
517
518 Chapter 9. Related Developments
• Amplifying aflux-controllingstep
• Using them as research tool to test basic ideas about metabolic reg-
ulation.
Metabolic flux analysis (MFA) and metabolic control analysis (MCA) are
mathematical tools that have become widely applicable in metabolic en-
gineering. Both tools are interrelated and widely used in metabolic engi-
neering research [331, 426, 565]. They are useful in developing models of
metabolic activity in a biochemical system. They would be instrumental in
developing detailed first principles models of fermentation processes.
The flux is a fundamental determinant of cell physiology and a critical
parameter of a metabolic pathway [565]. The pathway is the sequence of
feasible and observable biochemical reaction steps linking the input and out-
put metabolites. Consider the linear metabolic pathway in Figure 9.1 (a),
where A is the input metabolite, B is the output metabolite, i>i denotes
the reaction rate of the zth reaction step and EI the corresponding enzyme.
The flux J of this linear pathway is equal to the rates of the individual
reactions at steady state [565]:
v\ - »2 = • • • = Vi = • • • = VL (9.1)
A -^
(b)
S^xJ^P (9.2)
The flux of conversion of 5 to P at steady state is denoted by J. The
steady-state is uniquely defined by the parameters of the system, the levels
of enzyme activities E\ and E^, substrate concentration 5 and product
concentration P [565]. Given the values of these parameters, intermediate
metabolite concentration GX and pathway flux J can be determined. If any
parameter value is altered, a new steady state is reached and GX and J are
changed.
One objective of MCA is to relate the variables of a metabolic system
to its parameters and then determine the sensitivity of a system variable to
system parameters [565]. These sensitivities summarize the extent of sys-
temic flux control exercised by the activity of an enzyme in the pathway.
One can also solve for the concentrations of intracellular metabolites and
determine their sensitivities to enzyme activities or other system parame-
ters. The sensitivities are represented by control coefficients that indicate
how a parameter affects the behavior of the system at steady state. The
flux control coefficients (FCC) are the relative change in steady-state flux
resulting from an infinitesimal change in the activity of an enzyme of the
pathway divided by the relative change of the enzymatic activity [565]:
j _ EdJ _ d l n J
( }
~ JdE~ dlnE
Because enzymatic activity is an independent system parameter, its change
affects the flux both directly and indirectly through changes caused in other
system variables, as indicated by the total derivative symbol in Eq. 9.3.
FCCs are dimensionless and for linear pathways they have values from 0 to
1. For branched pathways, FCCs can be generalized to describe the effect
of each of the L enzyme activities on each of the L fluxes through various
reactions [565]:
j Ei dJ din Jfc .
C> = — —k— — — — — i,fe = l , - - - , L (9.4)
Jk dEi
where J/t is the steady-state flux through the kih reaction in the pathway
and Ei is the activity of the ith enzyme. A similar definition is developed
based on the rate of the iih reaction (z^) [565]:
dJk
Jk Vi
~
The FCCs for branched pathways may have any positive or negative value.
The normalization in the definition of FCCs leads to their sum being equal
9.2. Contributions of MFA and MCA to Modeling 527
x. _ E, dcj _
Ci ?
' CjdE," dlnE, ' ' ' - - ' ' '
where Cj denotes the concentration of Xj , Because the level of any inter-
mediate Xj remains unchanged when all enzyme activities are changed by
the same factor, the sum of all CCCs for each of the K metabolites is equal
to zero [565]:
L
*'=0 j = l,...,tf. (9.8)
Eq. 9.8 implies that for each metabolite at least one enzyme exerts negative
control. For example, in the two-step pathway of Eq. 9.2 the CCC C* will
normally be negative because GX will decrease when the activity of £2 is
increased [565].
The control coefficients are systemic properties of the overall metabolic
system. Local properties of individual enzymes in the metabolic network
can be described by elasticity coefficients such as the sensitivities of reaction
rates with respect to metabolite concentrations. The elasticity of the ith
reaction rate with respect to the concentration of metabolite Xj is the ratio
of the relative change in the reaction rate caused by an infinitesimal change
in the metabolite concentration, assuming that none of the other system
variables changed from their steady state values:
Elasticity coefficients may also be denned for other compounds that influ-
ence a reaction rate that may not be pathway intermediates.
The relationship between FCCs and elasticity coefficients is expressed
by the flux- control connectivity theorem that indicates how local enzyme
528 Chapter 9. Related Developments
For the two-step pathway of Eq. 9.2, the connectivity theorem gives
J
c* ^ -L r*Jc^ — 0
Lyi C Y i Oo ^ V — <J
I'Q
v"
11 ^/
or
/^•J ^2
(9.12)
indicating that large elasticities are associated with small FCCs. For exam-
ple, reactions operating close to thermodynamic equilibrium are normally
very sensitive to variations in metabolite concentrations; their elasticities
are large indicating that flux control for such reactions would be small [565].
Connectivity theorems have also been developed for CCCs.
*^ Evolution/ Reference
Methodology: Fixed Model Refined Model
Interpolation Tracking
(accuracy of (persistency of (what to track
excitation) (curse of
model) for optimality)
dimensionality)
surements at the end of a batch run are available, they could be used in
determining the optimal operation policy of the next batch. Consider the
kth run of a batch process where process measurements from the previous
(k — 1) batches and measurements up to the current time ti of the kth
batch are available. The optimal input policy for the remaining time inter-
val [ti,tf] of the kth batch can be determined by solving the optimization
problem:
mm Jk = L(x f c (tf,0)) (9.13)
,,fc
such that xfc = F(x f c ,0,u f c ) + d fc (t), x fc (0) = xg
y*=H(x*,0)+v*(t)
S(xfc,0,ufc)<0, T(x f c (t / ,6l))<0
given y jr (i) , i = 1, N for j = 1, k — 1 and i = 1,1 for j = k
where the superscript k denotes the kth batch run, x fc (£), u fc (£), yk(t), d k ( t ] ,
and v fe (t) denote the state, input, output, disturbance, and measurement
noise vectors, respectively. S() is a vector of path constraints, and T()
is a vector of terminal constraints, y^(i] denotes the ith measurement
vector collected during the jth batch run, and N the total number of mea-
surements during a run. The optimization utilizes information from the
previous k — 1 batch runs and measurements up to time ti of the current
batch to reduce uncertainty in the parameter vector 6 and to determine
the optimal input policy for the remainder of the current batch run k.
The optimization approaches that rely on process measurements to up-
date the inputs can be divided into two main groups: model-based tech-
niques and model-free techniques [73]. Model-based techniques use the
mathematical model of the batch process to predict the evolution of the run,
compute the cost sensitivity with respect to input variations, and update
the inputs. Measurement information is used to improve the estimates of
the state variables and parameters. The estimation and optimization tasks
are repeated over time (as frequently as at each sampling time), yielding
significant computational burden. In this repeated optimization approach
the model can be fixed or refined during the batch run and its optimization.
If the model is fixed, a higher level of model accuracy is necessary. If the
model parameters are known with accuracy and uncertainty is caused by
disturbances, the fixed model can yield satisfactory results. If model re-
finement such as estimation of model parameters is carried out during the
run, the initial model may not need to have high accuracy. The tradeoff is
heavier computational burden and addition of persistent excitation to in-
put signals in order to generate data rich in dynamic information for more
reliable model identification. Unfortunately, the requirement for sufficient
532 Chapter 9. Related Developments
Model-based
Fixed model [131, 683, 684] [2, 6, 380]
Model-free
Evolution [108, 687] [155, 307, 486]
Interpolation [537, 596, 673]
537
538 Appendix
D3 = 1 - 3d3/d2 D4 = 1 + 3d3/d2
Bibliography
[3] B Abraham and A Chuang. Outlier detection and time series model-
ing. Technometrics, 31:241-248, 1989.
539
540 BIBLIOGRAPHY
[25] AJ Assis and RM Filho. Soft sensors development for on-line biore-
actor state estimation. Comp. Chem. Engng., 24:1099-1103, 2000.
[26] B Atkinson and F Mavituna. Biochemical Engineering and Biotech-
nology Handbook. Stockton Press, New York, 1991.
[27] M Aynsley, A Hofland, GA Montague, D Peel, and AJ Morris. A
real-time knowledge based system for the operation and control of
a fermentation plant. In Proc. American Control Conference, pages
1992-1997, San Diego, CA, 1990.
[28] M Aynsley, A Hofland, AJ Morris, GA Montague, and C Di Massimo.
Artificial intelligence and the supervision of bioprocesses (real-time
knowledge-based systems and neural networks). Adv. in Biochem.
Eng. Biotech., 48:1-27, 1993.
[29] MJ Bagajewicz. Process Plant Instrumentation: Design and Upgrade.
C.H.I.P.S. Books, Weimar, Texas, 2001.
[30] JE Bailey. Periodic operation of chemical reactors: A review. Chem.
Engng Commun., 1:111-124, 1973.
[31] JE Bailey. Toward a science of metabolic engineering. Science,
252:1668-1675, 1991.
[32] JE Bailey. Mathematical modelling and analysis in biochemical en-
gineering: Past accomplishments and future opportunities. Biotech-
nology Progress, 14:8-20, 1998.
[33] JE Bailey, MA Hjortso, SB Lee, and F Srinec. Kinetics of product
formation and plasmid segregation in recombinant microbial popula-
tions. Ann. N. Y. Acad. Sci., 413:71-87, 1983.
[34] JE Bailey and FJM Horn. Comparison between two sufficient condi-
tions for improvement of an optimal steady-state process by periodic
operation. J. Optim. Theor. Applic., 18:378-384, 1971.
[35] JE Bailey and DF Ollis. Biochemical Engineering Fundamentals. Mc-
Graw Hill, New York, 2nd edition, 1986.
[36] RK Bajpai and M Reuss. A mechanistic model for penicillin produc-
tion. J. Chem. Technol Biotechnol, 30:332-344, 1980.
[37] A Bakhtazad, A Palazoglu, and JA Romagnoli. Detection and classifi-
cation of abnormal process situations using multidimensional wavelet
domain hidden markov trees. Comp Chem Engng, 24:769-775, 2000.
542 BIBLIOGRAPHY
[41] V Barnett. The study of outliers: Purpose and model. Appl. Statist.,
27:242-250, 1978.
[42] V Barnett and T Lewis. Outliers in Statistical Data. John Wiley,
New York, 1978.
[76] GEP Box. Some theorems on quadratic forms applied in the study of
analysis of variance problems: Effect of inequality of variance in one-
way classification. The Annals of Mathematical Statistics, 25:290-302,
1954.
[77] GEP Box and NR Draper. Empirical Model Building and Response
Surfaces. Wiley, New York, 1987.
[78] GEP Box, WG Hunter, and JS Hunter. Statistics for experimenters.
Wiley, New York, 1978.
[79] GEP Box, GM Jenkins, and GC Reinsel. Time series analysis -
Forecasting and Control. Prentice-Hall, Inc., Englewood Cliffs, NJ,
3rd edition, 1994.
[80] DR Brillinger. Discussion on linear functional relationships (by p.
sprent). J. R. Statist. Soc. B, 28:294-294, 1966.
[81] EH Bristol. On a new measure of interactions for multivariable pro-
cess control. IEEE Trans. Auto. Control, AC-11-.133, 1966.
[82] R Bro and AK Smilde. Centering and scaling in component analysis.
J Chemometrics, Submitted.
[83] RW Brockett. Volterra series and geometric control theory. Automat-
ica, 12:167-176, 1976.
[84] AE Bryson and YC Ho. Applied Optimal Control. Hemisphere, Wash-
ington, D.C., 1975.
[85] IY Caldwell and APJ. Trinci. The growth unit of the mould
Geotrichum candidum. Arch. Microbiol., 88:1-10, 1973.
[86] DC Cameron and F Chaplen. Developments in metabolic engineering.
Current Opinions in Biotechnology, 8:175-180, 1997.
[87] DC Cameron and IT Tong. Cellular and metabolic engineering. Appl.
Biochem. Biotech., 38:105-140, 1993.
[88] R Carlson. Preludes to a screening experiment: A tutorial. Chemo-
metrics and Intelligent Laboratory Systems, 14:103-114, 1992.
[89] http://turnbull.dcs.st-and.ac.uk/history/Mathematicians/
Cartwright.html. [Accessed 26 November 2002].
[90] M Casdagli, T Sauer, and JA Yorke. Embedology. J. Stat. Phys.,
65:579-616, 1991.
546 BIBLIOGRAPHY
[94] S Chen and SA Billings. Orthogonal least squares methods and its ap-
plication to non-linear system identification. Int. J. Control, 50:1873-
1896, 1989.
[96] S Chen and L Lou. Joint estimation of model parameters and outlier
effects in time series. J. Amer. Statist. Assoc., 88:284-297, 1993.
[172] RA Gabel and RA Roberts. Signals and Linear Systems. John Wiley,
New York, 3rd edition, 1980.
[267] H Kacser and JA Bums. The control of flux. Symp. Soc. Exp. Bioi,
27:65-104, 1973.
[268] MN Karim and SL Rivera. Artificial neural networks in bioprocess
state estimation. Adv. in Biochem. Eng. Biotech., 46:1-33, 1992.
[269] TW Karjala and DM Himmelblau. Dynamic data rectification by
recurrent neural networks and the extended kalman filter. AIChE J,
42:2225, 1996.
[270] A Kassidas, JF MacGregor, and PA Taylor. Synchronization of batch
trajectories using dynamic time warping. AIChE Journal, 44(4) :864-
875, 1998.
[271] SM Kay. Fundamentals of Statistical Signal Processing: Detection
Theory. Prentice Hall, New Jersey, 1998.
[272] DB Kell and HV Westerhoff. Metabolic control theory: Its role in
microbiology and biotechnology. FEMS Microbiol. Rew, 39:305-320,
1986.
[279] DH Kil and FB Shin. Pattern Recognition and Prediction with Appli-
cations to Signal Characterization. AIP Press, Woodbury, NY, 1996.
[281] G Kitagawa. On the use of aic for the detection of outliers. Techno-
metrics, 21:193-199, 1979.
[284] http://turnbull.dcs.st-and.ac.uk/history/Mathematicians/
Kolmogorov.html. [Accessed 26 November 2002].
[340] W. Liebert and H. G. Schuster. Proper choice of the time delay for
the analysis of chaotic time series. Phys. Lett. A, 142:107-111, 1989.
[342] C Lin and CS George Lee. Neural Fuzzy Systems. Prentice-Hall PTR,
Upper Saddle River, NJ, 1996.
[343] F Lindgren, P Geladi, S Rannar, and S Wold. Interactive variable
selection (IVS) for PLS. Part I.Theory and algorithms. J of Chemo-
metrics, 8:349-363, 1994.
[344] http://turnbull.dcs.st-and.ac.uk/history/Mathematicians/
Littlewood.html. [Accessed 26 November 2002].
[345] W Liu. An extended Kalman filter and neural network cascade fault
diagnosis strategy for glutamic acid fermentation process. Artificial
Intelligence in Engineering, 13:131-140, 1999.
[346] L Ljung. System Identification: Theory for the user. Prentice-Hall,
Englewood Cliffs, NJ, 2nd edition, 1999.
[347] L Ljung and T Glad. Modeling of Dynamic Systems. Prentice-Hall,
Englewood Cliffs, New Jersey, 1994.
[348] C Loeblein, JD Perkins, B Srinivasan, and D Bonvin. Economic per-
formance analysis in the design of on-line batch optimization systems.
J Proc Cont, 9:61-78, 1999.
[349] A Lorber, L Wangen, and B Kowalski. A theoretical foundation for
the PLS algorithm. J of Chemometrics, 1:19-31, 1987.
[350] EN Lorenz. Deterministic nonperiodic flow. J. Atmospheric Science,
20:130-141, 1963.
[351] CA Lowry and DC Montgomery. A review of multivariate control
charts. HE Transactions, 27:800-810, 1995.
[352] CA Lowry, WH Woodall, CW Champ, and SE Rigdon. A multivariate
exponentially weighted moving average control chart. Technometrics,
34(l):46-53, 1992.
[353] R Luo, M Misra, SJ Qin, R Barton, and DM Himmelblau. Sensor fault
detection via multiscale analysis and parametric statistical inference.
Ind. Eng. Chem. Res., 37:1024-1032, 1998.
[354] H Liitkepohl. Introduction to Multiple Time Series Analysis.
Springer, Berlin, Germany, 1991.
[355] JF MacGregor, C Jaeckle, C Kiparissides, and M Koutoudi. Pro-
cess monitoring and diagnosis by multiblock PLS methods. AIChE
Journal, 40(5):826-838, 1994.
566 BIBLIOGRAPHY
[372] Math.Works. The Matlab, Version 6.1. The MathWorks, Inc., 2001.
[373] Matlab Signal Processing Toolbox (Version 5). User's Guide, The
Mathworks, Inc. Natick, MA, 2000.
[379] RG McMillan. Tests for one or two outliers in normal samples with
unknown variance. Technometrics, 13:87-100, 1971.
[470] A Pinches and LJ Pallent. Rate and yield relationships in the pro-
duction of Xanthan gum by batch fermentations using complex and
chemically defined growth media. Biotechnol. Bioengng., 28:1484-
1496, 1986.
BIBLIOGRAPHY 575
[486] S Rahman and S Palanki. State feedback synthesis for on-line opti-
mization in the presence of measurable disturbances. AIChE Journal,
42:2869-2882,1996.
[490] GK Raju and CL Cooney. Active learning from process data. AIChE
J, 44(10):2199-2211, 1998.
[660] S Wold. Nonlinear partial least squares modelling: II. Spline inner
relation. Chemometrics and Intelligent Laboratory Systems, 14:71-84,
1992.