A High Reliability Control System
A High Reliability Control System
J. Callahan, J. Collins, A. Qualls, IUCF, 2401 Milo B. Sampson Lane, Bloomington, IN 47408
W. Hunt, Indiana University Computer Science Dept., Bloomington, IN 47405
VME Master Crate - 6U
Abstract TX
ADC in
Another major design goal is reliability. Beam time is
CH 1
16 Bit
Register
Compare Integrator -
Compare
expensive. We wanted CIS to be highly reliable and we
16 Bit
DAC out
+ Register
were also laying ground work for the proposed Light Ion
CH 2
ADC in
-
Synchrotron and for possible medical applications of
16 Bit Compare Integrator
Register
DAC out
Compare
+
16 Bit
Register
proton therapy which require high reliability. The first
ADC in
step in “reliability by design” is to insure that individual
CH 3
16 Bit
Register
Compare Integrator -
Compare 16 Bit
modules have as high a reliability as possible. Good
DAC out
+ Register
design, careful component selection, and heavy parts
derating yield a high reliability design. The next step is
Figure 2 16 Bit, 4 Channel DAC/ADC Block Diagram
carefully controlled manufacture to insure that the design
reliability is actually achieved. At IUCF, we developed an
3 REDUNDANCY ISO9000 compliant production facility (as yet unaudited)
To combat long term drift, we take advantage of a feature with full Electrostatic Discharge (ESD) protection to
which was originally designed for reliability reasons, as produce these modules. We have had approximately 50
will be described presently. Each DAC can be read back units in service for 9 months with no failures to date.
by the central computer and its output compared to its 5.1 MEAN TIME BETWEEN FAILURE (MTBF)
input, providing a direct measure of drift. If the DAC is in
calibration, the computer examines the output of the However, when large numbers of modules are used in a
power supply as measured by the ADC. If drift is system, the overall system MTBF is additive and hence,
detected here, we know that either the ADC or the power can be quite low, even when the individual MTBFs are
supply is in error, but not which. However, each power high. One of the best ways to improve system MTBF is to
supply is controlled by two DAC/ADC modules in an provide active redundancy. Active redundancy provides a
actively redundant configuration. The central computer back up module which takes over automatically when the
therefore has the ability to compare the output of two primary module fails. What makes active redundancy so
separate ADCs. If both agree, to within calibration attractive is that the probability of failure of two devices,
accuracy, we know that the power supply has probably in a given interval, is the product of the individual failure
drifted or malfunctioned. If the ADC’s differ, we know rates. When those rates are low, the product is extremely
the problem is with the ADCs. If the problem is with the low. If failures are detected immediately and the failed
ADC, the central computer has the ability to switch the unit replaced quickly, the system MTBF can approach
module to it’s backup and the primary module can then years. In CIS, critical devices are controlled by two
be removed and repaired or recalibrated. This feature also DAC/ADC modules. The primary module is normally in
permits us to automatically scan the system and control. The DAC/ADC module has a field
determine that all DACs/ADCs and power supplies are programmable gate array (FPGA) designated the “health”
functioning and are within specifications. A key FPGA which continuously monitors fourteen (14)
requirement of this approach is to calibrate power parameters within the module. If these parameters go out
supplies and DACs/ADCs which are nearing their of established ranges, a failure is detected. The primary
calibration limits. DAC/ADC module then relinquishes control to the
backup. The switch over occurs in less than a millisecond
4 COMMUNICATIONS LINK and will not usually be noticed by the controlled device at
all. The failure information is sent to the main computer.
The communication modules which link the DAC/ADC
The primary unit is also monitored by it’s backup module
modules to the main computer do not have automatic
via a “heartbeat” line. If the primary unit fails
switch over. To protect against communications failure,
catastrophically or loses power, the back up unit takes
the primary and backup DAC/ADC modules are located
over automatically. As mentioned previously, the main
in different crates. A failure in the communications
2450
computer can also direct a changeover to the backup
module if the primary module drifts out of calibration.
6 CONCLUSION
The non-ramping control system displays high reliability
and very low drift. Other portions of the CIS control
system are described in other papers.
2451