D3.2 Part1 Guidelines Dependability Hazard Analysis
D3.2 Part1 Guidelines Dependability Hazard Analysis
D3.2 Part1 Guidelines Dependability Hazard Analysis
EASIS
Commission Deliverable D3.2 Part 1
EASIS
Electronic Architecture and System Engineering for
Integrated Safety Systems
Authors
Authors
Dr Olle BRIDAL1 Volvo Technology
2
Mr Michael AMANN ZF
Mr Marko AUERSWALD Bosch
Mr Jonas EDÉN Volvo Technology
Mr Marc GRANIOU PSA
Mr Christoph HELLWIG TRW
Dr Bernhard JOSKO OFFIS
Mr Thorsten KIMMESKAMP Universität Duisburg Essen
Mr Andrew KINGSTON TRW
Mr Roman KRZEMIEN ZF
Mr Michel LEEMAN Valeo
Mr Ola LUNDKVIST Volvo Technology
3
Dr Andrea PIOVESAN C.R.F.
Mr Christian SCHEIDLER DaimlerChrysler
Mr Paul TIPLADY TRW
4
Dr David WARD MIRA
1
Worktask leader. Editor for Appendix A and D. Co-editor for Appendix C.
2
Editor for Appendix E
3
Co-editor for Appendix C
4
Editor for Appendix B
The Consortium:
DaimlerChrysler (D) Bosch (D) Continental Teves (D)
C.R.F (I) DAF Trucks (NL) DECOMSYS (A)
dSPACE (D) ETAS (D) Philips (D)
LEAR (E) MIRA (UK) Motorola (D)
OFFIS (D) Opel (D) PSA (F)
REGIENOV (F) TRW Automotive (D) Universität Duisburg Essen (D)
Valeo (F) Vector (D) Volvo (S)
ZF(D)
Table of contents
Executive summary
Unlike many other industrial sectors (aerospace, railway, military, etc), the automotive industry
lacks a standardized approach for how to deal with system safety issues in the development and
design of software-based systems. This lack of an automotive-applicable safety standard was one
of the major motivators for the EASIS project.
By analyzing, adapting, extending and defining methods and techniques for dependability and
system safety, this report defines and investigates a set of development process activities tailored
to the specific needs of the automotive sector. These activities are organized into a Dependability
Activity Framework. Recommendations and guidelines for how to carry out each activity within this
framework are given. The activities investigated are:
• Hazard identification: Identification of the undesirable vehicle-level states and behaviours
that may be created by the system being considered.
• Hazard classification: Assessment of the degree of undesirability of each hazard.
• Hazard occurrence analysis: Investigation of the cause-and-effect relationships that result
in a hazard, and estimation of the resulting hazard occurrence probabilities.
• Dependability requirements: Establishment of requirements on the product and process
to ensure that risks are sufficiently low or eliminated.
• Verification that the dependability requirements are met by the system design and
implementation.
• Safety Case: Construction of a convincing argument that the system is as safe as can
reasonably be demanded.
It should be noted that these activities are not completely orthogonal to each other. The hazard
occurrence analysis is primarily concerned with providing hints about whether, and where,
dependability mechanisms should be introduced as the system is being developed. However,
hazard occurrence analysis also plays a role in the verification activity.
This report is the result of EASIS Work Task 3.1 which is concerned with analyzing, adapting,
extending and defining methods and techniques for carrying out dependability-related activities in
the development of an Integrated Safety System (ISS). Of particular interest are the potential and
adaptability of existing dependability analysis approaches to the automotive domain.
The objective of this work is to provide guidelines for dependability-related activities that should be
performed during the development of an ISS. These activities are defined so that they are
applicable regardless of the specific development process model used. Thus, the aim is not to
define a process but to describe dependability-related activities that should be part of any
development process for an ISS.
In fact, the dependability issues for an ISS are in many ways the same as for any automotive
system. If a general methodology for dependability of automotive electronics had been established
prior to this project, our work could have focused on the ISS-specific issues. However, as such a
methodology does not exist, our work addresses the wider topic of 'dependability of automotive
electronic systems' rather than being limited to ISS-specific issues.
Dependability has been defined as "the trustworthiness of a computing system such that reliance
can justifiably be placed on the service it delivers" [2]. It is a wide concept that encompasses the
notions of reliability, availability, safety and maintainability but our focus is primarily on safety
issues. However, the safety benefits provided by integrated safety systems, in terms of their
contribution to the road traffic safety when they deliver the intended service, is outside the scope of
our work. Instead, we focus on the safety implications when the integrated safety function for some
reason does not operate as intended. In other words, we are primarily interested in the potential
failures and degraded modes of the system rather than its nominal operation. We additionally
address the possibility that the correct operation of the system might be hazardous in some driving
situations, but such issues are not in the focus of our work.
It should be noted that work is currently on-going to define an ISO standard (ISO 26262) for
functional safety of automotive electronic systems. The scope of this upcoming standard and the
scope of this EASIS Deliverable are partially overlapping. However, there are obviously major
differences between a research project and standardisation activities. In EASIS, we investigate
and discuss different approaches for how to deal with dependability and provide a set of
recommendations. The ISO standard, on the other hand, is concerned with defining a set of
requirements that have to be to be met if compliance with the standard is to be claimed. It
deserves to be mentioned that several individuals participating in EASIS WP3 are also members of
the ISO 26262 committee, thus allowing for a mutual influence between EASIS and the ISO work.
The structure of this report is based on the dependability activity framework shown in Figure 1. In
the figure, all blocks except "development and design of the integrated safety system" represent
dependability-specific activities and are thus within the scope of this report. Although this
framework does not perfectly match any particular lifecycle as defined in a standard such as
IEC 61508 [4] or ARP 4754 [3], it shows the dependability activities that should be performed
during the development of a potentially safety-critical1 system.
It is important to understand that these activities are typically not performed in strict sequence.
Instead, all activities are performed more or less in parallel throughout the development process.
This means that the inputs and outputs of an activity will be different in different phases of the
development lifecycle. However, there is typically an emphasis on the upper levels in the figure in
early phases and on lower levels in later phases. For example, the result of an early hazard
identification will typically be a list of coarsely described hazards. In a later phase, the hazards are
more precisely described. The motivation behind this "activity view" as opposed to a "process
view" is further explained in Appendix A.
We have consciously avoided referring to this framework as a "lifecycle" since the sequence of
activities is not our prime concern. Furthermore, we are only concerned with hazards occurring
during vehicle operation and with the corresponding risk-limiting activities during development.
Hazards potentially occurring during part storage, logistics, vehicle assembly, maintenance,
disposal, etc are beyond the scope of this work, as are risk-limiting actions and activities that are
outside the control of the vehicle manufacturers and suppliers. For example, speed limits, road
traffic management and driver education are not considered at all. We are mainly interested in
those hazards that are directly associated with potential failures of the considered system.
The framework is sufficiently general to allow existing process models, standards and guidelines to
be mapped to it. By not being associated with any existing process, standard or guideline, this
framework allows us to investigate each activity with open eyes and with a minimum of
presuppositions. A discussion of existing standards and guidelines that have influenced the
definition of this framework is given in Appendix A along with more detailed discussions about the
framework.
1 Here, "safety-critical" means that the system may contribute to the occurrence of hazards, for example when the system fails to work
as intended, regardless of the magnitude of this contribution.
Identification of hazards
Development
and design of
Classification of Hazard occurrence the integrated
hazards analysis safety system
Establishment of dependability-related
requirements
Verification and
validation of
dependability-
related
Safety Case construction requirements
Table 1 provides a summary of the dependability-related activities, including the typical information
flow between them. From the table, it should be clear that the outlined framework is generic in the
sense that it is applicable to the development of any safety-critical system.
In a real development scenario, the activities included in the Dependability Activity Framework will
be integrated into an overall development process. The relationship between the dependability
activity framework and the EASIS Engineering Process (EEP) [5] is shown in Table 2 below. In
each intersection between a "dependability activity" and an "EEP step", the specific dependability-
related action relevant at this stage of the EEP is briefly described. Based on the Table, the
following comments can be made:
• Hazard identification and classification are mainly carried out within EEP step 1.2 "Perform
PHA" (Preliminary Hazard Analysis). However, these activities are then repeated in various
stages of the EEP, since hitherto unknown hazards may be introduced as the development
progresses.
• Hazard occurrence analysis is performed at every step in the EEP that represents a more
detailed design (conceptual design in step 1.2, FAA2 model in step 2.4, FDA3 model in step
4.4, refined FDA model in step 5.5).
• Since the dependability activity "Establishment of dependability-related requirements"
covers everything that the system shall fulfil in order to be sufficiently dependable, it is
relevant in almost every step of the EEP.
• The Safety Case construction is not explicitly included in the EEP. Thus, this activity can
not be mapped to any step of the EEP.
Dependability
framework Hazard Hazard Hazard Establishment of
identification classification occurrence dependability-related
activity analysis requirements
(Appendix B) (Appendix B)
EEP step (Appendix C) (Appendix D)
1.3 Definition of risk Identify Classify Perform Identify requirements for risk
mitigation hazards hazards hazard graph mitigation
requirements introduced due introduced due analysis after
to risk to risk applying risk
mitigation mitigation mitigation
measures measures
1.4 Capture Document the decided
structured system dependability-related
requirements requirements
2 Analyze functional architecture
2.1 Specify functional
architecture
2.2 Specify dynamic
behaviour
2.3 Specify function Refine dependability
behaviour requirements to
implementation-related levels
2.4 FAA hazard Identify (new) Classify (new) Assess Identify risk mitigation
analysis hazards hazards residual risk measures
level
2.5 Validate FAA Check whether the FAA model
model conforms to requirements
3 Analyze hardware architecture
3.1 Design Refine dependability
system/HW requirements to
architecture implementation-related levels
3.2 Identify HW failure Hazard Define HW failure detection
model occurrence and reaction
analysis
3.3 Identify necessary Specify necessary HW
HW redundancy redundancy
4 Design of functional architecture concept
4.1 Design of Specify sensor/actuator error
sensor/actuator detection and handling
algorithms mechanisms
4. 2 Design of Mode management, with
functional behaviour respect to intentionally
degraded modes due to
detected errors
4.3 Refinement of Evaluation of error values,
functional interfaces partly relevant here
4.4 Basic design Identify (new) Classify (new) Assess Identify risk mitigation
hazard analysis hazards hazards residual risk measures
level
Dependability
framework Hazard Hazard Hazard Establishment of
identification classification occurrence dependability-related
activity analysis requirements
(Appendix B) (Appendix B)
EEP step (Appendix C) (Appendix D)
4.5 Validate Basic Check whether the basic FDA
FDA model model conforms to
requirements
5 Refine functional architecture to fulfil safety requirements
The activities of the EASIS Dependability Activity Framework and the relationship between them
are investigated and discussed in detail in the Appendices:
• Appendix A: Process frameworks for dependability
• Appendix B: Hazard identification and classification
• Appendix C: Hazard occurrence analysis
• Appendix D: Establishment of dependability-related requirements
• Appendix E: Safety Case construction
3 Guidelines
The following guidelines for dependability-related aspects of system development are given in
condensed format. They are based on the discussion and findings given in Appendixes A-E. For
each recommendation, there is a reference to the appendix where the subject is further elaborated.
Guidelines marked with *Val have been selected to be validated (with respect to their
appropriateness) in the EASIS Work Task 5.2 demonstrator. This selection has taken both the
feasibility of validation and the available amount of resources in WT5.2 into account. See EASIS
deliverable D.5.5 for more on this validation.
3.1.1 Dependability should be considered from the beginning of the development. (A.3)
3.1.2 Automotive safety-oriented standards should be followed when applicable. The upcoming
ISO 26262 standard, expected to become effective in 2008, will be particularly relevant.
(A.2.7)
3.1.3 The overall approach to dependability during the development should be aligned with the
dependability activity framework. (A.3) *Val
Objective of hazard identification: Production of a list of undesired vehicle states associated with
the system of concern.
3.2.1 A clear understanding of the scope of the system should exist before hazard identification is
carried out. In particular, the boundary between the system of concern and the outside
world should be identified. In an early stage of the development, this boundary may be
defined at a conceptual level whereas more details will be added in later stages. (B.1.1,
A.3)
3.2.2 In an early stage, hazards should be identified based on existing knowledge about the
system. This knowledge is typically quite coarse and conceptual. (A.3) *Val
3.2.3 In a later stage, more detailed descriptions of the system should be investigated in the
search for hazards that have not yet been identified. (A.3)
3.2.4 Hazards should be defined so that their occurrence is not affected by the driving situation.
(A.3.1) *Val
3.2.5 Hazards should be defined as undesirable states at the vehicle level, resulting from
undesirable states in the system of concern. (A.3.1)
3.2.6 In addition to hazards caused by faults, potential hazards associated with the fault-free
function of the system should be identified. (A.3.2, B.2.2.7) *Val
3.2.7 In the search for hazards, different use scenarios that are relevant with respect to the
system of concern should be considered. (B.2.1.3)
3.2.8 The suitability of splitting a particular hazard into several distinct hazards with different
characteristics should be considered. (B.2.1.3)
3.2.9 Since different hazard identification methods complement each other, several methods
should be used. (B.2.3) *Val
3.2.10 Use of the following methods for hazard identification should be considered
• Hazard and Operability studies, HAZOP, at the external output interface of the
system of concern. (B.2.2.3)
Objective of hazard classification: Assessment of the criticality of each hazard. Here, the criticality
is a measure of how critical an occurrence of the hazard is.
3.3.1 Each identified hazard should be classified with respect to the criticality of an occurrence of
the hazard, in order to allow the hazard to be properly addressed in the system
development. (A.3.1) *Val
3.3.2 A classification scheme of an established standard should be used, if available. The
ISO 26262 standard, tentatively scheduled for publication in 2008, will contain such a
scheme. (B.3.1.4)
3.3.3 In the absence of an established standard, consider using one of the following hazard
classification schemes
• ASIL classification as defined in the upcoming ISO 26262 standard. (B.3.1.4) *Val
• MISRA risk graph (B.3.1.3) including its Controllability factor. (B.3.1.2)
• The novel approach defined as part of the EASIS work. (B.3.2)
3.3.4 The hazard classification should take into account the characteristics of the hazard,
including:
• the magnitude of the resulting deviation from the intended behaviour. (B.2.1.3)
• the duration of the hazard. (B.3.2.1.4)
• whether or not the driver is informed about the existence of the hazard. (B.3.2.1.4)
3.3.5 The hazard classification should take into account the exposure to driving situations in
which the hazard may lead to negative consequences. (B.3.1.1, B.3.1.3, B.3.1.4, B.3.2.2)
3.3.6 The hazard classification should take into account the ability of the driver (or other road
user) to avoid negative consequences of the hazard. (B.3.1.1, B.3.1.2, B.3.1.3, B.3.1.4,
B.3.2.2)
3.3.7 The hazard classification should take into account the severity associated with each
potential outcome. (B.3.1.1, B.3.1.3, B.3.1.4, B.3.1.5.2, B.3.2.1, B.3.2.2)
3.3.8 Use of a classification approach that is based on vague criteria like "occasional",
"improbable", "catastrophic", "critical" should be avoided, unless these terms are clearly
defined in a way that ensures that they are not open to subjective interpretation. (B.3.2.1.1)
3.3.9 Company policy, legal considerations and any other similar factors should be allowed to
affect the classification assigned to a hazard. (B.3.2.1.3)
Objectives of hazard occurrence analysis: Understand the relationship between underlying faults
and resulting hazards. Assess the occurrence probability (or frequency) of each hazard.
3.4.1 The identified hazards should be analysed with respect to their occurrence (based on a
description of the system to be implemented), in order to allow each hazard to be properly
addressed during the development of the system. (A.3)
3.4.2 In an early stage, hazard occurrence analysis should be based on existing knowledge
about the system. This knowledge is typically quite coarse and conceptual. (C.2.1) *Val
3.4.3 In a later stage, hazard occurrence analysis should be based on more detailed descriptions
of the system. (C.2.2, C.2.3)
3.4.4 In an early stage, error detection and handling mechanisms should not be considered in the
hazard occurrence analysis. The main purpose of the analysis at this early stage is to
identify where such mechanisms are needed. (C.2.1) *Val
3.4.5 In a later stage, the hazard occurrence analysis could consider the identified and defined
mechanisms as being part of the system. The main purpose of the analysis at this later
stage is to permit an assessment of whether these mechanisms are appropriate. (This
illustrates that hazard occurrence analysis plays an important role during verification and
validation.) Two types of analysis can be distinguished (C.2.3):
• Qualitative analysis: Check that there are no weak points
• Quantitative analysis: Determine the residual risk
3.4.6 The depth of the hazard occurrence analysis should reflect the criticality of the hazard as
determined in the hazard classification. Thus, the higher the criticality of a particular hazard,
the more effort should be put into the occurrence analysis for this hazard. (C.2)
3.4.7 Hazard occurrence analysis should be performed at different levels of detail, reflecting the
hierarchical structure typically found in automotive electronics. The detailed levels are
typically managed by Tier 1 and Tier 2 suppliers while the higher levels are managed by the
OEM. (C.2)
3.4.8 Qualitative Fault Tree Analysis should be used to investigate the causal relationships
between random and systematic faults and errors on one hand and hazards on the other.
(C.4.3.2) *Val
3.4.9 Quantitative Fault Tree Analysis could be used to estimate the probability (or frequency) of
the hazard. However, the quantitative analysis typically has to be limited to only cover those
faults for which the probabilities (or frequencies) can be estimated, such as random
hardware faults. Furthermore, it is usually extremely difficult to estimate the probability of an
error that is not detected by any of a large number of implemented error detection
mechanisms at different levels in the system. Thus, the analysis will be quite coarse and
the results shall be interpreted more as an indication than a definite answer. (C.2.2,
C.4.3.2)
3.4.10 FMEA should be performed. (C.4.3.1) *Val
3.4.11 When an FMEA is to be performed, the appropriate starting level (initiating "failure mode")
at the present stage of the development process should be carefully selected. The basic
rule is "as detailed low level as possible, based on the present knowledge about what the
final system design will look like". At least two such levels should be addressed, with the
corresponding FMEAs typically referred to as "System-FMEA" and "Design-FMEA" in the
automotive industry. (C.4.3.1)
3.4.12 Use of Markov modelling should be considered. Markov models are particularly appropriate
for the analysis of fault-tolerant and fail-safe systems, especially when the idea is that the
first fault shall be repaired before a second fault occurs. (C.4.5.1)
3.4.13 Particular care should be taken to ensure that dependent faults are not treated as being
independent in the analysis. Thus, dependencies between different faults, errors and
component failures should be considered in the analysis whenever it is relevant. (C.5)
3.4.14 In the analysis of the contribution from potential software faults to the occurrence of a
hazard, too much detail should be avoided. Thus, the potential failure modes (rather than
the underlying faults) of the software should be considered as "root faults" in the analysis.
(C.6.1.1)
3.4.15 The following faults and other sources of hazards should be considered in the hazard
occurrence analysis. (C.3.3):
• Permanent hardware faults
• Transient hardware faults
• Environmental conditions that could make the system behave in an undesirable way
(Example: weather and visibility conditions for systems based on radars and vision
sensors)
• External faults, i.e. faults outside the system of concern
• Specification faults
• Hardware design faults
• Software design faults
• Faults in development tools: Code generator fault, compiler fault, etc
• Manufacturing faults
• Intentional malicious interactions with the system
words: The containment region defines the border where fault propagation must stop.
(D.3.2.2, D.3.2.4)
3.5.13 Fault tolerance requirements should be checked for completeness and consistency. Use of
a formal model as proposed in D.3.2.4 is recommended. (D.3.2.2)
3.5.14 The system architecture, in terms of the overall system structure and any constituent
redundancy structures, should be specified with due consideration of dependability issues.
(D.3.3)
3.5.15 When the hazard occurrence analysis shows that a particular fault may lead to a hazard
and this causal relationship is forbidden by a fault tolerance requirement, the causal path
from fault to hazard should be effectively broken. This can be achieved by a modification of
the system architecture (D.3.3) or by the introduction of an error detection mechanism and
a corresponding reaction. (D.3.4.3)
3.5.16 When the hazard occurrence analysis shows that a particular fault may lead to a hazard
and the resulting hazard probability is above the limit set by a hazard probability
requirement, the causal path from fault to hazard should be broken sufficiently effectively to
make the system meet the probabilistic requirement. This can be achieved by a
modification of the system architecture (D.3.3) or by the introduction of an error detection
mechanism and a corresponding reaction. (D.3.4.3)
3.5.17 Specific mechanisms for error detection and associated reaction should be specified as
requirements. In an early phase, such mechanisms may be specified somewhat
superficially, to be refined in more detail in a later development phase. (D.3.4)
3.5.18 In the search for appropriate requirements on specific error detection mechanisms, at least
the following mechanism types should be considered:
• plausibility checks. (D.3.4.1.1)
• electrical monitoring. (D.3.4.1.2)
• comparison of redundant information. (D.3.4.1.3)
• mechanisms for detection of errors in CPU and in memory. (D.3.4.1.4)
• monitoring of communication. (D.3.4.1.5)
3.5.19 For each error detection mechanism, the applicability of the following types of reactions to a
detected error should be considered. (D.3.4.2):
• switch to a degraded mode
• information to the driver
• reset of the affected functions
• storage of Diagnostic Trouble Codes (DTCs)
3.5.20 In the definition of the criteria for determining whether a detected error should lead to a
specific reaction, the suitability of the following "error filter" mechanisms should be
considered. (D.3.4.2):
• Up/down counter, counting up for each detection of an error and counting down
when the error is not detected: When the counter reaches a threshold, the reaction
is initiated.
• Timeout mechanism: When the error has been continuously detected during some
predefined time interval, the reaction is initiated.
3.5.21 In the specification of an error detection mechanism, the following questions should be
answered. (D.3.4.4):
• Which component performs the check?
• What is checked?
• When is the check made?
• What are the confirmation criteria, with respect to the result of the check, for
performing an action in response to a detected error? (Note: several different
actions may be possible, with different confirmation criteria)
• What action is performed by the component for each of the confirmation criteria?
• How does the component action propagate from the detecting component to a
system-level reaction?
• When a reaction to a detected error has been initiated, what is the 'healing criterion'
for reversing this action?
• What action is performed by the component when the healing criterion is fulfilled?
• How does this component action propagate from the detecting component to a
system-level reaction to the disappearance of the error?
3.5.22 Requirements on specific error detection mechanisms and corresponding reactions should
be verified using both fault injection techniques and design reviews. (D.3.4.5)
3.5.23 For quantitative requirements on hardware failure metrics, appropriate standards should be
followed when applicable. (The upcoming ISO 26262 standard, expected to become
effective in 2008, will be particularly relevant.). (D.3.5)
3.5.24 Critical functional requirements should be identified based on the results of the hazard
occurrence analysis. (D.3.7)
3.5.25 Critical functional requirements should be treated with particular care regarding verification.
The possibility of formal verification should be considered. (D.3.7)
3.5.26 For every identified hazard, introduction of limitations of the authority of the system (in the
time and value domains, i.e. duration and magnitude) should be considered, in order to
reduce the criticality of the hazards. (D.3.8)
3.5.27 Functional limitations, when specified, should be implemented close to the actual output of
the system. (D.3.8)
3.5.28 Design process requirements should be specified in accordance with applicable standards.
The upcoming ISO 26262 standard, expected to become effective in 2008, will be
particularly relevant. (D.3.9)
3.5.29 Requirements on isolation and diversity should be specified, when appropriate, based on
requirements on independence between system components with respect to the
occurrence of faults. (D.3.10)
3.5.30 In the specification of isolation requirements, the following methods should be considered.
(D.3.10.1):
• physical separation
• physical screens (housing, filtering, etc) between the components
• avoidance of common entities (hardware, data, etc)
• methods for isolation between SW modules running on the same CPU
3.5.31 In the specification of diversity requirements, the following methods should be considered.
(D.3.10.2):
• different software development teams
• different hardware
• different tools
14.11.2006 Version 2.0 15
EASIS Deliverable D3.2 Part 1
3.5.32 In the search for appropriate requirements to put on external systems, the following
questions should be considered. (D.3.12.3)
• Is there a need for additional information from an external system to the system of
concern, besides the normal and/or obvious?
• Can another system be used to implement (partial) redundancy to the system of
concern?
• Can the effects of a failure of the system of concern be mitigated (at the vehicle
level) by an external system?
• Can the effects of a failure of the system of concern be mitigated (at the vehicle
level) by an external system that detects and reacts to this failure?
3.5.33 Introduction of the following information in the user manual should be considered. (D.3.13):
• explanation of the system's functionality, capabilities and inherent limitations
• description of the Human-Machine Interface (HMI)
• descriptions of how the driver shall react to HMI information from the system
• description of system behaviour that could be surprising (example: pedal vibration
during ABS regulation)
• explanation that the driver is responsible for the control of the vehicle
3.5.34 Introduction of the following information in the instruction manual and similar documentation
should be considered, particularly for those maintenance actions that could lead to hazards
when performed incorrectly. (D.3.13):
• instructions for how to identify root faults
• instructions for how to perform repair actions
• instructions for assembly and mounting
• instructions for how to perform calibration
• instructions for how to check that a repair action has been correctly performed
Objective of Safety Case: communicate a clear, comprehensive and defensible argument that a
system is acceptably safe to operate in a given context.
Safety Case Process
3.6.1 It is recommended that a activities that lead to the creation of the Safety Case are planned,
and the plan is documented. Planning of theses activities includes scheduling and
assignment to appropriately experienced staff. (E.4.1.1)
3.6.2 A record of all discovered hazards and associated information should be kept. This is a
living document that requires updating as new information becomes available.. This record
is an essential input to the Safety Case construction. (E.4.1.2)
3.6.3 A Safety Case Lifecycle should be defined which at least states if and when the following
phases of the Safety Case have to be reached: (E.4.1.2)
• Preliminary Safety Case
• Interim Safety Cases
• Final Safety Case
3.6.4 There should be one incremental Safety Case which grows during the lifecycle (as indicated
in recommendation 3.6.3 above) instead of several new documents at each phase. (E.4)
3.6.5 The Safety Case has to be instantiated at the earliest stage possible and its preliminary
phase should at least include a preliminary Safety Plan and a preliminary Hazard
Identification. (E.4.1.1)
3.6.6 Stages for Safety Case Reports need to be defined. It is recommended that a Safety Case
Report is delivered at the end of each stage to demonstrate milestones in development.
The Safety Case Report could be done in a Checklist format. (E.4.2)
3.6.7 Safety Case Maintenance activities after the “Start Of Production” should be defined. This
includes document update as well as change management activities. Causes could be
changes made to the system or occurrence of accidents showing that the system is not
fulfilling the primary claim of the Safety Case. (E.4.3)
3.6.8 A definition of “safe enough” for the product under development and the context in which it
will be developed and operated is required. This must relate (as a minimum) to local and
international standards and to prevailing social and judicial expectations in the countries of
development and deployment. (E.2.3)
3.6.9 The primary claim of the Safety Case should be expressed as "the system is safe enough”,
“system is as safe as can reasonably be demanded" or other similar expression. (E.8.2)
3.6.10 If graphical representation of the Safety Case is required, the use of tools that support the
Safety Case construction is recommended. (E.11)
3.6.11 The inclusion of supporting documents (e.g. System Description, Validation Results) into
the Safety Case needs to be defined. There are the following options of which the first one
is recommended: (E.7.2)
1. Local links that refer to documents presenting the relevant information, which results
in many documents
2. Content of the documents itself are included into the Safety Case document, which
results in one huge Safety Case document.
Assessment
3.6.12 Independent Safety Case Assessment (by a person not involved in this Safety Case) is
recommended. (E.9)
Graphical representation
3.6.13 Graphical representation of the Safety Case is recommended to improve readability and to
support the argument structure. (E.5)
3.6.14 It is necessary to choose one notation to be used throughout the Safety Case Lifecycle.
(E5)
3.6.15 A style guide should be defined and used. (E.5)
Argument design
3.6.16 General principles of argumentation (e.g. simplicity and logical soundness) should be
considered, with the aim of facilitating the understanding. (E.3.4.1)
3.6.17 Safety Case Patterns are suggested. Which to choose will be further described when the
structure itself is described. (E.8.2.2)
3.6.18 Safety Case Modules are highly recommended for (E.8.3)
• supplier-OEM relationships to support the distribute construction of Safety Cases
• process part of the Safety Case to support semi-prescriptive documentation of the
process part
3.6.27 The Product can be further divided by applying the following Patterns. (E.6.3.2)
1. The “ALARP” Pattern decomposes the hazards into different categories according
to their criticality. The different categories may require different types and levels of
evidence.
2. The “Functional Decomposition” Pattern decomposes the high level goals into sub-
goals that can be more easily addressed.
3.6.28 Process assessment has to be carefully considered. It is necessary to assess whether the
attributes or reference processes are met or not. (E.6.3.1)
3.6.29 Correct application of the process has to be demonstrated. The supporting evidence for this
includes: (E.6.3.1)
1. Existence of the artefacts
2. Correctness of the artefacts
3. Correct timing and order of process steps
References
Table of contents
The activities included in any process may be arranged in a framework that shows how they are
related to each other and to the overall aim of the process. For dependability in general, and safety
in particular, there are a number of such frameworks proposed in standards, guidelines and
technical papers. This appendix defines a dependability activity framework based on an evaluation
of some existing dependability frameworks (lifecycles, processes, etc) with respect to their
applicability to the development of automotive integrated safety systems.
Most of the standards and guidelines reviewed in this document are specifically concerned with
safety aspects rather than dependability in general. Still, these safety-oriented approaches provide
some guidance regarding other dependability issues than safety such as availability and reliability.
The main objective of the document is to arrive at a development process view that gives structure
to the EASIS Work Task 3.1 (Dependability requirements and hazard analysis techniques). The
aim is to identify a set of dependability-specific activities that should be carried out during the
development of an Integrated Safety System, or indeed any automotive control system.
By using existing standards and approaches as input to our subdivision of the dependability work
into a set of activities, it is ensured that proper consideration of state-of-the-art knowledge and
procedures is made in this subdivision.
Existing standards and other approaches for dependability processes differ in their scope. Some
are concerned with the entire system while others only address the software aspects. Some are
safety-specific and others are more general. For these reasons, a direct comparison between
lifecycles proposed in different standards and guidelines is seldom feasible. In the following
subsections, several proposed lifecycles are evaluated with respect to their suitability for
development of automotive integrated safety systems.
In a joint effort, the European Organisation for Civil Aviation Equipment (EUROCAE) and the
American Society of Automotive Engineers (SAE) have developed guidelines for certification of
highly integrated and complex systems installed on aircraft.
The intention was to provide applicants and certification authorities with a universal basis for
demonstrating compliance with airworthiness requirements. These guidelines have been
developed with the contribution of the major aviation authorities, the Federal Aviation
Administration (FAA) and the Joint Aviation Authority (JAA).
The requirements and guidelines are contained in the following documents:
• SAE ARP-4754: Certification considerations for highly integrated aircraft [2].
• EUROCAE ED-12B / RTCA DO-178B: Software considerations in airborne systems and
equipment certification [8].
• EUROCAE ED-80/ RTCA DO-254: Design assurance guidance for airborne electronic
hardware [9].
• SAE ARP-4761: Guidelines and methods for conducting the safety assessment process on
civil airborne systems and equipment [3].
ARP-4754 provides a safety-directed development model. This model is based on the notion of
hierarchical system decomposition during development. It contains processes for requirements
capture, validation, system development and verification.
DO-178B and DO-254 become applicable at the point where requirements have been allocated to
specific hardware and software components. These documents define guidelines for the
specification, development and verification of these components.
Finally, ARP-4761 defines in detail the assessment activities of the ARP-4754 development model.
The suggested techniques span from methods to establish the functional safety requirements at
aircraft level, to detailed safety assessment at item level.
A discussion on specialized topics proposed by ARP-4761 is outside the scope of this document.
ARP 4754 [2] was published in 1996 by the SAE (Society of Automotive Engineers). It is intended
to provide designers, manufacturers, installers and certification authorities a common international
basis for demonstrating compliance with airworthiness requirements applicable to complex
systems that integrate multiple aircraft-level functions and have failure modes with the potential to
result in unsafe aircraft operating conditions.
ARP 4754 addresses the total life cycle for systems that implement aircraft-level functions. It
excludes specific coverage of detailed systems, software and hardware design processes beyond
those of significance in establishing the safety of the implemented system.
More detailed coverage of the software aspects of design are dealt with in RTCA document DO-
178B and its EUROCAE counterpart, ED-12B. Coverage of complex hardware aspects of design
are dealt with in RTCA document DO-254.
The ARP-4754 model is schematically illustrated in Figure A.1, where the inter-relations of the
safety assessment process activities with the main development process activities are highlighted.
In reality, there are many feedback loops within and among these relationships, though they have
been omitted from the figure for clarity.
Aircraft level
Requirements
Aircraft level
FHA Allocation of
Aircraft Functions
to Systems
System Level
FHA Sections Development of
System
Architecture
CCAs PSSAs
Allocation of
requirements to
Hardware &
SSAs Software
System
Implementation
Certification
At the first level of this breakdown, the functional requirements of the aircraft are supplemented
with safety requirements and are allocated to a number of systems.
The safety requirements for these systems are established from the results of the aircraft level
Functional Hazard Assessment (FHA).
At the second level of hierarchical breakdown (“system”), the potential functional failures of each
system are assessed by a system level Functional Hazard Assessment (FHA), and decisions are
taken on an appropriate system architecture that could meet the system requirements.
The Preliminary System Safety Assessment (PSSA) of the system architecture follows.
The aim of the PSSA is to establish the safety requirements for each sub-system or item of the
system architecture. Sub-system safety requirements include the definition of appropriate
development assurance levels for each sub-system. When the breakdown process has reached
the stage of implementation, these development assurance levels define the techniques and their
rigour for the specification, development and assessment of hardware and software.
As the system architecture is likely to contain parallel, dissimilar and multiple channel elements,
assumptions of independence of failure between these elements shall be derived and
substantiated. The Common Cause Analysis (CCA) is therefore appropriate at this stage in order
to establish requirements for the physical or functional separation of the sub-systems or items.
14.11.2006 2.0 A-4
EASIS Deliverable D3.2 Part 1
At the final stage, the System Safety Assessment (SSA) is conducted to collect, analyse and
document evidence that the system implementation meets the safety requirements established by
the FHA and PSSA. This verification process shall appropriately cover system and component
level development and verification activities.
In the context of system safety assessment (SSA), the results from the CCA provide the arguments
that substantiate assumptions of independence between parallel, dissimilar components.
According to the ARP-4754 model the system safety requirements are established by a Functional
Hazard Assessment by:
• The examination of aircraft and system functions and the identification of potential
functional failures.
• The assessment of the effects of functional failure conditions.
• The classification of each failure condition based on the identified effects.
The standard provides references to airworthiness regulations which define, for each aircraft
category, a proper failure classification schema based on the severity of the failure effects in terms
of system (aircraft and its subsystems) performance reduction, users (crew, passengers) injuries,
the mitigating or pejorative contributions to failure effects coming from the actual operating
condition (take-off, landing, etc.).
A typical failure classification applicable to FAR – JAR part 25 aircrafts is depicted in Table A.1.
Table A.1 Relationship between the severity of a functional failure condition, the
quantitative safety requirement for the function and the development level for the system
ARP-4754 recognizes that only with respect to random hardware failures is it possible to quantify
and apply reliability prediction in evaluating whether probabilistic safety requirements are met.
14.11.2006 2.0 A-5
EASIS Deliverable D3.2 Part 1
Contributions to critical functional failures coming from hardware systematic faults (i.e. faults which
are introduced by human in specification, design, manufacturing and installation of HW
components) and from software faults are, generally, not quantifiable. In consequence, the
verification of these kinds of failures against quantitative safety requirements is impossible.
This problem is addressed by the ARP-4754 introduction of the concept of “Development
Assurance Level”.
The Development Assurance Level addresses the development process to introduce appropriate
degree of process control, with the purpose of achieving qualitative indicators that a system meets
its safety objectives. In particular the development assurance level addresses the following
development process drivers:
• the extension and the methods to be used for the safety assessment of the system (e.g.
management of complexity, Fault Tree Analysis and relevant quantitative analysis, FMEA)
• the extension of the data and documentation formally issued within the system development
life cycle
• the extension, traceability and independence of the system verification process
• the extension and applicability of the traceability requirements for the change management
process
The ARP-4754 also shows how to use the development levels during the design decomposition
process, as a useful mechanism which simplifies and rationalises the process of allocating system
safety requirements to components of the system architecture.
At the lower level of the system hierarchical decomposition, the above mechanism should provide
a development assurance level for the software and hardware items as expected from RTCA DO-
178B and RTCA DO-254 guidelines.
In the ARP-4754 the system verification is driven by the System Safety Assessment (SSA)
process. The suggested SSA takes a functional view-point on system verification.
For the verification of system level and hardware the guidelines propose the selective, combined
use of Fault Tree Analysis (FTA), Failure Mode and Effect Analysis (FMEA), Markov models and
Dependence diagrams.
For the verification of software it is recognised that software failure rates and their contribution to
critical functional failures cannot be quantified. There are no techniques to verify software against
quantitative safety requirements.
The verification of software is addressed using the software development assurance level
according to RTCA DO-178B guidelines which relates each development assurance level to a list
of requirements for the specification, development and testing of software.
The guidelines do not perceive a structured safety case as a requirement for certification. The
decision on an appropriate set of certification data is a result of negotiations between applicants,
i.e. aircraft manufacturers and certification authorities.
It was written by a group of experts from certification authorities and companies developing
airborne software. It provides guidelines for the production of software for airborne systems and
equipment.
The objective of the guideline is to assure that software performs its intended function with a level
of confidence in safety that complies with airworthiness requirements.
These guidelines specify:
• Objectives for software life cycle processes.
• Description of activities and design considerations for achieving those objectives.
• Description of the evidence that indicate that the objectives have been satisfied.
Standards
Planning Environment Development
Process Process
Requirements,
Verification Verification
Design, Code
criteria Results
Integral Process
Verification
Certification Configuration Process
Liaison Management
Process Process Quality
Assurance
ARP 4754 identifies the relationships with DO-178B in the following terms:
“The point where requirements are allocated to hardware and software is also the point where the
guidelines of this document transition to the guidelines of DO-178B (for software), DO-254 (for
complex hardware), and other existing industry guidelines.
The following data is passed to the software and hardware processes as part of the requirements
allocation:
• Requirements allocated to hardware.
• Requirements allocated to software.
• Development assurance level for each requirement and a description of associated failure
condition(s), if applicable.
• Allocated failure rates and exposure interval(s) for hardware failures of significance.
• Hardware/software interface description (system design).
• Design constraints, including functional isolation, separation, and partitioning requirements.
• System validation activities to be performed at the software or hardware development level,
if any.
• System verification activities to be performed at the software or hardware development
level.
The standard provides requirements to perform tool qualification if a tool is used to automate
software development processes that are typically performed by humans. Tool Operational
Requirements must be provided, and the tool must be verified against the operational
requirements.
A.2.1.4.3 Off-the-Shelf-Software
The use of off-the-shelf, including commercial off-the-shelf (COTS), software is permitted by the
standard; however, the software must be verified to provide the verification assurance as defined
for the criticality level of the system.
During the software life cycle processes various data is produced to plan, explain, record or
provide evidence of activities. The document discusses the characteristics and contents of such
software lifecycle data.
ARP-4754 and DO-178B have been successfully used by avionics industry for many years.
From a theoretical point of view, these guidelines could be applied to automotive systems as well.
However, the following issues act as limiting factors to be considered:
• International codes and regulations available for the aerospace sector have reached a high
level of maturity that is not present in the automotive sector. This leads to evident problems
to classify the severity of system failure conditions and therefore to assign the safety
requirements to automotive systems.
• Due to very different volumes in the aerospace sector and in the automotive sector, the
quantitative safety requirements (failure rate per hour) proposed by ARP-4754 and DO-
178B are not applicable to automotive systems. Furthermore a precise failure data
collection is very difficult to achieve in the automotive field.
• The activities requested by ARP-4754 and DO-178B for the development processes are
very expensive and time consuming if compared to actual automotive lifecycles.
• In the automotive sector, dependability-related activities often involve pre-series trials. This
approach is not compatible with the ARP-4754 and DO-178B guidelines.
A.2.2 DO-254
DO-254 “Design assurance guidance for airborne electronic hardware” [9] is the hardware
counterpart of DO-178B “Software considerations in airborne systems and equipment certification”.
It is based around the same framework of certification and system safety. It recognizes
interactions with system development process and software life cycle process
The guidance uses the same five system development assurance levels corresponding to the five
classes of failure conditions as the other airborne standards. These five levels are related in DO-
254 to five hardware design assurance levels for which definitions are given.
Hardware safety assessment is required as part of and supporting system safety assessment. The
objective of the system safety assessment is to show that applicable systems and equipment
(including the hardware) have satisfied applicable aircraft certification safety requirements
The document gives a hardware design life cycle, which has the following stages:
• Planning
• Design
o Requirements capture
o Conceptual design
o Detailed design
o Implementation
o Production transition
• Validation and verification (sic)
• Configuration management
• Process assurance
• Certification liaison
A further section gives requirements for the hardware lifecycle data e.g. the need to produce a
Plan for Hardware Aspects of Certification (PHAC). This is analogous to the PSAC (Plan for
Software Aspects of Certification) required by DO-178B.
There is a section on “Additional considerations”. Some notable items from this section include
information on previously-developed hardware and COTS.
Generally DO-254 is more concerned with the process than with detailed recommendations on
specific hardware designs or specific techniques and measures to apply. Probabilistic failure rate
measures are permitted (and indeed assumed) but are not defined in the document.
DO-254 is considered to be applicable to complex hardware designs including ASICs and PLDs.
FAA AC 33-28-2 implies that DO-254 should be followed for PLDs (namely a device purchased as
an electronic part and altered to perform an application-specific function). DO-254 also refers to
firmware, although a definition is not given in the document. Firmware is to be classified as
hardware or software and treated appropriately i.e. if it classified as hardware DO-254 is to be
followed, but if classified as software DO-178B is to be followed.
A.2.3 Example system safety process from Delphi for by-wire automotive systems
The system safety analysis process exemplified (rather than proposed) by Delphi Automotive
Systems for by-wire automotive systems [1] differs from the other processes studied in this report
since it has no official status. Nevertheless, it is specifically aimed at the automotive domain and
therefore of some interest here.
The process is schematically depicted in Figure A.4. The bottom row of this figure shows the
system safety activities.
Verification
Conceptual Requirements Architecture Detailed and
Design Analysis Design Design Validation Production and
Deployment
Hazard Control
System Preliminary Hazard Control Detailed Specifications
Safety Hazard Specifications Hazard (Diagnostics, Safety Comprehensive
Program Analysis (Safety Analysis Design Safety Verification Safety report
Plan Requirements) Margins,...)
Figure A.4 Example System Safety Process from Delphi Automotive Systems (from [1])
IEC 61508 [10] is an international standard concerned with the functional safety of electrical,
electronic and programmable electronic (E/E/PE) safety-related systems. In understanding how to
apply the standard, there are some important points that have to be considered:
The standard is generic, and the intention is that industry sectors produce their own standards
based on it. To this end the normative parts (parts 1–4) of the standard have been designated by
IEC as a “basic safety publication” which means that these parts have to be used to prepare the
sector-specific standards. In practice this is often interpreted as meaning that a clause-by-clause
sector-specific application of each part is required.
Furthermore, if a sector-specific standard does not yet exist, IEC 61508 can be applied directly as
the applicable standard. However, a justification has to be provided for any sector-specific or
application-specific deviation from the standard.
The standard was developed against the background of industrial process control. In the IEC
61508 model, there is “equipment under control” (EUC) which can adversely affect its environment.
Safety functions are added (either as a stand-alone protection system or in a control system
associated with the EUC) to reduce the risks associated with the hazards of the system to an
acceptable level. Where these safety functions are realized in an electrical system, and/or an
electronic system, and/or a programmable electronic system the standard applies. If any safety
functions are realized entirely through some other means, then the standard does not apply to
those functions (although there may be other standards that do apply). This is illustrated in Figure
A.5 below adapted from Part 5 of the standard.
Concept
Safety requirements
allocation
back to appropriate
Overall safety validation lifecycle phase
Decommissioning or
disposal
• Some safety functions are described as “continuous” or “high demand”, where the function is
operative for a high proportion of the system up-time. These safety functions are typically found
within an EUC control system.
• For risk reduction allocated to safety functions implemented in E/E/PE systems, a safety
integrity level (SIL) is used as a measure of the necessary risk reduction required from that
function. There are 4 SILs, with SIL 1 representing the least requirement for risk reduction, and
SIL 4 the greatest. Higher SILs translate into greater reliability required of the safety functions.
• Example (according to the model of risk reduction envisaged by the authors of IEC 61508): an
EUC without protective measures has a hazard with a probability of occurrence of 2 x 10-4. The
tolerable risk is a probability of occurrence of 1 x 10-7. Therefore the required risk reduction
from a low demand protection system is 5 x 10-4; thus the protection system is allocated SIL 3
(see Table 2 of Part 1 in [10]). This is the target failure measure for the safety function.
• Where a system or component is described as being “a SIL n system”, it means that the
system or component is capable of supporting safety functions up to (and including) those
allocated SIL n. A SIL is not per se a property of a system or component.
• For safety functions where a random failure rate can be calculated, demonstration of the
requirements of the SIL is achieved by showing that the random failure rate is within the
bounds for that SIL. This is a random safety integrity requirement. Additionally, other
techniques and measures are applied to control random faults. The rigour of both the
techniques and their application increases with the SIL. These are systematic safety integrity
requirements.
• For safety functions where a random failure rate cannot be calculated (for example, for almost
any function based on software), demonstration of the requirements of the SIL is achieved by
applying appropriate techniques and measures in the design, implementation and verification.
Again, the rigour of both the techniques and their application increases with the SIL. A number
of very detailed and specific lists of techniques and measures are given. This includes
measures to avoid systematic faults. These are further systematic safety integrity
requirements.
• The safety integrity requirements determine the rigour of the processes that must be followed
in developing the system (Stage 9 of the lifecycle). In addition, these requirements determine
the rigour of the validation activities necessary to demonstrate that the system has achieved
the required level of safety (Stages 7 and 13).
• There are requirements on installation and commissioning (Stages 8 and 12), operation and
maintenance (Stages 6 and 14) and decommissioning (Stage 16). These appear to be quite
specific to the model of a protection system being added to a large industrial installation.
• Any changes to the system (Stage 15) have to be analysed for their impact and the change
taken back to the appropriate stage of the lifecycle.
• An independent assessor is usually required to provide independent assurance that the system
has been implemented in accordance with its safety requirements. The standard expects that
the assessor will be involved at all stages of system development.
In seeking to apply IEC 61508 to automotive systems the following issues are evident.
• The safety lifecycle does not align well with the typical product development processes
followed by automotive manufacturers and their suppliers. Automotive system development is
based on a number of iterations or “samples”, while the vehicles also have defined product
development lifecycles. Both of these lifecycles have key gateways or milestones that do not
have any specific functional safety requirements. See also “engineering” in the EASIS state-of-
the-art deliverable D0.1.2.
14.11.2006 2.0 A-14
EASIS Deliverable D3.2 Part 1
• As part of the automotive development lifecycle, automotive systems are usually subject to a
test-based Type Approval process. The system is fully developed and validated, then
approved, then released for series production. There is little provision within this process for
independent safety assessment throughout the lifecycle as required by IEC 61508.
• Automotive safety-related systems are rarely the classical “protection” systems envisaged by
IEC 61508. Instead, safety functions are part of the actual functioning of the system and it is
often difficult to make an arbitrary distinction between the EUC and the safety functions. For
example, the IEC’s “functional safety zone” website states that ABS is an example of an on-
demand protection system. In reality, the functioning of “ABS” on a modern vehicle is closely
bound up with the operation of the powertrain to implement a wide range of stability control
functions including conventional ABS.
• Practitioners therefore tend implicitly to allocate SILs to systems rather than functions.
• Automotive integrity/dependability requirements are often concerned with wider issues than
safety integrity alone. IEC 61508 makes little mention of human factors, yet drivers are an
integral part of the “system”.
• The techniques and measures listed by SIL are very specific and many of them are only
applicable to the process control sector. Conversely, techniques and measures that may be
commonplace in the automotive industry are not mentioned. Thus, for example, if a system
was developed using model-based control and automated code generation, a detailed
justification and analysis would need to be presented in order to comply with IEC 61508.
• The standard does not address the supply chain structure commonly found in the automotive
industry.
A.2.4.3 Misconceptions
A.2.4.4 Summary
IEC 61508 is reference standard for safety-related systems and contains many recognized “good
practice” techniques for the engineering of safety-related systems. It is a very prescriptive
approach, which can prove difficult to interpret in sectors with different requirements. The MISRA
Guidelines (see section A.2.5) were the first published interpretation of IEC 61508 for the
automotive industry. The EASIS approach will need to take account of its requirements, but some
parts will need careful interpretation and implementation.
Development Guidelines for Vehicle Based Software [7](often known as the “MISRA Guidelines”)
were developed in the early 1990s by a UK-based consortium of automotive companies
representing vehicle manufacturers, the supply chain and consultants. The Guidelines were written
against the background of the development that was taking place on IEC 61508 and
acknowledged that, while the track record of the automotive industry in respect of embedded
software was good, a recognized industry position on embedded software development for safety-
related vehicle control systems was needed.
The team producing the Guidelines had access to draft material from the committees that
produced IEC 61508, and many of the concepts and principles in that standard are embodied in
the Guidelines. The Guidelines sought to incorporate these principles within the context of the
standard approaches for automotive engineering. The authors of the Guidelines did not consider a
clause-by-clause interpretation of IEC 61508 as at that time the structure of the standard was not
confirmed.
As well as the Development Guidelines, there are 9 supporting reports that give additional
information and background to some of the recommendations in the Guidelines.
Although the Guidelines have been in use worldwide in the 10 years since their publication, a
number of issues can be identified.
Since there is no formal mapping with IEC 61508 in the Guidelines, despite many of the principles
being implemented, it is not always evident that the Guidelines provide an automotive industry
implementation of IEC 61508. Similarly, there is sometimes a perception that the Guidelines carry
less weight than IEC 61508, as they are not a standard, notwithstanding that in terms of Product
Liability and Product Safety legislation they represent best practice and therefore are equally
applicable.
In the UK the Health and Safety Executive (HSE), the government agency responsible for
enforcing health and safety legislation in industry and who are closely involved in the development
of IEC 61508, have stated “IEC 61508 will be used as a reference standard for determining
whether a reasonably practicable level of safety has been achieved when E/E/PE systems are
used to carry out safety functions. The extent to which Directorates/Divisions [of HSE] use IEC
61508 will depend on individual circumstances, whether any sector standards based on IEC 61508
have been developed and whether there are existing specific industry standards or guidelines.”
14.11.2006 2.0 A-16
EASIS Deliverable D3.2 Part 1
The guidelines are based around “integrity levels” rather than SIL. Although many practitioners
now use “SIL” when referring to the MISRA Guidelines, the MISRA definition of “integrity” is wider
than IEC 61508’s definition of “safety integrity” as it also encompasses the wider implications of
system failure such as economic or environmental loss as well as personal injury.
Furthermore, the techniques and measures in the Guidelines are not graded by SIL. Apart from
Table 3 in the Guidelines, which shows the rigour required in the software development process by
increasing SIL, there is no grading by SIL of any of the other recommendations.
The Guidelines assume that safety analysis processes will be carried out. A preliminary safety
analysis is explicitly required which encompasses hazard identification and hazard classification at
the concept stage of a system or early in its lifecycle. Detailed safety analysis is implied but not
described in detail beyond a short paragraph in the “Integrity” report.
The preliminary safety analysis required corresponds approximately to the hazard identification
and classification addressed in Appendix B of this deliverable.
The detailed safety analysis required corresponds approximately to the hazard occurrence
analysis addressed in Appendix C of this deliverable. Note that the MISRA Guidelines assume that
detailed safety analysis is an iterative activity. There is an implication that to some extent,
preliminary safety analysis may also be an iterative activity.
There is a perception that the Guidelines are only concerned with software. In fact the Guidelines
advocate a systems engineering approach, but provide guidance mostly for the software aspects.
The Guidelines do not explicitly address recent technology developments that may be used in
software development such as model-based development and automatic code generation.
A.2.5.3 Summary
The MISRA Guidelines represent an approach based on IEC 61508 that takes account of the
requirements of automotive systems and is, to some extent, more goal-based than prescriptive.
They could provide a good starting point for defining the EASIS approach, particularly to software
development.
The MISRA Safety Analysis (SA) guidelines are a new publication that gives guidance on the
management of functional safety in the context of automotive programs. The MISRA SA guidance
on functional safety management techniques is based around a safety management lifecycle. This
lifecycle is aligned with both the IEC 61508 functional safety lifecycle and the typical product
development lifecycles used for vehicles.
As well as providing a functional safety management framework, the MISRA SA guidelines contain
a detailed process for preliminary safety analysis and detailed safety analysis. The preliminary
safety analysis process is concerned with enabling safety requirements to be identified as part of
the process of setting targets or attributes for a new vehicle or a new vehicle system. It includes
guidance on hazard identification and hazard classification, including the use of a risk graph
technique for hazard classification. The risk graph incorporates the previous “controllability”
technique from the 1994 MISRA Guidelines. The safety requirements include the setting of random
14.11.2006 2.0 A-17
EASIS Deliverable D3.2 Part 1
and systematic safety integrity requirements. The random safety integrity requirements are
specified in terms of a target failure rate per hour, and the systematic safety integrity requirements
as a safety integrity level (SIL).
The detailed safety analysis process is concerned with iteratively applying inductive and deductive
analysis techniques to the design to confirm that the safety requirements have been implemented.
It is noted that the most common form of inductive analysis used in the automotive industry is
FMEA, and some advice is provided on applying FMEA in the context of a functional safety
management approach. Similarly, it is noted that FTA is a commonly-used deductive technique,
particularly as a means of predicting the target failure rate.
The MISRA SA guidelines are a goal-orientated approach to managing functional safety, within
which the requirements of standards such as IEC 61508 and the proposed ISO 26262 (see below)
can be met. The MISRA SA approach is compatible with these standards, in particular the
proposed ISO 26262, since in general it specifies the activities that are required but does not
prescribe the techniques that have to be used. The only potential incompatibility is that MISRA SA
suggests a different hazard classification scheme to that proposed in the draft ISO 26262.
However MISRA SA permits the user to select an appropriate scheme, provided it is documented
and justified in the company (or project) safety policy.
The MISRA SA scheme for hazard classification is discussed further in Appendix B.
In November 2005 a new ISO Working Group, ISO/TC22/SC3/WG16 “Functional safety” held its
inaugural meeting. The purpose of this ISO activity is to develop a new standard based on IEC
61508 for the functional safety of electrical, electronic and programmable electronic systems used
in safety-related applications in road vehicles. The new standard will be applicable to road vehicles
of classes M, N and O as defined by the Type Approval Directive 70/156/EEC.
At the present time, the working drafts are confidential to the ISO Working Group members. This
summary is therefore based on publicly available information such as conference presentations.
The Working Group is proposing that the new ISO standard, to be known as ISO 26262 [14], will
have the following parts:
• Part 1: Glossary
• Part 2: Management of functional safety
• Part 3: Concept phase
• Part 4: Product development – system
• Part 5: Product development – hardware
• Part 6: Product development – software
• Part 7: Production and operation
• Part 8: Supporting processes
Additional parts are also foreseen containing “Annex” material.
The motivation for developing ISO 26262 is that there are well-documented issues in applying
IEC 61508 directly to automotive systems (see for example Section A.2.4.2 of this document).
Thus ISO 26262 will consider:
• Using automotive, rather than process industry, lifecycle models (e.g. AutomotiveSPICE)
• Hazard analysis and risk assessment aligned to the automotive sector
• Validation practices in the automotive industry, for example hardware-in-the-loop tests, the use
of “LabCar”-type approaches.
One key difference between IEC 61508 and ISO 26262 is that it is proposed to use the concept of
automotive safety integrity level (ASIL) instead of SIL. There are four ASILs, ASIL A – ASIL D,
which represent the risk reduction required from a system with ASIL D being the highest. However
the rigour associated with ASIL D (for systematic techniques and measures) is considered broadly
equivalent to that required by IEC 61508 SIL 3, with no equivalent proposed for SIL 4. At the time
of writing this document, it was not clear whether ISO 26262 would include any numerical or
probabilistic requirements associated with ASILs.
The ASIL is determined by performing hazard identification and hazard classification. Hazard
classification considers three parameters:
• Exposure, which relates to the probability of being exposed to the hazard
• Controllability, which relates to the probability of the driver being able to control the hazardous
situation
• Severity, which relates to the severity of the outcome of the hazard.
This hazard classification scheme is further discussed in Appendix B.
Def Stan 00-56 is the UK Ministry of Defence standard for “Safety Management Requirements for
Defence Systems”[6]. It is in two parts, Part 1 being “Requirements” and Part 2 “Guidelines on
establishing a means of compliance with Part 1.” In the latest version, Issue 3, only compliance
with Part 1 is mandatory. With the release of Issue 3, the standards Def Stan 00-54 [4] (hardware)
and Def Stan 00-55 [5] (software) have been withdrawn. At the time of preparing this report, Def
Stan 00-55 was still available but its status was shown as “Obsolescent”.
The latest version of the standard is much less prescriptive than the previous version and is built
upon a goal-based approach. It provides an overall framework for safety management, but much of
the detail of implementation is permitted to be a project decision.
Part 1 sets out the framework for safety management. It is based on the following principles:
• The overall objective of the standard is to demonstrate that the risks associated with a system
are broadly acceptable, or if broadly acceptable cannot be achieved that the risks are tolerable
and reduced ALARP (as low as reasonably practicable). These concepts are discussed in
more detail in Appendix D of this deliverable.
• The standard may be applied to any system, not just to electronic or programmable systems.
• The emphasis is on safety being considered at the earliest possible stage of a project, and for
safety management activities to be included in the project plan from the outset.
• Safety management must be integrated into an overall systems engineering approach.
• There must be an auditable safety management system.
• A Safety Case is developed and maintained.
• All credible hazards and accidents are identified, the accident sequences defined, the risks
associated with the hazards are determined; and the risks are demonstrably reduced to a
broadly acceptable level, or to a tolerable level and ALARP.
• There are monitoring and reporting mechanisms for failures and for accidents or “near misses”.
• The provision for an independent safety assessor is included.
For the management of safety, the standard requires that a project appoint a Safety Manager and
that a Safety Committee is established. Decisions of the Safety Committee (e.g. to accept a
tolerable risk based on ALARP) have to form part of the Safety Case. A Safety Management Plan
has to be generated and updated throughout the life of the project.
An important part of the safety management activities is the establishment of a hazard log. A
hazard log is the primary mechanism for providing traceability of the risk management process and
assurance of the effective management of hazards and accidents. The hazard log shall be updated
through the life of the project to ensure that it accurately reflects risk management activities.
The standard defines risk management as comprising the following stages:
1. Hazard identification
2. Hazard analysis
3. Risk estimation
4. Risk and ALARP evaluation
5. Risk reduction
6. Risk acceptance
The standard notes that the combination of activities 1 to 3 is sometimes referred to as “risk
analysis” and the combination of activities 1 to 4 is sometimes referred to as “risk assessment”.
The standard does not require a specific technique to be adopted for these activities, but the
method that is chosen has to be demonstrable adequate and suitable.
particularly where a SIL (or equivalent) is used to link safety requirements explicitly to development
rigour. However it is stated that because of the assumptions that are implicit to the safety integrity
level schemes in these standards, problems may arise in novel applications or if a scheme is
applied outside the domain, regulatory regime or application for which it was intended.
A.2.8.3 Summary
Def Stan 00-56 presents a goal-based approach to the management of safety and provides a
model for less prescriptive framework that can be used for the management of system safety
across a variety of applications and domains. It can also provide the basis for developing a
framework from existing standards for a specific application or domain. The overall goal-based
approach, and the frameworks presented in Part 2 for dealing with complex electronics and
ALARP, are important inputs to the definition of an EASIS approach.
The MIL-STD-882D [11], issued by the US Department of Defense, defines and describes a set of
activities that shall be performed throughout the system life cycle when MIL-STD-882 is required in
a solicitation or contract. It does not define a development process in terms of how and when the
interaction between the activities shall be made. The activities required by the standard are the
following:
1. Documentation of the system safety approach
2. Identification of hazards
3. Assessment of mishap1 risk
4. Identification of mishap risk mitigation measures
5. Reduction of mishap risk to an acceptable level
6. Verification of mishap risk reduction
7. Review of hazards and acceptance of residual mishap risk by the appropriate authority
8. Tracking of hazards, their closures, and residual mishap risk
Some comments concerning the overall approach described in the standard and its applicability to
the development of automotive integrated safety systems are given below.
• The standard defines a hazard as "any real or potential condition that can cause injury,
illness, or death to personnel; damage to or loss of a system, equipment or property; or
damage to the environment". This definition encompasses a wide range of conditions
including the existence of explosive substances at a particular location. Such a wide
definition is appropriate for the type of systems addressed by the standard but in the EASIS
WT 3.1 work we have a more narrow scope. The hazards considered in the dependability
work package of EASIS are primarily associated with failures of automotive electronic
systems. Thus, the probabilities of these hazards are determined by the design of the
system. In other words, the hazard probabilities are a result of the system development
rather than an input to it.
• Although the standard does not explicitly state that the activities 2-6 are sequential steps
(with or without iteration) in a process, this seems to be an underlying assumption. Thus,
the approach appears to be "risk reduction in an existing system so that the residual risk is
acceptably low" rather than "development of a new system so that the residual risk is
1 A mishap is defined as an unplanned event or series of events resulting in death, injury, occupational illness, damage to or loss of
equipment or property, or damage to the environment.
acceptably low". For our purposes in the EASIS project, the latter approach is more
appropriate.
• The mishap risk is defined as "an expression of the impact and possibility of a mishap in
terms of potential mishap severity and probability of occurrence". The standard requires
that the mishap risk of each identified hazard is assessed. For the types of systems
considered in the EASIS project, however, the relationship between hazards on the one
hand and severities and probabilities of mishaps on the other is typically too complex to
allow a single probability and a single severity to be defined. Typically, a given hazard can
lead to several different consequences with different severities and different probabilities.
• The issue of how to determine the appropriate acceptable risk level is not mentioned in the
standard.
In summary, the standard is not particularly well suited for the type of systems considered in
EASIS. Still, parts of the standard provide valuable input to our investigation of the dependability-
related activities to be carried out during the development of an integrated safety system.
The United Nations Economic Commission for Europe (UNECE) is tasked with creating a uniform
set of regulations for vehicle design to aid global trade. Annex 18 of ECE-R13H for brake systems
[12] and Annex 6 of ECE-R79 for steering systems [13] are concerned with "Special Requirements
to be Applied to the Safety Aspects of Complex Electronic Vehicle Control Systems". These
annexes define requirements for documentation, fault strategy and verification. The requirements
may be summarized as follows:
• The function(s) of the system as well as the physical and logical system
structure shall be documented.
• The safety concept shall be documented. The safety concept is a description of
the measures designed into the system to ensure safe operation, for example
functional degradation or switch to a backup system when an error has been
detected.
• Analysis of the system's reaction to specific faults shall be documented, for
example in an FMEA and/or an FTA. The warning signals to be given to the
driver (and/or service inspection personnel) shall be documented for each
defined fault condition.
• Both the fault-free function and the safety concept shall be verified. For the
verification of the safety concept, error injection shall be carried out at the
discretion of the type approval authority. The verification results shall correspond
to the documented analysis, such that the safety concept and its implementation
are confirmed as being adequate.
From this brief summary, it should be evident that these annexes of the ECE regulations are not
particularly relevant to the investigation of dependability process frameworks in EASIS.
It is well-known that a simple "waterfall" development process, without iterations, is not suitable for
complex automotive electronics. Thus, the dependability-related activities in the development of an
Integrated Safety Systems will not follow each other in strict succession. Each activity will be
revisited several times during the development with the results influencing other activities. Figure
A.7 shows a simplified example, for purpose of illustration only, of how the dependability-related
activities may progress in time in a particular development project.
Depending on a multitude of factors, this chart could have virtually any shape. Some factors
influencing the shape of the chart are:
• The actual system being developed
• Role of OEM and suppliers in the development of the system
• Company-internal processes
• Applicable standards
For Integrated Safety Systems (ISS) in particular, components and subsystems already existing in
the vehicle are often involved in realizing the new function provided by the ISS. For example, both
the brake control system and the engine control system might be involved in realizing an
Integrated Safety Function (ISF), partly by making use of existing functionality and partly by
incorporating new features for the ISF of concern. This makes it even more difficult to define a
generic dependability process.
Hazard Occurrence
Analysis
Dependability-related high level detailed
Requirements
Verification
of dependability
Safety
Case
Since the process chart may have virtually any shape, EASIS Work Task 3.1 does not make an
attempt at defining a generic dependability process. Instead, our focus is on the specific
dependability-related activities that should be carried out. Process issues are instead investigated
in EASIS Work Package 4. Rather than trying to define the stepwise process in terms of the order
of activities and their interaction, EASIS WT 3.1 investigates each activity separately and provides
guidelines and recommendations for how to perform the activity, in early as well as late phases of
the development lifecycle. The information available as "input" to each activity and the information
to be produced by the activity will obviously be different depending on where we are in the
development process. For example, "identification of hazards" covers the following:
• how hazards can be identified in an early stage when only a rough idea of the
function(s) provided by the system, and almost nothing is known about the design and
implementation
• how hazards can be identified when the design work has progressed to a detailed level
• how hazards can be identified when there is a working prototype available
The dependability activity framework on which the EASIS WT 3.1 work is based is given in Figure
A.8. The figure shows the dependability-related activities that should be performed when an
integrated safety system is being developed. The definition of this framework is based on the
findings of the investigation of existing approaches in section A.2.
Identification of hazards A
Development
E F G and design of
B the integrated
Classification of Hazard occurrence safety system
hazards analysis
I H
J
Establishment of dependability-related C
requirements D
Verification and
K validation of
dependability-
related
requirements
Safety Case construction
Arrow in Figure Very early Early phase Middle phase Late phase
A.8 phase
H Descriptions of Descriptions of Descriptions of Descriptions of
how specific how specific how specific how specific
faults (related to faults (related to faults (related to faults (related to
the rough the conceptual the system the detailed
conceptual architecture) design) may system design)
architecture) may contribute contribute to the may contribute
may contribute to the occurrence of to the
to the occurrence of specific hazards. occurrence of
occurrence of specific hazards. specific hazards.
specific hazards.
I A list of hazards A list of hazards A list of hazards A list of hazards
in which every in which every in which every in which every
hazard is hazard is hazard is hazard is
categorized with categorized with categorized with categorized with
respect to how respect to how respect to how respect to how
undesirable it is. undesirable it is. undesirable it is. undesirable it is.
J-K Implementation- Requirements Refined Detailed
independent on error requirements on requirements on
requirements detection error detection error detection
(e.g. tolerable mechanisms mechanisms mechanisms
probability of and reaction to and reaction to and reaction to
hazards above a detected errors, detected errors, detected errors,
certain criticality in terms of broken down to for each
classification, "what" rather the subsystems. subsystem.
tolerable than "how".
Requirements
probability of
on which parts
specific hazards,
of the system
degree of fault
that should be
tolerance, etc.).
designed using
Conceptual some particular
solutions to method or
identified process (e.g.
problems. programming
language.)
From the description of the EASIS dependability activity framework, it should be obvious that the
scope of the activities within the framework is heavily dependant on what is meant by a "hazard".
Thus, we need a very precise description of what we mean by a hazard in this framework, much
more precise than typical hazard definitions such as "a condition that may lead to an accident". It is
important to understand that such a precise description, which is developed below, is only needed
to define the scope of the activities within this framework. The purpose of the explanation is not to
suggest a new definition of "hazard".
Figure A.9 shows the scope of three analysis activities: hazard identification, hazard occurrence
analysis and hazard classification.
Hazard
identification
Hazard occurrence analysis Hazard classification
(system-internal analysis) (system-external analysis)
This meaning of "hazard", for the purposes of the EASIS WT 3.1 work, is explained and motivated
by the following:
• It may certainly seem strange to define the "hazard" concept in a way that covers any
deviation from the desired behaviour, regardless of whether this behaviour is safety-
critical or not. However, with respect to the dependability activity framework, a benign
unwanted condition and a dangerous unwanted condition differ only in their degree of
undesirability. Furthermore, the EASIS project is concerned with Integrated Safety
Systems and such systems are by definition related to safety. For the purposes of our
work, any undesired condition can therefore be considered a hazard. Whether or not a
particular such condition really affects the safety is analysed in the Hazard
Classification activity, not in the Hazard Identification activity. In order to allow the
Hazard Identification to be carried out before Hazard Classification (which is obviously
necessary), we do not limit the hazard concept to safety-critical conditions only.
• It is tempting to define a hazard as an undesired behaviour rather than an underlying
condition or state. However, the relationship between a state (or condition) and the
corresponding behaviour is often dependent on the driving situation. Likewise, the
relationship between a given vehicle behaviour and its effects is also dependent on the
driving situation. If hazards were defined as undesired behaviour, the driving situation
would influence both the occurrence and the outcome of the hazards, generally
resulting in a complex analysis which does not allow the occurrence and the outcome to
be studied independently. This would prevent a clear separation between system-
internal and system-external analysis. Therefore, we prefer a state-based definition of
"hazard" to a behaviour-based one. However, it should be noted that when the state (or
condition) always - or almost always - results in a corresponding undesired vehicle
behaviour, it is often more practical to define the hazard as the undesired vehicle
behaviour as illustrated by the first example below:
o Example: One undesired behaviour of an airbag control system is the inflation of the
airbag in a non-collision situation. The underlying state, i.e. the corresponding
hazard, could be described as "inability to avoid inflating the airbag". Whenever this
inability occurs, it will result in an airbag inflation, so the hazard is more
conveniently described as "inflation of the airbag in a non-collision situation".
o Example: Another undesired behaviour of an airbag control system is the non-
inflation of the airbag in a collision situation. The underlying state, i.e. the
corresponding hazard, could be described as "inability to inflate the airbag". In this
case, the inability will only lead to the undesired behaviour if a collision occurs.
• Hazards are defined with respect to the vehicle and not with respect to the system of
concern. If hazards were defined with respect to the system, they would typically be
expressed in a very complex way which would make them difficult to understand and
difficult to reason about as shown in the following example.
o Example: For conventional ABS in cars, the hazards could in principle be defined in
terms of the state of electrically-controlled hydraulic valves with respect to "open"
and "closed", since the ABS output boundary would typically be chosen to coincide
with the state of these valves (or with the corresponding control signals to the
valves). The hydraulic lines, brake callipers, brake pads, brake disks, tires and road
surface are usually not considered to be parts of the ABS system even though they
influence the ABS operation.
It should be noted that the examples provided above are heavily simplified. All relevant
characteristics of the hazards should be included in the description of a specific hazard. Thus, a
hazard can often be broken down into several different hazards with different characteristics.
Examples of such characteristics are:
• The magnitude of the potential deviation from the desired behaviour (force, velocity,
etc)
• The duration of the hazard. (Some hazards are characterized by a specific duration
because of the way the system is, or may be, implemented)
• Information provided to the driver about the existence of the hazard. (For the airbag
example, there is obviously a large difference between "airbag inoperable and driver is
informed about this" and "airbag inoperable and driver is not informed about this".)
The subject of hazard descriptions is addressed in more detail in Appendix B.
With the proposed interpretation of "hazard", the investigation of the causal relationships that lead
to hazards are in principle confined to the system under consideration2. The behaviour of the
driver, the traffic situation and other environmental conditions do not influence the hazard
occurrence. This simplifies the investigation of "what might cause the hazard". The driver
behaviour, the traffic, etc will only affect the effects of the hazard. Thus, the investigation of the
hazard is separated into a system-internal analysis ("What might cause the hazard?") and a
2 The root cause may be system-external, for example electromagnetic interference, water, mechanical vibrations or physical damage. It
may also be located in a different system which provides input data to the system of concern. In all of these cases, however, there is a
system-internal event or state that can be considered as a cause of the hazard in the Hazard Occurrence Analysis.
system-external analysis ("What are the potential outcomes and how likely are they given that the
hazard occurs?"), as shown in Figure A.9.
A.3.2 Special cases requiring a different approach than the one outlined in the dependability
activity framework
For hazards potentially caused by faults in the system of concern or in its input information, the
dependability activity framework outlined in this document is a logical way of defining the
dependability-related activities that should be performed when a system is being developed.
However, there are some cases when the framework is less appropriate:
• A particular hazard may be continuously present due to natural limitations of the technology
employed. Integrated safety applications typically use environment-sensing components
such as radar, laser and cameras. It is well-known that natural limitations of these
technologies as well as environmental conditions may cause the sensors and their
associated signal processing to deliver incorrect information about the state of the physical
objects that they monitor. Even if a system is fault-free, the behaviour of the vehicle may
therefore still be undesired from the driver's point of view. The inability to cope with a
particular environmental situation is thus present continuously and the occurrence of the
undesired vehicle behaviour will depend on the occurrence of this environmental situation.
• Another case is when the intended system behaviour may be undesirable in some specific
(typically rare) situations. For example, most people would agree that it should not be
possible to start a vehicle without the proper key (or some similar device) but it is possible
to envision scenarios in which it would be highly desirable to be able to start the vehicle
without it. Many other - and better - examples can be found.
• A third case is when the user may misunderstand the operation of the system or may have
wrong expectations about its capabilities.
• A fourth case is when the driver may become distracted by the operation of the system or
by the information provided via the Human-Machine Interface.
In all of these cases, the dependability approach can not be separated into a system-internal
analysis (Hazard Occurrence Analysis) and a system-external analysis (Hazard Classification).
Such hazards are in principle outside the scope of our work, but the following general
recommendations concerning how to deal with them can be given:
• Identify hazards associated with natural limitations, specific situations, misunderstandings,
distraction (or associated with any other phenomenon that is not related to faults, errors or
failures)
• For each such hazard, investigate the feasibility and suitability of the following actions:
o Selection of a technology that has little inherent limitations
o Provision of information to the user about specific situations and how to handle
them (typically in the user manual)
o Provision of information to the user about the operation, capabilities and limitations
of the system (typically in the user manual)
o Particular care in the design of the Human-Machine Interface, including the
displaying of visual information
o Introduction of in-vehicle stickers such as the well-known warnings concerning
passenger airbags and child seats
o Re-definition of the basic functionality of the system
References
[1] S. Amberkar, J. G. D’Ambrosio, B. T. Murray, J. Wysocki, B. J. Czerny, "A System-Safety Process for By-
Wire Automotive Systems", SAE 2000 World Congress, SAE 2000-01-1056, 2000.
[2] ARP 4754: Certification Considerations for Highly-Integrated or Complex Aircraft Systems, Society of
Automotive Engineers, 1996.
[3] ARP 4761: Guidelines and methods for conducting the safety assessment process on civil airborne
systems and equipment, Society of Automotive Engineers
[4] Def Stan 00-54 Requirements for Safety Related Electronic Hardware in Defence Equipment, UK Ministry
of Defence, 1997.
[5] Def Stan 00-55 Requirements for Safety Related Software in Defence Equipment, UK Ministry of Defence,
1997.
[6] Def Stan 00-56 Safety Management Requirements for Defence Systems, UK Ministry of Defence, 2004.
[7] Development Guidelines for Vehicle Based Software, MISRA, 1994.
[8] DO-178B Software Considerations in Airborne Systems and Equipment Certification, RTCA, 1999.
[9] DO-254 Design assurance guidance for airborne electronic hardware, RTCA
[10] IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems, IEC,
1998.
[11] MIL-STD-882D Standard Practise for System Safety, U.S. Department of Defence, 2000.
[12] ECE R13-H, UNIFORM PROVISIONS CONCERNING THE APPROVAL OF PASSENGER CARS WITH
REGARD TO BRAKING, United Nations Economic Commission for Europe
[13] ECE R79, UNIFORM PROVISIONS CONCERNING THE APPROVAL OF VEHICLES WITH REGARD TO
STEERING EQUIPMENT, United Nations Economic Commission for Europe
[14] C. Jung, Stand des ISO-Standards zur Funktionalen Sicherheit für die Automobilindustrie, (mainly in
English), Presentation at Safetronic 2005
14.11.2006 2.0 ii
EASIS Deliverable D3.2 Part 1 - App B
Table of contents
The objective of this Appendix is to provide guidance on the processes of hazard identification and
hazard classification within the context of the dependability activities of an Integrated Safety
System. In the context of this Appendix, the emphasis is understood to be particularly on the
functional safety aspects of dependability, although the subject in general has a much wider scope.
In broad terms, hazard identification is concerned with a process of determining the hazards that
are associated with a system. Hazard classification is a process of determining the criticality of a
hazard in terms of its potential consequences given that the hazard occurs. This classification
provides the basis for determining requirements related to the prevention of the hazard. In generic
standards (such as in [5]), this is often expressed as the necessary risk reduction in order to
reduce the hazard risk to a broadly acceptable level. This concept is shown in Figure B.1 below
(adapted from [5]). For automotive electronic systems, the main emphasis is usually on preventing
the hazard occurring, or reducing the probability of it occurring, such that the resulting hazard risk
is at or better than the “broadly acceptable” level. Note that this part of the analysis is outside the
scope of the Appendix (see Appendix C “Hazard Occurrence Analysis”).
One of the most fundamental concepts of the process outlined above is the word “hazard”. In the
system safety context, the meaning of this word differs from the natural language understanding of
“hazard” (and indeed the dictionary definition). The natural language usage of hazard is anything
that is potentially dangerous or has the potential to cause harm, even if the combination of
circumstances leading to actual harm is improbable. Most states of a system have the potential to
cause harm or lead to an accident given certain conditions. In the natural language usage, a
moving vehicle is in a hazardous state; however it is not possible to design any transportation
system where the vehicles remain stationary. In the system safety definition, a vehicle travelling
along an empty road in good weather, with good road surface conditions, etc. does not constitute a
hazard. A hazard would be, for example, that the engine produces more torque than the driver
demands. This may or may not lead to an accident depending on the ability of the driver to react to
the hazard, including the application of appropriate backup systems or other mitigating measures.
The choice of boundary for the system and for hazard identification is therefore very important.
The boundary chosen needs to include the states over which the system designer has control, but
there is no purpose defining the boundary to include all conditions that could contribute to a
particular accident as many of these will be outside the control of the system designer.
The following meaning of “hazard” is assumed in the context of the EASIS dependability
framework. It is based on consideration of a number of system safety standards, notably [3]. Note
that this is not intended as a definition but as an explanation of what type of hazards that are
identified, classified and analysed in the EASIS dependability framework. Appendix A discusses
this issue further.
A hazard is an undesirable condition or state of a vehicle that could lead to an
undesirable outcome depending on other factors that can influence the outcome.
This recognizes the following sequence of events that occurs in order for a fault in a system, or
other event, to lead to an accident and ultimately to harm:
When a particular system is being considered, we are generally only concerned with those hazards
that result from that system. Note, however, that for integrated safety systems it may also be
necessary to consider those hazards that result from emergent properties, that is, from interactions
between individual systems. An example of such an emergent property may be seen by
considering a traction control function and a cruise control function. The traction control function
detects that the drive wheels are slipping and reduces engine torque. A side-effect of this is that
the vehicle slows down, so the cruise control requests increased engine torque to compensate.
Unless the interaction of these systems is correctly defined (for example, by the engine
management system having a function to prioritise torque requests and cancel a conflicting
function) then the two functions could “fight” each other.
The objective of the Hazard Identification section is to explain what is meant by hazard
identification, the different options available when performing hazard identification as well as
recommending an EASIS hazard identification process.
Hazard identification is the process of finding the vehicle-level states and conditions that could
contribute to the occurrence of undesirable outcomes and that are associated with the particular
system of concern. In this document, we are primarily concerned with hazards associated with
potential malfunctions, but hazards associated with fault-free operation of the system are also
addressed to some extent. Note that a clear separation between correct function and malfunction
can only be made if there is a complete functional specification available which is usually not the
case. There might be hazards created by the correct function of the system of concern which
should be considered. Example: a correctly deploying airbag could seriously injure a child in a
reverse-facing child seat. This is clearly a hazard that should be identified and handled but as long
as it is not a function of the airbag system to detect this situation the hazard is not related to a
malfunction.
Without having identified the hazards of the system, it is impossible to analyse the associated risks
and subsequently address these during the system development.
The hazards that will be focused upon in this section are the ones related to Integrated Safety
Systems. A very simple model of such a system is shown in Figure B.2. The system monitors the
environment via inputs, it acts on the vehicle through an interface and it determines this action in
either one processing unit (electronic control unit, ECU) or distributed over several ECUs
interconnected via one or more communication networks.
3.
Interface
Input
System Under
Consideration
A
S
S
S
ECU1
ECU3
ECU2 A
S
Vehicle Effect
2. 1.
S
S
ECU1
A
ECU2 S
S
A
ECU3
2 Hazards can be identified by looking at what might cause it, looking inside the system under
consideration. For example, a sensor could be faulty or an ECU could make an inappropriate
decision due to a hardware or software fault.
3 Hazards can be identified by looking at the interfaces between the system under consideration
and the vehicle. For example, the system under consideration could create an undesirable
mechanical torque via an actuator.
As input to the hazard identification, as much information as possible about the system should be
gathered. Once identified, the hazards may be analysed with respect to criticality (“classification of
hazards”) and occurrence (“hazard occurrence analysis”). This is illustrated in Figure B.3 which
shows the position of the hazard identification in the EASIS Dependability Activity Framework.
Identification of hazards
Development
and design of
the integrated
Classification of Hazard occurrence
safety system
hazards analysis
Establishment of dependability-related
requirements
Verification and
validation of
dependability-
related
requirements
Safety Case construction
A hazard can usually be expressed as a specific inability of the vehicle to behave as desired
and/or intended.
All relevant hazard characteristics should be included in the description of a specific hazard. Thus,
a hazard can often be broken down into several different hazards with different characteristics.
Examples of such characteristics are:
• The magnitude of the potential deviation from the desired behaviour (force, velocity,
etc)
• The duration of the hazard. (Some hazards are characterized by a specific duration
because of the way the system is, or may be, implemented)
• Information provided to the driver about the existence of the hazard. (For the airbag
example, there is obviously a large difference between "airbag inoperable and driver is
informed about this" and "airbag inoperable and driver is not informed about this".)
A description of a hazard does not necessarily have to be static throughout the development
project. In the beginning of the project it might be useful to formulate the hazard in very coarse
terms. In later stages when more detailed information about the system implementation is
available, the hazard may be more precisely defined, taking into consideration magnitudes and
duration of deviations as well as other characteristics of the hazard.
Some examples of hazard formulations are given below:
• Undemanded inflation of the airbag
• Inability to inflate the airbag
• Time-limited inability to inflate the airbag
This section gives an overview of the various methods available for hazard identification.
Table B.1 shows the template used in this document for describing the hazard identification
methods. It is based on a template in [2].
Development engineer Description of how the development engineer will use this
perspective method.
Organisational Description of how the organisation handles this method
perspective
Preconditions What will need to be available before this method can be
used?
Applicability range What types of error will it find (HW/SW/HMI/organisational)
Life cycle stage The earliest life cycle stage where the method is applicable
Experience in automotive Has the method been used previously in automotive
application development?
Related methods Alternate, overlapping or complementary methods.
Availability and tool Indicates the availability of tools and the commercial
support readiness of the method
Maturity The extent to which the method is ready for “prime time” and
has proven itself useful in application.
Evaluation
B.2.2.2 Checklists
Checklists
References used None
Alternate names None
Primary objective The goal of using checklists is to reduce the possibility that a
known hazard, which is relevant for the system being
considered, is missed when hazards associated with this
system are identified.
Description The basic tool for identifying hazards, and the one most easily
used, is a checklist. This lists the most common hazards which
should be taken into consideration when analysing a system in
order to identify hazards.
The checklists are developed by domain and dependability
experts, and should contain all hazards that are known to be
relevant for a vehicle, thus being potentially relevant for any
automotive system. The hazards on the checklists could also
be accompanied by a pre-decided hazard classification. (For
further information about hazard classification see section B.3).
Reference information
Development engineer The checklist helps the development engineer to reduce the
perspective possibility that common hazards are missed in the hazard
identification.
It is important to stress the fact that the developer should not
assume that all hazards have been identified just because the
checklist has been run through and "boxes have been ticked".
There might be hazards associated with the particular system
that are not covered by the generic checklist.
Organisational The checklists can be constructed from several different
perspective sources. They could, for example, be constructed from earlier
experiences of actual accidents/incidents, either from similar
systems or from other types of systems. Other good sources
are, of course, experts; both the application experts (for the
system at hand) and dependability experts (for all types of
systems).
For the development of the checklists it is important to connect
the hazards to possible accident scenarios and what
constraints the system has to abide to.
Preconditions Before the development engineer can start using the checklists
for hazard identification, he/she needs to have the checklists
ready (usually from someone else) and a good understanding
of the system and the system boundary.
Applicability range Checklists cover all types of hazard identification depending on
the area the actual checklist is defined for.
Life cycle stage Checklists are primarily useful in early life cycle stages.
Experience in Checklists overall are well spread in the automotive industry
automotive application but we do not know of any checklists containing “standard”
hazards for automotive systems.
Related methods None
Availability and tool Checklists are a very simple technique without a great need for
support advanced tools, but it is easy to imagine a web based tool for
administrating the checklists.
Maturity Checklists are commonly used in the automotive industry.
However, they may need updating to address hazards
associated with advanced future systems.
Evaluation
Ease of integration Checklists are very easy to integrate with most other methods.
Documentability Checklists leave an easy trail of documents.
Advantages Easy to use for the engineer. The possibility that a relevant
hazard is forgotten is reduced. Checklists can be re-used and
continuously improved.
Disadvantages Will only help finding known hazards. A particular system may
have hazards that are hitherto unknown and therefore missing
in the checklist. Thus, checklists should be used with some
care.
We would like to point out that the tables above, of course, are not official hazard lists from the
EASIS project, but only simplified examples.
HAZOP
References used [7], [10], [11]
Alternate names HAZards and OPerability analysis
Primary objective Identification of potential deviations from the expected
operation of the system and identification of the hazards
corresponding to these deviations.
Description HAZOP assumes accidents are caused by deviations from
the intended operation, such as no signal or wrong signal
value in an electronic system.
It is a qualitative technique based on the question: ”What if
[entity] [attribute] is [guideword]?”, where [entity] is a label
associated with an interconnection between components,
the [attribute] is a property of this entity and [guideword] is a
Reference information
(commission)
• Incorrect operation of function (stuck, high, low,
etc.)
2. Postulation of hazards based on the failures in these
functions.
3. Determination of the effects of each failure. Whenever
appropriate, effects are determined in combination with
other contributing factors, e.g. environmental factors.
Organisational The organisation should have a defined process for how to
perspective carry out the FHA analysis.
Preconditions The functions of the system have to be defined before a
functional hazard assessment can be carried out.
Applicability range FHA will find hazards related to the functionality of the
system.
Life cycle stage The hazard identification part of FHA can be performed as
soon as the system functions have been defined even if
these definitions are coarse and conceptual.
Experience in automotive FFA has been used at Volvo Car Corporation as described
application in [9]
Related methods Both FMEA and HAZOP are methods that focus on other
aspects than the functionality and can thus be used in
parallel to FHA.
Availability and tool We are not aware of any specific tool support for this
support method.
Maturity The technique has been used for a while in the aeronautic
industry, but is quite new as a tool in the automotive
industry.
Evaluation
Ease of integration The method is quite easy to understand and can be used in
parallel with other methods.
Documentability The tables resulting from the analysis provide a good
documentation.
Advantages FHA is a systematic and structured technique. It is also
relatively simple and straightforward to apply.
Disadvantages It is quite hard to find hazards that are caused by fault
combinations.
Figure B.4 shows an example state transition model with states and transitions describing the
behaviour of a hypothetical cruise control system. The following questions (and many more) may
be formulated during the hazard identification process:
• What happens if the system moves from the Stand By state to the Active state when the
"set +/-" condition is not fulfilled?
• What happens if the system does not go into the Stand By state from the Active state when
the driver does a "cancel" action?
The associated hazards can be described as:
• Spontaneous activation of the cruise control
• Not possible to deactivate the cruise control
Faulty
error error
repair error
set +/-
on
Off Stand By Active
off
cancel
off
B.2.2.6 FMEA
As discussed in section A.3.2 of Appendix A, hazards may exist even when there is no fault, error
or failure involved. Such hazards may be identified by careful consideration of relevant questions,
including the following:
• Are their any inherent limitations in the employed technology that could make the system
behave in an undesirable way in particular situations?
• Are there any potential scenarios in which the intended functionality of the system is
undesirable?
• Is there a possibility that the driver (or other person) may interact with the system in an
inappropriate manner due to a misunderstanding of its functionality?
• Is there a possibility that the driver (or other person) may have wrong expectation about the
capabilities and limitations of the system?
• Is there a possibility that the driver (or other person) is distracted by the behaviour of the
system or by the information provided via the Human-Machine Interface?
Hazards of this type are in principle outside the scope of our work so here we only conclude that
such hazards have to be considered in addition to the failure-related hazards.
It should be noted that a fault may lead to more than one identified hazard. The resulting
combination can be considered as a separate hazard and should be included in the list of hazards.
The possibility of such combined hazards should be considered when the hazard identification is
carried out. This implies that an iteration is necessary between the Hazard Identification and the
Hazard Occurrence analysis, since the latter investigates the relation between causes and the
resulting hazards.
As a simple example, let us assume that the following hazards have been identified for a
Collision Avoidance System (CAS):
• H1: unnecessary reduction of engine torque
• H2: unnecessary activation of brakes
It is quite obvious that any fault that makes the CAS falsely believe that a collision is about
to occur would lead to a simultaneous occurrence of H1 and H2. Thus, it is reasonable to
assume that sensor faults and faults in the CAS software or hardware may lead to the
combination H1+H2. A third hazard should therefore be defined as follows:
• H3: unnecessary reduction of engine torque and simultaneous activation of
brakes
Some hazard identification techniques are less effective than others for identification of combined
hazards. For example, a HAZOP performed on the outputs of a system would typically not find
combined hazards since it considers one output at a time.
The most effective way to identify hazards appears to be to use several methods since they
complement each other. Checklists can be used for an easy start followed by HAZOP, FHA and
'Hazard Identification Based on State Transition Models' for the different aspects they bring into the
analysis. FMEAs are already a part of typical automotive development processes and it is a simple
task to transfer the "Effects" listed in the FMEA tables to the hazard list. Identification of hazards
that are not related to failures (see B.2.2.7) should also be made and the results incorporated in
the list of hazards.
In an early phase the hazards are typically described somewhat coarse, to be refined into more
detail in later phases.
It should be ensured that people with different competences and viewpoints are involved in the
hazard identification.
Finally, it needs to be pointed out that hazards may be identified by other methods than those
listed above. For example, a vehicle test drive may reveal a hitherto unknown hazard. In fact, any
activity performed during the development of a system may have hazard identification as a “side-
effect”. It is essential that such hazards are communicated and included in the list of hazards.
Here, we only conclude that a hazard reporting scheme should exist so that hazards detected
during development are properly identified and documented.
In Section B.3.1, a number of existing approaches to hazard classification are analysed. A novel
approach is presented in Section B.3.2.
A number of existing approaches to hazard classification have been examined and are
summarized in the following subsections.
IEC 61508 [5] presents a risk graph approach in Part 5, “Examples of methods for the
determination of safety integrity levels” [5]. Note that this is an informative part of the standard
rather than one of the normative parts. Furthermore, it must be emphasized that the risk graph
presented in the standard is an example. The parameters used in the risk graph and their
weightings need to be developed for each sector and/or application, and should be defined in
sector specific standards. It should also be noted that risk graphs are typically applied for low-
demand (protection) systems where a specific safety-related system is added to an existing
process or system to act as a risk-reduction measure1.
The risk graph is based on the following equation:
R = f(f , C)
where:
R is the risk associated with the hazardous event with no protection measures in place;
f is the frequency of the hazardous event with no protection measures in place;
C is the consequence of the hazardous event (e.g. could be related to injury or to
environmental harm)
f represents a generalized (but not specified) function that combines the parameters. In the
most general sense, this could be by a quantitative means (in which case the combination
is by multiplication), or by a qualitative means (such as is used in a risk graph scheme).
The frequency f of the hazardous event is considered to be made up of three influencing factors:
• The frequency of and duration of exposure in the hazardous zone;
• The possibility of avoiding the hazardous event;
• The probability of the hazardous event taking place (without any protective measures),
which is called the probability of the unwanted occurrence.
This leads to the following parameters in a general risk graph scheme:
C consequence of the hazardous event
F frequency of, and exposure time in, the hazardous zone
P possibility of avoiding the hazardous event
W probability of the unwanted occurrence
1The model of hazards and risks used in IEC 61508 is different from the automotive model, as discussed in
Appendix A. We present the IEC formulation of a risk graph as an example of the type of approach to
hazard classification that can be used.
It may also be necessary to develop additional or alternative risk parameters depending on the
application sector and the technologies in use.
These parameters are combined using a graph (note that “graph” is used in the mathematical
sense).
W3 W2 W1
CA
a - -
PA
1 a -
CB FA PB
Start FB 2 1 a
PA
CC FA PB
FB PA
3 2 1
CD FA PB
FB 4 3 2
PA
PB
b 4 3
Figure B.5 shows the basic structure for a generic risk graph. Each of the parameters C, F and P
is evaluated in turn leading to the requirements shown under the column “W3”. This column shows
the required risk reduction assuming that all of the risk reduction is to be achieved by E/E/PE
(electrical, electronic or programmable electronic) systems. If there are other non-technology
based means of risk reduction (i.e. external risk reduction facilities) then it can be argued that a
different rating for the W parameter is used, which reduces the risk reduction required for the
E/E/PE systems. The required risk reduction is indicated by the legend in the appropriate box as
follows:
- no safety requirements
a no special safety requirements
1–4 SIL required for the E/E/PE system
b a single E/E/PE system is not sufficient to achieve the required risk reduction
Certain aspects of the risk graph approach as presented in IEC 61508 can be difficult to apply
directly to automotive systems, where safety is an inherent part of the functionality of the system
and not achieved through a separate system or function. For example the W parameter is difficult
to interpret in the automotive sector and different approaches have therefore been used as
described in the sections below about the MISRA Risk Graph and ISO WD 26262 Risk Graph
approaches.
The parameters in the risk graph have to be developed according to the application. For example,
the following scheme might be used:
Please note that these classifications are for illustration only. For examples of how the risk graph
has been interpreted in practice, see the discussion below of the MISRA Risk Graph and ISO WD
26262 Risk Graph approaches.
It is possible to assign numerical ranges to some of the parameters. In this case, the risk graph is
now a semi-qualitative approach and can be referred to as a “calibrated risk graph”. This approach
is described further in the sector-specific development of IEC 61508 for safety instrumented
systems in the process industry sector (see Part 3 of IEC 61511 [6]).
A “weighting” can be applied to one or more of the parameters. For example, it might be decided
that for the highest consequence parameter CD this has a far greater weight than F or P; and for
the next highest consequence parameter CC that this and F have a far greater weight than P. In
this case the risk graph would be modified as shown in Figure B.6.
W3 W2 W1
CA
a - -
PA
1 a -
CB FA PB
Start FB 2 1 a
PA
PB
CC FA
3 2 1
FB
4 3 2
CD
b 4 3
B.3.1.2 Controllability
The “Controllability” approach to hazard classification was developed for road transport
applications. It was first introduced by the EC-funded project DRIVE Safely [8]. It was then
adopted and enhanced by the UK Government supported project MISRA [13]. Since then it has
been used to assess the risks associated with a variety of novel in-vehicle and roadside systems.
It has also been recommended for use in assessing the safety properties of integrated traffic
control systems [16]. A summary of the approach is presented here, and fuller details along with
examples can be found in the MISRA Technical Report Controllability [14] or the MISRA Safety
Analysis Guidelines.
In general, it has been found that the Controllability approach is best suited to hazards
characterized by moving vehicle scenarios. Hazards that are not associated with motion of the
vehicle may be better addressed by other techniques such as a Risk Graph. An example would be
an anti-trap function on a window lift system – this is an example of a classical protection system
as envisaged by IEC 61508 and it is difficult to interpret some of the Controllability concepts when
considering its hazards.
The term “Controllability” refers to the ability of the driver, another vehicle occupant, or another
person interacting with the system to control the safety of the situation following a failure. This
approach recognizes that in road transport scenarios, a failure in a system does not necessarily
lead to an accident. Depending on other factors, which are discussed more fully in the
Controllability report, the driver or other operator of the system may be able to react to the failure
situation and prevent an accident from occurring. This chain of events may be represented
diagrammatically as shown below.
Failure ⎯may
⎯⎯ ⎯→ Loss of control ⎯may
lead to
⎯⎯ ⎯→ Accident
lead to
Each hazard is classified by assigning it one of five Controllability categories. The controllability
categories are defined in the following table. Note that the first column is a short descriptor for the
Controllability category; each category is fully defined only by the text in the second column.
To arrive at a Controllability category for a hazard, the following four parameters are considered
and graded:
1. Level of system inter-dependency (I)
2. Loss of authority or control due to the hazard (A)
3. Provision of backup or mitigation (B)
4. Reaction time (T)
The first two parameters are concerned with the importance of the system under discussion, and
the amount of authority or influence that it has to affect, or maintain, the safety of the situation.
The second two issues are concerned with what the user(s), normally the driver(s), can do to
maintain control of the safety of the situation in the event that the hazard under consideration
occurs.
Initially these parameters are considered separately and a grade from A (worst case, high value) to
E (best case, low value) is assigned to each one. Note that each parameter may take any grade
out of A, B, C, D or E. In the sections below that describe the parameters, the descriptions given
against grades A, C and E show the range over which the parameters are graded. See the section
on benchmarking of hazard classification for an example which explains this.
Once grades for the four parameters have been obtained, then the final consolidated grade is
considered. In general, the highest of the grades is taken. Note that a simple average of the
grades is not taken since the grades do not have the same dimensions; they actually define a
point in the four-dimensional space of the parameters. Once a possible final grade has been
chosen, the full definition of the corresponding Controllability category (Grade A implies
“Uncontrollable”, Grade B implies “Difficult to Control” etc.) is studied to confirm that it does indeed
reflect the controllability of the safety of the situation that results from the hazard under
consideration. The final grade for each hazard should be chosen carefully and reasonably, since
the highest final grade will be used to define the safety integrity requirements of the system, and
the risks created by this system must be demonstrated to be broadly acceptable or tolerable and
reduced as low as reasonably practicable (ALARP).
It is usual practice to record the grading allocated to the four parameters that are used to assign
the Controllability category, along with any notes or observations. An example is shown:
B.3.1.2.1 Parameters
The parameter “level of system interdependency” is concerned with system integration. It relates to
the degree to which other systems are relying on the correct functioning of this system for their
own correct functioning, e.g. when this system provides data for use by other systems in this
vehicle. It should be a functional dependency, not just the existence of a communications link
(which is a design issue that will be assessed later during detailed safety analyses). The concern
is not whether this system is part of a tightly integrated application, which would warrant a safety
analysis in its own right, but whether this system is providing data for other, and distinct, systems
that will modify their functionality according to the value of that data.
The grades for this parameter are allocated as follows:
A. Full functional inter-dependency
B. ↑
C. Partial functional dependency
D. ↑
E. Autonomous system or function
The parameter “loss of authority or control due to the hazard”, relates to the system under
investigation, which is inside the system boundary. Each hazard will reduce the authority of this
system and/or the ability of the user(s), or vehicle occupant(s), to maintain a safe situation.
The grades for this parameter are allocated as follows:
A. Full authority/control lost
B. ↑
C. Partial authority/control lost
D. ↑
E. No effect
The parameter “provision of backup or mitigation”, relates to any other functions outside the
system boundary that are being, or may be, used to control the safety of the situation following a
failure.
B.3.1.2.1.4Reaction time
The parameter “reaction time” refers to the speed with which the user(s), normally the driver(s),
must be able to apply the backup functions outside the system boundary in order that they will
succeed in creating a safe state. The grades for this parameter are allocated as follows:
A. Much faster than humanly possible
B. ↑
C. Similar to human
D. ↑
E. Similar to normal traffic situation
Note that:
• “Similar to normal traffic situation” in this context refers to the speed of reaction necessary to
maintain normal safe traffic conditions, i.e. the road user, or the driver, does not have to
perform any extra or different tasks to maintain the safety of the situation.
• “Similar to human” refers to a speed of reaction necessary in an emergency situation, but
within the capabilities of most road users. This is a scenario where road users, or drivers, have
to perform one or more tasks that they did not expect to have to do until the hazard under
consideration occurred.
• “Much faster than humanly possible” refers to scenarios, such as platooning, in which vehicles
are legitimately not under the immediate control of their drivers.
The MISRA Safety Analysis Guidelines [19] incorporate a Risk Graph approach to hazard
classification. The MISRA Risk Graph maintains the Controllability scheme that has been proven
over several years of use for moving vehicle hazards, while incorporating an additional scheme
that is more suited to the non-moving vehicle and protection system hazards. The MISRA Risk
Graph has also been designed to deal with possible future systems whose control authority is
greater than a single vehicle, for example, co-operative driving systems and infrastructure-based
systems.
The MISRA Risk Graph caters for three types of hazards:
• Hazards associated with (loss of control of) a moving vehicle – these are referred to as “moving
vehicle hazards” or “hazards associated with the control of a moving vehicle”
• Hazards that are not associated with loss of control of a moving vehicle – these are referred to
as “non-moving vehicle hazards” or “hazards not associated with the control of a moving
vehicle”
• Hazards of protection systems (in the classical sense of IEC 61508) – this is a subset of the
non-moving-vehicle case.
The MISRA Risk Graph approach considers three input parameters:
• The potential severity of the outcome of the hazard
• The frequency of exposure to the hazard
• The possibility to avoid the hazard.
The output parameter is a hazard classification which is called the “hazard risk”. This is the risk
associated with the hazardous event given that the hazard has occurred.
The MISRA Risk Graph may therefore be expressed in the following way:
Risk = f(S, F, P) – Used for hazards not associated with control of a moving vehicle, or for hazards
that will be mitigated by a classical protection system.
Risk = f(S, F, C) – Used for hazards that are associated with control of a moving vehicle
where:
S represents the potential severity of the consequences of the hazard
F represents the frequency of exposure to the hazard
P represents the probability of failing to avoid the hazardous event
C represents “controllability”, which is an estimate of the degree of control over the safety of the
situation following a hazard
f represents a general function that combines the parameters.
Note that the purpose of the C and P parameters is the same; however, experience has shown that
an assessment of controllability is naturally suited to those hazards characterized by the moving
vehicle whereas this is not the case for other types of hazard.
For the moving vehicle case F is currently assumed to be constant but is shown in the formula for
completeness.
The MISRA Risk Graph is shown in the diagram below:
P1 P2
S1 F1/F2
R1 R2
F1
R2 R3
F2
R3 R4
No
S2 Hazards associated with the
Hazard
control of a moving vehicle?
Yes
C0 C1 C2 C3 C4
NR R1 R2 R3 R4
S3
NR R2 R3 R4 R5
NR = No Risk
For each hazard, the risk graph is followed systematically from left to right in order to arrive at a
risk value R for that hazard. R is the Hazard Risk and represents the risk associated with a
hazardous event, given an occurrence of the hazard. The R values are therefore hazard
classifications.
In the MISRA Safety Analysis Guidelines approach, these hazard classifications are used
subsequently to set safety requirements (including random and systematic safety integrity
requirements) such that the probability of occurrence of hazards is less than or equal to the
broadly acceptable risk. Note that this approach is subtly different from the concept of risk
reduction encountered in IEC 61508 but very similar to the use of ASILs in ISO WD 26262 [3][11].
Usually the highest R value of all the hazards of the system leads to the safety integrity
requirements according to the following mapping:
Strictly speaking this aspect of the use of the MISRA Risk Graph is not part of hazard classification
but is provided in this section for completeness.
This has been done since some of the names (particularly “Debilitating” and “Distracting”) have
proven to be confusing when translated into languages other than English.
The Controllability classification still has to be performed by considering the four intermediate
parameters (I, A, B, T) – see section B.3.1.2.1.
The textual definitions remain the normative description of what each Controllability category
describes, although the severity descriptors have been removed since this is now covered by the
first part of the risk graph (e.g. “C4” is defined as “This relates to failures whose effects are not
controllable by the vehicle occupant(s). The outcome cannot be influenced by a human
response.“).
Work is presently ongoing to define an ISO standard (Working Draft ISO WD26262) for “Functional
Safety” [11] [3]. The hazard analysis and risk assessment in WD26262 is based on the concept of
ASIL (Automotive Safety Integrity Level). The ASIL appears to represent the degree to which an
14.11.2006 2.0 B-26
EASIS Deliverable D3.2 Part 1 - App B
individual failure mode of the considered system needs to be avoided. Thus, the derivation of the
ASIL classification is essentially a hazard classification activity. There are four ASIL levels: A, B, C
and D, with ASIL D representing the most critical hazards. In fact there is one more level outside
the ASIL range that is called “QM” which means that standard Quality Management techniques are
considered to be sufficient.
The Figure below illustrates the reasoning behind the ASIL classification. The risk associated with
a specific hazardous event depends on the frequency of the event and the severity (S) of the
resulting harm. The frequency depends on the occurrence rate (represented by ASIL) of the failure
mode considered, the exposure (E) to situations in which the hazardous event could occur and the
controllability (C) i.e. the degree to which humans can avoid the hazardous event.
In the classification scheme, the Exposure and Controllability parameters may take the following
values.
Exposure Controllability
E4: 1 C3: 1
E3: 0.1 C2: 0.1
E2: 0.01 C1: 0.01
E1: 0.001
The ASIL classification is then determined by the following table, with the MAIS ("Maximum
Abbreviated Injury Scale") numbers representing the accident severity in terms of the resulting
injury associated with an accident.
Probability E*C
Severity 1 0.1 0.01 0.001 0.000
1
S0: No QM QM QM QM QM
injuries
S1: MAIS 1-2 B A QM QM QM
S2: MAIS 3-4 C B A QM QM
S3: MAIS 5-6 D C B A QM
As the Exposure and Controllability are discretised into levels with a factor of ten between the
levels, it is reasonable to assume that the levels really should be interpreted as:
E4: 0.1-1 C3: 0.1-1
E3: 0.01-0.1 C2: 0.01-0.1
E2: 0.001-0.01 C1: 0.001-0.01
E1: 0.0001-0.001
The following observations can be made based on this rather limited information about the ASIL
approach.
• Application of the ASIL approach can create paradoxical results, as illustrated by the
following hypothetical example:
Consider a transient failure mode that in almost 10% of all driving situations is
guaranteed to result in an accident of severity S2. Thus, we assume that this failure
mode is completely uncontrollable (C=C3) in such driving situations. The ASIL
classification is S2*E3*C3 = ASIL B.
Let us now consider another transient failure mode that in 11% of all driving
situations could result an accident of the same severity S2. We assume that the
driver has slightly less than 90% chance of averting danger, meaning that just over
10% of all drivers are not able to control the damage (C=C3). The ASIL
classification is S2*E4*C3 = ASIL C.
Thus, the first case represents around 10% conditional probability of an accident,
given an occurrence of the hazard. The second case represents around 1.1%
conditional probability of an accident of the same severity, given an occurrence of
the hazard. The first case is therefore almost ten times worse than the first. Still, the
second case has a higher ASIL ranking and this is clearly paradoxical.
(Of course, estimation of precise values such as 10%, 11% and 90% above can not
be made in reality, but the example still illustrates the principle of the paradox.)
• It is not clear how to handle the case when the same failure mode can lead to different
severities, with different probability distributions between the severities in different driving
situations. These driving situations could very well have different controllability classes in
different driving situations, which further complicates the issue.
• An accident that results in S2 effects is not much different from an accident that results in
S3 effects. With respect to the electronic system considered, the differentiation between S2
and S3 does not seem necessary.
• The important characteristic of a failure mode is the conditional likelihood that it will lead to
a particular Severity level given an occurrence of the failure mode. It is not clear if the E
and C factors together provide a complete picture of this conditional probability.
B.3.1.5 MIL-STD-882D
Severity
Probability Catastrophic Critical Marginal Negligible
Frequent 1 3 7 13
Probable 2 5 9 16
Occasional 4 6 11 18
Remote 8 10 14 19
Improbable 12 15 17 20
Mishap risk assessment values can then be used to group individual hazards into “mishap risk
categories”. Mishap risk categories are then used to generate specific action such as mandatory
reporting of certain hazards to management for action or formal acceptance of the associated
mishap risk. In the table below, an example listing of mishap risk categories and the associated
assessment values is given. In this example, the system management has determined that mishap
risk assessment values 1 through 5 constitute “High” risk while values 6 through 9 constitute
“Serious” risk.
Higher risk categories indicate greater need for mishap risk reduction.
The standard gives suggested classifications for severity, which are reproduced here. Again these
need to be interpreted for the system under consideration; for example, the standard says that “…
The dollar values shown in this table should be established on a system by system basis
depending on the size of the system being considered to reflect the level of concern.”
The standard gives the following suggested mishap probability levels. It notes that the definitions
of descriptive words may have to be modified based on the quantity of items involved; and that the
expected size of the fleet or inventory should be defined prior to undertaking an assessment of the
system.
It should be noted that the approach described above evaluates the mishap risk, not the hazard
risk. The mishap risk is a combination of the severity of the mishap with the probability of the
mishap. The probability of a mishap is a combination of the probability of occurrence of the hazard
with the conditional probability of a mishap given that a hazard occurs. Thus comparing this to the
model used in EASIS it is observed that:
• The mishap severity classes are relevant to this Appendix
• The mishap probability levels combine the probability of occurrence of the hazard (relevant
to Appendix D) and the conditional probability of a mishap (relevant to this Appendix)
For hazard classification, therefore, the MIL-STD-882D approach could be difficult to apply directly
in the automotive applications.
The Society of Automotive Engineers (SAE) has published a Recommended Practice J1739 [18]
for FMEA (Failure Mode and Effects Analysis). With respect to hazard classification, the Severity
scale in J1739 is of some interest. This scale is given below:
• 10: Hazardous without warning
Very high severity ranking when a potential failure mode affects safe vehicle operation
and/or involves noncompliance with government regulation without warning.
• 9: Hazardous with warning
Very high severity ranking when a potential failure mode affects safe vehicle operation
and/or involves noncompliance with government regulation with warning.
• 8: Very High
Vehicle/item inoperable (loss of primary function).
• 7: High
Vehicle/Item operable, but at a reduced level of performance. Customer very
dissatisfied.
• 6: Moderate
Vehicle/Item operable, but Comfort/Convenience item(s)inoperable. Customer
dissatisfied.
• 5: Low
Vehicle/Item operable, but Comfort/Convenience item(s) operable at a reduced level of
performance. Customer somewhat dissatisfied.
• 4: Very Low
Fit and Finish/Squeak and Rattle item does not conform. Defect noticed by most
customers (greater than 75%).
• 3: Minor
Fit and Finish/Squeak and Rattle item does not conform. Defect noticed by 50% of
customers.
• 2: Very Minor
Fit and Finish/Squeak and Rattle item does not conform. Defect noticed by
discriminating customers (less than 25%).
• 1: None
No discernible effect.
Some comments and observations concerning this scale are given below.
• Only two of the levels are related to safety issues: 9 and 10. The distinction between
them is solely determined by whether the driver is informed about the failure or not.
Thus, the scale appears to be too coarse to be of much use for hazard classification.
• According to the scale, a very dangerous failure mode with warning is assigned a lower
severity than a slightly dangerous failure mode without warning. It is easy to think of
examples where this classification differs from the intuitive understanding of severity.
• The failure modes that have a severity classification of 9 or 10 should be analysed in
more depth, using one of the more sophisticated approaches described in this
document such as the ISO WD 26262 approach, the MISRA risk graph or the
alternative hazard classification approach.
In this section, a new hazard classification approach developed in the EASIS project is described.
First, however, the background underlying the approach is explained.
B.3.2.1 Background
The following observations regarding the relationship between hazards and their consequences
can be made:
• The relationship between hazards and their consequences is typically much more
complex than 'a hazard causes a consequence', particularly in the extremely complex
automotive environment. Typically, a hazard contributes to the occurrence of an
outcome and this contribution may range from very weak to very strong depending on
the characteristics of the hazard.
• When a particular hazard occurs, several different outcomes are possible. It is not
necessarily the outcome with the worst severity that is the most important one as it may
be relevant only in extremely rare driving scenarios. Other less severe outcomes may
be much more likely, possibly to the point that consideration of these outcomes
determines the criticality of the hazard.
Concerning the conditional probability of each potential outcome, vague expressions such as
”likely”, ”unlikely”, ”extremely unlikely” and ”almost impossible” should be avoided in the hazard
classification unless these are defined quantitatively. Qualitative expressions are only meaningful if
there is a reference that they can be related to. For example, consider the following statements:
• ”John is very short”
• ”Peter's apartment is quite big”
Within a society with a common comprehension of human dimensions and apartment sizes, these
statements provide meaningful information about John's stature and Peter's apartment. In contrast,
consider the following statement:
• If hazard H5 occurs, it is very unlikely that it will result in a fatal outcome
This statement does not convey much information. ”Very unlikely” could mean any probability from
perhaps 10-7 to 10-2. The reason for this vagueness is of course that there is no typical probability
that can be used as a reference. Note that a quantitative definition such as:
”very unlikely” is defined as a probability between 10-4 and 10-3
would mean that ”very unlikely” is a (discretised) quantitative measure rather than a qualitative
measure.
In some cases, it needs to be decided whether the hazard should be judged against a reference
situation in which the system operates as intended or against a reference situation in which the
vehicle is not equipped with the system at all. These two cases are clarified by the following
examples:
o The criticality of the hazard ”front airbag inoperable” may be determined based on a
comparison with ”front airbag works as intended”
o The criticality of the hazard “front airbag inoperable” may be determined based on a
comparison with “vehicle is not equipped with a front airbag”
The selection of whether to use “works as intended” or “not equipped” as the reference would
typically be determined based on whether the system can be considered as standard equipment or
not. This means that the choice is time-dependent. Systems and functions that were once
considered optional tend to become standard equipment as the years go by.
When a hazard occurs in real life, the outcome will be determined by the following factors, some of
which may be more or less relevant depending on the nature of the actual hazard:
• The particular hazard, in terms of how it affects the vehicle
• Speed of the vehicle
• Position and speeds of surrounding vehicles and other objects
• The state of the vehicle (for example cruise control on/off, actual gear)
• Road characteristics (crossing, highway, city street, curve, uphill/downhill, wide/narrow,
fences, etc)
• The road surface conditions (wet/dry, gravel, asphalt, snow, etc)
• Visibility conditions (day, night, fog, sun glare, etc)
• The skill and mental conditions of the drivers involved (experienced/novice, tired/alert,
alcohol influence, etc)
• Other
Defining a classification methodology that perfectly accounts for all these factors is simply not
possible, so the classification methodology has to be based on a simplified view of the relationship
between hazards and outcomes.
It should be noted that legal requirements and company policy may influence the classification of a
particular hazard. If two hazards are equivalent with respect to the conditional probabilities of their
potential outcomes, they may still need to be classified differently due to legal requirements or
company policy. We do not believe that it is possible to define a formalized classification scheme
that covers all characteristics of hazards perfectly. Thus, the classification approach should allow
hazards to be assigned a criticality level different from the one resulting from any formal
classification method.
Two characteristics of a hazard that are often important to the hazard classification are the hazard
duration and whether or not the driver is notified about the existence of the hazard.
Hazard duration
The following examples show why the duration of the hazard is important in the hazard
classification.
• The hazard “front airbag inoperable during a time less than one second” can
only lead to an effect if there is a simultaneous front-end collision. Thus, this
hazard is not particularly critical. (In this example we assume that there is no
causal relationship between the airbag inoperability and the collision.)
• The hazard “front airbag permanently inoperable” can only lead to an effect if
there is a collision. It is obviously a more critical hazard than the one-second
example above, but still not extremely critical as collisions are rare events.
• The hazard “full engine torque produced during 100 ms without a good reason”
is not particularly critical since the speed increase will be very small in this short
time.
However, a short-duration hazard does not necessarily have a low criticality. Consider the
hazard “front airbag not able to remain passive” which is equivalent to “front airbag is
14.11.2006 2.0 B-34
EASIS Deliverable D3.2 Part 1 - App B
activated (inflated) when not needed”. This hazard has an extremely short duration but is
still quite critical since it will lead to a safety-related effect whenever it occurs. (Here, we
ignore the extremely unlikely scenario that a fault in the system causes a triggering of the
airbag and a front-end collision happens to occur simultaneously.)
Driver notification
Another important characteristic of a hazard is whether or not the driver is informed about
the existence of the hazard. This distinction is important for hazards that have a long
duration and that may lead to effects a long time after the beginning of this duration.
Typically, such hazards are present permanently until a repair action is carried out in a
service station. An example is the hazard “front airbag permanently inoperable”. If the driver
(or possibly a service station) is informed about the existence of the hazard, the necessary
repair action could be carried out. Without any such information about the lack of airbag
functionality to the driver, this hazard may remain undetected until a front-end collision
occurs. Thus, the driver notification will have an impact on the Exposure parameter. To
summarize, this discussion, we conclude that hazards caused by detectable (and therefore
reportable) errors in the system are usually more benign than hazards caused by
undetectable errors.
For some hazards, the driver notification could additionally result in the driver adapting
his/her driving style. For example, if the driver is informed that there is something wrong
with the parking brake, he/she can be expected to avoid parking in a steep slope. This
illustrates another reason why the hazard classification should consider whether the driver
is informed or not about the existence of the hazard.
Finally, it should be noted that the driver may notice the occurrence of a hazard by other
means than by visual information such as telltales and instrument displays. The hazard
could for example lead to a vehicle behaviour that makes the driver understand that
something is wrong. This type of “driver notification” should also be considered in the
hazard classification when relevant.
Below, a hazard classification method is outlined based on the observations above. This
classification can be said to represent the criticality of the hazard. By criticality, we mean the
combination of the severities of the potential effects and the conditional probability of these
severities, given an occurrence of the hazard. Thus, the criticality is independent of the hazard
occurrence rate.
It might seem strange to introduce a new concept such as Hazard Criticality, when SIL (Safety
Integrity Level) is already an established concept. However, the SIL as defined in IEC 61508 [5] is
a discretised value of the target probability of a dangerous failure (of a "safety function") and
therefore more requirement-oriented than classification-oriented. We believe Hazard Criticality is a
more appropriate term to use in the hazard classification since it separates the classification from
any target probabilities. Such probabilistic requirements, for example the SIL as defined in IEC
61508, should not be embedded in the hazard classification, but should result from a
consideration of both the hazard criticality and the tolerable risk.
The hazard classification approach is based on three parameters:
• Severity
• Exposure
• Possibility of non-avoidance
Severity
The Severity is a measure of the harm associated with each safety-related potential outcome of the
hazard
• S2: One or more fatalities and/or major injuries
• S1: One or more minor injuries
With these severity levels, only those outcomes that are related to safety are considered. It is
certainly possible, however, to extend the classification approach to cover any undesirable system
states and not just hazards. This means that not only safety but also reliability can be addressed
by the methodology. For example, additional Severity levels representing e.g. ”major customer
dissatisfaction”, ”minor customer dissatisfaction”, etc. could be introduced.
Exposure
The Exposure represents the probability that a particular outcome is possible, given that the
hazard occurs. For a hazard that occurs at time t and that has a very short duration, the Exposure
is simply the probability that the vehicle at time t is in a situation in which the hazard is relevant
with respect to the particular outcome considered. For a hazard with a longer duration, the
Exposure is a measure of the probability that a situation arises in which the hazard might lead to a
particular outcome before the hazard disappears.
Possibility of non-avoidance
This is quite similar to the MISRA controllability concept, the main difference being that the
possibility of non-avoidance is described in quantitative terms. It is a measure of how likely it is that
a particular consequence will occur, given an occurrence of the hazard and given that the vehicle
is in a particular situation in which the consequence is possible.
The classification methodology for the determination of a Hazard Criticality (HC) associated with
any given hazard can now be described in the form of an algorithm:
For i = 1 to 2
Prob:= 0
For each driving situation in which the hazard may lead to severity Si
Determine Exposure (E) with respect to Si for this situation
Determine Possibility of non-avoidance (P) of Si in this situation
Prob:= Prob + E*P
End For
Determine a tentative HC level (A-E or A-D) for Si according to the table
below
End For
Determine the HC as the highest tentative HC level found so far
Prob
<0.0000 0.00001- 0.0001- 0.001- 0.01- >0.1
1 0.0001 0.001 0.01 0.1
S2 - HC A HC B HC C HC D HC E (or
D)
S1 - - HC A HC B HC C HC D
In the light of the issues discussed in sections B.3.2.1.2-B.3.2.1.3, the resulting HC might need to
be modified to account for some factors that are not captured by the algorithm above:
• For systems and functions that are far from standard equipment, it might be appropriate to
reduce the criticality of hazards that can be described as “the vehicle behaves as if it is not
equipped with the system at all”. For example, the criticality of the hazard ”collision
avoidance function unavailable” should perhaps be reduced since this function is (today)
not considered an essential part of the vehicle? The decision about how far to reduce the
criticality, possibly all the way to ”no HC classification”, should be determined by:
o how the system is marketed (for example: ”The collision avoidance system is not
guaranteed to avoid collisions in all situations and the driver is still responsible for
controlling the vehicle in a safe manner”)
o whether the driver can be expected to adapt his/her driving style to the existence of
the system
• Legal issues and company policy might make the final classification differ from the level
determined by the algorithm presented above.
The novel approach for hazard classification is quite similar to the WD 26262 approach presented
in section B.3.1.4. The main difference is in the quantitative rather than qualitative estimation of the
Exposure (E) and Possibility of non-avoidance (P). The use of quantitative conditional is certainly
unusual. Below, the reasons for selecting a quantitative approach are given.
• First it should be noted, that the determination of E and C in the WD 26262 approach is
somewhat quantitative too since the different levels are defined by numerical measures.
o E2 represents an exposure less than 1%
o E3 represents an exposure between 1% and 10%
o C1 represents less than 1%
o C2 represents 1% to 10%
• Unlike the ASIL approach, the discretisation of E and P is postponed until E and P have
been estimated and the results combined into a single (estimation of) conditional
probability. The ASIL approach leads to large discretisation errors since every intermediate
value of E and C is rounded to the next power of ten before the values are combined.
• E*C in the ASIL classification is in principle a measure of the conditional probability of the
specified harm, given an occurrence of the hazard. With this interpretation, the conditional
probabilities and the corresponding ASIL levels for ASIL B-D (for severity S3) are:
• 0.01-1 ASIL D
• 0.001-0.1 ASIL C
• (0.0001)-0.01 ASIL B
There are obviously overlaps between these ASIL ranges. A hazard may actually be to ten
times more likely to cause a specific level of harm than another hazard and still get a lower
ASIL rating. This overlap problem is eliminated, or at least significantly reduced, by the
novel approach.
• The novel approach is better adapted than other methods to deal with hazards for which
the exposure is extremely low. If the exposure is below 0.00001, the HC classification will
be "none". The ISO WD 26262, on the other hand, may end up with ASIL A for such
hazards. The following highly hypothetical example illustrates why ASIL A may be an
inappropriate classification in such a case:
o A system that protects car occupants from falling meteorites is theoretically
conceivable in convertible cars. One of the hazards of such a system can be
defined as "inability to provide protection". In the ISO WD 26262 approach, this
hazard would be classified as ASIL A (based on S3, E1, C3). Due to the low
exposure, i.e. the low probability of the car being hit by a falling meteorite, the novel
approach would instead end up with a Hazard Criticality of "none". Most people
would probably agree that the unavailability of such a meteorite protection system is
an extremely minor hazard, at least on this planet. (More realistic examples
involving extremely rare scenarios could be given.)
• The major drawback of the novel quantitative approach is that it is typically more difficult to
argue why a particular exposure probability (e.g. 0.035) has been chosen than to argue
why a particular exposure class (E1, E2, E3 or E4) has been chosen. This observation may
make the quantitative approach unacceptable for practical application in the automotive
industry. However, it should be noted the quantitative classification method could be
complemented with guidelines (tables, etc) on how to do the estimations. The
establishment of such tables is outside the scope of the EASIS project.
In this section we present a comparison between different hazard classification schemes using a
small number of hazards to conduct a benchmarking exercise.
The following hazard classification schemes were used. The rationale for choosing each one is
given.
• The “ASIL” approach of ISO WD 26262. This was chosen since ISO 26262 will be the future
standard for functional safety of automotive electronic systems and the scheme represents an
emerging automotive approach.
• The MISRA Risk Graph approach [19]. This was chosen since this represents another
automotive approach that has been used successfully for a number of years. The MISRA Risk
Graph incorporates the established “Controllability” approach that was originally developed for
telematic/ITS applications. Therefore the MISRA approach is seen as one method that can
take account of future systems as well as today’s systems.
• The EASIS approach. This new approach proposed by the EASIS project should be compared
against these existing approaches.
The results of the benchmarking using the different methods are presented. Note that all of the
results presented are the result of an exercise for the purposes of the EASIS project and should
not be taken as representative of the hazard classifications that are to be applied to a production
system. For any production system, a hazard analysis has to be carried out starting from the
constraints, system boundary, assumptions, etc. applicable to that system and the vehicle it will be
installed in.
The results of applying the ISO method independently by two different analysts are presented in
the following tables.
B.3.3.3.1.1Analyst 1
B.3.3.3.1.2Analyst 2
The results of applying the MISRA method independently by two different analysts are presented in
the following sections.
B.3.3.3.2.1Analyst 1
The controllability classification is C4, which is defined as “This relates to failures whose effects
are not controllable by the vehicle occupant(s). The outcome cannot be influenced by a human
response. Avoidance of an accident is usually extremely difficult.” Although the firing of the airbag
does not affect the systems with which the driver controls the vehicle, the driver may be severely
distracted or even rendered unconscious by the event. Thus C4 is an appropriate classification.
Note, the “A” parameter in controllability has been graded as “E”. This is on the basis that no
functionality within the system boundary has been lost with which the driver can control the safety
of the situation following the occurrence of the hazard. Similarly the “B” parameter has been
graded “E” as all the standard vehicle control functions are assumed to be unaffected.
This leads to a hazard classification of R4.
Airbag function unavailable without warning
It should be noted that the hazard is that the airbag function is unavailable (presumably due to
some fault) but that this hazard will not lead to an unwanted occurrence until a demand should be
made to operate the airbag system. Using the MISRA Risk Graph, the following classification is
made:
Severity S2 (maximum of one fatality)
Moving/non-moving Non-moving – this is a hazard associated with a protection function
Controllability
I A B T Controllability Commentary
It is assumed that no other systems are dependent on
E the correct functioning of the stand-alone braking
system.
It is assumed assume that steering is unaffected
B (assuming ABS is functioning) but there is no
longitudinal control of the vehicle possible.
It is assumed that steering is unaffected (assuming
ABS is functioning) but there is no longitudinal control
C3/C4 of the vehicle possible and there are no other systems
with which this control can be effected. Note that if
B/A
the analysis had been concerned with a hazard such
as loss of engine power this would have been ranked
“C” since steering, brakes, etc. would be unaffected in
the short term.
The driver will have to react extremely quickly to a
B very unusual and unexpected situation, but it should
be obvious what needs to be done.
The controllability classification is C3, which is defined as “This relates to failures whose effects
are not normally controllable by the vehicle occupant(s) but could, under favourable
circumstances, be influenced by an experienced human response. Avoidance of an accident is
usually very difficult.”
It could also be argued that the main outcome of this hazard is that another vehicle collides with
this vehicle. In this case the controllability classification C4 would be more appropriate as there is
nothing the driver of this vehicle can do to prevent this. This would be reflected by choosing the
grade “A” for backups (since no backups are available with which to control the outcome).
If this hazard was alternatively considered from the perspective of an incorrect request for braking
by an emergency brake assist function, then this analysis could also be conducted by treating this
as a failure of a protection system. In this case the following apply:
Severity S2 (maximum of one fatality)
Moving/non-moving Non-moving – this is a hazard associated with a protection function
Exposure F2 the hazard could occur at any time during driving
Possibility to avoid P2 no possibility to avoid hazard (worst case: hit by another car)
This leads to a hazard classification of R4.
Therefore, overall, taking a conservative approach leads to a hazard classification of R4.
Unwanted full braking, 1 s duration
This hazard is again assumed to originate from the brake system itself, rather than from a higher-
level system making an incorrect demand for braking which the brake system then (correctly)
provides. Using the MISRA Risk Graph, the following classification is made:
Controllability
I A B T Controllability Commentary
It is assumed that no other systems are dependent on
E the correct functioning of the stand-alone braking
system.
We assume that steering is unaffected (assuming ABS
B is functioning) but there is no longitudinal control of
the vehicle possible.
We assume that steering is unaffected (assuming ABS
C3 is functioning) but there is no longitudinal control of
B
the vehicle possible and there are no other systems
with which this control can be effected.
The driver will have to react quickly but provided they
are driving to always leave an “escape route” (as
C/B recommended by driving instruction organizations)
they should still be able to steer the vehicle to a safe
location.
The principal distinct between this and the previous hazard is that whilst they both remove all but
the absolute minimum of control from the driver, on this occasion full control is returned after one
second. The final outcome then depends on how well the driver is able to handle this unexpected
and unusual situation.
The controllability classification is C3, which is defined as “This relates to failures whose effects
are not normally controllable by the vehicle occupant(s) but could, under favourable
circumstances, be influenced by an experienced human response. Avoidance of an accident is
usually very difficult.” This is still a reasonable classification since it is necessary to consider the
effect on the driver after the event has occurred – will they be shocked and how quickly would they
recover? Discussions with a vehicle dynamics expert have indicated that for the majority of
drivers, this situation would still be a severe shock and different drivers would react in different
ways to this.
Again a similar argument could be made to the above concerning collision with another vehicle
from behind, although in the case of a short duration event it is more likely that the other driver can
recover, providing there is a suitable/recommended inter-vehicle gap.
This leads to a hazard classification of R3.
“Collision avoidance by braking” function unavailable without warning
Using the MISRA Risk Graph and treating this as a “protection” function the following classification
applies:
Severity S2 (maximum of one fatality)
Moving/non-moving Non-moving – this is a hazard associated with a protection function
Exposure F1 exposure to a situation where it would be needed is rare
Possibility to avoid P2 no possibility to avoid hazard
This leads to a hazard classification of R3.
Alternatively, this could be considered as a moving vehicle hazard. In this case we have:
Severity S2 (maximum of one fatality)
Moving/non-moving Moving – the vehicle is assumed to be driving and the driver’s control of the
safety of the situation must be assessed
Controllability
I A B T Controllability Commentary
The collision avoidance system provides data for the
braking system, not only to initiate the collision
C
avoidance service, but also to say that the service is
not required.
E No control over basic vehicle functions has been lost
C4 No control over basic vehicle functions has been lost
E therefore the driver can use them to avoid a
hazardous situation
If the vehicle is already in a situation where the
collision avoidance function is required, it is unlikely
A
that a human could react in the time required to
prevent a collision.
The controllability classification is C4, which is defined as “This relates to failures whose effects
are not controllable by the vehicle occupant(s). The outcome cannot be influenced by a human
response. Avoidance of an accident is usually extremely difficult.” If the vehicle is already in a
situation where the collision avoidance is required, it is unlikely that a human could react in the
time required to prevent a collision. Thus C4 is an appropriate classification.
This leads to a hazard classification of R4.
Note that in practice the correct choice of which part of the risk graph to apply will depend on a
precise definition of the system and the safety envelope, which are not available for the purposes
of this exercise.
The controllability classification is C4, which is defined as “This relates to failures whose effects
are not controllable by the vehicle occupant(s). The outcome cannot be influenced by a human
response. Avoidance of an accident is usually extremely difficult.” The initial reaction of most
drivers to a failure to brake is to press the brake pedal harder. In some circumstances, by the time
the driver has recognized this is not working it may be too late to apply the backups. Thus C4 is
an appropriate classification.
This leads to a hazard classification of R4.
B.3.3.3.2.2Analyst 2
Hazard S F/C P R
Unwanted deployment of airbag S2 C3 - R3
Airbag function permanently unavailable, without any information to the driver about this S2 F1 P2 R3
unavailability
Unwanted full braking (around 10ms-2), not limited in time S2 C2 - R2
-2
Unwanted full braking (around 10ms ), 1 second duration S2 C2 - R2
“Collision avoidance by braking” function unavailable, without any information to the driver about S2 F1 P2 R3
this unavailability
Total loss of service brake function, i.e. not possible for the driver to brake using the brake pedal S2 C3 - R3
The results of applying the proposed EASIS method to the hazard (by "Analyst 2" only) are
presented in the following table.
Failure (Hazard) S - E - Exposure P- E*P HC - Hazard HC - Hazard
Severity Criticality Criticality (final)
Possibility
of non- (preliminary)
avoidance
Unwanted S2 0.5 0.01 0.005 C C
deployment of airbag
S1 0.9 0.1 0.09 C
Airbag function S2 0.0005 1 0.0005 B A
permanently
unavailable, without Note: the Note: The lack of
any information to the remainder of airbag function does
driver about this the vehicle life not cause the
unavailability is considered accident.
here
Furthermore, the
S1 0.005 1 0.005 B consequences could
be fatal even if the
Note: the airbag would have
remainder of worked.
the vehicle life
is considered => reduce HC from B
here to A.
Unwanted full braking S2 0.5 0.01 0.005 C C
(around 10m/s^2),
not limited in time S1 0.9 0.1 0.09 C
The following table presents a summary of the results of hazard classification using different
approaches.
Classification
Hazard ISO1 ISO2 MISRA1 MISRA2 EASIS
Unwanted deployment of airbag D C R4 R3 C
Airbag function unavailable A A R3 R3 A
Unwanted full braking (around 10 ms-2), not D C R4 R2 C
limited in time
Unwanted full braking (around 10 ms-2), 1 B B R3 R2 B
second duration
“Collision avoidance by braking” function B A R3 R3 A
unavailable
Total loss of service brake function D D R4 R3 D
The results presented above are the result of an exercise for the purposes of the EASIS project
and should not be taken as representative of the hazard classifications that are to be applied to a
production system. For any production system, a hazard analysis has to be carried out starting
from the constraints, system boundary, assumptions, etc. applicable to that system and the vehicle
it will be installed in.
The ISO approach is viewed by many as the method that will be applied in the future. Hence the
results of the hazard classification benchmarking are evaluated with respect to that method. In
order to make a meaningful comparison, we have restated the results with respect to the
equivalent ISO classifications in the following table:
Classification
Hazard ISO1 ISO2 MISRA1 MISRA2 EASIS
Unwanted deployment of airbag D C D C C
Airbag function unavailable A A C C A
Unwanted full braking (around 10 ms-2), not D C D B C
limited in time
Unwanted full braking (around 10 ms-2), 1 B B C B B
second duration
“Collision avoidance by braking” function B A C C A
unavailable
Total loss of service brake function D D D C D
In making this comparison, the following mappings have been assumed. Note that in both these
cases, these mappings only apply for hazards where the severity has been rated as S2/S3 (ISO),
S2 (MISRA), S2 (EASIS). This applied to all of the hazards benchmarked. In general it is not easy
to compare the hazard classifications like-for-like as the following severity mappings seem to
apply:
ISO MISRA EASIS
S0 (S0) No equivalent
S1 S1 S1
S2
S2 S2
S3
No equivalent S3 No equivalent
For comparing MISRA results to ISO results, the following mapping of hazard classifications was
used:
MISRA ISO
NR QM
R1 ASIL A
R2 ASIL B
R3 ASIL C
R4 ASIL D
R5 No equivalent
For comparing EASIS results to ISO results, the following mapping of hazard classifications was
used:
EASIS ISO
HC A ASIL A
HC B ASIL B
HC C ASIL C
HC D ASIL D
HC E
The EASIS mapping is explained as follows. In ISO WD26262 E and C are defined as ranges.
For example, E4 means an exposure range > 10% and C2 means that 1%–10% of the drivers
cannot control the damage. The relationship between “ISO E*C” and “EASIS E*C” can therefore be
summarized as follows:
14.11.2006 2.0 B-49
EASIS Deliverable D3.2 Part 1 - App B
ISO EASIS
E*C range EASIS E*C
calculated based on ranges
Possible E and Actual values represented by E
the ranges of actual corresponding
C and C, expressed as ranges
ISO E*C values represented to the ISO E*C
combinations
by E and C range
0.01–0.1
1 E4, C3 E4:(0.1–1), C3:(0.1–1) 0.01–1 and
0.1–1
E4, C2 E4:(0.1–1), C2:(0.01–0.1) 0.001–0.01
0.1 or or 0.001–0.1 and
E3, C3 E3:(0.01–0.1), C3:(0.1–1) 0.01–0.1
E4, C1 E4:(0.1–1), C1:(0.001–0.01)
or or 0.0001–0.001
0.01 E3, C2 E3:(0.01–0.1), C2:(0.01–0.1) 0.0001–0.01 and
or or 0.001–0.01
E2, C3 E2:(0.001–0.01), C3:(0.1–1)
E3, C1 E3:(0.01–0.1), C1:(0.001–0.01)
or or 0.00001–0.0001
0.001 E2, C2 E2:(0.001–0.01), C2:(0.01–0.1) 0.00001–0.001 and
or or 0.0001–0.001
E1, C3 E1:(0.0001–0.001), C3:(0.1–1)
This table shows that there is not a one-to-one mapping between E*C in ISO and the E*C ranges
in EASIS. Instead there is an overlapping relationship, also visible in the earlier table that shows
the HC/ASIL relationship. Note that for an actual Exposure*Controllability value of 0.05, the ISO
approach would lead to an E*C classification of either 1 or 0.1 (see the green cells above),
depending on how this 0.05 figure is decomposed into E and C factors. The EASIS approach does
not have this overlap possibility since every actual Exposure*Controllability value will be mapped to
exactly on of the intervals from 0.00001–0.0001 to 0.1–1. This is one of the main distinctions in the
EASIS scheme; the discretisation into distinct levels is done after the E*C multiplication, not
before as in ISO WD26262.
For the hazards “Unwanted deployment of airbag” and “Total loss of service brake function” the
results found between the different methods were largely consistent. The difference in rankings
between ASIL C and ASIL D is mainly due to different analyst perspectives on the ability of the
driver to influence the outcome. In a real-life hazard analysis, these results would be subjected to a
more detailed review, including the use of domain experts.
For the unwanted full braking hazards, there was some variability in the results although in general
all the methods classified them as ASIL C or D for unlimited duration and ASIL B or C for duration
limited to one second. Thus it is observed that the methods generally indicate a lower classification
for a shorter duration hazard, in line with expectations.
For the hazard “airbag function unavailable”, the MISRA method classifies this as ASIL C whereas
both the ISO and EASIS approaches classify this as ASIL A. This results from the option in the
MISRA Risk Graph to treat this as a hazard associated with failure of a protection function.
For the hazard “collision avoidance by braking unavailable” the greatest variability in the results
was seen ranging from ASIL A to ASIL D. The MISRA method led to classifications of ASIL C or D
whereas ISO and EASIS gave rise to ASIL A and ASIL B. The results from the ISO and EASIS
method are principally due to the “exposure” rating being chosen as E1 or E2 depending on the
analyst’s viewpoint. This choice of exposure parameter is from the perspective of situations where
collision avoidance might be required rather than saying it is a hazard present during all driving. In
the MISRA method, if the controllability approach is chosen the exposure parameter is not
evaluated as it is considered that hazards associated with the control of a moving vehicle could
occur at any time the vehicle is being driven. Alternatively the hazard risk for this type of hazard
can be evaluated according to failure to operate on demand.
For the braking-related hazards, in most cases the results obtained using the MISRA approach are
fairly consistent, in that the hazards associated with braking functions are ranked R3 (ISO ASIL C)
or R4 (ISO ASIL D). The classification of the unwanted full braking hazard with 1 second duration
could be argued as R4 from a more conservative viewpoint.
At first sight, the potential for the MISRA approach to classify a hazard such as “collision
avoidance by braking unavailable” at a similar level to complete loss of braking may appear to
show a wide variability between the approaches. However the difference is almost entirely due to
the ISO approach permitting a much wider range of exposure values. In effect, the ISO approach
is arguing for reduced risk reduction requirements based on the low-demand nature of the system
whereas the MISRA approach tends to a more conservative result. Furthermore, it could be
argued that the ISO approach effectively has an “ALARP” type argument built into its exposure
classes (it is argued that the frequency of exposure to the hazard is so small that the required
effort to achieve a higher risk reduction is disproportionate). On the other hand, this step would be
explicitly considered in the “risk analysis” phase of a full MISRA analysis which has not been done
for the purposes of this exercise. Such a risk analysis could also take into account wider issues
such as purely commercial considerations such as the perception of product quality.
Clearly such issues will need to be addressed in defining the “broadly acceptable” risk associated
with ISS and whether different levels of risk may need to apply depending on the type of system.
This is particularly the case with the expectation that ISS will contribute to changing (improving) the
current level of “broadly acceptable” risk.
B.3.3.6 Conclusions
It was found that in general the three hazard classification approaches gave largely consistent
results. Some variability due to different perspectives of the analysts was observed, but this would
be eliminated in a real-life analysis due to the definition of the safety envelope, system boundary,
etc. which are all unknown for the purposes of this exercise.
The MISRA and EASIS approaches were both found to have some advantages compared to the
ISO approach:
• The MISRA method has the option to use an alternative mechanism for hazard classification
for “protection”-like functions such as collision avoidance and airbag systems, particularly for
failure to operate on demand.
• The EASIS method avoids the possibility of overlaps and anomalies in ASIL allocation that may
occur when using the ISO method.
The widest variability between methods was seen when applying to “on demand” functions that
may be required to operate very infrequently. The ISO (and EASIS) approaches tend to consider
these from the perspective of the overall risk reduction required for the vehicle, but the MISRA
approach from the perspective of the risk reduction required from the specific system and tends to
a much more conservative estimate.
14.11.2006 2.0 B-51
EASIS Deliverable D3.2 Part 1 - App B
We can make the following general observations about hazard classifications schemes applied to
ISS. A correct approach to hazard classification is to first examine the hazard and the way the
system of concern contributes to the hazard and then to choose the parameters used for hazard
classification.
Possible ways that the system of concern might contribute to hazards are:
• Hazard risk reduction: In this case there a hazard exists that is present even without the
system of concern being present (a non-system hazard). The system of concern provides a
safety function to reduce the risk of this hazard. The criticality of the safety function depends
on the necessary risk reduction for the non-system hazard risk. The system hazard is the
inability to provide the safety function.
• Hazard creation: Here three different cases can be distinguished:
• Hazard creation by an error state or failure of the system of concern
• Hazard creation by a dangerous function of the system of concern
• Hazard creation by a non-functional interaction of the system of concern with its
environment.
Furthermore, the created hazards can be assigned to one of two categories depending on to their
possible effects:
• Hazards that have an effect on the controllability of the moving vehicle by the driver
• Hazards that have further effects
Note that some hazards might fall into both categories and thus have to be considered from both
perspectives. The following diagram clarifies this view:
Risk Risk
reduction reduction
System of
? by other
system
Concern
Hazard
creation Controllability- Classification
related effects Scheme II
Hazard
System activation
Hazard
Exposure
Classification
Further effects Scheme III
Depending on the system of concern hazard contribution and the hazard effect three different
classification schemes might be suitable with different parameter sets:
• Classification Scheme I: HC = f (Non-System Hazard Risk) = f (Severity, Frequency)
• Classification Scheme II: HC = f (Exposure, Controllability, Severity)
• Classification Scheme III: HC = f (Exposure, Possibility to avoid, Severity)
The following points should be noted:
• It may be possible to omit the severity category in classification scheme II. The only possible
distinction may be between light and heavy vehicles, since we are always considering a
possible crash and the number of victims might vary depending on whether it is a light or heavy
vehicle that gets out of control.
• In classification scheme III a finer granularity of the severity parameter might be suitable.
• The possibility to avoid in classification scheme III should take into account factors such as the
P and the W parameters from the ISO/IEC 61508 Part 5 Annex D risk graph, i.e. the possibility
that a human can avoid the consequences of an activated hazard and the possibility that the
activated hazard might not necessarily lead to damage or harm.
• The ISO/WD 26262 and EASIS approaches can be used for classification schemes II and III by
careful interpretation of the parameters. They cannot be used for classification scheme I even if
the parameters exposure and controllability are not estimated separately but instead the
product E*C is replaced by the parameter F (Frequency). The reason is that the product E*C is
a conditional probability and not the frequency needed in classification scheme I.
• The MISRA approach distinguishes between classification scheme II and III but does not fully
account for classification scheme I (it can be applied to such hazards but the classification
scheme depending only on severity and frequency has not been developed). For classification
scheme II in the MISRA approach the exposure is not included at present which can lead to a
different result for some hazards (e.g. “unavailability of the collision avoidance function”) if this
scheme is used to classify such hazards.
B.4 References
[1] ARP4761, Guidelines and methods for conducting the safety assessment process on civil airborne
systems and equipment, SAE Committee S-18, Society of Automotive Engineers, Inc., August 1995
[2] ATM Safety Techniques and Toolbox Issue 1.0, EUROCONTROL/FAA, January 24 2005;
http://www.eurocontrol.int/eec/public/standard_page/safety_doc_techniques_and_toolbox.html
[3] M. Findeis, Functional Safety in the Automotive Industry, Process and methods, Presentation at VDA
Winter Meeting, February 2006; http://www.vda-wintermeeting.de/downloads2006/Matthias_Findeis.pdf
[4] Interim Defence Standard 00-56, Safety Management Requirements for Defence Systems, Issue 3, UK
Ministry of Defence, December 2004.
[5] IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems, IEC,
1998–2005 (8 parts).
[6] IEC 61511, Functional safety – safety instrumented systems for the process industry sector, IEC, 2003–
2004 (3 parts).
[7] Interim Defence Standards 00-58 Issue 1: HAZOP Studies on Systems Containing Programmable
Electronics, UK Ministry of Defence, July 1996
[8] Towards a European Standard: The Development of Safe Road Transport Informatic Systems, Project
DRIVE Safely (V1051), 1992
[9] P. Johannessen, C. Grante, A. Alminger, U. Eklund, J. Torin, “Hazard Analysis in Object Oriented Design
of Dependable Systems”, International Conference on Dependable Systems and Networks (DSN'01),
2001.
[10] P. H. Jesty, K. M. Hobley, R .Evans, I. Kendall, “Safety analysis of vehicle-based systems”, Aspects of
Safety Management: Proceedings of the Ninth Safety Critical Systems Symposium, F. Redmill &T.
Anderson (ed.), Springer-Verlag, 2000, pp. 90-110; http://www.misra.org.uk/papers/SCSC00-SA.PDF
[11] C. Jung, Stand des ISO-Standards zur Funktionalen Sicherheit für die Automobilindustrie, (mainly in
English), Presentation at Safetronic 2005
[12] N. Leveson, Safeware - System, Safety and Computers, Addison Wesley, 1995.
[13] Development Guidelines for Vehicle Based Software, MIRA, 1994. Also available as ISO/TR 15497:2000
[14] Controllability, MISRA Technical Report, Version 1, May 2004; http://www.misra.org.uk
[15] Y. Papadopoulos, J. McDermid, R. Sasse, G. Heiner, “Analysis and synthesis of the behaviour of complex
programmable electronic systems in conditions of failure”, Reliability Engineering and System Safety 71,
2001, pp. 229-247
[16] Framework for the development and assessment of safety-related UTMC2 systems, UTMC22 Project;
http://www.utmc.gov.uk/utmc22/pdf/utmc22-framework.pdf
[17] MIL-STD-882D, Standard Practice for System Safety, U.S. Department of Defence, 2000.
[18] SAE J1739, Potential Failure Mode and Effects Analysis in Design (Design FMEA), Potential Failure Mode
and Effects Analysis in Manufacturing and Assembly, Processes (Process FMEA), and Potential Failure
Mode and Effects Analysis for Machinery (Machinery FMEA), SAE International, 2002
[19] Guidelines for the Safety Analysis of Vehicle-Based Programmable Systems, MISRA, in preparation;
http://www.misra.org.uk
Table of contents
According to Appendix A, the EASIS dependability framework is based on the concept of a hazard.
More specifically, a hazard in this framework is taken to mean the following:
A hazard is an undesirable condition or state of a vehicle that could lead to an undesirable
outcome depending on other factors that can influence the outcome.
The objective of this Appendix is to provide guidance for the identification and investigation of the
causal relationships between potential faults in the system of concern and the resulting hazards.
Typically, for highly integrated and complex systems, these relationships involves several
“intermediate” failure conditions in different system layers.
The most relevant topics related to the activities addressed by this Appendix are the following:
• Role of the hazard occurrence analysis in different stages of the system lifecycle:
The general approach and the outputs of a hazard occurrence analysis depend on the
system development phase in which the analyses are performed. This topic is
addressed in section C.2.
The Dependability Activity Framework that forms the basis for Part 1 of this EASIS deliverable is
illustrated in Figure C.1 and described in more detail in Appendix A. It can be seen that the hazard
occurrence analysis supports several other activities in the system development process:
• Hazard Identification (see Appendix B)
• Establishment of dependability-related requirements (see Appendix D)
• Safety Case construction (see Appendix E)
Identification of hazards
Development
and design of
the integrated
Classification of Hazard occurrence safety system
hazards analysis
Establishment of dependability-related
requirements
Verification and
validation of
dependability-
related
requirements
Safety Case construction
Considering the overall system development process sketched in Figure C.2, it is possible to
identify three main phases in which the approach used to perform the hazard occurrence analysis
may be different: early, intermediate and later system development phases.
System Development
Requirements Design Test Process
rd ard is
ar d s is a za is az nalys
z y H ys H
y Ha Anal i ate Anal ter A
r l
Ea ence ed La ence
r term ence rr
cu
r In curr cu Hazard Occurrence
c Oc
O Oc Analyses
In early phases of the development, the hazard occurrence analysis is mainly focused to support
the hazard identification and classification activities (see Appendix B) and the extraction of the
dependability requirements activities (see Appendix D) by providing a clear view of how each
identified hazard is related to its potential causes.
The following information is typically available for hazard occurrence analysis in the early phases
of the design process:
• The conceptual architecture of the system
• The list of the system high-level functions
• The list of system hazards
Two main objectives can be achieved by the hazard occurrence analysis:
• Completeness of the hazards list. Starting from the system description, the use of
analysis based on inductive reasoning methods (section C.4.2) like FMEA allows the
identification of all system root causes of hazards. In this context, the hazard occurrence
analysis can be used as input to the “hazard identification” activity to complete the
hazards list
• System’s causal relationships investigation. Starting from each identified hazard, the
use of analysis based on deductive reasoning methods (section C.4.2) like FTA allow to
investigate on how the system root causes combine together and propagate across the
system components for leading to the specific hazard. In this context, the hazard
occurrence analysis can be used as input to the “dependability-related requirements”
activity.
In the early phases of the design process, the hazard occurrence analysis is typically performed in
a qualitative form. Thus, the causal relationships between hazards and their causes are
investigated, not the actual probabilities or rates of the hazards.
In the intermediate phases of the development process the hazard occurrence analysis is mainly
focused to support the establishment of dependability-related requirements activities (see
Appendix D) by providing a clear view of how each identified hazard is related to its potential
causes and, when possible, an estimation of the hazard occurrence probability.
The following information are typically available for hazard occurrence analysis in the intermediate
phases of the design process:
• Detailed descriptions of the system and its subsystems in terms of HW and SW
components
• The list of the system primary functions with their allocation to the components
• The list of system hazards
The hazard occurrence analysis is typically split between the vehicle manufacturer and the
involved supplier, so that the OEM investigates how system/subsystem failures lead to vehicle-
level hazards, while the suppliers investigate how component faults may lead to system/subsystem
failures. However, the nature of this responsibility split depends heavily on the type of system
considered, so it is not possible to define a preferred approach for how to assign these
responsibilities.
As the main objectives of the hazard occurrence analysis is to determine how the system root
causes can lead to the identified hazards, analysis based on deductive reasoning methods
(section C.4.2) is recommended.
For automotive systems, Fault Tree Analysis is particularly useful during the intermediate phases
of a system development process. Starting from an undesired top level hazard (output from
hazard identification activities, see Appendix B), a Fault Tree Analysis systematically determines
all credible single faults and failure combinations of the system functional blocks at the next lower
level which could cause this hazard event. The analysis proceeds down through successively more
detailed (i.e., lower) levels of the system (subsystems and components) until a root cause is
identified. FTA is discussed in more detail in section C.4.3.2.
The main goals achievable by using a qualitative FTA in the hazard occurrence analysis are:
• Investigation of single and multiple-fault effects
• Investigation of dependent failure sources (common cause analysis)
• Investigation of fail-safe design attributes (fault-tolerant and error-tolerant).
These “outputs” of the hazard occurrence analysis are the starting point for the next activities of
dependability-related requirements specification (see Appendix D). In this context, the advantages
coming from FTA application are:
• Clear evidence of architectural features and dependability functional requirements such
as redundancies, independence, diversity etc.
In the final phases of the development process, the hazard occurrence analysis is focused to
support the verification of dependability-related requirements activities (see Appendix D) and to
document the Dependability Activities performed on the system (see Appendix E).
The analyses performed in the previous phases of the system development process are re-used
and refined in the context of the verification process to check that the implemented design meets
its dependability requirements.
As an example, let us consider a quantitative FTA conducted on a specific system hazard
identified during the early or intermediate phases of the development process. The output
expected from the analysis are:
• Quantification of the hazard probability of occurrence
• Allocation of probability budgets to lower-level components
• Evaluation of exposure intervals, latency, and “at-risk” intervals with regard to their
overall impact on the system.
In the context of requirements verification, the same FTA is used for the following purposes:
• To confirm the qualitative assessments with reference to the implemented detailed
design of the system and its components (subsystems/items as applicable)
• To check the budget probabilities allocated to the lower level subsystems and root
causes against their actual failure occurrence coming from failure rate historical data of
similar equipment already in field use, reliability analysis, tests or laboratory data
• To evaluate the quantitative importance of root causes with respect to the top level
hazard occurrence.
The FMEA achievements are collected into forms, that typically include fields for the following
information:
• Identification of component, signal, and/or function
• Failure modes and associated hardware failure rates (numerical or categorical)
• Failure effects (directly and/or at the next higher level)
• Detectability and means of detection
• Compensating actions (i.e. automatic or manual)
• Operating phase in which the failure occurs
• Severity of failure effects
A concept for the distinction between fault, error and failure has already been proposed in [21]. As
this concept is widely used within the fault tolerance community, we have adopted it for our work in
EASIS.
In order to understand the difference between the three terms, it is important to take into account
the notion of “reliability”, which according to [21] is considered a “measure of the success with
which the system conforms to some authoritative specification of its behaviour”.
• a failure is a situation in which “the behaviour of a system deviates from the specification”
• an error is that part of the system state which is incorrect and may thus lead to a failure
• a fault is the adjudged cause for an error
A short example follows in order to clarify this relationship: In the sense of the terminology which
has been described above, an open circuit is a typical example of a fault. This fault may lead to a
wrong value of a program variable, which – being part of an incorrect system state – is an error.
This error may cause e. g. a steering system not to react to user input in a correct way, which
clearly would be a failure as this behaviour most likely deviates from the specification.
As described in [22], a fault model describes the possible faults of a system regarding both
structure (fault location) and function (type of wrong behaviour).
Examples of fault locations are CPU processing core, RAM, ROM, sensor, actuator, power supply
and communication network.
Examples of wrong behaviour are omitted output, wrong output, delayed output, an output being
transmitted two or more times to the same recipient or the transmission of an output to a wrong
recipient. In most of these examples, the "output" could be a hardwired signal or a signal
transmitted on a communication link, but some examples are only applicable for communication
signals.
Even correct output has to be considered, as a faulty component can submit correct results in
between faulty ones. In order to structure fault types, distinctions can be made between timing and
value faults, detectable and non-detectable faults as well as byzantine and non-byzantine. A more
detailed distinction is made in a section on fault types being part of EASIS Deliverable D1.2 [23].
A postulation on fault occurrence and duration may also be part of a fault model leading to a
classification into transient (i.e. temporary fault that does not reoccur within a defined time period),
intermittent (i.e. alternating between being present and not present) and permanent (i.e. constantly
present) faults.
Discussions during several work package-meetings made it clear that a common view of terms or
concepts is hard to establish as people from different background use terms differently. One
suggestion was a generalization of the term “fault” to also include “unexpected events” as long as
they lead to “undesirable system behaviour”. Several ideas of what possibly could be regarded as
a fault have been brought up, among them faulty specifications but also e. g. intentional and
malicious interaction with a system.
This chapter proposes a life-cycle model based on [22] to arrange all suggested terms in a broader
concept in order to achieve a common understanding.
A typical life-cycle of a technical system consists of the phases
• Design
• Production and
• Operation.
The design phase comprises specification, implementation and documentation; the production
phase includes both the manufacturing of hardware and the compilation of software. During the
operation phase both operating in a narrow sense as well as maintenance have to be considered.
This is illustrated in Figure C.4.
Implementation
During any of these phases, faults may occur. Regarding the design phase, specification (e. g.
wrong definition of operating conditions) as well as implementation (e. g. software bugs which
occur even despite of a correct specification) and documentation (which may be misleading and
thus cause wrong usage) are subject to faults.
As far as the production phase is concerned, both faults which arise during the manufacturing of
the hardware and e. g. wrong compilation of software have to be taken into account.
14.11.2006 2.0 C-8
EASIS Deliverable D3.2 Part 1 - App C
In the operation phase, a further distinction between faults caused by disturbances (e.g.
mechanical or electrical influence), wear out or physical influence can be made. Handling faults
and maintenance faults also fall into this category.
Based on this model, all proposals from project partners (numbered i.-x.) that have been made
during the discussions can now be considered.
i. Specification errors
As already described, these errors refer to the transition of user requirements to a system
specification and can well be incorporated into the above-mentioned model.
ii. External faults combined with an inability of the system to cope with them
Here it is important to distinguish between those faults which shall be tolerated according to
the system specification and those faults which the system is, again according to its
specification, not required to cope with. Of course, though in practice not all situations can
ever be taken into account, any safety-critical system should be designed in a way that is
as robust as reasonably possible.
iii. Human-made software faults (including mistakes made at any stage between
specification and coding)
These faults count as “implementation faults”. One possible countermeasure is diverse
software development.
iv. Faults in development tools (e.g. code generators and compilers)
Although there is little possibility to foresee such faults, they can lead to subsequent faults
during the “production”-phase.
v. Hardware design faults
According to the abovementioned model, these faults refer to specification, implementation
or documentation of hardware.
vi. Hardware manufacturing faults
Such faults fall into the “production”-phase, as described above.
vii. Permanent hardware faults caused by wear-out and/or environmental stress
(temperature, humidity, vibration, etc)
Typically, the specification describes requirements regarding e. g. minimal/maximal
operating temperature. Depending on whether this specification is inappropriate or whether
the specification has not been implemented correctly, the initial cause may be attributed to
different phases. However, in any case the resulting breakdown happens during the
“operation”-phase.
viii. Transient hardware faults caused by electromagnetic interference (radiated
interference, supply voltage anomalies) and other types of radiation
As in vii., this topic may again be related to the specification but is usually attributed to the
“operation”-phase.
ix. System-external conditions affecting the inputs to the system (for example scenarios
that prevent sensors from correctly measuring physical quantities)
In contrast to vii., these “system-external conditions” do not necessarily lead to permanent
hardware faults. For example fog can lead to misreading of a sensor value. However, it is
still a question of the design of a system to take into account all conditions under which the
system shall operate in a correct manner.
x. Intentional and malicious interaction with the system
These security problems play a role when telematics functions shall provide input for other
functions or even if they just share the same resources which may be possible in Integrated
14.11.2006 2.0 C-9
EASIS Deliverable D3.2 Part 1 - App C
Safety Systems. The topic of security may be important for the gateway-subtask, but is not
addressed here.
As a conclusion one might say that most of the proposals which have been made can be
subsumed in the life-cycle concept described above. However, the term “unexpected event” is
possibly no real generalization as faults do not necessarily have to be unexpected. On the other
hand, a fault does not always lead to undesirable behaviour as it may be masked by fault tolerance
mechanisms. Referring to chapter 3.1 it thus seems to be most reasonable to leave the meaning of
fault as an “adjudged cause for an error”, because the overall concept seems to be clearer.
Different analysis techniques may be usefully selected depending on how they explore the
relationship between causes and effect inside the system of concern, the type of the results
expected, i.e. qualitative and/or quantitative, and the applicability for analysis of static and dynamic
systems respectively.
The hazard occurrence analysis can be performed either in a qualitative form, or in a quantitative
form depending on the final scope for which it is planned.
By qualitative analysis, the casual relationship leading to a hazard is investigated, while by
quantitative analysis, the results from the qualitative analysis are used to calculate probabilities of
hazards occurrence, using the probabilities of lower level components as input data to this
calculation.
C.4.3.2. By this technique, the cause-effect relationship leading to a system hazard is investigated
starting from its known effect on the system and deducing all the possible causes.
Exploratory reasoning is a third method used to link unknown causes to unknown effects
involving the discovery or invention of a hypothesis to explain a novel phenomenon. An exploratory
analysis may be structure generating, model generating, or hypothesis generating. This is a form
of explanation-based reasoning, but there is no ready-to-hand set of knowledge structures from
which to construct an explanation. The situation assessor must search long-term memory for
appropriate knowledge.
A typical hazard occurrence analysis method based on exploratory reasoning is the HAZard and
OPerability study (HAZOP), further discussed in section C.4.3.3. In this case the cause-effect
relationship leading to a system hazard is investigated starting from a specified deviation and
exploring both possible causes and possible effects.
The following Figure C.6 (adapted from [11]) summarizes the different approaches to analyse the
cause-effect relationship by using three common techniques based respectively on inductive,
deductive and exploratory reasoning. These three techniques are to some extent complementary
and can often be used together.
HAZOP Possible
Possible
causes consequences
Start with
single deviation
Figure C.5 Graphical comparison between Inductive, deductive and exploratory reasoning.
In this section the most common techniques of analysis used for the investigation of the hazard
occurrence in static systems are discussed. These techniques do not provide real-time information
on whether the conditions in a system are becoming hazardous, which may finally lead to an
accident or injury. They are not applicable to dynamic systems where temporal issues need to be
considered.
An overview of analysis techniques for dynamic systems is given in section C.4.5.
Failure mode and effects analysis (FMEA) is an example of an inductive technique, as it starts from
known causes and explores possible consequences.
FMEA originated in the aerospace industry. However, it is now an accepted technique in many
other sectors, including military, rail and automotive. Generic standards [6] are available giving
accounted for in the assessment of the Occurrence. This means that the Detection parameter is
superfluous unless the Occurrence parameter refers to a highly hypothetical and unrealistic system
which has been developed without any design controls whatsoever.
In some FMEAs the Detection parameter is used to indicate the likelihood that the failure cause is
detected during actual operation, i.e. after the system has been released. However, depending on
whether or not a failure cause is detected, the effects (and thus the Severity) will be different. The
FMEA template is not well suited to describe multiple effects of the same basic failure mode. It
seems much more appropriate to treat the "detected" and "undetected" cases separately in the
FMEA, i.e. these should be analysed in separate rows of the FMEA form. The Occurrence rating of
detected and undetected faults, respectively, shall then be based on an analysis of the efficiency of
the error detection mechanism. Furthermore, it is usually possible to determine whether a
particular fault (e.g. a short circuit at a specific point in the system) will be detected by the
implemented error detection mechanisms or not. Therefore, the effects and the corresponding
Severity of a given fault can often be determined without any need for an estimation of the
Detection parameter on a 1-10 scale.
To summarize the discussion on the Detection parameter, we conclude that the Occurrence
parameter and the Severity parameter associated with a failure mode together provide sufficient
information about the risk. This is in line with the general understanding that a risk is a combination
of an occurrence probability and the associated severity.
A further issue that has to be considered is the system boundary and the point at which the effects
are observed. There are usually three boundaries that have to be considered:
• The boundary of the “target of evaluation” – the system, subsystem or component on
which the analysis is being performed;
• The system boundary (usually the point at which the systems sensors and actuators
observe and act on the equipment under control);
• The event boundary at which the hazardous occurrence will be observed (usually the
vehicle).
This brief overview of the FMEA technique is primarily concerned with the analysis principle, i.e.
the bottom-up inductive reasoning for hazard occurrence analysis. However, it should be noted
that an FMEA addresses the entire EASIS dependability framework, as shown in the Table
below.
Fault tree analysis (FTA) is an example of a deductive technique, as it starts from known
consequences and explores possible causes. Descriptions of the technique can be found in
standards such as [5], [8].
To conduct an FTA, it is first necessary to have an identified top-level hazard or hazards of a
system. Typically these will have been obtained from an inductive technique such as FMEA. The
technique is very flexible since the analysis can be conducted to any appropriate level. Therefore
it is extremely important to consider the boundary of the analysis. In application to automotive
systems, the top-level hazard will normally be at the vehicle level. The causes of this hazard can
then be investigated
A particularly important feature of FTA is that it permits the combinations of faults leading to a
particular hazard to be identified. The information in an FTA is usually presented in an hierarchical
format, where individual events in the hierarchy are combined using Boolean logic in the form of
“AND” and “OR” gates. An “AND” gate represents a combination of events that must all be fulfilled
in order for the next highest event to be triggered; whereas an “OR” gate represents a combination
of events where one or more must all be fulfilled in order for the next highest event to be triggered.
A separate tree has to be created for each top level hazard.
Figure C.7 shows an example incomplete fault tree for the hazard “no power from engine” that
demonstrates the basic features of an FTA.
OR
OR
AND
… etc
Rectangle – a fault event that usually results from the combination of one or more basic
faults
Circle – a basic component fault with no statistical dependence on any other events
denoted by a circle or diamond
Diamond – a basic fault event not developed to its cause
Double diamond – an important undeveloped fault event that requires further development
to complete the fault tree
OR gate – one or more of the input events can cause the output event
AND gate – all of the input events must occur to cause the output event
Starting from a top-level hazardous event, the immediate events leading to this event are listed,
combined as appropriate with an “AND” or “OR” gate. Each sub-event is further analysed until
either a basic component fault is reached or a basic fault event is reached that cannot be
developed further. This can be the case either because the information is not available, or
because the fault is not sufficiently relevant to the analysis being conducted.
The completed fault tree can be used for further analyses:
1. A “minimum cut set” can be generated by performing Boolean algebra on the tree.
This identifies the minimum combination of events that can lead to the top-level
event in the tree.
2. Occurrence rates for the top-level hazards can be calculated based on available
probability data for the lowest events in the tree.
In summary, it can be concluded that Fault Tree Analysis is an effective method for performing
Hazard Occurrence Analysis. More specifically, it provides the following benefits over inductive
methods like FMEA:
• The analysis efforts are focused at those issues that really represent potential
problems, i.e. the identified hazards
• The FTA technique allows investigation of cases when two or more faults may together
lead to a hazard
• The FTA technique allows the identification of not-so-obvious causes of hazards. For
example, it could be the case that a sensor signal may have the following
characteristics in a particular system:
• If the signal error is smaller than some limit ε0, the top-level hazard will not occur (simply
because the error is small)
• If the signal error is larger than some limit ε1, the top-level hazard will not occur since
plausibility checks effectively detect such large errors
• Thus, the most critical case is when the magnitude of the signal error is between ε0 and ε1,
supposedly when it is very close to the upper limit ε1. This type of dependency would in
principle be impossible to find by an inductive method such as FMEA.
It should be noted that fault trees may be generated automatically from a system model. The
SETTA project [30], funded by the CEC under contract number IST-2000-10043, addressed the
systems engineering of safety-related system with a special focus on time-triggered systems. One
of the objectives of the SETTA project was better integration of functional development and safety
analysis process. Within SETTA, a prototypical fault-tree synthesis toolset has been developed by
DaimlerChrysler in co-operation with the University of York. The synthesis algorithm performs a
backward traversal of the data flow graphs given by the system model. The toolset consists of a
Matlab/Simulink-GUI for failure mode annotations, a Simulink to XML Converter and a fault-tree
synthesis tool. The concept is illustrated in Figure C.8.
Fault Tree
Evaluation of Fault Tree
cut-set analysis
The entity is the lowest level of component, system or function that will be examined in the
analysis. In the original form of HAZOP, the entities were usually “flows” (of information or an
agent such as electricity, air pressure, a fluid). However the entities can be any identified
component of a system that has suitable attributes.
The attribute is an identifiable state or property of the entity.
The guideword describes a deviation from the intended design behaviour. There is a basic
standard set of guidewords although these usually need to be interpreted in the context of the
analysis being undertaken. Although HAZOP has been applied in many different contexts, it has
been found that these basic set of guidewords are always applicable even if they require
interpretation.
The standard guidewords and their generic meanings are [10], [11]:
Generic Meaning
properties
No The complete negation of the design intention – no part of the intention is
achieved and nothing else happens
More A quantitative increase over what was intended
Less A quantitative decrease over what was intended
As well as All the design intention is achieved together with additions (i.e. a qualitative
increase over what was intended)
Part of Only some of the design intention is achieved (i.e. a qualitative decrease over
what was intended)
Reverse The logical opposite of the intention is achieved
Other than Complete substitution, where no part of the original intention is achieved but
something quite different happens
Timing Meaning
Early Something happens earlier than expected relative to clock time
Late Something happens later than expected relative to clock time
Before Something happens before it is expected, relating to order or sequence
After Something happens after it is expected, relating to order or sequence
Besides Fault Tree Analysis, there are other "tree-based" analysis techniques such as event-tree
analysis (ETA) and cause-consequence analysis (CCA). These are briefly overviewed in the
following subsections.
Event tree analysis is a method for illustrating the sequence of outcomes which may arise after the
occurrence of a selected initial event.
Unlike a fault tree, an event tree is based on inductive reasoning. It is mainly used in consequence
analysis for pre-incident and post-incident application.
An example of an event tree is given in Figure C.9. The left side connects with the initiator, the
right side with damage state; the top defines the systems; nodes call for branching probabilities
obtained from the system analysis. If the path goes up at the node, the system succeeded. If it
goes down, it failed.
It should be noted that the probabilities of the final outcomes in Figure C.9 are approximated based
on the assumption that the "fails" probabilities are much less than unity. For example, the
probability P1 x P5 is more correctly described as P1 x (1-P2) x (1-P3) x (1-P4) x P5.
1 2 3 4 5
Detection Selection of Switch to Backup
Short circuit of short appropriate backup system
circuit action system
Available
P1
Succeeds 1 - P5
1 - P4 Fails
P1 x P5
Succeeds P5
1 - P3
Fails P1 x P4
Succeeds P4
1 - P2 Failure
Initiating
Event Fails
P1 x P3
P3
P1
Fails
P1 x P2
P2
Details on how to carry out event tree analysis as well as its benefits and restrictions are
documented in literature [12], [14].
The ETA method appears to have quite limited applicability for hazard occurrence in the context of
the EASIS Dependability Activity Framework.
Cause-consequence analysis (CCA) is a blend of fault tree and event tree analysis. This technique
combines cause analysis (described by fault trees) and consequence analysis (described by event
trees), and hence deductive and inductive analysis is used.
The purpose of CCA is to identify chains of events that can result in undesirable consequences.
With the probabilities of the various events in the CCA diagram, the probabilities of the various
consequences can be calculated, thus establishing the risk level of the system. Figure C.9 below
shows a typical CCA.
This technique was invented by RISO Laboratories in Denmark to be used in risk analysis of
nuclear power stations. However, it can also be adapted by the other industries in the estimation of
the safety of a protective or other systems.
Details on how to carry out cause consequence analysis as well as the benefits and restrictions of
it are documented in literature [13], [14].
The tree-based methods are mainly used to find cut-sets leading to the undesired events. In fact,
event tree and fault tree have been widely used to quantify the probabilities of occurrence of
accidents and other undesired events leading to the loss of life or economic losses in probabilistic
risk assessment.
However, the usage of fault tree and event tree are confined to static, logic modelling of accidents.
In giving the same treatment to hardware failures and human errors in fault tree and event tree
analysis, the conditions affecting human behaviour can not be modelled explicitly. This affects the
assessed level of dependency between events.
A brief description of the most common techniques for analysis of dynamic system is given in this
section. The main objective is to give a conceptual overview of these techniques, and literature
references where their specific features such as strengths, weaknesses and difficulties of
application can be more evaluated in detail.
Markov modelling allows analysis of the time-dependent behaviour of dynamic systems. The
transitions between system states are modelled as stochastic events. In a time-homogeneous
continuous-time discrete-state Markov process, each possible transition is associated with a
constant rate that represents the probability of the transition firing within an infinitesimally small
time interval divided by the length of that interval.
The system state probabilities PP(t) in a continuous Markov system analysis are obtained by the
solution of a coupled set of first order, constant coefficient differential equations known as the
Chapman-Kolmogorov matrix equation:
M ⋅ PP(t),
dPP/dt = M
where M
M is the matrix of coefficients whose off-diagonal elements are the transition rates and
whose diagonal elements are such that the matrix columns sum to zero.
Markov models are particularly useful for analysis of fault-tolerant systems in which repair and/or
recovery is possible. An example is given in Figure C.11 for the case when two faults together may
cause a failure. Each fault has the occurrence rate λ and the repair (or recovery) rate is µ. The
probability of the state "Failure of system" may be calculated from the rates λ and µ.
2λ
0 1 λ 2
Fault-free One fault Failure of
µ system
[] [P0'(t)
P1'(t) =
-2λ µ 0
2λ -(µ+λ) 0 .
][ P0 (t)
P1 (t)
]
P2'(t) 0 λ 0 P2 (t)
An application of Markov modelling to a hold-up tank problem is discussed in literature [15], while a
Pate-Cornell study on fire propagation for a subsystem on board a off-shore platform is proposed
in [16]. An approach for symbolic approximation of the state probabilities of Markov models of
repairable fault-tolerant systems is described in [31].
Dynamic event tree analysis method (DETAM) is an approach that treats time-dependent evolution
of plant hardware states, process variable values, and operator states over the course of a
scenario (e.g. an accident). In general, a dynamic event tree is an event tree in which branchings
are allowed at different points in time.
This approach is defined by five characteristics sets:
- branching set,
- set of variables defining the system state,
- branching rules,
- sequence expansion rule
- quantification tools
The branching set refer to the set of variables that determine the space of possible branches at
any node in the tree. Branching rules, on the other hand, refer to rules used to determine when a
branching should take place (a constant time step). The sequence expansion rules are used to
limit the number of sequences.
This approach can be used to represent a wide variety of operator behaviours, model the
consequences of operator actions and also served as a framework for the analyst to employ a
causal model for errors of commission. Thus it allows the testing of emergency procedures and
identify where and how changes can be made to improve their effectiveness. An analysis of the
accident sequence for a steam generator tube rupture is presented in literature [17].
The techniques discussed above address the deficiencies found in fault/event tree methodologies
when analysing dynamic scenarios i.e. a systems which configuration depends over the time.
However, there are also limitation to their usage.
Markov modelling requires the explicit identification of all possible system states and the transitions
between these states. This is a problem as it may be difficult to envision the entire set of possible
system states.
DETAM can solve the problem through the use of implicit state-transition definition. The drawbacks
to these implicit techniques are implementation- oriented. With the large tree structure generated
through the DETAM approaches, large computer resources are required. The second problem is
that the implicit methodologies may require a considerable amount of analyst effort in data
gathering and model construction.
Common cause failure analysis studies those system-internal failures that for some reason can not
be considered to be independent from each other.
C.5.1 Definitions
The terminology concerning common cause failures has changed over the years. Traditionally,
only common mode failures were considered. Later, the definition of common cause failure was
introduced referring to a wider group of failures superseding common mode failures. However, at
that time the idea that common cause failure was synonymous with common mode failure was
widespread.
The issue regarding the difference between common cause and mode was clarified when the term
dependent failures was introduced to supersede and encompass common cause, common mode
failures and “cascade failures”. Cascade includes all dependent failures that are not common
cause failures.
The safety and reliability directorate of the United Kingdom Atomic Energy Authority gives the
following definitions of dependent, common cause, common mode and cascade failures [28] and a
graphical explanation is provided in Figure C.12:
• Dependent failure: the likelihood of a set of events, the probability of which cannot be
expressed as simple product of the unconditional failure probabilities of the individual
events.
• Common cause failure: this is a specific type of dependent failure that arises in
redundant components where simultaneous (or near simultaneous) multiple failures
result in different channels from a single shared cause.
• Common mode failure: This term is reserved for common-cause failures in which
multiple items fail in the same mode.
• Cascade failure: These are all those dependent failures that are not Common Cause,
i.e. they do not affect redundant components.
Common cause
failures
Cascade failures
Given two dependent events A and B, the probability that both events A and B happen is not equal
to the product of the two unconditional probabilities:
Prob(A and B) = Prob(A) • Prob(B|A) = Prob(B) • Prob(A|B) ≠ Prob(A) • Prob(B)
This is the broadest category, including both common cause/mode and cascade failures
The definitions in section C.5.1 are not applicable to typical automotive architectures of today since
redundant components are rarely employed. They are however highly relevant for future
automotive architectures in which redundant units are used to implement e.g. Steer by Wire (see
Figure C.13).
Bat1
Bat2
Dual Bus
Connection
FSU1 FSU2
Basic Basic
Ctrl Act 1 Ctrl Act 2
D
D
Actor Actor
2x s A
2x s A
VL VR
FSU
Supervisor
Internal Power
Supply Communication
interface
Ext.
Communication
PWS
With this enlarged definition, common mode failure is a category of special interest. A common
mode failure in both the main CPU and the supervisor will not be detected at the FSU level and will
propagate. This is in contrast to other common cause failures, which will be detected by the
comparators between the supervisor and the main CPU.
Examples:
• Common cause (but not common mode) failure: a high intensity electromagnetic field
may cause a fault in the main CPU and a different fault in the supervisor as their HW is
different. The consequences of the faults will be detected by comparison of internal
values between the main CPU and the supervisor, switching the FSU in fail silent mode.
Then the other FSU (located in another area of the vehicle well protected against the
electromagnetic field) will take over.
• Common mode failure: a common specification fault in an algorithm used in the main
CPU and the supervisor will not be detected by comparisons between the two channels.
It will thus propagate to the output and may cause a spurious action.
As these examples show, a common mode failure is a particular type of common cause failure that
is neither detectable by comparators nor by voters.
Cascade failures are all dependent failures that are not common cause failures. So they relate to
dependent failures affecting a single channel.
Examples:
- At ECU level, failure of the voltage regulator will make the other sections of the ECU
unavailable all at once.
- At ECU level, a high intensity electromagnetic field may cause both the CPU to behave
erratically and its watchdog to fail silent
- At system level, failure of a speed sensor could produce the total/partial loss of many
functions: ABS, Cruise Control, Driver Speed Display…
A safety barrier is a function whose purpose is to avoid the propagation of an internal fault up to
the output of the system. Safety barriers are put in places where propagation of internal faults may
cause hazardous failures.
Examples:
- A watchdog to detect and handle wrong execution of the CPU program.
- A supervisor to detect and handle wrong acquisitions, wrong calculations and wrong
execution of program of a CPU
- A redundant low beam lamp (“loss of one lamp” will not propagate to the hazard “loss of
front lighting”)
- A redundant Fail-Silent Unit (FSU) for an X-by-Wire system
- A plausibility check on some data
Dependent failure events do not necessarily cause more acute problems than independent
failures.
Example:
- At ECU level, the cascade failure of the voltage regulator causing the failure of both the
CPU and its watchdog will only have as consequence the unavailability of the CPU. It
will fail silent. The loss of the watchdog has no consequence in that case. So it is not a
more severe failure than the loss of the CPU alone.
A dependent failure event causes more severe problems when it causes both an internal fault and
the inhibition of a safety barrier whose purpose is to avoid propagation of the fault (see Figure
C.15). The problem in that case is that the occurrence of the failure is as high as if there was no
safety barrier.
Example:
- At ECU level, a high intensity electromagnetic field may cause both the CPU to behave
erratically and its watchdog to fail silent. The consequence may be spurious actuator
command. In this case, the failure of the WD has a consequence.
Figure C.14 Example of FSU Sequence of events leading to a problematic dependent failure
The root cause explains the mechanism underlying the transition from available to failed or
functionally unavailable.
For example:
- If two components are located in the same enclosure and they are susceptible to high
humidity, a common cause failure could occur as a result of an event outside the
enclosure but causing high humidity in the enclosure. In this case high humidity is the
root cause of failure for the two components.
Given the existence of the root cause, the coupling factor explains why a particular cause affects
several components. It creates linking conditions to cause multiple components to fail in a
correlated fashion.
For example:
- Location in the same enclosure is a coupling factor for those components susceptible to
high humidity.
Figure C.16 shows the mechanism of failure of multiple components. When there is a coupling
factor (e.g. same location) and a trigger event (e.g. failure of an air conditioning system) occurs,
the root cause (e.g. high humidity) results in multiple component failures.
Component
a
Component
Root Coupling b
Cause Factor
Component
n
This section discusses use of FTA and FMEA to identify and analyze dependent failures. It also
describes methods used in the most significant standards and how they are applicable to the
automotive field.
FTA is a well adapted method for analysis of dependent failures. It shows where to focus the
identification process of harmful dependencies. Every “AND” gate of the FTA will be carefully
analysed, to see if the events in the input paths are truly independent.
Every hazard of the system, identified in the PHA, may be analysed by FTA at system level and
then detailed by FTA at component level. FTA at system level will show where to look for common
cause and common mode failures at system level. FTA at component level will also show where
cascade failure modes may be harmful.
FTA may be a qualitative analysis or may be quantitative when it concerns HW random failures.
Even in the case of a dependent failure analysis based on a quantified FTA, it is difficult to imagine
how a quantified analysis could be performed. The problem is to quantify the level of dependency.
So, it seems that dependent failure analysis based on FTA is a qualitative method only.
As dependent failure events cause only particular problems when they cause both an internal fault
and the inhibition of a safety barrier whose purpose is to avoid propagation of the fault, “Runtime
FMEA” may also be used to focus the identification process of harmful dependencies. In a
“Runtime FMEA”, the detection measures are safety barriers. So every FMEA line where the failure
effect is hazardous will be analysed to assess dependence between the cause column and the
detection measure column.
Aerospace industries use the term common cause analysis to address what the nuclear industries
call dependent failure analysis. In common cause analysis they identify three different issues which
they address with zonal safety analysis, particular risks analysis and common mode analysis [24],
[25].
Zonal Safety Analysis addresses all those concerns regarding equipment installations,
interference between systems, the robustness of the system against possible maintenance errors
and the claimed installed in close proximity on the aircraft.
The whole aircraft is divided into several zones and for each of these zones a zonal safety analysis
is performed. The objective of the zonal safety analysis is to ensure that the system design meets
the safety objective with respect to:
- Basic installation;
- Effect of failures on aircraft;
- Implication of maintenance errors;
- Verification that the design meets the FTA independence claims.
Particular Risk Analysis addresses specific events listed by airworthiness regulations that
potentially may cause a failure inside the system itself. For each risk the possible consequences
for the whole aircraft should be evaluated; if one of the risks may affect safety, proper measures
should be taken. These particular risks may influence several zones.
The following particular risks are set out in [24].
- Fire
- High energy devices (non-containment):
o Engine
o Auxiliary Power Unit
o Fans
- High pressure bottles
- High pressure Air Duct Rupture
- High temperature Air Duct Leakage
- Leaking fluids:
o Fuel
o Hydraulic
o Battery acid
o Water
- Hail, Ice, Snow
- Birds strike
- Tyre burst, flailing tread
- Wheel rim release
- Lighting strike
- High Intensity Radiation Fields
- Flailing Shafts
- Bulkhead rupture
Basically, components with the same hardware and software could be susceptible to common
mode failures due to couplings arising from particular risks, or other causes. Therefore the
principal task of the analysis is to look for couplings and to evaluate to what extent ‘root causes’
could affect coupled components.
Identifying coupling is the major task and is very much dependent on the expertise of the analyst;
several check lists have been tailored to help in discovering couplings.
It is important to point out that common cause failure analysis in the aerospace industry is purely a
qualitative analysis.
As these 2 zones are segregated by the firewall, they can be considered as independent (except
for particular risks).
Common Mode Analysis is applicable at system and ECU level as it provides an assessment that
the independence claims made in the Fault Tree Analysis are valid.
This standard defines common cause failure as “failure, which is the result of one or more events,
causing coincident failures of two or more separate channels in a multiple channel system, leading
to system failure” [27].
It stipulates qualitative methods to analyse common cause failures [27]:
- General quality control
- Design reviews
- Verification and testing by an independent team
- Analysis of real incidents with feedback on new development
For software [27] recommends (or highly recommends, depending on the SIL level) common cause
analysis of diverse software.
It also proposes [27] a quantitative method to evaluate the occurrence of common-cause hardware
failures. This method is relevant only for multi-channel systems. It gives a relation (β factor)
between the occurrence of common-cause hardware failures on 2 or more channels and the
occurrence of random hardware failures on a single channel. The β factor is computed using
checklists on the following subjects:
- Separation / Segregation
- Diversity / Redundancy
- Complexity / design/ maturity / experience
- Assessment / analysis and feedback of data
- Procedures / human interface
- Competence / training / safety culture
- Environmental control
- Environmental testing
Moreover, the questions asked in the tables are not always applicable to the automotive field.
Generally software development does not follow the same rules and constraints that are
traditionally associated with hardware development. Several important issues should be
considered when applying the qualitative analysis reported in the section C.4 to automotive
software development :
• In the automotive industry, the traditional "waterfall" software development process is being
replaced more and more by "spiral" or "iterative" development methods. This has become
necessary to keep in line with the rapidly increasing and changing customer expectations.
Thus, the software is often under continuous development and improvement. A result of
this is that software FMEAs or FTA are potentially very difficult and resource intensive to
maintain in line with ongoing changes.
• The results of a software FMEA or FTA should be available early enough in the
development process to have an added value in the series product and development
processes. Detailed software FMEAs and FTA are potentially very time consuming.
• Detailed analysis of potential types of software faults can usually be translated to
equivalent faults on the high level software architectural level.
To understand the concept of equivalent faults an example could be considered with a lateral
acceleration signal that has been corrected for offset and gain before being used in the control
system. A microprocessor calculating error (one fault type listed in Ref [18]) could potentially result
in the lateral acceleration variable permanently set to 2.0 m/s2 although the vehicle is driving
straight ahead. From a system level, it is not important if the microprocessor calculation error
occurred in the offset correction module or the gain correction module. In both cases, the signal
used later in the system has the same content and the effect on the system can be considered
equivalent.
The application of quantitative analysis to automotive software is generally considered
impracticable since it is not possible to quantify the probabilities of software faults.
This problem is addressed by the introduction of the concept of Development Assurance Level
[24], [25].
The "Development Assurance Level” addresses the SW development process to introduce
appropriate degree of process control, with the purpose of achieving qualitative indicators that a
system meets its safety objectives.
A large number of comprehensive recommendations for software development processes have
been issued together with assessment reference models by the most common International
Standards (see ref. [26], [27]).
Understandably a number of papers use slightly differing terminology and descriptions to explain
software FMEA techniques. Similarly the scope of components to be analyzed can vary
significantly.
In this paper an FMEA as applied to software is called a software FMEA although this analysis can
be carried out at different levels of detail. The paper in reference [20] - Software FMEA
Techniques, also talks about two levels of software FMEA called "System Software FMEA" and a
"Detailed Software FMEA".
In this paper a software FMEA corresponds closely to a "detailed software FMEA" and a system
FMEA that includes the software architecture corresponds closely to the "system software FMEA"
but includes not only the key software components but also the system hardware components.
C.6.1.1 Software FMEA Compared to System Level FMEA that includes SW Architecture
The FMEA examples that follow this chapter will briefly discuss the potential results from three
FMEA methods applied to the development of automotive software. However before proceeding it
is useful to review a simple software example that indicates the types of issues that arise.
The example selected is a subroutine that should return the highest of three numbers. It quickly
becomes clear that numerous potential faults could be contained within this type of calculation.
Typical fault examples could be a coding mistake that results in calculating with mismatched
intermediate data types, returning the lowest variable or writing the result to the wrong location. In
effect the number of potential faults are nearly endless even for this small calculation. That would
indicate that the software analysis is being conducted at either too low a level or with inappropriate
‘failure’ modes.
Approaching potential faults from another direction, it is always possible to construct faults that
result in the output taking any value possible. As such, it appears to make sense for a software
FMEA to assume that a fault will result in the worst case for the output of a function. Extrapolating
this result to larger software functions still results in potential faults that cause the outputs of this
larger function to take on any value possible.
Since the potential effects on the vehicle clearly depend on the system functionality and its ability
to influence vehicle operation via actuators, communication buses etc, it appears to make sense to
consider software faults at the system high architecture level rather than at the low software level.
This leads to the conclusion that system level FMEA (or ETA) methods including software at the
architectural level, together with additional process and product requirements that result from a
very generic software FMEA (see following chapter) can provide the vast majority of the value
which may be available from performing a very detailed, product specific software FMEA.
Other examples in the following chapters lead also to the suggestion that recommendations
developed from a detailed automotive software FMEA will tend to be similar, independent of the
actual product being developed. That is to say that the recommendations would be the same if the
product is a windscreen wiper, rear heated window or steering system. Furthermore, it is
suggested that the methods to resolve the potential issues that arise are well documented in
numerous process, coding standards and other types of documentation already available.
It is however recognized that a high level analysis cannot identify specific types of faults at all
points where the corruption of individual variables may couple through the software system to
result in critical failures. The effectiveness of this will be discussed more later in the safety FMEA
chapter. Some recognition of the cost of various analyses may be appropriate. That is, difficult or
costly analyses should generally be limited to safety critical systems. It is likely that analysis
methods at the system level will be considerably more cost effective to create and maintain than at
a detailed software level. This conclusion is also reached in reference [20].
The system level FMEA or equivalent should include typical types of microprocessor faults such as
execution errors (see ref [20]) and the effects of critical variables being corrupted.
There are a number of FMEA methods and formats used in the development of automotive
systems that could be applied to the software development.
Some examples include:
- Process FMEA
- Component FMEA
- Safety FMEA
In all of these FMEA methods there are values assigned for severity, detection and occurrence that
can have different meanings for the different types or styles of FMEA.
14.11.2006 2.0 C-33
EASIS Deliverable D3.2 Part 1 - App C
- In a process FMEA, the method often involves consideration of what could happen in
the manufacturing (development) process, what effect this could have, how likely this
may be and to ensure measures are in place to detect any of issues that could occur so
these can be corrected.
- In a component FMEA, the method often involves consideration of what could happen
in the finished product (or component), what effect this could have, how this would be
detected in the development process (and field) and what is the probability that this may
occur.
- In a safety FMEA, the method often involves assumption that particular faults will occur
and then considers the affects of this occurrence in the field. Only if the effects of the
occurrence are considered unacceptable, will the probability of this occurrence be
considered to assess if the system (or component) meets acceptable safety standards.
In all of the FMEA examples it is usually necessary to clearly define if the severity of the
occurrence is analysed assuming that a fault detection and correction feature has worked as
designed (either software, hardware or other detection and correction mechanisms). For example,
if a engine controller detects a sensor error and switches to a limp home function that allows the
engine to run, the severity of this is usually considered less than if the engine stops running
altogether.
These various types of FMEA can be applied to the software and development process and
product. The following chapters discuss potential results of such analysis methods related to
software development.
There are a number of stages in the software development process where faults could be
introduced into the production product. Analysis of the software development process will usually
identify these possibilities. For example faults may be introduced during the following development
stages:
- Requirements
- Specification
- Design
- Implementation & Coding
Analysis of the types of faults that can be introduced using a process FMEA typically result in the
need for appropriate verification and validation steps in the software development process that
covers all stages of development.
In the case of software development there are already a large number of comprehensive
recommendations for software development processes available together with assessment
reference models (for example [26] - Spice and CMM(I)). As such, the developer may ask if a
software process FMEA is necessary or if it is more effective to align to or tailor one of the more
recognized and emerging standards together with regular assessments designed to uncover weak
points in the development process and product.
When reviewing the fault types in a software component FMEA, the large number of types of faults
possible and the difficulty in assessing the effects of these different types of faults allows the fault
severity to be reduced to two categories.
- The software output or operation is correct
- The software output or operation is incorrect (for example too high or too low)
This can be justified by analysing an example with a sensor input signal represented within the
software as a 16 bit signed variable. If the variable is higher than the correct value at a given point
in time, this may have been caused by a signal overflow, a wrong gain factor, a wrong offset value
or many other fault types. The number of potential causes is almost unlimited.
Additionally the effects of a particular type of fault can also be reviewed. In this case EASIS
"Description of Fault Types for EASIS" document (see references [18]) was used and equivalent
faults applied to the software. Analysis of these types of faults suggests the effects of such faults
may be unpredictable in the worst case. For example, if a software module writes results to the
wrong address (using a wrong pointer), potentially this could result in any variable being too high
or too low and it is almost impractical to predict what effect this may have on the system for all
variables and contents.
The developer may ask if a software component FMEA is necessary or if it is more effective to
align to one of the more recognized and emerging standards together with regular assessments
designed to uncover weak points in the development process and product.
Safety FMEAs are usually only concerned with analysing the safety aspects of a system. The
severity will be scaled and resolution increased related to the safety criteria more heavily. For
example, a system switching into a safe degraded mode of operation with a warning lamp may be
rated 5 from a safety viewpoint whilst in other FMEAs the severity for a warning lamp may be 8 or
9. Priority is then given to analyse cases with potentially high severity.
FMEAs performed to support safety evaluation of products are focused on examining the product
for its immunity to single point failure events. That is, single point failures that could result in a
safety event must be shown to be eliminated from the design. That is typically extended to include
those dual point failure events which include an initial failure that is not detectable by the hardware
and/or software of the system. Safety related FMEAs have included system, software, and
hardware. These FMEAs are usually separate, with software FMEAs being performed both at the
system and detailed levels.
Software FMEAs applied to a product for evaluation of safety aspects of the system have been
shown to be of value. However, they are both expensive and time consuming at the detailed level.
Incorporation of the system level software FMEA of reference [20] into a combined hardware and
software system level FMEA may be advantageous both technically and for cost reasons. Most of
the value provided by software FMEAs is provided by the system level evaluation. However,
evaluation only at the system level cannot completely ensure that single point failure paths are not
included in the system and software design.
Verification of immunity from single point failures due to corruption of single variables has been
one of the crucial outputs of detailed software FMEA. These FMEAs, while requiring significant
resource commitments, have been plausible for many safety critical designs, including electronic
steering systems by various tier 1 suppliers. Completion of a detailed SW FMEA in a time frame
that supports the design process has been contingent on the use of a traditional ‘waterfall’ type
development process, a well structured design, and comparatively small (less than 128K compiled
size) software size.
Upcoming designs appear to be unlikely to routinely include these properties. The increasing size
of automotive software, coupled with a conversion to a spiral development process, limit the
effectiveness of current methods of detailed software FMEA. Completion of these FMEAs in a time
frame that supports design appears problematic. This makes it difficult to recommend the method
except for relatively small, safety critical, systems. Automated aids may have some potential to
extend the method to the larger systems now being developed in the automotive industry
eventually. These automated aids do not currently exist.
The EASIS consortium concludes that whilst a software FMEA may have benefits in automotive
software development, the results and recommendations from a low level, detailed software FMEA
cannot generally be completed in a time frame that adequately supports the spiral design process
which is becoming more standard in the automotive industry. Process type considerations are
generally well documented and available in the many and comprehensive process, coding
standards and other types of documentation available.
For a particular product, a system level FMEA including key components of the software
architecture is likely to provide all of the value of the current system level software FMEA and
much of the value of the detailed software FMEA when used in conjunction with numerous, existing
process and product recommendations. The use of detailed software FMEAs cannot be
recommended at this time except in limited cases. The current state of technology for detailed
software FMEAs does not allow cost effective and timely completion of the analyses when a spiral
(incremental) development process is in use.
The benefits from a system level product and process approach is potentially equivalent value,
available earlier, with less resource and is more practical to maintain. The system level analysis
(FMEA or equivalent) should include typical types of microprocessor hardware faults such as
execution errors (see ref [20]) and the effects of critical software variables being corrupted. The
development process must include validation.
A general overview of Fault Tree Analysis (FTA) is given in section C.4.3.2 of this appendix. The
idea of applying the FTA technique specifically to the software has been a topic of some interest in
the academic and industrial communities for some time. However, a number of observations
indicate that Software FTA may have significant limitations for automotive software. Some of these
observations are related to automotive-specific characteristics of software development, whereas
others are concerned with the software FTA method itself. These observations are elaborated in
the following subsections.
Short time to market, limited costs and high possibility of configuration are all essential features
required of automotive software. These features, combined with the need to reuse, combine and
improve different versions and modules of pre-developed software have led to a change in the way
automotive software is created. Model-based techniques are now being increasingly used.
With the Model Based Approach the software development can be considered as a chain of
activities starting with the specification, passing through the modelling and its validation and ending
with automated code generation. Errors occurring in these activities are clearly not detectable by a
deductive technique of analysis such as FTA. In reality, this problem is primarily addressed by
introducing appropriate verification and validation steps in the software development process,
covering all stages of development.
The model based approach, widely used in automotive, seems to discourage the execution of
detailed software FTAs in favour of system level FTAs. At the system level the efficiency of
traditional FTAs conducted on the system physical components can be improved, when necessary,
by considering the software components described by the software model. In other words, a
specific deviation from the intended service to be provided by a specific software component can
be considered as a basic event in a system level FTA. Such an FTA would then include both
hardware-related and software-related basic events.
When applying FTA to software, and in particular to automotive software, it appears that some of
the main FTA strengths are lost. In fact:
• FTA is not suitable to assess the reliability of the software because it is not possible to
express this feature by a top event. This is a very important limitation as software reliability
has a high impact both on the safety and the availability of a system.
• The software architectures used in automotive field are mainly simplex; no redundancies,
except for monitoring and recovery functions, are provided. This means that the FTA output
would be a simple list of elementary events without any conditional event (AND gates).
• Only a very limited class of failures originating from software can be considered
“deterministic”, hence manageable by an analytical approach.
So, the limited results expected from a FTA on a single software top event are generally
considered disproportionate to the resources and the time needed to assess thousands of code
lines typically related to one single top event.
C.6.4.3 Calibration
As already mentioned, software for automotive is conceived and organized to be used in more than
one application or in multiple variants of the same application.
As an example, we may consider the Electronic Stability Program (ESP) function often found in
vehicles of medium and large class. From the OEM point of view, the management of the ESP
function can be very articulated; the ESP could be “optional” on a particular vehicle model or could
have different characteristics for different vehicle models (small / large sedan, station wagon, van,
sport, etc.).
In order to take all these variations into account, the software shall be conceived to allow changes
of sets of parameters, usually called calibrations, to tailor the software to each vehicle model
variant. The final calibration typically takes place quite late in the product development lifecycle.
In many cases the calibration process involves the tuning of a large number of parameters for
which only general guidelines can be created. Thus, the calibration is a potential source of errors
the effects of which can not be efficiently predicted by analytical assessments.
The high potential for calibration-related problems, as indicated by field failure warranty data,
combined with the low efficiency of analytical methods has encouraged the search for alternative
methods for software assessment based on testing and process quality check.
At the current state of technology, detailed software FTAs are considered incompatible with the
development processes used in the automotive sector. However, software faults can be accounted
for in system-level FTAs by exploiting model based techniques where software is described in
terms of interacting “components”. In any case, the contribution of potential software defects to the
occurrence of system hazards can only be assessed qualitatively, not quantitatively.
In addition to system level FTAs, the primary means for hazard occurrence analysis with respect to
software should be an assessment of the processes for software development and quality
management.
This section describes an analysis method that is particularly suited to the early phases of system
and software development, when only outline information is available as an input to the analysis. It
requires, in terms of personnel, the technical know-how of system engineers, hardware engineers
and software engineers, with knowledge of the application. In terms of documentary input, it
requires only a system block diagram – the more information this has, the more useful will be the
process output.
C.6.5.1 Overview
The primary goal of the analysis method is to determine whether any hazardous outputs exist at
the system block level. For the purposes of this analysis, the system block design includes both
hardware and software blocks, and any blocks of functionality that will need to be present in the
final design, but have not yet been allocated to either hardware or software.
The method can be used either for system analysis, where the whole system block diagram is
analysed, or for software analysis, where only the system blocks allocated to software are
analysed. The former is preferred, for two main reasons.
First, there may yet exist some blocks that are not allocated to either hardware or software. These
may not be properly analysed if only hardware or software blocks are studied.
Second (and possibly more significant) is that there are occasions when hardware functional
blocks can mitigate software failures, and vice versa If the whole system is not analysed together,
these synergies may be overlooked.
At this stage the System Block Diagram is colour coded to show the presence of mitigated
and unmitigated failures within the functional blocks at the system level.
6 Identify Residual Risk
5 in which the areas where risk remains after applying the identified mitigating mechanisms to
the system design are identified. Corrective actions are proposed to reduce the risk from
those areas. Examples of corrective actions include addition of a mitigating function, and (for
software) specific checks on a particular variable.
It will be noted that the activities recommended for inclusion in this analysis method cover several
areas of the EASIS Dependability Framework. Specifically, steps 1 & 2 are basic preparation work
for the subsequent steps; steps 3 & 4 are part of ‘Hazard Identification and Classification’ (covered
in more detail in Appendix B); step 5 is part of the process of deducing Dependability
Requirements (for more of which please see Appendix D); and step 6 is split across other areas of
this appendix (residual risk) and Appendix D (corrective actions leading potentially to further
requirements).
In terms of documentation, the process takes as its input a document which exists within any state
of the art automotive development process, and produces as its output on the one hand
modifications & annotations on the system block diagram, and on the other hand corrective
actions, mitigations and design artefacts, all of which can be added to the change control
mechanisms and design processes used in the same state of the art development process.
This analysis method effectively uncovers shortcomings in the system design that can be simply
overcome by the addition of functionality (software or hardware) or other system modifications
(changes to the manual or service procedures), at a stage in the lifecycle when such addition and
change can be readily planned into the overall level of work. Similarly, it can highlight
shortcomings that can be mitigated in hardware at a stage before the hardware is finalised.
However, if the method is used before the System Block Diagram is properly stable, unnecessary
mitigations and corrective actions may be requested, leading to unnecessary cost. Additionally,
important mitigations and corrective actions may be overlooked, leading to less safe designs. This
disadvantage is easily overcome by ensuring the System Block Diagram is at an appropriate level
of maturity before attempting to apply the process.
C.7 References
[1] IST Program EASIS project, Technical Annex, Description of work, 2003.
[2] MIL-HDBK-217F. Reliability Prediction of Electronic Equipment, US
Department of Defense, 1991
[3] NPRD-95. Non electronic Parts Reliability Data, Reliability Analysis Center
(RAC),
[4] IEC 60812, Analysis techniques for system reliability - Procedure for failure
mode and effects analysis (FMEA), IEC, 1985
[5] IEC 61025, Fault tree analysis (FTA), IEC, 1990
[6] BS 5760-5, Reliability of systems, equipment and components. Guide to
failure modes, effects and criticality analysis (FMEA and FMECA), British
Standards Institution, 1991.
[7] MIL-STD-1629A, Procedures for performing a failure mode, effects and
criticality analysis, U.S. Department of Defense, 1980.
[8] Def Stan 00-41, Reliability and maintainability: MoD guide to practices and
procedures, Issue 3, UK Ministry of Defence , 1993
[9] SAE J 1739, Potential Failure Mode and Effects Analysis in Design (Design
FMEA), Potential Failure Mode and Effects Analysis in Manufacturing and
Assembly Processes (Process FMEA), and Potential Failure Mode and
Effects Analysis for Machinery (Machinery FMEA), Surface Vehicle
Recommended Practice, SAE International, Rev. August 2002
[10] Def Stan 00-58, HAZOP studies on systems containing programmable
electronics, Issue 2, UK Ministry of Defence, 2000.
[11] F. Redmill, M. Chudleigh, and J. Catmur, System Safety: HAZOP and
Software HAZOP, John Wiley and Sons, 1999, ISBN 0-471-98280-6.
[12] H Raafat, Risk Assessment Methodologies, University of Portsmouth, ISBN
1 069959434
[13] L.M. Ridley and J.D. Andrews. “Application of the Cause-Consequence
Diagram Method to Static Systems”, Reliability Engineering and System
Safety vol. 75, no. 1, Jan. 2002, pp. 47-58(12)
[14] J. Suokas and V. Rouhiainen, Quality Management of Safety and Risk
Analysis. Elsevier Science Publishers B.V, 1993.
[15] N. Siu. “Risk Assessment for dynamic systems : An overview.” Reliability
Engineering and System Safety, Vol 43, 1994, pp. 43-73.
[16] M. E. Paté-Cornell. “Risk Analysis and Risk Management for Offshore
Platforms: Lessons from the Piper Alpha Accident”, Journal of Offshore
Mechanics and Arctic Engineering, Vol. 115, Aug 1993, pp. 179-190.
[17] C. Acosta and N. Siu, “Dynamic event trees in accident sequence analysis:
application to steam generator tube rupture”, Reliability Engineering and
System Safety, Vol 41, 1993, pp. 135-154.
[18] IST Program EASIS project, “Description of Fault Types for EASIS”, 2005
[19] Guidelines for the Use of the C Language in Vehicle Based Software,
MISRA, 1998
[20] P. Goddard. “Software FMEA Techniques”. IEEE Reliability and
14.11.2006 2.0 C-40
EASIS Deliverable D3.2 Part 1 - App C
Table of contents
D.1 Introduction.....................................................................................................D-1
D.2 Overview of requirement types.......................................................................D-3
D.2.1 Requirement hierarchy example .........................................................D-5
D.2.2 Ideal properties of requirements .........................................................D-6
D.2.3 Relationships among the Requirement Types ....................................D-6
D.2.4 The relationship between Hazard Criticality and Dependability
Requirements......................................................................................D-7
D.3 Requirement types .......................................................................................D-11
D.3.1 Hazard probability requirements .......................................................D-11
D.3.2 Fault tolerance and functional degradation requirements.................D-19
D.3.3 Requirements on system architecture ..............................................D-45
D.3.4 Requirements on specific error detection mechanisms and
corresponding reactions....................................................................D-46
D.3.5 Quantitative requirements for hardware architecture........................D-53
D.3.6 Requirements on the avoidance of non-systematic faults ................D-58
D.3.7 Critical functional requirements.........................................................D-59
D.3.8 Requirements for functional limitations .............................................D-61
D.3.9 Requirements on the design process ...............................................D-63
D.3.10 Requirements on Isolation and Diversity ..........................................D-69
D.3.11 Requirements on the manufacturing process ...................................D-71
D.3.12 Requirements for systems external to the system of concern ..........D-72
D.3.13 Requirements on user manual and service manual..........................D-77
D.4 References ...................................................................................................D-80
D.1 Introduction
In this appendix, the issue of how to determine and specify appropriate dependability-related
requirements for an Integrated Safety System is addressed. Methods and analysis techniques to
identify dependability requirements are discussed, covering the entire process from high-level
implementation-independent requirements to low-level implementation-dependent design
requirements.
The appendix is organised according to different requirement types. For each such type, the
following aspects are addressed:
• How to determine suitable requirements of this type for a given system, including how to
validate that the requirements are appropriate
• The characteristics of the requirement type in terms of its meaning, expressive power, how
to formulate the requirements, limitations as well as specific difficulties associated with
defining requirements of this type
• Relationship between requirements at different levels of detail and relationship to other
types of requirements with respect to the decomposition of requirements from higher levels
to lower levels (for example from system level to subsystem and component levels)
• Examples of requirements of this type
• Verification issues
Figure D.1 shows the place of the requirements activity within the dependability activity framework
that is defined in Appendix A section A.3 (“The EASIS dependability activity framework”). As
implied by the Figure, the inputs to the establishment of dependability-related requirements are:
• A list of hazards that have been classified with respect to criticality
• A description of the relationship between hazards and their causes, including qualitative
(cause-effect) and quantitative (probabilistic) analysis of hazard occurrence
• Functional and physical descriptions of the system of concern, in more or less detail
depending on the current development phase e.g. conceptual phase, early phase, late
phase
Since the requirements obviously affect the system design and since the analysis performed on
the design can generate additional requirements, the process is iterative so the figure should not
be perceived as a sequence of steps to be taken in the development.
It is important to understand that requirements concerning the overall approach to dependability is
not within the scope of this appendix. Overall process issues are instead dealt with in EASIS
deliverable D3.2 as a whole and particularly in the definition of the dependability activity framework
(see Appendix A).
Identification of hazards
Development
and design of
the integrated
Classification of Hazard occurrence safety system
hazards analysis
Establishment of dependability-related
requirements
Verification and
validation of
dependability-
related
requirements
Safety case construction
The requirement types investigated in this document are briefly presented below. It is important to
understand that the main purpose of defining these types is to facilitate a logical structuring of the
document. As each requirement type has its own characteristics and its own considerations
concerning how to determine, formulate and verify the requirements, these types form the basis for
the structure of this appendix.
Hazard probability requirements
Requirements on the tolerable probability of each potential hazard can be considered as
the most fundamental type of safety requirement. A system that has a sufficiently low
probability of creating hazards is in principle fit for its intended use with respect to safety,
regardless of whether or not it fulfils all other of its dependability-related requirements.
Conversely, if the hazard probabilities are above the tolerable limit, the system is not
acceptable regardless of whether or not it fulfils other dependability-related requirements.
Fault tolerance requirements
As faults are the root causes of failures, it is desirable to break the cause-consequence
chain between faults and failures. At the very least, the resulting failure mode (at the user-
perceivable level) should be as safe as possible. Fault tolerance requirements describe the
allowed relationships between faults and failures in terms of the required degree of
functionality when a given fault (for example "any single fault") or combination of faults
exist.
In its purest form, fault tolerance means that full functionality shall be provided by the
system when a given fault (or combination of faults) is present. For the purpose of this
document, however, this requirement type is not limited to such pure fault tolerance.
Functional degradation is also included. It is often sufficient that the system of concern
enters a degraded operation mode (for example a failsafe mode) in response to an existing
fault, rather than maintaining full functionality.
Fault tolerance requirements can be decomposed into more detailed requirements on
mechanisms to implement for achieving fault tolerance.
Requirements on system architecture
Based on the findings of dependability analyses, it is often appropriate to specify
requirements on the system architecture. For example, requirements on the existence and
utilisation of redundancy may be specified.
Requirements on specific error detection mechanisms and corresponding reactions
As a result of the requirements analysis process, specific dependability mechanisms can
be identified. Such mechanisms typically involve an error detection method and a
corresponding reaction. The reaction typically includes a change of operation mode (for
example a transition to a safe state), information to the driver and the storage of a DTC
(Diagnostic Trouble Code). Descriptions of these mechanisms can be provided to the
hardware and software designers for implementation.
Quantitative requirements for HW architecture
Although hazard probability requirements represent the most obvious quantitative-
probabilistic requirement type, other types of quantitative requirements are also addressed
in existing and upcoming standards. These deal with metrics such as Safe Failure Fraction,
Dangerous Failure Coverage and Monitoring Coverage. In short, these metrics describe the
proportion of faults that lead to a particular outcome. For example the Safe Failure Fraction
is the rate of faults that do not directly lead to a dangerous failure mode divided by the total
failure rate.
A hypothetical example that shows how different types of requirements may be related to each
other is given in Figure D.2. This example has been produced for the sole purpose of this
document and does not represent any existing system, so the numbers and requirements given
are purely fictional. The system under consideration is a conventional cruise control system and
the hazard considered is "inability to deactivate the cruise control when the driver is pressing the
brake pedal". Although cruise control can not be considered as an Integrated Safety System, this
example gives an idea about how requirements might relate to each other.
The figure shows how the top level probabilistic requirement is broken down into a set of lower
level design-oriented requirements, finally ending in detailed implementation-level requirements.
Regardless of the requirement type, there are some properties that the requirements should ideally
fulfil. Some of these properties relate to individual requirements while others relate to the complete
set of requirements for a system. These properties are the following:
• Completeness: All relevant requirements shall be included. Thus, the requirements
specification should distinguish the behaviour of the desired system from that of any other,
undesired system that might be designed. It should however be kept in mind that this
EASIS document is only concerned with dependability aspects.
• Non-ambiguity: The requirements shall not be open to interpretation.
• Consistency: The requirements shall not contradict each other.
• Correctness: The requirements shall represent the desired behaviour or properties. For
each requirement, it shall be validated that the requirement is appropriately defined and
that it really provides a benefit. Here, "validation" of a requirement is taken to mean the task
of ensuring that a requirement is consistent with what is intended.
• Atomicity: Each requirement shall represent a single "designable" entity.
• Verifiability: The requirements shall be formulated in a way that makes it possible to verify
that they are fulfilled. For each requirement, a method and criteria for verification shall be
defined.
• Traceability: It shall be possible to trace between requirements at different hierarchical
levels, for example between system-level requirements and requirements on individual
components. Furthermore, tracing between every hazard and the requirements related to
this hazard shall be possible, in both directions.
It is of course not necessary that these properties hold at all times during the requirements
engineering process. However, the final requirements should preferably fulfil the listed criteria.
The use of a dedicated requirements management tool is strongly recommended. Such a tool
typically supports most of the requirements properties discussed above.
Within the list of requirement types above, there are three distinct requirement categories:
• Product Requirements These requirements will lead to functional behaviour, or will
affect the established functional behaviour, within the system under development or another
system.
Within this category are Hazard Probability, Fault Tolerance, Fault Avoidance, Quantitative
H/W, Dependability Mechanism, Critical Functional, Functional Limitation, and Separation/
Isolation/ Diversity requirements.
For requirements that affect systems other than the system under development, no
knowledge can be assumed concerning whether those requirements will be implemented.
• Process Requirements These requirements are concerned with the process used to
develop, verify or validate the system under development.
Within this category are Requirements on the Design Process, Requirements on the
Manufacturing Process and (via the Requirements on the Service Manual) Requirements on
the Maintenance Process.
• Affected Systems This category of requirements indicates the system to which the
particular requirement (be it product or process) relates. This will generally be the system
under development, and this is assumed to be the case if no system is specifically
referenced. Since a system tends to be mentioned only when it is not the system under
development, it is easy to deduce that there must be categories of requirement for external
systems. This is not the case for a genuinely orthogonal set of requirement types.
Nevertheless the categories exist here because it is useful to use ‘external systems’ as a
starting point for determining dependability requirements that may not otherwise be obvious.
Within this ‘category’ are Requirements on External Systems and Requirements on the User
Manual and Service Manual
The first two of these three categories are (at a low enough level in the requirements hierarchy)
independent of each other – requirements that affect the process are unlikely to be product
requirements. However, both product and process requirements will be fulfilled in a system, and
that system may be either the system under development or some external system.
It should be noted that the distinctions drawn here will not be true in all cases. At higher levels of
abstraction (higher up the hierarchy of requirements) the distinctions are less clear. However,
proper elaboration of the requirements will lead to atomic requirements, which will by definition
represent a single “designable” entity (see ‘Ideal Properties of Requirements’). Such a
requirement can only impact one system, and a “designable entity” will not be both a product entity
and a process entity.
Process requirements must eventually get nailed down to a particular process (design,
manufacture, maintenance, etc) and step (design, implementation, verification, End-Of-Line-Test,
planned maintenance, repair, etc) in order to be implemented. Similarly product requirements
must eventually get nailed down to a particular discipline (software, hardware, mechanical,
hydraulic, etc) for implementation.
Let us consider a purely hypothetical case, beginning with the hazard “Unstable vehicle”, which
has as one of its causes “Worn nurgling valve pinion”, and as another cause “Fully extended
nurgling valve”. These hazard causes lead to the requirement “The wear on the nurgling valve
pinion shall be inspected at every standard service interval”. The only way to inspect the nurgling
valve pinion is to fully extend the nurgling valve, which of course leads to the second cause of the
hazard. It is also known that retraction of the nurgling valve during inspection can lead to injury to
the inspector. If we propose a software requirement ‘The nurgling valve shall be fully extended by
diagnostic command’, we have two apparently conflicting requirements, which mean we can’t put a
duration limit on the full extension activation that’s safe for the vehicle because it’s not safe (or not
practical) for the inspector. As is often the case, we have conflicting requirements and we have
several possible solutions, some or all of which may be used together:
• We can either reconsider the inspection method (creating maintenance process requirements).
• We can increase our confidence that the nurgling valve is never extended when it could make
the vehicle unstable (creating software functional requirements, if we decide to provide an
unlikely and unique combination of inputs with which to ‘gate’ the extension request).
• We can limit the extension of the nurgling valve under normal circumstances (creating software
functional requirements).
This example shows how a dependability requirement, that has arisen from a Hazard and its
cause, can lead to requirements on both the process and the product. In fact, the process
requirement will probably also lead to a service manual requirement concerning how to prepare the
vehicle for inspection of the nurgling valve.
In conclusion, there are relationships between the requirement types listed in this document that
depend on whether they are process requirements, product requirements, or an attribute (‘system
affected’) of one of these two types of requirement.
In order to determine the nature of the relationship between the criticality of the hazards of the
system under development and any dependability requirements deduced for those hazards, we
14.11.2006 2.0 D-7
EASIS Deliverable D3.2 Part 1 - App D
should first consider why we are gathering dependability requirements. And the answer we find to
this question is that we have a system which we believe can lead to certain hazards, and we wish
to protect users of the system from those hazards, or reduce their likelihood of occurrence to an
acceptable level. In other words, we want to design a dependable system – a system upon which
the user can depend.
So the second question we need to consider concerns the nature of the system we are designing.
Generally speaking, we do not start the design of our system with a completely blank sheet of
paper. We begin with a ‘black box’. We know quite a lot about the system we wish to control, and
we also have a pretty good idea about how we want it to do it. We also have some idea of the
signals and sensors we have available, and what actuators we will need to control. So we know
what it is we want the black box to do, in broad terms, and we know with some degree of certainty
what the inputs to and outputs from this black box will be. Given this black box, with its inputs and
outputs, we can perform Hazard Identification and Hazard Classification to identify the hazards
associated with this black box and to make an estimate of the criticality of the hazards. (Hazard
Identification and Hazard Classification are described in Appendix B.)
Once we have determined criticality, we can start to determine dependability requirements for the
black box, and maybe the rest of the system as well. Now there is an intuitive link between the
criticality of the system (as given by the most critical hazard exhibited by the system) and the
dependability that the system will need to exhibit. If a system has only low criticality hazards it can
(by definition) do little or no harm through any direct action of its outputs or through faults on its
inputs. So it seems sensible to assume that dependability requirements will generally not be
sought in any great depth. However, if we have a system in which we have determined that failures
may very well lead to highly critical hazards, we accept that we will need to expend significant
effort ensuring that we do what is reasonably practical to avoid the situations which might lead to
those fatal consequences. Moreover, we only need to expend effort in proportion to the criticality
of the hazard when uncovering dependability requirements for each hazard.
It is important to realize that the hazards associated with a system usually have different criticality,
and that it is not necessarily appropriate to let the hazard with the highest criticality dictate the
development of the total system. By way of an example, consider a system that has one extremely
dangerous failure mode FM1 and another much less dangerous failure mode FM2. The Hazard
Classification would then result in a higher criticality assigned to FM1 than to FM2. The
requirements concerning the prevention, avoidance and/or mitigation of FM1 would consequently
be stronger than the requirements concerning the prevention, avoidance, mitigation etc of FM2.
The hazard probability requirements, for example, would typically differ between these two failure
modes.
To assist in this study, the Figure D.3 shows the two potential routes to Dependability
Requirements, and the methods used to ensure that those requirements are properly related to the
criticality of the Hazards associated with the system to which they apply.
Dependability
Identified
Identified Requirement System Description
Hazard
Hazard Types Experience
Specific Guidelines
etc
Elicit
Classified
Classified ALARP
Hazard
Hazard
Elicited
Elicited
Elicited
Dependability Safety Case
Hazard Dependability
Hazard Dependability
Requirement
Probability Requirement
Probability Requirement
Requirement
Requirement
Key to Symbols
Test for Test for
Applicability Support
Elaborate
Artifact
ALARP AND
Activity
Elaborated Elicited
Elaborated
Dependability Elicited
Dependability
Elaborated
Dependability
Requirement Dependability
Requirement
Dependability
Requirement Requirement Principle or
Requirement General
Guideline
new requirement can be associated. In the second case, the requirement can be rejected. This
principle is shown in the central (green) vertical chain of artefacts and activities in Figure D.3
A further justification for inclusion or exclusion of requirements is given in the Safety Case, which
can be used to show that enough requirements (and no more) have been implemented to justify
the statement of the primary goal “The system is safe enough to operate in the given environment”.
The ‘enough and no more’ part is again founded on the ALARP principle. This final principle is
depicted by the right-most (blue) vertical chain of artefacts and activities in Figure D.3. Note that a
requirement is only accepted into the set of all dependability requirements if it is both supported by
the Safety Case and applicable to one or more top-level hazard-based requirements.
To conclude, requirements are directly related to the criticality of the hazards as represented by
the results of the Hazard Classification, through elaboration (whereupon they have a place in the
hierarchy) or through testing elicited requirements for their place in the hierarchy.
In general there are two means of attaching probabilities to hazards: quantitative and qualitative.
In a quantitative scheme, numerical values are attached to the hazard probabilities; for example in
terms of a frequency of occurrence per year or per operating hour, or as a statistical probability.
Note that when numerical values are expressed as a statistical probability, such a quantity is
dimensionless. However when expressed as a frequency of occurrence, the quantity is not
dimensionless (and usually has dimensions of [time-1] or similar).
Care should therefore be taken when expressing quantitative requirements to ensure that the units
are clearly stated. This is particularly important when comparisons have to be made between
quantities. For example, reliability data for an electronic component is usually expressed as a
failure rate λ, which is typically expressed in failures per 106 hours. However the statistical
probability of failure (which is dimensionless) is given by:
P(t ) = 1 − e − λt
If λt is small this equation reduces to P(t) = λt. Note that in this example, the statistical probability
can only be derived by determining a time period t over which this probability is calculated. This
would usually be the projected service life of a component or the vehicle. For example, if a vehicle
is designed to have a service life of 10 000 hours and a component or system has a failure rate of
0.5 in 106 hours i.e. 5 x 10-7 per hour, then we find
P(t)exact = 0.00499
P(t)approx = 0.005
In a qualitative scheme, a number of discrete classes are used and definitions given of the hazard
probabilities and a description of what they mean. An example of such a qualitative classification is
given in the following table:
In this example, the frequency classes have been matched to a textual description of the frequency
(or occurrence). It is possible to extend a qualitative scheme to a semi-quantitative scheme, where
14.11.2006 2.0 D-11
EASIS Deliverable D3.2 Part 1 - App D
the frequency classes have numerical values attached to them, either as a range or as an upper
limit.
The following example is taken from the pharmaceutical industry and shows how verbal descriptors
(typically used on medicine labels to describe the frequency of occurrence of side-effects to the
medicine) are related to statistical probabilities:
A commentary on this approach [4] notes that, “The available literature suggests that a statistical
approach to describing risk is often met with satisfaction by the recipients. However, research into
what individuals understand by terms such as ‘very rare’, ‘common’ etc., suggests that the current
EU guidelines on verbal descriptors are not correctly matched with statistical probabilities. In
general, it appears that the public equate the verbal descriptors (very rare, common etc.) to risks
that are substantially higher than those defined in regulatory documents. Perceiving very small
risks is particularly problematic and a number of models have been proposed in the literature to
help with this. One scale is based on a different set of verbal descriptors (high, moderate, low, very
low and minimal), but this too may not be in accord with people’s actual interpretations.”
used. For this reason, such requirements should be considered as targets to be demonstrated
rather than absolute values to be proven in testing.
A hazard probability requirement shall therefore be stated as an unambiguous numerical target. It
is acceptable to use qualitative measures early in the development of a product but these need to
be changed to quantitative requirements or calibrated so that an objective verification can be
made.
The numerical target is derived by considering a numerical definition of broadly acceptable risk.
The numerical definition of broadly acceptable risk is then combined with the hazard classification
in order to arrive at a target. A simple model of how this can be done is as follows:
• Broadly acceptable risk = B
• Hazard classification for hazard n = RHn
• Requirement on hazard probability = B/RHn
Consider a typical automotive risk model where the hazard classification is a function of the
severity of the outcome of the hazard (the hazardous event) and the probability of the hazardous
event occurring. As discussed in Appendix B on hazard classification, the probability of the
hazardous event depends on:
• The probability of the failure occurring;
• The probability of the driver (or other person) being exposed to the hazardous situation;
• The probability of the driver (or other person) being unable to take the expected actions to
control the outcome of the hazardous situation.
For a given situation the latter two probabilities are usually constant but both can potentially give a
degree of risk reduction compared to the unprotected hazard. Therefore in order to ensure that the
overall risk is broadly acceptable it is necessary to set requirements for the first probability, namely
the probability of the failure occurring.
The MISRA Safety Analysis guidelines give an example of how random safety targets can be
derived from an overall measure such as the definition of broadly acceptable risk in society. In this
example, hazards are classified using a classification R1 … R5. A target is set for each
classification which represents the maximum acceptable failure rate of a random failure occurring if
the level of risk is to remain broadly acceptable. Thus (for example) for R4 the maximum
acceptable rate is 10-8 per operating hour for a single instance of a system. If a hazard denoted
H1 had been classified as R4 using the MISRA approach then an equivalent hazard probability
requirement could be written as:
“The occurrence rate of hazard H1 shall be less than 10-8 per operating hour.”
In order to derive hazard probability requirements using a scheme such as the one described
above, it is necessary to understand what is meant by broadly acceptable risk.
In reality, there is no such thing as absolute safety, and any activity has a risk associated with it
even if the probability is vanishingly small. The main aim of risk analysis and risk reduction in
safety-related systems engineering is to reduce the risks associated with a system to a level that is
“broadly acceptable”. There are a number of definitions available as to what constitutes “tolerable”
or “broadly acceptable” risk, but in general this is understood to mean a level of risk that is
equivalent to the risk exposure in everyday society.
The following risk categories are commonly encountered (see also the section below on ALARP):
• Unacceptable: this level of risk can never be accepted
• Tolerable: this level of risk can be accepted if further reduction is impracticable
• Broadly acceptable: this is equivalent to the risk exposure in everyday society
In deriving hazard probability requirements for a vehicle or a system, the objective is to take the
classified hazards and determine what measures have to be applied to reduce or mitigate these
risks to the “broadly acceptable” level. It is necessary to define a safety policy (at the company,
and/or product, and/or project level) in which the “broadly acceptable” risk is defined. Given the
identified risk reduction, associated requirements on hazard probabilities have to be derived.
Again for automotive systems, this is usually achieved by controlling the hazard probability so that
the overall hazard risk is reduced to a broadly acceptable level.
If a quantitative approach to hazard probabilities is being used, then the tolerable or broadly
acceptable risk will need to be a specific numerical target, e.g. x per year, y per hour. It will be
necessary to show how this level of tolerable risk relates to the risk exposure in everyday society;
for example, if there is a declared acceptable risk available in the literature (e.g. [3]) then this can
be used as a starting point. Note that such available targets usually relate to the overall risk and
broadly acceptable occurrence rates have to be derived from these.
If a qualitative approach is being used then the tolerable risk is likely to be expressed as a matrix
as shown in Section D.3.1.5.2. In this case, the categories will still need to be calibrated in some
way.
It is sometimes discovered in the course of analysing a system that it is not possible to reduce the
risks to a “broadly acceptable” level, but it is still desired to implement the system as the benefits
far outweigh the associated risks. In this case, the principle of “tolerable risk” applies along with the
concept of reducing risks ALARP (as low as reasonably practicable). An alternative term that may
be encountered is SFAIRP (so far as is reasonably practicable). These are essentially the same;
however, SFAIRP is the term most often used in legislation (e.g. the UK’s Health and Safety at
Work Act) and ALARP is the term used by practitioners.
A risk has been reduced ALARP when it has been demonstrated that the cost of any further risk
reduction, where the cost includes the loss of capability as well as financial or other resource
costs, is grossly disproportionate to the benefit obtained from that risk reduction. Further details of
the ALARP principle may be found in Def Stan 00-56 Part 2 Issue 3 [2].
In Def Stan 00-56 [2] the following definitions of risk are found:
• Broadly acceptable risk: A level of risk that is sufficiently low that it may be tolerated without the
need to demonstrate that the risk is ALARP.
• Tolerable risk: A level of risk between broadly acceptable and unacceptable that may be
tolerated when it has been demonstrated to be ALARP.
• Unacceptable risk: A level of risk that is tolerated only under exceptional circumstances
Note that IEC 61508 defines “tolerable risk” as “risk which is accepted in a given context based on
the current values of society”.
Establishing the tolerability of risk from a system depends on a number of factors, including the
technology involved, the purpose of the system, its domain and the expectations of society. The
UK HSE’s publication “Reducing risks, protecting people” [3] has some useful examples in
Appendix 4 showing the risks associated with certain industrial sectors and means of
transportation.
The latest version of the UK’s Def Stan 00-56 has some helpful guidance on this subject, which
has been reproduced and adapted below.
The approach to establishing tolerable risk is basically the same whether a quantitative or
qualitative approach to risk assessment is being adopted. The general principles are:
• Reduce system risks to a broadly acceptable level
• If it is not possible to reduce the risks to a broadly acceptable level, then demonstrate that the
risks are tolerable and have been reduced ALARP.
The definitions of “broadly acceptable”, “tolerable” and “unacceptable” need to be established
depending on the sector and the application. Figure D.4 from Def Stan 00-56 [2] shows how this is
applied for both quantitative and qualitative risk assessment.
Unacceptable
Very high
risk Class A
X per year likelihood
High
Tolerable Class B
likelihood
risk
Broadly
Low
acceptable Class D
likelihood
risk
Quantitative Qualitative
The following table from Def Stan 00-56 Part 2 [2] is useful in understanding how tolerable risk and
ALARP are related:
If quantitative hazard probability requirements are being used, the boundaries for tolerable risk and
broadly acceptable risk would be set as numerical probability targets e.g. X per year, Y per hour.
As explained in section D.3.1.4, broadly acceptable occurrence rates have to be derived from
these values for broadly acceptable risk.
If quantitative hazard probability requirements are being used, a discrete classification of risks can
be adopted (see IEC 61508 [1] part 5 and Def Stan 00-56 [2]). The risk classification is carried out
by combining the frequency of occurrence with the consequence, in order to create a matrix of risk
classes. Note that this is a separate analysis from hazard classification. The table below shows an
example of a risk classification matrix adapted. Note that, as with many of these examples taken
from IEC 61508, the actual population of the table is sector-specific. For this reason the entries in
the table are purposely not symmetrical.
Consequence
Frequency
Critical Severe Marginal Negligible
Frequent I I I II
Probable I I II III
Occasional I II III III
Remote II III III IV
Improbable III III IV IV
Incredible IV IV IV IV
In this example, the risk classes might be defined as follows. Note that these are examples and
actual interpretations are required for sectors and applications.
For these example risk classes, Class IV is “broadly acceptable”, Class III risks have to be reduced
ALARP and Class I risks are “unacceptable”. Class II risks are considered just inside the ALARP
region, and also have to be reduced ALARP but require a much higher level of justification.
In this example, the “consequence” classes might be defined as follows:
Category Definition
Critical Multiple deaths
Severe A single death; and/or multiple severe injuries or severe occupational illnesses
Marginal A single severe injury or occupational illness; and/or multiple minor injuries or
minor occupational illnesses
Negligible At most a single minor injury or minor occupational illness
Similarly the “frequency” classes might be defined as per the example in Section D.3.1 above.
If numerical values are associated with the frequencies, they have to be derived by considering the
operational profile of the system and a number of standards and guidelines are available (e.g. [3]).
For further discussion of the ALARP principle please refer to Def Stan 00-56 Part 2 Issue 3 [2]. The
standard also gives some guidance on applying ALARP to “complex” electronic systems (broadly
equivalent to E/E/PES, particularly PES, in IEC 61508 [1] terminology).
A general principle is that safety will be, so far as is reasonably practicable, at least as good as
that required by law. The key issue is that any tolerability criteria that are applied should be
justifiable and auditable.
The tolerability figures in [3] are given in terms of the individual risk of fatality per year; that is, the
total risk of being killed that a hypothetical individual is exposed to in any one year. These
represent the total risk to which an individual is exposed, so when considering specific risks an
assessment should be made as to how they contribute towards the overall level of risk. In addition,
consideration should also be taken of how many individuals are likely to be exposed to the risks. It
should also be noted that risk management for a system is usually considered in terms of the
likelihood of an accident occurring. Care needs to be taken in the use of statistics to ensure that
there is no confusion about what is being measured and the measurement units that are being
used.
The GALE (globally at least equivalent) and GAMAB principles (French: “globalement au moins
aussi bon” = globally at least as good) assume that there is already an existing “acceptable”
solution with a baseline risk and require that any new solution shall in total be at least as good. The
use of “globally” is important in this context, because it allows for trade-offs: an individual aspect of
the safety system may indeed be worsened if it is compensated for by an improvement elsewhere.
The GALE or GAMAB principle is similar to the ALARP principle but it is not equivalent to it. The
crucial difference is that an approach based on ALARP requires all hazards to be demonstrated to
have the associated risks reduced to the broadly acceptable level, or if it is not practical to reduce
a particular hazard to the broadly acceptable risk level that the risk associated with that hazard is
tolerable and has been reduced ALARP. GALE or GAMAB permits a higher risk to remain with
some hazards if overall the level of risk remains broadly acceptable. The final decision on whether
to use an ALARP or GALE/GAMAB approach will depend on the precedents both within the legal
system and the application sector and these may be territory-dependent. However it appears that
GALE/GAMAB is of most relevance to development of systems where an existing system is being
replaced or upgraded and there is suitable baseline data to compare it with. For a novel or
distributed system it would appear that ALARP is more appropriate.
The German MEM (minimum endogenous mortality) principle takes as its starting point the fact
that there are various age-dependent death rates in society and that a portion of each death rate is
caused by technological systems. The requirement that follows is that a new system shall not
“significantly” increase a technologically-caused death rate for any age group. Ultimately, this
means that the age group with the lowest technologically caused death rate, the group of 5 to 15
year olds, is the reference level.
Table D.1 provides a compact summary of the findings concerning hazard probability
requirements.
General characteristics Requirements for maximum acceptable failure rates in order to
maintain a level of risk that is broadly acceptable.
How to determine Use hazard classification results and a suitable measure of broadly
suitable requirements acceptable risk to set maximum acceptable failure rates for each
hazard classification category in order to keep the risk associated with
the hazard at the broadly acceptable level.
How to express Requirements should be expressed as an unambiguous and verifiable
requirements statement (see examples).
Specific difficulties Measures of broadly acceptable risk are usually specified in terms of
the probability (or rate of occurrence) of an unwanted outcome (e.g. a
fatality) and a calculation is needed to relate these to failure rates of
(e.g.) electronic components.
The failure rates or probabilities are usually targets rather than
absolute values (see “verification issues”).
Relationship to other Closely related to fault tolerance requirements and quantitative
types of requirements requirements for hardware architecture. For example specific
architectural features such as multiple lanes or redundant elements
may be necessary to achieve the required failure rate.
Relationship to other Not applicable.
requirements of the
same type
D.3.2.1 Introduction
Fault tolerance, in a general sense, is the property of a system to exhibit some desired behaviour
even in the presence of faults and errors. The requirements can range from the full application
functionality (in case of complete fail-operational behaviour) over partial functionality and degraded
functionality down to the simple exclusion of wrong or dangerous functionality (in the case of fail-
silence behaviour). The assumed faults must be specified with respect to their maximum number,
the locations of occurrence and the malfunctions taken into account (like stuck-at a value, omission
of output values, delay of operations etc.). Consequently, fault tolerance requirements must
sufficiently express the assumed faults.
Fault tolerance requirements need to be broken down into requirements on the service to be
provided by the countermeasures. Otherwise the work of the system designers may remain
unclear and, even worse, it may be impossible to see whether the fault tolerance requirements are
satisfied. In particular, the goals to be checked by analysis tools may be ambiguous. It should be
noted that requirements on the service provided by countermeasures are in principle outside the
scope of "fault tolerance and functional degradation requirements". However, this section D.3.2
takes a holistic approach to fault tolerance and functional degradation, covering the entire range
from top-level requirements to detailed requirements on the technical countermeasures.
In complex systems with strong cost-efficiency constraints, as are the integrated safety systems
addressed in the EASIS project, it is of particular importance to require the right degree of fault
tolerance of the various functions realized by different components. This means individual
requirements in various hardware and software region of the system rather than “flat”
homogeneous requirements throughout the whole system. By a fine-grained formulation of fault
tolerance requirements the balance can be kept between a sufficient degree of safety on one side
and a low redundancy overhead on the other. Typically different components require their
individual fault assumptions. And, depending on the application, different extents of degraded
functionality may be acceptable.
It should also be noted that there is no need for just one single fault tolerance layer in a system.
Instead a staggered approach could be both more effective and more efficient. Having various fault
tolerance layers to prevent vertical fault propagation and additional countermeasures against hori-
zontal fault propagation will allow for the provision of (partly) reliable services independent of the
application service on higher layers. Reliable end-to-end communication is an example of such a
staggered approach.
Once the fundamental system structure is defined the requirement of reliable service is identical to
requiring fault tolerance properties of certain components. Furthermore, the allocation of fault toler-
ance to a system structure implies the definition where fault propagation is allowed among compo-
nents, and where it must be prevented by fault tolerance techniques. This includes also the most
primitive, yet effective contribution to fault tolerance: the isolation of components by disallowing
interactions among them.
14.11.2006 2.0 D-19
EASIS Deliverable D3.2 Part 1 - App D
D.3.2.2 Procedure
Besides faults and degradation of functionality, fault tolerance requirements should also consider
structural aspects like fault propagation, isolation and location of fault tolerance techniques, as far
as high cost efficiency of the system is aimed at. This leads to the following basic steps to express
fault tolerance requirements.
Please note that very important initial steps like the definition of the safety integrity level are not
dealt with here. They are part of the process frameworks described in WT 3.1.1 (appendix A).
• non-plausible value
• maximum torque exceeded
b2) For each malfunction identify the components whose faults may lead to or at least contri-
bute to the respective malfunction anywhere in the system. On the basis of this informa-
tion the fault regions will be defined (see step c).
This step is related to Appendix C which deals with analyzing the relationship between
faults and failures.
c) Form fault regions: This definition clarifies what is counted as a single fault. A fault region is a
set of components whose internal disturbances are counted as exactly one fault, regardless of
where the disturbances are located, how many that occur and how far they stretch within the
fault regions. A fault region may contain just a single transistor or even a complete node of the
network. By the definition of fault regions the granularity of the fault tolerance concept is
defined. Simultaneous fault occurrences in two fault regions are counted as a double fault.
This topic is connected to WT 3.1.3 (appendix C) in which the relationship between faults and
failures and the concept of a fault region are explained.
c1) Define sets of hardware and/or software components that form a fault region with respect
to a malfunction. Typically these sets are disjoint. If components do not fall in any of the
fault regions they belong to the so-called hard core, where no fault tolerance is required
(due to a high degree of perfection of these components, for example.
d) Form containment regions: For each fault region the task of the fault tolerance technique has to
be expressed. The containment region is the set of components which may be adversely af-
fected by the respective malfunction. In other words: The containment region expresses the
borderline where fault propagation must stop. On fault occurrence in a fault region all compo-
nents outside the corresponding containment region must not become erroneous. Precisely
speaking: The functions of the components outside must not violate the allowed degraded
functionalities of the services they provide.
d1) For each fault region a containment region has to be defined.
A “natural” condition must hold for all containment regions: The components in the
containment region must be a superset of the components of the corresponding fault
region – or set of fault regions in the case of multiple fault tolerance.
d2) For each containment region the allowed degraded functionality of the components out-
side must be defined. This step can be considered a refinement of step a2). In a2) the
basic ideas are expressed whereas here the desired degree of fault tolerance is ex-
pressed in terms of malfunctions, fault regions and containment regions.
A single walk through the steps above cannot be expected sufficient for complex systems. Instead
a number of iterations will be necessary due to the fundamental nature of faults. When you start
specifying the functionality at application level there are no faults in the system (apart from flaws in
the specification itself). Faults can only occur when the functions are implemented by components,
which are always non-perfect. With increasing detail of the system design additional considerations
of faults may become necessary. Consequently, additional countermeasures may become
necessary, and the fault tolerance concept may appear in a new light, which may cause revisions.
For this reason a clear approach for the (potentially repeated) formulation of fault tolerance re-
quirements is helpful.
When complex fault tolerance requirements have been formulated for complex systems known
specification faults may appear:
• Fault tolerance requirements may be incomplete. Example: Fault regions and/or containment
regions might be missing for some identified malfunctions.
• Fault tolerance requirements may be inconsistent. Example: Fault regions might intersect the
respective containment regions.
Checking the fault tolerance requirements for such flaws should be done by the personnel ex-
pressing them. These checks can be performed separately requirement by requirement. In addi-
tion, once the structure of interactions among components is known, one can also check the
requirements jointly on the basis of a coarse and simple system model. In this model all the
components, malfunctions, fault regions, fault propagations through interactions among them,
containment regions and the allowed degradation of functionality are expressed.
Note that a model of the interactions is not necessary to express fault tolerance requirements, as
can be seen from the steps in section 2. However, if one is able to distinguish proper interaction
from wrong behaviour at the interfaces between components, then a primitive model of the fault
tolerance properties becomes “operational”. The interactions need not be expressed in much
detail. A distinction in correct interaction and various classes of malfunctions is sufficient to model
fault propagation among components. Then the propagations can be analysed to obtain the set of
components where faults spread to. Fault tolerance exists to the degree the containment regions
are not violated by fault propagations.
Moreover, there is also a check of the proper definition of containment regions. One has simply to
look at the fault-infected region and the effect to the functionality of the components there to see
whether the intended fault tolerance is achieved.
This can be seen as a connection to Appendix E which deals with Safety Case construction.
A mathematical model is presented to express the origin, the propagation and the potential
containment of faults by the fault tolerance techniques applied in the system. The model can be
implemented by a program (or a database) to check whether the defined containment regions are
violated.
Besides usual mathematics the following notation is used: Let a = (a1, ... , an) be an element of
A1 × ... × An. Then a|Ai denotes the projection of a to Ai, which is equal to ai. The notation ℘(S)
expresses the powerset of a set S.
The system structure is modelled in a completely static way. Between any pair of potentially inter-
acting components instances of services are located to express the functionality of the interaction.
The model abstracts from timing and all dynamic aspects. For both components and services types
are defined.
Let C be a set of components.
Let CT be a set of component types. C and CT must not share elements: C ∩ CT = ∅.
Let t: C → CT be a function defining the type of each component. A component is
considered an instance of a component type.
14.11.2006 2.0 D-22
EASIS Deliverable D3.2 Part 1 - App D
Let ST be a set of service types. Depending on its type each component provides some ser-
vices of particular service types. Remark: Sometimes the service is called “function” of
the component. Here we call it “service” for the sake of a clearer distinction from the
various mathematical functions. The sets C, CT and ST must not share elements:
(C ∪ CT) ∩ ST = ∅.
Let nu : CT × ST → INo be a function expressing the number of used services. Each compo-
nent type can use zero, one or more instances of a service type. A voter uses three
calculation results as its input, for example.
Let nr : CT × ST → INo ∪ {∞} be a function expressing the number of services realizations.
Each component type can implement zero, one or more instances of a service type. A
power supply may realize two 5V outputs, for example. A file server may implement file
access for any number of clients. This infinite number of service instances is expressed
by ∞.
Def: S = { (x, y, z) ∈ C × ST × IN : 1 ≤ z ≤ nu(t(x), y) }
is the set of services. Note that the set of services cannot be chosen freely. Instead it
follows from the set of components and the service types they use. Remark: The
realization of services by components does not necessarily lead to elements of S,
because the services may not be used by any component. For this reason the definition
of S is based on function nu rather than nr.
Def: u : C → ℘(S) such that u(x) = { y ∈ S : y|C = x }
returns the set of services used by a component.
Let r: S → C define the realization of a service by a component. This is the central function
to express the system structure (“who delivers what to whom?” or, more precisely,
“which component provides which service for some other component?”). The definition
of r must not require quantities of services from components they are unable to realize
according to their component type and function nr. For this reason the following condi-
tion must be satisfied:
The left side of the inequality is the quantity of service x requested from component r(x),
i.e. the cardinality of the inverse of the one-element set {r(x)}. The right side is the
quantity of service x component r(x) is able to realize according to its type t(r(x)).
Figure D.5 depicts an example of a system structure with two component types and two service
types CT = {ct1, ct2}, ST = {st1, st2}. The first component type ct1 provides a realization of one
instance of st1 (nr = 1 in upper part of the figure) and uses two instances of st1 (nu = 2) and one
instance of st2 (nu = 1), whereas ct2 realizes three instances of st1 (nr = 3) and an arbitrary
number of instances of st2 (nr = ∞), and uses only one instance of st1 (nu = 1). This structure is
instantiated as follows: There are two components c1 and c2 of type ct1, and one component c3 of
type ct2. This results in exactly seven services: S = {s1, ... , s7}. Function u simply “translates”
function nu from the type definitions to the concrete component and service instances. Function r
expresses the allocation of services in the system:
• Component c3 provides service type st1 once to c1 (s1) and twice to c2 (s4 and s5), as well as
service type st2 once to c1 (s3) and once to c2 (s6),
c1 c2 c3
u u u
r r r r
s1 = s2 = s3 = s4 = s5 = s6 = s7 =
(c1, st1, 1) (c1, st1, 2) (c1, st2, 1) (c2, st1, 1) (c2, st1, 2) (c2, st2, 1) (c3, st1, 1)
r r r
The fault model comprises the description of fault origins in components and the propagation to
further components through functions and components. The fault model is not deterministic,
because each cause may have various effects. This effect is expressed in the model by defining
sets of malfunctions for each service.
Let F be a set of component faults, where ok ∈ F. The element ok denotes the fault free
case. F must be disjoint from C, CT and ST: (C ∪ CT ∪ ST) ∩ F = ∅.
Let pf : CT → ℘(F), where ∀ x ∈ CT: ok ∈ pf(x),
express the potential faults of each component type. For all component of a given type
the sets of potential faults are the same.
Let M be a set of service malfunctions, where ok ∈ M. The element ok denotes the correct
function. Except the ok element, M must be disjoint from C, CT, ST and F:
(C ∪ CT ∪ ST ∪ F) ∩ M = {ok}.
Let pmu : ST → ℘(M), where ∀ x ∈ ST: ok ∈ pmu(x),
express the potential malfunctions towards the component using a service of a given
type. For all services of a given type the sets of potential malfunctions towards the
using components are the same.
Let pmr : ST → ℘(M), where ∀ x ∈ ST: ok ∈ pmr(x),
express the potential malfunctions towards the component realizing the service of a
given type. Compared to pmu, this is the reverse direction. Typically, the component
providing a service is less affected by malfunctions than the using component. How-
14.11.2006 2.0 D-24
EASIS Deliverable D3.2 Part 1 - App D
ever, sometimes wrong usage has a negative influence back to the realizing compo-
nent. For all services of a given type the sets of potential malfunctions towards the
realizing components are the same.
For each component type ct ∈ CT and each service type st ∈ ST realized by a component c of
type ct, formally nr(ct, st)>0 and t(c) = ct, the following behaviour function brct,st expresses fault
propagation to each realized service s of type st, formally s|ST = st. The value of function brct,st
depends on all faults and malfunctions affecting c.
• In the same way the following functions have to be defined for ct1:
buct1,st2
Fault tolerance is defined in the following way. For a set of faults within a set of components, called
fault region, the set of malfunctions of a larger set of components, called containment region, must
not exceed a given set of allowed malfunctions. Typically one allows only for {ok} to require com-
plete fault tolerance.
Let FT ⊂ ℘(C) × ℘(F) × ℘(C) × ℘(M)
D.3.2.4.4 Example
P1
S1 Processor CA1
P1
Sensor Comp. Adapter
S1 C1 A1
Processor
P2
P2 B1 B2
Bus Bus
B1 B2
P3
S2 Processor
P3
Sensor Comp. Adapter
S1 C2 A2
Processor
P4 CA2
P4
In the diagram fault regions are marked in red. The names of the fault regions are identical or simi-
lar to those of the respective components. Each sensor, processor or bus forms a fault region of its
own, whereas pairs of a comparator and an adapter form a joint fault region. Containment regions
are not depicted to avoid too complex illustration.
The functions and potential malfunctions of the components are as follows:
Fault free sensor Si: no input ...................................... output: c
Faulty sensor Si: no input ...................................... output: c, v, o
(no undetectably wrong sensor values are assumed here)
Fault free processor Pi: on input c .................................. output: c
on input v, o ............................... output: e
Faulty processor Pi: on any input ............................... output: c, v, w, dc, dv, dw, o
Fault free comp./ad. CAi: on input pair (x, y) ...................... output: c
where x = c and y ∈ {c, e, v, dv, dw, o},
or vice versa: x ∈ {c, e, v, dv, dw, o} and y = c.
on input pair (w, w) .................... output: w, e
on any other input pair ............... output: e
(timeout check in the comparator is assumed)
Faulty comp./ad. CAi: on any input pair (x, y) ............... output: c, v, dc, dv, o
where x ∉ {w, dw} and y ∉ {w, dw}.
(comparator cannot create undetectably wrong values because CRC
generation in the processors is assumed)
on any other input pair ............... output: c, v, w, dc, dv, dw, o
Fault free bus Bi: on input pair (x, y) ...................... output: c
where x = c and y ∈ {c, v, o},
or vice versa: x ∈ {c, v, o} and y = c.
on any other input pair ............... output: v, w, dv, dw, o.
Faulty bus Bi: on input pair (x, y) ...................... output: c, v, o
where x ∈ {c, v, o} and y ∈ {c, v, o}.
else on input pair (x’, y’) ............. output: c, v, dc, dv, o
where x’ ∈ {c, v, dc, dv, o} and y’ ∈ {c, v, dc, dv, o}.
on any other input pair ............... output: c, v, w, dc, dv, dw, o
(bus cannot create undetectably wrong values due to CRC protection)
Note that the above dependencies of the functions and malfunctions from the inputs of the com-
ponents could (and should) be more detailed, whereas the distinction in the malfunction classes c,
e, v, w, dc, dv, dw and o seems to be sufficient for many types of systems.
Now the fault regions and containment regions can be specified as follows. Here, only some
examples are given, not the complete list:
Malfunctions fault region containment region function outside (degradation)
• c, v, o S1 S1, P1, P3 c
• c, v, o S2 S2, P2, P4 c
of B2. This violates the specification. None of the busses transfers the correct value. By insertion of
bus guardians the system failure could have been prevented.
The mathematical model described above allows to precisely define fault tolerance requirements in
a formal way. However, it is rather difficult to handle the involved formulas. In order ease the usage
of the model, a prototypical tool “FT.rex” has been developed. It provides all the means of the
mathematical model in a more user-friendly way.
The scope of the tool is identical to the scope of the mathematical model. It allows to specify
system structure, fault model and fault tolerance specification of a given system.
FT.rex builds up a relational database model of the entered data und uses Microsoft Access as a
basis. It also checks whether the data entered is consistent, for example malfunctions cannot be
assigned to a component which does not exist.
The main benefit of FT.rex is that it requires the user to explicitly formulate his fault tolerance
requirements in a logical way. Doing so, design flaws can become obvious at early stages.
It has to be noted that FT.rex is not a design tool. It can be used to accompany the design process,
though. In any case, the focus lies on specifying the fault tolerance characteristics of a given
system and finding possible flaws or inconsistencies.
The user interface is split up into three sections “system structure”, “fault model” and “fault
tolerance specification” according to the scope described before. The main menu groups these
three sections together on one screen (see Figure D.7 ).
Each screen in FT.rex offers a short help area at the upper right corner and a menu at the lower
right corner. In case of the main menu, the user can open the form he wants to edit, quit the
program or show an “about”-dialog. The design of all screens is similar, so that it should be easy to
navigate through the program.
FT.rex is started by double-clicking the “FT.rex.mdb”-file. This should cause Microsoft Access to
open and automatically show the FT.rex-main menu. The user can navigate through the whole
program without using the Microsoft Access-menus. However, as this is a prototypical tool, he is
not prevented from doing so.
Each GUI-element has been made using the form-editor of Microsoft Access. The usage of these
elements is not explained in detail here, as it should be familiar to any user of the program.
In short, the data is entered in a tables, where a blue title bar shows the general logical contents of
the table (in terms of the model presented in chapter D.3.2.4). The column-headers inside the
tables show the meaning of each field. Selecting a row of data is done by clicking on the marker at
the left side of the table (triangle). A new row can be entered at the end of the table, marked with
an asterisk. Navigating through data is done with the control elements at the bottom (back,
forward, go to, etc.).
The following chapters describe in more detail the three different topics of system structure, fault
model and fault tolerance specification. These are covered in the same order as the buttons of the
main menu: from top to bottom in each of the three topics.
Each button in the “system structure” section toggles to a window, in which the user may enter the
respective data.
At first, the user should provide information of the component types which occur in the system.
This can be done by clicking on the button “Manage Component Types” in the main menu. This
can for instance be the abstract component type “motor” (cf. Figure D.8).
The result is a complete list of component types. After entering that list, a click on “close form”
returns to the main menu. The dialog “component types” can be opened again at any time.
After specifying the component types of the system, instances of these component types have to
be entered. For example, a system may contain two identical motors.
As shown in Figure D.9, each component instance is assigned to a specific component type which
is selected on the left side of the window. The two tables dynamically adjust themselves to each
other.
After specifying the component instances, list of service types is defined (not shown here, as
identical to definition of component types).
These service types are then assigned to specific component types. For each pair of a service type
and a component type, two numbers nr and nu specify, how often the component type realizes the
service type and how often it uses this service type (see Figure D.10). For example, any motor
realizes torque for an infinite number of other components (99) but does not use the service type
torque itself (0).
After this basic relationship is defined, the actual assignment of components (not component
types!) and service types has to be done. The goal is to thus have an identifiable instance of a
service which is delivered from a specific component to a specific other component. This is done
by assigning using and realizing component to a service type together with an unambiguous
number (see Figure D.11).
The program takes care that, the number of specified possible instances of a service (defined in
the dialog (“Manage number of service realisations/utilisations”) is not exceeded.
This section describes possible faults of component types and service types and their relationship
to each other.
First, a number of component faults has to be specified in general (not shown here). Then, these
component faults are assigned as possible faults to a specific component type (see Figure D.12).
Malfunctions of service types are handled separately. After defining all possible malfunctions in
general, they are assigned to the usage or realization of services. Figure D.13 shows the
assignment of possible malfunctions to the usage of a service type. Assignment concerning the
realization is done in the same way.
Once the malfunctions have been assigned to the realisation or utilisation of services, they can be
grouped together with regard to service types or component types. This is needed to combine
multiple malfunctions together. This is done identically for both realisation and utilisation (not
shown here).
Grouping of malfunctions is needed, because these groups of malfunctions define the concrete
behaviour of a component when realizing or using services.
This behaviour defines the propagation of faults in the system and is shown in Figure D.14.
Each row in the table has to be read as follows:
component type
PpmuST
service type
br
XpmrCT
component type
XpmuCT
The same principle is used when defining the fault propagation regarding the usage of services.
Lastly the desired behaviour of the system regarding fault tolerance can be specified. This is done
by clicking on the respective button in the section “fault tolerance specification” in the main menu.
In the fault tolerance-dialog (see Figure D.16), the user can specify several sets of fault tolerant
behaviour. Each set is characterised by
• a fault region FR
• the component faults which shall be tolerated SF
• the containment region CR
• the malfunctions still allowed at the border of the containment region SM
The first two elements describe the faulty behaviour which hall be tolerated. The Containment
region describes the desired border of the propagation concerning these faults. The last element
describes the degradation in terms of allowed malfunctions (e. g. “reduced speed”).
The respective menu is structured into five different sections. First the user shall select a given set
of fault tolerance specifications in the top frame (or enter a new one). These specifications are
simply numbered.
After that, the user can enter fault region, faults, containment regions and degradation in the four
frames below. The fault and fault containment regions are specified by simply listing a number of
components. Concerning faults and malfunctions FT.rex takes care that only faults or malfunctions
valid for these regions can be selected.
When modelling systems and errors using the method proposed in this document, one has to take
special care with regard to specifying the correct level on which errors are propagated.
A short example:
Application layer
A1 A2
Network layer
C1 C2
Node 1 Node 2
D.3.2.6.2 Monotony
The propagation of errors in the model is regarded monotonous. That is, no correction of errors
can be modelled at the current stage.
C V
The system shown in Figure D.18 consists of three application processes (A, B, C) a switch (U)
and a comparator (V). Initially, the result of A is used. If V discovers a difference between A and C,
this causes U to switch to B instead. The whole system can thus tolerate a single fault in A, B or C.
Now we consider A being faulty (B and C are assumed fault-free). At first, switch U forwards the
erroneous result for a short period of time. Then V detects the deviation of the results of A and C,
causing U to switch to B. Finally a correct value is output to subsequent processes.
Thus, it depends on the sequence in which the states of the different components are evaluated.
As a program is not able to foresee this sequence, situations like the one described above have to
be avoided. For non-monotonic systems a complete state-space analysis is of course possible.
The method described here is only valid for monotonic systems and can be used as a faster
substitute for a complete state-space analysis. It is yet important to note that it cannot be used for
non-monotonic systems.
D.3.2.7 Conclusion
Three ways to support the formulation of requirements have been presented in the preceding
chapters:
The procedure can be divided into four steps: description of the system and the required
degradation, description of the possible malfunctions, forming of fault regions and specification of
containment regions. It is important to note, that a single walk through of theses steps is not likely
to be enough for formulating fault tolerance requirements. Instead, a number of iterations of these
steps will be necessary in most cases. Once the requirements are expressed, a check of the
requirements is recommended.
The mathematical model presented in this document describes the formal background of fault
tolerance requirements. It provides clear descriptions for the structure of the system, the fault
model and the desired fault tolerance characteristics. However, due to the formal nature of the
model, these descriptions need to be made as easy to use as possible.
The prototypical tool represents one way of facilitating the formulation of fault tolerance-
requirements. By using an entity-relationship-model of the mathematical formulae, it is possible to
express fault tolerance requirements by a set of tables in a database-management-system. The
prototypical implementation “FT.rex” (standing for “fault-tolerance requirements”), which has been
presented here, is a first attempt of such a tool.
As a result of the work in EASIS, some aspects of the proposed model have shown to be important
to keep in mind for the user. Being described in more detail in the text itself, it is in short important
to know that a) in layered systems the level of error propagation has to be chosen correctly and
that b) non-monotonic behaviour (for example self-repair) is not supported by the proposed model.
Despite of this limitation, the proposed model can be used as a basis for a rather fast analysis of
monotonic system, compared with classical methods like state space-analysis. In any way, the
mere fact that the usage of a formal model requires the user to clearly express his or her thoughts
can be regarded a benefit for the detection of possible flaws in fault tolerance requirements. Thus,
even without actually using all features, formal expression of fault tolerance requirements is likely
to contribute to increasing the reliability of the final system.
D.3.2.8 Summary
Table D.2 provides a brief summary of the findings concerning fault tolerance requirements
presented in this document.
b. Distinguish malfunctions
Requirements on the system architecture are often highly appropriate when a dependable system
is to be developed. It may of course be debated whether such "requirements" are really
requirements or if they are part of the system design work. However, as hinted at in the
introduction (D.1) to this appendix, our requirement scope covers both implementation-
independent and implementation-dependent issues.
This requirement type is most easily explained by a few examples:
• It may be the case that a specific action performed by a system may lead to a critical
hazard if it is performed at the wrong time. It may then be appropriate to require that the
action shall only be performed when two independent subsystems both agree that the
action shall be carried out. The system architecture features necessary for such a "two
out of two" (2oo2) scheme may be specified in the form of system architecture
requirements. For example, one subsystem could control an actuator while the other
subsystem controls the power supply to the actuator.
• If the hazard occurrence analysis shows that a failure of a particular sensor may lead to
a critical hazard, it may be appropriate to require that the system is equipped with
redundant sensors. Such redundancy requirements can be seen as system architecture
requirements. In this case, an error detection mechanism that capitalises on this
redundancy would additionally have to be specified. Error detection is discussed in
more detail in section D.3.4.
When redundant components are employed, the issues related to independence between the
components are very important. Section D.3.10 discusses requirements concerning independence
in more detail.
A summary of the issues related to system architecture requirements is given in Table D.3 below.
In order to achieve the necessary degree of dependability, mechanisms for error detection during
runtime have to be implemented in any but the most trivial systems. This section is concerned with
establishment of requirements that describe the error detection mechanisms and the
corresponding reactions to be implemented in the system of concern. Such requirements can be
provided to the hardware and software designers for implementation.
It is important to understand that this type of requirements may be expressed at different levels of
detail. In an early phase of the development process, it may be sufficient to specify that a certain
entity, for example vehicle speed data, shall be checked with respect to plausibility. In a later
stage, this plausibility check could be specified in more detail. Thus, this requirement type covers a
broad range from pure requirements to detailed descriptions of the design. Consider the following
example.
Requirement1: The system shall check the longitudinal acceleration sensor signal
according to the following:
o Every 10 ms, the system shall read the acceleration sensor value.
o If the sensor value is found to represent an acceleration larger than +3 m/s^2, this
value shall be replaced by the latest sensor value that did not exceed this threshold.
Furthermore, an error counter shall be incremented by COUNT_UP steps, unless
the error counter is already at its MAX_COUNT value. In case the incrementation
results in a value higher than MAX_COUNT, the value of the error counter shall be
set to MAX_COUNT.
o If the sensor value is found to represent an acceleration of +3 m/s^2 or less, an
error counter shall be decremented by COUNT_DOWN steps, unless the error
counter is already at its MIN_COUNT value. In case the decrementation results in a
value lower than MIN_COUNT, the value of the error counter shall be set to
MIN_COUNT.
o If the error counter has a value of ERROR_THRESHOLD or higher, the system
shall set all its outputs to their passive* states for the remainder of the current
driving cycle.
* the last section of the requirement should additionally include a description of (or a
reference to) the concept of "passive states"
Is this example a requirement or a description of an implementation? Obviously, the border
between requirements and implementation can be difficult to define. We will not attempt to define
1 The example is given as a single requirement but it may be argued that it is better to split this into a number of atomic requirements.
The main requirement could state that the sensor shall be checked, another requirement could specify the sample rate, a third could
describe the incrementation of the error counter, etc. However, this would typically lead to many cross-references between the
requirements and therefore a more complex requirement structure.
such a border here. For the purpose of this section, any error detection mechanism description that
could be handed over to someone for further detailing or implementation is considered to be a
requirement.
It should be noted that issues related to error detection mechanisms are partly addressed in
section D.3.2 (“Fault tolerance requirements”), albeit at a higher abstraction level.
It is beyond the scope of this document to describe all principles for error detection that could be
employed in an Integrated Safety System. However, the following subsections provide an overview
of the most common principles in order to clarify the scope of this requirement type.
Plausibility checks are heavily used in automotive electronic systems. Such checks are typically
applied to the data entering a control unit. This input could be a sensor signal or some data
received via a communication network from another control unit. However, plausibility checks can
be applied to any data for which it is possible to define a plausibility criterion. Thus, inputs as well
as intermediate calculation results and final outputs could be checked for plausibility.
The plausibility criterion could be more or less complex, from a simple “within defined range” check
to a complex criterion involving the relationship between the dynamic behaviour of a number of
entities. In short, anything that is known in advance about an entity or about the relationship
between entities could potentially be used to define plausibility mechanisms. Much of the data
processed by electronic control systems represents physical quantities such as velocity,
acceleration, pressure, torque, temperature, etc. Such quantities should follow the laws of nature
and are therefore particularly well-adapted to plausibility checks.
Plausibility checks are effective regardless of what the underlying fault that causes the error might
be. Implausible data can be detected regardless of whether the root fault is a specification fault, a
software design fault, a hardware design fault, a manufacturing fault, a random hardware fault or
any other fault.
Detection of abnormal electrical signal levels can be considered as a special case of plausibility
checks. Two examples of electrical monitoring mechanisms are:
• Monitoring of supply voltage
• Checks intended to detect short-circuit and open-circuit
In a system based on redundant hardware components that perform the same task, error detection
is basically a matter of comparing the outputs from the redundant units and checking if the
difference fulfils an agreement criterion.
This error detection technique is particularly relevant for sensors and processing units.
Redundancy facilitates the detection of errors originating from random hardware faults in such
components.
It should be noted that requirements for comparison of redundant information typically leads to
architectural requirements (see D.3.3) that the underlying redundancy shall exist in the system.
There are many ways of detecting errors in processing units. Some examples of such mechanisms
are:
• Processor built-in test
• Check of execution flow, based on signature monitoring or similar technique
• Watchdog timer
• ROM/Flash memory checksum
• Write/Read test of RAM
• Exceptions inbuilt in the processor
• Check of memory accesses with respect to allowed address range
• Questions/Answers check performed by processor-external component
In distributed systems, the communication network that ties the involved control units together is
obviously an important part of the system. By performing checks on the incoming messages, a unit
can detect errors caused by faults in other units and in the communication itself. The error
detection principles that can be employed include the following:
• Mechanisms inbuilt in communication protocol (error detection code, frame format check
etc)
• Message timeout check (message not received within the expected time window)
• Update timeout check (message not indicated as updated by the transmitting unit within the
expected time window)
• Application-level checksum check (checksum protecting a signal from the generation by the
application layer in one unit to the reception in an application layer in another unit, not just
the signalling on the communication link)
• Message sequence counter check
• Consistent view in distributed systems (membership, etc)
As mentioned in section D.3.4.1.1, plausibility checks may of course also be applied to data
received via the communication network.
A typical first step in the specification of specific error detection mechanisms is to decide upon a
set of "default" error detection mechanisms to implement. These mechanisms are those that can
be decided upon without any preceding hazard occurrence analysis, since "everyone" knows that
they will have to be implemented anyway. Typical examples are ROM checksum and monitoring of
supply voltage. These default mechanisms shall be included in the requirement specification and
shall thereafter be considered as included in the system design. In other words, any hazard
occurrence analysis performed after this will be based on the assumption that these mechanisms
are implemented. Of course, it will later have to be verified that these mechanisms are indeed
implemented in the final system design but that is another story.
When a hazard occurrence analysis has been carried out, either at a conceptual or a detailed
level, its results will describe the relationship between faults and errors on one hand and hazards
on the other. If a quantitative analysis has been performed, the results will additionally include
estimations of the contribution from a particular fault to the hazard probability.
In order to enable suitable error detection requirements to be specified, some higher level
requirements must have already been defined. More specifically, requirements on hazard
probability and/or fault tolerance have to exist. Such requirements are addressed in other sections
of this appendix. Examples of such requirements are:
• The occurrence rate of hazard H9 shall be less than 10^(-7) per hour
• “A single fault shall not lead to hazard H9”
• “A pressure sensor fault shall not lead to hazard H4”
Thus, the following information is assumed to be available, at least partly, when requirements on
specific error detection mechanisms are to be specified:
• Identification of the faults that can lead to a particular hazard (i.e. results of the Hazard
Occurrence Analysis, see Appendix C)
• Estimations of occurrence rates for those faults that can lead to the hazards (i.e. results of the
Hazard Occurrence Analysis, see Appendix C)
• Requirements that specify the faults that are not allowed to create a particular hazard (i.e. fault
tolerance requirements, see section D.3.2)
• Requirements on tolerable occurrence rates of each hazard (i.e. hazard probability
requirements, see section D.3.1)
Based on this information, requirements for specific error detection mechanisms can be
determined as follows:
1. If the hazard occurrence analysis shows that a particular fault can lead to a particular hazard
and this is forbidden by a fault tolerance requirement, there are two options:
a) Change the system architecture radically so that this causal path from fault to hazard no
longer exists. (See section D.3.3.)
b) Define an error detection mechanism that breaks the cause-consequence chain from fault
to hazard. If this mechanism does not completely eliminate the error propagation from the
fault to the hazard, the mechanism needs to be redefined or additional mechanisms have
to be defined. This process is repeated until the propagation is eliminated or until the
effectiveness2 of the combined mechanisms is so high that it is virtually impossible for the
fault to create the hazard. In the latter case, the original fault tolerance requirement is not
strictly met but the solution may still be approved if the decision policy supports approval
of such deviations.
2. If the hazard occurrence analysis shows that the occurrence rate of a particular hazard
exceeds the tolerable limit defined by a hazard occurrence rate requirement, there are two
options:
a) Change the system architecture radically so that a tolerably low hazard occurrence rate is
achieved. (See section D.3.3.)
b) Define error detection mechanisms that partially break the error propagation path from
those faults that significantly contribute to the occurrence of the considered hazard. If the
combination of these mechanisms do not provide sufficiently effective3 protection against
the error propagation, they need to be redefined or additional mechanisms have to be
defined. This is a complex process that implies a loop between the definition of error
detection mechanisms and the analysis of the effectiveness of the combined mechanisms.
The loop is typically stopped when the hazard occurrence requirement is met, but an
ALARP requirement (see section D.3.1.5) would additionally mean that the loop is
2 The effectiveness of the error detection mechanisms can be estimated by an iteration of the Hazard Occurrence Analysis, with the
mechanisms assumed to be imlemented in the system.
3 The effectiveness of the error detection mechanisms can be estimated by an iteration of the Hazard Occurrence Analysis, with the
mechanisms assumed to be imlemented in the system.
continued until the cost of any further risk reduction is grossly disproportionate to the
benefit obtained.
A requirement that the system shall perform a specific error detection mechanism and
corresponding reaction is simply expressed as “The system shall...” followed by a description of
the mechanism. This description should include at least the following:
• Which component (which hardware unit? which software module?) performs the check?
• What is checked?
• When is the check made?
• What is the criterion for determining that there is a possible error?
• What action, if any, is taken by the component when a possible error is detected?
• What does this component action lead to at the system and vehicle levels, and how
does the component action propagate to the system level?
• What is the criterion for determining that a possible error is to be considered as a real
error?
• What action is taken by the detecting component when an error is considered to be a
real error?
• What does this component action lead to at the system and vehicle levels, and how
does the component action propagate to the system level?
• What is the criterion for considering that a real error has disappeared?
This list above only considers two stages; the error is first considered as possible and then as real.
It may be possible that more stages are used. For example, the system could enter a degraded
mode in a first stage, switch off a function in a second stage and, in a third stage, inform the driver
and set a Diagnostic Trouble Code.
For the verification of requirements on specific error detection mechanisms, there are basically two
alternatives. Whenever possible, both should be used. However, it is sometimes the case that only
one verification mechanism is feasible.
• Review of the system design including software code to check that the mechanism has been
implemented and that it does not contain any design fault.
• Fault injection testing in which the fault or error is injected and it is checked that the system
reacts in the specified manner.
Table D.4 provides a compact summary of the findings concerning requirements on specific error
detection mechanisms and corresponding reactions.
“No single point faults are allowed to cause a Hazard” is probably the simplest qualitative
requirement we can imagine concerning the architecture of a system. But in actual systems, there
remain single point faults, because it would not be cost effective to suppress all of them. Then,
how to specify “how much of single point fault is tolerable”?
A simple approach may be to qualitatively assess the occurrence of the single point faults and only
tolerate those with a negligible occurrence. The weakness of this approach is: how to translate
“negligible occurrence” into a non-ambiguous requirement?
A way out for random hardware faults is to quantify the occurrence of single point faults. In that
case we may either
• Allocate a part of the hazard probability requirement to single point faults. This is an
approach from the top.
• Allow only that a quantified proportion of the faults are single point faults. This is an
approach from the bottom. It is this approach that has been chosen in the IEC 61508
standard [1] with the Safe Failure Fraction (SFF) and in the ISO 26262 draft [5] with the
Dangerous Failure Coverage (DFC).
In the IEC 61508 standard, a SIL requirement (see section 3.1.3) is broken down into a
combination of two hardware architecture requirements (see figure below from IEC 61508-2 [1]).
The first one is a required level of Hardware Fault Tolerance (HFT) and the second one is a
minimum level of Safe Failure Fraction (SFF).
A level of HFT is defined as follows: “a hardware fault tolerance of N means that N+1 fault could
cause the loss of the safety function". In determining the hardware fault tolerance no account shall
be taken of other measures that may control the effects of faults such as diagnostics”.
This definition is not directly applicable for us, as in the automotive field safety functions and main
functions are not separate. Thus the expression “loss of safety function” is misleading. An
alternative definition for HFT could be: a HFT of N means that N+1 fault could lead to a dangerous
failure mode, without taking into account the effect of diagnostics.
To derive the SFF, random hardware faults are divided in three categories:
• Dangerous faults, that may lead to a dangerous failure mode. Total failure rate of these
faults is noted: ΣλD
• Dangerous detected faults, that may lead to a dangerous failure mode but are detected to
put the system in a safe mode. It is a subset of dangerous faults. Total failure rate of these
faults is noted: ΣλDD
• Safe faults, that cannot lead to a dangerous failure mode even in combination with other
faults. Total failure rate of these faults is noted: ΣλS
SFF is defined in the IEC 61508 as follows: “the safe failure fraction of a subsystem is defined as
the ratio of the average rate of safe failures plus dangerous detected failures of the subsystem to
the total average failure rate of the subsystem”. SFF = ( ΣλS + ΣλDD )/( ΣλS + ΣλD )
So SFF is the ratio of failure rates of faults that do not directly lead to a dangerous failure mode
over the total failure rate of the subsystem.
It is clearly mentioned in the definition of SFF that it is not considered at system level, but at
subsystem level. As HFT gives the level of redundancy an interpretation can be that SFF must be
calculated on single channel structures (HFT = 0) only.
In order to verify a SFF requirement, it is necessary to categorize every failure mode of every
component of the considered subsystem into one of the three categories (dangerous faults,
dangerous detected faults and safe faults). It is important to notice that this classification is
generally different for different failure modes of the subsystem. It is generally not possible to derive
a single SFF concerning all the subsystem dangerous failure modes. So, a particular SFF
characterizes the subsystem for only one of its failure modes.
Then, all these HW faults are quantified using for instance standard databases. Quantification of
dangerous detected faults and dangerous faults must also take into account the efficiency of
diagnostics. The effect of a dangerous fault may be detected by software with a given efficiency.
So a proportion of the failure rate will be allocated to the dangerous detected category and the rest
to the dangerous category. Thus SFF requirement is linked to error detection requirements.
Let’s take an example to illustrate what has been said concerning HFT and SFF: the low beams
function. The hazard considered is spurious cut off of both lamps with a SIL 2 requirement.
Let’s assume that:
• An ECU receives the requests from the driver and controls both lamps
• There is an independent power transistor for each lamp. There are two different power
supply lines for the two transistors.
In this example (see Figure D.19), the ECU has to be broken down into two parts. The controller
and the input receiving the driver request have an HFT of 0. The power stage has an HFT of 1.
So a HFT of N means that there are N redundant channels in the considered subsystem.
+Bat 1
ECU
HFT=0 HFT=1
CPU
Driver Left power driver Left lamp
requests
Right power driver Right lamp
+Bat 2
Figure D.19 Example system for low beam control
If we refer to the architectural constraints table, to comply with a SIL2 requirement, the CPU
subsystem shall have an SFF between 60 and 90%. The power drivers subsystem, as it has a HFT
of 1, doesn’t need to have a SFF better than 60%. As both power drivers are identical, the SFF
requirement for each power driver is the same as for both power drivers taken together. Then, SFF
has to be calculated for CPU and power driver blocks, using for instance FMEA technique, to verify
compliance. In this FMEA, the failure modes (FM) are the electronic parts (resistors, transistors…)
failure modes and the effects (E) are the failure modes of the hardware block (CPU or power
driver).
A weakness of SFF that is often mentioned is that it can easily be tuned. For instance if you add a
function that is not safety critical to the subsystem you will improve the SFF. This can easily be
overcome if you take only into consideration for the calculation of SFF the components that have at
least one dangerous failure mode.
Another drawback of SFF is that you cannot compare two architectures with a better granularity
than the SIL. For instance, a SIL2 system may be realised using two redundant channels, each
with an SFF of 75% or it may be realised using a single channel with an SFF of 95%. Which one is
better? Because of the link between HFT and SFF, it is not possible to make sharp comparisons
between different architectures. As the effort to derive SFF values is important, it should not be
only useful to assess compliance but it should also be useful for design purposes
An alternative to the architecture constraints of IEC 61508 is currently discussed in the ISO
working group in charge of the future ISO 26262 standard. In this alternative, HFT is not
considered. The quantitative requirement characterises the whole system for a particular hazard.
This requirement derives from the ASIL level. To define this alternative metric, random hardware
faults of the whole system are divided in 4 categories:
• Dangerous faults, leading directly to the hazard. Total failure rate of these faults is noted:
ΣλD. It is important to note that this category is not equivalent to the dangerous faults used
in the definition of SFF. Here, dangerous faults are always single point faults.
• Potentially Dangerous faults, leading to the hazard only in combination with other
independent faults. Total failure rate of these faults is noted: ΣλPD. This category is divided
into two subsets:
o Controlled dangerous faults, that are potentially dangerous faults prevented to lead
to the hazard by a safety mechanism. Total failure rate of these faults is noted: ΣλCD.
o Loss of safety mechanism. The purpose of a safety mechanism is to avoid
propagation of a fault up to the hazard. Total failure rate of these faults is noted:
ΣλB.
One may argue that a combination of HW faults may lead to a hazard without any of the
faults being a safety mechanism. The fault model used here considers that the additional
conditions to lead to a hazard are always there on purpose. On real automotive hardware,
this is certainly true at system level. When it comes to the detailed hardware circuitry, this
model is a little bit simplistic but is considered acceptable.
• Safe faults that cannot lead to a dangerous failure mode even in combination with other
faults. Total failure rate of these faults is noted: ΣλS.
A first proposal, described in the first draft of ISO 26262 called Dangerous Failure Coverage (DFC)
is defined as the ratio of the failure rate of controlled dangerous faults to the failure rate of
dangerous faults plus controlled dangerous faults. DFC= (ΣλCD)/ (ΣλCD + ΣλD).
DFC addresses single point faults. A weakness compared to SFF is that as it does not take into
account the safe failures, a design that favours an intrinsically safe design does not take any credit
with this metric.
To overcome this weakness, another variant of DFC closer to SFF (as it includes safe faults) has
been proposed. It is defined as the ratio of the failure rate of safe faults plus potentially dangerous
faults of the system to the total failure rate of the system. New Metric = (ΣλCD + ΣλS)/ (ΣλD + ΣλCD +
ΣλS).
This second variant of DFC, excludes hardware dedicated to the implementation of safety
mechanisms (ΣλB). This means that in order to derive DFC, a distinction has to be done between
hardware implementing the functions and hardware implementing the safety mechanisms. This is
not always possible.
A third variant of DFC that is simpler to define and to derive could be imagined: (ΣλPD + ΣλS)/ (ΣλD
+ ΣλPD + ΣλS) = 1- (ΣλD/ Σλ)
With this third variant, there are only two categories of faults to be distinguished: dangerous faults
(= single point faults) and all the others. The others include multiple point failures and safe faults.
There is no need any more to define controlled dangerous faults and safety mechanisms.
In any case, a component of the system is only taken into consideration if at least 1 of its failure
modes is a dangerous or a controlled dangerous fault. So, the metric cannot be tuned, adding safe
functions or components.
Verification of a DFC requirement like SFF requires sorting out every failure mode of every
component of the system into one of the different categories. All these HW faults are quantified.
Quantification of dangerous faults must take into account the efficiency of diagnostic.
DFC has a major advantage over SFF/HFT in that it always allows comparison between different
architectures as it is not linked to a second requirement like HFT.
There is a link between SFF and DFC requirements on one hand and hazard occurrence
requirement on the other hand. Both are quantitative requirements concerning random hardware
faults. In most cases, single point faults addressed by SFF/DFC are major contributors to the
occurrence of a hazard. One may argue that hazard occurrence is really what matters. So, why
consider SFF or DFC requirement? Verification of a quantitative requirement on hazard occurrence
is rather difficult. Calculations have to take into account things like exposure duration. SFF or DFC
requirement are easier to calculate as time is not considered. Another advantage is that verification
of SFF/DFC and hazard occurrence use components estimated failure rates, but SFF and DFC
use them in a ratio. So, SFF and DFC are more robust to failure rates inaccuracy.
Hazard probability requirement gives a target for a “young” system out of the production line. As
time passes, random hardware faults that do not directly cause a hazard may appear and will
subsequently increase the probability of the hazard. These faults may remain unnoticed (latent)
because they don’t have any functional consequence. So it is important to detect as much as
14.11.2006 2.0 D-56
EASIS Deliverable D3.2 Part 1 - App D
possible of these faults, to signal them to the driver to induce him or her to repair the car. It is what
Monitoring Coverage (MC) of ISO 26262 draft is all about: It is not to be confused with diagnostic
coverage of IEC 61508.
If we consider the same fault categories as defined for DFC, the faults that we don’t want to be
latent are the potentially dangerous faults composed of controlled dangerous faults and loss of
safety mechanisms. So MC may be defined as the ratio of the failure rate of potentially dangerous
faults detected by the driver to the failure rate of potentially dangerous faults. MC = ΣλPD D/ΣλPD.
Detection by the driver may be achieved through a built in diagnostic test triggering the lighting of a
lamp on the dashboard or through a functional degradation detectable by the driver.
Assumptions regarding the efficiency of detection by the driver in case of detection through
functional degradation have to be made to calculate MC. Also, duration between detection by the
driver and repair is not taken into account. It is considered as sufficiently short to be negligible.
Obviously, there is a link between MC requirement and hazard occurrence requirement. As
mentioned before, latent faults occurring during the lifetime of the system will increase the
probability of the hazard. These latent faults have to be taken into account in the calculation of
hazard probability. But as hazard probability (due to hardware random faults) is a function of many
different things, the link between hazard probability requirement and the quantification of latent
hardware faults is not as straightforward as with a MC requirement.
Table D.5 provides a compact summary of the findings concerning quantitative requirements for
hardware architecture.
When the hazard occurrence analysis shows that a particular fault can lead to a particular hazard,
one of the following solutions is typically employed:
• Introduction of redundant components (for example multiple sensors or multiple
processors) to achieve fault tolerance, thereby breaking the causal relationship between
fault and hazard
• Introduction of error detection mechanisms and functional degradation when an error is
detected
However, in some cases such techniques may be inappropriate and/or insufficient. For example:
• The cost of redundancy might be grossly disproportionate to the benefits gained.
• Component redundancy might be infeasible.
• Detection of the error might be infeasible.
• A degraded mode might be impossible to define.
• Reliability considerations may dictate that a functional degradation should not occur more
often than some predefined limit.
• The probability of multiple faults that together lead to a hazard may be too high in a system,
even if any single fault is prevented from leading to a hazard.
For these reasons, specification of requirements on fault avoidance is often appropriate. Examples
of such requirements are:
• an upper limit on the occurrence rate of a particular fault
• a requirement on the MTBF (Mean Time Between Failure) for a particular fault
• a requirement that a particular fault shall not occur at all (for example, a connector could be
assigned a requirement that a mechanical disconnection due to vibration shall not be
possible)
Obviously, it is not feasible to place a fault avoidance requirement on any single component of a
system. For example, the following example represents an extremely unsuitable choice of
abstraction level:
• An open circuit of capacitor C126 shall occur less than once per 10^6 hours
• A short circuit of resistor R48 shall occur less than once per 10^6 hours
In this case, a much more appropriate requirement could be:
• The hardware fault rate in the Electronic Brake Controller Unit shall be less than 10^(-4) per
hour
Note that requirements concerning the avoidance of design faults and manufacturing faults are
addressed in sections D.3.9 and D.3.11 respectively.
In the context of integrated safety systems one might consider all functional requirements also as
critical functional requirements as a safety system always deals with the avoidance of critical
situations. But this view on the functional behaviour is too strong. For example a collision
avoidance system may have the functional requirement that information is given on the dashboard
in a particular manner. Whether information is given for example in yellow or red is not critical.
Signalling the braking to the driver is less important than activating the braking itself.
Critical functional requirements are those that may lead to a highly critical hazard if they are
violated. Hence it is essential that these functional requirements are correctly implemented. From
that point of view it is desirable to analyse the software carefully with respect to such requirements.
It is recommended to use, if applicable, formal verification techniques to guarantee that the
software behaves 100 % correctly with respect to the given critical requirements.
The first question is which of the given functional requirements are critical and which not. The
classification might be not unambiguous. For example, is the correct functioning of the indicator
lights critical or not. Consider the situation that a car is setting the indicator lights to turn left, the
dashboard signal indicates a correct working of the lights, but assume that the lights are not
working. A following car observes the braking of the car in front and assumes that it will stop and
hence starts to pass the vehicle in front. But the vehicle in front turns left and the accident is
preassigned.
To obtain a classification the functional requirements have to be reviewed. One has to ask, which
are the main functional requirements specified to avoid critical situations and which of the
functional requirements are less critical. To identify the critical ones one may ask: “What will
happen if the requirement is violated? Will this lead to a catastrophic situation?”
Examples of critical functional requirements are:
• No airbag ignition without a crash.
• All doors will be unlocked if a crash is detected.
• No automatic braking (initiated for example by a collision avoidance system) without a
critical situation.
• If the vehicle is closed to a crash the collision avoidance system will activate the braking
system.
Additional critical functional requirements can be derived from the hazard analysis. Identifying
hazardous events/situations may lead to functional requirements that the (software) system should
never lead to these hazardous situations. For example a hazardous event may be the situation that
acceleration is active during braking. This leads to the functional requirement, that the acceleration
request signal should be always low when the braking is active.
There are two classical patterns to classify critical functional requirements:
• Safety Properties
expressing that “something bad never happens”. Typically these are invariant properties.
o “Never: airbag ignition and no crash signal” meaning no airbag ignition without a
crash.
o “Never: doors unlocked and crash signalled”.
• Progress Properties
expressing that “something good will happen soon”. Such a property will specify a required
reaction depending on a specific situation.
o Whenever a crash is detected the airbag will be ignited within 25 ms.
o If the collision avoidance system detects a situation close to a crash the braking
system will be activated.
Critical functional requirements should be treated with a particular attention regarding verification.
In general it is not feasible to formally prove that an entire distributed system fulfils all its functional
specifications. But proving that a specific requirement is fulfilled by a specific software module is in
the scope of today’s technology, for example by formal methods. It is recommended to use a
formal approach for verifying the correctness of a software system with respect to the identified
critical requirements. A non trivial task for that is the formalization of the requirements in terms of a
formal language suitable for formal verification. Details on that and on the formal verification
approach itself are given in Part 2 of these Guidelines.
Counter measurements are needed to avoid the loss of critical functionality caused by hardware
failures. Hence critical functional requirements are related also to fault tolerance requirements and
fault avoidance requirements.
Table D.7 below provides a compact summary of the findings concerning critical functional
requirements.
General Critical functional requirements are those functional requirements which
characteristics are related to the avoidance of critical behaviour.
How to determine Usually the functional requirements are given. In a review process the
suitable requirements critical ones have to be identified. Which are the main functional
requirements specified to avoid critical situations?
How to express Typically critical functional requirements are expressed (1) by
requirements relationship expressing required reactions in identified states which may
lead to critical situations or (2) in forms of invariants expressing what
never should happen.
Specific difficulties The difficulties are in identifying the critical ones from the non-critical
ones.
Relationship to other Counter measurements are needed to avoid the loss of critical
types of functionality caused by a failure. Hence critical functional requirements
requirements are related also to fault tolerance requirements and fault avoidance
requirements.
Relationship to other
requirements of the
same type
Verification issues Critical functional requirements should be treated with a particular
attention regarding verification. If possible formal verification techniques
should be used to verify the consistency of the system with respect to the
functional requirements. A non trivial task is the formalization of the
requirements in terms of a formal language suitable for formal
verification.
Examples “No ignition of the airbag in a non-crash situation”
“In a crash situation the ignition of the airbag should be within 25 ms”
“In a near crash situation (e.g. expressed in terms of vehicle speed and
distance to an obstacle) the brake systems should be activated.”
“In a crash situation all doors should be unlocked”.
In many cases, risks can be reduced by a limitation of the authority of the considered system. For
example, we can consider a Collision Avoidance (CA) system that applies braking without driver
demand when the vehicle is about to collide with a forward object. Braking as hard as possible
seems to be a good strategy. After all, the braking should only be made when the vehicle is very
close to a crash. One potential hazard of a CA system is obviously a totally unnecessary activation
of the brakes in a perfectly normal driving situation. The higher the brake force that the CA is
allowed to create, the more critical this particular hazard will be. There is a trade-off to be made
here between the desirable behaviour in a collision situation and the system safety requirements.
Functional limitation requirements are concerned with limiting the authority of control systems in
order to reduce the criticality of the hazards. In reality, this means that a significant portion of the
occurrence rate of a highly critical hazard is moved to a less critical hazard. Such authority
limitations can be made with respect to magnitude (for example brake force) and/or the duration of
the intervention. Obviously, a CA system would never need to brake longer than a very short time,
so a limitation of the duration would not affect the correct function but it would reduce the
occurrence of the hazard “long-duration unwanted activation of the brakes”.
Regarding the implementation of such functional limitations, it is strongly recommended that the
limitation is implemented close to the actual actuation. In the collision avoidance example, a
limitation in the brake controller would be more effective than a limitation in the Collision Avoidance
controller, assuming that there is a dedicated such hardware unit separate from the brake
controller. In fact the limitation should ideally be implemented in the actual control of the brake
actuator (or even in the actuator itself if this is possible). A limitation only provides protection
against “upstream” errors; any error occurring “downstream” of the limitation would of course not
be handled.
A functional limitation would split a previously defined hazard into two different hazards, one
representing the case when the limitation works and another representing the case when the
limitation is violated. This split would then necessitate an update of the hazard identification results
and a re-iteration of the hazard classification for these two cases.
Concerning the issue of defining suitable limitations, every hazard should be checked with respect
to the possibility of limiting the criticality by limiting the authority of the system.
In particular, the possibility of functional limitations should be investigated if the quantitative results
of the hazard occurrence analysis shows that hazard probability requirements are not met.
Table D.8 provides a compact summary of the findings concerning requirements for functional
limitations.
General characteristics Functional limitation requirements limit the authority of a system with
respect to its influence on the behaviour of the vehicle.
How to determine For every failure mode, investigate if a functional limitation is feasible
suitable requirements and desirable.
How to express A description of the limitation and a specification of where the
requirements limitation is to be allocated.
Specific difficulties Tradeoffs with functional requirements emanating from the desired
behaviour of the system
Relationship to other Hazard probability requirements constitute important inputs to the
types of requirements identification of suitable functional limitations, particularly when these
probabilities are compared to the results of a quantitative hazard
occurrence analysis.
Relationship to other None
requirements of the
same type
Verification issues Verification is relatively straightforward:
• Fault injection testing in which the limitation will be violated
unless the limitation works as specified
• Review of the implementation of the functional limitation
Examples “For the Collision Avoidance function, the brake controller shall limit
the brake actuator command to a level corresponding to 50 bars
hydraulic pressure.”
“For the Collision Avoidance function, the duration of brake actuator
commands issued by the brake controller shall be limited to a duration
of two seconds”
This section describes design process requirements and splits these requirements into two main
categories. It identifies several kinds of process requirements and where they are originating from,
their characteristics and their relationship to other requirements and their verification. This section
also provides examples of this type of requirement.
It should be noted that the topic of process requirements for software development and design is
covered in much more detail by EASIS Deliverable D4.1 "EASIS Engineering Process Framework"
[6]. Therefore, only an overview of this highly important topic is provided here.
D.3.9.1 Context
The following table gives a short overview on both design process categories:
Category 1 Construct or adapt a process according to process requirements
Assess a process according to a reference process (which is fulfilling the
process requirements) or according to the process requirements
Category 2 Select tools, methods, techniques within the development process during the
system development
Process requirements can place real constraints on the design of a system. For example, an
organization may specify that a specific set of CASE tools must be used for system development
because it has experience with these tools. If these tools do not support object-oriented
development, this means that the system architecture cannot be object-oriented.
The aim of process requirements is a systematic way of designing the system. Therefore a special
design process is that qualified and competent staff has to use qualified or certified tools to
develop the system.
The characteristics of the design process requirements are different for the 2 categories:
• Category 1 requirements are functional requirements on the design process and
requirements on the process attributes.
• Category 2 requirements are non-functional requirements on the product but are function
requirements on the design process.
Process requirements can be found in a great variety of standards. IEEE-Std 830 – 1998 gives the
following 3 subcategories for process requirements:
• Delivery requirements
• Implementation requirements
• Standards requirements.
This standard requires also attributes which have to be used to formulate requirements. Some
examples are
1. Correct
2. Unambiguous
3. Complete
4. Consistent
5. Verifiable
6. Modifiable
7. Traceable
The process requirements are applicable to the overall system design (engineering) process and
to specific steps or phases within the process.
The most requirements are expressed in a textual or tabular form. It is also possible to use
graphical notations to describe complex relations within the process, e.g. the lifecycle model /
process model of the design process.
Textual form is mostly used to express how a specific activity, method, techniques or tools has to
be used. Tabular form is mostly used to provide a compact overview or to present a choice
between specific methods, techniques or tools.
The following chapter describes how design process requirement can be determined. The primary
sources are laws, regulations, internal policies and strategies, industry best practices.
Category 1: Design process requirements are mainly found in relevant standards. (See Appendix A
for an overview of relevant standards) The purpose of these requirements is to set up a
development environment which helps to avoid systematic failures.
Category 2: Some standards and guidelines allow the tailoring of the development process. This
means that a company could adopt the reference to their needs.
This approach is widely used in standards and norms where successful procedures are
“preserved”. Standards like IEC 61508 also provides list of methods which could be applied in the
lifecycle. The selection of these methods is producing requirements which are coming out of the
design process.
Another source or design process requirements is the process improvement where new ideas are
introduced in the process and existing processes are measured and adapted.
Mainly process requirements can place constraints on the design of a system. This means that
other requirements could not be fulfilled or need to be changed. Beside this there is no relation of
deeper interest because the process requirements have only little influence on the functional
behaviour or the properties of the systems.
The verification and validation of design process requirements is very different for the 2 categories.
Category 1: The process requirements are either assessed with a process assessment using a
reference process model with Process FMEA or similar analysis techniques to determine if all
requirements are fulfilled. The most known and used process reference models are
• CMMI
• (Automotive) SPICE
• ISO 9000/9001
• IEC 61508
The verification that an assessed process is correctly followed could be done in several ways.
• Activities and work products could be included in a safety plan or project plan and checked
at several phases within the development process
• Tools, methods and techniques could be documented in the process description and
checked at several phases within the development process
This is also a very important link into the final dependability assessment and the process branch of
the Safety Case, because the work products are used as evidences for the correct application of
the development process. More information about that can be found in Appendix E.
Category 2: The lowest level process requirements could be verified by
• Reviews
• Checklists
• Analysis (e.g. Process FMEA)
These activities should show that the requirements are correctly fulfilled.
Table D.9 provides a compact summary of the findings concerning design process requirements.
General characteristics Process requirements describe how a system shall be developed.
How to determine Process requirements are usually retrieved out from Standards like
suitable requirements IEC 61508 or DO-178B and they could also document the working
style of a company (selection of tools and methods)
How to express Textual and sometimes graphical (e.g. lifecycle) representation
requirements
Specific difficulties Usually requirements for a design process are not part of the system
development.
Relationship to other Process requirements can place constraints on other requirement
types of requirements types.
Relationship to other none
requirements of the same
type
Verification issues The requirements for the process are usually assessment using a
reference model (e.g. Automotive SPICE). Lower level process
requirements can be verified with Review, Checklists and/or Analysis
Examples Requirements for the design process
• “A development lifecycle has to be defined and documented”
• “Roles and their responsibilities have to be defined and
documented”
Requirements due to a process
• Code Generation shall be used. One of the following tools
shall be used for this purpose …
• The software shall be compliant with MISRA C.
When two or more items must be independent (i.e. not susceptible to dependent failures like
common cause failure or cascade failure described in appendix C), it is appropriate to specify
requirements concerning their isolation and their diversity. Diversity is used to avoid having the
same systematic faults in both items. Isolation techniques prevent propagation of faults between
the items.
D.3.10.1 Isolation
For hardware, isolation may be achieved through physical separation (minimum distance between
the two items) or with physical screens (different housing, filtering on common power supply, etc).
Separation and physical screens will be used
• To reduce common sensitivity to the environment (coupling)
• Against external aggression (e.g. external metallic object creating a short circuit between
redundant items)
• To avoid propagation from one item to the other (overheating, electromagnetic interference,
etc).
Another aspect of isolation is to avoid using common resources for the items we want to be
independent (no common power supply, no common data network, no common critical input signal,
etc).
For software components running on the same CPU, isolation is usually called partitioning. It may
be achieved using techniques such as modularity, no shared data, memory protection unit, etc.
Partitioning will be used:
• when two software components must be independent for functional reasons. For example:
when one component monitors the other, a failure of the monitored component must not
disrupt the functionality of the monitoring component.
• when software components with different criticality are executed on the same CPU as a
failure of the low integrity component must not disrupt the functioning of the high integrity
component.
Isolation concerns both systematic and random faults.
D.3.10.2 Diversity
Diversity is used to avoid having the same systematic faults in two or more items. Diversity
requirements may concern the items themselves or the means used to produce them (team, tools,
etc).
Some examples are:
• Diversity in design: use of different core CPU on the main channel and the supervisor
channel of an FSU.
• Diversity requirement on tool: use different compilers to generate software for redundant
channels.
• Diversity against common sensitivity to the environment (coupling): different technologies
between redundant sensors used in a steering angle sensor (e.g. optical and magnetic).
D.3.10.3 How to determine, refine and verify requirements on isolation and diversity
Requirements on the manufacturing process are in principle outside the scope of this work so only
a brief overview is given here.
Requirements may be placed both on the manufacturing process for a component or subsystem
and on the assembly in the vehicle factory. Such requirements typically come from an identification
of a particular manufacturing (or assembly) error that could give rise to a hazard later on. For
example, it may be the case that a connector needs to be treated in a special way or that the
fastening of a component needs to be made in a particular manner. This type of requirements is
typically expressed as an instruction to the factory staff.
Additional comments concerning requirements on the manufacturing process are given in Table
D.11 below.
This section describes dependability requirements that apply to external systems, and covers
aspects such as where to look to find them, their typical characteristics, their relationships to other
requirements (both other types of requirements and within the hierarchical requirements structure),
and their verification. It also gives some examples of this type of requirement.
In the simplest conceivable world, requirements are elaborated from very general statements of
need at a high level, down through some intermediate levels, until they can be stated in the form of
‘what this system must do’ (and potentially how well it must do it). For systems of systems, this
elaboration will at some point lead to some design decisions being made that will involve
separating off requirements that apply to unique systems. This process may be repeated at many
levels for very large systems, where a detailed and complex hierarchy of systems exists. For
current vehicle systems, the hierarchy is relatively simple, comprising the transport infrastructure
(roads, signs, traffic controls etc) at the highest level, the vehicle at the next level, then control
systems (steering, braking, powertrain, various chassis and body controllers), and finally the
electronic hardware, software, mechanical and hydraulic systems. Within the lifetime of this
document, the ‘control systems’ layer is likely to become hierarchical in its own right, with the
advent of central controllers and smart actuators. See Figure D.20.
Infrastructure
Vehicle
Hydr A
H/W A
S/W A
Mech C
Mech D
Hydr C
Hydr D
H/W C
S/W C
H/W D
S/W D
Figure D.20 Hierarchy of Systems
For the purposes of the discussion of requirements on external systems as it applies in this
document, an external system is any system within the greater system (the vehicle) that is not a
lower part of the current hierarchy. With reference to Figure D.20, if Control System A is the
system of concern, Controllers B, C & D are all external systems, and the Vehicle can be
considered the parent system. If the system of concern is Central Controller B, Control System A is
an external system, but Smart Satellites C and D are probably not, and again the Vehicle is the
parent system.
Note that there are circumstances under which sub-systems of the system of concern may be
considered external systems, for instance if the sub-system is being developed by a third party.
At some stage in the development of a system, design decisions will be made that lead to a set of
sub-systems being developed. This is true for an arbitrary system at any level of the hierarchy
outlined above. Part of this particular activity will be to define the interfaces between the various
sub-systems. This activity will be documented in something that may be called an interface
specification. Subsequent development of the interface specification may be greatly influenced by
the requirements determined here for each sub-system, and by any similar activity performed
during development of the external system.
A key term used when discussing requirements is ‘Stakeholder’. In general terms, a stakeholder is
anyone who has an interest in the system of concern. The interest could be functional, financial,
legal, social, or in other categories. Stakeholders include such individuals or bodies as the system
developer, the system purchaser (or a ‘proxy’ representative, for instance in the case of consumer
goods), legislative organisations, standards developers etc. In the case of an Automotive
Integrated Safety System, the stakeholders will include the system integrator for a sub-system, the
system integrator for the vehicle, the system developers for ‘neighbouring’ systems and sub-
systems, the vehicle developer, organisations responsible for the infrastructure and so on. For this
particular discussion, the key stakeholders are the systems integrators, and developers of
‘neighbouring’ systems. These would have been the key stakeholders involved in the creation of
the interface specifications, and they will have an ongoing interest in ensuring that it is both
accurate and suitable for the vehicle being developed.
Generally, requirements on external systems are fairly straightforward to spot – they will refer to an
aspect of the greater system over which the system of concern has no direct control.
The requirement itself can be expressed in any of the ways listed under the other types of
requirement, since it will always be one of those types in addition to being a requirement on an
external system. The important point about this type of requirement is that it needs no further
elaboration for use within the system of concern. The development of the external system may be
affected by the addition of the requirement but that is beyond the scope of the system of concern.
Acknowledgement of the existence of this type of requirement leads to an additional attribute being
present for all requirements. This attribute can be considered Boolean in nature at its root, its value
being determined by answering the question “Does this requirement apply to the system of
concern?”. With more knowledge of the system the type of the attribute becomes enumerated
(potentially multi-valued), and is determined by answering the question “To which system (or
systems) does this requirement apply?”.
Requirements on external systems need no further elaboration for the system of concern.
However the progress of the requirement within the affected system (acceptance, development,
validation, deployment for instance) may need to be coordinated in order that the various systems
co-operate as expected.
It is of the utmost importance however that the requirement is communicated to the development
team for the relevant external system. This is by no means a trivial task, since the various sub-
systems in the wider system may be at different stages of development. However, a ‘live’
document such as the interface specification alluded to above will aid greatly in this task, as it is an
obvious place to collect and collate requirements from and for other systems.
While strictly beyond the scope of this document, it is nevertheless interesting to note that this is a
‘double-edged sword’. External system owners, in their efforts to determine dependability
requirements, may request that certain requirements are placed on ‘our’ system (the system of
concern). These will in their turn need to be examined for potential hazards, and for their
integration into the rest of the system design.
There are many places to look for dependability requirements that may apply to external systems.
The most obvious place to start is at the definition of the interfaces (for example in the interface
specifications, referred to above), but there are questions that may usefully be asked in an attempt
to elicit other requirements:
• Are there circumstances under which the system of concern needs to know more than is
immediately obvious about the state of another system? For instance the validity or age of an
incoming signal, or the value of an output from the external system to an actuator beyond the
normal scope of the system of concern. This may be under abnormal conditions, where for
instance in reversionary modes the system of concern may modify its behaviour depending on
the state of an external system, which may require that system to publish more information than
is required for normal operation.
• Can requirements for redundancy in the system of concern be satisfied by having an external
system make additional data available over an existing communications channel? This may
present a more cost-effective solution to a need for redundancy than straightforward duplication.
• Are there circumstances or states of the system of concern during which the overall integrity of
the parent system can be better maintained by altering the behaviour of an external (sibling)
system? This leads to a whole set of further questions, relating to the design of the parent
system and the sibling system as well as the system of concern.
• How should external systems react to the behaviour that the system of concern may exhibit when
in its various reversionary modes? These reversionary modes may not have been envisaged at
the time that the parent system was designed and the system requirements distributed amongst
the various control systems.
• Does mitigation of a fault that may otherwise lead to a hazard require non-standard behaviour of
an external system?
Requirements of this type should first be examined to determine which of the other categories
described in this document they fall into, and should then be formulated as defined there. The fact
that they are targeted at an external system does not affect the way they are formulated initially,
although the need to comply with processes and guidelines used in the design of the external
system may require adjustments to the policies suggested herein.
It is entirely possible that the requirements discovered for application to external systems are in
fact ‘simple’ functional requirements (as opposed to dependability requirements) when presented
to the external system. In this case, other guidelines need to be sought for formulation of functional
requirements.
Similarly, methods of validation and verification should be determined based on the ‘base type’ of
the requirement. These are presented in this document for dependability related requirements, but
will be specified elsewhere for other types of requirement.
This classification of requirements is orthogonal to the other classifications. Any and all other types
of requirements can be requirements on external systems as well as whatever other type they are.
Thus the relationships between this type of requirement and other types, in terms of the
breakdown of requirements, is as specified in the ‘base type’ of the requirement. Note, however,
the comments below on relationships within the hierarchy of requirements.
Within the hierarchy of requirements for the system of concern, requirements on external systems
will generally be ‘leaf nodes’. That is, they may have come to light during elaboration of more
general requirements (and so will have ‘parent’ requirements within the hierarchy), but, being
relevant to systems other than the one of concern, they will not be further elaborated within this
hierarchy (and so will have no ‘child’ requirements or specifications within the hierarchy).
This type of dependability requirement is most likely to appear near the top of the hierarchy, at a
point where interaction with other systems is being considered. For instance, they may be
uncovered during determination of mitigating actions for the causes of hazards. However it is
entirely possible that requirements on external systems are deduced at the very lowest level of
detail.
In terms of the external system’s hierarchy of requirements, these requirements will appear at the
apex of a hierarchy, since they will have no direct parent within the external system. Within that
external system, they may then develop a hierarchy of their own.
Requirements on external systems can fall into any of the other categories of requirement
specified within this document. The particular uniqueness is that they do not apply to the system of
14.11.2006 2.0 D-75
EASIS Deliverable D3.2 Part 1 - App D
concern, but to some other system in the larger system-of-systems (generally the vehicle, but
potentially the transport infrastructure too).
Section D.3.12.1 above clarified the concept of ‘external systems’. When it comes to verification,
particularly of requirements on external systems, the boundaries become somewhat cloudy.
Verification requires the developer to look beyond the system boundaries, with the aid of the
interface specification, to discover and either stimulate or imitate the interfaces that the system
sees there. At integration, verification requires that the interfaces between systems are tested in
situ. Specifically, one or more of the following (or others) may be checked:
• the response in system Y (or Z) to a stimulus (value change, event, state change) in system Z (or
Y) is correct according to the overall system requirements (emphasis on tests on the overall
system); or
• the data (signals or events) published by system Y is interpreted correctly in system Z that
subscribes to that data (emphasis on tests in the receiving system Z); or
• the data (signals or events) published by system Z is presented correctly to system Y that
subscribes to that data (emphasis on tests in the transmitting system Z).
If we consider system Y to be the system of concern, then we can consider system Z to be an
external system. Any requirement we place on system Z (its generation of a stimulus, its response
to a stimulus, its reception of data or its transmission of data) will need verifying for correctness. In
all three cases listed above, the verification activities lay emphasis away from the system of
concern, either on the external system or on the parent system.
Thus there are two significant aspects to verification of requirements on external systems:
• Verification of External Systems.
Verification of an external system is beyond the scope of this document. However, if that external
system is also developed to the guidelines present in this document set, it will be verified as a
‘system of concern’ in its own right, and will consider requirements from external sources in its
verification plan.
• Verification of Integrated Systems.
Verification at the integration of the system of concern with other systems can be considered as
verification of a larger system, and can also be subject to the guidelines present in this document
set. As such, the total set of requirements, including dependability requirements sourced from
one system but applicable to other systems, will be verified for that parent system as part of the
verification plan for the parent system.
Validating that the chosen requirement is correct and that it is correctly targeted at the appropriate
external system requires the validation team to study the requirements of the parent system, and
ask the question “Does this requirement (on a system external to the System of Concern)
represent the best way of satisfying the requirements of the parent system?”. It will not be a trivial
matter finding an answer to this question, given the conflicting nature of requirements on
performance, cost, delivery schedule, resourcing, reuse, enhancement capability etc. It is difficult
to give guidance from the context of this document, but in general, the guidelines that are given for
a system will apply also to the parent system, albeit to a greater or lesser degree depending on the
context.
Inevitably, the final decision on whether the requirement is valid remains with the owner of the
parent system, whose task it is to balance and allocate the requirements across the whole system,
and to define and ensure implementation of an appropriate validation plan.
14.11.2006 2.0 D-76
EASIS Deliverable D3.2 Part 1 - App D
Table D.12 provides a compact summary of the findings concerning requirements on external
systems.
Some hazards may be dealt with by the incorporation of appropriate information in the owner’s
manual. This is particularly suitable when a hazard is characterized by a misunderstanding of how
the system works. Although such hazards are in principle outside the scope of EASIS WT 3.1, a
few comments are given here. Some examples of this type of user manual information are:
• Explanation of how the system works to prevent the driver from misunderstanding its
operation
• Description of the Human/Machine Interface (HMI) including how the driver should interact
with the system
• Explanation of the inherent limitations of the system so that the driver does not expect too
much
• Description of the driving scenarios in which the system may perform inadequately
• System behaviour characteristics that the driver should be aware of (for example that the
brake pedal oscillates during ABS braking)
• Descriptions of how the user should react to error information such as “Service required”
• Explanation that the existence of a particular Safety System (collision warning, collision
avoidance, airbag, etc) does not warrant a less careful driving style than normal
For any system, the points above (and possibly other similar issues) should be considered.
Corresponding instructions shall be introduced in the user manual whenever appropriate.
If it has been identified that inadequate service actions might give rise to hazards, the service
manual shall highlight this by providing appropriate instructions on how to perform service.
Examples of requirements on the service manual (or other service instructions) are:
• Instructions for identification of root fault
• Assembly and mounting instructions (torque for fastening bolts, etc)
• Instructions for specific activities such as calibration of sensors
• Instructions for how to verify that a service action has been correctly made
These instructions may be complemented by warning text stickers on the components themselves.
Concerning maintenance, it may be noted that two types of hazards are possible:
• Hazards to the user of the vehicle (or to other road users) as the result of an incorrectly
performed maintenance action
• Hazards to the service technician in case he/she performs a maintenance action in the
wrong way
We are primarily concerned with the first of these, but the second one should obviously not be
forgotten.
Table D.13 below provides a compact summary of the findings concerning requirements on user
manual and instruction manual.
Table D.13 - Essential characteristics of requirements on user manual and service manual
D.4 References
[1] IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems, IEC,
1998.
[2] Interim Defence Standard 00-56 Issue 3: Safety Management Requirements for Defence Systems, UK
Ministry of Defence, 2004.
[3] Reducing risks, protecting people: HSE’s decision-making process (“R2P2”), ISBN 0 7176 2151 0, UK
Health & Safety Executive, Her Majesty’s Stationery Office, 2001.
[4] Always Read The Leaflet: Getting the best information with every medicine. Report of the Committee on
Safety of Medicines Working Group on Patient Information, The Stationery Office, ISBN 0 11 703409 6,
2005.
[5] C. Jung, Stand des ISO-Standards zur Funktionalen Sicherheit für die Automobilindustrie, (mainly in
English), Presentation at Safetronic 2005.
[6] EASIS Engineering Process Framework, EASIS Deliverable D4.1, 2006.
Table of contents
This Appendix has two main objectives. Firstly it introduces the overall ideas and the structure of
the Safety Case itself. By comparing the concept of Safety Case to standards and guidelines the
need is clarified and the importance of this type of confirmation is underlined. The second objective
of this Appendix is to provide guidance on the methods and processes available for constructing a
Safety Case for an Integrated Safety System. The intention is not to provide a prescriptive
approach that must be followed; rather, to recommend the principal features that must be found
and documented in an appropriate method and the key stages that must be followed in
constructing a Safety Case.
Furthermore a proposal for assessment is given and its application is explained. By assessing the
Safety Case the confidence that forms the basis of the Safety Case can be reasoned. On the last
pages tool support is touched on.
This Appendix consists of the following parts:
• Introduction: a description of what a Safety Case is intended to achieve, and some history
of the Safety Case.
• A listing of standards is given and a description of the structure and principles of Safety
Cases is expounded.
• Safety life cycle: Provides an answer on what different stages, tactics and phases are
underlying and what content should be achieved.
• Notations: an introduction and comparison of the two most common graphical notations is
given.
• Overview of the Safety Case: this is a possible breakdown and shows an approach for
creating a Safety Case.
• Elements of a Safety Case: This chapter offers an idea what contents a Safety Case should
have and what actions can be done to create evidence.
• Architectural approaches from software engineering are explained and applied.
• Safety Case Assessment: description of a method to ensure that the Safety Case is
sufficiently trustworthy
• Safety Case Engineering is explained.
• Tools: some relevant tools are analyzed
• Illustrations: some illustrations of Safety Case relevant documents
This chapter provides an overview of the history and the different definitions of a Safety Case and
its content. This is required due to the fact that the Safety Case concept is a relatively new concept
for the automotive industry and it was mainly used only within the UK. Therefore a detailed
introduction of the different topics, i.e. the construction and the content, related to a Safety Case is
provided.
E.2.1 History
As a short introduction of a Safety Case the history will be given. Within the history it will be
explained where Safety Cases appear and why they became necessary.
As Kelly stated in his doctoral thesis [1] the first Safety Case arose from a catastrophic accident in
the Windscale power plant.
In the year 1958 a fire broke out forcing the release of radioactive rejects to the atmosphere. This
was a catastrophe causing the death of 32 people and 260 cancer cases from radiation. As a
consequence the Nuclear Installations Act (NIA) 1965 was introduced. The aim was to regulate the
installations of all commercial nuclear reactors in the UK. As a part of this Act the Nuclear
Installations Inspectorate (NII) was founded to control all nuclear reactors in the UK. After handing
out a special report including justifications of safety of the design construction and operation of the
plant the NII agreed to an operating license. This report could be seen as the first Safety Case.
There are some other examples where such safety regulations were developed like: CIMAH
(“Control of Industrial Major Accident Hazards”, 1984) as the consequence of an accident in
Flixborough 1974, the Health and Safety at Work Act (HSWA) 1974, the Ionizing Radiations
Regulations 1985, the Radio Substances Act 1993 and many more.
Before these safety regulations were introduced safety was however not ignored. There were
standards and safety thinking. But this new understanding of safety is more thoroughly and better
documented as will be shown in the following chapters. It offers the possibility of putting the
responsibility for safety on the customer or regulator.
The main question in this chapter is if Safety is guaranteed by standards in an adequate level.
Safety standards advise processes and practices and the level of Safety to be reached. In the
past they regulated what to do and what is not allowed to do. This thinking changed to the goal-
based approach. The problem with those standards was that in times of fast technical
development, Safety changes fast as well. A more flexible approach is needed which allows more
innovation. The approach should be able to allow the Safety Engineer to demonstrate an adequate
level of Safety with evidence that fits the requirements individually set for this specific product. In a
prescriptive approach this is difficult.
Prescriptive In a prescriptive procedure the actions that are carried out and the ones that are
not are adhered. The question is if the documents that are provided are sufficient
and necessary. This depends on the system and as stated above the system
requirements change due to technology changes.
Goal-based In contrast to the prescriptive approach, goal-based development only provides
the claim what should be achieved. This approach requires a lot of experience.
The difference is that in this approach it is necessary to set up the requirements
as needed by a specific argumentation that is not specified by a standard.
Satisfy Satisfy
requirements requirements
On the one hand the goal-based approach has advantages like much easier innovation, but on the
other hand the level of trustworthiness can be afflicted. Use of prescriptive standards can reduce
the onus on the supplier to achieve ongoing risk reduction through life. Prescriptive Safety
regulations sometimes lead to a reduced sense of ownership of Safety followed by understanding
Safety as a “tick in the box” (a naïve view but sometimes seen).
As well as the design of such a structure, the assessment of goal-based justifications is more
involved. The assessor has to understand the structure first and then he can start to think about
the quality. This underlines that the basics (Argument structures) presented in the next chapter are
absolutely necessary for this type of Safety certification.
The first point of interest in this Appendix is the definition of a Safety Case. The main problem is
that the Safety Case is more than just a number of papers written with regard to safety. To stay
close on its fundamentals two already existing and most common definitions will be compared.
As Kelly illustrates a Safety Case should “communicate a clear, comprehensive and defensible
argument that a system is acceptably safe to operate in a given context” [2].
Acceptably Means that there is a tolerable risk remaining. The Safety Case should explain
safe that the underlying system is safe enough to operate. This safety depends on
the underlying industry, society and judicial reasons.
Context Context-free safety is impossible. It should be exactly declared which
application in which environment is discussed.
Clear is about being understandable and having good structure.
Comprehensive Comprehensive is about being acceptably complete, but will contribute to clarity
by ensuring that the full argument is present.
Understandably the Safety Case is more than just documentation. It collects all information
associated with safety, gives an overview and builds up connections between them. Therefore
clear comprehension is inevitable.
Another quite similar definition is given by Bishop and Bloomfield (Participants in the SHIP
Project): “A Safety Case is a documented body of evidence that provides a convincing and valid
argument that a system is adequately safe for a given application in a given environment” [3].
Clear and comprehensive can be taken to be implied by the documented body.
“Convincing and valid” means almost the same as defensible.
Acceptably is similar to adequately.
MoD Def Stan 00-56 [4]: “The Safety Case is the primary means of demonstrating safety. It should
show, from an early stage in the acquisition process, how safety will be achieved. Thus the Safety
Case will initially identify the means by which safety will be achieved and demonstrated; at later
stages detailed arguments and supporting evidence will be developed and refined. Once the
system is operational, the Safety Case will demonstrate how safety will be maintained.”
The first definition from Tim Kelly is preferred in this appendix, but the three definitions have been
shown to be sufficiently similar as to be considered equivalent.
In safety related systems where failures can lead to catastrophic or at least dangerous
consequences Safety Cases are implemented to demonstrate that a system is acceptably safe. As
stated in the introduction, Safety Cases are already in use for nuclear power plants, air planes and
railways. Some relevant standards are listed below and the content of a selection is examined in
some detail in Appendix A.
E.3.1 Standards
Some might think safety is guaranteed by standards. Unfortunately this is not enough. Safety
standards advise processes and practices but not the level of safety to be reached. Furthermore,
standards are given for a specific type of product but in most cases not for a specific design. Most
standards use limits like Safety Integrity Level or Development Assurance Level (see
Chapter E.9.2).
Some of the relevant standards are mentioned below:
ARP 4754 Certification Considerations for Highly-Integrated or Complex Aircraft Systems
was written by Systems Integration Requirements Task Group AS-1C, ASD SAE
on April the 10th 1996. This document discusses the certification aspects of
highly-integrated or complex systems installed on aircraft, taking into account the
overall aircraft operating environment and functions. [28]
CAP 670 Air Traffic Services Safety Requirements published by Safety Regulation Group
Civil Aviation Authority 2005. This standard can be downloaded from the
homepage: www.caa.co.uk. “CAP 670 Air Traffic Services Safety Requirements
describes the manner in which approval is granted, the means by which Air
Traffic Service (ATS) providers can gain approval and the ongoing process
through which approval is maintained.” [37]
Def Stan RELIABILITY AND MAINTAINABILITY ASSURANCE GUIDES was created by
00-42 the UK Ministry of Defence. This Defence Standard provides guidance to
accommodating Ministry of Defence (MOD) Reliability and Maintainability (R&M)
practices, procedures and requirements in the design process. This standard
was one of the first which introduced a concept of Reliability Case with a similar
structure of arguing. [38]
Def Stan REQUIREMENTS FOR SAFETY RELATED ELECTRONIC HARDWARE IN
00-54 DEFENCE EQUIPMENT. This Part of the Interim Standard describes the
requirements for procedures and technical practices for the acquisition of Safety
Related Electronic Hardware (SREH). Compliant procedures and practices shall
be required by all MOD Authorities involved in the original procurement of SREH,
whether COTS, reused or application specific, and during maintenance and
replacement. [30]
Def Stan REQUIREMENTS FOR SAFETY RELATED SOFTWARE IN DEFENCE
00-55 EQUIPMENT since August 1997. It summarizes its contents as follows: “The first
Part of the Standard describes the requirements for procedures and technical
practices for the development of Safety Related Software (SRS). The second
Part of the Standard contains guidance on the requirements contained in Part 1.
This guidance serves two functions: it elaborates on the requirements in order to
make conformance easier to achieve and assess; and it provides technical
background.” [31]
Normally it can be stated that other documents (e.g. the safety plan) are sufficient to demonstrate
an adequate level of safety. Here the arguments for using a Safety Case instead are considered.
• Provide a demonstration that an adequate level of safety is reached (Identify and justify
unsolved hazards, ensure that risks are acceptably low and present or reference evidence)
• Explain how safety is maintained throughout the lifetime of the system
• Minimize licensing risk (ability to demonstrate adequate safety to the regulators and
assessors)
• Minimize commercial risk (ensuring maintenance and implementation costs are acceptable)
Conclusion of the previous four advantages: A Safety Case helps the regulator and customer
to control the risks and costs his product will have and that these risks have been effectively
managed.
The main components are divided into the following three main categories. The understanding of
those elements will be improved during reading this report. In later chapters graphical notations are
introduced. From there on it is easy to understand the whole impact of the following three
components:
Requirements: Point of discussion
Arguments: Explanation and relationship between requirements and evidence
Evidence: Information that supports the claim that the safety requirements and objectives are met
Safety Argument
Safety Evidence
E.3.4 Arguments
Arguments are actions of inferring a conclusion from premised propositions. Conclusions should
always be either TRUE or FALSE.
An argument is considered valid if the conclusion can be logically derived from its premises.
Otherwise the argument is considered invalid.
An argument is considered sound if it is valid and all premises are true.
It should be mentioned that the argument has an important but often neglected role. It goes hand-
in-hand with the evidence. Evidence without argument is unexplained (for example thousands of
test results without the link to the objective) and an argument without evidence is unfounded.
Qualitative compliance with rules that have an indirect link to desired attributes (example:
staff skills and experience). These arguments are more difficult to evaluate. The
ratings might be assigned by expert judgment.
To describe how arguments can be designed and to evaluate them an argument requires a clear
and demonstrable understanding of the elements and structure. Govier [5] uses a graphical
notation:
Single support
Linked support
An Argument is said to be hybrid if it doesn’t fit to one of the three structures described above. In
the convergent support pattern further differentiation is achieved by researching the independence.
There are two kinds: conceptual independence (different underlying theories) and mechanistic
independence (different approaches but the same theory). This will be of greater relevance when
the Safety Case is analyzed in more detail.
As Kelly observes in his doctoral thesis [1]: “Extra structure such as this makes the process of
constructing a safety justification more predictable and manageable, e.g. so that the forms of
premise required to justify a particular conclusion are known.”
It is recommended that the three Argument structures above are understood and kept in mind
while creating a Safety Case, because they help to build up sound and strong Arguments.
Toulmin’s scheme [6] asserts that arguments consist of grounds, claims, warrants and backing:
• Claims are statements the truth of which needs to be confirmed.
• The justification for the claim is based on grounds “specific facts about a precise situation
that clarify and make good the claim”.
• The argumentation of the facts to the claim in general ways is called warrant.
• As basis for the warrant Toulmin introduces backing for the warrant. It includes the
validation for the scientific and engineering laws used.
Warrant
Backing
Claims Grounds
E.3.5 Evidence
At this stage it is roughly explained what evidences are. A catalogue would be more detailed (will
be given at a later stage, see chapter E.7.6, Evidences) but this part is only descriptive.
Evidence at different stages can be:
• Sub-claims (separately explained with sub arguments)
• Descriptions (of the actual design e.g. Block diagrams)
• Facts (test results, Analysis Reviews, formal methods, etc.)
• Assumptions (sometimes necessary but not always real)
To get evidences Bloomfield and Bishop [3] suggest taking a closer look at the following points:
• design
• development processes
• simulated experiences (testing; this can be divided in several sub-tests)
• prior field experience (analyzing the past)
Within [7] there is only one top goal. In most cases this is “{System X} is safe enough to operate”.
The more detailed look on this statement will be given later as well as a closer look at Arguments.
The Safety Case is mostly divided into several sub-goals but these sub-goals are part of the
Argument itself. The sub-goals are often out of the following kinds:
• Reliability and availability
• Security (from external attack)
• Functional correctness
• Time response
• Maintainability
• Usability and Accuracy
• Robustness to overload
• Modifiability
The prior objective was to show that the system is “safe enough”. For this reason the ALARP (“As
Low As Reasonably Practicable”) principle is one of the most important basics of the Safety Case.
For further information the reader is referred to Appendix D. It is a mixture of philosophical and
judicial ideas. It tries to differentiate whether the risk is:
a) Too big and must not be tolerated
b) Small enough to be neglected or
c) The risk lies between a) and b) and therefore it is decreased as much as practicable
The ALARP principle says that the risk should be minimized. At least it should be fixed in a
practicable region. The responsible individual or responsible organization tries to show that the risk
is tolerated by society, and that the reduction of risk would be out of proportion to its costs
(“financial trade-off between cost and level of risk” [8]). Later in this appendix there will be
contextual information added where this principle is the basic of. Furthermore the so-called ALARP
Pattern can be applied (refer to chapter E.8.2, Reusing Arguments by applying Patterns).
The methods, analysis techniques and frameworks described within the subtasks A to D within
WT3.1 are used as arguments within the Safety Case. The validation and verification results for
the requirements of Appendix E are used as Evidences. The EEP described by WP4 is used for
the argumentation about the development process. Results of WT3.1 will be found in the Product
branch of the Safety Case (E.6.3.2) and the Process branch of the Safety Case (E.6.3.1). Results
of WP4 will be found in the Process branch of the Safety Case (E.6.3.1).
A Safety Case can be developed at different stages. Historically it was not. At early phases of the
Safety Case the design was created and the Safety Case was produced at the end. But Safety
Case developers and practitioners found it better to create it incrementally, as described in the
next paragraph.
Construction and
Development Codes
First it should be mentioned that the Safety Case always has to be started at the earliest phase
possible.
The Defence Standard 00-56 [4] says: “The Safety Case should be initiated at the earliest possible
stage in the Safety Program so that hazards are identified and dealt with while the opportunities for
their exclusion exist”. The idea is to start the so called preliminary Safety Case when development
of the system is started. By this there maybe is a possibility to handle some of the hazards before
the planning stage is completed. If the design of the product is already produced or even has
started production it is more difficult and expensive to rectify an aspect of the product likely to
cause or lead to a hazard. Another Problem that arises if the Safety Analysis is started late, is that
the Arguments can be less reliable because the design can’t be influenced by safety decisions.
This leads to an evolving Safety Case. It can be compared to the Design Lifecycle as shown in the
following graph:
Requirements Complete System
Implementation
Preliminary SC Final SC
Interim SC
The Preliminary Safety Case is started as soon as a relatively stable and controlled system
requirements specification is available. After this stage discussions with the customer can
commence about possible safety issues (Hazards). The second phase is the Interim Safety Case.
It is situated after the first system design and tests. The third part is the Operational Safety Case
and it is placed prior to in-service use. All in all it is rather one incremental Safety Case than
several complete new ones.
Before talking of a Preliminary Safety Case and what it is the Safety Plan is explained. It consists
of the following points:
• Safety Process Definition, Tasks, Schedule, Resources, Work Products
• Roles, Responsibilities
• Staff Competencies, Skills and Experience Matrix
• Reporting Arrangements
• Contractual Agreements
• Dispute Resolution Provision
The Safety Plan is created on the basis of the safety capabilities which are previously developed
and the particular requirements of the project. Resources allocated to safety tasks and risk
mitigation work would be based on past experience of the organization. Progress of safety work on
the project would be monitored against these provisions.
Before the Preliminary Safety Case can be started, the following has to be accomplished:
• Design key safety processes roles and responsibilities
• Identify safety properties
• Preliminary Hazard Analysis (systematic review of system design concept) and Risk
Estimation (check the severity level and likelihood of detected hazards)
As a result of this the Preliminary Safety Case should include the following:
• System contents and top safety requirements
• Important standards or points of principle concern for the final Safety Case
• Main safety concerns (results of Risk Estimation and Hazard Analysis)
• Safety Analysis or tests
• Development description (Change Management, coding standards, etc.)
• start from scratch proofing Evidences and explain how to ensure that the system is safe
So in fact the Safety Plan already includes most of the information that is needed to create the
Preliminary Safety Case.
As described in the YELLOW BOOK Volume 2 (page 1 – 4, figure 1- 1) [9] of railway safety, the
evolution of the Safety Case is a course of several safety activities. A safety authority is required,
to endorse some activities. Typically the process starts with the preliminary safety plan and goes
on with the Hazard Log1 and Hazard Identification2. Risk Assessment is the next point and setting
1 Hazard Log: Lists all potential hazards of the project or product in its environment.
up safety requirements. The next step is the preparation, endorsement and implementation of a
safety plan (as explained above). An independent safety assessor should carry out safety
assessment (this means an evaluation of the safety or risks) and work out a report. The final action
is preparation of the final Safety Case and with it the Safety Approval. The three phases of the
Safety Case merge into one another so the borders are fluid. It should be mentioned that the
Safety Case has to be continuously updated and maintained.
Preliminary Safety Case Interim Safety Case Operational S. C.
time
Safety Authority
2 Hazard Identification: Identify hazards through a systematic hazard analysis process encompassing detailed analysis of system
hardware and software, the environment (in which the system will exist), and the intended use. Consider and use historical hazard and
mishap data, including lessons learned from other systems.
As defined in Def Stan 00-56 [4] a Safety Case Report is a summary of the Safety Case which is
given at specific stages either every period or after special achievements to ensure the Safety
Case is done properly. It also provides insight in all safety management activities. It is like a
snapshot taken for particular reasons e.g. to provide evidence that requirements from a standard
are met.
It is thinkable to plan those reports at special times during the development. The definition of those
points can be seen as defining milestones.
Safety
Case
Summary
Safety Case
report nr:
0815
The safety requirements are an integral part of the Safety Case report as well as the hazard log. In
fact the Safety Case report is a summarizing documentation of the Safety Case which itself is a
“documented body of evidence… accessible at different levels of detail”. For developing,
communicating and reviewing it is necessary that a report is created. The safety lifecycle and the
development of a Safety Case should be tightly linked. The selection of testing and evidence for
the software should be dependent upon the Safety Case. In JSP 318B [10] the Safety Case
evolves during the whole safety life cycle. Figure E.8 also explicitly shows the differentiation
between the evolving Safety Case and the final Safety Case report that is produced.
Figure E.8: Role of the Safety Case report JSP 318B [10]
The safety argument is often poorly communicated through the textual narrative of Safety Case
reports. The Goal Structuring Notation (GSN) and the ASCAD, presented within the next chapter,
have been developed to provide a clear, structured, approach to developing and presenting safety
arguments.
As already described the Safety Case report is “a report that summarizes the arguments and
evidences of the Safety Case, and documents progress against the safety program”. This function
implies the following contents:
• Executive Summary
A summary of the whole Safety Case is necessary. All important facts and Argument steps
could be included here. By this an overview is given.
• Summary of System Description
This is to ensure that the same system is underlying.
• Assumptions
All assumptions should be summarized so that the reader knows under which conditions
the safety should be guaranteed (e.g. numbers of personnel, training levels, time in service,
operating environment etc.).
• Progress against the Program
The current status has to be described as well so that the reader knows how much
progress has been made.
As William S. Greenwell, Elisabeth A. Strunk, and John C. Knight noted in their work [11] faults
often reveal themselves shortly after deployment. This implies a clear relation between System
Development and Failure Analysis. It is intuitive to prevent failures before they can occur. To do so
a primary failure analysis should be carried out to discover failures (accidents etc.) at an early
stage and the results used to prevent them from being implemented. The earlier they are known
the better, in terms of cost, product and company image, easier defect removal, etc.
Three main fault categories are defined [12]:
1) Random failures that are within the scope of the system’s safety requirements;
2) Attempts to operate the system outside of its intended environment; and
3) Failures resulting from defects that compromise the system’s ability to meet its safety
requirements
Problems with the field fault analysis make it hard to identify the causes of the fault:
1) Complexity of the system
2) Informality of the failure analysis process
3) Differences in designs and in development practices
Failures that lead to hazards can be an indication for the insufficient level of Safety. Two Safety
Cases can be defined: The Safety Case before (Original) and after (Revised) the failure. The post-
failure Safety Case is essentially the result of correcting the original, flawed safety argument.
Operation
Error
E.5 Notations
This section describes why it is a great advantage to use graphical notations for the explanation of
arguments in the Safety Case, over the use of plain text. After presenting arguments in favour of
the usefulness of notations the two most common ones will be introduced and explained.
Details required by most standards and complex systems make Safety Cases grow very fast and
so they quickly reach a size where clarity is lost, and it becomes very difficult for most people to
read documents of such a huge size. Linear documents (e.g. Figure E.10: Free text format) are
difficult to overview because the structure of the arguments is not easy to find. Furthermore there
is mostly one person holding all the information and arguments together and only this person
knows what he/she refers to.
While creating the Safety Case many people work together, typically leading to many different and
conflicting views and sources.
Referring to Bloomfield and Bishops work [3] the main problems during construction of Safety
Cases without graphical and other tool support are the following:
Size and complexity
As can be imagined, the Safety Case grows very quickly to a huge size. In “The contents of
the Safety Case”, chapter E.7, it is described what contents a Safety Case should include.
Complexity is the other side of the same problem. Evidence is always technical and very
hard to understand. Without graphical support it is almost impossible to understand why the
system will be safe.
Co-coordinating and presenting results from many different sources
The Safety Case consists of many analyses and test results from many different
independent people. Were they from one person the trustworthiness of the evidences
would be lower, but this diversity of sources compounds the problem of collation and
presentation.
Use of Free-format Text
Using a free text format makes it more difficult to see the key point. Furthermore there is
very bad English used in most of the free text Safety Cases. Figure E.10: Free text format,
taken from Kelly’s [13] paper, provides an example for this:
For hazards associated with warnings,
the assumptions of [7] Section 3.4
associated with the requirement to
present a warning when no equipment
failure has occurred are carried
forward. In particular, with respect
to hazard 17 in section 5.7 [4] that
for test operation, operating limits
will need to be introduced to protect
against the hazard, whilst further
data is gathered to determine the
extent of the problem.
E.5.2 ASCAD
The ASCAD (Adelard Safety Case Development) notation was developed by Adelard LLP which is
a company dealing with software and safety in general. ASCAD is a total Safety Case development
strategy. It is based on Evidence-Argument-Claim structure.
SHIP (Safety of Hazardous Industrial Processes in the Presence of Design Faults) was a project
which overall objective was to devise a means of assessing, ideally numerically, the achieved
reliability or safety of a system in the presence of design faults.
The SHIP model of the Safety Case defines the elements:
• Claims about properties of the system or subsystem
• Evidence used as the basis of the safety argument
• Argument that links the evidence to the claims via a series of inference rules
• Inference rules provides transformational rules
Whereby three types of argument are distinguished:
• Deterministic – relying upon axioms, logic and proof
• Probabilistic – relying upon probabilities and statistical analysis
• Qualitative – relying upon adherence to standards, design codes etc.
With this definition graphics like the following can be developed to describe the Safety Case:
Together result in
Conclusion Argument
Conclusion structure
Together result in
Claim
Adelard developed a notation for deducing the Evidence to the Claim. The notation has two
common names: ASCAD (Adelard Safety Case Development) and CAE (Claim Argument
Evidence). In the following the name used is ASCAD because it is the official one referring to the
Adelard homepage (www.adelard.com) who are its developers. Claims are represented by blue
circles, evidences by purple squares and Arguments by green squares with round corners (as
illustrated in Figure E.12).
The phrasing in the ASCAD notation is relatively free. There is no specific form. Already existing
ASCAD networks have the following word structures:
Claims Claims are statements that enforce a TRUE/FALSE conclusion and are stated in
<noun> <predicate> phrases.
Arguments Arguments are stated in <noun> <predicate> phrases. This can be done by a
sentence starting with Argument because… but also it is possible to formulate any
sentence that explains the connection between the upper claim and the element
below.
Evidences Normally evidences are stated in a noun phrase. Only the name or description of the
document itself is given.
Other Other statements can be from any kind and is only additional information that not
itself gives an Argument. Justifications, comments, etc. can be stated here as well as
list of staff etc.
To find out what ASCAD has in common with the Toulmin’s scheme both notation structures are
illustrated in one figure:
Warrant
grounds Claims
As mentioned in the ASCE manual [7], the typical Safety Case structure review process consists of
the following steps:
1) Make an explicit set of claims about the system.
2) Identify the supporting evidence.
3) Provide a set of safety arguments that link the claims to the evidence.
4) Make refinement based on review and evaluation.
5) Make clear the assumptions and judgments underlying the arguments.
6) Allow different viewpoints and levels of detail.
The Goal Structuring Notation (GSN) is a graphical argumentation notation developed by the
University of York. A good point for starting out with GSN is Kelly's PhD Thesis [1].
Goal
Context Model Solution
<Subject> - Descripti
<Predicate> Normally no <Subject>
on
phrase Predicate
M1 G1 J1
A1 Argument
by/ over/...
G1.1 G1.2 A1
S1 S2
This is one of the main differences between the two notations: the names of the elements. This
point shows that the two are quite similar. This short section should describe the elements and the
phrasing of the GSN notation.
Within this description the terms <noun-phrase> and <verb-phrase> are used with the following
meanings:
A noun phrase consists of a pronoun or noun with any associated modifiers, including adjectives,
adjective phrases, adjective clauses, and other nouns in the possessive case.
A verb phrase consists of a verb, its direct and/or indirect objects, and any adverb, adverb
phrases, or adverb clauses which happen to modify it. The predicate of a clause or sentence is
always a verb phrase.
Goal What the argument must show to be true. This can be a requirement, target or a
constraint. Kelly proposed in his doctoral thesis [1] the following. The goal should
be a TRUE/FALSE statement. By this is meant that it should be always either
TRUE OR FALSE. Furthermore it should be of the following form: <noun-
phrase><verb-phrase> whereas the noun-phrase means the subject in the
statement and the verb-phrase means the predicate (e.g. “system is safe”).
Strategy Breaks down a goal into a number of sub-goals or leads the goal to the solution. It
is recommended that strategies are of the following form: “Argument by
<approach>”, “Approach over <approach>”, “Argument using <approach>”,
“Argument of <approach>” – the focus of these is the argument itself.
The Strategies can be mentioned explicitly between goals and sub-goals or
solutions (next element explained), but if it is clearly understandable without it can
be left out. It exists as an aid to understanding.
For better illustration a little example from simple mathematics is given. In the
following problem a value for x is sought. To isolate x from the rest we must
divide by x. Formally written down this is illustrated by “/: x”. This can be left out,
but to argue how we came from one line to the next this step is added.
In addition GSN has two display rules which drive the annotation for nodes which are yet "to be
developed" or "to be instantiated". The following figures are (usually) appended on the bottom of
the goal.
to be developed
to be instantiated
to be developed
and instantiated
Furthermore, there are special relations adding numbers to the arrows to relate to 0…n (a), 1/ 0
(b), or to 1 element (c)
a) b) c)
n
O
The Is_A relation provides a basis for the expression of super-type and sub-type relations between
GSN entities and can therefore be used to establish type hierarchies and is demonstrated as
follows.
G2 G3
All hazards are Process carried
eliminated out properly
Ghazard
Hazard i is (i = 1 to n)
eliminated
between “to be developed” and “to be instantiated”. The arrow between G2 and Ghazard is marked
with n. It is made clear that this relation exists n times.
Figure E.19 shows the relationship between the Goal Structuring Notation (GSN) and the original
Toulmin concepts, where a GSN “goal” is equivalent to claim, which is “solved” by strategies (equal
to Backing and Warrants), sub-goals and solutions (which can be related to Grounds):
Warrant Claims
grounds
Goal
Solutio Strateg
n
As Kelly described in his work: “A six step Method for Developing Arguments in the Goal
Structuring Notation” [14] the construction of a GSN network can be divided into six steps:
Step 5: Elaborate
Strategy
Step 3: Step 6:
Step 1: Identify Define
Identify strategies to Basic
goals to be support solution
supported. goals.
Main-Goal Model
…
Sub-goal Sub-goal Model
1 2
Main-Goal
There are three things that should be borne in mind while creating Arguments in a Safety Case:
• Intelligibility: Consider the reader.
• Clarity: Everything should be stated in such a way that misunderstanding is avoided.
• Invulnerability: Explain everything from the ground up.
This method supports all three points.
At first the elements are compared. In the following table the three main types of elements are set
next to each other to underline that beside the colours and the names, the main elements are the
same.
Warrant+
Argument Strategy
Backing
Illustration
ASCAD GSN
Additional
graphical
elements
J A
Table E.3: Comparison of graphical methods II
Direction of arrows Actually the semantics of both are the same. In GSN the links are the
passive “[parent] is solved by [child]”, but the arrow could be drawn the
other way round and say “[child] solves [parent]”. Similarly in ASCAD
there could be downwards arrows and say “has evidence” etc. Effectively
both have a direction of argumentation support from child nodes to
parent node, it is just a difference in vocalization how the links are
expressed.
The direction indicates also flow of argumentation and construction, e.g.
top-down or bottom-up. Special cases are the arrows to the additional
elements in GSN (mainly drawn horizontal). These arrow are used to link
the additional elements to a goal.
GSN is more specific The GSN uses “models”, “assumptions”, “context” and “justification”
because of more whereas ASCAD Notation only uses “other”. Furthermore there is a sign
implemented nodes for undeveloped and uninstantiated. This is more specific and helps to
ensure that all important facts are listed. For example the model helps to
think about the exact description of the system.
ASCAD is easier to understand and present to someone with a non safety background.
Luke Emmet from Adelard explained when to choose which notation by the following lines [15]:
“Each author needs to find the best trade-off for his/her argument between being
suitably expressive but not being too verbose. Most of our users find a mixture of
graphical elements and explanatory narrative behind the nodes to be about right.
…
Some people prefer GSN in contexts where a "top down" planning/decomposition
flavor is helpful (e.g. perhaps planning a Safety Case and evidence to be
collected). Others prefer CAE in a context where evidence might already exist and
you want to show the best claim that it can support. But you can use both for each
of these purposes.
Some customers use both at different times in their process.
In summary, we support both notations equally (and some others besides), and it
is up to you to choose the one that suits your needs and context best. At Adelard
we use both with a slight preference for CAE.”
This might be the best answer to the question what to use. Whereas it has to be mentioned that
Luke Emmet (as he also mentioned in the lines above) is working for Adelard. This company
supported the implementation of ASCAD and implemented the “ASCE” tool (described later). In the
following GSN structure will be used.
Advantages:
Regardless of the differences, both notations have great advantages regarding free text format for
developers of Safety Cases. The five main advantages could be the following:
1. Help to construct With these trees it is much more clearly arranged. During the
Safety Cases construction it is easier to follow up “limbs”.
2. Overview of structure A graphic is a fast way to oversee all items and its relations.
3. Contents All contents are represented in the graph.
4. Easy to understand once the notations are learned and understood.
5. Guidance is provided by literature and internet.
Disadvantages:
The main disadvantages should be mentioned as well:
1. Learning is In contrast to the free text format, how to create the notation has to be
necessary learned. By this is meant not only the content but how to “draw” it. Reading is
quite easy but writing can be very complicated.
2. Quality is not The normal notation itself is not talking about the quality of an argument. To
addressed do so, information like the SAL has to be added (see E.9.2, Assessment with
SAL).
This section is concerned with identifying the necessary activities and information to construct a
Safety Case. Several “Safety Case Development Methodologies” have been developed. “Arguing
Safety – A Systematic Approach to Managing Safety” [1] gives a good introduction and provides an
overview of past and current research concerning Safety Case development. In particular, the
works of the following projects have been presented:
• ASAM (A Safety Argument Manager), ASAM-II and SAM
• SHIP (Safety of Hazardous Industrial Processes)
• Communication in Safety Cases
• Adelard Safety Case Development Method(ASCAD)
• SERENE (Safety and Risk Evaluation using Bayesian Nets)
As stated by the ASCAD manual, a typical Safety Case development process consists of the
following steps:
• Make an explicit set of claims about the system.
• Identify the supporting evidence.
• Provide a set of safety arguments that link the claims to the evidence.
• Make refinement based on review and evaluation.
• Make clear the assumptions and judgments underlying the arguments.
• Allow different viewpoints and levels of detail.
There are a number of different stakeholders who are involved in Safety Case development or
evaluation, each of which approach the Safety Case from a different viewpoint, such as:
• Safety specialists involved in detail.
• Development staff that may not have safety as their primary concern.
• Project managers.
• Operators.
• Senior staff who accept equipment as safe to enter service.
• Supply chain who may be involved in the production of safety evidence or arguments about
sub systems.
The first approach that is introduced in this appendix is comparing the safety development with the
engineering process. In an engineering process the main system is typically divided into
subsystems:
Argument Aufteilung in
over Komponenten
Argument by...
PFH HI
...
Attribute Function
The following description is not the only possible way to build up a Safety Case. It is just an idea
based on personal conversation with Dr. Robert Weaver. The following description is not a
standard form and there is no guarantee for its completeness or for its correctness.
The first step involves differentiation between process based and other product requirements.
While the normal requirements are changing more often they have to be looked at each time
anything changes.
As illustrated in Figure E.24 this first split is traceable. In the Process based Argumentation there
will be topics like qualified staff, project planning, standards and other project based topics
whereas in the product based Argumentation are design aspects, functional reasons and safety
components.
Process is Product is
safe safe
By checking the process the question should be answered of how the product is produced. In
many cases there are standards for the underlying industry. These should always be noted in the
Safety Case, along with a description of how they have been met. Some standards provide
minimum requirements which should be reached (e.g. SIL3 by IN 61508). The Process could be
divided into Development and Production.
Process 1 developed
in a way that it is safe
Hazard Identification
and Assessment
performed ...
C: Hazard C:
checklist Preliminary
HI results
Preliminary Hazard PHI results implemented
identification done functional hazard
C: Previous analysis C: All
Experience system
(old lists) functions
described
HI FHA
results
results
3 The System is thing which is developed. The development is divided into a part how the thing is developed, which is the process part,
and into what is developed, e.g. functions and properties, which is the product part.
The Product itself should be shown to be safe enough. This can be given by standards as well but
it has to be shown that this is enough and that the Norms and standards have been met. One
possibility is to satisfy Risk based Arguments for example over its individual hazards. The
Arguments can be split according to these criteria.
...
Product is
safe
Argument over
Risk/ hazards
Argument
by functions
System or Subsystem
6 step method
6 step method
As Kelly described in the Appendix of his work “A six step method... ” [14] Arguments can be
developed from requirements. In the first step, different categories of requirement – functional or
performance – are found. Of course there are a lot more but at this stage those two should be
sufficient.
G1
requirements
are met
Figure E.29 is just an example and only used for illustrating the development of argument. The
only safety related function could be the correct braking when a difficult track section is imminent
(G2). There can be other safety related requirements as well, for example in-service use,
development, safety criteria and more but in the following the focus lies on those two.
Another idea would be to analyse the evidence already available and try to analyse what is the
best claim that can be supported by it. In this case ASCAD notation is used. As already stated both
notations would be possible, but because of the defined phrasing this one was chosen at this
stage.
Best
Claim
Argument
Already Already
given given
Evidence I Evidence II
To find out what the contents of a Safety Case are already existing Safety Cases are analyzed and
similarities between them are uncovered. In general equivalent topics are included. The next
pages provide a short description of these topics.
E.7.1 Topics
There are some important main topics which are introduced first will be explained in the following
paragraphs. Of course most of the contents of the preliminary Safety Case are included in the final
Safety Case. It cannot be decided which topic belongs to which phase because it should be an
incremental development.
1) Introduction
2) Description of the system
3) Standards
4) Quality and Safety Management
5) Current status and development process
6) Configuration and Change Management
7) Conclusion
E.7.1.1 Introduction
This part is as its name suggests the first part of every Safety Case. It should introduce the Safety
Case itself and describe how it is partitioned. It names the most important contents and the general
preview on how they are handled. Also, the Motivation for creation of the Safety Case, and whether
it inherits from other Safety Cases (e.g. sub-system) can be added. In the end there should be a
clear overview of the Safety Case and its contents (an executive summary). The input will be given
by the points below.
One of the most important points in the Safety Case always is the adequate model description
(What kind of model it should be and what will be modelled). It is essential to ensure the system is
always discussed in the same context. Without this information the Safety Case is useless. It
would be dangerous to believe in safety without having a clear view of which system is being
studied. At the end there should be at least the following documents: requirements for staff
(preparatory training for new staff, experience …), exact environment description (in- or outdoor,
road or terrain), model description, extensions, adjustments, aims, main functions, and more. In
short everything that helps to describe the system in more detail.
E.7.1.3 Standards
At this point every single method should be mentioned to reconstruct how a certain level of
safety will be achieved and to check if everything was done finally.
Where can information regarding the timeliness of actions and the versions of artefacts be found?
If it cannot be ensured that changes reach all affected parties, it may be that other changes
are affected, or at worst disregarded. This can cause confusion (concurrent work,
misunderstanding …). A good change and configuration management can provide a
solution to this.
E.7.1.7 Conclusion
As stated in Safety Case and Safety Case Report [17], the Safety Case body of information will
include outputs from all the Safety Management activities conducted on a Project. The following is
a list of documents that could be included. But which document really is required depends on the
Arguments. Normally the documents are a subset of the following:
a. Safety Plans;
b. Disposal Plans;
c. Hazard Log;
d. Register of Legislation and other significant Requirements;
e. Minutes of Preliminary Safety Case meetings;
f. Safety Reports (e.g. Hazard Identification, Hazard Analysis, Risk Estimation, Risk
Evaluation);
g. Safety Assessment or Safety Case Reports for particular aspects of the system or
activities associated with the system (e.g. Software Safety Case, Disposal Safety
Assessment);
h. Safety Requirements;
i. Records of Design Reviews and Safety Reviews;
j. Verification Cross Reference Index;
k. Incident reports and records of their investigation and resolution;
l. Safety Audit Plans;
m. Safety Audit Reports;
n. Records of Safety advice received;
o. Results of Safety inspections;
p. Records of Safety approvals (e.g. Certificates);
q. Minimum Equipment List (i.e. vital to Safe operation);
r. Emergency and Contingency Plans and Arrangements;
s. Limitations on Safe Use;
t. Master Data and Assumptions List;
u. Evidence of compliance with Legislation and Standards;
v. Evidence of adequacy of tools and methods used;
w. List of people and their activities;
x. Signed statement;
y. Results of Tests and Trials;
z. Plans for Tests and Trials;
aa. System description;
bb. Design process description;
cc. Verification results;
These documents could be either included into the Safety Case or could be referenced by the
Safety Case. Both ways have their advantages and disadvantages. The inclusion of the documents
has the benefit that all information is present in one document. The main disadvantages are the
size of the document (it cold be expected that the document will consists of several 100 pages)
and that the intellectual property is shown completely in the Safety Case. The last disadvantage is
sensitive because the Safety Case is a “public” document which should be presentable to every
one interested. Therefore the referencing to the supporting documents is recommended because
the Safety Case contains the argument why the system is safe enough and any interested person
could ask for the supporting documents which are identified uniquely in the Safety Case.
This section will cover a description of the safety arguments. The safety arguments link the claims
to evidence. Examples of methods such as the Goal Structuring Notation (GSN) will be given in
Section to illustrate the essential features of this step.
Within “A Methodology for Safety Case Development” [3] the three types of arguments are
mentioned; deterministic, probabilistic and qualitative.
They state further: “The choice of argument will depend on the available evidence and the type of
claim. For example claims for reliability would normally be supported by statistical arguments, while
other claims (e.g. for maintainability) might rely on more qualitative arguments such as adherence
to codes of practice.”
Because Arguments can neither be categorized nor standardized the reuse is not simple. This
discussion is pushed forward to chapter E.8.2, “Reusing Arguments by applying Patterns”, which
will discuss the reusability of Arguments.
This section will cover a description of the supporting evidence. This is concerned with determining
which sources could be used, e.g.
• Design
• Results of Safety related activities
• Development Process
• Verification and Validation
• Field experience
The choice of argument will depend in part on the availability of such evidence, e.g. claims for
reliability might be based on field experience for an established design, and on development
processes and reliability testing for a new design.
For EASIS maybe new sources of evidence will be required and described.
Before categorizing elements the focus lies on the difference between HW and SW. Then
similarities between already existing examples are searched. After thinking extensively about
claims it was found out that it makes no sense to categorize them. Goals are part of Arguments
except the main goal. But before the Arguments are further analyzed the evidence will be the point
of interest.
Today most systems include SW, regardless of which systems are considered, from calculators to
cars or of course computers. There are, however, big differences between hardware and software,
which are analyzed in the following.
With regard to testing the following list of differences exists:
Hardware Software
Failure in development production or Failures are caused by humans during
1.
maintenance. software development.
7. Reliability can be enlarged by redundancy. Redundancy often does not help much.
8. Failure rates often follow the same pattern. No exact failure patterns existent.
Hardware Software
Table E.4: Differences between hardware and software with regard to testing [18]
Generally hardware and software have different design and development processes. Due to this
they have different behaviour. This behaviour of course causes different failures which can lead to
hazards. In general it should be mentioned that any failure has to be removed or at least detected
and be thought of. In safety critical systems the importance of finding failures is much higher. The
structure of the rates of failure differs between hardware and software, and this will be illustrated
on the next pages.
The Hardware failure rate has the following structure:
f Releases
1 2 3
…
t
Conclusion:
The most important difference between HW and SW is that HW failures are probabilistic and SW
failures systematic. This is why the breakdown of HW can be estimated, by e.g. MTTF (Mean Time
To Failure). For this reason many documents discuss only the HW failures and there are special
analysis and test techniques which can only be addressed to HW.
E.7.6 Evidences
The evidences address different aspects of the goal (e.g. assessment technique (SAL, review,
etc.), tests, analysis, etc.). It always has to be ensured that each of them is defensible enough to
confirm the underlying statement. There are three principles that should be borne in mind while
planning evidence.
1) Know the goals (necessary and sufficient. Only as much evidence as is required to confirm the
underlying statement is needed, and no more)
2) Satisfy the requirements
3) Proof has to be efficient
E.7.6.1 Attributes
Important aspects when dividing evidence are independence, relevance, coverage [20] and
strength. These points are further discussed below.
Independence:
Level Type of independence
I Conceptual and mechanistic Independence
II Conceptual but not mechanistic Independence
III Mechanistic but not conceptual Independence
IV No Independence
As Adelard states in the manual of the ASCE tool, sources of evidence could be out of the
following:
• Hazard Log: listing all potential hazards of the product in its environment using results of a
HAZOPS activity, etc..
• Probabilistic design assessments and qualitative design studies: These sources of
evidence can be divided into: inductive (e.g. FMEA) and deductive (e.g. FTA) Methods.
• Resource estimates: Resources used for the implementation and the associated Safety
Cases are assessed by estimation methods.
• Design techniques: Some design techniques implement prior evidence
• Independent certification (e.g. COTS products)
• Experience from existing systems in field operation
The most important methods for getting evidences are: tests, reviews, analysis and formal
methods. For further reading the reader is forwarded to EASIS WT 3.2. At this stage it is just
mentioned that the trade-off between formality and complexity has to be considered very carefully.
This difference is illustrated in the following figure:
Complexity/
Costs
Trustworthiness
Formal
methods
Analysis
Reviews
Tests
Formality
In software design architectural principles are applied to achieve a better structure and to avoid
systematic failures. Safety engineering also profits from this benefits. There are three main
principles that are explained in this chapter: Tactics, Patterns and Modules.
In architecture design decisions that influence the handling of attributes are called Tactics. If there
are multiple tactics applied it is called strategy. Taking this definition brings up the following
definition for safety Tactics:
Safety Tactics are design decisions that influence safety attributes and their handling.
As a consequence it gets clear that the attributes have to be found first. In this report safety is
considered and so safety attributes are the starting point. In his MSc thesis Weihang Wu [27]
bases his tactics on the following failure attributes:
Failure classification
• Failure cause
• Failure behaviour
• Failure property
These points can be further sub classified. Weihang Wu finally stopped with the following structure:
Failure
Tolerability Detectability
Timing Environment
In Safety Tactics some important headings should be handled. The list below illustrates them:
Aim At the beginning the reader should get a short summarize so that he knows if
this Tactic is suitable for his project.
Description This description is more extensive than the aim. Here a more detailed
description is proposed.
Rationale The basic principle which normally is represented in a GSN structure.
Applicability In this part of the tactic description any Rules and circumstances are given.
Consequences What happens if failures occur?
Side effects Are other attributes affected by this Tactic?
Practical Some strategies that result out of this Tactic.
Strategies
Patterns Sometimes Patterns implement tactics. Those could be mentioned here so
that a clear link is provided.
Related tactics To ensure that the correct tactic is applied and that related ones are
considered as well they should be mentioned here.
Kelly compared some already existing pattern methods and chose the one from the so-called Gang
of Four [21]. This pattern format has the following topics:
Pattern name Name of the pattern (e.g. argument name)
Intent It is important that, through reading this section there is a clear understanding
of what is being attempted.
Also Known As If there is another name imaginable it should be noted in this section (maybe it
is already written by a different author).
Motivation Here a kind of help file for other engineers trying to interpret and apply correctly
the following description of the pattern (e.g. previous experiences, problems
etc.) should be given.
Structure A clear structure is necessary (e.g. G1 solved by Sol1 over Str1). If done so it is
possible to refer to any element.
The following Patterns were found while creating GSN structures for better understanding of the
subject matter. The Top-Level Spider Pattern is a structure that was repeated in almost every GSN
because of the necessity of a good start. The detailed model description, the definition of “safe
enough”, assumptions made and justifications why a specific structure can be chosen are to be
made at the beginning.
The idea of the “Systematic Fault Avoidance Pattern” was that the first decomposition should be
between process and product. By doing so, two different aspects are handled.
What processes were executed to ensure Safety? And how were the Safety related processes
implemented?
Intent The intent of this Pattern is to separate the processes and the system design. The design
can only be safe if the way to it was carried out properly.
Structure: G1
{System X} is
safe
C1
Product
development
Processes of
S1 {System X}
Argument over
process and
product
C2
Safety related
development
Processes of
{System X}
G2 G3 G4
Development Safety provision {Product C} designed
{Process A} carried {Process B} carried safe
out properly out properly
Applicability This is a very general pattern with a wide range of Applicability. For applying it at first C1
and C2 have to be carried out. For this you should check which standards can or should be
applied (e.g. IEC61508). This can be addressed by the development and the safety plan.
At the end you should have a list of the processes carried out. The list should be updated
in later stages but than you have to change the correspondent as well.
Consequences This pattern will result in 3 goals which have to be solved. G2 and G3 will have further
decomposition which is not topic in this pattern but it should be mentioned that processes
mostly have standards which describe the way they should be handled (e.g. qualified staff,
SIL, etc.). Another pattern could be created including contextual information about these
ways. G4 should be broken down further as well (e.g. by Functional Decomposition
Implementation Start by identifying the lists which should be presented in C1 and C2. This step can be
reviewed and has to be repeated or better replenished in later development stages. The
processes will have to be explained in more detail (e.g. Model) and the underlying
standards and guidelines should be applied.
Possible Pitfalls:
• Processes can be left out because the user thinks it is not important enough.
• Processes identified during product development are not implemented in G2 or G3
G1
Example: System “Safe
Speed” is safe
C1 List of
development
Processes
S1 according to
Argument over SPICE
process and
product
C2
Safety Plan
according to
IEC 61508
G2 G3 G4
Development Safety plan
“Safe Speed”
Processes carried processes carried
control unit is safe
out properly out properly
Top-Level-Spider Pattern
Author Roman Krzemien, Michael Amann
Intent The intent of this Pattern is to instantiate the top-level goal and the surrounded elements.
In other words the environmental context is set up.
Structure:
M1
J1{System X} is of a
System
suitable type for
description
this
J
G1
{System X} is safe
enough
A1 Assumptions
made for C1
reasoning this Definition of
A
acceptably safe
Applicability It is a very general pattern with a wide range of Applicability. For applying this Pattern it
should be checked which standards can or should be applied (e.g. IEC61508). This can be
addressed by the development and the safety plan. M1 should be instantiated first so that
the underlying system is ensured. This and C1 which describes the definition of safe
enough are absolutely fundamental.
Consequences This pattern can be seen as the starting point of every Safety Case. It is possible to add
more than these contexts. By this you could ensure that all relevant information is added.
By detailed information the system is caught in its environment. The danger of setting up
the system under wrong assumptions is minimized.
Implementation Start by identifying the detailed model description. Next step is the detailed instantiation of
the necessary definition of what safe enough means. This can help implementing the
structure. Often the other two points are closely linked. So it is better to start with
instantiating the assumptions and then the justifications.
Possible Pitfalls:
• The role of the assumptions and justifications can be neglected and so poorly
communicated.
• Poorly communicated Model description and definition of safe enough imply wrong
t t thi i th t
Example:
M1
J1 ALARP Principle
XK8 AJ26
applied in Pattern Electronic
J Throttle
G1
Jaguar XK8
Electronic throttle
is safe enough
A1 Throttle should
allow instantiation C1
Safe enough
of ALARP principle
A means that IEC
61508 was
satisfied
Kelly described a Safety Case Pattern Catalogue in his Appendix B of his doctoral thesis [1].
Top-Down ALARP Argument
Functional Decomposition
Hazard Directed Integrity Level
Argument
No specific domain: Control System Architecture
Breakdown
General Diverse Argument
Construction:
Safety Margin
E.8.2.3 AntiPatterns
In addition to providing patterns for standardizing what can be done, anti-patterns can be used to
illustrate some things that should not be done.
The main reasons for poor Arguments are:
- Fallacious arguments
- Incomplete arguments
Negative examples called AntiPatterns can be used to avoid a structure of being bad or poorly set
up. There are recurring mistakes made at several Safety Cases. One of them is that process and
product aspects are mixed and so poorly communicated. As well as the “Systematic Fault
Avoidance Pattern” which was explained above an AntiPattern can be instantiated.
The structure of AntiPatterns is almost the same as the structure of Patterns. The main difference
is that AntiPatterns have two solutions instead of one problem and one solution.
AntiPattern Name Every AntiPattern should get a unique name, so that misleading
argumentation is minimized.
Also Known As If there are different names imaginable they should be stated here.
Most Frequent Scope It helps to find that this AntiPattern could be applied if the scope is
given and highlighted at the beginning.
Refactored Solution Here the name of the better structure is stated.
Name
Refactored Solution The scope of the new structure is given.
Type
Background This should be given to explain the user why this structure is better.
General Form of this This field is to show the GSN structure that should be changed.
AntiPattern
Symptoms and At this place the consequences of the existing structure are stated to
Consequences show the difference.
Typical Causes Reasons that lead to such a structure can be given here.
Known Exceptions If there are exceptions allowing such a structure not to result in the
poor or misleading structure they should be mentioned.
Refactored Solutions The general form of the correct or rewritten structure is provided.
Variations If there are variations thinkable they could be added here.
Example An example is always helpful so it should be added to improve the
applicability.
Related Solutions Related solutions help to apply this AntiPattern if the author already
knows the related one.
Examples for AntiPattern are given by Formal Methods AntiPattern, History AntiPattern, and the
two below. The following AntiPattern is called Convergent & Linked support. Together with the
Overlapping Linked Support AntiPattern it is possible to refactor hybrid support into convergent
and linked support. The AntiPattern can be used to refactor the solution into a better form, at this
stage to help refactor poor argument structures. A more detailed discussion on AntiPattern can be
found in [26].
The first step when applying a Pattern is to ensure that this Pattern really fits to the underlying
situation. Normally there is not much to be shown but it should be mentioned (e.g. as a context:
“Pattern can be applied because…”). As a second step the six step method can be applied but
following the note structure provided by the Pattern itself. After everything is done the new
structure should be looked at again, to check whether all the consequences and implementation
facts are accounted for. It should be mentioned that the use of Anti-Patterns is equal.
E.8.3 Modules
As stated at the beginning Safety Cases grow very fast in size and complexity. Furthermore the
work is not done by one person alone but is divided amongst groups, teams, several individuals, or
even companies (e.g. OEM and supplier). For these reasons the idea of decomposing complex
systems into sub-systems became more and more important. In this chapter the idea of structuring
and managing complex system constructions by using “modules” is discussed. The affinity to
software architecture is obvious.
Architecture itself has two main principals. Firstly it provides a plan. Secondly it offers an abstract
representation to manage the complexity of the system.
The following points are of importance when considering Architecture:
• The assumptions, justifications and intended environment should be visible
• Components arguments and evidences have to be structured
• Relationships and interdependencies should be made clear
Comparing this to the definition of software safety on page 21 in the book: ”Software Architecture
in Practice” [22] (“The software architecture of a program or computing system is the structure or
structures of the system, which comprise software elements, the externally visible properties of
those elements, and the relationships among them.“) brings up a similarity to Safety Case design.
The idea arises that the principals that are used in software architecture are similar to those in
Safety Case engineering.
Principals that arise while thinking about modular systems are:
System level:
• The cooperation of the modules allows the operation of the system
• The system consists of several modules
Module level:
• The modules should be kept as independent as possible
• The modules help to understand and to work with the whole system by decomposing it into
smaller systems
Several units, in which a Safety Cases is divided to using logical grouping principals, are called
Safety Case Modules. These modules are strictly connected to each other and its relations have to
be noted carefully. The connections are called “Ports” in this report. Ports are the only remaining
link to other modules except in GSN. In GSN there are special notations for modules but at this
stage the focus lies on the theory around modules and their relations. As illustrated above another
important aspect arises. Interactions and dependencies have to be analyzed very carefully. These
interactions of systems are exactly reflected by modular contracts.
It is important that all interactions and relations are documented very carefully so that this
information is not lost.
• objectives addressed by the module
• evidence presented within the module
E.8.3.5 Contracts
In so called contracts the relations are commented. They should be created every time two or more
modules contact (“touch each other”). This is equal to software engineering where these
dependencies are also captured (define ports/ interfaces).
Safety Case Module Contract
Participant Modules
E.g. Module 1, Module 2, Module 3
Resolved Away Goal, Context and Solution References between Participant Modules
Cross Referenced Item Source Module Sink Module
Away Goal 1 Module 2 Module 3
This chapter should point out the main advantages and the main dangers.
Advantages:
As mentioned in the introduction of this chapter, Safety Cases are mostly created by more than
one person. To manage this it is necessary to split the whole into smaller parts and control the
parts as described above (ref. Figure E.39). The decomposition offers the possibility of working
concurrently and separately on the development of the Safety Case (e.g. in different groups),
which makes it faster and more trustworthy (because of different people working on it).
Another advantage is that, in case of changes, reviewing the whole Safety Case can be limited to
reviewing the affected module. E.g. in software architecture it is assumed that the system
functionality is still maintained even if a SW component which is not affected is not further
analyzed.
Dangers:
It should be mentioned that even standards talk about modularising (e.g. the MoD DefStan 00/55
talks about division of system and software boundaries and a contract by enforcing a link to the
system claims). A more detailed analysis for the occurrence of modules in standards is given in the
third paragraph of Kelly’s paper “Managing Complex Safety Cases” [24].
Referencing to Tim Kelly’s Paper the following potential dangers in Safety Case Decomposition are
listed:
• It will be difficult to ensure that all interactions between subsystems are treated carefully,
especially when they are from different suppliers. A solution to this problem could be to
assign at least one element to the safety considerations set by those interactions.
• The weight of the partial Safety Cases has to be checked by staffs that know the total
Safety Case. It should be taken care that weak subsystem components don’t undermine
some stronger dependant parts.
• The effect or effort can be enlarged when responsibilities can be distributed. But vice versa
dividing responsibilities has the potential danger of setting the “boarders too close”. In this
case it can happen that two or more engineers believe a part to be from another person.
As with most dangers it is almost impossible to eliminate them, but by carefully planning, in this
case of the overall Safety Case, the dangerous areas can be controlled an the level of danger can
be minimized.
For better illustration graphical notations are used. It is possible to include the modular approach in
GSN. It is important to notice that the reference between the module and its parent system has to
be clearly defined. Packages from the UML (Unified Modelling Language) Standard are illustrated
in the following picture:
There are two types of systems where modular Safety Cases are recommended in this report:
• Components Off The Shelf (COTS) and
• Systems Of Systems (SOS)
Both are possibilities of dividing work or systems into smaller parts. Figure E.44 illustrates how
different systems work together and what happens if a new system is integrated. The
dependencies and interactions have to be considered carefully so that the Safety Case is still
reasonable.
Figure E.44 illustrates that some components interact with others. In many cases these
components are produced by different teams or even suppliers. There is a difference depending on
whether a system is reused, designed for a specific system, or modified. COTS products are
normally reused and have different requirements and evidence than SOS which are normally
produced to fit in this specific environment.
SOS With many systems it is impossible to ask every single system to be safe. In the automotive
industry there are a lot of suppliers. The Company XYZ builds gearshift systems for a lot of
different OEM. The idea of the modular SC offers the possibility to ensure that their part of
the whole system is safe. This relation should be commented anyway (cp. other subtasks of
WT 3.1). The documentation of these interfaces could be done by Contracts. So the idea of
modular SC gets reasoned.
These types play an important role when the relationship between supplier and OEM (Original
Equipment Manufacturer) is considered. The underlying system (e.g. car) has a lot of independent
subsystems that are built by a different manufacturer (e.g. gearbox).
In this case modular contracts could be a possibility not to disclose company secrets.
E.8.4 COTS
COTS products are Commercial Off-The-Shelf components. This means standard commercial
software developed without any particular application in mind. The collection of component claims
and supporting arguments and evidences alone is not sufficient. A commercial vendor (e.g. of Real
Time Operating Systems) can provide a Safety Case for his product. In this case a contract should
be delivered as well to ensure that the component is introduced and applied correctly. It should be
a “partially complete” and “ready to use” SC. The claims arguments and evidences should be
stated clear and reasonable so that this part can be included in the system SC.
Bought products have the advantage that they reduce costs in the development stage.
Furthermore they are already produced and so the time to develop is reduced. In many
applications (e.g. aircraft, automotive industry, etc.) these components are absolutely necessary.
These components have positive and negative aspects. The main disadvantages are:
• unknown design
• risk and failure analysis is difficult
The original developers often agree to provide a Safety certificate but this normally costs an
additional fee and has only very rough evidences. Sometimes the vendors are not even willing to
provide these evidences. In this case the only possibility is to create strong requirements. For the
Safety Case this implies the following:
The Safety Case should ensure Safety. This can only be done if the COTS component is safe. A
modular Safety Case ensuring this can help here. It should be taken care that no failure of the
COTS system can result in hazards.
The basic conditions should be ensured before the product is selected. When they are selected,
the conditions have to be recorded in a contract. As soon as those conditions are written down the
decision of which seller should be chosen is easier.
Tim Kelly and Fan Ye suggest to use so-called Safety Case contracts similar to those used for
modules. They further propose to address SAL (a Safety Case assessment method which is
introduced later). By doing so the necessary confidence is ensured. The distribution in such a
concept allows the creation of modules. The modular concept was already discussed in previous
chapters.
This distribution has effects on the FTA as well. For the buyer this results in a black box approach.
Main
Module
COTS
Module
Content
unknown
E.9 Assessment
Until now the Safety Case structure and its contents are considered. Now the next point of interest
is an assessment method. In Standards several methods for allocating risk are mentioned.
The main advantages of a good assessment model are:
• Improved system safety
• Defensible best practice statement
• Certified evaluated systems have advantages in competing
• Confidence in product is increased
• Costs are less if system is known to be acceptably safe
This chapter first introduces a method for Safety Case Assessment (SAL) which will be pursued
later and it introduces SIL. At the end of the chapter a two level assessment is introduced.
Safety Assurance may be defined as: “A qualitative statement expressing the degree of confidence
that a safety claim is true” [29]. Following this definition a qualitative valuation with regard to the
main argument is sought. The Safety Assurance Level (SAL) has been defined as follows: “SAL is
the level of confidence that a safety argument element (goal or solution) meets its objective” [26].
E.9.2.1 Procedure
Looking at the SAL assessment procedure it becomes apparent that the starting point is at the very
top of the GSN structure which often is “system is safe enough to operate”. Afterwards it goes
down until all goals are researched. Than it is possible to address a SAL for the evidence and
check this stepwise to the top again.
Three phases are identified:
1) Top-level safety Assurance Level
2) Parent-child Goal analysis
3) Specify SAL for evidence
Argument
hierarchy
(1)
(2) (3)
Time
Analogue to SIL (Software Integrity Level) and DAL (Development Assurance Levels) SAL is
defined in four steps for these levels. They indicate the required level of assurance. The highest
level is represented by four. Possible levels of relevance are given by: (Near) Valid, High, Medium
and Low.
The next table is an example for possible starting SAL criteria. If the hazard was critical, then the
failure to meet the top-level claim would equate to the potential of the system to cause a critical
hazard SAL3.
Negligible hazards Marginal hazards Critical hazard Catastrophic hazards
SAL 1 SAL 2 SAL 3 SAL 4
Linked support means that several elements together support the main argument.
Parent SAL Relevance Child SAL
S4 Valid/Near Valid S4
Valid/Near Valid S3
S3
High S4
Valid/Near Valid S2
S2 High S3
Medium S4
Valid/Near Valid S1
High S2
S1
Medium S3
Low S4
SAL2
System XYZ
is safe
Argument
by AB
SAL 3 SAL 3
Sub-system Sub-system
A is safe B is safe
Several premises separately support the conclusion. So the assurance of the parent goal is
separated by the independent child elements. It is important to identify the independence of the
child elements. This can be either mechanistically (same theory in different ways) or conceptually
(different underlying theories). Conceptual independence is desirable as it provides a higher
assurance.
Parent SAL Complementarities Child SAL Child SAL
Independence Child
elements
Conceptual 2 S2 S4
S4
Conceptual 2 S3 S3
Conceptual 2 S1 S3
S3
Conceptual 2 S2 S2
S2 Mechanistic 2 S1 S1
G1- SAL 4
Car is safe
enough
S1 Arg. by
speed red
equipment
G2 – SAL3 G3 – SAL 2
Manual brakes „SafeSpeed“
safe enough safe enough
SIL 4
function
Argument J
over SIL 3 DefStan
SAL 4
function
SAL 4 SAL 4
function function
SIL 3 SIL 3
function function
• SAL describes the level of confidence which can be placed in an argument. Safety Integrity
Levels (SIL) and Development Assurance Levels (DAL) do not necessarily target the
evidence with regard to the main goal.
• Furthermore, SAL helps focusing and transparent structuring of the Safety Case argument.
• SAL can be attached to all types of evidence.
• By using SAL the required assurance of the evidence is set at this stage and thus each
item of evidence will have its own required level of thoroughness. SAL justifies the
concentration of effort upon parts of the argument that require greater assurance. This
provides a more rational approach to the selection of evidence based upon the argument
being generated and allows safety engineers to produce suitable evidence of the correct
weight.
It has been shown that a level of how much the top-level Argument is satisfied can be
approximated. But is this enough? Can the quality of the Argument be approximated? The
Arguments are defined as TRUE/FALSE conclusions. It is not possible to quantify its return. But an
attempt can be made using the evidence:
Creation of a
document
no
E.10 Process
In this report a process (lat. procedere = to go ahead) is defined as a well defined sequence of
system states or working steps that are required for the design, planning, implementation and
operation of the underlying system. As illustrated in the next figure, development can be divided
into the above mentioned phases.
Development Implementation
Processes can be found in many phases of development and in development itself. Almost every
work flow is defined to guarantee that best practices are carried out properly in future. It is
important to notice that this approach has effects on Safety as well. Building a product in a safe
way can not guarantee the absence but minimize the presence of failures. The idea is to set up a
process which helps to avoid systematic failures. This approach is widely used in standards where
successful procedures are written down. Standards like IEC 61508 also provide a list of methods
which could be applied in the lifecycle. The selection of these methods is producing requirements
which are meeting those of the design process. The difference between random hardware failures
and systematic failures is that system failure rates can be predicted with quantifiable reasonable
accuracy, but systematic failures, cannot be accurately predicted or statistically quantified because
the events leading to them cannot be easily predicted.
In general one could say the better the development process, the less the systematic failure rates.
Best practices should be guaranteed throughout the whole engineering lifecycle.
Total System
Requirement acceptance
System System
requirements test
System System
architecture Integration
Adjust
Concurrent Software Software Data
requirements testing
Engineering
principles Software Software
design Integration
Software
development
Safety processes can be seen as the previously defined application flow of Safety Engineering
techniques. Some standards implement specific cases for each product to define its process steps.
Identification of hazards
Development
and design of
Classification of Hazard occurrence the integrated
hazards analysis safety system
Establishment of dependability-related
requirements
Verification and
validation of
dependability-
related
Safety Case construction requirements
System
design
HI
HC
HO
Requirements
V&V
This chapter focuses on the relationship between engineering and Safety Case.
Finally there is an ongoing two way relation between engineering and Safety which is illustrated in
the following figure:
Engineering Safety
New
System
Requirement
Failed Accepted
ok?
Hazard Safety
Identification and Engineering
Defining requirements classification
and system definition System acceptance
Hazard
Hazard Occurrence Identification and
Analysis classification
Development of
Integration
design and structure
Hazard Occurrence
Analysis
Implementation/
Development / Production
An attempt for coupling those two processes is illustrated above… A better structured process was
created during the EASIS project the so-called EASIS Engineering Process (EEP).
The members of the EASIS project defined a special engineering process. As illustrated in the next
figure this process has clearly defined start and end state.
Safety Case can be performed. All documents have to be reviewed. During the whole life cycle the
above mentioned documents are getting more detailed and new versions are given. By this it is
meant that not only the system gets progressed but the Safety Case itself as well. Not only
functional but also design decisions are made during the whole system development. This makes
the system changing several times. These changes do not have to result in big changes viewable
from outside. Anyway it is an ongoing process and implies interim Safety Cases and finally after
final releases of the documents are reviewed a dependability verification. This again is not a
prescriptive method.
This evolutionary nature of the standards has resulted in the majority providing a ‘cook book’ of
processes that are recommended, dependent on the risk that the system presents to human life.
Not every process defined by a standard or guideline completely fits to the underlying industry and
the usual processes carried out by the companies. In those cases processes are slightly changed
so that they are applicable. These changed company processes have to be checked if they still
perform the intended functions. This is what is meant by process assessment. There are two main
questions that have to be answered when this aspect of Safety is researched.
• Is the process able to create a safe product?
• Is the process carried out properly?
The first point leads to a philosophical discussion. Who decides what a safe product is. In a semi-
prescriptive approach this could be done by seen as the process provided by the underlying
standard. In a pure goal-based approach this problem is hard to address. The second question is
easier to answer. The best practices could be checked by a list, review…
It should be mentioned that the order of processes is important and has to be checked as well.
E.10.7 Problems
E.11 Tools
Tools to generate evidence (sources of evidence are already explained in chapter E.7.6,
Evidences)
• Safety analysis tools
• Tools for collecting and analyzing field experience
• Test tools
• FMEA and FTA (tool for this: IQFMEA)
• PHA and HAZOP to identify risks and safety concerns
Tools to support notations/ Integrated Safety Case Tools:
• ASCE (Adelard Safety Case Environment)
• ISCaDE (Integrated Safety Case Development Environment)
• eSafetyCase (electronic Safety Case)
• GSNCaseMaker (GSN Case Maker)
• SAM (Safety Argument Manager)
ASCE stands for Assurance Safety Case Environment. This graphical hypertext tool helps to
create, review, analyse and disseminate safety and assurance documentation. Its underlying
concept is that argument and evidence have to go hand in hand and it supports both notations
GSN and CAE. First the notation is chosen, then drawing the network can begin. Generally
speaking it is a useful tool for creating hypertext documentation, graphical support for any
arguments or illustration of technical relations and building of them. The availability of 'plug-ins' in
the newer version provides good and useful extensions. The main functions in a review:
• Drawing/ Creating networks and by this (e.g. Safety Case) structures.
• Checking network structure and correct spelling
• Supports status fields and its extensions like “undeveloped” (represented by a diamond
below the box)
• Export as a word or html document
• Adding text to each element by both, directly into the field and as link to the web, another
element and another file.
• The order in which the things should appear in the html format can be defined.
• Work can be protected by setting a password (incl. letters numbers and cases). It is
possible to view protected files but changes cannot be saved without entering the correct
password.
• Define so called “schemas” (e.g. Why-Because-Analysis and define nodes, checking rules,
status fields, etc.)
• Locking of opened files. Only one editor can be opened to prevent concurrent work.
• Plug-ins can be generated and applied to use the tool more efficiently (e.g. Popup windows
to present and collect information from the user, analysis of the structure of a network,
Propagation of data values over the network, connection to third party tools and file
formats, Interaction with the content of the HTML editor, Exporting data).
Parent nodes:
Child nodes:
o Is solved by Strategy: A5 Argument by different analysis
Parent nodes:
Another possibility would be to specify the elements. The design is in the hands of the user. In the
given list are: Title-Id, Title-Description, Type, Has External Reference, Development required,
Instantiation required, Completed, Resourced, Risk and Confidence.
Checking the network structure is a useful tool especially in complex systems. The underlying rules
can be defined by the user. Already included in the ASCE are the following templates:
In GSN networks, ASCE checks:
"Network circularities"
"Strategies must be solved by at least one sub-goal"
"Goals must be solved by at least one goal, strategy or solution"
"Solutions must not be solved by anything"
"Only one top level node (excluding notes)"
"Eventually, all Option nodes should be removed"
"Eventually, all n-iteration and 0/1 choice links should be removed"
"Solutions, Assumptions, Justifications and Contexts, should only have incoming links"
"All nodes should eventually have status completed"
ASCAD network check rules:
"Network circularities"
"Claim with no sub-claim, argument or evidence"
"Argument with no sub-claim, argument or evidence"
"Claim or evidence with direct evidence link"
"Floating claim"
"Evidence with claim or argument as input"
"All nodes should eventually have status completed"
One advantage is that boxes next to the rules can be ticked to make them optional (e.g. "All nodes
should eventually have status completed" can be volitional during the implementation of a
structure). These rules will help create a structure that is complete.
E.11.1.2 Improvements
There can only be one so called “main image”. This implies use of only one html file, meaning that
only the newest version can be in a folder. Change management?
There should be a better standard-rule for the export path like: after any element take next the
model/assumption/justification/context and first go left if possible.
Another possible improvement would be the support of Patterns (explained in chapter E.8.2,
“Reusing Arguments by applying Patterns”).
Because ASCE networks are plain XML, an external application to report on them in any way
required can always be written (even if the tool does not support it directly). ASCE also supports
user plugins which allow for custom reporting behaviours
E.11.2 ISCaDE
ISCaDE means Integrated Safety Case Development Environment and is a product of rcm2 Ltd.
The procedure for creating such a network is illustrated in the following figure:
Figure E.64: Example Hazard Log Form (taken from Fararooy's paper [25])
ISCaDE is using DOORS (Dynamic Object-Oriented Requirements System) as its database for
managing requirements. It is a COTS product where all requirements and their validations are
stored, managed or at least linked to. DOORS is a Telelogic Ltd. Product.
It is possible to combine Hazard Analysis techniques with an intelligent requirements management.
The idea was that the created environment should span all relevant sectors like a hazard log, risk
assessment, safety requirements, graphical support (e.g. GSN).
Standard ISCaDE supports the import of several standards that help to develop
support requirements for ensuring safety already in early design phases. Therefore any
word document that is structured can be turned into a database table in which
attributes (e.g. validation and verification criteria) may be assigned to table entry.
Hazard Log Each hazard can be addressed by a unique Hazard ID and easily be inserted as
a DOORS object. It can be shown in a window including several fields.
Connected fields are: Issue, Category/ Scenario (hazards can be categorized for
easier handling), Hazard (description of each hazard can be given), Cause and
Consequences (can be given in detail), Probability (of a hazard to occur), Severity
(if it occurs how catastrophic its consequences are), overall risk rating (category
given by risk matrices), Safeguard/ Mitigation (what can be done to lessen
consequences), Severity, Probability and overall risk rating after this Mitigation
(by this if it is ALARP if safeguard practices are carried out), Action (that should
be done), Owner (who carries it out) and Notes(to add additional information).
Gap Analysis The Gap Analysis identifies if there are any unsolved Hazards left. This can be
new hazards that are implied by standards or development of new requirements
that handle existing hazards but which are not satisfied yet.
Risk Matrices ISCaDE categorizes the hazards from the Hazard Log by the risk matrices.
SC Notations ISCaDE is able to generate autonomously a notation (e.g. GSN).
It should be noted that all project information including safety is managed within a single object
oriented database with a multi-user environment which implies no duplication or missing effort and
transparency of information to all team members.
Open questions are:
• Is it an automated approach trustworthy?
• Are all necessary contexts added?
• Are all Arguments formulated in appropriate manner?
E.11.3 ESafetyCase
The eSafetyCase (electronic Safety Case) is a Praxis High Integrity Systems Product. It is based
on the application of underlying standards. There are few of such formats integrated and can be
directly chosen: (e.g. DefStan00-56, JSP 518 ...). These formats demand different documents that
together ensure Safety. The tool eSafetyCase gives guidance in adding such documents. A
Microsoft® Visio Plug-in helps to generate Arguments and their integration by creating GSN
structures. It is possible to export this eSafetyCase by html and by PDF whereby it should be
mentioned that html is preferred because of its browsing benefits.
E.11.4 GSNCaseMaker
The GSNCaseMaker is a software application based on Microsoft® Visio. It was developed by ERA
Technology Ltd. in cooperation with CET Advantage Ltd and the University of York. Where ever a
clearly stated argument is needed the GSN is a useful approach.
E.11.5 SAM
The Safety Argument Manager is a quite old tool which is not even sold any more. In SAM
Arguments are expressed in a combination of goal structure and Toulmin’s Argument form. Its
graphical presentation looks as follows:
E.12 Illustrations
This Chapter contains several illustrations which could be used with minor adaptations in the
development a safety related system.
A Safety Case Checklist could be either created due to a defined process (e.g. Standards or
Company Process) or could be created due to the Steps planed in a Preliminary Safety Case.
A checklist should include as a minimum, for each item, an owner (who is responsible for ensuring
the activity occurs), an author (who is responsible for actually creating the relevant
documentation), a due date, and a completion date.
The following lists the items on a example Safety Case Status/Checklist, with additional notes
indicating potential future developments and streamlining.
Checklist Item Notes
System Description - Preliminary -
Develop Preliminary Hazard List - The main benefit of the
SIL/DL Estimated - technical items listed
Includes, for example, the SFF here is that they ensure
calculation from IEC 61508, or the the product is correctly
Preliminary System Level FMEA scoped prior to final
equivalent from ISO 26262 when it is
available price agreement. This
helps minimise the
All preceding items must be complete by Quote Approval
resource (part of the
Safety Case Program Plan - ALARP principle, applied
Safety Case Schedule / Checklist - to the process) and
Integrate SC Tasks into Project Plan hardware cost tensions
- that always occur as
Completed
real projects progress.
Preliminary Safety Requirements -
All preceding items must be complete by Proposal Acceptance
System Description - Detailed -
Preliminary Hazard Analysis -
Safety Integrity Level / Difficulty Level
-
Calculation
Hazard Verification Test Requirements -
Safety Requirements Analysis: System,
-
HW, SW
System Level FTA
System Level FMEA
Close actions before Design Verification
System Level Software FMEA
Component Level FMEAs
Hazard Verification Test Results -
All preceding items must be complete by Concept Verification
Power and Grounds Sneak Circuit
-
Analysis
Common Mode/Common Cause
-
Analysis
All preceding items must be complete by Design Verification
Maintenance Safety Analysis -
Process FMEA -
The following figure illustrates a part of the Process Safety Case using the ASCAD notations. In
the figure the first phase of the EEP (“Requirements Specification”) is shown. This phase is divided
into 4 sub steps. Each step has its own evidence which is the outgoing document for this phase.
Two work products (A2 and A4) are updated in later phases and therefore they are marked as
incremental documents. This means that the outgoing document is the evidence that a process
step has been done. The grey arrows show the flow of the document into subsequent process
steps, e.g. A1 is an input document for step 1.4.
A15
All process steps are
done and all work
products are created
CLAIM
Is a subclaim of
1
Requirements are
specified
CLAIM
Is a subclaim of
Is a subclaim Is
of a subclaim of
Is a subclaim of
1.1
Natural 1.2 1.3 1.4
Language PHA is Risk Mitigation Structured
Requirements performed Requirements System
are captured are defined Requirements
are specified
CLAIM CLAIM CLAIM
CLAIM
Is evidence for
Is evidence for Is evidence for
A1
Is evidence for
Natural
Language
Requirements
Specification
EVIDENCE A2 - 1 A3
PHA results Risk Mitigation A4 - 1
Requirements Structured
EVIDENCE System
EVIDENCE
Requirements
EVIDENCE
Is evidence for
Is evidence for
A2 A4
PHA Analysis Results Structured System Requirements
Specification
OTHER OTHER
Supports Supports
Note2
Incremental documents
CAPTION
Figure E.69 Illustration of the Process Part using the EEP from WP 4
E.13 References
[1] Kelly, T. P.: “Arguing Safety – A Systematic Approach to Managing Safety Cases” DPhil Thesis, Department of
Computer Science, University of York, UK, September 1998
[2] Kelly, T. P. and Weaver, R. A.: “The Goal Structuring Notation – A safety Argument Notation”
[3] Bishop, P. and Bloomfield, R.: A Methodology for Safety Case Development, Adelard, London, UK
[4] U.K. Ministry of Defence “00-56 Safety Management Requirements for Defence Systems,” Ministry of Defence,
Defence Standard December 1996.
[5] Govier, T. : “A Practical Study of Argument”, 3rd ed. Belmont CA: Wadsworth, 1992
[6] Toulmin, S.: “The Uses of Argument.” Cambridge: Cambridge University Press, 1958
[7] Adelard LLP: ASCE tool manual (www.adelard.com)
[8] Tiemeyer, B.: ”Performance evaluation of satellite navigation and Safety Case development”, Universität der
Bundeswehr München, 2002
[9] “Yellow Book” (page 1 – 4, figure 1- 1)
[10] U.K. Ministry of Defence, “JSP 318B - Regulation of the Airworthiness of Ministry of Defence Aircraft”, UK
Ministry of Defence, 1999.
[11] Greenwell, W. S., Strunk, E. A. , and Knight, J. C.. “Failure Analysis and the Safety Case Lifecycle.”, University
of Virginia, 2004
[12] Greenwell, W. S: “Pandora: An Approach to Analyzing Safety-Related Digital System Failures”
[13] Kelly, T. P.: “A Systematic Approach to Safety Case Management”, University of York, UK, 2003
[14] Kelly, T. P.: “A six step Method for Developing Arguments in the Goal Structuring Notation”, York, England,
September 1999
[15] Emmet, Luke: email from the 15th of February 2006
[16] Fenn, J.: “Putting Trust into Safety Arguments“ Jane Fenn and Brian Jepson BAE Systems, Wharton
Aerodrome, Preston, Lancashire, PR4 1AX
[17] MoD SMP12 Safety Case and Safety Case Report, July 2004
[18] Thaller, G. E.: “Software-Test-Verification and Validation”, 2nd edition, Hanover 2002
[19] Ehrenberger, W.: “Software verification- procedures for proving software reliability”, Hanser-Verlag 2002
[20] Weaver, R.A.; McDermid, J.A.; and Kelly, T.P.: “Software Safety Arguments: Towards a Systematic
Categorisation of Evidence”, Proceedings of the 20th International System Safety Conference, Denver, 2002.
[21] Gamma, E.; Helm, R.; Johnson, R. and Vlissides, J.: “Design Patterns: Abstraction and Reuse of Object-
Oriented Design,” presented at ECOOP'93 - Object- Oriented Programming, 7th European Conference,
Kaiserslautern, Germany, 1993.
[22] Bass, L.; Clements, P.l and Kazman, R.: “Software Architecture in Practice”, 2nd edition publisher: Addison
Wesley, 2003
[23] Bate, J. and Kelly, T. P.: “Architectural Considerations in the Certification of Modular Systems”,
(SAFECOMP'02), publisher: Springer-Verlag, September 2002.
[24] Kelly, T. P.: “Managing Complex Safety Cases, (SSS'03)”, February 2003, published by Springer-Verlag
[25] Fararooy, S.: “Managing a System Safety Case in an Integrated Environment”, Esher, Surrey UK
[26] Weaver, R. A.: “The Safety of Software – , Constructing and Assuring Arguments ” DPhil Thesis
[27] Wu, W.: “Safety Tactics for Software Architecture Design”, MSc Thesis
[28] ARP 4754: “Certification Considerations for Highly-Integrated or Complex Aircraft Systems”, Society of
Automotive Engineers, 1996.
[29] ARP 4761: “Guidelines and methods for conducting the safety assessment process on civil airborne systems
and equipment”, Society of Automotive Engineers
[30] Def Stan 00-54: “Requirements for Safety Related Electronic Hardware in Defence Equipment”, UK Ministry of
Defence, 1997.
[31] Def Stan 00-55: “Requirements for Safety Related Software in Defence Equipment”, UK Ministry of Defence,
1997.