D3.2 Part1 Guidelines Dependability Hazard Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 340

European

EASIS
Commission Deliverable D3.2 Part 1

Contract number: 2002 - 507690

EASIS
Electronic Architecture and System Engineering for
Integrated Safety Systems

Report type Deliverable D3.2 Part 1


Report name Guidelines for establishing
dependability requirements and
performing hazard analysis

Report status: Public

Version number: Version 2.0

Date of preparation: 14.11.2006

© 2006 The EASIS Consortium


EASIS Deliverable D3.2 Part 1

This page intentionally left blank


EASIS Deliverable D3.2 Part 1

Authors

Authors
Dr Olle BRIDAL1 Volvo Technology
2
Mr Michael AMANN ZF
Mr Marko AUERSWALD Bosch
Mr Jonas EDÉN Volvo Technology
Mr Marc GRANIOU PSA
Mr Christoph HELLWIG TRW
Dr Bernhard JOSKO OFFIS
Mr Thorsten KIMMESKAMP Universität Duisburg Essen
Mr Andrew KINGSTON TRW
Mr Roman KRZEMIEN ZF
Mr Michel LEEMAN Valeo
Mr Ola LUNDKVIST Volvo Technology
3
Dr Andrea PIOVESAN C.R.F.
Mr Christian SCHEIDLER DaimlerChrysler
Mr Paul TIPLADY TRW
4
Dr David WARD MIRA

1
Worktask leader. Editor for Appendix A and D. Co-editor for Appendix C.
2
Editor for Appendix E
3
Co-editor for Appendix C
4
Editor for Appendix B

The Consortium:
DaimlerChrysler (D) Bosch (D) Continental Teves (D)
C.R.F (I) DAF Trucks (NL) DECOMSYS (A)
dSPACE (D) ETAS (D) Philips (D)
LEAR (E) MIRA (UK) Motorola (D)
OFFIS (D) Opel (D) PSA (F)
REGIENOV (F) TRW Automotive (D) Universität Duisburg Essen (D)
Valeo (F) Vector (D) Volvo (S)
ZF(D)

14.11.2006 Version 2.0 iii


EASIS Deliverable D3.2 Part 1

This page intentionally left blank

14.11.2006 Version 2.0 iv


EASIS Deliverable D3.2 Part 1

Table of contents

Executive summary ............................................................................. 1


1 Introduction and objective ............................................................ 2
2 The dependability activity framework ........................................... 3
3 Guidelines .................................................................................... 9
3.1 General guidelines................................................................... 9
3.2 Guidelines for hazard identification ......................................... 9
3.3 Guidelines for hazard classification ....................................... 10
3.4 Guidelines for hazard occurrence analysis............................ 11
3.5 Guidelines for the establishment of dependability-related
requirements.......................................................................... 12
3.6 Guidelines for Safety Case construction................................ 16
References ........................................................................................ 20

14.11.2006 Version 2.0 v


EASIS Deliverable D3.2 Part 1

This page intentionally left blank

14.11.2006 Version 2.0 vi


EASIS Deliverable D3.2 Part 1

Executive summary

Unlike many other industrial sectors (aerospace, railway, military, etc), the automotive industry
lacks a standardized approach for how to deal with system safety issues in the development and
design of software-based systems. This lack of an automotive-applicable safety standard was one
of the major motivators for the EASIS project.
By analyzing, adapting, extending and defining methods and techniques for dependability and
system safety, this report defines and investigates a set of development process activities tailored
to the specific needs of the automotive sector. These activities are organized into a Dependability
Activity Framework. Recommendations and guidelines for how to carry out each activity within this
framework are given. The activities investigated are:
• Hazard identification: Identification of the undesirable vehicle-level states and behaviours
that may be created by the system being considered.
• Hazard classification: Assessment of the degree of undesirability of each hazard.
• Hazard occurrence analysis: Investigation of the cause-and-effect relationships that result
in a hazard, and estimation of the resulting hazard occurrence probabilities.
• Dependability requirements: Establishment of requirements on the product and process
to ensure that risks are sufficiently low or eliminated.
• Verification that the dependability requirements are met by the system design and
implementation.
• Safety Case: Construction of a convincing argument that the system is as safe as can
reasonably be demanded.
It should be noted that these activities are not completely orthogonal to each other. The hazard
occurrence analysis is primarily concerned with providing hints about whether, and where,
dependability mechanisms should be introduced as the system is being developed. However,
hazard occurrence analysis also plays a role in the verification activity.

14.11.2006 Version 2.0 1


EASIS Deliverable D3.2 Part 1

1 Introduction and objective

This report is the result of EASIS Work Task 3.1 which is concerned with analyzing, adapting,
extending and defining methods and techniques for carrying out dependability-related activities in
the development of an Integrated Safety System (ISS). Of particular interest are the potential and
adaptability of existing dependability analysis approaches to the automotive domain.
The objective of this work is to provide guidelines for dependability-related activities that should be
performed during the development of an ISS. These activities are defined so that they are
applicable regardless of the specific development process model used. Thus, the aim is not to
define a process but to describe dependability-related activities that should be part of any
development process for an ISS.
In fact, the dependability issues for an ISS are in many ways the same as for any automotive
system. If a general methodology for dependability of automotive electronics had been established
prior to this project, our work could have focused on the ISS-specific issues. However, as such a
methodology does not exist, our work addresses the wider topic of 'dependability of automotive
electronic systems' rather than being limited to ISS-specific issues.
Dependability has been defined as "the trustworthiness of a computing system such that reliance
can justifiably be placed on the service it delivers" [2]. It is a wide concept that encompasses the
notions of reliability, availability, safety and maintainability but our focus is primarily on safety
issues. However, the safety benefits provided by integrated safety systems, in terms of their
contribution to the road traffic safety when they deliver the intended service, is outside the scope of
our work. Instead, we focus on the safety implications when the integrated safety function for some
reason does not operate as intended. In other words, we are primarily interested in the potential
failures and degraded modes of the system rather than its nominal operation. We additionally
address the possibility that the correct operation of the system might be hazardous in some driving
situations, but such issues are not in the focus of our work.
It should be noted that work is currently on-going to define an ISO standard (ISO 26262) for
functional safety of automotive electronic systems. The scope of this upcoming standard and the
scope of this EASIS Deliverable are partially overlapping. However, there are obviously major
differences between a research project and standardisation activities. In EASIS, we investigate
and discuss different approaches for how to deal with dependability and provide a set of
recommendations. The ISO standard, on the other hand, is concerned with defining a set of
requirements that have to be to be met if compliance with the standard is to be claimed. It
deserves to be mentioned that several individuals participating in EASIS WP3 are also members of
the ISO 26262 committee, thus allowing for a mutual influence between EASIS and the ISO work.

14.11.2006 Version 2.0 2


EASIS Deliverable D3.2 Part 1

2 The dependability activity framework

The structure of this report is based on the dependability activity framework shown in Figure 1. In
the figure, all blocks except "development and design of the integrated safety system" represent
dependability-specific activities and are thus within the scope of this report. Although this
framework does not perfectly match any particular lifecycle as defined in a standard such as
IEC 61508 [4] or ARP 4754 [3], it shows the dependability activities that should be performed
during the development of a potentially safety-critical1 system.
It is important to understand that these activities are typically not performed in strict sequence.
Instead, all activities are performed more or less in parallel throughout the development process.
This means that the inputs and outputs of an activity will be different in different phases of the
development lifecycle. However, there is typically an emphasis on the upper levels in the figure in
early phases and on lower levels in later phases. For example, the result of an early hazard
identification will typically be a list of coarsely described hazards. In a later phase, the hazards are
more precisely described. The motivation behind this "activity view" as opposed to a "process
view" is further explained in Appendix A.
We have consciously avoided referring to this framework as a "lifecycle" since the sequence of
activities is not our prime concern. Furthermore, we are only concerned with hazards occurring
during vehicle operation and with the corresponding risk-limiting activities during development.
Hazards potentially occurring during part storage, logistics, vehicle assembly, maintenance,
disposal, etc are beyond the scope of this work, as are risk-limiting actions and activities that are
outside the control of the vehicle manufacturers and suppliers. For example, speed limits, road
traffic management and driver education are not considered at all. We are mainly interested in
those hazards that are directly associated with potential failures of the considered system.
The framework is sufficiently general to allow existing process models, standards and guidelines to
be mapped to it. By not being associated with any existing process, standard or guideline, this
framework allows us to investigate each activity with open eyes and with a minimum of
presuppositions. A discussion of existing standards and guidelines that have influenced the
definition of this framework is given in Appendix A along with more detailed discussions about the
framework.

1 Here, "safety-critical" means that the system may contribute to the occurrence of hazards, for example when the system fails to work
as intended, regardless of the magnitude of this contribution.

14.11.2006 Version 2.0 3


EASIS Deliverable D3.2 Part 1

Identification of hazards

Development
and design of
Classification of Hazard occurrence the integrated
hazards analysis safety system

Establishment of dependability-related
requirements
Verification and
validation of
dependability-
related
Safety Case construction requirements

Figure 1 Dependability activity framework

Table 1 provides a summary of the dependability-related activities, including the typical information
flow between them. From the table, it should be clear that the outlined framework is generic in the
sense that it is applicable to the development of any safety-critical system.

14.11.2006 Version 2.0 4


EASIS Deliverable D3.2 Part 1

Table 1 Overview of the dependability activities


Activity Description Inputs Outputs (see Inputs column for
receiving activities)
Development All development activities From "Establishment of • Functional description:
and design of except those specifically dependability-related Purpose of the system, in
the integrated related to hazard analysis requirements": terms of the service it is
safety system and dependability-related • Dependability-related intended to provide
requirements requirements • System design: drawings,
software, design
There are of course many specifications,
other inputs to this phase but accompanying
they are not considered here. documentation (user
manual, service manual)
• Physical product
Identification of Identification of the ways in From "Development and • List of identified hazards
hazards which the system, in normal design":
(Appendix B) or abnormal operation, may • Functional description
contribute to the occurrence
• System design
or severity of undesired
events. (Estimation of the • (Physical product)
magnitude of this From "Hazard occurrence
contribution is outside the analysis":
scope of hazard • Hazards identified by
identification, as it is related the hazard occurrence
to hazard classification. ) analysis
Classification of Quantification of the degree From "Hazard identification": • List of identified and
hazards of undesirability of each • List of identified classified hazards
(Appendix B) identified hazard hazards
Hazard Identification and From "Hazard identification": • Cause/consequence chains
occurrence investigation of the • List of identified resulting in hazards
analysis cause/consequence chains hazards • Estimated hazard
(Appendix C) that lead to a given hazard. occurrence rates
From "Design and
Estimation of hazard
occurrence rates.
development": • Hazards identified by the
• System design hazard occurrence analysis
• (Physical product)
Establishment Formulation of requirements From "Development and • Dependability-related
of on the design process and design": requirements
dependability- on the product • System design
related
From "Hazard classification":
requirements
• List of identified and
(Appendix D)
classified hazards
From "Hazard occurrence
analysis":
• Cause/consequence
chains resulting in
hazards
• Estimated hazard
occurrence rates
Verification and Demonstration that the From "Establishment of • Pass/fail results
validation of dependability-related dependability-related
dependability- requirements are fulfilled requirements":
In the case of "fail", some action will
related (verification) and that the • Dependability-related obviously have to be taken but this is
requirements system fulfils the requirements not shown in the activity framework.
(Appendix D) dependability expectations Depending on the characteristics of
From "Development and
of typical users and other the "fail", the action could for
design":
stakeholders (validation). example be a design change, a re-
Parts of the verification • Functional description
evaluation or a requirements
involves hazard occurrence • System design
change.
analysis techniques. • Physical product

14.11.2006 Version 2.0 5


EASIS Deliverable D3.2 Part 1

Activity Description Inputs Outputs (see Inputs column for


receiving activities)
Safety Case Argumentation of why the All outputs from all activities of • Safety case
construction system is considered to be the activity framework
(Appendix E) adequately safe for its
intended purpose

In a real development scenario, the activities included in the Dependability Activity Framework will
be integrated into an overall development process. The relationship between the dependability
activity framework and the EASIS Engineering Process (EEP) [5] is shown in Table 2 below. In
each intersection between a "dependability activity" and an "EEP step", the specific dependability-
related action relevant at this stage of the EEP is briefly described. Based on the Table, the
following comments can be made:
• Hazard identification and classification are mainly carried out within EEP step 1.2 "Perform
PHA" (Preliminary Hazard Analysis). However, these activities are then repeated in various
stages of the EEP, since hitherto unknown hazards may be introduced as the development
progresses.
• Hazard occurrence analysis is performed at every step in the EEP that represents a more
detailed design (conceptual design in step 1.2, FAA2 model in step 2.4, FDA3 model in step
4.4, refined FDA model in step 5.5).
• Since the dependability activity "Establishment of dependability-related requirements"
covers everything that the system shall fulfil in order to be sufficiently dependable, it is
relevant in almost every step of the EEP.
• The Safety Case construction is not explicitly included in the EEP. Thus, this activity can
not be mapped to any step of the EEP.

Table 2 Relationship between dependability activity framework and the EEP


Dependability
framework Hazard Hazard Hazard Establishment of
identification classification occurrence dependability-related
activity analysis requirements
(Appendix B) (Appendix B)
EEP step (Appendix C) (Appendix D)
1 Specify requirements
1.1 Capture natural
language
requirements
1.2 Perform PHA Identify Classify Assess Establish tolerable risk targets
(Preliminary Hazard hazards hazards probability of
Analysis) hazards,
Perform
hazard graph
analysis

2 FAA = Functional Analysis Architecture

3 FDA = Functional Design Architecture

14.11.2006 Version 2.0 6


EASIS Deliverable D3.2 Part 1

Dependability
framework Hazard Hazard Hazard Establishment of
identification classification occurrence dependability-related
activity analysis requirements
(Appendix B) (Appendix B)
EEP step (Appendix C) (Appendix D)
1.3 Definition of risk Identify Classify Perform Identify requirements for risk
mitigation hazards hazards hazard graph mitigation
requirements introduced due introduced due analysis after
to risk to risk applying risk
mitigation mitigation mitigation
measures measures
1.4 Capture Document the decided
structured system dependability-related
requirements requirements
2 Analyze functional architecture
2.1 Specify functional
architecture
2.2 Specify dynamic
behaviour
2.3 Specify function Refine dependability
behaviour requirements to
implementation-related levels
2.4 FAA hazard Identify (new) Classify (new) Assess Identify risk mitigation
analysis hazards hazards residual risk measures
level
2.5 Validate FAA Check whether the FAA model
model conforms to requirements
3 Analyze hardware architecture
3.1 Design Refine dependability
system/HW requirements to
architecture implementation-related levels
3.2 Identify HW failure Hazard Define HW failure detection
model occurrence and reaction
analysis
3.3 Identify necessary Specify necessary HW
HW redundancy redundancy
4 Design of functional architecture concept
4.1 Design of Specify sensor/actuator error
sensor/actuator detection and handling
algorithms mechanisms
4. 2 Design of Mode management, with
functional behaviour respect to intentionally
degraded modes due to
detected errors
4.3 Refinement of Evaluation of error values,
functional interfaces partly relevant here
4.4 Basic design Identify (new) Classify (new) Assess Identify risk mitigation
hazard analysis hazards hazards residual risk measures
level

14.11.2006 Version 2.0 7


EASIS Deliverable D3.2 Part 1

Dependability
framework Hazard Hazard Hazard Establishment of
identification classification occurrence dependability-related
activity analysis requirements
(Appendix B) (Appendix B)
EEP step (Appendix C) (Appendix D)
4.5 Validate Basic Check whether the basic FDA
FDA model model conforms to
requirements
5 Refine functional architecture to fulfil safety requirements

5.1 Design of Refine dependability


sensor/actuator requirements to
related diagnostics implementation levels
5.2 Design of Refine dependability
application related requirements to
plausibility checks implementation levels
5.3 Design of counter- Refine dependability
measures requirements to
implementation levels
5.4 Design of Refine dependability
degraded modes requirements to
implementation levels
5.5 Design hazard Identify (new) Classify (new) Hazard Identify risk mitigation
analysis hazards hazards occurrence measures
analysis,
Assess
residual risk
level
5.6 Validate FDA Check whether the final FDA
model model conforms to
requirements

The activities of the EASIS Dependability Activity Framework and the relationship between them
are investigated and discussed in detail in the Appendices:
• Appendix A: Process frameworks for dependability
• Appendix B: Hazard identification and classification
• Appendix C: Hazard occurrence analysis
• Appendix D: Establishment of dependability-related requirements
• Appendix E: Safety Case construction

14.11.2006 Version 2.0 8


EASIS Deliverable D3.2 Part 1

3 Guidelines

The following guidelines for dependability-related aspects of system development are given in
condensed format. They are based on the discussion and findings given in Appendixes A-E. For
each recommendation, there is a reference to the appendix where the subject is further elaborated.
Guidelines marked with *Val have been selected to be validated (with respect to their
appropriateness) in the EASIS Work Task 5.2 demonstrator. This selection has taken both the
feasibility of validation and the available amount of resources in WT5.2 into account. See EASIS
deliverable D.5.5 for more on this validation.

3.1 General guidelines

3.1.1 Dependability should be considered from the beginning of the development. (A.3)
3.1.2 Automotive safety-oriented standards should be followed when applicable. The upcoming
ISO 26262 standard, expected to become effective in 2008, will be particularly relevant.
(A.2.7)
3.1.3 The overall approach to dependability during the development should be aligned with the
dependability activity framework. (A.3) *Val

3.2 Guidelines for hazard identification

Objective of hazard identification: Production of a list of undesired vehicle states associated with
the system of concern.
3.2.1 A clear understanding of the scope of the system should exist before hazard identification is
carried out. In particular, the boundary between the system of concern and the outside
world should be identified. In an early stage of the development, this boundary may be
defined at a conceptual level whereas more details will be added in later stages. (B.1.1,
A.3)
3.2.2 In an early stage, hazards should be identified based on existing knowledge about the
system. This knowledge is typically quite coarse and conceptual. (A.3) *Val
3.2.3 In a later stage, more detailed descriptions of the system should be investigated in the
search for hazards that have not yet been identified. (A.3)
3.2.4 Hazards should be defined so that their occurrence is not affected by the driving situation.
(A.3.1) *Val
3.2.5 Hazards should be defined as undesirable states at the vehicle level, resulting from
undesirable states in the system of concern. (A.3.1)
3.2.6 In addition to hazards caused by faults, potential hazards associated with the fault-free
function of the system should be identified. (A.3.2, B.2.2.7) *Val
3.2.7 In the search for hazards, different use scenarios that are relevant with respect to the
system of concern should be considered. (B.2.1.3)
3.2.8 The suitability of splitting a particular hazard into several distinct hazards with different
characteristics should be considered. (B.2.1.3)
3.2.9 Since different hazard identification methods complement each other, several methods
should be used. (B.2.3) *Val
3.2.10 Use of the following methods for hazard identification should be considered
• Hazard and Operability studies, HAZOP, at the external output interface of the
system of concern. (B.2.2.3)

14.11.2006 Version 2.0 9


EASIS Deliverable D3.2 Part 1

• HAZOP at input interfaces and system-internal interfaces. (B.2.2.3)


• Functional Hazard Assessment / Functional Failure Analysis, FHA/FFA. (B.2.2.4)
• Hazard identification based on state transition models. (B.2.2.5)
• Failure Mode and Effects Analysis, FMEA. (B.2.2.6)
• Checklists containing predefined automotive hazards, if available. (B.2.2.2)
3.2.11 People with different perspectives should be involved in the hazard identification activity:
application engineers, system developers, developers working with other systems
interacting with the system of concern, etc. (B.2.3)
3.2.12 When hazards have been identified, the possibility of hazard combinations resulting from a
single cause should be considered. Such a hazard combination should from then on be
considered as a hazard in itself. (B.2.2.8) *Val

3.3 Guidelines for hazard classification

Objective of hazard classification: Assessment of the criticality of each hazard. Here, the criticality
is a measure of how critical an occurrence of the hazard is.
3.3.1 Each identified hazard should be classified with respect to the criticality of an occurrence of
the hazard, in order to allow the hazard to be properly addressed in the system
development. (A.3.1) *Val
3.3.2 A classification scheme of an established standard should be used, if available. The
ISO 26262 standard, tentatively scheduled for publication in 2008, will contain such a
scheme. (B.3.1.4)
3.3.3 In the absence of an established standard, consider using one of the following hazard
classification schemes
• ASIL classification as defined in the upcoming ISO 26262 standard. (B.3.1.4) *Val
• MISRA risk graph (B.3.1.3) including its Controllability factor. (B.3.1.2)
• The novel approach defined as part of the EASIS work. (B.3.2)
3.3.4 The hazard classification should take into account the characteristics of the hazard,
including:
• the magnitude of the resulting deviation from the intended behaviour. (B.2.1.3)
• the duration of the hazard. (B.3.2.1.4)
• whether or not the driver is informed about the existence of the hazard. (B.3.2.1.4)
3.3.5 The hazard classification should take into account the exposure to driving situations in
which the hazard may lead to negative consequences. (B.3.1.1, B.3.1.3, B.3.1.4, B.3.2.2)
3.3.6 The hazard classification should take into account the ability of the driver (or other road
user) to avoid negative consequences of the hazard. (B.3.1.1, B.3.1.2, B.3.1.3, B.3.1.4,
B.3.2.2)
3.3.7 The hazard classification should take into account the severity associated with each
potential outcome. (B.3.1.1, B.3.1.3, B.3.1.4, B.3.1.5.2, B.3.2.1, B.3.2.2)
3.3.8 Use of a classification approach that is based on vague criteria like "occasional",
"improbable", "catastrophic", "critical" should be avoided, unless these terms are clearly
defined in a way that ensures that they are not open to subjective interpretation. (B.3.2.1.1)
3.3.9 Company policy, legal considerations and any other similar factors should be allowed to
affect the classification assigned to a hazard. (B.3.2.1.3)

14.11.2006 Version 2.0 10


EASIS Deliverable D3.2 Part 1

3.4 Guidelines for hazard occurrence analysis

Objectives of hazard occurrence analysis: Understand the relationship between underlying faults
and resulting hazards. Assess the occurrence probability (or frequency) of each hazard.
3.4.1 The identified hazards should be analysed with respect to their occurrence (based on a
description of the system to be implemented), in order to allow each hazard to be properly
addressed during the development of the system. (A.3)
3.4.2 In an early stage, hazard occurrence analysis should be based on existing knowledge
about the system. This knowledge is typically quite coarse and conceptual. (C.2.1) *Val
3.4.3 In a later stage, hazard occurrence analysis should be based on more detailed descriptions
of the system. (C.2.2, C.2.3)
3.4.4 In an early stage, error detection and handling mechanisms should not be considered in the
hazard occurrence analysis. The main purpose of the analysis at this early stage is to
identify where such mechanisms are needed. (C.2.1) *Val
3.4.5 In a later stage, the hazard occurrence analysis could consider the identified and defined
mechanisms as being part of the system. The main purpose of the analysis at this later
stage is to permit an assessment of whether these mechanisms are appropriate. (This
illustrates that hazard occurrence analysis plays an important role during verification and
validation.) Two types of analysis can be distinguished (C.2.3):
• Qualitative analysis: Check that there are no weak points
• Quantitative analysis: Determine the residual risk
3.4.6 The depth of the hazard occurrence analysis should reflect the criticality of the hazard as
determined in the hazard classification. Thus, the higher the criticality of a particular hazard,
the more effort should be put into the occurrence analysis for this hazard. (C.2)
3.4.7 Hazard occurrence analysis should be performed at different levels of detail, reflecting the
hierarchical structure typically found in automotive electronics. The detailed levels are
typically managed by Tier 1 and Tier 2 suppliers while the higher levels are managed by the
OEM. (C.2)
3.4.8 Qualitative Fault Tree Analysis should be used to investigate the causal relationships
between random and systematic faults and errors on one hand and hazards on the other.
(C.4.3.2) *Val
3.4.9 Quantitative Fault Tree Analysis could be used to estimate the probability (or frequency) of
the hazard. However, the quantitative analysis typically has to be limited to only cover those
faults for which the probabilities (or frequencies) can be estimated, such as random
hardware faults. Furthermore, it is usually extremely difficult to estimate the probability of an
error that is not detected by any of a large number of implemented error detection
mechanisms at different levels in the system. Thus, the analysis will be quite coarse and
the results shall be interpreted more as an indication than a definite answer. (C.2.2,
C.4.3.2)
3.4.10 FMEA should be performed. (C.4.3.1) *Val
3.4.11 When an FMEA is to be performed, the appropriate starting level (initiating "failure mode")
at the present stage of the development process should be carefully selected. The basic
rule is "as detailed low level as possible, based on the present knowledge about what the
final system design will look like". At least two such levels should be addressed, with the
corresponding FMEAs typically referred to as "System-FMEA" and "Design-FMEA" in the
automotive industry. (C.4.3.1)
3.4.12 Use of Markov modelling should be considered. Markov models are particularly appropriate
for the analysis of fault-tolerant and fail-safe systems, especially when the idea is that the
first fault shall be repaired before a second fault occurs. (C.4.5.1)

14.11.2006 Version 2.0 11


EASIS Deliverable D3.2 Part 1

3.4.13 Particular care should be taken to ensure that dependent faults are not treated as being
independent in the analysis. Thus, dependencies between different faults, errors and
component failures should be considered in the analysis whenever it is relevant. (C.5)
3.4.14 In the analysis of the contribution from potential software faults to the occurrence of a
hazard, too much detail should be avoided. Thus, the potential failure modes (rather than
the underlying faults) of the software should be considered as "root faults" in the analysis.
(C.6.1.1)
3.4.15 The following faults and other sources of hazards should be considered in the hazard
occurrence analysis. (C.3.3):
• Permanent hardware faults
• Transient hardware faults
• Environmental conditions that could make the system behave in an undesirable way
(Example: weather and visibility conditions for systems based on radars and vision
sensors)
• External faults, i.e. faults outside the system of concern
• Specification faults
• Hardware design faults
• Software design faults
• Faults in development tools: Code generator fault, compiler fault, etc
• Manufacturing faults
• Intentional malicious interactions with the system

3.5 Guidelines for the establishment of dependability-related requirements

Objective of the establishment of dependability-related requirements: Impose requirements on the


system design and implementation (both process and product) so that hazards are prevented to a
sufficiently high degree.
3.5.1 Dependability-related requirements should be determined based on the findings from the
hazard identification, hazard classification and hazard occurrence analysis as well as
knowledge about the system design. However, some requirements may be determined
based on only a subset of this information. (D.1)
3.5.2 The final consolidated requirements should be: (D.2.2)
• complete
• non-ambiguous
• correct
• atomic
• verifiable
• consistent with each other
3.5.3 It is recommended that a requirements management tool be used. (D.2.2)
3.5.4 In the search for requirements to be specified, the following requirement types should be
considered. (D.2). *Val
• Requirements on the probability of the identified hazards
• Requirements on fault tolerance and graceful functional degradation
14.11.2006 Version 2.0 12
EASIS Deliverable D3.2 Part 1

• Requirements on the system architecture


• Requirements that specific mechanisms for error detection and error handling shall
be implemented
• Quantitative (probabilistic) requirements on the hardware architecture
• Requirements on the avoidance of random (i.e. non-systematic) faults
• Critical functional requirements that necessitate particularly effective verification
• Requirements for functional limitations
• Requirements on design process
• Requirements on isolation and diversity to reduce the possibility of common-cause
faults
• Requirements on manufacturing processes
• Requirements on systems external to the system of concern
• Requirements on the contents of user manuals
• Requirements on the contents of service manuals
3.5.5 Requirements on hazard probabilities should be considered more as targets, aims or
intentions than as verifiable requirements. Alternatively, these requirements could be
expressed quantitatively, including the criteria for verifying that the requirement is met
(calculation method, etc). (D3.1.3)
3.5.6 Requirements on hazard probabilities should be derived from applicable standards, if
available. In the absence of such standards, such requirements should be based on
consideration of either "maximum broadly acceptable risk" or "maximum tolerable risk". For
the latter case, the ALARP ("As Low As Reasonably Practicable") concept is highly
relevant. (D.3.1.4-D.3.1.5)
3.5.7 To enable requirements related to fault tolerance and graceful functional degradation to be
determined, all services of the system should be listed. For each service, the degradation
which is allowed with respect to requirements on the safe operation of the whole system
should be defined. (D.3.2.2, D.3.2.4)
3.5.8 For each degradation defined according to guideline 3.5.7, the circumstances in which it
must be reached should be determined. (D.3.2.2)
3.5.9 To enable requirements related to fault tolerance and graceful functional degradation to be
determined, assumed faults should be specified with respect to how many faults that could
occur in parallel, their potential locations and the possible malfunctions (like stuck-at a
value, omission of output values, delay of operations etc.). (D.3.2.2, D.3.2.4)
3.5.10 All malfunctions of services which will lead to invocation of the envisaged fault tolerance
strategy should be listed. For each malfunction, the components whose faults may contri-
bute to the respective malfunction anywhere in the system should be identified. (D.3.2.2,
D.3.2.4)
3.5.11 As part of the process for defining requirements related to fault tolerance and graceful
functional degradation, fault regions should be formed. A fault region is a set of
components whose internal disturbances are counted as exactly one fault, regardless of
where the disturbances are located, how many that occur and how far they stretch within
the fault regions. (D.3.2.2, D.3.2.4)
3.5.12 As part of the process for defining requirements related to fault tolerance and graceful
functional degradation, containment regions should be formed. For each fault region the
task of the fault tolerance technique has to be expressed. The containment region is the set
of components which may be adversely affected by the respective malfunction. In other

14.11.2006 Version 2.0 13


EASIS Deliverable D3.2 Part 1

words: The containment region defines the border where fault propagation must stop.
(D.3.2.2, D.3.2.4)
3.5.13 Fault tolerance requirements should be checked for completeness and consistency. Use of
a formal model as proposed in D.3.2.4 is recommended. (D.3.2.2)
3.5.14 The system architecture, in terms of the overall system structure and any constituent
redundancy structures, should be specified with due consideration of dependability issues.
(D.3.3)
3.5.15 When the hazard occurrence analysis shows that a particular fault may lead to a hazard
and this causal relationship is forbidden by a fault tolerance requirement, the causal path
from fault to hazard should be effectively broken. This can be achieved by a modification of
the system architecture (D.3.3) or by the introduction of an error detection mechanism and
a corresponding reaction. (D.3.4.3)
3.5.16 When the hazard occurrence analysis shows that a particular fault may lead to a hazard
and the resulting hazard probability is above the limit set by a hazard probability
requirement, the causal path from fault to hazard should be broken sufficiently effectively to
make the system meet the probabilistic requirement. This can be achieved by a
modification of the system architecture (D.3.3) or by the introduction of an error detection
mechanism and a corresponding reaction. (D.3.4.3)
3.5.17 Specific mechanisms for error detection and associated reaction should be specified as
requirements. In an early phase, such mechanisms may be specified somewhat
superficially, to be refined in more detail in a later development phase. (D.3.4)
3.5.18 In the search for appropriate requirements on specific error detection mechanisms, at least
the following mechanism types should be considered:
• plausibility checks. (D.3.4.1.1)
• electrical monitoring. (D.3.4.1.2)
• comparison of redundant information. (D.3.4.1.3)
• mechanisms for detection of errors in CPU and in memory. (D.3.4.1.4)
• monitoring of communication. (D.3.4.1.5)
3.5.19 For each error detection mechanism, the applicability of the following types of reactions to a
detected error should be considered. (D.3.4.2):
• switch to a degraded mode
• information to the driver
• reset of the affected functions
• storage of Diagnostic Trouble Codes (DTCs)
3.5.20 In the definition of the criteria for determining whether a detected error should lead to a
specific reaction, the suitability of the following "error filter" mechanisms should be
considered. (D.3.4.2):
• Up/down counter, counting up for each detection of an error and counting down
when the error is not detected: When the counter reaches a threshold, the reaction
is initiated.
• Timeout mechanism: When the error has been continuously detected during some
predefined time interval, the reaction is initiated.
3.5.21 In the specification of an error detection mechanism, the following questions should be
answered. (D.3.4.4):
• Which component performs the check?

14.11.2006 Version 2.0 14


EASIS Deliverable D3.2 Part 1

• What is checked?
• When is the check made?
• What are the confirmation criteria, with respect to the result of the check, for
performing an action in response to a detected error? (Note: several different
actions may be possible, with different confirmation criteria)
• What action is performed by the component for each of the confirmation criteria?
• How does the component action propagate from the detecting component to a
system-level reaction?
• When a reaction to a detected error has been initiated, what is the 'healing criterion'
for reversing this action?
• What action is performed by the component when the healing criterion is fulfilled?
• How does this component action propagate from the detecting component to a
system-level reaction to the disappearance of the error?
3.5.22 Requirements on specific error detection mechanisms and corresponding reactions should
be verified using both fault injection techniques and design reviews. (D.3.4.5)
3.5.23 For quantitative requirements on hardware failure metrics, appropriate standards should be
followed when applicable. (The upcoming ISO 26262 standard, expected to become
effective in 2008, will be particularly relevant.). (D.3.5)
3.5.24 Critical functional requirements should be identified based on the results of the hazard
occurrence analysis. (D.3.7)
3.5.25 Critical functional requirements should be treated with particular care regarding verification.
The possibility of formal verification should be considered. (D.3.7)
3.5.26 For every identified hazard, introduction of limitations of the authority of the system (in the
time and value domains, i.e. duration and magnitude) should be considered, in order to
reduce the criticality of the hazards. (D.3.8)
3.5.27 Functional limitations, when specified, should be implemented close to the actual output of
the system. (D.3.8)
3.5.28 Design process requirements should be specified in accordance with applicable standards.
The upcoming ISO 26262 standard, expected to become effective in 2008, will be
particularly relevant. (D.3.9)
3.5.29 Requirements on isolation and diversity should be specified, when appropriate, based on
requirements on independence between system components with respect to the
occurrence of faults. (D.3.10)
3.5.30 In the specification of isolation requirements, the following methods should be considered.
(D.3.10.1):
• physical separation
• physical screens (housing, filtering, etc) between the components
• avoidance of common entities (hardware, data, etc)
• methods for isolation between SW modules running on the same CPU
3.5.31 In the specification of diversity requirements, the following methods should be considered.
(D.3.10.2):
• different software development teams
• different hardware
• different tools
14.11.2006 Version 2.0 15
EASIS Deliverable D3.2 Part 1

3.5.32 In the search for appropriate requirements to put on external systems, the following
questions should be considered. (D.3.12.3)
• Is there a need for additional information from an external system to the system of
concern, besides the normal and/or obvious?
• Can another system be used to implement (partial) redundancy to the system of
concern?
• Can the effects of a failure of the system of concern be mitigated (at the vehicle
level) by an external system?
• Can the effects of a failure of the system of concern be mitigated (at the vehicle
level) by an external system that detects and reacts to this failure?
3.5.33 Introduction of the following information in the user manual should be considered. (D.3.13):
• explanation of the system's functionality, capabilities and inherent limitations
• description of the Human-Machine Interface (HMI)
• descriptions of how the driver shall react to HMI information from the system
• description of system behaviour that could be surprising (example: pedal vibration
during ABS regulation)
• explanation that the driver is responsible for the control of the vehicle
3.5.34 Introduction of the following information in the instruction manual and similar documentation
should be considered, particularly for those maintenance actions that could lead to hazards
when performed incorrectly. (D.3.13):
• instructions for how to identify root faults
• instructions for how to perform repair actions
• instructions for assembly and mounting
• instructions for how to perform calibration
• instructions for how to check that a repair action has been correctly performed

3.6 Guidelines for Safety Case construction

Objective of Safety Case: communicate a clear, comprehensive and defensible argument that a
system is acceptably safe to operate in a given context.
Safety Case Process
3.6.1 It is recommended that a activities that lead to the creation of the Safety Case are planned,
and the plan is documented. Planning of theses activities includes scheduling and
assignment to appropriately experienced staff. (E.4.1.1)
3.6.2 A record of all discovered hazards and associated information should be kept. This is a
living document that requires updating as new information becomes available.. This record
is an essential input to the Safety Case construction. (E.4.1.2)
3.6.3 A Safety Case Lifecycle should be defined which at least states if and when the following
phases of the Safety Case have to be reached: (E.4.1.2)
• Preliminary Safety Case
• Interim Safety Cases
• Final Safety Case

14.11.2006 Version 2.0 16


EASIS Deliverable D3.2 Part 1

3.6.4 There should be one incremental Safety Case which grows during the lifecycle (as indicated
in recommendation 3.6.3 above) instead of several new documents at each phase. (E.4)
3.6.5 The Safety Case has to be instantiated at the earliest stage possible and its preliminary
phase should at least include a preliminary Safety Plan and a preliminary Hazard
Identification. (E.4.1.1)
3.6.6 Stages for Safety Case Reports need to be defined. It is recommended that a Safety Case
Report is delivered at the end of each stage to demonstrate milestones in development.
The Safety Case Report could be done in a Checklist format. (E.4.2)
3.6.7 Safety Case Maintenance activities after the “Start Of Production” should be defined. This
includes document update as well as change management activities. Causes could be
changes made to the system or occurrence of accidents showing that the system is not
fulfilling the primary claim of the Safety Case. (E.4.3)
3.6.8 A definition of “safe enough” for the product under development and the context in which it
will be developed and operated is required. This must relate (as a minimum) to local and
international standards and to prevailing social and judicial expectations in the countries of
development and deployment. (E.2.3)
3.6.9 The primary claim of the Safety Case should be expressed as "the system is safe enough”,
“system is as safe as can reasonably be demanded" or other similar expression. (E.8.2)
3.6.10 If graphical representation of the Safety Case is required, the use of tools that support the
Safety Case construction is recommended. (E.11)
3.6.11 The inclusion of supporting documents (e.g. System Description, Validation Results) into
the Safety Case needs to be defined. There are the following options of which the first one
is recommended: (E.7.2)
1. Local links that refer to documents presenting the relevant information, which results
in many documents
2. Content of the documents itself are included into the Safety Case document, which
results in one huge Safety Case document.
Assessment
3.6.12 Independent Safety Case Assessment (by a person not involved in this Safety Case) is
recommended. (E.9)
Graphical representation
3.6.13 Graphical representation of the Safety Case is recommended to improve readability and to
support the argument structure. (E.5)
3.6.14 It is necessary to choose one notation to be used throughout the Safety Case Lifecycle.
(E5)
3.6.15 A style guide should be defined and used. (E.5)
Argument design
3.6.16 General principles of argumentation (e.g. simplicity and logical soundness) should be
considered, with the aim of facilitating the understanding. (E.3.4.1)
3.6.17 Safety Case Patterns are suggested. Which to choose will be further described when the
structure itself is described. (E.8.2.2)
3.6.18 Safety Case Modules are highly recommended for (E.8.3)
• supplier-OEM relationships to support the distribute construction of Safety Cases
• process part of the Safety Case to support semi-prescriptive documentation of the
process part

14.11.2006 Version 2.0 17


EASIS Deliverable D3.2 Part 1

• support for the argumentation for “re-use” components


• support for the decomposition of the system in sub-systems.
3.6.19 The elements should have a unique identifier so that misunderstanding is minimized. (E.5)
3.6.20 Element phrasing should be made in a uniform manner. This includes statements and
documents. (E.4.1)
• In the preliminary Safety Case the phrasing should be in future tense, e.g. “The
system will be safe”
• If an argument is sufficiently shown the phrasing should be change into present
tense, e.g. “The system is safe”.
3.6.21 The supporting evidence, context and model information should consist at least of the
following: (E.7.6)
• descriptions of the system design,
• descriptions of the overall development process, including how the dependability
activities have been performed, process assessment
• the results of the hazard identification, hazard classification and hazard occurrence
analysis
• the dependability-related requirements, the verification plans and the verification
result reports
3.6.22 Simple arguments should be used to facilitate readability. Argument elements are not
always necessary but help with reasoning about the structure. (E.3.4)
3.6.23 The arguments should provide the following: (E.3.4)
1. Justification of the belief that the claim is supported by the existing evidence
2. Help to understand the relation between the evidence and the claim
3.6.24 The following supplementary elements should be included where appropriate to enhance
the understandability. (E.5.3.2)
1. Justification of the selection of the specific methods used for identifying,
classifying and analyzing hazards as well as other methods used for establishing
and verifying safety-related requirements.
2. Model description of the specific system design (mechanisms, design process,
etc).
3. Any assumptions made have to be documented. When possible, assumptions
should be proven and changed to justifications in the Safety Case.
4. Contextual information (e.g. lists) can support better understanding of the
relations between the elements. Additionally lists and other contextual information
can be used to repeat relevant information from different places in the Safety Case.
Typical contexts are: definitions, (hazard) lists and descriptions.
Recommended structure
The following structures defined within EASIS are recommended but variations are possible. Other
Patterns and references can be found in section E.8.2.1 of Appendix E.
3.6.25 The “Top-level spider” Pattern should be used. It allows a detailed basic description of the
system in the context of the Safety Case. This includes model description, assumptions and
justifications made, and the definition of safe enough. (E.8.2.2)
3.6.26 The “Systematic Fault Avoidance” Pattern should be used to separate the process
arguments from the product arguments. (E.8.2.2)

14.11.2006 Version 2.0 18


EASIS Deliverable D3.2 Part 1

3.6.27 The Product can be further divided by applying the following Patterns. (E.6.3.2)
1. The “ALARP” Pattern decomposes the hazards into different categories according
to their criticality. The different categories may require different types and levels of
evidence.
2. The “Functional Decomposition” Pattern decomposes the high level goals into sub-
goals that can be more easily addressed.
3.6.28 Process assessment has to be carefully considered. It is necessary to assess whether the
attributes or reference processes are met or not. (E.6.3.1)
3.6.29 Correct application of the process has to be demonstrated. The supporting evidence for this
includes: (E.6.3.1)
1. Existence of the artefacts
2. Correctness of the artefacts
3. Correct timing and order of process steps

14.11.2006 Version 2.0 19


EASIS Deliverable D3.2 Part 1

References

[1] EASIS Project Technical Annex, 24.09.2003


[2] J.C. Laprie (ed.), "Dependability: Basic Concepts and Terminology", Dependable Computing and Fault-
Tolerant Systems series, Vol. 5, Springer Verlag, 1992.
[3] ARP 4754: Certification Considerations for Highly-Integrated or Complex Aircraft Systems, Society of
Automotive Engineers, 1996
[4] IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems, IEC,
1998.
[5] EASIS Deliverable D4.1: EASIS Engineering Process

14.11.2006 Version 2.0 20


EASIS Deliverable D3.2 Part 1 - App A

Deliverable D3.2 Part 1 – Appendix A

Process frameworks for dependability

Version number: 2.0

Date of preparation: 14.11.2006

© 2006 The EASIS Consortium


EASIS Deliverable D3.2 Part 1 - App A

This page intentionally left blank

14.11.2006 2.0 A-ii


EASIS Deliverable D3.2 Part 1

Table of contents

A.1 Introduction and objective........................................................................... A-1


A.2 Existing dependability process frameworks................................................ A-2
A.2.1 ARP 4754 and DO-178B ................................................................. A-2
A.2.1.1 Certification in aerospace sector ......................................... A-2
A.2.1.2 ARP 4754 principal features ................................................ A-2
A.2.1.3 DO-178B principal features ................................................. A-6
A.2.1.4 Overall processes overview................................................. A-7
A.2.1.5 Applying the standards to automotive systems ................... A-9
A.2.2 DO-254............................................................................................ A-9
A.2.3 Example system safety process from Delphi for by-wire
automotive systems....................................................................... A-10
A.2.4 IEC 61508 ..................................................................................... A-12
A.2.4.1 Principal features of the standard ...................................... A-12
A.2.4.2 Applying the standard to automotive systems ................... A-14
A.2.4.3 Misconceptions .................................................................. A-15
A.2.4.4 Summary ........................................................................... A-15
A.2.5 Development guidelines for vehicle-based software ..................... A-15
A.2.5.1 Principal features ............................................................... A-16
A.2.5.2 Applying the Guidelines ..................................................... A-16
A.2.5.3 Summary ........................................................................... A-17
A.2.6 MISRA Safety Analysis Guidelines ............................................... A-17
A.2.7 ISO “Functional safety” activity...................................................... A-18
A.2.8 UK MoD Def Stan 00-56................................................................ A-19
A.2.8.1 Def Stan 00-56 Part 1 ........................................................ A-19
A.2.8.2 Def Stan 00-56 Part 2 ........................................................ A-20
A.2.8.3 Summary ........................................................................... A-21
A.2.9 US MIL-STD 882D ........................................................................ A-21
A.2.10 ECE standards R13 and R79 ........................................................ A-22
A.3 The EASIS dependability activity framework............................................ A-23
A.3.1 Meaning of "hazard" in the EASIS dependability activity
framework...................................................................................... A-26
A.3.2 Special cases requiring a different approach than the one
outlined in the dependability activity framework ............................ A-29
References ............................................................................................... A-30

14.11.2006 2.0 A-iii


EASIS Deliverable D3.2 Part 1

This page intentionally left blank

14.11.2006 2.0 A-iv


EASIS Deliverable D3.2 Part 1

A.1 Introduction and objective

The activities included in any process may be arranged in a framework that shows how they are
related to each other and to the overall aim of the process. For dependability in general, and safety
in particular, there are a number of such frameworks proposed in standards, guidelines and
technical papers. This appendix defines a dependability activity framework based on an evaluation
of some existing dependability frameworks (lifecycles, processes, etc) with respect to their
applicability to the development of automotive integrated safety systems.
Most of the standards and guidelines reviewed in this document are specifically concerned with
safety aspects rather than dependability in general. Still, these safety-oriented approaches provide
some guidance regarding other dependability issues than safety such as availability and reliability.
The main objective of the document is to arrive at a development process view that gives structure
to the EASIS Work Task 3.1 (Dependability requirements and hazard analysis techniques). The
aim is to identify a set of dependability-specific activities that should be carried out during the
development of an Integrated Safety System, or indeed any automotive control system.
By using existing standards and approaches as input to our subdivision of the dependability work
into a set of activities, it is ensured that proper consideration of state-of-the-art knowledge and
procedures is made in this subdivision.

14.11.2006 2.0 A-1


EASIS Deliverable D3.2 Part 1

A.2 Existing dependability process frameworks

Existing standards and other approaches for dependability processes differ in their scope. Some
are concerned with the entire system while others only address the software aspects. Some are
safety-specific and others are more general. For these reasons, a direct comparison between
lifecycles proposed in different standards and guidelines is seldom feasible. In the following
subsections, several proposed lifecycles are evaluated with respect to their suitability for
development of automotive integrated safety systems.

A.2.1 ARP 4754 and DO-178B

A.2.1.1 Certification in aerospace sector

In a joint effort, the European Organisation for Civil Aviation Equipment (EUROCAE) and the
American Society of Automotive Engineers (SAE) have developed guidelines for certification of
highly integrated and complex systems installed on aircraft.
The intention was to provide applicants and certification authorities with a universal basis for
demonstrating compliance with airworthiness requirements. These guidelines have been
developed with the contribution of the major aviation authorities, the Federal Aviation
Administration (FAA) and the Joint Aviation Authority (JAA).
The requirements and guidelines are contained in the following documents:
• SAE ARP-4754: Certification considerations for highly integrated aircraft [2].
• EUROCAE ED-12B / RTCA DO-178B: Software considerations in airborne systems and
equipment certification [8].
• EUROCAE ED-80/ RTCA DO-254: Design assurance guidance for airborne electronic
hardware [9].
• SAE ARP-4761: Guidelines and methods for conducting the safety assessment process on
civil airborne systems and equipment [3].
ARP-4754 provides a safety-directed development model. This model is based on the notion of
hierarchical system decomposition during development. It contains processes for requirements
capture, validation, system development and verification.
DO-178B and DO-254 become applicable at the point where requirements have been allocated to
specific hardware and software components. These documents define guidelines for the
specification, development and verification of these components.
Finally, ARP-4761 defines in detail the assessment activities of the ARP-4754 development model.
The suggested techniques span from methods to establish the functional safety requirements at
aircraft level, to detailed safety assessment at item level.
A discussion on specialized topics proposed by ARP-4761 is outside the scope of this document.

A.2.1.2 ARP 4754 principal features

ARP 4754 [2] was published in 1996 by the SAE (Society of Automotive Engineers). It is intended
to provide designers, manufacturers, installers and certification authorities a common international
basis for demonstrating compliance with airworthiness requirements applicable to complex
systems that integrate multiple aircraft-level functions and have failure modes with the potential to
result in unsafe aircraft operating conditions.

14.11.2006 2.0 A-2


EASIS Deliverable D3.2 Part 1

ARP 4754 addresses the total life cycle for systems that implement aircraft-level functions. It
excludes specific coverage of detailed systems, software and hardware design processes beyond
those of significance in establishing the safety of the implemented system.
More detailed coverage of the software aspects of design are dealt with in RTCA document DO-
178B and its EUROCAE counterpart, ED-12B. Coverage of complex hardware aspects of design
are dealt with in RTCA document DO-254.

A.2.1.2.1 Overall life-cycle overview

ARP-4754 provides a safety-directed development model based on the notion of hierarchical


system decomposition during development. It contains processes for requirements capture,
validation, system development and verification on the high-levels of the system hierarchy.
These “supporting” processes involved by the certification activities are:
• Certification Process and Coordination
• Safety Assessment Process
• Validation of Requirements
• Implementation Verification
• Configuration Management
• Process Assurance
The most relevant topics related to the EASIS WT3.1 objectives are reported in the Certification
Process and Coordination section and in the Safety Assessment Process section.
These topics are presented in the following sections.

A.2.1.2.2 Safety Assessment process model

The ARP-4754 model is schematically illustrated in Figure A.1, where the inter-relations of the
safety assessment process activities with the main development process activities are highlighted.
In reality, there are many feedback loops within and among these relationships, though they have
been omitted from the figure for clarity.

14.11.2006 2.0 A-3


EASIS Deliverable D3.2 Part 1

Aircraft level
Requirements
Aircraft level
FHA Allocation of
Aircraft Functions
to Systems

System Level
FHA Sections Development of
System
Architecture

CCAs PSSAs
Allocation of
requirements to
Hardware &
SSAs Software

System
Implementation

Certification

Safety Assessment System development


Process Process

Figure A.1 Simplified ARP 4754 Safety assessment process model

At the first level of this breakdown, the functional requirements of the aircraft are supplemented
with safety requirements and are allocated to a number of systems.
The safety requirements for these systems are established from the results of the aircraft level
Functional Hazard Assessment (FHA).

At the second level of hierarchical breakdown (“system”), the potential functional failures of each
system are assessed by a system level Functional Hazard Assessment (FHA), and decisions are
taken on an appropriate system architecture that could meet the system requirements.

The Preliminary System Safety Assessment (PSSA) of the system architecture follows.
The aim of the PSSA is to establish the safety requirements for each sub-system or item of the
system architecture. Sub-system safety requirements include the definition of appropriate
development assurance levels for each sub-system. When the breakdown process has reached
the stage of implementation, these development assurance levels define the techniques and their
rigour for the specification, development and assessment of hardware and software.

As the system architecture is likely to contain parallel, dissimilar and multiple channel elements,
assumptions of independence of failure between these elements shall be derived and
substantiated. The Common Cause Analysis (CCA) is therefore appropriate at this stage in order
to establish requirements for the physical or functional separation of the sub-systems or items.
14.11.2006 2.0 A-4
EASIS Deliverable D3.2 Part 1

At the final stage, the System Safety Assessment (SSA) is conducted to collect, analyse and
document evidence that the system implementation meets the safety requirements established by
the FHA and PSSA. This verification process shall appropriately cover system and component
level development and verification activities.
In the context of system safety assessment (SSA), the results from the CCA provide the arguments
that substantiate assumptions of independence between parallel, dissimilar components.

A.2.1.2.3 Safety requirements

According to the ARP-4754 model the system safety requirements are established by a Functional
Hazard Assessment by:
• The examination of aircraft and system functions and the identification of potential
functional failures.
• The assessment of the effects of functional failure conditions.
• The classification of each failure condition based on the identified effects.
The standard provides references to airworthiness regulations which define, for each aircraft
category, a proper failure classification schema based on the severity of the failure effects in terms
of system (aircraft and its subsystems) performance reduction, users (crew, passengers) injuries,
the mitigating or pejorative contributions to failure effects coming from the actual operating
condition (take-off, landing, etc.).
A typical failure classification applicable to FAR – JAR part 25 aircrafts is depicted in Table A.1.

Table A.1 Relationship between the severity of a functional failure condition, the
quantitative safety requirement for the function and the development level for the system

Failure Condition Quantitative Safety Development


Class Requirement (failures/h) Assurance Level

Catastrophic P< 10-9 A


Hazardous P< 10-7 B
Major P< 10-5 C
Minor None D
No safety effect None E

The referenced airworthiness regulations (FAA, JAR AC 25.1309-1A) provide a relationship


between each of the five classes of failure conditions with:
• a functional safety requirement expressed in terms of maximum rate of failure per operating
hour for hardware random failures (coming out from “on field” data collection), and
• a development assurance level, i.e. a qualitative indicator used to reduce the effects of
systematic failures like hardware design & manufacturing, software, installation errors, etc.
during the system life-cycle.

A.2.1.2.4 Development Assurance Level

ARP-4754 recognizes that only with respect to random hardware failures is it possible to quantify
and apply reliability prediction in evaluating whether probabilistic safety requirements are met.
14.11.2006 2.0 A-5
EASIS Deliverable D3.2 Part 1

Contributions to critical functional failures coming from hardware systematic faults (i.e. faults which
are introduced by human in specification, design, manufacturing and installation of HW
components) and from software faults are, generally, not quantifiable. In consequence, the
verification of these kinds of failures against quantitative safety requirements is impossible.
This problem is addressed by the ARP-4754 introduction of the concept of “Development
Assurance Level”.
The Development Assurance Level addresses the development process to introduce appropriate
degree of process control, with the purpose of achieving qualitative indicators that a system meets
its safety objectives. In particular the development assurance level addresses the following
development process drivers:
• the extension and the methods to be used for the safety assessment of the system (e.g.
management of complexity, Fault Tree Analysis and relevant quantitative analysis, FMEA)
• the extension of the data and documentation formally issued within the system development
life cycle
• the extension, traceability and independence of the system verification process
• the extension and applicability of the traceability requirements for the change management
process
The ARP-4754 also shows how to use the development levels during the design decomposition
process, as a useful mechanism which simplifies and rationalises the process of allocating system
safety requirements to components of the system architecture.
At the lower level of the system hierarchical decomposition, the above mechanism should provide
a development assurance level for the software and hardware items as expected from RTCA DO-
178B and RTCA DO-254 guidelines.

A.2.1.2.5 System verification

In the ARP-4754 the system verification is driven by the System Safety Assessment (SSA)
process. The suggested SSA takes a functional view-point on system verification.
For the verification of system level and hardware the guidelines propose the selective, combined
use of Fault Tree Analysis (FTA), Failure Mode and Effect Analysis (FMEA), Markov models and
Dependence diagrams.
For the verification of software it is recognised that software failure rates and their contribution to
critical functional failures cannot be quantified. There are no techniques to verify software against
quantitative safety requirements.
The verification of software is addressed using the software development assurance level
according to RTCA DO-178B guidelines which relates each development assurance level to a list
of requirements for the specification, development and testing of software.

A.2.1.2.6 Safety Case

The guidelines do not perceive a structured safety case as a requirement for certification. The
decision on an appropriate set of certification data is a result of negotiations between applicants,
i.e. aircraft manufacturers and certification authorities.

A.2.1.3 DO-178B principal features

ED-12/DO-178B [8] was published in 1992 by EUROCAE (a non-profit organization addressing


aeronautic technical problems) and RTCA (Requirements and Technical Concepts for Aviation) an
association of aeronautical organizations of the U.S.A. from both government and industry.
14.11.2006 2.0 A-6
EASIS Deliverable D3.2 Part 1

It was written by a group of experts from certification authorities and companies developing
airborne software. It provides guidelines for the production of software for airborne systems and
equipment.
The objective of the guideline is to assure that software performs its intended function with a level
of confidence in safety that complies with airworthiness requirements.
These guidelines specify:
• Objectives for software life cycle processes.
• Description of activities and design considerations for achieving those objectives.
• Description of the evidence that indicate that the objectives have been satisfied.

A.2.1.4 Overall processes overview

DO-178B is not a development standard for software. It is an assurance standard. DO-178B is


neutral with respect to development methods. Developers are free to choose their own methods,
provided the results satisfy the assurance criteria of DO-178B in the areas of planning,
requirements definition, design and coding, integration, verification, configuration management,
and quality assurance.
The document structures activities as a hierarchy of “processes”, depicted by Figure A.2.
DO-178B defines three top-level groups of processes:
• The software planning processes that define and coordinate the activities of the software
development and integral processes for a project.
• The software development processes that produce the software product. These processes
are the software requirements process, the software design process, the software coding
process, and the integration process.
• The integral processes that ensures the correctness, control, and confidence of the
software life-cycle processes and their outputs. The integral processes are the software
verification process, the software configuration management process, the software quality
assurance process, and the certification liaison process. The integral processes are
performed concurrently with the software development processes throughout the software
life-cycle.

Standards
Planning Environment Development
Process Process
Requirements,
Verification Verification
Design, Code
criteria Results

Integral Process
Verification
Certification Configuration Process
Liaison Management
Process Process Quality
Assurance

Figure A.2 DO-178B Life-cycle process structure.

14.11.2006 2.0 A-7


EASIS Deliverable D3.2 Part 1

A.2.1.4.1 Relationship between ARP 4754 and DO-178B

ARP 4754 and DO-178B are complementary guidelines:


• ARP 4754 provides guidelines for the system level processes.
• DO-178B provides guidelines for the software development processes.
The information flow between the system processes and software processes is shown by Figure
A.3.

System life cycle processes (ARP 4574)

System Safety Assessment Process

• System requirements • Fault containment


allocated to software boundaries
• Software level(s) • Error sources
• Design constraints identified/eliminated
• Hardware definition • Software requirements
and architecture

Software life cycle processes (DO-178B)

Figure A.3 Relationship between ARP-4754 and DO-178B Processes

ARP 4754 identifies the relationships with DO-178B in the following terms:
“The point where requirements are allocated to hardware and software is also the point where the
guidelines of this document transition to the guidelines of DO-178B (for software), DO-254 (for
complex hardware), and other existing industry guidelines.
The following data is passed to the software and hardware processes as part of the requirements
allocation:
• Requirements allocated to hardware.
• Requirements allocated to software.
• Development assurance level for each requirement and a description of associated failure
condition(s), if applicable.
• Allocated failure rates and exposure interval(s) for hardware failures of significance.
• Hardware/software interface description (system design).
• Design constraints, including functional isolation, separation, and partitioning requirements.
• System validation activities to be performed at the software or hardware development level,
if any.
• System verification activities to be performed at the software or hardware development
level.

A.2.1.4.2 Tool Qualification

14.11.2006 2.0 A-8


EASIS Deliverable D3.2 Part 1

The standard provides requirements to perform tool qualification if a tool is used to automate
software development processes that are typically performed by humans. Tool Operational
Requirements must be provided, and the tool must be verified against the operational
requirements.

A.2.1.4.3 Off-the-Shelf-Software

The use of off-the-shelf, including commercial off-the-shelf (COTS), software is permitted by the
standard; however, the software must be verified to provide the verification assurance as defined
for the criticality level of the system.

A.2.1.4.4 Safety Case

During the software life cycle processes various data is produced to plan, explain, record or
provide evidence of activities. The document discusses the characteristics and contents of such
software lifecycle data.

A.2.1.5 Applying the standards to automotive systems

ARP-4754 and DO-178B have been successfully used by avionics industry for many years.
From a theoretical point of view, these guidelines could be applied to automotive systems as well.
However, the following issues act as limiting factors to be considered:
• International codes and regulations available for the aerospace sector have reached a high
level of maturity that is not present in the automotive sector. This leads to evident problems
to classify the severity of system failure conditions and therefore to assign the safety
requirements to automotive systems.
• Due to very different volumes in the aerospace sector and in the automotive sector, the
quantitative safety requirements (failure rate per hour) proposed by ARP-4754 and DO-
178B are not applicable to automotive systems. Furthermore a precise failure data
collection is very difficult to achieve in the automotive field.
• The activities requested by ARP-4754 and DO-178B for the development processes are
very expensive and time consuming if compared to actual automotive lifecycles.
• In the automotive sector, dependability-related activities often involve pre-series trials. This
approach is not compatible with the ARP-4754 and DO-178B guidelines.

A.2.2 DO-254

DO-254 “Design assurance guidance for airborne electronic hardware” [9] is the hardware
counterpart of DO-178B “Software considerations in airborne systems and equipment certification”.
It is based around the same framework of certification and system safety. It recognizes
interactions with system development process and software life cycle process
The guidance uses the same five system development assurance levels corresponding to the five
classes of failure conditions as the other airborne standards. These five levels are related in DO-
254 to five hardware design assurance levels for which definitions are given.
Hardware safety assessment is required as part of and supporting system safety assessment. The
objective of the system safety assessment is to show that applicable systems and equipment
(including the hardware) have satisfied applicable aircraft certification safety requirements

14.11.2006 2.0 A-9


EASIS Deliverable D3.2 Part 1

The document gives a hardware design life cycle, which has the following stages:
• Planning
• Design
o Requirements capture
o Conceptual design
o Detailed design
o Implementation
o Production transition
• Validation and verification (sic)
• Configuration management
• Process assurance
• Certification liaison
A further section gives requirements for the hardware lifecycle data e.g. the need to produce a
Plan for Hardware Aspects of Certification (PHAC). This is analogous to the PSAC (Plan for
Software Aspects of Certification) required by DO-178B.
There is a section on “Additional considerations”. Some notable items from this section include
information on previously-developed hardware and COTS.
Generally DO-254 is more concerned with the process than with detailed recommendations on
specific hardware designs or specific techniques and measures to apply. Probabilistic failure rate
measures are permitted (and indeed assumed) but are not defined in the document.
DO-254 is considered to be applicable to complex hardware designs including ASICs and PLDs.
FAA AC 33-28-2 implies that DO-254 should be followed for PLDs (namely a device purchased as
an electronic part and altered to perform an application-specific function). DO-254 also refers to
firmware, although a definition is not given in the document. Firmware is to be classified as
hardware or software and treated appropriately i.e. if it classified as hardware DO-254 is to be
followed, but if classified as software DO-178B is to be followed.

A.2.3 Example system safety process from Delphi for by-wire automotive systems

The system safety analysis process exemplified (rather than proposed) by Delphi Automotive
Systems for by-wire automotive systems [1] differs from the other processes studied in this report
since it has no official status. Nevertheless, it is specifically aimed at the automotive domain and
therefore of some interest here.
The process is schematically depicted in Figure A.4. The bottom row of this figure shows the
system safety activities.

14.11.2006 2.0 A-10


EASIS Deliverable D3.2 Part 1

Verification
Conceptual Requirements Architecture Detailed and
Design Analysis Design Design Validation Production and
Deployment

Hazard Control
System Preliminary Hazard Control Detailed Specifications
Safety Hazard Specifications Hazard (Diagnostics, Safety Comprehensive
Program Analysis (Safety Analysis Design Safety Verification Safety report
Plan Requirements) Margins,...)

Figure A.4 Example System Safety Process from Delphi Automotive Systems (from [1])

Some of the steps of this process are briefly explained below:


• In the Preliminary Hazard Analysis (PHA), potential hazards are identified and the
likelihood and severity of incidents that could result from each hazard are assessed.
• The Safety Requirements are generated based on the results of the Preliminary
Hazard Analysis. These requirements typically influence the architecture of the
system and include high-level requirements such as the tolerable risk. Thus, they
serve to refine the conceptual design into a system architecture.
• In the Detailed Hazard Analysis, the system architecture is analysed with respect to
the causal relationships that lead to hazards.
• The Diagnostics, Design safety margins, etc are generated based on the findings of
the Detailed Hazard Analysis.
• The Safety Verification typically involves fault injection testing to enable the hazard
controls to be verified.
• The Comprehensive Safety Report, or Safety Case, summarizes the results of
analyses performed and the steps taken to reduce potential risk, identifies the
residual potential risk remaining, describes why this level of risk is acceptable, and
justifies the belief of the System Safety Working Group that their assessment is
accurate.
As only an overview of an example process is given in [1], it is neither possible nor relevant to
make an evaluation of this approach. Our comments below should therefore be considered as
nothing more than some observations concerning the example process.
• It seems as if no iterations are performed within the process as there is no
information flow from the right to the left in Figure A.4. In a real-world development
undertaking, such a "waterfall" approach is rarely, if ever, realistic.
• The Conceptual Design represents the starting point of the process. This does not
reflect the fact that hazards can be identified and classified based on a description of
the intended purpose of the system, regardless of how it is to be implemented. Such
identification and classification of hazards could and should influence the conceptual
design.
• In the Preliminary Hazard Analysis (PHA), only an extremely rough estimation of the
likelihood of a given incident can be assessed as no information about the actual
system design is available other than the (rough) conceptual design. This means that
the results of the PHA might be misleading regarding the likelihood of incidents.

14.11.2006 2.0 A-11


EASIS Deliverable D3.2 Part 1

A.2.4 IEC 61508

IEC 61508 [10] is an international standard concerned with the functional safety of electrical,
electronic and programmable electronic (E/E/PE) safety-related systems. In understanding how to
apply the standard, there are some important points that have to be considered:
The standard is generic, and the intention is that industry sectors produce their own standards
based on it. To this end the normative parts (parts 1–4) of the standard have been designated by
IEC as a “basic safety publication” which means that these parts have to be used to prepare the
sector-specific standards. In practice this is often interpreted as meaning that a clause-by-clause
sector-specific application of each part is required.
Furthermore, if a sector-specific standard does not yet exist, IEC 61508 can be applied directly as
the applicable standard. However, a justification has to be provided for any sector-specific or
application-specific deviation from the standard.
The standard was developed against the background of industrial process control. In the IEC
61508 model, there is “equipment under control” (EUC) which can adversely affect its environment.
Safety functions are added (either as a stand-alone protection system or in a control system
associated with the EUC) to reduce the risks associated with the hazards of the system to an
acceptable level. Where these safety functions are realized in an electrical system, and/or an
electronic system, and/or a programmable electronic system the standard applies. If any safety
functions are realized entirely through some other means, then the standard does not apply to
those functions (although there may be other standards that do apply). This is illustrated in Figure
A.5 below adapted from Part 5 of the standard.

Residual Tolerable risk EUC risk


risk

Necessary minimum risk reduction [∆R]


Increasing
Actual risk reduction risk

Part of risk Part of risk Part of risk


covered by covered by covered by Scope of
other E/E/PE external application of
technology systems facilities IEC 61508

Figure A.5 Risk reduction in IEC61508

A.2.4.1 Principal features of the standard

The principal features of the standard are as follows:


• There is an explicit safety lifecycle that has to be followed. It covers all stages of a project from
concept through to decommissioning. It is assumed that the safety lifecycle is followed in
addition to any product development lifecycle (such as the lifecycle for a new chemical plant).
This lifecycle is shown in Figure A.6 below.

14.11.2006 2.0 A-12


EASIS Deliverable D3.2 Part 1

Concept

Overall scope definition

Hazard and risk analysis

Overall safety requirements

Safety requirements
allocation

Overall Safety-related systems: Safety-related External risk


planning E/E/PES systems: other reduction
Realisation technology facilities
Realisation Realisation
Overall installation and
commissioning

back to appropriate
Overall safety validation lifecycle phase

Overall operation, Overall modificaton


maintenance and repair and retrofit

Decommissioning or
disposal

Figure A.6 The IEC 61508 lifecycle


• The basic premise of the standard is that the EUC has hazards associated with it, and that
safety functions have to be added to the EUC and/or the EUC control system to reduce the
risks associated with the hazards to an acceptable level. A process of hazard analysis and risk
classification is followed (Stage 3 of the lifecycle). This process identifies safety requirements
(Stage 4), the necessary level of risk reduction required, and allocates this risk reduction to
various systems and facilities (Stage 5). These may include E/E/PE systems. The necessary
safety functions to be implemented within these systems are identified.
• Safety requirements are divided into safety functional requirements (i.e. a statement of the
functions that a system must perform in order to achieve the required level of safety) and safety
integrity requirements. Safety integrity requirements are related to the probability of the safety
functions being correctly performed under all the stated conditions within a stated period of
time. Safety integrity as defined by IEC 61508 is therefore a reliability measure associated
with the safety functions.
• Some safety functions are described as “on demand” or “low demand”, where the safety
function does not operate unless it is required to. These safety functions are typically found in
protection systems that are separate to an EUC control system.

14.11.2006 2.0 A-13


EASIS Deliverable D3.2 Part 1

• Some safety functions are described as “continuous” or “high demand”, where the function is
operative for a high proportion of the system up-time. These safety functions are typically found
within an EUC control system.
• For risk reduction allocated to safety functions implemented in E/E/PE systems, a safety
integrity level (SIL) is used as a measure of the necessary risk reduction required from that
function. There are 4 SILs, with SIL 1 representing the least requirement for risk reduction, and
SIL 4 the greatest. Higher SILs translate into greater reliability required of the safety functions.
• Example (according to the model of risk reduction envisaged by the authors of IEC 61508): an
EUC without protective measures has a hazard with a probability of occurrence of 2 x 10-4. The
tolerable risk is a probability of occurrence of 1 x 10-7. Therefore the required risk reduction
from a low demand protection system is 5 x 10-4; thus the protection system is allocated SIL 3
(see Table 2 of Part 1 in [10]). This is the target failure measure for the safety function.
• Where a system or component is described as being “a SIL n system”, it means that the
system or component is capable of supporting safety functions up to (and including) those
allocated SIL n. A SIL is not per se a property of a system or component.
• For safety functions where a random failure rate can be calculated, demonstration of the
requirements of the SIL is achieved by showing that the random failure rate is within the
bounds for that SIL. This is a random safety integrity requirement. Additionally, other
techniques and measures are applied to control random faults. The rigour of both the
techniques and their application increases with the SIL. These are systematic safety integrity
requirements.
• For safety functions where a random failure rate cannot be calculated (for example, for almost
any function based on software), demonstration of the requirements of the SIL is achieved by
applying appropriate techniques and measures in the design, implementation and verification.
Again, the rigour of both the techniques and their application increases with the SIL. A number
of very detailed and specific lists of techniques and measures are given. This includes
measures to avoid systematic faults. These are further systematic safety integrity
requirements.
• The safety integrity requirements determine the rigour of the processes that must be followed
in developing the system (Stage 9 of the lifecycle). In addition, these requirements determine
the rigour of the validation activities necessary to demonstrate that the system has achieved
the required level of safety (Stages 7 and 13).
• There are requirements on installation and commissioning (Stages 8 and 12), operation and
maintenance (Stages 6 and 14) and decommissioning (Stage 16). These appear to be quite
specific to the model of a protection system being added to a large industrial installation.
• Any changes to the system (Stage 15) have to be analysed for their impact and the change
taken back to the appropriate stage of the lifecycle.
• An independent assessor is usually required to provide independent assurance that the system
has been implemented in accordance with its safety requirements. The standard expects that
the assessor will be involved at all stages of system development.

A.2.4.2 Applying the standard to automotive systems

In seeking to apply IEC 61508 to automotive systems the following issues are evident.
• The safety lifecycle does not align well with the typical product development processes
followed by automotive manufacturers and their suppliers. Automotive system development is
based on a number of iterations or “samples”, while the vehicles also have defined product
development lifecycles. Both of these lifecycles have key gateways or milestones that do not
have any specific functional safety requirements. See also “engineering” in the EASIS state-of-
the-art deliverable D0.1.2.
14.11.2006 2.0 A-14
EASIS Deliverable D3.2 Part 1

• As part of the automotive development lifecycle, automotive systems are usually subject to a
test-based Type Approval process. The system is fully developed and validated, then
approved, then released for series production. There is little provision within this process for
independent safety assessment throughout the lifecycle as required by IEC 61508.
• Automotive safety-related systems are rarely the classical “protection” systems envisaged by
IEC 61508. Instead, safety functions are part of the actual functioning of the system and it is
often difficult to make an arbitrary distinction between the EUC and the safety functions. For
example, the IEC’s “functional safety zone” website states that ABS is an example of an on-
demand protection system. In reality, the functioning of “ABS” on a modern vehicle is closely
bound up with the operation of the powertrain to implement a wide range of stability control
functions including conventional ABS.
• Practitioners therefore tend implicitly to allocate SILs to systems rather than functions.
• Automotive integrity/dependability requirements are often concerned with wider issues than
safety integrity alone. IEC 61508 makes little mention of human factors, yet drivers are an
integral part of the “system”.
• The techniques and measures listed by SIL are very specific and many of them are only
applicable to the process control sector. Conversely, techniques and measures that may be
commonplace in the automotive industry are not mentioned. Thus, for example, if a system
was developed using model-based control and automated code generation, a detailed
justification and analysis would need to be presented in order to comply with IEC 61508.
• The standard does not address the supply chain structure commonly found in the automotive
industry.

A.2.4.3 Misconceptions

Some common misconceptions in applying the standard are as follows:


• "All parts of the standard are normative". In fact only parts 1 – 4 are normative. In particular,
part 5 (method for determining SILs) and part 7 (overview of techniques and measures) are
informative and therefore examples of what can be done.
• "SILs apply to a system". In fact they apply to safety functions – see discussion above.
• "Only quantitative risk assessment (i.e. demonstration of a failure rate) can be applied". In fact
the standard recognizes and allows both quantitative and qualitative risk assessment.
• "IEC 61508 provides a 'check list' approach to compliance". In fact it is not sufficient to simply
“tick a box”; there has to be skilled and competent understanding and application of the
techniques and measures, often alongside the need to convince an independent assessor.

A.2.4.4 Summary

IEC 61508 is reference standard for safety-related systems and contains many recognized “good
practice” techniques for the engineering of safety-related systems. It is a very prescriptive
approach, which can prove difficult to interpret in sectors with different requirements. The MISRA
Guidelines (see section A.2.5) were the first published interpretation of IEC 61508 for the
automotive industry. The EASIS approach will need to take account of its requirements, but some
parts will need careful interpretation and implementation.

A.2.5 Development guidelines for vehicle-based software

Development Guidelines for Vehicle Based Software [7](often known as the “MISRA Guidelines”)
were developed in the early 1990s by a UK-based consortium of automotive companies

14.11.2006 2.0 A-15


EASIS Deliverable D3.2 Part 1

representing vehicle manufacturers, the supply chain and consultants. The Guidelines were written
against the background of the development that was taking place on IEC 61508 and
acknowledged that, while the track record of the automotive industry in respect of embedded
software was good, a recognized industry position on embedded software development for safety-
related vehicle control systems was needed.
The team producing the Guidelines had access to draft material from the committees that
produced IEC 61508, and many of the concepts and principles in that standard are embodied in
the Guidelines. The Guidelines sought to incorporate these principles within the context of the
standard approaches for automotive engineering. The authors of the Guidelines did not consider a
clause-by-clause interpretation of IEC 61508 as at that time the structure of the standard was not
confirmed.
As well as the Development Guidelines, there are 9 supporting reports that give additional
information and background to some of the recommendations in the Guidelines.

A.2.5.1 Principal features

The notable features of the approach adopted in the Guidelines are:


• The emphasis on safety activities in the development lifecycle. There is not a specified
lifecycle, although a generic lifecycle is used to indicate where the various safety engineering
activities have to be applied.
• The use of “controllability” to classify hazards (see Appendix B). This is a different approach
from those traditionally used in pursuit of applying IEC 61508; although a similar process is
used by the aviation industry and IEC 61508 Part 5 references the MISRA Guidelines as an
example of another approach to SIL determination.
• The use of (safety) integrity levels to classify systems and/or functions according to the level of
risk mitigation required.
• A summary table recommending the degree of process rigour required with increasing (safety)
integrity level.
• The requirement for independent safety assessment.

A.2.5.2 Applying the Guidelines

Although the Guidelines have been in use worldwide in the 10 years since their publication, a
number of issues can be identified.

A.2.5.2.1 Relationship to IEC 61508

Since there is no formal mapping with IEC 61508 in the Guidelines, despite many of the principles
being implemented, it is not always evident that the Guidelines provide an automotive industry
implementation of IEC 61508. Similarly, there is sometimes a perception that the Guidelines carry
less weight than IEC 61508, as they are not a standard, notwithstanding that in terms of Product
Liability and Product Safety legislation they represent best practice and therefore are equally
applicable.
In the UK the Health and Safety Executive (HSE), the government agency responsible for
enforcing health and safety legislation in industry and who are closely involved in the development
of IEC 61508, have stated “IEC 61508 will be used as a reference standard for determining
whether a reasonably practicable level of safety has been achieved when E/E/PE systems are
used to carry out safety functions. The extent to which Directorates/Divisions [of HSE] use IEC
61508 will depend on individual circumstances, whether any sector standards based on IEC 61508
have been developed and whether there are existing specific industry standards or guidelines.”
14.11.2006 2.0 A-16
EASIS Deliverable D3.2 Part 1

A.2.5.2.2 Use of SILs

The guidelines are based around “integrity levels” rather than SIL. Although many practitioners
now use “SIL” when referring to the MISRA Guidelines, the MISRA definition of “integrity” is wider
than IEC 61508’s definition of “safety integrity” as it also encompasses the wider implications of
system failure such as economic or environmental loss as well as personal injury.
Furthermore, the techniques and measures in the Guidelines are not graded by SIL. Apart from
Table 3 in the Guidelines, which shows the rigour required in the software development process by
increasing SIL, there is no grading by SIL of any of the other recommendations.

A.2.5.2.3 Safety analysis

The Guidelines assume that safety analysis processes will be carried out. A preliminary safety
analysis is explicitly required which encompasses hazard identification and hazard classification at
the concept stage of a system or early in its lifecycle. Detailed safety analysis is implied but not
described in detail beyond a short paragraph in the “Integrity” report.
The preliminary safety analysis required corresponds approximately to the hazard identification
and classification addressed in Appendix B of this deliverable.
The detailed safety analysis required corresponds approximately to the hazard occurrence
analysis addressed in Appendix C of this deliverable. Note that the MISRA Guidelines assume that
detailed safety analysis is an iterative activity. There is an implication that to some extent,
preliminary safety analysis may also be an iterative activity.

A.2.5.2.4 Scope of the Guidelines

There is a perception that the Guidelines are only concerned with software. In fact the Guidelines
advocate a systems engineering approach, but provide guidance mostly for the software aspects.
The Guidelines do not explicitly address recent technology developments that may be used in
software development such as model-based development and automatic code generation.

A.2.5.3 Summary

The MISRA Guidelines represent an approach based on IEC 61508 that takes account of the
requirements of automotive systems and is, to some extent, more goal-based than prescriptive.
They could provide a good starting point for defining the EASIS approach, particularly to software
development.

A.2.6 MISRA Safety Analysis Guidelines

The MISRA Safety Analysis (SA) guidelines are a new publication that gives guidance on the
management of functional safety in the context of automotive programs. The MISRA SA guidance
on functional safety management techniques is based around a safety management lifecycle. This
lifecycle is aligned with both the IEC 61508 functional safety lifecycle and the typical product
development lifecycles used for vehicles.
As well as providing a functional safety management framework, the MISRA SA guidelines contain
a detailed process for preliminary safety analysis and detailed safety analysis. The preliminary
safety analysis process is concerned with enabling safety requirements to be identified as part of
the process of setting targets or attributes for a new vehicle or a new vehicle system. It includes
guidance on hazard identification and hazard classification, including the use of a risk graph
technique for hazard classification. The risk graph incorporates the previous “controllability”
technique from the 1994 MISRA Guidelines. The safety requirements include the setting of random
14.11.2006 2.0 A-17
EASIS Deliverable D3.2 Part 1

and systematic safety integrity requirements. The random safety integrity requirements are
specified in terms of a target failure rate per hour, and the systematic safety integrity requirements
as a safety integrity level (SIL).
The detailed safety analysis process is concerned with iteratively applying inductive and deductive
analysis techniques to the design to confirm that the safety requirements have been implemented.
It is noted that the most common form of inductive analysis used in the automotive industry is
FMEA, and some advice is provided on applying FMEA in the context of a functional safety
management approach. Similarly, it is noted that FTA is a commonly-used deductive technique,
particularly as a means of predicting the target failure rate.
The MISRA SA guidelines are a goal-orientated approach to managing functional safety, within
which the requirements of standards such as IEC 61508 and the proposed ISO 26262 (see below)
can be met. The MISRA SA approach is compatible with these standards, in particular the
proposed ISO 26262, since in general it specifies the activities that are required but does not
prescribe the techniques that have to be used. The only potential incompatibility is that MISRA SA
suggests a different hazard classification scheme to that proposed in the draft ISO 26262.
However MISRA SA permits the user to select an appropriate scheme, provided it is documented
and justified in the company (or project) safety policy.
The MISRA SA scheme for hazard classification is discussed further in Appendix B.

A.2.7 ISO “Functional safety” activity

In November 2005 a new ISO Working Group, ISO/TC22/SC3/WG16 “Functional safety” held its
inaugural meeting. The purpose of this ISO activity is to develop a new standard based on IEC
61508 for the functional safety of electrical, electronic and programmable electronic systems used
in safety-related applications in road vehicles. The new standard will be applicable to road vehicles
of classes M, N and O as defined by the Type Approval Directive 70/156/EEC.
At the present time, the working drafts are confidential to the ISO Working Group members. This
summary is therefore based on publicly available information such as conference presentations.
The Working Group is proposing that the new ISO standard, to be known as ISO 26262 [14], will
have the following parts:
• Part 1: Glossary
• Part 2: Management of functional safety
• Part 3: Concept phase
• Part 4: Product development – system
• Part 5: Product development – hardware
• Part 6: Product development – software
• Part 7: Production and operation
• Part 8: Supporting processes
Additional parts are also foreseen containing “Annex” material.
The motivation for developing ISO 26262 is that there are well-documented issues in applying
IEC 61508 directly to automotive systems (see for example Section A.2.4.2 of this document).
Thus ISO 26262 will consider:
• Using automotive, rather than process industry, lifecycle models (e.g. AutomotiveSPICE)
• Hazard analysis and risk assessment aligned to the automotive sector
• Validation practices in the automotive industry, for example hardware-in-the-loop tests, the use
of “LabCar”-type approaches.

14.11.2006 2.0 A-18


EASIS Deliverable D3.2 Part 1

One key difference between IEC 61508 and ISO 26262 is that it is proposed to use the concept of
automotive safety integrity level (ASIL) instead of SIL. There are four ASILs, ASIL A – ASIL D,
which represent the risk reduction required from a system with ASIL D being the highest. However
the rigour associated with ASIL D (for systematic techniques and measures) is considered broadly
equivalent to that required by IEC 61508 SIL 3, with no equivalent proposed for SIL 4. At the time
of writing this document, it was not clear whether ISO 26262 would include any numerical or
probabilistic requirements associated with ASILs.
The ASIL is determined by performing hazard identification and hazard classification. Hazard
classification considers three parameters:
• Exposure, which relates to the probability of being exposed to the hazard
• Controllability, which relates to the probability of the driver being able to control the hazardous
situation
• Severity, which relates to the severity of the outcome of the hazard.
This hazard classification scheme is further discussed in Appendix B.

A.2.8 UK MoD Def Stan 00-56

Def Stan 00-56 is the UK Ministry of Defence standard for “Safety Management Requirements for
Defence Systems”[6]. It is in two parts, Part 1 being “Requirements” and Part 2 “Guidelines on
establishing a means of compliance with Part 1.” In the latest version, Issue 3, only compliance
with Part 1 is mandatory. With the release of Issue 3, the standards Def Stan 00-54 [4] (hardware)
and Def Stan 00-55 [5] (software) have been withdrawn. At the time of preparing this report, Def
Stan 00-55 was still available but its status was shown as “Obsolescent”.
The latest version of the standard is much less prescriptive than the previous version and is built
upon a goal-based approach. It provides an overall framework for safety management, but much of
the detail of implementation is permitted to be a project decision.

A.2.8.1 Def Stan 00-56 Part 1

Part 1 sets out the framework for safety management. It is based on the following principles:
• The overall objective of the standard is to demonstrate that the risks associated with a system
are broadly acceptable, or if broadly acceptable cannot be achieved that the risks are tolerable
and reduced ALARP (as low as reasonably practicable). These concepts are discussed in
more detail in Appendix D of this deliverable.
• The standard may be applied to any system, not just to electronic or programmable systems.
• The emphasis is on safety being considered at the earliest possible stage of a project, and for
safety management activities to be included in the project plan from the outset.
• Safety management must be integrated into an overall systems engineering approach.
• There must be an auditable safety management system.
• A Safety Case is developed and maintained.
• All credible hazards and accidents are identified, the accident sequences defined, the risks
associated with the hazards are determined; and the risks are demonstrably reduced to a
broadly acceptable level, or to a tolerable level and ALARP.
• There are monitoring and reporting mechanisms for failures and for accidents or “near misses”.
• The provision for an independent safety assessor is included.

14.11.2006 2.0 A-19


EASIS Deliverable D3.2 Part 1

For the management of safety, the standard requires that a project appoint a Safety Manager and
that a Safety Committee is established. Decisions of the Safety Committee (e.g. to accept a
tolerable risk based on ALARP) have to form part of the Safety Case. A Safety Management Plan
has to be generated and updated throughout the life of the project.
An important part of the safety management activities is the establishment of a hazard log. A
hazard log is the primary mechanism for providing traceability of the risk management process and
assurance of the effective management of hazards and accidents. The hazard log shall be updated
through the life of the project to ensure that it accurately reflects risk management activities.
The standard defines risk management as comprising the following stages:
1. Hazard identification
2. Hazard analysis
3. Risk estimation
4. Risk and ALARP evaluation
5. Risk reduction
6. Risk acceptance
The standard notes that the combination of activities 1 to 3 is sometimes referred to as “risk
analysis” and the combination of activities 1 to 4 is sometimes referred to as “risk assessment”.
The standard does not require a specific technique to be adopted for these activities, but the
method that is chosen has to be demonstrable adequate and suitable.

A.2.8.2 Def Stan 00-56 Part 2

Part 2 provides guidance and amplification on Part 1, but it is not mandatory.


It contains further guidance on the matter of tolerable risk and ALARP. This has been explored
more fully in Appendix D of this deliverable.
This part also includes detailed guidance on “complex electronic” systems. It is stated that this part
of the standard is purposely more detailed due to the withdrawal of Def Stan 00-54 and Def Stan
00-55, which previously covered these subjects in great detail. “Complex electronic” in this context
means any system or element of a system that is implemented in software or in custom hardware.
It is also stated to be applicable to elements such as ASICs, the data relating to software (e.g.
digital maps, configuration data) and COTS items (Commercial-Off-The-Shelf, both hardware and
software).
The approach to complex electronic systems is based on:
• Identifying potential failures of complex electronic elements that may contribute to existing
hazards or cause new ones
• Deriving safety requirements and additional mitigation, including safety integrity requirements
(both quantitative and qualitative)
• Providing evidence for the failure modes and failure rates of the system.
• Providing a body of evidence to support the claim in the Safety Case that the complex
electronic element is adequately safe, though various means including:
o Direct evidence: analysis, demonstration, quantitative, review and qualitative means
o Process evidence, for example the procedures and tools used to develop the system
o Counter evidence, for example from failed tests or in-service incidents.
This part of the standard cross-references IEC 61508 and DO-178B amongst other standards. It
notes that for complex electronic systems, these standards already define good practice

14.11.2006 2.0 A-20


EASIS Deliverable D3.2 Part 1

particularly where a SIL (or equivalent) is used to link safety requirements explicitly to development
rigour. However it is stated that because of the assumptions that are implicit to the safety integrity
level schemes in these standards, problems may arise in novel applications or if a scheme is
applied outside the domain, regulatory regime or application for which it was intended.

A.2.8.3 Summary

Def Stan 00-56 presents a goal-based approach to the management of safety and provides a
model for less prescriptive framework that can be used for the management of system safety
across a variety of applications and domains. It can also provide the basis for developing a
framework from existing standards for a specific application or domain. The overall goal-based
approach, and the frameworks presented in Part 2 for dealing with complex electronics and
ALARP, are important inputs to the definition of an EASIS approach.

A.2.9 US MIL-STD 882D

The MIL-STD-882D [11], issued by the US Department of Defense, defines and describes a set of
activities that shall be performed throughout the system life cycle when MIL-STD-882 is required in
a solicitation or contract. It does not define a development process in terms of how and when the
interaction between the activities shall be made. The activities required by the standard are the
following:
1. Documentation of the system safety approach
2. Identification of hazards
3. Assessment of mishap1 risk
4. Identification of mishap risk mitigation measures
5. Reduction of mishap risk to an acceptable level
6. Verification of mishap risk reduction
7. Review of hazards and acceptance of residual mishap risk by the appropriate authority
8. Tracking of hazards, their closures, and residual mishap risk
Some comments concerning the overall approach described in the standard and its applicability to
the development of automotive integrated safety systems are given below.
• The standard defines a hazard as "any real or potential condition that can cause injury,
illness, or death to personnel; damage to or loss of a system, equipment or property; or
damage to the environment". This definition encompasses a wide range of conditions
including the existence of explosive substances at a particular location. Such a wide
definition is appropriate for the type of systems addressed by the standard but in the EASIS
WT 3.1 work we have a more narrow scope. The hazards considered in the dependability
work package of EASIS are primarily associated with failures of automotive electronic
systems. Thus, the probabilities of these hazards are determined by the design of the
system. In other words, the hazard probabilities are a result of the system development
rather than an input to it.
• Although the standard does not explicitly state that the activities 2-6 are sequential steps
(with or without iteration) in a process, this seems to be an underlying assumption. Thus,
the approach appears to be "risk reduction in an existing system so that the residual risk is
acceptably low" rather than "development of a new system so that the residual risk is

1 A mishap is defined as an unplanned event or series of events resulting in death, injury, occupational illness, damage to or loss of
equipment or property, or damage to the environment.

14.11.2006 2.0 A-21


EASIS Deliverable D3.2 Part 1

acceptably low". For our purposes in the EASIS project, the latter approach is more
appropriate.
• The mishap risk is defined as "an expression of the impact and possibility of a mishap in
terms of potential mishap severity and probability of occurrence". The standard requires
that the mishap risk of each identified hazard is assessed. For the types of systems
considered in the EASIS project, however, the relationship between hazards on the one
hand and severities and probabilities of mishaps on the other is typically too complex to
allow a single probability and a single severity to be defined. Typically, a given hazard can
lead to several different consequences with different severities and different probabilities.
• The issue of how to determine the appropriate acceptable risk level is not mentioned in the
standard.
In summary, the standard is not particularly well suited for the type of systems considered in
EASIS. Still, parts of the standard provide valuable input to our investigation of the dependability-
related activities to be carried out during the development of an integrated safety system.

A.2.10 ECE standards R13 and R79

The United Nations Economic Commission for Europe (UNECE) is tasked with creating a uniform
set of regulations for vehicle design to aid global trade. Annex 18 of ECE-R13H for brake systems
[12] and Annex 6 of ECE-R79 for steering systems [13] are concerned with "Special Requirements
to be Applied to the Safety Aspects of Complex Electronic Vehicle Control Systems". These
annexes define requirements for documentation, fault strategy and verification. The requirements
may be summarized as follows:
• The function(s) of the system as well as the physical and logical system
structure shall be documented.
• The safety concept shall be documented. The safety concept is a description of
the measures designed into the system to ensure safe operation, for example
functional degradation or switch to a backup system when an error has been
detected.
• Analysis of the system's reaction to specific faults shall be documented, for
example in an FMEA and/or an FTA. The warning signals to be given to the
driver (and/or service inspection personnel) shall be documented for each
defined fault condition.
• Both the fault-free function and the safety concept shall be verified. For the
verification of the safety concept, error injection shall be carried out at the
discretion of the type approval authority. The verification results shall correspond
to the documented analysis, such that the safety concept and its implementation
are confirmed as being adequate.
From this brief summary, it should be evident that these annexes of the ECE regulations are not
particularly relevant to the investigation of dependability process frameworks in EASIS.

14.11.2006 2.0 A-22


EASIS Deliverable D3.2 Part 1

A.3 The EASIS dependability activity framework

It is well-known that a simple "waterfall" development process, without iterations, is not suitable for
complex automotive electronics. Thus, the dependability-related activities in the development of an
Integrated Safety Systems will not follow each other in strict succession. Each activity will be
revisited several times during the development with the results influencing other activities. Figure
A.7 shows a simplified example, for purpose of illustration only, of how the dependability-related
activities may progress in time in a particular development project.
Depending on a multitude of factors, this chart could have virtually any shape. Some factors
influencing the shape of the chart are:
• The actual system being developed
• Role of OEM and suppliers in the development of the system
• Company-internal processes
• Applicable standards
For Integrated Safety Systems (ISS) in particular, components and subsystems already existing in
the vehicle are often involved in realizing the new function provided by the ISS. For example, both
the brake control system and the engine control system might be involved in realizing an
Integrated Safety Function (ISF), partly by making use of existing functionality and partly by
incorporating new features for the ISF of concern. This makes it even more difficult to define a
generic dependability process.

Start of time Start of


development production
Functional
Description
System conceptual detailed
Design
Hazard
Identification
and Classification

Hazard Occurrence
Analysis
Dependability-related high level detailed
Requirements
Verification
of dependability
Safety
Case

= Not part of the = Work Activity = Document Release


dependability process

Figure A.7 Example dependability process

14.11.2006 2.0 A-23


EASIS Deliverable D3.2 Part 1

Since the process chart may have virtually any shape, EASIS Work Task 3.1 does not make an
attempt at defining a generic dependability process. Instead, our focus is on the specific
dependability-related activities that should be carried out. Process issues are instead investigated
in EASIS Work Package 4. Rather than trying to define the stepwise process in terms of the order
of activities and their interaction, EASIS WT 3.1 investigates each activity separately and provides
guidelines and recommendations for how to perform the activity, in early as well as late phases of
the development lifecycle. The information available as "input" to each activity and the information
to be produced by the activity will obviously be different depending on where we are in the
development process. For example, "identification of hazards" covers the following:
• how hazards can be identified in an early stage when only a rough idea of the
function(s) provided by the system, and almost nothing is known about the design and
implementation
• how hazards can be identified when the design work has progressed to a detailed level
• how hazards can be identified when there is a working prototype available
The dependability activity framework on which the EASIS WT 3.1 work is based is given in Figure
A.8. The figure shows the dependability-related activities that should be performed when an
integrated safety system is being developed. The definition of this framework is based on the
findings of the investigation of existing approaches in section A.2.

Identification of hazards A

Development
E F G and design of
B the integrated
Classification of Hazard occurrence safety system
hazards analysis
I H
J
Establishment of dependability-related C
requirements D
Verification and
K validation of
dependability-
related
requirements
Safety Case construction

Figure A.8 Dependability activity framework


The activity "Development and design of the integrated safety system" includes all development
and design activities that are not directly concerned with dependability. This includes the
development of the actual ISS function (or functions) of the system, tradeoffs between
requirements, considerations of constraints and the actual design work.
The activities that are included in the framework are the following:
• Hazard identification: Identification of the undesirable vehicle-level states and
behaviours that may be caused by the system being considered

14.11.2006 2.0 A-24


EASIS Deliverable D3.2 Part 1

• Hazard classification: Categorization of the degree of undesirability associated with


each identified hazard
• Hazard occurrence analysis: Identification and investigation of the
cause/consequence chains that may lead to a given hazard
• Establishment of dependability-related requirements: Formulation of requirements
on the design process and on the finally implemented system
• Verification and validation: Checking of whether or not the dependability-related
requirements are fulfilled (verification), and checking of whether or not the system fulfils
the dependability expectations of typical users and other stakeholders (validation)
• Safety Case construction: Production of an artefact that communicates a clear,
comprehensive and defensible argument that a system is acceptably safe to operate in
a given context.
Table A.2 shows a typical information flow at different points in time in the development lifecycle.
The information flow will of course depend on the overall development process and on the specific
system being developed (service provided by the system, system complexity, degree of novelty,
etc). Thus, the table should only be interpreted as an example of the information that may be
exchanged between the activities.

Table A.2 Typical flow of information between activities


Arrow in Figure Very early Early phase Middle phase Late phase
A.8 phase
A-D The purpose of Descriptions of Descriptions of Detailed
the system, in the functions the system descriptions of
terms of the performed by functions and functions and
service it is the system. subfunctions. subfunctions.
intended to
Descriptions of Descriptions of Source code.
provide.
the external the subsystems,
Detailed
A rough interfaces of the their interfaces
descriptions of
conceptual system. and interaction.
the subsystems,
architecture of
Conceptual Executable their interfaces
the system.
system models. and interaction.
architecture.
Hardware
Executable design.
conceptual
Executable
models.
detailed models.
Physical system.
E-F Coarse Descriptions of Descriptions of Detailed
descriptions of identified identified descriptions of
identified hazards hazards identified
hazards hazards
G Hazards Hazards Hazards Hazards
identified by identified by identified by identified by
analysis (FMEA analysis (FMEA analysis (FMEA analysis (FMEA
and similar and similar and similar and similar
techniques) of techniques) of techniques) of techniques) of
the rough the conceptual the system. the system.
conceptual architecture.
architecture.

14.11.2006 2.0 A-25


EASIS Deliverable D3.2 Part 1

Arrow in Figure Very early Early phase Middle phase Late phase
A.8 phase
H Descriptions of Descriptions of Descriptions of Descriptions of
how specific how specific how specific how specific
faults (related to faults (related to faults (related to faults (related to
the rough the conceptual the system the detailed
conceptual architecture) design) may system design)
architecture) may contribute contribute to the may contribute
may contribute to the occurrence of to the
to the occurrence of specific hazards. occurrence of
occurrence of specific hazards. specific hazards.
specific hazards.
I A list of hazards A list of hazards A list of hazards A list of hazards
in which every in which every in which every in which every
hazard is hazard is hazard is hazard is
categorized with categorized with categorized with categorized with
respect to how respect to how respect to how respect to how
undesirable it is. undesirable it is. undesirable it is. undesirable it is.
J-K Implementation- Requirements Refined Detailed
independent on error requirements on requirements on
requirements detection error detection error detection
(e.g. tolerable mechanisms mechanisms mechanisms
probability of and reaction to and reaction to and reaction to
hazards above a detected errors, detected errors, detected errors,
certain criticality in terms of broken down to for each
classification, "what" rather the subsystems. subsystem.
tolerable than "how".
Requirements
probability of
on which parts
specific hazards,
of the system
degree of fault
that should be
tolerance, etc.).
designed using
Conceptual some particular
solutions to method or
identified process (e.g.
problems. programming
language.)

A.3.1 Meaning of "hazard" in the EASIS dependability activity framework

From the description of the EASIS dependability activity framework, it should be obvious that the
scope of the activities within the framework is heavily dependant on what is meant by a "hazard".
Thus, we need a very precise description of what we mean by a hazard in this framework, much
more precise than typical hazard definitions such as "a condition that may lead to an accident". It is
important to understand that such a precise description, which is developed below, is only needed
to define the scope of the activities within this framework. The purpose of the explanation is not to
suggest a new definition of "hazard".
Figure A.9 shows the scope of three analysis activities: hazard identification, hazard occurrence
analysis and hazard classification.

14.11.2006 2.0 A-26


EASIS Deliverable D3.2 Part 1

Environment (driver, traffic,


road, visibility conditions, etc)

Root Causal Causal Potential


causes relationships relationships effects of
Hazard
of that lead to that lead to hazards
hazard hazard effects

Hazard
identification
Hazard occurrence analysis Hazard classification
(system-internal analysis) (system-external analysis)

Figure A.9 Scope of the analysis activities


The questions to be answered by these three activities are:
• Hazard identification: At some appropriately selected point between root causes and
potential effects, what are the undesirable events and states?
• Hazard occurrence analysis: What can cause a hazard to occur? How often will the
hazard occur?
• Hazard classification: How undesirable are the hazards? Given that a hazard occurs,
which effects may occur with which likelihood? How severe are these effects?
In order to enable the system-internal and system-external issues associated with a hazard to be
analysed separately from each other, the hazards should be expressed in a way that makes their
occurrences independent from the traffic situation, driver behaviour, road conditions and other
system-external conditions. This leads to the following meaning of "hazard" for the purpose of the
EASIS WT 3.1 work:

A hazard is an undesirable condition or state of a vehicle that could lead to an


undesirable outcome depending on other factors that can influence the outcome

This meaning of "hazard", for the purposes of the EASIS WT 3.1 work, is explained and motivated
by the following:
• It may certainly seem strange to define the "hazard" concept in a way that covers any
deviation from the desired behaviour, regardless of whether this behaviour is safety-
critical or not. However, with respect to the dependability activity framework, a benign
unwanted condition and a dangerous unwanted condition differ only in their degree of
undesirability. Furthermore, the EASIS project is concerned with Integrated Safety
Systems and such systems are by definition related to safety. For the purposes of our
work, any undesired condition can therefore be considered a hazard. Whether or not a
particular such condition really affects the safety is analysed in the Hazard
Classification activity, not in the Hazard Identification activity. In order to allow the
Hazard Identification to be carried out before Hazard Classification (which is obviously
necessary), we do not limit the hazard concept to safety-critical conditions only.
• It is tempting to define a hazard as an undesired behaviour rather than an underlying
condition or state. However, the relationship between a state (or condition) and the
corresponding behaviour is often dependent on the driving situation. Likewise, the
relationship between a given vehicle behaviour and its effects is also dependent on the
driving situation. If hazards were defined as undesired behaviour, the driving situation

14.11.2006 2.0 A-27


EASIS Deliverable D3.2 Part 1

would influence both the occurrence and the outcome of the hazards, generally
resulting in a complex analysis which does not allow the occurrence and the outcome to
be studied independently. This would prevent a clear separation between system-
internal and system-external analysis. Therefore, we prefer a state-based definition of
"hazard" to a behaviour-based one. However, it should be noted that when the state (or
condition) always - or almost always - results in a corresponding undesired vehicle
behaviour, it is often more practical to define the hazard as the undesired vehicle
behaviour as illustrated by the first example below:
o Example: One undesired behaviour of an airbag control system is the inflation of the
airbag in a non-collision situation. The underlying state, i.e. the corresponding
hazard, could be described as "inability to avoid inflating the airbag". Whenever this
inability occurs, it will result in an airbag inflation, so the hazard is more
conveniently described as "inflation of the airbag in a non-collision situation".
o Example: Another undesired behaviour of an airbag control system is the non-
inflation of the airbag in a collision situation. The underlying state, i.e. the
corresponding hazard, could be described as "inability to inflate the airbag". In this
case, the inability will only lead to the undesired behaviour if a collision occurs.
• Hazards are defined with respect to the vehicle and not with respect to the system of
concern. If hazards were defined with respect to the system, they would typically be
expressed in a very complex way which would make them difficult to understand and
difficult to reason about as shown in the following example.
o Example: For conventional ABS in cars, the hazards could in principle be defined in
terms of the state of electrically-controlled hydraulic valves with respect to "open"
and "closed", since the ABS output boundary would typically be chosen to coincide
with the state of these valves (or with the corresponding control signals to the
valves). The hydraulic lines, brake callipers, brake pads, brake disks, tires and road
surface are usually not considered to be parts of the ABS system even though they
influence the ABS operation.
It should be noted that the examples provided above are heavily simplified. All relevant
characteristics of the hazards should be included in the description of a specific hazard. Thus, a
hazard can often be broken down into several different hazards with different characteristics.
Examples of such characteristics are:
• The magnitude of the potential deviation from the desired behaviour (force, velocity,
etc)
• The duration of the hazard. (Some hazards are characterized by a specific duration
because of the way the system is, or may be, implemented)
• Information provided to the driver about the existence of the hazard. (For the airbag
example, there is obviously a large difference between "airbag inoperable and driver is
informed about this" and "airbag inoperable and driver is not informed about this".)
The subject of hazard descriptions is addressed in more detail in Appendix B.
With the proposed interpretation of "hazard", the investigation of the causal relationships that lead
to hazards are in principle confined to the system under consideration2. The behaviour of the
driver, the traffic situation and other environmental conditions do not influence the hazard
occurrence. This simplifies the investigation of "what might cause the hazard". The driver
behaviour, the traffic, etc will only affect the effects of the hazard. Thus, the investigation of the
hazard is separated into a system-internal analysis ("What might cause the hazard?") and a

2 The root cause may be system-external, for example electromagnetic interference, water, mechanical vibrations or physical damage. It
may also be located in a different system which provides input data to the system of concern. In all of these cases, however, there is a
system-internal event or state that can be considered as a cause of the hazard in the Hazard Occurrence Analysis.

14.11.2006 2.0 A-28


EASIS Deliverable D3.2 Part 1

system-external analysis ("What are the potential outcomes and how likely are they given that the
hazard occurs?"), as shown in Figure A.9.

A.3.2 Special cases requiring a different approach than the one outlined in the dependability
activity framework

For hazards potentially caused by faults in the system of concern or in its input information, the
dependability activity framework outlined in this document is a logical way of defining the
dependability-related activities that should be performed when a system is being developed.
However, there are some cases when the framework is less appropriate:
• A particular hazard may be continuously present due to natural limitations of the technology
employed. Integrated safety applications typically use environment-sensing components
such as radar, laser and cameras. It is well-known that natural limitations of these
technologies as well as environmental conditions may cause the sensors and their
associated signal processing to deliver incorrect information about the state of the physical
objects that they monitor. Even if a system is fault-free, the behaviour of the vehicle may
therefore still be undesired from the driver's point of view. The inability to cope with a
particular environmental situation is thus present continuously and the occurrence of the
undesired vehicle behaviour will depend on the occurrence of this environmental situation.
• Another case is when the intended system behaviour may be undesirable in some specific
(typically rare) situations. For example, most people would agree that it should not be
possible to start a vehicle without the proper key (or some similar device) but it is possible
to envision scenarios in which it would be highly desirable to be able to start the vehicle
without it. Many other - and better - examples can be found.
• A third case is when the user may misunderstand the operation of the system or may have
wrong expectations about its capabilities.
• A fourth case is when the driver may become distracted by the operation of the system or
by the information provided via the Human-Machine Interface.
In all of these cases, the dependability approach can not be separated into a system-internal
analysis (Hazard Occurrence Analysis) and a system-external analysis (Hazard Classification).
Such hazards are in principle outside the scope of our work, but the following general
recommendations concerning how to deal with them can be given:
• Identify hazards associated with natural limitations, specific situations, misunderstandings,
distraction (or associated with any other phenomenon that is not related to faults, errors or
failures)
• For each such hazard, investigate the feasibility and suitability of the following actions:
o Selection of a technology that has little inherent limitations
o Provision of information to the user about specific situations and how to handle
them (typically in the user manual)
o Provision of information to the user about the operation, capabilities and limitations
of the system (typically in the user manual)
o Particular care in the design of the Human-Machine Interface, including the
displaying of visual information
o Introduction of in-vehicle stickers such as the well-known warnings concerning
passenger airbags and child seats
o Re-definition of the basic functionality of the system

14.11.2006 2.0 A-29


EASIS Deliverable D3.2 Part 1

References

[1] S. Amberkar, J. G. D’Ambrosio, B. T. Murray, J. Wysocki, B. J. Czerny, "A System-Safety Process for By-
Wire Automotive Systems", SAE 2000 World Congress, SAE 2000-01-1056, 2000.
[2] ARP 4754: Certification Considerations for Highly-Integrated or Complex Aircraft Systems, Society of
Automotive Engineers, 1996.
[3] ARP 4761: Guidelines and methods for conducting the safety assessment process on civil airborne
systems and equipment, Society of Automotive Engineers
[4] Def Stan 00-54 Requirements for Safety Related Electronic Hardware in Defence Equipment, UK Ministry
of Defence, 1997.
[5] Def Stan 00-55 Requirements for Safety Related Software in Defence Equipment, UK Ministry of Defence,
1997.
[6] Def Stan 00-56 Safety Management Requirements for Defence Systems, UK Ministry of Defence, 2004.
[7] Development Guidelines for Vehicle Based Software, MISRA, 1994.
[8] DO-178B Software Considerations in Airborne Systems and Equipment Certification, RTCA, 1999.
[9] DO-254 Design assurance guidance for airborne electronic hardware, RTCA
[10] IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems, IEC,
1998.
[11] MIL-STD-882D Standard Practise for System Safety, U.S. Department of Defence, 2000.
[12] ECE R13-H, UNIFORM PROVISIONS CONCERNING THE APPROVAL OF PASSENGER CARS WITH
REGARD TO BRAKING, United Nations Economic Commission for Europe
[13] ECE R79, UNIFORM PROVISIONS CONCERNING THE APPROVAL OF VEHICLES WITH REGARD TO
STEERING EQUIPMENT, United Nations Economic Commission for Europe
[14] C. Jung, Stand des ISO-Standards zur Funktionalen Sicherheit für die Automobilindustrie, (mainly in
English), Presentation at Safetronic 2005

14.11.2006 2.0 A-30


EASIS Deliverable D3.2 Part 1 - App B

Deliverable D3.2 Part 1 – Appendix B

Hazard identification and classification

Version number: 2.0

Date of preparation: 14.11.2006

© 2006 The EASIS Consortium


EASIS Deliverable D3.2 Part 1 - App B

This page intentionally left blank

14.11.2006 2.0 ii
EASIS Deliverable D3.2 Part 1 - App B

Table of contents

B.1 Introduction and objective........................................................................... B-1


B.1.1 What is a hazard? ......................................................................... B-1
B.2 Hazard Identification................................................................................... B-3
B.2.1 Objective of the Hazard Identification section............................... B-3
B.2.1.1 What is meant by hazard identification? .............................. B-3
B.2.1.2 Position in dependability framework .................................... B-4
B.2.1.3 How to formulate hazards.................................................... B-4
B.2.2 Hazard identification methods....................................................... B-5
B.2.2.1 Template method description .............................................. B-5
B.2.2.2 Checklists ............................................................................ B-6
B.2.2.3 HAZOP, HAZards and OPerability Analysis ........................ B-9
B.2.2.4 FHA, Functional Hazard Assessment................................ B-12
B.2.2.5 Hazard Identification Based on State Transition Models ... B-13
B.2.2.6 FMEA................................................................................. B-15
B.2.2.7 Identification of hazards that are not related to system
failures ............................................................................... B-15
B.2.2.8 Combined hazards............................................................. B-16
B.2.3 Recommended approach for Hazard Identification..................... B-16
B.3 Hazard classification ................................................................................ B-17
B.3.1 Existing approaches.................................................................... B-17
B.3.1.1 Risk graph ......................................................................... B-17
B.3.1.2 Controllability ..................................................................... B-20
B.3.1.3 MISRA risk graph............................................................... B-23
B.3.1.4 ASIL classification in ISO Working Draft 26262................. B-26
B.3.1.5 MIL-STD-882D................................................................... B-29
B.3.1.6 Severity classification in SAE J1739.................................. B-31
B.3.2 An alternative hazard classification approach............................. B-32
B.3.2.1 Background ....................................................................... B-32
B.3.2.2 A novel method for hazard classification ........................... B-35
B.3.3 Hazard classification benchmarking ........................................... B-38
B.3.3.1 The hazards considered .................................................... B-38
B.3.3.2 The schemes considered .................................................. B-39
B.3.3.3 Results of hazard classification ......................................... B-39
B.3.3.4 Summary of results............................................................ B-48
B.3.3.5 Discussion of results.......................................................... B-48
B.3.3.6 Conclusions ....................................................................... B-51

14.11.2006 2.0 B-iii


EASIS Deliverable D3.2 Part 1 - App B

B.4 References ............................................................................................... B-54

14.11.2006 2.0 B-iv


EASIS Deliverable D3.2 Part 1 - App B

B.1 Introduction and objective

The objective of this Appendix is to provide guidance on the processes of hazard identification and
hazard classification within the context of the dependability activities of an Integrated Safety
System. In the context of this Appendix, the emphasis is understood to be particularly on the
functional safety aspects of dependability, although the subject in general has a much wider scope.
In broad terms, hazard identification is concerned with a process of determining the hazards that
are associated with a system. Hazard classification is a process of determining the criticality of a
hazard in terms of its potential consequences given that the hazard occurs. This classification
provides the basis for determining requirements related to the prevention of the hazard. In generic
standards (such as in [5]), this is often expressed as the necessary risk reduction in order to
reduce the hazard risk to a broadly acceptable level. This concept is shown in Figure B.1 below
(adapted from [5]). For automotive electronic systems, the main emphasis is usually on preventing
the hazard occurring, or reducing the probability of it occurring, such that the resulting hazard risk
is at or better than the “broadly acceptable” level. Note that this part of the analysis is outside the
scope of the Appendix (see Appendix C “Hazard Occurrence Analysis”).

Residual Broadly Hazard risk


risk acceptable risk

Necessary minimum risk reduction [∆R]


Increasing
Actual risk reduction risk

Figure B.1: Principle of risk reduction (adapted from[5])


Hazard identification and hazard classification are commonly part of the process known overall as
“hazard analysis” or “safety analysis”.

B.1.1 What is a hazard?

One of the most fundamental concepts of the process outlined above is the word “hazard”. In the
system safety context, the meaning of this word differs from the natural language understanding of
“hazard” (and indeed the dictionary definition). The natural language usage of hazard is anything
that is potentially dangerous or has the potential to cause harm, even if the combination of
circumstances leading to actual harm is improbable. Most states of a system have the potential to
cause harm or lead to an accident given certain conditions. In the natural language usage, a
moving vehicle is in a hazardous state; however it is not possible to design any transportation
system where the vehicles remain stationary. In the system safety definition, a vehicle travelling
along an empty road in good weather, with good road surface conditions, etc. does not constitute a
hazard. A hazard would be, for example, that the engine produces more torque than the driver
demands. This may or may not lead to an accident depending on the ability of the driver to react to
the hazard, including the application of appropriate backup systems or other mitigating measures.
The choice of boundary for the system and for hazard identification is therefore very important.
The boundary chosen needs to include the states over which the system designer has control, but
there is no purpose defining the boundary to include all conditions that could contribute to a
particular accident as many of these will be outside the control of the system designer.
The following meaning of “hazard” is assumed in the context of the EASIS dependability
framework. It is based on consideration of a number of system safety standards, notably [3]. Note
that this is not intended as a definition but as an explanation of what type of hazards that are

14.11.2006 2.0 B-1


EASIS Deliverable D3.2 Part 1 - App B

identified, classified and analysed in the EASIS dependability framework. Appendix A discusses
this issue further.
A hazard is an undesirable condition or state of a vehicle that could lead to an
undesirable outcome depending on other factors that can influence the outcome.
This recognizes the following sequence of events that occurs in order for a fault in a system, or
other event, to lead to an accident and ultimately to harm:

When a particular system is being considered, we are generally only concerned with those hazards
that result from that system. Note, however, that for integrated safety systems it may also be
necessary to consider those hazards that result from emergent properties, that is, from interactions
between individual systems. An example of such an emergent property may be seen by
considering a traction control function and a cruise control function. The traction control function
detects that the drive wheels are slipping and reduces engine torque. A side-effect of this is that
the vehicle slows down, so the cruise control requests increased engine torque to compensate.
Unless the interaction of these systems is correctly defined (for example, by the engine
management system having a function to prioritise torque requests and cancel a conflicting
function) then the two functions could “fight” each other.

14.11.2006 2.0 B-2


EASIS Deliverable D3.2 Part 1 - App B

B.2 Hazard Identification

B.2.1 Objective of the Hazard Identification section

The objective of the Hazard Identification section is to explain what is meant by hazard
identification, the different options available when performing hazard identification as well as
recommending an EASIS hazard identification process.

B.2.1.1 What is meant by hazard identification?

Hazard identification is the process of finding the vehicle-level states and conditions that could
contribute to the occurrence of undesirable outcomes and that are associated with the particular
system of concern. In this document, we are primarily concerned with hazards associated with
potential malfunctions, but hazards associated with fault-free operation of the system are also
addressed to some extent. Note that a clear separation between correct function and malfunction
can only be made if there is a complete functional specification available which is usually not the
case. There might be hazards created by the correct function of the system of concern which
should be considered. Example: a correctly deploying airbag could seriously injure a child in a
reverse-facing child seat. This is clearly a hazard that should be identified and handled but as long
as it is not a function of the airbag system to detect this situation the hazard is not related to a
malfunction.
Without having identified the hazards of the system, it is impossible to analyse the associated risks
and subsequently address these during the system development.
The hazards that will be focused upon in this section are the ones related to Integrated Safety
Systems. A very simple model of such a system is shown in Figure B.2. The system monitors the
environment via inputs, it acts on the vehicle through an interface and it determines this action in
either one processing unit (electronic control unit, ECU) or distributed over several ECUs
interconnected via one or more communication networks.

3.
Interface
Input

System Under
Consideration
A
S
S

S
ECU1

ECU3
ECU2 A
S
Vehicle Effect

2. 1.
S
S
ECU1

A
ECU2 S

S
A
ECU3

Figure B.2 Model of the system


Although the model in Figure B.2 is very simple, it is sufficiently detailed to illustrate that hazards
can be identified in three fundamentally different ways:
1 Hazards can be identified by looking at the effect on the vehicle and the surroundings. For
example, the movement of the vehicle could deviate from the intended.

14.11.2006 2.0 B-3


EASIS Deliverable D3.2 Part 1 - App B

2 Hazards can be identified by looking at what might cause it, looking inside the system under
consideration. For example, a sensor could be faulty or an ECU could make an inappropriate
decision due to a hardware or software fault.
3 Hazards can be identified by looking at the interfaces between the system under consideration
and the vehicle. For example, the system under consideration could create an undesirable
mechanical torque via an actuator.

B.2.1.2 Position in dependability framework

As input to the hazard identification, as much information as possible about the system should be
gathered. Once identified, the hazards may be analysed with respect to criticality (“classification of
hazards”) and occurrence (“hazard occurrence analysis”). This is illustrated in Figure B.3 which
shows the position of the hazard identification in the EASIS Dependability Activity Framework.

Identification of hazards

Development
and design of
the integrated
Classification of Hazard occurrence
safety system
hazards analysis

Establishment of dependability-related
requirements
Verification and
validation of
dependability-
related
requirements
Safety Case construction

Figure B.3 Dependability activity framework

B.2.1.3 How to formulate hazards

A hazard can usually be expressed as a specific inability of the vehicle to behave as desired
and/or intended.
All relevant hazard characteristics should be included in the description of a specific hazard. Thus,
a hazard can often be broken down into several different hazards with different characteristics.
Examples of such characteristics are:
• The magnitude of the potential deviation from the desired behaviour (force, velocity,
etc)
• The duration of the hazard. (Some hazards are characterized by a specific duration
because of the way the system is, or may be, implemented)
• Information provided to the driver about the existence of the hazard. (For the airbag
example, there is obviously a large difference between "airbag inoperable and driver is
informed about this" and "airbag inoperable and driver is not informed about this".)
A description of a hazard does not necessarily have to be static throughout the development
project. In the beginning of the project it might be useful to formulate the hazard in very coarse
terms. In later stages when more detailed information about the system implementation is

14.11.2006 2.0 B-4


EASIS Deliverable D3.2 Part 1 - App B

available, the hazard may be more precisely defined, taking into consideration magnitudes and
duration of deviations as well as other characteristics of the hazard.
Some examples of hazard formulations are given below:
• Undemanded inflation of the airbag
• Inability to inflate the airbag
• Time-limited inability to inflate the airbag

B.2.2 Hazard identification methods

This section gives an overview of the various methods available for hazard identification.

B.2.2.1 Template method description

Table B.1 shows the template used in this document for describing the hazard identification
methods. It is based on a template in [2].

Table B.1 Template method

“Name of the method”


References used References used to asses the method
Alternate names Other names the method is known by
Primary objective The purpose of the method (what hazards will it find)
Description A description of the process of applying the method
Reference information

Development engineer Description of how the development engineer will use this
perspective method.
Organisational Description of how the organisation handles this method
perspective
Preconditions What will need to be available before this method can be
used?
Applicability range What types of error will it find (HW/SW/HMI/organisational)
Life cycle stage The earliest life cycle stage where the method is applicable
Experience in automotive Has the method been used previously in automotive
application development?
Related methods Alternate, overlapping or complementary methods.
Availability and tool Indicates the availability of tools and the commercial
support readiness of the method
Maturity The extent to which the method is ready for “prime time” and
has proven itself useful in application.
Evaluation

Ease of integration Is the method easy to understand and use? Is it easy to


integrate with other methods?
Documentability The degree to which the method lends itself to auditable
documentation
Advantages How well will this method help find hazards? Other general
advantages of the method?

14.11.2006 2.0 B-5


EASIS Deliverable D3.2 Part 1 - App B

Disadvantages Any restrictions on applicability?

B.2.2.2 Checklists

Table B.2 Checklist

Checklists
References used None
Alternate names None
Primary objective The goal of using checklists is to reduce the possibility that a
known hazard, which is relevant for the system being
considered, is missed when hazards associated with this
system are identified.
Description The basic tool for identifying hazards, and the one most easily
used, is a checklist. This lists the most common hazards which
should be taken into consideration when analysing a system in
order to identify hazards.
The checklists are developed by domain and dependability
experts, and should contain all hazards that are known to be
relevant for a vehicle, thus being potentially relevant for any
automotive system. The hazards on the checklists could also
be accompanied by a pre-decided hazard classification. (For
further information about hazard classification see section B.3).
Reference information

Development engineer The checklist helps the development engineer to reduce the
perspective possibility that common hazards are missed in the hazard
identification.
It is important to stress the fact that the developer should not
assume that all hazards have been identified just because the
checklist has been run through and "boxes have been ticked".
There might be hazards associated with the particular system
that are not covered by the generic checklist.
Organisational The checklists can be constructed from several different
perspective sources. They could, for example, be constructed from earlier
experiences of actual accidents/incidents, either from similar
systems or from other types of systems. Other good sources
are, of course, experts; both the application experts (for the
system at hand) and dependability experts (for all types of
systems).
For the development of the checklists it is important to connect
the hazards to possible accident scenarios and what
constraints the system has to abide to.
Preconditions Before the development engineer can start using the checklists
for hazard identification, he/she needs to have the checklists
ready (usually from someone else) and a good understanding
of the system and the system boundary.
Applicability range Checklists cover all types of hazard identification depending on
the area the actual checklist is defined for.

14.11.2006 2.0 B-6


EASIS Deliverable D3.2 Part 1 - App B

Life cycle stage Checklists are primarily useful in early life cycle stages.
Experience in Checklists overall are well spread in the automotive industry
automotive application but we do not know of any checklists containing “standard”
hazards for automotive systems.
Related methods None
Availability and tool Checklists are a very simple technique without a great need for
support advanced tools, but it is easy to imagine a web based tool for
administrating the checklists.
Maturity Checklists are commonly used in the automotive industry.
However, they may need updating to address hazards
associated with advanced future systems.
Evaluation

Ease of integration Checklists are very easy to integrate with most other methods.
Documentability Checklists leave an easy trail of documents.
Advantages Easy to use for the engineer. The possibility that a relevant
hazard is forgotten is reduced. Checklists can be re-used and
continuously improved.
Disadvantages Will only help finding known hazards. A particular system may
have hazards that are hitherto unknown and therefore missing
in the checklist. Thus, checklists should be used with some
care.

Table B.3 Example hazard checklist (general hazard list)


Hazard
† Sudden acceleration
† No brake capability
† Skewed (imbalanced) braking
† Braking too little
† Braking too much
† Airbag does not work
† …

Using the hazard checklist above (


Table B.3) when doing hazard identification for a brake system, checking of the hazards relevant
for your system, would look like
Table B.4:

14.11.2006 2.0 B-7


EASIS Deliverable D3.2 Part 1 - App B

Table B.4 Hazard list for the brake system


Hazards for Brake system
† Sudden acceleration
; No brake capability
; Skewed (imbalanced) braking
; Braking too little
; Braking too much
† Airbag does not work
† …

We would like to point out that the tables above, of course, are not official hazard lists from the
EASIS project, but only simplified examples.

14.11.2006 2.0 B-8


EASIS Deliverable D3.2 Part 1 - App B

B.2.2.3 HAZOP, HAZards and OPerability Analysis

Table B.5 HAZOP

HAZOP
References used [7], [10], [11]
Alternate names HAZards and OPerability analysis
Primary objective Identification of potential deviations from the expected
operation of the system and identification of the hazards
corresponding to these deviations.
Description HAZOP assumes accidents are caused by deviations from
the intended operation, such as no signal or wrong signal
value in an electronic system.
It is a qualitative technique based on the question: ”What if
[entity] [attribute] is [guideword]?”, where [entity] is a label
associated with an interconnection between components,
the [attribute] is a property of this entity and [guideword] is a
Reference information

word that describes a deviation from the design intent. The


technique is performed systematically, considering each
entity in turn.
As described in [10], for automotive systems the technique
could be reduced so that the entities considered are only
those associated with the outputs of the control system.
Attributes are thus derived from behaviour at the electro-
mechanical or communications boundary and the
guidewords are application specific.
For each deviation, possible causes and possible
consequences should be documented in order to later
analyse the relevance of the hazard.
In comparison with techniques such as checklists, HAZOP
is able to elicit hazards in new designs and hazards that
have not been considered previously.
HAZOP can be performed at different levels of the system,
but we are primarily concerned with the usage at the system
output boundary, including the Human-Machine Interface. If
HAZOP is applied to system-internal entities, the method is
quite similar to FMEA, further described in Appendix C.

14.11.2006 2.0 B-9


EASIS Deliverable D3.2 Part 1 - App B

Development engineer HAZOP is a group process in which the HAZOP team


perspective considers the following:
1 The intended functionality of the system in terms of its
interaction with its environment
2 The potential deviations from the intended functionality,
as defined by the entity-attribute-guidewords
combinations.
3 The causes of these deviations from the design intention
(technically outside the scope of hazard identification)
4 The consequences of the identified deviations
The results are typically documented in a table looking like
the following example:
Entity Attribute Guideword Interpretation Causes Consequences

Brake actuator Force None No brake force .........


No deceleration of vehicle

Note that the "Causes" column in this example is technically


a part of Hazard Occurrence Analysis that is addressed in
Appendix C of this EASIS deliverable.
Organisational The organisation should have a defined process for how to
perspective carry out the HAZOP analysis.
The HAZOP team typically includes application specialists,
system specialists from the subsystems involved in the
system and a facilitator who is an expert on the HAZOP
process.
Preconditions Before a HAZOP analysis can be done, the HAZOP team
need to have an understanding of the intended functionality
of the system. Furthermore, a suitable set of guidewords
need to be defined.
Applicability range HAZOP will help find potential failures of a system.
Life cycle stage The earliest possible life-cycle stage where HAZOP can be
used is when the system boundaries have been defined and
the design intentions of the systems are clear. Based on
this, the entities and attributes to be analysed can be
defined.
Experience in automotive The HAZOP method was originally developed for chemical
application plants but has also been used in other domains including
automotive [10].
Related methods FHA and FFA are related to HAZOP but consider the
functionality rather than the interconnections.
FMEA is in some ways similar to HAZOP, but is suitable for
a later stage when more info about the design is available.
Availability and tool HAZOP is readily usable for the automotive industry.
Evaluation

support Several tools to aid in the HAZOP process are available.


Maturity HAZOP was developed in the 1960s and is commonly used
in the processing industry. It has recently been applied in
the automotive industry [10].

14.11.2006 2.0 B-10


EASIS Deliverable D3.2 Part 1 - App B

Ease of integration HAZOP is easy to understand and is complementary to


other hazard identification methods.
Documentability The resulting tables document the work and provide good
insight into the analysis.
Advantages HAZOP encourages creative thinking of what might go
wrong while still being a systematic method. It is particularly
strong when used as a group tool. It can be used to elicit
hazards in new products as well as hazards that have not
been considered previously.
Disadvantages Using HAZOP can be quite time-consuming. Furthermore, it
is quite hard to find hazards that are caused by fault
combinations.

Example entities: brake calliper, airbag, engine system


Example attributes: force, expansion, data flow, value, timing, level
Example guidewords:

Guideword Possible interpretation


Missing The output is not produced
Too high The magnitude of the output is higher than intended
Too low The magnitude of the output is lower than intended
As well as Intended output but with additional result
Part of Only part of the intended activity occurs
Reverse The opposite of what is intended occurs
Other than The intended does not happen but something else happens
Early Something happens earlier than what is intended
Late Something happens later than what is intended
Before Something precedes something else that it should succeed
After Something succeeds something else that it should precede
Inadvertent Something happens when it should not
Stuck The output does not change value

14.11.2006 2.0 B-11


EASIS Deliverable D3.2 Part 1 - App B

B.2.2.4 FHA, Functional Hazard Assessment

Table B.6 Functional Hazard Assessment

Functional Hazard Assessment


References used [1], [9], [15]
Alternate names Functional Hazard Analysis, FHA, and Functional Failure
Analysis, FFA, have both been used to denote this method
(with some minor variations).
Primary objective To find hazards related to the function or malfunction of the
system
Description A Functional Hazard Assessment (FHA) is a typical
preliminary step to identify and classify potentially hazardous
failure conditions, and to describe them in functional and
operational terms. An FHA is qualitative and is conducted
using experienced engineering and operational judgment.
Development engineer Sequence as described in [1]:
perspective 1. Examination of each function for potential failure modes
in three classes:
• Loss of function (omission)
• Function provided when not required
Reference information

(commission)
• Incorrect operation of function (stuck, high, low,
etc.)
2. Postulation of hazards based on the failures in these
functions.
3. Determination of the effects of each failure. Whenever
appropriate, effects are determined in combination with
other contributing factors, e.g. environmental factors.
Organisational The organisation should have a defined process for how to
perspective carry out the FHA analysis.
Preconditions The functions of the system have to be defined before a
functional hazard assessment can be carried out.
Applicability range FHA will find hazards related to the functionality of the
system.
Life cycle stage The hazard identification part of FHA can be performed as
soon as the system functions have been defined even if
these definitions are coarse and conceptual.
Experience in automotive FFA has been used at Volvo Car Corporation as described
application in [9]
Related methods Both FMEA and HAZOP are methods that focus on other
aspects than the functionality and can thus be used in
parallel to FHA.

14.11.2006 2.0 B-12


EASIS Deliverable D3.2 Part 1 - App B

Availability and tool We are not aware of any specific tool support for this
support method.
Maturity The technique has been used for a while in the aeronautic
industry, but is quite new as a tool in the automotive
industry.
Evaluation

Ease of integration The method is quite easy to understand and can be used in
parallel with other methods.
Documentability The tables resulting from the analysis provide a good
documentation.
Advantages FHA is a systematic and structured technique. It is also
relatively simple and straightforward to apply.
Disadvantages It is quite hard to find hazards that are caused by fault
combinations.

Table B.7 Vehicular example


Function Failure Interpretation Effect on system
mode
Acceleration Omission No acceleration No acceleration when
available needed
Commission Sudden acceleration Unexpected acceleration

Stuck Constant acceleration Unexpected acceleration

B.2.2.5 Hazard Identification Based on State Transition Models

Table B.8 Hazard Identification Based on State Transition Models

Hazard Identification Based on State Transition Models


References used None
Alternate names None
Primary objective Hazard identification based on state transition models is
Reference information

used to find behaviours that might lead to hazardous


situations.
Description The method is used to systematically analyse what happens
if the system is not in the expected state.
Development engineer 1 Go through the state machine description and analyse
perspective what might happen to the system if:
• a spontaneous transition between states occurs
• a state transition does not occur when it should.
2 Document the behavioural discrepancies and the
associated hazards

14.11.2006 2.0 B-13


EASIS Deliverable D3.2 Part 1 - App B

Organisational None specific.


perspective
Preconditions A state machine model of the system needs to exist before
the analysis can be carried out.
Applicability range This method will help find problems with the system
behaviour in terms of its state transitions.
Life cycle stage The method is applicable as soon as a state diagram model
of the system, at any level of abstraction detail, is available.
Experience in automotive The method has only been informally used.
application
Related methods None.
Availability and tool We are not aware of any specific tool support for this
support method.
Maturity Unknown
Ease of integration The technique can be used in parallel to other methods
focusing on other system aspects.
Documentability Documenting of the results should be quite straightforward.
Evaluation

Advantages Analysing each state transition from a “what-if” perspective


is a very systematic method.
As long as complex and detailed state transitions models
are avoided, the technique is very simple to apply.
Disadvantages If the behaviour is complex the resulting state machine might
be too large to analyse by hand. (However, the abstraction
level may be chosen at a high level so that the complexity is
kept low, for example as shown in the Cruise Control
example in Figure B.4.)

Figure B.4 shows an example state transition model with states and transitions describing the
behaviour of a hypothetical cruise control system. The following questions (and many more) may
be formulated during the hazard identification process:
• What happens if the system moves from the Stand By state to the Active state when the
"set +/-" condition is not fulfilled?
• What happens if the system does not go into the Stand By state from the Active state when
the driver does a "cancel" action?
The associated hazards can be described as:
• Spontaneous activation of the cruise control
• Not possible to deactivate the cruise control

14.11.2006 2.0 B-14


EASIS Deliverable D3.2 Part 1 - App B

Faulty
error error
repair error
set +/-
on
Off Stand By Active
off
cancel
off

Figure B.4 Example Cruise Control State Machine

B.2.2.6 FMEA

In addition to being a method for analysis of cause-consequence relationships that lead to


hazards, the FMEA process is also an important hazard identification source. Hazards identified
during this process should be brought together with the hazards identified by other methods into
the hazards list.
As identified in the HAZOP description (section B.2.2.3), FMEA and HAZOP are similar, where
HAZOP should be used at the boundary of the system and FMEA on the inside of the system. This
also points to the fact that it is possible to extend the FMEA process with keywords that help
identify failure modes.
FMEA is further described in Appendix C.

B.2.2.7 Identification of hazards that are not related to system failures

As discussed in section A.3.2 of Appendix A, hazards may exist even when there is no fault, error
or failure involved. Such hazards may be identified by careful consideration of relevant questions,
including the following:
• Are their any inherent limitations in the employed technology that could make the system
behave in an undesirable way in particular situations?
• Are there any potential scenarios in which the intended functionality of the system is
undesirable?
• Is there a possibility that the driver (or other person) may interact with the system in an
inappropriate manner due to a misunderstanding of its functionality?
• Is there a possibility that the driver (or other person) may have wrong expectation about the
capabilities and limitations of the system?
• Is there a possibility that the driver (or other person) is distracted by the behaviour of the
system or by the information provided via the Human-Machine Interface?
Hazards of this type are in principle outside the scope of our work so here we only conclude that
such hazards have to be considered in addition to the failure-related hazards.

14.11.2006 2.0 B-15


EASIS Deliverable D3.2 Part 1 - App B

B.2.2.8 Combined hazards

It should be noted that a fault may lead to more than one identified hazard. The resulting
combination can be considered as a separate hazard and should be included in the list of hazards.
The possibility of such combined hazards should be considered when the hazard identification is
carried out. This implies that an iteration is necessary between the Hazard Identification and the
Hazard Occurrence analysis, since the latter investigates the relation between causes and the
resulting hazards.
As a simple example, let us assume that the following hazards have been identified for a
Collision Avoidance System (CAS):
• H1: unnecessary reduction of engine torque
• H2: unnecessary activation of brakes
It is quite obvious that any fault that makes the CAS falsely believe that a collision is about
to occur would lead to a simultaneous occurrence of H1 and H2. Thus, it is reasonable to
assume that sensor faults and faults in the CAS software or hardware may lead to the
combination H1+H2. A third hazard should therefore be defined as follows:
• H3: unnecessary reduction of engine torque and simultaneous activation of
brakes
Some hazard identification techniques are less effective than others for identification of combined
hazards. For example, a HAZOP performed on the outputs of a system would typically not find
combined hazards since it considers one output at a time.

B.2.3 Recommended approach for Hazard Identification

The most effective way to identify hazards appears to be to use several methods since they
complement each other. Checklists can be used for an easy start followed by HAZOP, FHA and
'Hazard Identification Based on State Transition Models' for the different aspects they bring into the
analysis. FMEAs are already a part of typical automotive development processes and it is a simple
task to transfer the "Effects" listed in the FMEA tables to the hazard list. Identification of hazards
that are not related to failures (see B.2.2.7) should also be made and the results incorporated in
the list of hazards.
In an early phase the hazards are typically described somewhat coarse, to be refined into more
detail in later phases.
It should be ensured that people with different competences and viewpoints are involved in the
hazard identification.
Finally, it needs to be pointed out that hazards may be identified by other methods than those
listed above. For example, a vehicle test drive may reveal a hitherto unknown hazard. In fact, any
activity performed during the development of a system may have hazard identification as a “side-
effect”. It is essential that such hazards are communicated and included in the list of hazards.
Here, we only conclude that a hazard reporting scheme should exist so that hazards detected
during development are properly identified and documented.

14.11.2006 2.0 B-16


EASIS Deliverable D3.2 Part 1 - App B

B.3 Hazard classification

In Section B.3.1, a number of existing approaches to hazard classification are analysed. A novel
approach is presented in Section B.3.2.

B.3.1 Existing approaches

A number of existing approaches to hazard classification have been examined and are
summarized in the following subsections.

B.3.1.1 Risk graph

IEC 61508 [5] presents a risk graph approach in Part 5, “Examples of methods for the
determination of safety integrity levels” [5]. Note that this is an informative part of the standard
rather than one of the normative parts. Furthermore, it must be emphasized that the risk graph
presented in the standard is an example. The parameters used in the risk graph and their
weightings need to be developed for each sector and/or application, and should be defined in
sector specific standards. It should also be noted that risk graphs are typically applied for low-
demand (protection) systems where a specific safety-related system is added to an existing
process or system to act as a risk-reduction measure1.
The risk graph is based on the following equation:
R = f(f , C)
where:
R is the risk associated with the hazardous event with no protection measures in place;
f is the frequency of the hazardous event with no protection measures in place;
C is the consequence of the hazardous event (e.g. could be related to injury or to
environmental harm)
f represents a generalized (but not specified) function that combines the parameters. In the
most general sense, this could be by a quantitative means (in which case the combination
is by multiplication), or by a qualitative means (such as is used in a risk graph scheme).
The frequency f of the hazardous event is considered to be made up of three influencing factors:
• The frequency of and duration of exposure in the hazardous zone;
• The possibility of avoiding the hazardous event;
• The probability of the hazardous event taking place (without any protective measures),
which is called the probability of the unwanted occurrence.
This leads to the following parameters in a general risk graph scheme:
C consequence of the hazardous event
F frequency of, and exposure time in, the hazardous zone
P possibility of avoiding the hazardous event
W probability of the unwanted occurrence

1The model of hazards and risks used in IEC 61508 is different from the automotive model, as discussed in
Appendix A. We present the IEC formulation of a risk graph as an example of the type of approach to
hazard classification that can be used.

14.11.2006 2.0 B-17


EASIS Deliverable D3.2 Part 1 - App B

It may also be necessary to develop additional or alternative risk parameters depending on the
application sector and the technologies in use.
These parameters are combined using a graph (note that “graph” is used in the mathematical
sense).

W3 W2 W1
CA
a - -

PA
1 a -
CB FA PB
Start FB 2 1 a
PA
CC FA PB
FB PA
3 2 1
CD FA PB
FB 4 3 2
PA
PB
b 4 3

Figure B.5 Generic structure of a risk graph

Figure B.5 shows the basic structure for a generic risk graph. Each of the parameters C, F and P
is evaluated in turn leading to the requirements shown under the column “W3”. This column shows
the required risk reduction assuming that all of the risk reduction is to be achieved by E/E/PE
(electrical, electronic or programmable electronic) systems. If there are other non-technology
based means of risk reduction (i.e. external risk reduction facilities) then it can be argued that a
different rating for the W parameter is used, which reduces the risk reduction required for the
E/E/PE systems. The required risk reduction is indicated by the legend in the appropriate box as
follows:
- no safety requirements
a no special safety requirements
1–4 SIL required for the E/E/PE system
b a single E/E/PE system is not sufficient to achieve the required risk reduction
Certain aspects of the risk graph approach as presented in IEC 61508 can be difficult to apply
directly to automotive systems, where safety is an inherent part of the functionality of the system
and not achieved through a separate system or function. For example the W parameter is difficult
to interpret in the automotive sector and different approaches have therefore been used as
described in the sections below about the MISRA Risk Graph and ISO WD 26262 Risk Graph
approaches.

14.11.2006 2.0 B-18


EASIS Deliverable D3.2 Part 1 - App B

B.3.1.1.1 Example parameters

The parameters in the risk graph have to be developed according to the application. For example,
the following scheme might be used:

Risk parameter Classification


Consequence CA Minor injury
CB Serious permanent injury or one death
CC Multiple deaths
CD Very many deaths
Frequency and time of exposure FA Rare to more often exposure in the
hazardous zone
FB Frequent to permanent exposure in the
hazardous zone
Possibility of avoiding the hazardous event PA Possible under certain conditions
PB Almost impossible
Probability of the unwanted occurrence W1 A very slight probability and only a few
unwanted occurrences likely
W2 A slight probability and few unwanted
occurrences likely
W3 A relatively high probability and frequent
unwanted occurrences likely

Please note that these classifications are for illustration only. For examples of how the risk graph
has been interpreted in practice, see the discussion below of the MISRA Risk Graph and ISO WD
26262 Risk Graph approaches.
It is possible to assign numerical ranges to some of the parameters. In this case, the risk graph is
now a semi-qualitative approach and can be referred to as a “calibrated risk graph”. This approach
is described further in the sector-specific development of IEC 61508 for safety instrumented
systems in the process industry sector (see Part 3 of IEC 61511 [6]).

B.3.1.1.2 Weighting of the parameters

A “weighting” can be applied to one or more of the parameters. For example, it might be decided
that for the highest consequence parameter CD this has a far greater weight than F or P; and for
the next highest consequence parameter CC that this and F have a far greater weight than P. In
this case the risk graph would be modified as shown in Figure B.6.

14.11.2006 2.0 B-19


EASIS Deliverable D3.2 Part 1 - App B

W3 W2 W1
CA
a - -

PA
1 a -
CB FA PB
Start FB 2 1 a
PA
PB
CC FA
3 2 1
FB
4 3 2
CD
b 4 3

Figure B.6 Example of risk graph with weighted parameters

B.3.1.2 Controllability

The “Controllability” approach to hazard classification was developed for road transport
applications. It was first introduced by the EC-funded project DRIVE Safely [8]. It was then
adopted and enhanced by the UK Government supported project MISRA [13]. Since then it has
been used to assess the risks associated with a variety of novel in-vehicle and roadside systems.
It has also been recommended for use in assessing the safety properties of integrated traffic
control systems [16]. A summary of the approach is presented here, and fuller details along with
examples can be found in the MISRA Technical Report Controllability [14] or the MISRA Safety
Analysis Guidelines.
In general, it has been found that the Controllability approach is best suited to hazards
characterized by moving vehicle scenarios. Hazards that are not associated with motion of the
vehicle may be better addressed by other techniques such as a Risk Graph. An example would be
an anti-trap function on a window lift system – this is an example of a classical protection system
as envisaged by IEC 61508 and it is difficult to interpret some of the Controllability concepts when
considering its hazards.
The term “Controllability” refers to the ability of the driver, another vehicle occupant, or another
person interacting with the system to control the safety of the situation following a failure. This
approach recognizes that in road transport scenarios, a failure in a system does not necessarily
lead to an accident. Depending on other factors, which are discussed more fully in the
Controllability report, the driver or other operator of the system may be able to react to the failure
situation and prevent an accident from occurring. This chain of events may be represented
diagrammatically as shown below.
Failure ⎯may
⎯⎯ ⎯→ Loss of control ⎯may
lead to
⎯⎯ ⎯→ Accident
lead to

Each hazard is classified by assigning it one of five Controllability categories. The controllability
categories are defined in the following table. Note that the first column is a short descriptor for the
Controllability category; each category is fully defined only by the text in the second column.

14.11.2006 2.0 B-20


EASIS Deliverable D3.2 Part 1 - App B

Controllability category Definition


Uncontrollable This relates to failures whose effects are not controllable by the
road user, or vehicle occupants, and which are most likely to lead
to extremely severe outcomes. The outcome cannot be
influenced by a human response.
Difficult to control This relates to failures whose effects are not normally
controllable by the road user, or vehicle occupants but could,
under favourable circumstances, be influenced by a mature
human response. They are likely to lead to very severe
outcomes.
Debilitating This relates to failures whose effects are usually controllable by a
sensible human response and, whilst there is a reduction in the
safety margin, can usually be expected to lead to outcomes
which are at worst severe.
Distracting This relates to failures which produce operational limitations, but
a normal human response will limit the outcome to no worse than
minor.
Nuisance only This relates to failures where safety is not normally considered to
be affected, and where customer satisfaction is the main
consideration.

To arrive at a Controllability category for a hazard, the following four parameters are considered
and graded:
1. Level of system inter-dependency (I)
2. Loss of authority or control due to the hazard (A)
3. Provision of backup or mitigation (B)
4. Reaction time (T)
The first two parameters are concerned with the importance of the system under discussion, and
the amount of authority or influence that it has to affect, or maintain, the safety of the situation.
The second two issues are concerned with what the user(s), normally the driver(s), can do to
maintain control of the safety of the situation in the event that the hazard under consideration
occurs.
Initially these parameters are considered separately and a grade from A (worst case, high value) to
E (best case, low value) is assigned to each one. Note that each parameter may take any grade
out of A, B, C, D or E. In the sections below that describe the parameters, the descriptions given
against grades A, C and E show the range over which the parameters are graded. See the section
on benchmarking of hazard classification for an example which explains this.
Once grades for the four parameters have been obtained, then the final consolidated grade is
considered. In general, the highest of the grades is taken. Note that a simple average of the
grades is not taken since the grades do not have the same dimensions; they actually define a
point in the four-dimensional space of the parameters. Once a possible final grade has been
chosen, the full definition of the corresponding Controllability category (Grade A implies
“Uncontrollable”, Grade B implies “Difficult to Control” etc.) is studied to confirm that it does indeed
reflect the controllability of the safety of the situation that results from the hazard under
consideration. The final grade for each hazard should be chosen carefully and reasonably, since
the highest final grade will be used to define the safety integrity requirements of the system, and
the risks created by this system must be demonstrated to be broadly acceptable or tolerable and
reduced as low as reasonably practicable (ALARP).

14.11.2006 2.0 B-21


EASIS Deliverable D3.2 Part 1 - App B

It is usual practice to record the grading allocated to the four parameters that are used to assign
the Controllability category, along with any notes or observations. An example is shown:

Hazard I A B T Controllability category Comments


Engine produces less torque E C C C Debilitating
than driver demands

B.3.1.2.1 Parameters

B.3.1.2.1.1Level of system interdependency

The parameter “level of system interdependency” is concerned with system integration. It relates to
the degree to which other systems are relying on the correct functioning of this system for their
own correct functioning, e.g. when this system provides data for use by other systems in this
vehicle. It should be a functional dependency, not just the existence of a communications link
(which is a design issue that will be assessed later during detailed safety analyses). The concern
is not whether this system is part of a tightly integrated application, which would warrant a safety
analysis in its own right, but whether this system is providing data for other, and distinct, systems
that will modify their functionality according to the value of that data.
The grades for this parameter are allocated as follows:
A. Full functional inter-dependency
B. ↑
C. Partial functional dependency
D. ↑
E. Autonomous system or function

B.3.1.2.1.2Loss of authority or control due to the hazard

The parameter “loss of authority or control due to the hazard”, relates to the system under
investigation, which is inside the system boundary. Each hazard will reduce the authority of this
system and/or the ability of the user(s), or vehicle occupant(s), to maintain a safe situation.
The grades for this parameter are allocated as follows:
A. Full authority/control lost
B. ↑
C. Partial authority/control lost
D. ↑
E. No effect

B.3.1.2.1.3Provision of backup or mitigation

The parameter “provision of backup or mitigation”, relates to any other functions outside the
system boundary that are being, or may be, used to control the safety of the situation following a
failure.

14.11.2006 2.0 B-22


EASIS Deliverable D3.2 Part 1 - App B

The grades for this parameter are allocated as follows:


A. No other functions available
B. ↑
C. Other functions available, but with reduced functionality or safe state
D. ↑
E. Full redundancy or diversity, or functions not affected
Note that in the case of a system failure, “full redundancy or diversity” refers to other systems that
can provide a backup function, but by other means; it does not refer to a multi-channel
implementation of the function that has failed (this is a design issue and will be one way of
achieving the required safety integrity requirements).

B.3.1.2.1.4Reaction time

The parameter “reaction time” refers to the speed with which the user(s), normally the driver(s),
must be able to apply the backup functions outside the system boundary in order that they will
succeed in creating a safe state. The grades for this parameter are allocated as follows:
A. Much faster than humanly possible
B. ↑
C. Similar to human
D. ↑
E. Similar to normal traffic situation
Note that:
• “Similar to normal traffic situation” in this context refers to the speed of reaction necessary to
maintain normal safe traffic conditions, i.e. the road user, or the driver, does not have to
perform any extra or different tasks to maintain the safety of the situation.
• “Similar to human” refers to a speed of reaction necessary in an emergency situation, but
within the capabilities of most road users. This is a scenario where road users, or drivers, have
to perform one or more tasks that they did not expect to have to do until the hazard under
consideration occurred.
• “Much faster than humanly possible” refers to scenarios, such as platooning, in which vehicles
are legitimately not under the immediate control of their drivers.

B.3.1.3 MISRA risk graph

The MISRA Safety Analysis Guidelines [19] incorporate a Risk Graph approach to hazard
classification. The MISRA Risk Graph maintains the Controllability scheme that has been proven
over several years of use for moving vehicle hazards, while incorporating an additional scheme
that is more suited to the non-moving vehicle and protection system hazards. The MISRA Risk
Graph has also been designed to deal with possible future systems whose control authority is
greater than a single vehicle, for example, co-operative driving systems and infrastructure-based
systems.
The MISRA Risk Graph caters for three types of hazards:
• Hazards associated with (loss of control of) a moving vehicle – these are referred to as “moving
vehicle hazards” or “hazards associated with the control of a moving vehicle”

14.11.2006 2.0 B-23


EASIS Deliverable D3.2 Part 1 - App B

• Hazards that are not associated with loss of control of a moving vehicle – these are referred to
as “non-moving vehicle hazards” or “hazards not associated with the control of a moving
vehicle”
• Hazards of protection systems (in the classical sense of IEC 61508) – this is a subset of the
non-moving-vehicle case.
The MISRA Risk Graph approach considers three input parameters:
• The potential severity of the outcome of the hazard
• The frequency of exposure to the hazard
• The possibility to avoid the hazard.
The output parameter is a hazard classification which is called the “hazard risk”. This is the risk
associated with the hazardous event given that the hazard has occurred.
The MISRA Risk Graph may therefore be expressed in the following way:
Risk = f(S, F, P) – Used for hazards not associated with control of a moving vehicle, or for hazards
that will be mitigated by a classical protection system.
Risk = f(S, F, C) – Used for hazards that are associated with control of a moving vehicle
where:
S represents the potential severity of the consequences of the hazard
F represents the frequency of exposure to the hazard
P represents the probability of failing to avoid the hazardous event
C represents “controllability”, which is an estimate of the degree of control over the safety of the
situation following a hazard
f represents a general function that combines the parameters.
Note that the purpose of the C and P parameters is the same; however, experience has shown that
an assessment of controllability is naturally suited to those hazards characterized by the moving
vehicle whereas this is not the case for other types of hazard.
For the moving vehicle case F is currently assumed to be constant but is shown in the formula for
completeness.
The MISRA Risk Graph is shown in the diagram below:

14.11.2006 2.0 B-24


EASIS Deliverable D3.2 Part 1 - App B

P1 P2
S1 F1/F2
R1 R2
F1
R2 R3
F2
R3 R4
No
S2 Hazards associated with the
Hazard
control of a moving vehicle?
Yes
C0 C1 C2 C3 C4
NR R1 R2 R3 R4
S3
NR R2 R3 R4 R5
NR = No Risk

For each hazard, the risk graph is followed systematically from left to right in order to arrive at a
risk value R for that hazard. R is the Hazard Risk and represents the risk associated with a
hazardous event, given an occurrence of the hazard. The R values are therefore hazard
classifications.
In the MISRA Safety Analysis Guidelines approach, these hazard classifications are used
subsequently to set safety requirements (including random and systematic safety integrity
requirements) such that the probability of occurrence of hazards is less than or equal to the
broadly acceptable risk. Note that this approach is subtly different from the concept of risk
reduction encountered in IEC 61508 but very similar to the use of ASILs in ISO WD 26262 [3][11].
Usually the highest R value of all the hazards of the system leads to the safety integrity
requirements according to the following mapping:

Hazard Safety integrity requirements


classification
Systematic Random
NR “SIL 0” No requirement
R1 SIL1 10-5 ≤ P/hr < 10-4
R2 SIL1 10-6 ≤ P/hr < 10-5
R3 SIL2 10-7 ≤ P/hr < 10-6
R4 SIL3 10-8 ≤ P/hr < 10-7
R5 SIL4 10-9 ≤ P/hr < 10-8

Strictly speaking this aspect of the use of the MISRA Risk Graph is not part of hazard classification
but is provided in this section for completeness.

14.11.2006 2.0 B-25


EASIS Deliverable D3.2 Part 1 - App B

The following features of the MISRA Risk Graph should be noted:


1. The severity levels are classified as follows:
S1 – Minor Injury
S2 – Serious permanent injury to one or more persons; death to one person
S3 – Death to several people
The MISRA approach considers that S2 is the highest severity that can apply to a single
autonomous vehicle carrying a small number of passengers. S3 is intended for use when
considering very high occupancy vehicles, or systems that could affect the safety of multiple
vehicles. Nevertheless, the S3 severity category in the MISRA approach does not consider the
highest consequence envisaged in IEC 61508 e.g. the C4 consequence in the example risk graph
in Part 5 which has an example definition of “Very many people killed”.
Note further that in the MISRA Risk Graph, S2 is equivalent to the classes S2 and S3 in the
proposed ISO approach (see next section).
2. The MISRA Risk Graph contains an implicit assumption that any hazard associated with control
of a vehicle could potentially lead to a fatality. Therefore Controllability is not accessible in the risk
graph for a severity of class S1.
3. The frequency of exposure is only considered for S2 hazards that are not associated with the
control of a moving vehicle. For moving vehicle hazards, the assumption is that the vehicle
occupants are exposed to the potential hazards for a significant proportion of the time over which a
vehicle is used. For severity S1, the exposure is considered to be of little consequence. It is
therefore only for S2 non-moving-vehicle hazards that the exposure needs to be evaluated.
4. In the MISRA Safety Analysis Guidelines, the Controllability categories have been relabelled as
follows:

Old name New label


Uncontrollable C4
Difficult to control C3
Debilitating C2
Distracting C1
Nuisance only C0

This has been done since some of the names (particularly “Debilitating” and “Distracting”) have
proven to be confusing when translated into languages other than English.
The Controllability classification still has to be performed by considering the four intermediate
parameters (I, A, B, T) – see section B.3.1.2.1.
The textual definitions remain the normative description of what each Controllability category
describes, although the severity descriptors have been removed since this is now covered by the
first part of the risk graph (e.g. “C4” is defined as “This relates to failures whose effects are not
controllable by the vehicle occupant(s). The outcome cannot be influenced by a human
response.“).

B.3.1.4 ASIL classification in ISO Working Draft 26262

Work is presently ongoing to define an ISO standard (Working Draft ISO WD26262) for “Functional
Safety” [11] [3]. The hazard analysis and risk assessment in WD26262 is based on the concept of
ASIL (Automotive Safety Integrity Level). The ASIL appears to represent the degree to which an
14.11.2006 2.0 B-26
EASIS Deliverable D3.2 Part 1 - App B

individual failure mode of the considered system needs to be avoided. Thus, the derivation of the
ASIL classification is essentially a hazard classification activity. There are four ASIL levels: A, B, C
and D, with ASIL D representing the most critical hazards. In fact there is one more level outside
the ASIL range that is called “QM” which means that standard Quality Management techniques are
considered to be sufficient.
The Figure below illustrates the reasoning behind the ASIL classification. The risk associated with
a specific hazardous event depends on the frequency of the event and the severity (S) of the
resulting harm. The frequency depends on the occurrence rate (represented by ASIL) of the failure
mode considered, the exposure (E) to situations in which the hazardous event could occur and the
controllability (C) i.e. the degree to which humans can avoid the hazardous event.

In the classification scheme, the Exposure and Controllability parameters may take the following
values.
Exposure Controllability
E4: 1 C3: 1
E3: 0.1 C2: 0.1
E2: 0.01 C1: 0.01
E1: 0.001

The ASIL classification is then determined by the following table, with the MAIS ("Maximum
Abbreviated Injury Scale") numbers representing the accident severity in terms of the resulting
injury associated with an accident.

14.11.2006 2.0 B-27


EASIS Deliverable D3.2 Part 1 - App B

Probability E*C
Severity 1 0.1 0.01 0.001 0.000
1
S0: No QM QM QM QM QM
injuries
S1: MAIS 1-2 B A QM QM QM
S2: MAIS 3-4 C B A QM QM
S3: MAIS 5-6 D C B A QM

As the Exposure and Controllability are discretised into levels with a factor of ten between the
levels, it is reasonable to assume that the levels really should be interpreted as:
E4: 0.1-1 C3: 0.1-1
E3: 0.01-0.1 C2: 0.01-0.1
E2: 0.001-0.01 C1: 0.001-0.01
E1: 0.0001-0.001

The following observations can be made based on this rather limited information about the ASIL
approach.
• Application of the ASIL approach can create paradoxical results, as illustrated by the
following hypothetical example:
Consider a transient failure mode that in almost 10% of all driving situations is
guaranteed to result in an accident of severity S2. Thus, we assume that this failure
mode is completely uncontrollable (C=C3) in such driving situations. The ASIL
classification is S2*E3*C3 = ASIL B.
Let us now consider another transient failure mode that in 11% of all driving
situations could result an accident of the same severity S2. We assume that the
driver has slightly less than 90% chance of averting danger, meaning that just over
10% of all drivers are not able to control the damage (C=C3). The ASIL
classification is S2*E4*C3 = ASIL C.
Thus, the first case represents around 10% conditional probability of an accident,
given an occurrence of the hazard. The second case represents around 1.1%
conditional probability of an accident of the same severity, given an occurrence of
the hazard. The first case is therefore almost ten times worse than the first. Still, the
second case has a higher ASIL ranking and this is clearly paradoxical.
(Of course, estimation of precise values such as 10%, 11% and 90% above can not
be made in reality, but the example still illustrates the principle of the paradox.)
• It is not clear how to handle the case when the same failure mode can lead to different
severities, with different probability distributions between the severities in different driving
situations. These driving situations could very well have different controllability classes in
different driving situations, which further complicates the issue.
• An accident that results in S2 effects is not much different from an accident that results in
S3 effects. With respect to the electronic system considered, the differentiation between S2
and S3 does not seem necessary.
• The important characteristic of a failure mode is the conditional likelihood that it will lead to
a particular Severity level given an occurrence of the failure mode. It is not clear if the E
and C factors together provide a complete picture of this conditional probability.

14.11.2006 2.0 B-28


EASIS Deliverable D3.2 Part 1 - App B

• It is somewhat confusing that a high value of the Controllability represents "difficult to


control" and a low value represents "simply controllable". In other words, the values seem
to represent the "non-controllability".

B.3.1.5 MIL-STD-882D

In MIL-STD-882D [17], a risk classification approach is used, based on a qualitative assessment of


mishap risk. This means assessing the severity and probability of the mishap risk associated with
each identified hazard, i.e., determining the potential negative impact of the hazard on personnel,
facilities, equipment, operations, the public, and the environment, as well as on the system itself.
The process is based on a number of tables shown in Appendix A of the standard, including a
“mishap risk assessment matrix” that combines severity and probability. Other techniques are also
permitted subject to formal agreement within the program. Once mishap risk has been identified,
then appropriate measures have to be introduced to mitigate the mishap risk and reduce it to an
acceptable level.
Note the following definitions in the standard:
Hazard: Any real or potential condition that can cause injury, illness, or death to personnel;
damage to or loss of a system, equipment or property; or damage to the environment.
Mishap: An unplanned event or series of events resulting in death, injury, occupational illness,
damage to or loss of equipment or property, or damage to the environment.
Mishap risk: An expression of the impact and possibility of a mishap in terms of potential mishap
severity and probability of occurrence.
The tables found in the standard are as follows, noting that these are examples that have to be
confirmed for the particular system or application under consideration and agreed between the
program manager and the developer.

B.3.1.5.1 Example mishap risk assessment matrix

Severity
Probability Catastrophic Critical Marginal Negligible
Frequent 1 3 7 13
Probable 2 5 9 16
Occasional 4 6 11 18
Remote 8 10 14 19
Improbable 12 15 17 20

Mishap risk assessment values can then be used to group individual hazards into “mishap risk
categories”. Mishap risk categories are then used to generate specific action such as mandatory
reporting of certain hazards to management for action or formal acceptance of the associated
mishap risk. In the table below, an example listing of mishap risk categories and the associated
assessment values is given. In this example, the system management has determined that mishap
risk assessment values 1 through 5 constitute “High” risk while values 6 through 9 constitute
“Serious” risk.

14.11.2006 2.0 B-29


EASIS Deliverable D3.2 Part 1 - App B

Mishap risk Mishap risk category Mishap risk acceptance level


assessment value
1–5 High Component Acquisition Executive
6–9 Serious Program Executive Officer
10 – 17 Medium Program Manager
18 – 20 Low As directed

Higher risk categories indicate greater need for mishap risk reduction.

B.3.1.5.2 Example severity classes

The standard gives suggested classifications for severity, which are reproduced here. Again these
need to be interpreted for the system under consideration; for example, the standard says that “…
The dollar values shown in this table should be established on a system by system basis
depending on the size of the system being considered to reflect the level of concern.”

Description Category Environmental, safety, and health result criteria


Catastrophic I Could result in death, permanent total disability, loss exceeding $1m,
or irreversible severe environmental damage that violates law or
regulation.
Critical II Could result in permanent partial disability, injuries or occupational
illness that may result in hospitalization of at least three personnel,
loss exceeding $200k but less than $1m, or reversible environmental
damage causing a violation of law or regulation.
Marginal III Could result in injury or occupational illness resulting in one or more
lost work days(s), loss exceeding $10k but less than $200k, or
mitigatible environmental damage without violation of law or
regulation where restoration activities can be accomplished.
Negligible IV Could result in injury or illness not resulting in a lost work day, loss
exceeding $2k but less than $10k, or minimal environmental damage
not violating law or regulation.

B.3.1.5.3 Example mishap probability levels

The standard gives the following suggested mishap probability levels. It notes that the definitions
of descriptive words may have to be modified based on the quantity of items involved; and that the
expected size of the fleet or inventory should be defined prior to undertaking an assessment of the
system.

14.11.2006 2.0 B-30


EASIS Deliverable D3.2 Part 1 - App B

Description Level Specific individual item Fleet or inventory


Frequent A Likely to occur often in the life of an item, Continuously
with a probability of occurrence greater than experienced
10-1 in that life
Probable B Will occur several times in the life of an Will occur frequently
item, with a probability of occurrence less
than 10-1 but greater than 10-2 in that life
Occasional C Likely to occur some time in the life of an Will occur several times
item, with a probability of occurrence less
than 10-2 but greater than 10-3 in that life
Remote D Unlikely but possible to occur in the life of Unlikely, but can
an item, with a probability of occurrence reasonably be expected
less than 10-3 but greater than 10-6 in that to occur
life
Improbable E So unlikely, it can be assumed occurrence Unlikely to occur, but
may not be experienced, with a probability possible
of occurrence less than 10-6 in that life

B.3.1.5.4 Evaluation of the MIL-STD-882D approach

It should be noted that the approach described above evaluates the mishap risk, not the hazard
risk. The mishap risk is a combination of the severity of the mishap with the probability of the
mishap. The probability of a mishap is a combination of the probability of occurrence of the hazard
with the conditional probability of a mishap given that a hazard occurs. Thus comparing this to the
model used in EASIS it is observed that:
• The mishap severity classes are relevant to this Appendix
• The mishap probability levels combine the probability of occurrence of the hazard (relevant
to Appendix D) and the conditional probability of a mishap (relevant to this Appendix)
For hazard classification, therefore, the MIL-STD-882D approach could be difficult to apply directly
in the automotive applications.

B.3.1.6 Severity classification in SAE J1739

The Society of Automotive Engineers (SAE) has published a Recommended Practice J1739 [18]
for FMEA (Failure Mode and Effects Analysis). With respect to hazard classification, the Severity
scale in J1739 is of some interest. This scale is given below:
• 10: Hazardous without warning
Very high severity ranking when a potential failure mode affects safe vehicle operation
and/or involves noncompliance with government regulation without warning.
• 9: Hazardous with warning
Very high severity ranking when a potential failure mode affects safe vehicle operation
and/or involves noncompliance with government regulation with warning.
• 8: Very High
Vehicle/item inoperable (loss of primary function).
• 7: High
Vehicle/Item operable, but at a reduced level of performance. Customer very
dissatisfied.

14.11.2006 2.0 B-31


EASIS Deliverable D3.2 Part 1 - App B

• 6: Moderate
Vehicle/Item operable, but Comfort/Convenience item(s)inoperable. Customer
dissatisfied.
• 5: Low
Vehicle/Item operable, but Comfort/Convenience item(s) operable at a reduced level of
performance. Customer somewhat dissatisfied.
• 4: Very Low
Fit and Finish/Squeak and Rattle item does not conform. Defect noticed by most
customers (greater than 75%).
• 3: Minor
Fit and Finish/Squeak and Rattle item does not conform. Defect noticed by 50% of
customers.
• 2: Very Minor
Fit and Finish/Squeak and Rattle item does not conform. Defect noticed by
discriminating customers (less than 25%).
• 1: None
No discernible effect.
Some comments and observations concerning this scale are given below.
• Only two of the levels are related to safety issues: 9 and 10. The distinction between
them is solely determined by whether the driver is informed about the failure or not.
Thus, the scale appears to be too coarse to be of much use for hazard classification.
• According to the scale, a very dangerous failure mode with warning is assigned a lower
severity than a slightly dangerous failure mode without warning. It is easy to think of
examples where this classification differs from the intuitive understanding of severity.
• The failure modes that have a severity classification of 9 or 10 should be analysed in
more depth, using one of the more sophisticated approaches described in this
document such as the ISO WD 26262 approach, the MISRA risk graph or the
alternative hazard classification approach.

B.3.2 An alternative hazard classification approach

In this section, a new hazard classification approach developed in the EASIS project is described.
First, however, the background underlying the approach is explained.

B.3.2.1 Background

The following observations regarding the relationship between hazards and their consequences
can be made:
• The relationship between hazards and their consequences is typically much more
complex than 'a hazard causes a consequence', particularly in the extremely complex
automotive environment. Typically, a hazard contributes to the occurrence of an
outcome and this contribution may range from very weak to very strong depending on
the characteristics of the hazard.
• When a particular hazard occurs, several different outcomes are possible. It is not
necessarily the outcome with the worst severity that is the most important one as it may
be relevant only in extremely rare driving scenarios. Other less severe outcomes may
be much more likely, possibly to the point that consideration of these outcomes
determines the criticality of the hazard.

14.11.2006 2.0 B-32


EASIS Deliverable D3.2 Part 1 - App B

Based on these observations, a hazard should be classified based on:


• The severity associated with each potential outcome.
• The conditional probability of each potential outcome, given that the hazard occurs.
(This probability can usually only be coarsely estimated.)
Unlike many existing approaches to hazard classification, this acknowledges the fact that a hazard
may lead to different outcomes.
In the following subsections, some more observations on which the alternative classification
approach are given.

B.3.2.1.1 Qualitative versus quantitative probability estimations

Concerning the conditional probability of each potential outcome, vague expressions such as
”likely”, ”unlikely”, ”extremely unlikely” and ”almost impossible” should be avoided in the hazard
classification unless these are defined quantitatively. Qualitative expressions are only meaningful if
there is a reference that they can be related to. For example, consider the following statements:
• ”John is very short”
• ”Peter's apartment is quite big”
Within a society with a common comprehension of human dimensions and apartment sizes, these
statements provide meaningful information about John's stature and Peter's apartment. In contrast,
consider the following statement:
• If hazard H5 occurs, it is very unlikely that it will result in a fatal outcome
This statement does not convey much information. ”Very unlikely” could mean any probability from
perhaps 10-7 to 10-2. The reason for this vagueness is of course that there is no typical probability
that can be used as a reference. Note that a quantitative definition such as:
”very unlikely” is defined as a probability between 10-4 and 10-3
would mean that ”very unlikely” is a (discretised) quantitative measure rather than a qualitative
measure.

B.3.2.1.2 Reference situation to consider in the classification

In some cases, it needs to be decided whether the hazard should be judged against a reference
situation in which the system operates as intended or against a reference situation in which the
vehicle is not equipped with the system at all. These two cases are clarified by the following
examples:
o The criticality of the hazard ”front airbag inoperable” may be determined based on a
comparison with ”front airbag works as intended”
o The criticality of the hazard “front airbag inoperable” may be determined based on a
comparison with “vehicle is not equipped with a front airbag”
The selection of whether to use “works as intended” or “not equipped” as the reference would
typically be determined based on whether the system can be considered as standard equipment or
not. This means that the choice is time-dependent. Systems and functions that were once
considered optional tend to become standard equipment as the years go by.

14.11.2006 2.0 B-33


EASIS Deliverable D3.2 Part 1 - App B

B.3.2.1.3 Factors involved in the classification

When a hazard occurs in real life, the outcome will be determined by the following factors, some of
which may be more or less relevant depending on the nature of the actual hazard:
• The particular hazard, in terms of how it affects the vehicle
• Speed of the vehicle
• Position and speeds of surrounding vehicles and other objects
• The state of the vehicle (for example cruise control on/off, actual gear)
• Road characteristics (crossing, highway, city street, curve, uphill/downhill, wide/narrow,
fences, etc)
• The road surface conditions (wet/dry, gravel, asphalt, snow, etc)
• Visibility conditions (day, night, fog, sun glare, etc)
• The skill and mental conditions of the drivers involved (experienced/novice, tired/alert,
alcohol influence, etc)
• Other
Defining a classification methodology that perfectly accounts for all these factors is simply not
possible, so the classification methodology has to be based on a simplified view of the relationship
between hazards and outcomes.
It should be noted that legal requirements and company policy may influence the classification of a
particular hazard. If two hazards are equivalent with respect to the conditional probabilities of their
potential outcomes, they may still need to be classified differently due to legal requirements or
company policy. We do not believe that it is possible to define a formalized classification scheme
that covers all characteristics of hazards perfectly. Thus, the classification approach should allow
hazards to be assigned a criticality level different from the one resulting from any formal
classification method.

B.3.2.1.4 Hazard duration and driver notification

Two characteristics of a hazard that are often important to the hazard classification are the hazard
duration and whether or not the driver is notified about the existence of the hazard.
Hazard duration
The following examples show why the duration of the hazard is important in the hazard
classification.
• The hazard “front airbag inoperable during a time less than one second” can
only lead to an effect if there is a simultaneous front-end collision. Thus, this
hazard is not particularly critical. (In this example we assume that there is no
causal relationship between the airbag inoperability and the collision.)
• The hazard “front airbag permanently inoperable” can only lead to an effect if
there is a collision. It is obviously a more critical hazard than the one-second
example above, but still not extremely critical as collisions are rare events.
• The hazard “full engine torque produced during 100 ms without a good reason”
is not particularly critical since the speed increase will be very small in this short
time.
However, a short-duration hazard does not necessarily have a low criticality. Consider the
hazard “front airbag not able to remain passive” which is equivalent to “front airbag is
14.11.2006 2.0 B-34
EASIS Deliverable D3.2 Part 1 - App B

activated (inflated) when not needed”. This hazard has an extremely short duration but is
still quite critical since it will lead to a safety-related effect whenever it occurs. (Here, we
ignore the extremely unlikely scenario that a fault in the system causes a triggering of the
airbag and a front-end collision happens to occur simultaneously.)
Driver notification
Another important characteristic of a hazard is whether or not the driver is informed about
the existence of the hazard. This distinction is important for hazards that have a long
duration and that may lead to effects a long time after the beginning of this duration.
Typically, such hazards are present permanently until a repair action is carried out in a
service station. An example is the hazard “front airbag permanently inoperable”. If the driver
(or possibly a service station) is informed about the existence of the hazard, the necessary
repair action could be carried out. Without any such information about the lack of airbag
functionality to the driver, this hazard may remain undetected until a front-end collision
occurs. Thus, the driver notification will have an impact on the Exposure parameter. To
summarize, this discussion, we conclude that hazards caused by detectable (and therefore
reportable) errors in the system are usually more benign than hazards caused by
undetectable errors.
For some hazards, the driver notification could additionally result in the driver adapting
his/her driving style. For example, if the driver is informed that there is something wrong
with the parking brake, he/she can be expected to avoid parking in a steep slope. This
illustrates another reason why the hazard classification should consider whether the driver
is informed or not about the existence of the hazard.
Finally, it should be noted that the driver may notice the occurrence of a hazard by other
means than by visual information such as telltales and instrument displays. The hazard
could for example lead to a vehicle behaviour that makes the driver understand that
something is wrong. This type of “driver notification” should also be considered in the
hazard classification when relevant.

B.3.2.2 A novel method for hazard classification

Below, a hazard classification method is outlined based on the observations above. This
classification can be said to represent the criticality of the hazard. By criticality, we mean the
combination of the severities of the potential effects and the conditional probability of these
severities, given an occurrence of the hazard. Thus, the criticality is independent of the hazard
occurrence rate.
It might seem strange to introduce a new concept such as Hazard Criticality, when SIL (Safety
Integrity Level) is already an established concept. However, the SIL as defined in IEC 61508 [5] is
a discretised value of the target probability of a dangerous failure (of a "safety function") and
therefore more requirement-oriented than classification-oriented. We believe Hazard Criticality is a
more appropriate term to use in the hazard classification since it separates the classification from
any target probabilities. Such probabilistic requirements, for example the SIL as defined in IEC
61508, should not be embedded in the hazard classification, but should result from a
consideration of both the hazard criticality and the tolerable risk.
The hazard classification approach is based on three parameters:
• Severity
• Exposure
• Possibility of non-avoidance

14.11.2006 2.0 B-35


EASIS Deliverable D3.2 Part 1 - App B

Severity
The Severity is a measure of the harm associated with each safety-related potential outcome of the
hazard
• S2: One or more fatalities and/or major injuries
• S1: One or more minor injuries
With these severity levels, only those outcomes that are related to safety are considered. It is
certainly possible, however, to extend the classification approach to cover any undesirable system
states and not just hazards. This means that not only safety but also reliability can be addressed
by the methodology. For example, additional Severity levels representing e.g. ”major customer
dissatisfaction”, ”minor customer dissatisfaction”, etc. could be introduced.
Exposure
The Exposure represents the probability that a particular outcome is possible, given that the
hazard occurs. For a hazard that occurs at time t and that has a very short duration, the Exposure
is simply the probability that the vehicle at time t is in a situation in which the hazard is relevant
with respect to the particular outcome considered. For a hazard with a longer duration, the
Exposure is a measure of the probability that a situation arises in which the hazard might lead to a
particular outcome before the hazard disappears.
Possibility of non-avoidance
This is quite similar to the MISRA controllability concept, the main difference being that the
possibility of non-avoidance is described in quantitative terms. It is a measure of how likely it is that
a particular consequence will occur, given an occurrence of the hazard and given that the vehicle
is in a particular situation in which the consequence is possible.
The classification methodology for the determination of a Hazard Criticality (HC) associated with
any given hazard can now be described in the form of an algorithm:

For i = 1 to 2
Prob:= 0
For each driving situation in which the hazard may lead to severity Si
Determine Exposure (E) with respect to Si for this situation
Determine Possibility of non-avoidance (P) of Si in this situation
Prob:= Prob + E*P
End For
Determine a tentative HC level (A-E or A-D) for Si according to the table
below
End For
Determine the HC as the highest tentative HC level found so far

Prob
<0.0000 0.00001- 0.0001- 0.001- 0.01- >0.1
1 0.0001 0.001 0.01 0.1
S2 - HC A HC B HC C HC D HC E (or
D)
S1 - - HC A HC B HC C HC D

14.11.2006 2.0 B-36


EASIS Deliverable D3.2 Part 1 - App B

In the light of the issues discussed in sections B.3.2.1.2-B.3.2.1.3, the resulting HC might need to
be modified to account for some factors that are not captured by the algorithm above:
• For systems and functions that are far from standard equipment, it might be appropriate to
reduce the criticality of hazards that can be described as “the vehicle behaves as if it is not
equipped with the system at all”. For example, the criticality of the hazard ”collision
avoidance function unavailable” should perhaps be reduced since this function is (today)
not considered an essential part of the vehicle? The decision about how far to reduce the
criticality, possibly all the way to ”no HC classification”, should be determined by:
o how the system is marketed (for example: ”The collision avoidance system is not
guaranteed to avoid collisions in all situations and the driver is still responsible for
controlling the vehicle in a safe manner”)
o whether the driver can be expected to adapt his/her driving style to the existence of
the system
• Legal issues and company policy might make the final classification differ from the level
determined by the algorithm presented above.

B.3.2.2.1 Comments on the quantitative probabilistic approach versus qualitative


approaches

The novel approach for hazard classification is quite similar to the WD 26262 approach presented
in section B.3.1.4. The main difference is in the quantitative rather than qualitative estimation of the
Exposure (E) and Possibility of non-avoidance (P). The use of quantitative conditional is certainly
unusual. Below, the reasons for selecting a quantitative approach are given.
• First it should be noted, that the determination of E and C in the WD 26262 approach is
somewhat quantitative too since the different levels are defined by numerical measures.
o E2 represents an exposure less than 1%
o E3 represents an exposure between 1% and 10%
o C1 represents less than 1%
o C2 represents 1% to 10%
• Unlike the ASIL approach, the discretisation of E and P is postponed until E and P have
been estimated and the results combined into a single (estimation of) conditional
probability. The ASIL approach leads to large discretisation errors since every intermediate
value of E and C is rounded to the next power of ten before the values are combined.
• E*C in the ASIL classification is in principle a measure of the conditional probability of the
specified harm, given an occurrence of the hazard. With this interpretation, the conditional
probabilities and the corresponding ASIL levels for ASIL B-D (for severity S3) are:
• 0.01-1 ASIL D
• 0.001-0.1 ASIL C
• (0.0001)-0.01 ASIL B
There are obviously overlaps between these ASIL ranges. A hazard may actually be to ten
times more likely to cause a specific level of harm than another hazard and still get a lower
ASIL rating. This overlap problem is eliminated, or at least significantly reduced, by the
novel approach.
• The novel approach is better adapted than other methods to deal with hazards for which
the exposure is extremely low. If the exposure is below 0.00001, the HC classification will
be "none". The ISO WD 26262, on the other hand, may end up with ASIL A for such

14.11.2006 2.0 B-37


EASIS Deliverable D3.2 Part 1 - App B

hazards. The following highly hypothetical example illustrates why ASIL A may be an
inappropriate classification in such a case:
o A system that protects car occupants from falling meteorites is theoretically
conceivable in convertible cars. One of the hazards of such a system can be
defined as "inability to provide protection". In the ISO WD 26262 approach, this
hazard would be classified as ASIL A (based on S3, E1, C3). Due to the low
exposure, i.e. the low probability of the car being hit by a falling meteorite, the novel
approach would instead end up with a Hazard Criticality of "none". Most people
would probably agree that the unavailability of such a meteorite protection system is
an extremely minor hazard, at least on this planet. (More realistic examples
involving extremely rare scenarios could be given.)
• The major drawback of the novel quantitative approach is that it is typically more difficult to
argue why a particular exposure probability (e.g. 0.035) has been chosen than to argue
why a particular exposure class (E1, E2, E3 or E4) has been chosen. This observation may
make the quantitative approach unacceptable for practical application in the automotive
industry. However, it should be noted the quantitative classification method could be
complemented with guidelines (tables, etc) on how to do the estimations. The
establishment of such tables is outside the scope of the EASIS project.

B.3.3 Hazard classification benchmarking

In this section we present a comparison between different hazard classification schemes using a
small number of hazards to conduct a benchmarking exercise.

B.3.3.1 The hazards considered

The hazards to be benchmarked are:


1. Unwanted deployment of airbag
2. Airbag function unavailable, without any information to the driver about this unavailability
3. Unwanted full braking (around 10 ms-2), not limited in time
4. Unwanted full braking (around 10 ms-2), 1 second duration
5. “Collision avoidance by braking” function unavailable, without any information to the driver
about this unavailability
6. Total loss of service brake function, i.e. not possible for the driver to brake using the brake
pedal.
The choice of hazards is motivated by the following considerations:
• EASIS is concerned with integrated safety systems. It is possible to envisage such systems
creating hazards 1–5.
• One extremely critical hazard is included (6).
• The list includes both hazards that typically have an immediate effect (1, 3, 4 and to some
extent 6) and hazards that may have an effect long after their first occurrence (2, 5 and to
some extent 6).
• The list includes both hazards that are critical in extremely rare driving situations (2, 5) and
hazards that are critical in everyday driving (1, 3, 4, 6)
• Since integrated safety systems typically involve automatic braking, particular emphasis has
been placed on failures associated with this function (3, 4, 5).

14.11.2006 2.0 B-38


EASIS Deliverable D3.2 Part 1 - App B

• Airbags (hazard 1) can be considered to be a part of a conventional vehicle, whereas “collision


avoidance by braking” is more of a “future” technology. By including hazards 2 and 5, we can
investigate how the classification approaches deal with this difference.
• Hazards 3 and 4 allow us to investigate how the duration of the malfunction is accounted for in
the different classification approaches.
• In total, there are six hazards. This seems to be sufficient for covering different classes of
hazards while still being few enough to allow the results of the hazard classifications to be
discussed and commented with a reasonable effort.

B.3.3.2 The schemes considered

The following hazard classification schemes were used. The rationale for choosing each one is
given.
• The “ASIL” approach of ISO WD 26262. This was chosen since ISO 26262 will be the future
standard for functional safety of automotive electronic systems and the scheme represents an
emerging automotive approach.
• The MISRA Risk Graph approach [19]. This was chosen since this represents another
automotive approach that has been used successfully for a number of years. The MISRA Risk
Graph incorporates the established “Controllability” approach that was originally developed for
telematic/ITS applications. Therefore the MISRA approach is seen as one method that can
take account of future systems as well as today’s systems.
• The EASIS approach. This new approach proposed by the EASIS project should be compared
against these existing approaches.

B.3.3.3 Results of hazard classification

The results of the benchmarking using the different methods are presented. Note that all of the
results presented are the result of an exercise for the purposes of the EASIS project and should
not be taken as representative of the hazard classifications that are to be applied to a production
system. For any production system, a hazard analysis has to be carried out starting from the
constraints, system boundary, assumptions, etc. applicable to that system and the vehicle it will be
installed in.

B.3.3.3.1 ISO WD26262 method

The results of applying the ISO method independently by two different analysts are presented in
the following tables.

B.3.3.3.1.1Analyst 1

Hazard S - Severity E - Exposure C - Controllability ASIL


Unwanted S3 E4 C3 D
deployment of
airbag Driver distraction The driver is exposed to this The driver may be distracted
and possible potential hazard during the and/or disorientated by the airbag
incapacitation majority of all driving situations firing and may react instinctively
meaning the worst as the airbag is continually ready e.g. jerk the steering wheel. In
case is an accident for action this scenario, even such a small
leading to life- deviation may require skilled and
threatening injuries (recall E is the probability of rapid intervention to correct.
exposure to driving situation Estimate 50% of drivers could
where accident can potentially have an initial panic reaction.
happen) Furthermore high likelihood of

14.11.2006 2.0 B-39


EASIS Deliverable D3.2 Part 1 - App B

driver being disorientated or even


rendered unconscious. Although
C2 may have seemed more
appropriate initially given how
drivers would react (if conscious)
the potential for incapacity leads
to a classification of C3.
Airbag function S3 E1 C3 A
permanently
unavailable, We consider the Accidents are very rare events, No driver action is possible to
without any situations where the occurring less than once per year control the outcome of this
information to the airbag should have per vehicle hazard.
driver about this deployed which by
unavailability definition are
scenarios leading
to an S3 outcome
Unwanted full S3 E4 C3 D
braking (around
10ms-2), not Driving at high Very common driving scenario The expected task of the driver of
limited in time speed on a this vehicle is to maintain
motorway with steering control of the vehicle as
heavy traffic it slows (presumably to a halt).
Other traffic participants will be
expected to recognize and react
to the sudden braking and
stopping of this vehicle.
Unwanted full S2 E4 C2 B
braking (around
10ms-2), 1 Driving at high Very common driving scenario The expected task of the driver is
second duration speed on a to maintain control then recover
motorway with from the deceleration. Other
heavy traffic traffic participants will be
expected to recognize and react
Speed reduction is to the sudden braking of this
about 36 km h-1 so vehicle.
the effects are
severe rather than
life-threatening as
per the guidance in
Annex A of the WD.
“Collision S3 E2 C3 B
avoidance by
braking” function Almost certain E2 is chosen as a conservative The expected task is to apply the
unavailable, collision with estimate (likely to have a few brakes manually. However, by
without any another vehicle. instances a year where traffic the time the driver is required to
information to the Consider here the might slow suddenly on a brake manually it will almost
driver about this worst-case impact motorway) and if vehicle is fitted certainly be too late to avoid a
unavailability (∆v>40 km h-1) as with CA a driver may start to rely collision
per the guidance in on it – risk compensation). Hence
Annex A of the WD) exposure is based on the
frequency of sudden braking
(whether to be initiated by driver
or by system) rather than
exposure to the situation where
CA would be required.
Total loss of S3 E4 C3 D
service brake
function, i.e. not Possible collision Brakes are required to be The expected task is to use
possible for the with another operative all the time handbrake, gears and steering to
driver to brake vehicle. Consider slow the vehicle and reach a safe
using the brake here the worst-case location. This requires a high
pedal impact (∆v>40 km degree of skill. A few drivers may
h-1) as per the be able to control in favourable
guidance in Annex circumstances.
A of the WD)

14.11.2006 2.0 B-40


EASIS Deliverable D3.2 Part 1 - App B

B.3.3.3.1.2Analyst 2

Hazard S - Severity E - Exposure C - Controllability ASIL


Unwanted deployment of S3 E4 C2 C
airbag
The driver may be Situations where 1%-10% of drivers are No need to
so distracted that life-threatening assumed to be incapable of investigate S2
a life-threatening accidents may avoiding a life-threatening and S1 effects
accident occurs occur represent accident as ASIL will not
over 10% of the be above C
driving time anyway
Airbag function S3 E1 C3 A
permanently unavailable,
without any information The lack of airbag Collisions are very Once the collision occurs, the No need to
to the driver about this function could lead rare events, driver can not control the investigate S1-
unavailability to life-threatening occurring less situation S2 effects as
effects than once per year ASIL is limited to
per vehicle A by the E1
parameter
Unwanted full braking S3 E4 C2 C
(around 10m/s^2), not
limited in time Driving at high Situations where 1%-10% of drivers are No need to
speed with life-threatening assumed to be incapable of investigate S1-
another vehicle accidents may avoiding a life-threatening S2 effects as
behind. occur represent accident. (In this case, ASIL will not be
over 10% of the Controllability refers to the above C anyway
driving time ability of the driver in the
vehicle behind.)
Unwanted full braking S2 E4 C2 B
(around 10m/s^2), 1
second duration Driving at high Situations where 1%-10% of drivers are No need to
speed with severe accidents assumed to be incapable of investigate S1
another vehicle may occur avoiding severe accident. (In effects as ASIL
behind. represent 1%-10% this case, Controllability will not be above
of the driving time refers to the ability of the B anyway
Speed reduction is driver in the vehicle behind.)
about 36 km/h so
the effects are
severe rather than
life-threatening.
‘Collision avoidance by S3 E1 C3 A
braking’ function
unavailable, without any The lack of Collisions are very Once the collision occurs, the No need to
information to the driver collision avoidance rare events, driver can not control the investigate S1-
about this unavailability could lead to life- occurring less situation S2 effects as
threatening effects than once per year ASIL is limited to
per vehicle A by the E1
parameter
Total loss of service S3 E4 C3 D
brake function, i.e. not
possible for the driver to The lack of brake Situations where More than 10% of drivers are No need to
brake using the brake function could lead life-threatening assumed to be incapable of investigate S1-
pedal to life-threatening accidents may avoiding a life-threatening S2 effects as D
effects occur represent accident is the highest
over 10% of the possible ASIL
driving time rating

14.11.2006 2.0 B-41


EASIS Deliverable D3.2 Part 1 - App B

B.3.3.3.2 MISRA Risk Graph

The results of applying the MISRA method independently by two different analysts are presented in
the following sections.

B.3.3.3.2.1Analyst 1

Unwanted deployment of airbag


It is assumed that this is a simple single-stage driver airbag system and that no other systems are
dependent on the correct functioning of the airbag system. This means in particular that the
deployment of the airbag is not used to initiate functions such as cutting off the engine fuel supply
or to sending an automatic “mayday” call.
Using the MISRA Risk Graph, the following classification is made:
Severity S2 (maximum of one fatality)
Moving/non-moving Moving – the vehicle is assumed to be driving and the driver’s control of the
safety of the situation must be assessed
Controllability
I A B T Controllability Commentary
It is assumed that no other systems are dependent
E
on the correct functioning of the airbag system.
No functionality is lost by any of the systems with
E
which the driver can control the vehicle
C4 All systems with which the driver can control the
E
vehicle remain available
The driver may be severely distracted or even
rendered unconscious by the event, therefore the
A
reaction time is considered to be much faster than
human to reflect this.

The controllability classification is C4, which is defined as “This relates to failures whose effects
are not controllable by the vehicle occupant(s). The outcome cannot be influenced by a human
response. Avoidance of an accident is usually extremely difficult.” Although the firing of the airbag
does not affect the systems with which the driver controls the vehicle, the driver may be severely
distracted or even rendered unconscious by the event. Thus C4 is an appropriate classification.
Note, the “A” parameter in controllability has been graded as “E”. This is on the basis that no
functionality within the system boundary has been lost with which the driver can control the safety
of the situation following the occurrence of the hazard. Similarly the “B” parameter has been
graded “E” as all the standard vehicle control functions are assumed to be unaffected.
This leads to a hazard classification of R4.
Airbag function unavailable without warning
It should be noted that the hazard is that the airbag function is unavailable (presumably due to
some fault) but that this hazard will not lead to an unwanted occurrence until a demand should be
made to operate the airbag system. Using the MISRA Risk Graph, the following classification is
made:
Severity S2 (maximum of one fatality)
Moving/non-moving Non-moving – this is a hazard associated with a protection function

14.11.2006 2.0 B-42


EASIS Deliverable D3.2 Part 1 - App B

Exposure F1 exposure to an accident is rare


Possibility to avoid P2 no possibility to avoid hazard
This leads to a hazard classification of R3.
Unwanted full braking, unlimited duration
This hazard could originate either from within the brake system itself, or from a higher-level system
making an incorrect demand for braking (e.g. for an emergency stop) which the brake system then
(correctly) provides. This hazard is considered from the perspective of a stand-alone braking
system for the reasons given above. The analysis could also be considered from the perspective
of a system providing a function such as collision avoidance or emergency stop that commands
the braking system to generate maximum braking demand. In a real-life analysis, such decisions
would be made based on having an appropriate specification for the system being analysed
including a precise definition of the system boundary. A fuller discussion of this subject can be
found in the forthcoming MISRA Safety Analysis guidelines [19].
Using the MISRA Risk Graph, the following classification is made:
Severity S2 (maximum of one fatality)
Moving/non-moving Moving – the vehicle is assumed to be driving and the driver’s control of the
safety of the situation must be assessed

Controllability
I A B T Controllability Commentary
It is assumed that no other systems are dependent on
E the correct functioning of the stand-alone braking
system.
It is assumed assume that steering is unaffected
B (assuming ABS is functioning) but there is no
longitudinal control of the vehicle possible.
It is assumed that steering is unaffected (assuming
ABS is functioning) but there is no longitudinal control
C3/C4 of the vehicle possible and there are no other systems
with which this control can be effected. Note that if
B/A
the analysis had been concerned with a hazard such
as loss of engine power this would have been ranked
“C” since steering, brakes, etc. would be unaffected in
the short term.
The driver will have to react extremely quickly to a
B very unusual and unexpected situation, but it should
be obvious what needs to be done.

The controllability classification is C3, which is defined as “This relates to failures whose effects
are not normally controllable by the vehicle occupant(s) but could, under favourable
circumstances, be influenced by an experienced human response. Avoidance of an accident is
usually very difficult.”
It could also be argued that the main outcome of this hazard is that another vehicle collides with
this vehicle. In this case the controllability classification C4 would be more appropriate as there is
nothing the driver of this vehicle can do to prevent this. This would be reflected by choosing the
grade “A” for backups (since no backups are available with which to control the outcome).

14.11.2006 2.0 B-43


EASIS Deliverable D3.2 Part 1 - App B

If this hazard was alternatively considered from the perspective of an incorrect request for braking
by an emergency brake assist function, then this analysis could also be conducted by treating this
as a failure of a protection system. In this case the following apply:
Severity S2 (maximum of one fatality)
Moving/non-moving Non-moving – this is a hazard associated with a protection function
Exposure F2 the hazard could occur at any time during driving
Possibility to avoid P2 no possibility to avoid hazard (worst case: hit by another car)
This leads to a hazard classification of R4.
Therefore, overall, taking a conservative approach leads to a hazard classification of R4.
Unwanted full braking, 1 s duration
This hazard is again assumed to originate from the brake system itself, rather than from a higher-
level system making an incorrect demand for braking which the brake system then (correctly)
provides. Using the MISRA Risk Graph, the following classification is made:

Severity S2 (maximum of one fatality)


Moving/non-moving Moving – the vehicle is assumed to be driving and the driver’s control of the
safety of the situation must be assessed

Controllability
I A B T Controllability Commentary
It is assumed that no other systems are dependent on
E the correct functioning of the stand-alone braking
system.
We assume that steering is unaffected (assuming ABS
B is functioning) but there is no longitudinal control of
the vehicle possible.
We assume that steering is unaffected (assuming ABS
C3 is functioning) but there is no longitudinal control of
B
the vehicle possible and there are no other systems
with which this control can be effected.
The driver will have to react quickly but provided they
are driving to always leave an “escape route” (as
C/B recommended by driving instruction organizations)
they should still be able to steer the vehicle to a safe
location.

The principal distinct between this and the previous hazard is that whilst they both remove all but
the absolute minimum of control from the driver, on this occasion full control is returned after one
second. The final outcome then depends on how well the driver is able to handle this unexpected
and unusual situation.
The controllability classification is C3, which is defined as “This relates to failures whose effects
are not normally controllable by the vehicle occupant(s) but could, under favourable
circumstances, be influenced by an experienced human response. Avoidance of an accident is
usually very difficult.” This is still a reasonable classification since it is necessary to consider the
effect on the driver after the event has occurred – will they be shocked and how quickly would they
recover? Discussions with a vehicle dynamics expert have indicated that for the majority of

14.11.2006 2.0 B-44


EASIS Deliverable D3.2 Part 1 - App B

drivers, this situation would still be a severe shock and different drivers would react in different
ways to this.
Again a similar argument could be made to the above concerning collision with another vehicle
from behind, although in the case of a short duration event it is more likely that the other driver can
recover, providing there is a suitable/recommended inter-vehicle gap.
This leads to a hazard classification of R3.
“Collision avoidance by braking” function unavailable without warning
Using the MISRA Risk Graph and treating this as a “protection” function the following classification
applies:
Severity S2 (maximum of one fatality)
Moving/non-moving Non-moving – this is a hazard associated with a protection function
Exposure F1 exposure to a situation where it would be needed is rare
Possibility to avoid P2 no possibility to avoid hazard
This leads to a hazard classification of R3.

Alternatively, this could be considered as a moving vehicle hazard. In this case we have:
Severity S2 (maximum of one fatality)
Moving/non-moving Moving – the vehicle is assumed to be driving and the driver’s control of the
safety of the situation must be assessed
Controllability
I A B T Controllability Commentary
The collision avoidance system provides data for the
braking system, not only to initiate the collision
C
avoidance service, but also to say that the service is
not required.
E No control over basic vehicle functions has been lost
C4 No control over basic vehicle functions has been lost
E therefore the driver can use them to avoid a
hazardous situation
If the vehicle is already in a situation where the
collision avoidance function is required, it is unlikely
A
that a human could react in the time required to
prevent a collision.

The controllability classification is C4, which is defined as “This relates to failures whose effects
are not controllable by the vehicle occupant(s). The outcome cannot be influenced by a human
response. Avoidance of an accident is usually extremely difficult.” If the vehicle is already in a
situation where the collision avoidance is required, it is unlikely that a human could react in the
time required to prevent a collision. Thus C4 is an appropriate classification.
This leads to a hazard classification of R4.
Note that in practice the correct choice of which part of the risk graph to apply will depend on a
precise definition of the system and the safety envelope, which are not available for the purposes
of this exercise.

14.11.2006 2.0 B-45


EASIS Deliverable D3.2 Part 1 - App B

Total loss of service brake


This is a moving vehicle hazard. Thus for the hazard classification the following applies:
Severity S2 (maximum of one fatality)
Moving/non-moving Moving – the vehicle is assumed to be driving and the driver’s control of the
safety of the situation must be assessed
Controllability
I A B T Controllability Commentary
It is assumed that no other systems are dependent
E on the correct functioning of the stand-alone braking
system.
Partial control over basic vehicle functions has been
lost – the driver is unable to control the longitudinal
C motion of the vehicle with the service brake. It is
assumed that the park brake is unaffected and can
be used when the vehicle is in motion.
C4
The park brake and transmission can be used to slow
the vehicle, and the steering is unaffected (at least in
C
the short term and assuming it is an independent
system from the brakes).
Once the driver has demanded braking (including a
reaction time), and has recognized that the loss of
A
brakes is real, an extremely fast response is needed
to apply the backups.

The controllability classification is C4, which is defined as “This relates to failures whose effects
are not controllable by the vehicle occupant(s). The outcome cannot be influenced by a human
response. Avoidance of an accident is usually extremely difficult.” The initial reaction of most
drivers to a failure to brake is to press the brake pedal harder. In some circumstances, by the time
the driver has recognized this is not working it may be too late to apply the backups. Thus C4 is
an appropriate classification.
This leads to a hazard classification of R4.

B.3.3.3.2.2Analyst 2

Hazard S F/C P R
Unwanted deployment of airbag S2 C3 - R3
Airbag function permanently unavailable, without any information to the driver about this S2 F1 P2 R3
unavailability
Unwanted full braking (around 10ms-2), not limited in time S2 C2 - R2
-2
Unwanted full braking (around 10ms ), 1 second duration S2 C2 - R2
“Collision avoidance by braking” function unavailable, without any information to the driver about S2 F1 P2 R3
this unavailability
Total loss of service brake function, i.e. not possible for the driver to brake using the brake pedal S2 C3 - R3

14.11.2006 2.0 B-46


EASIS Deliverable D3.2 Part 1 - App B

B.3.3.3.3 EASIS method

The results of applying the proposed EASIS method to the hazard (by "Analyst 2" only) are
presented in the following table.
Failure (Hazard) S - E - Exposure P- E*P HC - Hazard HC - Hazard
Severity Criticality Criticality (final)
Possibility
of non- (preliminary)
avoidance
Unwanted S2 0.5 0.01 0.005 C C
deployment of airbag
S1 0.9 0.1 0.09 C
Airbag function S2 0.0005 1 0.0005 B A
permanently
unavailable, without Note: the Note: The lack of
any information to the remainder of airbag function does
driver about this the vehicle life not cause the
unavailability is considered accident.
here
Furthermore, the
S1 0.005 1 0.005 B consequences could
be fatal even if the
Note: the airbag would have
remainder of worked.
the vehicle life
is considered => reduce HC from B
here to A.
Unwanted full braking S2 0.5 0.01 0.005 C C
(around 10m/s^2),
not limited in time S1 0.9 0.1 0.09 C

Unwanted full braking S2 0.01 0.005 0.00005 A B


(around 10m/s^2), 1
second duration S1 0.1 0.01 0.001 B

‘Collision avoidance S2 0.0005 1 0.0005 B A


by braking’ function
unavailable, without Note: the Note: The lack of
any information to the remainder of collision avoidance
driver about this the vehicle life function does not
unavailability cause the accident.
S1 0.005 1 0.005 B
Furthermore, Collision
Note: the Avoidance can not
remainder of today be considered
the vehicle life as a part of the
standard vehicle
equipment.
=> reduce HC from B
to A or to QM,
depending on how the
Collision Avoidance is
marketed.
Total loss of service S2 0.5 0.2 0.1 D D
brake function, i.e.
not possible for the S1 1.0 0.1 0.1 C
driver to brake using
the brake pedal

14.11.2006 2.0 B-47


EASIS Deliverable D3.2 Part 1 - App B

B.3.3.4 Summary of results

The following table presents a summary of the results of hazard classification using different
approaches.
Classification
Hazard ISO1 ISO2 MISRA1 MISRA2 EASIS
Unwanted deployment of airbag D C R4 R3 C
Airbag function unavailable A A R3 R3 A
Unwanted full braking (around 10 ms-2), not D C R4 R2 C
limited in time
Unwanted full braking (around 10 ms-2), 1 B B R3 R2 B
second duration
“Collision avoidance by braking” function B A R3 R3 A
unavailable
Total loss of service brake function D D R4 R3 D

The results presented above are the result of an exercise for the purposes of the EASIS project
and should not be taken as representative of the hazard classifications that are to be applied to a
production system. For any production system, a hazard analysis has to be carried out starting
from the constraints, system boundary, assumptions, etc. applicable to that system and the vehicle
it will be installed in.

B.3.3.5 Discussion of results

B.3.3.5.1 Mapping between schemes

The ISO approach is viewed by many as the method that will be applied in the future. Hence the
results of the hazard classification benchmarking are evaluated with respect to that method. In
order to make a meaningful comparison, we have restated the results with respect to the
equivalent ISO classifications in the following table:
Classification
Hazard ISO1 ISO2 MISRA1 MISRA2 EASIS
Unwanted deployment of airbag D C D C C
Airbag function unavailable A A C C A
Unwanted full braking (around 10 ms-2), not D C D B C
limited in time
Unwanted full braking (around 10 ms-2), 1 B B C B B
second duration
“Collision avoidance by braking” function B A C C A
unavailable
Total loss of service brake function D D D C D

14.11.2006 2.0 B-48


EASIS Deliverable D3.2 Part 1 - App B

In making this comparison, the following mappings have been assumed. Note that in both these
cases, these mappings only apply for hazards where the severity has been rated as S2/S3 (ISO),
S2 (MISRA), S2 (EASIS). This applied to all of the hazards benchmarked. In general it is not easy
to compare the hazard classifications like-for-like as the following severity mappings seem to
apply:
ISO MISRA EASIS
S0 (S0) No equivalent
S1 S1 S1
S2
S2 S2
S3
No equivalent S3 No equivalent

For comparing MISRA results to ISO results, the following mapping of hazard classifications was
used:

MISRA ISO
NR QM
R1 ASIL A
R2 ASIL B
R3 ASIL C
R4 ASIL D
R5 No equivalent

For comparing EASIS results to ISO results, the following mapping of hazard classifications was
used:

EASIS ISO
HC A ASIL A
HC B ASIL B
HC C ASIL C
HC D ASIL D
HC E

The EASIS mapping is explained as follows. In ISO WD26262 E and C are defined as ranges.
For example, E4 means an exposure range > 10% and C2 means that 1%–10% of the drivers
cannot control the damage. The relationship between “ISO E*C” and “EASIS E*C” can therefore be
summarized as follows:
14.11.2006 2.0 B-49
EASIS Deliverable D3.2 Part 1 - App B

ISO EASIS
E*C range EASIS E*C
calculated based on ranges
Possible E and Actual values represented by E
the ranges of actual corresponding
C and C, expressed as ranges
ISO E*C values represented to the ISO E*C
combinations
by E and C range
0.01–0.1
1 E4, C3 E4:(0.1–1), C3:(0.1–1) 0.01–1 and
0.1–1
E4, C2 E4:(0.1–1), C2:(0.01–0.1) 0.001–0.01
0.1 or or 0.001–0.1 and
E3, C3 E3:(0.01–0.1), C3:(0.1–1) 0.01–0.1
E4, C1 E4:(0.1–1), C1:(0.001–0.01)
or or 0.0001–0.001
0.01 E3, C2 E3:(0.01–0.1), C2:(0.01–0.1) 0.0001–0.01 and
or or 0.001–0.01
E2, C3 E2:(0.001–0.01), C3:(0.1–1)
E3, C1 E3:(0.01–0.1), C1:(0.001–0.01)
or or 0.00001–0.0001
0.001 E2, C2 E2:(0.001–0.01), C2:(0.01–0.1) 0.00001–0.001 and
or or 0.0001–0.001
E1, C3 E1:(0.0001–0.001), C3:(0.1–1)

This table shows that there is not a one-to-one mapping between E*C in ISO and the E*C ranges
in EASIS. Instead there is an overlapping relationship, also visible in the earlier table that shows
the HC/ASIL relationship. Note that for an actual Exposure*Controllability value of 0.05, the ISO
approach would lead to an E*C classification of either 1 or 0.1 (see the green cells above),
depending on how this 0.05 figure is decomposed into E and C factors. The EASIS approach does
not have this overlap possibility since every actual Exposure*Controllability value will be mapped to
exactly on of the intervals from 0.00001–0.0001 to 0.1–1. This is one of the main distinctions in the
EASIS scheme; the discretisation into distinct levels is done after the E*C multiplication, not
before as in ISO WD26262.

B.3.3.5.2 Comparison of results

For the hazards “Unwanted deployment of airbag” and “Total loss of service brake function” the
results found between the different methods were largely consistent. The difference in rankings
between ASIL C and ASIL D is mainly due to different analyst perspectives on the ability of the
driver to influence the outcome. In a real-life hazard analysis, these results would be subjected to a
more detailed review, including the use of domain experts.
For the unwanted full braking hazards, there was some variability in the results although in general
all the methods classified them as ASIL C or D for unlimited duration and ASIL B or C for duration
limited to one second. Thus it is observed that the methods generally indicate a lower classification
for a shorter duration hazard, in line with expectations.

14.11.2006 2.0 B-50


EASIS Deliverable D3.2 Part 1 - App B

For the hazard “airbag function unavailable”, the MISRA method classifies this as ASIL C whereas
both the ISO and EASIS approaches classify this as ASIL A. This results from the option in the
MISRA Risk Graph to treat this as a hazard associated with failure of a protection function.
For the hazard “collision avoidance by braking unavailable” the greatest variability in the results
was seen ranging from ASIL A to ASIL D. The MISRA method led to classifications of ASIL C or D
whereas ISO and EASIS gave rise to ASIL A and ASIL B. The results from the ISO and EASIS
method are principally due to the “exposure” rating being chosen as E1 or E2 depending on the
analyst’s viewpoint. This choice of exposure parameter is from the perspective of situations where
collision avoidance might be required rather than saying it is a hazard present during all driving. In
the MISRA method, if the controllability approach is chosen the exposure parameter is not
evaluated as it is considered that hazards associated with the control of a moving vehicle could
occur at any time the vehicle is being driven. Alternatively the hazard risk for this type of hazard
can be evaluated according to failure to operate on demand.
For the braking-related hazards, in most cases the results obtained using the MISRA approach are
fairly consistent, in that the hazards associated with braking functions are ranked R3 (ISO ASIL C)
or R4 (ISO ASIL D). The classification of the unwanted full braking hazard with 1 second duration
could be argued as R4 from a more conservative viewpoint.
At first sight, the potential for the MISRA approach to classify a hazard such as “collision
avoidance by braking unavailable” at a similar level to complete loss of braking may appear to
show a wide variability between the approaches. However the difference is almost entirely due to
the ISO approach permitting a much wider range of exposure values. In effect, the ISO approach
is arguing for reduced risk reduction requirements based on the low-demand nature of the system
whereas the MISRA approach tends to a more conservative result. Furthermore, it could be
argued that the ISO approach effectively has an “ALARP” type argument built into its exposure
classes (it is argued that the frequency of exposure to the hazard is so small that the required
effort to achieve a higher risk reduction is disproportionate). On the other hand, this step would be
explicitly considered in the “risk analysis” phase of a full MISRA analysis which has not been done
for the purposes of this exercise. Such a risk analysis could also take into account wider issues
such as purely commercial considerations such as the perception of product quality.
Clearly such issues will need to be addressed in defining the “broadly acceptable” risk associated
with ISS and whether different levels of risk may need to apply depending on the type of system.
This is particularly the case with the expectation that ISS will contribute to changing (improving) the
current level of “broadly acceptable” risk.

B.3.3.6 Conclusions

It was found that in general the three hazard classification approaches gave largely consistent
results. Some variability due to different perspectives of the analysts was observed, but this would
be eliminated in a real-life analysis due to the definition of the safety envelope, system boundary,
etc. which are all unknown for the purposes of this exercise.
The MISRA and EASIS approaches were both found to have some advantages compared to the
ISO approach:
• The MISRA method has the option to use an alternative mechanism for hazard classification
for “protection”-like functions such as collision avoidance and airbag systems, particularly for
failure to operate on demand.
• The EASIS method avoids the possibility of overlaps and anomalies in ASIL allocation that may
occur when using the ISO method.
The widest variability between methods was seen when applying to “on demand” functions that
may be required to operate very infrequently. The ISO (and EASIS) approaches tend to consider
these from the perspective of the overall risk reduction required for the vehicle, but the MISRA
approach from the perspective of the risk reduction required from the specific system and tends to
a much more conservative estimate.
14.11.2006 2.0 B-51
EASIS Deliverable D3.2 Part 1 - App B

We can make the following general observations about hazard classifications schemes applied to
ISS. A correct approach to hazard classification is to first examine the hazard and the way the
system of concern contributes to the hazard and then to choose the parameters used for hazard
classification.
Possible ways that the system of concern might contribute to hazards are:
• Hazard risk reduction: In this case there a hazard exists that is present even without the
system of concern being present (a non-system hazard). The system of concern provides a
safety function to reduce the risk of this hazard. The criticality of the safety function depends
on the necessary risk reduction for the non-system hazard risk. The system hazard is the
inability to provide the safety function.
• Hazard creation: Here three different cases can be distinguished:
• Hazard creation by an error state or failure of the system of concern
• Hazard creation by a dangerous function of the system of concern
• Hazard creation by a non-functional interaction of the system of concern with its
environment.
Furthermore, the created hazards can be assigned to one of two categories depending on to their
possible effects:
• Hazards that have an effect on the controllability of the moving vehicle by the driver
• Hazards that have further effects
Note that some hazards might fall into both categories and thus have to be considered from both
perspectives. The following diagram clarifies this view:

Related system Hazard:


inability to provide safety Non-System Hazard Risk Classification
function for risk reduction Hazard R=f(Frequency,Severity) Scheme I

Risk Risk
reduction reduction
System of
? by other
system
Concern

Hazard
creation Controllability- Classification
related effects Scheme II
Hazard
System activation
Hazard
Exposure
Classification
Further effects Scheme III

Depending on the system of concern hazard contribution and the hazard effect three different
classification schemes might be suitable with different parameter sets:
• Classification Scheme I: HC = f (Non-System Hazard Risk) = f (Severity, Frequency)
• Classification Scheme II: HC = f (Exposure, Controllability, Severity)
• Classification Scheme III: HC = f (Exposure, Possibility to avoid, Severity)
The following points should be noted:
• It may be possible to omit the severity category in classification scheme II. The only possible
distinction may be between light and heavy vehicles, since we are always considering a
possible crash and the number of victims might vary depending on whether it is a light or heavy
vehicle that gets out of control.

14.11.2006 2.0 B-52


EASIS Deliverable D3.2 Part 1 - App B

• In classification scheme III a finer granularity of the severity parameter might be suitable.
• The possibility to avoid in classification scheme III should take into account factors such as the
P and the W parameters from the ISO/IEC 61508 Part 5 Annex D risk graph, i.e. the possibility
that a human can avoid the consequences of an activated hazard and the possibility that the
activated hazard might not necessarily lead to damage or harm.
• The ISO/WD 26262 and EASIS approaches can be used for classification schemes II and III by
careful interpretation of the parameters. They cannot be used for classification scheme I even if
the parameters exposure and controllability are not estimated separately but instead the
product E*C is replaced by the parameter F (Frequency). The reason is that the product E*C is
a conditional probability and not the frequency needed in classification scheme I.
• The MISRA approach distinguishes between classification scheme II and III but does not fully
account for classification scheme I (it can be applied to such hazards but the classification
scheme depending only on severity and frequency has not been developed). For classification
scheme II in the MISRA approach the exposure is not included at present which can lead to a
different result for some hazards (e.g. “unavailability of the collision avoidance function”) if this
scheme is used to classify such hazards.

14.11.2006 2.0 B-53


EASIS Deliverable D3.2 Part 1 - App B

B.4 References

[1] ARP4761, Guidelines and methods for conducting the safety assessment process on civil airborne
systems and equipment, SAE Committee S-18, Society of Automotive Engineers, Inc., August 1995
[2] ATM Safety Techniques and Toolbox Issue 1.0, EUROCONTROL/FAA, January 24 2005;
http://www.eurocontrol.int/eec/public/standard_page/safety_doc_techniques_and_toolbox.html

[3] M. Findeis, Functional Safety in the Automotive Industry, Process and methods, Presentation at VDA
Winter Meeting, February 2006; http://www.vda-wintermeeting.de/downloads2006/Matthias_Findeis.pdf
[4] Interim Defence Standard 00-56, Safety Management Requirements for Defence Systems, Issue 3, UK
Ministry of Defence, December 2004.
[5] IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems, IEC,
1998–2005 (8 parts).
[6] IEC 61511, Functional safety – safety instrumented systems for the process industry sector, IEC, 2003–
2004 (3 parts).
[7] Interim Defence Standards 00-58 Issue 1: HAZOP Studies on Systems Containing Programmable
Electronics, UK Ministry of Defence, July 1996
[8] Towards a European Standard: The Development of Safe Road Transport Informatic Systems, Project
DRIVE Safely (V1051), 1992
[9] P. Johannessen, C. Grante, A. Alminger, U. Eklund, J. Torin, “Hazard Analysis in Object Oriented Design
of Dependable Systems”, International Conference on Dependable Systems and Networks (DSN'01),
2001.
[10] P. H. Jesty, K. M. Hobley, R .Evans, I. Kendall, “Safety analysis of vehicle-based systems”, Aspects of
Safety Management: Proceedings of the Ninth Safety Critical Systems Symposium, F. Redmill &T.
Anderson (ed.), Springer-Verlag, 2000, pp. 90-110; http://www.misra.org.uk/papers/SCSC00-SA.PDF
[11] C. Jung, Stand des ISO-Standards zur Funktionalen Sicherheit für die Automobilindustrie, (mainly in
English), Presentation at Safetronic 2005
[12] N. Leveson, Safeware - System, Safety and Computers, Addison Wesley, 1995.
[13] Development Guidelines for Vehicle Based Software, MIRA, 1994. Also available as ISO/TR 15497:2000
[14] Controllability, MISRA Technical Report, Version 1, May 2004; http://www.misra.org.uk
[15] Y. Papadopoulos, J. McDermid, R. Sasse, G. Heiner, “Analysis and synthesis of the behaviour of complex
programmable electronic systems in conditions of failure”, Reliability Engineering and System Safety 71,
2001, pp. 229-247

[16] Framework for the development and assessment of safety-related UTMC2 systems, UTMC22 Project;
http://www.utmc.gov.uk/utmc22/pdf/utmc22-framework.pdf
[17] MIL-STD-882D, Standard Practice for System Safety, U.S. Department of Defence, 2000.

[18] SAE J1739, Potential Failure Mode and Effects Analysis in Design (Design FMEA), Potential Failure Mode
and Effects Analysis in Manufacturing and Assembly, Processes (Process FMEA), and Potential Failure
Mode and Effects Analysis for Machinery (Machinery FMEA), SAE International, 2002

[19] Guidelines for the Safety Analysis of Vehicle-Based Programmable Systems, MISRA, in preparation;
http://www.misra.org.uk

2 UTMC = Urban Traffic Management and Control, a UK intelligent transportation initiative

14.11.2006 2.0 B-54


EASIS Deliverable D3.2 Part 1 - App C

Deliverable D3.2 Part 1 – Appendix C

Hazard occurrence analysis

Version number: 2.0

Date of preparation: 14.11.2006

© 2006 The EASIS Consortium


EASIS Deliverable D3.2 Part 1 - App C

This page intentionally left blank

14.11.2006 2.0 C-ii


EASIS Deliverable D3.2 Part 1 - App C

Table of contents

C.1 Introduction and objective........................................................................... C-1


C.2 Role of Hazard Occurrence Analysis in the development life cycle ........... C-2
C.2.1 Hazard Occurrence Analysis in the Early phases ............................. C-3
C.2.2 Hazard Occurrence Analysis in the Intermediate phases ................. C-4
C.2.3 Hazard Occurrence Analysis in the Final phases ............................. C-5
C.3 Description of the hazard cause types ....................................................... C-7
C.3.1 Concept of fault, error and failure...................................................... C-7
C.3.2 Concept of a fault model ................................................................... C-7
C.3.3 Possible generalization of "fault" to "unexpected event leading to
undesirable system behaviour" ......................................................... C-8
C.4 Description of the investigation methods.................................................. C-11
C.4.1 Qualitative and Quantitative Analysis.............................................. C-11
C.4.2 Inductive, Deductive and Exploratory reasoning ............................. C-11
C.4.3 Description of common techniques of analysis ............................... C-12
C.4.3.1 Failure mode and effects analysis (FMEA) .............................. C-12
C.4.3.2 Fault tree analysis (FTA) .......................................................... C-15
C.4.3.3 Hazard and operability study (HAZOP) .................................... C-17
C.4.4 Other techniques of analysis for static systems .............................. C-19
C.4.4.1 Event tree analysis (ETA)......................................................... C-19
C.4.4.2 Cause-Consequence Analysis ................................................. C-20
C.4.4.3 Discussion and Conclusion ...................................................... C-20
C.4.5 Techniques of analysis for dynamic systems .................................. C-21
C.4.5.1 Markov Modelling ..................................................................... C-21
C.4.5.2 Dynamic Event Tree Analysis .................................................. C-22
C.4.5.3 Discussion and Conclusion ...................................................... C-22
C.5 Investigation of dependent failures........................................................... C-23
C.5.1 Definitions ............................................................................... C-23
C.5.1.1 Dependent failures ................................................................... C-24
C.5.1.2 Common cause/mode failures in automotive systems ............. C-24
C.5.1.3 Cascade failure ...................................................................... C-25
C.5.1.4 Dependent failures and safety barriers .................................... C-26
C.5.1.5 Root Causes and Coupling Factors ......................................... C-27
C.5.2 Kinds of dependent failure causes .................................................. C-28
C.5.3 Methods for identifying and analyzing dependent failures .............. C-28
C.5.3.1 FTA for dependent failure analysis .......................................... C-28
C.5.3.2 FMEA for dependent failure analysis ....................................... C-28

14.11.2006 2.0 C-iii


EASIS Deliverable D3.2 Part 1 - App C

C.5.3.3 SAE-ARP 4754 and SAE-ARP 4761........................................ C-29


C.5.3.4 IEC 61508 ...................................................................... C-30
C.6 Investigation of software failures .............................................................. C-32
C.6.1 Describing Software FMEA Techniques and Different Levels of
Analysis ............................................................................... C-32
C.6.1.1 Software FMEA Compared to System Level FMEA that
includes SW Architecture ......................................................... C-33
C.6.2 The Different FMEA Methods and Formats..................................... C-33
C.6.2.1 Process FMEA applied to Software.......................................... C-34
C.6.2.2 Component FMEA applied to Software .................................... C-35
C.6.2.3 Safety FMEA applied to Software ............................................ C-35
C.6.3 Conclusion on application of FMEA to software.............................. C-36
C.6.4 Software Fault Tree Analysis .......................................................... C-36
C.6.4.1 Software for automotive applications ....................................... C-36
C.6.4.2 The FTA Approach ................................................................... C-37
C.6.4.3 Calibration ...................................................................... C-37
C.6.4.4 Conclusions on software FTA .................................................. C-37
C.6.5 System Software Block Failure Analysis ......................................... C-38
C.6.5.1 Overview ...................................................................... C-38
C.6.5.2 Process description.................................................................. C-38
C.6.5.3 Advantages and disadvantages of the method ........................ C-39
C.7 References ...................................................................................... C-40

14.11.2006 2.0 C-iv


EASIS Deliverable D3.2 Part 1 - App C

C.1 Introduction and objective

According to Appendix A, the EASIS dependability framework is based on the concept of a hazard.
More specifically, a hazard in this framework is taken to mean the following:
A hazard is an undesirable condition or state of a vehicle that could lead to an undesirable
outcome depending on other factors that can influence the outcome.
The objective of this Appendix is to provide guidance for the identification and investigation of the
causal relationships between potential faults in the system of concern and the resulting hazards.
Typically, for highly integrated and complex systems, these relationships involves several
“intermediate” failure conditions in different system layers.
The most relevant topics related to the activities addressed by this Appendix are the following:
• Role of the hazard occurrence analysis in different stages of the system lifecycle:
The general approach and the outputs of a hazard occurrence analysis depend on the
system development phase in which the analyses are performed. This topic is
addressed in section C.2.

• Description of the hazard cause types: The ever-increasing complexity of automotive


systems leads to a large variety of potential failure conditions. The main hazard causes
considered relevant for automotive electronic systems are addressed in section C.3.

• Description of the investigation methods: There are differences between analysis


techniques in terms of how they explore the relationship between cause and effect
(induction, deduction, exploratory), the type of the results expected (qualitative,
quantitative) and the presentation of the results (graphical, tabular, etc.). This topic is
addressed in section C.4.

• Investigation of dependent failures: Future generations of automotive systems are


expected to be increasingly based on redundant and standardized architectures that are
able to operate despite failures of a limited number of their hardware or software
components. In such architectures, the functional independence between the redundant
units is of particular importance. This topic is addressed in section C.5.

• Investigation of software failures: The investigation of the relationship between


design faults in the software and resulting hazards is a complex issue. It is only partially
approachable by traditional methods used for hardware analyses. This topic is
addressed in section C.6.

The concept of hazard occurrence analysis is nothing new. Investigations of relationships


between faults and failures (or hazards) has always been and will always be part of any
system development undertaking. For this reason, this appendix does not make any attempt
to cover the entire width and depth of hazard occurrence analysis. For example, no
investigation of available tools for FMEA, FTA and other methods is reported. Instead, this
appendix provides an overview of hazard occurrence analysis methods and focuses on some
issues of particular relevance for the development of automotive control systems in general
and Integrated Safety Systems in particular.

14.11.2006 2.0 C-1


EASIS Deliverable D3.2 Part 1 - App C

C.2 Role of Hazard Occurrence Analysis in the development life cycle

The Dependability Activity Framework that forms the basis for Part 1 of this EASIS deliverable is
illustrated in Figure C.1 and described in more detail in Appendix A. It can be seen that the hazard
occurrence analysis supports several other activities in the system development process:
• Hazard Identification (see Appendix B)
• Establishment of dependability-related requirements (see Appendix D)
• Safety Case construction (see Appendix E)

Identification of hazards

Development
and design of
the integrated
Classification of Hazard occurrence safety system
hazards analysis

Establishment of dependability-related
requirements
Verification and
validation of
dependability-
related
requirements
Safety Case construction

Figure C.1 Dependability Activity Framework


The higher the hazard criticality (as determined by the Hazard Classification), the more rigour is
needed in the Hazard Occurrence Analysis. Furthermore, the overall approach and the level of
detail of the analysis is determined by the level of detail in available system information and,
hence, by the system development phase in which the analysis is made.

14.11.2006 2.0 C-2


EASIS Deliverable D3.2 Part 1 - App C

Considering the overall system development process sketched in Figure C.2, it is possible to
identify three main phases in which the approach used to perform the hazard occurrence analysis
may be different: early, intermediate and later system development phases.

System Development
Requirements Design Test Process

Concept Preliminary Engineering Development


Development Design

System preliminary System detailed: Subsystem definition:


-Functions -Functions -Functions Allocation
-Requirements -Architecture -Subsystems Interaction
-Requirements -Derived Requirements

EARLY INTERMEDIATE LATER

rd ard is
ar d s is a za is az nalys
z y H ys H
y Ha Anal i ate Anal ter A
r l
Ea ence ed La ence
r term ence rr
cu
r In curr cu Hazard Occurrence
c Oc
O Oc Analyses

Figure C.2 Hazard Occurrence Analysis in the System Development Process

C.2.1 Hazard Occurrence Analysis in the Early phases

In early phases of the development, the hazard occurrence analysis is mainly focused to support
the hazard identification and classification activities (see Appendix B) and the extraction of the
dependability requirements activities (see Appendix D) by providing a clear view of how each
identified hazard is related to its potential causes.
The following information is typically available for hazard occurrence analysis in the early phases
of the design process:
• The conceptual architecture of the system
• The list of the system high-level functions
• The list of system hazards
Two main objectives can be achieved by the hazard occurrence analysis:
• Completeness of the hazards list. Starting from the system description, the use of
analysis based on inductive reasoning methods (section C.4.2) like FMEA allows the
identification of all system root causes of hazards. In this context, the hazard occurrence
analysis can be used as input to the “hazard identification” activity to complete the
hazards list

14.11.2006 2.0 C-3


EASIS Deliverable D3.2 Part 1 - App C

• System’s causal relationships investigation. Starting from each identified hazard, the
use of analysis based on deductive reasoning methods (section C.4.2) like FTA allow to
investigate on how the system root causes combine together and propagate across the
system components for leading to the specific hazard. In this context, the hazard
occurrence analysis can be used as input to the “dependability-related requirements”
activity.
In the early phases of the design process, the hazard occurrence analysis is typically performed in
a qualitative form. Thus, the causal relationships between hazards and their causes are
investigated, not the actual probabilities or rates of the hazards.

C.2.2 Hazard Occurrence Analysis in the Intermediate phases

In the intermediate phases of the development process the hazard occurrence analysis is mainly
focused to support the establishment of dependability-related requirements activities (see
Appendix D) by providing a clear view of how each identified hazard is related to its potential
causes and, when possible, an estimation of the hazard occurrence probability.
The following information are typically available for hazard occurrence analysis in the intermediate
phases of the design process:
• Detailed descriptions of the system and its subsystems in terms of HW and SW
components
• The list of the system primary functions with their allocation to the components
• The list of system hazards

The hazard occurrence analysis is typically split between the vehicle manufacturer and the
involved supplier, so that the OEM investigates how system/subsystem failures lead to vehicle-
level hazards, while the suppliers investigate how component faults may lead to system/subsystem
failures. However, the nature of this responsibility split depends heavily on the type of system
considered, so it is not possible to define a preferred approach for how to assign these
responsibilities.

As the main objectives of the hazard occurrence analysis is to determine how the system root
causes can lead to the identified hazards, analysis based on deductive reasoning methods
(section C.4.2) is recommended.

For automotive systems, Fault Tree Analysis is particularly useful during the intermediate phases
of a system development process. Starting from an undesired top level hazard (output from
hazard identification activities, see Appendix B), a Fault Tree Analysis systematically determines
all credible single faults and failure combinations of the system functional blocks at the next lower
level which could cause this hazard event. The analysis proceeds down through successively more
detailed (i.e., lower) levels of the system (subsystems and components) until a root cause is
identified. FTA is discussed in more detail in section C.4.3.2.

The main goals achievable by using a qualitative FTA in the hazard occurrence analysis are:
• Investigation of single and multiple-fault effects
• Investigation of dependent failure sources (common cause analysis)
• Investigation of fail-safe design attributes (fault-tolerant and error-tolerant).
These “outputs” of the hazard occurrence analysis are the starting point for the next activities of
dependability-related requirements specification (see Appendix D). In this context, the advantages
coming from FTA application are:
• Clear evidence of architectural features and dependability functional requirements such
as redundancies, independence, diversity etc.

14.11.2006 2.0 C-4


EASIS Deliverable D3.2 Part 1 - App C

• Systematic propagation of dependability functional requirements to system lower


breakdown level components.
• Systematic propagation of development process control (system integrity level) to lower
breakdown level components.
• Evaluation of design modifications with regards to their impact on dependability (mainly
safety, reliability and availability).
In some cases a quantification of the hazard occurrence (probability or frequency) may be
requested for the evaluation of exposure intervals, latency, and “at-risk” intervals with regard to
their overall impact on the system, or for complying with specific dependability requirements.
In such cases the hazard occurrence analysis shall support these activities (respectively
addressed in the Appendix B and Appendix D) by providing a quantification of the hazard
probability of occurrence.
With respect to the quantitative analysis in the automotive sector, it is important to note the
following:
• Due to very large volumes and different applications a precise failure data collection
“from field” is very difficult (expensive) to achieve.
• Differently from other fields such aerospace, railway, etc. specific dependability
requirements (in terms of maximum acceptable probability, or rate, of occurrence) are
not yet addressed by International Standards or Certification Authorities Regulations.
However, the upcoming ISO 26262 standard [29] may include guidance on this issue.
• It is recognized that only with respect to random hardware faults will it be possible to
estimate the actual hazard occurrence rates in a given system. Contributions to system
hazards coming from hardware systematic failure (i.e. faults which are introduced by
human in specification, design, manufacturing and installation of HW components) and
from software faults are generally not quantifiable.
• Quantitative hazard occurrence analysis is not an easy task even for random faults as there
are typically many different mechanisms implemented at different system levels for
preventing faults from leading to specific hazards. It is usually extremely difficult to estimate
the probability that a particular fault will propagate through all these mechanisms and
thereby cause the hazard. Estimation of fault rates might be relatively easy, but the
combined coverage of a large number of dependability mechanisms is much more difficult
to investigate.
Quantitative hazard occurrence analysis is therefore a difficult task which should be limited to
safety-related and confined applications.

C.2.3 Hazard Occurrence Analysis in the Final phases

In the final phases of the development process, the hazard occurrence analysis is focused to
support the verification of dependability-related requirements activities (see Appendix D) and to
document the Dependability Activities performed on the system (see Appendix E).
The analyses performed in the previous phases of the system development process are re-used
and refined in the context of the verification process to check that the implemented design meets
its dependability requirements.
As an example, let us consider a quantitative FTA conducted on a specific system hazard
identified during the early or intermediate phases of the development process. The output
expected from the analysis are:
• Quantification of the hazard probability of occurrence
• Allocation of probability budgets to lower-level components

14.11.2006 2.0 C-5


EASIS Deliverable D3.2 Part 1 - App C

• Evaluation of exposure intervals, latency, and “at-risk” intervals with regard to their
overall impact on the system.
In the context of requirements verification, the same FTA is used for the following purposes:
• To confirm the qualitative assessments with reference to the implemented detailed
design of the system and its components (subsystems/items as applicable)
• To check the budget probabilities allocated to the lower level subsystems and root
causes against their actual failure occurrence coming from failure rate historical data of
similar equipment already in field use, reliability analysis, tests or laboratory data
• To evaluate the quantitative importance of root causes with respect to the top level
hazard occurrence.

For automotive systems, it is common practice to perform a (qualitative) FMEA in order to


document the dependability activities performed on the system.

The FMEA achievements are collected into forms, that typically include fields for the following
information:
• Identification of component, signal, and/or function
• Failure modes and associated hardware failure rates (numerical or categorical)
• Failure effects (directly and/or at the next higher level)
• Detectability and means of detection
• Compensating actions (i.e. automatic or manual)
• Operating phase in which the failure occurs
• Severity of failure effects

14.11.2006 2.0 C-6


EASIS Deliverable D3.2 Part 1 - App C

C.3 Description of the hazard cause types

C.3.1 Concept of fault, error and failure

A concept for the distinction between fault, error and failure has already been proposed in [21]. As
this concept is widely used within the fault tolerance community, we have adopted it for our work in
EASIS.
In order to understand the difference between the three terms, it is important to take into account
the notion of “reliability”, which according to [21] is considered a “measure of the success with
which the system conforms to some authoritative specification of its behaviour”.
• a failure is a situation in which “the behaviour of a system deviates from the specification”
• an error is that part of the system state which is incorrect and may thus lead to a failure
• a fault is the adjudged cause for an error

This leads to a logical chain which is depicted in Figure C.3:

fault error failure

Figure C. 3 Concept of fault, error and failure

A short example follows in order to clarify this relationship: In the sense of the terminology which
has been described above, an open circuit is a typical example of a fault. This fault may lead to a
wrong value of a program variable, which – being part of an incorrect system state – is an error.
This error may cause e. g. a steering system not to react to user input in a correct way, which
clearly would be a failure as this behaviour most likely deviates from the specification.

C.3.2 Concept of a fault model

As described in [22], a fault model describes the possible faults of a system regarding both
structure (fault location) and function (type of wrong behaviour).
Examples of fault locations are CPU processing core, RAM, ROM, sensor, actuator, power supply
and communication network.
Examples of wrong behaviour are omitted output, wrong output, delayed output, an output being
transmitted two or more times to the same recipient or the transmission of an output to a wrong
recipient. In most of these examples, the "output" could be a hardwired signal or a signal
transmitted on a communication link, but some examples are only applicable for communication
signals.
Even correct output has to be considered, as a faulty component can submit correct results in
between faulty ones. In order to structure fault types, distinctions can be made between timing and
value faults, detectable and non-detectable faults as well as byzantine and non-byzantine. A more
detailed distinction is made in a section on fault types being part of EASIS Deliverable D1.2 [23].

14.11.2006 2.0 C-7


EASIS Deliverable D3.2 Part 1 - App C

A postulation on fault occurrence and duration may also be part of a fault model leading to a
classification into transient (i.e. temporary fault that does not reoccur within a defined time period),
intermittent (i.e. alternating between being present and not present) and permanent (i.e. constantly
present) faults.

C.3.3 Possible generalization of "fault" to "unexpected event leading to undesirable system


behaviour"

Discussions during several work package-meetings made it clear that a common view of terms or
concepts is hard to establish as people from different background use terms differently. One
suggestion was a generalization of the term “fault” to also include “unexpected events” as long as
they lead to “undesirable system behaviour”. Several ideas of what possibly could be regarded as
a fault have been brought up, among them faulty specifications but also e. g. intentional and
malicious interaction with a system.
This chapter proposes a life-cycle model based on [22] to arrange all suggested terms in a broader
concept in order to achieve a common understanding.
A typical life-cycle of a technical system consists of the phases
• Design
• Production and
• Operation.

The design phase comprises specification, implementation and documentation; the production
phase includes both the manufacturing of hardware and the compilation of software. During the
operation phase both operating in a narrow sense as well as maintenance have to be considered.
This is illustrated in Figure C.4.

Design Production Operation

Specification Hardware Operating

Documentation Software Maintenance

Implementation

Figure C. 4 Simplified life-cycle

During any of these phases, faults may occur. Regarding the design phase, specification (e. g.
wrong definition of operating conditions) as well as implementation (e. g. software bugs which
occur even despite of a correct specification) and documentation (which may be misleading and
thus cause wrong usage) are subject to faults.
As far as the production phase is concerned, both faults which arise during the manufacturing of
the hardware and e. g. wrong compilation of software have to be taken into account.
14.11.2006 2.0 C-8
EASIS Deliverable D3.2 Part 1 - App C

In the operation phase, a further distinction between faults caused by disturbances (e.g.
mechanical or electrical influence), wear out or physical influence can be made. Handling faults
and maintenance faults also fall into this category.
Based on this model, all proposals from project partners (numbered i.-x.) that have been made
during the discussions can now be considered.
i. Specification errors
As already described, these errors refer to the transition of user requirements to a system
specification and can well be incorporated into the above-mentioned model.
ii. External faults combined with an inability of the system to cope with them
Here it is important to distinguish between those faults which shall be tolerated according to
the system specification and those faults which the system is, again according to its
specification, not required to cope with. Of course, though in practice not all situations can
ever be taken into account, any safety-critical system should be designed in a way that is
as robust as reasonably possible.
iii. Human-made software faults (including mistakes made at any stage between
specification and coding)
These faults count as “implementation faults”. One possible countermeasure is diverse
software development.
iv. Faults in development tools (e.g. code generators and compilers)
Although there is little possibility to foresee such faults, they can lead to subsequent faults
during the “production”-phase.
v. Hardware design faults
According to the abovementioned model, these faults refer to specification, implementation
or documentation of hardware.
vi. Hardware manufacturing faults
Such faults fall into the “production”-phase, as described above.
vii. Permanent hardware faults caused by wear-out and/or environmental stress
(temperature, humidity, vibration, etc)
Typically, the specification describes requirements regarding e. g. minimal/maximal
operating temperature. Depending on whether this specification is inappropriate or whether
the specification has not been implemented correctly, the initial cause may be attributed to
different phases. However, in any case the resulting breakdown happens during the
“operation”-phase.
viii. Transient hardware faults caused by electromagnetic interference (radiated
interference, supply voltage anomalies) and other types of radiation
As in vii., this topic may again be related to the specification but is usually attributed to the
“operation”-phase.
ix. System-external conditions affecting the inputs to the system (for example scenarios
that prevent sensors from correctly measuring physical quantities)
In contrast to vii., these “system-external conditions” do not necessarily lead to permanent
hardware faults. For example fog can lead to misreading of a sensor value. However, it is
still a question of the design of a system to take into account all conditions under which the
system shall operate in a correct manner.
x. Intentional and malicious interaction with the system
These security problems play a role when telematics functions shall provide input for other
functions or even if they just share the same resources which may be possible in Integrated
14.11.2006 2.0 C-9
EASIS Deliverable D3.2 Part 1 - App C

Safety Systems. The topic of security may be important for the gateway-subtask, but is not
addressed here.
As a conclusion one might say that most of the proposals which have been made can be
subsumed in the life-cycle concept described above. However, the term “unexpected event” is
possibly no real generalization as faults do not necessarily have to be unexpected. On the other
hand, a fault does not always lead to undesirable behaviour as it may be masked by fault tolerance
mechanisms. Referring to chapter 3.1 it thus seems to be most reasonable to leave the meaning of
fault as an “adjudged cause for an error”, because the overall concept seems to be clearer.

14.11.2006 2.0 C-10


EASIS Deliverable D3.2 Part 1 - App C

C.4 Description of the investigation methods

Different analysis techniques may be usefully selected depending on how they explore the
relationship between causes and effect inside the system of concern, the type of the results
expected, i.e. qualitative and/or quantitative, and the applicability for analysis of static and dynamic
systems respectively.

C.4.1 Qualitative and Quantitative Analysis

The hazard occurrence analysis can be performed either in a qualitative form, or in a quantitative
form depending on the final scope for which it is planned.
By qualitative analysis, the casual relationship leading to a hazard is investigated, while by
quantitative analysis, the results from the qualitative analysis are used to calculate probabilities of
hazards occurrence, using the probabilities of lower level components as input data to this
calculation.

C.4.2 Inductive, Deductive and Exploratory reasoning

Inductive reasoning, in its general


meaning, works moving from Theory
specific observations to broader
generalizations and theories.
Informally, we sometimes call this a Tentative
"bottom up" approach. In inductive Hypothesis
reasoning, we begin with specific
observations and measures, begin
to detect patterns and regularities, Pattern
formulate some tentative
hypotheses that we can explore,
and finally end up developing some Observation
general conclusions or theories.
The inductive method forms the basis for such analysis as Failure Mode and Effects Analysis
(FMEA), further discussed in section C.4.3.1. By this technique the cause-effect relationship
leading to a system hazard is investigated starting from known hazard causes and exploring the
possible consequences (effects).

Deductive reasoning, in its general Theory


meaning, works the other way, from
the more general to the more
specific. Sometimes this is informally Hypotesis
called a "top-down" approach. We
might begin with thinking up a theory
about our topic of interest. We then
narrow that down into more specific Observation
hypotheses that we can test. We
narrow down even further when we
collect observations to address the Confirmation
hypotheses. This ultimately leads us
to be able to test the hypotheses with specific data, a confirmation (or not) of our original theories.
If inductive analysis details what can happen, deductive analysis informs how it happens. An
example of a deductive method would be Fault Tree Analysis (FTA), further discussed in section
14.11.2006 2.0 C-11
EASIS Deliverable D3.2 Part 1 - App C

C.4.3.2. By this technique, the cause-effect relationship leading to a system hazard is investigated
starting from its known effect on the system and deducing all the possible causes.
Exploratory reasoning is a third method used to link unknown causes to unknown effects
involving the discovery or invention of a hypothesis to explain a novel phenomenon. An exploratory
analysis may be structure generating, model generating, or hypothesis generating. This is a form
of explanation-based reasoning, but there is no ready-to-hand set of knowledge structures from
which to construct an explanation. The situation assessor must search long-term memory for
appropriate knowledge.
A typical hazard occurrence analysis method based on exploratory reasoning is the HAZard and
OPerability study (HAZOP), further discussed in section C.4.3.3. In this case the cause-effect
relationship leading to a system hazard is investigated starting from a specified deviation and
exploring both possible causes and possible effects.
The following Figure C.6 (adapted from [11]) summarizes the different approaches to analyse the
cause-effect relationship by using three common techniques based respectively on inductive,
deductive and exploratory reasoning. These three techniques are to some extent complementary
and can often be used together.

FMEA Start with Possible


single cause consequences

FTA Start with single


Possible
causes consequence

HAZOP Possible
Possible
causes consequences
Start with
single deviation

Figure C.5 Graphical comparison between Inductive, deductive and exploratory reasoning.

C.4.3 Description of common techniques of analysis

In this section the most common techniques of analysis used for the investigation of the hazard
occurrence in static systems are discussed. These techniques do not provide real-time information
on whether the conditions in a system are becoming hazardous, which may finally lead to an
accident or injury. They are not applicable to dynamic systems where temporal issues need to be
considered.
An overview of analysis techniques for dynamic systems is given in section C.4.5.

C.4.3.1 Failure mode and effects analysis (FMEA)

Failure mode and effects analysis (FMEA) is an example of an inductive technique, as it starts from
known causes and explores possible consequences.
FMEA originated in the aerospace industry. However, it is now an accepted technique in many
other sectors, including military, rail and automotive. Generic standards [6] are available giving

14.11.2006 2.0 C-12


EASIS Deliverable D3.2 Part 1 - App C

guidance on FMEA, as well as a number of sector-specific standards or guidelines ([7], [8])


including the SAE J1739 recommended practice [9] for surface vehicles.
Broadly speaking, the FMEA process can be split into the following distinct steps:
• The components of the system are identified
• The possible failure modes of the system's components are identified
• The effects of these failure on the system are evaluated
• The effects on the system are prioritized according to a number of pre-defined rules
• Design changes are proposed in order to reduce, or eliminate, the effect of the fault on
the system
• The analysis is iterated to confirm that the design changes have had a suitable impact
on the effect of the fault.
More specifically, these steps may be described as follows:
• The system components may be defined at a high abstraction level (e.g. electronic unit)
or at a more detailed level (e.g. resistor, capacitor). Thus, the FMEA technique may be
used in early as well as late design stages if appropriate abstraction levels are selected.
• The potential faults of all the components of the system are identified and listed. As the
number of components within a system increases, the number of failure modes will
typically increase as well.
• The effects of each failure mode are evaluated in turn. A single fault may give rise to
more than one effect. Each of these cases should be listed separately in the FMEA. In
general, a single fault is considered in isolation; multiple faults are not usually
considered due to the effort involved, even though a single fault may lead to further
faults in a cascade effect.
• Each failure mode and effect combination is ranked according to its severity and
likelihood of occurrence. In the most basic form of FMEA the rankings are by
classification (e.g. “catastrophic”, “critical”, “major”, “minor”) although the classifications
may be application-dependant.
• In a variant of FMEA, FMECA (failure mode, effects and criticality analysis), the criticality
of the effect is also evaluated. Again in its most basic form this is normally done with a
discrete classification. This is often combined with an evaluation of probabilities or
frequencies of occurrence, and for each event the criticality and probability can be
plotted on a grid or criticality matrix. This permits identification of the priorities for
corrective action.
In the automotive industry, it is usual to perform a variation of FMECA (which is usually called
“FMEA” even though strictly speaking it is “FMECA”). In this form of the analysis, each fault-effect
pair is given three scores, each between 1 and 10, for Severity, (probability of) Occurrence, and
Detection. These factors are usually multiplied to create a risk priority number (RPN). It should be
noted that these three factors are subjective, and the RPN derived from them is usually used to
determine which failures should receive attention (for example, by Pareto ranking of the RPNs). It
is, however, usually recommended that other controls are introduced, since it is possible to miss
an important result, for example when a relatively high value for one of the parameters is obscured
by low values of other parameters resulting in a relatively low RPN.
The Detection parameter in a typical FMEA/FMECA deserves some mentioning here. This
parameter is traditionally intended to represent the likelihood that the failure mode is detected
before the system is released for production. (It is implicitly assumed that such a detection would
lead to an elimination of the problem before the system is released). However, since the
Occurrence parameter is a measure as the expected occurrence in the system during actual post-
release use, the likelihood of detecting the problem before the system is put into use should be

14.11.2006 2.0 C-13


EASIS Deliverable D3.2 Part 1 - App C

accounted for in the assessment of the Occurrence. This means that the Detection parameter is
superfluous unless the Occurrence parameter refers to a highly hypothetical and unrealistic system
which has been developed without any design controls whatsoever.
In some FMEAs the Detection parameter is used to indicate the likelihood that the failure cause is
detected during actual operation, i.e. after the system has been released. However, depending on
whether or not a failure cause is detected, the effects (and thus the Severity) will be different. The
FMEA template is not well suited to describe multiple effects of the same basic failure mode. It
seems much more appropriate to treat the "detected" and "undetected" cases separately in the
FMEA, i.e. these should be analysed in separate rows of the FMEA form. The Occurrence rating of
detected and undetected faults, respectively, shall then be based on an analysis of the efficiency of
the error detection mechanism. Furthermore, it is usually possible to determine whether a
particular fault (e.g. a short circuit at a specific point in the system) will be detected by the
implemented error detection mechanisms or not. Therefore, the effects and the corresponding
Severity of a given fault can often be determined without any need for an estimation of the
Detection parameter on a 1-10 scale.
To summarize the discussion on the Detection parameter, we conclude that the Occurrence
parameter and the Severity parameter associated with a failure mode together provide sufficient
information about the risk. This is in line with the general understanding that a risk is a combination
of an occurrence probability and the associated severity.
A further issue that has to be considered is the system boundary and the point at which the effects
are observed. There are usually three boundaries that have to be considered:
• The boundary of the “target of evaluation” – the system, subsystem or component on
which the analysis is being performed;
• The system boundary (usually the point at which the systems sensors and actuators
observe and act on the equipment under control);
• The event boundary at which the hazardous occurrence will be observed (usually the
vehicle).
This brief overview of the FMEA technique is primarily concerned with the analysis principle, i.e.
the bottom-up inductive reasoning for hazard occurrence analysis. However, it should be noted
that an FMEA addresses the entire EASIS dependability framework, as shown in the Table
below.

FMEA concept Related dependability activity


Effects Hazard Identification
Severity Hazard Classification
Causal relationship between (Qualitative) Hazard Occurrence Analysis
"failure modes" and effects
Occurrence (Quantitative or semi-quantitative) Hazard
Occurrence Analysis
Recommended actions Establishment of Dependability-Related
Requirements
FMEA document Safety Case Construction

14.11.2006 2.0 C-14


EASIS Deliverable D3.2 Part 1 - App C

C.4.3.2 Fault tree analysis (FTA)

Fault tree analysis (FTA) is an example of a deductive technique, as it starts from known
consequences and explores possible causes. Descriptions of the technique can be found in
standards such as [5], [8].
To conduct an FTA, it is first necessary to have an identified top-level hazard or hazards of a
system. Typically these will have been obtained from an inductive technique such as FMEA. The
technique is very flexible since the analysis can be conducted to any appropriate level. Therefore
it is extremely important to consider the boundary of the analysis. In application to automotive
systems, the top-level hazard will normally be at the vehicle level. The causes of this hazard can
then be investigated
A particularly important feature of FTA is that it permits the combinations of faults leading to a
particular hazard to be identified. The information in an FTA is usually presented in an hierarchical
format, where individual events in the hierarchy are combined using Boolean logic in the form of
“AND” and “OR” gates. An “AND” gate represents a combination of events that must all be fulfilled
in order for the next highest event to be triggered; whereas an “OR” gate represents a combination
of events where one or more must all be fulfilled in order for the next highest event to be triggered.
A separate tree has to be created for each top level hazard.
Figure C.7 shows an example incomplete fault tree for the hazard “no power from engine” that
demonstrates the basic features of an FTA.

No power from engine

OR

Mechanical fault No fuelling No spark

OR

Incorrect output from


Fuel tank empty Fuel system fault Injector fault
engine management

AND

… etc

Figure C.6 Example of a typical Fault Tree Analysis.

14.11.2006 2.0 C-15


EASIS Deliverable D3.2 Part 1 - App C

In this fault tree, the following symbols have been used:

Rectangle – a fault event that usually results from the combination of one or more basic
faults
Circle – a basic component fault with no statistical dependence on any other events
denoted by a circle or diamond
Diamond – a basic fault event not developed to its cause

Double diamond – an important undeveloped fault event that requires further development
to complete the fault tree
OR gate – one or more of the input events can cause the output event

AND gate – all of the input events must occur to cause the output event

Starting from a top-level hazardous event, the immediate events leading to this event are listed,
combined as appropriate with an “AND” or “OR” gate. Each sub-event is further analysed until
either a basic component fault is reached or a basic fault event is reached that cannot be
developed further. This can be the case either because the information is not available, or
because the fault is not sufficiently relevant to the analysis being conducted.
The completed fault tree can be used for further analyses:

1. A “minimum cut set” can be generated by performing Boolean algebra on the tree.
This identifies the minimum combination of events that can lead to the top-level
event in the tree.
2. Occurrence rates for the top-level hazards can be calculated based on available
probability data for the lowest events in the tree.

In summary, it can be concluded that Fault Tree Analysis is an effective method for performing
Hazard Occurrence Analysis. More specifically, it provides the following benefits over inductive
methods like FMEA:

• The analysis efforts are focused at those issues that really represent potential
problems, i.e. the identified hazards
• The FTA technique allows investigation of cases when two or more faults may together
lead to a hazard
• The FTA technique allows the identification of not-so-obvious causes of hazards. For
example, it could be the case that a sensor signal may have the following
characteristics in a particular system:
• If the signal error is smaller than some limit ε0, the top-level hazard will not occur (simply
because the error is small)

• If the signal error is larger than some limit ε1, the top-level hazard will not occur since
plausibility checks effectively detect such large errors

• Thus, the most critical case is when the magnitude of the signal error is between ε0 and ε1,
supposedly when it is very close to the upper limit ε1. This type of dependency would in
principle be impossible to find by an inductive method such as FMEA.

14.11.2006 2.0 C-16


EASIS Deliverable D3.2 Part 1 - App C

It should be noted that fault trees may be generated automatically from a system model. The
SETTA project [30], funded by the CEC under contract number IST-2000-10043, addressed the
systems engineering of safety-related system with a special focus on time-triggered systems. One
of the objectives of the SETTA project was better integration of functional development and safety
analysis process. Within SETTA, a prototypical fault-tree synthesis toolset has been developed by
DaimlerChrysler in co-operation with the University of York. The synthesis algorithm performs a
backward traversal of the data flow graphs given by the system model. The toolset consists of a
Matlab/Simulink-GUI for failure mode annotations, a Simulink to XML Converter and a fault-tree
synthesis tool. The concept is illustrated in Figure C.8.

Failure Mode Annotations


Matlab/ Simulink

Fault Tree
Evaluation of Fault Tree
cut-set analysis

Figure C.7 Fault-tree synthesis based on Simulink models


The Graphical User Interface enables a safety analyst to annotate failure modes in Simulink
models. For an easier interfacing to other modelling tools an XML file is generated for the Simulink
model structure and failure annotations. The fault tree synthesis tool reads the XML file and
generates a fault tree based on failure annotations in a Simulink model.

C.4.3.3 Hazard and operability study (HAZOP)

Hazard and operability (study) (HAZOP/HAZOPS) is an example of an exploratory technique, as it


starts from a specified deviation and explores both possible causes and possible consequences.
The HAZOP technique was originally developed in the chemical engineering sector, but has since
been extended to be applied in many differing sectors [11]. For any form of HAZOP, the basic
process is the same. A system or process is examined for deviations from its design intent. The
analysis works backwards to identify possible causes of the deviation and forwards to identify
possible consequences.
HAZOP is based on a series of entities, attributes and guidewords, and the hazard analysis is
conducted by asking questions in the form:

What if [entity].[attribute] = [guideword] ?

14.11.2006 2.0 C-17


EASIS Deliverable D3.2 Part 1 - App C

The entity is the lowest level of component, system or function that will be examined in the
analysis. In the original form of HAZOP, the entities were usually “flows” (of information or an
agent such as electricity, air pressure, a fluid). However the entities can be any identified
component of a system that has suitable attributes.
The attribute is an identifiable state or property of the entity.
The guideword describes a deviation from the intended design behaviour. There is a basic
standard set of guidewords although these usually need to be interpreted in the context of the
analysis being undertaken. Although HAZOP has been applied in many different contexts, it has
been found that these basic set of guidewords are always applicable even if they require
interpretation.
The standard guidewords and their generic meanings are [10], [11]:

Generic Meaning
properties
No The complete negation of the design intention – no part of the intention is
achieved and nothing else happens
More A quantitative increase over what was intended
Less A quantitative decrease over what was intended
As well as All the design intention is achieved together with additions (i.e. a qualitative
increase over what was intended)
Part of Only some of the design intention is achieved (i.e. a qualitative decrease over
what was intended)
Reverse The logical opposite of the intention is achieved
Other than Complete substitution, where no part of the original intention is achieved but
something quite different happens
Timing Meaning
Early Something happens earlier than expected relative to clock time
Late Something happens later than expected relative to clock time
Before Something happens before it is expected, relating to order or sequence
After Something happens after it is expected, relating to order or sequence

An example question, applied to a valve controlling pneumatic or hydraulic pressure in a system,


would be:
What if Valve.Position = Maximum ?
Here the generic “more” property has been identified with a specific state of the entity.
HAZOP can be applied to a conceptual design and also to operational conditions. It is considered
to be particularly effective for new systems or novel technologies.
Finally, it should be mentioned that the "entity-attribute-guideword" approach may be applied also
in an FMEA.

14.11.2006 2.0 C-18


EASIS Deliverable D3.2 Part 1 - App C

C.4.4 Other techniques of analysis for static systems

Besides Fault Tree Analysis, there are other "tree-based" analysis techniques such as event-tree
analysis (ETA) and cause-consequence analysis (CCA). These are briefly overviewed in the
following subsections.

C.4.4.1 Event tree analysis (ETA)

Event tree analysis is a method for illustrating the sequence of outcomes which may arise after the
occurrence of a selected initial event.
Unlike a fault tree, an event tree is based on inductive reasoning. It is mainly used in consequence
analysis for pre-incident and post-incident application.
An example of an event tree is given in Figure C.9. The left side connects with the initiator, the
right side with damage state; the top defines the systems; nodes call for branching probabilities
obtained from the system analysis. If the path goes up at the node, the system succeeded. If it
goes down, it failed.
It should be noted that the probabilities of the final outcomes in Figure C.9 are approximated based
on the assumption that the "fails" probabilities are much less than unity. For example, the
probability P1 x P5 is more correctly described as P1 x (1-P2) x (1-P3) x (1-P4) x P5.

1 2 3 4 5
Detection Selection of Switch to Backup
Short circuit of short appropriate backup system
circuit action system

Available
P1
Succeeds 1 - P5

1 - P4 Fails
P1 x P5
Succeeds P5

1 - P3

Fails P1 x P4
Succeeds P4

1 - P2 Failure
Initiating
Event Fails
P1 x P3
P3
P1

Fails
P1 x P2
P2

Figure C.8 Example of a typical Event Tree Analysis.

Details on how to carry out event tree analysis as well as its benefits and restrictions are
documented in literature [12], [14].
The ETA method appears to have quite limited applicability for hazard occurrence in the context of
the EASIS Dependability Activity Framework.

14.11.2006 2.0 C-19


EASIS Deliverable D3.2 Part 1 - App C

C.4.4.2 Cause-Consequence Analysis

Cause-consequence analysis (CCA) is a blend of fault tree and event tree analysis. This technique
combines cause analysis (described by fault trees) and consequence analysis (described by event
trees), and hence deductive and inductive analysis is used.
The purpose of CCA is to identify chains of events that can result in undesirable consequences.
With the probabilities of the various events in the CCA diagram, the probabilities of the various
consequences can be calculated, thus establishing the risk level of the system. Figure C.9 below
shows a typical CCA.

Figure C.9 Example of a typical Cause-Consequence Analysis.

This technique was invented by RISO Laboratories in Denmark to be used in risk analysis of
nuclear power stations. However, it can also be adapted by the other industries in the estimation of
the safety of a protective or other systems.
Details on how to carry out cause consequence analysis as well as the benefits and restrictions of
it are documented in literature [13], [14].

C.4.4.3 Discussion and Conclusion

The tree-based methods are mainly used to find cut-sets leading to the undesired events. In fact,
event tree and fault tree have been widely used to quantify the probabilities of occurrence of
accidents and other undesired events leading to the loss of life or economic losses in probabilistic
risk assessment.
However, the usage of fault tree and event tree are confined to static, logic modelling of accidents.
In giving the same treatment to hardware failures and human errors in fault tree and event tree
analysis, the conditions affecting human behaviour can not be modelled explicitly. This affects the
assessed level of dependency between events.

14.11.2006 2.0 C-20


EASIS Deliverable D3.2 Part 1 - App C

C.4.5 Techniques of analysis for dynamic systems

A brief description of the most common techniques for analysis of dynamic system is given in this
section. The main objective is to give a conceptual overview of these techniques, and literature
references where their specific features such as strengths, weaknesses and difficulties of
application can be more evaluated in detail.

C.4.5.1 Markov Modelling

Markov modelling allows analysis of the time-dependent behaviour of dynamic systems. The
transitions between system states are modelled as stochastic events. In a time-homogeneous
continuous-time discrete-state Markov process, each possible transition is associated with a
constant rate that represents the probability of the transition firing within an infinitesimally small
time interval divided by the length of that interval.
The system state probabilities PP(t) in a continuous Markov system analysis are obtained by the
solution of a coupled set of first order, constant coefficient differential equations known as the
Chapman-Kolmogorov matrix equation:

M ⋅ PP(t),
dPP/dt = M

where M
M is the matrix of coefficients whose off-diagonal elements are the transition rates and
whose diagonal elements are such that the matrix columns sum to zero.
Markov models are particularly useful for analysis of fault-tolerant systems in which repair and/or
recovery is possible. An example is given in Figure C.11 for the case when two faults together may
cause a failure. Each fault has the occurrence rate λ and the repair (or recovery) rate is µ. The
probability of the state "Failure of system" may be calculated from the rates λ and µ.

0 1 λ 2
Fault-free One fault Failure of
µ system

Figure C.10 Example of a Markov model


For the example model above, the Chapman-Kolmogorov matrix equation takes the following form
that can easily be solved manually or with the aid of a suitable tool:

[] [P0'(t)
P1'(t) =
-2λ µ 0
2λ -(µ+λ) 0 .
][ P0 (t)
P1 (t)
]
P2'(t) 0 λ 0 P2 (t)

An application of Markov modelling to a hold-up tank problem is discussed in literature [15], while a
Pate-Cornell study on fire propagation for a subsystem on board a off-shore platform is proposed
in [16]. An approach for symbolic approximation of the state probabilities of Markov models of
repairable fault-tolerant systems is described in [31].

14.11.2006 2.0 C-21


EASIS Deliverable D3.2 Part 1 - App C

C.4.5.2 Dynamic Event Tree Analysis

Dynamic event tree analysis method (DETAM) is an approach that treats time-dependent evolution
of plant hardware states, process variable values, and operator states over the course of a
scenario (e.g. an accident). In general, a dynamic event tree is an event tree in which branchings
are allowed at different points in time.
This approach is defined by five characteristics sets:
- branching set,
- set of variables defining the system state,
- branching rules,
- sequence expansion rule
- quantification tools

The branching set refer to the set of variables that determine the space of possible branches at
any node in the tree. Branching rules, on the other hand, refer to rules used to determine when a
branching should take place (a constant time step). The sequence expansion rules are used to
limit the number of sequences.
This approach can be used to represent a wide variety of operator behaviours, model the
consequences of operator actions and also served as a framework for the analyst to employ a
causal model for errors of commission. Thus it allows the testing of emergency procedures and
identify where and how changes can be made to improve their effectiveness. An analysis of the
accident sequence for a steam generator tube rupture is presented in literature [17].

C.4.5.3 Discussion and Conclusion

The techniques discussed above address the deficiencies found in fault/event tree methodologies
when analysing dynamic scenarios i.e. a systems which configuration depends over the time.
However, there are also limitation to their usage.
Markov modelling requires the explicit identification of all possible system states and the transitions
between these states. This is a problem as it may be difficult to envision the entire set of possible
system states.
DETAM can solve the problem through the use of implicit state-transition definition. The drawbacks
to these implicit techniques are implementation- oriented. With the large tree structure generated
through the DETAM approaches, large computer resources are required. The second problem is
that the implicit methodologies may require a considerable amount of analyst effort in data
gathering and model construction.

14.11.2006 2.0 C-22


EASIS Deliverable D3.2 Part 1 - App C

C.5 Investigation of dependent failures

Common cause failure analysis studies those system-internal failures that for some reason can not
be considered to be independent from each other.

C.5.1 Definitions

The terminology concerning common cause failures has changed over the years. Traditionally,
only common mode failures were considered. Later, the definition of common cause failure was
introduced referring to a wider group of failures superseding common mode failures. However, at
that time the idea that common cause failure was synonymous with common mode failure was
widespread.

The issue regarding the difference between common cause and mode was clarified when the term
dependent failures was introduced to supersede and encompass common cause, common mode
failures and “cascade failures”. Cascade includes all dependent failures that are not common
cause failures.

The safety and reliability directorate of the United Kingdom Atomic Energy Authority gives the
following definitions of dependent, common cause, common mode and cascade failures [28] and a
graphical explanation is provided in Figure C.12:

• Dependent failure: the likelihood of a set of events, the probability of which cannot be
expressed as simple product of the unconditional failure probabilities of the individual
events.
• Common cause failure: this is a specific type of dependent failure that arises in
redundant components where simultaneous (or near simultaneous) multiple failures
result in different channels from a single shared cause.
• Common mode failure: This term is reserved for common-cause failures in which
multiple items fail in the same mode.
• Cascade failure: These are all those dependent failures that are not Common Cause,
i.e. they do not affect redundant components.

Dependent Common mode


failures failures

Common cause
failures
Cascade failures

Figure C.11 Dependent failures categories

14.11.2006 2.0 C-23


EASIS Deliverable D3.2 Part 1 - App C

C.5.1.1 Dependent failures

Given two dependent events A and B, the probability that both events A and B happen is not equal
to the product of the two unconditional probabilities:
Prob(A and B) = Prob(A) • Prob(B|A) = Prob(B) • Prob(A|B) ≠ Prob(A) • Prob(B)
This is the broadest category, including both common cause/mode and cascade failures

C.5.1.2 Common cause/mode failures in automotive systems

The definitions in section C.5.1 are not applicable to typical automotive architectures of today since
redundant components are rarely employed. They are however highly relevant for future
automotive architectures in which redundant units are used to implement e.g. Steer by Wire (see
Figure C.13).

Bat1
Bat2

Dual Bus
Connection

FSU1 FSU2

Basic Basic
Ctrl Act 1 Ctrl Act 2
D
D

Actor Actor
2x s A
2x s A

Bat1 Actor Bat2


1 2
U
U

VL VR

Figure C.12 Steer by Wire System example (extracted from[18]).


In such a system, the fail-silent units FSU1 and FSU2 are redundant components (and channels)
which are susceptible to Common Cause/Mode Failure (such as EMI, or common SW bug for
instance). As FSU1 and FSU2 are used to ensure availability of the Steer by Wire system in case
of a single failure on one of the FSU, a common mode or a common cause failure affecting both
FSU will have the same consequence: FSU will both go in fail silent mode and the system will no
more be available. So in such configuration, there is no real difference between common cause
and common mode.
Now, let’s try a slightly enlarged definition of common cause failure. In the Common Cause failure
definition in C.5.1, there are two different ideas of redundancy: different components and different
channels. If we consider the internal architecture of the FSU as given by Figure C.14, the
supervisor can be considered as an independent channel, so there are two channels in one
component. So, let’s keep the idea of different channels and let’s forget the idea of different
components. The definition becomes:
Common Cause Failure: a specific type of dependent failure where simultaneous (or near
simultaneous) failures occur in different channels due to a single shared cause.

14.11.2006 2.0 C-24


EASIS Deliverable D3.2 Part 1 - App C

FSU

Supervisor

Inp CPU Driv


Sensors uts er Actuators

Internal Power
Supply Communication
interface
Ext.
Communication
PWS

Figure C.13 Example of FSU (extracted from[18])

With this enlarged definition, common mode failure is a category of special interest. A common
mode failure in both the main CPU and the supervisor will not be detected at the FSU level and will
propagate. This is in contrast to other common cause failures, which will be detected by the
comparators between the supervisor and the main CPU.
Examples:
• Common cause (but not common mode) failure: a high intensity electromagnetic field
may cause a fault in the main CPU and a different fault in the supervisor as their HW is
different. The consequences of the faults will be detected by comparison of internal
values between the main CPU and the supervisor, switching the FSU in fail silent mode.
Then the other FSU (located in another area of the vehicle well protected against the
electromagnetic field) will take over.
• Common mode failure: a common specification fault in an algorithm used in the main
CPU and the supervisor will not be detected by comparisons between the two channels.
It will thus propagate to the output and may cause a spurious action.
As these examples show, a common mode failure is a particular type of common cause failure that
is neither detectable by comparators nor by voters.

C.5.1.3 Cascade failure

Cascade failures are all dependent failures that are not common cause failures. So they relate to
dependent failures affecting a single channel.
Examples:
- At ECU level, failure of the voltage regulator will make the other sections of the ECU
unavailable all at once.
- At ECU level, a high intensity electromagnetic field may cause both the CPU to behave
erratically and its watchdog to fail silent

14.11.2006 2.0 C-25


EASIS Deliverable D3.2 Part 1 - App C

- At system level, failure of a speed sensor could produce the total/partial loss of many
functions: ABS, Cruise Control, Driver Speed Display…

C.5.1.4 Dependent failures and safety barriers

A safety barrier is a function whose purpose is to avoid the propagation of an internal fault up to
the output of the system. Safety barriers are put in places where propagation of internal faults may
cause hazardous failures.
Examples:
- A watchdog to detect and handle wrong execution of the CPU program.
- A supervisor to detect and handle wrong acquisitions, wrong calculations and wrong
execution of program of a CPU
- A redundant low beam lamp (“loss of one lamp” will not propagate to the hazard “loss of
front lighting”)
- A redundant Fail-Silent Unit (FSU) for an X-by-Wire system
- A plausibility check on some data

Dependent failure events do not necessarily cause more acute problems than independent
failures.
Example:
- At ECU level, the cascade failure of the voltage regulator causing the failure of both the
CPU and its watchdog will only have as consequence the unavailability of the CPU. It
will fail silent. The loss of the watchdog has no consequence in that case. So it is not a
more severe failure than the loss of the CPU alone.

A dependent failure event causes more severe problems when it causes both an internal fault and
the inhibition of a safety barrier whose purpose is to avoid propagation of the fault (see Figure
C.15). The problem in that case is that the occurrence of the failure is as high as if there was no
safety barrier.
Example:
- At ECU level, a high intensity electromagnetic field may cause both the CPU to behave
erratically and its watchdog to fail silent. The consequence may be spurious actuator
command. In this case, the failure of the WD has a consequence.

Fault 1 Failure on internal function Propagation Hazard


Root
Cause
Fault 2 Loss of related safety barrier

Figure C.14 Example of FSU Sequence of events leading to a problematic dependent failure

14.11.2006 2.0 C-26


EASIS Deliverable D3.2 Part 1 - App C

C.5.1.5 Root Causes and Coupling Factors

The root cause explains the mechanism underlying the transition from available to failed or
functionally unavailable.
For example:
- If two components are located in the same enclosure and they are susceptible to high
humidity, a common cause failure could occur as a result of an event outside the
enclosure but causing high humidity in the enclosure. In this case high humidity is the
root cause of failure for the two components.

Given the existence of the root cause, the coupling factor explains why a particular cause affects
several components. It creates linking conditions to cause multiple components to fail in a
correlated fashion.
For example:
- Location in the same enclosure is a coupling factor for those components susceptible to
high humidity.

Figure C.16 shows the mechanism of failure of multiple components. When there is a coupling
factor (e.g. same location) and a trigger event (e.g. failure of an air conditioning system) occurs,
the root cause (e.g. high humidity) results in multiple component failures.

Component
a

Component
Root Coupling b
Cause Factor

Component
n

Figure C.15 Root Causes and Coupling Factors relationship.

14.11.2006 2.0 C-27


EASIS Deliverable D3.2 Part 1 - App C

C.5.2 Kinds of dependent failure causes

There are internal causes to the system such as:


- Design fault in a common software module
- Design fault in a common hardware component
- Specification fault
- Fault leading to corruption of a global variable in a software
- Power and information networks
- Production fault
- New technology

There are external causes to the system such as:


- Mechanical aggression because of misuse
- Mechanical aggression following a crash
- Extreme environmental conditions (temperature, electromagnetic field, humidity,
vibration, shock, etc)
- Fire
- Leaking fluids (Fuel, Hydraulic, Battery acid, Water …)
- Wrong maintenance.

C.5.3 Methods for identifying and analyzing dependent failures

This section discusses use of FTA and FMEA to identify and analyze dependent failures. It also
describes methods used in the most significant standards and how they are applicable to the
automotive field.

C.5.3.1 FTA for dependent failure analysis

FTA is a well adapted method for analysis of dependent failures. It shows where to focus the
identification process of harmful dependencies. Every “AND” gate of the FTA will be carefully
analysed, to see if the events in the input paths are truly independent.
Every hazard of the system, identified in the PHA, may be analysed by FTA at system level and
then detailed by FTA at component level. FTA at system level will show where to look for common
cause and common mode failures at system level. FTA at component level will also show where
cascade failure modes may be harmful.
FTA may be a qualitative analysis or may be quantitative when it concerns HW random failures.
Even in the case of a dependent failure analysis based on a quantified FTA, it is difficult to imagine
how a quantified analysis could be performed. The problem is to quantify the level of dependency.
So, it seems that dependent failure analysis based on FTA is a qualitative method only.

C.5.3.2 FMEA for dependent failure analysis

As dependent failure events cause only particular problems when they cause both an internal fault
and the inhibition of a safety barrier whose purpose is to avoid propagation of the fault, “Runtime
FMEA” may also be used to focus the identification process of harmful dependencies. In a
“Runtime FMEA”, the detection measures are safety barriers. So every FMEA line where the failure
effect is hazardous will be analysed to assess dependence between the cause column and the
detection measure column.

14.11.2006 2.0 C-28


EASIS Deliverable D3.2 Part 1 - App C

C.5.3.3 SAE-ARP 4754 and SAE-ARP 4761

Aerospace industries use the term common cause analysis to address what the nuclear industries
call dependent failure analysis. In common cause analysis they identify three different issues which
they address with zonal safety analysis, particular risks analysis and common mode analysis [24],
[25].
Zonal Safety Analysis addresses all those concerns regarding equipment installations,
interference between systems, the robustness of the system against possible maintenance errors
and the claimed installed in close proximity on the aircraft.
The whole aircraft is divided into several zones and for each of these zones a zonal safety analysis
is performed. The objective of the zonal safety analysis is to ensure that the system design meets
the safety objective with respect to:

- Basic installation;
- Effect of failures on aircraft;
- Implication of maintenance errors;
- Verification that the design meets the FTA independence claims.

Particular Risk Analysis addresses specific events listed by airworthiness regulations that
potentially may cause a failure inside the system itself. For each risk the possible consequences
for the whole aircraft should be evaluated; if one of the risks may affect safety, proper measures
should be taken. These particular risks may influence several zones.
The following particular risks are set out in [24].

- Fire
- High energy devices (non-containment):
o Engine
o Auxiliary Power Unit
o Fans
- High pressure bottles
- High pressure Air Duct Rupture
- High temperature Air Duct Leakage
- Leaking fluids:
o Fuel
o Hydraulic
o Battery acid
o Water
- Hail, Ice, Snow
- Birds strike
- Tyre burst, flailing tread
- Wheel rim release
- Lighting strike
- High Intensity Radiation Fields
- Flailing Shafts
- Bulkhead rupture

Common Mode Analysis


According to ARP 4761, common mode analysis should be performed in the lifecycle after
Functional Hazard Analysis and Preliminary System Safety Analysis. Its aim is to verify that all the
inputs to all AND gates (both explicit and implicit) in the failure logic analysis (Fault Tree Analysis,
Dependence Diagram, Markov Analysis etc.) are independent.

14.11.2006 2.0 C-29


EASIS Deliverable D3.2 Part 1 - App C

Basically, components with the same hardware and software could be susceptible to common
mode failures due to couplings arising from particular risks, or other causes. Therefore the
principal task of the analysis is to look for couplings and to evaluate to what extent ‘root causes’
could affect coupled components.
Identifying coupling is the major task and is very much dependent on the expertise of the analyst;
several check lists have been tailored to help in discovering couplings.
It is important to point out that common cause failure analysis in the aerospace industry is purely a
qualitative analysis.

Applicability in the automotive field:


Particular risks analysis seems applicable at car level. Obviously, the particular risks list should be
adapted.
Zonal safety analysis seems applicable. We may for instance create two zones:
- Underhood
- Passenger compartment

As these 2 zones are segregated by the firewall, they can be considered as independent (except
for particular risks).
Common Mode Analysis is applicable at system and ECU level as it provides an assessment that
the independence claims made in the Fault Tree Analysis are valid.

C.5.3.4 IEC 61508

This standard defines common cause failure as “failure, which is the result of one or more events,
causing coincident failures of two or more separate channels in a multiple channel system, leading
to system failure” [27].
It stipulates qualitative methods to analyse common cause failures [27]:
- General quality control
- Design reviews
- Verification and testing by an independent team
- Analysis of real incidents with feedback on new development

For software [27] recommends (or highly recommends, depending on the SIL level) common cause
analysis of diverse software.
It also proposes [27] a quantitative method to evaluate the occurrence of common-cause hardware
failures. This method is relevant only for multi-channel systems. It gives a relation (β factor)
between the occurrence of common-cause hardware failures on 2 or more channels and the
occurrence of random hardware failures on a single channel. The β factor is computed using
checklists on the following subjects:
- Separation / Segregation
- Diversity / Redundancy
- Complexity / design/ maturity / experience
- Assessment / analysis and feedback of data
- Procedures / human interface
- Competence / training / safety culture
- Environmental control
- Environmental testing

14.11.2006 2.0 C-30


EASIS Deliverable D3.2 Part 1 - App C

Applicability in the automotive field


Application of the IEC 61508 β factor method as is seems difficult. As the standard states:
- The justification of the relationship between probability of hardware common cause
failure and probability of random hardware failure is tenuous.
- There is no known data on hardware-related common cause failure available for the
calibration of the methodology. Therefore the tables to compute the β factor are based
on engineering judgement.

Moreover, the questions asked in the tables are not always applicable to the automotive field.

14.11.2006 2.0 C-31


EASIS Deliverable D3.2 Part 1 - App C

C.6 Investigation of software failures

Generally software development does not follow the same rules and constraints that are
traditionally associated with hardware development. Several important issues should be
considered when applying the qualitative analysis reported in the section C.4 to automotive
software development :
• In the automotive industry, the traditional "waterfall" software development process is being
replaced more and more by "spiral" or "iterative" development methods. This has become
necessary to keep in line with the rapidly increasing and changing customer expectations.
Thus, the software is often under continuous development and improvement. A result of
this is that software FMEAs or FTA are potentially very difficult and resource intensive to
maintain in line with ongoing changes.
• The results of a software FMEA or FTA should be available early enough in the
development process to have an added value in the series product and development
processes. Detailed software FMEAs and FTA are potentially very time consuming.
• Detailed analysis of potential types of software faults can usually be translated to
equivalent faults on the high level software architectural level.
To understand the concept of equivalent faults an example could be considered with a lateral
acceleration signal that has been corrected for offset and gain before being used in the control
system. A microprocessor calculating error (one fault type listed in Ref [18]) could potentially result
in the lateral acceleration variable permanently set to 2.0 m/s2 although the vehicle is driving
straight ahead. From a system level, it is not important if the microprocessor calculation error
occurred in the offset correction module or the gain correction module. In both cases, the signal
used later in the system has the same content and the effect on the system can be considered
equivalent.
The application of quantitative analysis to automotive software is generally considered
impracticable since it is not possible to quantify the probabilities of software faults.

This problem is addressed by the introduction of the concept of Development Assurance Level
[24], [25].
The "Development Assurance Level” addresses the SW development process to introduce
appropriate degree of process control, with the purpose of achieving qualitative indicators that a
system meets its safety objectives.
A large number of comprehensive recommendations for software development processes have
been issued together with assessment reference models by the most common International
Standards (see ref. [26], [27]).

C.6.1 Describing Software FMEA Techniques and Different Levels of Analysis

Understandably a number of papers use slightly differing terminology and descriptions to explain
software FMEA techniques. Similarly the scope of components to be analyzed can vary
significantly.
In this paper an FMEA as applied to software is called a software FMEA although this analysis can
be carried out at different levels of detail. The paper in reference [20] - Software FMEA
Techniques, also talks about two levels of software FMEA called "System Software FMEA" and a
"Detailed Software FMEA".
In this paper a software FMEA corresponds closely to a "detailed software FMEA" and a system
FMEA that includes the software architecture corresponds closely to the "system software FMEA"
but includes not only the key software components but also the system hardware components.

14.11.2006 2.0 C-32


EASIS Deliverable D3.2 Part 1 - App C

C.6.1.1 Software FMEA Compared to System Level FMEA that includes SW Architecture

The FMEA examples that follow this chapter will briefly discuss the potential results from three
FMEA methods applied to the development of automotive software. However before proceeding it
is useful to review a simple software example that indicates the types of issues that arise.
The example selected is a subroutine that should return the highest of three numbers. It quickly
becomes clear that numerous potential faults could be contained within this type of calculation.
Typical fault examples could be a coding mistake that results in calculating with mismatched
intermediate data types, returning the lowest variable or writing the result to the wrong location. In
effect the number of potential faults are nearly endless even for this small calculation. That would
indicate that the software analysis is being conducted at either too low a level or with inappropriate
‘failure’ modes.
Approaching potential faults from another direction, it is always possible to construct faults that
result in the output taking any value possible. As such, it appears to make sense for a software
FMEA to assume that a fault will result in the worst case for the output of a function. Extrapolating
this result to larger software functions still results in potential faults that cause the outputs of this
larger function to take on any value possible.
Since the potential effects on the vehicle clearly depend on the system functionality and its ability
to influence vehicle operation via actuators, communication buses etc, it appears to make sense to
consider software faults at the system high architecture level rather than at the low software level.
This leads to the conclusion that system level FMEA (or ETA) methods including software at the
architectural level, together with additional process and product requirements that result from a
very generic software FMEA (see following chapter) can provide the vast majority of the value
which may be available from performing a very detailed, product specific software FMEA.
Other examples in the following chapters lead also to the suggestion that recommendations
developed from a detailed automotive software FMEA will tend to be similar, independent of the
actual product being developed. That is to say that the recommendations would be the same if the
product is a windscreen wiper, rear heated window or steering system. Furthermore, it is
suggested that the methods to resolve the potential issues that arise are well documented in
numerous process, coding standards and other types of documentation already available.
It is however recognized that a high level analysis cannot identify specific types of faults at all
points where the corruption of individual variables may couple through the software system to
result in critical failures. The effectiveness of this will be discussed more later in the safety FMEA
chapter. Some recognition of the cost of various analyses may be appropriate. That is, difficult or
costly analyses should generally be limited to safety critical systems. It is likely that analysis
methods at the system level will be considerably more cost effective to create and maintain than at
a detailed software level. This conclusion is also reached in reference [20].
The system level FMEA or equivalent should include typical types of microprocessor faults such as
execution errors (see ref [20]) and the effects of critical variables being corrupted.

C.6.2 The Different FMEA Methods and Formats

There are a number of FMEA methods and formats used in the development of automotive
systems that could be applied to the software development.
Some examples include:
- Process FMEA
- Component FMEA
- Safety FMEA

In all of these FMEA methods there are values assigned for severity, detection and occurrence that
can have different meanings for the different types or styles of FMEA.
14.11.2006 2.0 C-33
EASIS Deliverable D3.2 Part 1 - App C

- In a process FMEA, the method often involves consideration of what could happen in
the manufacturing (development) process, what effect this could have, how likely this
may be and to ensure measures are in place to detect any of issues that could occur so
these can be corrected.

- In a component FMEA, the method often involves consideration of what could happen
in the finished product (or component), what effect this could have, how this would be
detected in the development process (and field) and what is the probability that this may
occur.

- In a safety FMEA, the method often involves assumption that particular faults will occur
and then considers the affects of this occurrence in the field. Only if the effects of the
occurrence are considered unacceptable, will the probability of this occurrence be
considered to assess if the system (or component) meets acceptable safety standards.

In all of the FMEA examples it is usually necessary to clearly define if the severity of the
occurrence is analysed assuming that a fault detection and correction feature has worked as
designed (either software, hardware or other detection and correction mechanisms). For example,
if a engine controller detects a sensor error and switches to a limp home function that allows the
engine to run, the severity of this is usually considered less than if the engine stops running
altogether.
These various types of FMEA can be applied to the software and development process and
product. The following chapters discuss potential results of such analysis methods related to
software development.

C.6.2.1 Process FMEA applied to Software

There are a number of stages in the software development process where faults could be
introduced into the production product. Analysis of the software development process will usually
identify these possibilities. For example faults may be introduced during the following development
stages:
- Requirements
- Specification
- Design
- Implementation & Coding

Analysis of the types of faults that can be introduced using a process FMEA typically result in the
need for appropriate verification and validation steps in the software development process that
covers all stages of development.
In the case of software development there are already a large number of comprehensive
recommendations for software development processes available together with assessment
reference models (for example [26] - Spice and CMM(I)). As such, the developer may ask if a
software process FMEA is necessary or if it is more effective to align to or tailor one of the more
recognized and emerging standards together with regular assessments designed to uncover weak
points in the development process and product.

14.11.2006 2.0 C-34


EASIS Deliverable D3.2 Part 1 - App C

C.6.2.2 Component FMEA applied to Software

When reviewing the fault types in a software component FMEA, the large number of types of faults
possible and the difficulty in assessing the effects of these different types of faults allows the fault
severity to be reduced to two categories.
- The software output or operation is correct
- The software output or operation is incorrect (for example too high or too low)

This can be justified by analysing an example with a sensor input signal represented within the
software as a 16 bit signed variable. If the variable is higher than the correct value at a given point
in time, this may have been caused by a signal overflow, a wrong gain factor, a wrong offset value
or many other fault types. The number of potential causes is almost unlimited.
Additionally the effects of a particular type of fault can also be reviewed. In this case EASIS
"Description of Fault Types for EASIS" document (see references [18]) was used and equivalent
faults applied to the software. Analysis of these types of faults suggests the effects of such faults
may be unpredictable in the worst case. For example, if a software module writes results to the
wrong address (using a wrong pointer), potentially this could result in any variable being too high
or too low and it is almost impractical to predict what effect this may have on the system for all
variables and contents.
The developer may ask if a software component FMEA is necessary or if it is more effective to
align to one of the more recognized and emerging standards together with regular assessments
designed to uncover weak points in the development process and product.

C.6.2.3 Safety FMEA applied to Software

Safety FMEAs are usually only concerned with analysing the safety aspects of a system. The
severity will be scaled and resolution increased related to the safety criteria more heavily. For
example, a system switching into a safe degraded mode of operation with a warning lamp may be
rated 5 from a safety viewpoint whilst in other FMEAs the severity for a warning lamp may be 8 or
9. Priority is then given to analyse cases with potentially high severity.
FMEAs performed to support safety evaluation of products are focused on examining the product
for its immunity to single point failure events. That is, single point failures that could result in a
safety event must be shown to be eliminated from the design. That is typically extended to include
those dual point failure events which include an initial failure that is not detectable by the hardware
and/or software of the system. Safety related FMEAs have included system, software, and
hardware. These FMEAs are usually separate, with software FMEAs being performed both at the
system and detailed levels.
Software FMEAs applied to a product for evaluation of safety aspects of the system have been
shown to be of value. However, they are both expensive and time consuming at the detailed level.
Incorporation of the system level software FMEA of reference [20] into a combined hardware and
software system level FMEA may be advantageous both technically and for cost reasons. Most of
the value provided by software FMEAs is provided by the system level evaluation. However,
evaluation only at the system level cannot completely ensure that single point failure paths are not
included in the system and software design.
Verification of immunity from single point failures due to corruption of single variables has been
one of the crucial outputs of detailed software FMEA. These FMEAs, while requiring significant
resource commitments, have been plausible for many safety critical designs, including electronic
steering systems by various tier 1 suppliers. Completion of a detailed SW FMEA in a time frame
that supports the design process has been contingent on the use of a traditional ‘waterfall’ type
development process, a well structured design, and comparatively small (less than 128K compiled
size) software size.

14.11.2006 2.0 C-35


EASIS Deliverable D3.2 Part 1 - App C

Upcoming designs appear to be unlikely to routinely include these properties. The increasing size
of automotive software, coupled with a conversion to a spiral development process, limit the
effectiveness of current methods of detailed software FMEA. Completion of these FMEAs in a time
frame that supports design appears problematic. This makes it difficult to recommend the method
except for relatively small, safety critical, systems. Automated aids may have some potential to
extend the method to the larger systems now being developed in the automotive industry
eventually. These automated aids do not currently exist.

C.6.3 Conclusion on application of FMEA to software

The EASIS consortium concludes that whilst a software FMEA may have benefits in automotive
software development, the results and recommendations from a low level, detailed software FMEA
cannot generally be completed in a time frame that adequately supports the spiral design process
which is becoming more standard in the automotive industry. Process type considerations are
generally well documented and available in the many and comprehensive process, coding
standards and other types of documentation available.
For a particular product, a system level FMEA including key components of the software
architecture is likely to provide all of the value of the current system level software FMEA and
much of the value of the detailed software FMEA when used in conjunction with numerous, existing
process and product recommendations. The use of detailed software FMEAs cannot be
recommended at this time except in limited cases. The current state of technology for detailed
software FMEAs does not allow cost effective and timely completion of the analyses when a spiral
(incremental) development process is in use.
The benefits from a system level product and process approach is potentially equivalent value,
available earlier, with less resource and is more practical to maintain. The system level analysis
(FMEA or equivalent) should include typical types of microprocessor hardware faults such as
execution errors (see ref [20]) and the effects of critical software variables being corrupted. The
development process must include validation.

C.6.4 Software Fault Tree Analysis

A general overview of Fault Tree Analysis (FTA) is given in section C.4.3.2 of this appendix. The
idea of applying the FTA technique specifically to the software has been a topic of some interest in
the academic and industrial communities for some time. However, a number of observations
indicate that Software FTA may have significant limitations for automotive software. Some of these
observations are related to automotive-specific characteristics of software development, whereas
others are concerned with the software FTA method itself. These observations are elaborated in
the following subsections.

C.6.4.1 Software for automotive applications

Short time to market, limited costs and high possibility of configuration are all essential features
required of automotive software. These features, combined with the need to reuse, combine and
improve different versions and modules of pre-developed software have led to a change in the way
automotive software is created. Model-based techniques are now being increasingly used.
With the Model Based Approach the software development can be considered as a chain of
activities starting with the specification, passing through the modelling and its validation and ending
with automated code generation. Errors occurring in these activities are clearly not detectable by a
deductive technique of analysis such as FTA. In reality, this problem is primarily addressed by
introducing appropriate verification and validation steps in the software development process,
covering all stages of development.

14.11.2006 2.0 C-36


EASIS Deliverable D3.2 Part 1 - App C

The model based approach, widely used in automotive, seems to discourage the execution of
detailed software FTAs in favour of system level FTAs. At the system level the efficiency of
traditional FTAs conducted on the system physical components can be improved, when necessary,
by considering the software components described by the software model. In other words, a
specific deviation from the intended service to be provided by a specific software component can
be considered as a basic event in a system level FTA. Such an FTA would then include both
hardware-related and software-related basic events.

C.6.4.2 The FTA Approach

When applying FTA to software, and in particular to automotive software, it appears that some of
the main FTA strengths are lost. In fact:
• FTA is not suitable to assess the reliability of the software because it is not possible to
express this feature by a top event. This is a very important limitation as software reliability
has a high impact both on the safety and the availability of a system.
• The software architectures used in automotive field are mainly simplex; no redundancies,
except for monitoring and recovery functions, are provided. This means that the FTA output
would be a simple list of elementary events without any conditional event (AND gates).
• Only a very limited class of failures originating from software can be considered
“deterministic”, hence manageable by an analytical approach.
So, the limited results expected from a FTA on a single software top event are generally
considered disproportionate to the resources and the time needed to assess thousands of code
lines typically related to one single top event.

C.6.4.3 Calibration

As already mentioned, software for automotive is conceived and organized to be used in more than
one application or in multiple variants of the same application.
As an example, we may consider the Electronic Stability Program (ESP) function often found in
vehicles of medium and large class. From the OEM point of view, the management of the ESP
function can be very articulated; the ESP could be “optional” on a particular vehicle model or could
have different characteristics for different vehicle models (small / large sedan, station wagon, van,
sport, etc.).
In order to take all these variations into account, the software shall be conceived to allow changes
of sets of parameters, usually called calibrations, to tailor the software to each vehicle model
variant. The final calibration typically takes place quite late in the product development lifecycle.
In many cases the calibration process involves the tuning of a large number of parameters for
which only general guidelines can be created. Thus, the calibration is a potential source of errors
the effects of which can not be efficiently predicted by analytical assessments.
The high potential for calibration-related problems, as indicated by field failure warranty data,
combined with the low efficiency of analytical methods has encouraged the search for alternative
methods for software assessment based on testing and process quality check.

C.6.4.4 Conclusions on software FTA

At the current state of technology, detailed software FTAs are considered incompatible with the
development processes used in the automotive sector. However, software faults can be accounted
for in system-level FTAs by exploiting model based techniques where software is described in
terms of interacting “components”. In any case, the contribution of potential software defects to the
occurrence of system hazards can only be assessed qualitatively, not quantitatively.

14.11.2006 2.0 C-37


EASIS Deliverable D3.2 Part 1 - App C

In addition to system level FTAs, the primary means for hazard occurrence analysis with respect to
software should be an assessment of the processes for software development and quality
management.

C.6.5 System Software Block Failure Analysis

This section describes an analysis method that is particularly suited to the early phases of system
and software development, when only outline information is available as an input to the analysis. It
requires, in terms of personnel, the technical know-how of system engineers, hardware engineers
and software engineers, with knowledge of the application. In terms of documentary input, it
requires only a system block diagram – the more information this has, the more useful will be the
process output.

C.6.5.1 Overview

The primary goal of the analysis method is to determine whether any hazardous outputs exist at
the system block level. For the purposes of this analysis, the system block design includes both
hardware and software blocks, and any blocks of functionality that will need to be present in the
final design, but have not yet been allocated to either hardware or software.
The method can be used either for system analysis, where the whole system block diagram is
analysed, or for software analysis, where only the system blocks allocated to software are
analysed. The former is preferred, for two main reasons.
First, there may yet exist some blocks that are not allocated to either hardware or software. These
may not be properly analysed if only hardware or software blocks are studied.
Second (and possibly more significant) is that there are occasions when hardware functional
blocks can mitigate software failures, and vice versa If the whole system is not analysed together,
these synergies may be overlooked.

C.6.5.2 Process description

The method comprises six basic steps, as follows.


1 Identify Functions
5 in which all functions are identified, labelled and enumerated from the System Block Diagram
2 Identify System Operating Scenarios
5 in which any way in which the system may be used or abused is listed, whether part of the
original design or not.
3 Identify Conceivable Function Failures
5 in which it is assumed that if it can go wrong, it will go wrong. No assignment of failure rates
or probabilities is performed. Principally a brainstorm, although prior experience will lead to
an initial list very quickly.
4 Classify Failures
5 in which the effect and severity for each failure is determined, assuming no mitigating action
occurs.
5 Identify Mitigating Mechanisms
5 in which any actions and mechanisms (software, hardware, operating instruction, etc) that
will effectively mitigate each failure effect are proposed, and their independence assessed.
All and any mechanisms, including (but not limited to) redundancy, plausibility checking,
range limitation etc should be considered.
14.11.2006 2.0 C-38
EASIS Deliverable D3.2 Part 1 - App C

At this stage the System Block Diagram is colour coded to show the presence of mitigated
and unmitigated failures within the functional blocks at the system level.
6 Identify Residual Risk
5 in which the areas where risk remains after applying the identified mitigating mechanisms to
the system design are identified. Corrective actions are proposed to reduce the risk from
those areas. Examples of corrective actions include addition of a mitigating function, and (for
software) specific checks on a particular variable.
It will be noted that the activities recommended for inclusion in this analysis method cover several
areas of the EASIS Dependability Framework. Specifically, steps 1 & 2 are basic preparation work
for the subsequent steps; steps 3 & 4 are part of ‘Hazard Identification and Classification’ (covered
in more detail in Appendix B); step 5 is part of the process of deducing Dependability
Requirements (for more of which please see Appendix D); and step 6 is split across other areas of
this appendix (residual risk) and Appendix D (corrective actions leading potentially to further
requirements).
In terms of documentation, the process takes as its input a document which exists within any state
of the art automotive development process, and produces as its output on the one hand
modifications & annotations on the system block diagram, and on the other hand corrective
actions, mitigations and design artefacts, all of which can be added to the change control
mechanisms and design processes used in the same state of the art development process.

C.6.5.3 Advantages and disadvantages of the method

This analysis method effectively uncovers shortcomings in the system design that can be simply
overcome by the addition of functionality (software or hardware) or other system modifications
(changes to the manual or service procedures), at a stage in the lifecycle when such addition and
change can be readily planned into the overall level of work. Similarly, it can highlight
shortcomings that can be mitigated in hardware at a stage before the hardware is finalised.
However, if the method is used before the System Block Diagram is properly stable, unnecessary
mitigations and corrective actions may be requested, leading to unnecessary cost. Additionally,
important mitigations and corrective actions may be overlooked, leading to less safe designs. This
disadvantage is easily overcome by ensuring the System Block Diagram is at an appropriate level
of maturity before attempting to apply the process.

14.11.2006 2.0 C-39


EASIS Deliverable D3.2 Part 1 - App C

C.7 References

[1] IST Program EASIS project, Technical Annex, Description of work, 2003.
[2] MIL-HDBK-217F. Reliability Prediction of Electronic Equipment, US
Department of Defense, 1991
[3] NPRD-95. Non electronic Parts Reliability Data, Reliability Analysis Center
(RAC),
[4] IEC 60812, Analysis techniques for system reliability - Procedure for failure
mode and effects analysis (FMEA), IEC, 1985
[5] IEC 61025, Fault tree analysis (FTA), IEC, 1990
[6] BS 5760-5, Reliability of systems, equipment and components. Guide to
failure modes, effects and criticality analysis (FMEA and FMECA), British
Standards Institution, 1991.
[7] MIL-STD-1629A, Procedures for performing a failure mode, effects and
criticality analysis, U.S. Department of Defense, 1980.
[8] Def Stan 00-41, Reliability and maintainability: MoD guide to practices and
procedures, Issue 3, UK Ministry of Defence , 1993
[9] SAE J 1739, Potential Failure Mode and Effects Analysis in Design (Design
FMEA), Potential Failure Mode and Effects Analysis in Manufacturing and
Assembly Processes (Process FMEA), and Potential Failure Mode and
Effects Analysis for Machinery (Machinery FMEA), Surface Vehicle
Recommended Practice, SAE International, Rev. August 2002
[10] Def Stan 00-58, HAZOP studies on systems containing programmable
electronics, Issue 2, UK Ministry of Defence, 2000.
[11] F. Redmill, M. Chudleigh, and J. Catmur, System Safety: HAZOP and
Software HAZOP, John Wiley and Sons, 1999, ISBN 0-471-98280-6.
[12] H Raafat, Risk Assessment Methodologies, University of Portsmouth, ISBN
1 069959434
[13] L.M. Ridley and J.D. Andrews. “Application of the Cause-Consequence
Diagram Method to Static Systems”, Reliability Engineering and System
Safety vol. 75, no. 1, Jan. 2002, pp. 47-58(12)
[14] J. Suokas and V. Rouhiainen, Quality Management of Safety and Risk
Analysis. Elsevier Science Publishers B.V, 1993.
[15] N. Siu. “Risk Assessment for dynamic systems : An overview.” Reliability
Engineering and System Safety, Vol 43, 1994, pp. 43-73.
[16] M. E. Paté-Cornell. “Risk Analysis and Risk Management for Offshore
Platforms: Lessons from the Piper Alpha Accident”, Journal of Offshore
Mechanics and Arctic Engineering, Vol. 115, Aug 1993, pp. 179-190.
[17] C. Acosta and N. Siu, “Dynamic event trees in accident sequence analysis:
application to steam generator tube rupture”, Reliability Engineering and
System Safety, Vol 41, 1993, pp. 135-154.
[18] IST Program EASIS project, “Description of Fault Types for EASIS”, 2005
[19] Guidelines for the Use of the C Language in Vehicle Based Software,
MISRA, 1998
[20] P. Goddard. “Software FMEA Techniques”. IEEE Reliability and
14.11.2006 2.0 C-40
EASIS Deliverable D3.2 Part 1 - App C

Maintainability Symposium (RAMS), 2000, pp. 118-123.


[21] B. Randall, “System Reliability and Structuring”; Computing systems
reliability, Cambridge University Press 1979, pp. 1-18.
[22] K. Echtle, Fehlertoleranzverfahren, Springer-Verlag 1990.
[23] Deliverable D1.2, EASIS consortium 2005.
[24] ARP4761: Guidelines and methods for conducting the safety assessment
process on civil airborne systems and equipment, SAE Committee S-18,
Society of Automotive Engineers, Inc., August 1995
[25] ARP 4754: Certification Considerations for Highly-Integrated or Complex
Aircraft Systems, Society of Automotive Engineers, 1996.
[26] IEC TR 15504: - Software Process Assessment, IEC, 1998.
[27] IEC 61508, Functional safety of electrical/electronic/programmable electronic
safety-related systems, IEC, 1998.
[28] P. Humphreyes and B.D. Johnston. Dependent Failure Procedure Guide,
SRD-R-418, United Kingdom Atomic Energy Authority, Safety and Reliability
Directorate. March 1987
[29] C. Jung, Stand des ISO-Standards zur Funktionalen Sicherheit für die
Automobilindustrie, (mainly in English), Presentation at Safetronic 2005
[30] SETTA project, Systems Engineering for Time Triggered Architectures, IST
Contract 10043, Final Document, 18 April 2002
[31] O. Bridal, Issues in the Design and Analysis of Dependable Distributed Real-
Time Systems, PhD dissertation, Department of Computer Engineering,
Chalmers University of Technology, 1997, pp. 13-18

14.11.2006 2.0 C-41


EASIS Deliverable D3.2 Part 1 - App C

This page intentionally left blank

14.11.2006 2.0 C-42


EASIS Deliverable D3.2 Part 1 - App D

Deliverable D3.2 Part 1 – Appendix D

Establishment of dependability-related requirements

Version number: 2.0

Date of preparation: 14.11.2006

© 2006 The EASIS Consortium


EASIS Deliverable D3.2 Part 1 - App D

This page intentionally left blank


EASIS Deliverable D3.2 Part 1 - App D

Table of contents

D.1 Introduction.....................................................................................................D-1
D.2 Overview of requirement types.......................................................................D-3
D.2.1 Requirement hierarchy example .........................................................D-5
D.2.2 Ideal properties of requirements .........................................................D-6
D.2.3 Relationships among the Requirement Types ....................................D-6
D.2.4 The relationship between Hazard Criticality and Dependability
Requirements......................................................................................D-7
D.3 Requirement types .......................................................................................D-11
D.3.1 Hazard probability requirements .......................................................D-11
D.3.2 Fault tolerance and functional degradation requirements.................D-19
D.3.3 Requirements on system architecture ..............................................D-45
D.3.4 Requirements on specific error detection mechanisms and
corresponding reactions....................................................................D-46
D.3.5 Quantitative requirements for hardware architecture........................D-53
D.3.6 Requirements on the avoidance of non-systematic faults ................D-58
D.3.7 Critical functional requirements.........................................................D-59
D.3.8 Requirements for functional limitations .............................................D-61
D.3.9 Requirements on the design process ...............................................D-63
D.3.10 Requirements on Isolation and Diversity ..........................................D-69
D.3.11 Requirements on the manufacturing process ...................................D-71
D.3.12 Requirements for systems external to the system of concern ..........D-72
D.3.13 Requirements on user manual and service manual..........................D-77
D.4 References ...................................................................................................D-80

14.11.2006 2.0 D-iii


EASIS Deliverable D3.2 Part 1 - App D

This page intentionally left blank

14.11.2006 2.0 D-iv


EASIS Deliverable D3.2 Part 1 - App D

D.1 Introduction

In this appendix, the issue of how to determine and specify appropriate dependability-related
requirements for an Integrated Safety System is addressed. Methods and analysis techniques to
identify dependability requirements are discussed, covering the entire process from high-level
implementation-independent requirements to low-level implementation-dependent design
requirements.
The appendix is organised according to different requirement types. For each such type, the
following aspects are addressed:
• How to determine suitable requirements of this type for a given system, including how to
validate that the requirements are appropriate
• The characteristics of the requirement type in terms of its meaning, expressive power, how
to formulate the requirements, limitations as well as specific difficulties associated with
defining requirements of this type
• Relationship between requirements at different levels of detail and relationship to other
types of requirements with respect to the decomposition of requirements from higher levels
to lower levels (for example from system level to subsystem and component levels)
• Examples of requirements of this type
• Verification issues
Figure D.1 shows the place of the requirements activity within the dependability activity framework
that is defined in Appendix A section A.3 (“The EASIS dependability activity framework”). As
implied by the Figure, the inputs to the establishment of dependability-related requirements are:
• A list of hazards that have been classified with respect to criticality
• A description of the relationship between hazards and their causes, including qualitative
(cause-effect) and quantitative (probabilistic) analysis of hazard occurrence
• Functional and physical descriptions of the system of concern, in more or less detail
depending on the current development phase e.g. conceptual phase, early phase, late
phase
Since the requirements obviously affect the system design and since the analysis performed on
the design can generate additional requirements, the process is iterative so the figure should not
be perceived as a sequence of steps to be taken in the development.
It is important to understand that requirements concerning the overall approach to dependability is
not within the scope of this appendix. Overall process issues are instead dealt with in EASIS
deliverable D3.2 as a whole and particularly in the definition of the dependability activity framework
(see Appendix A).

14.11.2006 2.0 D-1


EASIS Deliverable D3.2 Part 1 - App D

Identification of hazards

Development
and design of
the integrated
Classification of Hazard occurrence safety system
hazards analysis

Establishment of dependability-related
requirements
Verification and
validation of
dependability-
related
requirements
Safety case construction

Figure D.1 EASIS dependability activity framework

14.11.2006 2.0 D-2


EASIS Deliverable D3.2 Part 1 - App D

D.2 Overview of requirement types

The requirement types investigated in this document are briefly presented below. It is important to
understand that the main purpose of defining these types is to facilitate a logical structuring of the
document. As each requirement type has its own characteristics and its own considerations
concerning how to determine, formulate and verify the requirements, these types form the basis for
the structure of this appendix.
Hazard probability requirements
Requirements on the tolerable probability of each potential hazard can be considered as
the most fundamental type of safety requirement. A system that has a sufficiently low
probability of creating hazards is in principle fit for its intended use with respect to safety,
regardless of whether or not it fulfils all other of its dependability-related requirements.
Conversely, if the hazard probabilities are above the tolerable limit, the system is not
acceptable regardless of whether or not it fulfils other dependability-related requirements.
Fault tolerance requirements
As faults are the root causes of failures, it is desirable to break the cause-consequence
chain between faults and failures. At the very least, the resulting failure mode (at the user-
perceivable level) should be as safe as possible. Fault tolerance requirements describe the
allowed relationships between faults and failures in terms of the required degree of
functionality when a given fault (for example "any single fault") or combination of faults
exist.
In its purest form, fault tolerance means that full functionality shall be provided by the
system when a given fault (or combination of faults) is present. For the purpose of this
document, however, this requirement type is not limited to such pure fault tolerance.
Functional degradation is also included. It is often sufficient that the system of concern
enters a degraded operation mode (for example a failsafe mode) in response to an existing
fault, rather than maintaining full functionality.
Fault tolerance requirements can be decomposed into more detailed requirements on
mechanisms to implement for achieving fault tolerance.
Requirements on system architecture
Based on the findings of dependability analyses, it is often appropriate to specify
requirements on the system architecture. For example, requirements on the existence and
utilisation of redundancy may be specified.
Requirements on specific error detection mechanisms and corresponding reactions
As a result of the requirements analysis process, specific dependability mechanisms can
be identified. Such mechanisms typically involve an error detection method and a
corresponding reaction. The reaction typically includes a change of operation mode (for
example a transition to a safe state), information to the driver and the storage of a DTC
(Diagnostic Trouble Code). Descriptions of these mechanisms can be provided to the
hardware and software designers for implementation.
Quantitative requirements for HW architecture
Although hazard probability requirements represent the most obvious quantitative-
probabilistic requirement type, other types of quantitative requirements are also addressed
in existing and upcoming standards. These deal with metrics such as Safe Failure Fraction,
Dangerous Failure Coverage and Monitoring Coverage. In short, these metrics describe the
proportion of faults that lead to a particular outcome. For example the Safe Failure Fraction
is the rate of faults that do not directly lead to a dangerous failure mode divided by the total
failure rate.

14.11.2006 2.0 D-3


EASIS Deliverable D3.2 Part 1 - App D

Fault avoidance requirements


When it has been found that a particular fault can contribute to the occurrence of a hazard,
there are essentially three ways to handle this potential problem. Firstly, a mechanism for
breaking the cause-effect chain from the fault to the hazard can be defined and
implemented. Secondly, the situation can be accepted "as is", particularly if the fault is
considered very improbable and/or if the hazard is not highly critical. Thirdly, the fault can
be prevented from occurring in the first place. Fault avoidance requirements relate to the
third of these alternatives.
With respect to design faults, fault avoidance is covered by requirements on the design
process (see below). Similarly, the avoidance of faults introduced in the manufacturing of
hardware is covered by requirements on the manufacturing process (see below). That
leaves the fault avoidance with respect to random hardware faults as the sole topic of this
requirement type. Such fault avoidance is mainly a matter of choosing high-quality
components, achieving a high immunity to EMI (Electro-Magnetic Interference) and paying
attention to thermal and other environmental constraints. This is mainly outside our scope
and this requirement type is only superficially addressed in this document.
Critical functional requirements
The results of a hazard occurrence analysis often shows that if a particular software
module violates a particular functional requirement for that module, a hazard may occur. If
so, it is obviously highly desirable that this particular software design fault is avoided. In
general, it is not feasible to prove that an entire distributed system fulfils its specification
completely but proving that a specific requirement is fulfilled by a specific software module
is within the reach of today’s technology, for example by formal methods. Thus, it makes
sense to introduce a requirement type that represents critical functional requirements. Such
requirements can be considered as functional requirements that demand a stricter
verification procedure than usual.
Requirements for functional limitations
In order to limit the criticality of a given failure, functional limitations may be imposed. These
limitations may affect the duration or the magnitude, or both, of some action performed by
the system. If it is known that unintended activation of a function has a non-zero probability,
it may be a good idea to impose such limitations.
Requirements on the design process
For the avoidance of systematic faults in the design of software and hardware,
requirements on the design process may be formulated. Such requirements specify the
techniques and methods that should be used in the design process.
Requirements for isolation and diversity
In order to avoid common-cause faults that violate the typical assumption of fault
independence, specific requirements on isolation and diversity may be formulated.
Requirements on the manufacturing process
For the avoidance of systematic faults in the hardware manufacturing, requirements on the
manufacturing process may be formulated. This requirement type is mainly outside the
scope of the EASIS project but is included here for completeness.
Requirements on systems external to the system of concern
Sometimes, a hazard associated with a system can be mitigated by another system for
which the system developer is not responsible. This aspect should not be forgotten and it is
important that the organization has a defined way of transferring requirements between
teams that work on different systems. The actual requirements on the external system can
be of any type.

14.11.2006 2.0 D-4


EASIS Deliverable D3.2 Part 1 - App D

Requirements on user manual and service manual


One way of mitigating the effects of a potential hazard or failure is to provide warnings,
guidance and other safety-related information in user manuals. As faults may be introduced
by inappropriate service actions during maintenance, specific instructions to service
personnel should be identified during system development and integrated into the service
manual and similar documentation.

D.2.1 Requirement hierarchy example

A hypothetical example that shows how different types of requirements may be related to each
other is given in Figure D.2. This example has been produced for the sole purpose of this
document and does not represent any existing system, so the numbers and requirements given
are purely fictional. The system under consideration is a conventional cruise control system and
the hazard considered is "inability to deactivate the cruise control when the driver is pressing the
brake pedal". Although cruise control can not be considered as an Integrated Safety System, this
example gives an idea about how requirements might relate to each other.
The figure shows how the top level probabilistic requirement is broken down into a set of lower
level design-oriented requirements, finally ending in detailed implementation-level requirements.

Example system/function: Cruise Control


Example hazard: CC is not deactivated
when the driver is braking
Hazard probability
An inability to perform the "deactivate CC when driver
brakes" function shall not occur more often than 1E-8/h
Pedal
CC control unit (CCCU)
monitoring Fault tolerance Critical function
(hardware + software)
(hardware)
If the event chain from the brake pedal It shall be (e.g. formally) verified that the CCCU
depressing to the CC deactivation does not work software fulfills the condition ”deactivate CC
Engine (for whatever reason), CC shall be deactivated when driver is braking”, assuming that the
hardware is fault-free

Fault tolerance Fault tolerance


If a fault in the brake pedal monitoring hardware exists If a fault in the CCCU exists and either prevents it from being
and prevents the CCCU from being able to detect that able to detect that the driver is braking or prevents it from being
the driver is braking, CC shall be deactivated able to initiate a deactivation of CC, CC shall be deactivated

Mechanism Mechanism Mechanism Mechanism


There shall be redundant monitoring The signal(s) from the The CCCU shall check A CCCU-external unit (watchdog
of the prake pedal position, and the pedal monitoring hardware that it is operating or separate processor) shall
CCCU shall deactivate CC if there is shall be checked w.r.t. correctly and deactivate check if the CCCU appears to be
an inconsistency in this information plausibility (range, etc) CCCU if it is not operating as it should and
deactivate CC if it is not

Details... Details... (Detailed requirements


for the external
monitoring...)
Mechanism Mechanism Mechanism Mechanism Mechanism
The CCCU shall perform The CCCU shall In addition to its ”normal” The CCCU shall The CCCU shall check its
ROM checksum check perform checks of its software, the CCCU shall perform a memory addressing and
and deactivate CC if a RAM. If an error is perform a separate check that predefined self-test execution flow for correctness. If
checksum error is detected, CC shall be deactivates CC if it is found that (at power-on or an error is detected, CC shall be
detected deactivated the following condition is continuously) deactivated
fulfilled for 50 ms:
(CC is active) AND (the driver
is braking) The self-test shall be (Detailed descriptions of
The ROM checksum The RAM check
performed in the the addressing and
mechanism mechanism
following exection flow checks...)
shall...(details...) shall...(details...)
way...(details...)

Figure D.2 Requirement hierarchy example

14.11.2006 2.0 D-5


EASIS Deliverable D3.2 Part 1 - App D

D.2.2 Ideal properties of requirements

Regardless of the requirement type, there are some properties that the requirements should ideally
fulfil. Some of these properties relate to individual requirements while others relate to the complete
set of requirements for a system. These properties are the following:
• Completeness: All relevant requirements shall be included. Thus, the requirements
specification should distinguish the behaviour of the desired system from that of any other,
undesired system that might be designed. It should however be kept in mind that this
EASIS document is only concerned with dependability aspects.
• Non-ambiguity: The requirements shall not be open to interpretation.
• Consistency: The requirements shall not contradict each other.
• Correctness: The requirements shall represent the desired behaviour or properties. For
each requirement, it shall be validated that the requirement is appropriately defined and
that it really provides a benefit. Here, "validation" of a requirement is taken to mean the task
of ensuring that a requirement is consistent with what is intended.
• Atomicity: Each requirement shall represent a single "designable" entity.
• Verifiability: The requirements shall be formulated in a way that makes it possible to verify
that they are fulfilled. For each requirement, a method and criteria for verification shall be
defined.
• Traceability: It shall be possible to trace between requirements at different hierarchical
levels, for example between system-level requirements and requirements on individual
components. Furthermore, tracing between every hazard and the requirements related to
this hazard shall be possible, in both directions.
It is of course not necessary that these properties hold at all times during the requirements
engineering process. However, the final requirements should preferably fulfil the listed criteria.
The use of a dedicated requirements management tool is strongly recommended. Such a tool
typically supports most of the requirements properties discussed above.

D.2.3 Relationships among the Requirement Types

Within the list of requirement types above, there are three distinct requirement categories:
• Product Requirements These requirements will lead to functional behaviour, or will
affect the established functional behaviour, within the system under development or another
system.
Within this category are Hazard Probability, Fault Tolerance, Fault Avoidance, Quantitative
H/W, Dependability Mechanism, Critical Functional, Functional Limitation, and Separation/
Isolation/ Diversity requirements.
For requirements that affect systems other than the system under development, no
knowledge can be assumed concerning whether those requirements will be implemented.
• Process Requirements These requirements are concerned with the process used to
develop, verify or validate the system under development.
Within this category are Requirements on the Design Process, Requirements on the
Manufacturing Process and (via the Requirements on the Service Manual) Requirements on
the Maintenance Process.
• Affected Systems This category of requirements indicates the system to which the
particular requirement (be it product or process) relates. This will generally be the system
under development, and this is assumed to be the case if no system is specifically

14.11.2006 2.0 D-6


EASIS Deliverable D3.2 Part 1 - App D

referenced. Since a system tends to be mentioned only when it is not the system under
development, it is easy to deduce that there must be categories of requirement for external
systems. This is not the case for a genuinely orthogonal set of requirement types.
Nevertheless the categories exist here because it is useful to use ‘external systems’ as a
starting point for determining dependability requirements that may not otherwise be obvious.
Within this ‘category’ are Requirements on External Systems and Requirements on the User
Manual and Service Manual
The first two of these three categories are (at a low enough level in the requirements hierarchy)
independent of each other – requirements that affect the process are unlikely to be product
requirements. However, both product and process requirements will be fulfilled in a system, and
that system may be either the system under development or some external system.
It should be noted that the distinctions drawn here will not be true in all cases. At higher levels of
abstraction (higher up the hierarchy of requirements) the distinctions are less clear. However,
proper elaboration of the requirements will lead to atomic requirements, which will by definition
represent a single “designable” entity (see ‘Ideal Properties of Requirements’). Such a
requirement can only impact one system, and a “designable entity” will not be both a product entity
and a process entity.
Process requirements must eventually get nailed down to a particular process (design,
manufacture, maintenance, etc) and step (design, implementation, verification, End-Of-Line-Test,
planned maintenance, repair, etc) in order to be implemented. Similarly product requirements
must eventually get nailed down to a particular discipline (software, hardware, mechanical,
hydraulic, etc) for implementation.
Let us consider a purely hypothetical case, beginning with the hazard “Unstable vehicle”, which
has as one of its causes “Worn nurgling valve pinion”, and as another cause “Fully extended
nurgling valve”. These hazard causes lead to the requirement “The wear on the nurgling valve
pinion shall be inspected at every standard service interval”. The only way to inspect the nurgling
valve pinion is to fully extend the nurgling valve, which of course leads to the second cause of the
hazard. It is also known that retraction of the nurgling valve during inspection can lead to injury to
the inspector. If we propose a software requirement ‘The nurgling valve shall be fully extended by
diagnostic command’, we have two apparently conflicting requirements, which mean we can’t put a
duration limit on the full extension activation that’s safe for the vehicle because it’s not safe (or not
practical) for the inspector. As is often the case, we have conflicting requirements and we have
several possible solutions, some or all of which may be used together:
• We can either reconsider the inspection method (creating maintenance process requirements).
• We can increase our confidence that the nurgling valve is never extended when it could make
the vehicle unstable (creating software functional requirements, if we decide to provide an
unlikely and unique combination of inputs with which to ‘gate’ the extension request).
• We can limit the extension of the nurgling valve under normal circumstances (creating software
functional requirements).
This example shows how a dependability requirement, that has arisen from a Hazard and its
cause, can lead to requirements on both the process and the product. In fact, the process
requirement will probably also lead to a service manual requirement concerning how to prepare the
vehicle for inspection of the nurgling valve.
In conclusion, there are relationships between the requirement types listed in this document that
depend on whether they are process requirements, product requirements, or an attribute (‘system
affected’) of one of these two types of requirement.

D.2.4 The relationship between Hazard Criticality and Dependability Requirements

In order to determine the nature of the relationship between the criticality of the hazards of the
system under development and any dependability requirements deduced for those hazards, we
14.11.2006 2.0 D-7
EASIS Deliverable D3.2 Part 1 - App D

should first consider why we are gathering dependability requirements. And the answer we find to
this question is that we have a system which we believe can lead to certain hazards, and we wish
to protect users of the system from those hazards, or reduce their likelihood of occurrence to an
acceptable level. In other words, we want to design a dependable system – a system upon which
the user can depend.
So the second question we need to consider concerns the nature of the system we are designing.
Generally speaking, we do not start the design of our system with a completely blank sheet of
paper. We begin with a ‘black box’. We know quite a lot about the system we wish to control, and
we also have a pretty good idea about how we want it to do it. We also have some idea of the
signals and sensors we have available, and what actuators we will need to control. So we know
what it is we want the black box to do, in broad terms, and we know with some degree of certainty
what the inputs to and outputs from this black box will be. Given this black box, with its inputs and
outputs, we can perform Hazard Identification and Hazard Classification to identify the hazards
associated with this black box and to make an estimate of the criticality of the hazards. (Hazard
Identification and Hazard Classification are described in Appendix B.)
Once we have determined criticality, we can start to determine dependability requirements for the
black box, and maybe the rest of the system as well. Now there is an intuitive link between the
criticality of the system (as given by the most critical hazard exhibited by the system) and the
dependability that the system will need to exhibit. If a system has only low criticality hazards it can
(by definition) do little or no harm through any direct action of its outputs or through faults on its
inputs. So it seems sensible to assume that dependability requirements will generally not be
sought in any great depth. However, if we have a system in which we have determined that failures
may very well lead to highly critical hazards, we accept that we will need to expend significant
effort ensuring that we do what is reasonably practical to avoid the situations which might lead to
those fatal consequences. Moreover, we only need to expend effort in proportion to the criticality
of the hazard when uncovering dependability requirements for each hazard.
It is important to realize that the hazards associated with a system usually have different criticality,
and that it is not necessarily appropriate to let the hazard with the highest criticality dictate the
development of the total system. By way of an example, consider a system that has one extremely
dangerous failure mode FM1 and another much less dangerous failure mode FM2. The Hazard
Classification would then result in a higher criticality assigned to FM1 than to FM2. The
requirements concerning the prevention, avoidance and/or mitigation of FM1 would consequently
be stronger than the requirements concerning the prevention, avoidance, mitigation etc of FM2.
The hazard probability requirements, for example, would typically differ between these two failure
modes.
To assist in this study, the Figure D.3 shows the two potential routes to Dependability
Requirements, and the methods used to ensure that those requirements are properly related to the
criticality of the Hazards associated with the system to which they apply.

14.11.2006 2.0 D-8


EASIS Deliverable D3.2 Part 1 - App D

Dependability
Identified
Identified Requirement System Description
Hazard
Hazard Types Experience
Specific Guidelines
etc
Elicit
Classified
Classified ALARP
Hazard
Hazard

Elicited
Elicited
Elicited
Dependability Safety Case
Hazard Dependability
Hazard Dependability
Requirement
Probability Requirement
Probability Requirement
Requirement
Requirement

Key to Symbols
Test for Test for
Applicability Support
Elaborate
Artifact

ALARP AND

Activity
Elaborated Elicited
Elaborated
Dependability Elicited
Dependability
Elaborated
Dependability
Requirement Dependability
Requirement
Dependability
Requirement Requirement Principle or
Requirement General
Guideline

The Set of All Dependability Requirements

Figure D.3 Relating Dependability Requirements to Hazard Criticality


At the highest level in the requirements hierarchy (see the hierarchy example in Figure D.2) appear
requirements (often these are quantified Hazard Probability requirements) that are deduced
directly from the Hazard Classification, which also assigns each Hazard its criticality. Thus further
elaboration of these requirements will inherently relate the dependability requirements to the
hazard criticality. This principle is shown in the left-most (orange) vertical chain of artefacts and
activities in Figure D.3.
Criticality affects the elicitation, analysis and implementation (or rejection) of the rest of the
hierarchy of dependability requirements through the principle of ‘tolerable risk’, and the established
ALARP principle. Both of these topics are covered in detail in sections D.3.1.4 and D.3.1.5 in this
appendix. Here we shall consider how they relate to requirements.
Determining whether a requirement is applicable to the criticality of the hazards of the system
under development has two parts. The first concerns whether, given unlimited budget, the
requirement could be used to appropriately improve the dependability of the system. The second
concerns the application of ALARP, and determining whether, given the budgets and resources
available to the development team, the requirements should be used to improve the dependability
of the system.
So the problem of relating dependability requirements to hazard criticality will only arise when new
requirements are elicited, based on the categories listed in this document. As these new
requirements arise, they must be related to the hazard they aim to mitigate. If no existing hazard
would be mitigated by satisfying the requirement, consideration should be given to whether a
previously undiscovered hazard exists, or alternatively whether the requirement is a case of ‘gold-
plating’. In the first case, hazard classification needs to be re-run for this new hazard, and a top
level requirement added (possibly a quantified Hazard Probability Requirement), with which the

14.11.2006 2.0 D-9


EASIS Deliverable D3.2 Part 1 - App D

new requirement can be associated. In the second case, the requirement can be rejected. This
principle is shown in the central (green) vertical chain of artefacts and activities in Figure D.3
A further justification for inclusion or exclusion of requirements is given in the Safety Case, which
can be used to show that enough requirements (and no more) have been implemented to justify
the statement of the primary goal “The system is safe enough to operate in the given environment”.
The ‘enough and no more’ part is again founded on the ALARP principle. This final principle is
depicted by the right-most (blue) vertical chain of artefacts and activities in Figure D.3. Note that a
requirement is only accepted into the set of all dependability requirements if it is both supported by
the Safety Case and applicable to one or more top-level hazard-based requirements.
To conclude, requirements are directly related to the criticality of the hazards as represented by
the results of the Hazard Classification, through elaboration (whereupon they have a place in the
hierarchy) or through testing elicited requirements for their place in the hierarchy.

14.11.2006 2.0 D-10


EASIS Deliverable D3.2 Part 1 - App D

D.3 Requirement types

D.3.1 Hazard probability requirements

In general there are two means of attaching probabilities to hazards: quantitative and qualitative.

D.3.1.1 Quantitative requirements

In a quantitative scheme, numerical values are attached to the hazard probabilities; for example in
terms of a frequency of occurrence per year or per operating hour, or as a statistical probability.
Note that when numerical values are expressed as a statistical probability, such a quantity is
dimensionless. However when expressed as a frequency of occurrence, the quantity is not
dimensionless (and usually has dimensions of [time-1] or similar).
Care should therefore be taken when expressing quantitative requirements to ensure that the units
are clearly stated. This is particularly important when comparisons have to be made between
quantities. For example, reliability data for an electronic component is usually expressed as a
failure rate λ, which is typically expressed in failures per 106 hours. However the statistical
probability of failure (which is dimensionless) is given by:
P(t ) = 1 − e − λt
If λt is small this equation reduces to P(t) = λt. Note that in this example, the statistical probability
can only be derived by determining a time period t over which this probability is calculated. This
would usually be the projected service life of a component or the vehicle. For example, if a vehicle
is designed to have a service life of 10 000 hours and a component or system has a failure rate of
0.5 in 106 hours i.e. 5 x 10-7 per hour, then we find
P(t)exact = 0.00499
P(t)approx = 0.005

D.3.1.2 Qualitative requirements

In a qualitative scheme, a number of discrete classes are used and definitions given of the hazard
probabilities and a description of what they mean. An example of such a qualitative classification is
given in the following table:

Frequency Occurrence during operational life


Frequent Likely to be continually experienced
Probable Likely to occur often
Occasional Likely to occur several times
Remote Likely to occur some time
Improbable Unlikely, but may exceptionally occur
Incredible Extremely unlikely that the event will occur at all, given the assumptions
recorded about the domain and the system

In this example, the frequency classes have been matched to a textual description of the frequency
(or occurrence). It is possible to extend a qualitative scheme to a semi-quantitative scheme, where
14.11.2006 2.0 D-11
EASIS Deliverable D3.2 Part 1 - App D

the frequency classes have numerical values attached to them, either as a range or as an upper
limit.
The following example is taken from the pharmaceutical industry and shows how verbal descriptors
(typically used on medicine labels to describe the frequency of occurrence of side-effects to the
medicine) are related to statistical probabilities:

Verbal descriptor EU guidelines: Equivalent probability


Very common > 10% (more than 1 in 10)
Common >1% and <10% (less than 1 in 10 but more than 1 in 100)
Uncommon 0.1% to 1% (less than 1 in 100 but more than 1 in 1000)
Rare 0.01% to 0.1% (less than 1 in 1000 but more than 1 in 10000)
Very rare up to 0.01% (less than 1 in 10000)

A commentary on this approach [4] notes that, “The available literature suggests that a statistical
approach to describing risk is often met with satisfaction by the recipients. However, research into
what individuals understand by terms such as ‘very rare’, ‘common’ etc., suggests that the current
EU guidelines on verbal descriptors are not correctly matched with statistical probabilities. In
general, it appears that the public equate the verbal descriptors (very rare, common etc.) to risks
that are substantially higher than those defined in regulatory documents. Perceiving very small
risks is particularly problematic and a number of models have been proposed in the literature to
help with this. One scale is based on a different set of verbal descriptors (high, moderate, low, very
low and minimal), but this too may not be in accord with people’s actual interpretations.”

D.3.1.3 Applying hazard probability requirements

In general, there is no strong recommendation for or against qualitative and quantitative


requirements. Both have advantages and disadvantages. However qualitative techniques are
more appropriate earlier in the lifecycle. Once detailed design decisions have been made, it is
more appropriate to consider quantitative measures, or to calibrate the qualitative measures in
some way (compare the frequencies that are typically attached to the “occurrence” rankings in
failure mode and effects analysis). It is difficult to verify that (uncalibrated) qualitative requirements
have been met.
Regardless of the scheme that is chosen, the overall objective of safety engineering activities is to
determine the (unprotected) hazard risk, apply measures and techniques to ensure that the risk is
acceptably low, then to demonstrate that this has been achieved.
The required risk reduction can therefore be expressed either in quantitative or qualitative terms.
Safety integrity level (SIL) or variations such as automotive SIL (ASIL) are commonly used as a
measure of this required risk reduction. In IEC 61508 and its direct derivatives, SIL is a measure of
the reliability required of the safety functions (which are often “add ons” to a basic system). In the
automotive context, it is more appropriate to consider the SIL or ASIL as a requirement for means
to control the probability of a failure occurring such that the associated hazard risk is broadly
acceptable.
Note that in general it is only possible to demonstrate that numerical values of target failure rates
have been achieved for random failures associated with hardware elements. It is also generally
accepted that testing can only prove failure rates up to around 10-3 to 10-4, as anything greater
requires an infeasible test time. Therefore verification of requirements such as “The occurrence
rate of hazard H1 shall be less than 10-8 per operating hour” is not straightforward. It is not
possible to demonstrate this by testing, and usually an analytical approach such as fault tree
analysis based on available empirical data (such as failure rates for electronic components) is

14.11.2006 2.0 D-12


EASIS Deliverable D3.2 Part 1 - App D

used. For this reason, such requirements should be considered as targets to be demonstrated
rather than absolute values to be proven in testing.
A hazard probability requirement shall therefore be stated as an unambiguous numerical target. It
is acceptable to use qualitative measures early in the development of a product but these need to
be changed to quantitative requirements or calibrated so that an objective verification can be
made.
The numerical target is derived by considering a numerical definition of broadly acceptable risk.
The numerical definition of broadly acceptable risk is then combined with the hazard classification
in order to arrive at a target. A simple model of how this can be done is as follows:
• Broadly acceptable risk = B
• Hazard classification for hazard n = RHn
• Requirement on hazard probability = B/RHn
Consider a typical automotive risk model where the hazard classification is a function of the
severity of the outcome of the hazard (the hazardous event) and the probability of the hazardous
event occurring. As discussed in Appendix B on hazard classification, the probability of the
hazardous event depends on:
• The probability of the failure occurring;
• The probability of the driver (or other person) being exposed to the hazardous situation;
• The probability of the driver (or other person) being unable to take the expected actions to
control the outcome of the hazardous situation.
For a given situation the latter two probabilities are usually constant but both can potentially give a
degree of risk reduction compared to the unprotected hazard. Therefore in order to ensure that the
overall risk is broadly acceptable it is necessary to set requirements for the first probability, namely
the probability of the failure occurring.
The MISRA Safety Analysis guidelines give an example of how random safety targets can be
derived from an overall measure such as the definition of broadly acceptable risk in society. In this
example, hazards are classified using a classification R1 … R5. A target is set for each
classification which represents the maximum acceptable failure rate of a random failure occurring if
the level of risk is to remain broadly acceptable. Thus (for example) for R4 the maximum
acceptable rate is 10-8 per operating hour for a single instance of a system. If a hazard denoted
H1 had been classified as R4 using the MISRA approach then an equivalent hazard probability
requirement could be written as:
“The occurrence rate of hazard H1 shall be less than 10-8 per operating hour.”

D.3.1.4 Broadly acceptable risk

In order to derive hazard probability requirements using a scheme such as the one described
above, it is necessary to understand what is meant by broadly acceptable risk.
In reality, there is no such thing as absolute safety, and any activity has a risk associated with it
even if the probability is vanishingly small. The main aim of risk analysis and risk reduction in
safety-related systems engineering is to reduce the risks associated with a system to a level that is
“broadly acceptable”. There are a number of definitions available as to what constitutes “tolerable”
or “broadly acceptable” risk, but in general this is understood to mean a level of risk that is
equivalent to the risk exposure in everyday society.

14.11.2006 2.0 D-13


EASIS Deliverable D3.2 Part 1 - App D

The following risk categories are commonly encountered (see also the section below on ALARP):
• Unacceptable: this level of risk can never be accepted
• Tolerable: this level of risk can be accepted if further reduction is impracticable
• Broadly acceptable: this is equivalent to the risk exposure in everyday society
In deriving hazard probability requirements for a vehicle or a system, the objective is to take the
classified hazards and determine what measures have to be applied to reduce or mitigate these
risks to the “broadly acceptable” level. It is necessary to define a safety policy (at the company,
and/or product, and/or project level) in which the “broadly acceptable” risk is defined. Given the
identified risk reduction, associated requirements on hazard probabilities have to be derived.
Again for automotive systems, this is usually achieved by controlling the hazard probability so that
the overall hazard risk is reduced to a broadly acceptable level.
If a quantitative approach to hazard probabilities is being used, then the tolerable or broadly
acceptable risk will need to be a specific numerical target, e.g. x per year, y per hour. It will be
necessary to show how this level of tolerable risk relates to the risk exposure in everyday society;
for example, if there is a declared acceptable risk available in the literature (e.g. [3]) then this can
be used as a starting point. Note that such available targets usually relate to the overall risk and
broadly acceptable occurrence rates have to be derived from these.
If a qualitative approach is being used then the tolerable risk is likely to be expressed as a matrix
as shown in Section D.3.1.5.2. In this case, the categories will still need to be calibrated in some
way.

D.3.1.5 The ALARP principle

It is sometimes discovered in the course of analysing a system that it is not possible to reduce the
risks to a “broadly acceptable” level, but it is still desired to implement the system as the benefits
far outweigh the associated risks. In this case, the principle of “tolerable risk” applies along with the
concept of reducing risks ALARP (as low as reasonably practicable). An alternative term that may
be encountered is SFAIRP (so far as is reasonably practicable). These are essentially the same;
however, SFAIRP is the term most often used in legislation (e.g. the UK’s Health and Safety at
Work Act) and ALARP is the term used by practitioners.
A risk has been reduced ALARP when it has been demonstrated that the cost of any further risk
reduction, where the cost includes the loss of capability as well as financial or other resource
costs, is grossly disproportionate to the benefit obtained from that risk reduction. Further details of
the ALARP principle may be found in Def Stan 00-56 Part 2 Issue 3 [2].
In Def Stan 00-56 [2] the following definitions of risk are found:
• Broadly acceptable risk: A level of risk that is sufficiently low that it may be tolerated without the
need to demonstrate that the risk is ALARP.
• Tolerable risk: A level of risk between broadly acceptable and unacceptable that may be
tolerated when it has been demonstrated to be ALARP.
• Unacceptable risk: A level of risk that is tolerated only under exceptional circumstances
Note that IEC 61508 defines “tolerable risk” as “risk which is accepted in a given context based on
the current values of society”.
Establishing the tolerability of risk from a system depends on a number of factors, including the
technology involved, the purpose of the system, its domain and the expectations of society. The
UK HSE’s publication “Reducing risks, protecting people” [3] has some useful examples in
Appendix 4 showing the risks associated with certain industrial sectors and means of
transportation.

14.11.2006 2.0 D-14


EASIS Deliverable D3.2 Part 1 - App D

The latest version of the UK’s Def Stan 00-56 has some helpful guidance on this subject, which
has been reproduced and adapted below.
The approach to establishing tolerable risk is basically the same whether a quantitative or
qualitative approach to risk assessment is being adopted. The general principles are:
• Reduce system risks to a broadly acceptable level
• If it is not possible to reduce the risks to a broadly acceptable level, then demonstrate that the
risks are tolerable and have been reduced ALARP.
The definitions of “broadly acceptable”, “tolerable” and “unacceptable” need to be established
depending on the sector and the application. Figure D.4 from Def Stan 00-56 [2] shows how this is
applied for both quantitative and qualitative risk assessment.

Unacceptable
Very high
risk Class A
X per year likelihood

High
Tolerable Class B
likelihood
risk

Y per year Medium


Class C
likelihood

Broadly
Low
acceptable Class D
likelihood
risk

Quantitative Qualitative

Figure D.4 Risk acceptance concepts

14.11.2006 2.0 D-15


EASIS Deliverable D3.2 Part 1 - App D

The following table from Def Stan 00-56 Part 2 [2] is useful in understanding how tolerable risk and
ALARP are related:

Level of risk Have the risks been demonstrated to be ALARP?


Yes No
Unacceptable Cannot be tolerated (unable to Cannot be tolerated (unable to
be signed off) – unless there are be signed off)
exceptional reasons for the
activity to take place.
Tolerable Can be tolerated (can be signed Cannot be tolerated (unable to
off from an ALARP perspective) be signed off)
Broadly acceptable Can be tolerated (can be signed Can be tolerated (can be signed
off from an ALARP perspective) off without full ALARP
demonstration) – but risk should
be reduced wherever
reasonably practicable

D.3.1.5.1 Quantitative assessment

If quantitative hazard probability requirements are being used, the boundaries for tolerable risk and
broadly acceptable risk would be set as numerical probability targets e.g. X per year, Y per hour.
As explained in section D.3.1.4, broadly acceptable occurrence rates have to be derived from
these values for broadly acceptable risk.

D.3.1.5.2 Qualitative assessment

If quantitative hazard probability requirements are being used, a discrete classification of risks can
be adopted (see IEC 61508 [1] part 5 and Def Stan 00-56 [2]). The risk classification is carried out
by combining the frequency of occurrence with the consequence, in order to create a matrix of risk
classes. Note that this is a separate analysis from hazard classification. The table below shows an
example of a risk classification matrix adapted. Note that, as with many of these examples taken
from IEC 61508, the actual population of the table is sector-specific. For this reason the entries in
the table are purposely not symmetrical.

Consequence
Frequency
Critical Severe Marginal Negligible
Frequent I I I II
Probable I I II III
Occasional I II III III
Remote II III III IV
Improbable III III IV IV
Incredible IV IV IV IV

In this example, the risk classes might be defined as follows. Note that these are examples and
actual interpretations are required for sectors and applications.

14.11.2006 2.0 D-16


EASIS Deliverable D3.2 Part 1 - App D

Risk class Interpretation


Class I Intolerable risk
Class II Undesirable risk, and tolerable only if risk reduction is impracticable or if the
costs are grossly disproportionate to the improvement gained
Class III Tolerable risk if the cost of risk reduction would exceed the improvement
gained
Class IV Negligible risk

For these example risk classes, Class IV is “broadly acceptable”, Class III risks have to be reduced
ALARP and Class I risks are “unacceptable”. Class II risks are considered just inside the ALARP
region, and also have to be reduced ALARP but require a much higher level of justification.
In this example, the “consequence” classes might be defined as follows:

Category Definition
Critical Multiple deaths
Severe A single death; and/or multiple severe injuries or severe occupational illnesses
Marginal A single severe injury or occupational illness; and/or multiple minor injuries or
minor occupational illnesses
Negligible At most a single minor injury or minor occupational illness

Similarly the “frequency” classes might be defined as per the example in Section D.3.1 above.
If numerical values are associated with the frequencies, they have to be derived by considering the
operational profile of the system and a number of standards and guidelines are available (e.g. [3]).

D.3.1.5.3 Further discussion of the ALARP principle

For further discussion of the ALARP principle please refer to Def Stan 00-56 Part 2 Issue 3 [2]. The
standard also gives some guidance on applying ALARP to “complex” electronic systems (broadly
equivalent to E/E/PES, particularly PES, in IEC 61508 [1] terminology).

D.3.1.5.4 Defining tolerability criteria

A general principle is that safety will be, so far as is reasonably practicable, at least as good as
that required by law. The key issue is that any tolerability criteria that are applied should be
justifiable and auditable.
The tolerability figures in [3] are given in terms of the individual risk of fatality per year; that is, the
total risk of being killed that a hypothetical individual is exposed to in any one year. These
represent the total risk to which an individual is exposed, so when considering specific risks an
assessment should be made as to how they contribute towards the overall level of risk. In addition,
consideration should also be taken of how many individuals are likely to be exposed to the risks. It
should also be noted that risk management for a system is usually considered in terms of the
likelihood of an accident occurring. Care needs to be taken in the use of statistics to ensure that
there is no confusion about what is being measured and the measurement units that are being
used.

14.11.2006 2.0 D-17


EASIS Deliverable D3.2 Part 1 - App D

D.3.1.5.5 Similar approaches

The GALE (globally at least equivalent) and GAMAB principles (French: “globalement au moins
aussi bon” = globally at least as good) assume that there is already an existing “acceptable”
solution with a baseline risk and require that any new solution shall in total be at least as good. The
use of “globally” is important in this context, because it allows for trade-offs: an individual aspect of
the safety system may indeed be worsened if it is compensated for by an improvement elsewhere.
The GALE or GAMAB principle is similar to the ALARP principle but it is not equivalent to it. The
crucial difference is that an approach based on ALARP requires all hazards to be demonstrated to
have the associated risks reduced to the broadly acceptable level, or if it is not practical to reduce
a particular hazard to the broadly acceptable risk level that the risk associated with that hazard is
tolerable and has been reduced ALARP. GALE or GAMAB permits a higher risk to remain with
some hazards if overall the level of risk remains broadly acceptable. The final decision on whether
to use an ALARP or GALE/GAMAB approach will depend on the precedents both within the legal
system and the application sector and these may be territory-dependent. However it appears that
GALE/GAMAB is of most relevance to development of systems where an existing system is being
replaced or upgraded and there is suitable baseline data to compare it with. For a novel or
distributed system it would appear that ALARP is more appropriate.
The German MEM (minimum endogenous mortality) principle takes as its starting point the fact
that there are various age-dependent death rates in society and that a portion of each death rate is
caused by technological systems. The requirement that follows is that a new system shall not
“significantly” increase a technologically-caused death rate for any age group. Ultimately, this
means that the age group with the lowest technologically caused death rate, the group of 5 to 15
year olds, is the reference level.

D.3.1.6 Hazard probability requirements: Summary

Table D.1 provides a compact summary of the findings concerning hazard probability
requirements.
General characteristics Requirements for maximum acceptable failure rates in order to
maintain a level of risk that is broadly acceptable.
How to determine Use hazard classification results and a suitable measure of broadly
suitable requirements acceptable risk to set maximum acceptable failure rates for each
hazard classification category in order to keep the risk associated with
the hazard at the broadly acceptable level.
How to express Requirements should be expressed as an unambiguous and verifiable
requirements statement (see examples).
Specific difficulties Measures of broadly acceptable risk are usually specified in terms of
the probability (or rate of occurrence) of an unwanted outcome (e.g. a
fatality) and a calculation is needed to relate these to failure rates of
(e.g.) electronic components.
The failure rates or probabilities are usually targets rather than
absolute values (see “verification issues”).
Relationship to other Closely related to fault tolerance requirements and quantitative
types of requirements requirements for hardware architecture. For example specific
architectural features such as multiple lanes or redundant elements
may be necessary to achieve the required failure rate.
Relationship to other Not applicable.
requirements of the
same type

14.11.2006 2.0 D-18


EASIS Deliverable D3.2 Part 1 - App D

Verification issues Qualitative measures are difficult to verify.


Failure rates better than 10-3 to 10-4 per hour cannot be demonstrated
by testing and an analysis has to be made instead.
Examples “The maximum acceptable occurrence rate of hazard H1 is ‘remote’ ”
“The occurrence rate of hazard H1 shall be less than 10-8 per
operating hour.”
Table D.1 – Essential characteristics of hazard probability requirements

D.3.2 Fault tolerance and functional degradation requirements

D.3.2.1 Introduction

Fault tolerance, in a general sense, is the property of a system to exhibit some desired behaviour
even in the presence of faults and errors. The requirements can range from the full application
functionality (in case of complete fail-operational behaviour) over partial functionality and degraded
functionality down to the simple exclusion of wrong or dangerous functionality (in the case of fail-
silence behaviour). The assumed faults must be specified with respect to their maximum number,
the locations of occurrence and the malfunctions taken into account (like stuck-at a value, omission
of output values, delay of operations etc.). Consequently, fault tolerance requirements must
sufficiently express the assumed faults.
Fault tolerance requirements need to be broken down into requirements on the service to be
provided by the countermeasures. Otherwise the work of the system designers may remain
unclear and, even worse, it may be impossible to see whether the fault tolerance requirements are
satisfied. In particular, the goals to be checked by analysis tools may be ambiguous. It should be
noted that requirements on the service provided by countermeasures are in principle outside the
scope of "fault tolerance and functional degradation requirements". However, this section D.3.2
takes a holistic approach to fault tolerance and functional degradation, covering the entire range
from top-level requirements to detailed requirements on the technical countermeasures.
In complex systems with strong cost-efficiency constraints, as are the integrated safety systems
addressed in the EASIS project, it is of particular importance to require the right degree of fault
tolerance of the various functions realized by different components. This means individual
requirements in various hardware and software region of the system rather than “flat”
homogeneous requirements throughout the whole system. By a fine-grained formulation of fault
tolerance requirements the balance can be kept between a sufficient degree of safety on one side
and a low redundancy overhead on the other. Typically different components require their
individual fault assumptions. And, depending on the application, different extents of degraded
functionality may be acceptable.
It should also be noted that there is no need for just one single fault tolerance layer in a system.
Instead a staggered approach could be both more effective and more efficient. Having various fault
tolerance layers to prevent vertical fault propagation and additional countermeasures against hori-
zontal fault propagation will allow for the provision of (partly) reliable services independent of the
application service on higher layers. Reliable end-to-end communication is an example of such a
staggered approach.
Once the fundamental system structure is defined the requirement of reliable service is identical to
requiring fault tolerance properties of certain components. Furthermore, the allocation of fault toler-
ance to a system structure implies the definition where fault propagation is allowed among compo-
nents, and where it must be prevented by fault tolerance techniques. This includes also the most
primitive, yet effective contribution to fault tolerance: the isolation of components by disallowing
interactions among them.
14.11.2006 2.0 D-19
EASIS Deliverable D3.2 Part 1 - App D

D.3.2.2 Procedure

Besides faults and degradation of functionality, fault tolerance requirements should also consider
structural aspects like fault propagation, isolation and location of fault tolerance techniques, as far
as high cost efficiency of the system is aimed at. This leads to the following basic steps to express
fault tolerance requirements.
Please note that very important initial steps like the definition of the safety integrity level are not
dealt with here. They are part of the process frameworks described in WT 3.1.1 (appendix A).

Steps of the procedure:


a) Identify services where resilience against faults is required:
a1) List the functions of the system. For each function determine the degradation which is
allowed with respect to safety requirements. The degradations can range from full func-
tionality (complete fail-operational) to a permanent safety output (fail-safe) or even a
wrong output (no safety requirements to satisfy).
The determination of possible degradations is related to Appendix B which deals with
investigating in which ways the service provided by the system might deviate from the
nominal service.
a2) For each degradation decide the situation in which it must be reached. These decisions
define the principal fault tolerance strategy, subject to refinements in subsequent steps.
Example of fine-grained staggered degradations:
• Full functionality in the absence of faults
• Interruptions of at most 10 ms in case of minor temporary faults
• Movement to the middle position in the presence of up to two faults whether
temporary or permanent (excluding double permanent faults in the communication
system)
• Passivation on double permanent communication faults
• Arbitrary behaviour in case of three or more faults.
b) Distinguish malfunctions:
b1) List all malfunctions of services which are relevant to the envisaged fault tolerance
strategy. The granularity may be very coarse or rather detailed.
The following “classical” list is relatively coarse:
• fail-silence
• timing failure
• omission failure
• non-code value failure
• code value failure including Byzantine behaviour.
A refinement thereof may distinguish whether a timing fault causes premature action,
minor delays or major delays. Properties of sensors, actuators and power supplies can be
considered as well, as can be seen from the following examples of malfunctions:
• loss of energy
• too high energy consumption
• undesired energy feedback
14.11.2006 2.0 D-20
EASIS Deliverable D3.2 Part 1 - App D

• non-plausible value
• maximum torque exceeded
b2) For each malfunction identify the components whose faults may lead to or at least contri-
bute to the respective malfunction anywhere in the system. On the basis of this informa-
tion the fault regions will be defined (see step c).
This step is related to Appendix C which deals with analyzing the relationship between
faults and failures.
c) Form fault regions: This definition clarifies what is counted as a single fault. A fault region is a
set of components whose internal disturbances are counted as exactly one fault, regardless of
where the disturbances are located, how many that occur and how far they stretch within the
fault regions. A fault region may contain just a single transistor or even a complete node of the
network. By the definition of fault regions the granularity of the fault tolerance concept is
defined. Simultaneous fault occurrences in two fault regions are counted as a double fault.
This topic is connected to WT 3.1.3 (appendix C) in which the relationship between faults and
failures and the concept of a fault region are explained.
c1) Define sets of hardware and/or software components that form a fault region with respect
to a malfunction. Typically these sets are disjoint. If components do not fall in any of the
fault regions they belong to the so-called hard core, where no fault tolerance is required
(due to a high degree of perfection of these components, for example.
d) Form containment regions: For each fault region the task of the fault tolerance technique has to
be expressed. The containment region is the set of components which may be adversely af-
fected by the respective malfunction. In other words: The containment region expresses the
borderline where fault propagation must stop. On fault occurrence in a fault region all compo-
nents outside the corresponding containment region must not become erroneous. Precisely
speaking: The functions of the components outside must not violate the allowed degraded
functionalities of the services they provide.
d1) For each fault region a containment region has to be defined.
A “natural” condition must hold for all containment regions: The components in the
containment region must be a superset of the components of the corresponding fault
region – or set of fault regions in the case of multiple fault tolerance.
d2) For each containment region the allowed degraded functionality of the components out-
side must be defined. This step can be considered a refinement of step a2). In a2) the
basic ideas are expressed whereas here the desired degree of fault tolerance is ex-
pressed in terms of malfunctions, fault regions and containment regions.
A single walk through the steps above cannot be expected sufficient for complex systems. Instead
a number of iterations will be necessary due to the fundamental nature of faults. When you start
specifying the functionality at application level there are no faults in the system (apart from flaws in
the specification itself). Faults can only occur when the functions are implemented by components,
which are always non-perfect. With increasing detail of the system design additional considerations
of faults may become necessary. Consequently, additional countermeasures may become
necessary, and the fault tolerance concept may appear in a new light, which may cause revisions.
For this reason a clear approach for the (potentially repeated) formulation of fault tolerance re-
quirements is helpful.

14.11.2006 2.0 D-21


EASIS Deliverable D3.2 Part 1 - App D

D.3.2.3 Optional Check

When complex fault tolerance requirements have been formulated for complex systems known
specification faults may appear:

• Fault tolerance requirements may be incomplete. Example: Fault regions and/or containment
regions might be missing for some identified malfunctions.

• Fault tolerance requirements may be inconsistent. Example: Fault regions might intersect the
respective containment regions.

Checking the fault tolerance requirements for such flaws should be done by the personnel ex-
pressing them. These checks can be performed separately requirement by requirement. In addi-
tion, once the structure of interactions among components is known, one can also check the
requirements jointly on the basis of a coarse and simple system model. In this model all the
components, malfunctions, fault regions, fault propagations through interactions among them,
containment regions and the allowed degradation of functionality are expressed.
Note that a model of the interactions is not necessary to express fault tolerance requirements, as
can be seen from the steps in section 2. However, if one is able to distinguish proper interaction
from wrong behaviour at the interfaces between components, then a primitive model of the fault
tolerance properties becomes “operational”. The interactions need not be expressed in much
detail. A distinction in correct interaction and various classes of malfunctions is sufficient to model
fault propagation among components. Then the propagations can be analysed to obtain the set of
components where faults spread to. Fault tolerance exists to the degree the containment regions
are not violated by fault propagations.
Moreover, there is also a check of the proper definition of containment regions. One has simply to
look at the fault-infected region and the effect to the functionality of the components there to see
whether the intended fault tolerance is achieved.
This can be seen as a connection to Appendix E which deals with Safety Case construction.

D.3.2.4 Formal Background

A mathematical model is presented to express the origin, the propagation and the potential
containment of faults by the fault tolerance techniques applied in the system. The model can be
implemented by a program (or a database) to check whether the defined containment regions are
violated.
Besides usual mathematics the following notation is used: Let a = (a1, ... , an) be an element of
A1 × ... × An. Then a|Ai denotes the projection of a to Ai, which is equal to ai. The notation ℘(S)
expresses the powerset of a set S.

D.3.2.4.1 Structure of the System

The system structure is modelled in a completely static way. Between any pair of potentially inter-
acting components instances of services are located to express the functionality of the interaction.
The model abstracts from timing and all dynamic aspects. For both components and services types
are defined.
Let C be a set of components.
Let CT be a set of component types. C and CT must not share elements: C ∩ CT = ∅.
Let t: C → CT be a function defining the type of each component. A component is
considered an instance of a component type.
14.11.2006 2.0 D-22
EASIS Deliverable D3.2 Part 1 - App D

Let ST be a set of service types. Depending on its type each component provides some ser-
vices of particular service types. Remark: Sometimes the service is called “function” of
the component. Here we call it “service” for the sake of a clearer distinction from the
various mathematical functions. The sets C, CT and ST must not share elements:
(C ∪ CT) ∩ ST = ∅.
Let nu : CT × ST → INo be a function expressing the number of used services. Each compo-
nent type can use zero, one or more instances of a service type. A voter uses three
calculation results as its input, for example.
Let nr : CT × ST → INo ∪ {∞} be a function expressing the number of services realizations.
Each component type can implement zero, one or more instances of a service type. A
power supply may realize two 5V outputs, for example. A file server may implement file
access for any number of clients. This infinite number of service instances is expressed
by ∞.
Def: S = { (x, y, z) ∈ C × ST × IN : 1 ≤ z ≤ nu(t(x), y) }
is the set of services. Note that the set of services cannot be chosen freely. Instead it
follows from the set of components and the service types they use. Remark: The
realization of services by components does not necessarily lead to elements of S,
because the services may not be used by any component. For this reason the definition
of S is based on function nu rather than nr.
Def: u : C → ℘(S) such that u(x) = { y ∈ S : y|C = x }
returns the set of services used by a component.
Let r: S → C define the realization of a service by a component. This is the central function
to express the system structure (“who delivers what to whom?” or, more precisely,
“which component provides which service for some other component?”). The definition
of r must not require quantities of services from components they are unable to realize
according to their component type and function nr. For this reason the following condi-
tion must be satisfied:

∀ x ∈ S: | r–1({ r(x) }) | ≤ nr(t(r(x)), x|ST)

The left side of the inequality is the quantity of service x requested from component r(x),
i.e. the cardinality of the inverse of the one-element set {r(x)}. The right side is the
quantity of service x component r(x) is able to realize according to its type t(r(x)).
Figure D.5 depicts an example of a system structure with two component types and two service
types CT = {ct1, ct2}, ST = {st1, st2}. The first component type ct1 provides a realization of one
instance of st1 (nr = 1 in upper part of the figure) and uses two instances of st1 (nu = 2) and one
instance of st2 (nu = 1), whereas ct2 realizes three instances of st1 (nr = 3) and an arbitrary
number of instances of st2 (nr = ∞), and uses only one instance of st1 (nu = 1). This structure is
instantiated as follows: There are two components c1 and c2 of type ct1, and one component c3 of
type ct2. This results in exactly seven services: S = {s1, ... , s7}. Function u simply “translates”
function nu from the type definitions to the concrete component and service instances. Function r
expresses the allocation of services in the system:

• Component c1 provides service s7 of type st1 to component c3.

• Component c2 provides service s2 of type st1 to component c1.

• Component c3 provides service type st1 once to c1 (s1) and twice to c2 (s4 and s5), as well as
service type st2 once to c1 (s3) and once to c2 (s6),

14.11.2006 2.0 D-23


EASIS Deliverable D3.2 Part 1 - App D

st1 st1 st2


nr = 1 nr = 3 nr = ∞
ct1 ct2
nu = 2 nu = 1 nu = 1
st1 st2 st1
t t t

c1 c2 c3
u u u

r r r r

s1 = s2 = s3 = s4 = s5 = s6 = s7 =
(c1, st1, 1) (c1, st1, 2) (c1, st2, 1) (c2, st1, 1) (c2, st1, 2) (c2, st2, 1) (c3, st1, 1)

r r r

Figure D.5 Example of a system structure

D.3.2.4.2 Fault model

The fault model comprises the description of fault origins in components and the propagation to
further components through functions and components. The fault model is not deterministic,
because each cause may have various effects. This effect is expressed in the model by defining
sets of malfunctions for each service.
Let F be a set of component faults, where ok ∈ F. The element ok denotes the fault free
case. F must be disjoint from C, CT and ST: (C ∪ CT ∪ ST) ∩ F = ∅.
Let pf : CT → ℘(F), where ∀ x ∈ CT: ok ∈ pf(x),
express the potential faults of each component type. For all component of a given type
the sets of potential faults are the same.
Let M be a set of service malfunctions, where ok ∈ M. The element ok denotes the correct
function. Except the ok element, M must be disjoint from C, CT, ST and F:
(C ∪ CT ∪ ST ∪ F) ∩ M = {ok}.
Let pmu : ST → ℘(M), where ∀ x ∈ ST: ok ∈ pmu(x),
express the potential malfunctions towards the component using a service of a given
type. For all services of a given type the sets of potential malfunctions towards the
using components are the same.
Let pmr : ST → ℘(M), where ∀ x ∈ ST: ok ∈ pmr(x),
express the potential malfunctions towards the component realizing the service of a
given type. Compared to pmu, this is the reverse direction. Typically, the component
providing a service is less affected by malfunctions than the using component. How-
14.11.2006 2.0 D-24
EASIS Deliverable D3.2 Part 1 - App D

ever, sometimes wrong usage has a negative influence back to the realizing compo-
nent. For all services of a given type the sets of potential malfunctions towards the
realizing components are the same.
For each component type ct ∈ CT and each service type st ∈ ST realized by a component c of
type ct, formally nr(ct, st)>0 and t(c) = ct, the following behaviour function brct,st expresses fault
propagation to each realized service s of type st, formally s|ST = st. The value of function brct,st
depends on all faults and malfunctions affecting c.

Let brct,st : pf(ct) × (×y∈ST, nu(ct, y)>0 (pmu(y), min, max))


× (×y∈ST, nr(ct, y)>0 pmr(y), min, max))
→ ℘(pmu(st))
express the behaviour of component c towards a realized service s.
In the absence of any fault or malfunction brct,st must deliver ok:
brct,st (ok, ... , ok) = {ok}
In the following the reverse propagation from a service using component to a service realizing
component is dealt with.
For each component type ct ∈ CT and each service type st ∈ ST used by a component c of type
ct, formally nu(ct, st)>0 and t(c) = ct, the following behaviour function buct,st expresses fault
propagation to each used service s of type st, formally s|ST = st. The value of function buct,st
depends on all faults and malfunctions affecting c.

Let buct,st : pf(ct) × (×y∈ST, nu(ct, y)>0 (pmu(y), min, max))


× (×y∈ST, nr(ct, y)>0 pmr(y), min, max))
→ ℘(pmr(st))
express the behaviour of a component c towards a used service s.
In the absence of any fault or malfunction buct,st must deliver ok:
buct,st (ok, ... , ok) = {ok}
In many cases the functions brct,st and buct,st may depend on the various variables in the same
way, because the worst fault or malfunction, respectively, is supposed to exercise the dominant
influence.
In the example depicted on page 2 the following faults and malfunctions may be defined:

• F = {ok, minor, severe}

• pf(ct1) = {ok, minor}, pf(ct2) = {ok, minor, severe}

• M = {ok, silent, omitted, wrong}

• pmu(st1) = {ok, silent, omitted}, pmu(st2) = {ok, silent, wrong}

• pmr(st1) = {ok, omitted}, pmr(st2) = {ok, omitted}

• brct1,st1 is defined by the following table:

14.11.2006 2.0 D-25


EASIS Deliverable D3.2 Part 1 - App D

pf(ct1) pmu(st1) pmu(st2) pmr(st1) brct1,st1

ok (ok,2,2) (ok,1,1) (ok,1,1) {ok}


minor (ok,2,2) (ok,1,1) (ok,1,1) {ok, omitted}
ok (silent,1,1) (ok,1,1) (ok,1,1) {silent, omitted}
... ... ... ... ...

• buct1,st1 is defined by the following table:

pf(ct1) pmu(st1) pmu(st2) pmr(st1) buct1,st1

... ... ... ... ...


minor (omitted,1,1) (silent,1,1) (omitted,1,1) {omitted}
... ... ... ... ...

• In the same way the following functions have to be defined for ct1:
buct1,st2

• For ct2 the following functions have to be defined:


brct2,st1, brct1,st2, buct1,st1.

D.3.2.4.3 Fault Tolerance Definition

Fault tolerance is defined in the following way. For a set of faults within a set of components, called
fault region, the set of malfunctions of a larger set of components, called containment region, must
not exceed a given set of allowed malfunctions. Typically one allows only for {ok} to require com-
plete fault tolerance.
Let FT ⊂ ℘(C) × ℘(F) × ℘(C) × ℘(M)

be a fault tolerance definition of a complete system.


For some (FR, SF, CR, SM) ∈ FT we call

FR the fault region,


SF the set of faults in the fault region to be tolerated,
CR the containment region, where FR ⊂ CR must hold, and
SM the set of allowed malfunctions of all services realized by components from CR
and used by components outside CR. Typically SM = {ok} for complete fault
tolerance.
Now we develop a condition expressing whether or not the system is fault-tolerant with respect to
the defined set FT. For this purpose a sequence of actual system states is defined. The initial state
corresponds to the faults in the fault region. Subsequent states express the malfunctions caused
by fault propagation. Eventually the sequence of actual states will reach a stable state. The mal-
functions in this state must not exceed SM for the respective services.

14.11.2006 2.0 D-26


EASIS Deliverable D3.2 Part 1 - App D

Def: af : C → ℘(F), such that ∀ x ∈ FR: af(x) = SF ∩ pf(x)


and ∀ x ∈ C – FR: af(x) = {ok}
expresses the actual faults of each component. Inside the fault region the set of
faults is considered. Outside the fault region all components are fault free.
Def: amu1 : S → ℘(F), such that ∀ x ∈ S: amu1(x) = {ok}
expresses the initial actual malfunctions of each service towards the using
component. In the beginning all services are provided correctly.
Def: amr1 : S → ℘(F), such that ∀ x ∈ S: amr1(x) = {ok}
expresses the initial actual malfunctions of each service towards the component
realizing it. In the beginning the usage of all services does not cause malfunctions
to the component providing the service.
Def: amuk+1 : S → ℘(F) for k ∈ IN such that
∀ x ∈ S: amuk+1(x) = amuk(x) ∪ brt(r(x)), x|ST, x| IN(af(r(x)), amuk(y), ... , amrk(z), ...)
for all y ∈ u(r(x)) and all z with r(z) = r(x), obeying the order of parameters defined
for function brt(r(x)), x|ST, x| IN.
Function amuk+1 expresses the actual malfunctions of each service towards the
using component. The malfunctions correspond to state k + 1, which is based on
the previous state k.
Def: amrk+1 : S → ℘(F) for k ∈ IN such that
∀ x ∈ S: amrk+1(x) = amur(x) ∪ but(x|C), x|ST, x| IN(af(x|C), amuk(y), ... , amrk(z), ...)
for all y ∈ u(x|C) and all z with r(z) = x|C,
obeying the order of parameters defined for function brt(x|C), x|ST, x| IN.
Function amrk+1 expresses the actual malfunctions of each service towards the
component realizing it. The malfunctions correspond to state k + 1, which is based
on the previous state k.
Theorem: The sequence of functions (amuk, amrk) for k ∈ IN eventually becomes stable,
formally: ∃ n ∈ IN: ∀ k > n: (amuk, amrk) = (amun, amrn).
Proof: Functions amuk and amrk return a set of actual malfunctions for each service. Due
to the union operator in the definitions of amuk and amrk the sets of actual malfunc-
tions may add elements, but never loose elements for increasing k. Consequently,
the cardinalities of the sets increase monotonically. However, the cardinalities of the
sets of potential malfunctions form an upper bound for each service. Since the
number of services is finite, the process of adding elements to the sets of
malfunctions will terminate when some n is reached.

Condition: The system satisfies the fault tolerance definition FT, if


∀ (FR, SF, CR, SM) ∈ FT: ∀ x ∈ S: (r(x) ∈ CR ∧ x|C ∉ CR ⇒ amun(x) ⊂ SM)
∧ (r(x) ∉ CR ∧ x|C ∈ CR ⇒ amrn(x) ⊂ SM)
In this condition amun and amrn denote the stable functions of a sequence of func-
tions starting with the fault set from SF in an element FT. Note that the condition has
to be checked for each element of FT.

14.11.2006 2.0 D-27


EASIS Deliverable D3.2 Part 1 - App D

D.3.2.4.4 Example

The following example distinguishes the following malfunctions of components:


c correct function (which is not a real malfunction, of course)
e explicit error indication
v violation of a code or a plausibility check by a wrong function value (sometimes this
malfunction is also called non-code value failure). By an appropriate check the violation is
detectable.
w wrong value which does not violate a code or a plausibility or consistency condition. This
fault cannot be detected by an absolute test on the function value.
dc delayed provision of correct function (detectable by timeout)
dv delayed provision of a value violating a code, plausibility or consistency condition
(detectable by timeout)
dw delayed provision of a wrong value, not violating a code, plausibility or consistency
condition (detectable by timeout)
o omission of a value (equal to an infinite delay of that value)
The assumed system structure can be seen from the diagram in Figure D.6. Two redundant
sensors S1 and S2 are forwarding the read values to one processor of two different pairs (P1, P2)
and (P3, P4) of processors. In this double duplex solution comparator C1 reads the results of (P1,
P2), whereas C2 reads the results of (P3, P4). In case of equality a comparator forwards the result
via a bus adapter A1 or A2, respectively (maybe also called communication controller) to both
busses B1 or B2, respectively. At least one bus should transfer the correct result to the receivers
(not considered here) and, moreover, none of the two busses must transfer an undetectably wrong
value (malfunctions w or dw).

P1
S1 Processor CA1
P1
Sensor Comp. Adapter
S1 C1 A1
Processor
P2
P2 B1 B2

Bus Bus
B1 B2
P3
S2 Processor
P3
Sensor Comp. Adapter
S1 C2 A2
Processor
P4 CA2
P4

Figure D.6 Simple example for the formulation of fault-tolerance requirements

14.11.2006 2.0 D-28


EASIS Deliverable D3.2 Part 1 - App D

In the diagram fault regions are marked in red. The names of the fault regions are identical or simi-
lar to those of the respective components. Each sensor, processor or bus forms a fault region of its
own, whereas pairs of a comparator and an adapter form a joint fault region. Containment regions
are not depicted to avoid too complex illustration.
The functions and potential malfunctions of the components are as follows:
Fault free sensor Si: no input ...................................... output: c
Faulty sensor Si: no input ...................................... output: c, v, o
(no undetectably wrong sensor values are assumed here)
Fault free processor Pi: on input c .................................. output: c
on input v, o ............................... output: e
Faulty processor Pi: on any input ............................... output: c, v, w, dc, dv, dw, o
Fault free comp./ad. CAi: on input pair (x, y) ...................... output: c
where x = c and y ∈ {c, e, v, dv, dw, o},
or vice versa: x ∈ {c, e, v, dv, dw, o} and y = c.
on input pair (w, w) .................... output: w, e
on any other input pair ............... output: e
(timeout check in the comparator is assumed)
Faulty comp./ad. CAi: on any input pair (x, y) ............... output: c, v, dc, dv, o
where x ∉ {w, dw} and y ∉ {w, dw}.
(comparator cannot create undetectably wrong values because CRC
generation in the processors is assumed)
on any other input pair ............... output: c, v, w, dc, dv, dw, o
Fault free bus Bi: on input pair (x, y) ...................... output: c
where x = c and y ∈ {c, v, o},
or vice versa: x ∈ {c, v, o} and y = c.
on any other input pair ............... output: v, w, dv, dw, o.
Faulty bus Bi: on input pair (x, y) ...................... output: c, v, o
where x ∈ {c, v, o} and y ∈ {c, v, o}.
else on input pair (x’, y’) ............. output: c, v, dc, dv, o
where x’ ∈ {c, v, dc, dv, o} and y’ ∈ {c, v, dc, dv, o}.
on any other input pair ............... output: c, v, w, dc, dv, dw, o
(bus cannot create undetectably wrong values due to CRC protection)
Note that the above dependencies of the functions and malfunctions from the inputs of the com-
ponents could (and should) be more detailed, whereas the distinction in the malfunction classes c,
e, v, w, dc, dv, dw and o seems to be sufficient for many types of systems.
Now the fault regions and containment regions can be specified as follows. Here, only some
examples are given, not the complete list:
Malfunctions fault region containment region function outside (degradation)

• c, v, o S1 S1, P1, P3 c

• c, v, o S2 S2, P2, P4 c

• c, e, v, w, dc, dv, dw, o P1 P1 c

• c, v, w, dc, dv, dw, o CA1 CA1 c

• c, v, w, dc, dv, dw, o B1 B1 c


Although the system structure in the diagram above seems to be reasonably resilient against
faults, the (optional) check reveals an undesired “hole” in the fault tolerance requirement. Assum-
ing malfunction dv in CA1 can lead to malfunction dv in bus B1 and to simultaneous malfunction dv
14.11.2006 2.0 D-29
EASIS Deliverable D3.2 Part 1 - App D

of B2. This violates the specification. None of the busses transfers the correct value. By insertion of
bus guardians the system failure could have been prevented.

D.3.2.5 Prototype of a tool “FT.rex”

D.3.2.5.1 Scope of FT.rex

The mathematical model described above allows to precisely define fault tolerance requirements in
a formal way. However, it is rather difficult to handle the involved formulas. In order ease the usage
of the model, a prototypical tool “FT.rex” has been developed. It provides all the means of the
mathematical model in a more user-friendly way.
The scope of the tool is identical to the scope of the mathematical model. It allows to specify
system structure, fault model and fault tolerance specification of a given system.
FT.rex builds up a relational database model of the entered data und uses Microsoft Access as a
basis. It also checks whether the data entered is consistent, for example malfunctions cannot be
assigned to a component which does not exist.
The main benefit of FT.rex is that it requires the user to explicitly formulate his fault tolerance
requirements in a logical way. Doing so, design flaws can become obvious at early stages.
It has to be noted that FT.rex is not a design tool. It can be used to accompany the design process,
though. In any case, the focus lies on specifying the fault tolerance characteristics of a given
system and finding possible flaws or inconsistencies.

D.3.2.5.2 User Interface

The user interface is split up into three sections “system structure”, “fault model” and “fault
tolerance specification” according to the scope described before. The main menu groups these
three sections together on one screen (see Figure D.7 ).
Each screen in FT.rex offers a short help area at the upper right corner and a menu at the lower
right corner. In case of the main menu, the user can open the form he wants to edit, quit the
program or show an “about”-dialog. The design of all screens is similar, so that it should be easy to
navigate through the program.
FT.rex is started by double-clicking the “FT.rex.mdb”-file. This should cause Microsoft Access to
open and automatically show the FT.rex-main menu. The user can navigate through the whole
program without using the Microsoft Access-menus. However, as this is a prototypical tool, he is
not prevented from doing so.

14.11.2006 2.0 D-30


EASIS Deliverable D3.2 Part 1 - App D

Figure D.7 Main menu

Each GUI-element has been made using the form-editor of Microsoft Access. The usage of these
elements is not explained in detail here, as it should be familiar to any user of the program.
In short, the data is entered in a tables, where a blue title bar shows the general logical contents of
the table (in terms of the model presented in chapter D.3.2.4). The column-headers inside the
tables show the meaning of each field. Selecting a row of data is done by clicking on the marker at
the left side of the table (triangle). A new row can be entered at the end of the table, marked with
an asterisk. Navigating through data is done with the control elements at the bottom (back,
forward, go to, etc.).
The following chapters describe in more detail the three different topics of system structure, fault
model and fault tolerance specification. These are covered in the same order as the buttons of the
main menu: from top to bottom in each of the three topics.

3.2.5.2.1 System structure

Each button in the “system structure” section toggles to a window, in which the user may enter the
respective data.

14.11.2006 2.0 D-31


EASIS Deliverable D3.2 Part 1 - App D

At first, the user should provide information of the component types which occur in the system.
This can be done by clicking on the button “Manage Component Types” in the main menu. This
can for instance be the abstract component type “motor” (cf. Figure D.8).

Figure D.8 Definition of component types

The result is a complete list of component types. After entering that list, a click on “close form”
returns to the main menu. The dialog “component types” can be opened again at any time.
After specifying the component types of the system, instances of these component types have to
be entered. For example, a system may contain two identical motors.

14.11.2006 2.0 D-32


EASIS Deliverable D3.2 Part 1 - App D

Figure D.9 Definition of components

As shown in Figure D.9, each component instance is assigned to a specific component type which
is selected on the left side of the window. The two tables dynamically adjust themselves to each
other.
After specifying the component instances, list of service types is defined (not shown here, as
identical to definition of component types).
These service types are then assigned to specific component types. For each pair of a service type
and a component type, two numbers nr and nu specify, how often the component type realizes the
service type and how often it uses this service type (see Figure D.10). For example, any motor
realizes torque for an infinite number of other components (99) but does not use the service type
torque itself (0).

14.11.2006 2.0 D-33


EASIS Deliverable D3.2 Part 1 - App D

Figure D.10 Relationship between component types and service types

After this basic relationship is defined, the actual assignment of components (not component
types!) and service types has to be done. The goal is to thus have an identifiable instance of a
service which is delivered from a specific component to a specific other component. This is done
by assigning using and realizing component to a service type together with an unambiguous
number (see Figure D.11).
The program takes care that, the number of specified possible instances of a service (defined in
the dialog (“Manage number of service realisations/utilisations”) is not exceeded.

14.11.2006 2.0 D-34


EASIS Deliverable D3.2 Part 1 - App D

Figure D.11 Definition of services

This step completes the specification of the system structure.


Note that up to that point, no malfunctions or errors have been defined. These first steps explained
before are just necessary to define the basic system structure to which faults and malfunctions are
assigned later on.
On designing a system structure, the level of detail has to be chosen by the user. As faults are
later on assigned to the specified components, it is most suitable to choose the level of detail
accordingly, i. e. dependent on the level of detail at which faults are considered.
As the fault tolerance specification relies on a properly designed system structure, at this time, it
would be appropriate to re-check the structure. Also a graphical representation may be helpful. At
the current point, FT.rex itself does not provide graphical output.
It has to be noted, that the structure defined has to be static, i. e. no reconfiguration is considered.
Also, the specified behaviour is purely descriptive. It is to be understood as a data structure with
labels for the different components and services. Whether this structure is appropriate can only be
checked by the designer, but not by the tool. The tool, however, is a means for the designer that
helps to re-think a draft by requiring a formal specification.

14.11.2006 2.0 D-35


EASIS Deliverable D3.2 Part 1 - App D

3.2.5.2.2 Fault Model

This section describes possible faults of component types and service types and their relationship
to each other.
First, a number of component faults has to be specified in general (not shown here). Then, these
component faults are assigned as possible faults to a specific component type (see Figure D.12).

Figure D.12 Assignment of component faults

Malfunctions of service types are handled separately. After defining all possible malfunctions in
general, they are assigned to the usage or realization of services. Figure D.13 shows the
assignment of possible malfunctions to the usage of a service type. Assignment concerning the
realization is done in the same way.

14.11.2006 2.0 D-36


EASIS Deliverable D3.2 Part 1 - App D

Figure D.13 Assignment of malfunctions to service types

Once the malfunctions have been assigned to the realisation or utilisation of services, they can be
grouped together with regard to service types or component types. This is needed to combine
multiple malfunctions together. This is done identically for both realisation and utilisation (not
shown here).
Grouping of malfunctions is needed, because these groups of malfunctions define the concrete
behaviour of a component when realizing or using services.

14.11.2006 2.0 D-37


EASIS Deliverable D3.2 Part 1 - App D

Figure D.14 Specification of fault propagation

This behaviour defines the propagation of faults in the system and is shown in Figure D.14.
Each row in the table has to be read as follows:

• If a component of type “CT“


• that delivers a service type “ST”
• exhibits the fault “pf”
• and the used services have got malfunctions according to “XpmuCT”
• and the realised services have got malfunctions according to “XpmrCT”
• then each realised service of type ST will show the malfunctions listed in “PpmuST”.

A graphical illustration of this is shown in Figure D.15.

14.11.2006 2.0 D-38


EASIS Deliverable D3.2 Part 1 - App D

component type

PpmuST

service type

br

XpmrCT

component type

XpmuCT

Figure D.15 Visualisation of fault propagation

The same principle is used when defining the fault propagation regarding the usage of services.

3.2.5.2.3 Fault tolerance specification

Lastly the desired behaviour of the system regarding fault tolerance can be specified. This is done
by clicking on the respective button in the section “fault tolerance specification” in the main menu.
In the fault tolerance-dialog (see Figure D.16), the user can specify several sets of fault tolerant
behaviour. Each set is characterised by

• a fault region FR
• the component faults which shall be tolerated SF
• the containment region CR
• the malfunctions still allowed at the border of the containment region SM

The first two elements describe the faulty behaviour which hall be tolerated. The Containment
region describes the desired border of the propagation concerning these faults. The last element
describes the degradation in terms of allowed malfunctions (e. g. “reduced speed”).

14.11.2006 2.0 D-39


EASIS Deliverable D3.2 Part 1 - App D

Figure D.16 Specification of fault regions and fault containment regions

The respective menu is structured into five different sections. First the user shall select a given set
of fault tolerance specifications in the top frame (or enter a new one). These specifications are
simply numbered.
After that, the user can enter fault region, faults, containment regions and degradation in the four
frames below. The fault and fault containment regions are specified by simply listing a number of
components. Concerning faults and malfunctions FT.rex takes care that only faults or malfunctions
valid for these regions can be selected.

D.3.2.6 Two important considerations

D.3.2.6.1 Level of error propagation

When modelling systems and errors using the method proposed in this document, one has to take
special care with regard to specifying the correct level on which errors are propagated.

14.11.2006 2.0 D-40


EASIS Deliverable D3.2 Part 1 - App D

A short example:
Application layer

A1 A2
Network layer

C1 C2

Node 1 Node 2

Figure D.17 Error propagation

Shown in Figure D.17 is an application process A1 sending data to application process A2 on


another node. This is done via a communication channel and components of a network layer (C1,
C2) also residing on the nodes.
If A1 sends wrong application data, this error has to be modelled. It is tempting to just follow the
flow of information from A1 over C1 and C2 to A2 and mark each of these components as
erroneous. However, one has to keep in mind, that C1 and C2 themselves are not faulty and
moreover do not become faulty by transporting wrong data! Therefore, this approach is not correct.
Instead, one has to explicitly model the relationship between A1 and A2 on the application layer
(black arrow) and specify the error propagation also on that layer (red arrow). Otherwise, any data
sent over C1 and C2 would implicitly be regarded as being corrupted by these components, which
is clearly not intended.

D.3.2.6.2 Monotony

The propagation of errors in the model is regarded monotonous. That is, no correction of errors
can be modelled at the current stage.

14.11.2006 2.0 D-41


EASIS Deliverable D3.2 Part 1 - App D

An example is shown in Figure D.18:

C V

Figure D.18 Monotony

The system shown in Figure D.18 consists of three application processes (A, B, C) a switch (U)
and a comparator (V). Initially, the result of A is used. If V discovers a difference between A and C,
this causes U to switch to B instead. The whole system can thus tolerate a single fault in A, B or C.
Now we consider A being faulty (B and C are assumed fault-free). At first, switch U forwards the
erroneous result for a short period of time. Then V detects the deviation of the results of A and C,
causing U to switch to B. Finally a correct value is output to subsequent processes.
Thus, it depends on the sequence in which the states of the different components are evaluated.
As a program is not able to foresee this sequence, situations like the one described above have to
be avoided. For non-monotonic systems a complete state-space analysis is of course possible.
The method described here is only valid for monotonic systems and can be used as a faster
substitute for a complete state-space analysis. It is yet important to note that it cannot be used for
non-monotonic systems.

D.3.2.7 Conclusion

When it comes to the formulation of dependability-related requirements, a clear and unambiguous


description of the requirements seems to be one of the most important aspects. This is especially
true in complex systems, like the integrated safety systems addressed in the EASIS project.

14.11.2006 2.0 D-42


EASIS Deliverable D3.2 Part 1 - App D

Three ways to support the formulation of requirements have been presented in the preceding
chapters:

• a procedure for activities related to the formulation of requirements


• a mathematical model for the formulation of the requirements
• a prototypical tool to allow a user-friendly handling

The procedure can be divided into four steps: description of the system and the required
degradation, description of the possible malfunctions, forming of fault regions and specification of
containment regions. It is important to note, that a single walk through of theses steps is not likely
to be enough for formulating fault tolerance requirements. Instead, a number of iterations of these
steps will be necessary in most cases. Once the requirements are expressed, a check of the
requirements is recommended.
The mathematical model presented in this document describes the formal background of fault
tolerance requirements. It provides clear descriptions for the structure of the system, the fault
model and the desired fault tolerance characteristics. However, due to the formal nature of the
model, these descriptions need to be made as easy to use as possible.
The prototypical tool represents one way of facilitating the formulation of fault tolerance-
requirements. By using an entity-relationship-model of the mathematical formulae, it is possible to
express fault tolerance requirements by a set of tables in a database-management-system. The
prototypical implementation “FT.rex” (standing for “fault-tolerance requirements”), which has been
presented here, is a first attempt of such a tool.

As a result of the work in EASIS, some aspects of the proposed model have shown to be important
to keep in mind for the user. Being described in more detail in the text itself, it is in short important
to know that a) in layered systems the level of error propagation has to be chosen correctly and
that b) non-monotonic behaviour (for example self-repair) is not supported by the proposed model.
Despite of this limitation, the proposed model can be used as a basis for a rather fast analysis of
monotonic system, compared with classical methods like state space-analysis. In any way, the
mere fact that the usage of a formal model requires the user to clearly express his or her thoughts
can be regarded a benefit for the detection of possible flaws in fault tolerance requirements. Thus,
even without actually using all features, formal expression of fault tolerance requirements is likely
to contribute to increasing the reliability of the final system.

14.11.2006 2.0 D-43


EASIS Deliverable D3.2 Part 1 - App D

D.3.2.8 Summary

Table D.2 provides a brief summary of the findings concerning fault tolerance requirements
presented in this document.

General These requirements express the fault tolerance characteristics of a


characteristics given system. This also includes degradation which can be seen as a
special case of fault tolerance characteristics.
How to determine No general solution can be given here. The section “procedure” however
suitable requirements gives some hints for basic activities.

a. Identify services where resilience against faults is required

b. Distinguish malfunctions

c. Form fault regions

d. Form containment regions


Please refer to the respective section for a detailed description. It may
be necessary to iterate the procedure several times.
How to express The core feature of this section is a mathematical model to express
requirements these kinds of requirements in a formal way. This is of course not the
only possible way but only one proposal. The description of the
requirements is based on a description of a fault model and of the
system structure.
Specific difficulties Depending on the level of detail, the specification may become very
extensive.
Relationship to other Hazard probability requirements may precede the establishment of fault
types of requirements tolerance requirements. On the other hand, requirements on specific
dependability mechanisms and fault tolerance requirements may
complement one another.
Relationship to other Not applicable
requirements of the
same type
Verification issues Verification of these kinds of requirements can be achieved by formal
methods. As EASIS Work Task 3.2 (see Part 2 of this Deliverable D3.2)
already deals with such verification issues, they are not covered in this
section.
Examples Please refer to the section “example”. Due to the nature of the
mathematical model, it is not possible to give an example in a few lines
of text without the definitions given there.

Table D.2 - Essential characteristics of fault tolerance requirements

14.11.2006 2.0 D-44


EASIS Deliverable D3.2 Part 1 - App D

D.3.3 Requirements on system architecture

Requirements on the system architecture are often highly appropriate when a dependable system
is to be developed. It may of course be debated whether such "requirements" are really
requirements or if they are part of the system design work. However, as hinted at in the
introduction (D.1) to this appendix, our requirement scope covers both implementation-
independent and implementation-dependent issues.
This requirement type is most easily explained by a few examples:
• It may be the case that a specific action performed by a system may lead to a critical
hazard if it is performed at the wrong time. It may then be appropriate to require that the
action shall only be performed when two independent subsystems both agree that the
action shall be carried out. The system architecture features necessary for such a "two
out of two" (2oo2) scheme may be specified in the form of system architecture
requirements. For example, one subsystem could control an actuator while the other
subsystem controls the power supply to the actuator.
• If the hazard occurrence analysis shows that a failure of a particular sensor may lead to
a critical hazard, it may be appropriate to require that the system is equipped with
redundant sensors. Such redundancy requirements can be seen as system architecture
requirements. In this case, an error detection mechanism that capitalises on this
redundancy would additionally have to be specified. Error detection is discussed in
more detail in section D.3.4.
When redundant components are employed, the issues related to independence between the
components are very important. Section D.3.10 discusses requirements concerning independence
in more detail.
A summary of the issues related to system architecture requirements is given in Table D.3 below.

General Description of architectural features to be implemented in the system in


characteristics order to achieve or facilitate a high degree of dependability.
How to determine Based on the results of the hazard occurrence analysis, it should be
suitable requirements checked if the higher level requirements on fault tolerance and hazard
probability are met in the existing (conceptual or detailed or anywhere in
between) design. If this is not the case, the feasibility of introducing
architecture features that solve the problem should be investigated.
How to express Requirements are expressed as architecture properties.
requirements
Specific difficulties Specification of an architecture is a creative activity, only partly possible
to formalise.
Relationship to other Architectural requirements are typically derived from requirements on
types of hazard probability (D.3.1) and requirements on fault tolerance (D.3.2).
requirements Such requirements often involve components that need to be
independent from each other. Requirements concerning isolation and
diversity will then be important (see section D.3.10).
Relationship to other A requirement may be formulated as "the actuator shall not be activated
requirements of the unless both Unit1 and Unit2 agree that an activation shall be made". At a
same type more technical level, the requirement may be specified as "Unit1 shall
control the high-side driver of the actuator and Unit2 shall control the low-
side driver".
Verification issues Very straightforward. It is only a matter of checking whether the system

14.11.2006 2.0 D-45


EASIS Deliverable D3.2 Part 1 - App D

architecture fulfils the requirement or not.


Examples "There shall be two separate sensors for detection of an impending front-
end collision"
"There shall be two separate units that independently determine whether
a collision is impending"
"The system architecture shall be such that a collision avoidance action
is only taken if both units determine that a collision is impending"

Table D.3 - Essential characteristics of requirements on system architecture

D.3.4 Requirements on specific error detection mechanisms and corresponding reactions

In order to achieve the necessary degree of dependability, mechanisms for error detection during
runtime have to be implemented in any but the most trivial systems. This section is concerned with
establishment of requirements that describe the error detection mechanisms and the
corresponding reactions to be implemented in the system of concern. Such requirements can be
provided to the hardware and software designers for implementation.
It is important to understand that this type of requirements may be expressed at different levels of
detail. In an early phase of the development process, it may be sufficient to specify that a certain
entity, for example vehicle speed data, shall be checked with respect to plausibility. In a later
stage, this plausibility check could be specified in more detail. Thus, this requirement type covers a
broad range from pure requirements to detailed descriptions of the design. Consider the following
example.
Requirement1: The system shall check the longitudinal acceleration sensor signal
according to the following:
o Every 10 ms, the system shall read the acceleration sensor value.
o If the sensor value is found to represent an acceleration larger than +3 m/s^2, this
value shall be replaced by the latest sensor value that did not exceed this threshold.
Furthermore, an error counter shall be incremented by COUNT_UP steps, unless
the error counter is already at its MAX_COUNT value. In case the incrementation
results in a value higher than MAX_COUNT, the value of the error counter shall be
set to MAX_COUNT.
o If the sensor value is found to represent an acceleration of +3 m/s^2 or less, an
error counter shall be decremented by COUNT_DOWN steps, unless the error
counter is already at its MIN_COUNT value. In case the decrementation results in a
value lower than MIN_COUNT, the value of the error counter shall be set to
MIN_COUNT.
o If the error counter has a value of ERROR_THRESHOLD or higher, the system
shall set all its outputs to their passive* states for the remainder of the current
driving cycle.
* the last section of the requirement should additionally include a description of (or a
reference to) the concept of "passive states"
Is this example a requirement or a description of an implementation? Obviously, the border
between requirements and implementation can be difficult to define. We will not attempt to define

1 The example is given as a single requirement but it may be argued that it is better to split this into a number of atomic requirements.
The main requirement could state that the sensor shall be checked, another requirement could specify the sample rate, a third could
describe the incrementation of the error counter, etc. However, this would typically lead to many cross-references between the
requirements and therefore a more complex requirement structure.

14.11.2006 2.0 D-46


EASIS Deliverable D3.2 Part 1 - App D

such a border here. For the purpose of this section, any error detection mechanism description that
could be handed over to someone for further detailing or implementation is considered to be a
requirement.
It should be noted that issues related to error detection mechanisms are partly addressed in
section D.3.2 (“Fault tolerance requirements”), albeit at a higher abstraction level.

D.3.4.1 Overview of error detection principles

It is beyond the scope of this document to describe all principles for error detection that could be
employed in an Integrated Safety System. However, the following subsections provide an overview
of the most common principles in order to clarify the scope of this requirement type.

D.3.4.1.1 Plausibility checks

Plausibility checks are heavily used in automotive electronic systems. Such checks are typically
applied to the data entering a control unit. This input could be a sensor signal or some data
received via a communication network from another control unit. However, plausibility checks can
be applied to any data for which it is possible to define a plausibility criterion. Thus, inputs as well
as intermediate calculation results and final outputs could be checked for plausibility.
The plausibility criterion could be more or less complex, from a simple “within defined range” check
to a complex criterion involving the relationship between the dynamic behaviour of a number of
entities. In short, anything that is known in advance about an entity or about the relationship
between entities could potentially be used to define plausibility mechanisms. Much of the data
processed by electronic control systems represents physical quantities such as velocity,
acceleration, pressure, torque, temperature, etc. Such quantities should follow the laws of nature
and are therefore particularly well-adapted to plausibility checks.
Plausibility checks are effective regardless of what the underlying fault that causes the error might
be. Implausible data can be detected regardless of whether the root fault is a specification fault, a
software design fault, a hardware design fault, a manufacturing fault, a random hardware fault or
any other fault.

D.3.4.1.2 Electrical monitoring

Detection of abnormal electrical signal levels can be considered as a special case of plausibility
checks. Two examples of electrical monitoring mechanisms are:
• Monitoring of supply voltage
• Checks intended to detect short-circuit and open-circuit

D.3.4.1.3 Comparison of redundant information

In a system based on redundant hardware components that perform the same task, error detection
is basically a matter of comparing the outputs from the redundant units and checking if the
difference fulfils an agreement criterion.
This error detection technique is particularly relevant for sensors and processing units.
Redundancy facilitates the detection of errors originating from random hardware faults in such
components.
It should be noted that requirements for comparison of redundant information typically leads to
architectural requirements (see D.3.3) that the underlying redundancy shall exist in the system.

14.11.2006 2.0 D-47


EASIS Deliverable D3.2 Part 1 - App D

D.3.4.1.4 Detection of errors in processing units

There are many ways of detecting errors in processing units. Some examples of such mechanisms
are:
• Processor built-in test
• Check of execution flow, based on signature monitoring or similar technique
• Watchdog timer
• ROM/Flash memory checksum
• Write/Read test of RAM
• Exceptions inbuilt in the processor
• Check of memory accesses with respect to allowed address range
• Questions/Answers check performed by processor-external component

D.3.4.1.5 Monitoring of communication

In distributed systems, the communication network that ties the involved control units together is
obviously an important part of the system. By performing checks on the incoming messages, a unit
can detect errors caused by faults in other units and in the communication itself. The error
detection principles that can be employed include the following:
• Mechanisms inbuilt in communication protocol (error detection code, frame format check
etc)
• Message timeout check (message not received within the expected time window)
• Update timeout check (message not indicated as updated by the transmitting unit within the
expected time window)
• Application-level checksum check (checksum protecting a signal from the generation by the
application layer in one unit to the reception in an application layer in another unit, not just
the signalling on the communication link)
• Message sequence counter check
• Consistent view in distributed systems (membership, etc)
As mentioned in section D.3.4.1.1, plausibility checks may of course also be applied to data
received via the communication network.

D.3.4.1.6 Other mechanisms

A few examples of other mechanisms are:


• A/D conversion of known voltage (with the purpose of checking the A/D function)
• Check that an actuator command produces the expected response

14.11.2006 2.0 D-48


EASIS Deliverable D3.2 Part 1 - App D

D.3.4.2 Reaction when an error is detected

If an error detection mechanism is to be useful, some action must be taken in response to a


detected error. These actions are generally application-dependent, but some examples of actions
are:
• Switch off function (example: switch off ABS function in case of a permanent ABS controller
error)
• Enter a degraded mode in which the full function is not available
• Inform the driver, for example by warning lamp or text display
• Inform other systems
• Reset the system or roll back to a previous system state
• For the case when an input value is detected to be incorrect, for example a sensor out-of-
range error, the following actions can be imagined:
o Use previous value
o Estimate value
o Use a default value
• Store information about the error (DTC and other non-volatile storage) for readout during
service
In order to determine when to perform an action, there are at least two alternatives:
• Up/down counter: Every time an error (or a possible error) is indicated by the error
detection mechanism, the counter is increased a predefined amount. Every time the error is
not indicated, the counter is decreased. When the counter reaches a certain value, an
action is triggered.
• Timeout: When an error has been continuously indicated for a predefined duration, an
action is triggered. A particular case is when the timeout duration is infinitesimally small
which would mean that the action is taken immediately upon the first indication of an error.
It should be noted that intermittent errors may remain undetected if the timeout scheme is used,
since each occurrence of the error may be shorter than the timeout interval. The up/down counter
scheme, on the other hand, is better suited for the detection of intermittent errors.

D.3.4.3 Derivation of requirements

A typical first step in the specification of specific error detection mechanisms is to decide upon a
set of "default" error detection mechanisms to implement. These mechanisms are those that can
be decided upon without any preceding hazard occurrence analysis, since "everyone" knows that
they will have to be implemented anyway. Typical examples are ROM checksum and monitoring of
supply voltage. These default mechanisms shall be included in the requirement specification and
shall thereafter be considered as included in the system design. In other words, any hazard
occurrence analysis performed after this will be based on the assumption that these mechanisms
are implemented. Of course, it will later have to be verified that these mechanisms are indeed
implemented in the final system design but that is another story.
When a hazard occurrence analysis has been carried out, either at a conceptual or a detailed
level, its results will describe the relationship between faults and errors on one hand and hazards
on the other. If a quantitative analysis has been performed, the results will additionally include
estimations of the contribution from a particular fault to the hazard probability.
In order to enable suitable error detection requirements to be specified, some higher level
requirements must have already been defined. More specifically, requirements on hazard

14.11.2006 2.0 D-49


EASIS Deliverable D3.2 Part 1 - App D

probability and/or fault tolerance have to exist. Such requirements are addressed in other sections
of this appendix. Examples of such requirements are:
• The occurrence rate of hazard H9 shall be less than 10^(-7) per hour
• “A single fault shall not lead to hazard H9”
• “A pressure sensor fault shall not lead to hazard H4”
Thus, the following information is assumed to be available, at least partly, when requirements on
specific error detection mechanisms are to be specified:
• Identification of the faults that can lead to a particular hazard (i.e. results of the Hazard
Occurrence Analysis, see Appendix C)
• Estimations of occurrence rates for those faults that can lead to the hazards (i.e. results of the
Hazard Occurrence Analysis, see Appendix C)
• Requirements that specify the faults that are not allowed to create a particular hazard (i.e. fault
tolerance requirements, see section D.3.2)
• Requirements on tolerable occurrence rates of each hazard (i.e. hazard probability
requirements, see section D.3.1)
Based on this information, requirements for specific error detection mechanisms can be
determined as follows:
1. If the hazard occurrence analysis shows that a particular fault can lead to a particular hazard
and this is forbidden by a fault tolerance requirement, there are two options:
a) Change the system architecture radically so that this causal path from fault to hazard no
longer exists. (See section D.3.3.)
b) Define an error detection mechanism that breaks the cause-consequence chain from fault
to hazard. If this mechanism does not completely eliminate the error propagation from the
fault to the hazard, the mechanism needs to be redefined or additional mechanisms have
to be defined. This process is repeated until the propagation is eliminated or until the
effectiveness2 of the combined mechanisms is so high that it is virtually impossible for the
fault to create the hazard. In the latter case, the original fault tolerance requirement is not
strictly met but the solution may still be approved if the decision policy supports approval
of such deviations.
2. If the hazard occurrence analysis shows that the occurrence rate of a particular hazard
exceeds the tolerable limit defined by a hazard occurrence rate requirement, there are two
options:
a) Change the system architecture radically so that a tolerably low hazard occurrence rate is
achieved. (See section D.3.3.)
b) Define error detection mechanisms that partially break the error propagation path from
those faults that significantly contribute to the occurrence of the considered hazard. If the
combination of these mechanisms do not provide sufficiently effective3 protection against
the error propagation, they need to be redefined or additional mechanisms have to be
defined. This is a complex process that implies a loop between the definition of error
detection mechanisms and the analysis of the effectiveness of the combined mechanisms.
The loop is typically stopped when the hazard occurrence requirement is met, but an
ALARP requirement (see section D.3.1.5) would additionally mean that the loop is

2 The effectiveness of the error detection mechanisms can be estimated by an iteration of the Hazard Occurrence Analysis, with the
mechanisms assumed to be imlemented in the system.
3 The effectiveness of the error detection mechanisms can be estimated by an iteration of the Hazard Occurrence Analysis, with the
mechanisms assumed to be imlemented in the system.

14.11.2006 2.0 D-50


EASIS Deliverable D3.2 Part 1 - App D

continued until the cost of any further risk reduction is grossly disproportionate to the
benefit obtained.

D.3.4.4 Requirements formulation

A requirement that the system shall perform a specific error detection mechanism and
corresponding reaction is simply expressed as “The system shall...” followed by a description of
the mechanism. This description should include at least the following:
• Which component (which hardware unit? which software module?) performs the check?
• What is checked?
• When is the check made?
• What is the criterion for determining that there is a possible error?
• What action, if any, is taken by the component when a possible error is detected?
• What does this component action lead to at the system and vehicle levels, and how
does the component action propagate to the system level?
• What is the criterion for determining that a possible error is to be considered as a real
error?
• What action is taken by the detecting component when an error is considered to be a
real error?
• What does this component action lead to at the system and vehicle levels, and how
does the component action propagate to the system level?
• What is the criterion for considering that a real error has disappeared?
This list above only considers two stages; the error is first considered as possible and then as real.
It may be possible that more stages are used. For example, the system could enter a degraded
mode in a first stage, switch off a function in a second stage and, in a third stage, inform the driver
and set a Diagnostic Trouble Code.

D.3.4.5 Verification of requirements

For the verification of requirements on specific error detection mechanisms, there are basically two
alternatives. Whenever possible, both should be used. However, it is sometimes the case that only
one verification mechanism is feasible.
• Review of the system design including software code to check that the mechanism has been
implemented and that it does not contain any design fault.
• Fault injection testing in which the fault or error is injected and it is checked that the system
reacts in the specified manner.

D.3.4.6 Requirements on specific error detection mechanisms and corresponding


reactions: Summary

Table D.4 provides a compact summary of the findings concerning requirements on specific error
detection mechanisms and corresponding reactions.

14.11.2006 2.0 D-51


EASIS Deliverable D3.2 Part 1 - App D

General A requirement on a specific error detection mechanism describes a


characteristics function to be performed by a system in order to detect a specific error.
This includes a description of the check to be made, the criteria for
determining that an error exists, and the action to take when an error has
been detected.
How to determine Some typical mechanisms may be specified at a very early stage.
suitable requirements Based on the results of the hazard occurrence analysis, it should be
checked if the higher level requirements on fault tolerance and hazard
probability are met by the design. If this is not the case, error detection
mechanisms should be specified that prevent the error propagation (from
root faults to hazards) to a sufficient degree.
How to express Requirements should be expressed as descriptions of the specific error
requirements detection mechanisms, including the actions to take when an error has
been detected. A checklist for such descriptions is given in section b).
Specific difficulties Estimation of the detection coverage is difficult, particularly when many
different mechanisms at different levels can detect errors created by the
same root fault. Thus, it is difficult to know if a set of requirements is
sufficient to meet the design goals.
Relationship to other As described in section D.3.4.3, requirements on specific error detection
types of mechanisms are derived from fault tolerance requirements and hazard
requirements probability requirements, with consideration of the results from the
hazard occurrence analysis.
For error detection mechanisms based on redundancy, the actual
redundancy is described by requirements on the system architecture
(see D.3.3).
Relationship to other Error detection mechanisms can be specified at different levels. In early
requirements of the stages, error detection requirements may be expressed quite
same type superficially (for example, “radar errors shall be detected and the
collision avoidance function shall be switched off in case such an error is
detected”). At a later stage, the actual checks to make, the criteria for
determining that a specific action shall be taken and the details of how
this action is carried out should be described. However, this is more a
question of requirements refinement than of different requirements.
Verification issues Verification is relatively straightforward. Fault injection testing, combined
with design review should be sufficient.
Examples "The vehicle speed data shall be considered implausible if it is above 250
km/h constantly during at least 500 ms"
"The vehicle speed data shall be considered implausible if its change
rate represent an acceleration outside the interval [-15m/s^2, +5m/s^2]"
"If the vehicle speed data is considered to be implausible, the Cruise
Control function shall be switched off"
"The ROM contents shall be added together using modulo-256
arithmetics. If the calculated ROM checksum differs from the stored ROM
checksum, the unit shall remain passive for the remainder of the driving
cycle" (This requirement would additionally need detailed definitions of
"passive", "driving cycle", etc)

Table D.4 - Essential characteristics of requirements for specific error detection


mechanisms and corresponding reactions

14.11.2006 2.0 D-52


EASIS Deliverable D3.2 Part 1 - App D

D.3.5 Quantitative requirements for hardware architecture

Quantitative requirements for hardware architecture express architecture properties concerning


random hardware faults. A first type of such requirements aims at limiting the proportion of single
point random hardware faults (Safe Failure Fraction and Dangerous Failure Coverage). Another
type aims at limiting the proportion of random hardware latent faults (Monitoring Coverage).

D.3.5.1 Single point faults requirement

“No single point faults are allowed to cause a Hazard” is probably the simplest qualitative
requirement we can imagine concerning the architecture of a system. But in actual systems, there
remain single point faults, because it would not be cost effective to suppress all of them. Then,
how to specify “how much of single point fault is tolerable”?
A simple approach may be to qualitatively assess the occurrence of the single point faults and only
tolerate those with a negligible occurrence. The weakness of this approach is: how to translate
“negligible occurrence” into a non-ambiguous requirement?
A way out for random hardware faults is to quantify the occurrence of single point faults. In that
case we may either
• Allocate a part of the hazard probability requirement to single point faults. This is an
approach from the top.
• Allow only that a quantified proportion of the faults are single point faults. This is an
approach from the bottom. It is this approach that has been chosen in the IEC 61508
standard [1] with the Safe Failure Fraction (SFF) and in the ISO 26262 draft [5] with the
Dangerous Failure Coverage (DFC).

In the IEC 61508 standard, a SIL requirement (see section 3.1.3) is broken down into a
combination of two hardware architecture requirements (see figure below from IEC 61508-2 [1]).
The first one is a required level of Hardware Fault Tolerance (HFT) and the second one is a
minimum level of Safe Failure Fraction (SFF).

Hardware safety integrity: architectural constraints on type A safety-related subsystems


Safe failure fraction Hardware fault tolerance
0 1 2
< 60% SIL 1 SIL 2 SIL 3
60% -< 90% SIL 2 SIL 3 SIL 4
90% -< 99% SIL 3 SIL 4 SIL 4
≥ 99% SIL 3 SIL 4 SIL 4

A level of HFT is defined as follows: “a hardware fault tolerance of N means that N+1 fault could
cause the loss of the safety function". In determining the hardware fault tolerance no account shall
be taken of other measures that may control the effects of faults such as diagnostics”.
This definition is not directly applicable for us, as in the automotive field safety functions and main
functions are not separate. Thus the expression “loss of safety function” is misleading. An
alternative definition for HFT could be: a HFT of N means that N+1 fault could lead to a dangerous
failure mode, without taking into account the effect of diagnostics.

14.11.2006 2.0 D-53


EASIS Deliverable D3.2 Part 1 - App D

To derive the SFF, random hardware faults are divided in three categories:
• Dangerous faults, that may lead to a dangerous failure mode. Total failure rate of these
faults is noted: ΣλD
• Dangerous detected faults, that may lead to a dangerous failure mode but are detected to
put the system in a safe mode. It is a subset of dangerous faults. Total failure rate of these
faults is noted: ΣλDD
• Safe faults, that cannot lead to a dangerous failure mode even in combination with other
faults. Total failure rate of these faults is noted: ΣλS
SFF is defined in the IEC 61508 as follows: “the safe failure fraction of a subsystem is defined as
the ratio of the average rate of safe failures plus dangerous detected failures of the subsystem to
the total average failure rate of the subsystem”. SFF = ( ΣλS + ΣλDD )/( ΣλS + ΣλD )
So SFF is the ratio of failure rates of faults that do not directly lead to a dangerous failure mode
over the total failure rate of the subsystem.
It is clearly mentioned in the definition of SFF that it is not considered at system level, but at
subsystem level. As HFT gives the level of redundancy an interpretation can be that SFF must be
calculated on single channel structures (HFT = 0) only.
In order to verify a SFF requirement, it is necessary to categorize every failure mode of every
component of the considered subsystem into one of the three categories (dangerous faults,
dangerous detected faults and safe faults). It is important to notice that this classification is
generally different for different failure modes of the subsystem. It is generally not possible to derive
a single SFF concerning all the subsystem dangerous failure modes. So, a particular SFF
characterizes the subsystem for only one of its failure modes.
Then, all these HW faults are quantified using for instance standard databases. Quantification of
dangerous detected faults and dangerous faults must also take into account the efficiency of
diagnostics. The effect of a dangerous fault may be detected by software with a given efficiency.
So a proportion of the failure rate will be allocated to the dangerous detected category and the rest
to the dangerous category. Thus SFF requirement is linked to error detection requirements.

Let’s take an example to illustrate what has been said concerning HFT and SFF: the low beams
function. The hazard considered is spurious cut off of both lamps with a SIL 2 requirement.
Let’s assume that:
• An ECU receives the requests from the driver and controls both lamps
• There is an independent power transistor for each lamp. There are two different power
supply lines for the two transistors.
In this example (see Figure D.19), the ECU has to be broken down into two parts. The controller
and the input receiving the driver request have an HFT of 0. The power stage has an HFT of 1.
So a HFT of N means that there are N redundant channels in the considered subsystem.

14.11.2006 2.0 D-54


EASIS Deliverable D3.2 Part 1 - App D

+Bat 1

ECU
HFT=0 HFT=1

CPU
Driver Left power driver Left lamp
requests
Right power driver Right lamp

+Bat 2
Figure D.19 Example system for low beam control

If we refer to the architectural constraints table, to comply with a SIL2 requirement, the CPU
subsystem shall have an SFF between 60 and 90%. The power drivers subsystem, as it has a HFT
of 1, doesn’t need to have a SFF better than 60%. As both power drivers are identical, the SFF
requirement for each power driver is the same as for both power drivers taken together. Then, SFF
has to be calculated for CPU and power driver blocks, using for instance FMEA technique, to verify
compliance. In this FMEA, the failure modes (FM) are the electronic parts (resistors, transistors…)
failure modes and the effects (E) are the failure modes of the hardware block (CPU or power
driver).
A weakness of SFF that is often mentioned is that it can easily be tuned. For instance if you add a
function that is not safety critical to the subsystem you will improve the SFF. This can easily be
overcome if you take only into consideration for the calculation of SFF the components that have at
least one dangerous failure mode.
Another drawback of SFF is that you cannot compare two architectures with a better granularity
than the SIL. For instance, a SIL2 system may be realised using two redundant channels, each
with an SFF of 75% or it may be realised using a single channel with an SFF of 95%. Which one is
better? Because of the link between HFT and SFF, it is not possible to make sharp comparisons
between different architectures. As the effort to derive SFF values is important, it should not be
only useful to assess compliance but it should also be useful for design purposes
An alternative to the architecture constraints of IEC 61508 is currently discussed in the ISO
working group in charge of the future ISO 26262 standard. In this alternative, HFT is not
considered. The quantitative requirement characterises the whole system for a particular hazard.
This requirement derives from the ASIL level. To define this alternative metric, random hardware
faults of the whole system are divided in 4 categories:
• Dangerous faults, leading directly to the hazard. Total failure rate of these faults is noted:
ΣλD. It is important to note that this category is not equivalent to the dangerous faults used
in the definition of SFF. Here, dangerous faults are always single point faults.
• Potentially Dangerous faults, leading to the hazard only in combination with other
independent faults. Total failure rate of these faults is noted: ΣλPD. This category is divided
into two subsets:
o Controlled dangerous faults, that are potentially dangerous faults prevented to lead
to the hazard by a safety mechanism. Total failure rate of these faults is noted: ΣλCD.
o Loss of safety mechanism. The purpose of a safety mechanism is to avoid
propagation of a fault up to the hazard. Total failure rate of these faults is noted:
ΣλB.

14.11.2006 2.0 D-55


EASIS Deliverable D3.2 Part 1 - App D

One may argue that a combination of HW faults may lead to a hazard without any of the
faults being a safety mechanism. The fault model used here considers that the additional
conditions to lead to a hazard are always there on purpose. On real automotive hardware,
this is certainly true at system level. When it comes to the detailed hardware circuitry, this
model is a little bit simplistic but is considered acceptable.
• Safe faults that cannot lead to a dangerous failure mode even in combination with other
faults. Total failure rate of these faults is noted: ΣλS.
A first proposal, described in the first draft of ISO 26262 called Dangerous Failure Coverage (DFC)
is defined as the ratio of the failure rate of controlled dangerous faults to the failure rate of
dangerous faults plus controlled dangerous faults. DFC= (ΣλCD)/ (ΣλCD + ΣλD).
DFC addresses single point faults. A weakness compared to SFF is that as it does not take into
account the safe failures, a design that favours an intrinsically safe design does not take any credit
with this metric.
To overcome this weakness, another variant of DFC closer to SFF (as it includes safe faults) has
been proposed. It is defined as the ratio of the failure rate of safe faults plus potentially dangerous
faults of the system to the total failure rate of the system. New Metric = (ΣλCD + ΣλS)/ (ΣλD + ΣλCD +
ΣλS).
This second variant of DFC, excludes hardware dedicated to the implementation of safety
mechanisms (ΣλB). This means that in order to derive DFC, a distinction has to be done between
hardware implementing the functions and hardware implementing the safety mechanisms. This is
not always possible.
A third variant of DFC that is simpler to define and to derive could be imagined: (ΣλPD + ΣλS)/ (ΣλD
+ ΣλPD + ΣλS) = 1- (ΣλD/ Σλ)
With this third variant, there are only two categories of faults to be distinguished: dangerous faults
(= single point faults) and all the others. The others include multiple point failures and safe faults.
There is no need any more to define controlled dangerous faults and safety mechanisms.
In any case, a component of the system is only taken into consideration if at least 1 of its failure
modes is a dangerous or a controlled dangerous fault. So, the metric cannot be tuned, adding safe
functions or components.
Verification of a DFC requirement like SFF requires sorting out every failure mode of every
component of the system into one of the different categories. All these HW faults are quantified.
Quantification of dangerous faults must take into account the efficiency of diagnostic.
DFC has a major advantage over SFF/HFT in that it always allows comparison between different
architectures as it is not linked to a second requirement like HFT.
There is a link between SFF and DFC requirements on one hand and hazard occurrence
requirement on the other hand. Both are quantitative requirements concerning random hardware
faults. In most cases, single point faults addressed by SFF/DFC are major contributors to the
occurrence of a hazard. One may argue that hazard occurrence is really what matters. So, why
consider SFF or DFC requirement? Verification of a quantitative requirement on hazard occurrence
is rather difficult. Calculations have to take into account things like exposure duration. SFF or DFC
requirement are easier to calculate as time is not considered. Another advantage is that verification
of SFF/DFC and hazard occurrence use components estimated failure rates, but SFF and DFC
use them in a ratio. So, SFF and DFC are more robust to failure rates inaccuracy.

D.3.5.2 Latent faults requirement

Hazard probability requirement gives a target for a “young” system out of the production line. As
time passes, random hardware faults that do not directly cause a hazard may appear and will
subsequently increase the probability of the hazard. These faults may remain unnoticed (latent)
because they don’t have any functional consequence. So it is important to detect as much as
14.11.2006 2.0 D-56
EASIS Deliverable D3.2 Part 1 - App D

possible of these faults, to signal them to the driver to induce him or her to repair the car. It is what
Monitoring Coverage (MC) of ISO 26262 draft is all about: It is not to be confused with diagnostic
coverage of IEC 61508.
If we consider the same fault categories as defined for DFC, the faults that we don’t want to be
latent are the potentially dangerous faults composed of controlled dangerous faults and loss of
safety mechanisms. So MC may be defined as the ratio of the failure rate of potentially dangerous
faults detected by the driver to the failure rate of potentially dangerous faults. MC = ΣλPD D/ΣλPD.
Detection by the driver may be achieved through a built in diagnostic test triggering the lighting of a
lamp on the dashboard or through a functional degradation detectable by the driver.
Assumptions regarding the efficiency of detection by the driver in case of detection through
functional degradation have to be made to calculate MC. Also, duration between detection by the
driver and repair is not taken into account. It is considered as sufficiently short to be negligible.
Obviously, there is a link between MC requirement and hazard occurrence requirement. As
mentioned before, latent faults occurring during the lifetime of the system will increase the
probability of the hazard. These latent faults have to be taken into account in the calculation of
hazard probability. But as hazard probability (due to hardware random faults) is a function of many
different things, the link between hazard probability requirement and the quantification of latent
hardware faults is not as straightforward as with a MC requirement.
Table D.5 provides a compact summary of the findings concerning quantitative requirements for
hardware architecture.

General characteristics These requirements express hardware architecture properties


concerning random faults in a quantitative way.
How to determine suitable SFF derives from a SIL requirement.
requirements DFC and MC derive from an ASIL requirement.
How to express They are expressed as a percentage.
requirements
Specific difficulties .
Relationship to other SFF is explicitly linked to the level of hardware fault tolerance in
types of requirements order to comply with a SIL requirement.
There is a relationship between SFF/DFC/MC and hazard
occurrence requirements
Relationship to other Not applicable
requirements of the same
type
Verification issues Verification requires evaluation of failure rates of hardware
components. But these requirements are more robust to component
failure rates inaccuracies than hazard occurrence requirement.
Examples DFC for output S1 stuck at 0 shall be greater than 60%
MC for wrong speed info sent on CAN bus shall be greater than
90%

Table D.5 - Essential characteristics of quantitative requirements for hardware architecture

14.11.2006 2.0 D-57


EASIS Deliverable D3.2 Part 1 - App D

D.3.6 Requirements on the avoidance of non-systematic faults

When the hazard occurrence analysis shows that a particular fault can lead to a particular hazard,
one of the following solutions is typically employed:
• Introduction of redundant components (for example multiple sensors or multiple
processors) to achieve fault tolerance, thereby breaking the causal relationship between
fault and hazard
• Introduction of error detection mechanisms and functional degradation when an error is
detected
However, in some cases such techniques may be inappropriate and/or insufficient. For example:
• The cost of redundancy might be grossly disproportionate to the benefits gained.
• Component redundancy might be infeasible.
• Detection of the error might be infeasible.
• A degraded mode might be impossible to define.
• Reliability considerations may dictate that a functional degradation should not occur more
often than some predefined limit.
• The probability of multiple faults that together lead to a hazard may be too high in a system,
even if any single fault is prevented from leading to a hazard.
For these reasons, specification of requirements on fault avoidance is often appropriate. Examples
of such requirements are:
• an upper limit on the occurrence rate of a particular fault
• a requirement on the MTBF (Mean Time Between Failure) for a particular fault
• a requirement that a particular fault shall not occur at all (for example, a connector could be
assigned a requirement that a mechanical disconnection due to vibration shall not be
possible)
Obviously, it is not feasible to place a fault avoidance requirement on any single component of a
system. For example, the following example represents an extremely unsuitable choice of
abstraction level:
• An open circuit of capacitor C126 shall occur less than once per 10^6 hours
• A short circuit of resistor R48 shall occur less than once per 10^6 hours
In this case, a much more appropriate requirement could be:
• The hardware fault rate in the Electronic Brake Controller Unit shall be less than 10^(-4) per
hour
Note that requirements concerning the avoidance of design faults and manufacturing faults are
addressed in sections D.3.9 and D.3.11 respectively.

14.11.2006 2.0 D-58


EASIS Deliverable D3.2 Part 1 - App D

D.3.6.1 Fault avoidance requirements: Summary

General A fault avoidance requirement specify the degree to which a particular


characteristics non-systematic fault is allowed to occur
How to determine For any fault that may lead to an undesirable effect, the appropriateness
suitable requirements of an avoidance requirement on this fault should be considered.
How to express Fault avoidance requirements are typically expressed as an upper limit
requirements on the occurrence rate of a particular fault.
However, the following alternatives are sometimes appropriate:
• a requirement on the MTBF (Mean Time Between Failure) for a
particular fault
• a requirement that a particular fault shall not occur at all
Specific difficulties None
Relationship to other Probabilistic requirements on the occurrence of particular hazards can
types of requirements obviously be broken down into probabilistic fault avoidance requirements
of individual components.
Relationship to other Fault avoidance requirements can be assigned to components of
requirements of the different granularity. The relationship between different granularity levels
same type is represented by trivial reliability combinatorics. For example, the fault
occurrence rate of a unit is the sum of the fault occurrence rates of all its
constituent components.
Verification issues Verification is performed by a combination of analysis (using an
established reliability prediction method) and stress testing
Examples “The wheel speed sensor shall have a reliability of 0,99 over the lifetime
of the vehicle”
“Hardware faults in the engine ECU shall occur less than once per 10^4
hours”

Table D.6 - Essential characteristics of fault avoidance requirements

D.3.7 Critical functional requirements

In the context of integrated safety systems one might consider all functional requirements also as
critical functional requirements as a safety system always deals with the avoidance of critical
situations. But this view on the functional behaviour is too strong. For example a collision
avoidance system may have the functional requirement that information is given on the dashboard
in a particular manner. Whether information is given for example in yellow or red is not critical.
Signalling the braking to the driver is less important than activating the braking itself.
Critical functional requirements are those that may lead to a highly critical hazard if they are
violated. Hence it is essential that these functional requirements are correctly implemented. From
that point of view it is desirable to analyse the software carefully with respect to such requirements.
It is recommended to use, if applicable, formal verification techniques to guarantee that the
software behaves 100 % correctly with respect to the given critical requirements.
The first question is which of the given functional requirements are critical and which not. The
classification might be not unambiguous. For example, is the correct functioning of the indicator
lights critical or not. Consider the situation that a car is setting the indicator lights to turn left, the
dashboard signal indicates a correct working of the lights, but assume that the lights are not
working. A following car observes the braking of the car in front and assumes that it will stop and

14.11.2006 2.0 D-59


EASIS Deliverable D3.2 Part 1 - App D

hence starts to pass the vehicle in front. But the vehicle in front turns left and the accident is
preassigned.
To obtain a classification the functional requirements have to be reviewed. One has to ask, which
are the main functional requirements specified to avoid critical situations and which of the
functional requirements are less critical. To identify the critical ones one may ask: “What will
happen if the requirement is violated? Will this lead to a catastrophic situation?”
Examples of critical functional requirements are:
• No airbag ignition without a crash.
• All doors will be unlocked if a crash is detected.
• No automatic braking (initiated for example by a collision avoidance system) without a
critical situation.
• If the vehicle is closed to a crash the collision avoidance system will activate the braking
system.
Additional critical functional requirements can be derived from the hazard analysis. Identifying
hazardous events/situations may lead to functional requirements that the (software) system should
never lead to these hazardous situations. For example a hazardous event may be the situation that
acceleration is active during braking. This leads to the functional requirement, that the acceleration
request signal should be always low when the braking is active.
There are two classical patterns to classify critical functional requirements:
• Safety Properties
expressing that “something bad never happens”. Typically these are invariant properties.
o “Never: airbag ignition and no crash signal” meaning no airbag ignition without a
crash.
o “Never: doors unlocked and crash signalled”.
• Progress Properties
expressing that “something good will happen soon”. Such a property will specify a required
reaction depending on a specific situation.
o Whenever a crash is detected the airbag will be ignited within 25 ms.
o If the collision avoidance system detects a situation close to a crash the braking
system will be activated.
Critical functional requirements should be treated with a particular attention regarding verification.
In general it is not feasible to formally prove that an entire distributed system fulfils all its functional
specifications. But proving that a specific requirement is fulfilled by a specific software module is in
the scope of today’s technology, for example by formal methods. It is recommended to use a
formal approach for verifying the correctness of a software system with respect to the identified
critical requirements. A non trivial task for that is the formalization of the requirements in terms of a
formal language suitable for formal verification. Details on that and on the formal verification
approach itself are given in Part 2 of these Guidelines.
Counter measurements are needed to avoid the loss of critical functionality caused by hardware
failures. Hence critical functional requirements are related also to fault tolerance requirements and
fault avoidance requirements.

14.11.2006 2.0 D-60


EASIS Deliverable D3.2 Part 1 - App D

D.3.7.1 Critical functional requirements: Summary

Table D.7 below provides a compact summary of the findings concerning critical functional
requirements.
General Critical functional requirements are those functional requirements which
characteristics are related to the avoidance of critical behaviour.
How to determine Usually the functional requirements are given. In a review process the
suitable requirements critical ones have to be identified. Which are the main functional
requirements specified to avoid critical situations?
How to express Typically critical functional requirements are expressed (1) by
requirements relationship expressing required reactions in identified states which may
lead to critical situations or (2) in forms of invariants expressing what
never should happen.
Specific difficulties The difficulties are in identifying the critical ones from the non-critical
ones.
Relationship to other Counter measurements are needed to avoid the loss of critical
types of functionality caused by a failure. Hence critical functional requirements
requirements are related also to fault tolerance requirements and fault avoidance
requirements.
Relationship to other
requirements of the
same type
Verification issues Critical functional requirements should be treated with a particular
attention regarding verification. If possible formal verification techniques
should be used to verify the consistency of the system with respect to the
functional requirements. A non trivial task is the formalization of the
requirements in terms of a formal language suitable for formal
verification.
Examples “No ignition of the airbag in a non-crash situation”
“In a crash situation the ignition of the airbag should be within 25 ms”
“In a near crash situation (e.g. expressed in terms of vehicle speed and
distance to an obstacle) the brake systems should be activated.”
“In a crash situation all doors should be unlocked”.

Table D.7 - Essential characteristics of critical functional requirements

D.3.8 Requirements for functional limitations

In many cases, risks can be reduced by a limitation of the authority of the considered system. For
example, we can consider a Collision Avoidance (CA) system that applies braking without driver
demand when the vehicle is about to collide with a forward object. Braking as hard as possible
seems to be a good strategy. After all, the braking should only be made when the vehicle is very
close to a crash. One potential hazard of a CA system is obviously a totally unnecessary activation
of the brakes in a perfectly normal driving situation. The higher the brake force that the CA is
allowed to create, the more critical this particular hazard will be. There is a trade-off to be made
here between the desirable behaviour in a collision situation and the system safety requirements.
Functional limitation requirements are concerned with limiting the authority of control systems in
order to reduce the criticality of the hazards. In reality, this means that a significant portion of the
occurrence rate of a highly critical hazard is moved to a less critical hazard. Such authority

14.11.2006 2.0 D-61


EASIS Deliverable D3.2 Part 1 - App D

limitations can be made with respect to magnitude (for example brake force) and/or the duration of
the intervention. Obviously, a CA system would never need to brake longer than a very short time,
so a limitation of the duration would not affect the correct function but it would reduce the
occurrence of the hazard “long-duration unwanted activation of the brakes”.
Regarding the implementation of such functional limitations, it is strongly recommended that the
limitation is implemented close to the actual actuation. In the collision avoidance example, a
limitation in the brake controller would be more effective than a limitation in the Collision Avoidance
controller, assuming that there is a dedicated such hardware unit separate from the brake
controller. In fact the limitation should ideally be implemented in the actual control of the brake
actuator (or even in the actuator itself if this is possible). A limitation only provides protection
against “upstream” errors; any error occurring “downstream” of the limitation would of course not
be handled.
A functional limitation would split a previously defined hazard into two different hazards, one
representing the case when the limitation works and another representing the case when the
limitation is violated. This split would then necessitate an update of the hazard identification results
and a re-iteration of the hazard classification for these two cases.
Concerning the issue of defining suitable limitations, every hazard should be checked with respect
to the possibility of limiting the criticality by limiting the authority of the system.
In particular, the possibility of functional limitations should be investigated if the quantitative results
of the hazard occurrence analysis shows that hazard probability requirements are not met.

14.11.2006 2.0 D-62


EASIS Deliverable D3.2 Part 1 - App D

D.3.8.1 Requirements for functional limitations: Summary

Table D.8 provides a compact summary of the findings concerning requirements for functional
limitations.

General characteristics Functional limitation requirements limit the authority of a system with
respect to its influence on the behaviour of the vehicle.
How to determine For every failure mode, investigate if a functional limitation is feasible
suitable requirements and desirable.
How to express A description of the limitation and a specification of where the
requirements limitation is to be allocated.
Specific difficulties Tradeoffs with functional requirements emanating from the desired
behaviour of the system
Relationship to other Hazard probability requirements constitute important inputs to the
types of requirements identification of suitable functional limitations, particularly when these
probabilities are compared to the results of a quantitative hazard
occurrence analysis.
Relationship to other None
requirements of the
same type
Verification issues Verification is relatively straightforward:
• Fault injection testing in which the limitation will be violated
unless the limitation works as specified
• Review of the implementation of the functional limitation
Examples “For the Collision Avoidance function, the brake controller shall limit
the brake actuator command to a level corresponding to 50 bars
hydraulic pressure.”
“For the Collision Avoidance function, the duration of brake actuator
commands issued by the brake controller shall be limited to a duration
of two seconds”

Table D.8 - Essential characteristics of requirements for functional limitations

D.3.9 Requirements on the design process

This section describes design process requirements and splits these requirements into two main
categories. It identifies several kinds of process requirements and where they are originating from,
their characteristics and their relationship to other requirements and their verification. This section
also provides examples of this type of requirement.
It should be noted that the topic of process requirements for software development and design is
covered in much more detail by EASIS Deliverable D4.1 "EASIS Engineering Process Framework"
[6]. Therefore, only an overview of this highly important topic is provided here.

14.11.2006 2.0 D-63


EASIS Deliverable D3.2 Part 1 - App D

D.3.9.1 Context

There are two categories of process requirements


• Requirements that a development process itself has to fulfil
These requirements are coming out of a reference process (e.g. Automotive SPICE or
IEC 61508 [1]) or the properties of the process are described (e.g. traceability). These
requirements are referred as category 1 requirements.
• Requirements placed upon the development process itself
These requirements are used to tailor an existing process if the process allows the choice
of several methods, techniques or analysis. One good example is IEC 61508 [1] which
allows the choice of several methods per design phase (under the constraint of the
needed SIL). These requirements are referred as category 2 requirements
The first category of process requirements is only shortly discussed here. A more detailed
discussion on this topic is found in the EASIS WP4 deliverable D4.1, where the EASIS Engineering
Process has been developed due to a reference process and other process requirements.
The second category of process requirements are constraints placed upon the development
process or are used to tailor the development process of the system.
Process requirements are typically included when large organizations with existing standards and
practices and qualified technical staff are procuring systems. They may vary in detail from very
specific instructions on the process which must be followed (e.g. ISO/IEC 12207:1995, IEC 61508
[1]) to more general requirements such as that the process must be ISO 9000 conformant. They
may be included due to several sources:
• Laws and regulation
In some countries laws and regulation could request the conformance to certain process
requirements or reference processes, e.g. to use DO178B or IEC 61508. Laws and
regulation are documenting the society view on risk and safety.
• Industry best practices
Each industry sector could recommend there own process requirements due to specific
development process in this sector. For example building a nuclear power plant is quite
different from building a car. Examples for sector specific requirements are the MISRA
Guidelines or the EASIS Guidelines. This
• Internal policies and strategies
Each company can have the own internal policies and strategies to establish best
practices. This is the company view on risk and safety.
• External and internal process improvements
For each of the previous sources process improvements could recommend new or
changed design process requirements or modified reference process respectively.
• The customer for a system wishes to influence this process.
Another source of the design process requirements is the customer who wants to
influence the development of the system, e.g. he has an own process and he wants that
the product is developed in accordance with his process.
The main goal of process requirements of both categories is to ensure that the system is
developed in a manner that is commensurate with the criticality of the potential hazards associated
with the system. A catchword for this is “correct by construction”. Process requirements include
requirements and methods specified in development standards or guidelines which must be
followed. These standards and guidelines define a reference process which the actual
development process has to comply with. These requirements are category 1 requirements and
14.11.2006 2.0 D-64
EASIS Deliverable D3.2 Part 1 - App D

can be seen as functional requirements on the development process. Other category 1


requirements are attribute based requirements, e.g. that the product requirements shall be
traceable within a process or the work products and activities are plan able. These kind of
requirements can be used to setup an own process (also to “produce” a reference process) or to
assess a specific development process. Requirements to use specific methods or tools during the
development of a system are non-functional requirements and are category 2 requirements.

The following table gives a short overview on both design process categories:
Category 1 Construct or adapt a process according to process requirements
Assess a process according to a reference process (which is fulfilling the
process requirements) or according to the process requirements
Category 2 Select tools, methods, techniques within the development process during the
system development

Process requirements can place real constraints on the design of a system. For example, an
organization may specify that a specific set of CASE tools must be used for system development
because it has experience with these tools. If these tools do not support object-oriented
development, this means that the system architecture cannot be object-oriented.
The aim of process requirements is a systematic way of designing the system. Therefore a special
design process is that qualified and competent staff has to use qualified or certified tools to
develop the system.

D.3.9.2 Characteristics of Design Process Requirements

The characteristics of the design process requirements are different for the 2 categories:
• Category 1 requirements are functional requirements on the design process and
requirements on the process attributes.
• Category 2 requirements are non-functional requirements on the product but are function
requirements on the design process.
Process requirements can be found in a great variety of standards. IEEE-Std 830 – 1998 gives the
following 3 subcategories for process requirements:
• Delivery requirements
• Implementation requirements
• Standards requirements.
This standard requires also attributes which have to be used to formulate requirements. Some
examples are
1. Correct
2. Unambiguous
3. Complete
4. Consistent
5. Verifiable
6. Modifiable
7. Traceable

The process requirements are applicable to the overall system design (engineering) process and
to specific steps or phases within the process.

14.11.2006 2.0 D-65


EASIS Deliverable D3.2 Part 1 - App D

D.3.9.3 How to express this type of requirement

The most requirements are expressed in a textual or tabular form. It is also possible to use
graphical notations to describe complex relations within the process, e.g. the lifecycle model /
process model of the design process.
Textual form is mostly used to express how a specific activity, method, techniques or tools has to
be used. Tabular form is mostly used to provide a compact overview or to present a choice
between specific methods, techniques or tools.

D.3.9.4 Determining Design Process Requirements

The following chapter describes how design process requirement can be determined. The primary
sources are laws, regulations, internal policies and strategies, industry best practices.
Category 1: Design process requirements are mainly found in relevant standards. (See Appendix A
for an overview of relevant standards) The purpose of these requirements is to set up a
development environment which helps to avoid systematic failures.
Category 2: Some standards and guidelines allow the tailoring of the development process. This
means that a company could adopt the reference to their needs.
This approach is widely used in standards and norms where successful procedures are
“preserved”. Standards like IEC 61508 also provides list of methods which could be applied in the
lifecycle. The selection of these methods is producing requirements which are coming out of the
design process.
Another source or design process requirements is the process improvement where new ideas are
introduced in the process and existing processes are measured and adapted.

D.3.9.5 Relationship to Other Types of Requirements

Mainly process requirements can place constraints on the design of a system. This means that
other requirements could not be fulfilled or need to be changed. Beside this there is no relation of
deeper interest because the process requirements have only little influence on the functional
behaviour or the properties of the systems.

D.3.9.6 Examples of Design Process Requirements

Examples of design process requirements can be found on several levels.


Following there are some examples for the Category 1. Some of these requirements could be
found in the Automotive SPICE reference process. The following requirements are belonging the
attribute requirements subset of Category 1 requirements.
• The process shall be amenable for planning
• The product requirements and work products within the process shall be traceable
• A specific version of the final product shall be reproducible
• Each step within a process shall be reproducible.
This means that a process step with the same input shall always produce similar results
independent of the persons who are doing the work.
• The process shall be assessable.

14.11.2006 2.0 D-66


EASIS Deliverable D3.2 Part 1 - App D

The following requirements belong to the functional requirements subset of Category 1


requirements.
• “A development lifecycle has to be defined and documented”
• “Roles and their responsibilities have to be defined and documented”
• “Competent and experienced staff have to work on the system development. The
competence and experience have to be documented”
• “Certified or qualified tools have to be used”
On a more detailed level of functional process requirements, e.g. the phase level of the design
process, the requirements could look like following:
“Each requirement has to fulfil the following attributes:
• completeness, correctness, consistency
• freedom of intrinsic faults
• independence to avoid CC failures
• avoidance of what is not related to safety
• clear and unambiguous documentation
• testability, repeatable
• predictable, defensive, modifiable design
• assessable”
This is an example from IEC 61508-3:2005 which is adapting the ideas from ISO/IEC 9126. These
requirements are describing the functional requirements which tools, methods or techniques have
to fulfil.
On the lowest level methods, specific techniques, methods or tools are the content of design
requirements. Examples are:
• UML shall be used to design and document the system architecture and the software
architecture.
• Code Generation shall be used. One of the following tools shall be used for this purpose …
• The software shall be compliant with MISRA C.
It is important that there should always be a justification or an assumption why the different
requirement will help to produce a safe system. This is a very important link into the final
dependability assessment and the process branch of the Safety Case. More information about that
can be found in Appendix E.

D.3.9.7 Verification/Validation of Design Process Requirements

The verification and validation of design process requirements is very different for the 2 categories.
Category 1: The process requirements are either assessed with a process assessment using a
reference process model with Process FMEA or similar analysis techniques to determine if all
requirements are fulfilled. The most known and used process reference models are
• CMMI
• (Automotive) SPICE
• ISO 9000/9001
• IEC 61508

14.11.2006 2.0 D-67


EASIS Deliverable D3.2 Part 1 - App D

The verification that an assessed process is correctly followed could be done in several ways.
• Activities and work products could be included in a safety plan or project plan and checked
at several phases within the development process
• Tools, methods and techniques could be documented in the process description and
checked at several phases within the development process
This is also a very important link into the final dependability assessment and the process branch of
the Safety Case, because the work products are used as evidences for the correct application of
the development process. More information about that can be found in Appendix E.
Category 2: The lowest level process requirements could be verified by
• Reviews
• Checklists
• Analysis (e.g. Process FMEA)
These activities should show that the requirements are correctly fulfilled.

D.3.9.8 Design Process requirements: Summary

Table D.9 provides a compact summary of the findings concerning design process requirements.
General characteristics Process requirements describe how a system shall be developed.
How to determine Process requirements are usually retrieved out from Standards like
suitable requirements IEC 61508 or DO-178B and they could also document the working
style of a company (selection of tools and methods)
How to express Textual and sometimes graphical (e.g. lifecycle) representation
requirements
Specific difficulties Usually requirements for a design process are not part of the system
development.
Relationship to other Process requirements can place constraints on other requirement
types of requirements types.
Relationship to other none
requirements of the same
type
Verification issues The requirements for the process are usually assessment using a
reference model (e.g. Automotive SPICE). Lower level process
requirements can be verified with Review, Checklists and/or Analysis
Examples Requirements for the design process
• “A development lifecycle has to be defined and documented”
• “Roles and their responsibilities have to be defined and
documented”
Requirements due to a process
• Code Generation shall be used. One of the following tools
shall be used for this purpose …
• The software shall be compliant with MISRA C.

Table D.9 - Essential characteristics of design process requirements

14.11.2006 2.0 D-68


EASIS Deliverable D3.2 Part 1 - App D

D.3.10 Requirements on Isolation and Diversity

When two or more items must be independent (i.e. not susceptible to dependent failures like
common cause failure or cascade failure described in appendix C), it is appropriate to specify
requirements concerning their isolation and their diversity. Diversity is used to avoid having the
same systematic faults in both items. Isolation techniques prevent propagation of faults between
the items.

D.3.10.1 Isolation

For hardware, isolation may be achieved through physical separation (minimum distance between
the two items) or with physical screens (different housing, filtering on common power supply, etc).
Separation and physical screens will be used
• To reduce common sensitivity to the environment (coupling)
• Against external aggression (e.g. external metallic object creating a short circuit between
redundant items)
• To avoid propagation from one item to the other (overheating, electromagnetic interference,
etc).
Another aspect of isolation is to avoid using common resources for the items we want to be
independent (no common power supply, no common data network, no common critical input signal,
etc).
For software components running on the same CPU, isolation is usually called partitioning. It may
be achieved using techniques such as modularity, no shared data, memory protection unit, etc.
Partitioning will be used:
• when two software components must be independent for functional reasons. For example:
when one component monitors the other, a failure of the monitored component must not
disrupt the functionality of the monitoring component.
• when software components with different criticality are executed on the same CPU as a
failure of the low integrity component must not disrupt the functioning of the high integrity
component.
Isolation concerns both systematic and random faults.

D.3.10.2 Diversity

Diversity is used to avoid having the same systematic faults in two or more items. Diversity
requirements may concern the items themselves or the means used to produce them (team, tools,
etc).
Some examples are:
• Diversity in design: use of different core CPU on the main channel and the supervisor
channel of an FSU.
• Diversity requirement on tool: use different compilers to generate software for redundant
channels.
• Diversity against common sensitivity to the environment (coupling): different technologies
between redundant sensors used in a steering angle sensor (e.g. optical and magnetic).

14.11.2006 2.0 D-69


EASIS Deliverable D3.2 Part 1 - App D

D.3.10.3 How to determine, refine and verify requirements on isolation and diversity

Independency requirements, for example originating from requirements on system architecture


(see section D.3.3), will be refined into isolation and diversity requirements.
Isolation requirements are design related requirements. They are directly linked to the way the
functions will be implemented. Isolation requirements can not be defined before a preliminary
architecture is given. In preliminary design phases, general requirements on isolation of signals
and functions can be made. In more detailed design phases, these requirements will be refined.
For example, isolation requirements of critical signals identified in a preliminary phase will later be
derived into detailed separation requirements at connector and harness level.
Partitioning requirements will result from the allocation of software functions to the hardware.
When two software functions mapped to the same CPU are required to be independent from each
other, then these software functions will be put into different partitions. To ensure isolation between
the partitions, software as well as hardware techniques may be required. The level of isolation
required between the different partitions will determine the techniques to be implemented. Also, the
highest criticality level of the partitions will determine the criticality level of the partitioning
mechanism. Partitioning requirements should be refined into mechanisms and techniques to
ensure proper isolation of partitions. For example, software techniques such as asynchronous data
communication to avoid mutual blocking or time sliced CPU allocation may be required. Hardware
techniques such as the use of a memory protection unit can also be required.
As systematic faults may be introduced at different stages in the lifecycle, diversity requirements
will be determined at different levels: specification, design, manufacturing, maintenance, etc.
Whenever independence is required and the likelihood of introducing systematic faults is high, a
diversity requirement can be specified.
The following example illustrates successive refinements of an independence requirement and
interaction between the requirements. Let’s consider a functionality with associated hazards whose
criticality requires the implementation of an independent monitoring of the main function to put the
system in a safe mode in case of a failure. At this stage, redundant hardware will be required to
ensure independence of main and monitoring channel against random hardware faults. Having two
redundant HW channels, we may either require diverse or identical hardware.
If we do not require diverse hardware, additional requirements for independence of the channels
with respect to systematic faults will be needed. For hardware, a minimum distance or even a
screen between the channels may be required, to reduce common coupling from external
environment or physical influence of one channel to the other. To minimize the probability of
having the same systematic fault at the same time on both microprocessors, the instruction set can
be split between the channels. This will also reduce the likelihood of introducing systematic faults
by the tools generating the code. For software, requirements on diversity of algorithms will be
needed.
If diverse hardware are required for the channels, less additional requirements for independence
between the channels with respect to systematic faults will be needed. Physical separation
requirement will still be necessary but it will be weaker, as the likelihood of a common coupling to
external environment is unlikely. Requirement to split the instruction set is no more necessary.
Requirement on diversity of algorithm will still be necessary.
Verification of isolation or diversity requirements may always be done by inspection. For hardware
isolation, verification may sometime be achieved by testing. When hardware isolation requires
physical separation or physical screens, the efficiency of the implementation may be tested.
Table D.10 provides a compact summary of the findings concerning Isolation and Diversity
requirements.

14.11.2006 2.0 D-70


EASIS Deliverable D3.2 Part 1 - App D

General characteristics Isolation and diversity requirements concern avoidance of


dependent failures
How to determine suitable There is no simple method to determine suitable isolation or
requirements diversity requirements
How to express As there are many different types of diversity and isolation
requirements requirements, there is no unique way to express such
requirements.
Specific difficulties There is no simple method to determine suitable requirements
Relationship to other types Isolation and diversity requirements derive from independence
of requirements requirements
Relationship to other Diversity requirements may concern hardware, software,
requirements of the same specification… Different combinations of diversity requirements
type can meet the same level of independence.
Verification issues Most isolation and diversity requirements can only be verified by
inspection.
Examples Main channel and monitoring channel shall be based on diverse
hardware. Especially, the CPU cores of the redundant channels
shall not be of the same type.
Main channel and monitoring channel of the ECU shall be
physically separated with a minimum distance of 10 mm, at
connector level and PCB level.
Decision algorithm on main and monitoring channels shall be
diverse.

Table D.10 - Essential characteristics of Isolation and Diversity requirements

D.3.11 Requirements on the manufacturing process

Requirements on the manufacturing process are in principle outside the scope of this work so only
a brief overview is given here.
Requirements may be placed both on the manufacturing process for a component or subsystem
and on the assembly in the vehicle factory. Such requirements typically come from an identification
of a particular manufacturing (or assembly) error that could give rise to a hazard later on. For
example, it may be the case that a connector needs to be treated in a special way or that the
fastening of a component needs to be made in a particular manner. This type of requirements is
typically expressed as an instruction to the factory staff.
Additional comments concerning requirements on the manufacturing process are given in Table
D.11 below.

14.11.2006 2.0 D-71


EASIS Deliverable D3.2 Part 1 - App D

General Detailed instructions on how the manufacturing (or assembly) of a


characteristics particular component shall be performed.
How to determine If the hazard occurrence analysis shows that a specific manufacturing
suitable requirements error could lead to adverse system behaviour, instructions that seek to
avoid this error should be defined.
How to express Requirements should express the need for manufacturing instructions
requirements that avoid a particular manufacturing error. (Note: This is an output from
the dependability activities. Someone will then have to translate the
requirement into a proper manufacturing or assembly instruction.)
Specific difficulties None
Relationship to other A fault avoidance requirement (for a manufacturing fault) may be
types of requirements broken down into a manufacturing instruction.
Relationship to other None
requirements of the
same type
Verification issues For this requirement type, verification is simply a matter of checking
whether the manufacturing instruction exists and contains the specified
information.
Examples “Bolt B35 shall be fastened with a torque of Y Nm.”

“After assembly of the component, a visual inspection shall be made to


check that...”

Table D.11 - Essential characteristics of requirements on the manufacturing process

D.3.12 Requirements for systems external to the system of concern

This section describes dependability requirements that apply to external systems, and covers
aspects such as where to look to find them, their typical characteristics, their relationships to other
requirements (both other types of requirements and within the hierarchical requirements structure),
and their verification. It also gives some examples of this type of requirement.

D.3.12.1 A context for external systems

In the simplest conceivable world, requirements are elaborated from very general statements of
need at a high level, down through some intermediate levels, until they can be stated in the form of
‘what this system must do’ (and potentially how well it must do it). For systems of systems, this
elaboration will at some point lead to some design decisions being made that will involve
separating off requirements that apply to unique systems. This process may be repeated at many
levels for very large systems, where a detailed and complex hierarchy of systems exists. For
current vehicle systems, the hierarchy is relatively simple, comprising the transport infrastructure
(roads, signs, traffic controls etc) at the highest level, the vehicle at the next level, then control
systems (steering, braking, powertrain, various chassis and body controllers), and finally the
electronic hardware, software, mechanical and hydraulic systems. Within the lifetime of this
document, the ‘control systems’ layer is likely to become hierarchical in its own right, with the
advent of central controllers and smart actuators. See Figure D.20.

14.11.2006 2.0 D-72


EASIS Deliverable D3.2 Part 1 - App D

Infrastructure

Vehicle

Control System A Central Controller B

Smart Satellite C Smart Satellite D


Mech A

Hydr A
H/W A

S/W A

Mech C

Mech D
Hydr C

Hydr D
H/W C

S/W C

H/W D

S/W D
Figure D.20 Hierarchy of Systems
For the purposes of the discussion of requirements on external systems as it applies in this
document, an external system is any system within the greater system (the vehicle) that is not a
lower part of the current hierarchy. With reference to Figure D.20, if Control System A is the
system of concern, Controllers B, C & D are all external systems, and the Vehicle can be
considered the parent system. If the system of concern is Central Controller B, Control System A is
an external system, but Smart Satellites C and D are probably not, and again the Vehicle is the
parent system.
Note that there are circumstances under which sub-systems of the system of concern may be
considered external systems, for instance if the sub-system is being developed by a third party.
At some stage in the development of a system, design decisions will be made that lead to a set of
sub-systems being developed. This is true for an arbitrary system at any level of the hierarchy
outlined above. Part of this particular activity will be to define the interfaces between the various
sub-systems. This activity will be documented in something that may be called an interface
specification. Subsequent development of the interface specification may be greatly influenced by
the requirements determined here for each sub-system, and by any similar activity performed
during development of the external system.
A key term used when discussing requirements is ‘Stakeholder’. In general terms, a stakeholder is
anyone who has an interest in the system of concern. The interest could be functional, financial,
legal, social, or in other categories. Stakeholders include such individuals or bodies as the system
developer, the system purchaser (or a ‘proxy’ representative, for instance in the case of consumer
goods), legislative organisations, standards developers etc. In the case of an Automotive
Integrated Safety System, the stakeholders will include the system integrator for a sub-system, the
system integrator for the vehicle, the system developers for ‘neighbouring’ systems and sub-
systems, the vehicle developer, organisations responsible for the infrastructure and so on. For this
particular discussion, the key stakeholders are the systems integrators, and developers of
‘neighbouring’ systems. These would have been the key stakeholders involved in the creation of
the interface specifications, and they will have an ongoing interest in ensuring that it is both
accurate and suitable for the vehicle being developed.

D.3.12.2 Characteristics of requirements on external systems

Generally, requirements on external systems are fairly straightforward to spot – they will refer to an
aspect of the greater system over which the system of concern has no direct control.

14.11.2006 2.0 D-73


EASIS Deliverable D3.2 Part 1 - App D

D.3.12.2.1 How to express this type of requirement

The requirement itself can be expressed in any of the ways listed under the other types of
requirement, since it will always be one of those types in addition to being a requirement on an
external system. The important point about this type of requirement is that it needs no further
elaboration for use within the system of concern. The development of the external system may be
affected by the addition of the requirement but that is beyond the scope of the system of concern.
Acknowledgement of the existence of this type of requirement leads to an additional attribute being
present for all requirements. This attribute can be considered Boolean in nature at its root, its value
being determined by answering the question “Does this requirement apply to the system of
concern?”. With more knowledge of the system the type of the attribute becomes enumerated
(potentially multi-valued), and is determined by answering the question “To which system (or
systems) does this requirement apply?”.

D.3.12.2.2 How to process this type of requirement

Requirements on external systems need no further elaboration for the system of concern.
However the progress of the requirement within the affected system (acceptance, development,
validation, deployment for instance) may need to be coordinated in order that the various systems
co-operate as expected.
It is of the utmost importance however that the requirement is communicated to the development
team for the relevant external system. This is by no means a trivial task, since the various sub-
systems in the wider system may be at different stages of development. However, a ‘live’
document such as the interface specification alluded to above will aid greatly in this task, as it is an
obvious place to collect and collate requirements from and for other systems.

D.3.12.2.3 Considering the inverse – requirements from external systems

While strictly beyond the scope of this document, it is nevertheless interesting to note that this is a
‘double-edged sword’. External system owners, in their efforts to determine dependability
requirements, may request that certain requirements are placed on ‘our’ system (the system of
concern). These will in their turn need to be examined for potential hazards, and for their
integration into the rest of the system design.

D.3.12.3 Determining requirements on external systems

There are many places to look for dependability requirements that may apply to external systems.
The most obvious place to start is at the definition of the interfaces (for example in the interface
specifications, referred to above), but there are questions that may usefully be asked in an attempt
to elicit other requirements:
• Are there circumstances under which the system of concern needs to know more than is
immediately obvious about the state of another system? For instance the validity or age of an
incoming signal, or the value of an output from the external system to an actuator beyond the
normal scope of the system of concern. This may be under abnormal conditions, where for
instance in reversionary modes the system of concern may modify its behaviour depending on
the state of an external system, which may require that system to publish more information than
is required for normal operation.
• Can requirements for redundancy in the system of concern be satisfied by having an external
system make additional data available over an existing communications channel? This may
present a more cost-effective solution to a need for redundancy than straightforward duplication.

14.11.2006 2.0 D-74


EASIS Deliverable D3.2 Part 1 - App D

• Are there circumstances or states of the system of concern during which the overall integrity of
the parent system can be better maintained by altering the behaviour of an external (sibling)
system? This leads to a whole set of further questions, relating to the design of the parent
system and the sibling system as well as the system of concern.
• How should external systems react to the behaviour that the system of concern may exhibit when
in its various reversionary modes? These reversionary modes may not have been envisaged at
the time that the parent system was designed and the system requirements distributed amongst
the various control systems.
• Does mitigation of a fault that may otherwise lead to a hazard require non-standard behaviour of
an external system?
Requirements of this type should first be examined to determine which of the other categories
described in this document they fall into, and should then be formulated as defined there. The fact
that they are targeted at an external system does not affect the way they are formulated initially,
although the need to comply with processes and guidelines used in the design of the external
system may require adjustments to the policies suggested herein.
It is entirely possible that the requirements discovered for application to external systems are in
fact ‘simple’ functional requirements (as opposed to dependability requirements) when presented
to the external system. In this case, other guidelines need to be sought for formulation of functional
requirements.
Similarly, methods of validation and verification should be determined based on the ‘base type’ of
the requirement. These are presented in this document for dependability related requirements, but
will be specified elsewhere for other types of requirement.

D.3.12.4 Relationship to other types of requirement

This classification of requirements is orthogonal to the other classifications. Any and all other types
of requirements can be requirements on external systems as well as whatever other type they are.
Thus the relationships between this type of requirement and other types, in terms of the
breakdown of requirements, is as specified in the ‘base type’ of the requirement. Note, however,
the comments below on relationships within the hierarchy of requirements.

D.3.12.5 Relationships within the hierarchy of requirements

Within the hierarchy of requirements for the system of concern, requirements on external systems
will generally be ‘leaf nodes’. That is, they may have come to light during elaboration of more
general requirements (and so will have ‘parent’ requirements within the hierarchy), but, being
relevant to systems other than the one of concern, they will not be further elaborated within this
hierarchy (and so will have no ‘child’ requirements or specifications within the hierarchy).
This type of dependability requirement is most likely to appear near the top of the hierarchy, at a
point where interaction with other systems is being considered. For instance, they may be
uncovered during determination of mitigating actions for the causes of hazards. However it is
entirely possible that requirements on external systems are deduced at the very lowest level of
detail.
In terms of the external system’s hierarchy of requirements, these requirements will appear at the
apex of a hierarchy, since they will have no direct parent within the external system. Within that
external system, they may then develop a hierarchy of their own.

D.3.12.6 Examples of requirements on external systems

Requirements on external systems can fall into any of the other categories of requirement
specified within this document. The particular uniqueness is that they do not apply to the system of
14.11.2006 2.0 D-75
EASIS Deliverable D3.2 Part 1 - App D

concern, but to some other system in the larger system-of-systems (generally the vehicle, but
potentially the transport infrastructure too).

D.3.12.7 Verification of requirements on external systems

Section D.3.12.1 above clarified the concept of ‘external systems’. When it comes to verification,
particularly of requirements on external systems, the boundaries become somewhat cloudy.
Verification requires the developer to look beyond the system boundaries, with the aid of the
interface specification, to discover and either stimulate or imitate the interfaces that the system
sees there. At integration, verification requires that the interfaces between systems are tested in
situ. Specifically, one or more of the following (or others) may be checked:
• the response in system Y (or Z) to a stimulus (value change, event, state change) in system Z (or
Y) is correct according to the overall system requirements (emphasis on tests on the overall
system); or
• the data (signals or events) published by system Y is interpreted correctly in system Z that
subscribes to that data (emphasis on tests in the receiving system Z); or
• the data (signals or events) published by system Z is presented correctly to system Y that
subscribes to that data (emphasis on tests in the transmitting system Z).
If we consider system Y to be the system of concern, then we can consider system Z to be an
external system. Any requirement we place on system Z (its generation of a stimulus, its response
to a stimulus, its reception of data or its transmission of data) will need verifying for correctness. In
all three cases listed above, the verification activities lay emphasis away from the system of
concern, either on the external system or on the parent system.
Thus there are two significant aspects to verification of requirements on external systems:
• Verification of External Systems.
Verification of an external system is beyond the scope of this document. However, if that external
system is also developed to the guidelines present in this document set, it will be verified as a
‘system of concern’ in its own right, and will consider requirements from external sources in its
verification plan.
• Verification of Integrated Systems.
Verification at the integration of the system of concern with other systems can be considered as
verification of a larger system, and can also be subject to the guidelines present in this document
set. As such, the total set of requirements, including dependability requirements sourced from
one system but applicable to other systems, will be verified for that parent system as part of the
verification plan for the parent system.

D.3.12.8 Validation of requirements on external systems

Validating that the chosen requirement is correct and that it is correctly targeted at the appropriate
external system requires the validation team to study the requirements of the parent system, and
ask the question “Does this requirement (on a system external to the System of Concern)
represent the best way of satisfying the requirements of the parent system?”. It will not be a trivial
matter finding an answer to this question, given the conflicting nature of requirements on
performance, cost, delivery schedule, resourcing, reuse, enhancement capability etc. It is difficult
to give guidance from the context of this document, but in general, the guidelines that are given for
a system will apply also to the parent system, albeit to a greater or lesser degree depending on the
context.
Inevitably, the final decision on whether the requirement is valid remains with the owner of the
parent system, whose task it is to balance and allocate the requirements across the whole system,
and to define and ensure implementation of an appropriate validation plan.
14.11.2006 2.0 D-76
EASIS Deliverable D3.2 Part 1 - App D

D.3.12.9 Requirements on external systems: Summary

Table D.12 provides a compact summary of the findings concerning requirements on external
systems.

General Requirements on external systems refer to an aspect of the greater


characteristics system over which the system of concern has no direct control.
How to determine A set of questions is proposed. These should be addressed to the
suitable requirements interfaces of the system of concern, and to its modes, in an attempt to
determine whether the system of concern places requirements on
external systems.
How to express Requirements should be expressed in their highest level as they apply to
requirements the external system, following the guidelines provided elsewhere in the
EASIS documentation set for requirements of the type appropriate to the
deduced requirement.
Specific difficulties Agreement on ownership of requirements outside the scope of the
system of concern.
Acceptance that the correct and reliable functioning of one part of the
system may depend on some aspect (function, reliability, data provision
rate/content etc) of another system.
Relationship to other This type of requirement will also be another type of requirement: either
types of requirements one of the types described in this document, or a ‘regular’ requirement
(one specifying the functionality of an external system rather than its
dependability). Any relationship to other requirements are a function of
this other type of the requirement.
Relationship to other Not applicable (see above).
requirements of the
same type
Verification issues Requirements on external systems must be verified on external systems.
Examples “Data concerning engine speed shall be accompanied by an indication of
the validity and age of the engine speed data.”
“System Y shall provide an indication of the state of its output P.”
“System Y shall not attempt to change the state of its output Q when [the
system of concern] is in state S.”

Table D.12 - Essential characteristics of requirements on external systems

D.3.13 Requirements on user manual and service manual

Some hazards may be dealt with by the incorporation of appropriate information in the owner’s
manual. This is particularly suitable when a hazard is characterized by a misunderstanding of how
the system works. Although such hazards are in principle outside the scope of EASIS WT 3.1, a
few comments are given here. Some examples of this type of user manual information are:
• Explanation of how the system works to prevent the driver from misunderstanding its
operation
• Description of the Human/Machine Interface (HMI) including how the driver should interact
with the system

14.11.2006 2.0 D-77


EASIS Deliverable D3.2 Part 1 - App D

• Explanation of the inherent limitations of the system so that the driver does not expect too
much
• Description of the driving scenarios in which the system may perform inadequately
• System behaviour characteristics that the driver should be aware of (for example that the
brake pedal oscillates during ABS braking)
• Descriptions of how the user should react to error information such as “Service required”
• Explanation that the existence of a particular Safety System (collision warning, collision
avoidance, airbag, etc) does not warrant a less careful driving style than normal
For any system, the points above (and possibly other similar issues) should be considered.
Corresponding instructions shall be introduced in the user manual whenever appropriate.
If it has been identified that inadequate service actions might give rise to hazards, the service
manual shall highlight this by providing appropriate instructions on how to perform service.
Examples of requirements on the service manual (or other service instructions) are:
• Instructions for identification of root fault
• Assembly and mounting instructions (torque for fastening bolts, etc)
• Instructions for specific activities such as calibration of sensors
• Instructions for how to verify that a service action has been correctly made
These instructions may be complemented by warning text stickers on the components themselves.
Concerning maintenance, it may be noted that two types of hazards are possible:
• Hazards to the user of the vehicle (or to other road users) as the result of an incorrectly
performed maintenance action
• Hazards to the service technician in case he/she performs a maintenance action in the
wrong way
We are primarily concerned with the first of these, but the second one should obviously not be
forgotten.

14.11.2006 2.0 D-78


EASIS Deliverable D3.2 Part 1 - App D

D.3.13.1 Requirements on user manual and service manual: Summary

Table D.13 below provides a compact summary of the findings concerning requirements on user
manual and instruction manual.

General Requirements on user manual and service manual dictate specific


characteristics information that shall be included in a manual.
How to determine This type of requirement is relevant whenever a lack of understanding
suitable requirements could have adverse effects.
How to express Requirements are simply stated as “The user manual shall explain
requirements that...” or “The user manual shall describe...”. Requirements on the
service manual are expressed in the same way. (Note: This is an output
from the dependability activities. Someone will then have to translate the
requirement into a proper formulation in the respective manual.).
Specific difficulties None
Relationship to other None
types of requirements
Relationship to other None
requirements of the
same type
Verification issues Verification is simply a matter of checking that the user manual (or
service manual) contains the required information.
Examples “The user manual shall clearly show where the on/off switch for the
collision warning is located”
“The user manual shall explain that the collision warning system is not
capable of recognizing every possible near-collision situation”
“The user manual shall explain that the radar-based functions will not
perform correctly if the radar is covered by snow or other substance”
"The service manual shall describe precisely how the radar equipment is
to be mounted in the vehicle"

Table D.13 - Essential characteristics of requirements on user manual and service manual

14.11.2006 2.0 D-79


EASIS Deliverable D3.2 Part 1 - App D

D.4 References

[1] IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems, IEC,
1998.
[2] Interim Defence Standard 00-56 Issue 3: Safety Management Requirements for Defence Systems, UK
Ministry of Defence, 2004.
[3] Reducing risks, protecting people: HSE’s decision-making process (“R2P2”), ISBN 0 7176 2151 0, UK
Health & Safety Executive, Her Majesty’s Stationery Office, 2001.
[4] Always Read The Leaflet: Getting the best information with every medicine. Report of the Committee on
Safety of Medicines Working Group on Patient Information, The Stationery Office, ISBN 0 11 703409 6,
2005.
[5] C. Jung, Stand des ISO-Standards zur Funktionalen Sicherheit für die Automobilindustrie, (mainly in
English), Presentation at Safetronic 2005.
[6] EASIS Engineering Process Framework, EASIS Deliverable D4.1, 2006.

14.11.2006 2.0 D-80


EASIS Deliverable D3.2 Part 1 - App E

Deliverable D3.2 Part 1 – Appendix E

Safety Case construction

Version number: 2.0

Date of preparation: 14.11.2006

© 2006 The EASIS Consortium


EASIS Deliverable D3.2 Part 1 - App E

This page intentionally left blank

14.11.2006 2.0 E-ii


EASIS Deliverable D3.2 Part 1 - App E

Table of contents

E.1 Objectives and Structure of this report......................................................... E-1


E.2 Introduction, context and scope of the Safety Case..................................... E-2
E.2.1 History ................................................................................................... E-2
E.2.2 Prescriptive and Goal-based approach ................................................. E-2
E.2.3 Definition of the Safety Case................................................................. E-3
E.3 Scope and context ....................................................................................... E-5
E.3.1 Standards.............................................................................................. E-5
E.3.2 Failure Handling .................................................................................... E-7
E.3.3 Why Use Safety Cases for Integrated Safety Systems? ....................... E-7
E.3.4 Arguments ............................................................................................. E-8
E.3.5 Evidence ............................................................................................. E-10
E.3.6 Claims as defined by Adelard ............................................................. E-10
E.3.7 The ALARP principle ........................................................................... E-11
E.3.8 Links towards WT3.1 subtasks and WP4............................................ E-11
E.4 Safety life cycle .......................................................................................... E-12
E.4.1 Safety Case life cycle.......................................................................... E-12
E.4.2 Safety Case (status) report ................................................................. E-15
E.4.3 Safety Case maintenance ................................................................... E-17
E.5 Notations .................................................................................................... E-19
E.5.1 Why graphical methods?..................................................................... E-19
E.5.2 ASCAD ................................................................................................ E-20
E.5.3 Goal Structuring Notation .................................................................... E-22
E.5.4 Comparison of the two Notations ........................................................ E-28
E.5.5 Advantages and disadvantages of using GSN or ASCAD .................. E-30
E.6 Overview of the Safety Case...................................................................... E-31
E.6.1 Safety Case Development steps ......................................................... E-31
E.6.2 Relation between Engineering and Safety Case................................. E-32
E.6.3 Product and Process requirements ..................................................... E-32
E.6.4 Arguments by requirements ................................................................ E-35
E.6.5 Bottom up analysis .............................................................................. E-36
E.7 The contents of the Safety Case................................................................ E-37
E.7.1 Topics.................................................................................................. E-37
E.7.2 Documents included............................................................................ E-39
E.7.3 Safety arguments ................................................................................ E-40
E.7.4 Supporting evidence ........................................................................... E-41

14.11.2006 2.0 E-iii


EASIS Deliverable D3.2 Part 1 - App E

E.7.5 Hardware and software ....................................................................... E-41


E.7.6 Evidences............................................................................................ E-43
E.8 Architectural approaches for Safety Cases................................................ E-46
E.8.1 Safety Tactics...................................................................................... E-46
E.8.2 Reusing Arguments by applying Patterns ........................................... E-47
E.8.3 Modules............................................................................................... E-54
E.8.4 COTS .................................................................................................. E-60
E.9 Assessment................................................................................................ E-62
E.9.1 Safety Case Assessment .................................................................... E-62
E.9.2 Assessment with SAL ......................................................................... E-62
E.9.3 Assessment of the Evidence ............................................................... E-67
E.10 Process ...................................................................................................... E-69
E.10.1 Correct by construction ....................................................................... E-69
E.10.2 Engineering Process ........................................................................... E-69
E.10.3 Safety Process .................................................................................... E-70
E.10.4 System Engineering and Safety Processes ........................................ E-72
E.10.5 EASIS Engineering Process ............................................................... E-74
E.10.6 Process Assessment........................................................................... E-75
E.10.7 Problems ............................................................................................. E-75
E.11 Tools .......................................................................................................... E-76
E.11.1 ASCE tool............................................................................................ E-76
E.11.2 ISCaDE ............................................................................................... E-79
E.11.3 ESafetyCase ....................................................................................... E-80
E.11.4 GSNCaseMaker .................................................................................. E-82
E.11.5 SAM .................................................................................................... E-82
E.12 Illustrations ................................................................................................. E-84
E.12.1 Safety Case Checklist ......................................................................... E-84
E.12.2 Process Part........................................................................................ E-86
E.13 References................................................................................................. E-87

14.11.2006 2.0 E-iv


EASIS Deliverable D3.2 Part 1 - App E

E.1 Objectives and Structure of this report

This Appendix has two main objectives. Firstly it introduces the overall ideas and the structure of
the Safety Case itself. By comparing the concept of Safety Case to standards and guidelines the
need is clarified and the importance of this type of confirmation is underlined. The second objective
of this Appendix is to provide guidance on the methods and processes available for constructing a
Safety Case for an Integrated Safety System. The intention is not to provide a prescriptive
approach that must be followed; rather, to recommend the principal features that must be found
and documented in an appropriate method and the key stages that must be followed in
constructing a Safety Case.
Furthermore a proposal for assessment is given and its application is explained. By assessing the
Safety Case the confidence that forms the basis of the Safety Case can be reasoned. On the last
pages tool support is touched on.
This Appendix consists of the following parts:
• Introduction: a description of what a Safety Case is intended to achieve, and some history
of the Safety Case.
• A listing of standards is given and a description of the structure and principles of Safety
Cases is expounded.
• Safety life cycle: Provides an answer on what different stages, tactics and phases are
underlying and what content should be achieved.
• Notations: an introduction and comparison of the two most common graphical notations is
given.
• Overview of the Safety Case: this is a possible breakdown and shows an approach for
creating a Safety Case.
• Elements of a Safety Case: This chapter offers an idea what contents a Safety Case should
have and what actions can be done to create evidence.
• Architectural approaches from software engineering are explained and applied.
• Safety Case Assessment: description of a method to ensure that the Safety Case is
sufficiently trustworthy
• Safety Case Engineering is explained.
• Tools: some relevant tools are analyzed
• Illustrations: some illustrations of Safety Case relevant documents

14.11.2006 2.0 E-1


EASIS Deliverable D3.2 Part 1 - App E

E.2 Introduction, context and scope of the Safety Case

This chapter provides an overview of the history and the different definitions of a Safety Case and
its content. This is required due to the fact that the Safety Case concept is a relatively new concept
for the automotive industry and it was mainly used only within the UK. Therefore a detailed
introduction of the different topics, i.e. the construction and the content, related to a Safety Case is
provided.

E.2.1 History

As a short introduction of a Safety Case the history will be given. Within the history it will be
explained where Safety Cases appear and why they became necessary.
As Kelly stated in his doctoral thesis [1] the first Safety Case arose from a catastrophic accident in
the Windscale power plant.
In the year 1958 a fire broke out forcing the release of radioactive rejects to the atmosphere. This
was a catastrophe causing the death of 32 people and 260 cancer cases from radiation. As a
consequence the Nuclear Installations Act (NIA) 1965 was introduced. The aim was to regulate the
installations of all commercial nuclear reactors in the UK. As a part of this Act the Nuclear
Installations Inspectorate (NII) was founded to control all nuclear reactors in the UK. After handing
out a special report including justifications of safety of the design construction and operation of the
plant the NII agreed to an operating license. This report could be seen as the first Safety Case.
There are some other examples where such safety regulations were developed like: CIMAH
(“Control of Industrial Major Accident Hazards”, 1984) as the consequence of an accident in
Flixborough 1974, the Health and Safety at Work Act (HSWA) 1974, the Ionizing Radiations
Regulations 1985, the Radio Substances Act 1993 and many more.
Before these safety regulations were introduced safety was however not ignored. There were
standards and safety thinking. But this new understanding of safety is more thoroughly and better
documented as will be shown in the following chapters. It offers the possibility of putting the
responsibility for safety on the customer or regulator.

E.2.2 Prescriptive and Goal-based approach

The main question in this chapter is if Safety is guaranteed by standards in an adequate level.
Safety standards advise processes and practices and the level of Safety to be reached. In the
past they regulated what to do and what is not allowed to do. This thinking changed to the goal-
based approach. The problem with those standards was that in times of fast technical
development, Safety changes fast as well. A more flexible approach is needed which allows more
innovation. The approach should be able to allow the Safety Engineer to demonstrate an adequate
level of Safety with evidence that fits the requirements individually set for this specific product. In a
prescriptive approach this is difficult.
Prescriptive In a prescriptive procedure the actions that are carried out and the ones that are
not are adhered. The question is if the documents that are provided are sufficient
and necessary. This depends on the system and as stated above the system
requirements change due to technology changes.
Goal-based In contrast to the prescriptive approach, goal-based development only provides
the claim what should be achieved. This approach requires a lot of experience.
The difference is that in this approach it is necessary to set up the requirements
as needed by a specific argumentation that is not specified by a standard.

14.11.2006 2.0 E-2


EASIS Deliverable D3.2 Part 1 - App E

The processes itself are as follows:

Identify standard and its Set-up


requirements requirements

Satisfy Satisfy
requirements requirements

Show that requirements Show that requirements


are met are met

Figure E.1: Difference between prescriptive and goal-based approach


Figure E.1 explains the different processes. As illustrated the main difference lies in setting up the
requirements. In the prescriptive approach the requirements are already provided. In the past this
has been a good solution, but due to the fast evolving technology the goal-based approach
became more and more used.
Example: Requirements that ensure correct functioning of a car:
Prescriptive A checklist is provided which advices the car owner what has to be checked to
guarantee the car is running fine and performs all its intended functions.
Goal-based The owner himself checks the car so that wrong behaviour can be avoided. But
this enforces a lot of knowledge about the car. This could be “Brakes are working
sufficiently” but this depends on the individual.

On the one hand the goal-based approach has advantages like much easier innovation, but on the
other hand the level of trustworthiness can be afflicted. Use of prescriptive standards can reduce
the onus on the supplier to achieve ongoing risk reduction through life. Prescriptive Safety
regulations sometimes lead to a reduced sense of ownership of Safety followed by understanding
Safety as a “tick in the box” (a naïve view but sometimes seen).
As well as the design of such a structure, the assessment of goal-based justifications is more
involved. The assessor has to understand the structure first and then he can start to think about
the quality. This underlines that the basics (Argument structures) presented in the next chapter are
absolutely necessary for this type of Safety certification.

E.2.3 Definition of the Safety Case

The first point of interest in this Appendix is the definition of a Safety Case. The main problem is
that the Safety Case is more than just a number of papers written with regard to safety. To stay
close on its fundamentals two already existing and most common definitions will be compared.

14.11.2006 2.0 E-3


EASIS Deliverable D3.2 Part 1 - App E

E.2.3.1 Different or equivalent definitions

As Kelly illustrates a Safety Case should “communicate a clear, comprehensive and defensible
argument that a system is acceptably safe to operate in a given context” [2].

Acceptably Means that there is a tolerable risk remaining. The Safety Case should explain
safe that the underlying system is safe enough to operate. This safety depends on
the underlying industry, society and judicial reasons.
Context Context-free safety is impossible. It should be exactly declared which
application in which environment is discussed.
Clear is about being understandable and having good structure.
Comprehensive Comprehensive is about being acceptably complete, but will contribute to clarity
by ensuring that the full argument is present.

Understandably the Safety Case is more than just documentation. It collects all information
associated with safety, gives an overview and builds up connections between them. Therefore
clear comprehension is inevitable.
Another quite similar definition is given by Bishop and Bloomfield (Participants in the SHIP
Project): “A Safety Case is a documented body of evidence that provides a convincing and valid
argument that a system is adequately safe for a given application in a given environment” [3].
Clear and comprehensive can be taken to be implied by the documented body.
“Convincing and valid” means almost the same as defensible.
Acceptably is similar to adequately.
MoD Def Stan 00-56 [4]: “The Safety Case is the primary means of demonstrating safety. It should
show, from an early stage in the acquisition process, how safety will be achieved. Thus the Safety
Case will initially identify the means by which safety will be achieved and demonstrated; at later
stages detailed arguments and supporting evidence will be developed and refined. Once the
system is operational, the Safety Case will demonstrate how safety will be maintained.”
The first definition from Tim Kelly is preferred in this appendix, but the three definitions have been
shown to be sufficiently similar as to be considered equivalent.

14.11.2006 2.0 E-4


EASIS Deliverable D3.2 Part 1 - App E

E.3 Scope and context

In safety related systems where failures can lead to catastrophic or at least dangerous
consequences Safety Cases are implemented to demonstrate that a system is acceptably safe. As
stated in the introduction, Safety Cases are already in use for nuclear power plants, air planes and
railways. Some relevant standards are listed below and the content of a selection is examined in
some detail in Appendix A.

E.3.1 Standards

Some might think safety is guaranteed by standards. Unfortunately this is not enough. Safety
standards advise processes and practices but not the level of safety to be reached. Furthermore,
standards are given for a specific type of product but in most cases not for a specific design. Most
standards use limits like Safety Integrity Level or Development Assurance Level (see
Chapter E.9.2).
Some of the relevant standards are mentioned below:
ARP 4754 Certification Considerations for Highly-Integrated or Complex Aircraft Systems
was written by Systems Integration Requirements Task Group AS-1C, ASD SAE
on April the 10th 1996. This document discusses the certification aspects of
highly-integrated or complex systems installed on aircraft, taking into account the
overall aircraft operating environment and functions. [28]
CAP 670 Air Traffic Services Safety Requirements published by Safety Regulation Group
Civil Aviation Authority 2005. This standard can be downloaded from the
homepage: www.caa.co.uk. “CAP 670 Air Traffic Services Safety Requirements
describes the manner in which approval is granted, the means by which Air
Traffic Service (ATS) providers can gain approval and the ongoing process
through which approval is maintained.” [37]
Def Stan RELIABILITY AND MAINTAINABILITY ASSURANCE GUIDES was created by
00-42 the UK Ministry of Defence. This Defence Standard provides guidance to
accommodating Ministry of Defence (MOD) Reliability and Maintainability (R&M)
practices, procedures and requirements in the design process. This standard
was one of the first which introduced a concept of Reliability Case with a similar
structure of arguing. [38]
Def Stan REQUIREMENTS FOR SAFETY RELATED ELECTRONIC HARDWARE IN
00-54 DEFENCE EQUIPMENT. This Part of the Interim Standard describes the
requirements for procedures and technical practices for the acquisition of Safety
Related Electronic Hardware (SREH). Compliant procedures and practices shall
be required by all MOD Authorities involved in the original procurement of SREH,
whether COTS, reused or application specific, and during maintenance and
replacement. [30]
Def Stan REQUIREMENTS FOR SAFETY RELATED SOFTWARE IN DEFENCE
00-55 EQUIPMENT since August 1997. It summarizes its contents as follows: “The first
Part of the Standard describes the requirements for procedures and technical
practices for the development of Safety Related Software (SRS). The second
Part of the Standard contains guidance on the requirements contained in Part 1.
This guidance serves two functions: it elaborates on the requirements in order to
make conformance easier to achieve and assess; and it provides technical
background.” [31]

14.11.2006 2.0 E-5


EASIS Deliverable D3.2 Part 1 - App E

Def Stan SAFETY MANAGEMENT REQUIREMENTS FOR DEFENCE SYSTEMS was


00-56 created in December 1996. The first Part of the Defence Standard describes the
requirements for safety management, including hazard analysis and safety
assessment.
The second Part of the Defence Standard provides information and guidance on
the first part. [39]
Def Stan HAZOP STUDIES ON SYSTEMS CONTAINING PROGRAMMABLE
00-58 ELECTRONICS. This standard provides requirements for processes and
technical practices for Hazard and Operability Studies (HAZOP Studies). [40]
IEC 61508 In principal this standard provides an approach for achieving functional safety. It
is published in seven parts, whereas only the first four parts contain normative
requirements.
Part 1: General requirements
Part 2: Requirements for E/E/PE safety-related systems
Part 3: Software requirements
Part 4: Definitions and abbreviations
Part 5: Examples of methods for the determination of safety integrity levels
Part 6: Guidelines on the application of IEC 61508-2 and IEC 61508-3
Part 7: Overview of techniques and measures [35]
ISO 9001 QUALITY SYSTEMS - MODEL FOR QUALITY ASSURANCE IN
DESIGN/DEVELOPMENT, PRODUCTION, INSTALLATION AND SERVICING.
ISO 16949 QUALITY MANAGEMENT SYSTEMS -- Particular requirements for the
application of ISO 9001:2000 for automotive production and relevant service part
organizations“, International Organization for Standardization, 2002.
MIL-STD-882D STANDARD PRACTICE FOR SYSTEM SAFETY was written by the US
Department of Defense in February 2000.
A key objective of the DoD system safety approach is to include mishap risk
management (consistent with mission requirements) in technology development
by design for DoD systems, subsystems, equipment, facilities, and their
interfaces and operation; and to include this risk management by design. The
DoD goal is zero mishaps. [36]
Yellow Book Engineering Safety Management was published by Railtrack PLC in January
2000 and is distributed by its new homepage: www.yellowbook-rail.org.uk. It was
written to help people who are involved in changes (e.g. new trains etc.) to the
railway make sure that these changes contribute to improved safety.

14.11.2006 2.0 E-6


EASIS Deliverable D3.2 Part 1 - App E

Some of the relevant standards are categorized in the following table:


Standards
Standards, Processes Goal-Based
and Note/Comment
and research projects Approach
Guidelines
IEC 61508 X Requires “Assessment of the functional
safety” to show compliance with itself
ARP 4754 X
MISRA Guidelines X X The MISRA Guidelines fall somewhere
between the two – there is less
prescriptive emphasis than 61508 but it
is not fully a goal-based approach
Def-Stan 00-55 issue 3 X Was influenced by the work of Adelard
“Requirements for and Kelly
safety-related software”
Def-Stan 00-56 “Safety X
Management
Requirements”
ASCAD X

Table E.1 Safety Case classification

E.3.2 Failure Handling

The overall aim is:


1) Find and identify possible faults and
2) Provide evidence that this fault is one of the following:
a) Not recurring
b) Appearing acceptably rarely
c) With acceptably low effects
Techniques to cope with faults are: fault prevention: prevent occurrence of faults; fault
tolerance: deliver service if faults are present; fault removal: reduce frequency of faults; fault
forecasting: estimate future appearance of faults.

E.3.3 Why Use Safety Cases for Integrated Safety Systems?

Normally it can be stated that other documents (e.g. the safety plan) are sufficient to demonstrate
an adequate level of safety. Here the arguments for using a Safety Case instead are considered.
• Provide a demonstration that an adequate level of safety is reached (Identify and justify
unsolved hazards, ensure that risks are acceptably low and present or reference evidence)
• Explain how safety is maintained throughout the lifetime of the system
• Minimize licensing risk (ability to demonstrate adequate safety to the regulators and
assessors)
• Minimize commercial risk (ensuring maintenance and implementation costs are acceptable)

14.11.2006 2.0 E-7


EASIS Deliverable D3.2 Part 1 - App E

Conclusion of the previous four advantages: A Safety Case helps the regulator and customer
to control the risks and costs his product will have and that these risks have been effectively
managed.
The main components are divided into the following three main categories. The understanding of
those elements will be improved during reading this report. In later chapters graphical notations are
introduced. From there on it is easy to understand the whole impact of the following three
components:
Requirements: Point of discussion
Arguments: Explanation and relationship between requirements and evidence
Evidence: Information that supports the claim that the safety requirements and objectives are met

Safety Requirements and


Objectives

Safety Argument

Safety Evidence

Figure E.2: Role of the Safety Argument [1]


The Argument describes the “route” from the goal to the evidence or the other way round,
depending on preferred point of view. These elements will be explained in the next pages.

E.3.4 Arguments

Arguments are actions of inferring a conclusion from premised propositions. Conclusions should
always be either TRUE or FALSE.
An argument is considered valid if the conclusion can be logically derived from its premises.
Otherwise the argument is considered invalid.
An argument is considered sound if it is valid and all premises are true.
It should be mentioned that the argument has an important but often neglected role. It goes hand-
in-hand with the evidence. Evidence without argument is unexplained (for example thousands of
test results without the link to the objective) and an argument without evidence is unfounded.

E.3.4.1 Argument design

The overall objective of an argument is to lead the evidence to the claim.


As described by Bloomfield and Bishop [3] the arguments should be of one of the following types:
Deterministic an analytical application of predetermined rules to derive a true/false claim
(example: execution time). These arguments support a claim by showing that
certain hazards are not believed to happen in the real world because of logical
reasons.
Probabilistic is a quantitative statistical reasoning to establish a numerical level (for example
MTTF “mean time to failure” or MTTR “mean time to repair”).

14.11.2006 2.0 E-8


EASIS Deliverable D3.2 Part 1 - App E

Qualitative compliance with rules that have an indirect link to desired attributes (example:
staff skills and experience). These arguments are more difficult to evaluate. The
ratings might be assigned by expert judgment.

E.3.4.2 Structure of Arguments - philosophical view

To describe how arguments can be designed and to evaluate them an argument requires a clear
and demonstrable understanding of the elements and structure. Govier [5] uses a graphical
notation:

Single support

One premise supports the conclusion

Linked support

Several premises together support the conclusion


+

Convergent support pattern

Several premises independently support the conclusion

An Argument is said to be hybrid if it doesn’t fit to one of the three structures described above. In
the convergent support pattern further differentiation is achieved by researching the independence.
There are two kinds: conceptual independence (different underlying theories) and mechanistic
independence (different approaches but the same theory). This will be of greater relevance when
the Safety Case is analyzed in more detail.
As Kelly observes in his doctoral thesis [1]: “Extra structure such as this makes the process of
constructing a safety justification more predictable and manageable, e.g. so that the forms of
premise required to justify a particular conclusion are known.”
It is recommended that the three Argument structures above are understood and kept in mind
while creating a Safety Case, because they help to build up sound and strong Arguments.
Toulmin’s scheme [6] asserts that arguments consist of grounds, claims, warrants and backing:
• Claims are statements the truth of which needs to be confirmed.
• The justification for the claim is based on grounds “specific facts about a precise situation
that clarify and make good the claim”.

14.11.2006 2.0 E-9


EASIS Deliverable D3.2 Part 1 - App E

• The argumentation of the facts to the claim in general ways is called warrant.
• As basis for the warrant Toulmin introduces backing for the warrant. It includes the
validation for the scientific and engineering laws used.

Warrant

Backing
Claims Grounds

Figure E.3 Toulmin's scheme [6]

E.3.5 Evidence

At this stage it is roughly explained what evidences are. A catalogue would be more detailed (will
be given at a later stage, see chapter E.7.6, Evidences) but this part is only descriptive.
Evidence at different stages can be:
• Sub-claims (separately explained with sub arguments)
• Descriptions (of the actual design e.g. Block diagrams)
• Facts (test results, Analysis Reviews, formal methods, etc.)
• Assumptions (sometimes necessary but not always real)
To get evidences Bloomfield and Bishop [3] suggest taking a closer look at the following points:
• design
• development processes
• simulated experiences (testing; this can be divided in several sub-tests)
• prior field experience (analyzing the past)

E.3.6 Claims as defined by Adelard

Within [7] there is only one top goal. In most cases this is “{System X} is safe enough to operate”.
The more detailed look on this statement will be given later as well as a closer look at Arguments.
The Safety Case is mostly divided into several sub-goals but these sub-goals are part of the
Argument itself. The sub-goals are often out of the following kinds:
• Reliability and availability
• Security (from external attack)
• Functional correctness
• Time response
• Maintainability
• Usability and Accuracy

14.11.2006 2.0 E-10


EASIS Deliverable D3.2 Part 1 - App E

• Robustness to overload
• Modifiability

E.3.7 The ALARP principle

The prior objective was to show that the system is “safe enough”. For this reason the ALARP (“As
Low As Reasonably Practicable”) principle is one of the most important basics of the Safety Case.
For further information the reader is referred to Appendix D. It is a mixture of philosophical and
judicial ideas. It tries to differentiate whether the risk is:
a) Too big and must not be tolerated
b) Small enough to be neglected or
c) The risk lies between a) and b) and therefore it is decreased as much as practicable
The ALARP principle says that the risk should be minimized. At least it should be fixed in a
practicable region. The responsible individual or responsible organization tries to show that the risk
is tolerated by society, and that the reduction of risk would be out of proportion to its costs
(“financial trade-off between cost and level of risk” [8]). Later in this appendix there will be
contextual information added where this principle is the basic of. Furthermore the so-called ALARP
Pattern can be applied (refer to chapter E.8.2, Reusing Arguments by applying Patterns).

E.3.8 Links towards WT3.1 subtasks and WP4

The methods, analysis techniques and frameworks described within the subtasks A to D within
WT3.1 are used as arguments within the Safety Case. The validation and verification results for
the requirements of Appendix E are used as Evidences. The EEP described by WP4 is used for
the argumentation about the development process. Results of WT3.1 will be found in the Product
branch of the Safety Case (E.6.3.2) and the Process branch of the Safety Case (E.6.3.1). Results
of WP4 will be found in the Process branch of the Safety Case (E.6.3.1).

14.11.2006 2.0 E-11


EASIS Deliverable D3.2 Part 1 - App E

E.4 Safety life cycle

A Safety Case can be developed at different stages. Historically it was not. At early phases of the
Safety Case the design was created and the Safety Case was produced at the end. But Safety
Case developers and practitioners found it better to create it incrementally, as described in the
next paragraph.

Hazard Identification Production of


& Risk Estimation the Safety Case

Preliminary Safety Test and Inspection


Assessment
Confirmatory Analysis

Construction and
Development Codes

Figure E.4: Historical View [1]

E.4.1 Safety Case life cycle

First it should be mentioned that the Safety Case always has to be started at the earliest phase
possible.
The Defence Standard 00-56 [4] says: “The Safety Case should be initiated at the earliest possible
stage in the Safety Program so that hazards are identified and dealt with while the opportunities for
their exclusion exist”. The idea is to start the so called preliminary Safety Case when development
of the system is started. By this there maybe is a possibility to handle some of the hazards before
the planning stage is completed. If the design of the product is already produced or even has
started production it is more difficult and expensive to rectify an aspect of the product likely to
cause or lead to a hazard. Another Problem that arises if the Safety Analysis is started late, is that
the Arguments can be less reliable because the design can’t be influenced by safety decisions.
This leads to an evolving Safety Case. It can be compared to the Design Lifecycle as shown in the
following graph:
Requirements Complete System

Design and planning Integration and test

Implementation

Preliminary SC Final SC

Interim SC

Figure E.5: Integrated View of Safety Case Development [2]

14.11.2006 2.0 E-12


EASIS Deliverable D3.2 Part 1 - App E

The Preliminary Safety Case is started as soon as a relatively stable and controlled system
requirements specification is available. After this stage discussions with the customer can
commence about possible safety issues (Hazards). The second phase is the Interim Safety Case.
It is situated after the first system design and tests. The third part is the Operational Safety Case
and it is placed prior to in-service use. All in all it is rather one incremental Safety Case than
several complete new ones.

E.4.1.1 Preliminary Safety Case

Before talking of a Preliminary Safety Case and what it is the Safety Plan is explained. It consists
of the following points:
• Safety Process Definition, Tasks, Schedule, Resources, Work Products
• Roles, Responsibilities
• Staff Competencies, Skills and Experience Matrix
• Reporting Arrangements
• Contractual Agreements
• Dispute Resolution Provision
The Safety Plan is created on the basis of the safety capabilities which are previously developed
and the particular requirements of the project. Resources allocated to safety tasks and risk
mitigation work would be based on past experience of the organization. Progress of safety work on
the project would be monitored against these provisions.
Before the Preliminary Safety Case can be started, the following has to be accomplished:
• Design key safety processes roles and responsibilities
• Identify safety properties
• Preliminary Hazard Analysis (systematic review of system design concept) and Risk
Estimation (check the severity level and likelihood of detected hazards)
As a result of this the Preliminary Safety Case should include the following:
• System contents and top safety requirements
• Important standards or points of principle concern for the final Safety Case
• Main safety concerns (results of Risk Estimation and Hazard Analysis)
• Safety Analysis or tests
• Development description (Change Management, coding standards, etc.)
• start from scratch proofing Evidences and explain how to ensure that the system is safe
So in fact the Safety Plan already includes most of the information that is needed to create the
Preliminary Safety Case.

E.4.1.2 Safety Case life cycle

As described in the YELLOW BOOK Volume 2 (page 1 – 4, figure 1- 1) [9] of railway safety, the
evolution of the Safety Case is a course of several safety activities. A safety authority is required,
to endorse some activities. Typically the process starts with the preliminary safety plan and goes
on with the Hazard Log1 and Hazard Identification2. Risk Assessment is the next point and setting

1 Hazard Log: Lists all potential hazards of the project or product in its environment.

14.11.2006 2.0 E-13


EASIS Deliverable D3.2 Part 1 - App E

up safety requirements. The next step is the preparation, endorsement and implementation of a
safety plan (as explained above). An independent safety assessor should carry out safety
assessment (this means an evaluation of the safety or risks) and work out a report. The final action
is preparation of the final Safety Case and with it the Safety Approval. The three phases of the
Safety Case merge into one another so the borders are fluid. It should be mentioned that the
Safety Case has to be continuously updated and maintained.
Preliminary Safety Case Interim Safety Case Operational S. C.

(4) Hazard (6) Set up (10) Safety


Identification Safety (8) Implement Assessment (12) Safety
(2) Hazard Log Analysis Requirements Safety Plan Report Approval
(11) Prepare
(1) Preliminary (3) Hazard (5) Risk (7) Prepare (9) Safety Final Safety
Safety Plan Identification Assessment Safety Plan Assessment Case

time
Safety Authority

Figure E.6: Safety life cycle


The following numbered items refer to the numbers in Figure E.6:
1) The Preliminary Safety Plan should include all activities known from the outset that have to be
done to make the system acceptably safe.
2) In the Hazard Log all likely Hazards are written down, with any results recognized in the Safety
Analysis, or at later stages if they are recognized later.
3) Identifies Hazards more efficiently because of special analysis techniques (more detail out of
scope at this stage).
4) Analyses the previously identified hazards (input by (3))
5) Evaluates the Hazard Identification Analysis.
6) Works out requirements which have to be met before a system can be safe.
7) Prepare all activities that have to be carried out to bring the risk to an acceptably low level.
8) Describe and implement activities that have to be carried out to bring the risk to an acceptably
low level.
9) Safety Assessment provides an independent authoritative opinion on whether system safety
requirements are met or not.
10) Report of the above written
11) The final Safety Case is what can be seen as the “real” Safety Case and represents evidence
that all requirements are met and that all risks are down to an acceptably low level.
12) Safety Approval can be given by a Safety Authority (if present) after endorsing the Safety
Case. This can be a kind of Safety Certificate.
Note for 12: The existence of a Safety Authority depends on the underlying industry and is
sometimes not existent. For the automotive industry the relevant person can be the
OEM for the Supplier, the Certification Body (TUEV) for Supplier or OEM, etc. In
other industries there are controlling instances.

2 Hazard Identification: Identify hazards through a systematic hazard analysis process encompassing detailed analysis of system
hardware and software, the environment (in which the system will exist), and the intended use. Consider and use historical hazard and
mishap data, including lessons learned from other systems.

14.11.2006 2.0 E-14


EASIS Deliverable D3.2 Part 1 - App E

E.4.2 Safety Case (status) report

As defined in Def Stan 00-56 [4] a Safety Case Report is a summary of the Safety Case which is
given at specific stages either every period or after special achievements to ensure the Safety
Case is done properly. It also provides insight in all safety management activities. It is like a
snapshot taken for particular reasons e.g. to provide evidence that requirements from a standard
are met.
It is thinkable to plan those reports at special times during the development. The definition of those
points can be seen as defining milestones.

Safety
Case
Summary

Safety Case
report nr:
0815

Figure E.7: Safety Case report

E.4.2.1 Reason for an extra report

The safety requirements are an integral part of the Safety Case report as well as the hazard log. In
fact the Safety Case report is a summarizing documentation of the Safety Case which itself is a
“documented body of evidence… accessible at different levels of detail”. For developing,
communicating and reviewing it is necessary that a report is created. The safety lifecycle and the
development of a Safety Case should be tightly linked. The selection of testing and evidence for
the software should be dependent upon the Safety Case. In JSP 318B [10] the Safety Case
evolves during the whole safety life cycle. Figure E.8 also explicitly shows the differentiation
between the evolving Safety Case and the final Safety Case report that is produced.

14.11.2006 2.0 E-15


EASIS Deliverable D3.2 Part 1 - App E

Figure E.8: Role of the Safety Case report JSP 318B [10]
The safety argument is often poorly communicated through the textual narrative of Safety Case
reports. The Goal Structuring Notation (GSN) and the ASCAD, presented within the next chapter,
have been developed to provide a clear, structured, approach to developing and presenting safety
arguments.

E.4.2.2 Contents of the report

As already described the Safety Case report is “a report that summarizes the arguments and
evidences of the Safety Case, and documents progress against the safety program”. This function
implies the following contents:
• Executive Summary
A summary of the whole Safety Case is necessary. All important facts and Argument steps
could be included here. By this an overview is given.
• Summary of System Description
This is to ensure that the same system is underlying.
• Assumptions
All assumptions should be summarized so that the reader knows under which conditions
the safety should be guaranteed (e.g. numbers of personnel, training levels, time in service,
operating environment etc.).
• Progress against the Program
The current status has to be described as well so that the reader knows how much
progress has been made.

14.11.2006 2.0 E-16


EASIS Deliverable D3.2 Part 1 - App E

• Meeting Safety Requirements


The following should be included:
• A statement describing the principle agreed Safety Requirements.
• A summary of the arguments and evidences that demonstrate how Safety
Requirements should be met as well as are already met.
• Emergency and Contingency Arrangements
Which emergency measures are thought of and what to do if a failure occurs
• Operational Information
E.g. description of the operational envelopes
• Conclusions and Recommendations
Overall assessment of the safety of the system, any recommendations to enable issues
identified within the report to be resolved.
• References
• Appendices
These may include: Hazard Log, Diagrams of the Safety Case Claim and Argument
structure (e.g. Goal Structured Notation), Calculations, Analyses, List of Hazardous
Materials, List of lifting and manual handling Hazards, Safety certificates

E.4.3 Safety Case maintenance

As William S. Greenwell, Elisabeth A. Strunk, and John C. Knight noted in their work [11] faults
often reveal themselves shortly after deployment. This implies a clear relation between System
Development and Failure Analysis. It is intuitive to prevent failures before they can occur. To do so
a primary failure analysis should be carried out to discover failures (accidents etc.) at an early
stage and the results used to prevent them from being implemented. The earlier they are known
the better, in terms of cost, product and company image, easier defect removal, etc.
Three main fault categories are defined [12]:
1) Random failures that are within the scope of the system’s safety requirements;
2) Attempts to operate the system outside of its intended environment; and
3) Failures resulting from defects that compromise the system’s ability to meet its safety
requirements
Problems with the field fault analysis make it hard to identify the causes of the fault:
1) Complexity of the system
2) Informality of the failure analysis process
3) Differences in designs and in development practices
Failures that lead to hazards can be an indication for the insufficient level of Safety. Two Safety
Cases can be defined: The Safety Case before (Original) and after (Revised) the failure. The post-
failure Safety Case is essentially the result of correcting the original, flawed safety argument.

14.11.2006 2.0 E-17


EASIS Deliverable D3.2 Part 1 - App E

Fault / Failure Original SC


Evidence

Fault / Failure Analysis

Lessons & Revised SC


Recommendations

System and Process Revision

Operation

Error

Figure E.9: The Enhanced Safety-Case Lifecycle


Two possible techniques for Safety Case maintenance are backtracking and dependency analysis.
The first assumption is that the top level goal was not reached (e.g. system is acceptably safe).
From this starting point on incomplete or inadequate assumptions and evidence are found by
“backtracking” through the argument. The other possibility is the execution of a dependency
analysis. Recommendations in the enhanced safety-case lifecycle are suggestions made by an
investigating team that are intended to help system engineers in the task of creating a valid post-
failure Safety Case.
The first recommendation is that of a revised piece of the safety argument. System engineers
would be expected to construct a corresponding revised system design that would allow the use of
the argument fragment in the system’s updated Safety Case. The second form a recommendation
might take is that of a possible system change, always accompanied by a corresponding
postulated change to the pre-failure Safety Case. If the system engineer chooses to implement
recommended system changes, he can use the postulated change to guide the development of the
actual post-failure Safety Case.

14.11.2006 2.0 E-18


EASIS Deliverable D3.2 Part 1 - App E

E.5 Notations

This section describes why it is a great advantage to use graphical notations for the explanation of
arguments in the Safety Case, over the use of plain text. After presenting arguments in favour of
the usefulness of notations the two most common ones will be introduced and explained.

E.5.1 Why graphical methods?

Details required by most standards and complex systems make Safety Cases grow very fast and
so they quickly reach a size where clarity is lost, and it becomes very difficult for most people to
read documents of such a huge size. Linear documents (e.g. Figure E.10: Free text format) are
difficult to overview because the structure of the arguments is not easy to find. Furthermore there
is mostly one person holding all the information and arguments together and only this person
knows what he/she refers to.
While creating the Safety Case many people work together, typically leading to many different and
conflicting views and sources.
Referring to Bloomfield and Bishops work [3] the main problems during construction of Safety
Cases without graphical and other tool support are the following:
Size and complexity
As can be imagined, the Safety Case grows very quickly to a huge size. In “The contents of
the Safety Case”, chapter E.7, it is described what contents a Safety Case should include.
Complexity is the other side of the same problem. Evidence is always technical and very
hard to understand. Without graphical support it is almost impossible to understand why the
system will be safe.
Co-coordinating and presenting results from many different sources
The Safety Case consists of many analyses and test results from many different
independent people. Were they from one person the trustworthiness of the evidences
would be lower, but this diversity of sources compounds the problem of collation and
presentation.
Use of Free-format Text
Using a free text format makes it more difficult to see the key point. Furthermore there is
very bad English used in most of the free text Safety Cases. Figure E.10: Free text format,
taken from Kelly’s [13] paper, provides an example for this:
For hazards associated with warnings,
the assumptions of [7] Section 3.4
associated with the requirement to
present a warning when no equipment
failure has occurred are carried
forward. In particular, with respect
to hazard 17 in section 5.7 [4] that
for test operation, operating limits
will need to be introduced to protect
against the hazard, whilst further
data is gathered to determine the
extent of the problem.

Figure E.10: Free text format


The idea presented here is to support the Safety Case by graphical methods.
The two most established graphical Notations are ASCAD and GSN (Goal Structuring Notation).
Both are related to the Toulmin Concept (compare with chapter E.3.4.2, Structure of Arguments -
philosophical view). First ASCAD will be described, then the GSN, and finally they will be
compared.

14.11.2006 2.0 E-19


EASIS Deliverable D3.2 Part 1 - App E

E.5.2 ASCAD

The ASCAD (Adelard Safety Case Development) notation was developed by Adelard LLP which is
a company dealing with software and safety in general. ASCAD is a total Safety Case development
strategy. It is based on Evidence-Argument-Claim structure.

E.5.2.1 SHIP or Adelard Safety Case Approach

SHIP (Safety of Hazardous Industrial Processes in the Presence of Design Faults) was a project
which overall objective was to devise a means of assessing, ideally numerically, the achieved
reliability or safety of a system in the presence of design faults.
The SHIP model of the Safety Case defines the elements:
• Claims about properties of the system or subsystem
• Evidence used as the basis of the safety argument
• Argument that links the evidence to the claims via a series of inference rules
• Inference rules provides transformational rules
Whereby three types of argument are distinguished:
• Deterministic – relying upon axioms, logic and proof
• Probabilistic – relying upon probabilities and statistical analysis
• Qualitative – relying upon adherence to standards, design codes etc.
With this definition graphics like the following can be developed to describe the Safety Case:

Evidence Evidence Evidence Evidence

Choice: One of them


Conclusion

Together result in
Conclusion Argument
Conclusion structure

Together result in

Claim

Figure E.11: Argument Structure

E.5.2.2 ASCAD notation

Adelard developed a notation for deducing the Evidence to the Claim. The notation has two
common names: ASCAD (Adelard Safety Case Development) and CAE (Claim Argument
Evidence). In the following the name used is ASCAD because it is the official one referring to the
Adelard homepage (www.adelard.com) who are its developers. Claims are represented by blue
circles, evidences by purple squares and Arguments by green squares with round corners (as
illustrated in Figure E.12).

14.11.2006 2.0 E-20


EASIS Deliverable D3.2 Part 1 - App E

Figure E.12: ASCAD Notation


Using these symbols the whole Safety Case can be described. Any relation is presented by an
arrow. The Evidences leads us to the Arguments, the Arguments lead us to the claim. The
structure can be much ramified. Normally there are a lot of sub-claims. But to give an example for
this notation a small graph which is not taken from a real Safety Case should be sufficient as
shown in Figure E.13: Example network in ASCAD notation.

Figure E.13: Example network in ASCAD notation


As above illustrated the elements are linked by arrows which symbolize the relations to each other
(“Is evidence for”, “Supports”, “Is a sub claim of”).

E.5.2.3 Elements and phrasing used in ASCAD

The phrasing in the ASCAD notation is relatively free. There is no specific form. Already existing
ASCAD networks have the following word structures:
Claims Claims are statements that enforce a TRUE/FALSE conclusion and are stated in
<noun> <predicate> phrases.
Arguments Arguments are stated in <noun> <predicate> phrases. This can be done by a
sentence starting with Argument because… but also it is possible to formulate any
sentence that explains the connection between the upper claim and the element
below.

14.11.2006 2.0 E-21


EASIS Deliverable D3.2 Part 1 - App E

Evidences Normally evidences are stated in a noun phrase. Only the name or description of the
document itself is given.
Other Other statements can be from any kind and is only additional information that not
itself gives an Argument. Justifications, comments, etc. can be stated here as well as
list of staff etc.

E.5.2.4 Relation to Toulmin

To find out what ASCAD has in common with the Toulmin’s scheme both notation structures are
illustrated in one figure:

Warrant
grounds Claims

Evidence Argument Claims

Figure E.14: Toulmin’s scheme Ù ASCAD Notation


In Figure E.14 the relation between the two Notations is illustrated. The evidence is called grounds
in the Toulmin notation. Arguments consist of warrants and backings and support the Claim. The
principle is the same.

E.5.2.5 Iterative review of an ASCAD structure

As mentioned in the ASCE manual [7], the typical Safety Case structure review process consists of
the following steps:
1) Make an explicit set of claims about the system.
2) Identify the supporting evidence.
3) Provide a set of safety arguments that link the claims to the evidence.
4) Make refinement based on review and evaluation.
5) Make clear the assumptions and judgments underlying the arguments.
6) Allow different viewpoints and levels of detail.

E.5.3 Goal Structuring Notation

The Goal Structuring Notation (GSN) is a graphical argumentation notation developed by the
University of York. A good point for starting out with GSN is Kelly's PhD Thesis [1].

14.11.2006 2.0 E-22


EASIS Deliverable D3.2 Part 1 - App E

In this Notation the following elements are used:

Goal
Context Model Solution
<Subject> - Descripti
<Predicate> Normally no <Subject>
on
phrase Predicate

Argument Justification Assumption


over/ by/… <Subject> - <Subject> -
<Predicate> <Predicate>

Figure E.15: Example GSN (created with ASCE)


The idea is to break the goal down into sub-goals and arguments until the claims are supported by
direct reference to available evidence. As with the ASCAD notation the subjects can be linked by
arrows. With them “solved by” and “in context of” relations can be illustrated.

M1 G1 J1

Model System is safe Relevant info


type

A1 Argument
by/ over/...

G1.1 G1.2 A1

Subgoal 1 Subgoal 2 Respect this!

S1 S2

Tests for Tests for


G1.1 G1.2

Figure E.16: Example GSN network

E.5.3.1 Elements and phrasing used in GSN

This is one of the main differences between the two notations: the names of the elements. This
point shows that the two are quite similar. This short section should describe the elements and the
phrasing of the GSN notation.
Within this description the terms <noun-phrase> and <verb-phrase> are used with the following
meanings:
A noun phrase consists of a pronoun or noun with any associated modifiers, including adjectives,
adjective phrases, adjective clauses, and other nouns in the possessive case.
A verb phrase consists of a verb, its direct and/or indirect objects, and any adverb, adverb
phrases, or adverb clauses which happen to modify it. The predicate of a clause or sentence is
always a verb phrase.

14.11.2006 2.0 E-23


EASIS Deliverable D3.2 Part 1 - App E

Goal What the argument must show to be true. This can be a requirement, target or a
constraint. Kelly proposed in his doctoral thesis [1] the following. The goal should
be a TRUE/FALSE statement. By this is meant that it should be always either
TRUE OR FALSE. Furthermore it should be of the following form: <noun-
phrase><verb-phrase> whereas the noun-phrase means the subject in the
statement and the verb-phrase means the predicate (e.g. “system is safe”).
Strategy Breaks down a goal into a number of sub-goals or leads the goal to the solution. It
is recommended that strategies are of the following form: “Argument by
<approach>”, “Approach over <approach>”, “Argument using <approach>”,
“Argument of <approach>” – the focus of these is the argument itself.
The Strategies can be mentioned explicitly between goals and sub-goals or
solutions (next element explained), but if it is clearly understandable without it can
be left out. It exists as an aid to understanding.
For better illustration a little example from simple mathematics is given. In the
following problem a value for x is sought. To isolate x from the rest we must
divide by x. Formally written down this is illustrated by “/: x”. This can be left out,
but to argue how we came from one line to the next this step is added.

3x² + 2x = 0 /: x Assuming x is not 0!


Ù 3x +2 = 0
“Assuming x is not 0” is an Assumption but this element is explained later.
The best way to express an Argument could be by “Argument by/ over/ using/ of/
…”.
Solution Evidence that the sub-goals have been met. This can be achieved by
decomposing all goal claims to a level where direct reference to evidence was felt
possible. They should be stated as a noun-phrase and are more or less
references to external files.
Context Context allows reference to where these concepts are fully defined. There is no
specific form that a reference could take.
Justification Justifications are added whenever it seems necessary to explain the rationale
behind some strategy or claim.
Assumption Elements of the context that are taken to be true. Both, Assumptions and
Justification should be in the noun-phrase verb-phrase form (e.g. System has no
common failure modes). The idea is made clear in the example above
(“Assuming x is not 0”).
Model The Model is describing the underlying system. These descriptions are simply
references to outside available information. Therefore they are described as
noun-phrase.

14.11.2006 2.0 E-24


EASIS Deliverable D3.2 Part 1 - App E

E.5.3.2 Extensions to GSN

In addition GSN has two display rules which drive the annotation for nodes which are yet "to be
developed" or "to be instantiated". The following figures are (usually) appended on the bottom of
the goal.

to be developed

to be instantiated

to be developed
and instantiated

Furthermore, there are special relations adding numbers to the arrows to relate to 0…n (a), 1/ 0
(b), or to 1 element (c)

a) b) c)
n
O
The Is_A relation provides a basis for the expression of super-type and sub-type relations between
GSN entities and can therefore be used to establish type hierarchies and is demonstrated as
follows.

Figure E.17: “Is_A” Relation


G1 Def:
System is Acceptably
acceptably safe safe

G2 G3
All hazards are Process carried
eliminated out properly

Ghazard
Hazard i is (i = 1 to n)
eliminated

Figure E.18: Extensions to GSN


The context “Acceptably safe” still has to be defined. The ”to be instantiated” sign shows this. In
the final version there should be a detailed description of what is meant by “acceptably”. G3 has not
been fully developed. It is possible that more notes are added below this box. That is the difference

14.11.2006 2.0 E-25


EASIS Deliverable D3.2 Part 1 - App E

between “to be developed” and “to be instantiated”. The arrow between G2 and Ghazard is marked
with n. It is made clear that this relation exists n times.

E.5.3.3 Relation to Toulmin

Figure E.19 shows the relationship between the Goal Structuring Notation (GSN) and the original
Toulmin concepts, where a GSN “goal” is equivalent to claim, which is “solved” by strategies (equal
to Backing and Warrants), sub-goals and solutions (which can be related to Grounds):

Warrant Claims
grounds

Goal
Solutio Strateg
n

Figure E.19: Toulmin’s scheme Ù GSN

E.5.3.4 Iterative review process

As Kelly described in his work: “A six step Method for Developing Arguments in the Goal
Structuring Notation” [14] the construction of a GSN network can be divided into six steps:

Step 5: Elaborate
Strategy
Step 3: Step 6:
Step 1: Identify Define
Identify strategies to Basic
goals to be support solution
supported. goals.

Step 2: Step 4: Define


Define basis basis on
on which which
goals stated. strategy
stated

Figure E.20: Six step method to GSN construction method


Step 1: Identify goals to be supported:
The Argumentation flow should never be interrupted, so care must be taken that as many sub-
goals as possible are mentioned to simplify the work for the reader. In almost the same manner is
the exact declaration of the (sub-) goals. The context is very important. Take care that the goals
enforce a TRUE/FALSE conclusion.

14.11.2006 2.0 E-26


EASIS Deliverable D3.2 Part 1 - App E

Step 2: Define basis on which goals are stated:


The scope has to be clearly defined. Therefore Models are used. They are written in the noun
phrase without predicate and describe the system itself (e.g. Airbus A380, Product ID: 123456). If
other information is to be added, this can be done by context.

Main-Goal Model


Sub-goal Sub-goal Model
1 2

Figure E.21: Step 1 and Step 2


Step 3: Identify strategies to support goals:
The question is: How to explain to the reader (and to the author) that the stated goal is TRUE?
To give an answer to this question, more sub-goals or direct evidence need to be found. The
underlying logic is: sub-goals TRUE => main goal TRUE.
As already stated Strategies can be mentioned explicitly between goal and sub-goals but they
should not be necessary for the understandability of the Argument as a whole.
Step 4: Define basis on which strategy stated:
Compare with Step 2: It should be clear why this strategy was chosen. Assumptions can be used
to add information concerning how the strategy leads to the evidence, and Justifications can
similarly be used to show why a particular strategy can be considered a solution.

Main-Goal

Figure E.22: Step 3 and Step 4


Step 5: Elaborate Strategy:
The strategy above should implement new sub-goals so Step 1 has to be revisited.
Step 6: Define Basic solution:
Finally, the main goal should have been broken down until the basis goals can be proven by direct
evidence, which means by reference to external information.

14.11.2006 2.0 E-27


EASIS Deliverable D3.2 Part 1 - App E

There are three things that should be borne in mind while creating Arguments in a Safety Case:
• Intelligibility: Consider the reader.
• Clarity: Everything should be stated in such a way that misunderstanding is avoided.
• Invulnerability: Explain everything from the ground up.
This method supports all three points.

E.5.4 Comparison of the two Notations

At first the elements are compared. In the following table the three main types of elements are set
next to each other to underline that beside the colours and the names, the main elements are the
same.

TOULMIN ASCAD GSN

Claim Claim Goal

Warrant+
Argument Strategy
Backing

Grounds Evidence Solution

Table E.2: Comparison of graphical methods I


ASCAD GSN
Methodology No method known 6 step method

Illustration

Approach Bottom up Top Down allows a planning approach

14.11.2006 2.0 E-28


EASIS Deliverable D3.2 Part 1 - App E

ASCAD GSN
Additional
graphical
elements

J A
Table E.3: Comparison of graphical methods II
Direction of arrows Actually the semantics of both are the same. In GSN the links are the
passive “[parent] is solved by [child]”, but the arrow could be drawn the
other way round and say “[child] solves [parent]”. Similarly in ASCAD
there could be downwards arrows and say “has evidence” etc. Effectively
both have a direction of argumentation support from child nodes to
parent node, it is just a difference in vocalization how the links are
expressed.
The direction indicates also flow of argumentation and construction, e.g.
top-down or bottom-up. Special cases are the arrows to the additional
elements in GSN (mainly drawn horizontal). These arrow are used to link
the additional elements to a goal.
GSN is more specific The GSN uses “models”, “assumptions”, “context” and “justification”
because of more whereas ASCAD Notation only uses “other”. Furthermore there is a sign
implemented nodes for undeveloped and uninstantiated. This is more specific and helps to
ensure that all important facts are listed. For example the model helps to
think about the exact description of the system.
ASCAD is easier to understand and present to someone with a non safety background.
Luke Emmet from Adelard explained when to choose which notation by the following lines [15]:
“Each author needs to find the best trade-off for his/her argument between being
suitably expressive but not being too verbose. Most of our users find a mixture of
graphical elements and explanatory narrative behind the nodes to be about right.

Some people prefer GSN in contexts where a "top down" planning/decomposition
flavor is helpful (e.g. perhaps planning a Safety Case and evidence to be
collected). Others prefer CAE in a context where evidence might already exist and
you want to show the best claim that it can support. But you can use both for each
of these purposes.
Some customers use both at different times in their process.
In summary, we support both notations equally (and some others besides), and it
is up to you to choose the one that suits your needs and context best. At Adelard
we use both with a slight preference for CAE.”
This might be the best answer to the question what to use. Whereas it has to be mentioned that
Luke Emmet (as he also mentioned in the lines above) is working for Adelard. This company
supported the implementation of ASCAD and implemented the “ASCE” tool (described later). In the
following GSN structure will be used.

14.11.2006 2.0 E-29


EASIS Deliverable D3.2 Part 1 - App E

E.5.5 Advantages and disadvantages of using GSN or ASCAD

Advantages:
Regardless of the differences, both notations have great advantages regarding free text format for
developers of Safety Cases. The five main advantages could be the following:
1. Help to construct With these trees it is much more clearly arranged. During the
Safety Cases construction it is easier to follow up “limbs”.
2. Overview of structure A graphic is a fast way to oversee all items and its relations.
3. Contents All contents are represented in the graph.
4. Easy to understand once the notations are learned and understood.
5. Guidance is provided by literature and internet.
Disadvantages:
The main disadvantages should be mentioned as well:
1. Learning is In contrast to the free text format, how to create the notation has to be
necessary learned. By this is meant not only the content but how to “draw” it. Reading is
quite easy but writing can be very complicated.
2. Quality is not The normal notation itself is not talking about the quality of an argument. To
addressed do so, information like the SAL has to be added (see E.9.2, Assessment with
SAL).

14.11.2006 2.0 E-30


EASIS Deliverable D3.2 Part 1 - App E

E.6 Overview of the Safety Case

This section is concerned with identifying the necessary activities and information to construct a
Safety Case. Several “Safety Case Development Methodologies” have been developed. “Arguing
Safety – A Systematic Approach to Managing Safety” [1] gives a good introduction and provides an
overview of past and current research concerning Safety Case development. In particular, the
works of the following projects have been presented:
• ASAM (A Safety Argument Manager), ASAM-II and SAM
• SHIP (Safety of Hazardous Industrial Processes)
• Communication in Safety Cases
• Adelard Safety Case Development Method(ASCAD)
• SERENE (Safety and Risk Evaluation using Bayesian Nets)

E.6.1 Safety Case Development steps

As stated by the ASCAD manual, a typical Safety Case development process consists of the
following steps:
• Make an explicit set of claims about the system.
• Identify the supporting evidence.
• Provide a set of safety arguments that link the claims to the evidence.
• Make refinement based on review and evaluation.
• Make clear the assumptions and judgments underlying the arguments.
• Allow different viewpoints and levels of detail.
There are a number of different stakeholders who are involved in Safety Case development or
evaluation, each of which approach the Safety Case from a different viewpoint, such as:
• Safety specialists involved in detail.
• Development staff that may not have safety as their primary concern.
• Project managers.
• Operators.
• Senior staff who accept equipment as safe to enter service.
• Supply chain who may be involved in the production of safety evidence or arguments about
sub systems.

14.11.2006 2.0 E-31


EASIS Deliverable D3.2 Part 1 - App E

E.6.2 Relation between Engineering and Safety Case

The first approach that is introduced in this appendix is comparing the safety development with the
engineering process. In an engineering process the main system is typically divided into
subsystems:

System is safe Car design

Argument Aufteilung in
over Komponenten

EMS EGS .. ESP


.. Decomposition

Argument by...

PFH HI
...
Attribute Function

Figure E.23: Safety Case Ù Engineering


In Figure E.23, the Safety Case is broken down into several subsystems based on functional
decomposition. Of course other decompositions are possible. Furthermore this could be a
possibility to find more functions and attributes. Another discussion would be the link to an
integrated Safety Case which includes the idea of an evolving Safety Case which grows with the
ongoing development of the system.

E.6.3 Product and Process requirements

The following description is not the only possible way to build up a Safety Case. It is just an idea
based on personal conversation with Dr. Robert Weaver. The following description is not a
standard form and there is no guarantee for its completeness or for its correctness.
The first step involves differentiation between process based and other product requirements.
While the normal requirements are changing more often they have to be looked at each time
anything changes.
As illustrated in Figure E.24 this first split is traceable. In the Process based Argumentation there
will be topics like qualified staff, project planning, standards and other project based topics
whereas in the product based Argumentation are design aspects, functional reasons and safety
components.

14.11.2006 2.0 E-32


EASIS Deliverable D3.2 Part 1 - App E

Model System XYZ Context


is safe

Process is Product is
safe safe

Figure E.24: First split


After this it gets more difficult. The following points depend very strong on the underlying industry,
and which are advocated has to be carefully analysed
Care has to be taken that the processes carried out influence the product under development.
These two elements are tightly linked. The Hazard Log for example can change very often. When it
changes, the product requirements have to be checked as well. Seen from the other perspective a
product requirement can influence the process that has to be done.

E.6.3.1 The Process

By checking the process the question should be answered of how the product is produced. In
many cases there are standards for the underlying industry. These should always be noted in the
Safety Case, along with a description of how they have been met. Some standards provide
minimum requirements which should be reached (e.g. SIL3 by IN 61508). The Process could be
divided into Development and Production.

Process 1 developed
in a way that it is safe

Hazard Identification
and Assessment
performed ...
C: Hazard C:
checklist Preliminary
HI results
Preliminary Hazard PHI results implemented
identification done functional hazard
C: Previous analysis C: All
Experience system
(old lists) functions
described
HI FHA
results
results

Figure E.25: Process decomposition

3 The System is thing which is developed. The development is divided into a part how the thing is developed, which is the process part,
and into what is developed, e.g. functions and properties, which is the product part.

14.11.2006 2.0 E-33


EASIS Deliverable D3.2 Part 1 - App E

E.6.3.2 The Product

The Product itself should be shown to be safe enough. This can be given by standards as well but
it has to be shown that this is enough and that the Norms and standards have been met. One
possibility is to satisfy Risk based Arguments for example over its individual hazards. The
Arguments can be split according to these criteria.

...

Product is
safe

Argument over
Risk/ hazards

Hazard A Hazard B ... Hazard Z

Figure E.26: Decomposition by hazards


If this is done a justification for the assumption that all Hazards have been discovered should be
provided. But take care: this is a process so it has to appear in the other limb, for example by a
Hazard Identification Analysis.
The next step could be a functional breakdown, in which an analysis is performed to determine
whether the contribution of each individual function to a certain hazard is acceptable.

Hazard ... Hazard R Hazard ...

Argument
by functions

Function Function Function


A B C

Figure E.27: Decomposition by functions


The key is to look at the types of analysis and evidence that is being produced during the Safety
lifecycle and identify what role it plays. This will help production of an argument.
The above mentioned idea combined with the six step method offers a possible way of creating a
GSN structure for a new Safety Case. Together with an assessment approach this could be a
possibility of creating a defensible clear Argument that a system is safe enough to operate. Clarity
is given by the GSN structure above and defensibility is given by the Assessment methods.

14.11.2006 2.0 E-34


EASIS Deliverable D3.2 Part 1 - App E

System or Subsystem

6 step method

Process and completed Direct


Product evidence STOP
6 step method

Hazard Analysis and


identification

6 step method

Functions and Attributes

Figure E.28: Possible way to find sub-goals


Figure E.28 illustrates how a Safety Case could be constructed by using the 6 step method. At the
beginning the primary goal for a system or subsystem is defined. Then the 6 step method is used
to argue in more detail about the safety of the system until direct evidence could be provided. If
this is the case the 6 step method ends. The direct evidence could be found on each level and
there it is possible to stop on each level which is shown be the arrow leading to right side.
The six step method ensures that everything has been done in sufficient detail.
Kelly mentioned in his DPhil Thesis [1]: “As illustrated, using the representation of context within
the GSN it is possible to show how evidence used as part of the product argument was derived
and also how it formed part of the process argument.

E.6.4 Arguments by requirements

As Kelly described in the Appendix of his work “A six step method... ” [14] Arguments can be
developed from requirements. In the first step, different categories of requirement – functional or
performance – are found. Of course there are a lot more but at this stage those two should be
sufficient.

14.11.2006 2.0 E-35


EASIS Deliverable D3.2 Part 1 - App E

G1
requirements
are met

Argument over Argument over Argument over


functional req. reliability req. performance req.

G2 System will G3 Probability of G4 Max response


correctly brake failure on demand time < 5 seconds
< 0.001

Figure E.29: Arguments by requirements

Figure E.29 is just an example and only used for illustrating the development of argument. The
only safety related function could be the correct braking when a difficult track section is imminent
(G2). There can be other safety related requirements as well, for example in-service use,
development, safety criteria and more but in the following the focus lies on those two.

E.6.5 Bottom up analysis

Another idea would be to analyse the evidence already available and try to analyse what is the
best claim that can be supported by it. In this case ASCAD notation is used. As already stated both
notations would be possible, but because of the defined phrasing this one was chosen at this
stage.

Best
Claim

Argument

Already Already
given given
Evidence I Evidence II

Figure E.30: Develop best claim from evidences


The bottom-up analysis should not be done at the end of the development. It is well suited for pre-
defined process which is creating several documents (evidences) at certain steps in the process.
With the bottom-up analysis it could analysed early if the produced evidence is sound and
complete to argue about the safety of the system.

14.11.2006 2.0 E-36


EASIS Deliverable D3.2 Part 1 - App E

E.7 The contents of the Safety Case

To find out what the contents of a Safety Case are already existing Safety Cases are analyzed and
similarities between them are uncovered. In general equivalent topics are included. The next
pages provide a short description of these topics.

E.7.1 Topics

There are some important main topics which are introduced first will be explained in the following
paragraphs. Of course most of the contents of the preliminary Safety Case are included in the final
Safety Case. It cannot be decided which topic belongs to which phase because it should be an
incremental development.
1) Introduction
2) Description of the system
3) Standards
4) Quality and Safety Management
5) Current status and development process
6) Configuration and Change Management
7) Conclusion

E.7.1.1 Introduction

This part is as its name suggests the first part of every Safety Case. It should introduce the Safety
Case itself and describe how it is partitioned. It names the most important contents and the general
preview on how they are handled. Also, the Motivation for creation of the Safety Case, and whether
it inherits from other Safety Cases (e.g. sub-system) can be added. In the end there should be a
clear overview of the Safety Case and its contents (an executive summary). The input will be given
by the points below.

E.7.1.2 Description of the system

One of the most important points in the Safety Case always is the adequate model description
(What kind of model it should be and what will be modelled). It is essential to ensure the system is
always discussed in the same context. Without this information the Safety Case is useless. It
would be dangerous to believe in safety without having a clear view of which system is being
studied. At the end there should be at least the following documents: requirements for staff
(preparatory training for new staff, experience …), exact environment description (in- or outdoor,
road or terrain), model description, extensions, adjustments, aims, main functions, and more. In
short everything that helps to describe the system in more detail.

E.7.1.3 Standards

What standards are underlying?


Some standards lay down what is safe enough, what has to be ensured, and how a system
can be designed. If there is an underlying standard (which generally there is) it has to be
mentioned. If it aids understanding, various facts (for example SILs) that affect its safety
can be excerpted.

14.11.2006 2.0 E-37


EASIS Deliverable D3.2 Part 1 - App E

What can be done to meet the standards?


This is also an important question. If there are some differences between the product and
underlying standards they have to be mentioned, and an explanation given detailing why
they were not met or how they will be met.
Are there other guidelines?
The same should be said about other guidelines. It should be weighed up which guidelines
can be used and which are not relevant. Often Safety can be increased by adhering to
guidelines or other Safety Cases.

E.7.1.4 Quality and Safety Management

What is done to ensure safety?


To argue why safety is reached is one of the main aspects of the Safety Case. As Jane
Fenn explained in one of her papers [16], for ensuring safety the SEAL (or SAL which are
described later in this appendix) for a sub-goal should be the same as the one used for the
parent goal. How this is to be achieved should be described. Three attributes influence the
support of the parent goal: Relevance, Coverage and Strength. They should be considered
carefully in this part.
How should the system be analysed and tested?
A short summary and description of the techniques used to analyse the system is useful as
well in this part. Key elements in this part are: FMEA tables, FTA, etc.
What hazards can occur and how can they be handled?
A (at least) primary hazard analysis is one of the basics in this chapter as well. Without
being able to ensure that everything has been done to know the hazards there is no
confidence in the Safety Case. After identifying them they are analysed in more detail and
steps taken toward preventing them.
Who is responsible?
Detailed information should be given about who is responsible for what in the Safety Case.
It is necessary to define what qualifications (experience, training, etc) the staff should have.
Which requirements have to be respected?
Any important requirements should be mentioned and described.
The contents of this part should include a list of all functional and non- functional safety
requirements, Roles and responsibilities, Safety and quality lifecycle, Safety and quality
requirements, Safety and quality standards, Safety audit and assessment, Supplier management,
Safety checks, Project safety training and any other safety or quality related document.
Maintenance and service can be a point as well. The safety plan should also be provided and
referenced. The Hazard Log will be the primary source of evidence as well as FMEA and FTA.

E.7.1.5 Current status and development process

Is the system adequately safe?


It has to be shown if the system is safe at the moment or in near future (can be done by the
points above) otherwise this statement is unfounded and so useless. If the system cannot
be shown to be reasonable safe, the next questions have to be answered:
What is the situation like at the moment and where is there a need for improvement?
The now and future comparison is important to show where things will be changed to make
the system safe. This can be a good way of showing the customer what the work is worth.

14.11.2006 2.0 E-38


EASIS Deliverable D3.2 Part 1 - App E

At this point every single method should be mentioned to reconstruct how a certain level of
safety will be achieved and to check if everything was done finally.

E.7.1.6 Configuration and Change Management

Where can information regarding the timeliness of actions and the versions of artefacts be found?
If it cannot be ensured that changes reach all affected parties, it may be that other changes
are affected, or at worst disregarded. This can cause confusion (concurrent work,
misunderstanding …). A good change and configuration management can provide a
solution to this.

E.7.1.7 Conclusion

In the conclusion all open questions can be answered:


What Assumptions are made in the Safety Case?
At some points Assumptions are made, such as: part xyz has no common mode failures.
“Assumption” here means any statement about the system that can be taken as true.
Is there some risk remaining?
Often the risk cannot be totally eliminated. The remaining risks should be identified,
mentioned and if possible mitigation ideas should be given.
Which points of interest are still outstanding? Is there room for improvement?
All outstanding issues should be mentioned. Furthermore, all improvement possibilities can
be stated.
Are there restrictions that must be borne in mind?
If there are restrictions, regardless of where they were made, they should be catalogued in
the Safety Case.
Finally the document should be signed by an appropriate officer of the company, with a statement
that everything was done to the best available knowledge. By this it is guaranteed that as much
advice as possible was given.

E.7.2 Documents included

As stated in Safety Case and Safety Case Report [17], the Safety Case body of information will
include outputs from all the Safety Management activities conducted on a Project. The following is
a list of documents that could be included. But which document really is required depends on the
Arguments. Normally the documents are a subset of the following:
a. Safety Plans;
b. Disposal Plans;
c. Hazard Log;
d. Register of Legislation and other significant Requirements;
e. Minutes of Preliminary Safety Case meetings;
f. Safety Reports (e.g. Hazard Identification, Hazard Analysis, Risk Estimation, Risk
Evaluation);
g. Safety Assessment or Safety Case Reports for particular aspects of the system or
activities associated with the system (e.g. Software Safety Case, Disposal Safety
Assessment);

14.11.2006 2.0 E-39


EASIS Deliverable D3.2 Part 1 - App E

h. Safety Requirements;
i. Records of Design Reviews and Safety Reviews;
j. Verification Cross Reference Index;
k. Incident reports and records of their investigation and resolution;
l. Safety Audit Plans;
m. Safety Audit Reports;
n. Records of Safety advice received;
o. Results of Safety inspections;
p. Records of Safety approvals (e.g. Certificates);
q. Minimum Equipment List (i.e. vital to Safe operation);
r. Emergency and Contingency Plans and Arrangements;
s. Limitations on Safe Use;
t. Master Data and Assumptions List;
u. Evidence of compliance with Legislation and Standards;
v. Evidence of adequacy of tools and methods used;
w. List of people and their activities;
x. Signed statement;
y. Results of Tests and Trials;
z. Plans for Tests and Trials;
aa. System description;
bb. Design process description;
cc. Verification results;
These documents could be either included into the Safety Case or could be referenced by the
Safety Case. Both ways have their advantages and disadvantages. The inclusion of the documents
has the benefit that all information is present in one document. The main disadvantages are the
size of the document (it cold be expected that the document will consists of several 100 pages)
and that the intellectual property is shown completely in the Safety Case. The last disadvantage is
sensitive because the Safety Case is a “public” document which should be presentable to every
one interested. Therefore the referencing to the supporting documents is recommended because
the Safety Case contains the argument why the system is safe enough and any interested person
could ask for the supporting documents which are identified uniquely in the Safety Case.

E.7.3 Safety arguments

This section will cover a description of the safety arguments. The safety arguments link the claims
to evidence. Examples of methods such as the Goal Structuring Notation (GSN) will be given in
Section to illustrate the essential features of this step.
Within “A Methodology for Safety Case Development” [3] the three types of arguments are
mentioned; deterministic, probabilistic and qualitative.
They state further: “The choice of argument will depend on the available evidence and the type of
claim. For example claims for reliability would normally be supported by statistical arguments, while
other claims (e.g. for maintainability) might rely on more qualitative arguments such as adherence
to codes of practice.”

14.11.2006 2.0 E-40


EASIS Deliverable D3.2 Part 1 - App E

Because Arguments can neither be categorized nor standardized the reuse is not simple. This
discussion is pushed forward to chapter E.8.2, “Reusing Arguments by applying Patterns”, which
will discuss the reusability of Arguments.

E.7.4 Supporting evidence

This section will cover a description of the supporting evidence. This is concerned with determining
which sources could be used, e.g.
• Design
• Results of Safety related activities
• Development Process
• Verification and Validation
• Field experience
The choice of argument will depend in part on the availability of such evidence, e.g. claims for
reliability might be based on field experience for an established design, and on development
processes and reliability testing for a new design.
For EASIS maybe new sources of evidence will be required and described.
Before categorizing elements the focus lies on the difference between HW and SW. Then
similarities between already existing examples are searched. After thinking extensively about
claims it was found out that it makes no sense to categorize them. Goals are part of Arguments
except the main goal. But before the Arguments are further analyzed the evidence will be the point
of interest.

E.7.5 Hardware and software

Today most systems include SW, regardless of which systems are considered, from calculators to
cars or of course computers. There are, however, big differences between hardware and software,
which are analyzed in the following.
With regard to testing the following list of differences exists:
Hardware Software
Failure in development production or Failures are caused by humans during
1.
maintenance. software development.

2. Failures due to aging. Software is not aging.

Preventive maintenance prevents possible


3. Preventive maintenance is impossible.
failures.
Reliability can be seen as function of the
4. Reliability is not time dependent.
operating time.

5. Reliability depends on environment. Often independent of environment.

Reliability can be calculated based on Reliability can be neither measured nor


6.
physical laws. calculated exactly.

7. Reliability can be enlarged by redundancy. Redundancy often does not help much.

8. Failure rates often follow the same pattern. No exact failure patterns existent.

14.11.2006 2.0 E-41


EASIS Deliverable D3.2 Part 1 - App E

Hardware Software

9. Interfaces are visual. Interfaces are conceptual.

10. Mainly standard components used. Besides COTS, software is unique.

Table E.4: Differences between hardware and software with regard to testing [18]
Generally hardware and software have different design and development processes. Due to this
they have different behaviour. This behaviour of course causes different failures which can lead to
hazards. In general it should be mentioned that any failure has to be removed or at least detected
and be thought of. In safety critical systems the importance of finding failures is much higher. The
structure of the rates of failure differs between hardware and software, and this will be illustrated
on the next pages.
The Hardware failure rate has the following structure:

Figure E.31: HW failure probabilistic [19]


Hardware normally has a lot of failures in the beginning (e.g. failures in the production). This rate
decreases because these failures are eliminated and then the HW works almost properly. This is
what is called “normal operation”. At a later stage the failure rate increases again due to aging.
Maintenance for example can shift this moment but the structure keeps the same. This behaviour
is the same for electronic and mechanistic systems. Software normally follows a different structure:

f Releases
1 2 3


t

Figure E.32: SW failure rates [19]


As illustrated above the failure rates of SW are different. Two things come in mind:
1) Due to new releases which are implemented and included in the original version new sources
of failures are inserted as well. This causes “jumps” in the curve.
2) SW is not aging but it is getting worse..

14.11.2006 2.0 E-42


EASIS Deliverable D3.2 Part 1 - App E

Conclusion:
The most important difference between HW and SW is that HW failures are probabilistic and SW
failures systematic. This is why the breakdown of HW can be estimated, by e.g. MTTF (Mean Time
To Failure). For this reason many documents discuss only the HW failures and there are special
analysis and test techniques which can only be addressed to HW.

E.7.6 Evidences

The evidences address different aspects of the goal (e.g. assessment technique (SAL, review,
etc.), tests, analysis, etc.). It always has to be ensured that each of them is defensible enough to
confirm the underlying statement. There are three principles that should be borne in mind while
planning evidence.
1) Know the goals (necessary and sufficient. Only as much evidence as is required to confirm the
underlying statement is needed, and no more)
2) Satisfy the requirements
3) Proof has to be efficient

E.7.6.1 Attributes

Important aspects when dividing evidence are independence, relevance, coverage [20] and
strength. These points are further discussed below.
Independence:
Level Type of independence
I Conceptual and mechanistic Independence
II Conceptual but not mechanistic Independence
III Mechanistic but not conceptual Independence
IV No Independence

Table E.5: Table of independence


Two kinds are differentiated:
Conceptual which means that the evidences are different in both their approaches and their
underlying theories (See also E.3.4.2)
Mechanistic which means that the evidences have different approaches but follow the same
underlying theory.
Following evidences can weaken requirements for complementary evidences.
The Independence of Evidence has to be analyzed. It will be rare that one element of evidence will
adequately support a safety goal, so more than one item of evidence has to be used. In this case
the relationship between the elements should be considered. This can show how well they satisfy
the underlying goal.
Relevance Some evidences are stronger than others because they are more significant.
Furthermore there are some that are more important to exist (e.g. competency of
personnel).

14.11.2006 2.0 E-43


EASIS Deliverable D3.2 Part 1 - App E

The relevance itself can be divided further into


• Direct (direct link to parent goal)
• Backing (direct evidence that is soundly based)
• Reinforcement (indirect evidence that can be extrapolated to come to direct
evidence)
Coverage This indicates how much the evidence supports the main goal. One possibility to
assess this is by SAL (compare with E.9.2 Assessment with SAL).
For example the higher the test coverage, the greater is the confidence in the test
results. The coverage of evidence identifies the proportion of the software, or the
property of interest, for which safety goal has (demonstrably) been met.
Strength Regarding different techniques for providing or getting evidence uncovers certain
strength of their trustworthiness. This kind of strength depends on the author and
on underlying standards. For example while some people say a FTA is enough
others say that a FMEA is required (both analysis techniques will be explained at a
later stage).
Conclusion A good mixture of Independence, Relevance, Coverage and Strength is the key.
Any safety critical goal should be supported with at least two independent items of
evidence, preferably with conceptual and mechanistic independence, and at least
conceptual independence.

E.7.6.2 Primary sources of evidence

As Adelard states in the manual of the ASCE tool, sources of evidence could be out of the
following:
• Hazard Log: listing all potential hazards of the product in its environment using results of a
HAZOPS activity, etc..
• Probabilistic design assessments and qualitative design studies: These sources of
evidence can be divided into: inductive (e.g. FMEA) and deductive (e.g. FTA) Methods.
• Resource estimates: Resources used for the implementation and the associated Safety
Cases are assessed by estimation methods.
• Design techniques: Some design techniques implement prior evidence
• Independent certification (e.g. COTS products)
• Experience from existing systems in field operation

E.7.6.3 Different methods for getting evidence

The most important methods for getting evidences are: tests, reviews, analysis and formal
methods. For further reading the reader is forwarded to EASIS WT 3.2. At this stage it is just
mentioned that the trade-off between formality and complexity has to be considered very carefully.
This difference is illustrated in the following figure:

14.11.2006 2.0 E-44


EASIS Deliverable D3.2 Part 1 - App E

Complexity/
Costs
Trustworthiness

Formal
methods

Analysis

Reviews

Tests

Formality

Figure E.33: Formality and trustworthiness

14.11.2006 2.0 E-45


EASIS Deliverable D3.2 Part 1 - App E

E.8 Architectural approaches for Safety Cases

In software design architectural principles are applied to achieve a better structure and to avoid
systematic failures. Safety engineering also profits from this benefits. There are three main
principles that are explained in this chapter: Tactics, Patterns and Modules.

E.8.1 Safety Tactics

In architecture design decisions that influence the handling of attributes are called Tactics. If there
are multiple tactics applied it is called strategy. Taking this definition brings up the following
definition for safety Tactics:
Safety Tactics are design decisions that influence safety attributes and their handling.
As a consequence it gets clear that the attributes have to be found first. In this report safety is
considered and so safety attributes are the starting point. In his MSc thesis Weihang Wu [27]
bases his tactics on the following failure attributes:
Failure classification
• Failure cause
• Failure behaviour
• Failure property
These points can be further sub classified. Weihang Wu finally stopped with the following structure:

Failure

Classification Cause Behaviour


Property

Provision Value SW HW F. propagation F. transformation

Tolerability Detectability
Timing Environment

Figure E.34: Safety attributes

Figure E.35: Safety Tactics

14.11.2006 2.0 E-46


EASIS Deliverable D3.2 Part 1 - App E

In Safety Tactics some important headings should be handled. The list below illustrates them:
Aim At the beginning the reader should get a short summarize so that he knows if
this Tactic is suitable for his project.
Description This description is more extensive than the aim. Here a more detailed
description is proposed.
Rationale The basic principle which normally is represented in a GSN structure.
Applicability In this part of the tactic description any Rules and circumstances are given.
Consequences What happens if failures occur?
Side effects Are other attributes affected by this Tactic?
Practical Some strategies that result out of this Tactic.
Strategies
Patterns Sometimes Patterns implement tactics. Those could be mentioned here so
that a clear link is provided.
Related tactics To ensure that the correct tactic is applied and that related ones are
considered as well they should be mentioned here.

E.8.2 Reusing Arguments by applying Patterns

It is important to note the potential dangers in reusing Arguments. It is absolutely necessary to


think about possible consequences especially when safety is discussed.
Visibility and Reuse has the potential to propagate one error many times. Imagine an
traceability insufficient argument is reused in many systems. To deal with such
situations requires adequate visibility and traceability of the reuse process.
Contextual If there is a new argument which has a slightly different context to the
information not reused one this could debase the safety argument. Furthermore such
fully recognized reused arguments can have assumptions which are not apparent but which
are inconsistent with the new context.
High level of Reuse often occurs in a random, opportunistic, order and is not carried out
systematic needed systematically. Sometimes good opportunities to reuse are wasted. This is
the trouble that reuse needs the ability, firstly, to recognise the potential to
reuse and, secondly, to recall the appropriate information.

E.8.2.1 Pattern structure

Kelly compared some already existing pattern methods and chose the one from the so-called Gang
of Four [21]. This pattern format has the following topics:
Pattern name Name of the pattern (e.g. argument name)
Intent It is important that, through reading this section there is a clear understanding
of what is being attempted.
Also Known As If there is another name imaginable it should be noted in this section (maybe it
is already written by a different author).
Motivation Here a kind of help file for other engineers trying to interpret and apply correctly
the following description of the pattern (e.g. previous experiences, problems
etc.) should be given.
Structure A clear structure is necessary (e.g. G1 solved by Sol1 over Str1). If done so it is
possible to refer to any element.

14.11.2006 2.0 E-47


EASIS Deliverable D3.2 Part 1 - App E

Applicability Necessary context (e.g. environment) is important for correct application.


Participants It should be declared what elements need to be developed or instantiated, what
context is needed, etc. The element descriptions should also explain their
function.
Collaborations The description of how the elements of the pattern work together is mentioned
here.
Consequences This section should highlight, with direct reference to the elements, what has to
be done when applying the Argument pattern (e.g. goals that remain to be
supported, or assumptions to be discharged, etc.).
Implementation The Implementation of the Pattern itself should be explained here. The order as
well as the techniques that are proposed to successfully apply the pattern can
be mentioned at this stage. As well as noting what to do, things that should be
avoided and misinterpretations should be written down.
Sample Code Sample Codes are examples how the Pattern could be applied. If there is only
one it should be the most common one but it is thinkable that more than one
example is provided. The more abstract a pattern is, the more important it is to
provide concrete examples within this section.
Known Uses This section should give answer on the question how a pattern can be applied
as part of a larger safety argument within a Safety Case. So if this pattern or
this Argument structure is already in use it should be mentioned at this place.
Related Safety Case Patterns that are related to this pattern should be mentioned here
Patterns e.g. for different classes of systems.
These headings give advice how to create patterns. As Kelly stated following this structure means
a clear way how to reuse Argument structures.

E.8.2.2 Example Patterns

The following Patterns were found while creating GSN structures for better understanding of the
subject matter. The Top-Level Spider Pattern is a structure that was repeated in almost every GSN
because of the necessity of a good start. The detailed model description, the definition of “safe
enough”, assumptions made and justifications why a specific structure can be chosen are to be
made at the beginning.
The idea of the “Systematic Fault Avoidance Pattern” was that the first decomposition should be
between process and product. By doing so, two different aspects are handled.
What processes were executed to ensure Safety? And how were the Safety related processes
implemented?

14.11.2006 2.0 E-48


EASIS Deliverable D3.2 Part 1 - App E

Systematic Fault Avoidance Pattern


Author Roman Krzemien, Michael Amann

Created 13.03.2006 Last modified 14.03.2006

Intent The intent of this Pattern is to separate the processes and the system design. The design
can only be safe if the way to it was carried out properly.

Also Known As Process/Product Decomposition Pattern


Motivation The safety of the underlying system shall be shown by answering the two questions:
- Is the product safe enough (based on product dependent faults)?
- Are the processes which should lead to a safe system carried out correctly (based on
systematic faults)?

Structure: G1
{System X} is
safe

C1
Product
development
Processes of
S1 {System X}
Argument over
process and
product

C2
Safety related
development
Processes of
{System X}

G2 G3 G4
Development Safety provision {Product C} designed
{Process A} carried {Process B} carried safe
out properly out properly

Participants G1 - G1 defines the overall objective.


S1 - S1 is the Strategy adopted to support G1.
C1 - Here you should list all development relevant Processes (e.g. ISO
9000/9001, SPICE, CMMI, etc.).
C2 - Here you should list all safety relevant Processes (e.g. IEC 61508, Def Stan
00-55, ISO26262, etc.).
G2 - Expresses the implementation of the development processes.
G3 - Expresses the implementation of the safety processes.
G4 - Expresses the safety of the Product itself.

14.11.2006 2.0 E-49


EASIS Deliverable D3.2 Part 1 - App E

Collaborations • C1 (C2) gives a list of the elements in G2 (G3)


• The elements in the sub structure of G4 imply those of G2/G3.
• G2 and G3 necessary to support the implication that solving G4 is sufficient to
support G1

Applicability This is a very general pattern with a wide range of Applicability. For applying it at first C1
and C2 have to be carried out. For this you should check which standards can or should be
applied (e.g. IEC61508). This can be addressed by the development and the safety plan.
At the end you should have a list of the processes carried out. The list should be updated
in later stages but than you have to change the correspondent as well.

Consequences This pattern will result in 3 goals which have to be solved. G2 and G3 will have further
decomposition which is not topic in this pattern but it should be mentioned that processes
mostly have standards which describe the way they should be handled (e.g. qualified staff,
SIL, etc.). Another pattern could be created including contextual information about these
ways. G4 should be broken down further as well (e.g. by Functional Decomposition

Implementation Start by identifying the lists which should be presented in C1 and C2. This step can be
reviewed and has to be repeated or better replenished in later development stages. The
processes will have to be explained in more detail (e.g. Model) and the underlying
standards and guidelines should be applied.

Possible Pitfalls:
• Processes can be left out because the user thinks it is not important enough.
• Processes identified during product development are not implemented in G2 or G3

G1
Example: System “Safe
Speed” is safe

C1 List of
development
Processes
S1 according to
Argument over SPICE
process and
product
C2
Safety Plan
according to
IEC 61508

G2 G3 G4
Development Safety plan
“Safe Speed”
Processes carried processes carried
control unit is safe
out properly out properly

Known uses Simple example

Figure E.36: Systematic Fault Avoidance Pattern

14.11.2006 2.0 E-50


EASIS Deliverable D3.2 Part 1 - App E

Top-Level-Spider Pattern
Author Roman Krzemien, Michael Amann

Created 03.07.2006 Last modified 03.07.2006

Intent The intent of this Pattern is to instantiate the top-level goal and the surrounded elements.
In other words the environmental context is set up.

Also Known As Environment Pattern


Motivation In this Pattern the focus lies on the environmental context. It should be ensured that all
important contextual information is given to ensure that all general conditions are set up
properly. Otherwise there is a danger in poor communicated or missing contexts or models.

Structure:

M1
J1{System X} is of a
System
suitable type for
description
this
J
G1
{System X} is safe
enough
A1 Assumptions
made for C1
reasoning this Definition of
A
acceptably safe

Participants G1 - G1 defines the overall objective

M1 - M1 is the detailed System description, explaining which model, version, etc.


is used
C1 - Here definition of safe enough should be added to explain what general
conditions are used (e.g. IEC 61508, etc.) ensuring safety
J1 - At this place all relevant justifications should be addressed

A1 - All relevant assumptions should be made at this stage


Collaborations • G1 is the central object. The others are strongly dependant on this one.
• M1, C1, J1 and A1 are necessary to describe G1
• C1 and M1 are only linked with G1

Applicability It is a very general pattern with a wide range of Applicability. For applying this Pattern it
should be checked which standards can or should be applied (e.g. IEC61508). This can be
addressed by the development and the safety plan. M1 should be instantiated first so that
the underlying system is ensured. This and C1 which describes the definition of safe
enough are absolutely fundamental.

14.11.2006 2.0 E-51


EASIS Deliverable D3.2 Part 1 - App E

Consequences This pattern can be seen as the starting point of every Safety Case. It is possible to add
more than these contexts. By this you could ensure that all relevant information is added.
By detailed information the system is caught in its environment. The danger of setting up
the system under wrong assumptions is minimized.

Implementation Start by identifying the detailed model description. Next step is the detailed instantiation of
the necessary definition of what safe enough means. This can help implementing the
structure. Often the other two points are closely linked. So it is better to start with
instantiating the assumptions and then the justifications.

Possible Pitfalls:

• The role of the assumptions and justifications can be neglected and so poorly
communicated.
• Poorly communicated Model description and definition of safe enough imply wrong
t t thi i th t
Example:

M1
J1 ALARP Principle
XK8 AJ26
applied in Pattern Electronic
J Throttle
G1
Jaguar XK8
Electronic throttle
is safe enough
A1 Throttle should
allow instantiation C1
Safe enough
of ALARP principle
A means that IEC
61508 was
satisfied

Known uses Simple example

Related Patterns Not available at this time.

Figure E.37: Top-level-spider Pattern


Kelly identified two possible classifications to categorize the Patterns presented in his thesis:
- (i) Particular domain or class of system (e.g. car industry) or (ii) applicable across a number of
domains
- (1) Top-Down Safety Case Patterns can describe the decomposition of some objective (e.g.
Functional Decomposition pattern above), or alternatively (2) Bottom-Up Safety Case Patterns can
be used, which can describe how an argument may be constructed from a piece of evidence. The
mixtures are called (3) General Construction Safety Case Patterns.

14.11.2006 2.0 E-52


EASIS Deliverable D3.2 Part 1 - App E

Kelly described a Safety Case Pattern Catalogue in his Appendix B of his doctoral thesis [1].
Top-Down ALARP Argument
Functional Decomposition
Hazard Directed Integrity Level
Argument
No specific domain: Control System Architecture
Breakdown
General Diverse Argument
Construction:
Safety Margin

Bottom-up Fault Tree Evidence

Particular domain Top-Down Compliance Pattern

Figure E.38: Categorisation of Patterns [1]


Other example Patterns are mentioned in [2]. The use of these pattern are very useful and highly
recommended.

E.8.2.3 AntiPatterns

In addition to providing patterns for standardizing what can be done, anti-patterns can be used to
illustrate some things that should not be done.
The main reasons for poor Arguments are:
- Fallacious arguments
- Incomplete arguments
Negative examples called AntiPatterns can be used to avoid a structure of being bad or poorly set
up. There are recurring mistakes made at several Safety Cases. One of them is that process and
product aspects are mixed and so poorly communicated. As well as the “Systematic Fault
Avoidance Pattern” which was explained above an AntiPattern can be instantiated.
The structure of AntiPatterns is almost the same as the structure of Patterns. The main difference
is that AntiPatterns have two solutions instead of one problem and one solution.
AntiPattern Name Every AntiPattern should get a unique name, so that misleading
argumentation is minimized.
Also Known As If there are different names imaginable they should be stated here.
Most Frequent Scope It helps to find that this AntiPattern could be applied if the scope is
given and highlighted at the beginning.
Refactored Solution Here the name of the better structure is stated.
Name
Refactored Solution The scope of the new structure is given.
Type
Background This should be given to explain the user why this structure is better.
General Form of this This field is to show the GSN structure that should be changed.
AntiPattern

14.11.2006 2.0 E-53


EASIS Deliverable D3.2 Part 1 - App E

Symptoms and At this place the consequences of the existing structure are stated to
Consequences show the difference.
Typical Causes Reasons that lead to such a structure can be given here.
Known Exceptions If there are exceptions allowing such a structure not to result in the
poor or misleading structure they should be mentioned.
Refactored Solutions The general form of the correct or rewritten structure is provided.
Variations If there are variations thinkable they could be added here.
Example An example is always helpful so it should be added to improve the
applicability.
Related Solutions Related solutions help to apply this AntiPattern if the author already
knows the related one.
Examples for AntiPattern are given by Formal Methods AntiPattern, History AntiPattern, and the
two below. The following AntiPattern is called Convergent & Linked support. Together with the
Overlapping Linked Support AntiPattern it is possible to refactor hybrid support into convergent
and linked support. The AntiPattern can be used to refactor the solution into a better form, at this
stage to help refactor poor argument structures. A more detailed discussion on AntiPattern can be
found in [26].

E.8.2.4 Applying Patterns

The first step when applying a Pattern is to ensure that this Pattern really fits to the underlying
situation. Normally there is not much to be shown but it should be mentioned (e.g. as a context:
“Pattern can be applied because…”). As a second step the six step method can be applied but
following the note structure provided by the Pattern itself. After everything is done the new
structure should be looked at again, to check whether all the consequences and implementation
facts are accounted for. It should be mentioned that the use of Anti-Patterns is equal.

E.8.3 Modules

As stated at the beginning Safety Cases grow very fast in size and complexity. Furthermore the
work is not done by one person alone but is divided amongst groups, teams, several individuals, or
even companies (e.g. OEM and supplier). For these reasons the idea of decomposing complex
systems into sub-systems became more and more important. In this chapter the idea of structuring
and managing complex system constructions by using “modules” is discussed. The affinity to
software architecture is obvious.

E.8.3.1 Modules in architecture

Architecture itself has two main principals. Firstly it provides a plan. Secondly it offers an abstract
representation to manage the complexity of the system.
The following points are of importance when considering Architecture:
• The assumptions, justifications and intended environment should be visible
• Components arguments and evidences have to be structured
• Relationships and interdependencies should be made clear
Comparing this to the definition of software safety on page 21 in the book: ”Software Architecture
in Practice” [22] (“The software architecture of a program or computing system is the structure or
structures of the system, which comprise software elements, the externally visible properties of
those elements, and the relationships among them.“) brings up a similarity to Safety Case design.

14.11.2006 2.0 E-54


EASIS Deliverable D3.2 Part 1 - App E

The idea arises that the principals that are used in software architecture are similar to those in
Safety Case engineering.
Principals that arise while thinking about modular systems are:
System level:
• The cooperation of the modules allows the operation of the system
• The system consists of several modules
Module level:
• The modules should be kept as independent as possible
• The modules help to understand and to work with the whole system by decomposing it into
smaller systems

E.8.3.2 Modules in Safety Cases

Several units, in which a Safety Cases is divided to using logical grouping principals, are called
Safety Case Modules. These modules are strictly connected to each other and its relations have to
be noted carefully. The connections are called “Ports” in this report. Ports are the only remaining
link to other modules except in GSN. In GSN there are special notations for modules but at this
stage the focus lies on the theory around modules and their relations. As illustrated above another
important aspect arises. Interactions and dependencies have to be analyzed very carefully. These
interactions of systems are exactly reflected by modular contracts.

E.8.3.3 Module development steps

To design a modular approach some principals should be in mind.


• requirements should be traceable (during the whole lifetime)
• complete implementation
• consistency
• standards and guidelines should be met
• testability
Furthermore the development should be carried out by performing the following steps:
• set up requirements for the modules
• set up the intended behaviour
• ensure that modules can be formulated without additional information
• define interfaces and ports (between modules inside modules and to outlying systems)
• define the module structure

E.8.3.4 Module Ports

It is important that all interactions and relations are documented very carefully so that this
information is not lost.
• objectives addressed by the module
• evidence presented within the module

14.11.2006 2.0 E-55


EASIS Deliverable D3.2 Part 1 - App E

• context defined within the module


• dependencies within the module
• dependencies between modules

E.8.3.5 Contracts

In so called contracts the relations are commented. They should be created every time two or more
modules contact (“touch each other”). This is equal to software engineering where these
dependencies are also captured (define ports/ interfaces).
Safety Case Module Contract

Participant Modules
E.g. Module 1, Module 2, Module 3

Goal Required by Addressed by Goal


Goal 1 Module 1 Module 2 Goal 2
… … … …

Collective Context and Evidence of Participant Modules held to be consistent


Context Evidence

Context 1, Assumption 1, … Solution 1, Solution 2, …

Resolved Away Goal, Context and Solution References between Participant Modules
Cross Referenced Item Source Module Sink Module
Away Goal 1 Module 2 Module 3

Figure E.39: Example Module Contract [23]


Necessary module contract contents:
Definition of bounds and All the bounds and limitations that are assumed are accurately
limitations and completely defined in the contract.
Definition of capacity limits For example the maximum number of processes that can be
supported, the determination of all the ways in which the system
may fail due to system-specific operational context is extremely
challenging.
Ensure that system At the very minimum, it is inevitable that it can be ensured that all
contextual assumptions system contextual assumptions hold when module is integrated in
hold the whole system. This is not a big problem when using standard
modules (e.g. COTS that the vendor supports directly, and has
certification evidence) but gets more critical when a new system
is implemented.
Dependencies on other If there are those dependencies they have to be clearly defined
modules within the and stated.
operational environment

14.11.2006 2.0 E-56


EASIS Deliverable D3.2 Part 1 - App E

E.8.3.6 Main advantages and potential dangers

This chapter should point out the main advantages and the main dangers.
Advantages:
As mentioned in the introduction of this chapter, Safety Cases are mostly created by more than
one person. To manage this it is necessary to split the whole into smaller parts and control the
parts as described above (ref. Figure E.39). The decomposition offers the possibility of working
concurrently and separately on the development of the Safety Case (e.g. in different groups),
which makes it faster and more trustworthy (because of different people working on it).
Another advantage is that, in case of changes, reviewing the whole Safety Case can be limited to
reviewing the affected module. E.g. in software architecture it is assumed that the system
functionality is still maintained even if a SW component which is not affected is not further
analyzed.
Dangers:
It should be mentioned that even standards talk about modularising (e.g. the MoD DefStan 00/55
talks about division of system and software boundaries and a contract by enforcing a link to the
system claims). A more detailed analysis for the occurrence of modules in standards is given in the
third paragraph of Kelly’s paper “Managing Complex Safety Cases” [24].
Referencing to Tim Kelly’s Paper the following potential dangers in Safety Case Decomposition are
listed:
• It will be difficult to ensure that all interactions between subsystems are treated carefully,
especially when they are from different suppliers. A solution to this problem could be to
assign at least one element to the safety considerations set by those interactions.
• The weight of the partial Safety Cases has to be checked by staffs that know the total
Safety Case. It should be taken care that weak subsystem components don’t undermine
some stronger dependant parts.
• The effect or effort can be enlarged when responsibilities can be distributed. But vice versa
dividing responsibilities has the potential danger of setting the “boarders too close”. In this
case it can happen that two or more engineers believe a part to be from another person.
As with most dangers it is almost impossible to eliminate them, but by carefully planning, in this
case of the overall Safety Case, the dangerous areas can be controlled an the level of danger can
be minimized.

E.8.3.7 Modules in GSN

For better illustration graphical notations are used. It is possible to include the modular approach in
GSN. It is important to notice that the reference between the module and its parent system has to
be clearly defined. Packages from the UML (Unified Modelling Language) Standard are illustrated
in the following picture:

Figure E.40: UML modules

14.11.2006 2.0 E-57


EASIS Deliverable D3.2 Part 1 - App E

This presentation inspired the development of the module extensions:

Figure E.41: Module representation in GSN


Away Goals are goals that are not solved at the place they occur but by a module. Normally they
are used as a link to another place in the document or even to a complete new one dependent on
the module and on the preferences of the author. But because of this reason the name “Away
Goal” was chosen. Its solution is away. There has to be a unique reference to the module so that
this relation is unique as well.
By including Away Goals Arguments are supported. This can mean strategies as well as goals and
contextual information.
Public goals are differentiated from private goals. Public goals are goals that are accessible for the
module. The parent goal always has to be public if a reference should be created.

Figure E.42: Decomposition principle

14.11.2006 2.0 E-58


EASIS Deliverable D3.2 Part 1 - App E

Figure E.43: Example Modular GSN network


Modular software systems are developed by dividing system requirements between independent
modules. Those principals, mentioned above, offer the possibility of easier reusing, modifying and
upgrading.

E.8.3.8 Types of systems

There are two types of systems where modular Safety Cases are recommended in this report:
• Components Off The Shelf (COTS) and
• Systems Of Systems (SOS)
Both are possibilities of dividing work or systems into smaller parts. Figure E.44 illustrates how
different systems work together and what happens if a new system is integrated. The
dependencies and interactions have to be considered carefully so that the Safety Case is still
reasonable.

Figure E.44: Car components

14.11.2006 2.0 E-59


EASIS Deliverable D3.2 Part 1 - App E

Figure E.44 illustrates that some components interact with others. In many cases these
components are produced by different teams or even suppliers. There is a difference depending on
whether a system is reused, designed for a specific system, or modified. COTS products are
normally reused and have different requirements and evidence than SOS which are normally
produced to fit in this specific environment.
SOS With many systems it is impossible to ask every single system to be safe. In the automotive
industry there are a lot of suppliers. The Company XYZ builds gearshift systems for a lot of
different OEM. The idea of the modular SC offers the possibility to ensure that their part of
the whole system is safe. This relation should be commented anyway (cp. other subtasks of
WT 3.1). The documentation of these interfaces could be done by Contracts. So the idea of
modular SC gets reasoned.
These types play an important role when the relationship between supplier and OEM (Original
Equipment Manufacturer) is considered. The underlying system (e.g. car) has a lot of independent
subsystems that are built by a different manufacturer (e.g. gearbox).
In this case modular contracts could be a possibility not to disclose company secrets.

E.8.4 COTS

COTS products are Commercial Off-The-Shelf components. This means standard commercial
software developed without any particular application in mind. The collection of component claims
and supporting arguments and evidences alone is not sufficient. A commercial vendor (e.g. of Real
Time Operating Systems) can provide a Safety Case for his product. In this case a contract should
be delivered as well to ensure that the component is introduced and applied correctly. It should be
a “partially complete” and “ready to use” SC. The claims arguments and evidences should be
stated clear and reasonable so that this part can be included in the system SC.
Bought products have the advantage that they reduce costs in the development stage.
Furthermore they are already produced and so the time to develop is reduced. In many
applications (e.g. aircraft, automotive industry, etc.) these components are absolutely necessary.
These components have positive and negative aspects. The main disadvantages are:
• unknown design
• risk and failure analysis is difficult
The original developers often agree to provide a Safety certificate but this normally costs an
additional fee and has only very rough evidences. Sometimes the vendors are not even willing to
provide these evidences. In this case the only possibility is to create strong requirements. For the
Safety Case this implies the following:
The Safety Case should ensure Safety. This can only be done if the COTS component is safe. A
modular Safety Case ensuring this can help here. It should be taken care that no failure of the
COTS system can result in hazards.
The basic conditions should be ensured before the product is selected. When they are selected,
the conditions have to be recorded in a contract. As soon as those conditions are written down the
decision of which seller should be chosen is easier.
Tim Kelly and Fan Ye suggest to use so-called Safety Case contracts similar to those used for
modules. They further propose to address SAL (a Safety Case assessment method which is
introduced later). By doing so the necessary confidence is ensured. The distribution in such a
concept allows the creation of modules. The modular concept was already discussed in previous
chapters.
This distribution has effects on the FTA as well. For the buyer this results in a black box approach.

14.11.2006 2.0 E-60


EASIS Deliverable D3.2 Part 1 - App E

System FTA System FTA with modules

Main
Module

COTS
Module

Content
unknown

Figure E.45: Black box approach


The developers of such components often provide a certificate for the Safety of their product. But
this normally causes new costs and includes only rough evidence. A better way would be the
Assessment by a third independent authority.
If a component was already in use, a proof for its correctness and for its Safety could be created
out of this. For example it is known that the first version of a software application often has
“children’s diseases” which are reduced in later versions. The same is with hardware. So the
argument that a system is safe because it was in use is reasonable but hardly measurable.

14.11.2006 2.0 E-61


EASIS Deliverable D3.2 Part 1 - App E

E.9 Assessment

Until now the Safety Case structure and its contents are considered. Now the next point of interest
is an assessment method. In Standards several methods for allocating risk are mentioned.
The main advantages of a good assessment model are:
• Improved system safety
• Defensible best practice statement
• Certified evaluated systems have advantages in competing
• Confidence in product is increased
• Costs are less if system is known to be acceptably safe
This chapter first introduces a method for Safety Case Assessment (SAL) which will be pursued
later and it introduces SIL. At the end of the chapter a two level assessment is introduced.

E.9.1 Safety Case Assessment

The assessment is a subjective opinion and implies two questions:


• Is the Argument solid enough?
• Is the evidence sufficient?
A potential problem here is that reliance on these opinions can get too strong too soon, leading to
being contented too early, and being satisfied with less than was required initially.
A strong Argument should be both valid and sound (see the definition in E.3.4 Arguments).
Unfortunately because of the evidence and the inferences they are not always sound in the GSN-
model. GSN allows Arguments having true premises but their conclusion could be either true OR
false (=> not even valid). This kind of Arguments is called inductive. In other words the conclusion
is not a necessity but with probability. A kind of probability level can be assigned to this
characteristic. The correctness of all child goals doesn’t imply that the main goal is true. The
Argument should state how much the sub-goals add to the satisfaction of the parent goal but this is
not supported by GSN. An attempt can be made to analyze and allocate the importance of the sub-
goals which implement a degree of how much it helps to satisfy the higher-ranking goal and
furthermore by this the strength of the Argument itself. By strength it is meant the level of reliability
that can be put into the Argument.
If this “strength” is allocated to the relations in the GSN the level of information included in the
network is raised. It now contains subjective impressions. The Assessment depends on how much
the conclusion depends on the premises and the probability that the premises are true.
Unfortunately this is not measurable.

E.9.2 Assessment with SAL

Safety Assurance may be defined as: “A qualitative statement expressing the degree of confidence
that a safety claim is true” [29]. Following this definition a qualitative valuation with regard to the
main argument is sought. The Safety Assurance Level (SAL) has been defined as follows: “SAL is
the level of confidence that a safety argument element (goal or solution) meets its objective” [26].

E.9.2.1 Procedure

Looking at the SAL assessment procedure it becomes apparent that the starting point is at the very
top of the GSN structure which often is “system is safe enough to operate”. Afterwards it goes

14.11.2006 2.0 E-62


EASIS Deliverable D3.2 Part 1 - App E

down until all goals are researched. Than it is possible to address a SAL for the evidence and
check this stepwise to the top again.
Three phases are identified:
1) Top-level safety Assurance Level
2) Parent-child Goal analysis
3) Specify SAL for evidence

Argument
hierarchy
(1)

(2) (3)

Time

Figure E.46: Phases assigning SAL


1) The truth of the goal shall be proven by an Argument. The Assurance of the top level goal is
the so called target assurance. Every SAL is associated with this (at the end required)
assurance.
2) Especially inductive Argumentation needs an explanation of how much the premises support
the conclusion. The parent goals define the required cogency of the argument step to the child
goal. In top down analysis premises are defined so that the conclusion is acceptable. Child
goals must provide these suitable premises. Convergent support: the relevance of child
elements is assessed. Linked support: It is researched if the argument is necessary. It is
helpful if all child elements fit to one of these two terms. Otherwise the connection to the parent
goal can be opaque.
3) Finally it is possible to confirm that the evidence sufficiently assures the basic premises, and
allows the conclusion in an acceptable way by traversing the structure from bottom to top.

E.9.2.2 Safety Assurance Levels (SAL)

Analogue to SIL (Software Integrity Level) and DAL (Development Assurance Levels) SAL is
defined in four steps for these levels. They indicate the required level of assurance. The highest
level is represented by four. Possible levels of relevance are given by: (Near) Valid, High, Medium
and Low.
The next table is an example for possible starting SAL criteria. If the hazard was critical, then the
failure to meet the top-level claim would equate to the potential of the system to cause a critical
hazard SAL3.
Negligible hazards Marginal hazards Critical hazard Catastrophic hazards
SAL 1 SAL 2 SAL 3 SAL 4

Table E.6: Example SAL classification


For the top down analysis it has to be decided whether the underlying argument underlies linked or
convergent support. Accordingly the following rules have to be respected:

14.11.2006 2.0 E-63


EASIS Deliverable D3.2 Part 1 - App E

The SAL decomposition across child elements is down to three factors:


• Support type of the child elements
• Assurance required by the parent goal
• Relevance of the support to the conclusion
The support types are divided into linked and convergent support. These two types are treated
differently in this concept. At first the linked support is focused.

E.9.2.3 Linked support

Linked support means that several elements together support the main argument.
Parent SAL Relevance Child SAL
S4 Valid/Near Valid S4
Valid/Near Valid S3
S3
High S4
Valid/Near Valid S2
S2 High S3
Medium S4
Valid/Near Valid S1
High S2
S1
Medium S3
Low S4

Figure E.47: Example SAL for linked support


For linked supported elements the following rules exist: If the parent goals are SAL4 the relevance
of the linked support child elements to the parent goal must be valid or near valid. The linked
support child elements must maintain the level of assurance. This is why all child elements in the
table above must be SAL 4.
Lower SAL parent goals can inherit a different SAL. If their linked support child elements provide
total coverage of the parent goal, they must also maintain the SAL of the parent goal, but for goals
which are SAL 3, 2 or 1, full assurance is not necessarily required, so it is acceptable to have
reduced relevance with respect to the parent goal and consequently they have the same or a
higher level.

14.11.2006 2.0 E-64


EASIS Deliverable D3.2 Part 1 - App E

SAL2
System XYZ
is safe

Argument
by AB

SAL 3 SAL 3
Sub-system Sub-system
A is safe B is safe

Figure E.48: Child SAL linked support


Figure E.48 shows an example of how linked support child elements get a higher SAL because the
relevance to the parent SAL is medium and not valid.

E.9.2.4 Convergent support

Several premises separately support the conclusion. So the assurance of the parent goal is
separated by the independent child elements. It is important to identify the independence of the
child elements. This can be either mechanistically (same theory in different ways) or conceptually
(different underlying theories). Conceptual independence is desirable as it provides a higher
assurance.
Parent SAL Complementarities Child SAL Child SAL
Independence Child
elements
Conceptual 2 S2 S4
S4
Conceptual 2 S3 S3
Conceptual 2 S1 S3
S3
Conceptual 2 S2 S2
S2 Mechanistic 2 S1 S1

Table E.7: Example SAL convergent support


As an explanation why a child SAL can be lower a situation where convergent support is assumed
(i.e. multiple sub-goals can each independently support the parent goal). There it is feasible to
reduce the assurance of the sub-goal. Often additional protection systems are added in as the
main form of protection can not be shown to be sufficiently high. As an example: In a car, driver
awareness, brakes and steering should be sufficient to prevent an accident, but it cannot be
assured that this level is high enough, so seat-belts and airbags are also included to minimize the
effects of an accident.

14.11.2006 2.0 E-65


EASIS Deliverable D3.2 Part 1 - App E

G1- SAL 4
Car is safe
enough

S1 Arg. by
speed red
equipment

G2 – SAL3 G3 – SAL 2
Manual brakes „SafeSpeed“
safe enough safe enough

Figure E.49: Convergent support illustrated


In the example illustrates the discovery that the manual brakes are not enough to support the
SAL4 parent goal. Another possibility was identified to reduce the speed by the “SafeSpeed”-
function. Together these independent child elements are enough to satisfy the main goal.
When all sub-goals are assessed, SALs can be assigned for the solutions. But it also has to be
proven that they are met. Besides this their assessment is the same as the assessment of the sub-
goals. This SALs indicate how much trust can be placed in the evidence and how much it helps to
show safety. Some important influences are: faults in the presented evidence, review level,
experience and qualification of the personnel, etc.
Unfortunately SAL is still under development and is not in use at the moment in a real application.
At this point an example created by Jane Fenn [16] is illustrative:
Assuming there is a function that should be shown to be safe and it has a SIL 4, if it is required to
provide an Argument over two independent functions, they have to have a SIL 3 and a combinator
has to be used.

SIL 4

function

Argument J
over SIL 3 DefStan

SIL 4 SIL 3 SIL 3

Combinator Argument Argument

Figure E.50: SIL Decomposition

14.11.2006 2.0 E-66


EASIS Deliverable D3.2 Part 1 - App E

However, following the SAL Decomposition the following Figure is obtained:

SAL 4

function

J SAL inherited Argument by Linked


by rules about independent support
linked support

SAL 4 SAL 4

function function

J SAL 3 because Argument Convergen


convergent support t support
rules over other...

SIL 3 SIL 3

function function

Figure E.51: SAL decomposition

E.9.2.5 Compare to other assurance methods

• SAL describes the level of confidence which can be placed in an argument. Safety Integrity
Levels (SIL) and Development Assurance Levels (DAL) do not necessarily target the
evidence with regard to the main goal.
• Furthermore, SAL helps focusing and transparent structuring of the Safety Case argument.
• SAL can be attached to all types of evidence.
• By using SAL the required assurance of the evidence is set at this stage and thus each
item of evidence will have its own required level of thoroughness. SAL justifies the
concentration of effort upon parts of the argument that require greater assurance. This
provides a more rational approach to the selection of evidence based upon the argument
being generated and allows safety engineers to produce suitable evidence of the correct
weight.

E.9.3 Assessment of the Evidence

It has been shown that a level of how much the top-level Argument is satisfied can be
approximated. But is this enough? Can the quality of the Argument be approximated? The
Arguments are defined as TRUE/FALSE conclusions. It is not possible to quantify its return. But an
attempt can be made using the evidence:

14.11.2006 2.0 E-67


EASIS Deliverable D3.2 Part 1 - App E

Creation of a
document

no

Docu- yes Doc.


ment Review OK? accept
ready ed

Empty Independent Filled


persons:
checklist checklist
¾ Moderator
¾ Author

Figure E.52: Assessment of the Evidence


Idea:
Evaluate the evidence (which can be illustrated as a document e.g. Hazard Identification
documents, etc) by a formal review which can be controlled by a checklist. The filled checklist
could be added to the Safety Case or more specifically to its evidences. By this procedure the
evidence would be evaluated to ensure that they are done properly. Deeper detail is given by
Appendix D in this deliverable.

14.11.2006 2.0 E-68


EASIS Deliverable D3.2 Part 1 - App E

E.10 Process

In this report a process (lat. procedere = to go ahead) is defined as a well defined sequence of
system states or working steps that are required for the design, planning, implementation and
operation of the underlying system. As illustrated in the next figure, development can be divided
into the above mentioned phases.

Design and Planning

Development Implementation

Integration and Operation

Figure E.53: development phases


It is necessary to define them so that during those phases necessary procedures are carried out at
the correct time in a correct way. Known from personal use these steps are very important.
Examples therefore are package inserts in pharmaceutical products or building instructions for
furniture.

E.10.1 Correct by construction

Processes can be found in many phases of development and in development itself. Almost every
work flow is defined to guarantee that best practices are carried out properly in future. It is
important to notice that this approach has effects on Safety as well. Building a product in a safe
way can not guarantee the absence but minimize the presence of failures. The idea is to set up a
process which helps to avoid systematic failures. This approach is widely used in standards where
successful procedures are written down. Standards like IEC 61508 also provide a list of methods
which could be applied in the lifecycle. The selection of these methods is producing requirements
which are meeting those of the design process. The difference between random hardware failures
and systematic failures is that system failure rates can be predicted with quantifiable reasonable
accuracy, but systematic failures, cannot be accurately predicted or statistically quantified because
the events leading to them cannot be easily predicted.
In general one could say the better the development process, the less the systematic failure rates.
Best practices should be guaranteed throughout the whole engineering lifecycle.

E.10.2 Engineering Process

Systems Engineering Processes is a logical, systematic set of processes selectively used to


enable the realization of Systems. This includes the above mentioned development phases. The
construction of such a process model is very difficult, but once its implemented future work is more
structured.
Normally the development processes are arranged in the so called “V-model” to underline the
connections lying between them. One example for the connections is test cases. This is illustrated
in the following figure:

14.11.2006 2.0 E-69


EASIS Deliverable D3.2 Part 1 - App E

Total System
Requirement acceptance

System System
requirements test

System System
architecture Integration
Adjust
Concurrent Software Software Data
requirements testing
Engineering
principles Software Software
design Integration

Software
development

Project Supplier Configuration Change Quality Problem


management management management management management management

Figure E.54: V-model


The illustrated above, structure can be found as well in HW as in SW Engineering. Process models
are useful to explain the process sequence.
In each of those processes the underlying activity flows, tools, staff and methods are determined
by the responsible “process owner”. Furthermore they define the in- and output documents that are
necessary to get from one process to the next. The process itself can be decomposed again in
work steps. In each work step documents are created (e.g. to state the input/ output). In
Engineering the most common notation is UML (Unified Modeling Language). This can be used to
describe failure scenario classes and phase diagrams. Sometimes so-called “milestones” are
defined. These are specific stages which are normally communicated by specific documents.

E.10.3 Safety Process

Safety processes can be seen as the previously defined application flow of Safety Engineering
techniques. Some standards implement specific cases for each product to define its process steps.

14.11.2006 2.0 E-70


EASIS Deliverable D3.2 Part 1 - App E

Identification of hazards

Development
and design of
Classification of Hazard occurrence the integrated
hazards analysis safety system

Establishment of dependability-related
requirements
Verification and
validation of
dependability-
related
Safety Case construction requirements

Figure E.55: EASIS dependability framework (taken from EASIS D3.1)


In Figure E.55 the Safety related activities are illustrated. The first activity always is identification of
hazards. Its results are documented in a special Hazard Identification document. In advance a
thorough system development and design descriptions has to be carried out. It should be
mentioned that this point could be seen as the most important as it is the basis for the following
ones. The next steps are Hazard Classification and Hazard Occurrence Analysis. Those two can
be done in parallel. But to increase the performance of these activities it is possible to address the
classification first and assess the need for detailed occurrence analysis for the critical hazards.
Anyway, these activities form the basis of the dependability related requirements which are to be
established next. It is important to show that these requirements are fulfilled. Otherwise there is no
trust put into the Safety of the system (which is called verification). Furthermore it should be shown
that a system fulfils the expectations of the typical users and other stakeholders (this is called
validation).
As a quick summary the basis of these Safety activities is the engineering. The idea comes to mind
that those two are closely linked. There are several ideas how to meet this approach. To do so it
should be done in a process.
One idea is illustrated in the following figure. The process could be based on the waterfall-model
but could use other models. For detailed analysis of the integration of the EASIS dependability
framework (DAF) into a system development process, Appendix of D4.1 should be read.

14.11.2006 2.0 E-71


EASIS Deliverable D3.2 Part 1 - App E

System
design

HI

HC

HO

Requirements

V&V

Figure E.56: DAF using the Waterfall model

E.10.4 System Engineering and Safety Processes

This chapter focuses on the relationship between engineering and Safety Case.
Finally there is an ongoing two way relation between engineering and Safety which is illustrated in
the following figure:

Engineering Safety

Figure E.57: Relation between Safety and general Engineering


It is difficult to separate engineering from Safety. They are closely linked as the decisions made
during the Safety process influence issues from engineering and the other way round. These
tightly linked processes could maybe be linked. This idea would create a more complete
development process including more influencing aspects.
On the one hand, models are developed by the system design engineer to specify and to analyse
the expected behaviour of the system under consideration. This is done on different levels of
abstraction resulting in requirements, models or lower level architectural models.
On the other hand, the envisaged system is analysed by safety specialists with respect to
malfunctions (unintended behaviour). The safety analysis, performed at each stage of the system
development, is intended to identify all possible hazards with their relevant causes. Among
common safety analysis methods are Functional Hazard Analysis, Failure Mode and Effect
Analysis (FMEAs) and Fault Tree Analysis (FTA). Partly, mathematical methods are used (FTA,
stochastic models such as Markov Processes). The relation between those activities is illustrated
in the next picture.

14.11.2006 2.0 E-72


EASIS Deliverable D3.2 Part 1 - App E

Engineering Activities Safety Case

Design change Safety requirement


having req. in is set up
mind

New
System

Requirement Verification and


analysis Validation

Requirement
Failed Accepted
ok?

Decision required Evidence

Figure E.58: Relation between Safety Case and engineering activities


The requirements from the Safety specialists influence the system design. Although the system
design changes, the requirement should be fulfilled or finally eliminated (E.g. if a hazardous
element was replaced by another one which can not result in such hazards).

Hazard Safety
Identification and Engineering
Defining requirements classification
and system definition System acceptance
Hazard
Hazard Occurrence Identification and
Analysis classification
Development of
Integration
design and structure
Hazard Occurrence
Analysis

Implementation/
Development / Production

Figure E.59: Safety integrated in engineering

14.11.2006 2.0 E-73


EASIS Deliverable D3.2 Part 1 - App E

An attempt for coupling those two processes is illustrated above… A better structured process was
created during the EASIS project the so-called EASIS Engineering Process (EEP).

E.10.5 EASIS Engineering Process

The members of the EASIS project defined a special engineering process. As illustrated in the next
figure this process has clearly defined start and end state.

Figure E.60: EEP (taken from EASIS WP4)

Figure E.61: Example dependability process (taken from EASIS WP 4)


In the figure above it is illustrated how documents that are produced help to form the Safety Case
itself. As already described, there are different phases and the documents can be related to those
as well. Firstly there is the Preliminary Safety Case which requests a functional and a model
description. The same is with engineering. To produce something properly a description should be
given or at least it should be possible to give one. The more the process is progressed the more
detailed the description will be. It seems clear that a very draft hazard identification and – analysis
can be performed at the beginning when for example the environment is given. Some hazards (e.g.
air crash when talking about a flying object) are enforced due to its nature. It is possible to derive
the first dependability related requirements from these preliminary documents. So the Preliminary

14.11.2006 2.0 E-74


EASIS Deliverable D3.2 Part 1 - App E

Safety Case can be performed. All documents have to be reviewed. During the whole life cycle the
above mentioned documents are getting more detailed and new versions are given. By this it is
meant that not only the system gets progressed but the Safety Case itself as well. Not only
functional but also design decisions are made during the whole system development. This makes
the system changing several times. These changes do not have to result in big changes viewable
from outside. Anyway it is an ongoing process and implies interim Safety Cases and finally after
final releases of the documents are reviewed a dependability verification. This again is not a
prescriptive method.

E.10.6 Process Assessment

This evolutionary nature of the standards has resulted in the majority providing a ‘cook book’ of
processes that are recommended, dependent on the risk that the system presents to human life.
Not every process defined by a standard or guideline completely fits to the underlying industry and
the usual processes carried out by the companies. In those cases processes are slightly changed
so that they are applicable. These changed company processes have to be checked if they still
perform the intended functions. This is what is meant by process assessment. There are two main
questions that have to be answered when this aspect of Safety is researched.
• Is the process able to create a safe product?
• Is the process carried out properly?
The first point leads to a philosophical discussion. Who decides what a safe product is. In a semi-
prescriptive approach this could be done by seen as the process provided by the underlying
standard. In a pure goal-based approach this problem is hard to address. The second question is
easier to answer. The best practices could be checked by a list, review…
It should be mentioned that the order of processes is important and has to be checked as well.

E.10.7 Problems

Problems arising by applying processes are the following.


• Processes reflect ‘best practice’ at the time of writing. But times change. Sometimes the
best practices are old and there are already new ones available. It should be checked at
frequent times. If a new standard comes up it has to be checked if the new standard is
relevant for the underlying industry, product, etc.
• Authors of the standards do not always reason their recommendations
• Sometimes the rationale behind the choice of processes is missing

14.11.2006 2.0 E-75


EASIS Deliverable D3.2 Part 1 - App E

E.11 Tools

Tools to generate evidence (sources of evidence are already explained in chapter E.7.6,
Evidences)
• Safety analysis tools
• Tools for collecting and analyzing field experience
• Test tools
• FMEA and FTA (tool for this: IQFMEA)
• PHA and HAZOP to identify risks and safety concerns
Tools to support notations/ Integrated Safety Case Tools:
• ASCE (Adelard Safety Case Environment)
• ISCaDE (Integrated Safety Case Development Environment)
• eSafetyCase (electronic Safety Case)
• GSNCaseMaker (GSN Case Maker)
• SAM (Safety Argument Manager)

E.11.1 ASCE tool

ASCE stands for Assurance Safety Case Environment. This graphical hypertext tool helps to
create, review, analyse and disseminate safety and assurance documentation. Its underlying
concept is that argument and evidence have to go hand in hand and it supports both notations
GSN and CAE. First the notation is chosen, then drawing the network can begin. Generally
speaking it is a useful tool for creating hypertext documentation, graphical support for any
arguments or illustration of technical relations and building of them. The availability of 'plug-ins' in
the newer version provides good and useful extensions. The main functions in a review:
• Drawing/ Creating networks and by this (e.g. Safety Case) structures.
• Checking network structure and correct spelling
• Supports status fields and its extensions like “undeveloped” (represented by a diamond
below the box)
• Export as a word or html document
• Adding text to each element by both, directly into the field and as link to the web, another
element and another file.
• The order in which the things should appear in the html format can be defined.
• Work can be protected by setting a password (incl. letters numbers and cases). It is
possible to view protected files but changes cannot be saved without entering the correct
password.
• Define so called “schemas” (e.g. Why-Because-Analysis and define nodes, checking rules,
status fields, etc.)
• Locking of opened files. Only one editor can be opened to prevent concurrent work.
• Plug-ins can be generated and applied to use the tool more efficiently (e.g. Popup windows
to present and collect information from the user, analysis of the structure of a network,
Propagation of data values over the network, connection to third party tools and file
formats, Interaction with the content of the HTML editor, Exporting data).

14.11.2006 2.0 E-76


EASIS Deliverable D3.2 Part 1 - App E

Figure E.62: ASCE screenshot (GSN network)


But the tool can make much more. A report can be created with an overview (graph above) and an
explanation of each item in html format. The report could look like this:
Goal G7 Requirements are already fulfilled
[Back to main map]

Parent nodes:

o Solves Strategy: Argument by level of implementation

Child nodes:
o Is solved by Strategy: A5 Argument by different analysis

The safety related requirements have been implemented properly.


Goal G8 All hazards identified
[Back to main map]

Parent nodes:

o Solves Strategy: Arguments by different analysis


14.11.2006 2.0 E-77


EASIS Deliverable D3.2 Part 1 - App E

Another possibility would be to specify the elements. The design is in the hands of the user. In the
given list are: Title-Id, Title-Description, Type, Has External Reference, Development required,
Instantiation required, Completed, Resourced, Risk and Confidence.

E.11.1.1 Checking network structure

Checking the network structure is a useful tool especially in complex systems. The underlying rules
can be defined by the user. Already included in the ASCE are the following templates:
In GSN networks, ASCE checks:
"Network circularities"
"Strategies must be solved by at least one sub-goal"
"Goals must be solved by at least one goal, strategy or solution"
"Solutions must not be solved by anything"
"Only one top level node (excluding notes)"
"Eventually, all Option nodes should be removed"
"Eventually, all n-iteration and 0/1 choice links should be removed"
"Solutions, Assumptions, Justifications and Contexts, should only have incoming links"
"All nodes should eventually have status completed"
ASCAD network check rules:
"Network circularities"
"Claim with no sub-claim, argument or evidence"
"Argument with no sub-claim, argument or evidence"
"Claim or evidence with direct evidence link"
"Floating claim"
"Evidence with claim or argument as input"
"All nodes should eventually have status completed"
One advantage is that boxes next to the rules can be ticked to make them optional (e.g. "All nodes
should eventually have status completed" can be volitional during the implementation of a
structure). These rules will help create a structure that is complete.

E.11.1.2 Improvements

There can only be one so called “main image”. This implies use of only one html file, meaning that
only the newest version can be in a folder. Change management?
There should be a better standard-rule for the export path like: after any element take next the
model/assumption/justification/context and first go left if possible.
Another possible improvement would be the support of Patterns (explained in chapter E.8.2,
“Reusing Arguments by applying Patterns”).
Because ASCE networks are plain XML, an external application to report on them in any way
required can always be written (even if the tool does not support it directly). ASCE also supports
user plugins which allow for custom reporting behaviours

14.11.2006 2.0 E-78


EASIS Deliverable D3.2 Part 1 - App E

E.11.2 ISCaDE

ISCaDE means Integrated Safety Case Development Environment and is a product of rcm2 Ltd.
The procedure for creating such a network is illustrated in the following figure:

Figure E.63: ISCaDE procedure


The most important benefit is the reuse of the Safety requirements.
After breaking down the Safety Argument to a top-level requirement the requirement can be broken
down by following typical structures. ISCaDE can do this automatically. In the next figure and
example Hazard Log form in DOORS is presented:

Figure E.64: Example Hazard Log Form (taken from Fararooy's paper [25])
ISCaDE is using DOORS (Dynamic Object-Oriented Requirements System) as its database for
managing requirements. It is a COTS product where all requirements and their validations are
stored, managed or at least linked to. DOORS is a Telelogic Ltd. Product.
It is possible to combine Hazard Analysis techniques with an intelligent requirements management.
The idea was that the created environment should span all relevant sectors like a hazard log, risk
assessment, safety requirements, graphical support (e.g. GSN).

14.11.2006 2.0 E-79


EASIS Deliverable D3.2 Part 1 - App E

Standard ISCaDE supports the import of several standards that help to develop
support requirements for ensuring safety already in early design phases. Therefore any
word document that is structured can be turned into a database table in which
attributes (e.g. validation and verification criteria) may be assigned to table entry.
Hazard Log Each hazard can be addressed by a unique Hazard ID and easily be inserted as
a DOORS object. It can be shown in a window including several fields.
Connected fields are: Issue, Category/ Scenario (hazards can be categorized for
easier handling), Hazard (description of each hazard can be given), Cause and
Consequences (can be given in detail), Probability (of a hazard to occur), Severity
(if it occurs how catastrophic its consequences are), overall risk rating (category
given by risk matrices), Safeguard/ Mitigation (what can be done to lessen
consequences), Severity, Probability and overall risk rating after this Mitigation
(by this if it is ALARP if safeguard practices are carried out), Action (that should
be done), Owner (who carries it out) and Notes(to add additional information).
Gap Analysis The Gap Analysis identifies if there are any unsolved Hazards left. This can be
new hazards that are implied by standards or development of new requirements
that handle existing hazards but which are not satisfied yet.
Risk Matrices ISCaDE categorizes the hazards from the Hazard Log by the risk matrices.
SC Notations ISCaDE is able to generate autonomously a notation (e.g. GSN).
It should be noted that all project information including safety is managed within a single object
oriented database with a multi-user environment which implies no duplication or missing effort and
transparency of information to all team members.
Open questions are:
• Is it an automated approach trustworthy?
• Are all necessary contexts added?
• Are all Arguments formulated in appropriate manner?

E.11.3 ESafetyCase

The eSafetyCase (electronic Safety Case) is a Praxis High Integrity Systems Product. It is based
on the application of underlying standards. There are few of such formats integrated and can be
directly chosen: (e.g. DefStan00-56, JSP 518 ...). These formats demand different documents that
together ensure Safety. The tool eSafetyCase gives guidance in adding such documents. A
Microsoft® Visio Plug-in helps to generate Arguments and their integration by creating GSN
structures. It is possible to export this eSafetyCase by html and by PDF whereby it should be
mentioned that html is preferred because of its browsing benefits.

14.11.2006 2.0 E-80


EASIS Deliverable D3.2 Part 1 - App E

Figure E.65: eSafetyCase GSN structure (taken from eSafetyCase manual)

Figure E.66: eSafetyCase itself (screenshot)

14.11.2006 2.0 E-81


EASIS Deliverable D3.2 Part 1 - App E

E.11.4 GSNCaseMaker

The GSNCaseMaker is a software application based on Microsoft® Visio. It was developed by ERA
Technology Ltd. in cooperation with CET Advantage Ltd and the University of York. Where ever a
clearly stated argument is needed the GSN is a useful approach.

Figure E.67: Example eSafetyCase (taken from www.era.co.uk)


Advantages are:
• Compatibility with other Microsoft Office applications,
• Generating high quality graphical outputs.
• The addition of XML import/export capabilities,
• User definable fields,
• Support for modular GSN extensions and
• Argument review facilities

E.11.5 SAM

The Safety Argument Manager is a quite old tool which is not even sold any more. In SAM
Arguments are expressed in a combination of goal structure and Toulmin’s Argument form. Its
graphical presentation looks as follows:

14.11.2006 2.0 E-82


EASIS Deliverable D3.2 Part 1 - App E

Figure E.68: Example window from argument representation using SAM


Goal 1 is solved by Goal 1.1 and Goal 1.2.. In each Goal the symbols S (solution) I (informal
solution) F (formal solution) M (model) T (fault tree) are at its left border. If a letter is bold, than its
corresponding element is already instantiated. The box with the dark heading provides additional
information and can be designed by the author. The name of the author and the date are inserted
automatically. As well as the expression of the argument, SAM provides a drawing solution for
solutions (e.g. fault tree). A hazard Log provides an index for the Safety Case itself and provides
linking those documents so that they can be easily found.

14.11.2006 2.0 E-83


EASIS Deliverable D3.2 Part 1 - App E

E.12 Illustrations

This Chapter contains several illustrations which could be used with minor adaptations in the
development a safety related system.

E.12.1 Safety Case Checklist

A Safety Case Checklist could be either created due to a defined process (e.g. Standards or
Company Process) or could be created due to the Steps planed in a Preliminary Safety Case.
A checklist should include as a minimum, for each item, an owner (who is responsible for ensuring
the activity occurs), an author (who is responsible for actually creating the relevant
documentation), a due date, and a completion date.
The following lists the items on a example Safety Case Status/Checklist, with additional notes
indicating potential future developments and streamlining.
Checklist Item Notes
System Description - Preliminary -
Develop Preliminary Hazard List - The main benefit of the
SIL/DL Estimated - technical items listed
Includes, for example, the SFF here is that they ensure
calculation from IEC 61508, or the the product is correctly
Preliminary System Level FMEA scoped prior to final
equivalent from ISO 26262 when it is
available price agreement. This
helps minimise the
All preceding items must be complete by Quote Approval
resource (part of the
Safety Case Program Plan - ALARP principle, applied
Safety Case Schedule / Checklist - to the process) and
Integrate SC Tasks into Project Plan hardware cost tensions
- that always occur as
Completed
real projects progress.
Preliminary Safety Requirements -
All preceding items must be complete by Proposal Acceptance
System Description - Detailed -
Preliminary Hazard Analysis -
Safety Integrity Level / Difficulty Level
-
Calculation
Hazard Verification Test Requirements -
Safety Requirements Analysis: System,
-
HW, SW
System Level FTA
System Level FMEA
Close actions before Design Verification
System Level Software FMEA
Component Level FMEAs
Hazard Verification Test Results -
All preceding items must be complete by Concept Verification
Power and Grounds Sneak Circuit
-
Analysis
Common Mode/Common Cause
-
Analysis
All preceding items must be complete by Design Verification
Maintenance Safety Analysis -
Process FMEA -

14.11.2006 2.0 E-84


EASIS Deliverable D3.2 Part 1 - App E

Checklist Item Notes


VCRI Matrix -
Safety Analysis Report -
All preceding items must be complete by Product & Process Validation

14.11.2006 2.0 E-85


EASIS Deliverable D3.2 Part 1 - App E

E.12.2 Process Part

The following figure illustrates a part of the Process Safety Case using the ASCAD notations. In
the figure the first phase of the EEP (“Requirements Specification”) is shown. This phase is divided
into 4 sub steps. Each step has its own evidence which is the outgoing document for this phase.
Two work products (A2 and A4) are updated in later phases and therefore they are marked as
incremental documents. This means that the outgoing document is the evidence that a process
step has been done. The grey arrows show the flow of the document into subsequent process
steps, e.g. A1 is an input document for step 1.4.

A15
All process steps are
done and all work
products are created

CLAIM

Is a subclaim of

1
Requirements are
specified

CLAIM

Is a subclaim of
Is a subclaim Is
of a subclaim of
Is a subclaim of

1.1
Natural 1.2 1.3 1.4
Language PHA is Risk Mitigation Structured
Requirements performed Requirements System
are captured are defined Requirements
are specified
CLAIM CLAIM CLAIM
CLAIM

Is evidence for
Is evidence for Is evidence for
A1
Is evidence for
Natural
Language
Requirements
Specification
EVIDENCE A2 - 1 A3
PHA results Risk Mitigation A4 - 1
Requirements Structured
EVIDENCE System
EVIDENCE
Requirements
EVIDENCE
Is evidence for
Is evidence for

A2 A4
PHA Analysis Results Structured System Requirements
Specification
OTHER OTHER

Supports Supports

Note2
Incremental documents
CAPTION

Figure E.69 Illustration of the Process Part using the EEP from WP 4

14.11.2006 2.0 E-86


EASIS Deliverable D3.2 Part 1 - App E

E.13 References

[1] Kelly, T. P.: “Arguing Safety – A Systematic Approach to Managing Safety Cases” DPhil Thesis, Department of
Computer Science, University of York, UK, September 1998
[2] Kelly, T. P. and Weaver, R. A.: “The Goal Structuring Notation – A safety Argument Notation”
[3] Bishop, P. and Bloomfield, R.: A Methodology for Safety Case Development, Adelard, London, UK
[4] U.K. Ministry of Defence “00-56 Safety Management Requirements for Defence Systems,” Ministry of Defence,
Defence Standard December 1996.
[5] Govier, T. : “A Practical Study of Argument”, 3rd ed. Belmont CA: Wadsworth, 1992
[6] Toulmin, S.: “The Uses of Argument.” Cambridge: Cambridge University Press, 1958
[7] Adelard LLP: ASCE tool manual (www.adelard.com)
[8] Tiemeyer, B.: ”Performance evaluation of satellite navigation and Safety Case development”, Universität der
Bundeswehr München, 2002
[9] “Yellow Book” (page 1 – 4, figure 1- 1)
[10] U.K. Ministry of Defence, “JSP 318B - Regulation of the Airworthiness of Ministry of Defence Aircraft”, UK
Ministry of Defence, 1999.
[11] Greenwell, W. S., Strunk, E. A. , and Knight, J. C.. “Failure Analysis and the Safety Case Lifecycle.”, University
of Virginia, 2004
[12] Greenwell, W. S: “Pandora: An Approach to Analyzing Safety-Related Digital System Failures”
[13] Kelly, T. P.: “A Systematic Approach to Safety Case Management”, University of York, UK, 2003
[14] Kelly, T. P.: “A six step Method for Developing Arguments in the Goal Structuring Notation”, York, England,
September 1999
[15] Emmet, Luke: email from the 15th of February 2006
[16] Fenn, J.: “Putting Trust into Safety Arguments“ Jane Fenn and Brian Jepson BAE Systems, Wharton
Aerodrome, Preston, Lancashire, PR4 1AX
[17] MoD SMP12 Safety Case and Safety Case Report, July 2004
[18] Thaller, G. E.: “Software-Test-Verification and Validation”, 2nd edition, Hanover 2002
[19] Ehrenberger, W.: “Software verification- procedures for proving software reliability”, Hanser-Verlag 2002
[20] Weaver, R.A.; McDermid, J.A.; and Kelly, T.P.: “Software Safety Arguments: Towards a Systematic
Categorisation of Evidence”, Proceedings of the 20th International System Safety Conference, Denver, 2002.
[21] Gamma, E.; Helm, R.; Johnson, R. and Vlissides, J.: “Design Patterns: Abstraction and Reuse of Object-
Oriented Design,” presented at ECOOP'93 - Object- Oriented Programming, 7th European Conference,
Kaiserslautern, Germany, 1993.
[22] Bass, L.; Clements, P.l and Kazman, R.: “Software Architecture in Practice”, 2nd edition publisher: Addison
Wesley, 2003
[23] Bate, J. and Kelly, T. P.: “Architectural Considerations in the Certification of Modular Systems”,
(SAFECOMP'02), publisher: Springer-Verlag, September 2002.
[24] Kelly, T. P.: “Managing Complex Safety Cases, (SSS'03)”, February 2003, published by Springer-Verlag
[25] Fararooy, S.: “Managing a System Safety Case in an Integrated Environment”, Esher, Surrey UK
[26] Weaver, R. A.: “The Safety of Software – , Constructing and Assuring Arguments ” DPhil Thesis
[27] Wu, W.: “Safety Tactics for Software Architecture Design”, MSc Thesis
[28] ARP 4754: “Certification Considerations for Highly-Integrated or Complex Aircraft Systems”, Society of
Automotive Engineers, 1996.
[29] ARP 4761: “Guidelines and methods for conducting the safety assessment process on civil airborne systems
and equipment”, Society of Automotive Engineers
[30] Def Stan 00-54: “Requirements for Safety Related Electronic Hardware in Defence Equipment”, UK Ministry of
Defence, 1997.
[31] Def Stan 00-55: “Requirements for Safety Related Software in Defence Equipment”, UK Ministry of Defence,
1997.

14.11.2006 2.0 E-87


EASIS Deliverable D3.2 Part 1 - App E

[32] “Development Guidelines for Vehicle Based Software”, MISRA, 1994.


[33] “DO-178B Software Considerations in Airborne Systems and Equipment Certification”, RTCA, 1999.
[34] “DO-254 Design assurance guidance for airborne electronic hardware”, RTCA
[35] “IEC 61508, Functional safety of electrical/electronic/programmable electronic safety-related systems”, IEC,
1998.
[36] “MIL-STD-882D Standard Practise for System Safety”, U.S. Department of Defence, 2000.
[37] “CAP 670 Air Traffic Services Safety Requirements”, Safety Regulation Group Civil Aviation Authority, 2005
[38] Def Stan 00-42: “RELIABILITY AND MAINTAINABILITY ASSURANCE GUIDES”, UK Ministry of Defence
[39] Def Stan 00-56: “SAFETY MANAGEMENT REQUIREMENTS FOR DEFENCE SYSTEMS”, UK Ministry of
Defence
[40] Def Stan 00-58: “HAZOP STUDIES ON SYSTEMS CONTAINING PROGRAMMABLE ELECTRONICS”, UK
Ministry of Defence

14.11.2006 2.0 E-88

You might also like