- Home
- Machinery Directive
- History of the Machinery Directive 2006/42/EC
- Machinery directive 2006/42/EC
- Whereas of machinery directive 2006/42/EC
- Articles of machinery directive 2006/42/EC
- Article 1 of machinery directive 2006/42/EC - Scope
- Article 2 of machinery directive 2006/42/EC - Definitions
- Article 3 : Specific Directives of machinery directive 2006/42/EC
- Article 4 : Market surveillance of machinery directive 2006/42/EC
- Article 5 : Placing on the market and putting into service - machinery directive 2006/42/EC
- Article 6 : Freedom of movement - machinery directive 2006/42/EC
- Article 7 : Presumption of conformity and harmonised standards - machinery directive 2006/42/EC
- Article 8 : Specific measures - machinery directive 2006/42/EC
- Article 9 : Specific measures to deal with potentially hazardous machinery - machinery directive 2006/42/EC
- Article 10 : Procedure for disputing a harmonised standard - machinery directive 2006/42/EC
- Article 11 : Safeguard clause - machinery directive 2006/42/EC
- Article 12 : Procedures for assessing the conformity of machinery - machinery directive 2006/42/EC
- Article 13 : Procedure for partly completed machinery - 2006/42/EC
- Article 14 : Notified bodies - machinery directive 2006/42/EC
- Article 15 : Installation and use of machinery - machinery directive 2006/42/EC
- Article 16 : CE marking - machinery directive 2006/42/EC
- Article 17 : Non-conformity of marking - machinery directive 2006/42/EC
- Article 18 : Confidentiality - machinery directive 2006/42/EC
- Article 19 : Cooperation between Member States - machinery directive 2006/42/EC
- Article 20 : Legal remedies - machinery directive 2006/42/EC
- Article 21 : Dissemination of information - machinery directive 2006/42/EC
- Article 22 : Committee - machinery directive 2006/42/EC
- Article 23 : Penalties - machinery directive 2006/42/EC
- Article 24 : Amendment of Directive 95/16/EC - machinery directive 2006/42/EC
- Article 25 : Repeal - machinery directive 2006/42/EC
- Article 26 : Transposition - machinery directive 2006/42/EC
- Article 27 : Derogation - machinery directive 2006/42/EC
- Article 28 : Entry into force - machinery directive 2006/42/EC
- Article 29 : Addressees - machinery directive 2006/42/EC
- ANNEX I of machinery directive 2006/42/EC - Summary
- GENERAL PRINCIPLES of annex 1 of machinery directive 2006/42/EC
- 1 ESSENTIAL HEALTH AND SAFETY REQUIREMENTS of annex 1 - definitions - machinery directive 2006/42/EC
- Article 1.1.2. Principles of safety integration of annex 1 machinery directive 2006/42/EC
- Article 1.1.3. Materials and products annex 1 machinery directive 2006/42/EC
- Article 1.1.4. Lighting - annex 1 machinery directive 2006/42/EC
- Article 1.1.5. Design of machinery to facilitate its handling - annex 1 machinery directive 2006/42/EC
- Article 1.1.6. Ergonomics - annex 1 machinery directive 2006/42/EC
- Article 1.1.7. Operating positions - annex 1 machinery directive 2006/42/EC
- Article 1.1.8. Seating - annex 1 machinery directive 2006/42/EC
- Article 1.2.1. Safety and reliability of control systems - annex 1 of machinery directive 2006/42/EC
- Article 1.2.2. Control devices - annex 1 of machinery directive 2006/42/EC
- Article 1.2.3. Starting - annex 1 of machinery directive 2006/42/EC
- Article 1.2.4. Stopping - annex 1 of machinery directive 2006/42/EC
- Article 1.2.4.4. Assembly of machinery - Annex 1 of machinery directive 2006/42/EC
- Article 1.2.5. Selection of control or operating modes - annex 1 of machinery directive 2006/42/EC
- Article 1.2.6. Failure of the power supply - annex 1 of machinery directive 2006/42/EC
- Article 1.3. PROTECTION AGAINST MECHANICAL HAZARDS - annex 1 of machinery directive 2006/42/EC
- Article 1.4. REQUIRED CHARACTERISTICS OF GUARDS AND PROTECTIVE DEVICES - annex 1 of machinery directive 2006/42/EC
- Article 1.5. RISKS DUE TO OTHER HAZARDS - annex 1 of machinery directive 2006/42/EC
- Article 1.6. MAINTENANCE - annex 1 of machinery directive 2006/42/EC
- Article 1.7. INFORMATION - annex 1 of machinery directive 2006/42/EC
- Article 2. SUPPLEMENTARY ESSENTIAL HEALTH AND SAFETY REQUIREMENTS - annex 1 machinery directive 2006/42/EC
- Article 3. SUPPLEMENTARY ESSENTIAL HEALTH TO THE MOBILITY OF MACHINERY - annex 1 machinery directive 2006/42/EC
- Article 4. SUPPLEMENTARY REQUIREMENTS TO OFFSET HAZARDS DUE TO LIFTING OPERATIONS of machinery directive 2006/42/EC
- Article 5. SUPPLEMENTARY ESSENTIAL HEALTH AND SAFETY REQUIREMENTS FOR UNDERGROUND WORK of machinery directive 2006/42/EC
- Article 6. SUPPLEMENTARY REQUIREMENTS - HAZARDS DUE TO THE LIFTING OF PERSONS of machinery directive 2006/42/EC
- Annex II : Declarations of CONFORMITY OF THE MACHINERY, DECLARATION OF INCORPORATION - machinery directive 2006/42/EC
- Annex III of machinery directive 2006/42/EC - CE marking
- Annex IV of machinery directive 2006/42/EC
- Annex V of machinery directive 2006/42/EC
- Annex VI of machinery directive 2006/42/EC
- Annex VII - Technical file for machinery - machinery directive 2006/42/EC
- Annex VIII - Assessment of conformity of machinery directive 2006/42/EC
- Annex IX of machinery directive 2006/42/EC - EC type-examination
- Annex X of machinery directive 2006/42/EC - Full quality assurance
- Annex XI of machinery directive 2006/42/EC - Minimum criteria for the notification of bodies
- Annex XII of machinery directive 2006/42/EC - Correlation table between machinery directive 2006/42/CE and MD 1998/37/CE
- Machinery directive 1998/37/EC
- considerings of machinery directive 1998/37/CE
- articles of 1998/37/EC machinery directive
- Annex I of 1998/37/CE machinery directive
- Annex II of 1998/37/EC machinery directive
- Annex III of machinery directive 1998/37/CE
- Annex IV of machine directive 1998/37/EC
- Annex V of machines directive 1998/37/CE
- Annex VI of machines directive 1998/37/EC
- Annex VII of machines directive 1998/37/EC
- Annex VIII of 1998/37/CE machine directive
- Annex IX of machinery directive 1998/37/CE
- Machinery directive 1989/392/EC
- whereas of machinery directive machines 1989/392/EEC
- articles of machinery directive 1989/392/EEC
- Annex I of machinery directive 1989/392/EEC
- Annex II of machine directive 1989/392/EEC
- Annex III of machinery directive 1989/392/EEC
- Annex IV of machinery directive 1989/392/EEC
- Annex V of machinery directive 1989/392/EEC
- Annex VI of machine directive 1989/392/EEC
- Annexe VII of machinery directive 1989/392/EEC
- Amendments of 1989/392/EEC directive
- ATEX directives
- ATEX 94/9/EC directive
- Whereas of ATEX 94/9/CE directive
- Articles of ATEX 94/9/CE directive
- article 1 ATEX 94/9/EC directive
- article 2 ATEX 94/9/EC directive
- article 3 ATEX 94/9/EC directive
- article 4 : ATEX 94/9/EC directive
- article 5 : ATEX 94/9/EC directive
- article 6 : ATEX 94/9/EC directive
- article 7 : ATEX 94/9/EC directive
- article 8 ATEX 94/9/EC directive
- article 9 : ATEX 94/9/EC directive
- article 10 : ATEX 94/9/EC directive
- article 11 : ATEX 94/9/EC directive
- article 12 : ATEX 94/9/EC directive
- article 13 : ATEX 94/9/EC directive
- article 14 : ATEX 94/9/EC directive
- article 15 : ATEX 94/9/EC directive
- article 16 : ATEX 94/9/EC directive
- ANNEX I of ATEX 94/9/EC directive : CRITERIA DETERMINING THE CLASSIFICATION OF EQUIPMENT-GROUPS INTO CATEGORIES
- ANNEX II of ATEX 94/9/EC : directive ESSENTIAL HEALTH AND SAFETY REQUIREMENTS -EHSR
- ANNEX III of ATEX 94/9/EC directive : MODULE EC-TYPE EXAMINATION
- ANNEX IV of ATEX 94/9/EC directive : MODULE PRODUCTION QUALITY ASSURANCE
- ANNEX V of ATEX 94/9/EC directive : MODULE PRODUCT VERIFICATION
- ANNEX VI of ATEX 94/9/EC directive : MODULE CONFORMITY TO TYPE
- ANNEX VII of ATEX 94/9/EC directive : MODULE PRODUCT QUALITY ASSURANCE
- ANNEX VIII of ATEX 94/9/EC directive : MODULE INTERNAL CONTROL OF PRODUCTION
- ANNEX IX of ATEX 94/9/EC directive : MODULE UNIT VERIFICATION
- ANNEX X of ATEX 94/9/EC directive : CE Marking - Content of the EC declaration of conformity
- ANNEX XI of ATEX 94/9/EC directive: NOTIFICATION OF BODIES
- ATEX 99/92/EC Directive
- ATEX DIRECTIVE 2014/34/UE
- whereas of 2014/34/UE ATEX directive
- Articles of ATEX 2014/34/UE directive
- Annex 1 of ATEX 2014/34/UE directive
- Annex 2 of the ATEX 2014/34/UE directive
- Annex 3 of ATEX 2014/34/UE directive
- Annex 4 of ATEX 2014/34/UE directive
- Annex 5 of ATEX 2014/34/UE directive
- Annex 6 of ATEX 2014/34/UE directive
- Annex 7 of ATEX 94/9/EC directive
- Annex 8 of the ATEX 2014/34/UE directive
- Annex 9 of the ATEX 2014/34/UE directive
- Annex 10 of ATEX 2014/34/UE directive
- Annex 11 of ATEX 2014/34/UE directive
- Annex 12 of the ATEX 2014/34/UE directive
- Audits in Ex field - EN 13980, OD 005 and EN ISO/CEI 80079-34
- New ATEX directive
- RASE european project
- ATEX 94/9/EC directive
- IECEX
- Standardization & European Regulation
- Safety of machines : Standardization and European regulations
- European regulation for machines - standardization for machines - harmonized standards
- Standardization in machinery
- EN ISO 12100 - Décembre 2010
- EN ISO 12100-1 - January 2004
- EN ISO 12100-1:2003/A1
- EN ISO 12100-2 November 2003
- EN ISO 12100-2:2003/A1
- EN ISO 14121-1 September 2007
- ISO/TR 14121-2 - 2007
- EN 50205:2002 standard - Relays with forcibly guided (mechanically linked) contacts
- ISO 11161:2007
- ISO 13849-1:2006
- ISO 13849-2:2012
- ISO 13850:2006 - Safety of machinery -- Emergency stop -- Principles for design
- ISO 13851:2002 - Safety of machinery -- Two-hand control devices -- Functional aspects and design principles
- ISO 13854:1996 Safety of machinery - Minimum gaps to avoid crushing of parts of the human body
- ISO 13855:2010 - Safety of machinery -- Positioning of safeguards with respect to the approach speeds of parts of the human body
- ISO 13856-1:2013 Safety of machinery -- Pressure-sensitive protective devices -- Part 1: General principles
- ISO 13856-2:2013 - Safety of machinery -- Pressure-sensitive protective devices -- Part 2: General principles for design testing
- ISO 13856-3:2013 Safety of machinery -- Pressure-sensitive protective devices - Part 3: General principles for design
- ISO 13857:2008 Safety of machinery -- Safety distances to prevent hazard zones
- ISO 14118:2000 - Safety of machinery -- Prevention of unexpected start-up
- ISO 14119:2013- Interlocking devices associated with guards
- ISO 14120:2002 - Guards -- General requirements for the design and construction
- ISO 14122-1:2001 - Permanent means of access to machinery
- ISO 14122-2:2001 - Permanent means of access to machinery
- ISO 14122-4:2004 - Permanent means of access to machinery
- ISO 14123-1:1998 - Reduction of risks to health from hazardous substances emitted by machinery
- ISO 14123-2:1998 - Reduction of risks to health from hazardous substances emitted by machinery
- ISO 14159:2002 - Hygiene requirements for the design of machinery
- ISO 19353:2005 -- Fire prevention and protection
- ISO/AWI 17305 - Safety of machinery - Safety functions of control systems
- ISO/DTR 22100-2 - Safety of machinery -- Part 2: How ISO 12100 relates to ISO 13849-1
- ISO/TR 14121-2:2012 - Risk assessment - Part 2: Practical guidance
- ISO/TR 18569:2004 - Guidelines for the understanding and use of safety of machinery standards
- ISO/TR 23849:2010 - Guidance on the application of ISO 13849-1 and IEC 62061 in the design of safety-related control systems
- STABILITY DATES FOR Machinery STANDARDS
- harmonized standards list - machinery-directive 2006/42/CE
- Publication of harmonised standards for machinery directive 2006/42/EC - 9.3.2018
- Harmonized standard list - machinery directive 2006/42/EC - 9.6.2017
- Harmonized standards for machinery - OJ C 2016/C173/01 of 15/05/2016
- Harmonized standards for machinery -OJ C 2016/C14/102 of 15/01/2016
- Harmonized standards for machinery - corrigendum OJ C 2015/C 087/03 of 13/03/2015
- harmonized standards for machinery - OJ C 2015/C 054/01 of 13/02/2015
- Application guide for machinery directive 2006/42/EC
- Guide to application of the machinery directive 2006/42/CE - July 2017
- Guide to application of the Machinery Directive 2006/42/EC - second edition June 2010
- Guide to application of machinery directive - 1-2 : The citations
- Guide to application of machinery directive - § 3 to § 31 The Recitals
- Guide to application of machinery directive - § 32 to § 156 - The Articles
- Guide to application of machinery directive - § 157 to § 381 - Annex I
- Guide to application of machinery directive - § 382 to § 386 - ANNEX II Declarations
- Guide to application of machinery directive - § 387 - ANNEX III CE marking
- recommendation for use - machinery directive 2006/42/EC
- Notified bodies under the machinery directive 2006/42/CE
- Safety of Ex, ATEX and IECEx equipments : Standardization
- Standardization in Ex Field
- The transposition of the ATEX 94/9/EC Directive to the 2014/34/EU directive
- harmonized standards list - ATEX directive 2014/34/EU
- Harmonized standard list for ATEX 2014/34/UE - 12-10-2018
- Harmonized standard list for ATEX 2014/34/UE - 15.6.2018
- Harmonized standard list for ATEX 2014/34/UE - 12-07-2019
- Harmonized standard list for ATEX 2014/34/UE - 9.6.2017
- Harmonized standards list ATEX 2014/34/UE directive - OJ C 126 - 08/04/2016
- Guide to application of the ATEX Directive 2014/34/EU
- application guide of 2014/34/EU directive - preambule, citations and recitals
- Guide to application of the ATEX 2014/34/UE directive - THE ARTICLES OF THE ATEX DIRECTIVE
- Guide to application of the ATEX 2014/34/UE directive - ANNEX I CLASSIFICATION INTO CATEGORIES
- Guide to application of the ATEX 2014/34/UE directive - ANNEX II ESSENTIAL HEALTH AND SAFETY REQUIREMENTS
- Guide to application of the ATEX 2014/34/UE directive - ANNEX III MODULE B: EU-TYPE EXAMINATION
- Guide to application of the ATEX 2014/34/UE directive - ANNEX IV MODULE D: CONFORMITY TO TYPE
- Guide to application of machinery directive - § 388 - ANNEX IV machinery and mandatory certification
- Guide to application of the ATEX 2014/34/UE directive - ANNEX V MODULE F: CONFORMITY TO TYPE
- Alignment of ten technical harmonisation directives - Decision No 768/2008/EC
- ATEX 94/9/EC directive documents
- ATEX 94/9/EC guidelines
- ATEX 94/9/EC guidelines 4th edition
- 1 INTRODUCTION of ATEX 94/9/EC guidelines 4th edition
- 2 OBJECTIVE OF THE ATEX DIRECTIVE 94/9/EC - ATEX 94/9/EC guidelines 4th edition
- 3 GENERAL CONCEPTS of ATEX 94/9/EC directive ATEX 94/9/EC guidelines 4th edition
- 4 IN WHICH CASES DOES DIRECTIVE 94/9/EC APPLY - ATEX 94/9/EC guidelines 4th edition
- 5 EQUIPMENT NOT IN THE SCOPE OF DIRECTIVE 94/9/EC - ATEX 94/9/EC guidelines 4th edition
- 6 APPLICATION OF DIRECTIVE 94/9/EC ALONGSIDE OTHERS THAT MAY APPLY - ATEX 94/9/EC guidelines 4th edition
- 7 USED, REPAIRED OR MODIFIED PRODUCTS AND SPARE PARTS - ATEX 94/9/EC guidelines 4th edition
- 8 CONFORMITY ASSESSMENT PROCEDURES - ATEX 94/9/EC guidelines 4th edition
- 9 NOTIFIED BODIES - ATEX 94/9/EC guidelines 4th edition
- 10 DOCUMENTS OF CONFORMITY - ATEX 94/9/EC guidelines 4th edition
- 11 MARKING - CE marking -ATEX 94/9/EC guidelines 4th edition
- 12 SAFEGUARD CLAUSE AND PROCEDURE - ATEX 94/9/EC guidelines 4th edition
- 13 EUROPEAN HARMONISED STANDARDS - ATEX 94/9/EC guidelines 4th edition
- 14 USEFUL WEBSITES - ATEX 94/9/EC guidelines 4th edition
- ANNEX I: SPECIFIC MARKING OF EXPLOSION PROTECTION - ATEX 94/9/EC guidelines 4th edition
- ANNEX II: BORDERLINE LIST - ATEX PRODUCTS - ATEX 94/9/EC guidelines 4th edition
- ATEX 94/9/EC guidelines 4th edition
- Harmonized standards list - ATEX 94/9/EC directive
- Harmonized standards list ATEX 94/9/EC directive - OJ C 126 - 08/04/2016
- Harmonized standards list ATEX 94/9/EC - OJ C 335 - 09/10/2015
- Harmonized standards list ATEX 94/9/EC - OJ-C 445-02 - 12/12/2014
- Harmonized standards list ATEX 94/9/EC - OJ-C 076-14/03/2014
- Harmonized standards list ATEX 94/9/EC - OJ-C 319 05/11/2013
- ATEX 94/9/EC guidelines
- European regulation for ATEX 94/9/EC ATEX directive
- Guide to application of ATEX 2014/34/EU directive second edition
- Safety of machines : Standardization and European regulations
- Latest news & Newsletters
- Functional safety
- Terms and definitions for functional safety
- Safety devices in ATEX
- The SAFEC project
- main report of the SAFEC project
- Appendix 1 of the SAFEC project - guidelines for functional safety
- Appendix 2 of the SAFEC project
- ANNEX A - SAFEC project - DERIVATION OF TARGET FAILURE MEASURES
- ANNEX B - SAFEC project - ASSESSMENT OF CURRENT CONTROL SYSTEM STANDARDS
- ANNEX C - safec project - IDENTIFICATION OF “USED SAFETY DEVICES”
- Annex D - SAFEC project - study of ‘ Used Safety Devices’
- Annex E - Determination of a methodology for testing, validation and certification
- EN 50495 standard for safety devices
- The SAFEC project
- Safety components in Machinery
- STSARCES - Standards for Safety Related Complex Electronic Systems
- STSARCES project - final report
- STSARCES - Annex 1 : Software engineering tasks - Case tools
- STSARCES - Annex 2 : tools for Software - fault avoidance
- STSARCES - Annex 3 : Guide to evaluating software quality and safety requirements
- STSARCES - Annex 4 : Guide for the construction of software tests
- STSARCES - Annex 5 : Common mode faults in safety systems
- STSARCES - Annex 6 : Quantitative Analysis of Complex Electronic Systems using Fault Tree Analysis and Markov Modelling
- STSARCES - Annex 7 : Methods for fault detection
- STSARCES - Annex 8 : Safety Validation of Complex Components - Validation by Analysis
- STSARCES - Annex 9 : safety Validation of complex component
- STSARCES - Annex 10 : Safety Validation of Complex Components - Validation Tests
- STSARCES - Annex 11 : Applicability of IEC 61508 - EN 954
- STSARCES - Annex 12 : Task 2 : Machine Validation Exercise
- STSARCES - Annex 13 : Task 3 : Design Process Analysis
- STSARCES - Annex 14 : ASIC development and validation in safety components
- Functional safety in machinery - EN 13849-1 - Safety-related parts of control systems
- STSARCES - Standards for Safety Related Complex Electronic Systems
- History of standards for functional safety in machinery
- Basic safety principles - Well-tried safety principles - well tried components
- Functional safety - detection error codes - CRC and Hamming codes
- Functional safety - error codes detection - parity and chechsum
- Functional safety and safety fieldbus
- ISO 13849-1 and SISTEMA
- Prevention of unexpected start-up and machinery directive
- Self tests for micro-controllers
- Validation by analysis of complex safety systems
- basic safety principles - safety relays for machinery
- Download center
- New machinery regulation
- Revision of machinery directive 2006/42/EC
- security for machines
STSARCES - Annex 6 : Quantitative Analysis of Complex Electronic Systems using Fault Tree Analysis and Markov Modelling
Annex 6 - Quantitative Analysis of Complex Electronic Systems using Fault Tree Analysis and Markov Modelling
Final Report of WP2.1
SUMMARY
The risk reduction provided by the operation of a safety system can be assessed in different manners. While EN 954‑1 is using a qualitative scale of different categories IEC 61508 makes use of the Safety Integrity Level (SIL) as a quantitative measure. The latter is expressed by the probability of a dangerous failure of the safety related device. Thus a procedure is needed to take over the results of a qualitative analysis into a probabilistic evaluation. Markov models turned out to be the most appropriate tool because of their considerable capability of handling many of the technical features usually made use of by modern safety devices. Implementing a new feature enabled the models to reveal the interdependency of the online test rate, the rate of demands on the safety function and the Safety Integrity Level.
Markov models have been developed for several system arcitectures typical for the machinery sector. By altering the input data practical questions of interests can be answered concerning basic system design parameters such as diagnostic coverage (DC) or the need of a watchdog test. The evaluation results are able to demonstrate the influence of parameter variations and allow of a comparison between different system architectures.
The system architectures introduced in this report are proposed to be considered as "designated architectures" for the machinery sector. They can be assigned a category according to EN 954‑1. The developed basic Markov models make it possible to draw a link between the categories of EN 954‑1 and the Safety Integrity Levels of IEC 61508. It is not a fixed link because additional input information is needed beyond the category in order to determine the SIL. Arranged in a table some exemplary evaluation results may be used in order to simplify the SIL assessment in some cases. Whenever a manufacturer can prove that his system structure is in accordance with one of the designated architectures and that his quantitative parameters comply with the precalculated examples no new Markov modelling will be necessary.
Contents
1. Reasons for the need of reliability evaluations
1.1. Aims for the use of safety devices
1.2. Contributions of the standards
1.3. The investigations of this report
1.4. Fault tree analysis and Markov modelling
2. Short introduction to Markov modelling with respect of safety related systems
2.1. General
2.2. Modelling random failures of components with constant failure rates
2.3. Modelling common cause failures
2.4. Modelling online tests
2.5. Modelling demand
2.6. Modelling repair
2.7. Evaluation of a Markov model
2.8. Techniques for reducing the number of Markov states needed
3. Determination of the Safety Integrity Level according to IEC 61508 for the different modes of operation
4. Single channel system without fault detection in accordance with category B or 1 of EN 954‑1
4.1. Description
4.2. Markov model and assumptions
4.3. Result of evaluation
5. Single channel system with implemented tests in accordance with category 2 of EN 954‑1
5.1. Description
5.2. Markov model and assumptions
5.3. Result of evaluation
6. Dual channel system with comparison in accordance with category 3 or 4 of EN 954‑1
6.1. Description
6.2. Markov model and assumptions
6.3. Result of evaluation
7. Dual channel system in mixed technology in accordance with category 3 of EN 954‑1
7.1. Description
7.2. Markov model and assumptions
7.3. Result of evaluation
8. Triple channel system with comparison in accordance with category 4 of EN 954‑1
8.1. Description
8.2. Markov model and assumptions
8.3. Result of evaluation
9. Designated architectures of CES for the machinery sector
10. Conclusions
11. References
Glossary
ASIC |
Application specific integrated circuit |
C |
Diagnostic coverage (DC) |
CAT |
Category (according to EN 954‑1) |
CC |
Current converter |
CCF |
Common cause factor (β) |
CES |
Complex electronic system |
D |
Drive |
dang |
dangerous |
DC |
Diagnostic coverage (C) |
DCSC |
Dual channel system with comparison |
DCSMT |
Dual channel system in mixed technology |
E/E/PE |
Electrical / electronic / programmable electronic |
ES |
Emergency stop (actuator) |
EUC |
Equipment under control |
FTA |
Fault tree analysis |
IN |
(Input of a) switch-off path of the drive |
Ip, IP |
(Input of the) switch-off path of the drive for the PED |
Iw, IW |
(Input of the) switch-off path of the drive for the watchdog |
M |
Motor |
MTBD |
Mean time between demands on the safety function |
MTTF, MTTFd |
Mean time to dangerous failure |
|
(Average) probability of a dangerous failure per hour |
PED |
Programmable electronic Device |
PES |
Programmable electronic safety related system |
PFD |
(Average) probability of failure on demand |
PLC |
Programmable logic controller |
RC |
Relay circuit |
rd |
Demand rate on the safety function |
rr |
Repair rate |
rt |
Test rate |
S |
general sensor, rotation sensor |
SCS |
Single channel system |
SCST |
Single channel system with implemented tests |
SIL |
Safety integrity level (according to IEC 61508) |
TCSC |
Triple channel system with comparison |
TM |
Mission time |
Tr |
Average repair time |
Tt |
Test interval |
WD |
Watchdog |
β |
Common cause factor (CCF) |
λ |
dangerous failure rate |
1. Reasons for the need of reliability evaluations
1.1. Aims for the use of safety devices
The operation of many technical systems is involving risks of harm to people. The goal of the use of safety devices is to reduce these risks to an acceptable level. There is a wide range of various technical risk reducing measures. The investigations described in this report focus on complex programmable electronic safety related systems, commonly referred to as PES.
By providing one or more specified safety functions the safety device must make sure that a sufficient reduction of the risk is achieved whenever the equipment under control (EUC) is operated.
1.2. Contributions of the standards
One essential aspect in implementing a safety related system is considering the particular application in order to derive the needed risk reduction. A risk analysis has to be carried out for every potential hazardous event implied in operation of the EUC. Methods for obtaining the necessary risk reduction are presented in the standard IEC 61508‑5 [1]
The other aspect is to ensure that the claimed risk reduction is actually attained by the safety device that will be applied. As a consequence the degree of risk reduction provided by a particular safety system has to be determined. The two standards EN 954‑1 [2] and IEC 61508 [1] both are classifying electronic safety devices according to their respecting properties.
EN 954‑1 [2] has chosen a qualitative approach by defining five categories (B, 1, 2, 3, 4) which differ in the reaction of the safety device after the occurrence of internal faults. Thereby requirements concerning technical realisation are established indirectly.
IEC 61508 [1] is distinguishing four different safety integrity levels (SIL 1 ... SIL 4) in order to provide a quantitative measure for grading a system’s risk reducing capability. The latter is expressed by the probability of a dangerous failure of the safety related device.
Both standards describe technical means which can be implemented to improve the reliability of a safety related device. Said means include architectural measures, selection of appropriate system components, idle current principle, various kinds of online tests, etc.
It must be emphasised that these means are not intended to improve the availability of the equipment under control (EUC) being supervised by the safety device. In this context reliability signifies the probability of a safety device to be able to perform it’s intended safety function(s).
Lists and descriptions of such techniques and measures can be found for example in EN 954‑1 (chapter 5), EN 954‑2 (to be published in 1999), IEC 61508‑2 (annex A) and IEC 61508‑7 (annex A).
As far as methods for fault detection are concerned, this is subject to the work of SP, Sweden (WP 2.2) [3].
1.3. The investigations of this report
Most of the reliability improvement techniques mentioned in the standards can be considered to be well-tried or at least well-known. Nobody will have any doubts that the implementation of these measures will improve the safety device’s reliability, i.e. the probability of a safety device to be able to carry out the safety function(s) it has been developed for.
On the other hand it is difficult to assess quantitatively which degree of improvement is actually achieved by a particular feature implemented in a particular safety system. Moreover, in typical safety devices a junction of several technical means is found, for instance hardware redundancy in combination with a number of different component tests.
Therefore a mathematical tool is needed for evaluating reliability in order to find out which effect has been accomplished altogether.
The final goal of reliability evaluation techniques is to verify if a claimed safety integrity level (SIL) according to IEC 61508 [1] is actually met by a given complex electronic safety device.
This report will present the result of some basic investigations of simplified typical system architectures that can meet categories B, 1, 2, 3 and 4 according to EN 954‑1 [2].
With this basic investigations it is possible to gain some information about the link between the categories of EN 954‑1 [2] and the safety integrity level (SIL) according to IEC 61508 [1]. In parallel practical questions of interests concerning basic system design parameters such as self test rates, diagnostic coverage or the need of a watchdog test can be answered in respect of IEC 61508. Results of a quantitative reliability evaluation are able to demonstrate the influence of parameter variations and allow a comparison of different system architectures.
1.4. Fault tree analysis and Markov modelling
Besides Markov modelling reliability block diagrams and fault trees can be used for a quantitative analysis of a safety related system. The principle of a quantitative fault tree analysis (FTA) is described in [6] and, more detailed in [7].
However, for the evaluations presented in this document Markov modelling techniques [4], [5] have been chosen because of their considerable capability of handling many of the technical features usually implemented in modern safety devices. Especially periodic events like online tests can be modelled quite comfortably.
Nevertheless a qualitative fault tree analysis may be useful in connection with a Markov model. This will be demonstrated in chapter 7.2.
2. Short introduction to Markov modelling
with respect of safety related systems
2.1. General
Markov models [4] are an efficient tool for evaluating the probability of the occurrence of states in which a system can dwell while a process is running. In principle this modelling technique is applicable to any type of process (e.g. biological, chemical or physical processes) as long as it meets certain requirements.
Concerning safety related devices, the considered system is given by the hardware of the device and the process is represented by the failures of the system’s components, by online tests, system repair and demand on the safety function. All this is considered during the entire mission time TM of the safety system, where “TM” means a declared span of time during which the system is permitted to be used for safety related applications. The desired result of the Markov model application in this case is the probability of a dangerous failure of the system.
The Markov approach requires that a set of system states is established which covers any single state that might occur during the time under consideration. Furthermore, these states have to be mutual exclusive, i.e. at any time a particular sample of the device must be assignable to exactly one of the states.
Passing over from one state to another is described by transition probability. It is important to note that transition probability is always related to a specified time interval Dt. The complete set of all possible transitions is represented by a set of transition probabilities in connection with adjoined source and drain states. Together with the definitions of the states this information is sufficient to establish the Markov model.
Dt must be the same for all transition probabilities within the model. Since Dt affects the value of every single transition probability it must be chosen small enough in order to ensure that the sum of all transition probabilities exiting from any of the states is smaller than one. This is because probability by definition cannot exceed the value of one. Dt could be designated as the "time base" of the calculation.
The usual graphic manifestation of the model consists of a circle (or "bubble") for each state and connective arcs for the possible transitions. The circles are labelled by the names of the states (often abbreviations for descriptions of the states) and/or a number whereas the arcs are labelled by the pertinent transition probabilities.
Markov modelling implies that all transitions only depend on the transition probabilities and the present state probabilities and not on what has happened in the past. Therefore such a model is sometimes called "memoryless". Irrespective of the fact that a complex electronic system usually contains memory Markov modelling techniques can be applied because normally the memory contents has no influence on component failures.
2.2. Modelling random failures of components with constant failure rates
Mostly component failure rates can be assumed to be constant over time. This is usually correct if
- the mission time ends before the beginning of the wear-out area,
- additional early failures can be neglected or are eliminated by burn-in,
- the component itself does not contain any redundancy.
Then the transition probability due to random failures is given by
,
where l is the failure rate and Dt is the time interval the transition probability is related to.
The Markov model for this simple failure process is shown in Figure 1.
Figure 1: Failure of a component with constant failure rate
It is important to point out that our scope implies only dangerous failures of components or systems to be considered. When a dangerous failure of a component occurs, in many cases this will not cause a dangerous system failure, due to the system’s inherent fail safe capability. However a component failure is said to be dangerous if the component thereby is no longer able to perform a subfunction which helps the system to carry out it’s intended safety function, even if the safety function is maintained by other (redundant) system components.
Failures which only affect the availability of the process under control are out of the scope of this document’s investigations.
2.3. Modelling common cause failures
Failures of components due to the same cause are called common cause failures. They can form a severe problem for safety systems using homogenous redundancy. In bibliography this item is also dealt with using the term common mode faults [6]. These effects can be taken account of in a Markov model by applying the so-called beta model. This model assumes that a fraction β of the failure rate affects all components of same type at the same time. β is a value between zero and one, usually much lower than 0.1. The residual fraction (1-β) must still be applied to each of the redundant components. Figure 2 demonstrates the Markov implementation for two redundant components or subsystems.
Figure 2: Markov representation of dual redundancy with common cause effect
In case of three redundant components the modelling technique must be extended. A decision has to be made whether the common cause always forces all three components to fail (simple β factor model of Figure 3) or whether it produces a certain fraction of dual failures as well (multiple greek factor model of Figure 3). In the latter case a new parameter γ would have to be introduced in order to determine the ratio between dual and triple failures. This ratio depends on the distribution function of the stress which is causing the failures and on the distribution function of the component susceptibility towards the stress. For the most part these data will not be available.
Figure 3: Models for failure rate stream partitioning for triple redundancy common cause failures
Therefore the simple β model is chosen as a worst case estimation for the common cause effect of a triple redundancy. This model applies exactly if the components show a sharp threshold of susceptibility towards stress. Figure 4 depicts the Markov implementation of this model.
Figure 4: Markov representation of triple redundancy with common cause effect
(simple β factor model)
It is important to apply the modelling techniques described above to any of the model states where two or more components of same type are still operational.
2.4. Modelling online tests
If a particular component of a system is periodically tested by means of an automatic system self test, it is spoken of an online test.
Usually online tests are not performed continuously but only at certain points of time, fixed by the test rate rt. In addition, some tests are considerably time consuming, e.g. tests of large memories. Hence the test duration establishes an upper limit for the test rate and in some cases may result in a rise of the hardware costs if the test has to be carried out very often. For that reason the test rate is an important design parameter not least in the manufacturer’s view.
Therefore the Markov models presented in this document take account on the span of time which is needed for the detection of internal component failures (so far as provided by the system in question).
The time interval between two consecutive tests is named the test interval Tt. The probability that a test is carried out during the time interval ("time base") Δt is given by
,, where , is called the test rate.
The test can be either successful or not. There is a probability to detect a failure under the condition that a failure has occurred before. This conditional probability is called the diagnostic coverage C (or diagnostic coverage factor DC). C is a measure for the quality of a test. Therefore the probability that the test takes place is divided into two fractions:
The (transitional) probability that the test is able to detect the failure within the time interval Dt is
,,
whereas the probability that the test will not be successful during Δt is given by
,.
If a test has failed to detect a component failure due to it’s limited diagnostic coverage it is assumed that this test will not detect the failure even if it is repeated (deterministic test model). The Markov representation of this test process is shown in Figure 5.
Figure 5: Online test of a component
The left circle in Figure 5 is depicting an intermediate state with the component x failed while the online test has not yet been carried out. Additional component failures may occur in all three states of Figure 5.
Detection of a component failure implies that there are still other components operational which can carry out an appropriate safety oriented action. Consequently each state must be checked carefully in order to determine which online test is executable actually.
2.5. Modelling demand
IEC 61508 distinguishes Safety Integrity Levels (SILs) for safety-related being employed in
- low demand mode of operation
- high demand mode of operation and
- continuous mode of operation.
In order to determine the SIL for the first two modes of operation the demand on the safety function has to be modelled. A demand will lead to a hazardous situation if the safety-related system is in a “dangerous” state which is the case whenever it is not able to fulfil it’s intended safety function. This is taken account of in the model by introducing transition arcs for the demand leading from any dangerous state to a common “hazard state”.
The circle on the left side of Figure 5 is representing an intermediate state where component x has failed but the online test has not yet been executed. If this state is “dangerous”, it is contributing to the system’s overall probability of dwelling in a dangerous state and, consequently, a “demand arc” must exit from the intermediate state. Then the probability of the intermediate state does not only depend on the failure rate of component x but also on the test rates and the demand rate. Therefore these two rates have an influence on the overall probability of the system to behave dangerous and on the SIL which is calculated from this probability (see chapter 3). Naturally test rate and demand rate are also affecting the number of hazardous events within the mission time of the safety system since a demand can hit the system in an intermediate state before an online test has taken place. These effects are clearly demonstrated in chapters 1.1 and 6.3.
By regarding the test rates as well as the demand rate this type of Markov model is able to simulate precisely the influence of both rates on the SIL and helps to answer the question which test rate is necessary.
The transition probability for a demand is given by
,.
For the continuous mode of operation there are continuous demands so that the potentially dangerous states are hazardous. The demand does not have to be modelled for these systems. Chapter 3 will describe how to achieve the SIL for the three different modes of operation.
2.6. Modelling repair
When a failure has been detected successfully the safety related system will carry out a predetermined action. Usually the same will happen as if the operational safety device would perform it’s intended safety function: the process which is controlled by the device will be shut down to ensure a non-volatile safe behaviour.
Naturally, in this case shutdown is not a reaction due to an external hazardous situation but is used as an indicator to signalise the need of a repair. Furthermore, for machinery safety related systems it is assumed that process operation will be prevented until the safety related device has been repaired or replaced. We also postulate no online repair capability to be provided so far.
Assuming an average repair time Tr, the (transition) probability that a repair will take place during the time interval Δt is given by
,, where , is referred to as the repair rate.
After repair (or replacement) system and process operation will be continued. Figure 6 presents the corresponding Markov diagram.
Figure 6: Repair of the system
Usually the assumption is made that a repaired system is as good as a new one. Actually the safety device has only been renewed in parts. Nevertheless this is a reasonable simplification as long as none of the system’s components is drawing up to it’s wear-out area during the residual mission time of the system.
2.7. Evaluation of a Markov model
The original method for evaluation of a given Markov model consists in deriving a complete set of differential equations from the model and solving for the desired state probabilities as continuous functions of time. This is a difficult and time-consuming job since the number of differential equations is equal to the number of model states. Thus this method is suitable only for small models with a few states.
Fortunately a numerical solution can also be obtained if a time discrete solution is accepted. This method requires the entire set of transition probabilities to be arranged into the template of a transition matrix. Since either of the transition probabilities in the matrix is related to the same time interval Δt, the transition matrix P itself is related to that interval as well.
In general the transition matrix P for a Markov model comprising n states consists of n rows and n columns. It is given by
,,
with pij being the transition probability of a transition from state i to state j during the time interval Δt. If no transition from state i to state j is possible (i.e. if there is no transition arc), the corresponding transition probability pij in the P-matrix is set to zero.
The matrix elements of the diagonal represent the probabilities that there will be a transition from a particular state to that state itself, i.e. pii is the probability that the system will not leave state i during the time interval Δt.
The transition probabilities of each row in the matrix must sum to one, because it is sure that any transition will take place during. Δt, the "transitions" from states to themselves included. Thus the matrix elements of the diagonal can be calculated:
, (n is the number of model states.)
At any time the complete set of all state probabilities can be compiled by forming the state probability row matrix S, which is defined by
,, where si is the state probability of state i.
The elements of S must also sum to one, for it is secured that at any time the system must be in either of the states.
If St is the state probability row matrix at the point of time t, the state probability row matrix for t+ Δt can be calculated by using the transition matrix P related to the time interval Δt:
,
By using this formula n times recursively, the state probability matrix St+n×Δt for the point of time t+n×Δt is obtained. Every single multiplication of the state probability matrix by the transition matrix P causes the state probabilities to proceed by a time step of Δt. Therefore a time step of n×Δt can directly be done by using the transition matrix that is raised to the power of n:
,
Making use of one or both of these two equations it is possible to calculate any state probability desired at any time which is equal to a multiple time step of Δt.
Usually the procedure is started with a state probability row matrix that has the form
,.
This implies the assumption that at the beginning (time zero) the system’s probability of being in state No. 1 is 100% whereas the probability to be in any of the other states is equal to zero. For instance, state 1 could represent a completely operational safety related system. All the other states symbolise system conditions where one or more components have failed.
After having executed the desired number of matrix multiplications there will be found a new distribution of probabilities related to the time step that has been made. Some components will have failed during the this time step. This results in a probability of state 1 which is smaller than 1 (or 100%) while the other probabilities have increased.
Some of the states represent a dangerous failure of the entire system. The probabilities of the latter states must be added to obtain the total probability of dangerous failure of the system for the pertinent point of time.
Usually the calculations are automated by computer software, either regular spreadsheet programs, specific Markov model software or universal mathematical software. BIA makes use of the windows-based programs MS Excel (the well-known spreadsheet program) and Carms [8] (a specific Markov modelling program). It has been testet by spot-checks that both programs deliver the same results.
2.8. Techniques for reducing the number of Markov states needed
In principle the Markov method requires every possible combination of component failures to be taken account of by a state of it’s own. This is due to the fact that random failures will occur in an accidental chronological order. For instance, a system comprising only 10 components will show 210 = 1024 different failure combinations if each component is either operational or defective. Such a great number of states is hard to handle. Fortunately there are some techniques which can be applied in order to reduce the number of states needed:
1. Combining all “dangerous” states where definitively no online test will be effective any longer. It must be taken notice of the fact that there might exist some states with no effective online test but a test will become active after the occurrence of additional component failures. A state of this kind must not be put into the collection.
2. Symmetric architectures providing homogenous redundancy allow to unite model states which show the same system behaviour as a consequence of the symmetric arrangement of the failed components. Generally, different states must not be united only because they lead to uniform system behaviour.
3. The model development can normally be terminated if all combinations with a certain number of failures have been covered. For example, if every possible combination of four component failures has been regarded by particular states, it may be adequate to unite all states with five or more failures. Reason: the more failures are needed for arriving at a definite state the less will be the probability of that state. Executing the calculation once with the collective state assumed to be “dangerous” and once with the assumption that the system is operational in that state will show whether the accuracy of the simplified model is sufficient.
All three techniques have been applied to the Markov model of the homogenous triple channel system described in chapter 8. It consists of 9 components with 512 potential failure combinations, but the number of model states has been cut down to 91, including additional states for proper handling of the online tests.
3. Determination of the Safety Integrity Level according to IEC 61508
for the different modes of operation
IEC 61508‑1 [1] defines in its tables 2 and 3 two target failure measures for a safety function as Safety Integrity Levels (SILs): Table 2 is allocated to an E/E/PE safety-related system operating in low demand mode of operation whilst table 3 is related to an E/E/PE safety-related system operating in high demand or continuous mode of operation. The target failure measure for low demand mode is given by the average probability of failure to perform its designed function on demand (PFD) whereas the measure for high demand or continuous mode is given by the (average) probability of a dangerous failure per hour (PDF). These definitions imply the demand on the safety function to be taken into account when the SIL of a particular system shall be ascertained. As a consequence the state “hazard” needs to be introduced into the Markov model. The term “hazard” must carefully be distinguished from the term “accident”, because in this context “hazard” means the coincidence of a dangerous system failure and a demand on the safety function, which will not necessarily lead to an accident. In the following it will be shown by using a simple example how to determine the SIL for the three different modes of operation: low demand mode, high demand mode and continuous mode of operation.
Figure 7 shows the Markov model of a safety system with a single channel for the main safety function and an additional supervision device. The supervisor function can detect a certain part of the failures in the main system and can shut down the machine (EUC) to a safe state (state 4). All undetectable failures will lead to a dangerous undetectable state (state 5). There is a probability that the supervision device has failed first and consecutively the main system fails also. This will lead to state 5 too. When the main function has failed (state 2) a demand will lead to the hazard state as long as this failure is not detected. The same is the case if there is a demand in state 5. If only the supervisor has failed (state 3) the system can still perform its designed safety function.
For low and high demand mode of operation the same Markov model is applicable, but the SIL is determined in two different ways: The PFD (probability of failure on demand) which delivers the value for the SIL table for low demand mode of operation (table 2 of IEC 61508‑1) is calculated by:
,
As said before states 2 and 5 together present the probabilities of a hazard in case of the occurrence of a demand. The SIL is obtained from the average probability. Therefore the sum of these two probabilities must be integrated over and subsequently divided by the mission time TM.
|
lM: Failure rate main system
lS: Failure rate supervision device
rt: Test rate
rd: Demand rate
rr: Repair rate
C: Diagnostic coverage
Δt: Markov time step
Figure 7: Markov model for determining the SIL
The PDF (probability of a dangerous failure per hour) which is needed for the SIL in case of high demand mode of operation (table 3 of IEC 61508‑1) is calculated by:
,
As said before states 2 and 5 give the probabilities for a hazard in case that a demand occurs. To get the average probability of a dangerous failure per hour we have to calculate the average flow from states 2 and 5 to state 6. This is the average probability of the two states multiplied by the demand rate.
For continuous demand mode systems states 2 and 5 are directly hazardous because the demand is present continuously. State 6 and all arcs connected with state 6 must be removed from the model. Two new repair arcs must be added. The first one points from state 2 to state 1 and the second one from state 5 to state 1. In this simple example the safe shutdown (state 4) can only be reached after the occurrence of a hazard. This means that in practice, this simple system architecture is not suitable for continuous mode of operation. However the principle of model evaluation for continuous mode can be demonstrated.
Calculating the SIL in this case requires to determine the average flow into these hazardous states 2 and 5:
,
Again, the obtained value is used for table 3 of IEC 61508‑1.
During the calculations it could be shown, that the same hardware is achieving the same SIL, independently from its use in high or low demand mode of operation. It could also be shown that the PDF for continuous mode of operation equals the PDF for high demand mode operations with a very high demand rate. Both results are plausible.
4. Single channel system without fault detection
in accordance with category B or 1 of EN 954‑1
4.1. Description
The categories B or 1 according to EN 954-1 [2] imply that the system does not provide any capability of detecting internal faults. For category 1 not only basic but well-tried safety principles and components must be used, which means that a higher reliability is achieved and therefore the probability of a system failure is lower than in category B (see EN 954‑1, 6.22).
If we assume the system comprising a sensor (S), a programmable electronic device (PED) with integrated power supply for signal evaluation and a drive (D) that is controlled by the PED it can be represented in a block diagram by a simple series system. This is shown in Figure 8.
Figure 8: Block diagram of a single channel system without fault detection
Normally a single electronic device is not regarded to be a well-tried component. Thus, it is not possible to realise a category 1 single channel safety system using a PED.
Three assumptions have been made in order to determine the SIL:
1. Switching off the drive is the appropriate action to generate a safe state of the equipment under control (EUC) the drive is belonging to.
2. The safety system is not able to induce a hazardous situation by itself. The worst case which can occur is a dangerous failure, i.e. the system cannot perform it’s intended safety function.
3. Failures are only revealed by a demand on the safety function. This leads to a hazardous situation which will be followed by a repair.
4.2. Markov model and assumptions
Since all three components are series-connected either of them must be operational for the safety system to be operational. Therefore a total failure rate for the system can be obtained by simply adding the failure rates oft the components:
,
Assuming constant failure rates for the components the system failure is also constant. As mentioned formerly only dangerous failures are regarded. The Markov model regarding system failure, demand on the safety function and repair is shown in Figure 9.
Figure 9: Markov model of the single channel system (SCS) without fault detection
4.3. Result of evaluation
Because of the elementary nature of this specific model no special matrix technique is needed (although such techniques could be applied successfully). The result can be obtained by solving a system of three differential equations describing the inputs and outputs of each state. Involving the initial conditions
,, ,,
and using the abbreviations
and
the average probability of a dangerous failure on demand for the mission time TM is given by
.
Thus the average probability of a dangerous failure per hour can be calculated by
.
These equations have been evaluated for a mission time TM of 10 years and 1 year and a (“dangerous”) mean time to failure of 15, 150 and 1500 years each component. The result is shown in the diagram of Figure 10.
Figure 10: Average probabilities (PFD, PDF) versus MTTF for the single channel system
without fault detection
According to the diagram, for a mission time of 10 years each unit of Figure 8 needs an MTTF of 150 years to achieve SIL 1. So the safety-related system has a total MTTF of 50 years.
Systematic failures are not included because they are assumed to be avoided by the qualitative measures and requirements of IEC 61508. If systematic failures are included, the MTTF of the hardware must be even better than 50 years.
The calculations assuming a mission time of 10 years and no proof test shows that SIL 1 cannot be achieved by complex electronics according to category B.
As defined in IEC 61508 a proof test requires each component of the safety-related system to be tested, so that after the proof test the system can be restored to an “as new” condition. We believe that proof tests are not possible for complex electronics but only for non complex electromechanics. A proof test may therefore be possible for category 1 systems.
Figure 10 also shows that SIL 1 is possible with a MTTF of 15 years per unit if the mission time is cut down to one year. This is equivalent to performing a perfect proof test once a year thus starting a “new” mission time in order to prolong the actual period of use. For electromechanical devices a MTTF of 150 years may be possible so that SIL 2 may be achievable with a proof test interval of one year. Higher SILs are not realistic even for category 1 architectures.
The single channel system without fault detection establishes a "reference system" which the systems introduced in the next chapters will be compared with. Hence, the effects of measures like hardware redundancy or online tests will be made obvious.
5. Single channel system with implemented tests
in accordance with category 2 of EN 954‑1
5.1. Description
Category 2 of EN 954-1 [2] requires self checks to be executed by the safety related system "at suitable intervals". The tests may be initiated either manually or automatically. If a fault is detected an output signal shall be generated in order to initiate an "appropriate control action". Whenever possible a safe state shall be induced.
These requirements imply "that the occurrence of a fault can lead to the loss of the safety function between the checking intervals". Additionally it must be remarked that many of the typical testing techniques do not provide a diagnostic coverage of 100%. Therefore there may exist faults within the safety device which cannot be detected by the checks.
A representative system architecture for category 2 is presented by the block diagram of Figure 11 [9].
Figure 11: Block diagram of a single channel system with implemented tests
Compared with the simple system of Figure 8 a watchdog (WD) has been added in order to monitor the operation of the programmable electronic device (PED) which is thought to be represented by a microcontroller system. In the PED a power supply is integrated. The drive (D) has two separate inputs, the first (Ip) - as usual - for the PED and a second one (Iw) for the watchdog, each providing full switch-off capability. The system is also performing periodic tests of the sensor, the switch-off path(s) of the drive and the watchdog.
Several assumptions have been made in order to ease the creation of a suitable Markov model:
1. Switching off the drive is the appropriate action to generate a safe state of the equipment under control (EUC) the drive is belonging to.
2. The safety system is not able to induce a hazardous situation by itself. The worst case which can occur is a dangerous failure, i.e. the system cannot perform it’s intended safety function.
3. The programmable electronic device (PED) is periodically performing a self test. Detection of a dangerous failure of the PED simply consists in staying away of the retrigger pulses which are normally sent repeatedly to the watchdog (WD). This online test is characterised by the test rate rtp and the diagnostic coverage Cpe which is assigned a value between zero and one. Cpe is the conditional probability that a dangerous failure of PED will be detected, given that it has occurred. In this case the PED is no longer able to cut off the drive via input Ip although this might be necessary. If the fault is detectable the drive will be cut off by the watchdog via input Iw (presumed that WD and Iw both are operational).
4. The sensor and the drive-internal switch-off path beginning with input IP of the drive are tested periodically by the PED. The corresponding test rates are named rts and rtip respectively. The diagnostic coverages are assumed to be equal to one as long as the tests are carried out. The test rates can be set to zero in order to model the case that no such tests are implemented.
5. The watchdog is also tested by the PED. The corresponding test rate is called rtw and the diagnostic coverage is supposed to be equal to one. If there is no watchdog test, the rate rtw can be set to zero. There are two ways to monitor the operation of the watchdog. It’s output signal can either be directly reread by the PED or the drive-internal switch-off path beginning with input Iw of the drive can be included in the test loop. In the latter case said switch-off path is also covered by the test. This can be expressed by the diagnostic coverage Ciw which is set either to zero or to one.
6 Any failure which has been detected successfully will drive the system to a non-volatile safe state with the drive cut off. The system is assumed to be disconnected from the power manually until it has been repaired or replaced by a new one.
7. If the PED has failed it will no longer perform any tests of PED-external components, i.e. S, Ip, WD and Iw are not tested in case of a failure of PED.
8. In order to describe the drive by a single dangerous failure rate a factor k out of the interval (0...1) has been introduced. Thereby the dangerous failure rates of the drive-internal switch-off paths beginning with inputs Ip and Iw respectively can be derived from the drive’s total dangerous failure rate:
5.2.Markov model and assumptions
Based on the assumptions listed in chapter 5.1 Markov model SCST (Single channel system with implemented tests) has been developed. It is plotted in Figure 12.
The state at the top is depicting the fully operational system. The ellipse-shaped state “undet dang” on the right represents a collection of dangerous states with no fault detection possible. Thus a demand from this state leads to the hazardous state at the bottom. All the other states are intermediate states where some components have failed and a detection is possible or has already happened (state “fail det” on the left). Every dangerous state of the model is labelled by the remark "dang". Any of the circle-shaped states is also labelled by names of the components that have failed. After a hazard has occurred the machine is disconnected from power and repaired. The same is valid when a failure is detected by online tests.
The Markov model is a little complicated because we wanted to simulate the time-related behaviour of the system. This is the reason why, for instance, the intermediate states “S dang”, “PED dang” and “IP dang” appear in the second row of Figure 12: some time is needed to detect the failures and during this span of time a demand on the safety function could lead to a hazard or a second unit could fail. Rows 3 and 4 show states where a second and third unit fails before the failure is detected and a demand occurs. With this complete Markov model the effect of the demand rate, the test rate and the diagnostic coverage can be studied. This is not possible with an often used simplified approach where a failure is immediately detected or not. After studying the time effects in detail we will decide whether a simplification is allowed or even fault tree analysis will be sufficient.
Figure 12: Markov model SCST of a single channel system with implemented tests
5.3. Result of evaluation
Model SCST was evaluated by using the spreadsheet program MS Excel. As a reference configuration we chose a MTTF of 15 years per unit. Drive D and sensor S can be tested with 100% coverage by the PED. The self test of the PED is executed in connection with the watchdog WD. According to [10] 80% coverage can be assumed for a tested WD. 100% diagnostic coverage is possible for a digital sensor giving an on or off signal. These signals are usual for sensors which are monitoring the position of a guard, for a safety mat or a light curtain in front of a machine. In order to test the drive it has to be checked whether the motor is moving or not. This can also be done by a sensor giving a digital output, thus we supposed 100% diagnostic coverage. Test and repair rate both were chosen one per 8 hours. The demand rate was set to 1 per year assuming the system to be operated in low demand mode. An overview of the reference input parameter set is given by he following table.
MTTF of the programmable logic device (PED) |
MTTFd ped |
15 years |
MTTF of watchdog (WD) |
MTTFd wd |
100 years |
MTTF of the sensor (S) |
MTTFd s |
15 years |
MTTF of the drive (D), k=0.5 |
MTTFd d |
15 years |
Diagnostic coverage of the sensor |
Cs |
1 |
Diagnostic coverage of the PED |
Cp |
0.8 |
Diagnostic coverage of the drive's switch-off input for PED |
Cip |
1 |
Diagnostic coverage of the drive's switch-off input for WD |
Ciw |
1 |
Test rate of the sensor |
rts |
1/(8 hours) |
Test rate of the PED |
rtp |
1/(8 hours) |
Test rate of the drive's switch-off input for PED |
rtip |
1/(8 hours) |
Test rate of the watchdog and the drive's switch-off input for it |
rtw |
1/(8 hours) |
Repair rate after failure detection |
rr |
1/(8 hours) |
Demand rate of the safety function |
rd |
1/year |
Repair rate after hazardous event |
rrh |
1/(8 hours) |
Mission time (life time) |
TM |
10 years |
Figure 13 compiles the results, i.e. the probability of failure on demand (PFD) for the reference configuration and various parameter alterations. This will be discussed in the following.
Figure 13: PFD of a single channel system with implemented tests in accordance with category 2
Failure rate of the subsystems
The last four bars in Figure 13 show the results with modified MTTFs. One can see that a change of the MTTF of all subsystems has a direct and proportional effect on the PFD. For a complex safety-related system an MTTF of 30 years may be achievable but 100 years are normally unrealistic. The changes of the MTTFs of sensor and drive exclusively have no remarkable influence on the PFD. This can easily be explained by the fact that we have assumed 100 % diagnostic coverage for sensors and drive. This justifies to assume equal MTTFs for the different components in the other parameter combinations. As a result we can say that a category 2 architecture is able to fulfil SIL 1 and with reliable components may in some cases achieve SIL 2.
Diagnostic coverage of the subsystems
Bars 2 to 6 show the effect of switch-off path testing and watchdog testing. In machinery applications normally it will be possible to test the switch off path when the machine is stopped. In this case the machine sometimes can be stopped by the 2nd switch off path and the reaction of the motor can be monitored. The second bar shows the effect of the omission of this test. It is also possible to test the effect of the WD. This can only be done by using the 2nd switch off path. If this test is omitted the result shown by bar 3 is attained. On the other hand the sensor S and the normal switch-off path can be tested by the PED. If these tests of the periphery are not executed a dramatic effect results as shown by bar 4. If no tests are carried out except the selftest of the PED we loose nearly 2/3 of a SIL step (bar 5). Bar 6 shows that the use of a standard controller (e.g. PLC) without diagnostic tests results in a similar worsening like omitting the peripheral tests. Bars 8, 9 and 10 demonstrate the major influence of the diagnostic coverage of the PED which is similar to the influence of the failure rates. Nearly one SIL step can be gained by improving the diagnostic coverage from 90% to 99%. A processor-watchdog-combination will not be able to achieve a diagnostic coverage of 99%. 90% may be possible by sophisticated means. Processor tests are very effective but they have to be combined with tests of the peripheral subsystems.
Repetition rates of the diagnostic tests
The seventh bar of Figure 13 shows what happens if the diagnostic tests are executed every 10 seconds instead of every 8 hours. Actually there is nearly no effect. Our investigations proved that a single channel system will show an effect if the test rate is not much higher than the demand rate. This is demonstrated in Figure 14. The bars indicate the number of hazardous situations per system within a mission time of 10 years. A hazard, in this sense, occurs at any time when a system which has failed dangerously is confronted with a demand on the safety function. Each bar of Figure 14 is labelled with the corresponding time Tt between consecutive online tests (the reciprocal value of the test rate rt) and the mean time between demands (“MTBD”, the reciprocal value of the demand rate rd). For these calculations the assumption was made that no repair is carried out after a hazardous event but the system is decommissioned in this case. As shown in Figure 14, for maximum test effect the test rate must be at least a factor of 100 greater than the demand rate. A factor very much greater than 100 will offer no additional benefit. If the test rate has the same order of magnitude as the demand rate this results in an increase of the number of hazardous events by a factor of about 6 or 7.
Figure 14: Single channel system with testing:
Number of hazardous events per system during a mission time of 10 years
6. Dual channel system with comparison
in accordance with category 3 or 4 of EN 954‑1
6.1. Description
EN 954-1 [2] requires a category 3 device to remain operational if a single fault is present in any part of the system. Besides, "whenever reasonably practicable the single fault shall be detected at or before the next demand upon the safety function." This includes that not all faults must be detected and that "the accumulation of undetected faults may lead to an unintended output and a hazardous situation at the machine." Common mode failures shall be taken into account.
In addition to above-mentioned demands there are more rigid requirements to be fulfilled by a system that claims for category 4: The single fault shall be detected "whenever possible" and, "if this detection is not possible, then an accumulation of faults shall not lead to a loss of safety functions."
The problem of providing the safety functions after the occurrence of a fault is often solved by the implementation of redundancy. A typical example for homogeneous redundancy is given by the dual channel system depicted by Figure 15 [9]. Whether category 3 or 4 can be met depends on the extent to which faults can be detected or tolerated.
Figure 15: Block diagram of a dual channel system with comparison
The system comprises two sensors (S1, S2) of same type and two programmable electronic devices (PED1, PED2) of identical type with integrated power supply in each PED combined with a single drive (D). Either of the PEDs is connected with an individual input (IN1, IN2) of the drive. In reality the PEDs will usually be given by microcontrollers. The cross link between them is intended for data interchange.
Again, there is a number of reasonable assumptions which have been made in order to derive a suitable Markov model:
1. Switching off the drive is the appropriate action to generate a safe state of the equipment under control (EUC) the drive is belonging to.
2. The safety system is not able to induce a hazardous situation by itself. The worst case which can occur is a dangerous failure, i.e. the system cannot perform it’s intended safety function.
3. Periodic online tests are carried out by the two programmable electronic devices (PEDs). The complete set of tests includes:
- a self-test of PED1 controlled and monitored by PED2,
- a self-test of PED2 controlled and monitored by PED1,
- a test of the drive-internal switch-off path beginning with input IN1 of the drive, performed by PED1,
- a test of the drive-internal switch-off path beginning with input IN2 of the drive, performed by PED2,
- a comparison of the output signals of the two sensors (S1, S2), performed by PED1 and PED2 together.
Each of the tests is checking subfunctions which are performed by the different components. Performing all subfunctions properly is a pre-condition for the safety system to provide it's intended safety function(s).
4. The mutually contolled and monitored self-tests of the PEDs are characterised by a diagnostic coverage, which can be assigned a value between zero and one.
5. The diagnostic coverage related to the sensors is equal to one. In some cases the feature will be implemented, in others it won’t. This can be expressed by the diagnostic coverage which is set either to zero or to one.[1]
6. The diagnostic coverage related to the drive-internal switch-off paths beginning with inputs IN1 and IN2 of the drive is equal to one. In some cases the feature will be implemented, in others it won’t. This can be expressed by the diagnostic coverage which is set either to zero or to one.1
7. Any failure which has been detected successfully will lead the system to a non-volatile safe state with the drive cut off. The system is assumed to be disconnected from the power manually until it has been repaired or replaced by a new one.
8. If one PED has failed dangerous it will no longer perform the test of it’s related drive input. The comparison of the output signals of the sensors is also inhibited.
9. A dangerous failure of both sensors at the same time is not detectable because they deliver identical (wrong) output signals. This can not be revealed by a comparison.
10. The failure rate of each input channel of the drive is given by:
11. Common cause effects do not hit complete channels but the two sensors, the two PEDs and the two switch-off inputs of the drive separately.
6.2. Markov model and assumptions
Based on the assumptions listed in chapter 6.1 Markov model DCSC (Dual channel system with comparison) has been developed. This dual channel system is put up completely symmetric. Therefore not only the first but also the second of the techniques mentioned in chapter 2.8 could have been applied in order to reduce the number of states needed. For example it doesn’t make any difference whether S2 and PED1 have failed or S1 and PED2. Uniting every pair of such “mirror combinations” to a single state in the Markov model cuts down the number of states necessary by nearly one half.
The resulting Markov model is plotted in Figure 16. All circle-shaped states are labelled with the components which have failed respectively. Dangerous states (red-coloured) have additionally been marked by the label "dang". State 17 was created using the first technique of chapter 2.8. It collects all dangerous states where
- the inherent faults can not be detected without having a real demand because there is no appropriate test left running and
- no additional failure of a component will lead to a condition where a test could be successful.
Common cause effects have been taken account of by using the β model technique described in chapter 2.3 (Figure 2). For this the presumption was made that all components of same type can be hit by common cause failure. Therefore individual β factors were introduced for the sensors (βd), the PEDs (βp) and the switch-off inputs (βi) of the drive.
Regarding the common cause effect results in 7 more transition arcs in the model. Furthermore the transition probability of many existing arcs has to be adapted. For better clearness in Figure 16 the additional arcs due to common cause effects have been drawn orange-coloured.
Figure 16: Markov model DCSC of the symmetric dual channel system
6.3. Result of evaluation
Markov model DCSC of Figure 16 has been evaluated as a high demand system with a demand rate of 10 demands on the safety function per hour. All calculations are based on an mission time of ten years. The following set of input parameters has been chosen as reference set:
MTTF of the sensors (S) |
MTTFd s |
15 years |
MTTF of the programmable logic devices (PED) |
MTTFd p |
15 years |
MTTF of the drive (D) |
MTTFd d |
15 years |
Diagnostic coverage of the sensor comparison |
Cs |
1 |
Diagnostic coverage of the PED self tests |
Cp |
0.9 |
Diagnostic coverage of the switch-off inputs of the drive |
Ci |
1 |
Test rate of all online test |
rt |
1/(10 s) |
Repair rate after failure detection |
rr |
1/(8 hours) |
Demand rate on the safety function |
rd |
10/hour |
Repair rate after hazardous event |
rrh |
1/(8 hours) |
Mission time (life time) |
TM |
10 years |
In the following paragraphs the influence of the different parameters on the probability of a dangerous failure per hour will be discussed. The complete compilation of all results is shown in the bar diagram of Figure 17. It should be noticed that in this diagram the probability is depicted in a logarithmic scale. The three β factors for the sensors, the PEDs and the switch-off inputs of the drive have been assigned the same value which is simply called β. Each parameter combination has been evaluated for the β values 0, 1%, 5% and 10%. In Figure 17 the β factor is indicated by the colour of the bars.
Implementation of diagnostic tests
A comparison of the results obtained by run 1, run 2, run 7 and run 8 reveals the immense impact of diagnostic coverage on the probability of a dangerous failure. Only the coverage for the PED has been altered. The step from 90% to 99% results in an improvement of about one order of magnitude (β=0) but a β factor of only 0.01 will reduce the gain to half an order of magnitude (or half a SIL step). The step from 60% to 90% only provides a smaller progress. Figure 17 shows that at least 90% diagnostic coverage is necessary to achieve SIL 2 with reasonable MTTFs (β=0.01). Comparing run 1, run 2, run 3 and run 4 gives an answer to the question whether internal online tests for PEDs are necessary. In run 4 no diagnostics were assumed while in run 3 100% diagnostics for sensors and drive and in run 7 additionally 60% diagnostics for the PED were chosen. Run 7 may be a good example for using two standard programmable logic controllers, implementing 100% diagnostics for the peripheral components and using these systems for safety functions. Figure 17 shows that this version is not much better than doing no diagnostics at all and too bad for SIL 2. Only the higher diagnostics in the PED (see run 1) brings the necessary jump into SIL 2 but a β factor of 0.05 or 0.1 will reduce the result to SIL 1.
Test rate
All tests in the system are assumed to be executed once within the same cycle. Thus, they all are related to the same test rate (or test interval). Comparing run 1 and run 5 the test rate is reduced from one test every ten seconds to one test per hour. In run 6 the test rate is only one test per eight hours. The result of the evaluation clearly shows that there is very low influence of the test rate on the probability of a dangerous failure per hour.
Figure 18 shows the influence of the test rate on dual channel systems. The mean time between demands (“MTBD”) has been kept constant but the test interval Tt has been altered. As demonstrated by the diagram there is no significant increase in the number of hazardous events as long as the test interval is much smaller than the MTTF of a single channel. This is a fundamental difference to the single channel system where the test rate has to be 100 times larger than the demand rate in order to avoid a substantial increase in the number of accidents. An explanation of this is that in the dual channel system there is still an operational channel left if the first channel has failed. The failure of the first channel plays the role of a “demand” for the remaining “single channel system”. Therefore the dominant factor here is not the ratio of test rate and demand rate but the MTTF of the a single channel.
Figure 17: PDF of a dual channel system with comparison (Markov model DCSC)
Figure 18: Dual channel system with comparison: Influence of the ratio of MTTF and test interval Tt
Failure rate of the subsystems
Considering Figure 17, the impact of the subsystem failure rate can be studied by a comparison of run 1 (15 years MTTF) , run 9 (30 years MTTF) and run 10 (100 years MTTF). The failure rates of all three types of components are altered in the same manner. It should be noticed that there is a non-linear relationship between the failure rate and the probability of a dangerous failure per hour but that the failure rate has also a big influence on the SIL.
Comparing run 1, run 11 and run 12 reveals what happens if only the failure rates of the sensors and the drive are altered. A lower or higher MTTF for sensors and drive than for the PED have a small effect on the SIL. This can be explained by the 100% diagnostic coverage for the sensors and the drive. This result justifies to take the same MTTF for all three subsystems in our simulations.
Influence of Common cause
The bar diagram of Figure 17 clearly depicts the impact of the β factor for each parameter combination. The reference combination looses about 2/3 of a SIL step due to a common cause factor of 10%. A general principle is recognisable: the lower the failure probability achieved by a system the higher the negative influence of common cause effects, no matter by which measures the low failure probability originally had been achieved. For instance, there is a loss of about 1.5 SIL steps if the system with 99% diagnostic coverage of the PEDs is confronted with a β factor of 10%. A comparison with the reference parameter set (90% coverage) shows that nearly the whole benefit of the very high coverage is lost due to common cause failures (β=10% for both cases). This demonstrates the immense importance of regarding common cause effects during design, development and operation.
According to part 6 of IEC 61508 a β factor of 2% can be looked upon as an achievable value for the machinery sector.
7. Dual channel system in mixed technology
in accordance with category 3 of EN 954‑1
7.1. Description
In many applications a mixed technology is used in order to implement a safety function. A first channel is given by a standard programmable logic controller (PLC) with integrated power supply and no specific online tests, while the second channel is formed by electromechanical means. Online tests are carried out by the PLC to check the elements of the electromechanical signal path.
As an example the simplified schematic of Figure 19 depicts the implementation of an emergency stop function employing a PLC and a relay circuit.
Figure 19: Implementation of an emergency stop function using mixed technology
We assume a machine where a current converter (CC) is controlled by a standard PLC. The rotation sensor (S) is part of the speed or position control of the current converter and can be used by the PLC to monitor the motor movements.
The safety function to be implemented is the emergency stop of the dangerous movement as soon as the emergency stop device (ES) is actuated. The actuator contains two mechanically forced contacts, either of them providing a separate output signal. One of which is processed by the PLC while the other is led to a relay circuit (RC) consisting of 2 relays (or contactors respectively) with forced contacts. The emergency stop function is executed by both the PLC via the current converter and the relay circuit. A failure of the opening of the contacts of the emergency stop actuator device is excluded. Independent random failures are supposed to happen to the PLC, the current converter, the relay circuit and the sensor while the emergency stop actuator ES is imputed not to fail to open it’s contacts if the button is pressed.
The PLC software is designed so that the opening of the contact of ES immediately leads to a stop signal for the current converter. Four online tests can be modelled by our Markov model. If one of the tests is not implemented in reality the pertinent test rate may be set to zero.
Description of the online tests:
- PLC diagnostic test:
As said before a standard PLC is used. Therefore we assume only simple online tests like a watchdog and parity bit test of the memory which are common today also for standard electronics. This will result in a low diagnostic coverage Cp of perhaps 30%. The test rate is rtp. We assume that the PLC after failure detection permanently switches off the outputs connected with CC and RC.
- CC diagnostic test:
In suitable time intervals e.g. once per day or during maintenance the PLC switches off the motor movement using the current converter CC. In parallel the PLC monitors the output signal of the rotation sensor S so that it can detect the reaction of CC. If the movement is not stopped by CC the PLC permanently stops the motor via the relay circuit RC. The diagnostic coverage of this test is named Cc and the test rate is called rtc.
- Rotation sensor diagnostic test:
The diagnostic test of CC can only be effective if the rotation sensor S is able to detect the motion of the motor. To check this the PLC is reading the sensor signal after switching on the motor. If the motion is not detected the PLC permanently stops the motor using the relays circuit RC. Diagnostic coverage of this test: Cs, test rate: rts.
- Relay circuit diagnostic test:
After a normal stop of the motor using CC and after executing the CC diagnostic test the PLC switches off the control signal for RC. Simultaneously the PLC monitors the corresponding contact(s) of the relay circuit RC. If RC does not react properly the PLC permanently stops the motor via the current converter CC. Because of the test’s simplicity the diagnostic coverage Cr can reach 100%. The test rate is titled rtr.
7.2. Markov model and assumptions
The four subsystems PLC, CC, S and RC are assumed to be hit by random failures. Because of the total different structure of the two channels we did not presume common cause failures. The Markov model has to model the failure of all subsystems in all possible sequences and in all combinations. We assume that the system after a permanent stopping of the motor in case of failure detection is disconnected from power. In this situation we do not have to assume further random failure occurring during repair. The repair rate is called rr.
With these assumptions we get the Markov model shown in Figure 21. All states depicted by circles are labelled with the subsystems which have failed dangerously (exception is state 1 where all is ok). The letter n after a subsystems name indicates that the failure of this subsystem is not detectable. State 24 shows the permanent stopping of the motor after failure detection.
For the evaluation of the model it is necessary to know which of the states are dangerous. As a useful tool a fault tree may be used identify them. The fault tree of our system is depicted in Figure 20. This tree could also form the first step of a quantitative fault tree analysis (FTA) which is able to deliver probability values [6], [7], but in this case it is used as qualitative tool only.
Figure 20: Fault tree of the dual channel system in mixed technology
According to the fault tree states 14, 23 and 25 are dangerous states which means the loss of the emergency stop function. State 25 summarises all dangerous states where no test is effective. In Figure 21 all dangerous states are additionally labelled by the appendix “dang”.
The single failure fault tolerance of this redundant architecture can be perceived in the Markov model by the fact, that no state with one subsystem faulty (states 2 to 9) is dangerous. State 26 represents the hazardous state which will be reached if an emergency stop has to be executed while the system is in a dangerous state. In the labelling of the transition arcs dt is used instead of Δt and lp, lc, ls and lr instead of λp, λc, λs and λr.
Figure 21: Markov model DCSMT of the dual channel system in mixed technology
7.3. Result of evaluation
A reference input parameter combination was chosen which is shown in the following table. With a demand rate of 1/year the appliance was evaluated as a low demand system.
MTTF of the programmable logic controller (PLC) |
MTTFd plc |
15 years |
MTTF of current converter (CC) |
MTTFd cc |
15 years |
MTTF of rotation sensor (S) |
MTTFd s |
15 years |
MTTF of relay circuit (RC) |
MTTFd rc |
15 years |
Coverage of the PLC diagnostic test |
Cp |
0.3 |
Coverage of the CC diagnostic test |
Cc |
0.9 |
Coverage of the rotation sensor diagnostic test |
Cs |
0.9 |
Coverage of the relay circuit diagnostic test |
Cr |
1.0 |
Test rate of PLC diagnostic tests |
rtp |
1/hour |
Test rate of the current converter diagnostic test |
rtc |
1/(24 hours) |
Test rate of the rotation sensor diagnostic test |
rts |
1/(24 hours) |
Test rate of the relay circuit diagnostic test |
rtr |
1/(24 hours) |
Repair rate after failure detection |
rr |
1/(8 hours) |
Demand rate of the emergency stop function |
rd |
1/year |
Repair rate after hazardous event |
rrh |
1/(8 hours) |
Mission time (life time) |
TM |
10 years |
Based on this reference data a number of simulations runs with different parameter deviations were performed. The result is shown in Figure 22. One can see that SIL 2 will hardly be achieved by the reference configuration.
Investigations revealed that a demand rate lower than 1/year results in an increase of the probability of a dangerous failure on demand. The deterioration reaches about 2/3 of a SIL step if the demand rate is assumed to be zero. This effect is independent from the other input parameters and it is due to the fact that a demand hitting a defective system will not only lead to a hazardous event but will also reveal that the system has failed dangerously. Consequently a very low demand rate will raise the fraction of systems dwelling in dangerous undetectable states. Therefore it is sensible to check the emergency stop function manually once a year.
Figure 22: Evaluation result of Markov model DCSMT
8. Triple channel system with comparison
in accordance with category 4 of EN 954‑1
8.1. Description
In seldom cases the problem of providing the safety functions after the occurrence of a fault is solved by the implementation of triple redundancy. A typical example for homogeneous redundancy is given by the tiple channel system depicted by Figure 23. Whether category 3 or 4 can be met depends on the extent to which faults can be detected or tolerated.
Figure 23: Block diagram of a triple channel system with comparison
The system comprises three sensors (S1, S2 and S3) of same type and three programmable electronic devices (PED1, PED2 and PED3) of identical type (with integrated power supply) in connection with a single drive (D). Each PED is connected to an individual input (IN1, IN2 and IN3) of the drive. In reality the PEDs will usually be given by microcontrollers. The three cross links between them are intended for data interchange.
8.2. Markov model and assumptions
More or less the same or equivalent assumptions are made as for the dual channel architecture in chapter 6.1 in order to derive a Markov model which can deliver comparable results.
The system contains 9 individual components: three sensors, three PEDs and three switch-off paths (“inputs”) of the drive. Supposing each component to be either operational or defective will result in 29 = 512 different failure combinations. Therefore all three techniques of chapter 2.8 have been applied in order to reduce the number of states needed:
- Combining all “dangerous” states where definitively no online test will be effective any longer,
- Making use of the architectures’ symmetry and
- Termination of further model development after the fourth failure in sequence.
Failures due to common cause effects have been taken account of by the simple β model implementing the modelling principle demonstrated in Figure 4 of chapter 2.3.
The result was a Markov model consisting of 91 states and a very large number of transition arcs. Due to it’s complexity no drawing of it is shown in this report.
8.3. Result of evaluation
The Markov model has been evaluated as a high demand system with a demand rate of 10 demands on the safety function per hour. All calculations are based on a mission time of ten years. The input parameters for the reference set is nearly identical with the table in chapter 6.3. Again, the MTTF of each switch-off path of the drive D is set to 30 years; all other components have a MTTF of 15 years. Deviating from the symmetric dual channel system the three test rates for sensor comparison, PED self test and switch-off path test are set to one per day (instead of one every 10 seconds). The complete reference parameter set is presented in the following table.
MTTF of the sensors (S) |
MTTFd s |
15 years |
MTTF of the programmable logic devices (PED) |
MTTFd p |
15 years |
MTTF of the drive (D) |
MTTFd d |
10 years |
Diagnostic coverage of the sensor comparison |
Cs |
1 |
Diagnostic coverage of the PED self tests |
Cp |
0.9 |
Diagnostic coverage of the switch-off inputs of the drive |
Ci |
1 |
Test rate of all online test |
rt |
1/(24 hours) |
Repair rate after failure detection |
rr |
1/(8 hours) |
Demand rate on the safety function |
rd |
10/hour |
Repair rate after hazardous event |
rrh |
1/(8 hours) |
Mission time (life time) |
TM |
10 years |
In the following paragraphs the influence of the different parameters on the probability of a dangerous failure per hour will be discussed. The complete compilation of all results is shown in the bar diagram of Figure 24. It should be noticed that in this diagram the probability is depicted in a logarithmic scale. The three β factors for the sensors, the PEDs and the switch-off inputs of the drive have been assigned the same value which is simply called β. Each parameter combination has been evaluated for the β values 0.1%, 5% and 10%. In Figure 24 the β factor is indicated by the colour of the bars.
Implementation of diagnostic tests
A comparison of the results obtained by run 1, run 2, run 7 and run 8 reveals the immense impact of diagnostic coverage on the probability of a dangerous failure. Only the coverage for the PED has been altered. The step from 90% to 99% results in an improvement of about two orders of magnitude (β=0) but a β factor of only 1% will bring all efforts to nothing. The step from 60% to 90% only provides a small progress also for a β of 1% or 5%. Figure 24 shows that at least 90% diagnostic coverage is necessary to achieve SIL 2 with reasonable MTTFs (β=1%). Comparing run 1, run 2, run 3 and run 4 gives an answer to the question whether internal online tests for PEDs are necessary. In run 4 no diagnostics were assumed while in run 3 100% diagnostics for sensors and drive and in run 7 additionally 60% diagnostics for the PED were chosen. Run 7 may be a good example for using three standard programmable logic controllers, implementing 100% diagnostics for the peripheral components and using this systems for safety functions. Figure 24 shows that this version leads to SIL 2 with a β of 1%. As a result we can state that we do not gain much in a triple redundant system using high diagnostic coverage for the PEDs because of the tremendous influence of the common cause factor.
Test rate
All tests in the system are assumed to be executed once within the same cycle. Thus, they all are related to the same test rate (or test interval). Comparing run 1 and run 5 the test rate is reduced from one test per day to one test per week. In run 6 the test rate is one test in 10 seconds. The result of the evaluation clearly shows that there is very low influence of the test rate on the probability of a dangerous failure per hour. The effects are even smaller than those at a dual channel system (see Figure 17).
Failure rate of the subsystems
Considering Figure 24, the impact of the subsystem’s failure rates can be studied by a comparison of run 1 (15 years MTTF) , run 9 (30 years MTTF) and run 10 (100 years MTTF). The failure rates of all three types of components are altered in the same manner. It should be noticed that there is a non-linear relationship between the failure rate and the probability of a dangerous failure per hour but that the failure rate has also a big bearing on the SIL.
Comparing run 1, run 11 and run 12 reveals what happens if only the failure rates of the sensors and the drive are altered. A lower or higher MTTF for sensors and drive than for the PED have a small effect on the SIL. This can be explained by the 100% diagnostic coverage for the sensors and the drive. This result justifies to take the same MTTF for all three subsystems in our simulations.
Influence of common cause
The bar diagram of Figure 24 clearly depicts the tremendous impact of the β factor for each parameter combination. The reference combination looses nearly 2 SIL steps due to a common cause factor of 10%. As stated before a β factor of even less than 5% destroys the gain obtained by high diagnostic coverage. Also the gain by better subsystems is strongly limited by the β factor. Due to common cause effects it seems to be hard to achieve SIL 3 with complex electronic systems. It should be noticed that, according to IEC 61508‑6, a β factor of 1% may me achievable with diverse redundancy only.
Figure 24: Evaluation result of Markov model TCSC for the symmetric triple channel system
9. Designated architectures of CES for the machinery sector
It could be shown in this report that typical architectures used in machinery which fulfil the requirements of EN 954‑1 can be linked to the SILs of IEC 61508. Figure 25 compiles some results obtained by the Markov models presented in the preceding chapters.
In order to make different architectures comparable the input parameters for identical or similar functional units have been set to the same values. In other cases reasonable values have been assumed. (Unifying the input parameters as far as possible will sometimes lead to results differing a little from those presented earlier.)
Unless otherwise noted, the following input data have been assumed:
MTTF of sensors, PEDs and PLCs: 15 years
MTTF of switch-off paths of the drive: 30 years
MTTF of a watchdog: 100 years
MTTF of a relay circuit (two contactors): 50 years
Repair rate
(after failure detection or hazardous event): 1/(8 hours)
All test rates of single channel systems: 1/(15 min)
All test rates of dual or triple channel systems: 1/(10 s)
All demand rates of single channel systems: 1/(24 hours)
All demand rates of dual or triple channel systems: 10/hour
Mission time (life time) 10 years
All evaluations have been executed applying the high demand procedure. As shown in Figure 25, SILs 1 to 3 can be achieved by system architectures belonging to different categories. For category B no link to a SIL is possible. With category 2 and suitable tests running in a time interval which is about 100 times smaller than the mean time between demand SIL 1 is achievable. Redundancy without any diagnostic tests running is comparable to category B systems and cannot be used even for SIL 1. Redundancy in mixed technology may achieve SIL 2 if online testing of the periphery is implemented. To achieve SIL 3 a redundant system needs to have 99% diagnostic coverage or a much better MTTF of the subsystems than we presumed for our reference systems. Given appropriate conditions SIL 3 is possible with a triple redundant system.
Figure 25: Comparison of different architectures used in machinery
Figure 25 demonstrates that simple doubling of signal processing paths and implementing no online tests (“simple redundancy”) does not provide a significant gain if the mission time has a similar order of magnitude as the MTTF of a single channel. Other investigations we did have shown that “simple redundancy” can only have a positive effect if the mission time is one order of magnitude smaller than the MTTF. For simple systems (e.g. contactors or valves) which can be proof tested once a year (i.e. 100% diagnostic coverage for all subsystems) simple doubling of the hardware may be useful. For complex subsystems like ASICS or PEDs simple doubling is only useful if the MTTF is one order of magnitude bigger (possible e.g. for some ASICS) than the mission time (life time) of the safety system. In all other cases online diagnostics are essential also in redundant safety-related systems.
These results compiled in Figure 25 could be helpful for standardisation. A link may be drawn between SILs and the categories for so called designated architectures. The architectures introduced in this chapter are proposed to be considered as designated architectures for the machinery sector. A manufacturer who can prove that his architecture is equivalent to one of the designated architectures only has to determine the MTTFdangerous of his the subsystems, to determine the diagnostic coverage of the online tests and, in case of redundant systems, estimate the common cause factor. Then he may derive the SIL out of a table. As an example, a table of this kind is presented in the following. This table is the compilation of results achieved by choosing particular input data. New Markov modelling will be necessary only if system architectures and/or parameters for the subsystems are used, which are not listed in the table.
There are several data banks which can be employed to determine the MTTF of hardware components, for example [11], [12], [13]. Standardisation e.g. could demand for the use of one of these appropriate data sources in order to attain comparable results. The diagnostic coverage can be determined using the failure model in annex A of part 2 of IEC 61508 [1]. Part 6 of IEC 61508 may be helpful to estimate the common cause factor β. Standardisation could specify one methodology for estimating the CCF. With this proposal a link between the two standards IEC 61508 and EN 954‑1 is possible. It is not a fixed link between categories and SILs but it is applicable without individual quantification of control systems.
Table: Possible designated architectures for machinery
SIL |
System Architecture |
Mean Time (years) |
CCF (%) |
Diagnostic (%) |
Cat. |
In/Processing/Out |
In/Processing/Out |
||||
- |
Single PE, Single I/O |
15/15/30 |
- |
0/0/0 |
B |
|
Single PE, Single I, Ext. WD(u/t) |
15/15/30 |
- |
0/60/0 |
B |
|
Dual PE, Dual I/O, 1oo2 |
15/15/30 |
5 |
0/0/0 |
? |
1 |
Single PE, Single I, Ext. WD(u/t) |
15/15/30 |
- |
100/60/100 |
2 |
|
Single PE, Single I, Ext. WD(u/t) |
7.5/15/10 |
- |
100/60/100 |
2 |
|
Dual PE, IPC, Dual I/O, 1oo2 |
15/15/30 |
5 |
100/60/100 |
3 |
|
Dual PE, IPC, Dual I/O, 1oo2 |
15/15/30 |
10 |
100/90/100 |
3 |
|
Dual PE, IPC, Dual I/O, 1oo2 |
45/15/60 |
10 |
100/90/100 |
3 |
|
Triple PE, IPC, Triple I/O, 1oo3 |
15/15/30 |
5 |
100/60/100 |
3 |
|
Triple PE, IPC, Triple I/O, 1oo3 |
15/15/30 |
10 |
100/90/100 |
4 |
2 |
Single PE, Single I, Ext. WD(t) |
15/15/30 |
- |
100/90*/100 |
2 |
|
Dual PE, IPC, Dual I/O, 1oo2 |
15/15/30 |
1 |
100/90/100 |
3 |
|
Dual PE, IPC, Dual I/O, 1oo2 |
30/30/60 |
5 |
100/90/100 |
3 |
|
Dual PE, IPC, Dual I/O, 1oo2 |
7.5/15/10 |
1 |
100/99/100 |
4 |
|
Mixed Dual Processing, Dual O, 1oo2 |
∞/(15/100)/(15/100) |
- |
0/(30/100)/(100/100) |
3 |
|
Triple PE, IPC, Triple I/O, 1oo3 |
15/15/30 |
1 |
100/60/100 |
3 |
|
Triple PE, IPC, Triple I/O, 1oo3 |
100/100/200 |
10 |
100/90/100 |
4 |
3 |
Single PE, Single I, Ext. WD(t) |
30/30/60 |
- |
100/99*/100 |
2 |
|
Dual PE, IPC, Dual I/O, 1oo2 |
45/45/90 |
1 |
100/99/100 |
4 |
|
Triple PE, IPC, Triple I/O, 1oo3 |
100/100/200 |
1 |
100/90/100 |
4 |
Conditions for single channel systems: Conditions for dual or triple channel systems: All test rates: 1/(15 min) All test rates: 1/(24h) Demand rate: 1/(24 h) Demand rate: 10/h Repair rate: 1/(8h) Repair rate: 1/(8h) Mission time (life time): 10 years Mission time (life time): 10 years MTTFd of watchdog: 100 years MTTFd of output sensor of mixed system: 15 years MTTFd of switch-off path for watchdog: equal to normal switch-off path (output sensor not tested) WD(u/t): Watchdog and pertinent switch-off path untested or tested WD(t): Watchdog and pertinent switch-off path tested IPC: Inter-processor communication (* not achievable by simple watchdog) |
10. Conclusions
During the STSARCES research project WP 2.1 wanted to execute systematic investigations on the effect of the test time interval on the Safety Integrity Level (SIL). In addition the concept of a proof test making the control system “as good as new” is a theoretical model which is not suitable for validation of complex electronic systems (CES) in the machinery sector. Therefore we determined the average probability of a dangerous failure per hour or the average probability of a failure on demand during the typical lifetime of a control system i.e. 10 years. (In the report the life time is referred to as the “mission time”.) It could be shown that without doing proof tests the demand had to be introduced into our Markov model so that one of the states is the hazard state in case of a demand arising at a point of time where the safety function cannot be performed by the control system due to an internal failure. With this model we could determine the SIL for the three modes of operation according to IEC 61508. The results are comparable and the SIL of a CES does not depend on the mode of operation. To determine the influence of the test time interval in our Markov models intermediate states had to be introduced where faults are present but online tests did not detect them because they have not yet been executed. With this models we could show that the test time interval is connected to the mean time to demand in a single channel system and to the mean time to dangerous failure (MTTF) of the individual channels for a multi channel system. These results can be generalised for all CESs. The generalisation justifies a dramatic simplification of Markov modelling which is necessary to handle existing CES in the machinery sector.
This report also demonstrated that a link between the categories (CAT) of EN 954‑1 and the SILs of IEC 61508 cannot be made by a fixed relation. If we interpret a category as an architecture with a specific diagnostic coverage, a SIL can be determined using several assumptions which are common in the machinery sector and giving the MTTFdangerous as an input parameter. For realistic input data the fixed relation of the past can be derived but this is only one possibility. It can be shown that SIL 3 is hard to achieve for a mission time of 10 years with dual redundancy only.
The concept of designated architectures was developed on the base of modelling of the different typical architectures for the machinery sector. This concept which had been proposed to IEC 61508 several years ago was rejected there because the standard is generic and it was impossible to find generic architectures for all application sectors. However, this concept seems to be usable in a sector specific standard as IEC 62061 [14]. This is the reason why the authors propose this concept as an link between CATs and SILs and as an input to IEC 62061. The concept seems to be realistic to be accepted by machine manufacturers because it strongly simplifies the quantification of CES in the machinery sector.
11. References
[2] EN 954‑1: 1996 Safety of machinery – Safety-related parts of control systems – Part 1: General principles for design. Brussels, December 1996.
[3] Jan Jacobson; Jacques Hérard: Methods for fault detection. STSARCES WP 2.2. Borás, September 1999.
[4] William M. Goble: Evaluating Control Systems Reliability. Instrument Society of America (ISA), Research Triangle Park, North Carolina 1992.
[5] Alessandro Birolini: Qualität und Zuverlässigkeit technischer Systeme. Springer, Berlin 1991.
[6] INRS, Dept. Ingénierie des Equipementes de Travail, Lab. Sûreté des systèmes Electroniques: STSARCES WP 1.2 Software Quality and Safety Requirements, Tools for Software Fault Avoidance. Vandoeuvre, December 1999.
[7] Reliability Analysis Center (RAC): Fault Tree Analysis Application Guide. Rome, NY 1990.
[8] CARMS (Computer Aided Rate Modeling and Simulation). DAINA, Columbia Heights, MN 1994.
[9] Werner Kleinbreuer; Franz Kreuzkampf; Karlheinz Meffert; Dietmar Reinert: Categories for safety-related control systems according to EN 954-1. BIA-Report 6/97e. Sankt Augustin, September 1999.
[10] Heinz Gall; Klaus Kemp: Einsatz und Wirksamkeit von Programmlaufüberwachungen. In: atp40(1998) Oldenbourg. p. 40 – 48.
[11] FARADIP.THREE (Failure Rate and Failure Mode Data Bank and Failure Mode and Effect Analysis Package). Technis, Tonbridge, Kent UK 1997.
[12] Reliability Analysis Center (RAC): Automated Databook (Electronic Parts Reliability Data, Nonelectronic Parts Reliability Data). Rome, NY 1994 - 1997.
[13] SN 29500 Failure Rates of Components, Part 1 – 7, Part 9 – 10. Siemens AG, ZT TN Corporate Functions Technical Regulation and Standardization, Munich and Erlangen 1982 – 1999.
[14] IEC 62061 Functional safety of electrical, electronic and programmable electronic control systems for machinery: Working group document 1999.
[15] INRS, Dept. Ingénierie des Equipementes de Travail, Lab. Sûreté des systèmes Electroniques STSARCES WP 1: Software Quality and Safety Requirements Vandoeuvre, December 1999.
[16] Timo Malm; Maarit Kivipuro: Safety Validation of Complex Components – Validation by Analysis. STSARCES WP 3.1. Tampere, September 1999.
[1] For machinery normally only a few digital sensors like switches are used. Monitoring of the drive is also done by digital signals. Thus a 100% diagnostic coverage is possible.
English