- Home
- Machinery Directive
- History of the Machinery Directive 2006/42/EC
- Machinery directive 2006/42/EC
- Whereas of machinery directive 2006/42/EC
- Articles of machinery directive 2006/42/EC
- Article 1 of machinery directive 2006/42/EC - Scope
- Article 2 of machinery directive 2006/42/EC - Definitions
- Article 3 : Specific Directives of machinery directive 2006/42/EC
- Article 4 : Market surveillance of machinery directive 2006/42/EC
- Article 5 : Placing on the market and putting into service - machinery directive 2006/42/EC
- Article 6 : Freedom of movement - machinery directive 2006/42/EC
- Article 7 : Presumption of conformity and harmonised standards - machinery directive 2006/42/EC
- Article 8 : Specific measures - machinery directive 2006/42/EC
- Article 9 : Specific measures to deal with potentially hazardous machinery - machinery directive 2006/42/EC
- Article 10 : Procedure for disputing a harmonised standard - machinery directive 2006/42/EC
- Article 11 : Safeguard clause - machinery directive 2006/42/EC
- Article 12 : Procedures for assessing the conformity of machinery - machinery directive 2006/42/EC
- Article 13 : Procedure for partly completed machinery - 2006/42/EC
- Article 14 : Notified bodies - machinery directive 2006/42/EC
- Article 15 : Installation and use of machinery - machinery directive 2006/42/EC
- Article 16 : CE marking - machinery directive 2006/42/EC
- Article 17 : Non-conformity of marking - machinery directive 2006/42/EC
- Article 18 : Confidentiality - machinery directive 2006/42/EC
- Article 19 : Cooperation between Member States - machinery directive 2006/42/EC
- Article 20 : Legal remedies - machinery directive 2006/42/EC
- Article 21 : Dissemination of information - machinery directive 2006/42/EC
- Article 22 : Committee - machinery directive 2006/42/EC
- Article 23 : Penalties - machinery directive 2006/42/EC
- Article 24 : Amendment of Directive 95/16/EC - machinery directive 2006/42/EC
- Article 25 : Repeal - machinery directive 2006/42/EC
- Article 26 : Transposition - machinery directive 2006/42/EC
- Article 27 : Derogation - machinery directive 2006/42/EC
- Article 28 : Entry into force - machinery directive 2006/42/EC
- Article 29 : Addressees - machinery directive 2006/42/EC
- ANNEX I of machinery directive 2006/42/EC - Summary
- GENERAL PRINCIPLES of annex 1 of machinery directive 2006/42/EC
- 1 ESSENTIAL HEALTH AND SAFETY REQUIREMENTS of annex 1 - definitions - machinery directive 2006/42/EC
- Article 1.1.2. Principles of safety integration of annex 1 machinery directive 2006/42/EC
- Article 1.1.3. Materials and products annex 1 machinery directive 2006/42/EC
- Article 1.1.4. Lighting - annex 1 machinery directive 2006/42/EC
- Article 1.1.5. Design of machinery to facilitate its handling - annex 1 machinery directive 2006/42/EC
- Article 1.1.6. Ergonomics - annex 1 machinery directive 2006/42/EC
- Article 1.1.7. Operating positions - annex 1 machinery directive 2006/42/EC
- Article 1.1.8. Seating - annex 1 machinery directive 2006/42/EC
- Article 1.2.1. Safety and reliability of control systems - annex 1 of machinery directive 2006/42/EC
- Article 1.2.2. Control devices - annex 1 of machinery directive 2006/42/EC
- Article 1.2.3. Starting - annex 1 of machinery directive 2006/42/EC
- Article 1.2.4. Stopping - annex 1 of machinery directive 2006/42/EC
- Article 1.2.4.4. Assembly of machinery - Annex 1 of machinery directive 2006/42/EC
- Article 1.2.5. Selection of control or operating modes - annex 1 of machinery directive 2006/42/EC
- Article 1.2.6. Failure of the power supply - annex 1 of machinery directive 2006/42/EC
- Article 1.3. PROTECTION AGAINST MECHANICAL HAZARDS - annex 1 of machinery directive 2006/42/EC
- Article 1.4. REQUIRED CHARACTERISTICS OF GUARDS AND PROTECTIVE DEVICES - annex 1 of machinery directive 2006/42/EC
- Article 1.5. RISKS DUE TO OTHER HAZARDS - annex 1 of machinery directive 2006/42/EC
- Article 1.6. MAINTENANCE - annex 1 of machinery directive 2006/42/EC
- Article 1.7. INFORMATION - annex 1 of machinery directive 2006/42/EC
- Article 2. SUPPLEMENTARY ESSENTIAL HEALTH AND SAFETY REQUIREMENTS - annex 1 machinery directive 2006/42/EC
- Article 3. SUPPLEMENTARY ESSENTIAL HEALTH TO THE MOBILITY OF MACHINERY - annex 1 machinery directive 2006/42/EC
- Article 4. SUPPLEMENTARY REQUIREMENTS TO OFFSET HAZARDS DUE TO LIFTING OPERATIONS of machinery directive 2006/42/EC
- Article 5. SUPPLEMENTARY ESSENTIAL HEALTH AND SAFETY REQUIREMENTS FOR UNDERGROUND WORK of machinery directive 2006/42/EC
- Article 6. SUPPLEMENTARY REQUIREMENTS - HAZARDS DUE TO THE LIFTING OF PERSONS of machinery directive 2006/42/EC
- Annex II : Declarations of CONFORMITY OF THE MACHINERY, DECLARATION OF INCORPORATION - machinery directive 2006/42/EC
- Annex III of machinery directive 2006/42/EC - CE marking
- Annex IV of machinery directive 2006/42/EC
- Annex V of machinery directive 2006/42/EC
- Annex VI of machinery directive 2006/42/EC
- Annex VII - Technical file for machinery - machinery directive 2006/42/EC
- Annex VIII - Assessment of conformity of machinery directive 2006/42/EC
- Annex IX of machinery directive 2006/42/EC - EC type-examination
- Annex X of machinery directive 2006/42/EC - Full quality assurance
- Annex XI of machinery directive 2006/42/EC - Minimum criteria for the notification of bodies
- Annex XII of machinery directive 2006/42/EC - Correlation table between machinery directive 2006/42/CE and MD 1998/37/CE
- Machinery directive 1998/37/EC
- considerings of machinery directive 1998/37/CE
- articles of 1998/37/EC machinery directive
- Annex I of 1998/37/CE machinery directive
- Annex II of 1998/37/EC machinery directive
- Annex III of machinery directive 1998/37/CE
- Annex IV of machine directive 1998/37/EC
- Annex V of machines directive 1998/37/CE
- Annex VI of machines directive 1998/37/EC
- Annex VII of machines directive 1998/37/EC
- Annex VIII of 1998/37/CE machine directive
- Annex IX of machinery directive 1998/37/CE
- Machinery directive 1989/392/EC
- whereas of machinery directive machines 1989/392/EEC
- articles of machinery directive 1989/392/EEC
- Annex I of machinery directive 1989/392/EEC
- Annex II of machine directive 1989/392/EEC
- Annex III of machinery directive 1989/392/EEC
- Annex IV of machinery directive 1989/392/EEC
- Annex V of machinery directive 1989/392/EEC
- Annex VI of machine directive 1989/392/EEC
- Annexe VII of machinery directive 1989/392/EEC
- Amendments of 1989/392/EEC directive
- ATEX directives
- ATEX 94/9/EC directive
- Whereas of ATEX 94/9/CE directive
- Articles of ATEX 94/9/CE directive
- article 1 ATEX 94/9/EC directive
- article 2 ATEX 94/9/EC directive
- article 3 ATEX 94/9/EC directive
- article 4 : ATEX 94/9/EC directive
- article 5 : ATEX 94/9/EC directive
- article 6 : ATEX 94/9/EC directive
- article 7 : ATEX 94/9/EC directive
- article 8 ATEX 94/9/EC directive
- article 9 : ATEX 94/9/EC directive
- article 10 : ATEX 94/9/EC directive
- article 11 : ATEX 94/9/EC directive
- article 12 : ATEX 94/9/EC directive
- article 13 : ATEX 94/9/EC directive
- article 14 : ATEX 94/9/EC directive
- article 15 : ATEX 94/9/EC directive
- article 16 : ATEX 94/9/EC directive
- ANNEX I of ATEX 94/9/EC directive : CRITERIA DETERMINING THE CLASSIFICATION OF EQUIPMENT-GROUPS INTO CATEGORIES
- ANNEX II of ATEX 94/9/EC : directive ESSENTIAL HEALTH AND SAFETY REQUIREMENTS -EHSR
- ANNEX III of ATEX 94/9/EC directive : MODULE EC-TYPE EXAMINATION
- ANNEX IV of ATEX 94/9/EC directive : MODULE PRODUCTION QUALITY ASSURANCE
- ANNEX V of ATEX 94/9/EC directive : MODULE PRODUCT VERIFICATION
- ANNEX VI of ATEX 94/9/EC directive : MODULE CONFORMITY TO TYPE
- ANNEX VII of ATEX 94/9/EC directive : MODULE PRODUCT QUALITY ASSURANCE
- ANNEX VIII of ATEX 94/9/EC directive : MODULE INTERNAL CONTROL OF PRODUCTION
- ANNEX IX of ATEX 94/9/EC directive : MODULE UNIT VERIFICATION
- ANNEX X of ATEX 94/9/EC directive : CE Marking - Content of the EC declaration of conformity
- ANNEX XI of ATEX 94/9/EC directive: NOTIFICATION OF BODIES
- ATEX 99/92/EC Directive
- ATEX DIRECTIVE 2014/34/UE
- whereas of 2014/34/UE ATEX directive
- Articles of ATEX 2014/34/UE directive
- Annex 1 of ATEX 2014/34/UE directive
- Annex 2 of the ATEX 2014/34/UE directive
- Annex 3 of ATEX 2014/34/UE directive
- Annex 4 of ATEX 2014/34/UE directive
- Annex 5 of ATEX 2014/34/UE directive
- Annex 6 of ATEX 2014/34/UE directive
- Annex 7 of ATEX 94/9/EC directive
- Annex 8 of the ATEX 2014/34/UE directive
- Annex 9 of the ATEX 2014/34/UE directive
- Annex 10 of ATEX 2014/34/UE directive
- Annex 11 of ATEX 2014/34/UE directive
- Annex 12 of the ATEX 2014/34/UE directive
- Audits in Ex field - EN 13980, OD 005 and EN ISO/CEI 80079-34
- New ATEX directive
- RASE european project
- ATEX 94/9/EC directive
- IECEX
- Standardization & European Regulation
- Safety of machines : Standardization and European regulations
- European regulation for machines - standardization for machines - harmonized standards
- Standardization in machinery
- EN ISO 12100 - Décembre 2010
- EN ISO 12100-1 - January 2004
- EN ISO 12100-1:2003/A1
- EN ISO 12100-2 November 2003
- EN ISO 12100-2:2003/A1
- EN ISO 14121-1 September 2007
- ISO/TR 14121-2 - 2007
- EN 50205:2002 standard - Relays with forcibly guided (mechanically linked) contacts
- ISO 11161:2007
- ISO 13849-1:2006
- ISO 13849-2:2012
- ISO 13850:2006 - Safety of machinery -- Emergency stop -- Principles for design
- ISO 13851:2002 - Safety of machinery -- Two-hand control devices -- Functional aspects and design principles
- ISO 13854:1996 Safety of machinery - Minimum gaps to avoid crushing of parts of the human body
- ISO 13855:2010 - Safety of machinery -- Positioning of safeguards with respect to the approach speeds of parts of the human body
- ISO 13856-1:2013 Safety of machinery -- Pressure-sensitive protective devices -- Part 1: General principles
- ISO 13856-2:2013 - Safety of machinery -- Pressure-sensitive protective devices -- Part 2: General principles for design testing
- ISO 13856-3:2013 Safety of machinery -- Pressure-sensitive protective devices - Part 3: General principles for design
- ISO 13857:2008 Safety of machinery -- Safety distances to prevent hazard zones
- ISO 14118:2000 - Safety of machinery -- Prevention of unexpected start-up
- ISO 14119:2013- Interlocking devices associated with guards
- ISO 14120:2002 - Guards -- General requirements for the design and construction
- ISO 14122-1:2001 - Permanent means of access to machinery
- ISO 14122-2:2001 - Permanent means of access to machinery
- ISO 14122-4:2004 - Permanent means of access to machinery
- ISO 14123-1:1998 - Reduction of risks to health from hazardous substances emitted by machinery
- ISO 14123-2:1998 - Reduction of risks to health from hazardous substances emitted by machinery
- ISO 14159:2002 - Hygiene requirements for the design of machinery
- ISO 19353:2005 -- Fire prevention and protection
- ISO/AWI 17305 - Safety of machinery - Safety functions of control systems
- ISO/DTR 22100-2 - Safety of machinery -- Part 2: How ISO 12100 relates to ISO 13849-1
- ISO/TR 14121-2:2012 - Risk assessment - Part 2: Practical guidance
- ISO/TR 18569:2004 - Guidelines for the understanding and use of safety of machinery standards
- ISO/TR 23849:2010 - Guidance on the application of ISO 13849-1 and IEC 62061 in the design of safety-related control systems
- STABILITY DATES FOR Machinery STANDARDS
- harmonized standards list - machinery-directive 2006/42/CE
- Publication of harmonised standards for machinery directive 2006/42/EC - 9.3.2018
- Harmonized standard list - machinery directive 2006/42/EC - 9.6.2017
- Harmonized standards for machinery - OJ C 2016/C173/01 of 15/05/2016
- Harmonized standards for machinery -OJ C 2016/C14/102 of 15/01/2016
- Harmonized standards for machinery - corrigendum OJ C 2015/C 087/03 of 13/03/2015
- harmonized standards for machinery - OJ C 2015/C 054/01 of 13/02/2015
- Application guide for machinery directive 2006/42/EC
- Guide to application of the machinery directive 2006/42/CE - July 2017
- Guide to application of the Machinery Directive 2006/42/EC - second edition June 2010
- Guide to application of machinery directive - 1-2 : The citations
- Guide to application of machinery directive - § 3 to § 31 The Recitals
- Guide to application of machinery directive - § 32 to § 156 - The Articles
- Guide to application of machinery directive - § 157 to § 381 - Annex I
- Guide to application of machinery directive - § 382 to § 386 - ANNEX II Declarations
- Guide to application of machinery directive - § 387 - ANNEX III CE marking
- recommendation for use - machinery directive 2006/42/EC
- Notified bodies under the machinery directive 2006/42/CE
- Safety of Ex, ATEX and IECEx equipments : Standardization
- Standardization in Ex Field
- The transposition of the ATEX 94/9/EC Directive to the 2014/34/EU directive
- harmonized standards list - ATEX directive 2014/34/EU
- Harmonized standard list for ATEX 2014/34/UE - 12-10-2018
- Harmonized standard list for ATEX 2014/34/UE - 15.6.2018
- Harmonized standard list for ATEX 2014/34/UE - 12-07-2019
- Harmonized standard list for ATEX 2014/34/UE - 9.6.2017
- Harmonized standards list ATEX 2014/34/UE directive - OJ C 126 - 08/04/2016
- Guide to application of the ATEX Directive 2014/34/EU
- application guide of 2014/34/EU directive - preambule, citations and recitals
- Guide to application of the ATEX 2014/34/UE directive - THE ARTICLES OF THE ATEX DIRECTIVE
- Guide to application of the ATEX 2014/34/UE directive - ANNEX I CLASSIFICATION INTO CATEGORIES
- Guide to application of the ATEX 2014/34/UE directive - ANNEX II ESSENTIAL HEALTH AND SAFETY REQUIREMENTS
- Guide to application of the ATEX 2014/34/UE directive - ANNEX III MODULE B: EU-TYPE EXAMINATION
- Guide to application of the ATEX 2014/34/UE directive - ANNEX IV MODULE D: CONFORMITY TO TYPE
- Guide to application of machinery directive - § 388 - ANNEX IV machinery and mandatory certification
- Guide to application of the ATEX 2014/34/UE directive - ANNEX V MODULE F: CONFORMITY TO TYPE
- Alignment of ten technical harmonisation directives - Decision No 768/2008/EC
- ATEX 94/9/EC directive documents
- ATEX 94/9/EC guidelines
- ATEX 94/9/EC guidelines 4th edition
- 1 INTRODUCTION of ATEX 94/9/EC guidelines 4th edition
- 2 OBJECTIVE OF THE ATEX DIRECTIVE 94/9/EC - ATEX 94/9/EC guidelines 4th edition
- 3 GENERAL CONCEPTS of ATEX 94/9/EC directive ATEX 94/9/EC guidelines 4th edition
- 4 IN WHICH CASES DOES DIRECTIVE 94/9/EC APPLY - ATEX 94/9/EC guidelines 4th edition
- 5 EQUIPMENT NOT IN THE SCOPE OF DIRECTIVE 94/9/EC - ATEX 94/9/EC guidelines 4th edition
- 6 APPLICATION OF DIRECTIVE 94/9/EC ALONGSIDE OTHERS THAT MAY APPLY - ATEX 94/9/EC guidelines 4th edition
- 7 USED, REPAIRED OR MODIFIED PRODUCTS AND SPARE PARTS - ATEX 94/9/EC guidelines 4th edition
- 8 CONFORMITY ASSESSMENT PROCEDURES - ATEX 94/9/EC guidelines 4th edition
- 9 NOTIFIED BODIES - ATEX 94/9/EC guidelines 4th edition
- 10 DOCUMENTS OF CONFORMITY - ATEX 94/9/EC guidelines 4th edition
- 11 MARKING - CE marking -ATEX 94/9/EC guidelines 4th edition
- 12 SAFEGUARD CLAUSE AND PROCEDURE - ATEX 94/9/EC guidelines 4th edition
- 13 EUROPEAN HARMONISED STANDARDS - ATEX 94/9/EC guidelines 4th edition
- 14 USEFUL WEBSITES - ATEX 94/9/EC guidelines 4th edition
- ANNEX I: SPECIFIC MARKING OF EXPLOSION PROTECTION - ATEX 94/9/EC guidelines 4th edition
- ANNEX II: BORDERLINE LIST - ATEX PRODUCTS - ATEX 94/9/EC guidelines 4th edition
- ATEX 94/9/EC guidelines 4th edition
- Harmonized standards list - ATEX 94/9/EC directive
- Harmonized standards list ATEX 94/9/EC directive - OJ C 126 - 08/04/2016
- Harmonized standards list ATEX 94/9/EC - OJ C 335 - 09/10/2015
- Harmonized standards list ATEX 94/9/EC - OJ-C 445-02 - 12/12/2014
- Harmonized standards list ATEX 94/9/EC - OJ-C 076-14/03/2014
- Harmonized standards list ATEX 94/9/EC - OJ-C 319 05/11/2013
- ATEX 94/9/EC guidelines
- European regulation for ATEX 94/9/EC ATEX directive
- Guide to application of ATEX 2014/34/EU directive second edition
- Safety of machines : Standardization and European regulations
- Latest news & Newsletters
- Functional safety
- Terms and definitions for functional safety
- Safety devices in ATEX
- The SAFEC project
- main report of the SAFEC project
- Appendix 1 of the SAFEC project - guidelines for functional safety
- Appendix 2 of the SAFEC project
- ANNEX A - SAFEC project - DERIVATION OF TARGET FAILURE MEASURES
- ANNEX B - SAFEC project - ASSESSMENT OF CURRENT CONTROL SYSTEM STANDARDS
- ANNEX C - safec project - IDENTIFICATION OF “USED SAFETY DEVICES”
- Annex D - SAFEC project - study of ‘ Used Safety Devices’
- Annex E - Determination of a methodology for testing, validation and certification
- EN 50495 standard for safety devices
- The SAFEC project
- Safety components in Machinery
- STSARCES - Standards for Safety Related Complex Electronic Systems
- STSARCES project - final report
- STSARCES - Annex 1 : Software engineering tasks - Case tools
- STSARCES - Annex 2 : tools for Software - fault avoidance
- STSARCES - Annex 3 : Guide to evaluating software quality and safety requirements
- STSARCES - Annex 4 : Guide for the construction of software tests
- STSARCES - Annex 5 : Common mode faults in safety systems
- STSARCES - Annex 6 : Quantitative Analysis of Complex Electronic Systems using Fault Tree Analysis and Markov Modelling
- STSARCES - Annex 7 : Methods for fault detection
- STSARCES - Annex 8 : Safety Validation of Complex Components - Validation by Analysis
- STSARCES - Annex 9 : safety Validation of complex component
- STSARCES - Annex 10 : Safety Validation of Complex Components - Validation Tests
- STSARCES - Annex 11 : Applicability of IEC 61508 - EN 954
- STSARCES - Annex 12 : Task 2 : Machine Validation Exercise
- STSARCES - Annex 13 : Task 3 : Design Process Analysis
- STSARCES - Annex 14 : ASIC development and validation in safety components
- Functional safety in machinery - EN 13849-1 - Safety-related parts of control systems
- STSARCES - Standards for Safety Related Complex Electronic Systems
- History of standards for functional safety in machinery
- Basic safety principles - Well-tried safety principles - well tried components
- Functional safety - detection error codes - CRC and Hamming codes
- Functional safety - error codes detection - parity and chechsum
- Functional safety and safety fieldbus
- ISO 13849-1 and SISTEMA
- Prevention of unexpected start-up and machinery directive
- Self tests for micro-controllers
- Validation by analysis of complex safety systems
- basic safety principles - safety relays for machinery
- Download center
- New machinery regulation
- Revision of machinery directive 2006/42/EC
- security for machines
STSARCES - Annex 7 : Methods for fault detection
Annex 7 : Methods for fault detection
Final Report of WP2.2
European Project STSARCES
Contract SMT 4CT97-2191
Contents
Preface
Scope
Summary
1 Introduction
1.1 Diagnostic coverage
1.2 Diagnostic test interval
1.3 Architecture
2 Description of methods
2.1 Processing units
2.2 Invariable memory ranges
2.3 Variable memory ranges
2.4 I/O Units and Interface
2.5 Data paths
2.6 Power supply
2.7 Program sequence
3 Diagnostic coverage
3.1 Processing unit
3.2 Memory ranges
3.3 I/O Units and Interfaces
3.4 Data paths
3.5 Power supply
3.6 Program sequence
4 Requirements
4.1 Processing unit
4.2 Invariable memory ranges
4.3 Variable memory ranges
4.4 I/O Units and Interface
4.5 Data paths
4.6 Power supply
4.7 Program sequence
5 Validation methods
5.1 Processing units
5.2 Invariable memory ranges
5.3 Variable memory ranges
5.4 I/O Units and Interface
5.5 Data paths
5.6 Power supply
5.7 Program sequence
6 Conclusions
Preface
The European Research project STSARCES Standards for Safety Related Complex Electronic Systems aims to harmonise validation methods for safety-related parts of machine control systems.
This is a draft issued to be subject of further discussions. Especially the requirements and the connection of requirements at specific categories (see chapter 4) have to be harmonised between the STSARCES partners.
Revision |
Date |
Changes made |
Distribution |
0 |
26 August 1998 |
First issue |
BIA, INERIS |
1 |
4 December 1998 |
Chapters on coverage and validation methods added |
BIA |
2 |
23 December 1998 |
Changes to chapter 1.1 and requirements. Chapter 6 integrated into chapter 2. Coverage chapter is now chapter 3. Some other minor changes. |
BIA, INERIS |
3 |
24 February 1999 |
Chapter 1.3 added. Several additions to chapter 3. Requirements of chapter 4 updated Check lists in chapter 5 modified. Some other minor changes. |
All STSARCES partners |
4 |
10 September 1999 |
Addition to chapter 1.1 Addition of figures to chapter 3 |
BIA |
5 |
24 September 1999 |
Changes to chapter 1.1 : DCN definition revised Changes to chapter 1.3 : new text added, rectification of the reference in last paragraph. Figures on applicable architecture configurations added. Changes to chapter 3 : Tables listing proprieties of methods for diagnostic coverage are slightly modified. Changes to chapter 4 |
All STSARCES partners |
5 |
29 February 2000 |
Summary and conclusion added. Page numbers included in the table of contents |
INERIS |
Scope
Work Package 2.2 ”Methods for Fault detection” of the European research project STSARCES has the objective to establish which techniques and measures that can be regarded as basic or well-tried safety principles for detection of hardware faults in machine control systems. It is part of the STSARCES WP 2 ”Hardware Safety”, which also contains WP2.1 ”Quantitative analysis of complex electronic systems using fault tree analysis and Markov modelling”.
This document contains an introduction (chapter 1), a background description of the techniques (chapter 2), calculation and definition of diagnostic coverage (chapter 3), listing of the requirements (chapter 4) and validation methods (chapter 5). The introduction and the background description is regarded to be needed to harmonise the opinions of the STSARCES partners. The overall hardware architecture is not discussed in this document.
Summary
The evaluation of hardware safety is an important factor when assessing the overall characteristics of a safety-related system according to the machinery directive. The increasing integration of programmable electronics in industrial equipment adds new aspects that need to be considered. The experience and knowledge gathered from existing systems and the modern approach in system development could contribute to the validation process. The present work is an attempt to lay down some reference marks on existing methods and principles considered as state of the art. It also clarifies the interdependency between architecture design, diagnostic coverage and category (i.e. the system behaviour at fault).
The basic concepts of diagnostic coverage, test interval and architecture are explained. Measures for checking of processing units, memory, I/O units, interface, data paths, power supply and program sequence are described. These measures are then group into requirements for control systems to be used with safety functions of different category.
Eventually, preliminary tables are presented in order to summarise the aspects related to the validation of safety-related systems.
1 Introduction
Programmable electronic systems (PES) have the ability to detect faults within themselves before a fault is manifested as a failure of the system. The techniques and measures used focus on different parts of the electronic hardware and may require different amount of system effort. It is regarded as state-of-the-art to implement techniques for fault detection in PES used in safety-critical applications.
It is easy to imagine some possible faults which can cause unexpected behaviour of the machine which is controlled. A bit in a memory cell may be stuck at ”0” or ”1”. The output circuits may be stuck at ”ON”. A software fault may cause a task to enter an ”eternal loop”. Perhaps interruptions in the supply power, or variations in the voltage level, may influence the execution of the software. Data transferred on serial communication lines may be distorted by interference. An internal CPU fault might cause incorrect execution. There are techniques and measures to detect such faults before the machine gets out of control.
All safety critical systems should undertake basic measures to detect the faults, and possibly control the failures which might occur. The standard EN 954-1 specifies 5 categories for system behaviour at fault. Basic safety principles should be implemented for categories B, 1, 2, 3 and 4. For categories 1, 2, 3 and 4 also well-tried safety principles have to be implemented. The presence and performance of the safety principles must be validated.
It does not yet exist a common understanding if a certain measure fulfils the requirements for safety principles for a specific category.
Category |
System behaviour |
Principles to achieve safety |
B |
The occurrence of a fault can lead to the loss of the safety function. |
Mainly characterised by selection of components |
1 |
As in category B, but the probability of occurrence is lower than in category B. |
" |
2 |
Faults are detected by periodic checks at suitable intervals. The occurrence of a fault can lead to the loss of the safety function between the checks. The loss of the safety function is detected by the check. |
Mainly characterised by structure |
3 |
When a single fault occurs the safety function is always performed. Some but not all faults will be detected. Accumulation of undetected faults can lead to the loss of the safety function. |
" |
4 |
When the fault (faults) occur the safety function is always performed. The fault (faults) will be detected in time to prevent the loss of the safety function. |
" |
Figure 1a. Summary of categories for system behaviour at fault according to standard EN954-1.
1.1 Diagnostic coverage
Simple tests will not detect all hardware faults. Elaborate tests will detect many hardware faults at the cost of much processing effort spent. The diagnostic coverage, DC, is defined as the fractional decrease in the probability of dangerous hardware failure resulting from the operation of the automatic diagnostic tests. [IEC 61508-4, clause 3.8.6] See formula 1. If the test detects all faults, the coverage is 100%. If no faults are detectable, the coverage is 0%.
diagnostic coverage DC = [1]
Another definition of diagnostic coverage, DCN, (see formula 2) relates to the fraction of total number of different failures that is detected during a particular test. There can be large differences between the two ways to define the diagnostic coverage. The probability based approach distinguishes between faults which occur with different probability, while the number based approach does not. A test technique which detects faults occurring with high probability is very well likely to have a high DC, but may have a low DCN if there is a large number of low probability faults which are not detected.
diagnostic coverage DCN = [2]
It may be hard to find numerical values for the probabilities of different faults. Sometimes the assumption is made, that all faults have the same probability. This is always an approximation of reality.
It is possible to make a numerical calculation of the coverage of some methods. While the coverage of some other methods may have to be expressed in qualitative ways such as ”high/medium/low”. An estimation of the diagnostic coverage will be needed to be able to compare two diagnostic test methods.
A translation from the qualitative definition “low/medium/high”, to a quantitative measure expressed as a percentage will be needed. This report has chosen to follow the definitions suggested by the IEC 61508 standard. (See figure 1.) High coverage is used for techniques and measures with a probability higher than 99% to detect a fault. Medium coverage means a probability less than 99%, but higher than 90%. Low coverage will correspond to a diagnostic coverage greater than 60%, but lower than 90%. Techniques and measures offering less than 60% probability to detect faults are to be avoided in safety-related parts of control systems.
Figure 1. Diagnostic coverage defined as low, medium and high.
When numerical values are needed in calculations, 60% is used for “low” coverage, 90% is used for “medium” coverage and 99% is used for “high” coverage.
The quantification of the diagnostic coverage for different methods of fault detection in memory and I/O units (in chapter 3 of this report) is based on values extracted from the IEC 61508 standard. Those values are the results of theoretical studies based on a simplified probabilistic approach. The lack of data concerning the various types of memory chips, and the assumption that the potential faults are equally distributed introduce a number of uncertainties.
Similar uncertainties are introduced for other parts of the PES. The probability of different faults in the processing unit will depend on the type of processor, the manufacturer, the production process, the design etc. It is hardly possible to state a probability that will be valid in all cases. Assuming a normal fault distribution of possible faults, the previous expressions [1] and [2] for the diagnostic coverage become equivalent.
Faults in the programme sequence will have different probabilities depending on the programming language, the experience of the programmer, the testing effort etc. It will not be easy to state a probability-based diagnostic coverage for the programme sequence monitor.
The most valid estimation of diagnostic coverage for a fault detecting method should at this stage be limited to one of the three levels referred earlier in this section; low, medium or high. The level chosen may be different if probability, or numbers of errors, is used for the definition of diagnostic coverage. However, the level will be the same if all faults are equally probable.
1.2 Diagnostic test interval
Methods for fault detection can be used at power-up to establish if the electronic system is fit to start operating. Faults should be detected at start, and operation must not be allowed to start when faults are detected. Such a power-up test will not detect faults which occur during operation. Thus it will not be suitable for systems of high safety integrity with strict requirements for behaviour at fault. Power-up testing is sometimes used in systems with moderate safety requirements, and short operating time between power-ups.
Fault detection may also be performed at run-time to find faults occurring during operation. The diagnostic test interval will be of great importance to decide for which applications the system can be used. The test interval will be the maximum time during which an undetected fault may exist.
The diagnostic test interval is defined as the interval between on-line tests to detect faults in a safety-related system that have a specified diagnostic coverage [IEC61508-4, clause 3.8.7].
The application will decide the requirement for diagnostic test interval. It is not easy to define an exact requirement for a certain category. The failure rate being generally much greater than the diagnostic test interval, the diagnostic time interval should nevertheless not be longer than 1% of the mean time between operation demands.
1.3 Architecture
Different hardware architectures are possible for a PES intended to fulfil category 2, 3 or 4. Category 2 is often realised by a single channel structure, which is called a 1oo1 system (1 channel out of 1 provides the safety function). Category 3 may be realised by a single channel system in combination with auxiliary components such as an electromechanical module, or by a dual channel system. A PES intended to claim category 4 will probably be either dual-channel or triple-channel. Other more advanced architectures may certainly be used for any of the categories, but this is not state-of-the-art due to financial restrictions on the design.
The safety principles required will depend on the hardware architecture chosen. A dual-channel system employing safety principles of medium diagnostic coverage may provide the same probability for failure of the safety function, as a single channel system using safety principles with high diagnostic coverage. Faults in the single channel will have to be more accurately detected to provide a high probability of failure-free operation. A dual-channel system may tolerate an undetected fault in one of channels, while the other channel still provides the safety function.
In a similar way, a triple-channel system will need less efficient safety principles, than a dual-channel system. These differences will be reflected in the requirements on which safety principles to be employed for a specific category. (See the tables of chapter 4.)
2 Description of methods
There are several aspects in a programmable electronic system which can be automatically self-checked. The following chapter gives some background and explanations to the different techniques.
2.1 Processing units
The processing unit of the PES may suffer from faults which may cause malfunction. Examples of such faults could be bit errors in one of its internal registers, or malfunctioning of the instruction decoder.
Self-testing is employed by designing software routines which test the functionality of the processing unit. Certain operations are performed, and there will be only one correct result of such a test. The principle of letting a processor which might be defect check itself, relies on the assumption that a fault will corrupt the result of the self-check. The self-test has to be designed in such a way that the risk for a fault corrupting the test itself is negligible.
Some tests on the processing units are very hard to perform during run-time. A processing unit in full operation will make use of all registers, flags etc. It may be easier to perform exhaustive tests of the processing unit at power-up before the application software has been started.
Another approach is to let two processing units exchange data, and then compare the result. A fault in one of the processors is supposed to be detected by the other processor.
Following are examples of measures and techniques which are often employed to detect faults in processing units:
- self test of the execution of the instruction set.
- self test of registers by patterns or walking-bit [IEC61508-7, clause A.3.1, A.3.2]
- reciprocal comparison by software between two processing units [IEC61508-7, clause A.3.5]
2.2 Invariable memory ranges
Semiconductor memories may fail to work as intended. A fault in the invariable memory will corrupt the source code and constants stored there. The instructions to the processing unit may be distorted, and important constants and parameters will be incorrect. This will result in an unpredictable behaviour of the machine control system. It is important to try to detect such faults before the execution is disturbed.
The methods for checking of invariable memory all rely on reading the memory cells and compare the read values to what originally stored there. This may be done by direct comparison to a duplicate area, or by calculating a checksum (or signature). The checksum will then be compared to the sum which was originally calculated and stored in memory.
Memory tests can be quite time consuming. It may not be possible to cover the complete address range at one time. Run time checking is often performed by checking only a limited address range each time the memory test task is started. After a large number of calls to the memory test task the complete memory will have been tested.
Example: A 8 kbyte ROM memory is check summed by adding 8 bytes to the check sum every time the memory test task is called. It will take a total of 1024 calls (8*1024/8=1024) before the check sum is completed. If the memory test task is started every 50 ms, the diagnostic test interval will be 51 seconds (1024*50ms= 51.2 seconds).
It will not always be possible to use time-consuming memory tests at reset or power-up. The operator might find the time to wait for system start too long. In such cases, a simple power-up test may be combined with a more exhaustive run-time test.
Following are examples of measures and techniques which are often employed to detect faults in invariable memory ranges:
- checksum [IEC61508-7, clause A.4.2]
- 8-bit signature [IEC61508-7, clause A.4.3]
- 16-bit signature [IEC61508-7, clause A.4.4]
- replication [IEC61508-7, clause A.4.5]
2.3 Variable memory ranges
Semiconductor memories may fail to work as intended. A fault in the variable memory will corrupt all variables stored in memory. This will result in an unpredictable behaviour of the machine control system.
The methods for checking variable memory rely on different ways to stimulate the memory cells, and check that the function correctly. Test patterns of different complexity can be written and then read back. A complex test pattern will have a higher diagnostic coverage than a simple test pattern.
Example: A simple test of variable memory can be made by writing and reading bit patterns according to the following:
- save contents of memory cell to test.
- write bit pattern 055H.
- read and compare memory cell with 055H. Handle error if no correspondence.
- write bit pattern 0AAH.
- read and compare memory cell with 0AAH. Handle error if no correspondence.
- restore contents of memory cell.
Also testing of variable memory can be time consuming, as described above for invariable memory.
Following are examples of measures and techniques which are often employed to detect faults in variable memory ranges:
- RAM test ”checkerboard” or ”march” [IEC61508-7, clause A.5.1]
- RAM test ”walkpath” [IEC61508-7, clause A.5.2]
- RAM test ”galpat” [IEC61508-7, clause A.5.3]
2.4 I/O Units and Interface
Interfaces to external units are present on many safety-related machine control systems. They can be analogue, digital, serial or parallel. The external communication through these interfaces can be of great importance for the safety. The status of the I/O units and interfaces may be critical, and should therefore be monitored.
Example: The semiconductor outputs of a safety-related PES are doubled and monitored. A fault in one output transistor will be detected, while the other transistor still is able to switch off the output load.
Following are examples of measures and techniques which are often employed to detect faults in I/O and interfaces:
- multi-channel parallel output [IEC61508-7, clause A.6.3]
- monitored outputs [IEC61508-7, clause A.6.4]
- input comparison/voting [IEC61508-7, clause A.6.5]
2.5 Data paths
Even a physically small PES may consist of several internal units which communicate. Examples of such data paths are electrical parallel buses, serial buses and optical fibres. The data paths between the internal units may fail, and should be checked.
Important signals, such as alarm signals, originating from one unit should be detected and processed by the appropriate receiving unit.
Example: An optical fibre is used as data path between two modules of a safety-related PES. All messages have an 16-bit checksum included, and all safety-related commands are transmitted twice before they are accepted and executed by the receiving module.
Following are examples of measures and techniques which are often employed to detect faults in data paths:
- inspection using test patterns [IEC61508-7, clause A.7.4]
- transmission redundancy [IEC61508-7, clause A.7.5]
- information redundancy [IEC61508-7, clause A.7.6]
2.6 Power supply
All safety-related programmable electronic systems should have some kind of circuitry to ensure that operation will not be started before an adequate supply voltage has been reached. The behaviour of the processor and other electronic circuits is specified only for a specific voltage range.
Monitoring of the supply voltage level will be important also during run-time. Interruptions of the supply power will cause the electronic hardware to enter an undefined range where the exact behaviour cannot be foreseen. The monitoring circuit must give an alarm in time before the voltage reaches a threshold value. The PES should have a ”graceful death” bringing the controlled machinery to a safe state.
The programmable electronic system should then have time to take proper action to enter a safe state. It may also be necessary to save machine status and calculated values in a non-volatile memory.
The output signal from the hardware circuit monitoring the supply power can be either static or dynamic. The output from a static monitor will indicate proper power supply with a constant high or low output signal. A change of the output will give an alarm to the processor. A monitor built on a dynamic principle will have a dynamic output signal. Power failure will be indicated by a constant signal.
The most common way to implement a supply power monitoring is to use a standard commercial supply power monitor circuit. Such ICs are available from several semiconductor manufactures. A disadvantage with this type of circuit is that it is normally not testable, i.e. a fault in the circuit will not be noticed before the PES is shut down in an uncontrolled way.
Figure 2.6.a Example of block diagram of external circuits for power-up reset and detection of low supply voltage
Figure 2.6.b Example of block diagram of external circuits for power-up reset and detection of low supply voltage in a dual-channel system.
Also over-voltage may create unwanted behaviour of the control system. Checking facilities can also be implemented to react on over-voltage before the specified operating voltage is exceeded, and the behaviour cannot be guaranteed. One example when over-voltage detection is needed may be a dual-channel system using a single power supply.
Following are examples of measures and techniques which are often employed to detect faults in the power supply:
- over-voltage protection with safety shut-off [IEC61508-7, clause A.8.1].
- monitoring of secondary voltages [IEC61508-7, clause A.8.2].
- power-down with safety shut-off [IEC61508-7, clause A.8.3].
2.7 Program sequence
The execution sequence of the software may be distorted by either software faults, hardware faults or environmental disturbances. This will with great probability lead to incorrect behaviour of the programmable electronic system. The consequences of such a fault are not possible to foresee since ”anything may happen”. However, there are well established techniques to monitor the programme sequence.
The monitoring of programme sequence may be realised both in hardware and in software. A combination of a hardware unit (”watchdog”) with a logical monitoring realised in software will be the most powerful. But also less sophisticated techniques built on simple hardware or software solutions will certainly be able to detect some errors in the programme sequence.
2.7.1 Monitoring by hardware
A watchdog is defined as a hardware design which monitors the operation of internal hardware functions, and/or application programme functions, and/or system software functions and will result in a safe condition if not periodically reset at a predetermined interval. [prEN 12978] In practice, this means a timer circuit which is triggered by the processor at a periodic interval. If the trigger does not reach the watchdog, a stop or reset signal is given to the processor.
Figure 2.7.a. Watchdog, block diagram
Figure 2.7.b. Watchdog triggering and alarm signals
There are several hardware solutions for watchdog functionality. Many microcontrollers offer a watchdog circuit on-chip. The watchdog is then programmed, activated and controlled through internal registers of the controller. There are also special circuits available to supervise microcontrollers. Such circuits often offer the facility of a watchdog. Another solution is to design a separate hardware circuitry based on a monostable flip-flop which has to be retriggered at a specified interval.
The watchdog circuit is preferred to be hardware independent from the processor itself. The same error which causes the fault in programme sequence should not cause also the watchdog to stop functioning. There is an increased risk for this if the watchdog is integrated on the same chip as the processor. Special caution must be paid if the time base used by the processor and the watch dog is the same. A clock fault might then affect both the microcontroller and the watchdog.
A good watchdog should be tested or fail-safe. Systems with high requirements for functional safety may let the processor fake a watchdog alarm at every power-up, or at a periodic interval. The processor will then detect if the watchdog is non-operational. Otherwise there is a risk for a fault in the watchdog to pass unnoticed, until it really will be needed. Another possibility is to design a watchdog circuitry where single hardware faults will cause an alarm.
2.7.2 Software aspects
Software must be used to generate the trigger to the watchdog hardware circuit. It may also be used to check the programme sequence even if no hardware has been implemented as watchdog.
The simplest way of triggering in a cyclical system is to check the program sequence once every cycle. It is not advisable to make the checking or watchdog triggering in a dedicated interrupt routine or isolated task. Such a triggering will only prove that one single routine or task is running. The main program may be ”locked up”, without this influencing the execution of a time triggered interrupt routine.
A more powerful way to check the program sequence by software, is to use trigger flags and key words to indicate that all the significant parts of the software are active and executed in the correct sequence.
Figure 2.7.c. Example of program sequence monitoring by software
2.7.3 Actions at fault
A detected fault in the program sequence will require different kinds of actions depending of the type of system. It is important that the correct action is taken. It will not be enough to design a watchdog into the system, and not specify properly which action shall be taken at fault.
Most machines have a safe state which shall be entered when a fault is detected. A watchdog circuit may force the processor and the outputs to the safe state where signals are inactive. Another possibility is that a watchdog alarm will cut the power to the outputs and leave the machine in safe state. (See figure 2.7.d.) This safe state must not be possible to exit without a dedicated action of reset.
For applications with requirements for high availability, the watchdog alarm may be used to isolate one of the processors and let a redundant unit take over without affecting the normal operation of the system.
Figure 2.7.d. The watchdog may be used to disconnect the outputs.
A single-channel system which is intended to run in continuous mode should be reset by a watchdog alarm, and the operation should continue as soon as possible. No safe state exists in this case, and there will be risks associated with an inactive processor.
If the control system contains any values measured or calculated during run-time, it must be specified how these values shall be handled after a watchdog alarm. Can they still be trusted and used, or must all such values be reset to default?
2.7.4 Examples
Following are examples of measures and techniques which are often employed to detect faults in program sequence:
- an on-chip watchdog with separate time base without time-window, e.g. Motorola microcontroller 68HC11 [IEC61508-7, clause A.9.1]
- a watchdog with separate time base without time-window, e.g. a Maxim microprocessor supervisor IC [IEC61508-7, clause A.9.1]
- logical monitoring of programme sequence implemented in software[IEC61508-7, clause A.9.3]
- combination of temporal and logical monitoring of programme sequence [IEC61508-7, clause A.9.4]
3 Diagnostic coverage
3.1 Processing unit
The central processing unit (CPU) is responsible for executing all software instructions in their programmed sequence.
A correct function of the CPU relies upon the use of adequate features for control and avoidance of faults or dysfunction in any of its components. The arithmetic/logic unit, the control unit, the main memory, stacks and registers are integral parts of the CPU.
The arithmetic/logic unit
The arithmetic/logic unit (ALU) operates as the processor’s calculator, executing operations and returning the results to the main memory.
The operation of the arithmetic/logic unit and control unit depends on the contents of associated registers, accumulators and stacks. These accumulators, stacks, and registers hold the operands necessary to the execution of the arithmetic instructions as well as the results of arithmetic calculations or data manipulations.
The control unit
The task of the control unit is to interpret machine instructions and to issue commands to all other units such as storage devices, input and output devices, and main memory.
The current value of the control register indicates at which address the next instruction shall be fetched. The function of the instruction control unit is to generate the microprogram which results from the decoding operations performed by the instruction decoder.
The types of faults which may affect the instruction control unit may depend on register and main memory faults or on faults in the instruction decoder. The register faults are similar to memory faults. Faults in the instruction decoder may be a combination of faults in main memory and registers and faults in the decoder logic.
The main memory registers and stacks
The main memory is used for short term storage, holding data which is currently needed to carry out processing instructions.
Normally, program instructions and data are stored in different places in the main memory. The area of main memory where program instructions are held while awaiting execution is the instruction stack.
The registers
. Accumulator where results of computations are hold
. Storage register where data about to be used is stored
. Addressed register
. General purpose register
are used to hold current data and sometimes sections of the program to be processed.
The types of faults or dysfunction which may affect the main memory, stack and registers are similar to those listed in the following section. (3.2 Memory ranges). Consequently certain methods used to detect or avoid faults in the CPU may be common those used to detect faults in memory units.
There are several methods to achieve fault diagnostic coverage of the processing unit of which the following can be used to reach a specified rate of coverage.
This work shall focus upon the methods listed in the IEC 61508 standard (part 2, A.4).
- Comparator
- Majority voter
- Self-test by software: limited number of patterns (one channel)
- Self-test by software : walking bit (one channel)
- Self-test supported by hardware (one channel)
- Reciprocal comparison by software
3.1.1 Comparator (IEC 61508-7, A.1.3)
This method is applicable for the detection of errors in instruction decoding and execution.
In dual or multi-channel systems a special device provides for the detection of fault/error by means of comparison of data from the different channels.
The periodicity of the test execution shall comply with the specific application in order to reach an acceptable level of diagnostic coverage.
If a fail-safe comparator device is used, a continuous execution of the test procedure or an adequate periodic repetition frequency of the test should provide for a high diagnostic coverage. In the following example, the comparator is a hardware device.
Figure 3.1a
In the following example the comparator is implemented by software. The multiple processing is realised by two independent sets of data and application programs. Since there is only one processing unit, the double processing is performed in sequence as shown in the figure. The independent output results are checked by the software comparator.
A high fault diagnostic coverage is achieved if the implementation of the two application programs is diverse.
Figure 3.1 b Software comparator single channel
3.1.2 Majority voter (IEC 61508-7, A.1.4)
This method is applicable for the detection of errors in instruction decoding and execution in multi-channel systems. The method shall detect and mask failures in one of at least three hardware channels.
3.1.3 Self-test by software: limited number of patterns [one channel] (IEC 61508-7, A.3.1)
Standard techniques are used for the hardware construction. The failure detection is realised by means of specific software functions which provide for self-tests using at least two data patterns.
For example a periodic testing can be carried out by introducing the test patterns 77hex and 88hex.
A low fault diagnostic coverage is achieved by this method.
3.1.4 Self-test by software : walking bit [one channel] (IEC 61508-7, A.3.2)
This test method is focused on faults which can affect the memory parts of the processing unit. Assuming that the hardware consists of standard memory type without any parity bit, software test functions are implemented. Self-tests are performed by using a data pattern to check the physical storage medium (stacks, registers, accumulators).
Given a specific memory unit to check, the different cells are sequentially addressed with a one bit value. The content of all the cells in the register are then read. The same procedure is repeated for the next cell until all cells are addressed.
A medium fault diagnostic coverage is achieved by this method.
Example : test of a four-bit register
3.1.5 Self-test supported by hardware [one channel] (IEC 61508-7, A.3.3)
Hardware supported test method for failure detection. A periodic hardware facility may be used for monitoring a certain bit pattern according to the watchdog principle.
A medium fault diagnostic coverage is achieved by this method.
3.1.6 Reciprocal comparison by software (IEC 61508-7, A.3.5)
The method is built on the comparison result of data exchanged between the processing units of a multi-channel architecture. The types of data include results, intermediate results and test data.
The detection of discrepancies results into the generation of an error/failure message.
A high fault diagnostic coverage is achieved by this method.
3.1.7 Summary of examples
The table in figure 3.1 lists methods for checking the processing units. The list is not exhaustive, and other acceptable methods exist,
Test method for diagnostic coverage of processing unit |
Maximum achievable fault diagnosis coverage |
Reference in IEC 61508- 7 |
Comparator |
High |
A.1.3 |
Majority voter |
High |
A.1.4 |
Self-test by software : limited number of patterns (one channel) |
Low |
A.3.1 |
Self-test supported by software : walking bit |
Medium |
A.3.2 |
Self-test supported by hardware (one channel) |
Medium |
A.3.3 |
Reciprocal comparison by software |
High |
A.3.5 |
Figure 3.1 Examples of test methods of processing unit
3.2 Memory ranges
One aspect of memory design involves ensuring the integrity of data stored in memory. Memory devices can sometimes develop permanent faults which cause the memory to function incorrectly. Faults in a memory device can affect three different functions:
- The memory cell array,
- The decoder logic
- The Read/Write logic.
Memory Cell Array
A memory unit is built of memory cells. Physical cellular alterations such as metallization short-circuits and capacitive coupling may lead to the following types of faults:
- One or more cells are stuck at 0 or 1
- The occurence of a coupling between one or several pairs of cells. In such a case, the transition from x to y in one cell of the pair induces a state transition in the other cell.
Decoder
The function of the decoder is to select a unique memory cell for a specific address. The occurence of any failure in the decoder logic gives rise to one of the following behaviours :
- The decoder does not access the addressed cell and may even access non addressed cells.
- The decoder accesses multiple cells, including the addressed cell.
Depending on the logic used, the case of no access to an addressed cell is equivalent to that cell stuck at 0 or stuck at 1.
The case of multiple accesses is equivalent to coupling between cells in the memory cell array.
Consequently, the faults encountered in the decoder do not need to be treated specifically since they are in every aspect similar to the set of faults observable in a memory cell array.
Read/Write Logic
Independently of the type of memory ( variable, invariable), the device can be considered to have K inputs and N outputs. The occurence of any failure in the Read/Write logic gives rise to one of the following behaviours :
Data input lines or data output lines may interact with each other via short-circuits or capacitive coupling. These errors are equivalent to coupling between memory cells.
Output lines of the sense amplifier logic or write driver logic may be stuck at 0 or stuck at 1. In either case this fault is equivalent to stuck-at-0 or stuck-at-1 memory cells that correspond to the stuck output lines.
The purpose of the following sections is to point out adequate methods and techniques to achieve specified levels of diagnostic coverage. These techniques, adapted to the current type of memory device, shall concentrate on faults in the memory cell only.
3.2.1 Invariable memory ranges
There are several methods to achieve fault diagnostic coverage of memory units of which the following can be used to reach a specified rate of coverage.
This work shall focus upon the methods listed in the IEC 61508 standard (part 2, A.5 - A.6).
Word saving multi-bit redundancy
Modified checksum
Signature of one word (8 bit)
Signature of a double word (16 bit)
Block replication
3.2.1.1 Word saving multi-bit redundancy ( IEC 61508-7, A.4.1)
The multi-bit redundancy technique is used to detect single bit errors, 2-bit errors, 3-bit errors, and all-bit errors in a 16-bit word. A common approach to error bit detection, is the simple parity technique where one parity bit is added to each character.
The word saving multi-bit redundancy is a variation of simple parity and is achieved by the addition of several redundant bits, a so called check code, to each word. By doing so the probability of multi-bit error detection is increased.
Usually data rarely consist of long continuous streams of bits. Data access or recovering is broken into blocks of data.
A block code converts a fixed length of K data bits to a fixed length N code word, where N > K. The rate of the code is the ratio K / N, and the redundancy of the code is 1 - (K/N).
The procedure can also be used to detect addressing errors, by calculating the redundant bits for the concatenation of the data word and its address.
Diagnostic coverage : medium
DCN =
3.2.1.2 Modified checksum (IEC 61508-7, A.4.2)
Common to all checksum techniques is that one set of code is generated by a specific algorithm. This set of code, representing the contents of a memory area is then stored as the defined checksum value. At runtime, during self test, a new set of code is generated by the same algorithm and compared with the stored value. If a difference occurs an adequate measure is taken and/or a failure message is raised to bring the condition to the attention of the user/operator.
What characterises a checksum technique is the nature of the set of code generated and the type of algorithm used. The combination of those two factors permits to chose the appropriate method for a specified rate of diagnostic coverage.
In the case of the modified checksum a single word is generated and saved. This word represents the contents of all words in memory.
A modified checksum is a fault/error control technique in which a single word representing the contents of all words in memory is generated and saved. The algorithm applied to derive the checksum value shall use all the words in the current block of memory.
During self test, a checksum is produced from the current algorithm and compared with the previously saved checksum value. The detection of a difference shall result in a predefined invariable memory-error measure and/or message.
Diagnostic coverage :
DCN = » 50 %
3.2.1.3 Signature of one word (8 bit) (IEC 61508-7, A.4.3)
This technique is used to detect 1-bit errors and multi-bit errors within a word.
The Cyclic redundancy check (CRC), is used in applications involving detection of small changes in blocks of data. Such applications include start-up verification of ROM code, program and data correctness validation.
The CRC algorithm operates on a block of data as a unit. The CRC algorithm divides this single large value by the CRC polynomial or generator polynomial. CRC polynomials are designed and constructed for test of data blocks of limited size. Larger amount of data invalidate some of the expected properties such as the guarantee of detecting any 2-bit errors.
The remainder of the division between the value of the data block and the CRC polynomial is the CRC result or signature. In the case of Signature of one word technique, the CRC result shall be at least one word in size.
The signature is sent or stored along with the original data. When the data is received or recovered from storage, the CRC algorithm is reapplied and the latest signature compared with the original signature. A failure message is raised if there is a difference.
Diagnostic coverage : DCN =
= 99.6 %
3.2.1.4 Signature of a double word (16 bit) (IEC 61508-7, A.4.4)
This technique is used to detect 1-bit errors and multi-bit errors within a word, as well as all possible bit errors.
The signature of a double word technique is a variation of the signature of one word technique where the CRC result value is at least two words in size.
For 16-bit polynomials, the maximum designed data length is generally 215 - 1 bits, which is just less than 4K bytes. Consequently, a 16-bit polynomial is not the best choice to produce a single result representing an entire file, or even to verify a single EROM device, of size 8K or more.
Diagnostic coverage :
DCN = » 99.998%
3.2.1.5 Block replication (IEC 61508-7, A.4.5)
This technique is used to detect all bit errors.
The block replication is a technique used for storage and loading of safety-related data and programs. The block of data is duplicated in different format and stored in separate memory areas. The contents of the two memory areas are compared and a failure message is raised if a difference is detected.
The detection of certain types of bit-errors requires that the data is stored inversely in one of the two memories and re-inverted when read.
Diagnostic coverage : high
DCN =
3.2.1.6 Summary of examples
The table in figure 3.1 lists methods for checking the invariable memory ranges. The list is not exhaustive, and other acceptable methods exist.
Method for fault diagnostic coverage of invariable memory ranges |
Maximum achievable fault diagnosis coverage |
Reference in IEC 61508- 7 |
Word saving multi-bit redundancy |
High |
A.4.1 |
Modified checksum |
Low |
A.4.2 This technique detects all the odd errors and some of the even errors |
Signature of one word (8 bit) |
High |
A.4.3 This technique detects all one bit and a high percentage of multibit errors |
Signature of a double word (16 bits) |
High |
A.4.4 |
Block repetition |
High |
A.4.5 |
Figure 3.2.1 Examples of test methods for invariable memory
3.2.2 Variable memory ranges
3.2.2.1 RAM test ”checkerboard” or ”march” (IEC 61508-7, A.5.1)
The RAM checkerboard test is used to detect static bit errors.
The checkerboard test is based on a checker-board type pattern of 0’s and 1’s which is written into a bit-oriented memory area. The cells of the memory area under test are inspected in pairs to ensure that the contents are the same and correct. The address of the first cell in such a pair is variable and the address of the second cell is derived by bit inversion of the first address.
In the first phase, the variable address is incremented until the end of the address space of the memory area under test.
In the second phase, the address is decremented to its original value.
The test is repeated with the checker-board pattern inverted
A failure message is raised if any difference is occurs.
Diagnostic coverage : low
DCN =
3.2.2.2 RAM test ”walk-path” (IEC 61508-7, A.5.2)
The RAM walk-path test is used to detect static and dynamic bit errors, as well cross-talk between memory cells.
The first step in the walk-pat test consists of initialising the chosen memory area to be tested. This is realised by writing a standard data pattern in that memory area. A bit inversion is performed on the first cell and the remaining memory area is inspected.
The second step consists of re-inverting the first cell followed by another inspection of the remaining memory area.
These two steps are repeated for all memory cells in the memory area under test.
A symmetric test is carried out by performing a bit inversion of all cells in memory under test and proceeding as described above.
Diagnostic coverage : medium
DCN =
3.2.2.3 RAM test ”galpat” or ”transparent galpat”
(IEC 61508-7, A.5.3)
The Galpat and the transparent Galpat techniques are used to detect static bit errors and dynamic couplings between cells.
Description and performance of the method
The first step in the Galpat test consists of initialising a chosen memory area. This is realised by setting all cells to the value 0 or 1.
The second step consists of inverting one cell at the time;
For each inverted cell, the value of the remaining cells are inspected sequentially by single read access. The inverted cell is checked after each read access.
The same procedure is repeated with the opposite initialisation of the memory range to be tested.
In the transparent Galpat method no initialisation takes place. When a cell to be tested is chosen, the inspection of the remaining cells is realised by sequential read access followed by the generation of a signature e.g. S1, which is stored.
The next step is to invert the cell to be tested and to repeat the same inspection process of the remaining cells followed by the generation of a second signature e.g. S2, which is also stored. The signatures S2 and S1 are then compared. Any difference gives rise to an error message.
The last step in the method is the re-inversion of the cell to be tested followed by the generation of the signature e.g. S3 of all the remaining cells. A comparison of S3 and S1 is performed and any discrepancy gives rise to an error message.
The inspection of all the cells in the memory range are done in the same manner.
Diagnostic coverage : high
DCN =
3.2.2.4 RAM test ”Abraham” (IEC 61508-7, A.5.4)
This technique is used to detect all stuck-at and coupling errors between memory cells.
Description and performance of the method
The Abraham test is a form of variable memory pattern test which identifies all stuck-at faults and all coupling faults between memory cells. The proportion of faults detected exceeds that of the RAM test ”galpat”. A number of 30n operations is necessary to perform the test of n cells in memory.
Diagnostic coverage : high
DCN =
3.2.2.5 Parity-bit for RAM (IEC 61508-7, A.5.5 One-bit redundancy - for example RAM monitoring with a parity bit)
This technique is used to detect all possible bit errors in the memory range tested.
A very common approach to error detection is the single parity check code. This code appends to each K data bits an additional bit whose value is taken to make the K+1 word even (or odd).
Such a choice is said to have even (odd) parity. With even (odd) parity, a single bit error will make the received word odd (even).
The implementation of this technique for an N*8-bit memory simply requires an extra block of memory to realise an N*9-bit memory. If any single bit of the 8 stored bits is corrupted, the parity bit will be incorrect. This is also the case for 3, 5, or 7 bits errors. However 2 or any even number of errors will not be noticed.
Diagnostic coverage : low
DCN =
3.2.2.6 RAM monitoring with a modified hamming code (IEC 61508-7, A.5.6)
This technique is used to detect all odd-bit errors, all 2-bit errors, 3-bit and multi-bit errors.
Error-Correcting Codes - Hamming Code
Data is rarely transmitted in long continuous streams of bits. Transmission of data is usually broken into blocks or message. Each block or message is a separate unit of transmission.
A message of length n (n = m + r) consists of :
m message bits (data bits)
r redundant bits (check bits)
The check bits are part of the error protection and recovery mechanism built in the message.
The message of n bits size, is called an n-bit codeword.
The number of bits which differ in two codewords is called the Hamming Distance.
The significance of the Hamming distance is that if two codewords are a Hamming distance d apart, it will require d single bit errors to convert one into the other. In other words, errors that involve less bits than the Hamming distance can be detected.
In most data transmission applications, all 2m possible data messages are legal. But, depending on how check bits are computed, not all codewords are used.
Knowing how codewords are generated, gives information on how to construct a list of legal codewords and from that knowledge find the minimum Hamming distance.
This distance is the Hamming distance of the complete code.
The error-detecting and correcting properties of a code depend on its Hamming distance.
To detect d errors requires a distance d +1 code.
With such a code it is not possible that d single-bit errors can change a valid codeword into another valid codeword.
To correct d errors requires a distance 2d+1 code, so that legal codewords are so far apart that even with d changes, the original codeword is still closer than any other codeword and consequently can be uniquely determined.
Diagnostic coverage : high
DCN =
3.2.2.7 Double RAM with hardware or software comparison and read/write test (IEC 61508-7, A.5.7)
This technique is used to detect all bit errors.
The double RAM with hardware or software comparison and read/write test technique is based on duplication of the safety-related contents of a chosen memory area.
The contents of memory is duplicated in different format and stored in separate memory areas.
The contents of the two memory areas are compared and a failure message is raised if a difference is detected.
Diagnostic coverage : high
DCN =
3.2.2.8 Summary of examples
The table in figure 3.2 lists methods for checking the variable memory ranges. The list is not exhaustive, and other acceptable methods exist.
Method for fault diagnostic coverage of variable memory ranges |
Maximum achievable fault diagnosis coverage |
Reference in IEC 61508- 7 |
RAM test ”checkerboard” |
Low |
A.5.1 |
RAM test ”walk-path” |
Medium |
A.5.2 |
RAM test ”galpat or ”transparent” |
High |
A.5.3 Transparency during the operating cycle can be achieved by partitioning the memory. The testing of each partition is later realised in different time segments. |
RAM test ”Abraham” |
High |
A.5.4 |
Parity-bit for RAM |
Low |
A.5.5 |
RAM monitoring with a modified hamming code
|
High |
A.5.6 |
Double RAM with hardware or software comparison and read/write test |
High |
A.5.5 |
Figure 3.2.2 Examples of test methods for variable memory
3.3 I/O Units and Interfaces
There are several methods to achieve fault diagnostic coverage of I/O units and interface of which the following can be used to reach a specified rate of coverage.
This work shall focus upon the methods listed in the IEC 61508 standard (part 2, A.7).
Test pattern
Code protection
Multi-channelled parallel output
Monitored outputs
Input comparison/voting
3.3.1 Test pattern (IEC 61508-7, A.6.1)
The aim of this test method is to detect the faulty states which may affect any input or output lines. A defined flow of data is fed via the input ports of the control system which after processing gives rise to the corresponding signals in the respective output lines. The output signal pattern is then compared to the expected signals.
The repetition frequency of this test procedure depends of the current application.
Diagnostic coverage : high
Example :
3.3.2 Code protection (IEC 61508-7, A.6.2)
The aim of this test method is to detect random hardware and systematic failures in the input/output data flow. A coding procedure or a table system is used to limit the number of input or output signal combinations for which the system operates normally. By means of a decoding routine the status of the input and output signals are determined in such a way that any systematic error or random failure is detected. This information can be further used to add more protection under run-time.
Diagnostic coverage : high
Example:
A two channelled safety-related system is based on independent input signals In1 and In2.
By coding In1 and In2, information redundancy is added at the input stage. The parallel processing of these signals in respective computer results in two output signals. If special conditions are set to generate a common status signal any discrepancies are detected and result in appropriate safety measures.
3.3.3 Multi-channel parallel output (IEC 61508-7, A.6.3)
The multi-channel parallel output feature enables the detection of random hardware faults such as stuck-at faults as well as faults resulting from external devices. Errors/failures such as timing errors, addressing errors, drift failures and transcient failures may also be detected with such a method. This control technique implies that the system is built with independent outputs. The error detection procedure is carried out under the diagnostic interval by external comparators which can switch off the current equipment. Diagnostic coverage : high
3.3.4 Monitored outputs (IEC 61508-7, A.6.4)
Individual failures can be detected by using the monitored output method which is data flow-dependent. This method enables detection of faults/errors and failures originating from external equipment, timing errors, addressing errors as well as drift failures and transcient failures
The error detection procedure is effective only if the data flow changes appear under the diagnostic interval. Depending on the robustness of this control mechanism, detected failures may sometimes not be track to a specific output line. Diagnostic coverage : high
3.3.5 Input comparison / voting (IEC 61508-7, A.6.5)
Input comparison voting is an error / fault method used to detect individual failures from external equipment or units. Errors/failures such as timing errors, addressing errors, drift failures and transcient failures may also be detected with such a method. This control technique implies that the system is built with independent input units. The error detection procedure is carried out under the diagnostic interval by an external comparator which can switch off the signal processing in respective processor unit.
Diagnostic coverage : high
3.3.6 Summary of examples
The table in figure 3.3 lists methods for checking the I/O units and interfaces. The list is not exhaustive, and other acceptable methods exist.
Method for fault diagnostic coverage of I/O units and interface |
Maximum achievable fault diagnosis coverage |
Reference in IEC 61508- 7 |
Test pattern |
High |
A.6.1 |
Code protection |
|
A.6.2 |
Multi-channel parallel output |
High |
A.6.3 |
Monitored outputs |
High |
A.6.4 |
Input comparison / voting |
High |
A.6.5 |
Figure 3.3 Examples of test methods for I/O units and interfaces
3.4 Data paths
There are several methods to achieve fault diagnostic coverage of Data paths of which the following can be used to reach a specified rate of coverage.
This work shall focus upon the methods listed in the IEC 61508 standard (part 2, A.8).
Multi-bit hardware redundancy
Complete hardware redundancy
Inspection using test patterns
Transmission redundancy
Information redundancy
3.4.1 Multi-bit hardware redundancy (IEC 61508-7, A.7.2)
The aim of this method is to detect errors during transmission on the bus and in serial transmission links. By extending the bus with two or more lines, error detection is achieved by using hamming code techniques.
Diagnostic coverage : medium
3.4.2 Complete hardware redundancy (IEC 61508-7, A.7.3)
The aim of this method is to detect errors during the communication process between units within a system. The communication bus is doubled and the additional lines are used to detect errors.
Diagnostic coverage : high
3.4.3 Inspection using test patterns (IEC 61508-7, A.7.4)
The aim of this method is to detect static failures (stuck-at failure) and cross-talk. By using a data flow-independent cyclical test of data path, a comparison is done to compare current observation with the expected values. This method is effective only if the pattern information, the test pattern reception, and the pattern evaluation are independent of each other.
Diagnostic coverage : high
3.4.4 Transmission redundancy (IEC 61508-7, A.7.5)
The aim of this method is to detect and avoid transcient failures. The transmission of information is repeated several time in sequence.
Diagnostic coverage : high
3.4.5 Information redundancy (IEC 61508-7, A.7.6)
The aim of this method is to detect and avoid errors in bus communication. Usually the information transfer is carried out in blocks, followed by a checksum calculation.
Diagnostic coverage : high
3.4.6 Summary of examples
The table in figure 3.4 lists methods for checking the data paths. The list is not exhaustive, and other acceptable methods exist.
Method for fault diagnostic coverage of data paths |
Maximum achievable fault diagnosis coverage |
Reference in IEC 61508- 7 |
Multi-bit hardware redundancy |
Medium |
A.7.2 |
Complete hardware redundancy |
High |
A.7.3 |
Inspection using test patterns |
High |
A.7.4 |
Transmission redundancy |
High |
A.7.5 |
Information redundancy |
High |
A.7.6 |
Figure 3.4 Examples of test methods for data paths
3.5 Power supply
The possible faults cannot be defined in the an exact way, as for example memory faults can be. Faults in the supply power may be characterised as:
- supply voltage lower than specified minimum limit.
- supply voltage higher than specified maximum limit.
- supply voltage drifting within the specified range.
- oscillations.
The voltage variations can further be characterised as fast or slow, and intermittent or singular.
Neither is it possible to exactly define the faults which are detected by a certain technique to monitor the supply power. The diagnostic coverage of the power supply monitoring cannot be calculated. It will have to be defined in terms such as low, medium or high.
Continuous monitoring without checking for faults in the circuitry can be defined to have low diagnostic coverage. A hardware or software fault in the monitoring mechanism may easily disable the supply voltage monitoring without this being observed by the processing unit.
Continuous monitoring using dynamic signals can be defined to have high diagnostic coverage. A fault in the monitoring function must then be regarded as likely to corrupt the dynamic signals. The processing unit will observe the change in the dynamic signal, and detect the fault.
Machine control systems do very seldom employ advanced or redundant monitoring of the supply voltage in a single channel. Dual-channel designs usually have two independent voltage monitoring circuits. Most circuits for supply voltage monitoring may be considered to have low diagnostic coverage.
The table in figure 3.5 lists methods for checking the power supply. The list is not exhaustive, and other acceptable methods exist.
Method for fault diagnostic coverage of power supply |
Maximum achievable fault diagnosis coverage |
Reference in IEC 61508- 7 |
Continuous monitoring (no checking in the circuitry) |
Low |
|
Continuous monitoring using dynamic signals |
High |
|
Figure 3.5 Examples of test methods for power supply
3.6 Program sequence
Faults in the programme sequence may be caused by hardware faults, software faults or environmental disturbances. It is not possible to define the fault modes in a precise way. Neither is it possible to calculate the number of faults which will be detected. The diagnostic coverage cannot be calculated, but has to be agreed upon by definition, e.g. as low, medium or high.
Techniques to detect faults in programme sequence are based on software, hardware or combinations of both.
A frequently use technique is a hardware circuit (‘watch dog’) triggered by the software of the processing unit within a periodic interval. This technique may be further enhanced by logical monitoring of the execution. The different parts of the software will check for a correct sequence between them. Software checking may also be used without hardware circuitry.
Depending on combinations of techniques, different diagnostic coverage may be claimed.
There are some hardware aspects which will influence the diagnostic coverage:
- monitoring time base shared with CPU, or independent of CPU time base
- on-chip hardware or hardware separated from the processing unit.
- fail-safe monitoring hardware, i.e. a fault in the hardware used for monitoring the execution will cause an alarm.
There are also software aspects which will affect the diagnostic coverage:
- triggering from a non safety-related part of the software.
- testing of the program sequence monitoring function at reset.
- software monitoring of the correct sequence of the individual program sections by passing ”keys” between software modules, or by ”counting”.
Execution monitoring without hardware support is very sensitive to hardware faults in the processing unit. Faults in the processing unit may disable both the execution of the application software, and the monitoring software. Execution monitoring based only on software must be defined to have low diagnostic coverage.
A monitoring device integrated on the same chip as the processing unit, and triggered at a periodic interval, may be regarded to have low diagnostic coverage.
An external hardware circuit dedicated to execution monitoring will be less subject to common cause failures with the processing unit. It can be regarded to have medium diagnostic coverage.
An external fail-safe circuit, or an external periodically tested circuit, can be regarded to have high diagnostic coverage.
Machine control systems do seldom employ advanced or redundant monitoring of the programme sequence in a single channel. Dual-channel designs usually have two independent ‘watchdogs’, and the processing units may synchronise and thereby monitor the programme sequence of the other channel.
The table in figure 3.6 lists methods for checking the invariable memory ranges. The list is not exhaustive, and other acceptable methods exist.
Method for fault diagnostic coverage of program sequence execution |
Maximum achievable fault diagnosis coverage |
Reference in IEC 61508- 7 |
Monitoring device integrated on the same chip as the processing unit |
Low |
|
External hardware circuit or external periodically tested circuit |
Medium |
|
External fail-safe circuit |
High |
|
Figure 3.6 Examples of test methods for program sequence
4 Requirements
In the work package 2.1 it is shown that, for system architectures typical for complex electronic systems (CES) a correspondence between the categories of EN 954-1 and the Safety Integrity Levels of IEC 61508 is possible if a complete safety function is executed by the CES. To get an hierarchical risk reduction between categories B, 2, 3 and 4 the following tables can be used. If the designer can justify that the assumptions for the architectures are the same than made in chapter 1.3 the following tables could be used without quantification. Using the result of chapter xxx a link can be made between the categories of EN 954-1 and the Safety Integrity Levels of IEC 61508 for these designated architectures for the machinery sector for CES.
For complex electronic systems according to chapter 6.2.2 of EN 954-1 cannot be used to realize Category 1. For this reason, the following tables do not contain any checking principles for category 1.
The requirements listed in this report are combined to form adequate safety principles for a certain category (B, 1, 2, 3 or 4). Different safety principles are listed for the different architectures in category 3 and 4. This is commented in chapter 1.3.
Some of the checking principles may be applied either at power-up, or continuously during run-time. This will be much depending on the application and no specific requirements are given for when (or how often) the checking must take place.
4.1 Processing unit
4.1.a Requirement: The CPU shall be checked for stuck-at failures of registers and internal RAM.
4.1.b Requirement: The decoding and execution of instructions shall be checked.
4.1.c Requirement: All registers must be checked.
4.1.d Requirement: Faults in the processing unit shall be indicated by the PES.
Requirement |
Category |
|
|
|
|
|
|
B |
2 |
3 Single |
Dual |
4 Dual |
Triple |
4.1.a Registers&RAM |
|
|
x |
x |
x |
x |
4.1.b Instruction decoding |
|
|
x |
|
x |
|
4.1.c All registers |
|
|
x |
|
x |
|
4.1.d Indication |
|
|
x |
x |
x |
x |
Minimum diagnostic coverage |
- |
- |
High |
Medium |
High |
Medium |
Table 4.1 Safety principles for monitoring of the processing unit.
4.2 Invariable memory ranges
4.2.a Basic requirement: The PES shall be able to detect faults in the invariable memory.
4.2.b Additional requirement: The complete address range must be checked.
4.2.c Basic requirement: Memory failures shall be indicated by the PES.
Requirement |
Category |
|
|
|
|
|
|
B |
2 |
3 Single |
Dual |
4 Dual |
Triple |
4.2.a Memory check |
x |
x |
x |
x |
x |
x |
4.2.b Complete address range |
|
|
x |
|
x |
|
4.2.c Indication |
|
x |
x |
x |
x |
x |
Minimum diagnostic coverage |
Low |
Low |
High |
Medium |
High |
Medium |
Table 4.2 Safety principles for monitoring of invariable memory.
4.3 Variable memory ranges
4.3.a Basic requirement: The PES shall be able to detect faults in the variable memory.
4.3.b Additional requirement: The complete address range must be covered.
4.3.c Basic requirement: Memory failures shall be indicated by the PES.
Requirement |
Category |
|
|
|
|
|
|
B |
2 |
3 Single |
Dual |
4 Dual |
Triple |
4.3.a Memory check |
x |
x |
x |
x |
x |
x |
4.3.b Complete address range |
|
|
x |
|
x |
x |
4.3.c Indication |
|
x |
x |
x |
x |
x |
Minimum diagnostic coverage |
Low |
Low |
High |
Medium |
High |
Medium |
Table 4.3 Safety principles for monitoring of variable memory.
4.4 I/O Units and Interface
4.4.a Requirement: The PES shall automatically check the input and output units (digital, analogue, serial or parallel).
4.4.b Requirement: Faults detected in the internal communication shall be indicated.
Requirement |
Category |
|
|
|
|
|
|
B |
2 |
3 Single |
Dual |
4 Dual |
Triple |
4.4.a I/O check |
|
|
x |
x |
x |
x |
4.4.b Indication |
|
|
x |
x |
x |
x |
Minimum diagnostic coverage |
- |
- |
High |
Medium |
High |
Medium |
Table 4.4 Safety principles for monitoring of I/O units and interface.
4.5 Data paths
4.5.a Requirement: The PES shall automatically check the internal communication.
4.5.b Requirement: Faults detected in the internal communication shall be indicated.
Requirement |
Category |
|
|
|
|
|
|
B |
2 |
3 Single |
Dual |
4 Dual |
Triple |
4.5.a Data path check |
|
|
x |
x |
x |
x |
4.5.b Indication |
|
|
x |
x |
x |
x |
Minimum diagnostic coverage |
- |
- |
High |
Medium |
High |
Medium |
Table 4.5 Safety principles for monitoring of data paths.
4.6 Power supply
4.6.a Basic requirement: The PES shall be able to detect decreases in the supply voltage, and the execution of the processor must be halted in a controlled way.
4.6.b Additional requirement: The supply voltage monitoring circuit must be dynamic, i.e. correct supply voltage is indicated by a dynamic signal.
4.6.c Additional requirement: The supply voltage monitor circuit must be fail-safe, i.e. a fault in the circuitry shall lead to a power fail alarm.
4.6.d Additional requirement: Power supply failures shall be indicated by the PES.
Requirement |
Category |
|
|
|
|
|
|
B |
2 |
3 Single |
Dual |
4 Dual |
Triple |
4.6.a Supply power monitoring |
x |
x |
x |
x |
x |
x |
4.6.b Dynamic |
|
|
x or 46c |
|
x or 46c |
|
4.6.c Fail-safe |
|
|
x or46b |
|
x or46b |
|
4.6.d Indication |
|
|
x |
x |
x |
x |
Minimum diagnostic coverage |
Low |
Low |
High |
Medium |
High |
Medium |
Table 4.6 Safety principles for monitoring of supply power.
NOTE: It is difficult to find a good definition of diagnostic coverage for the power supply checking.
4.7 Program sequence
4.7.a Basic requirement: The PES shall have a watchdog implemented in hardware to monitor the program sequence.
4.7.b Additional requirement: The hardware (especially the time base) used for the watchdog shall be independent of the processor it is supposed to supervise.
4.7.c Additional requirement: There shall be software means of monitoring the program sequence.
4.7.d Additional requirement: The watchdog must be automatically tested by the software at power-up or a periodic intervals.
4.7.e Additional requirement: The watchdog must be fail-safe, i.e. a fault in the watchdog circuitry will lead to a watchdog alarm.
4.7.f Additional requirement: A watchdog alarm shall be indicated by the PES.
Requirement |
Category |
|
|
|
|
|
|
B |
2 |
3 Single |
Dual |
4 Dual |
Triple |
4.7.a Hardware watchdog |
x |
x |
x |
x |
x |
x |
4.7.b Independent hardware |
|
|
x |
|
x |
|
4.7.c Software monitoring |
|
|
x |
x |
x |
x |
4.7.d Tested |
|
|
x or 47e |
|
x or 47e |
|
4.7.e Fail-safe |
|
|
x or 47d |
|
x or 47d |
|
4.7.f Indication |
|
|
x |
x |
x |
x |
Minimum diagnostic coverage |
Low |
Low |
High |
Medium |
High |
Medium |
Table 4.7 Safety principles for monitoring of program execution.
NOTE: The principle of requiring more functionality for a higher category corresponds to requiring a higher diagnostic coverage.
5 Validation methods
5.1 Processing units
Aim: To validate the checking of processing units, and to determine its diagnostic coverage.
Description: The processing unit in a safety-related PES used for machine control is most often a microprocessor or a microcontroller. A hardware reference manual of the semiconductor manufacturer will be needed to study the processing unit. Software routines for checking of the processing unit shall be analysed. The analysis can be summarised in following check list:
|
Processing Units |
|
|
|
|
|
|
Ap.*) |
Yes |
No |
Comment |
A |
Are the processing units checked for faults? |
|
|
|
|
B |
Have techniques and measures according to IEC 61508-2, table A.4 been used? |
|
|
|
|
C |
Have the software routines used for checking of processing units been identified? |
|
|
|
|
D |
Have the software routines for handling of faults been identified? |
|
|
|
|
E |
Is the instruction decoding covered by self-checking ? |
|
|
|
|
F |
Are the internal registers covered by self-checking ? |
|
|
|
|
G |
Is self-checking performed at power-up ? |
|
|
|
|
H |
Is periodic self-checking performed at run-time ? |
|
|
|
|
I |
Has the diagnostic coverage been determined? |
|
|
|
|
J |
Has the software implementing the self-checking been checked without finding logical faults? |
|
|
|
|
K |
Are the techniques and measures used adequate as safety principles for the intended category (see chapter 4)? |
|
|
|
|
*) Ap. = Applicable question (Some questions may not be applicable for all control systems.)
5.2 Invariable memory ranges
Aim: To validate the memory checking, and to calculate its diagnostic coverage.
Description: Programme code and constants will be stored in invariable memory (ROM, EPROM etc.). A memory map will give an overview of the memory ranges. The hardware circuit diagram or the microprocessor manual may have to be studied. Software routines for memory checking and handling of memory faults shall be analysed. The analysis can be summarised in following check list:
|
Invariable Memory Ranges |
|
|
|
|
|
|
Ap.*) |
Yes |
No |
Comment |
A |
Is automatic and periodic checking of the invariable memory range made at a periodic interval? |
|
|
|
|
B |
Is automatic checking of the invariable memory range made at power-up? |
|
|
|
|
C |
Have techniques and measures according to IEC 61508-2, table A.5 been used? |
|
|
|
|
D |
Have the software routines used for memory checking been identified? |
|
|
|
|
E |
Have the software routines for handling of faults been identified? |
|
|
|
|
F |
Is the complete address range of the invariable memory covered by the test? |
|
|
|
|
G |
Has the software for memory checking been analysed without finding faults? |
|
|
|
|
H |
Has the software for handling of faults in memory been checked without finding faults? |
|
|
|
|
I |
Has the diagnostic coverage been calculated? |
|
|
|
|
J |
Are the techniques and measures used adequate as safety principles for the intended category (see chapter 4)? |
|
|
|
|
*) Ap. = Applicable question (Some questions may not be applicable for all control systems.)
5.3 Variable memory ranges
Aim: To validate the memory checking, and to calculate its diagnostic coverage.
Description: Variables and parameters will be stored in variable memory (RAM, EEPROM etc). A memory map will give an overview of the memory ranges. The hardware circuit diagram or the microprocessor manual may have to be studied. Software routines for memory checking and handling of memory faults shall be analysed. The analysis can be summarised in following check list:
|
Variable Memory Ranges |
|
|
|
|
|
|
Ap.*) |
Yes |
No |
Comment |
A |
Is automatic and periodic checking of the variable memory range made at a periodic interval? |
|
|
|
|
B |
Is automatic checking of the variable memory range made at power-up? |
|
|
|
|
C |
Have techniques and measures according to IEC 61508-2, table A.6 been used? |
|
|
|
|
D |
Have the software routines used for memory checking been identified? |
|
|
|
|
E |
Have the software routines for handling of faults been identified? |
|
|
|
|
F |
Is the complete address range of the variable memory covered by the test? |
|
|
|
|
G |
Has the software for memory checking been analysed without finding faults? |
|
|
|
|
H |
Has the software for handling of faults in memory been checked without finding faults? |
|
|
|
|
I |
Has the diagnostic coverage been calculated?
|
|
|
|
|
J |
Are the techniques and measures used adequate as safety principles for the intended category (see chapter 4)? |
|
|
|
|
*) Ap. = Applicable question (Some questions may not be applicable for all control systems.)
5.4 I/O Units and Interface
Aim: To validate the checking of I/O units and interfaces (external communication), and to determine its diagnostic coverage.
Description: The control system may be equipped with I/O units and interfaces for communication with external units. A hardware circuit diagram, and possibly also the reference manual of the semiconductor I/O circuits, will be needed for the analysis. Software routines for checking of external communication shall be analysed. The analysis can be summarised in following check list:
|
I/O Units and Interface |
|
|
|
|
|
|
Ap.*) |
Yes |
No |
Comment |
A |
Have all I/O units and interfaces been identified? |
|
|
|
|
B |
Are the I/O units and interfaces checked for faults at power-up? |
|
|
|
|
C |
Are the I/O units and interfaces checked for faults at run-time? |
|
|
|
|
D |
Have techniques and measures according to IEC 61508-2, table A.7 been used? |
|
|
|
|
E |
Have the software routines used for checking of I/O units and interfaces been identified? |
|
|
|
|
F |
Have the software routines for handling of faults been identified? |
|
|
|
|
G |
Is self-checking performed at power-up ? |
|
|
|
|
H |
Is periodic self-checking performed at run-time ? |
|
|
|
|
I |
Has the diagnostic coverage been determined? |
|
|
|
|
J |
Has the software implementing the self-checking been checked without finding logical faults? |
|
|
|
|
K |
Are the techniques and measures used adequate as safety principles for the intended category (see chapter 4)? |
|
|
|
|
*) Ap. = Applicable question (Some questions may not be applicable for all control systems.)
5.5 Data paths
Aim: To validate the data path (internal communication) checking, and to determine its diagnostic coverage.
Description: Communication between modules of the PES is made through different data paths. A block diagram of the hardware has to be studied in combination with the circuit diagrams. Software routines for checking of internal communication shall be analysed. The analysis can be summarised in following check list:
|
Data Paths |
|
|
|
|
|
|
Ap.*) |
Yes |
No |
Comment |
A |
Have all data paths for communication been identified? |
|
|
|
|
B |
Are the data paths checked for faults at power-up? |
|
|
|
|
C |
Are the data paths checked for faults at run-time? |
|
|
|
|
D |
Have techniques and measures according to IEC 61508-2, table A.8 been used? |
|
|
|
|
E |
Have the software routines used for checking of data paths been identified? |
|
|
|
|
F |
Have the software routines for handling of faults been identified? |
|
|
|
|
G |
Has the diagnostic coverage been determined? |
|
|
|
|
H |
Has the software implementing the self-checking been checked without finding logical faults? |
|
|
|
|
I |
Are the techniques and measures used adequate as safety principles for the intended category (see chapter 4)? |
|
|
|
|
*) Ap. = Applicable question (Some questions may not be applicable for all control systems.)
5.6 Power supply
Aim: To validate the power supply monitoring, and to determine its diagnostic coverage.
Description: The monitoring of supply voltage requires a combination of hardware and software measures. An analysis of the hardware circuit diagrams is often the best way to commence. The sensing of the power alarm signal, and the power down routines should be analysed in the source code. The analysis can be summarised in a check list:
|
Power Supply Monitoring |
|
|
|
|
|
|
Ap.*) |
Yes |
No |
Comment |
A |
Is the supply power monitored to give alarm at low voltage ("brown-out") ? |
|
|
|
|
B |
Is the supply power monitored to give alarm at high voltage? |
|
|
|
|
C |
Have techniques and measures according to IEC 61508-2, table A.8 been used? |
|
|
|
|
D |
Have the hardware circuit diagrams been analysed without finding faults? |
|
|
|
|
E |
Have the software routines used for power supply monitoring been identified? |
|
|
|
|
F |
Have the software routines for handling of faults been identified? |
|
|
|
|
G |
Has the interface between hardware and software been analysed without finding faults? |
|
|
|
|
H |
Has the control flow at low voltage alarm been analysed without finding faults ? |
|
|
|
|
I |
Has the time from low voltage alarm to entered ‘safe state’ been analysed without finding faults ? |
|
|
|
|
J |
Has the action taken at high voltage been analysed without finding faults? |
|
|
|
|
K |
Has the diagnostic coverage been determined? |
|
|
|
|
L |
Has the software implementing the self-checking been checked without finding logical faults? |
|
|
|
|
M |
Are the techniques and measures used adequate as safety principles for the intended category (see chapter 4)? |
|
|
|
|
*) Ap. = Applicable question (Some questions may not be applicable for all control systems.)
5.7 Program sequence
Aim: To validate the programme sequence monitoring (watchdog), and to determine its diagnostic coverage.
Description: The programme sequence monitoring may be implemented in combination of hardware and software, or just in software. The hardware circuit diagrams should be analysed for the measures in hardware. The source code should be analysed for triggering of hardware, monitoring mechanisms in software and handling of watchdog program sequence alarm.
|
Programme Sequence Monitoring |
|
|
|
|
|
|
Ap.*) |
Yes |
No |
Comment |
A |
Is there a hardware device (watchdog) to monitor the execution of the software? |
|
|
|
|
B |
Is there a software monitoring of the execution of the software ? |
|
|
|
|
C |
Have techniques and measures according to IEC 61508-2, table A.10 been used? |
|
|
|
|
D |
Has the hardware circuit diagram been analysed without finding faults? |
|
|
|
|
E |
Will a watchdog alarm result in a reset of the microprocessor? |
|
|
|
|
F |
Will a watchdog alarm result in the loss of the calculated values, and machine state? |
|
|
|
|
G |
Is the watchdog triggered at a period less than 5 seconds? |
|
|
|
|
H |
Is the time base of the hardware watchdog independent from the processor time base? |
|
|
|
|
I |
Is the watchdog a separate hardware circuit (not integrated into the microcontroller) ? |
|
|
|
|
J |
Is an unintended change in the program flow likely to lead to a watchdog alarm ? |
|
|
|
|
K |
Is the watchdog triggered during the execution of the main program ? |
|
|
|
|
L |
Has the diagnostic coverage been determined? |
|
|
|
|
M |
Has the software implementing the self-checking been checked without finding logical faults? |
|
|
|
|
N |
Are the techniques and measures used adequate as safety principles for the intended category (see chapter 4)? |
|
|
|
|
*) Ap. = Applicable question (Some questions may not be applicable for all control systems.)
6 Conclusions
Summing up the results of this work, it appears without any doubt that when safety-related functions are at stake, safety-principles shall be implemented in the design of the control system. The level of safety or the category shall be the starting point for the choice of architecture to be implemented. This architecture shall be supported by adequate techniques and measures to detect and control faults, to minimise their repercussion on the whole system and eventually to provide a soft transition to a safe-state.
Checking of processing units, memory ranges, I/O units, interfaces, data paths, power supply and program sequence are regarded as state-of the-art. The selection of checking techniques and the diagnostic coverage needed, will be depending on the risks associated with the machine to be controlled. A complex electronic control system of a high-risk machine shall be able to find most faults in the control system before hazardous situations can be caused. That calls for elaborate checking facilities with high coverage. A low-risk machine must also have safety principles implemented, but lower coverage can be accepted.
It has been possible to use the techniques and measures suggested in the IEC 61508 safety standard, to prescribe safety principles required by the EN 954-1 standard. The IEC 61508 has established a frame of reference which is well possible to use also with the EN 954-1. However, the recommendations for the safety integrity levels of IEC 61508 will have to be modified for the categories of EN 954-1.
The safety measures to be implemented are described and associated to the current architecture for a specific category. The checklists presented in this report will be of help both to the developer and to the assessor.
English