Difference between revisions of "System availability"
From Zenitel Wiki
(19 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | =System availability= | + | ==System availability== |
+ | System Availability | ||
+ | *The percentage of time that the system can perform its intended function | ||
+ | <br> | ||
+ | System Availability = 1 – System Downtime | ||
+ | <br> | ||
+ | <br> | ||
+ | <br> | ||
+ | '''Downtime per year''' | ||
+ | {|border="1" | ||
+ | |width="100px"|'''Availability''' | ||
+ | |width="100px"|'''Nines''' | ||
+ | |colspan="2" width="200px"|'''Downtime''' | ||
+ | |- | ||
+ | |90%||1||36.5||days/year | ||
+ | |- | ||
+ | |99%||2||3.65||days/year | ||
+ | |- | ||
+ | |99.9%||3||8.78||hours/year | ||
+ | |- | ||
+ | |99.99%||4||52||minutes/year | ||
+ | |- | ||
+ | |99.999%||5||5||minutes/year | ||
+ | |} | ||
− | [[Category: | + | ==System Downtime== |
+ | '''Many events causes system downtime:''' | ||
+ | *HW fault | ||
+ | *Software fault | ||
+ | *Vandalism | ||
+ | *Extreme conditions (fire, flooding etc) | ||
+ | *Power outage | ||
+ | *IP network failure | ||
+ | *Planned system maintenance | ||
+ | <br> | ||
+ | <br> | ||
+ | '''System Downtime = ∑ P * S * MTTR''' | ||
+ | {|border="0" | ||
+ | |width="60px"|P | ||
+ | |= Probability of event taken place | ||
+ | |- | ||
+ | |S||= Severity of event | ||
+ | |- | ||
+ | | ||= Percentage of service affected by fault | ||
+ | |- | ||
+ | |MTTR||= Mean Time To Repair | ||
+ | |- | ||
+ | | ||= mean time to detect fault + mean time to fix fault | ||
+ | |} | ||
+ | |||
+ | ==HW failure== | ||
+ | <br> | ||
+ | '''MTBF''' | ||
+ | *Probability of HW faults calculated using MTBF figures | ||
+ | *MTBF ≠ System Availability | ||
+ | <br> | ||
+ | '''MTBF calculations''' | ||
+ | *Emperical method | ||
+ | *MIL-HDBK-217 | ||
+ | *Telcordia | ||
+ | <br> | ||
+ | '''Emperical methods''' | ||
+ | *Based on statistics from the field | ||
+ | <br> | ||
+ | '''MIL-HDBK-217 and Telcordia''' | ||
+ | *All component entered in database with set environmental condition | ||
+ | *Provides usually lower MTBF figure than emperical methods | ||
+ | **Does include real usage conditions | ||
+ | **Use worst case environmental conditions | ||
+ | <br> | ||
+ | '''*More components gives higher MTBF''' | ||
+ | <br> | ||
+ | '''*MTBF and single points of failure''' | ||
+ | <br> | ||
+ | ==Other failures== | ||
+ | '''Software fault''' | ||
+ | *Automatic watch dog functions | ||
+ | *Automatic recovery functions | ||
+ | *Maturity of system | ||
+ | *Structured software design and test | ||
+ | <br> | ||
+ | '''Vandalism and Extreme conditions''' | ||
+ | *Robustness to vandalism and extreme conditions | ||
+ | *IP and IK class | ||
+ | *IP security functions to hinder denial of service attacks (DOS) | ||
+ | <br> | ||
+ | '''Power outage''' | ||
+ | *UPS and redundant power supplier | ||
+ | <br> | ||
+ | '''IP network failure''' | ||
+ | *Network service level | ||
+ | *Redundant and switchover functions | ||
+ | <br> | ||
+ | '''Planned system maintenance''' | ||
+ | *Expansion, add users etc | ||
+ | *Ability to do maintenance without service interruptions | ||
+ | |||
+ | ==Redundancy== | ||
+ | |||
+ | Redundancy is about parallelism and removing single point of failures | ||
+ | <br> | ||
+ | <br> | ||
+ | Redundancy usually gives lower MTBF figures | ||
+ | *Require more components | ||
+ | <br> | ||
+ | Redundancy usually provides significant higher service availability | ||
+ | *A single failure shall have minimum or no impact on service availability | ||
+ | |||
+ | ==STENTOFON System Availability== | ||
+ | |||
+ | '''Redundancy''' | ||
+ | *Control room redundancy and parallel call handling | ||
+ | *Power supply redundancy | ||
+ | *Alternative AlphaNet routing | ||
+ | *Control card AMC-IP redundancy | ||
+ | <br> | ||
+ | '''Reduced MTTR''' | ||
+ | *AlphaNet supervision | ||
+ | *Station supervision and tone test | ||
+ | *Network monitoring (SNMP, Syslog, OPC) | ||
+ | <br> | ||
+ | '''Software failures and recovery''' | ||
+ | *HW watchdog | ||
+ | *SW process watchdog | ||
+ | *Automatic recovery | ||
+ | <br> | ||
+ | '''System maintenance''' | ||
+ | *Centralized and remote firmware upgrade | ||
+ | *Hot insert and removal of cards | ||
+ | *Control card AMC-IP redundancy | ||
+ | |||
+ | == MTBF figures== | ||
+ | [http://en.wikipedia.org/wiki/MTBF MTBF] (Mean Time Between failures) figures try to give a measure of the reliability of a system. There are 2 ways in which MTBF figures can be obtained: | ||
+ | * '''Calculation.''' There are a number of different standards for the calculation of MTBF figures, of which MIL-HDBK-217 is probably the most widely used. | ||
+ | * '''Statistical information.''' MTBF figures can also be calculated form data gathered about failures in real installations. Of each electronic unit the number of hours in operation is reported at the moment of failure. When this is done over a sufficient large quantity of the same units a realistic ‘real life’ MTBF can be calculated. The MTBF of newly designed items can still be calculated this way by comparing them to existing items with a similar complexity. | ||
+ | |||
+ | Zenitel mainly uses statistical information, in line with common practice. | ||
+ | |||
+ | List of [[MTBF figures]] | ||
+ | |||
+ | |||
+ | |||
+ | [[Category:AMC Software]] |
Latest revision as of 10:26, 8 September 2017
Contents
System availability
System Availability
- The percentage of time that the system can perform its intended function
System Availability = 1 – System Downtime
Downtime per year
Availability | Nines | Downtime | |
90% | 1 | 36.5 | days/year |
99% | 2 | 3.65 | days/year |
99.9% | 3 | 8.78 | hours/year |
99.99% | 4 | 52 | minutes/year |
99.999% | 5 | 5 | minutes/year |
System Downtime
Many events causes system downtime:
- HW fault
- Software fault
- Vandalism
- Extreme conditions (fire, flooding etc)
- Power outage
- IP network failure
- Planned system maintenance
System Downtime = ∑ P * S * MTTR
P | = Probability of event taken place |
S | = Severity of event |
= Percentage of service affected by fault | |
MTTR | = Mean Time To Repair |
= mean time to detect fault + mean time to fix fault |
HW failure
MTBF
- Probability of HW faults calculated using MTBF figures
- MTBF ≠ System Availability
MTBF calculations
- Emperical method
- MIL-HDBK-217
- Telcordia
Emperical methods
- Based on statistics from the field
MIL-HDBK-217 and Telcordia
- All component entered in database with set environmental condition
- Provides usually lower MTBF figure than emperical methods
- Does include real usage conditions
- Use worst case environmental conditions
*More components gives higher MTBF
*MTBF and single points of failure
Other failures
Software fault
- Automatic watch dog functions
- Automatic recovery functions
- Maturity of system
- Structured software design and test
Vandalism and Extreme conditions
- Robustness to vandalism and extreme conditions
- IP and IK class
- IP security functions to hinder denial of service attacks (DOS)
Power outage
- UPS and redundant power supplier
IP network failure
- Network service level
- Redundant and switchover functions
Planned system maintenance
- Expansion, add users etc
- Ability to do maintenance without service interruptions
Redundancy
Redundancy is about parallelism and removing single point of failures
Redundancy usually gives lower MTBF figures
- Require more components
Redundancy usually provides significant higher service availability
- A single failure shall have minimum or no impact on service availability
STENTOFON System Availability
Redundancy
- Control room redundancy and parallel call handling
- Power supply redundancy
- Alternative AlphaNet routing
- Control card AMC-IP redundancy
Reduced MTTR
- AlphaNet supervision
- Station supervision and tone test
- Network monitoring (SNMP, Syslog, OPC)
Software failures and recovery
- HW watchdog
- SW process watchdog
- Automatic recovery
System maintenance
- Centralized and remote firmware upgrade
- Hot insert and removal of cards
- Control card AMC-IP redundancy
MTBF figures
MTBF (Mean Time Between failures) figures try to give a measure of the reliability of a system. There are 2 ways in which MTBF figures can be obtained:
- Calculation. There are a number of different standards for the calculation of MTBF figures, of which MIL-HDBK-217 is probably the most widely used.
- Statistical information. MTBF figures can also be calculated form data gathered about failures in real installations. Of each electronic unit the number of hours in operation is reported at the moment of failure. When this is done over a sufficient large quantity of the same units a realistic ‘real life’ MTBF can be calculated. The MTBF of newly designed items can still be calculated this way by comparing them to existing items with a similar complexity.
Zenitel mainly uses statistical information, in line with common practice.
List of MTBF figures