Difference between revisions of "System availability"
From Zenitel Wiki
(→Other failures) |
(→Redundancy) |
||
Line 108: | Line 108: | ||
Redundancy usually provides significant higher service availability | Redundancy usually provides significant higher service availability | ||
*A single failure shall have minimum or no impact on service availability | *A single failure shall have minimum or no impact on service availability | ||
+ | |||
+ | ==STENTOFON System Availability== | ||
+ | |||
+ | '''Redundancy''' | ||
+ | *Control room redundancy and parallel call handling | ||
+ | *Power supply redundancy | ||
+ | *Alternative AlphaNet routing | ||
+ | *Control card redundancy | ||
+ | *…. | ||
+ | <br> | ||
+ | '''Reduced MTTR''' | ||
+ | *AlphaNet supervision | ||
+ | *Station supervision and tone test | ||
+ | *Network monitoring (SNMP, Syslog, OPC) | ||
+ | <br> | ||
+ | '''Software failures and recovery''' | ||
+ | *HW watchdog | ||
+ | *SW process watchdog | ||
+ | *Automatic recovery | ||
+ | <br> | ||
+ | '''System maintenance''' | ||
+ | *Centralized and remote firmware upgrade | ||
+ | *Hot insert and removal of cards | ||
+ | *Control card redundancy | ||
+ | |||
+ | ==Some MTBF figures== | ||
+ | Zenitel USA has been keeping statistics about failures and the reasons for failures | ||
+ | <br> | ||
+ | <br> | ||
+ | Zenitel USA actively encourages repairs and the estimate is that 95% of failures is reported, even failures of equipment installed before the AlphaCom was introduced | ||
+ | <br> | ||
+ | <br> | ||
+ | The following figures are based on the Zenitel USA statistics; for new equipment a comparison is made to figures from known equipment | ||
+ | <br> | ||
+ | <br> | ||
+ | Data used are the sales figures from Zenitel USA and the fault reports over an 8 year period | ||
[[Category:AlphaCom E System]] | [[Category:AlphaCom E System]] |
Revision as of 14:20, 4 August 2009
Contents
System availability
System Availability
- The percentage of time that the system can perform its intended function
System Availability = 1 – System Downtime
Downtime per year
Availability | Nines | Downtime | |
90% | 1 | 36.5 | days/year |
99% | 2 | 3.65 | DAys/year |
99.9% | 3 | 8.78 | hours/year |
99.99% | 4 | 52 | minutes/year |
99.999% | 5 | 5 | minutes/year |
System Downtime
Many events causes system downtime:
- HW fault
- Software fault
- Vandalism
- Extreme conditions (fire, flooding etc)
- Power outage
- IP network failure
- Planned system maintenance
System Downtime = ∑ P * S * MTTR
P | = Probability of event taken place |
S | = Severity of event |
= Percentage of service affected by fault | |
MTTR | = Mean Time To Repair |
= mean time to detect fault + mean time to fix fault |
HW failure
MTBF
- Probability of HW faults calculated using MTBF figures
- MTBF ≠ System Availability
MTBF calculations
- Emperical method
- MIL-HDBK-217
- Telcordia
Emperical methods
- Based on statistics from the field
MIL-HDBK-217 and Telcordia
- All component entered in database with set environmental condition
- Provides usually lower MTBF figure than emperical methods
- Does include real usage conditions
- Use worst case environmental conditions
*More components gives higher MTBF
*MTBF and single points of failure
Other failures
Software fault
- Automatic watch dog functions
- Automatic recovery functions
- Maturity of system
- Structured software design and test
Vandalism and Extreme conditions
- Robustness to vandalism and extreme conditions
- IP and IK class
- IP security functions to hinder denial of service attacks (DOS)
Power outage
- UPS and redundant power supplier
IP network failure
- Network service level
- Redundant and switchover functions
Planned system maintenance
- Expansion, add users etc
- Ability to do maintenance without service interruptions
Redundancy
Redundancy is about parallelism and removing single point of failures
Redundancy usually gives lower MTBF figures
- Require more components
Redundancy usually provides significant higher service availability
- A single failure shall have minimum or no impact on service availability
STENTOFON System Availability
Redundancy
- Control room redundancy and parallel call handling
- Power supply redundancy
- Alternative AlphaNet routing
- Control card redundancy
- ….
Reduced MTTR
- AlphaNet supervision
- Station supervision and tone test
- Network monitoring (SNMP, Syslog, OPC)
Software failures and recovery
- HW watchdog
- SW process watchdog
- Automatic recovery
System maintenance
- Centralized and remote firmware upgrade
- Hot insert and removal of cards
- Control card redundancy
Some MTBF figures
Zenitel USA has been keeping statistics about failures and the reasons for failures
Zenitel USA actively encourages repairs and the estimate is that 95% of failures is reported, even failures of equipment installed before the AlphaCom was introduced
The following figures are based on the Zenitel USA statistics; for new equipment a comparison is made to figures from known equipment
Data used are the sales figures from Zenitel USA and the fault reports over an 8 year period