Actions

Difference between revisions of "System availability"

From Zenitel Wiki

(Other failures)
(Redundancy)
Line 108: Line 108:
 
Redundancy usually provides significant higher service availability
 
Redundancy usually provides significant higher service availability
 
*A single failure shall have minimum or no impact on service availability
 
*A single failure shall have minimum or no impact on service availability
 +
 +
==STENTOFON System Availability==
 +
 +
'''Redundancy'''
 +
*Control room redundancy and parallel call handling
 +
*Power supply redundancy
 +
*Alternative AlphaNet routing
 +
*Control card redundancy
 +
*….
 +
<br>
 +
'''Reduced MTTR'''
 +
*AlphaNet supervision
 +
*Station supervision and tone test
 +
*Network monitoring (SNMP, Syslog, OPC)
 +
<br>
 +
'''Software failures and recovery'''
 +
*HW watchdog
 +
*SW process watchdog
 +
*Automatic recovery
 +
<br>
 +
'''System maintenance'''
 +
*Centralized and remote firmware upgrade
 +
*Hot insert and removal of cards
 +
*Control card redundancy
 +
 +
==Some MTBF figures==
 +
Zenitel USA has been keeping statistics about failures and the reasons for failures
 +
<br>
 +
<br>
 +
Zenitel USA actively encourages repairs and the estimate is that 95% of failures is reported, even failures of equipment installed before the AlphaCom was introduced
 +
<br>
 +
<br>
 +
The following figures are based on the Zenitel USA statistics; for new equipment a comparison is made to figures from known equipment
 +
<br>
 +
<br>
 +
Data used are the sales figures from Zenitel USA and the fault reports over an 8 year period
  
  
 
[[Category:AlphaCom E System]]
 
[[Category:AlphaCom E System]]

Revision as of 14:20, 4 August 2009

System availability

System Availability

  • The percentage of time that the system can perform its intended function


System Availability = 1 – System Downtime


Downtime per year

Availability Nines Downtime
90% 1 36.5 days/year
99% 2 3.65 DAys/year
99.9% 3 8.78 hours/year
99.99% 4 52 minutes/year
99.999% 5 5 minutes/year

System Downtime

Many events causes system downtime:

  • HW fault
  • Software fault
  • Vandalism
  • Extreme conditions (fire, flooding etc)
  • Power outage
  • IP network failure
  • Planned system maintenance



System Downtime = ∑ P * S * MTTR

P = Probability of event taken place
S = Severity of event
  = Percentage of service affected by fault
MTTR = Mean Time To Repair
  = mean time to detect fault + mean time to fix fault

HW failure


MTBF

  • Probability of HW faults calculated using MTBF figures
  • MTBF ≠ System Availability


MTBF calculations

  • Emperical method
  • MIL-HDBK-217
  • Telcordia


Emperical methods

  • Based on statistics from the field


MIL-HDBK-217 and Telcordia

  • All component entered in database with set environmental condition
  • Provides usually lower MTBF figure than emperical methods
    • Does include real usage conditions
    • Use worst case environmental conditions


*More components gives higher MTBF
*MTBF and single points of failure

Other failures

Software fault

  • Automatic watch dog functions
  • Automatic recovery functions
  • Maturity of system
  • Structured software design and test


Vandalism and Extreme conditions

  • Robustness to vandalism and extreme conditions
  • IP and IK class
  • IP security functions to hinder denial of service attacks (DOS)


Power outage

  • UPS and redundant power supplier


IP network failure

  • Network service level
  • Redundant and switchover functions


Planned system maintenance

  • Expansion, add users etc
  • Ability to do maintenance without service interruptions

Redundancy

Redundancy is about parallelism and removing single point of failures

Redundancy usually gives lower MTBF figures

  • Require more components


Redundancy usually provides significant higher service availability

  • A single failure shall have minimum or no impact on service availability

STENTOFON System Availability

Redundancy

  • Control room redundancy and parallel call handling
  • Power supply redundancy
  • Alternative AlphaNet routing
  • Control card redundancy
  • ….


Reduced MTTR

  • AlphaNet supervision
  • Station supervision and tone test
  • Network monitoring (SNMP, Syslog, OPC)


Software failures and recovery

  • HW watchdog
  • SW process watchdog
  • Automatic recovery


System maintenance

  • Centralized and remote firmware upgrade
  • Hot insert and removal of cards
  • Control card redundancy

Some MTBF figures

Zenitel USA has been keeping statistics about failures and the reasons for failures

Zenitel USA actively encourages repairs and the estimate is that 95% of failures is reported, even failures of equipment installed before the AlphaCom was introduced

The following figures are based on the Zenitel USA statistics; for new equipment a comparison is made to figures from known equipment

Data used are the sales figures from Zenitel USA and the fault reports over an 8 year period