Actions

System availability

From Zenitel Wiki

System availability

System Availability

  • The percentage of time that the system can perform its intended function


System Availability = 1 – System Downtime


Downtime per year

Availability Nines Downtime
90% 1 36.5 days/year
99% 2 3.65 days/year
99.9% 3 8.78 hours/year
99.99% 4 52 minutes/year
99.999% 5 5 minutes/year

System Downtime

Many events causes system downtime:

  • HW fault
  • Software fault
  • Vandalism
  • Extreme conditions (fire, flooding etc)
  • Power outage
  • IP network failure
  • Planned system maintenance



System Downtime = ∑ P * S * MTTR

P = Probability of event taken place
S = Severity of event
  = Percentage of service affected by fault
MTTR = Mean Time To Repair
  = mean time to detect fault + mean time to fix fault

HW failure


MTBF

  • Probability of HW faults calculated using MTBF figures
  • MTBF ≠ System Availability


MTBF calculations

  • Emperical method
  • MIL-HDBK-217
  • Telcordia


Emperical methods

  • Based on statistics from the field


MIL-HDBK-217 and Telcordia

  • All component entered in database with set environmental condition
  • Provides usually lower MTBF figure than emperical methods
    • Does include real usage conditions
    • Use worst case environmental conditions


*More components gives higher MTBF
*MTBF and single points of failure

Other failures

Software fault

  • Automatic watch dog functions
  • Automatic recovery functions
  • Maturity of system
  • Structured software design and test


Vandalism and Extreme conditions

  • Robustness to vandalism and extreme conditions
  • IP and IK class
  • IP security functions to hinder denial of service attacks (DOS)


Power outage

  • UPS and redundant power supplier


IP network failure

  • Network service level
  • Redundant and switchover functions


Planned system maintenance

  • Expansion, add users etc
  • Ability to do maintenance without service interruptions

Redundancy

Redundancy is about parallelism and removing single point of failures

Redundancy usually gives lower MTBF figures

  • Require more components


Redundancy usually provides significant higher service availability

  • A single failure shall have minimum or no impact on service availability

STENTOFON System Availability

Redundancy

  • Control room redundancy and parallel call handling
  • Power supply redundancy
  • Alternative AlphaNet routing
  • Control card AMC-IP redundancy


Reduced MTTR

  • AlphaNet supervision
  • Station supervision and tone test
  • Network monitoring (SNMP, Syslog, OPC)


Software failures and recovery

  • HW watchdog
  • SW process watchdog
  • Automatic recovery


System maintenance

  • Centralized and remote firmware upgrade
  • Hot insert and removal of cards
  • Control card AMC-IP redundancy

MTBF figures

MTBF (Mean Time Between failures) figures try to give a measure of the reliability of a system. There are 2 ways in which MTBF figures can be obtained:

  • Calculation. There are a number of different standards for the calculation of MTBF figures, of which MIL-HDBK-217 is probably the most widely used.
  • Statistical information. MTBF figures can also be calculated form data gathered about failures in real installations. Of each electronic unit the number of hours in operation is reported at the moment of failure. When this is done over a sufficient large quantity of the same units a realistic ‘real life’ MTBF can be calculated. The MTBF of newly designed items can still be calculated this way by comparing them to existing items with a similar complexity.

Zenitel mainly uses statistical information, in line with common practice.

List of MTBF figures