Actions

System availability

From Zenitel Wiki

Revision as of 14:15, 4 August 2009 by Hege2 (talk) (Other failures)

System availability

System Availability

  • The percentage of time that the system can perform its intended function


System Availability = 1 – System Downtime


Downtime per year

Availability Nines Downtime
90% 1 36.5 days/year
99% 2 3.65 DAys/year
99.9% 3 8.78 hours/year
99.99% 4 52 minutes/year
99.999% 5 5 minutes/year

System Downtime

Many events causes system downtime:

  • HW fault
  • Software fault
  • Vandalism
  • Extreme conditions (fire, flooding etc)
  • Power outage
  • IP network failure
  • Planned system maintenance



System Downtime = ∑ P * S * MTTR

P = Probability of event taken place
S = Severity of event
  = Percentage of service affected by fault
MTTR = Mean Time To Repair
  = mean time to detect fault + mean time to fix fault

HW failure


MTBF

  • Probability of HW faults calculated using MTBF figures
  • MTBF ≠ System Availability


MTBF calculations

  • Emperical method
  • MIL-HDBK-217
  • Telcordia


Emperical methods

  • Based on statistics from the field


MIL-HDBK-217 and Telcordia

  • All component entered in database with set environmental condition
  • Provides usually lower MTBF figure than emperical methods
    • Does include real usage conditions
    • Use worst case environmental conditions


*More components gives higher MTBF
*MTBF and single points of failure

Other failures

Software fault

  • Automatic watch dog functions
  • Automatic recovery functions
  • Maturity of system
  • Structured software design and test


Vandalism and Extreme conditions

  • Robustness to vandalism and extreme conditions
  • IP and IK class
  • IP security functions to hinder denial of service attacks (DOS)


Power outage

  • UPS and redundant power supplier


IP network failure

  • Network service level
  • Redundant and switchover functions


Planned system maintenance

  • Expansion, add users etc
  • Ability to do maintenance without service interruptions

Redundancy

Redundancy is about parallelism and removing single point of failures

Redundancy usually gives lower MTBF figures

  • Require more components


Redundancy usually provides significant higher service availability

  • A single failure shall have minimum or no impact on service availability