Difference between revisions of "System availability"

Revision as of 14:20, 4 August 2009

System availability

System Availability

The percentage of time that the system can perform its intended function

System Availability = 1 – System Downtime

Downtime per year

Availability	Nines	Downtime
90%	1	36.5	days/year
99%	2	3.65	DAys/year
99.9%	3	8.78	hours/year
99.99%	4	52	minutes/year
99.999%	5	5	minutes/year

System Downtime

Many events causes system downtime:

HW fault
Software fault
Vandalism
Extreme conditions (fire, flooding etc)
Power outage
IP network failure
Planned system maintenance

System Downtime = ∑ P * S * MTTR

P	= Probability of event taken place
S	= Severity of event
	= Percentage of service affected by fault
MTTR	= Mean Time To Repair
	= mean time to detect fault + mean time to fix fault

HW failure

MTBF

Probability of HW faults calculated using MTBF figures
MTBF ≠ System Availability

MTBF calculations

Emperical method
MIL-HDBK-217
Telcordia

Emperical methods

Based on statistics from the field

MIL-HDBK-217 and Telcordia

All component entered in database with set environmental condition
Provides usually lower MTBF figure than emperical methods
- Does include real usage conditions
- Use worst case environmental conditions

*More components gives higher MTBF
*MTBF and single points of failure

Other failures

Software fault

Automatic watch dog functions
Automatic recovery functions
Maturity of system
Structured software design and test

Vandalism and Extreme conditions

Robustness to vandalism and extreme conditions
IP and IK class
IP security functions to hinder denial of service attacks (DOS)

Power outage

UPS and redundant power supplier

IP network failure

Network service level
Redundant and switchover functions

Planned system maintenance

Expansion, add users etc
Ability to do maintenance without service interruptions

Redundancy

Redundancy is about parallelism and removing single point of failures

Redundancy usually gives lower MTBF figures

Require more components

Redundancy usually provides significant higher service availability

A single failure shall have minimum or no impact on service availability

STENTOFON System Availability

Redundancy

Control room redundancy and parallel call handling
Power supply redundancy
Alternative AlphaNet routing
Control card redundancy
….

Reduced MTTR

AlphaNet supervision
Station supervision and tone test
Network monitoring (SNMP, Syslog, OPC)

Software failures and recovery

HW watchdog
SW process watchdog
Automatic recovery

System maintenance

Centralized and remote firmware upgrade
Hot insert and removal of cards
Control card redundancy

Some MTBF figures

Zenitel USA has been keeping statistics about failures and the reasons for failures

Zenitel USA actively encourages repairs and the estimate is that 95% of failures is reported, even failures of equipment installed before the AlphaCom was introduced

The following figures are based on the Zenitel USA statistics; for new equipment a comparison is made to figures from known equipment

Data used are the sales figures from Zenitel USA and the fault reports over an 8 year period

@@ Line 108: / Line 108: @@
 Redundancy usually provides significant higher service availability
 *A single failure shall have minimum or no impact on service availability
+==STENTOFON System Availability==
+'''Redundancy'''
+*Control room redundancy and parallel call handling
+*Power supply redundancy
+*Alternative AlphaNet routing
+*Control card redundancy
+*….
+<br>
+'''Reduced MTTR'''
+*AlphaNet supervision
+*Station supervision and tone test
+*Network monitoring (SNMP, Syslog, OPC)
+<br>
+'''Software failures and recovery'''
+*HW watchdog
+*SW process watchdog
+*Automatic recovery
+<br>
+'''System maintenance'''
+*Centralized and remote firmware upgrade
+*Hot insert and removal of cards
+*Control card redundancy
+==Some MTBF figures==
+Zenitel USA has been keeping statistics about failures and the reasons for failures
+<br>
+<br>
+Zenitel USA actively encourages repairs and the estimate is that 95% of failures is reported, even failures of equipment installed before the AlphaCom was introduced
+<br>
+<br>
+The following figures are based on the Zenitel USA statistics; for new equipment a comparison is made to figures from known equipment
+<br>
+<br>
+Data used are the sales figures from Zenitel USA and the fault reports over an 8 year period
 [[Category:AlphaCom E System]]