Tuesday, October 14, 2014

High Availability - Overview & General Concepts


High availability is a characteristic of a system. The definition of availability is
Ao = up time / total time.
This equation is not practically useful, but if (total time - down time) is substituted for up time then you have
Ao = (total time - down time) / total time.
Determining tolerable down time is practical. From that, the required availability may be easily calculated.

High availability system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.

There are three principles of high availability engineering. They are

  1. Elimination of single points of failure. This means adding redundancy to the system so that failure of a component does not mean failure of the entire system.
  2. Reliable crossover. In multithreaded systems, the crossover point itself tends to become a single point of failure. High availability engineering must provide for reliable crossover.
  3. Detection of failures as they occur. If the two principles above are observed, then a user may never see a failure. But the maintenance activity must.
Modernization has resulted in an increased reliance on these systems. For example, hospitals and data centers require high availability of their systems to perform routine daily activities. Availability refers to the ability of the user community to obtain a service or good, access the system, whether to submit new work, update or alter existing work, or collect the results of previous work. If a user cannot access the system, it is - from the users point of view - unavailable. Generally, the term downtime is used to refer to periods when a system is unavailable.

Percentage calculation
Availability is usually expressed as a percentage of uptime in a given year. The following table shows the downtime that will be allowed for a particular percentage of availability, presuming that the system is required to operate continuously. Service level agreements often refer to monthly downtime or availability in order to calculate service credits to match monthly billing cycles. The following table shows the translation from a given availability percentage to the corresponding amount of time a system would be unavailable per year, month, or week.

Uptime and availability are not synonymous. A system can be up, but not available, as in the case of a network outage.

In the world of IT, “high availability” is a term you often encounter. There are a few things that go into having high availability and assessing it. This week we will cover the general concepts behind maintaining highly available environments.

There are two major things to consider with high availability: redundancy and separation.


Redundancy involves providing excess capacity in the design in order to account for any failures without a performance decline.

An example of redundancy would be taking  a server and plugging it into not just one, but two power circuits to protect against the power failure of one.

But what if the server itself fails? Add another server and put them together in a cluster.

However, if the circuit fails, the server or cluster would still go down. The key to protecting your availability is to double up (or triple even) on your equipment and power sources.


Diversifying the power sources will help protect your servers from going down due to malfunction or power outages.  A good way to improve your server’s or server cluster’s availability is to connect it to power sources from two different circuits. To maintain availability even during a widespread power outage, we recommend using at least one uninterruptible power supply (UPS). These UPSs are powerful for protecting availability because they take very little time to assume the power burden if there has been a mains power failure. A UPS is able to quickly supply energy through batteries or a flywheel.

While redundancy and separation are two different elements to ensuring high availability, it’s important to note that you need both. A server failing is as likely as a power failure. A solid plan for high availability accounts for both redundancy and separation in order to ensure there is a plan B for any situation.

No comments:

Post a Comment

Share Your Inspiration...