What Does “Database High Availability” Really Mean?
For many of our customers, High Availability is a key concern. Their architects spend a lot of time in designing and planning for high availability of applications and databases. High availability is important for business continuity. A short downtime can lead to loss of business, therefore this topic needs to be addressed and that leads me to write this blog.
If you Google for High availability, you will find many definitions. One definition from Wikipedia is given below:
High availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
Key Principles of High Availability
The following are the key principles of High Availability:
- Eliminate any single point of failure: Adding redundancy, so that the failure of any one part of the system does not lead to the collapse of the entire system.
- Reliable crossover: In a redundant system, the crossover point itself becomes a single point of failure. Fault-tolerant systems must provide a reliable crossover or automatic switchover mechanism to avoid failure.
- Detection of failures: If the above two principles are proactively monitored, then a user may never see a system failure.
EDB Postgres has building blocks for covering all of the above key principles.
- Elimination of single points of failover – Postgres supports the following types of physical standbys:
- Cold standby – A backup server that has backups and all necessary WAL files for recovery. This system by definition is not up and running. However, the system can be made available if needed. Mainly we use backup servers and WAL files for creating a new PostgreSQL node as part of disaster recovery.
- Warm Standby – In Warm Standby mode, Postgres runs in recovery mode and receives the updates using archived log files or using log shipping replication of Postgres. In this mode, Postgres is not accepting connections or queries.
- Hot Standby – In Hot Standby mode, Postgres runs in recovery mode and receives the updates using archived log files or using log shipping replication. In recovery mode, Postgres supports connections and read-only queries.
Any of the above can help in eliminating single points of failover. However, depending on the agreed level of performance/uptime, users can choose any one of the above. The most popular standby mode after Postgres 9.0 is Hot Standby.
- Reliable crossover – For a reliable crossover, i.e., switching between master and standby(s) node(s), EDB provides a technology called EDB Postgres Failover Manager (EFM). This technology enables automatic failover of the Postgres master node to a Standby node in case of a software or hardware failure on the Master. EFM uses JGroups, which provides a reliable, distributed, and redundant infrastructure without a single point of failure.
- Detection of failures – EDB Postgres Failover Manager continuously monitors the server and detects failures. It also executes the failover from the Master to one of the Replicas in order to make the system available for accepting database connections and executing queries. Properly configured, EFM can detect failures, and execute a failover within a few seconds.
Combining all the above can help in achieving High Availability of EDB Postgres within a data center or across data centers. If you are a cloud user, you can have High Availability within a region (across multiple zones) or across the regions (using a backplane network supported by the cloud vendors). For a detailed walkthrough of questions you need to ask when designing Highly available databases, watch our on-demand webinar.
PostgreSQL Database Uptime and Availability
Uptime and availability are generally used as synonymous. To achieve High Availability and maintain the agreed uptime, architects make sure to reduce the outages/downtime.
Service outages come in two main flavors:
- 1. Planned outages
- 2. Unplanned outages
Some people refer to them as Scheduled and Unscheduled downtime.
- Planned outage/Scheduled downtime – Planned outage/scheduled downtime is a result of maintenance activities, which disrupt system operation and usually cannot be avoided. It might include patches to system software that require a reboot or database restart. In general Planned outage is a result of some logical, management-initiated event.
- Unplanned outage/Unscheduled downtime – Unplanned Outage/unscheduled downtime is the result of downtime events due to some physical failures/events, such as hardware or software failure or environmental anomaly. For example, power outages, failed CPU or RAM components (or possibly other hardware components failure), network failure, security breaches, or various applications, middleware, and operating system failures result in Unplanned outage/Unscheduled downtime.
In the above outages/downtimes, the EDB Postgres Failover manager can help in minimizing the downtime. For planned outage/Scheduled downtime, a user/DBA can first patch all the standby(s) and use EDB Postgres Failover Manager perform switchover before patching the master (primary) node.
For unplanned outage/unscheduled downtime, EDB Postgres Failover Manager can detect failures and perform the failover to the appropriate standby, and make it the new master, which can then accept read/write connections and provide database services to the application. EDB Postgres Failover Manager also makes sure that the old master/primary doesn’t come back (after failover) to avoid a split-brain situation.
With EDB Postgres Failover Manager, if an architect wants to reduce the unavailability of their applications, they can also leverage multiple hosts connections of JDBC driver or libpq as given.
The above will make the master/primary failover of Postgres transparent to the application.
Availability is usually calculated/expressed as a percentage of uptime in a given year based on the service level agreements. Some companies exclude the planned outage/scheduled downtime based on their agreements with customers on the availability of their services.
The below table shows the translation of five Nines (9) from a given availability percentage to the corresponding amount of time a system would be unavailable.
|Availability %||Downtime per year||Downtime per month||Downtime per week||Downtime per day|
|99.99% (“four nines”)||52.60 minutes||4.38 minutes||1.01 minutes||8.64 seconds|
|99.995% (“four and a half nines”)||26.30 minutes||2.19 minutes||30.24 seconds||4.32 seconds|
|99.999% (“five nines”)||5.26 minutes||26.30 seconds||6.05 seconds||864.00 milliseconds|
Based on the use cases and service level agreements, EDB has been able to help our customers to achieve five 9s with EDB Postgres.
Want to learn more how to operate Postgres at scale, with flexible deployment options? Check out the EDB Postgres Platform.
- EDB Postgres is an open source Database Platform enabling digital transformation. It delivers a premium open source-based, multi-model data platform for new applications, cloud re-platforming, application modernization, and legacy database migration.
- What Does “Database High Availability” Really Mean?
- Quick and Reliable Failure Detection with EDB Postgres Failover Manager
- Gartner Report 2020: How to Succeed at Database and DBMS Migration