System Design, Chapter 8: Replication and Redundancy

Redundancy is the duplication of critical components or functions of a system with the intention of increasing the reliability of the…

System Design, Chapter 8: Replication and Redundancy

Redundancy is the duplication of critical components or functions of a system with the intention of increasing the reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance. For example, if there is only one copy of a song file stored on a single server, then losing that server means losing the song file. Since losing data is seldom a good thing, we can create duplicate or redundant copies of the song file to solve this problem.

Replication can be used to prevent the loss of a single server from causing your directory service to become unavailable.

A reliable replication topology ensures that the most recent data is available to clients across data centers, even in the case of a server failure. At a minimum, your local directory tree needs to be replicated to at least one backup server. Some directory architects say that you should replicate three times per physical location for maximum data reliability. In deciding how much to use replication for fault tolerance, consider the quality of the hardware and networks used by your directory. Unreliable hardware requires more backup servers.

To maintain the ability to read data in the directory, a suitable load balancing strategy must be put in place. Both software and hardware load balancing solutions exist to distribute read load across multiple replicas. Each of these solutions can also determine the state of each replica and to manage its participation in the load balancing topology. The solutions might vary in terms of completeness and accuracy.

To maintain write failover over geographically distributed sites, you can use multiple data center replication over WAN. This entails setting up at least two master servers in each data center, and configuring the servers to be fully meshed over the WAN. This strategy prevents loss of service if any of the masters in the topology fail. Write operations must be routed to an alternative server if a writable server becomes unavailable. Various methods can be used to reroute write operations, including Directory Proxy Server.

Now just because you have multiple servers or multiple systems– you’ve got that redundancy– doesn’t necessarily mean that your environment is highly available. High availability means that the systems will always be available regardless of what happens. With redundancy, you may have to flip a switch to move from one server to the other, or you may have to power up a new system to be able to have that system available. High availability is generally considered to be always on, always available.