System Design, Chapter 2: Sharding

Sharding
In a simple word, Sharding is the process of storing data records across multiple machines. Lets relate it with some of below picture to have more clarity, these are called Sharding strategies. If…

System Design, Chapter 2: Sharding

In a simple word, Sharding is the process of storing data records across multiple machines. Lets relate it with some of below picture to have more clarity, these are called Sharding strategies.

The Lookup strategy:

The Range strategy:

The Hash strategy:

What Drives the Need for Database Sharding?

  • You can scale the system out by adding further shards running on additional storage nodes.
  • A system can use off-the-shelf hardware rather than specialized and expensive computers for each storage node.
  • You can reduce contention and improve performance by balancing the workload across shards.
  • In the cloud, shards can be located physically close to the users that’ll access the data.
The growth in database transactions and volumes has a large impact on response times

Practicalities of Database Sharding

If Database Sharding is highly scalable, less costly, and improves performance, why hasn’t adoption of the technology been more widespread? Is it feasible for your organization?

The reality is that Database Sharding is a very useful technology, but like other approaches, there are many factors to consider that ensure a successful implementation. Further, there are some limitations and Database Sharding will not work well for every type of business application. This chapter discusses these critical considerations and how they can be addressed.

Challenges:

Due to the distributed nature of individual databases, a number of key elements must be taken into account:

  1. Avoidance of cross-shard joins — In a sharded system, queries or other statements that use inner-joins that span shards are highly inefficient and difficult to perform.
  2. Auto-increment key management- Typical auto-increment functionality provided by database management systems generate a sequential key for each new row inserted into the database. This is fine for a single database application, but when using Database Sharding, keys must be managed across all shards in a coordinated fashion.
  3. Reliability- The database tier is often the single most critical element in any reliability design, and therefore an implementation of Database Sharding is no exception. In fact, due to the distributed nature of multiple shard databases, the criticality of a well-designed approach is even greater. To ensure a fault-tolerant and reliable approach, the following items are required:
  • Automated backups of individual Database Shards.
  • Database Shard redundancy, ensuring at least 2 “live” copies of each shard are available in the event of an outage or server failure. This requires a high-performance, efficient, and reliable replication mechanism.
  • Cost-effective hardware redundancy, both within and across servers.
  • Automated failover when an outage or server failure occurs.
  • Disaster Recovery site management.

Summary:

Sharding comes in many forms. In the most basic sense, it describes breaking up a large database into many smaller databases. Sharding can include strategies like carving off tables, or cutting up tables vertically (by columns). I’m mostly referring to the strategy of “horizontal table partitioning” — dividing large tables by row. This is a relatively exotic configuration and generally used only by the most demanding applications.

  1. Good strategy for handling extreme database loads
  2. Easiest/most appropriate when most load is accessing a few tables and JOIN operations across shards are not required
  3. Success depends largely of sharding strategy and shard sizing
  4. May require painful periodic shard resizing
  5. Very high complexity
  6. Improves availability — The impact of a shard failure is small (affects only a portion of data), but sharding must be coupled with an additional strategy (like active-passive) to provide complete high-availability

Knowledge is Power!