System Design, Chapter 7: Queues
System Design, Chapter 7: Queues
What is message queuing?
Let’s start by defining message queues; how you can use a message queue and the benefits achieved when using message queues.
A queue is a line of things waiting to be handled — in sequential order starting at the beginning of the line. A message queue is a queue of messages sent between applications. It includes a sequence of work objects that are waiting to be processed.
A message is the data transported between the sender and the receiver application; it’s essentially a byte array with some headers on top. An example of a message could be an event. One application tells another application to start processing a specific task via the queue.
The basic architecture of a message queue is simple; there are client applications called producers that create messages and deliver them to the message queue. Another application, called a consumer, connects to the queue and gets the messages to be processed. Messages placed onto the queue are stored until the consumer retrieves them.
Processing Queue — all incoming tasks are added to the queue, and as soon as any worker has the capacity to process, they can pick up a task from the queue
Usage
- asynchronous communication protocol — client not required to wait for the results
- fault tolerance — as they can provide some protection from service outages and failures
Queues play a vital role in managing distributed communication between different parts of any large-scale distributed system.
Kafka as a Messaging System
Kafka is a piece of technology originally developed by the folks at Linkedin. In a nutshell, it’s sort of like a message queueing system with a few twists that enable it to support pub/sub, scaling out over many servers, and replaying of messages.
How does Kafka’s notion of streams compare to a traditional enterprise messaging system?
Messaging traditionally has two models: queuing and publish-subscribe. In a queue, a pool of consumers may read from a server and each record goes to one of them; in publish-subscribe the record is broadcast to all consumers. Each of these two models has a strength and a weakness. The strength of queuing is that it allows you to divide up the processing of data over multiple consumer instances, which lets you scale your processing. Unfortunately, queues aren’t multi-subscriber — once one process reads the data it’s gone. Publish-subscribe allows you broadcast data to multiple processes, but has no way of scaling processing since every message goes to every subscriber.
The consumer group concept in Kafka generalizes these two concepts. As with a queue the consumer group allows you to divide up processing over a collection of processes (the members of the consumer group). As with publish-subscribe, Kafka allows you to broadcast messages to multiple consumer groups.
The advantage of Kafka’s model is that every topic has both these properties — it can scale processing and is also multi-subscriber — there is no need to choose one or the other.
Kafka has stronger ordering guarantees than a traditional messaging system, too.
A traditional queue retains records in-order on the server, and if multiple consumers consume from the queue then the server hands out records in the order they are stored. However, although the server hands out records in order, the records are delivered asynchronously to consumers, so they may arrive out of order on different consumers. This effectively means the ordering of the records is lost in the presence of parallel consumption. Messaging systems often work around this by having a notion of “exclusive consumer” that allows only one process to consume from a queue, but of course this means that there is no parallelism in processing.
Kafka does it better. By having a notion of parallelism — the partition — within the topics, Kafka is able to provide both ordering guarantees and load balancing over a pool of consumer processes. This is achieved by assigning the partitions in the topic to the consumers in the consumer group so that each partition is consumed by exactly one consumer in the group. By doing this we ensure that the consumer is the only reader of that partition and consumes the data in order. Since there are many partitions this still balances the load over many consumer instances. Note however that there cannot be more consumer instances in a consumer group than partitions.
Message queuing fulfills this purpose by providing a means for services to push messages to a queue asynchronously and ensure that they get delivered to the correct destination. To implement a message queue between services, you need a message broker; think of it as a mailman, who takes mail from a sender and delivers it to the correct destination.
Message Broker — RabbitMQ
RabbitMQ is one of the most widely used message brokers, with over 35,000 production deployments worldwide and is considered one of the most reliable message brokers available. RabbitMQ acts as the message broker, “the mailman”, a microservice architecture needs.
For more details: RabbitMQ
Examples: RabbitMQ vs Kafka
- Features
Apache Kafka
— It is distributed. The data is shared and replicated with
assured durability and availability.
RabbitMQ
— It offers comparatively less support for these features.
2. Performance rate
Apache Kafka — Its performance rate is high, up to 100,000
messages/second.
RabbitMQ — Whereas, the performance rate of RabbitMQ is around 20,000
messages/second.
3. Processing
Apache Kafka
— It allows reliable log distributed processing. Also, there
exist stream processing semantics built into the Kafka
Streams.
RabbitMQ — The consumer is just FIFO based, reading from the HEAD and
processing sequentially.
Hope this article is useful for people looking to understand queue & message broker, Please ❤️ to recommend this post to others 😊. Let me know your feedback. :)
References: