Near real-time communication
Internet of Things (IoT) is an interdisciplinary field that allows a wide range of devices, from the smallest sensors to industrial machines, to communicate and affect each other close to real-time. It has removed the dependency of processing data on a centralized server and replaced it with a more decentralized solution where each device can be considered a client as well as a server.
Traditionally, each device (or client) in a network could only initiate a connection to send or request data, whereas the server, or cluster of servers, only serve received requests from the devices. This is known as the request/response pattern and suffers from some significant drawbacks.
Chief among these is that in a system with sensors, actuators, and servers, the sensors and actuators can’t communicate directly with each other. In a simple example such as a sensor trying to change the state of some actuators, the sensor needs to send the request to the server first. Meanwhile, the actuators need to keep on polling the server to check if it needs to change its state. There are two disadvantages associated with this; firstly, there is a high risk that messages can’t be received by the server due to heavy load and time needed to process the requests. Secondly, actuators may use an unnecessary amount of bandwidth since they keep polling the server, even when nothing changes.
The modern way is to use the publish/subscribe (or pub/sub) pattern. This approach replaces the traditional architecture and tackles its problems. In the same example of a sensor trying to change the state of some actuators with a similar architecture design, the server is replaced by a message broker that only manages multiple queues of messages. Both, the sensors and actuators are upgraded to be able to post, receive and process messages on specific queues.
With this solution, the broker requires minimal time between receiving the message and acknowledging it. Also, actuators don’t need to keep polling the server since they get notified only when new, relevant messages are received.
Traditionally, each device (or client) in a network could only initiate a connection to send or request data, whereas the server, or cluster of servers, only serve received requests from the devices. This is known as the request/response pattern and suffers from some drawbacks.
Chief among these is that in a system with sensors, actuators, and servers, the sensors and actuators can’t communicate directly with each other. In a simple example such as a sensor trying to change the state of actuators, the sensor needs to send the request to the server first. Meanwhile, the actuators need to keep on polling the server to check if it needs to change its state. There are two disadvantages associated with this; firstly, there is a high risk that messages can’t be received by the server due to heavy load and time needed to process requests. Secondly, actuators may use an unnecessary amount of bandwidth since they keep polling the server, even when nothing changes.
The modern way is to use the publish/subscribe (or pub/sub) pattern. This approach replaces the traditional architecture and tackles its problems. In the same example of a sensor trying to change the state of actuators with a similar architecture design, the server is replaced by a message broker that only manages multiple queues of messages. Both, the sensors and actuators are upgraded to be able to post, receive and process messages on specific queues.
MQTT is designed to be an extremely lightweight protocol. It excels in direct device to device communication, where devices are set in constrained environments with unstable connectivity. This is due to the MQTT design; a broker with protection mechanisms against connectivity issues and relatively small messages with negligible headers. To be able to employ the protection mechanisms, the broker is responsible for maintaining the queues, messages and subscribers list. Due to the broker holding all information, a typical MQTT setup can’t be scaled horizontally in an efficient manner. It can, on average, handle up to 100,000 messages per second.
A perfect set-up for MQTT is a scenario with a connected car fleet. While driving, each car can go into tunnels or remote locations with no connections. Thanks to MQTT no messages will be lost. One well-known MQTT implementation is Mosquitto and it is free under the Eclipse Foundation license. One can self-host the broker on-premise as well as in the cloud. Some of the cloud solutions would be Amazon IoTCore, and Azure IoT Hub as well as Google IoTCore.
AMQP is designed more towards complex routing and scalability. Each message has a header that contains delivery information and routing preferences. The message broker keeps a strict order for all the messages it receives. AMQP should be used in less constrained environments compared to MQTT and in cases where message ordering, complex routing and delivery processing guarantees are more important. Since the messages hold their meta information, the message broker can be comprised of multiple nodes, each capable of handling 50,000 messages per second (up to 1 million messages in total).
In complex industrial settings with different production lines, AMQP can be a better solution as the complexity of a factory’s set-up can be translated one to one using the routing prowess of AMQP. RabbitMQ is one of the most popular AMQP brokers and it is free to use under the Mozilla License. RabbitMQ also extends support for MQTT messages, which makes it extremely valuable for self-hosted solutions. Some of the paid cloud solutions include Amazon MQ and Azure Servicebus.
Kafka is designed differently compared to MQTT and AMQP. Its queue can be considered as a log system containing all messages, with strict timestamps and order. A message is only removed when it expires and not when it is delivered like in the other two protocols. The subscriber (called a ‘consumer’ in Kafka) will maintain its own offset to know which messages it has already read, while the message broker is responsible only for the ordering. Kafka is not used for real-time communication between devices like MQTT and AMQP. Instead it is used when message order, retention and replay are needed. All the work in a Kafka setup is distributed to the consumers, thus it can handle millions of messages per second.
Kafka is used optimally for data analysis, for example, to ingest sensor data for predictive maintenance, or users’ actions on the web to predict shopping habits. Kafka is part of the Apache Foundation, so it is free to use under the organization’s 2.0 license. Cloud hosted solutions are Amazon MSK, Azure Eventhub.
This section dissects the different protocols a little bit more, highlighting seven of the main criteria. Those criteria will explain why these protocols vary so much in usage and on which basis should a protocol be used.
Installation and Configuration: Both AMQP and MQTT are easy to install. An application can then send a message to the broker on a specific queue, where it is forwarded to another. AMQP can be more difficult to configure if you need to use more complex routing capabilities. Kafka is the most complicated to set up as multiple producers, consumers and connectors need to be installed and properly configured, which requires deeper knowledge.
Complex Routing: MQTT allows only routing with queue names (topics) in it and messages that have this queue’s topics are sent there. While Kafka does the same thing, it can partition each queue and consumers can handle them separately. AMQP, on the other hand, has a variety of routing possibilities. It can organize messages by topic names like the other two and it can also publish messages to all subscribers connected to it. For a more complex solution AMQP can separate messages by complex calculations on the headers (e.g. having conditions in the header that subscribers must meet). An extra functionality in AMQP is a default queue where messages that have no queue specified are sent automatically.
Scalability of infrastructure: While both MQTT and AMQP follow a single-broker paradigm containing and maintaining the queues, only AMQP can work in multi-node mode which allows it to scale better horizontally. For Kafka, most of the logic is in the consumers which allows it to scale better compared to the other two.
Message Order: Both MQTT and AMQP guarantee ordering in the queue to different extents, but they both struggle once the messages are read by the subscribers. Kafka, on the other hand, guarantees the message order, no matter the condition due to its logging-style of queuing.
Retention: Messages in MQTT are only delivered once, while in AMQP more conditions for the message can be added (multiple reads, acknowledgments, etc.). However, once the conditions are met, the messages are lost forever. Kafka doesn’t remove the messages from its queue but rather every consumer moves its offset to the message it wants to read, which allows messages to be retained indefinitely (limited only by hardware considerations).
Negligible Latency: Due to MQTT’s simple logic and the small size of messages, it has negligible added latency to that of the network transfer. For Kafka, to guarantee a strict ordering and to put the messages in the correct part of the queue, it requires more time to acknowledge the message. This causes the latency to increase. Depending on the AMQP’s setup and the complex routing configured, its latency can increase to almost double the latency of the other two. However, with a simple set-up, it can be as fast as Kafka
Bandwidth Usage: MQTT is dedicated to constrained environments which makes its usage of bandwidth extremely optimal. Kafka uses more bandwidth due to its message composition and timestamps being used. AMQP usage of bandwidth is the worst out of the three protocols as the messages’ headers can be extremely large doubling the size of data to be transferred.
IoT has been gaining more visibility in the last five years due to investments in the fields of smart homes, smart cities, connected cars, industrial IoT and many more. It is a great fit for those fields, as it can drastically improve the communication between devices and allows real-time data collection and analytics. Using the pub/sub pattern, it can patch the disadvantages of traditional communication patterns.
Concept Reply has faced more complex use cases over time and learned to craft architectures, utilizing multiple mixes of IoT protocols as well as different patterns to reach the optimal solution for our clients. Real-life scenarios can be very complex, for example, a customer needs to collect data and do analysis on such data. This scenario can be solved using an MQTT service collecting data and pushing it to a Kafka service to ease the analysis step. Another example, a customer may need data collection and direct access to industrial machines, requiring the use of both a pub/sub pattern and a request/response pattern.
Concept Reply is a software development partner specialized in innovative IoT solutions. It offers its customers solutions for smart infrastructure, industrial IoT, and connected cars, working from the initial idea to the planning phase through to implementation, operation, and support. With its IoT specialists, the range extends from implementation in the embedded environment to gateway software and cloud applications. www.concept-reply.com