ZMQ is a communication library that has recently made it possible to use messaging patterns simply. When we need to connect two software processes, very often the only way that comes to mind is the client-server, where the server part provides a service to one or several clients. In reality this is not an actual pattern because there are many ways in which a service can be provided, indeed, it is generally a quick way to identify which part is listening compared to the one that connects. ZMQ attempts to abstract the way in which two parts connect together, focusing instead on how we want messaging to take place. The main built-in patterns of ZMQ are as follows:

ZMQ has various types of sockets to use basic messaging patterns. Through the basic patterns, it is then possible to create second level patterns, depending on the needs of the application to be set up. In this short article I will set out what were the reasons that necessitated the use of an advanced messaging pattern such as the Clone Pattern.

Module interconnection problem

When we are dealing with a distributed system, or when we want to separate the complex logic of our system on several independent modules, we must above all think about how they will have to communicate. This entails extra work and costs, so it is not always convenient to make this decision, especially when undertaking small projects. However the organisation in separate modules has some advantages including:

As the logic of the entire system is distributed over several modules, the status is also distributed. In fact, each module or component of the system has its own status. Often the most immediate solution translates into a direct connection between the modules, as shown in the figure.

This decentralised architecture is very flexible as there are no constraints. Often, however, enormous flexibility translates into management complexity. In fact, for each module the communication mode, the status update policies and also the network endpoints must be defined. As the number N of modules increases, the tendency is to obtain NxN connections, a situation that is not always pleasant.

To give a more structured cut to this problem, a centralised architecture can be used. In this type of architecture there is a special component called broker which takes on the role of sorting and storage of the status of the distributed application. Introducing a central component brings with it a series of pros and cons, which is important to evaluate:

The following figure shows a centralised broker-based architecture.

Status sharing

In order to connect multiple modules and at the same time to share a common status, ZMQ’s Clone Pattern might be the solution. The system architecture is centralised and based on brokers. However, the task of the broker is not limited to just sorting messages, but it also performs the storage function of the common status. This storage feature is a change from the generic centralised model described in the previous paragraph. This pattern is not one of the built-in ones, but it uses other basic ones to create a more complex one. The built-in patterns used in Clone are as follows:

Status sharing is made possible by the broker, as the software modules cannot communicate directly with each other. Let’s look in more detail at how this mechanism works:

  1. Synchronisation operation. It is the first operation that a module must carry out, at the time of its inclusion in the Clone network. The aim is to synchronise with the entire status of the broker (or rather only the relevant subpart). Synchronisation occurs via a pair of REQ-REP sockets (request-reply). The broker always listens for these status requests. Whenever it receives one, it interrupts its normal functioning flow to serve the request and to send the entire status to the module.
  2. Update operation. It represents the style of data propagation within the network and takes place through a pair of PUB-SUB sockets (publisher-subscriber). Each type of data is named topic and a module generally subscribes only to the topics in which it is interested. We can say that the status is a set of key-value data, where the topic assumes the meaning of key, while the value is a generic binary payload. The broker, during its normal operation, publishes updates to the modules every time a change is made to the data.
  3. Change operation. It represents the way a module makes a change to status data. In particular, each module submits its status change through a pair of PUSH-PULL sockets (pipeline). The broker in turn sequences all the changes received by the modules and, for each of them, updates its status and finally republishes the change.

Correct implementation of this pattern must take into account the disconnections, which can occur between the broker and the other modules. This functionality can be implemented by means of a diagnostic mechanism, with heartbeat packets, sent continuously by the broker as if it were a normal distribution of data. Each component must detect this heartbeat and proceed with resynchronisation of the status in case of lack of communication.

The Clone Pattern only defines the way in which messages pass between one module and another, but does not instead place constraints on the payload. In order for the software component interfaces to be clear and well defined, the payload of the exchanged messages can be defined with other formalisms such as ASN.1, MessagePack, JSON and others. However, this decision is the task of the business logic of the applications.

Using this communication mechanism in a real system a round-trip time of 2 ms per message was tested. In the presence of a sufficiently high load, for example a message of approximately 800 bytes every 200 ms, there can be a latency ranging from 15 ms to 50 ms (indicative data). This is a completely acceptable performance for applications in non-real-time or that must meet tight deadlines.

Status replication

There are contexts in which the replication of the status is fundamental to have a certain degree of redundancy of the entire system. Redundancy can be used to increase the availability or security of a system. The practical case we want to discuss is relative to the first point. That is, we want to extend this centralised architecture to obtain a hot spare mechanism (hot stand-by), to increase availability. In this context we use the term system to define the set of all the modules connected to a broker, running on a machine. We therefore want to instantiate two systems and link them together in some way.

A possible solution is to link the two brokers together. The broker component can be further extended so that it has, within it, the same connection and management logic that we find in the software modules. So a broker presents itself to the other as if it were a simple module (federation style approach), instead of having privileged communication channels (peering style approach). The system broker that assumes the role of hot spare (system B) subscribes to the entire status of the active system (system A), without any limitation. By construction, therefore, the B system will obtain all the updates and keep its status aligned with that of the A system.

System B has the same modules as system A, but they cannot be running until system A is up and running. Under this condition, the only function of the B system is to keep its status aligned. Instead, activation of the B system can take place by mean of the same heart beating mechanism previously defined. For example, if system A stops working, system B notices and takes over, becoming active. At that point, all the B system modules are started and the first operation they perform is to synchronise with the status of their broker. As a result, the overall operation of the system is redundant, increasing availability.

Of course there are some practical problems to solve, for the decision to perform the changeover between the two units:

Conclusions

ZMQ is a low-level library that provides messaging patterns, which until a few years ago were very complex to implement and use. However, by focusing on the needs of our application, we can efficiently interconnect software components. Nowadays it is much easier to implement a distributed application.