Clustering: Retained messages

It is best practice for the retained messages for a topic in a cluster to be published to the same server and with the same quality of service. Keep the number of retained messages to a minimum.

When an application first subscribes to a topic, the application expects to receive retained messages that match the subscribing topic and that such retained messages are available when the application connects.

All retained messages published anywhere in the cluster are sent to all other servers in the cluster. These messages are then stored on each server in the cluster; for this reason, it is best practice to keep the number of retained messages that are used to a minimum.

The servers in the cluster ensure that such messages are synchronized, but there might be occasions when a retained message that is published on one server has not yet been sent to one or more of the other servers in the cluster. For this reason, the set of retained messages that an application receives when it connects to one server may not yet match the set that it receives when it connects to a different server in the cluster. As retained messages arrive at a server, they are delivered to existing subscribing applications as live messages; as the servers catch up, applications receive the new retained messages.

To ensure correct message processing, all servers that are connected to one another in a cluster must share a common view of the set of retained messages in the cluster. To maintain this common view, whenever a server detects that the set of retained messages that it receives from a cluster member does not match the set that the other cluster members have received, the unsynchronized server requests updates after 30 minutes from the most up-to-date server to which it is connected. It is, therefore, possible, if a server is removed from the cluster before it sends locally published retained message updates to other cluster members, for any retained message updates that are visible at the removed server not to be sent around the cluster because all remaining servers consider themselves to be up-to-date with messages that originated at the removed server.