Cluster monitoring

Cluster monitoring statistics allow you to establish how well the cluster is functioning by providing information about the cluster members and the flow of messages between the cluster members. Cluster monitoring statistics provide a set of statistics about each cluster member as they relate to the cluster member on which you are viewing the statistics.

You can use a REST Monitoring API or the Amlen WebUI to view information about the status of cluster members and the flow of messages between them. You can view the following information:

  • Cluster
    Each object in the array is a cluster member in the same cluster as the cluster member on which you are viewing the statistics.
  • Server name
    The user-assigned name of the cluster member.
  • Server UID
    The randomly-generated unique identifier of the cluster member. The same value applies to both servers in a high availability (HA) pair.
  • Status
    The status of the cluster member.
    The value of the Status field can be one of the following values:
    • Active
      The cluster member is sending and receiving messages and cluster control information.
    • Inactive
      The remote cluster member is known to this cluster member but has not been discovered.
      A status of Inactive commonly means that the server is not running but can also indicate that there is a problem with the network or configuration.
    • Connecting
      The cluster member has been identified by Discovery but is not yet sending messages.
      Cluster members normally remain in Connecting state for a short time. If a cluster member remains in Connecting state, it is likely that its configuration is incorrect.
  • Status time
    The date and time when the status of the cluster member last changed.
    Status time information is useful in identifying cluster members that are inactive for long periods, and cluster members that frequently change state.
  • Health
    An indication of the health of the cluster member. The health of the cluster member is primarily based on the amount of memory that is available on the cluster member.
    The value of the Health field can be one of the following values:
    • Unknown
      The health of the cluster member is unknown.
      A status of Unknown can be returned when the cluster member is not running but can also indicate that there is a problem with the network or configuration..
    • Green
      The health of the cluster member is good. The cluster member has sufficient memory to process messaging traffic.
    • Yellow
      A Health status of Yellow is a warning that the health of the cluster member is becoming bad.
      The cluster member is getting low on memory and will start to take actions to limit the amount of memory that is being used.
      Administer the cluster member to determine the cause of the low memory condition and take appropriate remedial action.
    • Red
      A Health status of Red indicates that the health of the cluster member is bad.
      The cluster member is low on memory and will start to take actions to limit the amount of memory that is being used. Such actions include discarding QoS1 and QoS2 messages.
      Administer the cluster member to determine the cause of the low memory condition and take appropriate remedial action.
  • Memory
    The percentage of memory that is being used on the cluster member.
  • High availability status
    The high availability (HA) status of the cluster member.
    The value of the high availability status field can be one of the following values:
    • Unknown
      The HA status of the cluster member is unknown.
      A status of Unknown can be returned when there is no messaging traffic between the cluster member that you are viewing statistics on and the remote cluster member.
    • None
      The cluster member is not configured for HA.
    • Single
      The cluster member is configured as a member of an HA pair but is running without a standby.
    • Pair
      The cluster member is running as a member of an HA pair.
    • Error
      The cluster member is configured as a member of an HA pair but HA status of the cluster member is in error.
  • Retained messages synchronized
    Indicates whether the cluster member has retained messages synchronized.
    A value of false indicates that the cluster member does not have retained messages synchronized; this is normal for a short time after a cluster member becomes active. A value of false that persists for longer than approximately 10 minutes might indicate that the cluster member is experiencing problems or that there is a problem with the communication between the remote cluster member and this cluster member. If a value of false persists for 30 minutes, the cluster members attempt to resynchronize retained messages automatically.
  • Reconnect
    The number of times the connection used for messaging between the remote cluster member and this cluster member has been re-established. If the remote cluster member goes down, the count is incremented by 2. A high value might indicate that the cluster member or the network is unstable.
  • Read messages
    The number of incoming messages that are received from the remote cluster member.
    The value is reset when the cluster member starts.
  • Read bytes
    The number of bytes in the incoming messages that are received from the remote cluster member.
    The value is reset when the cluster member starts.
  • Write messages
    The number of outgoing messages that are sent to the remote cluster member.
    The value is reset when the cluster member starts.
  • Write bytes
    The number of bytes in the outgoing messages that are sent to the remote cluster member.
    The value is reset when the cluster member starts.
  • Read message rate
    The number of messages per second that are received from the remote cluster member.
  • Unreliable
    The information relates to unreliable messaging (QoS 0) from this cluster member to remote cluster members.
  • Reliable
    The information relates to reliable messaging (QoS 1 or QoS2) from this cluster member to remote cluster members.
The following information is returned for each of the two messaging types (unreliable and reliable) that are used to send messages from this cluster member to remote cluster members:
  • Buffered messages
    The number of messages that are currently awaiting delivery to the remote cluster member.
  • Buffered messages high water mark
    The highest number of messages that awaited delivery to the remote cluster member.
  • Buffered bytes
    The number of bytes that are awaiting delivery to the remote cluster member.
  • Maximum bytes
    The maximum number of bytes that are allowed for buffered messages.
  • Sent messages
    The number of messages that have been successfully sent to the remote cluster member.
    For messages of QoS 1 or QoS 2, the count is incremented only after the message transfer is committed.
    The value is reset when the cluster member starts.
  • Message send rate
    The number of messages per second sent to the remote cluster member. This value represents the current rate of transmission.
  • Discarded messages
    The number of messages that were discarded because the buffered data limit was reached. Refer to the value that is displayed in the Maximum bytes field.
    The value is reset when the cluster member starts.
  • Expired messages
    The number of messages that were discarded because they exceeded the expiration time.
    The value is reset when the cluster member starts.
  • Suspend
    The number of messages that were suspended from being sent to the remote cluster member. In certain circumstances, messages are suspended when the remote cluster member cannot receive messages as quickly as this cluster member is sending them. Suspending the sending of messages allows the remote cluster member to process the messages that it has already received so that it can then continue to receive messages.
    A high suspend count indicates that the remote cluster member is having problems keeping up with the message rate from this cluster member.
    The value is reset when the cluster member starts.