High availability problems in clusters
There are several reasons why you might experience problems and unexpected behavior when you configure high availability (HA) in clusters of Eclipse Amlen servers.
- The cluster status and server status on both servers in the HA pair.
- The cluster membership configuration on both servers in the HA pair.
Error scenario 1: Did you attempt to enable clustering on an HA pair, and now one or both of your HA servers are in maintenance mode?
Check the server status and HA status of your servers. Use the Eclipse Amlen REST API GET method with the following Eclipse Amlen service URI:
http://<Server-IP:Port>/ima/v1/service/status
ErrorCode
, ErrorMessage
, ReasonCode
, and ReasonString
fields in the status information that is returned. The significant fields are highlighted in the following example of status information. {
"Version": "v1",
"Server": {
"Name": "examplesystem01.com:9089",
"UID": "lz3Qj3Kd",
"Status": "Running",
"State": 9,
"StateDescription": "Running (maintenance)",
"ServerTime": "2016-04-13T13:32:28.546Z",
"UpTimeSeconds": 94,
"UpTimeDescription": "0 days 0 hours 1 minutes 34 seconds",
"Version": "2.0 20160413-1109",
"ErrorCode": 509,
"ErrorMessage": "Store High-Availability error."
},
"Container": {
"UUID": "bb41d6d23772d9062d1eb7c7fe6864246bafae565b7ecae32972492e63c61006"
},
"HighAvailability": {
"Status": "Active",
"Enabled": true,
"Group": "mygroup01",
"NewRole": "UNSYNC_ERROR",
"OldRole": "UNSYNC",
"ActiveNodes": 1,
"SyncNodes": 0,
"PrimaryLastTime": "",
"PctSyncCompletion": -1,
"ReasonCode": 1,
"ReasonString": "Cluster.EnableClusterMembership - CONFIG_ERROR",
"RemoteServerName": ""
A possible cause of this error condition is that cluster membership was enabled on the primary server in the HA pair but only one of the servers in the HA pair was restarted.
Restart both servers in the HA pair at the same time.
Error scenario 2: Did you attempt to enable clustering on an HA pair, and now, after restarting servers, your HA servers are in maintenance mode?
Check the status of your servers. Use the Eclipse Amlen REST API GET method with the following Eclipse Amlen service URI:
http://<Server-IP:Port>/ima/v1/service/status
On each server, check the ErrorCode
, and ErrorMessage
fields in the status information that is returned. The significant fields are highlighted in the following example of status information.
{
"Version": "v1",
"Server": {
"Name": "examplesystem01:9089",
"UID": "DnAUsuJb",
"Status": "Running",
"State": 9,
"StateDescription": "Running (maintenance)",
"ServerTime": "2016-04-13T13:20:40.702Z",
"UpTimeSeconds": 515,
"UpTimeDescription": "0 days 0 hours 8 minutes 35 seconds",
"Version": "2.0 20160413-1109",
"ErrorCode": 509,
"ErrorMessage": "Store High-Availability error."
},
"Container": {
"UUID": "bb41d6d23772d9062d1eb7c7fe6864246bafae565b7ecae32972492e63c61006"
},
"HighAvailability": {
"Status": "Active",
"Enabled": true,
"Group": "mygroup02",
"NewRole": "UNSYNC_ERROR",
"OldRole": "UNSYNC",
"ActiveNodes": 1,
"SyncNodes": 0,
"PrimaryLastTime": "2016-04-13T13:05:02Z",
"PctSyncCompletion": -1,
"ReasonCode": 2,
"ReasonString": " - DISCOVERY_TIMEOUT",
"RemoteServerName": ""
},
"Cluster": {
"Status": "Initializing",
"Name": "MyCluster",
"Enabled": true,
"ConnectedServers": 0,
"DisconnectedServers": 0
},
"Plugin": {
"Status": "Inactive",
"Enabled": false
},
"MQConnectivity": {
"Status": "Inactive",
"Enabled": false
},
"SNMP": {
"Status": "Inactive",
"Enabled": false
}
}
{
"Version": "v1",
"Server": {
"Name": "examplesystem02:9089",
"UID": "DnAUsuJb",
"Status": "Running",
"State": 9,
"StateDescription": "Running (maintenance)",
"ServerTime": "2016-04-13T19:22:50.403Z",
"UpTimeSeconds": 958,
"UpTimeDescription": "0 days 0 hours 15 minutes 58 seconds",
"Version": "2.0 20160413-1109",
"ErrorCode": 112,
"ErrorMessage": "The property value is not valid: Property: Cluster.ControlAddress Value: \"NULL\"."
},
"Container": {
"UUID": "b308915aa0525a62eaf70a8f5c08b508153caac4e6d1200eb0cd9d53396c8c62"
},
"HighAvailability": {
"Status": "Active",
"Enabled": true,
"Group": "mygroup02",
"NewRole": "UNSYNC",
"OldRole": "UNSYNC",
"ActiveNodes": 0,
"SyncNodes": 0,
"PrimaryLastTime": "",
"PctSyncCompletion": 0,
"ReasonCode": 0,
"RemoteServerName": ""
},
"Cluster": {
"Status": "Unavailable",
"Enabled": true
},
"Plugin": {
"Status": "Inactive",
"Enabled": false
},
"MQConnectivity": {
"Status": "Inactive",
"Enabled": false
},
"SNMP": {
"Status": "Inactive",
"Enabled": false
}
}
In this scenario, a value for the cluster control address had not been specified on the standby server before the cluster was enabled. A similar error scenario can occur if the cluster messaging address is not specified.
Ensure that values for control address and messaging address are specified on both members of the HA pair before you enable them for cluster membership.
Restart both servers in the HA pair.
Error scenario 3: Did you attempt to disable clustering on an HA pair, and now, after restarting servers, your HA servers are in maintenance mode?
Check the status of your servers. Use the Eclipse Amlen REST API GET method with the following Eclipse Amlen service URI:
http://<Server-IP:Port>/ima/v1/service/status
"ErrorCode": 509,
"ErrorMessage": "Store High-Availability error."
"ReasonCode": 1,
"ReasonString": "Cluster.EnableClusterMembership - CONFIG_ERROR",
A possible cause of this error condition is that cluster membership was disabled on the primary server in the HA pair while the standby server was inactive.
Disable cluster membership on both servers in the HA pair. Restart both servers.
Error scenario 4: "ReasonString": "Store.TotalMemSizeMB_CONFIG_ERROR" is issued
"ReasonString": "Store.TotalMemSizeMB - CONFIG_ERROR"
in HA status indicates that there is a mismatch between the memory configuration of the store of the nodes and, consequently, the nodes cannot form an HA pair. A possible scenario in which this error can arise is when you are using one node that is a Docker container that has a controlled memory configuration, and another node that has been installed as an RPM on the host OS which means that all the memory that is available on the machine is used.
It is best practice to ensure that the two nodes in an HA pair are identical particularly with regard to the amount of memory available to them.