Charts stop working after a few hours (AMC 3.5.4) [Released]

We use AMC with a 1 second refresh and the charts on the homepage stop refreshing completely after a few hours.

Refreshing the page will draw the chart again but it’ll slowly disappear from left to right rather than updating from the right.

Same reaction in all browsers (Chrome, IE, Firefox, Opera) on Windows 8.1.

Could we get a screenshot and the version number for AMC?

AMC Enterprise 3.5.1 with 1 sec update interval.

The charts disappear from left to right until they’re blank and the totals counters are static. The Nodes section also is static. The Namespaces section does update. I can open up the AMC url to an IP if a dev wants access to view it. It’s running on its own t2.micro instance on AWS using the latest Amazon linux AMI. It runs nothing else, AMC is all that’s installed.

Hi,

Is this a regular issue? If yes, Can you please provide us with steps to reproduce this issue?

In case you see this again, can you find out 3 to 4 consecutive response of following API when the graphs start disappearing

http://<AMC_Address>/aerospike/service/clusters/<cluster_id>/throughput

and

http://<AMC_Address>/aerospike/service/clusters/<cluster_id>/nodes/<comma_separated_nodes_selection>/allstats?type=all

You can also get the responses of these calls from browser developer tools under network panel.

Yep, did some more research and if I leave IE open with that page, it seems to keep rendering fine.

However if I don’t leave any browsers running with the page open, after a few hours (not quite sure how many exactly) the charts and nodes section will not update.

http://ec2-54-175-71-113.compute-1.amazonaws.com:8081/aerospike/service/clusters/5c79f69c-d4c1-4322-9f85-23415dfc3d9c/throughput

{"cluster_status": "on", "write_tps": {"172.31.12.156:3000": {"y": 178.0, "x": 1421431897387, "secondary": 178.0}, "172.31.12.155:3000": {"y": 233.0, "x": 1421431897395, "secondary": 233.0}, "172.31.0.79:3000": {"y": 309.0, "x": 1421431897392, "secondary": 309.0}}, "read_tps": {"172.31.12.156:3000": {"y": 228.0, "x": 1421431897387, "secondary": 265.0}, "172.31.12.155:3000": {"y": 81.0, "x": 1421431897395, "secondary": 122.0}, "172.31.0.79:3000": {"y": 546.0, "x": 1421431897392, "secondary": 587.0}}}

{"cluster_status": "on", "write_tps": {"172.31.12.156:3000": {"y": 178.0, "x": 1421431897387, "secondary": 178.0}, "172.31.12.155:3000": {"y": 233.0, "x": 1421431897395, "secondary": 233.0}, "172.31.0.79:3000": {"y": 309.0, "x": 1421431897392, "secondary": 309.0}}, "read_tps": {"172.31.12.156:3000": {"y": 228.0, "x": 1421431897387, "secondary": 265.0}, "172.31.12.155:3000": {"y": 81.0, "x": 1421431897395, "secondary": 122.0}, "172.31.0.79:3000": {"y": 546.0, "x": 1421431897392, "secondary": 587.0}}}

{"cluster_status": "on", "write_tps": {"172.31.12.156:3000": {"y": 178.0, "x": 1421431897387, "secondary": 178.0}, "172.31.12.155:3000": {"y": 233.0, "x": 1421431897395, "secondary": 233.0}, "172.31.0.79:3000": {"y": 309.0, "x": 1421431897392, "secondary": 309.0}}, "read_tps": {"172.31.12.156:3000": {"y": 228.0, "x": 1421431897387, "secondary": 265.0}, "172.31.12.155:3000": {"y": 81.0, "x": 1421431897395, "secondary": 122.0}, "172.31.0.79:3000": {"y": 546.0, "x": 1421431897392, "secondary": 587.0}}}

http://ec2-54-175-71-113.compute-1.amazonaws.com:8081/aerospike/service/clusters/5c79f69c-d4c1-4322-9f85-23415dfc3d9c/xdr/3004/nodes/172.31.12.156:3000,172.31.0.79:3000,172.31.12.155:3000

{"172.31.12.156:3000": {"xdr_status": "off", "node_status": "on"}, "172.31.12.155:3000": {"xdr_status": "off", "node_status": "on"}, "172.31.0.79:3000": {"xdr_status": "off", "node_status": "on"}}

{"172.31.12.156:3000": {"xdr_status": "off", "node_status": "on"}, "172.31.12.155:3000": {"xdr_status": "off", "node_status": "on"}, "172.31.0.79:3000": {"xdr_status": "off", "node_status": "on"}}

{"172.31.12.156:3000": {"xdr_status": "off", "node_status": "on"}, "172.31.12.155:3000": {"xdr_status": "off", "node_status": "on"}, "172.31.0.79:3000": {"xdr_status": "off", "node_status": "on"}}

http://ec2-54-175-71-113.compute-1.amazonaws.com:8081/aerospike/service/clusters/5c79f69c-d4c1-4322-9f85-23415dfc3d9c/xdr/3004/nodes/172.31.12.156:3000,172.31.0.79:3000,172.31.12.155:3000/allstats?type=all

{"xdr_status": "off", "node_status": "off"}

{"xdr_status": "off", "node_status": "off"}

{"xdr_status": "off", "node_status": "off"}

Also another issue seems to be the “objects” count for each set on the Definitions page.

The counts always seem to be way under the actual count of objects (confirmed with a scan) and go down every the cluster is changed (node added/removed).

Hi,

We were unable to reproduce the exact scenario where in the graphs disappears to right and we are still debugging the issue, however we did found a bug on similar lines which is fixed and due for release in the next build.

Meanwhile, can you check if the same issue reproduces on keeping AMC refresh interval at 5 seconds.

Also we will verify the objects count on definition page and revert back to you soon.

Happens at both intervals.

If I keep the page open in IE, it stays working. If no windows are open, revisiting in a few hours leads to the charts/refresh issue. I can open access to the AMC site to Aerospike staff if anyone wants to see the effect live.

That won’t be necessary. We have identified the bug for charts refresh issue and are working to release a patch with next AMC release.

Installed the 3.5.2 release and this issue is resolved now (outside of the known issue of a straight line with a very high number when first viewing charts after inactivity).

Object counts in the definitions page are still off.

Hi,

Could you tell me your setup and how the values are inserted using which client? The more information the better.

Thanks Petter

Here’s a look at one of the namespaces: Cookies - used for user profile data

We have a confirmed 100MM + objects total spread across 2 different sets but the definitions page shows much smaller numbers. Replication factor is 2 with SSD storage on AWS with 3 x c3.8xlarge instances and 1 x i2.xlarge instance in a single cluster across 4 availability zones with mesh and set to interval of 150ms / 20x. All operations done using latest version of .NET client. 2 secondary indexes in namespace.

Hi,

Please walk me through how you confirmed this as I am trying to replicate the problem.

Thanks Petter

We exported all of the data from Aerospike into offline system for some data analysis experiments. Unique keys were above 100MM.

It seems like the definitions page counts are corrupted when the cluster nodes change. We routinely rotate the actual underlying EC2 VMs, usually every 3-4 days a node will be completely replaced by adding a new instance then waiting for migrates to complete then removing an “older” one.

Running mesh across availability zone is in no way supported.

Lets compare the number on the dashboard to the ones from the command line.

Grab another screenshot and please compare that to the numbers from: asinfo -v stats | grep objects

Second, please show the output from asinfo -v sets

Also, is your client always specifying a set?

Thanks Petter

Same thing happens when the entire cluster is in the same availability zone. We have a decent chunk of data running on several high end machines, we lose the entire cost benefit if we have to set up an entire cluster in each AZ. The latency is under 1ms between AZ and we’ve had no problems with the expanded interval timing.

Hi,

Would it be possible for you to provide the data I requested?

Thanks

AMC 3.5.4 just released which should fix the very high TPS when viewing charts after inactivity.

1 Like