Experiencing pauses in data when running AerospikeClient.scanNode

mharris · August 13, 2015, 6:03pm

I am experiencing an odd behavior when running scans using AerospikeClient.scanNode (java API) to retrieve all records in a set. (Some background that may or may not be relevant: This code is running inside hadoop map-reduce, in which there is one mapper for each node).

There is a long delay (1-3 minutes) before the first record is returned, then a period of 1-3 minutes where records are returned (at a rate of a few thousand per second), then a 1-3 minute delay, then 1-3 minutes of data, and so on.

There are no other jobs running. The cluster appears to be otherwise healthy, responding to a few thousand non-job related requests per second.

Does anyone have any suggestions for what may be causing this, or how to proceed investigating?

Thanks,
Marc

helipilot50 · August 14, 2015, 8:18pm

Hi Marc

You might have a heap or GC issue in your java application.

Are you using scanAll(), or query() without a filter, to retrieve the records?

regards Peter

mharris · August 17, 2015, 3:16pm

I am using scanNode().

I increased the heap (it was at 200MB) and the problem went away.

Unfortunately we will never know if this was the real problem, because between running with the smaller heap and the bigger heap, our system went through a rolling upgrade. This is our production system and I don’t have the ability to conduct experiments on it. Thanks for your help.

helipilot50 · August 17, 2015, 3:39pm

Hi Marc

If you use query() with NO filter it is the same as a scan(), but the advantage is that the records are returned through a RecordSet in a controlled manor. There is a blocking queue between your application and the records that the nodes are returning. When the queue fills, the nodes pause the scan jobs. When you read from the queue, the scan jobs resume.

Its a very controlled way of reading large RecordSets and it makes it easier to control the heap space requirements.

I hope this helps

Peter

mharris · August 17, 2015, 3:56pm

Interesting. We actually implemented a similar solution when using scan (using a blocking queue, which would block inside our implementation of the callback method) to cope with this exact problem. Perhaps we should have used query instead?

helipilot50 · August 20, 2015, 11:48am

It’s up to you, but query controls the jobs running on each node via the protocol, so it is my favorite. Be sure to close the record set when you are finished with it.

Regards

Peter

Topic		Replies	Views
Scan/query all records returns only half of the records after a restart of any node Java Client	3	1992	March 10, 2016
NodeJS Client Scan is not Asynchronous Operations	15	3062	May 20, 2016
Very slow fetching of records with scanAll() Java Client secondary , scan , spring , index	17	7565	March 2, 2015
scanAll() yields inconsistent results Java Client	4	1833	February 28, 2017
Scans suddenly stop responding Query & Indexing	4	1971	February 9, 2016

Experiencing pauses in data when running AerospikeClient.scanNode

Related topics