Scanning all records in a set


#1

Hi, I have around 6M records in a set. I would like to loop through them and perform certain actions (use as a base record to create an index elsewhere) In the first place, I am getting a timeout error. OK, lets increase timeout. 300 seconds. Still failing after ~30s. OK, let’s try concurrent scanning. Still failing with the timeout error OK, let’s ignore bins - bingo, it’s scanning through the records, but… how do I get an actual data? Accumulate keys in batch and use getMany?

Is this an intended way of handling things? Any other alternatives?


#2

OK, it failed miserably even in that case: An error occured while scanning[9] Timeout: timeout=999999 iterations=1 failedNodes=0 failedConns=0 3562595 records found

So… How do I fetch ALL my records?


#3

Are you trying to do this through AQL? What is the server/client version? How many hops between client calling application and aerospike server? Does the client system show its hitting upper limits of cpu/memory/network?


#4
  1. What version of the server?
  2. Which error is raised by the client?

#5

Latest PHP client, not AQL. No hops, they are on the same network (same AWS zone, same VPC)

I use r3.4xlarge instance which should be good enough. No replication at the moment.

3.14 server.

Error is shown above:

An error occured while scanning[9] Timeout: timeout=999999 iterations=1 failedNodes=0 failedConns=0

I tried querying the same aerospike set using node.js and it worked like a charm - all 8M records were processed with no issues.


#6

Could be the PHP client and your code is taking too much time to process and causing the socket to timeout.

With server 3.14, you should be able to set the socket timeout for scan. This is possible since version 3.12.1:

[AER-5510] - (SCAN) Write idle-time-out now configurable from client.

I am not sure if the PHP client supports setting this property on scan, but I would hope it does. Otherwise, you typically have only a 10 second timeout at the socket level after which the server will close the connection. I believe this is different than the read timeout you are passsing. (You can confirm this is happening by checking the server side logs).


#7

Thank you. I’ve coded a prototype in node.js that I’m happy with - consistent read speed and no lost WRITEs.

So probably I have to blame PHP implementation for Aerospike, it didn’t work for my case…


#8

Release 7.0.2 of the PHP client adds the necessary Aerospike::OPT_SOCKET_TIMEOUT param to the options for the query and scan methods.