Aerospike-hadoop connector closes prematurely during scan


#1

We have 6 aerospike nodes and are using the aerospike-hadoop connector to scan for all the records. After scanning about 86,000 records the aerospike log says the scan job is done, even though there are a lot more records to scan for. Here is an example log:

Jan 13 2015 20:08:26 GMT: INFO (scan): (thr_tscan.c::1188) tid 4: scan send response error returned -1 errno 104 fd 269

Jan 13 2015 20:08:26 GMT: INFO (scan): (thr_tscan.c::277) SCAN JOB DONE [id =4: ns= user_profile_store set=user_profile scanned=86462 expired=0 set_diff=0 elapsed=14602 (ms)]

Jan 13 2015 20:08:26 GMT: INFO (scan): (thr_tscan.c::1218) tid 4: no more fh. Probably client closed up connection

Is there an issue with the connector or did we not set it up correctly?

Thanks, Phu


#2

Phu,

I’m afraid I’ve got some questions before I can investigate …

Can you tell me the version of aerospike-hadoop you are using? I think typing the following command at the top of your aerospike-hadoop tree will tell me what I need to know:

cd aerospike-hadoop
git rev-parse --short HEAD

It would also be helpful to know the versions of the aerospike server and client.

Finally, you mentioned that the aerospike cluster contained 6 nodes. Is the hadoop job running on the same cluster or on different machines? How many hadoop nodes are there?

Thanks in advance!

Ken Sedgwick ken@aerospike.com


#3

Hey Ken,

Thanks for your reply. I built the jar from master and this is what the command returns:

git rev-parse --short HEAD

b43ac6f

Server version is 3.3.21 and client version is 3.0.28

Hadoop is running on different machines and there are 12 hadoop nodes.

Hope this helps.

Thanks, Phu


#4

Phu,

Thanks, looks like you’ve got the latest version. I’ve got an idea what might be going on and will attempt to reproduce.

Ken


#5

Phu,

I ran some tests using more hadoop nodes then aerospike nodes. I didn’t see the problem that you encountered.

Is there any chance you could send me logs containing the aerospike debugging from your run? Specifically, I’d like to confirm the split creation lines that look like this:

15/01/14 14:22:11 INFO aggregateintinput.AggregateIntInput: run starting on bin bin1
15/01/14 14:22:12 INFO client.RMProxy: Connecting to ResourceManager at as0/192.168.1.23:8032
15/01/14 14:22:14 INFO mapreduce.AerospikeConfigUtil: using aerospike.input.operation = scan
15/01/14 14:22:14 INFO mapreduce.AerospikeConfigUtil: using aerospike.input.host = localhost
15/01/14 14:22:14 INFO mapreduce.AerospikeConfigUtil: using aerospike.input.port = 3000
15/01/14 14:22:14 INFO mapreduce.AerospikeConfigUtil: using aerospike.input.namespace = test
15/01/14 14:22:14 INFO mapreduce.AerospikeConfigUtil: using aerospike.input.setname = integers
15/01/14 14:22:14 INFO mapreduce.AerospikeConfigUtil: using aerospike.input.binnames = bin1
15/01/14 14:22:14 INFO mapreduce.AerospikeInputFormat: using: localhost 3000 test integers
15/01/14 14:22:15 INFO mapreduce.AerospikeInputFormat: found 2 nodes
15/01/14 14:22:15 INFO mapreduce.AerospikeInputFormat: split: scan:BB9E4C91D192200:192.168.1.23:3000:test:integers
15/01/14 14:22:15 INFO mapreduce.AerospikeInputFormat: split: scan:BB9EBA9CC192200:192.168.1.17:3000:test:integers
15/01/14 14:22:15 INFO mapreduce.JobSubmitter: number of splits:2
15/01/14 14:22:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1413483527218_0046
15/01/14 14:22:15 INFO impl.YarnClientImpl: Submitted application application_1413483527218_0046

Regards,

Ken


#6

Hey Ken,

We have found a workaround for this issue. I appreciate your time and help in this issue!

Thanks, Phu


#7

Phu,

Glad to hear you are around the problem! What was the workaround?

Regards,

Ken