ScanAll and aql select timeout!

javjar · May 11, 2016, 5:07pm

Hey there,

I’m getting timeouts (no matter what TIMEOUT is set to) when executing a scan.

Using Go:

gist.github.com

https://gist.github.com/javjar/3568ee63befb57c832fe6b665d54d172

as.go

package main

import (
	"fmt"
	"log"

	as "github.com/aerospike/aerospike-client-go"
	"github.com/k0kubun/pp"
)

This file has been truncated. show original

Just gets to: “Printing…”

Using AQL:

SELECT * FROM test.demo
Error: (9) Timeout: timeout=1000 iterations=1 failedNodes=0 failedConns=0

SELECT * FROM test.demo WHERE PK = ‘key1’
Specifying they key works fine!

What’s the reason behind scanning one record working but scanning all not working?

rbotzer · May 11, 2016, 5:58pm

Your gist doesn’t show you setting any timeout explicitly, so it will be using the default of 1000ms, and similarly your AQL query fails when you use the default timeout value. In both cases, set a higher timeout, because it seems to require that for your data.

Getting a single record is much faster because Aerospike is like a distributed hash table and a single key lookup is O(1). A scan, however, has to go over the 4096 logical partitions of the namespace looking at each record within those for whether it matches your specified set.

javjar · May 11, 2016, 6:55pm

Hey @rbotzer thanks for the quick reply!
As I mentioned in the beginning of my original post: “I’m getting timeouts (no matter what TIMEOUT is set to) when executing a scan”

I’m sorry for not explicitly adding it to the Gist (that was my bad), just updated it. The same goes for the AQL part.

The output for the Golang app:

go run main.go
Connecting…
Scanning…
Printing…
done.

Adding the time out here, worked. But didn’t get/print any data at all.

And for the AQL part:

aql> SET TIMEOUT 60000
aql> SELECT * FROM test.demo
Error: (9) Timeout: timeout=60000 iterations=1 failedNodes=0 failedConns=0

Now, a key part here is that test.demo has only 1 record as you can see under n_objects of my original post.

So, I understand your point about the lookup complexity for a single key vs a scan, but 60 seconds should be enough?

Any other ideas why this should be happening?

rbotzer · May 11, 2016, 7:58pm

Well, something is going on, because that’s plenty of time. In general, small set scans (such as this) take a lot of time relative to the size of the data, because of the overhead of scanning the 4096 partitions. It’s something that amortizes well over a large set of data. There will be an improvement of this in future releases.

Still, this is wrong. How many nodes in your cluster, and what’s the version of the server and client you’re using?

javjar · May 11, 2016, 8:31pm

Hey @rbotzer I’m pasting the full asinfo output just in case:

asinfo -h <node_ip> -p 3000
1 :  node
     BB9D3BD70664302
2 :  statistics
     cluster_size=2;cluster_key=A607AFBCBFB508AF;cluster_integrity=true;objects=3753;sub-records=0;total-bytes-disk=25769803776;used-bytes-disk=17360640;free-pct-disk=99;total-bytes-memory=15032385536;used-bytes-memory=240226;data-used-bytes-memory=34;index-used-bytes-memory=240192;sindex-used-bytes-memory=0;free-pct-memory=99;stat_read_reqs=32273;stat_read_reqs_xdr=0;stat_read_success=26998;stat_read_errs_notfound=5275;stat_read_errs_other=0;stat_write_reqs=3852;stat_write_reqs_xdr=0;stat_write_success=3818;stat_write_errs=34;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=18;stat_rw_timeout=0;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=22728;udf_scan_rec_reqs=22728;udf_query_rec_reqs=22728;udf_replica_writes=0;stat_proxy_reqs=34;stat_proxy_reqs_xdr=0;stat_proxy_success=34;stat_proxy_errs=0;stat_ldt_proxy=0;stat_cluster_key_trans_to_proxy_retry=0;stat_cluster_key_transaction_reenqueue=0;stat_slow_trans_queue_push=0;stat_slow_trans_queue_pop=0;stat_slow_trans_queue_batch_pop=0;stat_cluster_key_regular_processed=0;stat_cluster_key_prole_retry=0;stat_cluster_key_err_ack_dup_trans_reenqueue=0;stat_cluster_key_partition_transaction_queue_count=0;stat_cluster_key_err_ack_rw_trans_reenqueue=0;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;err_tsvc_requests=108;err_tsvc_requests_timeout=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=0;err_rw_pending_limit=0;err_rw_cant_put_unique=0;fabric_msgs_sent=153813;fabric_msgs_rcvd=153798;paxos_principal=BB9D3BD70664302;migrate_msgs_sent=80516;migrate_msgs_recv=146055;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=32768;migrate_num_incoming_refused=0;queue=0;transactions=3007289;reaped_fds=643;scans_active=43;basic_scans_succeeded=417;basic_scans_failed=40;aggr_scans_succeeded=88;aggr_scans_failed=33;udf_bg_scans_succeeded=12;udf_bg_scans_failed=0;batch_index_initiate=0;batch_index_queue=0:0,0:0,0:0,0:0;batch_index_complete=0;batch_index_timeout=0;batch_index_errors=0;batch_index_unused_buffers=0;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=34;proxy_action=0;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=3852;write_prole=3750;read_dup_prole=0;rw_err_dup_internal=0;rw_err_dup_cluster_key=0;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=0;rw_err_ack_badnode=0;client_connections=53;waiting_transactions=0;tree_count=0;record_refs=3753;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=0;err_storage_queue_full=0;partition_actual=8288;partition_replica=8096;partition_desync=0;partition_absent=0;partition_object_count=3753;partition_ref_count=16388;system_free_mem_pct=98;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=3;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=28;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=3;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=0;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=0;uptime=9090481;stat_write_errs_notfound=3;stat_write_errs_other=31;heartbeat_received_self=0;heartbeat_received_foreign=72665742;query_reqs=2;query_success=0;query_fail=2;query_abort=0;query_avg_rec_count=0;query_short_running=0;query_long_running=0;query_short_queue_full=0;query_long_queue_full=0;query_short_reqs=0;query_long_reqs=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0
3 :  features
     float;batch-index;replicas-all;replicas-master;replicas-prole;udf
4 :  cluster-generation
     5
5 :  partition-generation
     48985
6 :  edition
     Aerospike Community Edition
7 :  version
     Aerospike Community Edition build 3.6.3
8 :  build
     3.6.3
9 :  services
     1.2.3.4:3000
10 :  services-alumni
     1.2.3.4:3000

So:

cluster_size=2
Aerospike Community Edition build 3.6.3
Aerospike Tools 3.8.2 (aql)
1.13.0 Go Client Library (Golang apps)

rbotzer · May 11, 2016, 8:55pm

A quick note - your Go client version is the latest and expects a newer version of the server. If you look at the release notes you’ll see that there are several features that require version > 3.6.

You have a 2 node cluster, so it probably would not be too difficult for you to upgrade to the latest server, currently 3.8.2.3. Let’s see if the problem persists after that.

javjar · May 13, 2016, 6:35pm

hey @rbotzer just deployed the new version of the server. Doing an aql SELECT now works!

INSERT INTO test.demo (PK, foo, bar) VALUES (‘key1’, 123, ‘abc’)

SHOW SETS

±----------±-----------------±---------------±---------------±---------±----------------±-----------+
| n_objects | disable-eviction | set-enable-xdr | n-bytes-memory | ns_name | set_name | set-delete |
±----------±-----------------±---------------±---------------±---------±----------------±-----------+
| 1 | “false” | “use-default” | 34 | “test” | “demo” | “false” |

aql> SELECT * FROM test.demo
±----±------+
| foo | bar |
±----±------+
| 123 | “abc” |
±----±------+
1 row in set (0.677 secs)

I still need to do a “real” test from the Go client once all of the data from the old cluster is imported. But this works so far.

Also, something interesting happened when trying to do a ScanAll from test.demo. If I do: championSet, err := client.ScanAll(nil, "test", "demo") it works as expected, printing:

go run main.go
Connecting...
Scanning...
Printing...
da
here?!
"test:demo::ec 91 19 2d 4b 7f 8c e3 5d 5d 78 d3 4b ca 65 cb aa aa c9 60 "
{
  "foo": 123,
  "bar": "abc",
}
done.

But if I create a new scan policy and specify a large timeout value, I get a timeout! e.g:

policy := as.NewScanPolicy()
policy.Timeout = 60000

championSet, err := client.ScanAll(policy, "test", "demo")

produces: 2016/05/13 15:29:49 Error processing results: read tcp 10.222.222.222:60118->1.2.3.4:3000: i/o timeout.

I have no idea why this is happening. I will post again, once I import the data from the old cluster and test everything to confirm whether the upgrade solved everything or not.

Thanks a lot for your help!

Khosrow_Afroozeh · May 16, 2016, 1:27pm

Hi, you are giving scan command only 60 microseconds to complete, which is way too short.

Use:

    policy.Timeout = time.Millisecond

and it will perform properly.

javjar · May 16, 2016, 3:38pm

hey @Khosrow_Afroozeh I’m not sure what you mean. By setting policy.Timeout = 60000 I’m effectively giving 60 seconds for the scan to complete. Setting it to 60 second timeout, is working properly after the version upgrade. Before the version upgrade, even 60 seconds weren’t enough. My theory is that maybe something broke? Anyway, it works now

Thanks for the input though!

Khosrow_Afroozeh · May 16, 2016, 4:34pm

60000 in Go time unit (nanoseconds) == 60 microseconds.

To set the timeout to 6 seconds you need 6 * time.Second which equals 6 * 1e9

javjar · May 16, 2016, 4:42pm

Hey @Khosrow_Afroozeh you are absolutely right!
I keep forgetting about time.Duration being nanoseconds

Thank you!!!

javjar · May 18, 2016, 7:19pm

Just to update everyone and close the thread.
Everything is working as expected after the update.
Go snippet (with the right changes noted by @Khosrow_Afroozeh) and aql queries are working smoothly!

Thanks everyone!

nguyenquang · December 1, 2017, 3:49am

Hi, If i want to scan small set, is it better to create a secondary index instead of using scan all with timeout?

rbotzer · December 1, 2017, 4:33am

It depends on the size of the set relative to the size of the namespace it is in. If it’s very small compared to the namespace, you can do one of these:

Create a separate namespace for that small set.
Create a secondary index over a redundant bin that identifies that set (for example a single character that is indexed, identifying those records as members of the specific set).

Topic		Replies	Views
Iterating via Scan Seems to be Quite Slow Go Client query , secondary , scan , primary , index	5	4008	June 23, 2015
Aql select command encounters timeout issue when another scan job for different set is running AQL	6	3087	April 13, 2017
"Client timeout: timeout=2000 iterations=1 failedNodes=0 failedConns=0" err on scan request PHP Client Library	0	2290	December 23, 2015
Very slow fetching of records with scanAll() Java Client secondary , scan , spring , index	17	7580	March 2, 2015
Scanning all records in a set PHP Client Library	7	1894	October 5, 2017

ScanAll and aql select timeout!

Related topics