ScanAll and aql select timeout!

Hey there,

I’m getting timeouts (no matter what TIMEOUT is set to) when executing a scan.

Using Go:

Just gets to: “Printing…”

Using AQL:

INSERT INTO test.demo (PK, foo, bar) VALUES (‘key1’, 123, ‘abc’)
SHOW SETS
±----------±-----------------±---------------±---------------±---------±----------------±-----------+
| n_objects | disable-eviction | set-enable-xdr | n-bytes-memory | ns_name | set_name | set-delete |
±----------±-----------------±---------------±---------------±---------±----------------±-----------+
| 1 | “false” | “use-default” | 34 | “test” | “demo” | “false” |

SELECT * FROM test.demo
Error: (9) Timeout: timeout=1000 iterations=1 failedNodes=0 failedConns=0

SELECT * FROM test.demo WHERE PK = ‘key1’
Specifying they key works fine!

What’s the reason behind scanning one record working but scanning all not working?

Your gist doesn’t show you setting any timeout explicitly, so it will be using the default of 1000ms, and similarly your AQL query fails when you use the default timeout value. In both cases, set a higher timeout, because it seems to require that for your data.

Getting a single record is much faster because Aerospike is like a distributed hash table and a single key lookup is O(1). A scan, however, has to go over the 4096 logical partitions of the namespace looking at each record within those for whether it matches your specified set.

1 Like

Hey @rbotzer thanks for the quick reply!
As I mentioned in the beginning of my original post: “I’m getting timeouts (no matter what TIMEOUT is set to) when executing a scan”

I’m sorry for not explicitly adding it to the Gist (that was my bad), just updated it. The same goes for the AQL part.

The output for the Golang app:

go run main.go
Connecting…
Scanning…
Printing…
done.

Adding the time out here, worked. But didn’t get/print any data at all.

And for the AQL part:

aql> SET TIMEOUT 60000
aql> SELECT * FROM test.demo
Error: (9) Timeout: timeout=60000 iterations=1 failedNodes=0 failedConns=0

Now, a key part here is that test.demo has only 1 record as you can see under n_objects of my original post.

So, I understand your point about the lookup complexity for a single key vs a scan, but 60 seconds should be enough? :slight_smile:

Any other ideas why this should be happening?

Well, something is going on, because that’s plenty of time. In general, small set scans (such as this) take a lot of time relative to the size of the data, because of the overhead of scanning the 4096 partitions. It’s something that amortizes well over a large set of data. There will be an improvement of this in future releases.

Still, this is wrong. How many nodes in your cluster, and what’s the version of the server and client you’re using?

Hey @rbotzer I’m pasting the full asinfo output just in case:

asinfo -h <node_ip> -p 3000
1 :  node
     BB9D3BD70664302
2 :  statistics
     cluster_size=2;cluster_key=A607AFBCBFB508AF;cluster_integrity=true;objects=3753;sub-records=0;total-bytes-disk=25769803776;used-bytes-disk=17360640;free-pct-disk=99;total-bytes-memory=15032385536;used-bytes-memory=240226;data-used-bytes-memory=34;index-used-bytes-memory=240192;sindex-used-bytes-memory=0;free-pct-memory=99;stat_read_reqs=32273;stat_read_reqs_xdr=0;stat_read_success=26998;stat_read_errs_notfound=5275;stat_read_errs_other=0;stat_write_reqs=3852;stat_write_reqs_xdr=0;stat_write_success=3818;stat_write_errs=34;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=18;stat_rw_timeout=0;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=22728;udf_scan_rec_reqs=22728;udf_query_rec_reqs=22728;udf_replica_writes=0;stat_proxy_reqs=34;stat_proxy_reqs_xdr=0;stat_proxy_success=34;stat_proxy_errs=0;stat_ldt_proxy=0;stat_cluster_key_trans_to_proxy_retry=0;stat_cluster_key_transaction_reenqueue=0;stat_slow_trans_queue_push=0;stat_slow_trans_queue_pop=0;stat_slow_trans_queue_batch_pop=0;stat_cluster_key_regular_processed=0;stat_cluster_key_prole_retry=0;stat_cluster_key_err_ack_dup_trans_reenqueue=0;stat_cluster_key_partition_transaction_queue_count=0;stat_cluster_key_err_ack_rw_trans_reenqueue=0;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;err_tsvc_requests=108;err_tsvc_requests_timeout=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=0;err_rw_pending_limit=0;err_rw_cant_put_unique=0;fabric_msgs_sent=153813;fabric_msgs_rcvd=153798;paxos_principal=BB9D3BD70664302;migrate_msgs_sent=80516;migrate_msgs_recv=146055;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=32768;migrate_num_incoming_refused=0;queue=0;transactions=3007289;reaped_fds=643;scans_active=43;basic_scans_succeeded=417;basic_scans_failed=40;aggr_scans_succeeded=88;aggr_scans_failed=33;udf_bg_scans_succeeded=12;udf_bg_scans_failed=0;batch_index_initiate=0;batch_index_queue=0:0,0:0,0:0,0:0;batch_index_complete=0;batch_index_timeout=0;batch_index_errors=0;batch_index_unused_buffers=0;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=34;proxy_action=0;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=3852;write_prole=3750;read_dup_prole=0;rw_err_dup_internal=0;rw_err_dup_cluster_key=0;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=0;rw_err_ack_badnode=0;client_connections=53;waiting_transactions=0;tree_count=0;record_refs=3753;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=0;err_storage_queue_full=0;partition_actual=8288;partition_replica=8096;partition_desync=0;partition_absent=0;partition_object_count=3753;partition_ref_count=16388;system_free_mem_pct=98;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=3;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=28;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=3;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=0;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=0;uptime=9090481;stat_write_errs_notfound=3;stat_write_errs_other=31;heartbeat_received_self=0;heartbeat_received_foreign=72665742;query_reqs=2;query_success=0;query_fail=2;query_abort=0;query_avg_rec_count=0;query_short_running=0;query_long_running=0;query_short_queue_full=0;query_long_queue_full=0;query_short_reqs=0;query_long_reqs=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0
3 :  features
     float;batch-index;replicas-all;replicas-master;replicas-prole;udf
4 :  cluster-generation
     5
5 :  partition-generation
     48985
6 :  edition
     Aerospike Community Edition
7 :  version
     Aerospike Community Edition build 3.6.3
8 :  build
     3.6.3
9 :  services
     1.2.3.4:3000
10 :  services-alumni
     1.2.3.4:3000

So:

  • cluster_size=2
  • Aerospike Community Edition build 3.6.3
  • Aerospike Tools 3.8.2 (aql)
  • 1.13.0 Go Client Library (Golang apps)

A quick note - your Go client version is the latest and expects a newer version of the server. If you look at the release notes you’ll see that there are several features that require version > 3.6.

You have a 2 node cluster, so it probably would not be too difficult for you to upgrade to the latest server, currently 3.8.2.3. Let’s see if the problem persists after that.

hey @rbotzer just deployed the new version of the server. Doing an aql SELECT now works!

INSERT INTO test.demo (PK, foo, bar) VALUES (‘key1’, 123, ‘abc’)

SHOW SETS

±----------±-----------------±---------------±---------------±---------±----------------±-----------+
| n_objects | disable-eviction | set-enable-xdr | n-bytes-memory | ns_name | set_name | set-delete |
±----------±-----------------±---------------±---------------±---------±----------------±-----------+
| 1 | “false” | “use-default” | 34 | “test” | “demo” | “false” |

aql> SELECT * FROM test.demo
±----±------+
| foo | bar |
±----±------+
| 123 | “abc” |
±----±------+
1 row in set (0.677 secs)

I still need to do a “real” test from the Go client once all of the data from the old cluster is imported. But this works so far.

Also, something interesting happened when trying to do a ScanAll from test.demo. If I do: championSet, err := client.ScanAll(nil, "test", "demo") it works as expected, printing:

go run main.go
Connecting...
Scanning...
Printing...
da
here?!
"test:demo::ec 91 19 2d 4b 7f 8c e3 5d 5d 78 d3 4b ca 65 cb aa aa c9 60 "
{
  "foo": 123,
  "bar": "abc",
}
done.

But if I create a new scan policy and specify a large timeout value, I get a timeout! e.g:

policy := as.NewScanPolicy()
policy.Timeout = 60000

championSet, err := client.ScanAll(policy, "test", "demo")

produces: 2016/05/13 15:29:49 Error processing results: read tcp 10.222.222.222:60118->1.2.3.4:3000: i/o timeout.

I have no idea why this is happening. I will post again, once I import the data from the old cluster and test everything to confirm whether the upgrade solved everything or not.

Thanks a lot for your help!

Hi, you are giving scan command only 60 microseconds to complete, which is way too short.

Use:

    policy.Timeout = time.Millisecond

and it will perform properly.

hey @Khosrow_Afroozeh I’m not sure what you mean. By setting policy.Timeout = 60000 I’m effectively giving 60 seconds for the scan to complete. Setting it to 60 second timeout, is working properly after the version upgrade. Before the version upgrade, even 60 seconds weren’t enough. My theory is that maybe something broke? Anyway, it works now :slight_smile:

Thanks for the input though!

60000 in Go time unit (nanoseconds) == 60 microseconds.

To set the timeout to 6 seconds you need 6 * time.Second which equals 6 * 1e9

Hey @Khosrow_Afroozeh you are absolutely right!
I keep forgetting about time.Duration being nanoseconds :frowning2:

Thank you!!!

Just to update everyone and close the thread.
Everything is working as expected after the update.
Go snippet (with the right changes noted by @Khosrow_Afroozeh) and aql queries are working smoothly!

Thanks everyone! :slight_smile:

Hi, If i want to scan small set, is it better to create a secondary index instead of using scan all with timeout?

It depends on the size of the set relative to the size of the namespace it is in. If it’s very small compared to the namespace, you can do one of these:

  • Create a separate namespace for that small set.
  • Create a secondary index over a redundant bin that identifies that set (for example a single character that is indexed, identifying those records as members of the specific set).