Scan and duplicated records (AER-3648)

moreno · May 14, 2015, 9:35am

HI all,

I’m using Aerospike 3.5.4

Cluster is composed of 3 nodes (replication factor 2).

Using Java Client (the one of the online course) I inserted 3 records in “Users” SET and 3 records in “Tweets” SET.

When Java Client tries to scan all tweets of all users it gets duplicated records!

The same result if a use as client a python script or AQL.

In all 3 cases clients connect to the Cluster specifying the IP Address of one of the 3 nodes.

If clients specify, in connect phase, Ip Addresses of all the 3 Nodes, scan run correctly and duplicated records are not returned.

Am I missing something?

Many thanks in advance. Moreno

pratyyy · May 15, 2015, 6:24am

Hi moreno,

Can you share the following information with us ?

Is the namespace same for both the sets ?
AQL command and its output.

Thanks

moreno · May 15, 2015, 6:38am

Thank you pratyyy!

Yes, namespace is the same for both the sets.

Aerospike.conf:

namespace test { replication-factor 2 memory-size 4G default-ttl 30d storage-engine memory }

SETS are “users” and “tweets”.

AQL output when I connect to localhost:

aql> select * from test.users ±-----------±-----------±---------±-------±--------------±-----------±------------±-------+ | key | username | password | region | lasttweeted | tweetcount | interests | gender | ±-----------±-----------±---------±-------±--------------±-----------±------------±-------+ | “pietro” | “pietro” | “qwerty” | “n” | 1430984773607 | 1 | [“inter”] | “m” | | “pietro” | “pietro” | “qwerty” | “n” | 1430984773607 | 1 | [“inter”] | “m” | | “pietro” | “pietro” | “qwerty” | “n” | 1430984773607 | 1 | [“inter”] | “m” | | “giuseppe” | “giuseppe” | “qwerty” | “s” | 1430984928958 | 1 | [“sicilia”] | “f” | | “moreno” | “moreno” | “qwerty” | “n” | 1430984519751 | 1 | [“juve”] | “m” | | “giuseppe” | “giuseppe” | “qwerty” | “s” | 1430984928958 | 1 | [“sicilia”] | “f” | | “moreno” | “moreno” | “qwerty” | “n” | 1430984519751 | 1 | [“juve”] | “m” | ±-----------±-----------±---------±-------±--------------±-----------±------------±-------+

7 rows in set (1.392 secs)

(7 rows when I inserted 3)

If I connect to another host in the cluster:

aql -h 10.100.0.4

aql> select * from test.users ±-----------±-----------±---------±-------±-------±--------------±-----------±------------+ | key | username | password | gender | region | lasttweeted | tweetcount | interests | ±-----------±-----------±---------±-------±-------±--------------±-----------±------------+ | “giuseppe” | “giuseppe” | “qwerty” | “f” | “s” | 1430984928958 | 1 | [“sicilia”] | | “moreno” | “moreno” | “qwerty” | “m” | “n” | 1430984519751 | 1 | [“juve”] | | “pietro” | “pietro” | “qwerty” | “m” | “n” | 1430984773607 | 1 | [“inter”] | | “pietro” | “pietro” | “qwerty” | “m” | “n” | 1430984773607 | 1 | [“inter”] | | “pietro” | “pietro” | “qwerty” | “m” | “n” | 1430984773607 | 1 | [“inter”] | ±-----------±-----------±---------±-------±-------±--------------±-----------±------------+ 5 rows in set (1.563 secs)

The same when I use a python client:

./users.10.100.0.3.py

(‘test’, ‘users’, u’pietro’, bytearray(b’\xc8_\xf85\xd1\xfe\x00&\xfc\xe1\x0ctO,\x83\xf17!\xe3\xcf’)) {‘gen’: 13, ‘ttl’: 2589144} {‘username’: u’pietro’}

(‘test’, ‘users’, u’giuseppe’, bytearray(b’\xbf\x90\x01\xdf\x8c=\xf1\xfe\xda\xfa\nL\xaf\xec\nQ\xa4xE\x97’)) {‘gen’: 4, ‘ttl’: 2528163} {‘username’: u’giuseppe’}

(‘test’, ‘users’, u’moreno’, bytearray(b’t]87X\xb2\x82\x01\xf9\xc7P\xbcW\x81\xffKO\xd4\xd0\x18’)) {‘gen’: 2, ‘ttl’: 1901527} {‘username’: u’moreno’}

(‘test’, ‘users’, u’giuseppe’, bytearray(b’\xbf\x90\x01\xdf\x8c=\xf1\xfe\xda\xfa\nL\xaf\xec\nQ\xa4xE\x97’)) {‘gen’: 4, ‘ttl’: 2528163} {‘username’: u’giuseppe’}

(‘test’, ‘users’, u’moreno’, bytearray(b’t]87X\xb2\x82\x01\xf9\xc7P\xbcW\x81\xffKO\xd4\xd0\x18’)) {‘gen’: 2, ‘ttl’: 1901527} {‘username’: u’moreno’}

5 records.

If I use a client python that connect to hosts 10.100.0.3 and 10.100.0.4 the result is OK:

./users.10.100.0.3e4.py

(Here are the first rows of python script:

Configure the client

config = { ‘hosts’: [ (‘10.100.0.3’, 3000), (‘10.100.0.4’, 3000)] }

and its execution:

(‘test’, ‘users’, u’pietro’, bytearray(b’\xc8_\xf85\xd1\xfe\x00&\xfc\xe1\x0ctO,\x83\xf17!\xe3\xcf’)) {‘gen’: 13, ‘ttl’: 2589074} {‘username’: u’pietro’}

(‘test’, ‘users’, u’giuseppe’, bytearray(b’\xbf\x90\x01\xdf\x8c=\xf1\xfe\xda\xfa\nL\xaf\xec\nQ\xa4xE\x97’)) {‘gen’: 4, ‘ttl’: 2528094} {‘username’: u’giuseppe’}

(‘test’, ‘users’, u’moreno’, bytearray(b’t]87X\xb2\x82\x01\xf9\xc7P\xbcW\x81\xffKO\xd4\xd0\x18’)) {‘gen’: 2, ‘ttl’: 1901458} {‘username’: u’moreno’}

3 records.

The hosts have Linux Centos as their operating system:

CentOS release 6.5 (Final)

uname -a

Linux couchbase-1 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

pratyyy · May 15, 2015, 9:13am

Thanks for the info.

Can you please also send us the output of command

asinfo -v “statistics”

Thanks

moreno · May 15, 2015, 9:54am

ASINFO on node 1:

cluster_size=3 cluster_key=A5A28076E110C33B cluster_integrity=true objects=2010382 sub-records=0 total-bytes-disk=0 used-bytes-disk=0 free-pct-disk=0 total-bytes-memory=8589934592 used-bytes-memory=282140675 data-used-bytes-memory=127340916 index-used-bytes-memory=128664448 sindex-used-bytes-memory=26135311 free-pct-memory=96 stat_read_reqs=3 stat_read_reqs_xdr=0 stat_read_success=0 stat_read_errs_notfound=3 stat_read_errs_other=0 stat_write_reqs=4 stat_write_reqs_xdr=0 stat_write_success=3 stat_write_errs=1 stat_xdr_pipe_writes=0 stat_xdr_pipe_miss=0 stat_delete_success=8 stat_rw_timeout=0 udf_read_reqs=0 udf_read_success=0 udf_read_errs_other=0 udf_write_reqs=0 udf_write_success=0 udf_write_err_others=0 udf_delete_reqs=0 udf_delete_success=0 udf_delete_err_others=0 udf_lua_errs=0 udf_scan_rec_reqs=0 udf_query_rec_reqs=0 udf_replica_writes=0 stat_proxy_reqs=0 stat_proxy_reqs_xdr=0 stat_proxy_success=0 stat_proxy_errs=0 stat_ldt_proxy=0 stat_cluster_key_trans_to_proxy_retry=0 stat_cluster_key_transaction_reenqueue=0 stat_slow_trans_queue_push=0 stat_slow_trans_queue_pop=0 stat_slow_trans_queue_batch_pop=0 stat_cluster_key_regular_processed=0 stat_cluster_key_prole_retry=0 stat_cluster_key_err_ack_dup_trans_reenqueue=0 stat_cluster_key_partition_transaction_queue_count=0 stat_cluster_key_err_ack_rw_trans_reenqueue=0 stat_expired_objects=0 stat_evicted_objects=0 stat_deleted_set_objects=0 stat_evicted_set_objects=0 stat_evicted_objects_time=0 stat_zero_bin_records=0 stat_nsup_deletes_not_shipped=0 err_tsvc_requests=1 err_out_of_space=0 err_duplicate_proxy_request=0 err_rw_request_not_found=0 err_rw_pending_limit=0 err_rw_cant_put_unique=0 fabric_msgs_sent=10263439 fabric_msgs_rcvd=10263420 paxos_principal=BB9CF35A5565000 migrate_msgs_sent=10221978 migrate_msgs_recv=10263301 migrate_progress_send=0 migrate_progress_recv=0 migrate_num_incoming_accepted=16464 migrate_num_incoming_refused=0 queue=0 transactions=697914 reaped_fds=12 tscan_initiate=840 tscan_pending=0 tscan_succeeded=836 tscan_aborted=0 batch_initiate=0 batch_queue=0 batch_tree_count=0 batch_timeout=0 batch_errors=0 info_queue=0 delete_queue=0 proxy_in_progress=0 proxy_initiate=0 proxy_action=0 proxy_retry=0 proxy_retry_q_full=0 proxy_unproxy=0 proxy_retry_same_dest=0 proxy_retry_new_dest=0 write_master=4 write_prole=28 read_dup_prole=0 rw_err_dup_internal=0 rw_err_dup_cluster_key=0 rw_err_dup_send=0 rw_err_write_internal=0 rw_err_write_cluster_key=0 rw_err_write_send=0 rw_err_ack_internal=0 rw_err_ack_nomatch=0 rw_err_ack_badnode=0 client_connections=2 waiting_transactions=0 tree_count=0 record_refs=2010382 record_locks=0 migrate_tx_objs=0 migrate_rx_objs=0 ongoing_write_reqs=0 err_storage_queue_full=0 partition_actual=2728 partition_replica=2760 partition_desync=0 partition_absent=2704 partition_object_count=2010382 partition_ref_count=8192 system_free_mem_pct=78 sindex_ucgarbage_found=0 sindex_gc_locktimedout=0 sindex_gc_inactivity_dur=758318989 sindex_gc_activity_dur=1205011 sindex_gc_list_creation_time=1200847 sindex_gc_list_deletion_time=1035 sindex_gc_objects_validated=721678292 sindex_gc_garbage_found=347617 sindex_gc_garbage_cleaned=347617 system_swapping=false err_replica_null_node=0 err_replica_non_null_node=0 err_sync_copy_null_node=0 err_sync_copy_null_master=0 storage_defrag_corrupt_record=0 err_write_fail_prole_unknown=0 err_write_fail_prole_generation=0 err_write_fail_unknown=0 err_write_fail_key_exists=0 err_write_fail_generation=0 err_write_fail_generation_xdr=0 err_write_fail_bin_exists=0 err_write_fail_parameter=0 err_write_fail_incompatible_type=0 err_write_fail_noxdr=0 err_write_fail_prole_delete=0 err_write_fail_not_found=0 err_write_fail_key_mismatch=0 err_write_fail_record_too_big=0 err_write_fail_bin_name=0 err_write_fail_bin_not_found=0 err_write_fail_forbidden=0 stat_duplicate_operation=0 uptime=761715 stat_write_errs_notfound=1 stat_write_errs_other=0 heartbeat_received_self=5051021 heartbeat_received_foreign=9711102 query_reqs=25 query_success=15 query_fail=10 query_abort=0 query_avg_rec_count=0 query_short_queue_full=0 query_long_queue_full=0 query_short_running=15 query_long_running=0 query_tracked=0 query_agg=0 query_agg_success=0 query_agg_err=0 query_agg_abort=0 query_agg_avg_rec_count=0 query_lookups=15 query_lookup_success=15 query_lookup_err=0 query_lookup_abort=0 query_lookup_avg_rec_count=0

ASINFO on node 2:

cluster_size=3 cluster_key=A5A28076E110C33B cluster_integrity=true objects=1984282 sub-records=0 total-bytes-disk=0 used-bytes-disk=0 free-pct-disk=0 total-bytes-memory=8589934592 used-bytes-memory=278934760 data-used-bytes-memory=125643464 index-used-bytes-memory=126994048 sindex-used-bytes-memory=26297248 free-pct-memory=96 stat_read_reqs=120 stat_read_reqs_xdr=0 stat_read_success=105 stat_read_errs_notfound=15 stat_read_errs_other=0 stat_write_reqs=1492600 stat_write_reqs_xdr=0 stat_write_success=1492595 stat_write_errs=5 stat_xdr_pipe_writes=0 stat_xdr_pipe_miss=0 stat_delete_success=44 stat_rw_timeout=0 udf_read_reqs=4 udf_read_success=2 udf_read_errs_other=2 udf_write_reqs=14 udf_write_success=14 udf_write_err_others=0 udf_delete_reqs=0 udf_delete_success=0 udf_delete_err_others=0 udf_lua_errs=0 udf_scan_rec_reqs=4 udf_query_rec_reqs=4 udf_replica_writes=0 stat_proxy_reqs=0 stat_proxy_reqs_xdr=0 stat_proxy_success=0 stat_proxy_errs=0 stat_ldt_proxy=0 stat_cluster_key_trans_to_proxy_retry=0 stat_cluster_key_transaction_reenqueue=0 stat_slow_trans_queue_push=0 stat_slow_trans_queue_pop=0 stat_slow_trans_queue_batch_pop=0 stat_cluster_key_regular_processed=0 stat_cluster_key_prole_retry=0 stat_cluster_key_err_ack_dup_trans_reenqueue=0 stat_cluster_key_partition_transaction_queue_count=0 stat_cluster_key_err_ack_rw_trans_reenqueue=0 stat_expired_objects=0 stat_evicted_objects=0 stat_deleted_set_objects=0 stat_evicted_set_objects=0 stat_evicted_objects_time=0 stat_zero_bin_records=0 stat_nsup_deletes_not_shipped=0 err_tsvc_requests=5 err_out_of_space=0 err_duplicate_proxy_request=0 err_rw_request_not_found=0 err_rw_pending_limit=0 err_rw_cant_put_unique=0 fabric_msgs_sent=12659592 fabric_msgs_rcvd=12659550 paxos_principal=BB9CF35A5565000 migrate_msgs_sent=9605908 migrate_msgs_recv=9659115 migrate_progress_send=0 migrate_progress_recv=0 migrate_num_incoming_accepted=24452 migrate_num_incoming_refused=0 queue=0 transactions=2889093 reaped_fds=94 tscan_initiate=781 tscan_pending=0 tscan_succeeded=789 tscan_aborted=0 batch_initiate=0 batch_queue=0 batch_tree_count=0 batch_timeout=0 batch_errors=0 info_queue=0 delete_queue=0 proxy_in_progress=0 proxy_initiate=0 proxy_action=0 proxy_retry=0 proxy_retry_q_full=0 proxy_unproxy=0 proxy_retry_same_dest=0 proxy_retry_new_dest=0 write_master=1492600 write_prole=1507713 read_dup_prole=0 rw_err_dup_internal=0 rw_err_dup_cluster_key=0 rw_err_dup_send=0 rw_err_write_internal=0 rw_err_write_cluster_key=0 rw_err_write_send=0 rw_err_ack_internal=0 rw_err_ack_nomatch=0 rw_err_ack_badnode=0 client_connections=4 waiting_transactions=0 tree_count=0 record_refs=1984282 record_locks=0 migrate_tx_objs=0 migrate_rx_objs=0 ongoing_write_reqs=0 err_storage_queue_full=0 partition_actual=2708 partition_replica=2712 partition_desync=0 partition_absent=2772 partition_object_count=1984282 partition_ref_count=8192 system_free_mem_pct=71 sindex_ucgarbage_found=0 sindex_gc_locktimedout=0 sindex_gc_inactivity_dur=2129846129 sindex_gc_activity_dur=3499873 sindex_gc_list_creation_time=3489177 sindex_gc_list_deletion_time=2072 sindex_gc_objects_validated=2053448283 sindex_gc_garbage_found=696100 sindex_gc_garbage_cleaned=696100 system_swapping=false err_replica_null_node=0 err_replica_non_null_node=0 err_sync_copy_null_node=0 err_sync_copy_null_master=0 storage_defrag_corrupt_record=0 err_write_fail_prole_unknown=0 err_write_fail_prole_generation=0 err_write_fail_unknown=0 err_write_fail_key_exists=4 err_write_fail_generation=0 err_write_fail_generation_xdr=0 err_write_fail_bin_exists=0 err_write_fail_parameter=0 err_write_fail_incompatible_type=0 err_write_fail_noxdr=0 err_write_fail_prole_delete=0 err_write_fail_not_found=0 err_write_fail_key_mismatch=0 err_write_fail_record_too_big=0 err_write_fail_bin_name=0 err_write_fail_bin_not_found=0 err_write_fail_forbidden=0 stat_duplicate_operation=0 uptime=2152798 stat_write_errs_notfound=1 stat_write_errs_other=4 heartbeat_received_self=14275817 heartbeat_received_foreign=18934583 query_reqs=101 query_success=60 query_fail=31 query_abort=10 query_avg_rec_count=1259 query_short_queue_full=0 query_long_queue_full=0 query_short_running=46 query_long_running=24 query_tracked=14 query_agg=25 query_agg_success=15 query_agg_err=0 query_agg_abort=10 query_agg_avg_rec_count=3523 query_lookups=45 query_lookup_success=45 query_lookup_err=0 query_lookup_abort=0 query_lookup_avg_rec_count=0

ASINFO on node 3:

cluster_size=3 cluster_key=A5A28076E110C33B cluster_integrity=true objects=2005344 sub-records=0 total-bytes-disk=0 used-bytes-disk=0 free-pct-disk=0 total-bytes-memory=8589934592 used-bytes-memory=277329629 data-used-bytes-memory=127016768 index-used-bytes-memory=128342016 sindex-used-bytes-memory=21970845 free-pct-memory=96 stat_read_reqs=0 stat_read_reqs_xdr=0 stat_read_success=0 stat_read_errs_notfound=0 stat_read_errs_other=0 stat_write_reqs=0 stat_write_reqs_xdr=0 stat_write_success=0 stat_write_errs=0 stat_xdr_pipe_writes=0 stat_xdr_pipe_miss=0 stat_delete_success=0 stat_rw_timeout=0 udf_read_reqs=0 udf_read_success=0 udf_read_errs_other=0 udf_write_reqs=0 udf_write_success=0 udf_write_err_others=0 udf_delete_reqs=0 udf_delete_success=0 udf_delete_err_others=0 udf_lua_errs=0 udf_scan_rec_reqs=0 udf_query_rec_reqs=0 udf_replica_writes=0 stat_proxy_reqs=0 stat_proxy_reqs_xdr=0 stat_proxy_success=0 stat_proxy_errs=0 stat_ldt_proxy=0 stat_cluster_key_trans_to_proxy_retry=0 stat_cluster_key_transaction_reenqueue=0 stat_slow_trans_queue_push=0 stat_slow_trans_queue_pop=0 stat_slow_trans_queue_batch_pop=0 stat_cluster_key_regular_processed=0 stat_cluster_key_prole_retry=0 stat_cluster_key_err_ack_dup_trans_reenqueue=0 stat_cluster_key_partition_transaction_queue_count=0 stat_cluster_key_err_ack_rw_trans_reenqueue=0 stat_expired_objects=0 stat_evicted_objects=0 stat_deleted_set_objects=0 stat_evicted_set_objects=0 stat_evicted_objects_time=0 stat_zero_bin_records=0 stat_nsup_deletes_not_shipped=0 err_tsvc_requests=0 err_out_of_space=0 err_duplicate_proxy_request=0 err_rw_request_not_found=0 err_rw_pending_limit=0 err_rw_cant_put_unique=0 fabric_msgs_sent=3042378 fabric_msgs_rcvd=3042365 paxos_principal=BB9CF35A5565000 migrate_msgs_sent=3028681 migrate_msgs_recv=3042350 migrate_progress_send=0 migrate_progress_recv=0 migrate_num_incoming_accepted=5476 migrate_num_incoming_refused=0 queue=0 transactions=56866 reaped_fds=0 tscan_initiate=18 tscan_pending=0 tscan_succeeded=18 tscan_aborted=0 batch_initiate=0 batch_queue=0 batch_tree_count=0 batch_timeout=0 batch_errors=0 info_queue=0 delete_queue=0 proxy_in_progress=0 proxy_initiate=0 proxy_action=0 proxy_retry=0 proxy_retry_q_full=0 proxy_unproxy=0 proxy_retry_same_dest=0 proxy_retry_new_dest=0 write_master=0 write_prole=0 read_dup_prole=0 rw_err_dup_internal=0 rw_err_dup_cluster_key=0 rw_err_dup_send=0 rw_err_write_internal=0 rw_err_write_cluster_key=0 rw_err_write_send=0 rw_err_ack_internal=0 rw_err_ack_nomatch=0 rw_err_ack_badnode=0 client_connections=3 waiting_transactions=0 tree_count=0 record_refs=2005344 record_locks=0 migrate_tx_objs=0 migrate_rx_objs=0 ongoing_write_reqs=0 err_storage_queue_full=0 partition_actual=2756 partition_replica=2720 partition_desync=0 partition_absent=2716 partition_object_count=2005344 partition_ref_count=8192 system_free_mem_pct=80 sindex_ucgarbage_found=0 sindex_gc_locktimedout=0 sindex_gc_inactivity_dur=10808110 sindex_gc_activity_dur=15890 sindex_gc_list_creation_time=15866 sindex_gc_list_deletion_time=2 sindex_gc_objects_validated=9360170 sindex_gc_garbage_found=0 sindex_gc_garbage_cleaned=0 system_swapping=false err_replica_null_node=0 err_replica_non_null_node=0 err_sync_copy_null_node=0 err_sync_copy_null_master=0 storage_defrag_corrupt_record=0 err_write_fail_prole_unknown=0 err_write_fail_prole_generation=0 err_write_fail_unknown=0 err_write_fail_key_exists=0 err_write_fail_generation=0 err_write_fail_generation_xdr=0 err_write_fail_bin_exists=0 err_write_fail_parameter=0 err_write_fail_incompatible_type=0 err_write_fail_noxdr=0 err_write_fail_prole_delete=0 err_write_fail_not_found=0 err_write_fail_key_mismatch=0 err_write_fail_record_too_big=0 err_write_fail_bin_name=0 err_write_fail_bin_not_found=0 err_write_fail_forbidden=0 stat_duplicate_operation=0 uptime=10861 stat_write_errs_notfound=0 stat_write_errs_other=0 heartbeat_received_self=72117 heartbeat_received_foreign=144224 query_reqs=0 query_success=0 query_fail=0 query_abort=0 query_avg_rec_count=0 query_short_queue_full=0 query_long_queue_full=0 query_short_running=0 query_long_running=0 query_tracked=0 query_agg=0 query_agg_success=0 query_agg_err=0 query_agg_abort=0 query_agg_avg_rec_count=0 query_lookups=0 query_lookup_success=0 query_lookup_err=0 query_lookup_abort=0 query_lookup_avg_rec_count=0

Thanks again! Moreno

pratyyy · May 15, 2015, 10:27am

Hi Moreno,

Since you have lot of data in the namespace, there is a possibility that you might have inserted some records in this set earlier. Scanning different nodes should not change the result.

Lets investigate this further. Can you please take backup of set in question ? You can use the following command-

asbackup -d <dir_name> -n namespace_name -s set_name -h host_name

Also can you share the result of following command on each host ?

asinfo -v “sets”

Thanks

moreno · May 15, 2015, 12:18pm

asbackup fails on 3 nodes, output is:

asbackup -d /MORENO_91/BACKUP -n test -s users -h 10.100.0.2

Backing up From: host 10.100.0.2 port 3000 namespace test set users bin_list (null) to directory /MORENO_91/BACKUP with scan_pct 100 Nodes are repeated or different addresses of same node. Give proper input

asinfo -v “sets” node1:

ns_name=test:set_name=demo2:n_objects=1340058:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:set-dele te=false;ns_name=test:set_name=ccc:n_objects=670322:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:se t-delete=false;ns_name=test:set_name=tweets:n_objects=1:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-defaul t:set-delete=false;ns_name=test:set_name=users:n_objects=1:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-def ault:set-delete=false;

asinfo -v “sets” node2:

ns_name=test:set_name=ccc:n_objects=661095:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:set-delete= false;ns_name=test:set_name=demo2:n_objects=1323181:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:se t-delete=false;ns_name=test:set_name=users:n_objects=3:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default :set-delete=false;ns_name=test:set_name=tweets:n_objects=3:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-def ault:set-delete=false;

asinfo -v “sets” node3:

ns_name=test:set_name=demo2:n_objects=1336759:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:set-dele te=false;ns_name=test:set_name=ccc:n_objects=668581:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:se t-delete=false;ns_name=test:set_name=tweets:n_objects=2:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-defaul t:set-delete=false;ns_name=test:set_name=users:n_objects=2:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-def ault:set-delete=false;

pratyyy · May 15, 2015, 1:37pm

Hi Moreno,

Looking at the backup failure, it seems service list of the nodes is corrupted somehow at some node. This might trigger a scan twice at some node.

To confirm this -

Can you share the output of following command from each node with us ?

asinfo -v “services”

Also to be 100 % sure, can you share the config file of each node with us ?

moreno · May 15, 2015, 3:01pm

Hi pratyyy,

Node 1

ip a

eth0: 138.132.43.90/24 eth1: 10.100.0.2/24 eth1.1: 20.100.0.2/24

asinfo -v “services”

10.100.0.3:3000;20.100.0.3:3000;138.132.43.91:3000;10.100.0.4:3000;138.132.43.92:3000

Node 2

ip a eth0: 138.132.43.91/24 eth1: 10.100.0.3/24 eth1.1: 20.100.0.3/24

asinfo -v “services”

10.100.0.2:3000;20.100.0.2:3000;138.132.43.90:3000;10.100.0.4:3000;138.132.43.92:3000

Node 3

ip a

eth0: 138.132.43.92/24 eth1: 10.100.0.4/24 (no eth1.1)

asinfo -v “services”

10.100.0.3:3000;20.100.0.3:3000;138.132.43.91:3000;10.100.0.2:3000;20.100.0.2:3000;138.132.43.90:3000

/etc/aerospike/aerospike.conf (is the same, identical, on the 3 nodes)

(sorry, copy and paste of the file is not easily readable)

service { user root group root paxos-single-replica-limit 1 pidfile /var/run/aerospike/asd.pid service-threads 4 transaction-queues 4 transaction-threads-per-queue 4 proto-fd-max 15000 }

logging { file /var/log/aerospike/aerospike.log { context any info } }

network { service { address any port 3000 }

    heartbeat {
            mode multicast
            address 239.1.99.222
            port 9918
            interval 150
            timeout 10
    }

    fabric {
            port 3001
    }

    info {
            port 3003
    }

}

namespace test { replication-factor 2 memory-size 4G default-ttl 30d storage-engine memory }

namespace bar { replication-factor 2 memory-size 4G default-ttl 30d storage-engine memory }

moreno · May 18, 2015, 9:11am

Hi pratyyy, do you have any news?

Do you need other info?

Thanks Moreno

jyoti · May 18, 2015, 10:53am

Hi Moreno, Extremely sorry for late reply. From your as-info result we come to know that you have multiple nic addresses of same machine. Which creates same node-id for multiple nics. We will enhance our scan api to handle duplicate node ids efficiently.

You can resolve your issue by specifying external address of each node in service section of your config file. e.g.

service {
    address any
    port 3000
    access-address 10.100.0.03
}

By adding access-address, each node will expose only one ip for client connection.

Let us know whether above solution is working for you or you need more help?

moreno · May 18, 2015, 12:07pm

Thank you jyoti (late reply? You and pratyyy are so collaborative with me :-)).

I changed aerospike.conf on 3 nodes according your suggestion.

Node 1:

Added row:

access-address 10.100.0.2

Node2:

Added row:

access-address 10.100.0.3

Node3:

Added row:

access-address 10.100.0.4

The output of asinfo -v “services” seems to be right:

Node1:

10.100.0.3:3000;10.100.0.4:3000

Node2:

10.100.0.2:3000;10.100.0.4:3000

Node3:

10.100.0.3:3000;10.100.0.2:3000

I restarted aerospike service in this way:

service aerospike stop

service aerospike start

Bad new is that now I see only 1 record for “Users” SET and 1 for “Tweets” SET.

Something wrong in this sequence of commands and aerospike.conf update?

Thanks again Moreno

jyoti · May 18, 2015, 12:10pm

Can you run as backup now. And just give us the backup output.

moreno · May 18, 2015, 12:30pm

All 3 backups were ok.

Output for Node1 (for example):

asbackup -d /MORENO_90/BACKUP -n test -s users

Backing up From: host 127.0.0.1 port 3000 namespace test set users bin_list (null) to directory /MORENO_90/BACKUP with scan_pct 100

Aerospike scan nodes: 3 nodes

Node_name Objects Rep_fact

BB96753A5565000 469350 2

BB9CF35A5565000 293504 2

BB9B1ECA5565000 329158 2

directory “/MORENO_90/BACKUP” prepared for backup

starting backup for node BB96753A5565000

starting backup for node BB9CF35A5565000

starting backup for node BB9B1ECA5565000

May 18 2015 12:22:11 GMT: New file created /MORENO_90/BACKUP/BB9CF35A5565000_00000.asb

May 18 2015 12:22:11 GMT: New file created /MORENO_90/BACKUP/BB96753A5565000_00000.asb

May 18 2015 12:22:11 GMT: New file created /MORENO_90/BACKUP/BB9B1ECA5565000_00000.asb

Complete backup for node BB9B1ECA5565000 and total backed up from this node: 0

Complete backup for node BB96753A5565000 and total backed up from this node: 0

Complete backup for node BB9CF35A5565000 and total backed up from this node: 1

May 18 2015 12:22:12 GMT: backed up records 0%

May 18 2015 12:22:12 GMT: backed up records 100%

May 18 2015 12:22:12 GMT: Backup successfully completed.

May 18 2015 12:22:12 GMT: Total backed up records from all nodes 1

jyoti · May 18, 2015, 12:40pm

From asbackup output it seems you have only one record in users set. Are you sure there are 3 records in users set? You can confirm this using following command in each node:

asinfo -v “sets”

moreno · May 18, 2015, 1:18pm

asinfo -v “sets” reported 1 record for set_name=Users

I didn’t deleted the other 2 users (and the other 2 tweets).

So I stopped Aerospike, replaced the right aerospike.conf with older version (where access-address was not specified).

After restarting it I had duplicated records.

When I stopped again Aerospike, replaced aerospike.conf with the right version (with “access-address” row) and restarted Aerospike, ALL Users and Tweets records were disappeared!

(asinfo -v “sets” has now empty output)

I don’y know if this last test is an interesting or a stupid test: perhaps the best thing is to delete all previous records and with the correct aerospike.conf restart from the beginning.

jyoti · May 18, 2015, 1:42pm

Actually your data is in-memory thats why you lost your records. Sorry I could have explain you the steps in proper way. Whenever you do node restart do one by one and wait for migration to finish.

moreno · May 18, 2015, 1:51pm

Ok jyoti.

No problem for data loss.

I’m only doing preliminary tests with Aerospike (the ICT company where I work is evaluating if it’s possible to use a NoSql DB, in cloud, in order to store real time application’s data).

jyoti · May 18, 2015, 2:11pm

We are able reproduce this issue when multiple NICs are there. It seems this is due to some client bug. We are fixing this. Thanks for the patience. Looking forward to have more queries from you.

Thanks, Jyoti

moreno · May 18, 2015, 2:22pm

Ok jyoti.

Thanks for your patience!

After preliminary tests I’ll switch to stress test (with related questions )

Topic		Replies	Views
Inconsistent result if fetching a key when 1 node crashed on 4 node Aerospike cluster (3.9.0) AQL	31	3970	October 14, 2016
Total Number of records from aql & fetched through query do not match Java Client aql	14	20522	November 7, 2017
Aerospike showing records that aren't there	8	2229	December 15, 2016
Are AQL query results supposed to include replica copies? AQL	5	2562	June 26, 2015
scanAll() yields inconsistent results Java Client	4	1824	February 28, 2017

Scan and duplicated records (AER-3648)

Configure the client

Related topics