Scan and duplicated records (AER-3648)


#1

HI all,

I’m using Aerospike 3.5.4

Cluster is composed of 3 nodes (replication factor 2).

Using Java Client (the one of the online course) I inserted 3 records in “Users” SET and 3 records in “Tweets” SET.

When Java Client tries to scan all tweets of all users it gets duplicated records!

The same result if a use as client a python script or AQL.

In all 3 cases clients connect to the Cluster specifying the IP Address of one of the 3 nodes.

If clients specify, in connect phase, Ip Addresses of all the 3 Nodes, scan run correctly and duplicated records are not returned.

Am I missing something?

Many thanks in advance. Moreno


#2

Hi moreno,

Can you share the following information with us ?

  1. Is the namespace same for both the sets ?
  2. AQL command and its output.

Thanks


#3

Thank you pratyyy!

Yes, namespace is the same for both the sets.

Aerospike.conf:

namespace test { replication-factor 2 memory-size 4G default-ttl 30d storage-engine memory }

SETS are “users” and “tweets”.

AQL output when I connect to localhost:

aql> select * from test.users ±-----------±-----------±---------±-------±--------------±-----------±------------±-------+ | key | username | password | region | lasttweeted | tweetcount | interests | gender | ±-----------±-----------±---------±-------±--------------±-----------±------------±-------+ | “pietro” | “pietro” | “qwerty” | “n” | 1430984773607 | 1 | [“inter”] | “m” | | “pietro” | “pietro” | “qwerty” | “n” | 1430984773607 | 1 | [“inter”] | “m” | | “pietro” | “pietro” | “qwerty” | “n” | 1430984773607 | 1 | [“inter”] | “m” | | “giuseppe” | “giuseppe” | “qwerty” | “s” | 1430984928958 | 1 | [“sicilia”] | “f” | | “moreno” | “moreno” | “qwerty” | “n” | 1430984519751 | 1 | [“juve”] | “m” | | “giuseppe” | “giuseppe” | “qwerty” | “s” | 1430984928958 | 1 | [“sicilia”] | “f” | | “moreno” | “moreno” | “qwerty” | “n” | 1430984519751 | 1 | [“juve”] | “m” | ±-----------±-----------±---------±-------±--------------±-----------±------------±-------+

7 rows in set (1.392 secs)

(7 rows when I inserted 3)

If I connect to another host in the cluster:

aql -h 10.100.0.4

aql> select * from test.users ±-----------±-----------±---------±-------±-------±--------------±-----------±------------+ | key | username | password | gender | region | lasttweeted | tweetcount | interests | ±-----------±-----------±---------±-------±-------±--------------±-----------±------------+ | “giuseppe” | “giuseppe” | “qwerty” | “f” | “s” | 1430984928958 | 1 | [“sicilia”] | | “moreno” | “moreno” | “qwerty” | “m” | “n” | 1430984519751 | 1 | [“juve”] | | “pietro” | “pietro” | “qwerty” | “m” | “n” | 1430984773607 | 1 | [“inter”] | | “pietro” | “pietro” | “qwerty” | “m” | “n” | 1430984773607 | 1 | [“inter”] | | “pietro” | “pietro” | “qwerty” | “m” | “n” | 1430984773607 | 1 | [“inter”] | ±-----------±-----------±---------±-------±-------±--------------±-----------±------------+ 5 rows in set (1.563 secs)

The same when I use a python client:

./users.10.100.0.3.py

(‘test’, ‘users’, u’pietro’, bytearray(b’\xc8_\xf85\xd1\xfe\x00&\xfc\xe1\x0ctO,\x83\xf17!\xe3\xcf’)) {‘gen’: 13, ‘ttl’: 2589144} {‘username’: u’pietro’}

(‘test’, ‘users’, u’giuseppe’, bytearray(b’\xbf\x90\x01\xdf\x8c=\xf1\xfe\xda\xfa\nL\xaf\xec\nQ\xa4xE\x97’)) {‘gen’: 4, ‘ttl’: 2528163} {‘username’: u’giuseppe’}

(‘test’, ‘users’, u’moreno’, bytearray(b’t]87X\xb2\x82\x01\xf9\xc7P\xbcW\x81\xffKO\xd4\xd0\x18’)) {‘gen’: 2, ‘ttl’: 1901527} {‘username’: u’moreno’}

(‘test’, ‘users’, u’giuseppe’, bytearray(b’\xbf\x90\x01\xdf\x8c=\xf1\xfe\xda\xfa\nL\xaf\xec\nQ\xa4xE\x97’)) {‘gen’: 4, ‘ttl’: 2528163} {‘username’: u’giuseppe’}

(‘test’, ‘users’, u’moreno’, bytearray(b’t]87X\xb2\x82\x01\xf9\xc7P\xbcW\x81\xffKO\xd4\xd0\x18’)) {‘gen’: 2, ‘ttl’: 1901527} {‘username’: u’moreno’}

5 records.

If I use a client python that connect to hosts 10.100.0.3 and 10.100.0.4 the result is OK:

./users.10.100.0.3e4.py

(Here are the first rows of python script:

Configure the client

config = { ‘hosts’: [ (‘10.100.0.3’, 3000), (‘10.100.0.4’, 3000)] }

and its execution:

(‘test’, ‘users’, u’pietro’, bytearray(b’\xc8_\xf85\xd1\xfe\x00&\xfc\xe1\x0ctO,\x83\xf17!\xe3\xcf’)) {‘gen’: 13, ‘ttl’: 2589074} {‘username’: u’pietro’}

(‘test’, ‘users’, u’giuseppe’, bytearray(b’\xbf\x90\x01\xdf\x8c=\xf1\xfe\xda\xfa\nL\xaf\xec\nQ\xa4xE\x97’)) {‘gen’: 4, ‘ttl’: 2528094} {‘username’: u’giuseppe’}

(‘test’, ‘users’, u’moreno’, bytearray(b’t]87X\xb2\x82\x01\xf9\xc7P\xbcW\x81\xffKO\xd4\xd0\x18’)) {‘gen’: 2, ‘ttl’: 1901458} {‘username’: u’moreno’}

3 records.

The hosts have Linux Centos as their operating system:

CentOS release 6.5 (Final)

uname -a

Linux couchbase-1 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux


#5

Thanks for the info.

Can you please also send us the output of command

asinfo -v “statistics”

Thanks


#6

ASINFO on node 1:

cluster_size=3 cluster_key=A5A28076E110C33B cluster_integrity=true objects=2010382 sub-records=0 total-bytes-disk=0 used-bytes-disk=0 free-pct-disk=0 total-bytes-memory=8589934592 used-bytes-memory=282140675 data-used-bytes-memory=127340916 index-used-bytes-memory=128664448 sindex-used-bytes-memory=26135311 free-pct-memory=96 stat_read_reqs=3 stat_read_reqs_xdr=0 stat_read_success=0 stat_read_errs_notfound=3 stat_read_errs_other=0 stat_write_reqs=4 stat_write_reqs_xdr=0 stat_write_success=3 stat_write_errs=1 stat_xdr_pipe_writes=0 stat_xdr_pipe_miss=0 stat_delete_success=8 stat_rw_timeout=0 udf_read_reqs=0 udf_read_success=0 udf_read_errs_other=0 udf_write_reqs=0 udf_write_success=0 udf_write_err_others=0 udf_delete_reqs=0 udf_delete_success=0 udf_delete_err_others=0 udf_lua_errs=0 udf_scan_rec_reqs=0 udf_query_rec_reqs=0 udf_replica_writes=0 stat_proxy_reqs=0 stat_proxy_reqs_xdr=0 stat_proxy_success=0 stat_proxy_errs=0 stat_ldt_proxy=0 stat_cluster_key_trans_to_proxy_retry=0 stat_cluster_key_transaction_reenqueue=0 stat_slow_trans_queue_push=0 stat_slow_trans_queue_pop=0 stat_slow_trans_queue_batch_pop=0 stat_cluster_key_regular_processed=0 stat_cluster_key_prole_retry=0 stat_cluster_key_err_ack_dup_trans_reenqueue=0 stat_cluster_key_partition_transaction_queue_count=0 stat_cluster_key_err_ack_rw_trans_reenqueue=0 stat_expired_objects=0 stat_evicted_objects=0 stat_deleted_set_objects=0 stat_evicted_set_objects=0 stat_evicted_objects_time=0 stat_zero_bin_records=0 stat_nsup_deletes_not_shipped=0 err_tsvc_requests=1 err_out_of_space=0 err_duplicate_proxy_request=0 err_rw_request_not_found=0 err_rw_pending_limit=0 err_rw_cant_put_unique=0 fabric_msgs_sent=10263439 fabric_msgs_rcvd=10263420 paxos_principal=BB9CF35A5565000 migrate_msgs_sent=10221978 migrate_msgs_recv=10263301 migrate_progress_send=0 migrate_progress_recv=0 migrate_num_incoming_accepted=16464 migrate_num_incoming_refused=0 queue=0 transactions=697914 reaped_fds=12 tscan_initiate=840 tscan_pending=0 tscan_succeeded=836 tscan_aborted=0 batch_initiate=0 batch_queue=0 batch_tree_count=0 batch_timeout=0 batch_errors=0 info_queue=0 delete_queue=0 proxy_in_progress=0 proxy_initiate=0 proxy_action=0 proxy_retry=0 proxy_retry_q_full=0 proxy_unproxy=0 proxy_retry_same_dest=0 proxy_retry_new_dest=0 write_master=4 write_prole=28 read_dup_prole=0 rw_err_dup_internal=0 rw_err_dup_cluster_key=0 rw_err_dup_send=0 rw_err_write_internal=0 rw_err_write_cluster_key=0 rw_err_write_send=0 rw_err_ack_internal=0 rw_err_ack_nomatch=0 rw_err_ack_badnode=0 client_connections=2 waiting_transactions=0 tree_count=0 record_refs=2010382 record_locks=0 migrate_tx_objs=0 migrate_rx_objs=0 ongoing_write_reqs=0 err_storage_queue_full=0 partition_actual=2728 partition_replica=2760 partition_desync=0 partition_absent=2704 partition_object_count=2010382 partition_ref_count=8192 system_free_mem_pct=78 sindex_ucgarbage_found=0 sindex_gc_locktimedout=0 sindex_gc_inactivity_dur=758318989 sindex_gc_activity_dur=1205011 sindex_gc_list_creation_time=1200847 sindex_gc_list_deletion_time=1035 sindex_gc_objects_validated=721678292 sindex_gc_garbage_found=347617 sindex_gc_garbage_cleaned=347617 system_swapping=false err_replica_null_node=0 err_replica_non_null_node=0 err_sync_copy_null_node=0 err_sync_copy_null_master=0 storage_defrag_corrupt_record=0 err_write_fail_prole_unknown=0 err_write_fail_prole_generation=0 err_write_fail_unknown=0 err_write_fail_key_exists=0 err_write_fail_generation=0 err_write_fail_generation_xdr=0 err_write_fail_bin_exists=0 err_write_fail_parameter=0 err_write_fail_incompatible_type=0 err_write_fail_noxdr=0 err_write_fail_prole_delete=0 err_write_fail_not_found=0 err_write_fail_key_mismatch=0 err_write_fail_record_too_big=0 err_write_fail_bin_name=0 err_write_fail_bin_not_found=0 err_write_fail_forbidden=0 stat_duplicate_operation=0 uptime=761715 stat_write_errs_notfound=1 stat_write_errs_other=0 heartbeat_received_self=5051021 heartbeat_received_foreign=9711102 query_reqs=25 query_success=15 query_fail=10 query_abort=0 query_avg_rec_count=0 query_short_queue_full=0 query_long_queue_full=0 query_short_running=15 query_long_running=0 query_tracked=0 query_agg=0 query_agg_success=0 query_agg_err=0 query_agg_abort=0 query_agg_avg_rec_count=0 query_lookups=15 query_lookup_success=15 query_lookup_err=0 query_lookup_abort=0 query_lookup_avg_rec_count=0


ASINFO on node 2:

cluster_size=3 cluster_key=A5A28076E110C33B cluster_integrity=true objects=1984282 sub-records=0 total-bytes-disk=0 used-bytes-disk=0 free-pct-disk=0 total-bytes-memory=8589934592 used-bytes-memory=278934760 data-used-bytes-memory=125643464 index-used-bytes-memory=126994048 sindex-used-bytes-memory=26297248 free-pct-memory=96 stat_read_reqs=120 stat_read_reqs_xdr=0 stat_read_success=105 stat_read_errs_notfound=15 stat_read_errs_other=0 stat_write_reqs=1492600 stat_write_reqs_xdr=0 stat_write_success=1492595 stat_write_errs=5 stat_xdr_pipe_writes=0 stat_xdr_pipe_miss=0 stat_delete_success=44 stat_rw_timeout=0 udf_read_reqs=4 udf_read_success=2 udf_read_errs_other=2 udf_write_reqs=14 udf_write_success=14 udf_write_err_others=0 udf_delete_reqs=0 udf_delete_success=0 udf_delete_err_others=0 udf_lua_errs=0 udf_scan_rec_reqs=4 udf_query_rec_reqs=4 udf_replica_writes=0 stat_proxy_reqs=0 stat_proxy_reqs_xdr=0 stat_proxy_success=0 stat_proxy_errs=0 stat_ldt_proxy=0 stat_cluster_key_trans_to_proxy_retry=0 stat_cluster_key_transaction_reenqueue=0 stat_slow_trans_queue_push=0 stat_slow_trans_queue_pop=0 stat_slow_trans_queue_batch_pop=0 stat_cluster_key_regular_processed=0 stat_cluster_key_prole_retry=0 stat_cluster_key_err_ack_dup_trans_reenqueue=0 stat_cluster_key_partition_transaction_queue_count=0 stat_cluster_key_err_ack_rw_trans_reenqueue=0 stat_expired_objects=0 stat_evicted_objects=0 stat_deleted_set_objects=0 stat_evicted_set_objects=0 stat_evicted_objects_time=0 stat_zero_bin_records=0 stat_nsup_deletes_not_shipped=0 err_tsvc_requests=5 err_out_of_space=0 err_duplicate_proxy_request=0 err_rw_request_not_found=0 err_rw_pending_limit=0 err_rw_cant_put_unique=0 fabric_msgs_sent=12659592 fabric_msgs_rcvd=12659550 paxos_principal=BB9CF35A5565000 migrate_msgs_sent=9605908 migrate_msgs_recv=9659115 migrate_progress_send=0 migrate_progress_recv=0 migrate_num_incoming_accepted=24452 migrate_num_incoming_refused=0 queue=0 transactions=2889093 reaped_fds=94 tscan_initiate=781 tscan_pending=0 tscan_succeeded=789 tscan_aborted=0 batch_initiate=0 batch_queue=0 batch_tree_count=0 batch_timeout=0 batch_errors=0 info_queue=0 delete_queue=0 proxy_in_progress=0 proxy_initiate=0 proxy_action=0 proxy_retry=0 proxy_retry_q_full=0 proxy_unproxy=0 proxy_retry_same_dest=0 proxy_retry_new_dest=0 write_master=1492600 write_prole=1507713 read_dup_prole=0 rw_err_dup_internal=0 rw_err_dup_cluster_key=0 rw_err_dup_send=0 rw_err_write_internal=0 rw_err_write_cluster_key=0 rw_err_write_send=0 rw_err_ack_internal=0 rw_err_ack_nomatch=0 rw_err_ack_badnode=0 client_connections=4 waiting_transactions=0 tree_count=0 record_refs=1984282 record_locks=0 migrate_tx_objs=0 migrate_rx_objs=0 ongoing_write_reqs=0 err_storage_queue_full=0 partition_actual=2708 partition_replica=2712 partition_desync=0 partition_absent=2772 partition_object_count=1984282 partition_ref_count=8192 system_free_mem_pct=71 sindex_ucgarbage_found=0 sindex_gc_locktimedout=0 sindex_gc_inactivity_dur=2129846129 sindex_gc_activity_dur=3499873 sindex_gc_list_creation_time=3489177 sindex_gc_list_deletion_time=2072 sindex_gc_objects_validated=2053448283 sindex_gc_garbage_found=696100 sindex_gc_garbage_cleaned=696100 system_swapping=false err_replica_null_node=0 err_replica_non_null_node=0 err_sync_copy_null_node=0 err_sync_copy_null_master=0 storage_defrag_corrupt_record=0 err_write_fail_prole_unknown=0 err_write_fail_prole_generation=0 err_write_fail_unknown=0 err_write_fail_key_exists=4 err_write_fail_generation=0 err_write_fail_generation_xdr=0 err_write_fail_bin_exists=0 err_write_fail_parameter=0 err_write_fail_incompatible_type=0 err_write_fail_noxdr=0 err_write_fail_prole_delete=0 err_write_fail_not_found=0 err_write_fail_key_mismatch=0 err_write_fail_record_too_big=0 err_write_fail_bin_name=0 err_write_fail_bin_not_found=0 err_write_fail_forbidden=0 stat_duplicate_operation=0 uptime=2152798 stat_write_errs_notfound=1 stat_write_errs_other=4 heartbeat_received_self=14275817 heartbeat_received_foreign=18934583 query_reqs=101 query_success=60 query_fail=31 query_abort=10 query_avg_rec_count=1259 query_short_queue_full=0 query_long_queue_full=0 query_short_running=46 query_long_running=24 query_tracked=14 query_agg=25 query_agg_success=15 query_agg_err=0 query_agg_abort=10 query_agg_avg_rec_count=3523 query_lookups=45 query_lookup_success=45 query_lookup_err=0 query_lookup_abort=0 query_lookup_avg_rec_count=0


ASINFO on node 3:

cluster_size=3 cluster_key=A5A28076E110C33B cluster_integrity=true objects=2005344 sub-records=0 total-bytes-disk=0 used-bytes-disk=0 free-pct-disk=0 total-bytes-memory=8589934592 used-bytes-memory=277329629 data-used-bytes-memory=127016768 index-used-bytes-memory=128342016 sindex-used-bytes-memory=21970845 free-pct-memory=96 stat_read_reqs=0 stat_read_reqs_xdr=0 stat_read_success=0 stat_read_errs_notfound=0 stat_read_errs_other=0 stat_write_reqs=0 stat_write_reqs_xdr=0 stat_write_success=0 stat_write_errs=0 stat_xdr_pipe_writes=0 stat_xdr_pipe_miss=0 stat_delete_success=0 stat_rw_timeout=0 udf_read_reqs=0 udf_read_success=0 udf_read_errs_other=0 udf_write_reqs=0 udf_write_success=0 udf_write_err_others=0 udf_delete_reqs=0 udf_delete_success=0 udf_delete_err_others=0 udf_lua_errs=0 udf_scan_rec_reqs=0 udf_query_rec_reqs=0 udf_replica_writes=0 stat_proxy_reqs=0 stat_proxy_reqs_xdr=0 stat_proxy_success=0 stat_proxy_errs=0 stat_ldt_proxy=0 stat_cluster_key_trans_to_proxy_retry=0 stat_cluster_key_transaction_reenqueue=0 stat_slow_trans_queue_push=0 stat_slow_trans_queue_pop=0 stat_slow_trans_queue_batch_pop=0 stat_cluster_key_regular_processed=0 stat_cluster_key_prole_retry=0 stat_cluster_key_err_ack_dup_trans_reenqueue=0 stat_cluster_key_partition_transaction_queue_count=0 stat_cluster_key_err_ack_rw_trans_reenqueue=0 stat_expired_objects=0 stat_evicted_objects=0 stat_deleted_set_objects=0 stat_evicted_set_objects=0 stat_evicted_objects_time=0 stat_zero_bin_records=0 stat_nsup_deletes_not_shipped=0 err_tsvc_requests=0 err_out_of_space=0 err_duplicate_proxy_request=0 err_rw_request_not_found=0 err_rw_pending_limit=0 err_rw_cant_put_unique=0 fabric_msgs_sent=3042378 fabric_msgs_rcvd=3042365 paxos_principal=BB9CF35A5565000 migrate_msgs_sent=3028681 migrate_msgs_recv=3042350 migrate_progress_send=0 migrate_progress_recv=0 migrate_num_incoming_accepted=5476 migrate_num_incoming_refused=0 queue=0 transactions=56866 reaped_fds=0 tscan_initiate=18 tscan_pending=0 tscan_succeeded=18 tscan_aborted=0 batch_initiate=0 batch_queue=0 batch_tree_count=0 batch_timeout=0 batch_errors=0 info_queue=0 delete_queue=0 proxy_in_progress=0 proxy_initiate=0 proxy_action=0 proxy_retry=0 proxy_retry_q_full=0 proxy_unproxy=0 proxy_retry_same_dest=0 proxy_retry_new_dest=0 write_master=0 write_prole=0 read_dup_prole=0 rw_err_dup_internal=0 rw_err_dup_cluster_key=0 rw_err_dup_send=0 rw_err_write_internal=0 rw_err_write_cluster_key=0 rw_err_write_send=0 rw_err_ack_internal=0 rw_err_ack_nomatch=0 rw_err_ack_badnode=0 client_connections=3 waiting_transactions=0 tree_count=0 record_refs=2005344 record_locks=0 migrate_tx_objs=0 migrate_rx_objs=0 ongoing_write_reqs=0 err_storage_queue_full=0 partition_actual=2756 partition_replica=2720 partition_desync=0 partition_absent=2716 partition_object_count=2005344 partition_ref_count=8192 system_free_mem_pct=80 sindex_ucgarbage_found=0 sindex_gc_locktimedout=0 sindex_gc_inactivity_dur=10808110 sindex_gc_activity_dur=15890 sindex_gc_list_creation_time=15866 sindex_gc_list_deletion_time=2 sindex_gc_objects_validated=9360170 sindex_gc_garbage_found=0 sindex_gc_garbage_cleaned=0 system_swapping=false err_replica_null_node=0 err_replica_non_null_node=0 err_sync_copy_null_node=0 err_sync_copy_null_master=0 storage_defrag_corrupt_record=0 err_write_fail_prole_unknown=0 err_write_fail_prole_generation=0 err_write_fail_unknown=0 err_write_fail_key_exists=0 err_write_fail_generation=0 err_write_fail_generation_xdr=0 err_write_fail_bin_exists=0 err_write_fail_parameter=0 err_write_fail_incompatible_type=0 err_write_fail_noxdr=0 err_write_fail_prole_delete=0 err_write_fail_not_found=0 err_write_fail_key_mismatch=0 err_write_fail_record_too_big=0 err_write_fail_bin_name=0 err_write_fail_bin_not_found=0 err_write_fail_forbidden=0 stat_duplicate_operation=0 uptime=10861 stat_write_errs_notfound=0 stat_write_errs_other=0 heartbeat_received_self=72117 heartbeat_received_foreign=144224 query_reqs=0 query_success=0 query_fail=0 query_abort=0 query_avg_rec_count=0 query_short_queue_full=0 query_long_queue_full=0 query_short_running=0 query_long_running=0 query_tracked=0 query_agg=0 query_agg_success=0 query_agg_err=0 query_agg_abort=0 query_agg_avg_rec_count=0 query_lookups=0 query_lookup_success=0 query_lookup_err=0 query_lookup_abort=0 query_lookup_avg_rec_count=0

Thanks again! Moreno


#7

Hi Moreno,

Since you have lot of data in the namespace, there is a possibility that you might have inserted some records in this set earlier. Scanning different nodes should not change the result.

Lets investigate this further. Can you please take backup of set in question ? You can use the following command-

asbackup -d <dir_name> -n namespace_name -s set_name -h host_name

Also can you share the result of following command on each host ?

asinfo -v “sets”

Thanks


#8

asbackup fails on 3 nodes, output is:

asbackup -d /MORENO_91/BACKUP -n test -s users -h 10.100.0.2

Backing up From: host 10.100.0.2 port 3000 namespace test set users bin_list (null) to directory /MORENO_91/BACKUP with scan_pct 100 Nodes are repeated or different addresses of same node. Give proper input


asinfo -v “sets” node1:

ns_name=test:set_name=demo2:n_objects=1340058:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:set-dele te=false;ns_name=test:set_name=ccc:n_objects=670322:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:se t-delete=false;ns_name=test:set_name=tweets:n_objects=1:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-defaul t:set-delete=false;ns_name=test:set_name=users:n_objects=1:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-def ault:set-delete=false;


asinfo -v “sets” node2:

ns_name=test:set_name=ccc:n_objects=661095:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:set-delete= false;ns_name=test:set_name=demo2:n_objects=1323181:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:se t-delete=false;ns_name=test:set_name=users:n_objects=3:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default :set-delete=false;ns_name=test:set_name=tweets:n_objects=3:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-def ault:set-delete=false;


asinfo -v “sets” node3:

ns_name=test:set_name=demo2:n_objects=1336759:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:set-dele te=false;ns_name=test:set_name=ccc:n_objects=668581:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-default:se t-delete=false;ns_name=test:set_name=tweets:n_objects=2:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-defaul t:set-delete=false;ns_name=test:set_name=users:n_objects=2:set-stop-write-count=0:set-evict-hwm-count=0:set-enable-xdr=use-def ault:set-delete=false;


#9

Hi Moreno,

Looking at the backup failure, it seems service list of the nodes is corrupted somehow at some node. This might trigger a scan twice at some node.

To confirm this -

Can you share the output of following command from each node with us ?

asinfo -v “services”

Also to be 100 % sure, can you share the config file of each node with us ?


#10

Hi pratyyy,


Node 1

ip a

eth0: 138.132.43.90/24 eth1: 10.100.0.2/24 eth1.1: 20.100.0.2/24

asinfo -v “services”

10.100.0.3:3000;20.100.0.3:3000;138.132.43.91:3000;10.100.0.4:3000;138.132.43.92:3000


Node 2

ip a eth0: 138.132.43.91/24 eth1: 10.100.0.3/24 eth1.1: 20.100.0.3/24

asinfo -v “services”

10.100.0.2:3000;20.100.0.2:3000;138.132.43.90:3000;10.100.0.4:3000;138.132.43.92:3000


Node 3

ip a

eth0: 138.132.43.92/24 eth1: 10.100.0.4/24 (no eth1.1)

asinfo -v “services”

10.100.0.3:3000;20.100.0.3:3000;138.132.43.91:3000;10.100.0.2:3000;20.100.0.2:3000;138.132.43.90:3000


/etc/aerospike/aerospike.conf (is the same, identical, on the 3 nodes)

(sorry, copy and paste of the file is not easily readable)

service { user root group root paxos-single-replica-limit 1 pidfile /var/run/aerospike/asd.pid service-threads 4 transaction-queues 4 transaction-threads-per-queue 4 proto-fd-max 15000 }

logging { file /var/log/aerospike/aerospike.log { context any info } }

network { service { address any port 3000 }

    heartbeat {
            mode multicast
            address 239.1.99.222
            port 9918
            interval 150
            timeout 10
    }

    fabric {
            port 3001
    }

    info {
            port 3003
    }

}

namespace test { replication-factor 2 memory-size 4G default-ttl 30d storage-engine memory }

namespace bar { replication-factor 2 memory-size 4G default-ttl 30d storage-engine memory }


#11

Hi pratyyy, do you have any news?

Do you need other info?

Thanks Moreno


#12

Hi Moreno, Extremely sorry for late reply. From your as-info result we come to know that you have multiple nic addresses of same machine. Which creates same node-id for multiple nics. We will enhance our scan api to handle duplicate node ids efficiently.

You can resolve your issue by specifying external address of each node in service section of your config file. e.g.

service {
    address any
    port 3000
    access-address 10.100.0.03
}

By adding access-address, each node will expose only one ip for client connection.

Let us know whether above solution is working for you or you need more help?


#13

Thank you jyoti (late reply? You and pratyyy are so collaborative with me :-)).

I changed aerospike.conf on 3 nodes according your suggestion.

Node 1:

Added row:

access-address 10.100.0.2

Node2:

Added row:

access-address 10.100.0.3

Node3:

Added row:

access-address 10.100.0.4

The output of asinfo -v “services” seems to be right:

Node1:

10.100.0.3:3000;10.100.0.4:3000

Node2:

10.100.0.2:3000;10.100.0.4:3000

Node3:

10.100.0.3:3000;10.100.0.2:3000

I restarted aerospike service in this way:

service aerospike stop

service aerospike start

Bad new is that now I see only 1 record for “Users” SET and 1 for “Tweets” SET.

Something wrong in this sequence of commands and aerospike.conf update?

Thanks again Moreno


#14

Can you run as backup now. And just give us the backup output.


#15

All 3 backups were ok.

Output for Node1 (for example):

asbackup -d /MORENO_90/BACKUP -n test -s users

Backing up From: host 127.0.0.1 port 3000 namespace test set users bin_list (null) to directory /MORENO_90/BACKUP with scan_pct 100

Aerospike scan nodes: 3 nodes

Node_name Objects Rep_fact

BB96753A5565000 469350 2

BB9CF35A5565000 293504 2

BB9B1ECA5565000 329158 2

directory “/MORENO_90/BACKUP” prepared for backup

starting backup for node BB96753A5565000

starting backup for node BB9CF35A5565000

starting backup for node BB9B1ECA5565000

May 18 2015 12:22:11 GMT: New file created /MORENO_90/BACKUP/BB9CF35A5565000_00000.asb

May 18 2015 12:22:11 GMT: New file created /MORENO_90/BACKUP/BB96753A5565000_00000.asb

May 18 2015 12:22:11 GMT: New file created /MORENO_90/BACKUP/BB9B1ECA5565000_00000.asb

Complete backup for node BB9B1ECA5565000 and total backed up from this node: 0

Complete backup for node BB96753A5565000 and total backed up from this node: 0

Complete backup for node BB9CF35A5565000 and total backed up from this node: 1

May 18 2015 12:22:12 GMT: backed up records 0%

May 18 2015 12:22:12 GMT: backed up records 100%

May 18 2015 12:22:12 GMT: Backup successfully completed.

May 18 2015 12:22:12 GMT: Total backed up records from all nodes 1


#16

From asbackup output it seems you have only one record in users set. Are you sure there are 3 records in users set? You can confirm this using following command in each node:

asinfo -v “sets”


#17

asinfo -v “sets” reported 1 record for set_name=Users

I didn’t deleted the other 2 users (and the other 2 tweets).

So I stopped Aerospike, replaced the right aerospike.conf with older version (where access-address was not specified).

After restarting it I had duplicated records.

When I stopped again Aerospike, replaced aerospike.conf with the right version (with “access-address” row) and restarted Aerospike, ALL Users and Tweets records were disappeared!

(asinfo -v “sets” has now empty output)

I don’y know if this last test is an interesting or a stupid test: perhaps the best thing is to delete all previous records and with the correct aerospike.conf restart from the beginning.


#18

Actually your data is in-memory thats why you lost your records. Sorry I could have explain you the steps in proper way. Whenever you do node restart do one by one and wait for migration to finish.


#19

Ok jyoti.

No problem for data loss.

I’m only doing preliminary tests with Aerospike (the ICT company where I work is evaluating if it’s possible to use a NoSql DB, in cloud, in order to store real time application’s data).


#20

We are able reproduce this issue when multiple NICs are there. It seems this is due to some client bug. We are fixing this. Thanks for the patience. Looking forward to have more queries from you.

Thanks, Jyoti


#21

Ok jyoti.

Thanks for your patience!

After preliminary tests I’ll switch to stress test (with related questions :slight_smile: )