Hi all, I went through another round of testing with Aerospike today and found that our write throughput of the entire cluster dropped quite dramatically with each new node we added.
We’re using c3.8xlarges in the recommended memory + file backed storage configuration. Is this behavior expected?
FWIW we’re inserting single-bin records and we were topping out around ~150k per second on a single node, but it was as low as ~50k when using three. Using two it was around ~100k.
Thanks in advance…
raj
March 12, 2015, 7:50am
2
Joshua,
Are you running with data in memory. I think that is what you mean when you said “memory +”, but just confirming
Can you grab following
After the run
asinfo -v 'statistics" from all the nodes
While running
top
iostat -xmt 5, 10
iftop
Also where are you clients running from. Is it in same VPC / Availablity Zone / region ?? Can you also grab ping from your client box to your server box while you running the load.
I am trying to understand what is a bottleneck. C38xlarge is capable of doing lot more.
– R
Here’s the namespace config:
namespace signal {
replication-factor 1
memory-size 60G
default-ttl 0 # 30 days, use 0 to never expire/evict.
ldt-enabled true
storage-engine device {
file /mnt/aerospike/signal.dat
filesize 300G
data-in-memory true # Store data in memory in addition to file.
}
}
I was running the clients on other hosts, but that was even slower, so I switched to running the client on the same VM as the server(s) so they could just connect to localhost. That alone gave us a 3x performance increase.
I’ve spun down the other hosts now, but here’s the output of statistics:
cluster_size=1;cluster_key=1BB7FB78C975E97E;cluster_integrity=true;objects=73860812;sub-records=0;total-bytes-disk=322122547200;used-bytes-disk=16139206272;free-pct-disk=94;total-bytes-memory=64424509440;used-bytes-memory=7856325928;data-used-bytes-memory=3129233960;index-used-bytes-memory=4727091968;sindex-used-bytes-memory=0;free-pct-memory=87;stat_read_reqs=0;stat_read_reqs_xdr=0;stat_read_success=0;stat_read_errs_notfound=0;stat_read_errs_other=0;stat_write_reqs=99179388;stat_write_reqs_xdr=0;stat_write_success=99179388;stat_write_errs=0;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=0;stat_rw_timeout=0;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=0;udf_scan_rec_reqs=0;udf_query_rec_reqs=0;udf_replica_writes=0;stat_proxy_reqs=0;stat_proxy_reqs_xdr=0;stat_proxy_success=0;stat_proxy_errs=0;stat_ldt_proxy=0;stat_cluster_key_trans_to_proxy_retry=0;stat_cluster_key_transaction_reenqueue=0;stat_slow_trans_queue_push=634;stat_slow_trans_queue_pop=634;stat_slow_trans_queue_batch_pop=21;stat_cluster_key_regular_processed=0;stat_cluster_key_prole_retry=0;stat_cluster_key_err_ack_dup_trans_reenqueue=0;stat_cluster_key_partition_transaction_queue_count=0;stat_cluster_key_err_ack_rw_trans_reenqueue=0;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;err_tsvc_requests=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=0;err_rw_pending_limit=0;err_rw_cant_put_unique=0;fabric_msgs_sent=12334;fabric_msgs_rcvd=12323;paxos_principal=BB98E85FA0A0022;migrate_msgs_sent=6148;migrate_msgs_recv=12299;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=3391;migrate_num_incoming_refused=0;queue=0;transactions=99629949;reaped_fds=2;tscan_initiate=0;tscan_pending=0;tscan_succeeded=0;tscan_aborted=0;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=0;proxy_action=0;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=99179388;write_prole=0;read_dup_prole=0;rw_err_dup_internal=0;rw_err_dup_cluster_key=0;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=0;rw_err_ack_badnode=0;client_connections=1;waiting_transactions=0;tree_count=0;record_refs=73860812;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=0;err_storage_queue_full=0;partition_actual=4096;partition_replica=0;partition_desync=0;partition_absent=0;partition_object_count=73860812;partition_ref_count=4096;system_free_mem_pct=85;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=0;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=0;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=0;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=0;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=0;uptime=69026;stat_write_errs_notfound=0;stat_write_errs_other=0;heartbeat_received_self=0;heartbeat_received_foreign=29207;query_reqs=0;query_success=0;query_fail=0;query_abort=0;query_avg_rec_count=0;query_short_queue_full=0;query_long_queue_full=0;query_short_running=0;query_long_running=0;query_tracked=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0
03/12/2015 01:36:31 PM
avg-cpu: %user %nice %system %iowait %steal %idle
1.05 0.00 0.81 0.02 0.07 98.05
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 1.62 0.33 1.80 0.01 0.04 39.21 0.01 3.36 0.27 3.91 1.05 0.22
xvdb 0.00 0.03 2.18 23.35 0.09 0.96 83.80 0.80 31.38 4.96 33.85 0.28 0.72
xvdc 0.00 0.04 2.18 23.31 0.09 0.96 83.91 0.75 29.61 4.99 31.91 0.28 0.72
dm-0 0.00 0.00 4.35 46.33 0.18 1.91 84.42 1.57 30.88 4.99 33.31 0.15 0.74
03/12/2015 01:36:36 PM
avg-cpu: %user %nice %system %iowait %steal %idle
0.03 0.00 0.03 0.01 0.01 99.93
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvdap1 0.00 0.60 0.20 1.00 0.00 0.01 16.00 0.00 2.67 4.00 2.40 2.67 0.32
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Recreated the node as an HVM instance after finding this: 1 Aerospike server X 1 Amazon EC2 instance = 1 Million TPS for just $1.68/hour - High Scalability -
… not seeing much of an improvement
jbuss@aero:~$ asinfo -v 'statistics'
cluster_size=1;cluster_key=89667A74D9F482F1;cluster_integrity=true;objects=33028613;sub-records=0;total-bytes-disk=322122547200;used-bytes-disk=7217096704;free-pct-disk=97;total-bytes-memory=64424509440;used-bytes-memory=3513182344;data-used-bytes-memory=1399351112;index-used-bytes-memory=2113831232;sindex-used-bytes-memory=0;free-pct-memory=94;stat_read_reqs=0;stat_read_reqs_xdr=0;stat_read_success=0;stat_read_errs_notfound=0;stat_read_errs_other=0;stat_write_reqs=35222686;stat_write_reqs_xdr=0;stat_write_success=35222683;stat_write_errs=0;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=0;stat_rw_timeout=0;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=0;udf_scan_rec_reqs=0;udf_query_rec_reqs=0;udf_replica_writes=0;stat_proxy_reqs=0;stat_proxy_reqs_xdr=0;stat_proxy_success=0;stat_proxy_errs=0;stat_ldt_proxy=0;stat_cluster_key_trans_to_proxy_retry=0;stat_cluster_key_transaction_reenqueue=0;stat_slow_trans_queue_push=0;stat_slow_trans_queue_pop=0;stat_slow_trans_queue_batch_pop=0;stat_cluster_key_regular_processed=0;stat_cluster_key_prole_retry=0;stat_cluster_key_err_ack_dup_trans_reenqueue=0;stat_cluster_key_partition_transaction_queue_count=0;stat_cluster_key_err_ack_rw_trans_reenqueue=0;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;err_tsvc_requests=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=0;err_rw_pending_limit=0;err_rw_cant_put_unique=0;fabric_msgs_sent=591025;fabric_msgs_rcvd=591019;paxos_principal=BB92900FD0A0022;migrate_msgs_sent=588939;migrate_msgs_recv=591012;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=35;migrate_num_incoming_refused=0;queue=0;transactions=35290955;reaped_fds=0;tscan_initiate=0;tscan_pending=0;tscan_succeeded=0;tscan_aborted=0;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=0;proxy_action=0;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=35222704;write_prole=0;read_dup_prole=0;rw_err_dup_internal=0;rw_err_dup_cluster_key=0;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=0;rw_err_ack_badnode=0;client_connections=523;waiting_transactions=0;tree_count=0;record_refs=33028632;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=2;err_storage_queue_full=0;partition_actual=4096;partition_replica=0;partition_desync=0;partition_absent=0;partition_object_count=33028679;partition_ref_count=4099;system_free_mem_pct=92;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=0;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=0;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=0;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=0;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=0;uptime=656;stat_write_errs_notfound=0;stat_write_errs_other=0;heartbeat_received_self=0;heartbeat_received_foreign=686;query_reqs=0;query_success=0;query_fail=0;query_abort=0;query_avg_rec_count=0;query_short_queue_full=0;query_long_queue_full=0;query_short_running=0;query_long_running=0;query_tracked=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0
avg-cpu: %user %nice %system %iowait %steal %idle
0.32 0.00 0.25 0.08 0.32 99.03
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.15 13.78 1.46 10.61 0.03 0.30 56.02 1.05 86.87 23.58 95.55 1.82 2.19
xvdb 0.04 0.02 0.16 34.94 0.00 1.45 84.67 0.60 16.99 0.25 17.07 0.23 0.81
xvdc 0.04 0.15 0.22 35.42 0.00 1.46 83.68 1.08 30.23 0.18 30.42 0.23 0.82
dm-0 0.00 0.00 0.21 70.46 0.00 2.91 84.22 1.69 23.85 0.17 23.92 0.12 0.85
03/12/2015 04:09:29 PM
avg-cpu: %user %nice %system %iowait %steal %idle
8.61 0.00 8.97 0.26 0.57 81.60
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 0.00 0.20 0.00 0.00 8.00 0.00 20.00 0.00 20.00 20.00 0.40
xvdb 0.00 1.60 0.00 449.20 0.00 18.63 84.93 11.70 25.83 0.00 25.83 0.24 10.56
xvdc 0.00 0.40 0.00 433.80 0.00 17.96 84.78 13.92 31.05 0.00 31.05 0.24 10.56
dm-0 0.00 0.00 0.00 929.60 0.00 38.44 84.68 25.77 27.13 0.00 27.13 0.11 10.56
03/12/2015 04:09:34 PM
avg-cpu: %user %nice %system %iowait %steal %idle
8.56 0.00 8.99 0.49 0.65 81.32
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 1.00 0.00 0.60 0.00 0.01 21.33 0.00 6.67 0.00 6.67 6.67 0.40
xvdb 0.00 0.40 0.00 846.40 0.00 35.01 84.71 12.45 14.82 0.00 14.82 0.23 19.52
xvdc 0.00 1.60 0.00 861.00 0.00 35.67 84.86 27.54 32.50 0.00 32.50 0.23 19.68
dm-0 0.00 0.00 0.00 1664.80 0.00 68.83 84.68 40.22 24.48 0.00 24.48 0.12 19.68
Hi Joshua,
We have a detailed set of procedures describing the process we used in the High Scalability post . Please find them in our Amazon Deployment Tuning Guide .
I found that and have been making as many modifications as I can to model your examples. Unfortunately, I cannot use a VPC.
My main question is if it is expected to see write performance drop when growing the cluster… that seems very counter intuitive to me.
When using replication factor 1, you should scale linearly for each node added. With replication factor 2 you would see a performance drop due to replication when expanding from 1 to 2 nodes, but should be linear there on.
Now the servers need to compete with the clients for resources, I would expect this to negatively impact your performance.
This may be true, but the performance of the Aerospike Cluster was not improved, the network performance was. Requiring the servers to compete with the clients for resources would limit the performance of the servers.
Try running the clients from separate machines. Start with one and tune the number of threads for highest TPS and then spin up more instances with the same client configuration. In the 1M TPS procedures, each instance running a client pushed about 250K tps so we needed 4 client instances to fully load a single server instance.
The initial 250+k TPS was when I was inserting into a single-node system. It dropped to the lower level when I added a second node. The nodes are both c3.8xl with the same tuning applied (except for the multi-nic / VPC trick which I can not do at this time). The replication factor for this namespace is 1.
Sorry for the delay
These two would indicate that your clients were the bottleneck and the fact that they are python makes that really likely. I suspect that you would have seen the TPS increase if you increased the number of clients indicating that the server could handle more than the clients could push.
Migrations should definently have an affect on performance especially peak performance.
I recommend using the java benchmark client to see how many transactions per second the server can handle. On internal machines the java benchmark can push upwards of 300,000 TPS per second.