Write performance in multi-node clusters?

Joshua_Buss · March 11, 2015, 9:27pm

Hi all, I went through another round of testing with Aerospike today and found that our write throughput of the entire cluster dropped quite dramatically with each new node we added.

We’re using c3.8xlarges in the recommended memory + file backed storage configuration. Is this behavior expected?

FWIW we’re inserting single-bin records and we were topping out around ~150k per second on a single node, but it was as low as ~50k when using three. Using two it was around ~100k.

Thanks in advance…

raj · March 12, 2015, 7:50am

Joshua,

Are you running with data in memory. I think that is what you mean when you said “memory +”, but just confirming

Can you grab following

After the run

asinfo -v 'statistics" from all the nodes

While running

top

iostat -xmt 5, 10

iftop

Also where are you clients running from. Is it in same VPC / Availablity Zone / region ?? Can you also grab ping from your client box to your server box while you running the load.

I am trying to understand what is a bottleneck. C38xlarge is capable of doing lot more.

– R

Joshua_Buss · March 12, 2015, 1:37pm

Here’s the namespace config:

namespace signal {
        replication-factor 1
        memory-size 60G
        default-ttl 0 # 30 days, use 0 to never expire/evict.
        ldt-enabled true
        storage-engine device {
                file /mnt/aerospike/signal.dat
                filesize 300G
                data-in-memory true # Store data in memory in addition to file.
        }
}

I was running the clients on other hosts, but that was even slower, so I switched to running the client on the same VM as the server(s) so they could just connect to localhost. That alone gave us a 3x performance increase.

I’ve spun down the other hosts now, but here’s the output of statistics:

cluster_size=1;cluster_key=1BB7FB78C975E97E;cluster_integrity=true;objects=73860812;sub-records=0;total-bytes-disk=322122547200;used-bytes-disk=16139206272;free-pct-disk=94;total-bytes-memory=64424509440;used-bytes-memory=7856325928;data-used-bytes-memory=3129233960;index-used-bytes-memory=4727091968;sindex-used-bytes-memory=0;free-pct-memory=87;stat_read_reqs=0;stat_read_reqs_xdr=0;stat_read_success=0;stat_read_errs_notfound=0;stat_read_errs_other=0;stat_write_reqs=99179388;stat_write_reqs_xdr=0;stat_write_success=99179388;stat_write_errs=0;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=0;stat_rw_timeout=0;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=0;udf_scan_rec_reqs=0;udf_query_rec_reqs=0;udf_replica_writes=0;stat_proxy_reqs=0;stat_proxy_reqs_xdr=0;stat_proxy_success=0;stat_proxy_errs=0;stat_ldt_proxy=0;stat_cluster_key_trans_to_proxy_retry=0;stat_cluster_key_transaction_reenqueue=0;stat_slow_trans_queue_push=634;stat_slow_trans_queue_pop=634;stat_slow_trans_queue_batch_pop=21;stat_cluster_key_regular_processed=0;stat_cluster_key_prole_retry=0;stat_cluster_key_err_ack_dup_trans_reenqueue=0;stat_cluster_key_partition_transaction_queue_count=0;stat_cluster_key_err_ack_rw_trans_reenqueue=0;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;err_tsvc_requests=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=0;err_rw_pending_limit=0;err_rw_cant_put_unique=0;fabric_msgs_sent=12334;fabric_msgs_rcvd=12323;paxos_principal=BB98E85FA0A0022;migrate_msgs_sent=6148;migrate_msgs_recv=12299;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=3391;migrate_num_incoming_refused=0;queue=0;transactions=99629949;reaped_fds=2;tscan_initiate=0;tscan_pending=0;tscan_succeeded=0;tscan_aborted=0;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=0;proxy_action=0;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=99179388;write_prole=0;read_dup_prole=0;rw_err_dup_internal=0;rw_err_dup_cluster_key=0;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=0;rw_err_ack_badnode=0;client_connections=1;waiting_transactions=0;tree_count=0;record_refs=73860812;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=0;err_storage_queue_full=0;partition_actual=4096;partition_replica=0;partition_desync=0;partition_absent=0;partition_object_count=73860812;partition_ref_count=4096;system_free_mem_pct=85;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=0;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=0;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=0;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=0;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=0;uptime=69026;stat_write_errs_notfound=0;stat_write_errs_other=0;heartbeat_received_self=0;heartbeat_received_foreign=29207;query_reqs=0;query_success=0;query_fail=0;query_abort=0;query_avg_rec_count=0;query_short_queue_full=0;query_long_queue_full=0;query_short_running=0;query_long_running=0;query_tracked=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0

03/12/2015 01:36:31 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.05    0.00    0.81    0.02    0.07   98.05

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     1.62    0.33    1.80     0.01     0.04    39.21     0.01    3.36    0.27    3.91   1.05   0.22
xvdb              0.00     0.03    2.18   23.35     0.09     0.96    83.80     0.80   31.38    4.96   33.85   0.28   0.72
xvdc              0.00     0.04    2.18   23.31     0.09     0.96    83.91     0.75   29.61    4.99   31.91   0.28   0.72
dm-0              0.00     0.00    4.35   46.33     0.18     1.91    84.42     1.57   30.88    4.99   33.31   0.15   0.74

03/12/2015 01:36:36 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.03    0.00    0.03    0.01    0.01   99.93

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.60    0.20    1.00     0.00     0.01    16.00     0.00    2.67    4.00    2.40   2.67   0.32
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdc              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

Joshua_Buss · March 12, 2015, 4:10pm

Recreated the node as an HVM instance after finding this: 1 Aerospike server X 1 Amazon EC2 instance = 1 Million TPS for just $1.68/hour - High Scalability -

… not seeing much of an improvement

jbuss@aero:~$ asinfo -v 'statistics'
cluster_size=1;cluster_key=89667A74D9F482F1;cluster_integrity=true;objects=33028613;sub-records=0;total-bytes-disk=322122547200;used-bytes-disk=7217096704;free-pct-disk=97;total-bytes-memory=64424509440;used-bytes-memory=3513182344;data-used-bytes-memory=1399351112;index-used-bytes-memory=2113831232;sindex-used-bytes-memory=0;free-pct-memory=94;stat_read_reqs=0;stat_read_reqs_xdr=0;stat_read_success=0;stat_read_errs_notfound=0;stat_read_errs_other=0;stat_write_reqs=35222686;stat_write_reqs_xdr=0;stat_write_success=35222683;stat_write_errs=0;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=0;stat_rw_timeout=0;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=0;udf_scan_rec_reqs=0;udf_query_rec_reqs=0;udf_replica_writes=0;stat_proxy_reqs=0;stat_proxy_reqs_xdr=0;stat_proxy_success=0;stat_proxy_errs=0;stat_ldt_proxy=0;stat_cluster_key_trans_to_proxy_retry=0;stat_cluster_key_transaction_reenqueue=0;stat_slow_trans_queue_push=0;stat_slow_trans_queue_pop=0;stat_slow_trans_queue_batch_pop=0;stat_cluster_key_regular_processed=0;stat_cluster_key_prole_retry=0;stat_cluster_key_err_ack_dup_trans_reenqueue=0;stat_cluster_key_partition_transaction_queue_count=0;stat_cluster_key_err_ack_rw_trans_reenqueue=0;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;err_tsvc_requests=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=0;err_rw_pending_limit=0;err_rw_cant_put_unique=0;fabric_msgs_sent=591025;fabric_msgs_rcvd=591019;paxos_principal=BB92900FD0A0022;migrate_msgs_sent=588939;migrate_msgs_recv=591012;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=35;migrate_num_incoming_refused=0;queue=0;transactions=35290955;reaped_fds=0;tscan_initiate=0;tscan_pending=0;tscan_succeeded=0;tscan_aborted=0;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=0;proxy_action=0;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=35222704;write_prole=0;read_dup_prole=0;rw_err_dup_internal=0;rw_err_dup_cluster_key=0;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=0;rw_err_ack_badnode=0;client_connections=523;waiting_transactions=0;tree_count=0;record_refs=33028632;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=2;err_storage_queue_full=0;partition_actual=4096;partition_replica=0;partition_desync=0;partition_absent=0;partition_object_count=33028679;partition_ref_count=4099;system_free_mem_pct=92;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=0;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=0;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=0;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=0;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=0;uptime=656;stat_write_errs_notfound=0;stat_write_errs_other=0;heartbeat_received_self=0;heartbeat_received_foreign=686;query_reqs=0;query_success=0;query_fail=0;query_abort=0;query_avg_rec_count=0;query_short_queue_full=0;query_long_queue_full=0;query_short_running=0;query_long_running=0;query_tracked=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.32    0.00    0.25    0.08    0.32   99.03

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.15    13.78    1.46   10.61     0.03     0.30    56.02     1.05   86.87   23.58   95.55   1.82   2.19
xvdb              0.04     0.02    0.16   34.94     0.00     1.45    84.67     0.60   16.99    0.25   17.07   0.23   0.81
xvdc              0.04     0.15    0.22   35.42     0.00     1.46    83.68     1.08   30.23    0.18   30.42   0.23   0.82
dm-0              0.00     0.00    0.21   70.46     0.00     2.91    84.22     1.69   23.85    0.17   23.92   0.12   0.85

03/12/2015 04:09:29 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.61    0.00    8.97    0.26    0.57   81.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.20     0.00     0.00     8.00     0.00   20.00    0.00   20.00  20.00   0.40
xvdb              0.00     1.60    0.00  449.20     0.00    18.63    84.93    11.70   25.83    0.00   25.83   0.24  10.56
xvdc              0.00     0.40    0.00  433.80     0.00    17.96    84.78    13.92   31.05    0.00   31.05   0.24  10.56
dm-0              0.00     0.00    0.00  929.60     0.00    38.44    84.68    25.77   27.13    0.00   27.13   0.11  10.56

03/12/2015 04:09:34 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.56    0.00    8.99    0.49    0.65   81.32

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     1.00    0.00    0.60     0.00     0.01    21.33     0.00    6.67    0.00    6.67   6.67   0.40
xvdb              0.00     0.40    0.00  846.40     0.00    35.01    84.71    12.45   14.82    0.00   14.82   0.23  19.52
xvdc              0.00     1.60    0.00  861.00     0.00    35.67    84.86    27.54   32.50    0.00   32.50   0.23  19.68
dm-0              0.00     0.00    0.00 1664.80     0.00    68.83    84.68    40.22   24.48    0.00   24.48   0.12  19.68

kporter · March 12, 2015, 7:08pm

Hi Joshua,

We have a detailed set of procedures describing the process we used in the High Scalability post. Please find them in our Amazon Deployment Tuning Guide.

Joshua_Buss · March 12, 2015, 7:34pm

I found that and have been making as many modifications as I can to model your examples. Unfortunately, I cannot use a VPC.

My main question is if it is expected to see write performance drop when growing the cluster… that seems very counter intuitive to me.

kporter · March 13, 2015, 12:47am

When using replication factor 1, you should scale linearly for each node added. With replication factor 2 you would see a performance drop due to replication when expanding from 1 to 2 nodes, but should be linear there on.

Now the servers need to compete with the clients for resources, I would expect this to negatively impact your performance.

This may be true, but the performance of the Aerospike Cluster was not improved, the network performance was. Requiring the servers to compete with the clients for resources would limit the performance of the servers.

Try running the clients from separate machines. Start with one and tune the number of threads for highest TPS and then spin up more instances with the same client configuration. In the 1M TPS procedures, each instance running a client pushed about 250K tps so we needed 4 client instances to fully load a single server instance.

Joshua_Buss · March 13, 2015, 2:04pm

The initial 250+k TPS was when I was inserting into a single-node system. It dropped to the lower level when I added a second node. The nodes are both c3.8xl with the same tuning applied (except for the multi-nic / VPC trick which I can not do at this time). The replication factor for this namespace is 1.

kporter · March 16, 2015, 10:58pm

Are they still on the same nodes as the server?
Which client are you using? I have assumed the java benchmark client.
Does the performance improve after migrations have completed?
Could you provide your server configuration?

Joshua_Buss · March 27, 2015, 6:59pm

I was using clients on the same servers and different servers. Moving the clients to just remote servers lowered my throughput substantially
The clients were python-based and just did inserts
No, performance was consistent regardless of whether migrations were happening
Server config was all defaults (with exception of using IPs for node discovery since we’re on EC2)

kporter · May 1, 2015, 1:13am

Sorry for the delay

These two would indicate that your clients were the bottleneck and the fact that they are python makes that really likely. I suspect that you would have seen the TPS increase if you increased the number of clients indicating that the server could handle more than the clients could push.

Migrations should definently have an affect on performance especially peak performance.

I recommend using the java benchmark client to see how many transactions per second the server can handle. On internal machines the java benchmark can push upwards of 300,000 TPS per second.

Topic		Replies	Views
Setting up a cluster with different memory sizes on nodes Configuration	2	1917	July 13, 2017
Bad performance with replication Operations	15	3765	June 7, 2016
Aerospike slow performance write/batch-read	3	3573	October 16, 2017
Performance drops with the 6th machine	2	1369	March 28, 2016
Aerospike performance with node.js driver (on AWS c3.2xlarge) Node.js Client aws	14	4768	May 18, 2016

Write performance in multi-node clusters?

Related topics