Write performance in multi-node clusters?


#1

Hi all, I went through another round of testing with Aerospike today and found that our write throughput of the entire cluster dropped quite dramatically with each new node we added.

We’re using c3.8xlarges in the recommended memory + file backed storage configuration. Is this behavior expected?

FWIW we’re inserting single-bin records and we were topping out around ~150k per second on a single node, but it was as low as ~50k when using three. Using two it was around ~100k.

Thanks in advance…


#2

Joshua,

Are you running with data in memory. I think that is what you mean when you said “memory +”, but just confirming

Can you grab following

After the run

asinfo -v 'statistics" from all the nodes

While running

top

iostat -xmt 5, 10

iftop

Also where are you clients running from. Is it in same VPC / Availablity Zone / region ?? Can you also grab ping from your client box to your server box while you running the load.

I am trying to understand what is a bottleneck. C38xlarge is capable of doing lot more.

– R


#3

Here’s the namespace config:

namespace signal {
        replication-factor 1
        memory-size 60G
        default-ttl 0 # 30 days, use 0 to never expire/evict.
        ldt-enabled true
        storage-engine device {
                file /mnt/aerospike/signal.dat
                filesize 300G
                data-in-memory true # Store data in memory in addition to file.
        }
}

I was running the clients on other hosts, but that was even slower, so I switched to running the client on the same VM as the server(s) so they could just connect to localhost. That alone gave us a 3x performance increase.

I’ve spun down the other hosts now, but here’s the output of statistics:

cluster_size=1;cluster_key=1BB7FB78C975E97E;cluster_integrity=true;objects=73860812;sub-records=0;total-bytes-disk=322122547200;used-bytes-disk=16139206272;free-pct-disk=94;total-bytes-memory=64424509440;used-bytes-memory=7856325928;data-used-bytes-memory=3129233960;index-used-bytes-memory=4727091968;sindex-used-bytes-memory=0;free-pct-memory=87;stat_read_reqs=0;stat_read_reqs_xdr=0;stat_read_success=0;stat_read_errs_notfound=0;stat_read_errs_other=0;stat_write_reqs=99179388;stat_write_reqs_xdr=0;stat_write_success=99179388;stat_write_errs=0;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=0;stat_rw_timeout=0;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=0;udf_scan_rec_reqs=0;udf_query_rec_reqs=0;udf_replica_writes=0;stat_proxy_reqs=0;stat_proxy_reqs_xdr=0;stat_proxy_success=0;stat_proxy_errs=0;stat_ldt_proxy=0;stat_cluster_key_trans_to_proxy_retry=0;stat_cluster_key_transaction_reenqueue=0;stat_slow_trans_queue_push=634;stat_slow_trans_queue_pop=634;stat_slow_trans_queue_batch_pop=21;stat_cluster_key_regular_processed=0;stat_cluster_key_prole_retry=0;stat_cluster_key_err_ack_dup_trans_reenqueue=0;stat_cluster_key_partition_transaction_queue_count=0;stat_cluster_key_err_ack_rw_trans_reenqueue=0;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;err_tsvc_requests=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=0;err_rw_pending_limit=0;err_rw_cant_put_unique=0;fabric_msgs_sent=12334;fabric_msgs_rcvd=12323;paxos_principal=BB98E85FA0A0022;migrate_msgs_sent=6148;migrate_msgs_recv=12299;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=3391;migrate_num_incoming_refused=0;queue=0;transactions=99629949;reaped_fds=2;tscan_initiate=0;tscan_pending=0;tscan_succeeded=0;tscan_aborted=0;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=0;proxy_action=0;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=99179388;write_prole=0;read_dup_prole=0;rw_err_dup_internal=0;rw_err_dup_cluster_key=0;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=0;rw_err_ack_badnode=0;client_connections=1;waiting_transactions=0;tree_count=0;record_refs=73860812;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=0;err_storage_queue_full=0;partition_actual=4096;partition_replica=0;partition_desync=0;partition_absent=0;partition_object_count=73860812;partition_ref_count=4096;system_free_mem_pct=85;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=0;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=0;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=0;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=0;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=0;uptime=69026;stat_write_errs_notfound=0;stat_write_errs_other=0;heartbeat_received_self=0;heartbeat_received_foreign=29207;query_reqs=0;query_success=0;query_fail=0;query_abort=0;query_avg_rec_count=0;query_short_queue_full=0;query_long_queue_full=0;query_short_running=0;query_long_running=0;query_tracked=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0

03/12/2015 01:36:31 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.05    0.00    0.81    0.02    0.07   98.05

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     1.62    0.33    1.80     0.01     0.04    39.21     0.01    3.36    0.27    3.91   1.05   0.22
xvdb              0.00     0.03    2.18   23.35     0.09     0.96    83.80     0.80   31.38    4.96   33.85   0.28   0.72
xvdc              0.00     0.04    2.18   23.31     0.09     0.96    83.91     0.75   29.61    4.99   31.91   0.28   0.72
dm-0              0.00     0.00    4.35   46.33     0.18     1.91    84.42     1.57   30.88    4.99   33.31   0.15   0.74

03/12/2015 01:36:36 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.03    0.00    0.03    0.01    0.01   99.93

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.60    0.20    1.00     0.00     0.01    16.00     0.00    2.67    4.00    2.40   2.67   0.32
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdc              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

#4

Recreated the node as an HVM instance after finding this: http://highscalability.com/blog/2014/8/18/1-aerospike-server-x-1-amazon-ec2-instance-1-million-tps-for.html

… not seeing much of an improvement :expressionless:

jbuss@aero:~$ asinfo -v 'statistics'
cluster_size=1;cluster_key=89667A74D9F482F1;cluster_integrity=true;objects=33028613;sub-records=0;total-bytes-disk=322122547200;used-bytes-disk=7217096704;free-pct-disk=97;total-bytes-memory=64424509440;used-bytes-memory=3513182344;data-used-bytes-memory=1399351112;index-used-bytes-memory=2113831232;sindex-used-bytes-memory=0;free-pct-memory=94;stat_read_reqs=0;stat_read_reqs_xdr=0;stat_read_success=0;stat_read_errs_notfound=0;stat_read_errs_other=0;stat_write_reqs=35222686;stat_write_reqs_xdr=0;stat_write_success=35222683;stat_write_errs=0;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=0;stat_rw_timeout=0;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=0;udf_scan_rec_reqs=0;udf_query_rec_reqs=0;udf_replica_writes=0;stat_proxy_reqs=0;stat_proxy_reqs_xdr=0;stat_proxy_success=0;stat_proxy_errs=0;stat_ldt_proxy=0;stat_cluster_key_trans_to_proxy_retry=0;stat_cluster_key_transaction_reenqueue=0;stat_slow_trans_queue_push=0;stat_slow_trans_queue_pop=0;stat_slow_trans_queue_batch_pop=0;stat_cluster_key_regular_processed=0;stat_cluster_key_prole_retry=0;stat_cluster_key_err_ack_dup_trans_reenqueue=0;stat_cluster_key_partition_transaction_queue_count=0;stat_cluster_key_err_ack_rw_trans_reenqueue=0;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;err_tsvc_requests=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=0;err_rw_pending_limit=0;err_rw_cant_put_unique=0;fabric_msgs_sent=591025;fabric_msgs_rcvd=591019;paxos_principal=BB92900FD0A0022;migrate_msgs_sent=588939;migrate_msgs_recv=591012;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=35;migrate_num_incoming_refused=0;queue=0;transactions=35290955;reaped_fds=0;tscan_initiate=0;tscan_pending=0;tscan_succeeded=0;tscan_aborted=0;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=0;proxy_action=0;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=35222704;write_prole=0;read_dup_prole=0;rw_err_dup_internal=0;rw_err_dup_cluster_key=0;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=0;rw_err_ack_badnode=0;client_connections=523;waiting_transactions=0;tree_count=0;record_refs=33028632;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=2;err_storage_queue_full=0;partition_actual=4096;partition_replica=0;partition_desync=0;partition_absent=0;partition_object_count=33028679;partition_ref_count=4099;system_free_mem_pct=92;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=0;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=0;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=0;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=0;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=0;uptime=656;stat_write_errs_notfound=0;stat_write_errs_other=0;heartbeat_received_self=0;heartbeat_received_foreign=686;query_reqs=0;query_success=0;query_fail=0;query_abort=0;query_avg_rec_count=0;query_short_queue_full=0;query_long_queue_full=0;query_short_running=0;query_long_running=0;query_tracked=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.32    0.00    0.25    0.08    0.32   99.03

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.15    13.78    1.46   10.61     0.03     0.30    56.02     1.05   86.87   23.58   95.55   1.82   2.19
xvdb              0.04     0.02    0.16   34.94     0.00     1.45    84.67     0.60   16.99    0.25   17.07   0.23   0.81
xvdc              0.04     0.15    0.22   35.42     0.00     1.46    83.68     1.08   30.23    0.18   30.42   0.23   0.82
dm-0              0.00     0.00    0.21   70.46     0.00     2.91    84.22     1.69   23.85    0.17   23.92   0.12   0.85

03/12/2015 04:09:29 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.61    0.00    8.97    0.26    0.57   81.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    0.00    0.20     0.00     0.00     8.00     0.00   20.00    0.00   20.00  20.00   0.40
xvdb              0.00     1.60    0.00  449.20     0.00    18.63    84.93    11.70   25.83    0.00   25.83   0.24  10.56
xvdc              0.00     0.40    0.00  433.80     0.00    17.96    84.78    13.92   31.05    0.00   31.05   0.24  10.56
dm-0              0.00     0.00    0.00  929.60     0.00    38.44    84.68    25.77   27.13    0.00   27.13   0.11  10.56

03/12/2015 04:09:34 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.56    0.00    8.99    0.49    0.65   81.32

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     1.00    0.00    0.60     0.00     0.01    21.33     0.00    6.67    0.00    6.67   6.67   0.40
xvdb              0.00     0.40    0.00  846.40     0.00    35.01    84.71    12.45   14.82    0.00   14.82   0.23  19.52
xvdc              0.00     1.60    0.00  861.00     0.00    35.67    84.86    27.54   32.50    0.00   32.50   0.23  19.68
dm-0              0.00     0.00    0.00 1664.80     0.00    68.83    84.68    40.22   24.48    0.00   24.48   0.12  19.68


#5

Hi Joshua,

We have a detailed set of procedures describing the process we used in the High Scalability post. Please find them in our Amazon Deployment Tuning Guide.


#6

I found that and have been making as many modifications as I can to model your examples. Unfortunately, I cannot use a VPC.

My main question is if it is expected to see write performance drop when growing the cluster… that seems very counter intuitive to me.


#7

When using replication factor 1, you should scale linearly for each node added. With replication factor 2 you would see a performance drop due to replication when expanding from 1 to 2 nodes, but should be linear there on.

Now the servers need to compete with the clients for resources, I would expect this to negatively impact your performance.

This may be true, but the performance of the Aerospike Cluster was not improved, the network performance was. Requiring the servers to compete with the clients for resources would limit the performance of the servers.

Try running the clients from separate machines. Start with one and tune the number of threads for highest TPS and then spin up more instances with the same client configuration. In the 1M TPS procedures, each instance running a client pushed about 250K tps so we needed 4 client instances to fully load a single server instance.


#8

The initial 250+k TPS was when I was inserting into a single-node system. It dropped to the lower level when I added a second node. The nodes are both c3.8xl with the same tuning applied (except for the multi-nic / VPC trick which I can not do at this time). The replication factor for this namespace is 1.


#9
  1. Are they still on the same nodes as the server?

  2. Which client are you using? I have assumed the java benchmark client.

  3. Does the performance improve after migrations have completed?

  4. Could you provide your server configuration?


#10
  1. I was using clients on the same servers and different servers. Moving the clients to just remote servers lowered my throughput substantially
  2. The clients were python-based and just did inserts
  3. No, performance was consistent regardless of whether migrations were happening
  4. Server config was all defaults (with exception of using IPs for node discovery since we’re on EC2)

#11

Sorry for the delay :scream:

These two would indicate that your clients were the bottleneck and the fact that they are python makes that really likely. I suspect that you would have seen the TPS increase if you increased the number of clients indicating that the server could handle more than the clients could push.

Migrations should definently have an affect on performance especially peak performance.

I recommend using the java benchmark client to see how many transactions per second the server can handle. On internal machines the java benchmark can push upwards of 300,000 TPS per second.