AMC :: Total and success Write TPS

Hi Guys,

when i am using AMC for monitoring cluster throughput… i observed that my write TPS is quite confusing.

Total write tps = 5182 ::: success write tps =2182

Does it mean rest of 3000 write transactions are failed ? can anyone please let me know what is total and success tps in case writes ?

Yes, 3000 writes did not result in an actual write. There are ways that such a high value could be expected such as Read-Modify-Write where the generation has changed since the read so Aerospike fails the write with a generation error (error code 3).

If you could run asinfo -v "statistics" -h "ip.address.of.node" we could take a look at what type of errors are happening.

please find the below response when i ran above command on node. I didnt find the reason for 3000 write failures .

cluster_size=2;cluster_key=96D38E0DCB4B6383;cluster_integrity=true;objects=836634;sub-records=0;total-bytes-disk=32212254720;used-bytes-disk=1109409024;free-pct-disk=96;total-bytes-memory=32212254720;used-bytes-memory=955374636;data-used-bytes-memory=901830060;index-used-bytes-memory=53544576;sindex-used-bytes-memory=0;free-pct-memory=97;stat_read_reqs=8920376;stat_read_reqs_xdr=0;stat_read_success=8503515;stat_read_errs_notfound=416861;stat_read_errs_other=0;stat_write_reqs=13380526;stat_write_reqs_xdr=0;stat_write_success=4877030;stat_write_errs=0;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=0;stat_rw_timeout=0;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=0;udf_scan_rec_reqs=0;udf_query_rec_reqs=0;udf_replica_writes=0;stat_proxy_reqs=0;stat_proxy_reqs_xdr=0;stat_proxy_success=0;stat_proxy_errs=0;stat_ldt_proxy=0;stat_cluster_key_trans_to_proxy_retry=0;stat_cluster_key_transaction_reenqueue=0;stat_slow_trans_queue_push=0;stat_slow_trans_queue_pop=0;stat_slow_trans_queue_batch_pop=0;stat_cluster_key_regular_processed=0;stat_cluster_key_prole_retry=0;stat_cluster_key_err_ack_dup_trans_reenqueue=0;stat_cluster_key_partition_transaction_queue_count=0;stat_cluster_key_err_ack_rw_trans_reenqueue=0;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;err_tsvc_requests=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=0;err_rw_pending_limit=0;err_rw_cant_put_unique=0;fabric_msgs_sent=9807453;fabric_msgs_rcvd=9807446;paxos_principal=BB900007FEC8FCB;migrate_msgs_sent=8192;migrate_msgs_recv=16385;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=4096;migrate_num_incoming_refused=0;queue=0;transactions=22395722;reaped_fds=7;tscan_initiate=0;tscan_pending=0;tscan_succeeded=0;tscan_aborted=0;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=0;proxy_action=0;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=13380526;write_prole=9828050;read_dup_prole=0;rw_err_dup_internal=0;rw_err_dup_cluster_key=0;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=0;rw_err_ack_badnode=0;client_connections=137;waiting_transactions=0;tree_count=0;record_refs=836634;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=0;err_storage_queue_full=0;partition_actual=2045;partition_replica=2051;partition_desync=0;partition_absent=0;partition_object_count=836634;partition_ref_count=4096;system_free_mem_pct=81;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=0;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=0;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=0;err_write_fail_key_mismatch=0;stat_duplicate_operation=0;uptime=15594;stat_write_errs_notfound=0;stat_write_errs_other=0;heartbeat_received_self=0;heartbeat_received_foreign=206761;query_reqs=0;query_success=0;query_fail=0;query_abort=0;query_avg_rec_count=0;query_short_queue_full=0;query_long_queue_full=0;query_short_running=0;query_long_running=0;query_tracked=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0

This particular node hasn’t seen any write errors since it was last restarted:

stat_write_errs=0

Could you run the command on the other node as well?

In other node stat_write_errs=0 But in AMC still the write success tps is very less than the total write TPS.We are using LDT(lmap) as part of application.

Could you tell us which version of AMC , Aerospike server and server tools that you are using?

Thanks for your response .

Please find the AMC and Aerospike server and server tools versions

aerospike-server-community-3.4.1-1 aerospike-tools-3.4.1-1 aerospike-amc-community-3.5.2-el5

I am facing the issue of total tps and success tps variation in case of using LDTs . but not in case of JSON. kindly let me know the reason for huge difference of total tps vs success tps

We have been able to reproduce this issue using LDTs, our engineers are currently looking into the root cause. It seems that some successful LDT writes are not incrementing the stat_write_success.

Problem is both read / write UDF are accounted as write_reqs and writes_master but only write UDF account for the write_success. And read UDF success is not being accounted under reads_success …

This is statistics bug. Trust stat_write_success for successful writes. err_write_fail* for failed writes in

asinfo -v "statistics" 

output.

– R

Dear Kporter

Thank you for your response . Please let me know the root cause of this .

When UDF is invoked it is not known beforehand if it is going to perform read or write. So this is accounted as write upfront and is resolved as read after the execution finishes.

The counters were not accordingly adjusted.

– R

Hi kporter,

How you doing .

is this issue got fixed in any release ?

Sorry for the late reply on this. The fix should have made it in our 3.5.3 server release which was made available at the end of February. Let us know if you are still experiencing this issue though.

This is fixed in 3.5.3. http://www.aerospike.com/download/server/notes.html#3.5.3

Release note: UDF/LDT - Server statistics fix.

Will clarify release note. Thanks.