Hits AEROSPIKE_ERR_TIMEOUT more often than I should


#1

I am running a single node cluster in my local box - 64bit ubuntu and non-SSD. The aerospike.conf is configured to store the data in the hard drive.

I hit the timeout error way too often. Almost every 2-3 mins.

Let me know if you have faced the same.


Scan job for 16M records is failing on socket timeout
#2

Is a this data-in-memory namespace?

Could you provide a copy of /etc/aerospike/aerospike.conf?`

Could you provide a link to your Aerospike logs?

Are you running your client from the same node?

How much load are you pushing?


#3

No the data is not stored in memory primarily. The data is stored in the drive and there data-in-memory flag enabled to true which stores the data in memory in addition to file.

.conf

# Aerospike database configuration file.

# This stanza must come first.
service {
	user root
	group root
	paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
	pidfile /var/run/aerospike/asd.pid
	service-threads 4
	transaction-queues 4
	transaction-threads-per-queue 4
	proto-fd-max 15000
}

logging {
	# Log file must be an absolute path.
	file /var/log/aerospike/aerospike.log {
		context any info
	}
}

network {
	service {
		address any
		port 3000
	}

	heartbeat {
		mode multicast
		address 239.1.99.222
		port 9918

		# To use unicast-mesh heartbeats, comment out the 3 lines above and
		# use the following 4 lines instead.
#		mode mesh
#		port 3002
#		mesh-address 10.1.1.1
#		mesh-port 3002

		interval 150
		timeout 10
	}

	fabric {
		port 3001
	}

	info {
		port 3003
	}
}

namespace test {
	replication-factor 2
	memory-size 4G
	default-ttl 30d # 30 days, use 0 to never expire/evict.

	storage-engine memory
}

namespace bar {
	replication-factor 2
	memory-size 4G
	default-ttl 30d # 30 days, use 0 to never expire/evict.

	#storage-engine memory

	# To use file storage backing, comment out the line above and use the
	# following lines instead.
	storage-engine device {
		file /opt/aerospike/data/bar.dat
		filesize 16G
		data-in-memory true # Store data in memory in addition to file.
	}
}

namespace polls {
        replication-factor 1
        memory-size 4G
        default-ttl 30d # 30 days, use 0 to never expire/evict.

        #storage-engine memory

        # To use file storage backing, comment out the line above and use the
        # following lines instead.
       storage-engine device {
               file /opt/aerospike/data/polls.dat
               filesize 16G
               data-in-memory true # Store data in memory in addition to file.
       }
}

The namspace I actively use is ‘polls’(please scroll down in the .conf file to see the configurations of the namespace ‘polls’)

Yes the client is on the same node.

I am not pushing a lot of load. Just a bunch of lists and maps that lengths up to max 10-20 items.

Thanks for your reply.


#4

Running the client on the same node can result in increased timeouts, especially if the client process is resource intensive.

Likewise, a very low transaction throughput can also result in timeouts if the rate is so low that sockets are being closed.

What are you setting your timeout to?


#5

What are you setting your timeout to?

Settings for timeout connections is 1000 ms.

I have increased the timeout and tried, but that doesn’t help.

Running the client on the same node can result in increased timeouts, especially if the client process is resource intensive.

Yes, I am running the client and server on the same node(for development of course) but the load is very very less. I am not sure how this is causing such timeouts so very often.


#6

Hm, could you provide the output of:

asadm -h HOST_IP -e "show stat; asinfo -v build"

#7

When I run the command I get the following outout

~~~ stat ~~~
ERR: Unable to find any Aerospike nodes
~~~ -v build ~~~
ERR: Unable to find any Aerospike nodes

Aerospike version is 3.5.9 Operating System - ubuntu 14.04 Non-SSD, 64 bit machine.


#8

Did you replace “HOST_IP” with the hosts IP?

If you did can you check if Aerospike is still running on that host?

If you are running the command on the host itself then the HOST_IP will default to localhost:

E.G. The following would seed from localhost.

asadm -e "show stat; asinfo -v build"

#9

Sorry about that, here it is

~~~ stat ~~~ ~~test Bin Statistics~~~
NODE           :   1        
bin-names-quota:   32768
num-bin-names  :   0       

~~polls Bin Statistics~~ 
NODE           :   1        
bin-names-quota:  32768    
num-bin-names  :   25      

~~~bar Bin Statistics~~~ 
NODE           :   1        
bin-names-quota:  32768    
num-bin-names  :   17      

~~bar notification Set Statistics~~~ 
NODE                :   1         
n_objects           :   5               
ns_name             :   bar    
set-delete          :   false           
set-enable-xdr      :   use-default     
set-evict-hwm-count :   0               
set-stop-write-count:   0               
set_name            :   notification   

~~bar active_users Set Statistics~~~ 
NODE                :   1         
n_objects           :   1               
ns_name             :   bar    
set-delete          :   false           
set-enable-xdr      :   use-default     
set-evict-hwm-count :   0               
set-stop-write-count:   0               
set_name            :   active_users   

~~~~polls invite Set Statistics~~~~ 
NODE                :   1          
n_objects           :   1              
ns_name             :   polls   
set-delete          :   false          
set-enable-xdr      :   use-default    
set-evict-hwm-count :   0              
set-stop-write-count:   0              
set_name            :   invite 


~~~bar email_sent Set Statistics~~~ 
NODE                :   1          
n_objects           :   1              
ns_name             :   bar    
 set-delete          :   false          
set-enable-xdr      :   use-default    
set-evict-hwm-count :   0              
set-stop-write-count:   0              
set_name            :   email_sent    

~polls notification Set Statistics~~ 
NODE                :   1         
n_objects           :   1               
ns_name             :   polls  
set-delete          :   false           
set-enable-xdr      :   use-default     
set-evict-hwm-count :   0               
set-stop-write-count:   0               
set_name            :   notification   

~~~~~polls feed Set Statistics~~~~~ 
NODE                :   1          
n_objects           :   3              
ns_name             :   polls   
set-delete          :   false          
set-enable-xdr      :   use-default    
set-evict-hwm-count :   0             
 set-stop-write-count:   0              
set_name            :   feed   


~~~~~~bar post Set Statistics~~~~~~ 
NODE                :   1          
n_objects           :   24             
ns_name             :   bar    
 set-delete          :   false          
set-enable-xdr      :   use-default   
set-evict-hwm-count :   0              
set-stop-write-count:   0              
set_name            :   post   


~~~~~polls post Set Statistics~~~~~ NODE                :   1          n_objects           :   5              ns_name             :   polls   set-delete          :   false          set-enable-xdr      :   use-default    set-evict-hwm-count :   0              set-stop-write-count:   0              set_name            :   post   


~~polls user_people_state Set Statistics~ 
NODE                :   1    
n_objects           :   1                    
ns_name             :   polls               
 set-delete          :   false                
set-enable-xdr      :   use-default          
set-evict-hwm-count :   0 
set-stop-write-count:   0                    
set_name            :   user_people_state   

~polls active_users Set Statistics~~ 
NODE                :   1         
n_objects           :   1               
ns_name             :   polls  
set-delete          :   false           
set-enable-xdr      :   use-default     
set-evict-hwm-count :   0               
set-stop-write-count:   0               
set_name            :   active_users   

~~~~~~bar feed Set Statistics~~~~~~ 
NODE                :   1          
n_objects           :   7              
ns_name             :   bar     
set-delete          :   false          
set-enable-xdr      :   use-default    
set-evict-hwm-count :   0              
set-stop-write-count:   0              
set_name            :   feed   


~~~~~~~~~~~~~~~~~~~~~~~~~~Service Statistics~~~~~~~~~~~~~~~~~~~~~~~~~~ 
NODE                                              :   1                batch_errors                                      :   0                batch_initiate                                    :   0                batch_queue                                       :   0                batch_timeout                                     :   0                batch_tree_count                                  :   0                client_connections                                :   7                cluster_integrity                                 :   true             cluster_key                                       :   D02C51D2614BF9A1 cluster_size                                      :   1                
data-used-bytes-memory                            :   15695            delete_queue                                      :   0                err_duplicate_proxy_request                       :   0                err_out_of_space                                  :   0                err_replica_non_null_node                         :   0                err_replica_null_node                             :   0                err_rw_cant_put_unique                            :   0                err_rw_pending_limit                              :   0                err_rw_request_not_found                          :   0                err_storage_queue_full                            :   0                err_sync_copy_null_master                         :   0                err_sync_copy_null_node                           :   0                err_tsvc_requests                                 :   0                err_write_fail_bin_exists                         :   0                err_write_fail_bin_name                           :   0                err_write_fail_bin_not_found                      :   0                err_write_fail_forbidden                          :   0                err_write_fail_generation                         :   0                err_write_fail_generation_xdr                     :   0                err_write_fail_incompatible_type                  :   0                err_write_fail_key_exists                         :   0                err_write_fail_key_mismatch                       :   0                err_write_fail_not_found                          :   0                err_write_fail_noxdr                              :   0                err_write_fail_parameter                          :   0                err_write_fail_prole_delete                       :   0                err_write_fail_prole_generation                   :   0                err_write_fail_prole_unknown                      :   0                err_write_fail_record_too_big                     :   0                err_write_fail_unknown                            :   0                fabric_msgs_rcvd                                  :   0                fabric_msgs_sent                                  :   0                free-pct-disk                                     :   99               
free-pct-memory                                   :   99               heartbeat_received_foreign                        :   0                heartbeat_received_self                           :   299              
index-used-bytes-memory                           :   3200             info_queue                                        :   0                migrate_msgs_recv                                 :   0                migrate_msgs_sent                                 :   0                migrate_num_incoming_accepted                     :   0                migrate_num_incoming_refused                      :   0                migrate_progress_recv                             :   0                migrate_progress_send                             :   0                migrate_rx_objs                                   :   0                migrate_tx_objs                                   :   0                objects                                           :   50               ongoing_write_reqs                                :   0                partition_absent                                  :   0                partition_actual                                  :   12288            partition_desync                                  :   0                partition_object_count                            :   50               partition_ref_count                               :   12288            partition_replica                                 :   0                paxos_principal                                   :   BB995C9E9272D14  proxy_action                                      :   0                proxy_in_progress                                 :   0                proxy_initiate                                    :   0                proxy_retry                                       :   0                proxy_retry_new_dest                              :   0                proxy_retry_q_full                                :   0                proxy_retry_same_dest                             :   0                proxy_unproxy                                     :   0                query_abort                                       :   0                query_agg                                         :   0                query_agg_abort                                   :   0                query_agg_avg_rec_count                           :   0                query_agg_err                                     :   0                query_agg_success                                 :   0                query_avg_rec_count                               :   0                query_fail                                        :   0                query_long_queue_full                             :   0                query_long_queue_size                             :   0                query_long_running                                :   0                query_lookup_abort                                :   0                query_lookup_avg_rec_count                        :   0                query_lookup_err                                  :   0                query_lookup_success                              :   0                query_lookups                                     :   0                query_reqs                                        :   0                query_short_queue_full                            :   0                query_short_queue_size                            :   0                query_short_running                               :   0                query_success                                     :   0                query_tracked                                     :   0                queue                                             :   0                read_dup_prole                                    :   0                reaped_fds                                        :   0                record_locks                                      :   0                record_refs                                       :   50               rw_err_ack_badnode                                :   0                rw_err_ack_internal                               :   0                rw_err_ack_nomatch                                :   0                rw_err_dup_cluster_key                            :   0                rw_err_dup_internal                               :   0                rw_err_dup_send                                   :   0                rw_err_write_cluster_key                          :   0                rw_err_write_internal                             :   0                rw_err_write_send                                 :   0                
sindex-used-bytes-memory                          :   203070           sindex_gc_activity_dur                            :   0                sindex_gc_garbage_cleaned                         :   0                sindex_gc_garbage_found                           :   0                sindex_gc_inactivity_dur                          :   20000            sindex_gc_list_creation_time                      :   0                sindex_gc_list_deletion_time                      :   0                sindex_gc_locktimedout                            :   0                sindex_gc_objects_validated                       :   0                sindex_ucgarbage_found                            :   0                stat_cluster_key_err_ack_dup_trans_reenqueue      :   0                stat_cluster_key_err_ack_rw_trans_reenqueue       :   0                stat_cluster_key_partition_transaction_queue_count:   0                stat_cluster_key_prole_retry                      :   0                stat_cluster_key_regular_processed                :   0                stat_cluster_key_trans_to_proxy_retry             :   0                stat_cluster_key_transaction_reenqueue            :   0                stat_delete_success                               :   0                stat_deleted_set_objects                          :   0                stat_duplicate_operation                          :   0                stat_evicted_objects                              :   0                stat_evicted_objects_time                         :   0                stat_evicted_set_objects                          :   0                stat_expired_objects                              :   0                stat_ldt_proxy                                    :   0                stat_nsup_deletes_not_shipped                     :   0                stat_proxy_errs                                   :   0                stat_proxy_reqs                                   :   0                stat_proxy_reqs_xdr                               :   0                stat_proxy_success                                :   0                stat_read_errs_notfound                           :   0                stat_read_errs_other                              :   0                stat_read_reqs                                    :   0                stat_read_reqs_xdr                                :   0                stat_read_success                                 :   0                stat_rw_timeout                                   :   0                stat_slow_trans_queue_batch_pop                   :   0                stat_slow_trans_queue_pop                         :   0                stat_slow_trans_queue_push                        :   0                stat_write_errs                                   :   0                stat_write_errs_notfound                          :   0                stat_write_errs_other                             :   0                stat_write_reqs                                   :   0                stat_write_reqs_xdr                               :   0                stat_write_success                                :   0                stat_xdr_pipe_miss                                :   0                stat_xdr_pipe_writes                              :   0                stat_zero_bin_records                             :   0                storage_defrag_corrupt_record                     :   0                sub-records                                       :   0                system_free_mem_pct                               :   77               system_swapping                                   :   false            
total-bytes-disk                                  :   34359738368      
total-bytes-memory                                :   12884901888      transactions                                      :   21               tree_count                                        :   0                tscan_aborted                                     :   0                tscan_initiate                                    :   0                tscan_pending                                     :   0                tscan_succeeded                                   :   0                udf_delete_err_others                             :   0                udf_delete_reqs                                   :   0                udf_delete_success                                :   0                udf_lua_errs                                      :   0                udf_query_rec_reqs                                :   0                udf_read_errs_other                               :   0                udf_read_reqs                                     :   0                udf_read_success                                  :   0                udf_replica_writes                                :   0                udf_scan_rec_reqs                                 :   0                udf_write_err_others                              :   0                udf_write_reqs                                    :   0                udf_write_success                                 :   0                uptime                                            :   31               used-bytes-disk                                   :   27776            
used-bytes-memory                                 :   221965           waiting_transactions                              :   0                write_master                                      :   0                write_prole                                       :   0               

~~~~~~~~~~~~~~~~~~~bar Namespace Statistics~~~~~~~~~~~~~~~~~~~ 
NODE    :   1                              
allow_versions                 :   false                          
available-bin-names            :   32751                          
available_pct                  :   99   
cold-start-evict-ttl           :   4294967295                     
conflict-resolution-policy     :   generation                     
current-time                   :   169017220                      
data-in-memory                 :   true                           
data-used-bytes-memory         :   13720                          
default-ttl                    :   2592000                        
defrag-lwm-pct                 :   50                             
defrag-queue-min               :   0                              
defrag-sleep                   :   1000                           
defrag-startup-minimum         :   10                             
disallow-null-setname          :   false                          
enable-xdr                     :   false                          
evict-tenths-pct               :   5                              
evicted-objects                :   0                              
expired-objects                :   0                              
file :   /opt/aerospike/data/bar.dat    
filesize                       :   17179869184                    
flush-max-ms                   :   1000 
free-pct-disk                  :   99                             
free-pct-memory                :   99                             
fsync-max-sec                  :   0                              
high-water-disk-pct            :   50                             
high-water-memory-pct          :   60                             
hwm-breached                   :   false                          
index-used-bytes-memory        :   2432                           
ldt-enabled                    :   false                          
ldt-page-size                  :   8192                           
master-objects                 :   38                             
master-sub-objects             :   0                              
max-ttl                        :   0                              
max-void-time                  :   171525071                      
max-write-cache                :   67108864                       
memory-size                    :   4294967296                     
min-avail-pct                  :   5                              
non-expirable-objects          :   0                              
ns-forward-xdr-writes          :   false                          
nsup-cycle-duration            :   0                              
nsup-cycle-sleep-pct           :   0                              
obj-size-hist-max              :   100                            
objects                        :   38                             
post-write-queue               :   0                              
prole-objects                  :   0                              
prole-sub-objects              :   0                              
read-consistency-level-override:   off                            
repl-factor                    :   1                              
set-deleted-objects            :   0                              
set-evicted-objects            :   0                              
sets-enable-xdr                :   true                           
sindex-used-bytes-memory       :   0                              
single-bin                     :   false                          
stop-writes                    :   false                          
stop-writes-pct                :   90                             
sub-objects                    :   0                              
total-bytes-disk               :   17179869184                    
total-bytes-memory             :   4294967296                     
type :   device                         
used-bytes-disk                :   23168                          
used-bytes-memory              :   16152                          
write-commit-level-override    :   off  write-smoothing-period     

~~~~~~~~~~~~~~~~~~~polls Namespace Statistics~~~~~~~~~~~~~~~~~~~ 
NODE  :   1                                
allow_versions                 :  false                            
available-bin-names            :   32743                            
available_pct                  :   99 
cold-start-evict-ttl           :   4294967295                       
conflict-resolution-policy     :   generation                       
current-time                   :   169017220                        
data-in-memory                 :   true                             
data-used-bytes-memory         :   1975                             
default-ttl                    :   0                                
defrag-lwm-pct                 :   50                               
defrag-queue-min               :   0                                
defrag-sleep                   :   1000                             
defrag-startup-minimum         :   10                               
disallow-null-setname          :   false                            
enable-xdr                     :   false                            
evict-tenths-pct               :   5                                
evicted-objects                :   0                                
expired-objects                :   0                                
file                           :   /opt/aerospike/data/polls.dat    
filesize                       :   17179869184                      
flush-max-ms                   :   1000                             
free-pct-disk                  :   99                               
free-pct-memory                :   99                               
fsync-max-sec                  :   0                                
high-water-disk-pct            :   50                               
high-water-memory-pct          :   60                               
hwm-breached                   :   false                            
index-used-bytes-memory        :   768                              
ldt-enabled                    :   false                            
ldt-page-size                  :   8192                             
master-objects                 :   12                               
master-sub-objects             :   0                                
max-ttl                        :   0                                
max-void-time                  :   0                                
max-write-cache                :   67108864                         
memory-size                    :   4294967296                       
min-avail-pct                  :   5                                
non-expirable-objects          :   0                                
ns-forward-xdr-writes          :   false                            
nsup-cycle-duration            :   0                                
nsup-cycle-sleep-pct           :   0                                
obj-size-hist-max              :   100                              
objects                        :   12                               
post-write-queue               :   0                                
prole-objects                  :   0                                
prole-sub-objects              :   0                                
read-consistency-level-override:   off                              
repl-factor                    :   1                                
set-deleted-objects            :   0                                
set-evicted-objects            :   0                                
sets-enable-xdr                :   true                             
sindex-used-bytes-memory       :   203070                           
single-bin                     :   false                            
stop-writes                    :   false                            
stop-writes-pct                :   90                               
sub-objects                    :   0                                
total-bytes-disk               :   17179869184                      
total-bytes-memory             :   4294967296                       
type                           :   device

~~~~~~~~~~test Namespace Statistics~~~~~~~~~~ 
NODE                           :   1             
allow_versions                 :   false         
available-bin-names            :   32768         
cold-start-evict-ttl           :   4294967295    
conflict-resolution-policy     :   generation    
current-time                   :   169017220     
data-used-bytes-memory         :   0             
default-ttl                    :   2592000       
disallow-null-setname          :   false         
enable-xdr                     :   false         
evict-tenths-pct               :   5             
evicted-objects                :   0             
expired-objects                :   0             
free-pct-memory                :   100           
high-water-disk-pct            :   50            
high-water-memory-pct          :   60            
hwm-breached                   :   false         
index-used-bytes-memory        :   0             
ldt-enabled                    :   false         
ldt-page-size         :   8192          
master-objects                 :   0             
master-sub-objects             :   0             
max-ttl               :   0             
max-void-time                  :   0             
memory-size                    :   4294967296    
non-expirable-objects :   0             
ns-forward-xdr-writes          :   false         
nsup-cycle-duration            :   0             
nsup-cycle-sleep-pct  :   0             
objects                        :   0             
prole-objects                  :   0             
prole-sub-objects     :   0             
read-consistency-level-override:   off           
repl-factor                    :   1             
set-deleted-objects   :   0             
set-evicted-objects            :   0             
sets-enable-xdr                :   true          
sindex-used-bytes-memory       :   0             
single-bin            :   false         
stop-writes                    :   false         
stop-writes-pct                :   90            
sub-objects           :   0             
total-bytes-memory             :   4294967296    
type                           :   memory        
used-bytes-memory     :   0             
write-commit-level-override    :   off

~~~ -v build ~~~ 1 (192.168.1.5) returned:
3.5.9

Polls is the namespace I am currently using.


#10

There aren’t any timeouts appearing in these stats, also the uptime is only 31 seconds. Could you provide the stats of the server after the issue has been observed?

Also I should have asked for the latency histograms from the logs.

The service stats should be enough, could you run and provide the output of the following after the issue has been observed?

asadm -e "show stat service"
tail -n 10000 /var/log/aerospike/aerospike.log | grep "hist.c" 

#11

Here’s the hist -

I couldnt paste the out put here as it too long.

May 14 2015 12:17:01 GMT: INFO (info): (hist.c::137) histogram dump: reads (14848 total) msec
May 14 2015 12:17:01 GMT: INFO (info): (hist.c::163)  (00: 0000014848)
May 14 2015 12:17:01 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (359 total) msec
May 14 2015 12:17:01 GMT: INFO (info): (hist.c::163)  (00: 0000000359)
May 14 2015 12:17:01 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
May 14 2015 12:17:01 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (364 total) msec
May 14 2015 12:17:01 GMT: INFO (info): (hist.c::163)  (00: 0000000364)
May 14 2015 12:17:01 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
May 14 2015 12:17:01 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
May 14 2015 12:17:01 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count

#12

All 14848 reads and 359 writes succeeded in <= 1 ms. But these histograms are for the successful reads and writes.

Could you provide the output of the following after the timeouts have been observed?

asadm -e "show stat service"

Also which client are you using?


#13

Sure, I’ll do that the next time I have this error. It’s not been happening for a while now.