Aerospike service shuts down when too many writes

Hello, I’m running Aerospike Community 3.5 on an Ubuntu 14.04 server with 1GB ram and 1GHz CPU with a non SSD hard drive (just a test server).

I’m using the Go client to either fetch data via a scan or create a record.

When I run Aerospike under a high load, reads only, it performs fine. But when run something that does more than 750 concurrent writes (with 3 instances of my Go client based app running, load balanced by nginx), the database server stops after creating about 200 or so records:

Here’s my config file:

namespace default {
        replication-factor 1
        memory-size 4G
        default-ttl 0
        
        storage-engine device {
                file /opt/aerospike/data/default.dat
                filesize 2T
                data-in-memory true
        }
}

Here’s the tail of the log output when it stops (last 200 or so lines):

Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4796)  system memory: free 189808kb ( 38 percent free ) 
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1014 ::: sub_objects 0
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4812)  rec refs 1014 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (15, 248, 233) : hb (0, 0, 0) : fab (16, 16, 0)
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 0 : foreign 0
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4860) namespace test: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4860) namespace default: disk inuse: 391680 memory inuse: 216591 (bytes) sindex memory inuse: 210842 (bytes) avail pct 99
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4860) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:25:58 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 12288 sync 0 desync 0 zombie 0 wait 0 absent 0
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::137) histogram dump: reads (1334 total) msec
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::154)  (00: 0000001198) (01: 0000000027) (02: 0000000031) (03: 0000000024)
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::163)  (04: 0000000036) (05: 0000000016) (06: 0000000002)
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (689 total) msec
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::154)  (00: 0000000601) (01: 0000000017) (02: 0000000029) (03: 0000000021)
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::163)  (04: 0000000015) (05: 0000000006)
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (689 total) msec
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::154)  (00: 0000000601) (01: 0000000017) (02: 0000000029) (03: 0000000021)
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::163)  (04: 0000000015) (05: 0000000006)
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::137) histogram dump: query (1224 total) msec
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::154)  (03: 0000000296) (04: 0000000497) (05: 0000000327) (06: 0000000068)
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::163)  (07: 0000000036)
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (1224 total) count
Jun 09 2015 18:25:58 GMT: INFO (info): (hist.c::163)  (00: 0000000689) (04: 0000000535)
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4796)  system memory: free 178984kb ( 36 percent free ) 
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1247 ::: sub_objects 0
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4812)  rec refs 1247 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (15, 248, 233) : hb (0, 0, 0) : fab (16, 16, 0)
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 0 : foreign 0
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4860) namespace test: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4860) namespace default: disk inuse: 481152 memory inuse: 266115 (bytes) sindex memory inuse: 244744 (bytes) avail pct 99
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4860) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:08 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 12288 sync 0 desync 0 zombie 0 wait 0 absent 0
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::137) histogram dump: reads (1801 total) msec
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::154)  (00: 0000001618) (01: 0000000040) (02: 0000000039) (03: 0000000039)
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::163)  (04: 0000000042) (05: 0000000021) (06: 0000000002)
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (922 total) msec
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::154)  (00: 0000000805) (01: 0000000027) (02: 0000000036) (03: 0000000028)
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::163)  (04: 0000000017) (05: 0000000009)
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (922 total) msec
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::154)  (00: 0000000805) (01: 0000000027) (02: 0000000036) (03: 0000000028)
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::163)  (04: 0000000017) (05: 0000000009)
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::137) histogram dump: query (1690 total) msec
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::154)  (03: 0000000386) (04: 0000000705) (05: 0000000482) (06: 0000000081)
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::163)  (07: 0000000036)
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (1690 total) count
Jun 09 2015 18:26:08 GMT: INFO (info): (hist.c::163)  (00: 0000000923) (04: 0000000767)
Jun 09 2015 18:26:10 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/default.dat: used 518016, contig-free 2097150M (2097150 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 4 (0.0/s)
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4796)  system memory: free 189368kb ( 38 percent free ) 
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1514 ::: sub_objects 0
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4812)  rec refs 1514 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (15, 248, 233) : hb (0, 0, 0) : fab (16, 16, 0)
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 0 : foreign 0
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4860) namespace test: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4860) namespace default: disk inuse: 583680 memory inuse: 322887 (bytes) sindex memory inuse: 283314 (bytes) avail pct 99
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4860) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:18 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 12288 sync 0 desync 0 zombie 0 wait 0 absent 0
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::137) histogram dump: reads (2334 total) msec
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::154)  (00: 0000002078) (01: 0000000057) (02: 0000000061) (03: 0000000053)
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::163)  (04: 0000000051) (05: 0000000031) (06: 0000000003)
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (1189 total) msec
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::154)  (00: 0000001030) (01: 0000000041) (02: 0000000046) (03: 0000000038)
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::163)  (04: 0000000022) (05: 0000000012)
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (1189 total) msec
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::154)  (00: 0000001030) (01: 0000000041) (02: 0000000046) (03: 0000000038)
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::163)  (04: 0000000022) (05: 0000000012)
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::137) histogram dump: query (2225 total) msec
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::154)  (03: 0000000443) (04: 0000000887) (05: 0000000681) (06: 0000000177)
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::163)  (07: 0000000037)
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (2225 total) count
Jun 09 2015 18:26:18 GMT: INFO (info): (hist.c::163)  (00: 0000001189) (04: 0000001036)
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4796)  system memory: free 189368kb ( 38 percent free ) 
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1514 ::: sub_objects 0
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4812)  rec refs 1514 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (15, 248, 233) : hb (0, 0, 0) : fab (16, 16, 0)
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 0 : foreign 0
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4860) namespace test: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4860) namespace default: disk inuse: 583680 memory inuse: 322887 (bytes) sindex memory inuse: 283314 (bytes) avail pct 99
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4860) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:28 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 12288 sync 0 desync 0 zombie 0 wait 0 absent 0
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::137) histogram dump: reads (2334 total) msec
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::154)  (00: 0000002078) (01: 0000000057) (02: 0000000061) (03: 0000000053)
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::163)  (04: 0000000051) (05: 0000000031) (06: 0000000003)
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (1189 total) msec
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::154)  (00: 0000001030) (01: 0000000041) (02: 0000000046) (03: 0000000038)
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::163)  (04: 0000000022) (05: 0000000012)
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (1189 total) msec
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::154)  (00: 0000001030) (01: 0000000041) (02: 0000000046) (03: 0000000038)
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::163)  (04: 0000000022) (05: 0000000012)
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::137) histogram dump: query (2225 total) msec
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::154)  (03: 0000000443) (04: 0000000887) (05: 0000000681) (06: 0000000177)
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::163)  (07: 0000000037)
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (2225 total) count
Jun 09 2015 18:26:28 GMT: INFO (info): (hist.c::163)  (00: 0000001189) (04: 0000001036)
Jun 09 2015 18:26:30 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/default.dat: used 583680, contig-free 2097150M (2097150 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 4 (0.0/s)
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4796)  system memory: free 189376kb ( 38 percent free ) 
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1514 ::: sub_objects 0
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4812)  rec refs 1514 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (15, 248, 233) : hb (0, 0, 0) : fab (16, 16, 0)
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 0 : foreign 0
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4860) namespace test: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4860) namespace default: disk inuse: 583680 memory inuse: 322887 (bytes) sindex memory inuse: 283314 (bytes) avail pct 99
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4860) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:38 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 12288 sync 0 desync 0 zombie 0 wait 0 absent 0
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::137) histogram dump: reads (2334 total) msec
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::154)  (00: 0000002078) (01: 0000000057) (02: 0000000061) (03: 0000000053)
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::163)  (04: 0000000051) (05: 0000000031) (06: 0000000003)
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (1189 total) msec
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::154)  (00: 0000001030) (01: 0000000041) (02: 0000000046) (03: 0000000038)
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::163)  (04: 0000000022) (05: 0000000012)
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (1189 total) msec
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::154)  (00: 0000001030) (01: 0000000041) (02: 0000000046) (03: 0000000038)
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::163)  (04: 0000000022) (05: 0000000012)
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::137) histogram dump: query (2225 total) msec
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::154)  (03: 0000000443) (04: 0000000887) (05: 0000000681) (06: 0000000177)
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::163)  (07: 0000000037)
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (2225 total) count
Jun 09 2015 18:26:38 GMT: INFO (info): (hist.c::163)  (00: 0000001189) (04: 0000001036)
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4796)  system memory: free 186372kb ( 37 percent free ) 
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1514 ::: sub_objects 0
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4812)  rec refs 1514 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (15, 248, 233) : hb (0, 0, 0) : fab (16, 16, 0)
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 0 : foreign 0
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4860) namespace test: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4860) namespace default: disk inuse: 583680 memory inuse: 322887 (bytes) sindex memory inuse: 283314 (bytes) avail pct 99
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4860) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:48 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 12288 sync 0 desync 0 zombie 0 wait 0 absent 0
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::137) histogram dump: reads (2334 total) msec
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::154)  (00: 0000002078) (01: 0000000057) (02: 0000000061) (03: 0000000053)
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::163)  (04: 0000000051) (05: 0000000031) (06: 0000000003)
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (1189 total) msec
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::154)  (00: 0000001030) (01: 0000000041) (02: 0000000046) (03: 0000000038)
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::163)  (04: 0000000022) (05: 0000000012)
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (1189 total) msec
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::154)  (00: 0000001030) (01: 0000000041) (02: 0000000046) (03: 0000000038)
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::163)  (04: 0000000022) (05: 0000000012)
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::137) histogram dump: query (2225 total) msec
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::154)  (03: 0000000443) (04: 0000000887) (05: 0000000681) (06: 0000000177)
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::163)  (07: 0000000037)
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (2225 total) count
Jun 09 2015 18:26:48 GMT: INFO (info): (hist.c::163)  (00: 0000001189) (04: 0000001036)
Jun 09 2015 18:26:50 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/default.dat: used 583680, contig-free 2097150M (2097150 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 4 (0.0/s)
Jun 09 2015 18:26:56 GMT: INFO (nsup): (thr_nsup.c::1319) {test} nsup start
Jun 09 2015 18:26:56 GMT: INFO (nsup): (thr_nsup.c::1237) {test} Records: 0, 0 0-vt, 0(0) expired, 0(0) evicted, 0(0) set deletes, 0(0) set evicted. Evict ttls: 0,0,0.000. Waits: 0,0,0. Total time: 5 ms
Jun 09 2015 18:26:56 GMT: INFO (nsup): (thr_nsup.c::1319) {default} nsup start
Jun 09 2015 18:26:56 GMT: INFO (nsup): (thr_nsup.c::1237) {default} Records: 1514, 1514 0-vt, 0(0) expired, 0(0) evicted, 0(0) set deletes, 0(0) set evicted. Evict ttls: 0,0,0.000. Waits: 0,0,0. Total time: 7 ms
Jun 09 2015 18:26:56 GMT: INFO (nsup): (thr_nsup.c::1319) {bar} nsup start
Jun 09 2015 18:26:56 GMT: INFO (nsup): (thr_nsup.c::1237) {bar} Records: 0, 0 0-vt, 0(0) expired, 0(0) evicted, 0(0) set deletes, 0(0) set evicted. Evict ttls: 0,0,0.000. Waits: 0,0,0. Total time: 5 ms
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4796)  system memory: free 186296kb ( 37 percent free ) 
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1514 ::: sub_objects 0
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4812)  rec refs 1514 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (15, 248, 233) : hb (0, 0, 0) : fab (16, 16, 0)
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 0 : foreign 0
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4860) namespace test: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4860) namespace default: disk inuse: 583680 memory inuse: 322887 (bytes) sindex memory inuse: 283314 (bytes) avail pct 99
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4860) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:26:58 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 12288 sync 0 desync 0 zombie 0 wait 0 absent 0
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::137) histogram dump: reads (2334 total) msec
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::154)  (00: 0000002078) (01: 0000000057) (02: 0000000061) (03: 0000000053)
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::163)  (04: 0000000051) (05: 0000000031) (06: 0000000003)
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (1189 total) msec
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::154)  (00: 0000001030) (01: 0000000041) (02: 0000000046) (03: 0000000038)
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::163)  (04: 0000000022) (05: 0000000012)
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (1189 total) msec
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::154)  (00: 0000001030) (01: 0000000041) (02: 0000000046) (03: 0000000038)
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::163)  (04: 0000000022) (05: 0000000012)
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::137) histogram dump: query (2225 total) msec
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::154)  (03: 0000000443) (04: 0000000887) (05: 0000000681) (06: 0000000177)
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::163)  (07: 0000000037)
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (2225 total) count
Jun 09 2015 18:26:58 GMT: INFO (info): (hist.c::163)  (00: 0000001189) (04: 0000001036)
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4796)  system memory: free 122700kb ( 24 percent free ) 
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1687 ::: sub_objects 0
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4812)  rec refs 1687 ::: rec locks 0 ::: trees 0 ::: wr reqs 1 ::: mig tx 0 ::: mig rx 0
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 1 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (45, 278, 233) : hb (0, 0, 0) : fab (16, 16, 0)
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 0 : foreign 0
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 11553 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 1
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4860) namespace test: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4860) namespace default: disk inuse: 650112 memory inuse: 359647 (bytes) sindex memory inuse: 308888 (bytes) avail pct 99
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4860) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Jun 09 2015 18:27:08 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 12288 sync 0 desync 0 zombie 0 wait 0 absent 0
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::137) histogram dump: reads (2678 total) msec
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::154)  (00: 0000002256) (01: 0000000081) (02: 0000000081) (03: 0000000066)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::154)  (04: 0000000070) (05: 0000000069) (06: 0000000035) (07: 0000000019)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::163)  (08: 0000000001)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (1361 total) msec
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::154)  (00: 0000001133) (01: 0000000046) (02: 0000000051) (03: 0000000051)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::154)  (04: 0000000030) (05: 0000000026) (06: 0000000016) (07: 0000000006)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::163)  (08: 0000000002)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (1361 total) msec
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::154)  (00: 0000001133) (01: 0000000046) (02: 0000000051) (03: 0000000051)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::154)  (04: 0000000030) (05: 0000000026) (06: 0000000016) (07: 0000000006)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::163)  (08: 0000000002)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::137) histogram dump: query (2556 total) msec
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::154)  (03: 0000000452) (04: 0000000905) (05: 0000000710) (06: 0000000213)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::163)  (07: 0000000147) (08: 0000000126) (10: 0000000003)
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (2556 total) count
Jun 09 2015 18:27:08 GMT: INFO (info): (hist.c::163)  (00: 0000001372) (04: 0000001184)
Jun 09 2015 18:27:10 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/default.dat: used 695040, contig-free 2097150M (2097150 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 4 (0.0/s)

There isn’t any indication in the log snippet of a crash. Also in the ~2 min of logs there are ~1500 read and about ~200 writes transactions, which should be far from any kind of load issue.

Ok but I’m trying to do 750 concurrent writes and it fails after around 200 writes. Maybe it’s a client library issues so shall I provide a Go snippet to reproduce with ab tool or?

Just to make sure it is not one of the usual issues:

  1. On creating your client, set your ClientPolicy.LimitConnectionsToQueueSize = true and try again.
  2. Make sure your client library is up to date. ($ go get -u .)

If this doesn’t work, then we’ll have to debug using your snippet.

How does it fail? Does the server crash?

@khosrow, @kporter,

Getting closer here:

It appears that I can handle the load just fine if all I’m doing is performing an Exists() call then Put() for creating user records.

Yet, the server mysteriously stops (nothing I can see in aerospike.log) whenever I perform a Query() (and iterate over the limited results) immediately before the Put(). Also, the same problem occurs when I do my Put() with one Aerospike client instance (connection) then with another client instance (via a separate thread) I run a Query() (and iterate over another limited result list).

Note that I’m using a clustered setup + nginx for the Go webapp. So there are quite a good amount of Client instances being created.

Also note that I do indeed have LimitConnectionsToQueueSize = true (as recommended)

I can provide test code if needed but first maybe do either of you have any thoughts?

Thanks for your help.

Nicholas,

Aerospike supports your use-case through Policies. If you set the RecordExistsAction = CREATE_ONLY in WritePolicy for Put(), you’ll be sure that the record will either be created, or the operation will fail.

Checking Exists() before Put() is not atomic, and can go wrong. You can also do multiple operations in a single transaction on a Record. Refer to Operate() documentation for more info.

That being said, what you have encountered is very interesting, and merits investigation. Would be great if you could provide a snippet that reproduces the issue.

Still a problem but getting closer…

  1. I added the CREATE_ONLY policy for my Put() and the issue disappears. But could you go into some detail why it’s not “safe” to do an Exists() then a Put() with the default write policy?

  2. This issue still occurs when I run a more realistic simulation: a) creating a user then b) fetching a “news feed” (Query() with a filter on a secondary index). I’m making 1000+ concurrent calls to my endpoints with a ramp up time of 10 seconds.

To reproduce, there are two files, the Go snippet and the Jmeter load test file to simulate. ** Please follow the steps in the comments at the top of the Go file ** I’m running this on Ubuntu 14.04 with Aerospike Community 3.5 on 500MB RAM (yes, below requirements but during testing the RAM does not max out) and 1GHz CPU on a Rackspace VPS instance.

You’ll need to follow the setup directions at the top of the Go snippet

FILE 1 OF 2:

package main

import (
    "encoding/json"
    "fmt"
    "net/http"
    "runtime"
    "strconv"
    "time"

    "github.com/aerospike/aerospike-client-go"
    "github.com/codegangsta/negroni"
    "github.com/gorilla/mux"
    "github.com/jmcvetta/randutil"
)

const (
    DB_URI    = "162.209.101.94"
    NAMESPACE = "default"
)

/*
SETUP:
1. use this namespace:
namespace default {
        replication-factor 2
        memory-size 4G
        default-ttl 0

        storage-engine device {
                file /opt/aerospike/data/default.dat
                filesize 2T
                data-in-memory true
        }
}
2. import some sample data using the aql tool


CREATE INDEX usersuid ON default.users (uid) STRING
CREATE INDEX feeditemsuid ON default.feeditems (uid) STRING
CREATE INDEX feeditemsvisible ON default.feeditems (visible) NUMERIC

INSERT INTO default.feeditems (PK, uid, title, visible) VALUES ('vpieaaJreoPX1433781281', 'vpieaaJreoPX1433781281', 'Feed Item 1', 1)
INSERT INTO default.feeditems (PK, uid, title, visible) VALUES ('vpieaaJreoPX1433781282', 'vpieaaJreoPX1433781282', 'Feed Item 2', 1)
INSERT INTO default.feeditems (PK, uid, title, visible) VALUES ('vpieaaJreoPX1433781283', 'vpieaaJreoPX1433781283', 'Feed Item 3', 1)
INSERT INTO default.feeditems (PK, uid, title, visible) VALUES ('vpieaaJreoPX1433781284', 'vpieaaJreoPX1433781284', 'Feed Item 4', 0)
3. build
go build && ./aerospiketest
4. ensure things work:
curl -XPOST -d "name=adam" YOURSERVERHOSTNAME:8010/signup
curl YOURSERVERHOSTNAME:8010/feed
5. to load test:
a. open the the provided jmeter .jmx file
b. edit the 'Server Name' field in Test Plan > Thread Group > HTTP Request Defaults
*/

var client *aerospike.Client

type User struct {
    Uid  string `json:"uid",string`
    Name string `json:"name",string`
}

type FeedItem struct {
    Uid     string `json:"uid",string`
    Title   string `json:"title",string`
    Visible bool   `json:"visible",bool`
}

func main() {
    runtime.GOMAXPROCS(runtime.NumCPU())

    router := mux.NewRouter()
    router.HandleFunc("/signup", signUpHandler).Methods("POST")
    router.HandleFunc("/feed", feedHandler).Methods("GET")

    cp := aerospike.NewClientPolicy()
    cp.LimitConnectionsToQueueSize = true
    c, err := aerospike.NewClientWithPolicy(cp, DB_URI, 3000)
    if err != nil {
        panic(err)
    }
    client = c

    n := negroni.Classic()
    n.UseHandler(router)
    n.Run(fmt.Sprintf(":%v", 8010))
}

// handlers

// signUpHandler simply checks to ensure user doesn't already exist then creates a user record
func signUpHandler(w http.ResponseWriter, r *http.Request) {

    newUser, _ := newUserFromReq(r, true)

    key, err := aerospike.NewKey(NAMESPACE, "users", newUser.Uid)
    panicIfErr(err)

    // complain if user exists
    // (we're impl a consumer-friendly API, so we'd normally notify if user already exists)
    exists, err := client.Exists(nil, key)
    panicIfErr(err)
    if exists {
        panicIfErr(err)
    }

    // put record
    bins := aerospike.BinMap{
        "uid":  newUser.Uid,
        "name": newUser.Name,
    }
    err = client.Put(nil, key, bins)
    panicIfErr(err)
    fmt.Printf("created user: %+v", newUser)
    renderJSON(w, newUser, http.StatusCreated)
}

// feedHandler fetchs a list of non-user specific items
func feedHandler(w http.ResponseWriter, r *http.Request) {

    stm := aerospike.NewStatement(NAMESPACE, "feeditems")
    stm.Addfilter(aerospike.NewEqualFilter("visible", 1))

    recordset, err := client.Query(nil, stm)
    panicIfErr(err)

    items := []FeedItem{}

    for result := range recordset.Results() {
        if result.Err != nil {
            panic(result.Err)
        }

        item := FeedItem{}
        item.Uid = binStringValue(result.Record, "uid")
        item.Title = binStringValue(result.Record, "title")
        item.Visible = binBoolValue(result.Record, "visible")

        items = append(items, item)
    }

    renderJSON(w, items, http.StatusOK)
}

// user helpers

func newUserFromReq(r *http.Request, isNew bool) (*User, error) {
    u := User{}
    u.Uid = generateUid()
    u.Name = r.FormValue("name")
    return &u, nil
}

// util

func generateUid() string {
    rand, _ := randutil.String(12, randutil.Alphabet)
    ts := time.Now().Unix()
    return fmt.Sprintf("%v%v", rand, ts)
}

func renderJSON(w http.ResponseWriter, value interface{}, status int) {
    body, _ := json.Marshal(value)
    w.Header().Set("Content-Type", "application/json; charset=UTF8")
    w.Header().Set("Content-Length", strconv.Itoa(len(body)))
    w.Header().Set("Access-Control-Allow-Origin", "*")
    w.Header().Set("Access-Control-Allow-Headers", "Content-Type, Content-Length, Accept-Encoding, X-CSRF-Token, Authorization")
    w.Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS")
    w.WriteHeader(status)
    _, _ = w.Write(body)
    return
}

func binStringValue(r *aerospike.Record, key string) string {
    if val, ok := r.Bins[key]; ok {
        if _, ok = val.(string); ok {
            return val.(string)
        }
    }
    return ""
}

func binBoolValue(r *aerospike.Record, key string) bool {
    if val, ok := r.Bins[key]; ok {
        if _, ok = val.(int); ok {
            // bool not supported in aerospike, so interpret int
            if val == 1 {
                return true
            } else {
                return false
            }
        }
    }
    return false
}

func panicIfErr(err error) {
    if err != nil {
        panic(err)
    }
}

FILE 2 OF 2:

<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="2.8" jmeter="2.13 r1665067">
  <hashTree>
    <TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="Test Plan" enabled="true">
      <stringProp name="TestPlan.comments"></stringProp>
      <boolProp name="TestPlan.functional_mode">false</boolProp>
      <boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
      <elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
        <collectionProp name="Arguments.arguments"/>
      </elementProp>
      <stringProp name="TestPlan.user_define_classpath"></stringProp>
    </TestPlan>
    <hashTree>
      <ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Thread Group" enabled="true">
        <stringProp name="ThreadGroup.on_sample_error">stoptestnow</stringProp>
        <elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true">
          <boolProp name="LoopController.continue_forever">false</boolProp>
          <stringProp name="LoopController.loops">1</stringProp>
        </elementProp>
        <stringProp name="ThreadGroup.num_threads">2000</stringProp>
        <stringProp name="ThreadGroup.ramp_time">10</stringProp>
        <longProp name="ThreadGroup.start_time">1433789949000</longProp>
        <longProp name="ThreadGroup.end_time">1433789949000</longProp>
        <boolProp name="ThreadGroup.scheduler">false</boolProp>
        <stringProp name="ThreadGroup.duration"></stringProp>
        <stringProp name="ThreadGroup.delay"></stringProp>
      </ThreadGroup>
      <hashTree>
        <ConfigTestElement guiclass="HttpDefaultsGui" testclass="ConfigTestElement" testname="HTTP Request Defaults" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain">YOURHOSTNAME</stringProp>
          <stringProp name="HTTPSampler.port">8010</stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
          <stringProp name="HTTPSampler.protocol"></stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path"></stringProp>
          <stringProp name="HTTPSampler.concurrentPool">4</stringProp>
        </ConfigTestElement>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Sign Up" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments">
              <elementProp name="name" elementType="HTTPArgument">
                <boolProp name="HTTPArgument.always_encode">false</boolProp>
                <stringProp name="Argument.value">${__Random(1,10000000)}</stringProp>
                <stringProp name="Argument.metadata">=</stringProp>
                <boolProp name="HTTPArgument.use_equals">true</boolProp>
                <stringProp name="Argument.name">name</stringProp>
              </elementProp>
            </collectionProp>
          </elementProp>
          <stringProp name="HTTPSampler.domain"></stringProp>
          <stringProp name="HTTPSampler.port"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
          <stringProp name="HTTPSampler.protocol"></stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">/signup</stringProp>
          <stringProp name="HTTPSampler.method">POST</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">true</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">false</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.monitor">false</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Get Feed" enabled="true">
          <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
            <collectionProp name="Arguments.arguments"/>
          </elementProp>
          <stringProp name="HTTPSampler.domain"></stringProp>
          <stringProp name="HTTPSampler.port"></stringProp>
          <stringProp name="HTTPSampler.connect_timeout"></stringProp>
          <stringProp name="HTTPSampler.response_timeout"></stringProp>
          <stringProp name="HTTPSampler.protocol"></stringProp>
          <stringProp name="HTTPSampler.contentEncoding"></stringProp>
          <stringProp name="HTTPSampler.path">/feed</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
          <boolProp name="HTTPSampler.follow_redirects">true</boolProp>
          <boolProp name="HTTPSampler.auto_redirects">false</boolProp>
          <boolProp name="HTTPSampler.use_keepalive">true</boolProp>
          <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
          <boolProp name="HTTPSampler.monitor">false</boolProp>
          <stringProp name="HTTPSampler.embedded_url_re"></stringProp>
        </HTTPSamplerProxy>
        <hashTree/>
        <ResultCollector guiclass="GraphVisualizer" testclass="ResultCollector" testname="Graph Results" enabled="true">
          <boolProp name="ResultCollector.error_logging">false</boolProp>
          <objProp>
            <name>saveConfig</name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>false</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>false</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <threadCounts>true</threadCounts>
            </value>
          </objProp>
          <objProp>
            <name></name>
            <value class="SampleSaveConfiguration">
              <time>true</time>
              <latency>true</latency>
              <timestamp>true</timestamp>
              <success>true</success>
              <label>true</label>
              <code>true</code>
              <message>true</message>
              <threadName>true</threadName>
              <dataType>true</dataType>
              <encoding>false</encoding>
              <assertions>true</assertions>
              <subresults>true</subresults>
              <responseData>false</responseData>
              <samplerData>false</samplerData>
              <xml>false</xml>
              <fieldNames>false</fieldNames>
              <responseHeaders>false</responseHeaders>
              <requestHeaders>false</requestHeaders>
              <responseDataOnError>false</responseDataOnError>
              <saveAssertionResultsFailureMessage>false</saveAssertionResultsFailureMessage>
              <assertionsResultsToSave>0</assertionsResultsToSave>
              <bytes>true</bytes>
              <threadCounts>true</threadCounts>
            </value>
          </objProp>
          <stringProp name="filename">/Users/seenickcode/Desktop/aerospiketest.csv</stringProp>
          <boolProp name="ResultCollector.success_only_logging">true</boolProp>
        </ResultCollector>
        <hashTree/>
      </hashTree>
    </hashTree>
  </hashTree>
</jmeterTestPlan>

Thanks for your help.

When using the create_only policy, the check and update if not exist is an atomic operation. When using exists followed by put is not atomic and in the time between exists returning and the put arriving, another thread may have put also put to the same key.

Back to the issue, could you run the following to see if the kernel killed the Aerospike process:

grep -i "killed process" /var/log/{syslog,messages,kern.log}

@kporter, noted, but at worst the client would simply return an error right?

I ran the test again and found this:

/var/log/kern.log:Jun 15 12:58:12 myhostname kernel: [106345.992567] Killed process 8251 (asd) total-vm:2857412kB, anon-rss:372848kB, file-rss:0kB

It would error if you supplied the appropriate policy, but if you do not supply a policy and you are attempting to create a new record. If another thread beats you to creating the record then the current thread will overwrite that data.

So appears the kernel is killing Aerospike. This normally occurs when the system has run critically low on RAM.

If you run:

grep -i "Out of memory" /var/log/{syslog,messages,kern.log}

or

grep -i "oom" /var/log/{syslog,messages,kern.log}

If those return nothing, then this should give us a clue of why the kernel killed Aerospike:

grep -C10 -i "killed process" /var/log/{syslog,messages,kern.log}

Ok so I am getting “Out of memory”, I’ve found it. Also found that:

  1. Disabling Query() and only doing Put() prevents the issue and I can handle this load totally fine. So is Query() that memory intensive?

  2. Shouldn’t there be some mechanism protect against this situation? In the past, when I had done load testing with a competitor NoSQL database, in the same situation, the client would simply get timeouts from the database connection but the database wouldn’t completely allow itself to be shut down by out of memory issues.

Again, this occurs when there’s a high volume of Query(), not Put() calls. I’m concerned this will happen again if I say increase the load to 12,000 concurrent requests with one node and this time 4GB of RAM. Or 24,000 concurrent requests with 8GB of RAM for that matter.

Thanks for bearing with my questions but I really would like this evaluation of Aerospike to come out positive for us.

We do offer some protection against running out of memory through the memory-size namespace configuration parameter. Only your namespace configuration has set it to 4 times the actual amount of RAM on this instance.

(facepalm)

Yep that did it. Can’t believe we overlooked this setting in the first place!

Thanks,

1 Like