Aerospike Crashes under heavy load


#1

Hi,

I was testing Aerospike using heavy load. I sent parallel 100 request in separate thread. during the process Aerospike crashed with status Active(Exited) I am accessing aerosppike using PHP client. so wondering what load Aerospike can handle. My concern is during heavy load it should not stop the service.

thanks. Samuel


#2

Do you mind providing information about what your cluster looks like, the number of nodes, the OS, version of the Aerospike server, available memory, configuration, etc?

Let’s figure out it’s not related to environment and setup first, then create a bug if that’s not the case.


#3

Hello rbotzer,

that was a really quick response. appreciate that. I am using the basic instance of AWS with 1gb ram. My webserver and Aerospike db is on same machine. i am logging session into aerospike. Through jmeter I created the load. and Aerospike crashed and all my session data lost. I understand the instance is small but I feel rather than slow performance aerospike crashed.

Log

Aug 31 2015 17:53:30 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 256 : foreign 0
Aug 31 2015 17:53:30 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Aug 31 2015 17:53:30 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 31 2015 17:53:30 GMT: INFO (info): (thr_info.c::4860) namespace sessstore: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Aug 31 2015 17:53:30 GMT: INFO (info): (thr_info.c::4860) namespace cart: disk inuse: 1852032 memory inuse: 1671794 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Aug 31 2015 17:53:30 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 8192 sync 0 desync 0 zombie 0 wait 0 absent 0
Aug 31 2015 17:53:30 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Aug 31 2015 17:53:30 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
Aug 31 2015 17:53:30 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Aug 31 2015 17:53:30 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (0 total) msec
Aug 31 2015 17:53:30 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Aug 31 2015 17:53:30 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Aug 31 2015 17:53:30 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Aug 31 2015 17:53:32 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/sess.dat: used 0, contig-free 16383M (16383 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Aug 31 2015 17:53:32 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/cart.dat: used 1852032, contig-free 16380M (16380 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 3 (0.0/s)
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4796)  system memory: free 804628kb ( 79 percent free ) 
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1987 ::: sub_objects 0
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4812)  rec refs 1987 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 0, 0) : hb (0, 0, 0) : fab (16, 16, 0)
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 322 : foreign 0
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4860) namespace sessstore: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4860) namespace cart: disk inuse: 1852032 memory inuse: 1671794 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Aug 31 2015 17:53:40 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 8192 sync 0 desync 0 zombie 0 wait 0 absent 0
Aug 31 2015 17:53:40 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Aug 31 2015 17:53:40 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
Aug 31 2015 17:53:40 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Aug 31 2015 17:53:40 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (0 total) msec
Aug 31 2015 17:53:40 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Aug 31 2015 17:53:40 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Aug 31 2015 17:53:40 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4796)  system memory: free 804628kb ( 79 percent free ) 
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1987 ::: sub_objects 0
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4812)  rec refs 1987 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 0, 0) : hb (0, 0, 0) : fab (16, 16, 0)
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 389 : foreign 0
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4860) namespace sessstore: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4860) namespace cart: disk inuse: 1852032 memory inuse: 1671794 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Aug 31 2015 17:53:50 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 8192 sync 0 desync 0 zombie 0 wait 0 absent 0
Aug 31 2015 17:53:50 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Aug 31 2015 17:53:50 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
Aug 31 2015 17:53:50 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Aug 31 2015 17:53:50 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (0 total) msec
Aug 31 2015 17:53:50 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Aug 31 2015 17:53:50 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Aug 31 2015 17:53:50 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Aug 31 2015 17:53:52 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/sess.dat: used 0, contig-free 16383M (16383 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Aug 31 2015 17:53:52 GMT: INFO (drv_ssd): (drv_ssd.c::2536) device /opt/aerospike/data/cart.dat: used 1852032, contig-free 16380M (16380 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 3 (0.0/s)
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4796)  system memory: free 807312kb ( 79 percent free ) 
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 1987 ::: sub_objects 0
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4812)  rec refs 1987 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 0, 0) : hb (0, 0, 0) : fab (16, 16, 0)
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 455 : foreign 0
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4860) namespace sessstore: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4860) namespace cart: disk inuse: 1852032 memory inuse: 1671794 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Aug 31 2015 17:54:00 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 8192 sync 0 desync 0 zombie 0 wait 0 absent 0
Aug 31 2015 17:54:00 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Aug 31 2015 17:54:00 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
Aug 31 2015 17:54:00 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Aug 31 2015 17:54:00 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (0 total) msec
Aug 31 2015 17:54:00 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Aug 31 2015 17:54:00 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Aug 31 2015 17:54:00 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4796)  system memory: free 567292kb ( 55 percent free ) 
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4804)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 2010 ::: sub_objects 0
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4812)  rec refs 2010 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4818)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0 
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4828)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (62, 92, 30) : hb (0, 0, 0) : fab (16, 16, 0)
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4830)    heartbeat_received: self 521 : foreign 0
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4831)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0 
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4844)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4860) namespace sessstore: disk inuse: 5888 memory inuse: 4117 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4860) namespace cart: disk inuse: 1852032 memory inuse: 1671794 (bytes) sindex memory inuse: 0 (bytes) avail pct 99
Aug 31 2015 17:54:10 GMT: INFO (info): (thr_info.c::4905)    partitions: actual 8192 sync 0 desync 0 zombie 0 wait 0 absent 0
Aug 31 2015 17:54:10 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Aug 31 2015 17:54:10 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (46 total) msec
Aug 31 2015 17:54:10 GMT: INFO (info): (hist.c::163)  (00: 0000000046)
Aug 31 2015 17:54:10 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Aug 31 2015 17:54:10 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (46 total) msec
Aug 31 2015 17:54:10 GMT: INFO (info): (hist.c::163)  (00: 0000000046)
Aug 31 2015 17:54:10 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Aug 31 2015 17:54:10 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Aug 31 2015 17:54:10 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count

#4

You’re using too small of an instance, most likely. When you have less than 1GB to give the database server, and also add a webserver to it, you’re likely to encounter the OS’s OOM killer (out of memory). Running on such a weak box (I don’t know if you even have enough cores to support those two servers concurrently) is not simulating anything realistic.

So, we have an EC2 guide that includes things such as specific recommendations on which node to use and how to configure it. Please use it. You can alternatively run it on a VM on your own machine, if you have enough cores to spare.


#5

Thanks Rbotzer,

I understand that we used very low instance but I am not able to understand crashing and stopping the aerospike service. with 25 thread it behaved perfectly. but the moment I raised the thread it brought down the aerospike service. My RDBMS service when in running state, irrespective of any issues, load, locking it wont stop rdbms. Once the service is on it is always on.

so even if I move aerospike to new dedicated instance and if due to some issue the memory of that instance is utilized by any process. does that mean my aerospike wud stop?

is there any way I can dedicate my ram to aerospike so that no other process cud use it. say if I got 4gb ram, I’ll assign 3gb to aerospike and rest 1gb to other processes.

my concern is because of this scenario in production when there is any unexpected load it would bring my business down.


#6

One possible (and common) problem is OOM Killer where your OS will terminate a process that gets too big. Coming from PHP you should be familiar with php.ini settings like memory_limit that have a similar effect.

The RDBMS you’re familiar with is mostly file based which is why it has more room to grow (and why it’s much slower), but if you run out of disk space you’ll see similar odd behavior from the RDBMS. In the case of Aerospike, the RAM it uses is provisioned, meaning it’s allocated in advance rather than slowly growing to some unknown size. You should read the Capacity Planning article to understand more about how you select the right instance or physical node for what you need in production. Reading the operations manual will help you understand how things work and how to configure and tune. You can also post your questions about which instance you should use for your use case to the community, and get help with correct configuration. Lastly, there’s a clear knowledge base that includes a lot of useful articles, for example about sizing and configuration.

There are many companies using Aerospike in very heavy production situations, ones that RDBMSs simply cannot support, and with years of uptime for the cluster. I am going to guess that once you get past the trivial sizing and configuration issues you’ll be happy with the stability of the system.