ERROR Server is currently in readonly mode. Shutting down


#1

Hello Team,

I am running benchmark tool on the same machine at which aerospike is installed. Platform is ubuntu 14.04. Everything is working fine and I am able to connect to database but while running benchmark it is saying

2015-04-01 19:08:09 ERROR Server is currently in readonly mode. Shutting down…

After I restart aerospike service, it runs for next 2-3 minutes and after that the same error.

Running this command for benchmark test: ./target/benchmarks -h 127.0.0.1 -p 3000 -n test -k 100000000 -o S:50 -w RU,90 -z 20

Please tell what to do and troubleshoot next.


#2

Hi Amit,

Can you please share your server config (aerospike.conf) and machine details like RAM, disk etc. Also share your aerospike.log file (/var/log/aerospike/aerospike.log)

This is most probably caused due to some problem in configuration around replication factor and available RAM.

From https://www.aerospike.com/docs/architecture/data-distribution.html

In some cases, the replication factor can't be satisfied. The cluster can be configured to either decrease the replication factor and retain all data, or begin evicting the oldest data that is marked as disposable. If the cluster can't accept any more data, it will begin operating in a read-only mode until new capacity becomes available - at which point it will automatically being accepting application writes.

#3

Server Specs : Version: Intel® Xeon® CPU E5-2620 v2 @ 2.10GHz, Max Speed: 4000 MHz, Current Speed: 2100 MHz; 96GB RAM and has two HDDs (One SAS : 900G, SSD: 120G) OS: Ubuntu 14.04 64bit

Everything is kept same as default file after installation except following modifications.

namespace test {
        replication-factor 1
        memory-size 1K
        default-ttl 30d # 30 days, use 0 to never expire/evict.

        # Warning - legacy data in defined raw partition devices will be erased.
        # These partitions must not be mounted by the file system.
        storage-engine device {
                # Use one or more lines like those below with actual device paths.
#               device /dev/sdb
#               device /dev/sdc
                device /dev/sda
                # The 2 lines below optimize for SSD.
                scheduler-mode noop
                write-block-size 128K

                # Use the line below to store data in memory in addition to devices.
#               data-in-memory true
        }
}

Apr 02 2015 11:52:38 GMT: INFO (info): (thr_info.c::4628)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Apr 02 2015 11:52:38 GMT: INFO (info): (thr_info.c::4669) namespace test: disk inuse: 218649600 memory inuse: 4372992 (bytes) sindex memory inuse: 0 (bytes) avail pct 99 cache-read pct 0.00
Apr 02 2015 11:52:38 GMT: INFO (info): (thr_info.c::4644) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Apr 02 2015 11:52:38 GMT: INFO (info): (thr_info.c::4689)    partitions: actual 8192 sync 0 desync 0 zombie 0 wait 0 absent 0
Apr 02 2015 11:52:38 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Apr 02 2015 11:52:38 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (69401 total) msec
Apr 02 2015 11:52:38 GMT: INFO (info): (hist.c::163)  (00: 0000069147) (01: 0000000246) (02: 0000000008)
Apr 02 2015 11:52:38 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 02 2015 11:52:38 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (69401 total) msec
Apr 02 2015 11:52:38 GMT: INFO (info): (hist.c::163)  (00: 0000069147) (01: 0000000246) (02: 0000000008)
Apr 02 2015 11:52:38 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Apr 02 2015 11:52:38 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Apr 02 2015 11:52:38 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4580)  system memory: free 98070936kb ( 99 percent free )
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4588)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 68328 ::: sub_objects 0
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4596)  rec refs 68328 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4602)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4612)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 24, 24) : hb (0, 0, 0) : fab (16, 16, 0)
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4614)    heartbeat_received: self 3019 : foreign 0
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4615)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4628)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4669) namespace test: disk inuse: 218649600 memory inuse: 4372992 (bytes) sindex memory inuse: 0 (bytes) avail pct 99 cache-read pct 0.00
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4644) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Apr 02 2015 11:52:48 GMT: INFO (info): (thr_info.c::4689)    partitions: actual 8192 sync 0 desync 0 zombie 0 wait 0 absent 0
Apr 02 2015 11:52:48 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Apr 02 2015 11:52:48 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (69401 total) msec
Apr 02 2015 11:52:48 GMT: INFO (info): (hist.c::163)  (00: 0000069147) (01: 0000000246) (02: 0000000008)
Apr 02 2015 11:52:48 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 02 2015 11:52:48 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (69401 total) msec
Apr 02 2015 11:52:48 GMT: INFO (info): (hist.c::163)  (00: 0000069147) (01: 0000000246) (02: 0000000008)
Apr 02 2015 11:52:48 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Apr 02 2015 11:52:48 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Apr 02 2015 11:52:48 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Apr 02 2015 11:52:53 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sda: used 218649600, contig-free 114255M (914040 wblocks), swb-free 178, n-w 0, w-q 0 w-tot 1735 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4580)  system memory: free 98075148kb ( 99 percent free )
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4588)  migrates in progress ( 0 , 0 ) ::: ClusterSize 1 ::: objects 68328 ::: sub_objects 0
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4596)  rec refs 68328 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4602)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4612)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (0, 24, 24) : hb (0, 0, 0) : fab (16, 16, 0)
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4614)    heartbeat_received: self 3086 : foreign 0
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4615)    heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4628)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4669) namespace test: disk inuse: 218649600 memory inuse: 4372992 (bytes) sindex memory inuse: 0 (bytes) avail pct 99 cache-read pct 0.00
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4644) namespace bar: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 100
Apr 02 2015 11:52:58 GMT: INFO (info): (thr_info.c::4689)    partitions: actual 8192 sync 0 desync 0 zombie 0 wait 0 absent 0
Apr 02 2015 11:52:58 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Apr 02 2015 11:52:58 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (69401 total) msec
Apr 02 2015 11:52:58 GMT: INFO (info): (hist.c::163)  (00: 0000069147) (01: 0000000246) (02: 0000000008)
Apr 02 2015 11:52:58 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 02 2015 11:52:58 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (69401 total) msec
Apr 02 2015 11:52:58 GMT: INFO (info): (hist.c::163)  (00: 0000069147) (01: 0000000246) (02: 0000000008)
Apr 02 2015 11:52:58 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Apr 02 2015 11:52:58 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Apr 02 2015 11:52:58 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count

#4

Hi Amit,

Sorry for the late reply on this. Looks like you need to allocate more memory… With 1K memory-size you will not be able to hold many records… Only 16 actually (each record uses 64bytes of RAM for the index). You will likely be able to go beyond (based on periodic checks for this) but the system will fairly quickly put you in stop-writes because you would be above the stop-writes-pct (default 90% of memory allocated). At that point, the database is putting itself in read-only mode to avoid using too much more memory then allocated. You can verify this in the logs (you would need to share more of the logs for me to point out where).

You can find more details on stop writes scenarios here.

Also, here is our capacity planning guide where you can find details on primary index sizing and more.

Thanks, and let us know if you have any further questions.

–meher