Hi ,
We have AWS cluster of 3 i2 nodes with 2 ephemeral nodes each (800GBx2).
Cluster uses following config:
namespace storage {
replication-factor 2
memory-size 20G
high-water-memory-pct 60
high-water-disk-pct 70
stop-writes-pct 90
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine memory
# ldt-enabled true
storage-engine device {
# Use the 2 lines below with actual device paths.
device /dev/sdb
device /dev/sdc
# The 2 lines below optimize for SSD.
scheduler-mode noop
write-block-size 256K
}
}
namespace memory {
replication-factor 2
memory-size 34G
high-water-memory-pct 60
high-water-disk-pct 70
stop-writes-pct 90
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine memory
#
storage-engine device {
#
# Use the 2 lines below with actual device paths.
device /dev/sdb1
device /dev/sdc1
# The 2 lines below optimize for SSD.
scheduler-mode noop
write-block-size 128K
Suddenly aerospike.log filled up with WARNING messages and grew up to ~7GB.
Apr 07 2015 08:39:07 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Apr 07 2015 08:39:07 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdc1: used 0, contig-free 51199M (409592 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Apr 07 2015 08:39:08 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdb1: used 0, contig-free 51199M (409592 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Apr 07 2015 08:39:14 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1310720
Apr 07 2015 08:39:14 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1310720
Apr 07 2015 08:39:14 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
Apr 07 2015 08:39:14 GMT: WARNING (rw): (thr_rw.c::5710) as_record_get: expired record still in system, no read: n_bins(0)
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4580) system memory: free 49546488kb ( 78 percent free )
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4588) migrates in progress ( 0 , 0 ) ::: ClusterSize 3 ::: objects 200992324 ::: sub_objects 0
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4596) rec refs 200992324 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4602) replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4612) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (66, 8101, 8035) : hb (3, 3, 0) : fab (44, 44, 0)
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4614) heartbeat_received: self 0 : foreign 5767385
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4615) heartbeat_stats: bt 0 bf 3823379 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4628) tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4669) namespace mth-storage: disk inuse: 370665007104 memory inuse: 12863508736 (bytes) sindex memory inuse: 0 (bytes) avail pct 62 cache-read pct 30.11
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4669) namespace mth-memory: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99 cache-read pct 0.00
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4689) partitions: actual 2626 sync 2790 desync 0 zombie 0 wait 0 absent 2776
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::137) histogram dump: reads (2670500987 total) msec
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154) (00: 2606725317) (01: 0033777283) (02: 0016810289) (03: 0006098423)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154) (04: 0004777100) (05: 0002102302) (06: 0000151460) (07: 0000040276)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::163) (08: 0000018432) (09: 0000000105)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (162161384 total) msec
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154) (00: 0147910827) (01: 0010684599) (02: 0003135245) (03: 0000344022)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154) (04: 0000065583) (05: 0000016214) (06: 0000002703) (07: 0000001365)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::163) (08: 0000000772) (09: 0000000054)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (162161384 total) msec
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154) (00: 0147929040) (01: 0010668592) (02: 0003133255) (03: 0000343817)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154) (04: 0000065572) (05: 0000016214) (06: 0000002703) (07: 0000001365)
Apr 07 2015 10:55:08 GMT: INFO (info): (thr_info.c::4669) namespace mth-memory: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99 cache-read pct 0.00
Apr 07 2015 10:55:08 GMT: INFO (info): (thr_info.c::4689) partitions: actual 2626 sync 2790 desync 0 zombie 0 wait 0 absent 2776
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: reads (2676769539 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154) (00: 2612883459) (01: 0033851440) (02: 0016844668) (03: 0006099710)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154) (04: 0004777612) (05: 0002102377) (06: 0000151460) (07: 0000040276)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::163) (08: 0000018432) (09: 0000000105)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (165813223 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154) (00: 0151369182) (01: 0010835095) (02: 0003176342) (03: 0000345590)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154) (04: 0000065834) (05: 0000016258) (06: 0000002723) (07: 0000001372)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::163) (08: 0000000773) (09: 0000000054)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (165813223 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154) (00: 0151387667) (01: 0010818826) (02: 0003174343) (03: 0000345384)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154) (04: 0000065823) (05: 0000016258) (06: 0000002723) (07: 0000001372)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::163) (08: 0000000773) (09: 0000000054)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Apr 07 2015 10:55:08 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdc1: used 0, contig-free 51199M (409592 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Apr 07 2015 10:55:09 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdb1: used 0, contig-free 51199M (409592 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
Apr 07 2015 10:55:10 GMT: WARNING (rw): (thr_rw.c::5710) as_record_get: expired record still in system, no read: n_bins(0)
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
Apr 07 2015 10:55:10 GMT: WARNING (rw): (thr_rw.c::5710) as_record_get: expired record still in system, no read: n_bins(0)
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
Apr 07 2015 10:55:10 GMT: WARNING (rw): (thr_rw.c::5710) as_record_get: expired record still in system, no read: n_bins(0)
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
Apr 07 2015 10:55:10 GMT: WARNING (rw): (thr_rw.c::5710) as_record_get: expired record still in system, no read: n_bins(0)
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
I didnt this WARNING about bad blocks in general warnings posted here.
The question is what does it mean and why there is suddenly happened?
Another question is about work with partitioned disks, do we need to format it with some "ext" Fs or leave it as is "Raw"?
We have set the TTL for most records to 14 days and exactly at this point all this problem started.
There are two major issues I see in your config:
A namespace should only have a single storage engine defined. You have defined bout memory and device. This will end up using device since it appears last but it will allocate resources for the first storage engine which will not be used. The run-time behavior would also be undefined but I believe this not to be causing your issue.
It appears you have configured the “storage” namespace to use the full raw devices /dev/sdb
and /dev/sdc
and configured the “memory” namespace to use a partition of those devices /dev/sdb1
and /dev/sdc1
. In which case the warning you are receiving would be expected when the “storage” namespace begins to overwrite the contents of the partitions in the “memory” namespace.
The devices should be unformatted and zeroized before usage begins. To zeroize a device that supports TRIM, such as SSDs, use:
Example:
sudo blkdiscard /dev/DEVICE
For other devices use:
Example:
sudo dd if=/dev/zero of=/dev/sdc bs=1M
Thank you for your answer kporter!
Maybe i am wrong but as i understand if we have only 1 physical device and 2 namespaces we can use it by splitting the Disk for 2 partitions and use each partition for 1 namespace ?
Regarding “Memory” namespace configuration using 2 types of storage engine what we mean to do is work like in memory data base that flushed to disk also.
Maybe i didn’t understood the concept.
Thanks.
Yes, this is correct. I am assuming your device names are following normal conventions and that /dev/sdb
is the full raw device and /dev/sdb1
is a partition on that device. If this is the case then the namespace containing /dev/sdb
will eventually overwrite the namespace containing the partition.
To configure a storage-engine device
to work in this way you need to add data-in-memory true
namespace storage {
replication-factor 2
memory-size 20G
high-water-memory-pct 60
high-water-disk-pct 70
stop-writes-pct 90
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine device {
# Use the 2 lines below with actual device paths.
device /dev/sdb
device /dev/sdc
# The 2 lines below optimize for SSD.
scheduler-mode noop
write-block-size 256K
data-in-memory true
}
}