Bad blocks and expired records


#1
Hi ,

We have AWS cluster of  3 i2 nodes with 2 ephemeral nodes each (800GBx2).
Cluster uses following config:

    namespace storage {
        replication-factor 2
        memory-size 20G
        high-water-memory-pct 60
        high-water-disk-pct   70
        stop-writes-pct       90

        default-ttl 30d # 30 days, use 0 to never expire/evict.

        storage-engine memory
#        ldt-enabled true

        storage-engine device {

        # Use the 2 lines below with actual device paths.
        device /dev/sdb
        device /dev/sdc

        # The 2 lines below optimize for SSD.
        scheduler-mode noop
        write-block-size 256K
}
}
namespace memory {
        replication-factor 2
        memory-size 34G
        high-water-memory-pct 60
        high-water-disk-pct   70
        stop-writes-pct       90

        default-ttl 30d # 30 days, use 0 to never expire/evict.

        storage-engine memory
#
        storage-engine device {
#
        # Use the 2 lines below with actual device paths.
        device /dev/sdb1
        device /dev/sdc1

        # The 2 lines below optimize for SSD.
        scheduler-mode noop
        write-block-size 128K

Suddenly aerospike.log filled up with WARNING messages and grew up to ~7GB.

Apr 07 2015 08:39:07 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Apr 07 2015 08:39:07 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdc1: used 0, contig-free 51199M (409592 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Apr 07 2015 08:39:08 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdb1: used 0, contig-free 51199M (409592 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Apr 07 2015 08:39:14 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1310720
Apr 07 2015 08:39:14 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1310720
Apr 07 2015 08:39:14 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
Apr 07 2015 08:39:14 GMT: WARNING (rw): (thr_rw.c::5710) as_record_get: expired record still in system, no read: n_bins(0)
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4580)  system memory: free 49546488kb ( 78 percent free )
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4588)  migrates in progress ( 0 , 0 ) ::: ClusterSize 3 ::: objects 200992324 ::: sub_objects 0
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4596)  rec refs 200992324 ::: rec locks 0 ::: trees 0 ::: wr reqs 0 ::: mig tx 0 ::: mig rx 0
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4602)  replica errs :: null 0 non-null 0 ::: sync copy errs :: node 0 :: master 0
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4612)    trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: bq 0 ::: iq 0 ::: dq 0 : fds - proto (66, 8101, 8035) : hb (3, 3, 0) : fab (44, 44, 0)
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4614)    heartbeat_received: self 0 : foreign 5767385
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4615)    heartbeat_stats: bt 0 bf 3823379 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4628)    tree_counts: nsup 0 scan 0 batch 0 dup 0 wprocess 0 migrx 0 migtx 0 ssdr 0 ssdw 0 rw 0
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4669) namespace mth-storage: disk inuse: 370665007104 memory inuse: 12863508736 (bytes) sindex memory inuse: 0 (bytes) avail pct 62 cache-read pct 30.11
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4669) namespace mth-memory: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99 cache-read pct 0.00
Apr 07 2015 08:39:17 GMT: INFO (info): (thr_info.c::4689)    partitions: actual 2626 sync 2790 desync 0 zombie 0 wait 0 absent 2776
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::137) histogram dump: reads (2670500987 total) msec
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154)  (00: 2606725317) (01: 0033777283) (02: 0016810289) (03: 0006098423)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154)  (04: 0004777100) (05: 0002102302) (06: 0000151460) (07: 0000040276)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::163)  (08: 0000018432) (09: 0000000105)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (162161384 total) msec
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154)  (00: 0147910827) (01: 0010684599) (02: 0003135245) (03: 0000344022)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154)  (04: 0000065583) (05: 0000016214) (06: 0000002703) (07: 0000001365)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::163)  (08: 0000000772) (09: 0000000054)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (162161384 total) msec
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154)  (00: 0147929040) (01: 0010668592) (02: 0003133255) (03: 0000343817)
Apr 07 2015 08:39:17 GMT: INFO (info): (hist.c::154)  (04: 0000065572) (05: 0000016214) (06: 0000002703) (07: 0000001365)





Apr 07 2015 10:55:08 GMT: INFO (info): (thr_info.c::4669) namespace mth-memory: disk inuse: 0 memory inuse: 0 (bytes) sindex memory inuse: 0 (bytes) avail pct 99 cache-read pct 0.00
Apr 07 2015 10:55:08 GMT: INFO (info): (thr_info.c::4689)    partitions: actual 2626 sync 2790 desync 0 zombie 0 wait 0 absent 2776
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: reads (2676769539 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154)  (00: 2612883459) (01: 0033851440) (02: 0016844668) (03: 0006099710)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154)  (04: 0004777612) (05: 0002102377) (06: 0000151460) (07: 0000040276)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::163)  (08: 0000018432) (09: 0000000105)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (165813223 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154)  (00: 0151369182) (01: 0010835095) (02: 0003176342) (03: 0000345590)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154)  (04: 0000065834) (05: 0000016258) (06: 0000002723) (07: 0000001372)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::163)  (08: 0000000773) (09: 0000000054)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (165813223 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154)  (00: 0151387667) (01: 0010818826) (02: 0003174343) (03: 0000345384)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::154)  (04: 0000065823) (05: 0000016258) (06: 0000002723) (07: 0000001372)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::163)  (08: 0000000773) (09: 0000000054)
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Apr 07 2015 10:55:08 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Apr 07 2015 10:55:08 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdc1: used 0, contig-free 51199M (409592 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Apr 07 2015 10:55:09 GMT: INFO (drv_ssd): (drv_ssd.c::2398) device /dev/sdb1: used 0, contig-free 51199M (409592 wblocks), swb-free 0, n-w 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 0 (0.0/s)
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
Apr 07 2015 10:55:10 GMT: WARNING (rw): (thr_rw.c::5710) as_record_get: expired record still in system, no read: n_bins(0)
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
Apr 07 2015 10:55:10 GMT: WARNING (rw): (thr_rw.c::5710) as_record_get: expired record still in system, no read: n_bins(0)
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
Apr 07 2015 10:55:10 GMT: WARNING (rw): (thr_rw.c::5710) as_record_get: expired record still in system, no read: n_bins(0)
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()
Apr 07 2015 10:55:10 GMT: WARNING (rw): (thr_rw.c::5710) as_record_get: expired record still in system, no read: n_bins(0)
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: WARNING (drv_ssd): (drv_ssd.c::1192) read: bad block magic offset 1494016
Apr 07 2015 10:55:10 GMT: INFO (drv_ssd): (drv_ssd.c::1218) read_all: failed as_storage_record_read_ssd()

#2

I didnt this WARNING about bad blocks in general warnings posted here.

The question is what does it mean and why there is suddenly happened?
Another question is about work with partitioned disks, do we need to format it with some "ext" Fs or leave it as is "Raw"?

We have set the TTL for most records to 14 days and exactly at this point all this problem started.


#3

There are two major issues I see in your config:

A namespace should only have a single storage engine defined. You have defined bout memory and device. This will end up using device since it appears last but it will allocate resources for the first storage engine which will not be used. The run-time behavior would also be undefined but I believe this not to be causing your issue.

It appears you have configured the “storage” namespace to use the full raw devices /dev/sdb and /dev/sdc and configured the “memory” namespace to use a partition of those devices /dev/sdb1 and /dev/sdc1. In which case the warning you are receiving would be expected when the “storage” namespace begins to overwrite the contents of the partitions in the “memory” namespace.

The devices should be unformatted and zeroized before usage begins. To zeroize a device that supports TRIM, such as SSDs, use:

Example:

sudo blkdiscard /dev/DEVICE

For other devices use:

Example:

sudo dd if=/dev/zero of=/dev/sdc bs=1M

#4

Thank you for your answer kporter!

Maybe i am wrong but as i understand if we have only 1 physical device and 2 namespaces we can use it by splitting the Disk for 2 partitions and use each partition for 1 namespace ?

Regarding “Memory” namespace configuration using 2 types of storage engine what we mean to do is work like in memory data base that flushed to disk also.

Maybe i didn’t understood the concept.

Thanks.


#5

Yes, this is correct. I am assuming your device names are following normal conventions and that /dev/sdb is the full raw device and /dev/sdb1 is a partition on that device. If this is the case then the namespace containing /dev/sdb will eventually overwrite the namespace containing the partition.

To configure a storage-engine device to work in this way you need to add data-in-memory true

namespace storage {
    replication-factor 2
    memory-size 20G
    high-water-memory-pct 60
    high-water-disk-pct   70
    stop-writes-pct       90

    default-ttl 30d # 30 days, use 0 to never expire/evict.

    storage-engine device {
        # Use the 2 lines below with actual device paths.
        device /dev/sdb
        device /dev/sdc
        # The 2 lines below optimize for SSD.
        scheduler-mode noop
        write-block-size 256K
        data-in-memory true
    }
}