Aerospike doesn't work with local SSD disks at GCE


#1

Hello,

I try to setup a test instance at GCE (google compute engine) using local SSD disk but this configuration doesn’t seem to work. When starting aerospike exists with an error:

Nov 10 2014 12:48:37 GMT: INFO (drv_ssd): (drv_ssd.c::3809) Opened device /dev/sdb bytes 402653184000
Nov 10 2014 12:48:37 GMT: INFO (drv_ssd): (drv_ssd.c::3698) storage: set device /dev/sdb scheduler mode to noop
Nov 10 2014 12:48:37 GMT: INFO (drv_ssd): (drv_ssd.c::995)  number of wblocks in allocator: 3072000 wblock 131072
Nov 10 2014 12:48:37 GMT: INFO (drv_ssd): (drv_ssd.c::2597) read_header: dev /dev/sdb: unable to read: rv -1 error Invalid argument
Nov 10 2014 12:48:37 GMT: CRITICAL (drv_ssd): (drv_ssd.c:ssd_load_devices:3506) unable to read disk header /dev/sdb: Invalid argument
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::70) SIGABRT received, aborting Aerospike Community Edition build 3.3.21
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x54) [0x46eea2]
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x321e0) [0x7f15962761e0]
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f1596276165]
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x180) [0x7f15962793e0]
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x229) [0x4f954b]
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 5: /usr/bin/asd(ssd_load_devices+0xf0) [0x4f1ee6]
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 6: /usr/bin/asd(as_storage_namespace_init_ssd+0x3ec) [0x4f337c]
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 7: /usr/bin/asd(as_storage_init+0x66) [0x4e9346]
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 8: /usr/bin/asd(main+0x37b) [0x45284b]
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 9: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f1596262ead]
Nov 10 2014 12:48:37 GMT: WARNING (as): (signal.c::77) stacktrace: frame 10: /usr/bin/asd() [0x452add]

Server config is shown below

# Aerospike database configuration file.

# This stanza must come first.
service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        pidfile /var/run/aerospike/asd.pid
        service-threads 4
        transaction-queues 4
        transaction-threads-per-queue 4
        proto-fd-max 15000
}

logging {
        # Log file must be an absolute path.
        file /var/log/aerospike/aerospike.log {
                context any info
        }
}

network {
        service {
                address any
                port 3000
        }

        heartbeat {
                mode multicast
                address 239.1.99.222
                port 9918

                # To use unicast-mesh heartbeats, comment out the 3 lines above and
                # use the following 4 lines instead.
                mode mesh
#               port 3002
#               mesh-address 10.1.1.1
#               mesh-port 3002

                interval 150
                timeout 10
        }

        fabric {
                port 3001
        }

        info {
                port 3003
        }
}

namespace test {
        replication-factor 2
        memory-size 28G
        default-ttl 30d # 30 days, use 0 to never expire/evict.

        storage-engine device {
                device /dev/sdb
                scheduler-mode noop
                write-block-size 128K
        }
}

I guess this problem might be related the earlier one I had with ACT benchmark: Problem running ACT on GCE using local SSD disk (CE 3.4.0) [Released] [Resolved]


Problem running ACT on GCE using local SSD disk (CE 3.4.0) [Released] [Resolved]
#2

hi Illyam,

Can you please confirm that the particular instance has /dev/sdb available. Also, can you please tell us which vm type were you using? I am assuming this was based on debian 7 backports image?


#3

Anshu,

Yes, /dev/sdb is available - prior to setting up aerospike I run ACT benchmark on the very same disk and later I successfully cleared the disk with dd. And yes, it is Debian 7 backports images with SSD attached via SCSI interface.


#4

Hi,

You are right. This is related to the o-direct issues that you faced with the ACT. Luckily we have a config option in ASD which will disable the flag. You can add “disable-odirect true” in the storage section of the namespace as below.

    storage-engine device {
            device /dev/sdb
            disable-odirect true
            scheduler-mode noop
            write-block-size 128K
    }

#5

Thanks, it does work now.

By the way there is only a minor problem in the start up script. It prints errors but it doesn’t seems to stop aerospike from working.

$ sudo /etc/init.d/aerospike start
[....] Start aerospike: : asd/etc/init.d/aerospike: line 39: [: 18446744073692774399: integer expression expected
/etc/init.d/aerospike: line 49: [: 18446744073692774399: integer expression expected
. ok

#6

Which OS image are you using ? I tried it on backports-debian7-wheezy image. and it works fine for me.


#7

The init script does problem exist and is due to (very) large shm values set recently in the kernel. (Check https://github.com/torvalds/linux/blob/master/include/uapi/linux/shm.h)

Will be fixed in the next release.


#8

This init script issue has been fixed in Server release 3.4.0 (http://www.aerospike.com/download/server/notes.html#3.4.0)