We’ve recently spun some instances in AWS. Specifically i3.4xlarge instances from the aerospike ami (ami-da6c3da0). Our aerospike.cfg is utilizing a single namespace with device access to nvme0n1 and nvme1n1. Attempting to start asd results in the error below:
Feb 02 2018 20:35:44 GMT: WARNING (drv_ssd): (drv_ssd.c:3499) unable to open device /dev/nvme0n1: Permission denied
Additionally, we have followed the instructions for running non-root and both adding aerospike to the disk group and trying the udev tweak result in the same error.
Is it possible we are doing it wrong or missing something specific?
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
proto-fd-max 30000
migrate-threads 8
migrate-max-num-incoming 16
nsup-period 60
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port 172.20.0.41 3002
mesh-seed-address-port 172.20.0.49 3002
mesh-seed-address-port 172.20.0.90 3002
mesh-seed-address-port 172.20.0.183 3002
mesh-seed-address-port 172.20.0.253 3002
mesh-seed-address-port 172.20.1.98 3002
mesh-seed-address-port 172.20.1.198 3002
mesh-seed-address-port 172.20.1.213 3002
mesh-seed-address-port 172.20.1.217 3002
interval 5000
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace a {
replication-factor 2
memory-size 120G
default-ttl 180d
high-water-disk-pct 50
high-water-memory-pct 90
stop-writes-pct 95
migrate-sleep 0
disable-write-dup-res true
evict-hist-buckets 10000000
evict-tenths-pct 10
storage-engine device {
device /dev/nvme0n1
device /dev/nvme1n1
write-block-size 128K
}
}
Yes and no as the documentation states that “Newly provisioned blank EBS volumes and all Ephemeral disks are already zeroed.” These hosts are all new instances with fresh new ephemeral disks.
Feb 06 2018 15:46:57 GMT: INFO (as): (as.c:318) <><><><><><><><><><> Aerospike Community Edition build 3.15.1.3 <><><><><><><><><><>
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) # Aerospike database configuration file for deployments using mesh heartbeats.
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) service {
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) user aerospike
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) group aerospike
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) pidfile /var/run/aerospike/asd.pid
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) proto-fd-max 15000
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) }
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) logging {
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) # Log file must be an absolute path.
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) file /var/log/aerospike/aerospike.log {
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) context any info
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) }
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) }
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) network {
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) service {
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) address any
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) port 3000
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) }
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) heartbeat {
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) mode mesh
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) port 3002 # Heartbeat port for this node.
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) # List one or more other nodes, one ip-address & port per line:
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) # mesh-seed-address-port 10.10.10.11 3002
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) # mesh-seed-address-port 10.10.10.12 3002
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) # mesh-seed-address-port 10.10.10.13 3002
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) # mesh-seed-address-port 10.10.10.14 3002
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) interval 250
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) timeout 10
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) }
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) fabric {
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) port 3001
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) }
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) info {
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) port 3003
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) }
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) }
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) namespace bmac {
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) replication-factor 1
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) memory-size 120G
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) default-ttl 180d
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) high-water-disk-pct 50
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) high-water-memory-pct 90
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) stop-writes-pct 95
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) migrate-sleep 0
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) disable-write-dup-res true
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) evict-hist-buckets 10000000
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) evict-tenths-pct 10
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531)
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) storage-engine device {
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) device /dev/nvme0n1
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) device /dev/nvme1n1
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) write-block-size 128K
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) }
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3531) }
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3551) system file descriptor limit: 100000, proto-fd-max: 15000
Feb 06 2018 15:46:57 GMT: INFO (hardware): (hardware.c:1785) detected 16 CPU(s), 8 core(s), 1 NUMA node(s)
Feb 06 2018 15:46:57 GMT: INFO (socket): (socket.c:2549) Node port 3001, node ID bb9de6bdab97e12
Feb 06 2018 15:46:57 GMT: INFO (config): (cfg.c:3592) node-id bb9de6bdab97e12
Feb 06 2018 15:46:57 GMT: INFO (namespace): (namespace_ce.c:96) {bmac} beginning cold start
Feb 06 2018 15:46:57 GMT: INFO (drv_ssd): (drv_ssd.c:3419) usable device size must be header size 1048576 + multiple of 131072, rounding down
Feb 06 2018 15:46:57 GMT: INFO (drv_ssd): (drv_ssd.c:3525) opened device /dev/nvme0n1: usable size 1899999920128, io-min-size 512
Feb 06 2018 15:46:57 GMT: INFO (drv_ssd): (drv_ssd.c:3419) usable device size must be header size 1048576 + multiple of 131072, rounding down
Feb 06 2018 15:46:57 GMT: INFO (drv_ssd): (drv_ssd.c:3525) opened device /dev/nvme1n1: usable size 1899999920128, io-min-size 512
Feb 06 2018 15:46:57 GMT: INFO (drv_ssd): (drv_ssd.c:1045) /dev/nvme0n1 has 14495849 wblocks of size 131072
Feb 06 2018 15:46:58 GMT: INFO (drv_ssd): (drv_ssd.c:1045) /dev/nvme1n1 has 14495849 wblocks of size 131072
Feb 06 2018 15:46:58 GMT: FAILED ASSERTION (drv_ssd): (drv_ssd.c:112) /dev/nvme0n1: DEVICE FAILED open: errno 13 (Permission denied)
Feb 06 2018 15:46:58 GMT: WARNING (as): (signal.c:210) SIGUSR1 received, aborting Aerospike Community Edition build 3.15.1.3 os el6
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: found 12 frames
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 0: /usr/bin/asd(as_sig_handle_usr1+0x36) [0x46f8c6]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 1: /lib64/libc.so.6(+0x35270) [0x7fb2b452b270]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 2: /lib64/libpthread.so.0(raise+0x2b) [0x7fb2b578646b]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 3: /usr/bin/asd(cf_fault_event+0x216) [0x4fc2fd]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 4: /usr/bin/asd(ssd_fd_get+0xae) [0x4dc9a3]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 5: /usr/bin/asd(ssd_read_header+0x52) [0x4dcce8]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 6: /usr/bin/asd(ssd_load_records+0xa9) [0x4e1f7f]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 7: /usr/bin/asd(as_storage_namespace_init_ssd+0x4d7) [0x4e352c]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 8: /usr/bin/asd(as_storage_init+0x69) [0x4d90e4]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 9: /usr/bin/asd(main+0x2ef) [0x43d1c1]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 10: /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fb2b4517c05]
Feb 06 2018 15:46:58 GMT: INFO (as): (signal.c:214) call stack: frame 11: /usr/bin/asd() [0x43c4c9]
If you use root, does Aerospike start? If yes then it seems to be a permission issue, if no let’s try to get running as root working first.
Assuming a permission issue, try following the second method mentioned in “Configure SSD Resources used by Namespace” at https://www.aerospike.com/docs/operations/configure/non_root which begins with “or add a udev rule to the Aerospike user…”
A bit of progress. I rolled a test run using CentOS7, disabled selinux and I HALF replicated the same issue. While adding aerospike user to disk group, it did not work; however, utilizing the udev rule below resulted in successful start of the service.
KERNEL=="nvme[0-9]n1", OWNER="aerospike"
I returned to my AWS instance and; utilizing the same udev rule above, still get the permission denied error.
CentOS kernel details:
[centos@ip-10-0-0-150 ~]$ cat /etc/system-release && uname -a CentOS Linux release 7.4.1708 (Core) Linux ip-10-0-0-150.velox.adtheorent.com 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Aerospike AMI details:
[ec2-user@ip-10-0-0-92 ~]$ cat /etc/system-release && uname -a Amazon Linux AMI release 2017.09 Linux ip-10-0-0-92 4.9.70-22.55.amzn1.x86_64 #1 SMP Wed Dec 20 23:36:28 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Yes, I ensured execution of udev rules update. With the AWS Linux even rebooting the host after inclusion of the udev rules results in the same permissions error.
What is incredibly interesting is I just went to verify my 99-aerospike.rules file and it wasn’t there. After containing my anger I created it again, verified it’s existence… reloaded udev rules and triggered and THEN it started…