We are using the Node.js implementation of Aerospike.
Here are the versions we are using in production:
asd --version
Aerospike Enterprise Edition, build 3.5.8
npm ls | grep aerospike
Aerospike Node Client, version 1.0.50
===
Here are errors we are seeing being returned from our production aerospike clusters:
Error 1
{
code: 14,
message: 'AEROSPIKE_ERR_RECORD_BUSY',
func: 'as_command_parse_result',
file: 'src/main/aerospike/as_command.c',
line: 916,
query: {}
}
Error 2
{
code: 9,
message: 'Client timeout: timeout=1000 iterations=1 failedNodes=0 failedConns=0',
func: 'as_command_execute',
file: 'src/main/aerospike/as_command.c',
line: 446
}
Error 3
{
code: -1,
message: 'Bad file descriptor',
func: 'as_socket_read_limit',
file: 'src/main/aerospike/as_socket.c',
line: 413,
input: {
ns: 'Graph',
set: 'SLT',
key: '%14426123659267653m8s1804416150636',
digest: <Buffer f6 ea a2 18 ab 51 70 cb 83 9a 3f 02 d2 69 50 d4 9e 05 83 a3>
}
}
Is there any way to get more context as to why these errors are happening and how we can address/fix/prevent them?
Here is our aerospike.conf:
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002
mesh-seed-address-port XX.XX.XXX.102 3002 # X-ed out
mesh-seed-address-port XX.XX.XXX.87 3002 # X-ed out
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace Graph {
replication-factor 2
memory-size 100G
storage-engine device {
device /dev/sdb
write-block-size 128k
}
}
Here is a dump of df -k
on one of our nodes:
admin_augur_io@aerospike-w9wx:~$ df -k
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 10319160 1484920 8310056 16% /
udev 10240 0 10240 0% /dev
tmpfs 10748608 140 10748468 1% /run
/dev/disk/by-uuid/ac75b061-bf43-4298-9234-8a555ab0f9ac 10319160 1484920 8310056 16% /
tmpfs 5120 0 5120 0% /run/lock
tmpfs 21497200 0 21497200 0% /run/shm
/dev/sdb 12540912287890076416 12540912286858158908 0 100% /aerospike
admin_augur_io@aerospike-w9wx:~$ cat /var/log/messages | grep sdb
Sep 20 20:50:08 aerospike-w9wx kernel: [13293360.096019] EXT4-fs (sdb): last error at time 705507539: :16843009: inode 729977170: block 282578800148737
Sep 21 20:51:55 aerospike-w9wx kernel: [13379867.616034] EXT4-fs (sdb): last error at time 705507539: :16843009: inode 729977170: block 282578800148737
Sep 22 20:53:43 aerospike-w9wx kernel: [13466375.136020] EXT4-fs (sdb): last error at time 705507539: :16843009: inode 729977170: block 282578800148737
And our other node:
admin_augur_io@aerospike-behq:~$ df -k
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 10319160 1428364 8366612 15% /
udev 10240 0 10240 0% /dev
tmpfs 10748608 140 10748468 1% /run
/dev/disk/by-uuid/ac75b061-bf43-4298-9234-8a555ab0f9ac 10319160 1428364 8366612 15% /
tmpfs 5120 0 5120 0% /run/lock
tmpfs 21497200 0 21497200 0% /run/shm
/dev/sdb 12540912287890076416 12540912286858159544 0 100% /aerospike
There was no mention of SDB in the second node’s /var/log/messages
Our Aero VMs use SSDs.