After I upgrade to aerospike 3.10.1, I get a fatal error “Error while enumerating network routes” in cent OS. But 3.8.3 can run properly on server. So I feel strange about that.
I only change the default config to mesh mode.
heartbeat {
#mode multicast
#multicast-group 239.1.99.222
#port 9918
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
mode mesh
interval 150
timeout 10
}
But I meet following fatal error in log
The error message is listed below:
Dec 02 2016 04:11:09 GMT: INFO (config): (cfg.c:3344) system file descriptor limit: 100000, proto-fd-max: 15000
Dec 02 2016 04:11:09 GMT: WARNING (cf:socket): (socket.c:1595) Error while sending netlink request: 71 (Protocol error)
Dec 02 2016 04:11:09 GMT: FAILED ASSERTION (cf:socket): (socket.c:1983) Error while enumerating network routes
Dec 02 2016 04:11:09 GMT: WARNING (as): (signal.c:210) SIGUSR1 received, aborting Aerospike Community Edition build 3.10.1 os el6
Dec 02 2016 04:11:09 GMT: INFO (as): (signal.c:214) call stack: found 10 frames
Dec 02 2016 04:11:09 GMT: INFO (as): (signal.c:214) call stack: frame 0: /usr/bin/asd(as_sig_handle_usr1+0x36) [0x4a9e67]
I change aerospike.conf to following, but the issue is still here.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address any
port 3000
}
heartbeat {
# mode multicast
# address 239.1.99.222
# port 9918
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
# mode mesh
mode mesh
address 10.16.5.71
port 3002
#mesh-seed-address-port 10.16.5.71 3002
#port 3002 # Heartbeat port for this node.
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace product {
replication-factor 2
memory-size 2G
high-water-memory-pct 90
high-water-disk-pct 90
default-ttl 0 # 30 days, use 0 to never expire/evict.
storage-engine device {
file /opt/aerospike/data/data1.dat
file /opt/aerospike/data/data2.dat
file /opt/aerospike/data/data3.dat
file /opt/aerospike/data/data4.dat
file /opt/aerospike/data/data5.dat
filesize 4g
write-block-size 1M
data-in-memory false # Store data in memory in addition to file.
defrag-startup-minimum 10 # server needs at least 10%
}
}
namespace test1 {
replication-factor 1
memory-size 1G
storage-engine memory
}
namespace test2 {
replication-factor 2
memory-size 1G
storage-engine device {
file /opt/aerospike/data/test2.dat
filesize 16G
data-in-memory false
}
}
Thank you for reporting this issue. May I ask what the exact version of the Linux kernel on that machine is? I.e., what is the output when you run the
uname -a
command on the machine?
The problem is triggered when the Aerospike server tries to communicate with the Linux kernel in order to read the network routing tables. For some reason, this fails on your machine. Hence I’d like to know the exact version of the Linux kernel, so that we can try to reproduce the issue with that same kernel.