Error while enumerating network routes in Aerospike 3.10.1


#1

After I upgrade to aerospike 3.10.1, I get a fatal error “Error while enumerating network routes” in cent OS. But 3.8.3 can run properly on server. So I feel strange about that. I only change the default config to mesh mode.

heartbeat {
                #mode multicast
                #multicast-group 239.1.99.222
                #port 9918

                # To use unicast-mesh heartbeats, remove the 3 lines above, and see
                # aerospike_mesh.conf for alternative.
                mode mesh
                interval 150
                timeout 10
        }

But I meet following fatal error in log

The error message is listed below:

Dec 02 2016 04:11:09 GMT: INFO (config): (cfg.c:3344) system file descriptor limit: 100000, proto-fd-max: 15000
Dec 02 2016 04:11:09 GMT: WARNING (cf:socket): (socket.c:1595) Error while sending netlink request: 71 (Protocol error)
Dec 02 2016 04:11:09 GMT: FAILED ASSERTION (cf:socket): (socket.c:1983) Error while enumerating network routes
Dec 02 2016 04:11:09 GMT: WARNING (as): (signal.c:210) SIGUSR1 received, aborting Aerospike Community Edition build 3.10.1 os el6
Dec 02 2016 04:11:09 GMT: INFO (as): (signal.c:214) call stack: found 10 frames
Dec 02 2016 04:11:09 GMT: INFO (as): (signal.c:214) call stack: frame 0: /usr/bin/asd(as_sig_handle_usr1+0x36) [0x4a9e67]

Could you kindly provide some clue? thanks.


Aerospike CE 3.14.1.1 crashes with "Error while enumerating network links"
#2

I change aerospike.conf to following, but the issue is still here.

service {
        user root
        group root
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        pidfile /var/run/aerospike/asd.pid
        service-threads 4
        transaction-queues 4
        transaction-threads-per-queue 4
        proto-fd-max 15000
}

logging {
        # Log file must be an absolute path.
        file /var/log/aerospike/aerospike.log {
                context any info
        }
}

network {
        service {
                address any
                port 3000
        }

        heartbeat {
                # mode multicast
                # address 239.1.99.222
                # port 9918

                # To use unicast-mesh heartbeats, remove the 3 lines above, and see
                # aerospike_mesh.conf for alternative.

                # mode mesh
                mode mesh
                address 10.16.5.71
                port 3002
                #mesh-seed-address-port 10.16.5.71 3002
                #port 3002 # Heartbeat port for this node.
                interval 150
                timeout 10
        }

        fabric {
                port 3001
        }

        info {
                port 3003
        }
}

namespace product {
        replication-factor 2
        memory-size 2G
        high-water-memory-pct 90
        high-water-disk-pct 90
        default-ttl 0 # 30 days, use 0 to never expire/evict.
        storage-engine device {
               file /opt/aerospike/data/data1.dat
               file /opt/aerospike/data/data2.dat
               file /opt/aerospike/data/data3.dat
               file /opt/aerospike/data/data4.dat
               file /opt/aerospike/data/data5.dat
               filesize 4g
               write-block-size 1M
               data-in-memory false # Store data in memory in addition to file.
               defrag-startup-minimum  10 # server needs at least 10%
        }
}

namespace test1 {
        replication-factor 1
        memory-size 1G
        storage-engine memory
}

namespace test2 {
        replication-factor 2
        memory-size 1G

        storage-engine device {
                file /opt/aerospike/data/test2.dat
                filesize 16G
                data-in-memory false
        }
}

#3

Thank you for reporting this issue. May I ask what the exact version of the Linux kernel on that machine is? I.e., what is the output when you run the

uname -a

command on the machine?

The problem is triggered when the Aerospike server tries to communicate with the Linux kernel in order to read the network routing tables. For some reason, this fails on your machine. Hence I’d like to know the exact version of the Linux kernel, so that we can try to reproduce the issue with that same kernel.

Sorry for the inconvenience!

Thomas