Configuration documentation is a mess


#1

http://www.aerospike.com/docs/operations/troubleshoot/startup

network {
        service {
        ...
                node-id-interface p2p1
                address any
                port 3000
        }
        ...
}

 * Starting aerospike
Jan 10 2017 15:08:47 GMT: FAILED ASSERTION (config): (cfg.c:1349) line 33 :: unknown config parameter name 'node-id-interface'
Jan 10 2017 15:08:47 GMT: WARNING (as): (signal.c:153) SIGINT received, shutting down
Jan 10 2017 15:08:47 GMT: WARNING (as): (signal.c:156) startup was not complete, exiting immediately

OK Moving into service.

Log:

Jan 10 2017 15:14:20 GMT: INFO (as): (as.c:423) <><><><><><><><><><>  Aerospike Community Edition build 3.11.0.1  <><><><><><><><><><>
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) # Aerospike database configuration file for deployments using raw storage.
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) service {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     user root
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     group root
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     paxos-single-replica-limit 2 # Number of nodes where the replica count is automatically reduced to 1.
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     paxos-protocol v4
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     pidfile /var/run/aerospike/asd.pid
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     service-threads 2
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     transaction-queues 4
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     transaction-threads-per-queue 4
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     proto-fd-max 90000
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     paxos-max-cluster-size 16
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     node-id-interface eth0
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) cluster {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     mode dynamic
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     self-group-id 1
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) logging {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     # Log file must be an absolute path.
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     file /var/log/aerospike/aerospike.log {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         context any info
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) network {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     service {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         address any
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         port 3000
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         access-address 172.31.34.199
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)    }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     heartbeat {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         mode mesh
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         port 3002
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         mesh-seed-address-port 172.31.37.249 3002
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         interval 150
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         timeout 40
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     fabric {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) 	address 172.31.34.199
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         port 3001
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     info {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) 	address 172.31.34.199
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         port 3003
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) namespace primary {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     replication-factor 2
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     memory-size 23G
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     default-ttl 90d
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     stop-writes-pct 90
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     high-water-memory-pct 85
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     high-water-disk-pct 80
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     write-commit-level-override master
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     # Warning - legacy data in defined raw partition devices will be erased.
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     # These partitions must not be mounted by the file system.
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     storage-engine device {
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         # Use one or more lines like those below with actual device paths.
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         device /dev/xvdf
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         defrag-lwm-pct 50
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         # The 2 lines below optimize for SSD.
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         scheduler-mode noop
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)         write-block-size 1M
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) 	data-in-memory true
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424)     }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3424) }
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3442) system file descriptor limit: 100000, proto-fd-max: 90000
Jan 10 2017 15:14:20 GMT: INFO (cf:socket): (socket.c:2563) Node port 3001, node ID bb93d7f476c7206
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3470) Rack Aware mode enabled
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:4215) Cluster Mode Dynamic: Config IP address for Self Node
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:4218) Setting node ID to 2130706433 (0x7F000001) from IP address "127.0.0.1"
Jan 10 2017 15:14:20 GMT: INFO (config): (cfg.c:3492) Node id bb900017f000001
Jan 10 2017 15:14:20 GMT: INFO (namespace): (namespace_ce.c:96) ns primary beginning COLD start

Cool, isn’t it? Setting node ID to 2130706433 (0x7F000001) from IP address “127.0.0.1

# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 06:72:6c:47:7f:3d
          inet addr:172.31.34.199  Bcast:172.31.47.255  Mask:255.255.240.0
          inet6 addr: fe80::472:6cff:fe47:7f3d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:39532 errors:0 dropped:0 overruns:0 frame:0
          TX packets:28860 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:31921870 (31.9 MB)  TX bytes:4659888 (4.6 MB)

Total mess

and ifconfig lo down service aerospike start results in

Jan 10 2017 15:35:01 GMT: INFO (config): (cfg.c:3442) system file descriptor limit: 100000, proto-fd-max: 90000
Jan 10 2017 15:35:01 GMT: INFO (cf:socket): (socket.c:2563) Node port 3001, node ID bb93d7f476c7206
Jan 10 2017 15:35:01 GMT: INFO (config): (cfg.c:3470) Rack Aware mode enabled
Jan 10 2017 15:35:01 GMT: INFO (config): (cfg.c:4215) Cluster Mode Dynamic: Config IP address for Self Node
Jan 10 2017 15:35:01 GMT: INFO (config): (cfg.c:4218) Setting node ID to 2887721671 (0xAC1F22C7) from IP address "172.31.34.199"
Jan 10 2017 15:35:01 GMT: INFO (config): (cfg.c:3492) Node id bb90001ac1f22c7

what the hell node-id-interface option meaning is then? It is being ignored all the way


Decommissioning a node
#3

Thanks for reporting this issue. I am sorry that this bug is causing you frustration. Aerospike may be pretty powerful, but it also has quite a few configuration options and concepts to deal with. It’s particularly frustrating, if you have to deal with a bug like this on top of it all. Especially, when it’s such a ridiculous bug. Sorry about this.

The issue is that when I refactored our network code to support IPv6, I broke dynamic cluster mode. It ignores the node-id-interface setting and, what’s worse, always picks the first IP address - be it v4 or v6; at least this part works! - of the first network interface reported by the Linux kernel: the loopback interface.

Unfortunately, there isn’t any workaround other than using static cluster mode at the moment. (I am assuming that you need to run Aerospike in rack-aware mode.)

Of course, you can always run Aerospike without it being rack-aware. That would work, too. Most of our customers do not use rack-awareness and thus this bug went unnoticed for this long. To turn off rack-awareness, simply remove the “cluster” configuration section and the “paxos-protocol v4” line.

Let me know how things go.

Thomas


#4

I need a rack-aware cluster and thus the option for me was to “down” the loopback interface on Aerospike start/restart bug reported https://github.com/aerospike/aerospike-server/issues/169


#5

Alternatively you can use a static cluster configuration setup, where you specify the node-id yourself in the configuration file, for each of the nodes.

http://www.aerospike.com/docs/operations/configure/network/rack-aware