We just installed Aerospike and configured a single node cluster.
We use Aerospike Community edition 3.12.1 on Ubuntu 16.04.
First of all we found that there are some problems with the new systemd service manager. It is not very reliable and sometime when we want to stop Aerospike service, it takes a lot of time and it does not shut it down.
But the real problem is that this single node cluster fails most of the times when querying it. For example look at this normal aql command from command line:
$ aql -h 172.18.50.50
2017-05-15 17:08:10 WARN Failed to connect to seed 172.18.50.50 3000. AEROSPIKE_ERR_TIMEOUT , 172.18.50.50:3000
Error -1: Failed to connect
Error log is not showing us details of the error. We donāt know where to look, tried several configuration changes but still we have the same problem. Any help will be much appreciated!!
What is your /etc/aerospike/aerospike.conf file look like?
Are you running aql from a server separate from the single node cluster (@ 172.18.50.50?)
Can you ping 172.18.50.50 from the server on which you are running aql?
ie. Is this aerospike config issue or a network issue?
Everything is being done from the same server where aerospike is running (172.18.50.50). Iām pretty sure this is an aerospike config issue because the network works without problem (even we are not going outside, that IP is from the same server that aql is being run). And yes the ping works as is the same server:
$ ping 172.18.50.50
PING 172.18.50.50 (172.18.50.50) 56(84) bytes of data.
64 bytes from 172.18.50.50: icmp_seq=1 ttl=64 time=0.021 ms
64 bytes from 172.18.50.50: icmp_seq=2 ttl=64 time=0.028 ms
64 bytes from 172.18.50.50: icmp_seq=3 ttl=64 time=0.014 ms
This is the aerospike config file:
Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
proto-fd-max 15000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address 172.18.50.50
port 3000
access-address 172.18.50.50
}
heartbeat {
# mode multicast
# multicast-group 239.1.99.222
# port 9918
mode mesh
address 172.18.50.50
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
#asd1
mesh-seed-address-port 172.18.50.9 3002
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
port 3001
address 172.18.50.50
}
info {
port 3003
address 172.18.50.50
}
}
namespace recommender {
replication-factor 1
memory-size 8G
default-ttl 1m # 30 days, use 0 to never expire/evict.
conflict-resolution-policy last-update-time
storage-engine device {
file /var/data/data.dat
filesize 10G
data-in-memory true # Store data in memory in addition to file.
}
}
@fdnieves , the community forums can provide help on a best effort level. If you require urgent production level support, you can setup a contract with Aerospike to help you. They are there 24x7 for any issuesā¦
That being said, even though Iām not a member of the staff, Iām happy to help but please keep in mind that these troubleshooting steps and updates may not be extremely timelyā¦
Iām curious now to know if Aerospike is binded to that port at all. Can you run ānetstat -tunapā and post the output? Maybe something else has the port?
Also can you post the log file, so that we can see if anything is standing out? Maybe we can catch something you missed.
Thanks for your answer Albot. This is just a side-project development so no need to hire production level support right now.
The aql intermittently failing was due to a wrong configuration. As this was a single node (and this aerospike only runs in this machine), we specified āmesh-seed-address-port 172.18.50.9 3002ā to an unexistant server. Seems like it was messing the network a little bit. After commenting that line, aql logs every time without problems.