I have a great deal of experience with Aerospike in high throughput, low-latency environments, and have come to love it, although I have a new need now that doesn’t seem to be the focus of any popular NoSQL solutions, so I am setting forth on a mission to accomplish this using Aerospike.
The goal is to create a widely dispersed cluster (across data centers and continents) of application servers that can each independently carry out their work without a dependency on any other servers. Each application server has everything it needs to complete its work, however there is shared data that needs to be synchronized between all servers in near real-time, which I hope to accomplish using Aerospike. This solution therefore necessitates that every node have a complete copy of the data.
The data set is small ( < 10,000 records), and has low transactions per minute ( ~ 1 ). The code will run on the same physical server as the Aerospike daemon, so the client will read and write to Aerospike using the localhost IP of 127.0.0.1 exclusively. The end solution achieves redundancy by having multiple application servers, so if any local dependency on the server fails, then that entire application server is to be considered down.
My questions are:
Is this viable using Aerospike?
How can I configure the data to exist on every node; simply set “replication-factor” to the total number of nodes?
This solution isn’t using XDR, although nodes exist across the Internet, so my plan is to use SSH tunnels to securely mesh all servers. This raises a problem however, which is, how do I configure Aerospike to cope with this unusually high latency between nodes?
I will attempt to update this thread with my progress and findings over time, as well as with any future questions or challenges that I encounter along the way.
Running a single Aerospike cluster that spans multiple regions is not a configuration Aerospike recommends and Aerospike configuration defaults are not going to be ideal for this configuration.
Yes, but be aware that the replication-factor is not a dynamic configuration option, so you will probably want to set it a bit higher than the total number of nodes in the cluster to allow for node addition in the future without restarting the entire cluster to increase the replication factor.
The first problem is that Aerospike defaults assume the cluster will be tightly coupled. We time out a node by default if we do not hear a heartbeat from that node within 1.5 seconds. You will need to increase the heartbeat interval and heartbeat-timeout to cope with the latency.
Be aware that increasing these values also increases the window where some partitions will not be available for writes due to node failure.
For such an environment may start with interval of 1000 and timeout of 60 which will have each node sending a heartbeat every second and timeout a node if you haven’t heard a heartbeat from it in 60 intervals.
The cluster communicate with other nodes using our “fabric” layer. Again typically the latency between nodes is expected to be low so the default number of 16 fabric-workers per node is enough to handle cross node requests without waiting for a thread to free up. In your case this probably will not be true so you will need to increase this value.
Another consideration:
You probably want your application reading and writing from the local node. But our clients are not aware of this topology, some of them can be configure to read from replica copies but they will read from them in a round robin manner which is probably not what you want.
A potential solution here is to configure the access-address to be 127.0.0.1 on each node. This will cause clients to only learn about that address. But your writes will suffer from this configuration because they will need to proxy to the primary copy of that record. Since your client is running on the same node, maybe this isn’t a big deal breaker.
A problem with running the application on the same node as the server is that the server and application will be competing for the same resources. You can mitigate this to a certain degree by placing the client and server on different NUMA nodes.
Our architecture has a similar layout actually and we’re currently trying several different methods:
XDR between local clusters in each region where many app servers use the local cluster and let AS handle the rest
Separate application managed replication where the app handles it in a layer above the database, usually best for idempotent or high-latency stuff. We write to one location as a queue then fan out writes back to each region. Good for syncing/sharing data but major headaches for high-availability as catching up a disconnected cluster won’t be easy.
In-memory data fabric is the newest stuff we’re using. Basically IMDG/IMDF are in-memory caches evolved like this: single server cache > multiple server shared cache > distributed caching + data structures > distributed caching + data + computing. Look at Hazelcast, GridGain, Coherence for options. The IMDF code runs as a separate instance on every app server and can do local clustering with WAN transfer links. It can get complicated but very cool solution if you need it.