Access-address virtual vs dc-int-ext-ipmap for XDR

Hi,

I wanted to ask if there was any inherent disadvantage of using “access-address (external IP) virtual” in Aerospike nertwork stanza for service config to achieve external/internal IP mapping for XDR configurations? The setup I’m describing runs with multiple clusters over AWS and GCE.

The documentation for XDR doesn’t refer to the ‘virtual’ access address as an alternative, but instead a heavy handed approach to defining every possible public/private IP mapping with dc-int-ext-ipmap for remote nodes so send XDR data to. This solution is not very elegant:

  • from a configuration management perspective each node needs to know internal and external IPs for all remote nodes in other cluster, which is ugly
  • dc-int-ext-ipmap is defined as static in the configuration reference so adding new nodes to remote clusters requires all XDR services to be restarted

By using

service {
address any
port 3000 
access-address [EXT IP] virtual 
}

The aerospike service will allow local clients to use the internal IP and remote XDR services to access via the external IP. The only perceived downside I have observed so far is that the java client constantly tries (and fails) to connect to the external IP of a local aerospike node because it gets added to the list of friends. While this pollutes the INFO/DEBUG logging, is there any risk or race condition that could lead to query failure, and is there any plan to enhance the client to be aware of addresses which are labelled as virtual?

The only problem is as you mentioned, your local clients and tools will learn of the public access-address. The Java client will walk your seed list until if finds a seed that responds with a list of its neighbors. It will then stop walking the seed list and use the host that provided neighbors plus the neighbors nodes for transactions. Since your clients cannot access the public interface there will only be a single valid address for your clients to talk to. The result will be desegregated performance caused by the one accessible server proxying all requests on behalf of this client.

You can see the number of proxy (and other transactions) by running:

asadm -e "show latency"

Hi,

If I’m following correctly you’re saying that there may only be a danger if the client can access the public IP of the aerospike node? In our set up that’s not possible; we have 6 nodes and many clients, each client can reach the internal IP of any node in the cluster but continually tries to add the public IP of each.

My concern was about the unnecessary attempts for the client to retry against public IPs, is there a potential risk there? A nice feature would be for Aerospike server to inform clients of ‘virtual’ addresses so that the client can be configured to filter them out.

Check the proxies in your cluster. If what I said is correct, then there should be a lot of proxies. Normally there would be 0. The bootstrap algorithm walks your seed nodes until one responds with a list of neighbors, at which point it no longer walks that list, but instead it walk the list of neighbors. In your setup, the node that responded and the list of neighbor access-address will be the only nodes your client will try to communicate with.

Like the internal to external mapping in XDR, the java client also has the capability to define an address translation.

We don’t see any proxy requests on our cluster at the moment, but what we have observed is that in this ‘access-address virtual’ configuration, on occasion we’ll start seeing proxied requests out of the blue. When we restart a client, the proxied requests stop being registered on the server. Is this a consequence of that configuration type?

All Aerospike nodes can only talk to each other in a local cluster on the internal addresses so each node should have both a list of external and internal neighbour IPs? The tend process on the client runs every second and offers logs like the following:

20150605T153013.586+0000 application-server DEBUG [tend] Alias xx.xx.xx.91:3000 failed: Error Code 11: java.net.SocketTimeoutException: connect timed out  [applicaiton-server.AerospikeModule$1.log() @ 52]
20150605T153013.586+0000 application-server  WARN [tend] Add node xx.xx.xx.91:3000 failed: Error Code 11: java.net.SocketTimeoutException: connect timed out  [application-server.AerospikeModule$1.log() @ 46]
20150605T153015.591+0000 application-server  DEBUG [tend] Alias xx.xx.xx.6:3000 failed: Error Code 11: java.net.SocketTimeoutException: connect timed out  [application-server.AerospikeModule$1.log() @ 52]
20150605T153015.591+0000 application-server   WARN [tend] Add node xx.xx.xx.6:3000 failed: Error Code 11: java.net.SocketTimeoutException: connect timed out  [application-server.AerospikeModule$1.log() @ 46]

etc.

Right now I’m trying to ensure we have a minimal amount of configuration overhead per server and if using virtual addresses can support XDR needs and make the cluster scalable without having to maintain XDR services I’d like to ensure that access-address virtual works for us. It doesn’t look like the aerospike-client differentiates between internal and virtual addresses discovered but if it could label virtual addresses, it would solve this potential issue.

Run asadm -e "asinfo -v services"

This will return the IP addresses that each node will advertise to the client. Notice that it will only be the access-addresses that you have defined.

If you were to add a new host to the cluster, your clients will not find it unless you add the server to your client configuration. And even then, I would expect that it not walk that entire list but this expectation is contradictory to you not seeing proxies so I will need to check into that further.

Ah, well then access-address virtual is fine to support scaling clusters with XDR but dangerous for application clients.

I ran asinfo -v services and the only IPs listed from all nodes in the cluster were public IPs. So this suggests that the only way the application client can know about internal IPs is from the list assigned on client creation (fortunately for our application, we provide a list of all internal IPs for nodes).

This means that we have no choice but to use the dc-int-ext-ipmap for multi-dc configurations. Will XDR writes proxy to the correct host when a remote cluster is scaled up? If so then there will be no risk of replication failure between the time of adding a new node and the time to restart an XDR service with the new node IP mapping.

Correct, the XDR writes will proxy to the appropriate nodes.

An alternate approach to the whole thing is to setup a VLAN between source and destination if its possible. You can make the destination server listen on the VLAN-assigned IPs which are accessible from the source cluster. Then you need not to do the internal-external mapping. The configuration becomes much more simple. You need not specify all the IP address of the destination cluster in the XDR config at the source. XDR can discover the cluster based on a seed IP address.

Thank you for the follow up, the compromise of XDR being able to have its forwarding messages proxied when cluster size changes is good enough for us to be work with dc-int-ext-ipmap in a cloud environment. The configuration overhead is still an undesirable issue though, so if there were planned improvements on service discoverability I’d be happy to hear about them :wink:

The VLAN approach would certainly allow us to address remote machines as if on a common internal network, but at this stage and with the volumes of data we’re syncing, that would introduce a bottleneck and point of failure through our VPN tunnel links. But it’s an idea that’s on our roadmap.