Failing to connect to cluster


#1

Hi!

When I try to connect to my Aerospike server, like this:

as.NewClient("127.0.0.1", 3000)

I get this error:

Failed to connect to host(s): [127.0.0.1:3000]

I try to set up something in /etc/hosts such that I’m using a domain name:

as.NewClient("myserver.com", 3000)

and just get:

Failed to connect to host(s): [myserver.com:3000]

I’ve been able to track this down to the NodeValidator.setAliases method, in particular this line:

addresses, err := net.LookupHost(host.Name)

which returns an error if the give name isn’t a domain name. Unfortunately, it appears that even when the original name is a domain, it walks up the DNS and finds an IP address and passes it to net.LookupHost and fails because an IP address isn’t a domain name.

I’ve been able to successfully connect by making these changes:

diff --git a/node_validator.go b/node_validator.go
index dc769d1..71444c4 100644
--- a/node_validator.go
+++ b/node_validator.go
@@ -52,16 +52,23 @@ func newNodeValidator(cluster *Cluster, host *Host, timeout time.Duration) (*nod
 }
 
 func (ndv *nodeValidator) setAliases(host *Host) error {
-	addresses, err := net.LookupHost(host.Name)
-	if err != nil {
-		return err
-	}
-	aliases := make([]*Host, len(addresses))
-	for idx, addr := range addresses {
-		aliases[idx] = NewHost(addr, host.Port)
+	ip := net.ParseIP(host.Name)
+	if ip != nil {
+		aliases := make([]*Host, 1)
+		aliases[0] = NewHost(host.Name, host.Port)
+		ndv.aliases = aliases
+	} else {
+		addresses, err := net.LookupHost(host.Name)
+		if err != nil {
+			return err
+		}
+		aliases := make([]*Host, len(addresses))
+		for idx, addr := range addresses {
+			aliases[idx] = NewHost(addr, host.Port)
+		}
+		ndv.aliases = aliases
 	}
-	ndv.aliases = aliases
-	Logger.Debug("Node Validator has %d nodes.", len(aliases))
+	Logger.Debug("Node Validator has %d nodes.", len(ndv.aliases))
 	return nil
 }

This checks to see if the given host name is an IP address, and if it is it just uses it as its own alias rather than failing. I’m then able to connect to the cluster and read values from it.

Thoughts? Is this a bug in the client or am I doing something stupid trying to connect to it with an IP address or domain name?


#2

You are not doing anything wrong. That’s how everybody connects to the database. I developed and work with the Go client on a daily basis on Mac, so this comes as a surprise.

It probably is a bug somewhere in the Go runtime, though I’d include your changes if we find out that it could work around this bug.

To help me investigate, could you please mention the client version you’re using (master, or a tag), your go compiler version and OSX version?


#3

I’m using the client master (as of commit 8f5f58f295a674f823ee193abf07f54a3b60d79c on Feb 17).

I’m on Yosemite:

$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.10.2
BuildVersion:	14C109

With Go 1.3:

$ go version
go version go1.3 darwin/amd64

It’s working well for me with my fix (although I’ve changed it a little so there’s less of a code change) but I don’t know how that’ll work when there’s actually a cluster involved.

Thanks!


#4

We are using the same OS and build, though I currently use Go 1.4. Changing to Go 1.3 didn’t make a difference. Strange.

I’m going to check your code on several other machines, and if all tests pass, I’ll include it in the next release.

Please let me know in case you encounter further road blocks. Thanks.


#5

This looks like the same problem I had months ago, and I’m interested to know where you’re running this. Are you connecting to an aerospike server inside a local VM? Are you building your program and running it inside a linux VM? I never tried connecting directly from OSX because, at the time, the aerospike cluster I was using to diagnose this problem is someplace in our cloud infrastructure.

From what I saw testing a few months ago this is issue came from cross compiling to another architecture, where cgo is disabled by default. The underlying netgo DNS implementation seemed to be at fault—I saw an error generated on this line: https://code.google.com/p/go/source/browse/src/pkg/net/dnsclient_unix.go?name=go1.3.3#216 Are you also compiling for another architecture? (Or otherwise disabling cgo, enabling the netgo build tag?)

Your solution looks like it would work for my issue! …which you can see the history of on github: https://github.com/aerospike/aerospike-client-go/issues/31