Aerospike PHP-client library

Using PHP client version 3.3.2 when the aerospike node is down we still get $db->isConnected() as true. we have a nginx + php-fpm setup which writes to aerospike.

That method is unfortunately named. It would be more accurate to call it hasConnected, because it just is a flag set to true/false right after you first connect to the server. It doesn’t check or re-establish a connection.

I would only use it right after calling $client = new aerospike($config); to check for connection errors. Later, any operation such as exists, get, put will re-establish a connection if it got disconnected, and throw an error if it cannot connect.

I’m still considering whether to rename the method or add code to it to actually test the connections.

thanks for the reply. Interesting… Since were using persistent connections. It is a problem then.

Until Each of my fpm nginx children is killed it will throw errors.

I noticed, using persistence as false solves this issue.

Is there a workaround, to have persistence and still be able to know connection status ?

That’s not how it works. Each FPM child process is its own PHP process and it does not share a connection with the other processes. If you choose persistent connections it means that all the requests sent to the same process will use the object that connected on the first request this process received (after it was spun up). Again, persistent connections means that the process doesn’t open and close the connections to the aerospike server on each request, it keeps it open.

The only thing you can possibly share is the cluster tending thread, if you configure aerospike.shm.use=true in your php.ini.

When a node goes down, as long as there are other nodes still up, your cluster will re-form. The cluster tending thread will recognize within a second that this event happened, get the new partition map, and start sending reads and writes to the correct node. In the interim, between the new cluster forming automatically and the cluster tending thread learning about the change, your app’s requests will get proxied to the correct node.

How many nodes do you have in the cluster, and what does the code you mention look like? Can you paste it here?

agreed. we have very transient data. so our requirement is satisfied at a single r3.xlarge instance. i.e. a 1 node cluster. what happens when cluster tend thread determines this 1 node cluster is down ?. will it correctly report the isConnected() call ?

// create aerospike connection
$opts = array(Aerospike::OPT_CONNECT_TIMEOUT => 50);
$config = array(
    "hosts" => array(
        array("addr" => "10.0.1.30", "port" => 3000)
        ));
// use a persistent connection
$db = new Aerospike($config, true, $opts);
if (!$db->isConnected()) {
    $use_kafka = true;
} else {
    $use_kafka = false;
}

we use kafka as a failover for aerospike. we have aerospike crashes once in a while.

The following script checks on a single record repeatedly, with a pause of 0.5s between iterations. If the connection fails it attempts to read from the node containing the replica of the record.

<?php
$config = ['hosts'=>[['addr'=>'192.168.119.3','port'=>3000]]];
$db = new Aerospike($config, true);
var_dump("connected? ",$db->isConnected());
$key = $db->initKey('test','foo', 15537250);
$i = 1;
while (1) {
    echo "iteration: $i\n";
    $status = $db->exists($key, $meta);
    if ($status !== Aerospike::OK) {
        var_dump("error!", $db->isConnected(),$status, $db->error(),"========================");
        // let's try to read from the replica now
        $status = $db->exists($key, $meta, [Aerospike::OPT_POLICY_REPLICA => Aerospike::POLICY_REPLICA_ANY]);
        if ($status !== Aerospike::OK) {
            var_dump("failed to read from replica.", $db->error());
        } else {
            var_dump("got it from the replica:", $meta);
        }
    } else {
        var_dump($status, $meta);
    }
    time_nanosleep(0, 500000000);
    $i++;
}
?>

I take down one of the nodes in a two-node cluster (the one which contains the record)

iteration: 40
int(0)
array(2) {
  ["generation"]=>
  int(1)
  ["ttl"]=>
  int(2377130)
}
iteration: 41
string(6) "error!"
bool(true)
int(9)
string(69) "Client timeout: timeout=1000 iterations=2 failedNodes=0 failedConns=0"
string(24) "========================"
string(24) "got it from the replica:"
array(2) {
  ["generation"]=>
  int(1)
  ["ttl"]=>
  int(2377130)
}
iteration: 42
string(6) "error!"
bool(true)
int(9)
string(69) "Client timeout: timeout=1000 iterations=2 failedNodes=0 failedConns=0"
string(24) "========================"
string(24) "got it from the replica:"
array(2) {
  ["generation"]=>
  int(1)
  ["ttl"]=>
  int(2377129)
}
iteration: 43
string(6) "error!"
bool(true)
int(9)
string(69) "Client timeout: timeout=1000 iterations=2 failedNodes=0 failedConns=0"
string(24) "========================"
string(24) "got it from the replica:"
array(2) {
  ["generation"]=>
  int(1)
  ["ttl"]=>
  int(2377129)
}
iteration: 44
int(0)
array(2) {
  ["generation"]=>
  int(1)
  ["ttl"]=>
  int(2377128)
}
iteration: 45
int(0)
array(2) {
  ["generation"]=>
  int(1)
  ["ttl"]=>
  int(2377128)
}

As you can see, this is one way to handle a temporary connection timeout on a read operation. I could have also slept for a while and tried again.

To answer your question, no, it will not report a state change. It only changes state on connection, reconnect() or close(). I’m considering changing that soon because it’s misleading. Also, I plan to add what we’ve discussed into the documentation.

I hope the code example above helps. However, a 1-node cluster in production is no better than using memcached. The point of Aerospike is that it provides speed at scale. If you don’t need scale or durability then it might be the wrong choice. I would recommend a two-node cluster of small instances over a single one. If you insist on a single node you can simply use a counter to track how many connection failures occur in a short period. Once you count past a certain threshold you can assume the node is down. As I pointed out in the example, the client keeps track and tries to reconnect.

we leverage the 80gb ssd along with the 30gb of RAM of an r3x, also we do batch reads using a secondary index, which aerospike supports well. using smaller instances is something ill explore.thanks