Node Not Found For Partition using AQL with Strong Consistency

Node Not Found For Partition using AQL with Strong Consistency

Problem Description

When a namespace has been configured to be strongly consistent, a test insert into the namespace fails while an insert from the same client for an AP namespace works correctly.

Admin> show config like strong-consistency
~~~~~~~~~~~~~~~~~~~~~~~~~~~test Namespace Configuration (2020-03-31 17:41:24 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                            :   10342564f2bd:3000   172.17.0.4:3000   172.17.0.5:3000   172.17.0.6:3000
strong-consistency              :   false               false             false             false
strong-consistency-allow-expunge:   false               false             false             false

~~~~~~~~~~~~~~~~~~~~~~~~~~~bar Namespace Configuration (2020-03-31 17:41:24 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~
NODE                            :   10342564f2bd:3000   172.17.0.4:3000   172.17.0.5:3000   172.17.0.6:3000
strong-consistency              :   true                true              true              true
strong-consistency-allow-expunge:   false               false             false             false

Admin>

...

root@10342564f2bd:/# aql
Seed:         127.0.0.1
User:         None
Config File:  /etc/aerospike/astools.conf /root/.aerospike/astools.conf
Aerospike Query Client
Version 3.23.0
C Client Version 4.6.9
Copyright 2012-2019 Aerospike. All rights reserved.
aql> insert into test.testset (PK,value1) values(1,'value1')
OK, 1 record affected.

aql> insert into bar.testset (PK,value1) values(1,'value1')
Error: (-8) Node not found for partition bar:501

aql>

Explanation

This issue will occur when the roster has not been set for the strongly consistent namespace. If the roster is not set, the cluster cannot know which nodes to distribute the namespace data across and therefore partitions cannot be assigned. Using the example cluster above the partition map can be displayed and it can be seen that there are no partitions mapped to nodes for the strongly consistent namespace (bar).

Admin> show pmap
~~~~~~~~~~~~~~~~~~~~~~~~~Partition Map Analysis (2020-03-31 17:43:04 UTC)~~~~~~~~~~~~~~~~~~~~~~~~
     Cluster   Namespace                Node      Primary    Secondary         Dead   Unavailable
         Key           .                   .   Partitions   Partitions   Partitions    Partitions
179FDC193C39   bar         10342564f2bd:3000            0            0            0             0
179FDC193C39   bar         172.17.0.5:3000              0            0            0             0
179FDC193C39   bar         172.17.0.4:3000              0            0            0             0
179FDC193C39   bar         172.17.0.6:3000              0            0            0             0
179FDC193C39   test        10342564f2bd:3000         1024         1024            0             0
179FDC193C39   test        172.17.0.5:3000           1024         1024            0             0
179FDC193C39   test        172.17.0.4:3000           1024         1024            0             0
179FDC193C39   test        172.17.0.6:3000           1024         1024            0             0
Number of rows: 8

Admin>

The roster defines the cluster in it’s normal state in terms of node membership. Without this node list it is not possible to create a partition map as, unlike AP mode, the cluster will not simply map partitions to all the nodes it can see. If the cluster were to do this then consistency in the face of a network partition could not be assured. For this reason, the roster is key. To validate that the roster is the issue, it can be checked using the roster asinfo command.

Admin> asinfo -v 'roster:namespace=bar'
10342564f2bd:3000 (172.17.0.3) returned:
roster=null:pending_roster=null:observed_nodes=BB9060011AC4202,BB9050011AC4202,BB9040011AC4202,BB9030011AC4202

172.17.0.5:3000 (172.17.0.5) returned:
roster=null:pending_roster=null:observed_nodes=BB9060011AC4202,BB9050011AC4202,BB9040011AC4202,BB9030011AC4202

172.17.0.4:3000 (172.17.0.4) returned:
roster=null:pending_roster=null:observed_nodes=BB9060011AC4202,BB9050011AC4202,BB9040011AC4202,BB9030011AC4202

172.17.0.6:3000 (172.17.0.6) returned:
roster=null:pending_roster=null:observed_nodes=BB9060011AC4202,BB9050011AC4202,BB9040011AC4202,BB9030011AC4202

Admin>

The output above confirms that while all cluster nodes are visible or observed, none are present in the roster.

Solution

To resolve this issue the roster should be set. This is done using the roster-set info command followed by the recluster info command. Only the principal node is expected to respond to the recluster command and other nodes will ignore it.

Admin> asinfo -v 'roster-set:namespace=bar;nodes=BB9060011AC4202,BB9050011AC4202,BB9040011AC4202,BB9030011AC4202'
10342564f2bd:3000 (172.17.0.3) returned:
ok

172.17.0.5:3000 (172.17.0.5) returned:
ok

172.17.0.4:3000 (172.17.0.4) returned:
ok

172.17.0.6:3000 (172.17.0.6) returned:
ok

Admin> asinfo -v 'recluster:namespace=bar'
10342564f2bd:3000 (172.17.0.3) returned:
ignored-by-non-principal

172.17.0.5:3000 (172.17.0.5) returned:
ignored-by-non-principal

172.17.0.4:3000 (172.17.0.4) returned:
ignored-by-non-principal

172.17.0.6:3000 (172.17.0.6) returned:
ok

Admin>

Partitions for namespace bar are now mapped across the 4 nodes in the roster as expected:

Admin> show pmap
~~~~~~~~~~~~~~~~~~~~~~~~~Partition Map Analysis (2020-03-31 17:52:36 UTC)~~~~~~~~~~~~~~~~~~~~~~~~
     Cluster   Namespace                Node      Primary    Secondary         Dead   Unavailable
         Key           .                   .   Partitions   Partitions   Partitions    Partitions
B84A449E41CF   bar         10342564f2bd:3000         1024         1024            0             0
B84A449E41CF   bar         172.17.0.5:3000           1024         1024            0             0
B84A449E41CF   bar         172.17.0.4:3000           1024         1024            0             0
B84A449E41CF   bar         172.17.0.6:3000           1024         1024            0             0

The AQL command now completes properly.

aql> insert into bar.testset (PK,value1) values(1,'value1')
OK, 1 record affected.

Notes

  • The Node Not Found For Partition could also indicate a potential tending error.
  • It is not mandatory to include all observed nodes in the cluster in the roster however if this is not done the reasoning should be well understood.

Keywords

NODE NOT FOUND FOR PARTITION AQL CLIENT STRONG CONSISTENCY ROSTER

Timestamp

March 2020

© 2015 Copyright Aerospike, Inc. | All rights reserved. Creators of the Aerospike Database.