Aerospike routing traffic

Ivan44785372 · July 22, 2019, 7:35am

Hello. I have 2 node cluster, and I need to make requests only on 1 of them. I set one ip in client configuration, but I can see that my client connects to the second node also. Is it possible that client somehow get node list from server or server somehow route traffic to another node ? Is it possible to make client make requests only on first node ?

kporter · July 23, 2019, 12:55am

The rack-aware feature of Aerospike Enterprise allows to to partition your cluster into multiple racks. With rack-aware enabled, you can then set the prefer-rack policy in the client to prefer a particular rack. This is often used in cloud environments to drastically reduce cross ‘availability zone’ latency and data-transfer costs.

Ivan44785372 · July 23, 2019, 7:37am

Thank you for your answer. So do I understand correctly that aerospike get node list from server and even if I configure client only with one server, client will send data to all of the servers ? And there is no way to use only one server in cluster with community edition?

kporter · July 23, 2019, 4:25pm

Yes, that is correct. What’s the reason you want to do this?

Ivan44785372 · July 23, 2019, 4:52pm

Because of replication lag, we have two applications - the first writes to aerospike, the second one reads from aerospike, pretty often when the second comes to read from another aerospike node it can’t find the key because of replication lag. Delay between read and write operations can be 0.1 - 0.5 sec

kporter · July 23, 2019, 5:52pm

That shouldn’t be an issue by default. The default read policy will read from the master replica for a given partition, only if the master is unreachable will it try the replica. Writes must always use the master replica for a given partition.

There is something else going on with your environment. Could you grep the logs for ‘rebalanced’.

Also what version are the Aerospike Servers and what language and client version are you using?

Ivan44785372 · July 23, 2019, 6:02pm

Hm. It’s strange. Thank you for your help.

root@aerospike1 ~ # grep -ic rebalanced /var/log/aerospike/aerospike.log
0
root@aerospike1 ~ # head -1 /var/log/aerospike/aerospike.log
Apr 30 2019 04:27:44 GMT: INFO (info): (ticker.c:171) NODE-ID bb9d4512311b36c CLUSTER-SIZE 2

The aerospike server version is 4.3.0.7 I’m using php and go client library.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2019-07-23 18:03:07 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace                              Node       Total     Repl                           Objects                   Tombstones             Pending   Rack
        .                                 .     Records   Factor        (Master,Prole,Non-Replica)   (Master,Prole,Non-Replica)            Migrates     ID
        .                                 .           .        .                                 .                            .             (tx,rx)      .
ns1      aerospike1.domain.com:3000   675.382 M   2        (35.237 M, 640.145 M, 0.000)      (0.000,  0.000,  0.000)      (0.000,  0.000)     0
ns1      aerospike2.domain.com:3000   675.402 M   2        (640.144 M, 35.258 M, 0.000)      (0.000,  0.000,  0.000)      (0.000,  0.000)     0
ns1                                          1.351 B            (675.381 M, 675.403 M, 0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)

~

Ivan44785372 · July 23, 2019, 10:23pm

# asadm -e "show pmap"
Seed:        [('127.0.0.1', 3000, None)]
Config_file: /root/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Partition Map Analysis (2019-07-23 22:17:49 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Cluster   Namespace                              Node      Primary    Secondary         Dead   Unavailable
     Key           .                                 .   Partitions   Partitions   Partitions    Partitions
930430C64D1D   ns1      aerospike2.domain.com:3000         2078         2018            0             0
930430C64D1D   ns1      aerospike1.domain.com:3000         2018         2078            0             0

Maybe it means that aerospike2 now is master ? Is it possible to manually switch “master” for partitions ?

Ivan44785372 · July 23, 2019, 10:27pm

# asadm -e "asinfo -v 'replicas-all'"
Seed:        [('127.0.0.1', 3000, None)]
Config_file: /root/.aerospike/astools.conf, /etc/aerospike/astools.conf
aerospike2.pw.domain.com:3000 (xx.xx.xxx.xx) returned:
ns1:2,7QBQtEaaS7jdftJCZ9bfP19mI4s90p5618HZe0ypMy4ndrY52zx5of0J98fl5y+aDNCTHp1liO4GC5SlcjCiFwN4c3HE22H9okAbW4+cp6ZglCjKeQC/DMer/q/MreEvmTgZAhQxGJboN1Xk3vJO8xvFsp93fhimTc88guKEc7cYB1V6RZ8zWCcCrdQd2PtWkkEgA7p/nRe/+TrvD49PQ2jsMsg4aapuJ8PLRs+d/JSCFxPilH2E790rZtd4Uniuyh3nfFgkesJqzWbvBxDn+05l7Skbm97zqf2WMt2Erycj71UDG7uW8BTcoH4moGa5SSfPGNwZXhehofPBCeK64m4SWGkJFW3NgzzHv+L27v+UG6qxcQAAZ5wkACsP26TPHgn1VpBBJKJ+Skp2IT6T8lhMwpyREDeiPVfL1/wjB0XcT9R9RXDiMyaXIu+ID/ue5s62z1XWYe/nE7Og/aPY+8t1Om7HkoWDRAfzJgnC0DVUae5RBk/XKpydW8sBkB7/qQvRNRpdK+9s9EIThQLs/+rvhYSaC6N6sPb+ZLktG2n6ImKwrsR4WEH1pxPJrcRbNXARUTgjeWD5YxRV9qUO6xF0gI6QuawApmddiGKtsAYsudq7xwve2g9uGpm3cNTA1YO9ODjJ/MXQrnDdLdPVlcHjeskNwKAMQEXGAZuLYx4=,Ev+vS7lltEcigS29mCkgwKCZ3HTCLWGFKD4mhLNWzNHYiUnGJMOGXgL2CDgaGNBl8y9s4WKadxH59Gtajc9d6PyHjI47JJ4CXb/kpHBjWFmfa9c1hv9A8zhUAVAzUh7QZsfm/evO52kXyKobIQ2xDOQ6TWCIgedZsjDDfR17jEjn+KqFumDMp9j9UiviJwSpbb7f/EWAYuhABsUQ8HCwvJcTzTfHllWR2Dw0uTBiA2t96Owda4J7ECLUmSiHrYdRNeIYg6fbhT2VMpkQ+O8YBLGaEtbkZCEMVgJpzSJ7UNjcEKr85ERpD+sjX4HZX5lGttgw5yPmoeheXgw+9h1FHZHtp5b26pIyfMM4QB0JEQBr5FVOjv//mGPb/9TwJFsw4fYKqW++212BtbWJ3sFsDaezPWNu78hdwqg0KAPc+LojsCuCuo8dzNlo3RB38ARhGTFJMKopnhAY7ExfAlwnBDSKxZE4bXp8u/gM2fY9L8qrlhGu+bAo1WNipDT+b+EAVvQuyuWi1BCTC73sev0TABUQentl9FyFTwkBm0bS5JYF3Z1PUTuHp74KWOw2Ujukyo/ursfchp8GnOuqCVrxFO6Lf3FvRlP/WZiid51ST/nTRiVEOPQhJfCR5WZIjys/KnxCx8c2AzovUY8i0iwqaj4chTbyP1/zv7o5/mR0nOE=;ns2:1,//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////8=

aerospike1.pw.domain.com:3000 (yy.yy.yy.yy) returned:
ns1:2,Ev+vS7lltEcigS29mCkgwKCZ3HTCLWGFKD4mhLNWzNHYiUnGJMOGXgL2CDgaGNBl8y9s4WKadxH59Gtajc9d6PyHjI47JJ4CXb/kpHBjWFmfa9c1hv9A8zhUAVAzUh7QZsfm/evO52kXyKobIQ2xDOQ6TWCIgedZsjDDfR17jEjn+KqFumDMp9j9UiviJwSpbb7f/EWAYuhABsUQ8HCwvJcTzTfHllWR2Dw0uTBiA2t96Owda4J7ECLUmSiHrYdRNeIYg6fbhT2VMpkQ+O8YBLGaEtbkZCEMVgJpzSJ7UNjcEKr85ERpD+sjX4HZX5lGttgw5yPmoeheXgw+9h1FHZHtp5b26pIyfMM4QB0JEQBr5FVOjv//mGPb/9TwJFsw4fYKqW++212BtbWJ3sFsDaezPWNu78hdwqg0KAPc+LojsCuCuo8dzNlo3RB38ARhGTFJMKopnhAY7ExfAlwnBDSKxZE4bXp8u/gM2fY9L8qrlhGu+bAo1WNipDT+b+EAVvQuyuWi1BCTC73sev0TABUQentl9FyFTwkBm0bS5JYF3Z1PUTuHp74KWOw2Ujukyo/ursfchp8GnOuqCVrxFO6Lf3FvRlP/WZiid51ST/nTRiVEOPQhJfCR5WZIjys/KnxCx8c2AzovUY8i0iwqaj4chTbyP1/zv7o5/mR0nOE=,7QBQtEaaS7jdftJCZ9bfP19mI4s90p5618HZe0ypMy4ndrY52zx5of0J98fl5y+aDNCTHp1liO4GC5SlcjCiFwN4c3HE22H9okAbW4+cp6ZglCjKeQC/DMer/q/MreEvmTgZAhQxGJboN1Xk3vJO8xvFsp93fhimTc88guKEc7cYB1V6RZ8zWCcCrdQd2PtWkkEgA7p/nRe/+TrvD49PQ2jsMsg4aapuJ8PLRs+d/JSCFxPilH2E790rZtd4Uniuyh3nfFgkesJqzWbvBxDn+05l7Skbm97zqf2WMt2Erycj71UDG7uW8BTcoH4moGa5SSfPGNwZXhehofPBCeK64m4SWGkJFW3NgzzHv+L27v+UG6qxcQAAZ5wkACsP26TPHgn1VpBBJKJ+Skp2IT6T8lhMwpyREDeiPVfL1/wjB0XcT9R9RXDiMyaXIu+ID/ue5s62z1XWYe/nE7Og/aPY+8t1Om7HkoWDRAfzJgnC0DVUae5RBk/XKpydW8sBkB7/qQvRNRpdK+9s9EIThQLs/+rvhYSaC6N6sPb+ZLktG2n6ImKwrsR4WEH1pxPJrcRbNXARUTgjeWD5YxRV9qUO6xF0gI6QuawApmddiGKtsAYsudq7xwve2g9uGpm3cNTA1YO9ODjJ/MXQrnDdLdPVlcHjeskNwKAMQEXGAZuLYx4=;ns3:1,//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////8=

kporter · July 23, 2019, 10:37pm

Master are selected using a deterministic algorithm based on current cluster membership for each partition.

You can see from the output you have provided, aerospike2.domain.com:3000 has 2078 master (aka primary) partitions and 2018 replica (aka secondary) partitions.

The cluster seems stable since there hasn’t been a cluster disruption since the logs you grepped began. Are you sure the requests for the records are arriving after a successful write? Is it possible that a request arrives before the data was ever written or while the transaction is in progress?

pgupta · July 24, 2019, 1:11am

What I find odd is the huge disparity between distribution of total # of records masters vs replicas - for e.g. node 1 - #masters (35M) and #replicas (640M). I would check the config file of each node (/etc/aerospike/aerospike.conf), are the storage-engine size allocations same on both nodes? high-water-marks, grep for eviction in the logs …

kporter · July 24, 2019, 2:55am

Ah, good point @pgupta. The older eviction algorithm could cause these imbalances in 2 nodes clusters.

@Ivan44785372, to confirm, could you provide the output for:

asadm -e "info"

We would expect to see the vast majority of evictions taking place on the node with fewer master objects (aerospike1).

If this is the case you have two options:

Adding a third node will allow the eviction algorithm normalize.
Upgrade to Aerospike 4.5.1.5 or later - Aerospike 4.5.1.5 addresses this issue with:

[AER-6000] - (KVS) Redesigned namespace supervisor (nsup), featuring expiration and eviction without transactions, and per-namespace control.

Ivan44785372 · July 24, 2019, 5:11am

    # asadm -e info
    Seed:        [('127.0.0.1', 3000, None)]
    Config_file: /root/.aerospike/astools.conf, /etc/aerospike/astools.conf
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2019-07-24 05:00:29 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                               Node               Node                   Ip       Build   Cluster   Migrations        Cluster     Cluster         Principal   Client       Uptime
                                  .                 Id                    .           .      Size            .            Key   Integrity                 .    Conns            .
    aerospike1.domain.com:3000   *BB9D4512311B36C   xx.xx.xx.xx:3000   C-4.3.0.7         2      0.000     930430C64D1D   True        BB9D4512311B36C     5793   3335:45:16
    aerospike2.domain.com:3000   BB9BC512311B36C    yy.yy.yy.yy:3000    C-4.3.0.7         2      0.000     930430C64D1D   True        BB9D4512311B36C     2717   2557:45:49
    Number of rows: 2

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2019-07-24 05:00:29 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Namespace                              Node       Total    Expirations,Evictions     Stop       Disk    Disk     HWM   Avail%          Mem     Mem    HWM      Stop
            .                                 .     Records                        .   Writes       Used   Used%   Disk%        .         Used   Used%   Mem%   Writes%
    ns1      aerospike1.domain.com:3000   654.490 M   (5.265 B, 15.957 M)      false    1.173 TB   30      80      55       108.174 GB   57      94     96
    ns1      aerospike2.domain.com:3000   654.510 M   (356.946 M, 0.000)       false    1.173 TB   21      80      69       107.878 GB   34      90     90
    ns1                                        1.309 B   (5.622 B, 15.957 M)               2.345 TB                            216.052 GB
    ns2       aerospike1.domain.com:3000     6.607 M   (146.546 M, 86.522 M)    false         N/E   N/E     50      N/E        2.243 GB   38      96     98
    ns2                                         6.607 M   (146.546 M, 86.522 M)             0.000 B                               2.243 GB
    ns3      aerospike2.domain.com:3000     0.000     (0.000,  0.000)          false         N/E   N/E     50      N/E        0.000 B    0       60     90
    ns3                                        0.000     (0.000,  0.000)                   0.000 B                               0.000 B
    Number of rows: 7

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2019-07-24 05:00:29 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Namespace                              Node       Total     Repl                           Objects                   Tombstones             Pending   Rack
            .                                 .     Records   Factor        (Master,Prole,Non-Replica)   (Master,Prole,Non-Replica)            Migrates     ID
            .                                 .           .        .                                 .                            .             (tx,rx)      .
    ns1      aerospike1.domain.com:3000   654.490 M   2        (35.136 M, 619.354 M, 0.000)      (0.000,  0.000,  0.000)      (0.000,  0.000)     0
    ns1      aerospike2.domain.com:3000   654.510 M   2        (619.353 M, 35.157 M, 0.000)      (0.000,  0.000,  0.000)      (0.000,  0.000)     0
    ns1                                        1.309 B            (654.489 M, 654.511 M, 0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)
    ns2       aerospike1.domain.com:3000     6.608 M   1        (6.608 M, 0.000,  0.000)          (0.000,  0.000,  0.000)      (0.000,  0.000)     0
    ns2                                         6.608 M            (6.608 M, 0.000,  0.000)          (0.000,  0.000,  0.000)      (0.000,  0.000)
    ns3      aerospike2.domain.com:3000     0.000     1        (0.000,  0.000,  0.000)           (0.000,  0.000,  0.000)      (0.000,  0.000)     0
    ns3                                        0.000              (0.000,  0.000,  0.000)           (0.000,  0.000,  0.000)      (0.000,  0.000)
    Number of rows: 7

~ Configuration is a liitle bit different between nodes: aerospike1 has less disk space and memory-size than aerospike2 but as I can see aerospike1 doesn’t have stop writes flag and it has enough disk and memory space now.

kporter · July 24, 2019, 5:40am

This will make this type of issue worse. You should have homogeneous hardware within you cluster if possible. Alternatively, you should configure all nodes to logically have the same amount memory and disk resources.

These are the expirations, evictions from aerospike1 - notice that this node has evicted 15.957 million records.

While aerospike2 has evicted 0 records. This is causing a master record imbalance between these nodes. Because this node is full of replica objects, it has no room for new master objects. What you are seeing from the client is that a write succeeds and is immediately evicted from aerospike1 - subsequent reads will not find it since it has been removed.

You need to run homogeneous hardware if possible or make them to be homogeneous through configuration and upgrade to the at least 4.5.1.5 to resolve this issue.

pgupta · July 24, 2019, 5:47am

Another suggestion, if you download CE 4.5.1.5 or later version, register with the portal with your email etc. You will then get in your email, a link to Aerospike Academy with login creds - which will allow you to take a Free Online Intro course, few hours long. It will give you the necessary background to understand all these configuration issues better.

Ivan44785372 · July 24, 2019, 12:45pm

Thank you very much. BTW is it possible to have replication between 4.3.0.7 and 4.5.3.4 ?

kporter · July 24, 2019, 4:01pm

Yes, the problem is eviction - not replication.

Upgrading Aerospike1 first since it is the unhealthy node.

Topic		Replies	Views
Aerospikes strange behaviour when link between nodes goes down How Aerospike Works query , scan , index	8	3888	January 17, 2017
Uneven partition distribution Operations	3	1754	June 13, 2019
Inconsistent result if fetching a key when 1 node crashed on 4 node Aerospike cluster (3.9.0) AQL	31	3971	October 14, 2016
Aerospike Cluster Automatically Errors Node.js Client	3	3727	January 18, 2016
Aerospike not clustering using multicast or mesh Installation	17	4221	November 27, 2017

Aerospike routing traffic

Related topics