Aerospike_err_server_full

Please explain how deletion works in term to release resources.

Aerospike Community Edition build 5.0.0.3

Filled in set “test” by data until defined memory for namespace is full and got AEROSPIKE_ERR_SERVER_FULL. Deleted 10 records from set and tried to add single record again, got AEROSPIKE_ERR_SERVER_FULL.

aql> truncate test; OK

aql> select PK from test.test where PK=1; Error: (2) 127.0.0.1:3000 AEROSPIKE_ERR_RECORD_NOT_FOUND

Attempt to add single record , again AEROSPIKE_ERR_SERVER_FULL.

So, how can it be solved without server restart?

Non-durable deletes remove a record from the index immediately and the space within the wblock on disk is available to be asynchronously reclaimed by defrag. For a wblock to be eligible for defrag, the occupied space within the wblock needs to drop below the defrag-lwm-pct.

I would need to see the associated warning in the logs to know if the problem is lack of free index space in RAM or lack of available wblocks in storage. Meanwhile here is a KB on the defrag system.

My understanding defrag is not applicable for memory engine, sorry I didn’t specified that.

I’ve just spotted message in the log

Jun 02 2020 23:41:10 GMT: WARNING (nsup): (nsup.c:667) {test} breached eviction hwm (memory), memory sz:1398820752 (894208 + 0 + 1397926544) hwm:912680550, index-device sz:0 hwm:0, disk sz:0 hwm:0

looks like it takes all available memory despite limit is set. Can it affect deletion procedure?

before load data

free -h
              total        used        free      shared  buff/cache   available
Mem:          2.9Gi       1.1Gi       1.6Gi       108Mi       294Mi       1.7Gi
Swap:         4.0Gi          0B       4.0Gi

after data loaded

free -h
              total        used        free      shared  buff/cache   available
Mem:          2.9Gi       2.8Gi        24Mi       0.0Ki       132Mi       109Mi
Swap:         4.0Gi       1.3Gi       2.7Gi
namespace test {
        replication-factor 2
        memory-size 1G
        nsup-period  10s
        high-water-memory-pct 85

        storage-engine memory
}
Jun 02 2020 23:41:02 GMT: INFO (truncate): (truncate.c:142) {users} got command to truncate to now (328837262996)
Jun 02 2020 23:41:10 GMT: WARNING (nsup): (nsup.c:667) {test} breached eviction hwm (memory), memory sz:1398820752 (894208 + 0 + 1397926544) hwm:912680550, index-device sz:0 hwm:0, disk sz:0 hwm:0
Jun 02 2020 23:41:10 GMT: WARNING (nsup): (nsup.c:727) {test} no records eligible for eviction
Jun 02 2020 23:41:11 GMT: WARNING (nsup): (nsup.c:875) {test} breached stop-writes limit (memory), memory sz:1398820752 (894208 + 0 + 1397926544) limit:966367641, disk avail-pct:100
Jun 02 2020 23:41:11 GMT: INFO (info): (ticker.c:423) {test} objects: all 13972 master 13972 prole 0 non-replica 0
Jun 02 2020 23:41:11 GMT: INFO (info): (ticker.c:483) {test} migrations: complete
Jun 02 2020 23:41:11 GMT: INFO (info): (ticker.c:501) {test} memory-usage: total-bytes 1398820752 index-bytes 894208 sindex-bytes 0 data-bytes 1397926544 used-pct 130.28
Jun 02 2020 23:41:11 GMT: INFO (info): (ticker.c:632) {test} client: tsvc (0,0) proxy (0,0,0) read (0,0,0,0,0) write (13972,6028,0,0) delete (0,0,0,0,0) udf (0,0,0,0) lang (0,0,0,0)
Jun 02 2020 23:41:11 GMT: INFO (info): (ticker.c:809) {test} scan: basic (1,1,0) aggr (0,0,0) udf-bg (0,0,0) ops-bg (0,0,0)
Jun 02 2020 23:41:11 GMT: INFO (info): (hist.c:240) histogram dump: {test}-write (20000 total) msec
Jun 02 2020 23:41:11 GMT: INFO (info): (hist.c:257)  (00: 0000019815) (01: 0000000170) (02: 0000000012) (03: 0000000002)
Jun 02 2020 23:41:11 GMT: INFO (info): (hist.c:266)  (04: 0000000001)
Jun 02 2020 23:41:20 GMT: WARNING (nsup): (nsup.c:667) {test} breached eviction hwm (memory), memory sz:1398820752 (894208 + 0 + 1397926544) hwm:912680550, index-device sz:0 hwm:0, disk sz:0 hwm:0
Jun 02 2020 23:41:20 GMT: WARNING (nsup): (nsup.c:727) {test} no records eligible for eviction
Jun 02 2020 23:41:21 GMT: WARNING (nsup): (nsup.c:875) {test} breached stop-writes limit (memory), memory sz:1398820752 (894208 + 0 + 1397926544) limit:966367641, disk avail-pct:100

Similarly, stop_writes is also evaluated every 10 seconds (server constant). It is possible for writes to a node to exceed the configured memory-size between cycles. Also note that stop-writes does not apply to replica writes, so successful writes to other nodes can push a node further beyond the stop-writes-pct.

Also, do you write expirable records - nsup is reporting that it cannot find anything that is expirable. If your use case does not use expiration, you can turn off NSUP by setting the nsup-period to 0.

Finally, there should be a message indicating how many records were deleted by truncation when the job finishes:

cf_info(AS_TRUNCATE, "{%s} truncated records (%lu,%lu)", ns->name,
				ns->truncate.n_records_this_run, ns->truncate.n_records);

Unfortunately, there is no message about a deletion. The server is in the stable state but it is impossible to add record, the memory is still in use, repeatedly reports only one warning

Jun 10 2020 22:47:18 GMT: WARNING (nsup): (nsup.c:875) {test} breached stop-writ es limit (memory), memory sz:1366182936 (873344 + 0 + 1365309592) limit:96636764 1, disk avail-pct:100

Thus, what should be done in this case? Any way to force the server to evict a deleted records?

With data-in-memory, a deleted record is gone. Eviction is basically early expiration both of which are deletes and doesn’t seem to apply to your use case (based on the log lines you have provided).

Ok, not correct term.

But do you see my problem? The deleted record should gone but it didn’t. Or it is gone but the server still doesn’t allow to insert record.

Can you reproduce this case and test?

According to this log, you are currently using 1398820752 bytes of memory which is above your stop-writes limit (966367641 bytes).

Removing records should decrease the used bytes.

You could also dynamically increase the value of memory-size as well.

Actually, the set was truncated. As far as it was the single set, logically, there is no records in database at all.

Jun 02 2020 23:41:02 GMT: INFO (truncate): (truncate.c:142) {users} got command to truncate to now (328837262996)

So I expected the server frees resources and allows to insert records. Is it wrong vision?

Ok, it is possible dynamically increase memory, but what if there is no memory available? What to do with the server if it keeps a lot of memory and doesn’t use it? The server can start from a small amount memory and constantly keeping resources after the deletion, finally hits a limit anyway.

The expectation is that if you remove a record, the memory is freed (we are in agreement here). That said, I do not see the line from truncate indicating how many records were deleted, which is troubling.

Could you provide the output of:

asadm -e "info"
asadm -e "info set"

Well, I can’t see indication how many records were deleted as well. The resources are differ from initial post, I’ve re-started server and repeated steps.

asadm -e “info” Seed: [(‘127.0.0.1’, 3000, None)] Config_file: /root/.aerospike/astools.conf, /etc/aerospike/astools.conf

        Node    Node                   Ip       Build   Cluster   Migrations        Cluster     Cluster   Principal   Client     Uptime   
           .      Id                    .           .      Size            .            Key   Integrity           .    Conns          .   
dcz:3000   *BB95   192.168.1.129:3000   C-5.0.0.3         1      0.000     3CF9CC40DF8A   True        BB95             3   00:02:34   
Number of rows: 1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2020-06-11 00:36:17 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace           Node      Total   Expirations,Evictions     Stop       Disk    Disk     HWM   Avail%        Mem     Mem    HWM      Stop          PI         PI      PI     PI   
        .              .    Records                       .   Writes       Used   Used%   Disk%        .       Used   Used%   Mem%   Writes%        Type       Used   Used%   HWM%   
test        dcz:3000   17.916 K   (0.000,  0.000)         true          N/E   N/E     0       N/E      1.670 GB   100     0      90        undefined   1.094 MB   0       N/E    
test                       17.916 K   (0.000,  0.000)                  0.000 B                             1.670 GB                                        0.000 B                   
Number of rows: 2

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2020-06-11 00:36:17 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace           Node      Total     Repl                       Objects                   Tombstones             Pending   Rack   
        .              .    Records   Factor    (Master,Prole,Non-Replica)   (Master,Prole,Non-Replica)            Migrates     ID   
        .              .          .        .                             .                            .             (tx,rx)      .   
test        dcz:3000   17.916 K   1        (17.916 K, 0.000,  0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)     0      
test                       17.916 K            (17.916 K, 0.000,  0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)            
Number of rows: 2



asadm -e "info set"
Seed:        [('127.0.0.1', 3000, None)]
Config_file: /root/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~Set Information (2020-06-11 00:40:21 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~
  Set   Namespace           Node      Set        Mem    Objects     Stop    Disable      Set   
    .           .              .   Delete       Used          .   Writes   Eviction   Enable   
    .           .              .        .          .          .    Count          .      Xdr   
users   test        dcz:3000   None     1.669 GB   17.916 K   0        false      N/E      
users   test                                1.669 GB   17.916 K                                
Number of rows: 2

Thanks, I’ll try to reproduce this result later this evening. Meanwhile, would you be able to test on 4.9?

Ah! you are issuing the command wrong from aql.

You need to do:

aql> truncate test.users

You have been issuing a command to truncate the namespace ‘users’ and not the set ‘users’ in the namespace ‘test’. This explains why the line indicating how many records were deletes was not appearing, because this node didn’t have the namespace ‘users’ it didn’t execute the command.

Yeap. It is. Thank you for hint.

Well, in this case I would say aql utility needs to be more precise or should have declaimer something like “Check result of the command in the asd log.”

aql> truncate test.userss;
OK

asd log
Jun 11 2020 13:22:23 GMT: INFO (truncate): (truncate.c:302) {test|userss} truncate for nonexistent set

what does aql’s “OK” mean in this case?

A namespace doesn’t have to be on every node in the cluster, also every node in the cluster may not be available at this time. Similarly, a set may not be on all nodes. So the ‘OK’ means that the server has recorded the transaction to truncate the namespace ‘users’ or the set ‘test.userss’. The nodes that received the command will relay this command to other nodes joining or returning to the cluster which may have the truncated set or namespace.

Interesting. Thank you.

However, in this particular case

Jun 11 2020 13:22:23 GMT: INFO (truncate): (truncate.c:302) {test|userss} truncate for nonexistent set

is transaction finished?

if we have 10 servers cluster. 5 are online and 5 are not available. is transaction relayed to all ‘live’ nodes and waits somewhere until rest 5 nodes be online to process it?

Side question, asked before but was ignored. Could you advise how query works with predicates in dept. For instance predicateBetween is used. Is resultset buffered on server side? If yes what is size of buffer? Or does it return record on each fetch operation?

And last: Aql crashes on drop secondary index command.

When you send a truncate command, you have the option to send a timestamp. If you do not sent a timestamp, the current time is used. This information is stored in System Metadata (SMD) and the client is then acked. SMD will asynchronously call the truncate modules callback when the module is updated. Contents of SMD are distributed to all node in the cluster or all nodes that eventually join the cluster. Many systems in Aerospike operate through SMD, they include truncate, eviction, secondary indexes, and UDF.

What happens if a truncate command sent, lets say ‘truncate test.userss;’ , however there is node with the test.userss online. Later on, some server has created with and joined to cluster. Will truncate command be applied for this node?

Yes, all SMD data will replicate to all nodes.