Aerospike_err_server_full

bindto · June 2, 2020, 2:18pm

Please explain how deletion works in term to release resources.

Aerospike Community Edition build 5.0.0.3

Filled in set “test” by data until defined memory for namespace is full and got AEROSPIKE_ERR_SERVER_FULL. Deleted 10 records from set and tried to add single record again, got AEROSPIKE_ERR_SERVER_FULL.

aql> truncate test; OK

aql> select PK from test.test where PK=1; Error: (2) 127.0.0.1:3000 AEROSPIKE_ERR_RECORD_NOT_FOUND

Attempt to add single record , again AEROSPIKE_ERR_SERVER_FULL.

So, how can it be solved without server restart?

kporter · June 2, 2020, 7:38pm

Non-durable deletes remove a record from the index immediately and the space within the wblock on disk is available to be asynchronously reclaimed by defrag. For a wblock to be eligible for defrag, the occupied space within the wblock needs to drop below the defrag-lwm-pct.

I would need to see the associated warning in the logs to know if the problem is lack of free index space in RAM or lack of available wblocks in storage. Meanwhile here is a KB on the defrag system.

bindto · June 2, 2020, 9:47pm

My understanding defrag is not applicable for memory engine, sorry I didn’t specified that.

I’ve just spotted message in the log

Jun 02 2020 23:41:10 GMT: WARNING (nsup): (nsup.c:667) {test} breached eviction hwm (memory), memory sz:1398820752 (894208 + 0 + 1397926544) hwm:912680550, index-device sz:0 hwm:0, disk sz:0 hwm:0

looks like it takes all available memory despite limit is set. Can it affect deletion procedure?

before load data

free -h
              total        used        free      shared  buff/cache   available
Mem:          2.9Gi       1.1Gi       1.6Gi       108Mi       294Mi       1.7Gi
Swap:         4.0Gi          0B       4.0Gi

after data loaded

free -h
              total        used        free      shared  buff/cache   available
Mem:          2.9Gi       2.8Gi        24Mi       0.0Ki       132Mi       109Mi
Swap:         4.0Gi       1.3Gi       2.7Gi

namespace test {
        replication-factor 2
        memory-size 1G
        nsup-period  10s
        high-water-memory-pct 85

        storage-engine memory
}

Jun 02 2020 23:41:02 GMT: INFO (truncate): (truncate.c:142) {users} got command to truncate to now (328837262996)
Jun 02 2020 23:41:10 GMT: WARNING (nsup): (nsup.c:667) {test} breached eviction hwm (memory), memory sz:1398820752 (894208 + 0 + 1397926544) hwm:912680550, index-device sz:0 hwm:0, disk sz:0 hwm:0
Jun 02 2020 23:41:10 GMT: WARNING (nsup): (nsup.c:727) {test} no records eligible for eviction
Jun 02 2020 23:41:11 GMT: WARNING (nsup): (nsup.c:875) {test} breached stop-writes limit (memory), memory sz:1398820752 (894208 + 0 + 1397926544) limit:966367641, disk avail-pct:100
Jun 02 2020 23:41:11 GMT: INFO (info): (ticker.c:423) {test} objects: all 13972 master 13972 prole 0 non-replica 0
Jun 02 2020 23:41:11 GMT: INFO (info): (ticker.c:483) {test} migrations: complete
Jun 02 2020 23:41:11 GMT: INFO (info): (ticker.c:501) {test} memory-usage: total-bytes 1398820752 index-bytes 894208 sindex-bytes 0 data-bytes 1397926544 used-pct 130.28
Jun 02 2020 23:41:11 GMT: INFO (info): (ticker.c:632) {test} client: tsvc (0,0) proxy (0,0,0) read (0,0,0,0,0) write (13972,6028,0,0) delete (0,0,0,0,0) udf (0,0,0,0) lang (0,0,0,0)
Jun 02 2020 23:41:11 GMT: INFO (info): (ticker.c:809) {test} scan: basic (1,1,0) aggr (0,0,0) udf-bg (0,0,0) ops-bg (0,0,0)
Jun 02 2020 23:41:11 GMT: INFO (info): (hist.c:240) histogram dump: {test}-write (20000 total) msec
Jun 02 2020 23:41:11 GMT: INFO (info): (hist.c:257)  (00: 0000019815) (01: 0000000170) (02: 0000000012) (03: 0000000002)
Jun 02 2020 23:41:11 GMT: INFO (info): (hist.c:266)  (04: 0000000001)
Jun 02 2020 23:41:20 GMT: WARNING (nsup): (nsup.c:667) {test} breached eviction hwm (memory), memory sz:1398820752 (894208 + 0 + 1397926544) hwm:912680550, index-device sz:0 hwm:0, disk sz:0 hwm:0
Jun 02 2020 23:41:20 GMT: WARNING (nsup): (nsup.c:727) {test} no records eligible for eviction
Jun 02 2020 23:41:21 GMT: WARNING (nsup): (nsup.c:875) {test} breached stop-writes limit (memory), memory sz:1398820752 (894208 + 0 + 1397926544) limit:966367641, disk avail-pct:100

kporter · June 10, 2020, 5:55pm

Similarly, stop_writes is also evaluated every 10 seconds (server constant). It is possible for writes to a node to exceed the configured memory-size between cycles. Also note that stop-writes does not apply to replica writes, so successful writes to other nodes can push a node further beyond the stop-writes-pct.

Also, do you write expirable records - nsup is reporting that it cannot find anything that is expirable. If your use case does not use expiration, you can turn off NSUP by setting the nsup-period to 0.

Finally, there should be a message indicating how many records were deleted by truncation when the job finishes:

cf_info(AS_TRUNCATE, "{%s} truncated records (%lu,%lu)", ns->name,
				ns->truncate.n_records_this_run, ns->truncate.n_records);

bindto · June 10, 2020, 9:00pm

Unfortunately, there is no message about a deletion. The server is in the stable state but it is impossible to add record, the memory is still in use, repeatedly reports only one warning

Jun 10 2020 22:47:18 GMT: WARNING (nsup): (nsup.c:875) {test} breached stop-writ es limit (memory), memory sz:1366182936 (873344 + 0 + 1365309592) limit:96636764 1, disk avail-pct:100

Thus, what should be done in this case? Any way to force the server to evict a deleted records?

kporter · June 10, 2020, 9:18pm

With data-in-memory, a deleted record is gone. Eviction is basically early expiration both of which are deletes and doesn’t seem to apply to your use case (based on the log lines you have provided).

bindto · June 10, 2020, 9:33pm

Ok, not correct term.

But do you see my problem? The deleted record should gone but it didn’t. Or it is gone but the server still doesn’t allow to insert record.

Can you reproduce this case and test?

kporter · June 10, 2020, 9:50pm

According to this log, you are currently using 1398820752 bytes of memory which is above your stop-writes limit (966367641 bytes).

Removing records should decrease the used bytes.

kporter · June 10, 2020, 9:50pm

You could also dynamically increase the value of memory-size as well.

bindto · June 10, 2020, 10:24pm

Actually, the set was truncated. As far as it was the single set, logically, there is no records in database at all.

Jun 02 2020 23:41:02 GMT: INFO (truncate): (truncate.c:142) {users} got command to truncate to now (328837262996)

So I expected the server frees resources and allows to insert records. Is it wrong vision?

Ok, it is possible dynamically increase memory, but what if there is no memory available? What to do with the server if it keeps a lot of memory and doesn’t use it? The server can start from a small amount memory and constantly keeping resources after the deletion, finally hits a limit anyway.

kporter · June 10, 2020, 10:33pm

The expectation is that if you remove a record, the memory is freed (we are in agreement here). That said, I do not see the line from truncate indicating how many records were deleted, which is troubling.

Could you provide the output of:

asadm -e "info"
asadm -e "info set"

bindto · June 10, 2020, 10:45pm

Well, I can’t see indication how many records were deleted as well. The resources are differ from initial post, I’ve re-started server and repeated steps.

asadm -e “info” Seed: [(‘127.0.0.1’, 3000, None)] Config_file: /root/.aerospike/astools.conf, /etc/aerospike/astools.conf

        Node    Node                   Ip       Build   Cluster   Migrations        Cluster     Cluster   Principal   Client     Uptime   
           .      Id                    .           .      Size            .            Key   Integrity           .    Conns          .   
dcz:3000   *BB95   192.168.1.129:3000   C-5.0.0.3         1      0.000     3CF9CC40DF8A   True        BB95             3   00:02:34   
Number of rows: 1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2020-06-11 00:36:17 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace           Node      Total   Expirations,Evictions     Stop       Disk    Disk     HWM   Avail%        Mem     Mem    HWM      Stop          PI         PI      PI     PI   
        .              .    Records                       .   Writes       Used   Used%   Disk%        .       Used   Used%   Mem%   Writes%        Type       Used   Used%   HWM%   
test        dcz:3000   17.916 K   (0.000,  0.000)         true          N/E   N/E     0       N/E      1.670 GB   100     0      90        undefined   1.094 MB   0       N/E    
test                       17.916 K   (0.000,  0.000)                  0.000 B                             1.670 GB                                        0.000 B                   
Number of rows: 2

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2020-06-11 00:36:17 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace           Node      Total     Repl                       Objects                   Tombstones             Pending   Rack   
        .              .    Records   Factor    (Master,Prole,Non-Replica)   (Master,Prole,Non-Replica)            Migrates     ID   
        .              .          .        .                             .                            .             (tx,rx)      .   
test        dcz:3000   17.916 K   1        (17.916 K, 0.000,  0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)     0      
test                       17.916 K            (17.916 K, 0.000,  0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)            
Number of rows: 2



asadm -e "info set"
Seed:        [('127.0.0.1', 3000, None)]
Config_file: /root/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~Set Information (2020-06-11 00:40:21 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~
  Set   Namespace           Node      Set        Mem    Objects     Stop    Disable      Set   
    .           .              .   Delete       Used          .   Writes   Eviction   Enable   
    .           .              .        .          .          .    Count          .      Xdr   
users   test        dcz:3000   None     1.669 GB   17.916 K   0        false      N/E      
users   test                                1.669 GB   17.916 K                                
Number of rows: 2

kporter · June 10, 2020, 10:51pm

Thanks, I’ll try to reproduce this result later this evening. Meanwhile, would you be able to test on 4.9?

kporter · June 11, 2020, 12:57am

Ah! you are issuing the command wrong from aql.

You need to do:

aql> truncate test.users

You have been issuing a command to truncate the namespace ‘users’ and not the set ‘users’ in the namespace ‘test’. This explains why the line indicating how many records were deletes was not appearing, because this node didn’t have the namespace ‘users’ it didn’t execute the command.

bindto · June 11, 2020, 11:28am

Yeap. It is. Thank you for hint.

Well, in this case I would say aql utility needs to be more precise or should have declaimer something like “Check result of the command in the asd log.”

aql> truncate test.userss;
OK

asd log
Jun 11 2020 13:22:23 GMT: INFO (truncate): (truncate.c:302) {test|userss} truncate for nonexistent set

what does aql’s “OK” mean in this case?

kporter · June 11, 2020, 5:33pm

A namespace doesn’t have to be on every node in the cluster, also every node in the cluster may not be available at this time. Similarly, a set may not be on all nodes. So the ‘OK’ means that the server has recorded the transaction to truncate the namespace ‘users’ or the set ‘test.userss’. The nodes that received the command will relay this command to other nodes joining or returning to the cluster which may have the truncated set or namespace.

bindto · June 11, 2020, 8:36pm

Interesting. Thank you.

However, in this particular case

Jun 11 2020 13:22:23 GMT: INFO (truncate): (truncate.c:302) {test|userss} truncate for nonexistent set

is transaction finished?

if we have 10 servers cluster. 5 are online and 5 are not available. is transaction relayed to all ‘live’ nodes and waits somewhere until rest 5 nodes be online to process it?

Side question, asked before but was ignored. Could you advise how query works with predicates in dept. For instance predicateBetween is used. Is resultset buffered on server side? If yes what is size of buffer? Or does it return record on each fetch operation?

And last: Aql crashes on drop secondary index command.

kporter · June 12, 2020, 6:12pm

When you send a truncate command, you have the option to send a timestamp. If you do not sent a timestamp, the current time is used. This information is stored in System Metadata (SMD) and the client is then acked. SMD will asynchronously call the truncate modules callback when the module is updated. Contents of SMD are distributed to all node in the cluster or all nodes that eventually join the cluster. Many systems in Aerospike operate through SMD, they include truncate, eviction, secondary indexes, and UDF.

bindto · June 12, 2020, 6:50pm

What happens if a truncate command sent, lets say ‘truncate test.userss;’ , however there is node with the test.userss online. Later on, some server has created with and joined to cluster. Will truncate command be applied for this node?

kporter · June 12, 2020, 7:59pm

Yes, all SMD data will replicate to all nodes.

Topic		Replies	Views
Asinfo set-delete: change the AEROSPIKE_ERR_SERVER_FULL error message (AER-3191) Deletion deletion	5	3691	August 6, 2015
Unused data on the disk - Error Code 8: Server memory error Configuration	11	8162	June 29, 2017
Clarify a case of "Server memory error"	1	4487	November 11, 2014
Found existence of deleted and expired records in Aerospike! Deletion deletion , durable-deletion	7	5545	December 29, 2016
Aerospike -community server 3.14.0 version	2	1089	June 14, 2017

Aerospike_err_server_full

Related topics