We are facing issue in our Aerospike cluster.
Infra detail:
Aerospike version: V4.5.2.2 C client binaries: V4.6.16
Issue description: scenario-1:
The record has been inserted by C application in Aerospike successfully, where TTL is 14 Hrs, which is confirmed by logging aerospike insert return status as success by library function. The aero-c-client library function used by C application is “aerospike_key_put()” like “if (aerospike_key_put(as, &err, NULL, &g_key, &rec) != AEROSPIKE_OK)”.
Another C application searches this inserted record from Aerospike where the record has not been found in aerospike, to process and re-insert by deleting old key and inserting the record with new key with TTL of 14 Hrs. This search is made in a few milliseconds post insert. The aero-c-client library function used by C application is “aerospike_key_get()” like “if (aerospike_key_get(p_as, &err, NULL, g_key, &p_rec) != AEROSPIKE_OK)”.
This aerospike record fetch failure occurs intermittently, which is nearly 1-10% of total inserts and fetch / get done by C applications and is not for all records.
Issue description: scenario-2:
The record has been inserted by C application in Aerospike successfully, where TTL is 14 Hrs, which is confirmed by logging aerospike insert return status as success by library function. The aero-c-client library function used by C application is “aerospike_key_put()” like “if (aerospike_key_put(as, &err, NULL, &g_key, &rec) != AEROSPIKE_OK)”.
Another C application fetches the record and re-insert the changed key with TTL of 14 Hrs, as per point 2 in the above scenario. The aero-c-client library function used by C application is “aerospike_key_get()” like “if (aerospike_key_get(p_as, &err, NULL, g_key, &p_rec) != AEROSPIKE_OK)”.
Post 12 Hrs another C application post 12th Hrs searches Aerospike for fetching the record details where the record has not been found in aerospike due to which the message was unable to process further.The aero-c-client library function used by C application is “aerospike_key_get()” like “if (aerospike_key_get(p_as, &err, NULL, g_key, &p_rec) != AEROSPIKE_OK)”.
This aerospike record fetch failure occurs intermittently, which is nearly 1-10% of total inserts and fetch / get done by C applications and is not for all records.
Configuration File:
service {
paxos-single-replica-limit 1
proto-fd-max 90000
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
query-in-transaction-thread true}
logging {
file /var/log/aerospike/aerospike.log {
context any info
}
console {
context any warning
}
}
network {
service {
address any
port 3000
}
heartbeat
{
mode mesh
#multicast-group 239.1.99.222
port 3002
mesh-seed-address-port IP-address_1 3002
mesh-seed-address-port IP-address_2 3002
mesh-seed-address-port IP-address_3 3002
mesh-seed-address-port IP-address_4 3002
interval 150
timeout 10
}
fabric
{
port 3001
}
info
{
port 3003
}
}
namespace dup {
replication-factor 2
memory-size 10G
default-ttl 20M # 30 days, use 0 to never expire/evict.
data-in-index true
single-bin true
write-commit-level-override=all
read-consistency-level-override=all
storage-engine device {
file /u01/aerospike/data/prod_dup.dat
filesize 40G
data-in-memory true
}
}
namespace tran {
replication-factor 2
memory-size 110G
default-ttl 18H # 30 days, use 0 to never expire/evict.
write-commit-level-override=all
read-consistency-level-override=all
storage-engine memory
}