FAILED ASSERTION (alloc): (alloc.c:987) valloc failed sz 131072

Hello. After these messages my aerospike was crashed. What can be an issue ?

03:23:42 GMT: INFO (info): (ticker.c:399) {ns1} objects: all 0 master 0 prole 0 non-replica 0
03:23:42 GMT: INFO (info): (ticker.c:457) {ns1} migrations: complete
03:23:42 GMT: INFO (info): (ticker.c:475) {ns1} memory-usage: total-bytes 36864 index-bytes 0 sindex-bytes 36864 data-bytes 0 used-pct 0.00
03:23:44 GMT: FAILED ASSERTION (alloc): (alloc.c:987) valloc failed sz 131072
03:23:44 GMT: WARNING (as): (signal.c:210) SIGUSR1 received, aborting Aerospike Community Edition build 4.5.3.5 os ubuntu18.04
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: registers: rax 0000000000000000 rbx 00007f8d4b27d1a0 rcx 00007f8e83c55727 rdx 0000000000000000 rsi 00007f8d4b27d030 rdi 0000000000000002 rbp 00007f8d4b27d690 rsp 0
030 r8 0000000000000000 r9 00007f8d4b27d030 r10 0000000000000008 r11 0000000000000246 r12 0000000000000001 r13 0000000000000000 r14 0000000000000059 r15 0000557362921560 rip 00007f8e83c55727
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: found 16 frames: 0x557360c1aa4b 0x7f8e83c55890 0x7f8e83c55727 0x557360cc7472 0x557360cbe40e 0x557360c99646 0x557360c9cd88 0x557360c9d1b6 0x557360cb9244 0x557360cba
cbb381 0x557360c55e25 0x557360c5623b 0x557360cbf41b 0x7f8e83c4a6db 0x7f8e82a0b88f offset 0x557360b74000
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_usr1+0x13e) [0x557360c1aa4b]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890) [0x7f8e83c55890]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 2: /lib/x86_64-linux-gnu/libpthread.so.0(raise+0xc7) [0x7f8e83c55727]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 3: /usr/bin/asd(cf_fault_event+0x27a) [0x557360cc7472]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 4: /usr/bin/asd(valloc+0x90) [0x557360cbe40e]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 5: /usr/bin/asd(swb_get+0x1d7) [0x557360c99646]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 6: /usr/bin/asd(ssd_buffer_bins+0x121) [0x557360c9cd88]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 7: /usr/bin/asd(ssd_write+0xa4) [0x557360c9d1b6]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 8: /usr/bin/asd(write_master_ssd+0x2a1) [0x557360cb9244]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 9: /usr/bin/asd(write_master+0x2c8) [0x557360cbaa8f]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 10: /usr/bin/asd(as_write_start+0xf2) [0x557360cbb381]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 11: /usr/bin/asd(as_tsvc_process_transaction+0x613) [0x557360c55e25]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 12: /usr/bin/asd(run_tsvc+0x8c) [0x557360c5623b]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 13: /usr/bin/asd(+0x14b41b) [0x557360cbf41b]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 14: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f8e83c4a6db]
03:23:44 GMT: WARNING (as): (signal.c:212) stacktrace: frame 15: /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f8e82a0b88f]
05:13:25 GMT: INFO (as): (as.c:317) <><><><><><><><><><>  Aerospike Community Edition build 4.5.3.5  <><><><><><><><><><>

Your system was likely too low on RAM to satisfy the 131072 byte allocation. It is strange that the allocation asserted since Linux permits RAM to be over-committed, we usually see the OOM-Killer killing the process.

Hm, no objects, is this the only namespace?

Could you share your system specs as well as your aerospike.conf.

cat /proc/meminfo
cat /proc/cpuinfo | grep "model name"

Hm, It happened again.

Aug 23 2019 11:59:43 GMT: FAILED ASSERTION (alloc): (alloc.c:987) valloc failed sz 131072

Actually there are a lot of objects in the namespace.

Just before it crashed.

# asadm -e info
Seed:        [('127.0.0.1', 3000, None)]
Config_file: /root/.aerospike/astools.conf, /etc/aerospike/astools.conf
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information (2019-08-23 11:49:39 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                           Node               Node                   Ip       Build   Cluster   Migrations        Cluster     Cluster         Principal   Client       Uptime
                              .                 Id                    .           .      Size            .            Key   Integrity                 .    Conns            .
aerospike1.dddd.domain.com:3000   *BB9D4512311B36C   xx.xx.xxx.xxx:3000   C-4.5.3.5         2      4.079 K   B75BC879D438   True        BB9D4512311B36C     2335   102:36:14
aerospike2.dddd.domain.com:3000   BB9BC512311B36C    yy.yy.yy.yyy:3000    C-4.3.0.7         2      4.079 K   B75BC879D438   True        BB9D4512311B36C     2826   3284:35:00
Number of rows: 2

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Usage Information (2019-08-23 11:49:39 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace                              Node       Total   Expirations,Evictions     Stop       Disk    Disk     HWM   Avail%          Mem     Mem    HWM      Stop
        .                                 .     Records                       .   Writes       Used   Used%   Disk%        .         Used   Used%   Mem%   Writes%
ns1      aerospike1.dddd.domain.com:3000     2.164 B   (666.400 M, 0.000)      false    3.917 TB   60      90      31       273.725 GB   79      94     98
ns1      aerospike2.dddd.domain.com:3000     2.392 B   (2.641 B, 0.000)        false    4.339 TB   75      80      12       297.902 GB   86      94     98
ns1                                          4.555 B   (3.308 B, 0.000)                 8.256 TB                            571.627 GB
ns       aerospike1.dddd.domain.com:3000     0.000     (0.000,  0.000)         false         N/E   N/E     50      N/E       36.000 KB   1       60     90
ns                                           0.000     (0.000,  0.000)                  0.000 B                              36.000 KB
ns2      aerospike2.dddd.domain.com:3000   980.860 K   (0.000,  3.257 M)       false         N/E   N/E     50      N/E      611.176 MB   15      80     96
ns2                                        980.860 K   (0.000,  3.257 M)                0.000 B                             611.176 MB
Number of rows: 7

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Namespace Object Information (2019-08-23 11:49:39 UTC)~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Namespace                              Node       Total     Repl                        Objects                   Tombstones               Pending   Rack
        .                                 .     Records   Factor     (Master,Prole,Non-Replica)   (Master,Prole,Non-Replica)              Migrates     ID
        .                                 .           .        .                              .                            .               (tx,rx)      .
ns1      aerospike1.dddd.domain.com:3000     2.164 B   2        (1.023 B, 1.141 B, 0.000)      (0.000,  0.000,  0.000)      (2.238 K, 1.841 K)    0
ns1      aerospike2.dddd.domain.com:3000     2.392 B   2        (1.216 B, 1.176 B, 0.000)      (0.000,  0.000,  0.000)      (1.841 K, 2.238 K)    0
ns1                                          4.555 B            (2.239 B, 2.317 B, 0.000)      (0.000,  0.000,  0.000)      (4.079 K, 4.079 K)
ns       aerospike1.dddd.domain.com:3000     0.000     1        (0.000,  0.000,  0.000)        (0.000,  0.000,  0.000)      (0.000,  0.000)       0
ns                                           0.000              (0.000,  0.000,  0.000)        (0.000,  0.000,  0.000)      (0.000,  0.000)
ns2      aerospike2.dddd.domain.com:3000   980.719 K   1        (980.719 K, 0.000,  0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)       0
ns2                                        980.719 K            (980.719 K, 0.000,  0.000)     (0.000,  0.000,  0.000)      (0.000,  0.000)
Number of rows: 7

/proc/meminfo

MemTotal:       396133456 kB
MemFree:        53563996 kB
MemAvailable:   62018072 kB
Buffers:          680520 kB
Cached:          9357856 kB
SwapCached:            0 kB
Active:         334273624 kB
Inactive:        6364376 kB
Active(anon):   330595956 kB
Inactive(anon):     4456 kB
Active(file):    3677668 kB
Inactive(file):  6359920 kB
Unevictable:       32016 kB
Mlocked:           32016 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:             59768 kB
Writeback:             0 kB
AnonPages:      330631972 kB
Mapped:           194596 kB
Shmem:              6096 kB
Slab:             795724 kB
SReclaimable:     527000 kB
SUnreclaim:       268724 kB
KernelStack:       35200 kB
PageTables:       660192 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    396133456 kB
Committed_AS:   337434180 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     1570860 kB
DirectMap2M:    76957696 kB
DirectMap1G:    325058560 kB

model name : AMD EPYC 7401P 24-Core Processor

service {
        user root
        group root
        paxos-single-replica-limit 1
        pidfile /var/run/aerospike/asd.pid
        proto-fd-max 60000
}

logging {
        file /var/log/aerospike/aerospike.log {
                context any info
        }
        console {
                context any info
        }
}

network {
        service {
                address x.x.x.x
                access-address x.x.x.x
                alternate-access-address x.x.x.x

                port 3000
        }
        heartbeat {
                mode mesh
                address x.x.x.x
                port 3002
                interval 150
                timeout 10

                mesh-seed-address-port y.y.y.y 3002
        }
        fabric {
                address x.x.x.x
                port 3001
        }
        info {
                address x.x.x.x
                port 3003
        }
}

namespace ns1 {
        replication-factor 2
        memory-size 320G
        default-ttl 7d
        high-water-disk-pct 90
        high-water-memory-pct 96

        storage-engine device {
            device /dev/sda3
            device /dev/sda4
            device /dev/sdb3
            device /dev/sdb4
            device /dev/sdc1
            device /dev/sdc2
            device /dev/sdd1
            device /dev/sdd2

            write-block-size 128k
            defrag-startup-minimum 6
        }
}

namespace ns {
        replication-factor 1
        memory-size 12G
        default-ttl 7d
        single-bin      false

        storage-engine memory
}

Oh, wow. Thanks for reporting this. I’d like to see the heap-efficiency lines from your log file leading up to the crash. You can get these lines with a simple grep, just fill in the location of your log file:

grep heap-efficiency .../aerospike.log

This should give you a log line that repeats every 10 seconds. Can you give me these log lines for the 5 minutes leading up to the crash?

I’m wondering what the heap efficiency reading is. Maybe we’re dealing with memory fragmentation.