SIGSEGV received Crash : 5.7.0.10 CE on os e17

Running at CentOS 7.9.2009 and Linux 3.10.0-1062.4.3.e17.x86_64

8 nodes and with disk storage. with 5000~20000 reads tps and 20000 ~ 100000 udf tps.

always crash with logs above. any one help me will be appreciated.

The logs leading up to the crash would help. Has this always been an issue? Did you recently upgrade or start utilizing some feature? Are hardware logs clean? Is it only 1 node? It looks like 5.7.0.18 is the latest, so there have been a number of bug fixes since then which may be relevant Aerospike Server CE Release Note | Download | Aerospike

1 Like

Thanks for reply.
I have checked the update logs after 5.7.0.10 but can not find any info about this question. It happens on most of (always 4~6) my 8 nodes, and when with continous high(40000 ~ 80000) tps. I got a core dump file and trace it with gdb, I found it try to get the particle size of a bin with particle type show as 0, pic link: gdp info and will cause NULL Pointer at flow code: as code because the 0 response as AS_PARTICLE_TYPE_NULL in particle_vtable is always point to NULL image but

I check the code at tag 5.7.0.18 and branch master, the code is the same. In my case I always use bin in udf as a map create by aerospike udf api, I wonder why the particle->type is AS_PARTICLE_TYPE_NULL and how cause it. Any help will be appreciated.

Would you be able to provide code to reproduce this crash?

1 Like

I can provide some code, but I don’t know how to reproduce this crash. I try to simplify it to the following code:

local function binDataForUpdate(binData, ver)
    if binData == nil then
        binData = map()
        binData[VER_KEY] = ver
        binData[DATA_KEY] = map()
    elseif ver ~= binData[VER_KEY] then
        binData[VER_KEY] = ver
        binData[DATA_KEY] = map()
        map.remove(binData, EXT_DATA_KEY)
    end
    return binData
end

function entrance(rec, paramList)
    if paramList == nil then
        return nil
    end

    local result = map()
    local recordExists = aerospike:exists(rec)
    local needUpdate = false
    local binName
    for param in list.iterator(paramList) do
        binName = param[BIN_KEY]
        local ver = param[VER_KEY]
        if binName == nil or ver == nil then
            -- "RECORD_PARAM_ERROR"
        else
            local binData = map()
            if recordExists then
                binData = rec[binName]
            end
            binData = binDataForUpdate(binData, ver)
            local ret = mainFunc(binData[DATA_KEY], param)
            local extWrite = extendFunc(binData, param)

            if ret == 0 or extWrite == 0 then
                needUpdate = true
                if not recordExists then
                    local createRet = aerospike:create(rec)
                    if createRet ~= nil and createRet ~= 0 then
                        result['msg'] = "RECORD_UPDATE_ERROR"
                        result['code'] = createRet
                        error('Record create failed with function  '.. tostring(funcName) .. ' binName ' ..
                                tostring(binName) .. '. Error: '.. tostring(createRet))
                        return result
                    end
                    recordExists = true
                end
                rec[binName] = binData
            end
        end
    end

    if needUpdate then
        local ret
        ret = aerospike:update(rec)
        if ret ~= nil and ret ~= 0 then
            result['msg'] = "RECORD_UPDATE_ERROR"
            result['code'] = ret
        end
    end
    return result
end

The data to store in bin is always a map provide by aerospike api. all my business code in mainFunc and extendFunc is only operate the key in it. And I don’t know when it will crash,high tps? or some kind of data? As I reduce the data to compute, all nodes works fine. Thanks for your replay, and any help will be appreciated.

Hi, could we have that core dump and the asd binary that you were using? Thank you.

1 Like

Thanks your reply The file is on the production server. I’ll try to apply to get the file, it will take some time. Could I have your email so that I can send it to you privately after I get it.

That would be most helpful, I’ve sent you a message with my information.

1 Like

I’ve send an email with the core file.

Thank you, we have found the bug thanks to your core file. We’ll have it hot-fixed.

2 Likes

I’m glad it helped you and thanks for your attention on my question. :slight_smile: