8 nodes and with disk storage. with 5000~20000 reads tps and 20000 ~ 100000 udf tps.
always crash with logs above. any one help me will be appreciated.
8 nodes and with disk storage. with 5000~20000 reads tps and 20000 ~ 100000 udf tps.
always crash with logs above. any one help me will be appreciated.
The logs leading up to the crash would help. Has this always been an issue? Did you recently upgrade or start utilizing some feature? Are hardware logs clean? Is it only 1 node? It looks like 5.7.0.18 is the latest, so there have been a number of bug fixes since then which may be relevant Aerospike Server CE Release Note | Download | Aerospike
Thanks for reply.
I have checked the update logs after 5.7.0.10 but can not find any info about this question.
It happens on most of (always 4~6) my 8 nodes, and when with continous high(40000 ~ 80000) tps. I got a core dump file and trace it with gdb, I found it try to get the particle size of a bin with particle type show as 0,
pic link: gdp info
and will cause NULL Pointer at flow code:
as code
because the 0 response as AS_PARTICLE_TYPE_NULL in particle_vtable is always point to NULL
but
I check the code at tag 5.7.0.18 and branch master, the code is the same. In my case I always use bin in udf as a map create by aerospike udf api, I wonder why the particle->type is AS_PARTICLE_TYPE_NULL and how cause it. Any help will be appreciated.
Would you be able to provide code to reproduce this crash?
I can provide some code, but I don’t know how to reproduce this crash. I try to simplify it to the following code:
local function binDataForUpdate(binData, ver)
if binData == nil then
binData = map()
binData[VER_KEY] = ver
binData[DATA_KEY] = map()
elseif ver ~= binData[VER_KEY] then
binData[VER_KEY] = ver
binData[DATA_KEY] = map()
map.remove(binData, EXT_DATA_KEY)
end
return binData
end
function entrance(rec, paramList)
if paramList == nil then
return nil
end
local result = map()
local recordExists = aerospike:exists(rec)
local needUpdate = false
local binName
for param in list.iterator(paramList) do
binName = param[BIN_KEY]
local ver = param[VER_KEY]
if binName == nil or ver == nil then
-- "RECORD_PARAM_ERROR"
else
local binData = map()
if recordExists then
binData = rec[binName]
end
binData = binDataForUpdate(binData, ver)
local ret = mainFunc(binData[DATA_KEY], param)
local extWrite = extendFunc(binData, param)
if ret == 0 or extWrite == 0 then
needUpdate = true
if not recordExists then
local createRet = aerospike:create(rec)
if createRet ~= nil and createRet ~= 0 then
result['msg'] = "RECORD_UPDATE_ERROR"
result['code'] = createRet
error('Record create failed with function '.. tostring(funcName) .. ' binName ' ..
tostring(binName) .. '. Error: '.. tostring(createRet))
return result
end
recordExists = true
end
rec[binName] = binData
end
end
end
if needUpdate then
local ret
ret = aerospike:update(rec)
if ret ~= nil and ret ~= 0 then
result['msg'] = "RECORD_UPDATE_ERROR"
result['code'] = ret
end
end
return result
end
The data to store in bin is always a map provide by aerospike api. all my business code in mainFunc and extendFunc is only operate the key in it. And I don’t know when it will crash,high tps? or some kind of data? As I reduce the data to compute, all nodes works fine. Thanks for your replay, and any help will be appreciated.
Hi, could we have that core dump and the asd binary that you were using? Thank you.
Thanks your reply The file is on the production server. I’ll try to apply to get the file, it will take some time. Could I have your email so that I can send it to you privately after I get it.
That would be most helpful, I’ve sent you a message with my information.
I’ve send an email with the core file.
Thank you, we have found the bug thanks to your core file. We’ll have it hot-fixed.
I’m glad it helped you and thanks for your attention on my question.