Random SIGSEGV when doing around 104600 to 104700 queries


#1

As mentioned I encountered random crashes when doing around 104600 to 104700 queries with c client API.

I use Qt Creator on Ubuntu LTS 12.04.

I forked the c client 4.0.2 on github and compiled it on my local machine. With the deb package I got just few chunks of assembly, with the self compiled stuff i got crashes at aerospike code or assembly without a stack trace. But in common are the different locations and that i can run somewhat about 104k aerospike_key_get(). I installed signal handlers which work fine with ctrl+c or *(int*)0 = 42; but are unable to catch those SIGSEGV, even gdb crashed.

As anyone some ideas for me how to deal with it?

I figured a work around out: i count the API operation and when it reaches 100k then I throw as_config and aerospike away and create completely fresh ones, this avoids the crash


#2

as_config is not designed to stick around. All of the as_config data is copied when aerospike_connect() is called, so as_config can go out of scope after that.


#3

ok thanks but i think i was a little unclear…

The first implementation was just a few lines within the main (the application just imports some data from files like csv) in that stage as_config lifed on stack for whole run time.

The latest version, which still contains the bug, throws it away after aerospike_init() and aerospike_connect().

My workaround is to throw everything, including aerospike* as; after 100k queries away, parse the int main(args) and create everything again, then go on with the import. If i instead just do aerospike_close and ‘aerospike_connect’ i get a AEROSPIKE_ERR_TIMEOUT afterwards.


#4

I would be helpful if you provided the source code that reproduces the error.


#5
#include <vector>
#include <iostream>
#include <aerospike/aerospike.h>
#include <aerospike/aerospike_key.h>
#include <aerospike/as_record_iterator.h>

enum class FunctionResult { success = 0, terminate = 1, repeat = 2, failure = 3 };
struct MainArgRefs
{
    const char* asnamespace;
    const char* mappingset;
    const char* targetset;
    const char* targetbin;
    const char* inputfile;
};

std::size_t g_linenumber = -1;
std::size_t g_successfulLines = -1;

FunctionResult ParseArgsAndConnect
(
    aerospike* as,
    MainArgRefs* argrefs,
    int argc, char* argv[]
)
{
    as_config cfg;
    as_config_init(&cfg);
    cfg.fail_if_not_connected = false;
    cfg.conn_timeout_ms = 1000;
    cfg.policies.timeout = 1000;
    std::vector<std::string> servers;
    for (int i = 1; i < argc; i++)
        if ("certain conditions")
            argrefs->asnamespace = argv[++i];
        else if ("other conditions")
            as_config_add_host(&cfg, address.c_str(), (uint16_t)port);

    // connect to aerospike
    as_error err;
    aerospike_init(as, &cfg);
    if (aerospike_connect(as, &err) != AEROSPIKE_OK)
    {
        LOG_GENERAL_ERROR("error connecting to aerospike: " << err.message);
        aerospike_destroy(as);
        return FunctionResult::failure;
    }
    return FunctionResult::success;
}

FunctionResult StreamDataIntoAerospike
(
    aerospike* as,
    MainArgRefs* argrefs
)
{
    bool reinitialize = false;
    while (std::getline(std::cin, line))
    {
        g_linenumber++;
        some parse logic ...

        // map dmp-uid to uuid
        as_key keyget;
        if (&keyget != as_key_init_str(&keyget, argrefs->asnamespace, argrefs->mappingset, dmpuid.c_str()))
            LOG_GENERAL_ERROR("error initializing key");
        else
        {
            as_record* recget = nullptr;
            as_error err;
            as_status stat = aerospike_key_get(as, &err, nullptr, &keyget, &recget);
            if (stat == AEROSPIKE_OK && recget)
            {
                uuid = as_record_get_str(recget, "uuid");
                if (!uuid.empty())
                {
                    as_key keyput;
                    if (&keyput != as_key_init_str(&keyput, argrefs->asnamespace, argrefs->targetset, uuid.c_str()))
                        LOG_GENERAL_ERROR("error initializing key");
                    as_record recput;
                    as_record_inita(&recput, 1);
                    if (as_record_set_str(&recput, argrefs->targetbin, convertedData.c_str()))
                    {
                        stat = aerospike_key_put(as, &err, nullptr, &keyput, &recput);
                        if (stat == AEROSPIKE_OK)
                            g_successfulLines++;
                        else
                            LOG_GENERAL_ERROR("error inserting aerospike: " << err.message << ", uuid: "<< uuid);
                    }
                    else
                        LOG_GENERAL_ERROR("error inserting aerospike, uuid: " << uuid);
                    as_record_destroy(&recput);
                }
                else
                    LOG_GENERAL_ERROR("error querying aerospike bin \'uuid\' has no value, dmp uid:" << dmpuid);
            }
            else if (stat == AEROSPIKE_ERR_RECORD_NOT_FOUND)
                LOG_DATA_ERROR("AEROSPIKE_ERR_RECORD_NOT_FOUND for dmp uid from mapping set: " << dmpuid);
            else
                LOG_GENERAL_ERROR("error querying aerospike: code " << stat << ": " << err.message);
            as_record_destroy(recget);
        }
        as_key_destroy(&keyget);

        // do only 100k queries (with each loop two are done)
        if ((g_linenumber%50000) == 0)
        {
            TRACE_OBJECT(g_linenumber);
#           if 1
                reinitialize = true;
                break;
#           else
                as_error err;
                if (aerospike_close(as, &err) != AEROSPIKE_OK)
                    LOG_GENERAL_ERROR("error disconnecting to aerospike: " << err.message);
                if (aerospike_connect(as, &err) != AEROSPIKE_OK)
                    LOG_GENERAL_ERROR("error connecting to aerospike: " << err.message);
#           endif
        }
    }

    // clean up
    FunctionResult res = FunctionResult::success;
    as_error err;
    if (aerospike_close(as, &err) != AEROSPIKE_OK)
    {
        LOG_GENERAL_ERROR("error disconnecting to aerospike: " << err.message);
        res = FunctionResult::failure;
    }
    aerospike_destroy(as);
    if (reinitialize)
        return FunctionResult::repeat;
    else
        return res;
}

int main(int argc, char* argv[])
{
    while (true)
    {
        aerospike as;
        MainArgRefs argrefs;
        switch (ParseArgsAndConnect(&as, &argrefs, argc, argv))
        {
        default:
        case FunctionResult::failure: return -1;
        case FunctionResult::terminate: return 0;
        }
        switch (StreamDataIntoAerospike(&as, &argrefs))
        {
        case FunctionResult::repeat: continue;
        case FunctionResult::failure: return -1;
        case FunctionResult::success:
        default:
            std::cout << "{\"TotalLines\":" << g_linenumber << ",\"SuccessfulLines\":" << g_successfulLines << "}" << std::endl;
            return 0;
        }
    }
}

I striped some parts out and think/hope they where not important.


#6

I didn’t understand this line:

uuid = as_record_get_str(recget, "uuid");

uuid is not defined, but I assume its a C++ string. as_record_get_str() is returning a “char *”. Is this string really being constructed/destructed properly when assigned a “char *” in a loop? I’m not a C++ expert.

In general, the aerospike calls look okay. I don’t have any advice other than to check for memory leaks/stomps when interfacing between C and C++ types.


#7

std::string line, dmpuid, data, timestamp, convertedData, uuid; is written just before while (std::getline(std::cin, line)) so its propper assigned and copied.

Yes i guess its a deeper analysis in maybe clib, aslib and/or memroy…

But thanks for reviewing the code.