> 3.0.94: "Unexpected data received from socket after a write"


#1

Since upgrading to anything later than 3.0.94 we’ve been seeing a lot of

Unexpected data received from socket after a write: fd=5 size=130802 (where the size may differ slightly, but is typically in the order of 128K)

errors while trying to write records. We’ve traced this down to code being added in commit https://github.com/aerospike/aerospike-client-c/commit/c9ad474777e6460ae94152ea6303cb4a4de4b0f4 that checks for extra data. The error occurs in this scenario:

  • we have a set with a secondary integer index and some data bins (8 in total). Initially, the value of the secondary index for all records is 0
  • we insert records (doesn’t matter whether it’s 1000 or 1000000)
  • we run a query (as_query_foreach) with one condition: as_query_where(&query, "in_use", as_integer_equals(0)); the callback method just needs one record, so the callback is hard-coded to always return false, and as_query_foreach has a userdata field which is a pointer to an std::string:
static bool candidateRecordCallBack(const as_val *value, void *userData) {
    assert(userData);
    if (value) {
        as_record *rec = as_record_fromval(value);
        if (!rec) {
            return false;
        }
        char *strVal = as_record_get_str(rec, "key");
        if (strVal == 0) {
            return false;
        }

        std::string &str = *(static_cast<std::string *>(userData));
        str = strVal;
    }
    return false;
}
  • [Just for the record, if we store about 1000 records, the callback is called about 3 times per as_query_foreach call, if we store about 100,000 records, we get about 5 callbacks; the last of these is always called with a NULL value pointer, as expected.]
  • Now, after as_query_foreach returns, the callback has returned us a key. We use this to retrieve & update that record (in the same set) and destroy the query using as_query_destroy
  • Next, we write a few records to other sets, and then we get the error mentioned above, however which aerospike_key_put fails differs between invocations.
  • We have dumped the buffer that is claimed to be received erroneously in as_command.c and it seems to contain records from the query intended for query callback (i.e. we see record data). This is unexpected, since the query callback has finished and we’ve destroyed the query using as_query_destroy.

Is it possible that the query is somehow not properly finished on the server and that the server keeps sending data? There does not seem to be a way to check whether a query is fully finished and/or to flush any remaining data (I expect I wouldn’t have to do that?)

Notes:

  • the error occurs only on ‘remote’ system, i.e. this does not happen if we run Aerospike on localhost (single node cluster)
  • the error occurs more often if there are mode nodes in a cluster, and if the machines or network are slower
  • for our specific scenario, versions 3.0.96 through roughly 3.1.2 hang; anything after that exhibits the error listed above

#2

This is a bug. The query is terminated prematurely, but the socket is placed back into the connection pool. A future command will re-use the socket and then get the error you observe. The socket needs to be closed when the query is terminated in this way. We are working on a fix.


#3

Thanks for the quick response Brian!


#4

C client 3.1.11 has been released which should fix your problem.

http://www.aerospike.com/download/client/c/3.1.11


#5

Hi Brian, thanks, we can confirm this indeed fixes this issue.