Invalid size for buffer error


#1

Hi,

We recently deployed our ruby application with aerospike on a test server with production like load and within a matter of few seconds we started getting this error very frequently. We were doing a single get from aerospike in each request with the user_id as key. The value for each key is a list with upto two small hashes as its elements which in no case can be greater than the MAX_BUFFER_SIZE which is 10MB by default. There is no other call to the aerospike.

We are using nginx passenger setup with passenger running 50 app instances. In the config/initializers we’ve initialized the aerospike client as a global variable ($as_client). A little digging in the code shows the error comes while parsing the headers from the data_buffer. Is it possible that the data_buffer gets corrupted by multiple requesting using it simultaneously? The error message in the logs is like : Aerospike::Exceptions::Parse: Invalid size for buffer: 1602040889240

The size mentioned in the message is always way bigger than the actual size of the data in that key. How do I go about debugging this? I tried creating a new client instance for every request and it seemed to work fine without giving any errors. However, when I have only one client instance initialized in the initializers direcrory as a global variable, this error comes up.


#2

Hi @zingoba,

The default concurrency model is single-threaded, multi-process. The multi-threaded concurrency model is only supported by the enterprise version. Which model are you using in your application? You definitely shouldn’t have to create a new client instance for every request.

Do you have a stack trace for the exception?

Cheers, Jan


#3

Hey @Jan, Thanks for the quick response. We are currently using the community edition. Here’s the stack trace :

…/gems/aerospike-2.1.0/lib/aerospike/command/ command.rb: 761:insize_buffer_sz’ …/aerospike-2.1.0/lib/aerospike/command/ read_command.rb: 66:in parse_result' …/gems/aerospike-2.1.0/lib/aerospike/command/ command.rb: 454:inexecute’ …uby-2.1.2/gems/aerospike-2.1.0/lib/aerospike/ client.rb: 912:in execute_command' …uby-2.1.2/gems/aerospike-2.1.0/lib/aerospike/ client.rb: 273:inget’ …road/joulroad.com/config/initializers/ 01_raerospike.rb: 48:in get' …oulroad/joulroad.com/app/models/ user_recommendation.rb: 12:inget_user_category_filter_data’ …joulroad.com/app/controllers/ application_controller.rb:2226:in check_my_feed_user_type' …joulroad.com/app/controllers/ application_controller.rb:1770:inpublish_activityLogger’ …joulroad.com/app/controllers/ application_controller.rb:1523:in global_after_filter' …uby-2.1.2/gems/omniauth-1.2.2/lib/omniauth/ strategy.rb: 186:incall!’ …uby-2.1.2/gems/omniauth-1.2.2/lib/omniauth/ strategy.rb: 164:in call' …ruby-2.1.2/gems/omniauth-1.2.2/lib/omniauth/ builder.rb: 59:incall’ …uby-2.1.2/gems/omniauth-1.2.2/lib/omniauth/ strategy.rb: 404:in call_app!' …/ubuntu/joulroad/joulroad.com/lib/ joulroad_strategy.rb: 27:inother_phase’ …uby-2.1.2/gems/omniauth-1.2.2/lib/omniauth/ strategy.rb: 185:in call!' …uby-2.1.2/gems/omniauth-1.2.2/lib/omniauth/ strategy.rb: 164:incall’ …uby-2.1.2/gems/omniauth-1.2.2/lib/omniauth/ strategy.rb: 186:in call!' …uby-2.1.2/gems/omniauth-1.2.2/lib/omniauth/ strategy.rb: 164:incall’ …ruby-2.1.2/gems/omniauth-1.2.2/lib/omniauth/ builder.rb: 59:in call' …ems/rack-mobile-detect-0.4.0/lib/rack/ mobile-detect.rb: 164:incall’ …/.rvm/gems/ruby-2.1.2/gems/rack-1.5.5/lib/rack/ etag.rb: 23:in call' …/ruby-2.1.2/gems/rack-1.5.5/lib/rack/ conditionalget.rb: 35:incall’ …/.rvm/gems/ruby-2.1.2/gems/rack-1.5.5/lib/rack/ head.rb: 11:in call' …-2.1.2/gems/rack-1.5.5/lib/rack/session/abstract/ id.rb: 225:incontext’ …-2.1.2/gems/rack-1.5.5/lib/rack/session/abstract/ id.rb: 220:in call' …/ruby-2.1.2/gems/rack-1.5.5/lib/rack/ methodoverride.rb: 21:incall’ …vm/gems/ruby-2.1.2/gems/rack-1.5.5/lib/rack/ runtime.rb: 17:in call' …m/gems/ruby-2.1.2/gems/rack-1.5.5/lib/rack/ sendfile.rb: 112:incall’ …lib/phusion_passenger/rack/ thread_handler_extension.rb: 58:in process_request' …ib/phusion_passenger/request_handler/ thread_handler.rb: 126:inaccept_and_process_next_request’ …ib/phusion_passenger/request_handler/ thread_handler.rb: 94:in main_loop' …server-4.0.49/lib/phusion_passenger/ request_handler.rb: 471:inblock (3 levels) in start_threads’`


#4

Hi @Jan, I hope you find time to go through this. I’m sharing few more findings here. The above is not the only error that occurs but also the following when we deploy our site with passenger set to run multiple app instances :

ResultCode 66 unknown in the client. Please file a github issue.

Error during refresh for node BB9566D9B270008:127.0.0.1:3000: width too big

All these errors disappear when we configure passenger to run only a single instance of the application. Although the app instances spawned by passenger are separate processes in themselves, so they shouldn’t be sharing the aerospike data buffers but somehow it is these data buffers which get corrupted causing variety of errors. We are currently using client version 2.1.0 on production. Any help/pointers would be much appreciated.


#5

Hi @zingoba,

I was actually looking into this today. And I believe Passenger’s “smart spawning” is causing the issue. When you are creating the Aerospike client instance in the Rails initializer, the client immediately starts to connect to the Aerospike cluster. That means when the ‘preloader’ process gets forked, all the child processes share the existing TCP connection(s) that were established in the ‘preloader’ process. The issue is very similar to the first example listed under the Smart spawning caveats.

You can avoid the issue by using Passenger’s starting_worker_process hook to only connect the client once a new application process was spawned. That way, the processes don’t share any of the file descriptors. Once I changed the client initialization in config/initializers/aerospike.rb the problem disappeared in my test setup:

connect = true
if defined?(PhusionPassenger)
  connect = false
  PhusionPassenger.on_event(:starting_worker_process) do
    $as_client.connect
  end
end
$as_client = Aerospike::Client.new(connect: connect)

The same underlying issue is also responsible for the “ResultCode 66” and “width too big” errors, which I also saw during testing. These errors can happen as well when two processes share the same socket connection to communicate with a cluster node.

Let me know if you are still seeing any issues when you change how the client gets initialized.

Cheers, Jan


#6

@Jan,

Awesome!! This works. Thank you very much for your time. Why didn’t I think in this direction?! :pensive: Thanks again. Cheers!