How to increase threads used by UDFs?

benchmark
secondary
index

#25

I think there is some other bottleneck …

– R


#26

Hello, i have the same problem with similar Lua scripts. In addition to described problem i find In the udf detailed log sometimes there are lines like: DETAIL (query): (aggr.c: as_aggr_istream_read: 400) No More Nodes for this Lua Call

Earlier in the subject, I found similar messages in the log: DETAIL (query): (aggr.c: as_aggr_istream_read: 406) No More Nodes for this Lua Call

Does anyone know what it is? Is it due to the fact that Lua is performed for a long time downloading only part of the CPUs? Where can I find information about this message?


#27

JekaMas,

I would imagine you are running something like

aql> Aggregate something.something () on test

which is a scan aggregation. We have just identified regression. I am on fixing it. Please confirm.

nizsheanez is actually running the aggregation over secondary index query

aql> Aggregate something.something () on test where bin = 10 

So what we are seeing there is something different.

– R


#28

Thanks for answer, raj! I am trying queries with and without secondary index:

aggregate fullsearch.filter_records('active','one') on test.products

aggregate udfActive.filter_records() on test.products WHERE status='active'

Lua scripts: fullsearch.lua:

function filter_records(stream,status,testbin)

   local function map_record(record)
      return map {id=record.id, testBin=record.testBin}
   end

   local function filter_name(record)
      return record.testBin == testbin and record.status == status
   end

    return stream : filter(filter_name) : map(map_record)
end

udfActive.lua:

function filter_records(stream)
   local function map_record(record)
      return map {id=record.id, testBin=record.testBin}
   end

   local function filter_name(record)
      return record.testBin == "one"
   end
  
   return stream : filter(filter_name) : map(map_record)
end

The behavior of both queries are very similar. Execution of each (with and without secondary index), give a load only in 1-4 of 8 processors. Other processors are idle. In a detailed log written messages like:

DETAIL (query): (aggr.c: as_aggr_istream_read: 400) No More Nodes for this Lua Call

With increasing number of processors the execution time of requests is not decreasing. I have the same results for 2-8 processors and 1000000 records: 0.4s for query with secondary index and 1.4s for query without secondary index.


#29

Jekamas,

Scan and query aggregation is two different engine on server side.

What is the query configuration you are running with

asinfo -v 'get-config:' | grep query

You may want to bump up query-threads. Check this out

http://www.aerospike.com/docs/operations/manage/queries/

– R


#30

raj,

Raj, my running configuration: query-threads=6 query-worker-threads=15 query-priority=10 query-in-transaction-thread=0 query-req-in-query-thread=0 query-req-max-inflight=100 query-bufpool-size=256 query-batch-size=100 query-sleep=1 query-job-tracking=false query-short-q-max-size=500 query-long-q-max-size=500 query-rec-count-bound=4294967295 query-threshold=10 query-untracked-time=1000000 query-hist-track-back=1800 query-hist-track-slice=10 query-hist-track-thresholds=1,8,64 query_rec_count-hist-track-back=1800 query_rec_count-hist-track-slice=10 query_rec_count-hist-track-thresholds=1,8,64

pretty standard i think.

raj, can you describe, what does mean this message in log:

DETAIL (query): (aggr.c: as_aggr_istream_read: 400) No More Nodes for this Lua Call

#31

JekaMas,

DETAIL (query): (aggr.c: as_aggr_istream_read: 400) No More Nodes for this Lua Call

This simply means that you are done with the a certain batch that is it. When aggregation is run system chunks it up into batches and runs the Lua aggregation code over it. It is showing up because you have enabled DETAILED log.

Btw why are you running with detail enabled, default is info. It will spew lot of data into the log. Will fill up the log and also slow down the server …

You have enough query threads if you disable all the detailed logging what performance do you see ??

– R


#32

I suggested as similar thing before also. Can you make client simply dump the data into dev/null without processing … This is to make sure slowness is not due to client processing the data slowly. If you are fetching that much data client slowdown would eventually throttle back server …

Btw what client are you using ??

– R


#33

or aql from console