How to increase threads used by UDFs?

benchmark
index
secondary

#1

Hello i trying to do benchmarks: 1M records, 3 Servers, In-Memory storage.

Documents format:

{id:1, status:active, testBin:one},{id:2, status:active, testBin:two}, {id:3,status:inactive,testBin:one},…

Secondary index on ‘status’ bin

UDFs:

function filter_records(stream)
	local function map_record(record)
		return map {id=record.id, testBin=record.testBin}
	end

	local function filter_name(record)
		return record.testBin == "one"
	end

	return stream : filter(filter_name) : map(map_record)
end

I do request: AGGREGATE udfActive.filter_records() ON sandbox.products WHERE status=‘active’

and it take 0.5-0.7 sec

I think it’s so slow because UDFs running in 1 thread, how can i configure Aerospike for parallelize UDFs?


How to do multiple filtering
#2

nizsheanez,

The Aggregation with secondary index query is driven by query engine. What configuration are you running with

 asinfo -v 'get-config:' -l | grep query

query-thread determine the amount of parallel threads running on server.

How many records are you processing per request. Can you share query_rec_count histogram as shown in the log.

Can you enable query benchmark while running

asinfo -v 'set-config:context=service;query-microbenchmark=true

this will dump some extra histogram at micro level in the log /var/log/aerospike.log can you share few lines of the log with the histrogram at the end of the run.

– R


#3

asinfo -v ‘get-config:’ -l | grep query

query-threads=6
query-worker-threads=15
query-priority=10
query-in-transaction-thread=0
query-req-in-query-thread=0
query-req-max-inflight=100
query-bufpool-size=256
query-batch-size=100
query-sleep=1
query-job-tracking=false
query-short-q-max-size=500
query-long-q-max-size=500
query-rec-count-bound=4294967295
query-threshold=10
query-untracked-time=1000000
query-hist-track-back=1800
query-hist-track-slice=10
query-hist-track-thresholds=1,8,64
query_rec_count-hist-track-back=1800
query_rec_count-hist-track-slice=10
query_rec_count-hist-track-thresholds=1,8,64

I changed query-threads=6 to 20 but don’t see difference (24 CPU cores server)

i enabled benchmarks by command:

asinfo -v 'set-config:context=service;microbenchmarks=true'

Result: asinfo -v ‘get-config:’ -l | grep bench

    microbenchmarks=true
    storage-benchmarks=false
    ldt-benchmarks=false

But don’t see anything in: tail -f /var/log/aerospike/aerospike.log


#4

in aerospike.conf i have:

    logging {
	file /var/log/aerospike/aerospike.log {
               	context any warning
        }
     }

I will try to change to debug level


#5
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_open:205) [ENTER] Opening record key:<Digest>:0x3030303030303030303030303030303038393930
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_storage_record_open:83) [ENTER] Opening record key:<Digest>:0x3030303030303030303030303030303038393930
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_get:687) [ENTER] rec(0x7f691dcd3ca0) name(testBin)
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_get:424) [ENTER] BinName(testBin)
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_set:481) [ENTER] urecord(0x7f691dcd16a0) name(0x41745a48)[testBin] dirty(0)
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_close:266) [ENTER] Closing record key:<Digest>:0x3030303030303030303030303030303038393930
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_free:448) [ENTER] NumUpdates(1)
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_open:205) [ENTER] Opening record key:<Digest>:0x3030303030303030303030303030303039393930
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_storage_record_open:83) [ENTER] Opening record key:<Digest>:0x3030303030303030303030303030303039393930
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_get:687) [ENTER] rec(0x7f691dcd3ca0) name(testBin)
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_get:424) [ENTER] BinName(testBin)
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_set:481) [ENTER] urecord(0x7f691dcd16a0) name(0x41745a48)[testBin] dirty(0)
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_close:266) [ENTER] Closing record key:<Digest>:0x3030303030303030303030303030303039393930
May 20 2015 08:02:42 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_free:448) [ENTER] NumUpdates(1)
May 20 2015 08:02:42 GMT: DEBUG (query): (aggr.c:as_aggr__process:244)  Apply Stream with udfActive filter_records 0x7f691dcd3c40 0x7f691dcd3c00 0x7f691dcd3c20 ret=0
May 20 2015 08:02:42 GMT: DEBUG (sindex): (secondary_index.c:as_sindex__pre_op_assert:565) DML on index status in 2 state with reference count 2 < 2
May 20 2015 08:02:42 GMT: DEBUG (query): (aggr.c:as_aggr__process:244)  Apply Stream with udfActive filter_records 0x7f691dcd3c40 0x7f691dcd3c00 0x7f691dcd3c20 ret=0
May 20 2015 08:02:42 GMT: DEBUG (sindex): (secondary_index.c:as_sindex_range_free:2139) as_sindex_range_free
May 20 2015 08:02:42 GMT: DEBUG (sindex): (secondary_index.c:as_sindex_release:1192) Index status in 2 state Released to reference count 1 < 2 at base/thr_query.c:764
May 20 2015 08:02:42 GMT: DEBUG (cf:jem): (jem.c:jem_set_arena:182) TID 1010 changed from arena #15 ==> #15
May 20 2015 08:02:42 GMT: DEBUG (demarshal): (thr_demarshal.c:thr_demarshal:626) [Sending Info request via fast path.]
May 20 2015 08:02:42 GMT: DEBUG (cf:jem): (jem.c:jem_set_arena:182) TID 1031 changed from arena #150 ==> #28
May 20 2015 08:02:42 GMT: DEBUG (demarshal): (thr_demarshal.c:thr_demarshal:626) [Sending Info request via fast path.]
May 20 2015 08:02:42 GMT: DEBUG (sindex): (secondary_index.c:as_sindex_reserve:1143) Index status in 2 state Reserved to reference count 2 < 2 at base/thr_sindex.c:257
May 20 2015 08:02:42 GMT: DEBUG (sindex): (secondary_index.c:as_sindex_release:1192) Index status in 2 state Released to reference count 1 < 2 at base/thr_sindex.c:438
May 20 2015 08:02:43 GMT: DEBUG (cf:jem): (jem.c:jem_set_arena:182) TID 1010 changed from arena #15 ==> #15
May 20 2015 08:02:43 GMT: DEBUG (demarshal): (thr_demarshal.c:thr_demarshal:626) [Sending Info request via fast path.]
May 20 2015 08:02:43 GMT: DEBUG (cf:jem): (jem.c:jem_set_arena:182) TID 1016 changed from arena #150 ==> #13

enough? teach me how to read it :slight_smile:


#6

I did a bit cleaner log for you like this:

logging {
	file /var/log/aerospike/aerospike.log {
                context any warning
                context paxos info
                context ldt info
                context udf debug
                context query detail
        }
}

I reduced amount of records to 10K (with 1M and debug log i got timeouts, now it took 73ms) results:

May 20 2015 08:07:27 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_set:481) [ENTER] urecord(0x7f49108d06a0) name(0x40555a48)[testBin] dirty(0)
May 20 2015 08:07:27 GMT: DETAIL (query): (aggr.c:as_aggr_istream_read:380) Close Record (0x7f4a16440400,8)
May 20 2015 08:07:27 GMT: DEBUG (udf): (udf_record.c:udf_record_close:266) [ENTER] Closing record key:<Digest>:0x3030303030303030303030303030303038393930
May 20 2015 08:07:27 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_free:448) [ENTER] NumUpdates(1)
May 20 2015 08:07:27 GMT: DETAIL (query): (aggr.c:as_aggr_istream_read:421) Open Record (0x7f4a16440400,9 139951787027424, 139951787027392)
May 20 2015 08:07:27 GMT: DEBUG (udf): (udf_record.c:udf_record_open:205) [ENTER] Opening record key:<Digest>:0x3030303030303030303030303030303039393930
May 20 2015 08:07:27 GMT: DEBUG (udf): (udf_record.c:udf_storage_record_open:83) [ENTER] Opening record key:<Digest>:0x3030303030303030303030303030303039393930
May 20 2015 08:07:27 GMT: DETAIL (query): (aggr.c:as_aggr_istream_read:460) Successfully read record
May 20 2015 08:07:27 GMT: DEBUG (udf): (udf_record.c:udf_record_get:687) [ENTER] rec(0x7f49108d2ca0) name(testBin)
May 20 2015 08:07:27 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_get:424) [ENTER] BinName(testBin)
May 20 2015 08:07:27 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_set:481) [ENTER] urecord(0x7f49108d06a0) name(0x40555a48)[testBin] dirty(0)
May 20 2015 08:07:27 GMT: DETAIL (query): (aggr.c:as_aggr_istream_read:380) Close Record (0x7f4a16440400,9)
May 20 2015 08:07:27 GMT: DEBUG (udf): (udf_record.c:udf_record_close:266) [ENTER] Closing record key:<Digest>:0x3030303030303030303030303030303039393930
May 20 2015 08:07:27 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_free:448) [ENTER] NumUpdates(1)
May 20 2015 08:07:27 GMT: DETAIL (query): (aggr.c:as_aggr_istream_read:406) No More Nodes for this Lua Call
May 20 2015 08:07:27 GMT: DEBUG (query): (aggr.c:as_aggr__process:244)  Apply Stream with udfActive filter_records 0x7f49108d2c40 0x7f49108d2c00 0x7f49108d2c20 ret=0
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query__generator_get_nextbatch:1844) start 0 end 0 @ 19 pimd found 0
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query__generator_get_nextbatch:1867) All the Data finished moving to next tree 20
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query__generator:2032) Loop=101, Selected=0, ret=2
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query__process_request:1731) Processing Request 2
May 20 2015 08:07:27 GMT: DEBUG (query): (aggr.c:as_aggr__process:244)  Apply Stream with udfActive filter_records 0x7f49108d2c40 0x7f49108d2c00 0x7f49108d2c20 ret=0
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query__generator:2073) All the Data finished; All tree finished 0 10
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query__add_fin:955) Adding fin 0x7f4a19800008
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query_netio_finish_cb:1024) Finished sequence number 0x7f4a19800008->1
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_qtr__release:806) Free qtr ref count is zero
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query__update_stats:394) Total time elapsed 73172 us, 1 of 1000 read operations avg latency 73172 us
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query__transaction_done:759) Cleaned up qtr 0x7f4a19800008
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_qtr__release:814) Released qtr [base/thr_query.c:1041] 0x7f4a19800008 1
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query_netio_finish_cb:1046) Finished query with retCode 0
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query__netio:1097) Streamed Out
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_query__bb_poolrelease:594) Pushed 0x7f4a16417000 131072 131072
May 20 2015 08:07:27 GMT: DETAIL (query): (thr_query.c:as_qtr__release:814) Released qtr [base/thr_query.c:2087] 0x7f4a19800008 1

And then i did same for only 10 records, maybe it will show full picture:

May 20 2015 08:26:53 GMT: DEBUG (query): (thr_query.c:as_query:2552) No Index Defined in the Query
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query:2621) Query on index status
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__queue:2394) Logged query
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__bb_poolrequest:623) Popped 0x7f48d7417000
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__generator_get_nextbatch:1844) start 0 end 0 @ 19 pimd found 1
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__generator_get_nextbatch:1867) All the Data finished moving to next tree 20
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__generator:2032) Loop=1, Selected=1, ret=2
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__process_request:1731) Processing Request 2
May 20 2015 08:26:53 GMT: DETAIL (query): (aggr.c:as_aggr_istream_read:421) Open Record (0x7f48d7440400,0 1099337656, 139946376371136)
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_open:205) [ENTER] Opening record key:<Digest>:0x3030303030303030303030303030303030303030
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_storage_record_open:83) [ENTER] Opening record key:<Digest>:0x3030303030303030303030303030303030303030
May 20 2015 08:26:53 GMT: DETAIL (query): (aggr.c:as_aggr_istream_read:460) Successfully read record
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_get:687) [ENTER] rec(0x7f47ce0d1ca0) name(testBin)
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_get:424) [ENTER] BinName(testBin)
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_set:481) [ENTER] urecord(0x7f47ce0cf6a0) name(0x41863a48)[testBin] dirty(0)
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_get:687) [ENTER] rec(0x7f47ce0d1ca0) name(id)
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_get:424) [ENTER] BinName(id)
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_set:481) [ENTER] urecord(0x7f47ce0cf6a0) name(0x41858d20)[id] dirty(0)
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_get:687) [ENTER] rec(0x7f47ce0d1ca0) name(testBin)
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_get:424) [ENTER] BinName(testBin)
May 20 2015 08:26:53 GMT: DETAIL (query): (aggr.c:as_aggr_istream_read:380) Close Record (0x7f48d7440400,0)
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_close:266) [ENTER] Closing record key:<Digest>:0x3030303030303030303030303030303030303030
May 20 2015 08:26:53 GMT: DEBUG (udf): (udf_record.c:udf_record_cache_free:448) [ENTER] NumUpdates(2)
May 20 2015 08:26:53 GMT: DETAIL (query): (aggr.c:as_aggr_istream_read:406) No More Nodes for this Lua Call
May 20 2015 08:26:53 GMT: DEBUG (query): (aggr.c:as_aggr__process:244)  Apply Stream with udfActive filter_records 0x7f47ce0d1c40 0x7f47ce0d1c00 0x7f47ce0d1c20 ret=0
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__generator:2073) All the Data finished; All tree finished 0 10
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__add_fin:955) Adding fin 0x7f48d8800008
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query_netio_finish_cb:1024) Finished sequence number 0x7f48d8800008->1
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_qtr__release:806) Free qtr ref count is zero
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__update_stats:394) Total time elapsed 5074 us, 1 of 1 read operations avg latency 5074 us
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__transaction_done:759) Cleaned up qtr 0x7f48d8800008
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_qtr__release:814) Released qtr [base/thr_query.c:1041] 0x7f48d8800008 1
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query_netio_finish_cb:1046) Finished query with retCode 0
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__netio:1097) Streamed Out
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_query__bb_poolrelease:594) Pushed 0x7f48d7417000 131072 131072
May 20 2015 08:26:53 GMT: DETAIL (query): (thr_query.c:as_qtr__release:814) Released qtr [base/thr_query.c:2087] 0x7f48d8800008 1

#7

Please disable detail and debug logging … that is way too much info to start with :smile:. Just have a benchmark enabled and paste what shows up in last 100 lines of log …

Histogram would look like

May 20 2015 08:37:59 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec May 20 2015 08:37:59 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (2 total) msec May 20 2015 08:37:59 GMT: INFO (info): (hist.c::163) (00: 0000000001) (01: 0000000001) May 20 2015 08:37:59 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec May 20 2015 08:37:59 GMT: INFO (info): (hist.c::137) histogram dump: writes_reply (2 total) msec May 20 2015 08:37:59 GMT: INFO (info): (hist.c::163) (00: 0000000001) (01: 0000000001) May 20 2015 08:37:59 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec May 20 2015 08:37:59 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec May 20 2015 08:37:59 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count

– R


#9

Here is query histogram:

May 21 2015 04:18:28 GMT: INFO (info): (hist.c::137) histogram dump: query (31 total) msec
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::163)  (09: 0000000012) (10: 0000000019)
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (31 total) count
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::163)  (10: 0000000031)
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::137) histogram dump: query_txn_q_wait_us (4 total) usec
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::163)  (05: 0000000004)
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::137) histogram dump: query_prepare_batch_q_wait_us (4 total) usec
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::163)  (44: 0000000004)
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::137) histogram dump: query_prepare_batch_us (4 total) usec
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::163)  (44: 0000000004)
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::137) histogram dump: query_batch_io_q_wait_us (0 total) usec
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::137) histogram dump: query_batch_io_us (0 total) usec
May 21 2015 04:18:28 GMT: INFO (info): (hist.c::137) histogram dump: query_net_io_us (0 total) usec

#10

nizsheanez,

query_rec_count histogram tells that all the request is in 10th bucket so every query is processing around 1k request. The default batch size of query is 100 that would mean system would create batch is 100 each and run that in parallel.

query histogram tells that each request is roughly getting upto 9th bucket that is around .5 second as you observer on the client side as well. This is pretty huge.

How fast is your normal query going on ??

aql> select * from sandbox.products WHERE status='active'

Are you running it concurrently …??

What is the server config … ??

– R


"Initial partition balance unresolved" after restart 1 node in Cluster (AER-3863)
#12

If amount of records for return is small, then it’s fast(100 records for return): 0.006ms

if amout of records for return is 1K: 0.042ms

aerospike.conf:

       service {
    	user root
            group root
            paxos-single-replica-limit 1
            pidfile /var/run/aerospike/asd.pid
            service-threads 24
            transaction-queues 16
            transaction-threads-per-queue 16
            proto-fd-max 15000
            batch-max-requests 100000
            batch-threads 16
    }
logging {
	file /var/log/aerospike/aerospike.log {
                context any info
                context paxos warning
                context ldt warning
                context udf info
                context query info
        }
}
network {
	service {
                address any
                port 3000
        }

	heartbeat {
                mode mesh
                address 127.0.0.1 #239.1.99.222
                port 9918
                interval 150
                timeout 10
        }

	fabric {
                port 3001
        }

	info {
              	port 3003
        }
}


namespace sandbox {
        replication-factor 0
        #enable-xdr true      # Enable replication for this namespace
        memory-size 4G
        default-ttl 30d       # 30 days, use 0 to never expire/evict.

        storage-engine memory
}

asinfo -v ‘get-config:’ -l

transaction-queues=16
transaction-threads-per-queue=16
transaction-duplicate-threads=0
transaction-pending-limit=20
migrate-threads=1
migrate-xmit-priority=40
migrate-xmit-sleep=500
migrate-read-priority=10
migrate-read-sleep=500
migrate-xmit-hwm=10
migrate-xmit-lwm=5
migrate-max-num-incoming=256
migrate-rx-lifetime-ms=60000
proto-fd-max=15000
proto-fd-idle-ms=60000
proto-slow-netio-sleep-ms=1
transaction-retry-ms=1000
transaction-max-ms=1000
transaction-repeatable-read=false
dump-message-above-size=134217728
ticker-interval=10
microbenchmarks=false
storage-benchmarks=false
ldt-benchmarks=false
scan-priority=200
scan-sleep=1
batch-threads=16
batch-max-requests=100000
batch-priority=200
nsup-delete-sleep=100
nsup-period=120
nsup-startup-evict=true
paxos-retransmit-period=5
paxos-single-replica-limit=1
paxos-max-cluster-size=32
paxos-protocol=v3
paxos-recovery-policy=manual
write-duplicate-resolution-disable=false
respond-client-on-master-completion=false
replication-fire-and-forget=false
info-threads=16
allow-inline-transactions=true
use-queue-per-device=false
snub-nodes=false
fb-health-msg-per-burst=0
fb-health-msg-timeout=200
fb-health-good-pct=50
fb-health-bad-pct=0
auto-dun=false
auto-undun=false
prole-extra-ttl=0
max-msgs-per-type=-1
service-threads=24
fabric-workers=16
pidfile=/var/run/aerospike/asd.pid
memory-accounting=false
udf-runtime-gmax-memory=18446744073709551615
udf-runtime-max-memory=18446744073709551615
sindex-populator-scan-priority=3
sindex-data-max-memory=18446744073709551615
query-threads=6
query-worker-threads=15
query-priority=10
query-in-transaction-thread=0
query-req-in-query-thread=0
query-req-max-inflight=100
query-bufpool-size=256
query-batch-size=100
query-sleep=1
query-job-tracking=false
query-short-q-max-size=500
query-long-q-max-size=500
query-rec-count-bound=4294967295
query-threshold=10
query-untracked-time=1000000
service-address=0.0.0.0
service-port=3000
reuse-address=true
fabric-port=3001
network-info-port=3003
enable-fastpath=true
heartbeat-mode=mesh
heartbeat-protocol=v2
heartbeat-address=127.0.0.1
heartbeat-port=9918
heartbeat-interval=150
heartbeat-timeout=10
enable-security=false
privilege-refresh-period=300
report-authentication-sinks=0
report-data-op-sinks=0
report-sys-admin-sinks=0
report-user-admin-sinks=0
report-violation-sinks=0
syslog-local=-1
enable-xdr=false
forward-xdr-writes=false
xdr-delete-shipping-enabled=true
xdr-nsup-deletes-enabled=false
stop-writes-noxdr=false
reads-hist-track-back=1800
reads-hist-track-slice=10
reads-hist-track-thresholds=1,8,64
writes_master-hist-track-back=1800
writes_master-hist-track-slice=10
writes_master-hist-track-thresholds=1,8,64
proxy-hist-track-back=1800
proxy-hist-track-slice=10
proxy-hist-track-thresholds=1,8,64
writes_reply-hist-track-back=1800
writes_reply-hist-track-slice=10
writes_reply-hist-track-thresholds=1,8,64
udf-hist-track-back=1800
udf-hist-track-slice=10
udf-hist-track-thresholds=1,8,64
query-hist-track-back=1800
query-hist-track-slice=10
query-hist-track-thresholds=1,8,64
query_rec_count-hist-track-back=1800
query_rec_count-hist-track-slice=10
query_rec_count-hist-track-thresholds=1,8,64

I have 2 variants for tests:

  1. Single server installation - server load is ~20% but nobody use Aerospike instance.

  2. 3 Nodes Cluster - 270TPS Read and 90TPS write, it’s simple Get/Put transactions. All 3 nodes have 24CPU cores, only 1 core is busy.

So, i have small concurrency, i show you usually statistic from 1) variant. But response time of queries quite stable(3 nodes 2 time faster).

PS. Found how to read histograms (http://www.aerospike.com/docs/operations/monitor/v3port/reading_raw_latency_histograms.html)


#13

You mean 0.006 seconds and 0.042 seconds right ?? And with the aggregation it was around 0.5 to 0.7 seconds right ??

1 core busy

hmm…

What server version are you running ? is it <3.5.8 ?

Can you stream out result into /dev/null on the client side to strike out client bottleneck and time it.

 aql -c "select * from sandbox.products WHERE status='active' > /dev/null

Also what does TOP look like when you are running the workload (top with core level detail i.e top & 1)

– R


#15

yes, of course seconds.

asinfo | grep -C 1 build
7 :  version
     Aerospike Community Edition build 3.5.8
8 :  build
     3.5.8

Top for no requests:

Top for select request:

Top for aggregation request:

On other nodes almost same picture.


#16

Weird certain CPU getting overloaded for all request cases …

What is your NIC setup like … number of queues ?? Can you send distribution of how irq is served…

 cat /etc/interrupts | grep <ethernet name> 

Is it amazon (hosted) or bare metal boxes ??

– R


#18

ifconfig should show you the NIC you guys are using …

– R


#19

which one? sorry, i bad in network interfaces


#20

Could be eth1 or eth0 seems to be the one. Based on your earlier message seems the card you are using has only 1 queue… Can you check and confirm with your network guy …

Invariant on server side does how does query histogram look with and without aggregation …

– R


#21

Admin says: Yes we have only 1 queue, but network card is:

 Capabilities: [58] MSI: Enable- Count=1/16 Maskable- 64bit+
    Capabilities: [a0] MSI-X: Enable- Count=9 Masked-
    Capabilities: [ac] Express Endpoint, MSI 00

He think something disabled in bios


#22

Try RPS

http://www.aerospike.com/docs/deploy_guides/aws/recommendations/

https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/network-rps.html

Lets remove network out of equation …

– R


#23

But guy says: irq ~1% , context switch 0.5%

And look like it’s not about nerwork card.

But anyway, on single node installation i do requests from localhost.

One small question: in ruby client i found this comment:

Calling query with no filters is more efficiently handled by a primary index scan.

does it make sense to do all filtering on UDF only, and don’t use secondary indices?


#24

one thing i really don’t understand:

on Single node installation i changed lot’s of configurations:

query-batch-size from 5 to 100000
query-worker-threads from 1 to 300
query-threads from 1 to 24

and other, but it doesn’t affect response time at all.

i changing by: asinfo -v 'set-config:context=service;query-batch-size=100'
and checking by: asinfo -v 'get-config:' -l | grep query

what do i do wrong? (i think it’s because query executing in single thread, but why…)