Very slow fetching of records with scanAll()

tporadowski · January 8, 2015, 11:42pm

Hi,

I started evaluating Aerospike on my notebook to see it in action, but after setting up a sample web application with help of Spring Framework - I was somewhat astonished to see that a scanAll() operation which returns 50 elements from a given set takes over 1000ms . I know that my development machine is not a speed daemon and it has no SSD disk, but I expected much faster response times. When I replace the repository bean with one that connects to a Couchbase instance - then I get the same set of records in 90-100ms. I’ve tried setting higher priorities in clientPolicy.scanPolicyDefault.priority, but the result is the same.

Here is my namespace configuration (the rest of aerospike.conf is default):

namespace tpotest {
        memory-size 2G
        default-ttl 0 
        storage-engine memory
        storage-engine device {
                data-in-memory true
                filesize 8G
                file /opt/aerospike/data/tpotest.dat
        }
}

And here is AerospikClient configuration:

	@Bean
	public AerospikeClient getAerospikeClient(){
		ClientPolicy clientPolicy = new ClientPolicy();
		clientPolicy.readPolicyDefault.timeout = 500;
		clientPolicy.readPolicyDefault.maxRetries = 4;
		clientPolicy.readPolicyDefault.sleepBetweenRetries = 150;
		clientPolicy.writePolicyDefault.recordExistsAction = RecordExistsAction.UPDATE;
		clientPolicy.failIfNotConnected = true;
		clientPolicy.timeout = 500;
		clientPolicy.maxThreads = 100;
		clientPolicy.queryPolicyDefault.priority = Priority.HIGH;
		clientPolicy.queryPolicyDefault.recordQueueSize = 100;
		clientPolicy.scanPolicyDefault.concurrentNodes = false;
		clientPolicy.scanPolicyDefault.includeBinData = true;
		clientPolicy.scanPolicyDefault.priority = Priority.HIGH;
		clientPolicy.scanPolicyDefault.timeout = 500;

		return new AerospikeClient(clientPolicy, "127.0.0.1", 3000);
	}

What else can I try to make it work as it should (blazing fast)?

I’m using:

Aerospike Server Community Edition 3.4.0
Ubuntu 14.04
Aerospike Java Client 3.0.32
a notebook with 2 CPU cores and 8GB of RAM.

Regards, Tomasz

helipilot50 · January 9, 2015, 2:48am

Hi Tomasz

I can do a scan of 99,999 records in ~700 milli seconds. The client is Java running on my Mack Book Pro and the server is running in a Virtual Box VM. So all on the same machine.

The Java client takes a moment to “warm up”, you may being incurring the connection cost as part of your test. Try running your test twice and measuring the time of the 2nd test.

Regards Peter

tporadowski · January 9, 2015, 8:11am

Hi Peter,

I wouldn’t bother anyone here just after running it once or twice :). I also have similar results when running a query from aql tool - it also fetches my 50 records in 0,9 seconds. Apparently there is something wrong in my setup (I’m using default out-of-the-box values), but initially I posted this issue to Java Client group as I had thought it was related… Anyway, maybe someone here can point me in the right direction?

Regards, Tomasz

helipilot50 · January 12, 2015, 5:28pm

Hi Tomasz

To eliminate the Java client configuration, can you try your test without the client policy?

Also, are you running a vm on your laptop, if so what is the host OS?

Regards

tporadowski · January 13, 2015, 7:43am

Hi Peter,

I’m testing Aerospike directly without any VM under Ubuntu 14.04 LTS. What I have tried so far:

using AerospikeClient.scanAll() - yields 10 records in around a second
using AerospikeClient.query() with a secondary index - similar time
using both above without any ClientPolicy applied
using aql tool:

aql> SELECT login FROM test.users
10 rows in set (0.880 secs)

which gives me such an entry in aerospike.log:

Jan 13 2015 07:10:28 GMT: INFO (scan): (thr_tscan.c::280) SCAN JOB DONE  [id =6: ns= test set=users scanned=10 expired=0 set_diff=0 elapsed=976 (ms)]

(debug logging shows that 4096 partitions were scanned…)

When I query for records by PK then I get records instantly (aql reports 0.000 to 0.001s for a single record).

Regards, Tomasz

helipilot50 · January 22, 2015, 5:13pm

Hi Tomasz

You are getting some very unusual results. I’m guessing something is configured wrong. Can you post your full aerospike.conf file, and can you post a description of your record. Whit this I will simulate your issue.

BTW: Where are you located? - I’m teaching Aerospike for Developers in Vilnius next week and in Berlin the week after; if you can make it to either location, I can put you on the class for no cost.

Peter

tporadowski · January 23, 2015, 8:42am

Hi Peter,

I also suspect a configuration issue, yet I’m testing with out-of-the-box configuration which I presume should be correct in 99% of cases (mine is likely falling into that missing 1%). Yesterday I even compiled it from sources, used aerospike_dev.conf settings, but still - no improvement. You can see my configuration and logs in this ZIP file where I added: /var/log/aerospike/aerospike.log, config files tested plus output from dmesg, free, lsmod, sysctl -a and my /proc/cpuinfo.

Thanks for the training offer, but I won’t make it on such a short notice (though Berlin is not that far away from Bielsko-Biała, Poland).

Regards, Tomasz

tporadowski · January 23, 2015, 9:37am

Oh, forgot to add details about my records, but they are very simple:

INSERT INTO test.users (PK, login) VALUES ('login1', 'login1')

and then it’s repeated for 'login1' … 'login10'. So I created a set users and put there 10 records with a single text bin login. After that SELECT * FROM test.users in aql gets me those 10 records in 0.8s :(.

Tomasz

tporadowski · January 27, 2015, 8:00am

Yesterday, after having talked to one of your Solution Engineers, I tried to provide a “bare-metal” configuration and switched off all graphical interface processes and left the necessary processes running only. Yet, there was no significant improvement and full-scan timings for my little data set were similar. Mysterious, isn’t it?

Regards, Tomasz

helipilot50 · January 28, 2015, 6:20am

Hi Tomasz

I finally got a chance to write some code to duplicate issue. Here is what I have found:

Scenario - 100,000 records, each containing 1 string ~10 characters and 1 integer

aql running on the same node (eliminating network latency, but adding CPU usage) select * from test.demo 100000 rows in set (0.860 secs)

Java coded application to scanAll with the bins on the same node $ java -jar aerospike-top-10-1.0-full.jar -S Scan time for 100000 is 853 milliseconds

You can find the code at GitHub - helipilot50/aerospike-top-10-aggregation: Finding the top 10 with an Aerospike aggregation, it is part of another example.

To duplicate follow the instructions in the README.md to build a runnable jar.

Run the jar (as shown) with the -l option to load data
Run the jar with the -S option to scanAll

Using this example, running on the same node as your server, you will eliminate the network latency and any code differences between the scan you have coded and aql.

hope this helps Peter

tporadowski · February 1, 2015, 10:54pm

Hi Peter,

Thanks for that sample code. With different set sizes, using your aerospike-top-10-aggregation -S I got these results:

Scan time for 5000 is 908 milliseconds
Scan time for 50000 is 1139 milliseconds
Scan time for 100000 is 1145 milliseconds
Scan time for 250000 is 1425 milliseconds
Scan time for 500000 is 1832 milliseconds

(I run each test 10x on given set, above are the fastest times).

It seems that on my test environment there is like 800-900ms of initialization/warm-up(?) phase each time and then the rest is actually depending on the amount of data. I will give it a try somewhere else to see if it isn’t high time to upgrade my development hardware

[edit] On a server with memory-only storage I get 100K records in 155ms and 1M records in a second, so I think I’ll give up here and buy a new development machine…

Regards, Tomasz

anandprakash · February 3, 2015, 12:48am

Hi Tomasz, I looked at your scenario and did testing in my local environment on my laptop where I’ve VM machine(Vagrant) running with aerospike server(3.4.0).

→ I had created set called “login” and bin called “login” and inserted 10 records as below:- aql> select * from test.login 10 rows in set (0.281 secs)

→ Then ran scanAll() API call on above set ,so scanAll completed in around ~250 to 290 milliseconds range Scan time for 10 record is 290 milliseconds

→ Lastly ran AerospikeClient.query() call on secondary indexed bin “login” with filter setting as below

stmt.setFilters( Filter.equal(“login”, “login2”) );

So I got 1 record fetched in around ~8 to 10 milliseconds range all the time as below .

Record 1 is: {login=login2}
Query time for fetching 1 record is 9 milliseconds

So querying small data sets should take few milliseconds rather than seconds. Also all these depends on network latency,IO controller,amount of data … I noticed in your conf file that “proto-fd-max” value is not the default one i.e. 15000 .Try with default value Also , you can run asloglatency command to check how much time your operations are taking i.e. % of operation with respect to time. Here is the link for Log latency check - http://www.aerospike.com/docs/tools/asloglatency/

helipilot50 · February 3, 2015, 6:41am

Hi Tomasz,

There is a warm up period, especially with the Java client, because the class loader needs to load everything and JIT it as it is used. You see this in short test, but not so much in production.

Peter

tporadowski · February 3, 2015, 8:21am

Hi Anand,

I tested both config files (aerospike_dev.conf bundled with sources of aerospike-server and aerospike.conf that comes with .deb package). I will see what this asloglatency script can tell and post any findings here.

In regard to fetching single records - this is very fast when querying by key or for an indexed bin, I’m only curious why it’s so slow on my machine when doing a scan for the whole set (which has 10 records ).

Regards, Tomasz

tporadowski · February 3, 2015, 8:31am

Hi Peter,

By “initialization/warm-up phase” I meant some server-side operations that seem to eat those 800ms in each scan and then the rest of time seems to depend really on the amount of data. Of course you’re right with class loader and JIT compilation, but it all happens before you start timing getting back of results from your TopTen.scanAll() method and shouldn’t have any influence here.

Regards, Tomasz

helipilot50 · February 3, 2015, 12:46pm

Hi Tomasz

You are right, this is a warmup time on the server too, and it is more obvious in a single server cluster.

Peter

tporadowski · March 2, 2015, 7:56am

Hi,

@anandprakash, @helipilot50: I finally got some time to dig more into Aerospike Server’s code, run my tests with detailed logging and get to know (more or less) the idea behind “scan” operation. I only wish I had found this page from Operations Manual earlier as it seems I was using the wrong tool for my task. Now when I use a secondary index on additional “type” bin that resembles my set’s name - I get my 10 test records in 30-40ms on my dev machine.

Regards, Tomasz

anandprakash · March 2, 2015, 6:59pm

That’s great, Tomasz. Scans are long-running, resource-intensive jobs, so they should be implemented properly to get the expected result since they do entire namespace scan which might slow down your operation. So, it’s good to have secondary indexed bin to query and get filtered data.

Topic		Replies	Views
Iterating via Scan Seems to be Quite Slow Go Client query , secondary , scan , primary , index	5	3997	June 23, 2015
Extremely slow query times, optimization tips appreciated	4	2516	April 10, 2017
scanAll() yields inconsistent results Java Client	4	1807	February 28, 2017
Experiencing pauses in data when running AerospikeClient.scanNode Java Client	5	2033	August 20, 2015
Aerospike 4.5 to latest version upgrade suggestion	0	130	April 18, 2024

Very slow fetching of records with scanAll()

Related topics