Hi,
I started evaluating Aerospike on my notebook to see it in action, but after setting up a sample web application with help of Spring Framework - I was somewhat astonished to see that a scanAll()
operation which returns 50 elements from a given set takes over 1000ms . I know that my development machine is not a speed daemon and it has no SSD disk, but I expected much faster response times. When I replace the repository bean with one that connects to a Couchbase instance - then I get the same set of records in 90-100ms. Iāve tried setting higher priorities in clientPolicy.scanPolicyDefault.priority
, but the result is the same.
Here is my namespace configuration (the rest of aerospike.conf
is default):
namespace tpotest {
memory-size 2G
default-ttl 0
storage-engine memory
storage-engine device {
data-in-memory true
filesize 8G
file /opt/aerospike/data/tpotest.dat
}
}
And here is AerospikClient
configuration:
@Bean
public AerospikeClient getAerospikeClient(){
ClientPolicy clientPolicy = new ClientPolicy();
clientPolicy.readPolicyDefault.timeout = 500;
clientPolicy.readPolicyDefault.maxRetries = 4;
clientPolicy.readPolicyDefault.sleepBetweenRetries = 150;
clientPolicy.writePolicyDefault.recordExistsAction = RecordExistsAction.UPDATE;
clientPolicy.failIfNotConnected = true;
clientPolicy.timeout = 500;
clientPolicy.maxThreads = 100;
clientPolicy.queryPolicyDefault.priority = Priority.HIGH;
clientPolicy.queryPolicyDefault.recordQueueSize = 100;
clientPolicy.scanPolicyDefault.concurrentNodes = false;
clientPolicy.scanPolicyDefault.includeBinData = true;
clientPolicy.scanPolicyDefault.priority = Priority.HIGH;
clientPolicy.scanPolicyDefault.timeout = 500;
return new AerospikeClient(clientPolicy, "127.0.0.1", 3000);
}
What else can I try to make it work as it should (blazing fast)?
Iām using:
- Aerospike Server Community Edition 3.4.0
- Ubuntu 14.04
- Aerospike Java Client 3.0.32
- a notebook with 2 CPU cores and 8GB of RAM.
Regards,
Tomasz
Hi Tomasz
I can do a scan of 99,999 records in ~700 milli seconds. The client is Java running on my Mack Book Pro and the server is running in a Virtual Box VM. So all on the same machine.
The Java client takes a moment to āwarm upā, you may being incurring the connection cost as part of your test. Try running your test twice and measuring the time of the 2nd test.
Regards
Peter
Hi Peter,
I wouldnāt bother anyone here just after running it once or twice :). I also have similar results when running a query from aql
tool - it also fetches my 50 records in 0,9 seconds. Apparently there is something wrong in my setup (Iām using default out-of-the-box values), but initially I posted this issue to Java Client group as I had thought it was relatedā¦ Anyway, maybe someone here can point me in the right direction?
Regards,
Tomasz
Hi Tomasz
To eliminate the Java client configuration, can you try your test without the client policy?
Also, are you running a vm on your laptop, if so what is the host OS?
Regards
Hi Peter,
Iām testing Aerospike directly without any VM under Ubuntu 14.04 LTS. What I have tried so far:
- using
AerospikeClient.scanAll()
- yields 10 records in around a second
- using
AerospikeClient.query()
with a secondary index - similar time
- using both above without any
ClientPolicy
applied
- using
aql
tool:
aql> SELECT login FROM test.users
10 rows in set (0.880 secs)
which gives me such an entry in aerospike.log:
Jan 13 2015 07:10:28 GMT: INFO (scan): (thr_tscan.c::280) SCAN JOB DONE [id =6: ns= test set=users scanned=10 expired=0 set_diff=0 elapsed=976 (ms)]
(debug logging shows that 4096 partitions were scannedā¦)
When I query for records by PK then I get records instantly (aql
reports 0.000 to 0.001s for a single record).
Regards,
Tomasz
Hi Tomasz
You are getting some very unusual results. Iām guessing something is configured wrong. Can you post your full aerospike.conf file, and can you post a description of your record. Whit this I will simulate your issue.
BTW: Where are you located? - Iām teaching Aerospike for Developers in Vilnius next week and in Berlin the week after; if you can make it to either location, I can put you on the class for no cost.
Peter
Hi Peter,
I also suspect a configuration issue, yet Iām testing with out-of-the-box configuration which I presume should be correct in 99% of cases (mine is likely falling into that missing 1%). Yesterday I even compiled it from sources, used aerospike_dev.conf
settings, but still - no improvement. You can see my configuration and logs in this ZIP file where I added: /var/log/aerospike/aerospike.log
, config files tested plus output from dmesg
, free
, lsmod
, sysctl -a
and my /proc/cpuinfo
.
Thanks for the training offer, but I wonāt make it on such a short notice (though Berlin is not that far away from Bielsko-BiaÅa, Poland).
Regards,
Tomasz
Oh, forgot to add details about my records, but they are very simple:
INSERT INTO test.users (PK, login) VALUES ('login1', 'login1')
and then itās repeated for 'login1'
ā¦ 'login10'
. So I created a set users
and put there 10 records with a single text bin login
. After that SELECT * FROM test.users
in aql
gets me those 10 records in 0.8s :(.
Tomasz
Yesterday, after having talked to one of your Solution Engineers, I tried to provide a ābare-metalā configuration and switched off all graphical interface processes and left the necessary processes running only. Yet, there was no significant improvement and full-scan timings for my little data set were similar. Mysterious, isnāt it?
Regards,
Tomasz
Hi Tomasz
I finally got a chance to write some code to duplicate issue. Here is what I have found:
Scenario - 100,000 records, each containing 1 string ~10 characters and 1 integer
aql running on the same node (eliminating network latency, but adding CPU usage)
select * from test.demo
100000 rows in set (0.860 secs)
Java coded application to scanAll with the bins on the same node
$ java -jar aerospike-top-10-1.0-full.jar -S
Scan time for 100000 is 853 milliseconds
You can find the code at GitHub - helipilot50/aerospike-top-10-aggregation: Finding the top 10 with an Aerospike aggregation, it is part of another example.
To duplicate follow the instructions in the README.md to build a runnable jar.
- Run the jar (as shown) with the -l option to load data
- Run the jar with the -S option to scanAll
Using this example, running on the same node as your server, you will eliminate the network latency and any code differences between the scan you have coded and aql.
hope this helps
Peter
Hi Peter,
Thanks for that sample code. With different set sizes, using your aerospike-top-10-aggregation -S
I got these results:
Scan time for 5000 is 908 milliseconds
Scan time for 50000 is 1139 milliseconds
Scan time for 100000 is 1145 milliseconds
Scan time for 250000 is 1425 milliseconds
Scan time for 500000 is 1832 milliseconds
(I run each test 10x on given set, above are the fastest times).
It seems that on my test environment there is like 800-900ms of initialization/warm-up(?) phase each time and then the rest is actually depending on the amount of data. I will give it a try somewhere else to see if it isnāt high time to upgrade my development hardware
[edit] On a server with memory-only storage I get 100K records in 155ms and 1M records in a second, so I think Iāll give up here and buy a new development machineā¦
Regards,
Tomasz
Hi Tomasz,
I looked at your scenario and did testing in my local environment on my laptop where Iāve VM machine(Vagrant) running with aerospike server(3.4.0).
ā I had created set called āloginā and bin called āloginā and inserted 10 records as below:-
aql> select * from test.login
10 rows in set (0.281 secs)
ā Then ran scanAll() API call on above set ,so scanAll completed in around ~250 to 290 milliseconds range
Scan time for 10 record is 290 milliseconds
ā Lastly ran AerospikeClient.query() call on secondary indexed bin āloginā with filter setting as below
stmt.setFilters( Filter.equal(āloginā, ālogin2ā) );
So I got 1 record fetched in around ~8 to 10 milliseconds range all the time as below .
- Record 1 is: {login=login2}
- Query time for fetching 1 record is 9 milliseconds
So querying small data sets should take few milliseconds rather than seconds. Also all these depends on network latency,IO controller,amount of data ā¦
I noticed in your conf file that āproto-fd-maxā value is not the default one i.e. 15000 .Try with default value
Also , you can run asloglatency command to check how much time your operations are taking i.e. % of operation with respect to time.
Here is the link for Log latency check - http://www.aerospike.com/docs/tools/asloglatency/
Hi Tomasz,
There is a warm up period, especially with the Java client, because the class loader needs to load everything and JIT it as it is used. You see this in short test, but not so much in production.
Peter
Hi Anand,
I tested both config files (aerospike_dev.conf
bundled with sources of aerospike-server
and aerospike.conf
that comes with .deb
package). I will see what this asloglatency
script can tell and post any findings here.
In regard to fetching single records - this is very fast when querying by key or for an indexed bin, Iām only curious why itās so slow on my machine when doing a scan for the whole set (which has 10 records ).
Regards,
Tomasz
Hi Peter,
By āinitialization/warm-up phaseā I meant some server-side operations that seem to eat those 800ms in each scan and then the rest of time seems to depend really on the amount of data. Of course youāre right with class loader and JIT compilation, but it all happens before you start timing getting back of results from your TopTen.scanAll()
method and shouldnāt have any influence here.
Regards,
Tomasz
Hi Tomasz
You are right, this is a warmup time on the server too, and it is more obvious in a single server cluster.
Peter
Hi,
@anandprakash, @helipilot50: I finally got some time to dig more into Aerospike Serverās code, run my tests with detailed logging and get to know (more or less) the idea behind āscanā operation. I only wish I had found this page from Operations Manual earlier as it seems I was using the wrong tool for my task. Now when I use a secondary index on additional ātypeā bin that resembles my setās name - I get my 10 test records in 30-40ms on my dev machine.
Regards,
Tomasz
Thatās great, Tomasz. Scans are long-running, resource-intensive jobs, so they should be implemented properly to get the expected result since they do entire namespace scan which might slow down your operation. So, itās good to have secondary indexed bin to query and get filtered data.