GeoSpatial query performance suboptimal and varies

clusc · March 9, 2017, 5:36am

Hello,

We were conducting some performance/load testing on Aerospike for geospatial queries, but we are not very satisfied by the result. For GeoContaingPoint queries, we saw average 100 TPS, and it varies a lot (20-2k) depending on the input point.

The test dataset we were using is from OpenStreetMap, which has roughly 173K polygons(~1.7G). We then performs GeoContainingPoint queries using Go client library, and capture the result from the benchmark tool(slightly modified):

./benchmark_geo -h 10.60.4.5 -p 3000 -w RU:100 -c 64 -n osm -s mapzen-level-8 -i city.res -L 5,1 2017/03/09 02:29:41 benchmark_geo.go:677: write(tps=0 timeouts=0 errors=0) read(tps=166 timeouts=0 errors=0) total(tps=166 timeouts=0 errors=0, count=166) 2017/03/09 02:29:41 benchmark_geo.go:685: Min(ms) Avg(ms) Max(ms) |<= 1 ms |> 1 ms |> 2 ms |> 4 ms |> 8 ms |> 16 ms 2017/03/09 02:29:41 benchmark_geo.go:692: READ 0 306.796 409 | 0/0.00% | 166/99.40% | 166/99.40% | 166/99.40% | 166/99.40% | 166/99.40% 2017/03/09 02:29:41 benchmark_geo.go:699: WRITE 0 0.000 0 | 0/0.00% | 0/0.00% | 0/0.00% | 0/0.00% | 0/0.00% | 0/0.00%

Some further investigation facts:

The performance varies a great deal depending on the input geopoint. For densely populated area(e.g. LA/SF), performance degrades to around 20TPS, whereas for uninhabited area we saw almost 2K TPS. Is this expected?
CPU utilization on the cluster are very low(<2%), but increasing the client parallelization would only increase latency but not TPS, why?

Here are our cluster setup:

Aerospike Version: 3.11.1.1 Community Cluster size: 3 Machine: GCE n1-highmem-32 (32 vCPUs, 208 GB memory) storage-engine memory replication-factor 2

We’d be very interested in gathering some performance metrics from your team, and wanna know the upper limit of the Aerospike cluster. Please advice.

Thanks,

Eric

bbulkow · March 9, 2017, 3:52pm

Eric, I agree, this seems really low — One bare-metal hardware, with the same OSM system, we typically found 50k QPS; at that point the CPU on the servers would be saturated. 100 to 2k is far lower than we would expect.

Hopefully our eng team will have some time to investigate, give a cogent response, although I know we have some exciting features coming out, thus they might not get to the analysis immediately.

clusc · March 9, 2017, 3:59pm

Thanks for the quick response. We’re very keen on getting Aerospike to production but the performance number is a blocker now. Hopefully we can get some pointer from your engineering team soon.

clusc · March 14, 2017, 12:18am

@bbulkow Can you help us push for an update? We’ve been waiting for a few days but no response from your engineering team. Thanks

Paul_Choi · March 23, 2017, 3:30am

Hi,

We are still waiting for your help. While waiting, I ran profiling and it shows a sign of lock contention at as_record_get_live() within query_io() (in thr_query.c, that goes down to olock_vlock() in as_index_sprig_get_vlock() in index.c). Changing partition-tree-locks and partition-tree-sprigs did not help. Can anyone take a look?

When you got 50k geo qps, what was the aerospike configuration/geo data/queries?

Paul

Paul_Choi · March 28, 2017, 2:52am

Thanks for the comment.

We ran benchmark again with more diverse query set, expecting to spread out the hot-keys. This time we got better TPS, but still less than half of the vCPU cores are busy. If there is a way to fully utilize the available vCPUs and get better TPS, please let us know. Thanks.

Paul

wchu · March 28, 2017, 3:17am

You may be at a stage where client is not able to feed enough requests to the server. Try adding some more clients to load the server.

bbulkow · March 28, 2017, 6:13am

I have a further question. There’s no statement in the original question about whether the client is single threaded, or not. You want to make sure you’re running multiple client threads, and as Wchu says, you’ll want to make sure you’re not out of client horsepower - or network. Both network and client horsepower scales by adding more nodes.

clusc · March 28, 2017, 7:20am

We were running on a 16 core machine using the benchmark tool shipped with Go client library(slightly modified to tailor geo query). The result we got was from using 64 goroutines, and we found increasing concurrencies would only increase latencies but not overal TPS, a sign of overloading. But what you suggest make sense, it could also be due to network. We will test again tomorrow with more distributed clients and post result here. Thanks!

clusc · March 28, 2017, 6:48pm

We re-did the test with distributed 100 clients, but the behavior is similar, the total QPS is still under 2k(aggregated from all clients), and server cpu utilization is less than 10%.

Topic		Replies	Views
High Latency with geo filtering(point in a polygon) at high throughput Query & Indexing query , secondary , geoindex , latency	3	1526	August 3, 2017
Not able to achieve 1Million TPS in Aerospike Benchmarks despite of capable hardware Aerospike Server Benchmarks	19	9402	March 29, 2017
Aerospike Geospatial Index & Query - Early Adopter Release (November 30, 2015) Releases (Server, Client & Tools) query , geo , index	3	3946	December 9, 2015
Extremely slow query times, optimization tips appreciated	4	2575	April 10, 2017
Multi filter : one with list of ids and second is geo contains	1	806	January 30, 2018

GeoSpatial query performance suboptimal and varies

Related topics