Iterating via Scan Seems to be Quite Slow

primary
index
secondary
scan
query

#1

Hello,

I’m using the Go client library on Ubuntu 14.04 with 1GB memory and 1GHz CPU with the default Aerospike Community 3.5 configuration.

What I’d like to do is fetch all records in a set as fast as possible. (this set represented a pre-generated ‘news feed’ for my app).

It seems that while fetching individual records is fast (~1ms, though I thought it’d be faster), fetching around 15 records from a set seems slow (~40ms via the aql tool and ~70ms with the Go client library)

My questions are:

  1. Is the best way to fetch all records from a small set (< 1000 records) to simply do a Scan or is there a faster way (UDF?)

  2. Is it just me or is this code slow (output below code snippet):

    package main

    import ( “log” “math” “runtime” “time”

    as "github.com/aerospike/aerospike-client-go"
    

    )

    const ( HOSTNAME = "162.209.101.94" PORT = 3000 NAMESPACE = "test" SET = “set1” )

    func panicOnError(err error) { if err != nil { log.Fatalln(err) } }

    func main() { runtime.GOMAXPROCS(runtime.NumCPU())

    client, err := as.NewClient(HOSTNAME, PORT)
    panicOnError(err)
    
    log.Printf("clearing data from %v.%v", NAMESPACE, SET)
    recordset, err := client.ScanAll(nil, NAMESPACE, SET)
    

    L: for { select { case record, open := <-recordset.Records: if !open { break L } client.Delete(nil, record.Key) case err := <-recordset.Errors: panicOnError(err) } } log.Printf(“starting test…”)

    count := 0
    for i := 1; i < math.MaxInt16; i *= 2 {
        for j := count; j < i; j++ {
            key, _ := as.NewKey(NAMESPACE, SET, j)
            err := client.PutBins(nil, key, as.NewBin("bin", j))
            panicOnError(err)
            count++
        }
    
        log.Printf("There are now %d records in the database...\n", i)
    
        for j := 0; j < 3; j++ {
            t := time.Now()
            scan(client, i)
            log.Printf("Scanning %d records took: %v\n", i, time.Now().Sub(t))
        }
    
    }
    

    }

    // scans the database and checks if the number of returned records // is the same as the expected number func scan(client *as.Client, expectedRecordsCount int) { recordset, err := client.ScanAll(nil, NAMESPACE, SET) panicOnError(err) count := 0

    L: for { select { case rec := <-recordset.Records: if rec == nil { break L } count++ case err := <-recordset.Errors: panicOnError(err) } }

    log.Println("Number of records returned by ScanAll:", count)
    if expectedRecordsCount != count {
        log.Fatalf("Expected ScanAll to return %d records, but it returned %d.\n", expectedRecordsCount, count)
    }
    

    }

Output on my server:

2015/06/08 16:32:31 clearing data from test.set1
2015/06/08 16:32:32 starting test...
2015/06/08 16:32:32 There are now 1 records in the database...
2015/06/08 16:32:32 Number of records returned by ScanAll: 1
2015/06/08 16:32:32 Scanning 1 records took: 78.36412ms
2015/06/08 16:32:32 Number of records returned by ScanAll: 1
2015/06/08 16:32:32 Scanning 1 records took: 43.248526ms
2015/06/08 16:32:32 Number of records returned by ScanAll: 1
2015/06/08 16:32:32 Scanning 1 records took: 40.7049ms
2015/06/08 16:32:32 There are now 2 records in the database...
2015/06/08 16:32:32 Number of records returned by ScanAll: 2
2015/06/08 16:32:32 Scanning 2 records took: 59.658765ms
2015/06/08 16:32:32 Number of records returned by ScanAll: 2
2015/06/08 16:32:32 Scanning 2 records took: 42.604027ms
2015/06/08 16:32:32 Number of records returned by ScanAll: 2
2015/06/08 16:32:32 Scanning 2 records took: 40.282638ms
2015/06/08 16:32:32 There are now 4 records in the database...
2015/06/08 16:32:32 Number of records returned by ScanAll: 4
2015/06/08 16:32:32 Scanning 4 records took: 48.869536ms
2015/06/08 16:32:32 Number of records returned by ScanAll: 4
2015/06/08 16:32:32 Scanning 4 records took: 43.169878ms
2015/06/08 16:32:32 Number of records returned by ScanAll: 4
2015/06/08 16:32:32 Scanning 4 records took: 47.166162ms
2015/06/08 16:32:32 There are now 8 records in the database...
2015/06/08 16:32:32 Number of records returned by ScanAll: 8
2015/06/08 16:32:32 Scanning 8 records took: 84.795476ms
2015/06/08 16:32:32 Number of records returned by ScanAll: 8
2015/06/08 16:32:32 Scanning 8 records took: 43.545652ms
2015/06/08 16:32:32 Number of records returned by ScanAll: 8
2015/06/08 16:32:32 Scanning 8 records took: 80.401251ms
2015/06/08 16:32:32 There are now 16 records in the database...
2015/06/08 16:32:32 Number of records returned by ScanAll: 16
2015/06/08 16:32:32 Scanning 16 records took: 40.034078ms
2015/06/08 16:32:32 Number of records returned by ScanAll: 16
2015/06/08 16:32:32 Scanning 16 records took: 41.769932ms
2015/06/08 16:32:33 Number of records returned by ScanAll: 16
2015/06/08 16:32:33 Scanning 16 records took: 40.049275ms
2015/06/08 16:32:33 There are now 32 records in the database...
2015/06/08 16:32:33 Number of records returned by ScanAll: 32
2015/06/08 16:32:33 Scanning 32 records took: 40.808046ms
2015/06/08 16:32:33 Number of records returned by ScanAll: 32
2015/06/08 16:32:33 Scanning 32 records took: 42.052257ms
2015/06/08 16:32:33 Number of records returned by ScanAll: 32
2015/06/08 16:32:33 Scanning 32 records took: 42.014803ms
2015/06/08 16:32:33 There are now 64 records in the database...
2015/06/08 16:32:33 Number of records returned by ScanAll: 64
2015/06/08 16:32:33 Scanning 64 records took: 73.045035ms
2015/06/08 16:32:33 Number of records returned by ScanAll: 64
2015/06/08 16:32:33 Scanning 64 records took: 47.690186ms
2015/06/08 16:32:33 Number of records returned by ScanAll: 64
2015/06/08 16:32:33 Scanning 64 records took: 44.162124ms
2015/06/08 16:32:33 There are now 128 records in the database...
2015/06/08 16:32:33 Number of records returned by ScanAll: 128
2015/06/08 16:32:33 Scanning 128 records took: 68.254957ms
2015/06/08 16:32:33 Number of records returned by ScanAll: 128
2015/06/08 16:32:33 Scanning 128 records took: 55.243683ms
2015/06/08 16:32:33 Number of records returned by ScanAll: 128
2015/06/08 16:32:33 Scanning 128 records took: 55.808721ms
2015/06/08 16:32:33 There are now 256 records in the database...
2015/06/08 16:32:33 Number of records returned by ScanAll: 256
2015/06/08 16:32:33 Scanning 256 records took: 86.917317ms
2015/06/08 16:32:33 Number of records returned by ScanAll: 256
2015/06/08 16:32:33 Scanning 256 records took: 65.004391ms
2015/06/08 16:32:33 Number of records returned by ScanAll: 256
2015/06/08 16:32:33 Scanning 256 records took: 63.99411ms
2015/06/08 16:32:33 There are now 512 records in the database...
2015/06/08 16:32:33 Number of records returned by ScanAll: 512
2015/06/08 16:32:33 Scanning 512 records took: 111.097565ms
2015/06/08 16:32:33 Number of records returned by ScanAll: 512
2015/06/08 16:32:33 Scanning 512 records took: 99.760519ms
2015/06/08 16:32:34 Number of records returned by ScanAll: 512
2015/06/08 16:32:34 Scanning 512 records took: 88.390393ms
2015/06/08 16:32:34 There are now 1024 records in the database...
2015/06/08 16:32:34 Number of records returned by ScanAll: 1024
2015/06/08 16:32:34 Scanning 1024 records took: 129.885419ms
2015/06/08 16:32:34 Number of records returned by ScanAll: 1024
2015/06/08 16:32:34 Scanning 1024 records took: 108.603152ms
2015/06/08 16:32:34 Number of records returned by ScanAll: 1024
2015/06/08 16:32:34 Scanning 1024 records took: 106.503247ms
2015/06/08 16:32:34 There are now 2048 records in the database...
2015/06/08 16:32:34 Number of records returned by ScanAll: 2048
2015/06/08 16:32:34 Scanning 2048 records took: 152.69444ms
2015/06/08 16:32:34 Number of records returned by ScanAll: 2048
2015/06/08 16:32:34 Scanning 2048 records took: 133.488184ms
2015/06/08 16:32:34 Number of records returned by ScanAll: 2048
2015/06/08 16:32:34 Scanning 2048 records took: 125.877821ms
2015/06/08 16:32:35 There are now 4096 records in the database...
2015/06/08 16:32:35 Number of records returned by ScanAll: 4096
2015/06/08 16:32:35 Scanning 4096 records took: 143.931268ms
2015/06/08 16:32:35 Number of records returned by ScanAll: 4096
2015/06/08 16:32:35 Scanning 4096 records took: 136.875988ms
2015/06/08 16:32:35 Number of records returned by ScanAll: 4096
2015/06/08 16:32:35 Scanning 4096 records took: 154.082514ms
2015/06/08 16:32:35 There are now 8192 records in the database...
2015/06/08 16:32:35 Number of records returned by ScanAll: 8192
2015/06/08 16:32:35 Scanning 8192 records took: 198.330261ms
2015/06/08 16:32:36 Number of records returned by ScanAll: 8192
2015/06/08 16:32:36 Scanning 8192 records took: 188.137534ms
2015/06/08 16:32:36 Number of records returned by ScanAll: 8192
2015/06/08 16:32:36 Scanning 8192 records took: 190.544291ms
2015/06/08 16:32:36 There are now 16384 records in the database...
2015/06/08 16:32:37 Number of records returned by ScanAll: 16384
2015/06/08 16:32:37 Scanning 16384 records took: 345.938466ms
2015/06/08 16:32:37 Number of records returned by ScanAll: 16384
2015/06/08 16:32:37 Scanning 16384 records took: 336.74797ms
2015/06/08 16:32:37 Number of records returned by ScanAll: 16384
2015/06/08 16:32:37 Scanning 16384 records took: 322.955112ms

#2

Aerospike was designed and built for handling massive amounts of data. As such, Scan operations are not just simple cursors, but rather scheduled operations behind the scene to be able to go through that much data without disrupting our signature under-millisecond latency across the cluster.

That scheduling incurs overhead, which in the case of small datasets (few thousand records), becomes a significant part of the operation.

Since we have discovered developers (including ourselves in internal projects) are using Aerospike as a general database, we are optimizing the Scan/Query performance to do better in small datasets as well.


#3

Ok. Thank you for the clarification. Makes sense.

In the meantime, instead of relying on speed of a scan (or query, as I assume these are the same), would a a) UDF speed things up? or b) would somehow load balancing across nodes speed things up? (i.e. instead of increasing speed of the scan, simply increase throughput to handle more concurrent users)

My main goal is the ability to handle many concurrent users.


#4

@Khosrow_Afroozeh,

For a very limited data set, isn’t there a way to improve the speed of a Query() (not a scan), by tuning various policies such as ‘query-in-transaction-thread’ (true) and ‘query-sleep’ (set to a low number)?

I’m looking at your documentation but can’t find a way to config with the Go client.


#5

One side comment - the speed of scan and query is not the same. Queries interact with a specific secondary index which is more compact, while scans traverse several primary index trees. While future versions of the server will improve on the gap, it’s currently faster to query a known all-inclusive range on a secondary index than call a scan. You do pay in extra memory for a secondary index, but if it’s already present this technique will speed up the response.


#6

Nicholas,

The query speed problem is mostly an overhead kind of thing. It is unlikely you’ll be able to significantly speed it up for the moment. What is your expected performance and what do you get at the moment?

For small data sets, I usually cache them client-side, and update them if a counter is changed in the db.