Iterating through records is slow

go
index
query

#1

I’m reading from the Result channel in golang and I’m not able to loop through the records fast enough. I’m running the following query in aql:

SELECT JobID FROM search.jobEmbeddings WHERE Location WITHIN CAST('{"type": "AeroCircle", "coordinates": [[-122.419416,37.774929], 160934]}' AS GEOJSON)

29217 rows in set (0.159 secs)

But when I perform the same query in golang code

  QueryByLoc: time taken 0 (ms)
  29217 items processed
  Results loop: time taken 613 (ms)

it takes 4 times as long.

The records consist of a float array of size 100 and two small strings.

Network: I checked the networkIn on the machine and it peaks at 20 million bytes, I’m currently running this Aerospike database on a r3.xlarge instance (this was recommended by AS).

The speed I might expect is somewhere around 250ms or less for fetching 30k records.

Questions: Why might reading reading records from the channel be that much slower than running the query in aql?

I’m reading from the channel on the same box as the Aerospike instance. I’m trying to create a Search Engine using AS. I’m making Geo queries and I’m processing the results in my golang process.

Any help is greatly appreciated!

Thanks, Rob


#2

Can you show us the code snippet where you are performing query and timing it?


#3

Sure!

Here is the function where I am generating the geo query:

//GeoQuery creates a new geo query statement
// Create a geo filtering query which returns withing radius (miles)
func GeoQuery(ns string, keyset string, p *Point, radius float64) *as.Statement {
	stm := as.NewStatement(ns, keyset)
	stm.Addfilter(as.NewGeoWithinRadiusFilter(LocBinKey, p.Lng, p.Lat, radius*metersInMile))

	return stm
}

I’m getting the record set obtained by this function which calls GeoQuery:

//QueryByLoc queries the aerospike db via geospatial indexes
func QueryByLoc(i *Index, p *Point, r float64) (*as.Recordset, error) {
	var err error

	s := time.Now()
	rs := &as.Recordset{}

	stmt := GeoQuery(Namespace, EmbedSet, p, r)
	if rs, err = i.Client.Query(nil, stmt); err != nil {
		return nil, err
	}
	fmt.Printf("QueryByLoc: time taken %d (ms)\n", time.Since(s)/1000000)
	return rs, err
}

The part that takes a long time is here where I iterate through the records.

//compareAll embeddings in the Recordset with embedding vector
func CalculateTopK(rs *as.Recordset, v1 algo.Emb, k int) *algo.SearchHeap {
	items := []algo.Item{}
	c := 0

	sTotal := time.Now()

	for res := range rs.Results() {
           if res == nil {
           }
       }

	fmt.Printf("%d items processed\n", c)
	fmt.Printf("CalculateTopK: time taken %d (ms)\n", time.Since(sTotal)/1000000)
	return top
}

The number of items processed is around 30k. I most feel as if the channel isn’t getting records written to it fast enough or I’m not reading the channel fast enough.

I removed some of the parts that did more work since just iterating through was the bottleneck.

I read the code which populates the channel and it seems to me that having multiple nodes would perform the writing to the channel in parallel. I tried adding a node in the AMC but it said that “feature is for enterprise version only”.

Rob


#4

Is there anyone who is able to get 30k or so results back from an Aerospike db in under half a second? I’m fetching the results from the same machine as where the Aerospike db is. Am I fetching the results incorrectly? I tried creating concurrent go routines to read from the channel but I still get the same slow speeds. I’m currently running on a 10Gb r4.xlarge box on AWS.


#5

Sorry I don’t really know go. You DEFINITELY can get more than 30k records back in under half a second… I just don’t know if this is a code problem or a node problem. I also never did anything with geo query, is that something you can test in AQL?


#6

Yes, you can. Here is the equivalent geo query in aql:

SELECT JobID FROM search.jobEmbeddings WHERE Location WITHIN CAST('{"type": "AeroCircle", "coordinates":
 [[-81.379234,28.538336], 400000]}' AS GEOJSON)

27406 rows in set (0.172 secs)

It looks like I’m getting around 30k records faster than half a second here. So I take it that you can definitely get more than 30k records back in a half a second using java or c++. I take it that you’re seen this type of performance before with Aerospike in other languages. That’s helpful. If the issue is with the go code then I can probably find where the bottleneck is. I suppose that I can contact the committers of the go package if I can’t find where the issues are.


#7

OK well I’m glad we’ve narrowed it down to a code issue! That’s good progress!!