Batch Writes (AER-6499)

Using Go, coming from MariaDB, I created a bytes.Buffer, and then wrote records put together from files as strings to create a single large INSERT string, which I then execute. I could import well over a million rows in around 45 seconds.

var sqlStrCat bytes.Buffer
importedAt := time.Now().UTC().Format(time.RFC3339)

sqlStrCat.WriteString("INSERT INTO " + Model.Type + "_values_hr" + modelHour + "(`lat`,`long`,`value`,`rh`,`vvel`,`prestend`,`last_updated`,`imported_at`) VALUES")
for _, point := range ModelPoints {
	sqlStrCat.WriteString(strconv.FormatFloat(float64(point.Lat), 'e', -1, 32))
	sqlStrCat.WriteString(strconv.FormatFloat(float64(point.Long), 'e', -1, 32))
	sqlStrCat.WriteString(strconv.FormatFloat(float64(point.RelativeHumidity), 'e', -1, 32))
	sqlStrCat.WriteString(strconv.FormatFloat(float64(point.VerticalVelocity), 'e', -1, 32))
	sqlStrCat.WriteString(strconv.FormatFloat(float64(point.PressureTendency), 'e', -1, 32))
	sqlStrCat.WriteString("\"" + Model.LastUpdated + "\"")
	sqlStrCat.WriteString("\"" + importedAt + "\"")

I’ve been testing out Aerospike, and I’m a bit confused as to why there’s not a more efficient way of importing a large number of records quickly. It takes around 3 minutes to import the same type/size of dataset with this:

for i := range ModelPoints {
	modelPointKey, _ = aerospike.NewKey(ns, set, i+1)

	p, _ = geojson.Marshal(geojson.NewFeature(geojson.NewPoint(geojson.Coordinate{geojson.Coord(ModelPoints[i].Long), geojson.Coord(ModelPoints[i].Lat)}), map[string]interface{}{
		"rh":           ModelPoints[i].RelativeHumidity,
		"high_clouds":  ModelPoints[i].HighClouds,
		"vvel":         ModelPoints[i].VerticalVelocity,
		"prestend":     ModelPoints[i].PressureTendency,
		"last_updated": m.LastUpdated,
		"imported_at":  time.Now().UTC().Format(time.RFC3339),
	}, nil))

	Store.PutBins(nil, modelPointKey, &aerospike.Bin{"gj", aerospike.NewGeoJSONValue(p)})

Probably not a big deal to many, but how does one go about improving that? Would batch writes be the feature that would fix this? Is it possible that I could write a module to allow me to do something similar as in my first example?

Could the geojson marshaling be taking up the majority of the execution time? Might want to try doing a string construction for passing into the bin?

Thanks. It was taking up much more time than I thought. I got the average time to around 1 minute and 35 seconds with an ugly bytes.Buffer like the first code block in the op.

The “batch writes/deletes/UDFs” feature has been released in Aerospike

This topic was automatically closed 84 days after the last reply. New replies are no longer allowed.