Batch Writes


#1

Using Go, coming from MariaDB, I created a bytes.Buffer, and then wrote records put together from files as strings to create a single large INSERT string, which I then execute. I could import well over a million rows in around 45 seconds.

var sqlStrCat bytes.Buffer
importedAt := time.Now().UTC().Format(time.RFC3339)

sqlStrCat.WriteString("INSERT INTO " + Model.Type + "_values_hr" + modelHour + "(`lat`,`long`,`value`,`rh`,`vvel`,`prestend`,`last_updated`,`imported_at`) VALUES")
		
for _, point := range ModelPoints {
	sqlStrCat.WriteString("(")
	sqlStrCat.WriteString(strconv.FormatFloat(float64(point.Lat), 'e', -1, 32))
	sqlStrCat.WriteString(",")
	sqlStrCat.WriteString(strconv.FormatFloat(float64(point.Long), 'e', -1, 32))
	sqlStrCat.WriteString(",")
	sqlStrCat.WriteString(strconv.FormatFloat(float64(point.RelativeHumidity), 'e', -1, 32))
	sqlStrCat.WriteString(",")
	sqlStrCat.WriteString(strconv.FormatFloat(float64(point.VerticalVelocity), 'e', -1, 32))
	sqlStrCat.WriteString(",")
	sqlStrCat.WriteString(strconv.FormatFloat(float64(point.PressureTendency), 'e', -1, 32))
	sqlStrCat.WriteString(",")
	sqlStrCat.WriteString("\"" + Model.LastUpdated + "\"")
	sqlStrCat.WriteString(",")
	sqlStrCat.WriteString("\"" + importedAt + "\"")
	sqlStrCat.WriteString("),")
}

I’ve been testing out Aerospike, and I’m a bit confused as to why there’s not a more efficient way of importing a large number of records quickly. It takes around 3 minutes to import the same type/size of dataset with this:

for i := range ModelPoints {
	modelPointKey, _ = aerospike.NewKey(ns, set, i+1)

	p, _ = geojson.Marshal(geojson.NewFeature(geojson.NewPoint(geojson.Coordinate{geojson.Coord(ModelPoints[i].Long), geojson.Coord(ModelPoints[i].Lat)}), map[string]interface{}{
		"rh":           ModelPoints[i].RelativeHumidity,
		"high_clouds":  ModelPoints[i].HighClouds,
		"vvel":         ModelPoints[i].VerticalVelocity,
		"prestend":     ModelPoints[i].PressureTendency,
		"last_updated": m.LastUpdated,
		"imported_at":  time.Now().UTC().Format(time.RFC3339),
	}, nil))

	Store.PutBins(nil, modelPointKey, &aerospike.Bin{"gj", aerospike.NewGeoJSONValue(p)})
}

Probably not a big deal to many, but how does one go about improving that? Would batch writes be the feature that would fix this? Is it possible that I could write a module to allow me to do something similar as in my first example?


#2

Could the geojson marshaling be taking up the majority of the execution time? Might want to try doing a string construction for passing into the bin?


#3

Thanks. It was taking up much more time than I thought. I got the average time to around 1 minute and 35 seconds with an ugly bytes.Buffer like the first code block in the op.