Using Go, coming from MariaDB, I created a bytes.Buffer
, and then wrote records put together from files as strings to create a single large INSERT
string, which I then execute. I could import well over a million rows in around 45 seconds.
var sqlStrCat bytes.Buffer
importedAt := time.Now().UTC().Format(time.RFC3339)
sqlStrCat.WriteString("INSERT INTO " + Model.Type + "_values_hr" + modelHour + "(`lat`,`long`,`value`,`rh`,`vvel`,`prestend`,`last_updated`,`imported_at`) VALUES")
for _, point := range ModelPoints {
sqlStrCat.WriteString("(")
sqlStrCat.WriteString(strconv.FormatFloat(float64(point.Lat), 'e', -1, 32))
sqlStrCat.WriteString(",")
sqlStrCat.WriteString(strconv.FormatFloat(float64(point.Long), 'e', -1, 32))
sqlStrCat.WriteString(",")
sqlStrCat.WriteString(strconv.FormatFloat(float64(point.RelativeHumidity), 'e', -1, 32))
sqlStrCat.WriteString(",")
sqlStrCat.WriteString(strconv.FormatFloat(float64(point.VerticalVelocity), 'e', -1, 32))
sqlStrCat.WriteString(",")
sqlStrCat.WriteString(strconv.FormatFloat(float64(point.PressureTendency), 'e', -1, 32))
sqlStrCat.WriteString(",")
sqlStrCat.WriteString("\"" + Model.LastUpdated + "\"")
sqlStrCat.WriteString(",")
sqlStrCat.WriteString("\"" + importedAt + "\"")
sqlStrCat.WriteString("),")
}
I’ve been testing out Aerospike, and I’m a bit confused as to why there’s not a more efficient way of importing a large number of records quickly. It takes around 3 minutes to import the same type/size of dataset with this:
for i := range ModelPoints {
modelPointKey, _ = aerospike.NewKey(ns, set, i+1)
p, _ = geojson.Marshal(geojson.NewFeature(geojson.NewPoint(geojson.Coordinate{geojson.Coord(ModelPoints[i].Long), geojson.Coord(ModelPoints[i].Lat)}), map[string]interface{}{
"rh": ModelPoints[i].RelativeHumidity,
"high_clouds": ModelPoints[i].HighClouds,
"vvel": ModelPoints[i].VerticalVelocity,
"prestend": ModelPoints[i].PressureTendency,
"last_updated": m.LastUpdated,
"imported_at": time.Now().UTC().Format(time.RFC3339),
}, nil))
Store.PutBins(nil, modelPointKey, &aerospike.Bin{"gj", aerospike.NewGeoJSONValue(p)})
}
Probably not a big deal to many, but how does one go about improving that? Would batch writes be the feature that would fix this? Is it possible that I could write a module to allow me to do something similar as in my first example?