Load csv file

Can some one please point to load csv file into aerospike?

Also performance parameters which need to be considered for getting performance

Here is the documentation for the csv loader: https://aerospike.com/docs/tools/asloader/index.html

Having multiple data files can help but it seems the main driver would be the number of CPU (as the number of writer threads is equal to the number of CPU x 5).

Thanks for replying

  1. How to delete all the sets from the namespace?

  2. com.aerospike.client.AerospikeException: Error 4,1,0,0,0,BB9030011AC4202 127.0.0.1 3000: Parameter error at com.aerospike.client.command.WriteCommand.parseResult(WriteCommand.java:82) at com.aerospike.client.command.SyncCommand.executeCommand(SyncCommand.java:103) at com.aerospike.client.command.SyncCommand.execute(SyncCommand.java:64) at com.aerospike.client.AerospikeClient.put(AerospikeClient.java:385) at com.aerospike.load.AsWriterTask.writeToAs(AsWriterTask.java:135) at com.aerospike.load.AsWriterTask.call(AsWriterTask.java:582) at com.aerospike.load.AsWriterTask.call(AsWriterTask.java:54) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ERROR AsWriterTask :157 - File: concept.csv Line: 30138Aerospike Write Error: 4 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

You can look up the truncate command for deleting sets.

Regarding the error code, you can consult the list here: Error Codes | Aerospike Documentation

Error 4 indicates a bad parameter past or a server not supporting that parameter.

Do we have any limit on value for set ?

“set”: {“column_name”:“id”, “type”: “string”},

“CPP9087,CPP9087,CPP9087,CPP9087,CPP9087,CPP9087,CPP9087,CPP9087,CPP9087”

INFO AerospikeLoad :237 - Number of data files:1 INFO AerospikeLoad :241 - Aerospike loader started INFO AerospikeLoad :386 - Config file processed. INFO AerospikeLoad :408 - Reader pool size : 48 INFO PrintStat :93 - 2020-10-05 23:52:36 load(Write count=0 tps=0 Errors=0 (Timeout:0 KeyExists:0 othersWrites:0 ReadErrors:0 Processing:0) Skiped (NullKey:0 NoBins:0) Progress:0% INFO AerospikeLoad :421 - Shutdown reader thread pool INFO AerospikeLoad :795 - Processing: 70.csv INFO AerospikeLoad :790 - Reader completed 2-lines in 0.002sec, From file: 70.csv INFO AerospikeLoad :424 - Reader thread pool terminated INFO AerospikeLoad :428 - Shutdown writer thread pool ERROR AsWriterTask :157 - File: 70.csv Line: 2Aerospike Write Error: 4 INFO AerospikeLoad :431 - Writer thread pool terminated INFO AerospikeLoad :434 - Final Statistics of importer: (Records Read = 1, Successful Writes = 0, Successful Primary Writes = 0, Successful Mapping Writes = 0, Errors = 1(1-Write,0-Read,0-Processing), Skipped = 0(0-NullKey,0-NoBins) INFO AerospikeLoad :253 - Aerospike loader completed INFO AerospikeLoad :260 - Loader completed in 0.078sec

  1. also what are the parameters which needs tweaking to get better tps.

  2. If we restart the aerospike that gets wiped out…how do we persist between restart

  3. If i want to combine 3 columns from csv file to make it as key like below how do we load?

“key”: {“column_name”:“id1:id2:id3”, “type”: “integer”},

There is a limit of 1023 sets per namespace. Check the server logs to figure out potential errors.

The list of parameters is specified in the docs. Best thing to accelerate would be to separate in multiple files I think.

If you run with a persisted namespace (storage-engine device) the data should be persisted upon restart. With storage-engine memory, the data will not persist upon restart, but if running with replication-factor 2 or more, you would still have a copy (or more) of the data.

Not sure the asloader tool supports such feature to combine multiple columns to make it a key. You should be able to easily code that through a client, though.

Running the following in examples but it never writes and there are no errors.

java -cp /Users/nareshmaharaj/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1.jar:/Users/nareshmaharaj/.m2/repository/org/gnu/gnu-crypto/2.0.1/gnu-crypto-2.0.1.jar:/Users/nareshmaharaj/.m2/repository/org/apache/logging/log4j/log4j-api/2.14.1/log4j-api-2.14.1.jar:/Users/nareshmaharaj/.m2/repository/org/apache/logging/log4j/log4j-core/2.14.1/log4j-core-2.14.1.jar:/Users/nareshmaharaj/.m2/repository/com/aerospike/aerospike-client/5.0.0/aerospike-client-5.0.0.jar:/Users/nareshmaharaj/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:target/aerospike-loader-1.0-SNAPSHOT.jar com.aerospike.load.AerospikeLoad -h ec2-35-181-51-211.eu-west-3.compute.amazonaws.com -p 3000 -n "insurance.bar" -c example/alldatatype.json example/alldatatype.dsv
INFO  AerospikeLoad    :237 - Number of data files:1
INFO  AerospikeLoad    :241 - Aerospike loader started
INFO  AerospikeLoad    :386 - Config file processed.
INFO  AerospikeLoad    :408 - Reader pool size : 12
INFO  PrintStat        :93 - 2021-06-20 14:09:40 load(Write count=0 tps=0 Errors=0 (Timeout:0 KeyExists:0 othersWrites:0 ReadErrors:0 Processing:0) Skiped (NullKey:0 NoBins:0) Progress:0%
INFO  AerospikeLoad    :421 - Shutdown reader thread pool
INFO  AerospikeLoad    :795 - Processing: alldatatype.dsv
INFO  AerospikeLoad    :790 - Reader completed 9-lines in 0.003sec, From file: alldatatype.dsv
INFO  AerospikeLoad    :424 - Reader thread pool terminated
INFO  AerospikeLoad    :428 - Shutdown writer thread pool
INFO  AerospikeLoad    :431 - Writer thread pool terminated
INFO  AerospikeLoad    :434 - Final Statistics of importer: (Records Read = 8, Successful Writes = 0, Successful Primary Writes = 0, Successful Mapping Writes = 0, Errors = 0(0-Write,0-Read,0-Processing), Skipped = 0(0-NullKey,0-NoBins)
INFO  AerospikeLoad    :253 - Aerospike loader completed
INFO  AerospikeLoad    :260 - Loader completed in 0.596sec

With debug on:

Nareshs-MacBook-Pro:aerospike-loader nareshmaharaj$ java -cp /Users/nareshmaharaj/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1.jar:/Users/nareshmaharaj/.m2/repository/org/gnu/gnu-crypto/2.0.1/gnu-crypto-2.0.1.jar:/Users/nareshmaharaj/.m2/repository/org/apache/logging/log4j/log4j-api/2.14.1/log4j-api-2.14.1.jar:/Users/nareshmaharaj/.m2/repository/org/apache/logging/log4j/log4j-core/2.14.1/log4j-core-2.14.1.jar:/Users/nareshmaharaj/.m2/repository/com/aerospike/aerospike-client/5.0.0/aerospike-client-5.0.0.jar:/Users/nareshmaharaj/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:target/aerospike-loader-1.0-SNAPSHOT.jar com.aerospike.load.AerospikeLoad -h ec2-35-181-51-211.eu-west-3.compute.amazonaws.com -n "insurance.bar" -c example/alldatatype.json example/alldatatype.dsv
DEBUG AerospikeLoad    :341 - Using writer Threads: 60
DEBUG AerospikeLoad    :345 - Using reader Threads: 12
INFO  AerospikeLoad    :237 - Number of data files:1
DEBUG AerospikeLoad    :368 - File names:alldatatype.dsv
INFO  AerospikeLoad    :241 - Aerospike loader started
DEBUG AerospikeLoad    :448 - Column definition files/directory: example/alldatatype.json
DEBUG Parser           :75 - Config file contents: {"dsv_config":{"header_exist":true,"n_columns_datafile":10,"delimiter":"##"},"mappings":[{"set":{"column_name":"set","type":"string"},"bin_list":[{"name":"intDataBin","value":{"column_name":"intData","type":"integer"}},{"name":"floatDataBin","value":{"column_name":"floatData","type":"float"}},{"name":"stringDataBin","value":{"column_name":"stringData","type":"string"}},{"name":"listDataBin","value":{"column_name":"listData","type":"json"}},{"name":"mapDataBin","value":{"column_name":"mapData","type":"json"}},{"name":"dateDataBin","value":{"dst_type":"integer","column_name":"dateData","type":"timestamp","encoding":"MM\/dd\/yy"}},{"name":"blobDataBin","value":{"dst_type":"blob","column_name":"blobData","type":"blob","encoding":"hex"}},{"name":"geoDataBin","value":{"column_name":"geoData","type":"geojson"}},{"name":"timestamp","value":{"dst_type":"integer","column_name":"system_time","type":"timestamp","encoding":"MM\/dd\/yy"}}],"key":{"column_name":"key","type":"string"}}],"input_type":"dsv","version":"2.0"}
INFO  AerospikeLoad    :386 - Config file processed.
DEBUG AerospikeLoad    :492 - Config version used:2.0
DEBUG AerospikeLoad    :595 - MappingDef:MappingDefinition [secondary_mapping=false keyColumnDef=ColumnDefinition [staticName=null, nameDef=ColumnDefinition [columnPos=1, columnName=key, srcType=STRING, dstType=null, encoding=null, removePrefix=null, jsonPath=null]] setColumnDef=ColumnDefinition [staticName=null, nameDef=ColumnDefinition [columnPos=0, columnName=set, srcType=STRING, dstType=null, encoding=null, removePrefix=null, jsonPath=null]]binColumnDefs=[ColumnDefinition [staticName=intDataBin, staticValue=null, nameDef=ColumnDefinition [columnPos=-1, columnName=null, srcType=null, dstType=null, encoding=null, removePrefix=null, jsonPath=null], valueDef=ColumnDefinition [columnPos=2, columnName=intData, srcType=INTEGER, dstType=null, encoding=null, removePrefix=null, jsonPath=null]], ColumnDefinition [staticName=floatDataBin, staticValue=null, nameDef=ColumnDefinition [columnPos=-1, columnName=null, srcType=null, dstType=null, encoding=null, removePrefix=null, jsonPath=null], valueDef=ColumnDefinition [columnPos=3, columnName=floatData, srcType=FLOAT, dstType=null, encoding=null, removePrefix=null, jsonPath=null]], ColumnDefinition [staticName=stringDataBin, staticValue=null, nameDef=ColumnDefinition [columnPos=-1, columnName=null, srcType=null, dstType=null, encoding=null, removePrefix=null, jsonPath=null], valueDef=ColumnDefinition [columnPos=4, columnName=stringData, srcType=STRING, dstType=null, encoding=null, removePrefix=null, jsonPath=null]], ColumnDefinition [staticName=listDataBin, staticValue=null, nameDef=ColumnDefinition [columnPos=-1, columnName=null, srcType=null, dstType=null, encoding=null, removePrefix=null, jsonPath=null], valueDef=ColumnDefinition [columnPos=5, columnName=listData, srcType=JSON, dstType=null, encoding=null, removePrefix=null, jsonPath=null]], ColumnDefinition [staticName=mapDataBin, staticValue=null, nameDef=ColumnDefinition [columnPos=-1, columnName=null, srcType=null, dstType=null, encoding=null, removePrefix=null, jsonPath=null], valueDef=ColumnDefinition [columnPos=6, columnName=mapData, srcType=JSON, dstType=null, encoding=null, removePrefix=null, jsonPath=null]], ColumnDefinition [staticName=dateDataBin, staticValue=null, nameDef=ColumnDefinition [columnPos=-1, columnName=null, srcType=null, dstType=null, encoding=null, removePrefix=null, jsonPath=null], valueDef=ColumnDefinition [columnPos=7, columnName=dateData, srcType=TIMESTAMP, dstType=INTEGER, encoding=MM/dd/yy, removePrefix=null, jsonPath=null]], ColumnDefinition [staticName=blobDataBin, staticValue=null, nameDef=ColumnDefinition [columnPos=-1, columnName=null, srcType=null, dstType=null, encoding=null, removePrefix=null, jsonPath=null], valueDef=ColumnDefinition [columnPos=8, columnName=blobData, srcType=BLOB, dstType=BLOB, encoding=hex, removePrefix=null, jsonPath=null]], ColumnDefinition [staticName=geoDataBin, staticValue=null, nameDef=ColumnDefinition [columnPos=-1, columnName=null, srcType=null, dstType=null, encoding=null, removePrefix=null, jsonPath=null], valueDef=ColumnDefinition [columnPos=9, columnName=geoData, srcType=GEOJSON, dstType=null, encoding=null, removePrefix=null, jsonPath=null]], ColumnDefinition [staticName=timestamp, staticValue=null, nameDef=ColumnDefinition [columnPos=-1, columnName=null, srcType=null, dstType=null, encoding=null, removePrefix=null, jsonPath=null], valueDef=ColumnDefinition [columnPos=-1, columnName=system_time, srcType=TIMESTAMP, dstType=INTEGER, encoding=MM/dd/yy, removePrefix=null, jsonPath=null]]]]
INFO  AerospikeLoad    :408 - Reader pool size : 12
DEBUG PrintStat        :64 - Used Memory: 12 Free Memory: 243 Total Memory: 256 Max Memory: 4096
DEBUG AerospikeLoad    :415 - Submitting task for: /Users/nareshmaharaj/Documents/aerospike/loader_csv/aerospike-loader/example/alldatatype.dsv
DEBUG PrintStat        :91 - 2021-06-20 14:21:18: Read/process tps:0
INFO  PrintStat        :93 - 2021-06-20 14:21:18 load(Write count=0 tps=0 Errors=0 (Timeout:0 KeyExists:0 othersWrites:0 ReadErrors:0 Processing:0) Skiped (NullKey:0 NoBins:0) Progress:0%
INFO  AerospikeLoad    :421 - Shutdown reader thread pool
INFO  AerospikeLoad    :795 - Processing: alldatatype.dsv
DEBUG AerospikeLoad    :739 - Reading file:  alldatatype.dsv
DEBUG AsWriterTask     :208 - processing  File: alldatatype.dsvline: 2
DEBUG AsWriterTask     :208 - processing  File: alldatatype.dsvline: 3
DEBUG AsWriterTask     :208 - processing  File: alldatatype.dsvline: 6
DEBUG AsWriterTask     :208 - processing  File: alldatatype.dsvline: 5
DEBUG AsWriterTask     :208 - processing  File: alldatatype.dsvline: 7
DEBUG AsWriterTask     :208 - processing  File: alldatatype.dsvline: 4
DEBUG AsWriterTask     :208 - processing  File: alldatatype.dsvline: 8
DEBUG AsWriterTask     :208 - processing  File: alldatatype.dsvline: 9
INFO  AerospikeLoad    :790 - Reader completed 9-lines in 0.006sec, From file: alldatatype.dsv
INFO  AerospikeLoad    :424 - Reader thread pool terminated
INFO  AerospikeLoad    :428 - Shutdown writer thread pool
INFO  AerospikeLoad    :431 - Writer thread pool terminated
INFO  AerospikeLoad    :434 - Final Statistics of importer: (Records Read = 8, Successful Writes = 0, Successful Primary Writes = 0, Successful Mapping Writes = 0, Errors = 0(0-Write,0-Read,0-Processing), Skipped = 0(0-NullKey,0-NoBins)
INFO  AerospikeLoad    :253 - Aerospike loader completed
INFO  AerospikeLoad    :260 - Loader completed in 0.631sec
THIS IS THE STATS with no writes
INFO  AerospikeLoad    :434 - Final Statistics of importer: (Records Read = 8, Successful Writes = 0, Successful Primary Writes = 0, Successful Mapping Writes = 0, Errors = 0(0-Write,0-Read,0-Processing), Skipped = 0(0-NullKey,0-NoBins)

Something may not match between the config/mapping and the .csv file… Do you have the data file?

Here is the config (for readability):

{
	"dsv_config": {
		"header_exist": true,
		"n_columns_datafile": 10,
		"delimiter": "##"
	},
	"mappings": [{
		"set": {
			"column_name": "set",
			"type": "string"
		},
		"bin_list": [{
			"name": "intDataBin",
			"value": {
				"column_name": "intData",
				"type": "integer"
			}
		}, {
			"name": "floatDataBin",
			"value": {
				"column_name": "floatData",
				"type": "float"
			}
		}, {
			"name": "stringDataBin",
			"value": {
				"column_name": "stringData",
				"type": "string"
			}
		}, {
			"name": "listDataBin",
			"value": {
				"column_name": "listData",
				"type": "json"
			}
		}, {
			"name": "mapDataBin",
			"value": {
				"column_name": "mapData",
				"type": "json"
			}
		}, {
			"name": "dateDataBin",
			"value": {
				"dst_type": "integer",
				"column_name": "dateData",
				"type": "timestamp",
				"encoding": "MM\/dd\/yy"
			}
		}, {
			"name": "blobDataBin",
			"value": {
				"dst_type": "blob",
				"column_name": "blobData",
				"type": "blob",
				"encoding": "hex"
			}
		}, {
			"name": "geoDataBin",
			"value": {
				"column_name": "geoData",
				"type": "geojson"
			}
		}, {
			"name": "timestamp",
			"value": {
				"dst_type": "integer",
				"column_name": "system_time",
				"type": "timestamp",
				"encoding": "MM\/dd\/yy"
			}
		}],
		"key": {
			"column_name": "key",
			"type": "string"
		}
	}],
	"input_type": "dsv",
	"version": "2.0"
}