Aerospike set to csv file

What’s the best way to dump an Aerospike table/set to a csv file? Assume the set has a very simple schema as: PK key (duplicate of PK, as PK may not be saved with data) value

Thanks. Ming

The key is always stored if the client writes with the sendKey policy.

There currently isn’t a tool to export a set to a CSV file. You would need to scan the set and have the client serialize to CSV.

If you have aerospike-spark connector, you can do that by creating a DataFrame in Spark on the Aerospike set, read data into Spark and export data in CSV format to a file in HDFS.

     spark.conf.set("aerospike.seedhost", dbHost)
     spark.conf.set("aerospike.port", dbPort)
     spark.conf.set("aerospike.set", dbSet)
     spark.conf.set("aerospike.keyPath", "/etc/aerospike/features.conf")
     spark.conf.set("aerospike.user", dbConnection)
     spark.conf.set("aerospike.password", dbPassword)

     val sqlContext = spark.sqlContext

// create DataFrame on top of Aerospike set   
val dfBatchRead  =
      option("aerospike.batchMax", 10000).

This returns the following

scala>    val dfBatchRead  =
     |       format("com.aerospike.spark.sql").
     |       option("aerospike.batchMax", 10000).
     |       load
dfBatchRead: org.apache.spark.sql.DataFrame = [__key: string, __digest: binary ... 7 more fields]

// Show the schema
scala> dfBatchRead.printSchema
 |-- __key: string (nullable = true)
 |-- __digest: binary (nullable = false)
 |-- __expiry: integer (nullable = false)
 |-- __generation: integer (nullable = false)
 |-- __ttl: integer (nullable = false)
 |-- price: double (nullable = true)
 |-- rowkey: string (nullable = true)
 |-- timeissued: string (nullable = true)
 |-- ticker: string (nullable = true)

//Only choose the bins that you want and show the two first records as an example

scala>'price, 'rowkey, 'timeissued, 'ticker).take(2).foreach(println)

// Save it to csv file in HDFS. coalesce(1) means create one partition only'price, 'rowkey, 'timeissued, 'ticker).coalesce(1).write.csv("/tmp/test.csv")

Get it from the Edge node. It creates a directory called test.csv under the file system and a long file name

hdfs dfs -get /tmp/test.csv

cd test.csv

mv part-00000-dc6b5d62-cad1-467e-9440-fc0ebfaa5467-c000.csv test.csv

head test.csv

Here you are using Spark as ETL tool. You can even migrate tables/collections from another DB to Aerospike set etc.


Thanks kporter and Mich_Talebzadeh.

Now I am using scan first and maybe spark later.

I checked online documents at:

I tried following steps and it worked: scan = client.scan(‘test’, ‘demo’)‘name’, ‘age’) scan.foreach(print_result)

Can I limit number of records to scan, especially during testing and development. Something like:

Select * from test.demo limit 1000


I am not that familiar with python here. However, in Scala you can do say using DataFrame and functional programming.


For example:"name", "age").orderBy("age").take(1000).foreach(println)

May be something similar in python?