Aerospike set to csv file

mli · July 8, 2019, 9:11pm

What’s the best way to dump an Aerospike table/set to a csv file? Assume the set has a very simple schema as: PK key (duplicate of PK, as PK may not be saved with data) value

Thanks. Ming

kporter · July 8, 2019, 9:22pm

The key is always stored if the client writes with the sendKey policy.

There currently isn’t a tool to export a set to a CSV file. You would need to scan the set and have the client serialize to CSV.

Mich_Talebzadeh · July 8, 2019, 9:52pm

If you have aerospike-spark connector, you can do that by creating a DataFrame in Spark on the Aerospike set, read data into Spark and export data in CSV format to a file in HDFS.

     spark.conf.set("aerospike.seedhost", dbHost)
     spark.conf.set("aerospike.port", dbPort)
     spark.conf.set("aerospike.namespace",namespace)
     spark.conf.set("aerospike.set", dbSet)
     spark.conf.set("aerospike.keyPath", "/etc/aerospike/features.conf")
     spark.conf.set("aerospike.user", dbConnection)
     spark.conf.set("aerospike.password", dbPassword)

     val sqlContext = spark.sqlContext

// create DataFrame on top of Aerospike set   
val dfBatchRead  = sqlContext.read.
      format("com.aerospike.spark.sql").
      option("aerospike.batchMax", 10000).
      load

This returns the following

scala>    val dfBatchRead  = sqlContext.read.
     |       format("com.aerospike.spark.sql").
     |       option("aerospike.batchMax", 10000).
     |       load
dfBatchRead: org.apache.spark.sql.DataFrame = [__key: string, __digest: binary ... 7 more fields]

// Show the schema
scala> dfBatchRead.printSchema
root
 |-- __key: string (nullable = true)
 |-- __digest: binary (nullable = false)
 |-- __expiry: integer (nullable = false)
 |-- __generation: integer (nullable = false)
 |-- __ttl: integer (nullable = false)
 |-- price: double (nullable = true)
 |-- rowkey: string (nullable = true)
 |-- timeissued: string (nullable = true)
 |-- ticker: string (nullable = true)

//Only choose the bins that you want and show the two first records as an example

scala> dfBatchRead.select('price, 'rowkey, 'timeissued, 'ticker).take(2).foreach(println)
[295.95,06a41fb4-5215-4dad-8f99-826c7ab34f6f,2019-07-02T23:00:02,MRW]
[266.1,12b085ea-1127-48fe-b5fa-0c7947b7d4a7,2019-07-04T09:48:52,VOD]

// Save it to csv file in HDFS. coalesce(1) means create one partition only

dfBatchRead.select('price, 'rowkey, 'timeissued, 'ticker).coalesce(1).write.csv("/tmp/test.csv")

Get it from the Edge node. It creates a directory called test.csv under the file system and a long file name

hdfs dfs -get /tmp/test.csv

cd test.csv

mv part-00000-dc6b5d62-cad1-467e-9440-fc0ebfaa5467-c000.csv test.csv

head test.csv
38.23,dde599ee-6aaf-4621-bbd6-a79fd510b072,2019-07-02T17:51:08,SAP
489.54,ec247768-0c55-465b-8304-c96c5d3c7874,2019-07-02T09:32:29,BP
324.38,d012cffd-2f3a-418a-b8e2-2176f703c0d2,2019-07-02T15:03:53,MRW
22.36,fd7ca950-54cc-49dd-97a7-496d9cf7e84d,2019-07-02T19:05:15,MSFT
33.09,ef7d1fde-dc86-460f-8929-d3ec918c0364,2019-07-02T14:42:04,ORCL
274.5,2483f9ab-337b-4bc9-a8f8-797759db2a97,2019-07-02T19:31:41,SBRY

Here you are using Spark as ETL tool. You can even migrate tables/collections from another DB to Aerospike set etc.

HTH

mli · July 11, 2019, 5:19pm

Thanks kporter and Mich_Talebzadeh.

Now I am using scan first and maybe spark later.

I checked online documents at: https://www.aerospike.com/docs/client/python/usage/scan/scan.html

I tried following steps and it worked: scan = client.scan(‘test’, ‘demo’) scan.select(‘name’, ‘age’) scan.foreach(print_result)

Can I limit number of records to scan, especially during testing and development. Something like:

Select * from test.demo limit 1000

Thanks.

Mich_Talebzadeh · July 11, 2019, 10:57pm

I am not that familiar with python here. However, in Scala you can do say using DataFrame and functional programming.

DataFrame.collect.take(1000).foreach(println)

For example:

DataFrame.select("name", "age").orderBy("age").take(1000).foreach(println)

May be something similar in python?

HTH

Topic		Replies	Views
How to create set and save or query data by java Java Client	2	1890	June 18, 2022
Is there any tool to migrate data from MySQL (or MongoDB) to Aerospike? Aerospike and other Databases	3	4011	July 17, 2019
How to save data to mysql? Feature Discussion	3	1827	August 22, 2017
Storing tabular/PSV data in Aerospike Data Modeling query , spark , hadoop	2	1922	September 30, 2015
Best way to store bins with multiple values for one key in Aerospike Data Modeling	1	5522	September 10, 2015

Aerospike set to csv file

Related topics