I just came across Aerospike Spark Connector “Aerospark” and thought of using it in our current project for real time stream joins with spark rdd. So my question is can I join streaming rdd with from kafka with Aerospike rdd? If so, will aerospike rdd internally works by calling multi get api to get all keys for streaming batch records at once? I want to understand the internals of aerospike rdd while performing join. My requirement is that streaming rdd (from kafka) will have small number of records (e.g 30000) which I want to join with data in Aerospike which may contain millions of records. So while performing going, will Aerospike spark connector only load 30000 keys from Aerospike (via multi Get) that I get from kafka streams or it will first get all records present in Aerospike and then perform join? I think if uses multi get to get only 30k records, then it will make sense for me to use in my current project. Let me know in case any other details are required.
The current Aerospike Spark Connector does not support streaming joins using batch reads, but we are in the process of retooling this connector and this is certainly one of the use cases that we plan to support. We will update this topic when the new version of the connector is available.
A new version of the Spark connector has been released which has the aeroJoin function. Have a look at the updated tutorial documentation. Let us know if you have any issues/questions.
The tutorial link is broken. Does this connector support Spark 2 streaming?