Aerospike connectors to ingest data from Aerospike into machine learning / statistical tools like R

Samarjit_Uppal · February 20, 2015, 11:27am

Hi,

I am not able to find any connectors to ingest data from aerospike into R to perform statistical and predictive analytics algrotihms.

Is there a way to ingest data from aerospike to perform predictive analytics and other statistical stuff?

Thanks & Regards, Samar

bbulkow · March 2, 2015, 10:47pm

Hi Samar,

The ingest will depend on which toolchain you’re using.

We have a repo with some hadoop and thus spark connectors, and some basic operations on it. This includes some spark analytics examples on aerospike.

This tooling will also allow you to easily get data from Aerospike to HDFS / Hbase / Hadoop, as well as to run MapReduce jobs on aerospike data without “ingest”.

A guy named Sasha published a nice Spark RDD example for Aerospike.

We have published a Storm client integration. It creates both spouts and bolts that read and write from Aerospike.

We have published an example real-time recommendation engine as a stand alone example. https://github.com/aerospike/recommendation-engine-example

A gentleman was doing some predictive Caltrans / traffic analytics with a great tool called Dato (was Graphlab) and Aerospike but I can’t find his DevWeek talk online

We have not done integration with R clients. I’m fond of the R language for similar small-data processing (limited to in-memory quick jobs), but we haven’t done an integration. As we have a C client and a Java client, both open source, I would expect that anyone who wanted to port/publish the connector would have a reasonable time.

Let me know what tooling you’re using, and perhaps I can be more specific.

Samarjit_Uppal · March 7, 2015, 1:44pm

Hello,

Thanks for your detailed reply. We will not be using Spark. Instead, we will be using Storm with Aerospike and its great that you have already made available connectors for Storm. Maybe we could feed the data from Aerospike into the Storm Trident ML library. Thanks for the dato.com tip. It was a v interesting read.

I think the speed of Aerospike and availability of streaming UDF’s could be make for a use case of building some analytics on our own. But for hardcore machine learning algorithms which perhaps require offline processing, we might use the Trident library.

Please let me know your thoughts on this.

Many thanks, Samar

bbulkow · April 10, 2015, 5:56am

I strongly believe using Aerospike as the “temporary store” for streaming work is a great case. Many of those algorithms should use a shared store for temporary data instead of machine-local, because with machine-local if a machine crashes you’ve lost a lot of state.

Which framework to use is a much harder question, and is determined by a lot of factors. The trident library for doing exactly once transactions has a number of plusses and minuses, and I’d suggest really testing it for your performance level. You might also want to look at Akka.

Topic		Replies	Views
Aerospike Java spark connector	0	844	February 15, 2018
Aerospark: an open-source Spark connector for Aerospike’s NoSQL database Spark	2	3114	October 21, 2016
Is there any integration or connector between Aerospike and ElasticSearch? Connectors	2	3925	May 10, 2016
Moving data from RDBMS/Flat files into Aerospike Tools	1	1482	August 16, 2014
Spark Streaming join with Aerospark RDD Spark stream , spark	10	2805	August 30, 2019

Aerospike connectors to ingest data from Aerospike into machine learning / statistical tools like R

Related topics