Extending Aerospike from 1 node to a 2-node cluster in AWS (Amazon Web Services)

Hi Sir

I am new to aerospike and able to install aerospike on single node. Now i wanted to extend it 2 node cluster and need explanation for following points

  1. Please can you explain me how to configure clustering between 2 nodes of r3.large amazon instances. Step by step guide with sections needs to be edited in aerospike.conf file.

  2. In case of cluster in aerospike between two systems how to push data ( Do i need to push data individually to each server or there will be an cluster ip address).

Thank You

Hi Vinay-

The information on creating a cluster with mesh is provided in the following documentation:

http://www.aerospike.com/docs/operations/configure/network/heartbeat/#mesh-unicast-heartbeat

As you are using AWS, use mesh heartbeat.

Once the nodes have formed a cluster, you need not worry about manually pushing data between node_a and node_b. The nodes perform that task without any intervention.

I hope this helps. Would you let us know your progress?

Thank you for your time,

-DM

Just to follow up on what I wrote previously, when you write data with the Aerospike Smart Client, the Smart Client directs the record to the nodes in the cluster. The beauty of the Smart Client (and the clustering in general) is that you do not need to manually balance records on the cluster, nor do you need to tell the client where to write records. The client efficiently writes data to the cluster. The cluster automatically moves data between nodes.

If you were to add a third node to the cluster, the other nodes in the cluster would automatically discover the third node. The client can find out about the new third node without any manual intervention.

I hope this helps. Let me know if I can answer anything else for you.

-DM

Sorry for the late reply. I was able to follow instructions.

Now i have configured 2 node cluster on amazon and using aerospace-loader to push data on cluster.

I had few doubts - can you please explain me.

  1. Just need an clarification whether in set same field can be defined as both key and secondary_index.

  2. I want records from particular set in namespace need to expire/evict/delete all records without affecting data in other sets in same namespace.

  3. In a cluster mode do we need to mention list of namespaces in both the machines ( example i have namespace called clickstream with replication factor defined as ā€œ1ā€ in aerospike.conf file of one node in cluster. Do we need to replicate same configuration in other node of cluster.)

  4. Do we have any native aerospike java webservice which handle put,get,scan and update operations defined by aerospike.

Thanks in advance.

Hi Vinay-

Iā€™m glad that you are doing well with Aerospike.

I have numbered my answers to match the numbers of your questions. I didnā€™t mean to have such lengthy answers, but I wanted to cover all of the questions.

  1. Can you define a secondary index on a key bin? I have not tried this, but that is because I donā€™t see the benefit. The index for the key is held in system memory. Secondary indexes are held in process memory. If system memory and process memory both hold the same index value, that would be a redundant use of memory.

In asking this, are you seeking to write the key value in a location where you can retrieve it later?

You might want to either set ā€˜sendKeyā€™ in the write policy. You could also simply write the key value to a bin.

The following document discusses these options:

http://www.aerospike.com/docs/client/java/usage/best_practices.html

  1. Can you evict/expire/delete data in a particular set without impacting the rest of the records in the namespace?

When you write a record to a namespace, the namespace has a ttl (time to live) defined for all records. The configuration for it is ā€œdefault-ttlā€. If you do not define a ttl in the client application, the default-ttl is applied to the record. When you update or touch the record, the ttl is reset.

If you set the ttl in the client, that ttl overrides the ttl set on the node. For example, if you define a ttl of 60 in your Java application, and the ttl on the node is 0 (never expire), the record expires in 60 seconds.

The node evicts records when either the high-water-disk-pct or the high-water-memory-pct is breached. The node selects records that are closest to expiration and those records are evicted from the cluster.

(http://www.aerospike.com/docs/operations/configure/namespace/retention/)

The API includes functionality to delete records. The following document discusses deletes in the Java client:

http://www.aerospike.com/docs/client/java/usage/kvs/delete.html

To (finally) answer your question: If you insert all records in a set with the same ttl (set in the client), and their ttls were never reset, they should all expire at about the same time.

If the nodes start evicting, that is an indicator of larger problems.

If you can identify the records for this set in the client, you should be able to delete them there.

  1. Do you need to mention the same list of namespaces in each node of the cluster?

Yes, absolutely. If you have two nodes, and node_1 has the namespaces test and bar, and the node_2 only has the namespace test, the nodes will not form a cluster.

  1. The following documentation is for the Java client:

http://www.aerospike.com/docs/client/java/

Scans in the Java client: http://www.aerospike.com/docs/client/java/usage/scan/scan.html

Write a record in the Java client: http://www.aerospike.com/docs/client/java/usage/kvs/write.html

Read a record in the Java client: http://www.aerospike.com/docs/client/java/usage/kvs/read.html

1 Like

Hi Dave

Thank you very much for your detailed reply.

Was able to follow the java client and created custom build client from our side.

Using client we were writing data into aerospike - reading from CSV file.

The write speed i got was only 40-50 records/second.

This is an single threaded program and its works in synchronous way as it has capture response for every request.

Here an java webservice send request to aerospike java client - for either read / write operations on multiple sets ( maximum 5 sets) , Based on response from aerospike either successful / not we need to call 3 rd party web services (which throws real time offers)

Currently we are getting around 400 requests per second.

Can you suggest anything on this.

====================================================================

Secondly in cluster nodes ( having 2 nodes) in aerospike can we allocate one of the nodes for all read operations and other node for all write operations.

We want do allocate nodes in cluster particularly for read and write operations

would it possible for you install aerospike tools:

download and extract tools:

run

sudo ./asinstall

Please ensure that python is installed prior to installing the tools.

and run

asmonitor -e latency

and

asmonitor -e info

I have installed all the tools , But facing issues while running commands.

Can you please help to resolve this.

Also can you explain how to set default-ttl as never expire.

Also what is difference between default-ttl and max-ttl and how to reset max-ttl

Also can you help in aql tool how to query for particular value or selecting values with limit and deleting values from set in aql.

Your system seems to be missing a couple of dependencies.

Please install python on your system and re-run ./asinstall

Default-ttl can be set to 0 in aerospike.conf configuration to set all newly updated or added record to never expire.

Please see:

http://www.aerospike.com/docs/operations/configure/namespace/

default-ttl is the time to live of a record. The record will expire and be removed from the index once that time is reached.

http://www.aerospike.com/docs/reference/configuration/#default-ttl

max-ttl is the maximum value that you can set the ttl of a record in a namespace.

The max-ttl can be change in both aerospike.conf and dynamically.

http://www.aerospike.com/docs/reference/configuration/#max-ttl