Moving from Cassandra to Aerospike: is there a list of "what I wish I knew" or "gotchas" about Aerospike?

gotcha
index
secondary

#1

Hi all,

nowadays we use cassandra as our storage and it has about 74TB of data (with replication of 3) and we are expecting it to grow much more. But due to its high cost of maintenance we are looking for some other option and Aerospike seems to be the best one available. Everyone says its is very stable, and easy to maintain. But as everything aerospike also must have some problems too. So do you guys see any drawbacks in relation to cassandra? What do we have to worry about when using Aerospike? What are the common problems? Is it really stable and predictable?

Thanks for any help


#2

Thanks for your question,

Aerospike is definitely very stable and easy to maintain for all its core KV Store features (assuming it has been configured correctly). So you definitely need to understand the few basics regarding tuning, and you should be good to go :smile:

Feel free to share a bit more about your use case, though, and the features you are planning to use.

Thanks,

–meher


#3

Thanks for the replay meher. Do you have any “what I wish I knew” about AeroSpike before starting to use it?


#4

Hum… good question. We don’t have anything like this per se, but definitely something I will bring up as it would make sense as part of our online documentation. I am happy to try to guide you if you share details about how you are planning to use Aerospike and I promise to be very open whether or not it makes sense to use Aerospike. (I am not part of our sales team… I am part of our support team, so it would be in my best interest to not push you towards anything that cannot be easily supported :wink: )


#5

We have migrated from Cassandra (CS) to Aerospike (AS) due to low latency and predictability requirements, including lower time for maintenance.

Our data set is not as large as yours and I don’t know your Read/Write throughput, latency requirements and data model so it is difficult to answer about drawbacks. AS is one of the fastest and most predictable storages. Do you really need 74TB of data to be accessible as fast as possible?

AS is designed to perform best on SSD’s or in memory. For a large data sets we use SSD’s, in-memory for small. For our throughput we managed to downsize server count from 64 for CS to 6 for AS: 10 times less servers, saved on power and hardware costs! That allowed us to get a paid support - we found it invaluable. We also saved on development, since AS is much more predictable, Network load is also much lower if you used read repairs with CS.

AS requires much less maintenance, we even managed to lost 1 or two servers (hardware issues) without even knowing. With CS such situation is a pain. Adding new AS nodes is as easy as just start and forget. Replacing a dead one is the same as adding a new one.

CS sstables has a huge overhead, re-balancing is mostly manual, while AS has no such overhead and automatically balances the cluster - you even don’t have to monitor it. For the latency, CS can’t compete with AS due to GC pauses, compaction, etc. Even using the same hardware as for AS, Cassandra will just die due to a lot of overhead.

The one AS drawback on SSD’s is slow node startup after reboot - it needs to rebuild indexes scanning SSDs. The time depends on amount of data on your node. It may take a lot of time, may be over 1 hour. But it may happen once a year and you will not care a lot, since the cluster still performs well.

To get most from AS you need to learn its data models. When you learn it and design your model properly, you easily calculate your cluster capacity. You can’t do the same with Cassandra. The AS drawbacks, as for Cassandra - you need to know its limits. Such as 32k unique bin names and name length - CS has no such limits for dynamic columns. The size of the record (CS row) when not using large data types is also limited, but to read a record you will do exactly 1 read I/O, while you can’t count it for CS and 1 I/O for CS is much larger - a huge overhead for a throughput.

We do not use yet cross-DC replications, so no comments on this. But I’m sure AS will also win over CS repairs in case of failures, which is a most pain for CS maintenance.

With AS you concentrate on your business, not hot to survive with CS.


#6

Thanks a lot Meher, I’m starting to plann how we are going to use AS and I’ll share it with you guys.


#7

That’s great AdformViktor! That’s what I wanted to hear :D! The 74TB of our cassandra, as it is 3 replica and most of everyone uses 2 for AS, would be about 50TB. But half of our data we write and then read just once. We keep it in case we need to reprocess. So we need to think about it too. We are considering keeping it in cassandra… we dont know yet. Do you think it is a good idea to move that data we read just once to AS too?


#8

We use AS with RF=2 and use rack awareness.

AS is designed with high throughput lowest latency in mind. Every object stored in AS can be accessed with the same speed regardless when object was written. CS can’t assure this.

Since the cost of SSD is higher than HDD, we store to AS the only data we access with the characteristics above. In some cases we use scan to read all data for export/processing, but scan is not a primary usage.

If we need some raw data for re-processing, we store it as log files to S3 type of storage and can re-process it with whatever - such storage is cheaper and designed just for that. For example, http://ceph.com/

Comparing to S3, AS “overwrites” current data, CS will absorb updates with compactions, both have TTL, while S3 will keep everything, so it is up to you how to remove outdated files from S3.


#9

Thanks AdformViktor!

One clarification:

The Aerospike Enterprise Edition has a feature called Fast Restart which will actually persist the index in shared memory and avoid having to scan the SSD upon restart. You can read details about it on this page.

Thanks, –meher


#10

By reboot I mean machine/OS reboot. Thanks for clarification about enterprise version restart - that helps when reconfiguring AS, but does not help upon OS reboot.


#11

Yes, you are 100% correct. And, one more: if one has secondary indices defined, those will also not fast start and will have to be rebuilt from persistent storage.


#12

Thanks AdformViktor, we’ll definitelly think about a solution like that too. Meher, about secondary indexes, is there a limit on how many should we have? Besides the fact that it worsens the startup time, how does it impact if we have many secondary indexes? Does it have to fit in memory?


#13

There is a limit of 256 secondary indexes per namespace.

Each secondary index will for sure use up memory, so yes, it has to fit in memory. We may at some point decide to have an option to be able to have those on disk, but that’s not in the works at this point.

For more details regarding memory usage for secondary indices, check this page. This is not straight forward to understand so feel free to provide details here if you need help with sizing of a secondary index.

I also recommend reading this page about secondary index architecture in Aerospike.


#14

Thanks very much Meher!