LDT Limits and use-case


#1

Hi!

I’m currently going through making a final decision on what DB would serve my application best and I’ve been carefully looking into Aerospike but have yet to use it - I’ve been working with Redis before but unfortunately my application would become very expensive being completely in-memory. I’ve got SSD-powered servers (3 x 300GB, SW RAID1 but will undo any form of RAID if I decide to go with Aerospike) already.

The application itself is similar in a manner to Pastebin, but different as well. This means that I’m planning on storing data that can each individually range between 1B to 10MB (planning on setting a hard limit but unsure on exactly how much at the moment). I will be using the Node.js client and wondering if LDT’s supports a single bin value over the limit set (128KB as I’ll be using SSD’s)? I could chunk the data using a UDF or on the application-level and write multiple values, but I just don’t understand exactly how LDT works.

The way I’ve understood it is that I should create a LLIST for each individual “paste” and not one big LLIST for all of them since each bin value will differ so much in sizes - presuming each bin value can exceed the write-block limit which , if I remember correctly it can’t. Does this mean I should chunk the data into X/Y where X=data size and Y=write-block limit and write each chunk into an individual bin value and then assemble the chunks when being read again?

Hopefully I managed to clarify the application structure well enough!

Thanks in advance!

-Tom


#2

I am not sure, but can’t you use a CDN / traditional filesystem approach to store the payload and use Aerospike just for the meta data? I don’t think streaming 10 megs out of AS per GET is a good idea in the first place. If you want to use AS for storing stuff, you can increase the record size limit (though it’s not fully tested) and use a simple BLOB-bin. Using an LDT would be the last option I would consider, as it involves a lot of overhead (atleast 220 bytes/LList plus x for every subrecord created). Oh and yes, you would rather create many LDTs than using a single on as it would be a first class bottleneck. Hope this helps you with designing your application.

Cheers, Manuel


#3

Folks,

Let me try to draw the setup

user-id -> list of snippets

Question is if you should keep single big list of snippets or should you have a bin per snippet.

Few facts which you should know

  1. Having more bins have overhead of storing info about it. So it is advisable to work with smaller number of bins.
  2. LLIST is list of items each of which can have maximum size of write block size. (max 1MB)
  3. Given LDT provides ability to store unbounded list. You can store as many snippets as you like, they are slower than normal get put and has some (disk space) and memory overhead (64byte per subRecord). SubRecord size is ldt-pagesize as configured and can be upto write block size 1MB)
  4. LLIST does not have built in expiry mechanism. User have to run back ground scan udf to achieve it.
  5. Records in aerospike have maximum size of up to write block size (1MB).

If I understand your application structure based on above facts I would suggest to have a single LLIST bin with the data layed out like

user-id snippet-list ( name-chunckid:blob, name-chunkid:blob, name-chunkid:blob… )

You need to chunk each snippet into units of 1MB-delta (some space of other stuff) and store it like above. Store like above would let you query system using snippet-name and pieces of it or in entirety.

For expiry you need to run scan udf to walk through all the LLIST and throw away entries which you do not need anymore. If expiry time for snippets does not change you can organize them under multiple LLIST one for each unit of time. Expiry can throw away entire bin to expire big chunks without walking through LLIST

User-id LLIST_BIN_DAY1, LLIST_BIN_DAY2, LLIST_BIN_DAY3

In you use case LLIST cannot store a single snippet as single item in the list so your application is doing a hardwork chunking it. Given that and depending on how much of performance you need you may want to consider other option of moving this entire thing up a higher level. Store the chunks in key value store itself. When storing snippet store it with name-chunkid key and each chunk being 1MB. While writing you do multiple records writes and while reading perform batch-reads to get name-chunkid {1…10} always if you have max size of 10MB. In this model, if expiry time in snippet does not change, it can be automatic to with certain contraints.

I would suggest you use LDT if you really are worried about same snippet being updated under multiple threads and protecting it from application is part of your problem. e.g multiple user is updating snippet and should be visible to multiple user in consistent manner.

HTH

– R


#4

Hi Raj!

Thank you for clearing things up for me! Much appreciated!

Storing the data in the k:v store is probably a better option, but as I will be using SSD’s the write-block-size will probably be lower than 1MB (128KB, according to the docs lower write-block-size improves performance a notch when using SSD).

I don’t have any issue with the write-block-size being 128KB as I’ll rarely see a single snippet larger than 2MB, but those may occur. The reason I’m looking into Aerospike is because I need a performant store. I could use the filesystem directly, but it’d be a hassle to scale and ensure redundancy to.


Here’s a use case:

Application recieves 2MB of data, splits it into 128KB chunks ( 2000000 bytes / 128000 bytes = 16 chunks). Application then generates a key for each chunk and continiues with a batch PUT with the 16 chunks along with the snippet metadata (contains chunk keys and whatnot). If each request reports OK status, report back to the user that the snippet is saved, if error, perform cleanup and report error back to user.

When a user wants to get the snippet, the application performs a GET on the snippet metadata, collects the chunk keys and then continiues to run a batch GET on the chunks. When each chunk is loaded the application assembles the chunks and pipes it to the user.

When a user updates the snippet, reproduce create step.


Does this sound like a good model? The chunks doesn’t necessarily have to be assembled but can be piped to the user as a Transfer Encoding: "chunked" http response and let the browser assemble. That way I don’t have to wait for the entire batch to complete, but as soon as I get the first chunk I can start piping.

Now I have a question regarding this:

Does this model have any noticable performance impact in comparison to using the normal filesystem? Is it a faster approach? Since Node.js isn’t the fastest way to work with the filesystem (no sendfile(2) syscall support) I figured it’s probably better to pipe the data to a performant store.

Thanks again for your inputs!


#5

Yes !!

We do not have Aerospike comparison with file system. I think decision should primarily be based on what kind of interface you want to work with in your application. Anyways Disk IO in all like hood will be your bottleneck.

– R


#6

I understand. I’m planning to use both SSD and RAM, where the most popular snippets will reside cached in-memory for faster loading (as they will be visible on the front-page)

I’ve got a couple of very powerful servers to back this up:

3 of these:

  • CPU: E3-1245v2 @ 3.4GHz
  • RAM: 32GB DDR3 ECC
  • Disks: 3 x 300GB Intel 320-series SSD’s (non-RAID)

Anything with these specs that doesn’t work well with Aerospike that you know of?

Thanks!


#7

Looks good !!! But really depends on what is your requirement :smile:

Just in case you did not know Aerospike does not cache data in memory if configured with data-in-memory to false. So basically all reads are performed from the disk. The only case it gets cached is if you read the data you recently wrote. Amount of hit you get is based on the size of post write queue you have configured and your access pattern !!. Check out http://www.aerospike.com/docs/reference/configuration/#post-write-queue

– R


#8

Essentially I was looking into having another namespace configured for memory storage to serve as the cache. Is that a bad model?


#9

Aerospike is not a LRU data store !!! So what it means is the expirations are based on the when the data is written not based on when it is read.

If you want LRU cache you need to do touch everytime you read so that it is not a candidate for eviction or expiry from the in-memory cache.

Other option can be LRU cache like memcache !! I have not really used both along so cannot help you with how much is going to be impedence mismatch.

– R


#10

As you can get very high throughput at low latency from Aerospike with SSD, you should consider getting rid of the caching pattern, and focusing on a single SSD-backed namespace. Combining a database with a cache is a relic of RDBMS on HDD. It aimed to solve the significant latency difference between an app and its database, and to patch the warts of an RDBMS, such as unpredictable, high latency, and a limited ability to handle concurrent connections.

The cache is volatile, and it adds a complicated pattern to your data access on the application-side. Without it you’re looking at less code, and with it faster execution and less bugs.


#11

Thanks a bunch for your inputs! They’re much appreciated since I’ve never actually used Aerospike before. I’m looking into it and have tinkered with it a bit and it looks very, very promising so far!

I’ll get back to you when I get something together to show you :smile:


#12

@Svenskunganka, @ManuelSchmidt:

Thank you for posting about LDTs in our forum. Please see the LDT Feature Guide for current LDT recommendations and best practices.


#13

@Svenskunganka, @ManuelSchmidt:

Effective immediately, we will no longer actively support the LDT feature and will eventually remove the API. The exact deprecation and removal timeline will depend on customer and community requirements. Instead of LDTs, we advise that you use our newer List and SortedMap APIs, which are now available in all Aerospike-supported clients at the General Availability level. Read our blog post for details.