Any java API to get only the 160 bit digests


#1

I have a set that will contain 15 Million to 20 Million records. My keys are also stored in a bin.

I have to get all the keys at once in my application. So, I fetch the bin with all the keys stored in it. Then fetch the records in a batched manner using sets of the keys returned.

The problem is that my keys are huge - of the order of 50-64 bytes each. Fetching all of them at once causes my application to crash.

Is there any API to get all the 160 bit digests instead of the actual keys(the bin, in my case)?


#2

You can get the digest if you have the key, set of a record - but that seems like going in circles. What are you trying to do? There may be a better way to achieve what your are trying to accomplish in your data model.


#3

My use case is a distributed SOAP/REST web service that returns the records in a batched manner(first, next, next…)

The steps are:

  1. The application where the ‘First’ webservice call lands, fetches a list of all the keys from aerospike.
  2. It responds back with the 1st batch of records and pushes the rest of list of keys to a kafka topic(with single kafka group ID) to distribute it.
  3. The application where the ‘next’ webservice request lands retrieves the next batch(say 1000) of keys from kafka and queries aerospike to fetch the records. It then responds back with the list of records.
  4. and so on…

We store the user-keys in a bin too. Each key is of the order of 50-64 bytes. With 15 Mil keys, fetching all the keys at once causes a lot of ‘stress’ on my application. The application also handles real-time traffic at the same time.

I was wondering if I could fetch only the digests(that are 20 bytes each) from aerospike instead of the keys(that are 64 bytes each). That would reduce the memory footprint of the batched webservice in my application.


#4

So you are storing all the keys of the 15 million records - in a set of lookup records? Because you cannot fit 64 bytes keys, 15 million of them in one bin of a single record. Would you clarify where and how you are storing the 64 byte key of these 15 million records? Is it a list bin, spread over 1000 records or so? Do these 15 million records expire or live forever? ie how do you maintain the bin for deletions or expirations if applicable?

Instead of storing the 64 byte key, you can store the 20 byte digest and then fetch the records using the digest. I am assuming all the records are in the same namespace. That will cut your storage and network transfer by a third.

Is it possible for you to use a monotonically increasing key? like 1, 2, 3, …or K:1, K:2, K:3 …etc so you can algorithimically know all your keys?


#5

Sorry for not being clear in my earlier post. Let me try again: We have a set ‘UserSessions’ which stores session information of users. A user’s ID is a string(about 64 bytes each) and is the PK of this set. We have sendKey = false, so aerospike doesn’t store the keys. But we store the user Ids in a bin named ‘UserID’.

For the distributed ‘fetchAllSessions’ API described in my previous post, I need to fetch all the keys at once. Right now, my application fetches the complete UserID bin from the set and pushes these User IDs on to Kafka. The application that receives the next fetchAllSessions request then retrieves a batch of User IDs from kafka and then translates the user IDs to Aerospike ‘Key’ object. It then fetches the records for those keys from aerospike using the batch API.

My keys are user IDs as I described above, so, I don’t think they can be represented by monotonically increasing numbers.


#6

Set is just a logical grouping of records in a namespace and is just a metadata on a record. So, a set does not have a PK. A record has a PK. Sets don’t contain bins. A Record contains one or more bin.

So, set UserSessions will contain records, those records will have their own primary key.

So on one hand you have 15 million records with their 15 million PKs that were used to create those records. Then you have the UserSessions set which has some records? 1000s? which have bins containing those 15 million PKs for retrieving the original 15 million records?

I am still not clear on your data model.


#7

Here is my set header as listed in aql select * from ns.UserSessions

±---------±-------------------±--------------------+

| userId | Typ1Sessions | Typ2Sessions |

±---------±-------------------±--------------------+

For each record, ‘userId’ is the bin that contains user IDs. Currently, my application fetches this bin from all the records. I would want to retrieve the digests for each record instead.


#8

I think I am unable to explain the concept of primary key of a record succinctly through these messages. Are you using Aerospike Enterprise Edition? If so, you are entitled to support and they can help. If you are using Community Edition, I suggest training courses offered by Aerospike may be worth an investment. www.aerospike.com/schedule


#9

I don’t know what else I can do to explain my requirement. But thanks anyway!


#10

I assume you are using a scan to fetch all keys. If so, set ScanPolicy.includeBinData to false. Since you created the records with Policy.sendKey == false, the user key will not be returned, nor will any bins. Only the digest will be returned.


#11

I came across QueryPolicy.includeBinData yesterday and am testing it currently. You just confirmed my findings. Thanks a lot.


#12

This topic was automatically closed 6 days after the last reply. New replies are no longer allowed.