Hi!
I’m maintaining users in Aerospike and I want to have unique validation on some bins like email, username etc. How can I achieve this in Aerospike?
One solution which I can think of is
create the record
check if other records exist with same bin value. If exists, and our created record is younger than the others, we delete the new record.
For me to do this, UDF should support querying which it doesn’t. I can do it on the client side also but I would prefer not to because of the network latency.
I assume you mean that you want something equivalent to an SQL UNIQUE index on a bin such as ‘email’, ‘username’
Aerospike is a primary key database. Your key for the users table (set) should be the username. That would ensure its uniqueness. You can also write a new user with RecordExistsPolicy of CREATE_ONLY to make it act as a SQL INSERT, rather than the default upsert behavior. This way if you already have a user with username ‘foo’ and someone tries to create a new user ‘foo’, you’ll get an error, rather than the second ‘foo’ overwriting the first one. In the Python client you’d set the exists write policy with aerospike.POLICY_EXISTS_CREATE instead of the default aerospike.POLICY_EXISTS_IGNORE.
If you need both username (the key) and email to uniquely refer to the same record, you should have a ‘reverse lookup table’ - a set where the key is the email and the value is the username. You could use exists() to check for uniqueness and get() to find whose email that is.
Don’t use a UDF for what you can figure out quickly on the application side. The native operations are very fast, scale well, and are there for you to use before you ever need to write your own function. Get familiar with your client’s API.
Hi, Thank you so much for your response. I need some clarifications here. I’m assuming we can do parallel writes in my problems. Correct me if I’m wrong.
How can I handle parallel writes in my case with your solution? (Multiple record writes happening at the same instance, whose email (not a primary key) is same)
Even in the solution, you suggested, to handle parallel writes we need multiple network calls (I would prefer if all of the below steps can be done in single network call which is possible if querying is possible in UDF). Below is the solution:
In this solution, I’m assuming CREATE_ONLY policy works perfectly even for parallel writes somehow.
Write the record
Write the lookup record with email as PKEY and CREATE_ONLY policy. If this returns an error, then delete the created record.
Note: For a better secure system, we must lock the record if possible for both reads and writes until it’s validity is verified.
Another question:
Does CREATE_ONLY policy lock the memory address? If not how will it handle parallel insertions with the same primary key?
If by ‘parallel writes’ you mean a multi-record transaction, you can’t do that. It’s worth reading the section of the docs titled ‘Architecture - How it Works’, from the Architecture Overview onward.
You can check if a record with that email exists in the reverse lookup (useremails). First, attempt to create an entry in the reverse lookup (‘usersemails’) for the user record that doesn’t yet exist with a CREATE_ONLY RecordExistsPolicy. Think of it as a form of cooperative locking. If you’re able to create a record in the reverse lookup set with key email and value username this means you can now proceed to create a record in the ‘users’ set with key username. No other process in your application should be allowed to do so, by application logic, since they don’t have the agreed upon lock. If you follow this approach, uniqueness will be maintained, even without a multi-record transaction, since you shouldn’t create a record in ‘users’ if you can’t create the corresponding record in ‘usersemails’ first. The first client to it, gets to ensure a unique pair of records.
Parallel insertions happen across different keys, never of the same specific key. Every modification to a record locks it first, so while Aerospike is multi-node, multi-core, multi-threaded, and does lots of reads and writes in parallel, a record with a specific key cannot be modified at the same time by different clients.
If you’re using Aerospike Enterprise Edition, the reverse lookup and users sets should live in a namespace that is configured for strong consistency.
Let’s deal with the system you have, not a fictional one. UDFs cannot act on multiple records. Even in an RDBMS, you’d do this in a multi record transcation, using native SQL, not using a UDF (Pl/SQL or whatnot).