What should I know before going with UDF?

dmartyanov · February 16, 2017, 1:58am

Hi, I am going to use UDF to perform some atomic READ/WRITE operations because it looks more robust to perform it in data layer than in application layer using optimistic locks with CAS. What should I know before accepting this design? Are there any limitations of using UDFs? In terms of memory used or anything else? I have seen a topic that UDF crashed an entire cluster in Google Cloud How my UDF crashed Aerospike. Could you please share any live experience or recommendations of using UDFs with respect to reliability and performance?

rbotzer · February 16, 2017, 3:07am

Did you consider using the operate command to execute multiple operations on the same record? From what you’re describing that’s what you’re trying to do. A record UDF gets a lock and all operations described in the function occur on that same record. Similarly, a ‘multi op’ gets a lock on the record, and executes multiple operations in sequence on it, with the entire set of operations rolling back on failure. It’s similar to a transaction in an RDBMS, just on a single record.

I’m using the Python client’s method as an example, but the same method exists in the other clients as well.

The operate command would actually be faster and more scalable than the equivalent record UDF.

pgupta · February 16, 2017, 3:52am

Operate() allows you to do a list of operations on a record (modifications) and finally read it back to the client in its final form in one trip from the client to the server.

However, you cannot do “if then else” type of logic on the record operations based on the data in the record in Operate(). If you want to read, then based on what is read, apply some logic and take action 1 or action 2 or action 3 on the record, then you can either use CAS or UDF.

With Operate() you get a read of the record in the same lock because Operate() has a return value in which you can return the “record”.

dmartyanov · February 16, 2017, 4:00am

Hi, thanks for the reply. I haven’t considered aerospike Operations because they have only some primitive functions like add/append/etc. I am going to use CDT and perform atomic Read-Modify-Write operations. UDF fits much better to my requirements, but it is not a single way. That’s why I need some feedback about its reliability.

pgupta · February 16, 2017, 4:09am

Also, the crash link you mention is from Jul 2015, plus it is on stream UDF for aggregation, what you are trying to use is RecordUDF.

Stream UDFs operate on a set of records in read only mode and can be used to extract and aggregate data from the records - sum/average etc. In my testing of Stream UDFs in recent past, I have not seen server crash or anything like that. Hard to comment on what went wrong in that test.

Record UDFs operate on a single record and can modify, delete, update that record.

dmartyanov · February 16, 2017, 4:27am

What if my UDF function requires a lot of time for execution, could it theoretically breed problems related to utilization of the thread pool or something like that? Roughly speaking what is the scope of propagation of the problem related to Record UDF? Are they isolated within a namespace ? I am curious about it because an opportunity to execute some code within your cluster might be quite dangerous.

pgupta · February 16, 2017, 5:46am

For a record UDF, when operating on a record, the lock will be on that record which is isolated to the node on which the record is, the namespace that record is in and the partition that record is in. There are 4K partitions.

Ver 3.11 has introduced further refinement of the partition into sprigs (Configuration Reference | Aerospike Documentation) which will improve performance significantly.

This page has some discussion on assessing UDF performance: Managing UDFs | Aerospike Documentation

Also see

Topic		Replies	Views
Atomicity of Record UDFs User Defined Functions (UDF) udf	2	3148	July 29, 2016
Multi-record write transactions User Defined Functions (UDF)	0	1536	February 12, 2015
Is Record UDF calling thread safe? User Defined Functions (UDF)	6	5949	August 21, 2018
Record locking semantics Feature Discussion	1	3503	March 21, 2016
How to get, examine and update/insert few records in one transaction Python Client udf	3	1431	April 17, 2018

What should I know before going with UDF?

Related topics