Background Scan: Pass selective bins to UDF

scan
udf
#1

Hi, I’ve used the as_scan_select function to select specific bins for a record from aerospike. The standard scan examples in aerospike client uses the same function.

I used the same function with aerospike_scan_background next time. I was expecting only the selected bins will be returned to the registered udf functions for background scan. But that does not seem to be the case. The background udf function is receiving all bins for that record.

So does as_scan_select not work with aerospike_scan_background? If that is the case, is there a way to return selective bins to a background udf function, instead of returning all bins in record?

#2

In the background scan there is no callback to process data in the client. You would specify the User Defined Function in lua to execute on each record scanned, on the server itself. The UDF will be made part of the scan definition using as_scan_apply_each().

The UDF will have to be loaded on the server - for e.g. using AQL’s REIGSTER MODULE … before launching the scan job. This UDF will reside on all the server nodes till you decide to delete it, again you can do that using AQL’s REMOVE MODULE. (i.e. It can be used again in the next scan job till deleted off the server.)

Inside your UDF, the lua function , you get full access to all bins and metadata of the record and you write the UDF code to modify selected bins, if thats what you want to do. All the client gets is the status of the background scan.

So in a background scan, as_scan_select() has no relevance. The server always fetches the entire record from disk - select() only helps reduce network traffic if client is interested in selected bins only. When using UDFs, they run on the server, so they get the full record right off the storage and can do their own selection.

#3

Thanks for the reply @pgupta

Scan behavior much more clear now.

1 Like