Remove a column from set

Hello everyone. How to remove a column from a set in aerospike with aql command?

Aerospike has bins instead of columns. One drawback of being a schemaless system is that each record contains its own schema… There is no way other than writing a program to remove that bin record-by-record using some sort of scan system, potentially with a binexists Expression. I do not think AQL can do this for you. You should look into the examples for Scan/ScanCallback in your favorite language to perform this activity

You can do with aql using a UDF written in lua. For example, here, in this file delete_bin.lua I have defined a function deleteBin that takes the binname as the argument:

$ more delete_bin.lua
function deleteBin(rec, binname)
   rec[binname] = nil
   aerospike:update(rec)
   return
end

Now, in aql, you can register this lua module and invoke it as follows. (I first insert my test records with bin age, then delete bin age.)

aql> select * from test.testset
select * from test.testset
+--------+----------+-----+
| PK     | name     | age |
+--------+----------+-----+
| "key7" | "Sean"   | 24  |
| "key5" | "Julia"  | 62  |
| "key0" | "Sandra" | 34  |
| "key9" | "Susan"  | 42  |
| "key2" | "Jill"   | 20  |
| "key1" | "Jack"   | 26  |
| "key8" | "Sam"    | 12  |
| "key3" | "James"  | 38  |
| "key6" | "Sally"  | 32  |
| "key4" | "Jim"    | 46  |
+--------+----------+-----+
10 rows in set (0.035 secs)

OK

Now I add register the lua module and run it. (It is in same directory where I launched aql from, so no path needed.)

aql> register module 'delete_bin.lua'

Now execute a background udf job via aql:

aql> execute delete_bin.deleteBin("age") on test.testset
execute delete_bin.deleteBin("age") on test.testset
OK, Scan job (8304329783834003284) created.

aql> select * from test.testset
select * from test.testset
+--------+----------+
| PK     | name     |
+--------+----------+
| "key6" | "Sally"  |
| "key9" | "Susan"  |
| "key7" | "Sean"   |
| "key5" | "Julia"  |
| "key0" | "Sandra" |
| "key2" | "Jill"   |
| "key8" | "Sam"    |
| "key3" | "James"  |
| "key4" | "Jim"    |
| "key1" | "Jack"   |
+--------+----------+
10 rows in set (0.034 secs)

OK
1 Like

Very slick @pgupta ! You should sneak that in as a standard feature with how much more efficient/easy that is than writing a program with a scancallback :wink:

You can actually also do it with Expressions without any UDF but you would then need to leverage a Client Library and wouldn’t be able to just use AQL.

1 Like

Yes, certainly one can do it with Expression Operations. However, in this case, we can get away with just using Operations since this Operation is not conditional on other bin values or record metadata. We achieve this via background query job - there are two ways to do it. 1) background udf (this what the aql example did) and 2) background operations. The code below shows the relevant execution on same data set using background operations query (job). (Note: @Albot In this case, I am not using scan call back … not bringing every record back to client and then updating in client, and rewriting back to server.)

Background operations are more efficient than background UDFs for a large data set. Also, background jobs are not managed by client. execute() returns synchronously to client acknowledging “job launched” … and client does not actively monitor the job to completion. Which means, if there are nodes that are added or removed from the cluster, while the job is running, it may not execute on all the records. But this kind of job is ideal for background udf/ops execution - its idempotent. If you detect cluster change before and after this job, rerun the job. This check can be built into the code using cluster-key info.

Starting data (as viewed in aql):

select * from test.testset
+--------+----------+-----+
| PK     | name     | age |
+--------+----------+-----+
| "key5" | "Julia"  | 62  |
| "key9" | "Susan"  | 42  |
| "key7" | "Sean"   | 24  |
| "key0" | "Sandra" | 34  |
| "key2" | "Jill"   | 20  |
| "key4" | "Jim"    | 46  |
| "key1" | "Jack"   | 26  |
| "key8" | "Sam"    | 12  |
| "key3" | "James"  | 38  |
| "key6" | "Sally"  | 32  |
+--------+----------+-----+
10 rows in set (0.034 secs

I added (to show additional filtering capability) a Secondary Index on age - for all records where age between 20 and 60, set age bin to null. (i.e. delete age bin)

Relevant java code:

//Needed imports
import com.aerospike.client.query.Statement;
import com.aerospike.client.query.Filter;
import com.aerospike.client.Operation;
import com.aerospike.client.Bin;
import com.aerospike.client.task.ExecuteTask;

//Run SI query
Statement stmt = new Statement();
stmt.setNamespace("test");
stmt.setSetName("testset");
stmt.setFilter(Filter.range("age", 20,60));
Operation[] ops = new Operation[1];
Bin ageBinNull = Bin.asNull("age");
ops[0] = Operation.put(ageBinNull);
stmt.setOperations(ops);
WritePolicy wp = new WritePolicy();

ExecuteTask task = client.execute(wp, stmt);

After executing this, the aql output shows:

select * from test.testset
+--------+----------+-----+
| PK     | name     | age |
+--------+----------+-----+
| "key5" | "Julia"  | 62  |
| "key0" | "Sandra" |     |
| "key9" | "Susan"  |     |
| "key2" | "Jill"   |     |
| "key4" | "Jim"    |     |
| "key7" | "Sean"   |     |
| "key1" | "Jack"   |     |
| "key8" | "Sam"    | 12  |
| "key3" | "James"  |     |
| "key6" | "Sally"  |     |
+--------+----------+-----+
10 rows in set (0.034 secs)
1 Like