I’m trying to write a UDF that modifies data (e.g., delete record) and return results (e.g., compute sum of selected columns of the record). Since I need to do this over a range of rows, I believe I need to use a Streaming UDF. This is because I want to return the sum captured over all rows selected.
For example, say my set has two bins: metaSize and callSize. I want to run the UDF over a range of rows, where we sum metaSize and callSize of each row, delete the row, and eventually return the sum of metaSize and callSize from all rows.
Is this possible to do? The manual says “The stream-UDF is read-only”. So it seems I cannot issue deletes. Can I achieve my goal without stream UDFs?