It’s not by design that we support stream UDFs only from query results. It’s just where we are in the current implementation. In theory, the results of a scan and the results of a query should look and feel the same, but in reality, they actually use different mechanisms to get the record stream(s).
Here’s a bit of history and discussion.
When you think about how scans are done in a relational DB, an index scan and a table scan are just two fairly similar access methods – at least in terms of how they process the output. And, generally, a single table doesn’t hold terabytes of data. However, in distributed NoSQL, there’s somewhat of a different look and feel. In a setting where DBs are fairly large (e.g. in the multi-terabyte range), a distributed scan over an entire namespace is a pretty big deal. Hence, we currently require scans to run in the background. On the other hand, even though a secondary index does span all of the nodes, most secondary index queries are going to be quite a bit smaller. As a result, we run them in the foreground. In the long term, we plan to make this mechanism more general, although that doesn’t help you in the short term.
Now, as for scans over a cluster when nodes change. A scan has a ScanPolicy parameter that says what to do when things change:
All (or almost all) l of our examples set the “failOnClusterChange” parameter. If you continue during a cluster change, you might see inconsistent data.