Aerospike Connect for Spark version 2.8.0 was released on July 14, 2021.
- Supported until October 14, 2022.
- Tested with Apache Spark 2.4.7, Scala 2.11.12, & Python 3.7.
- Minimum supported Aerospike Server version 5.0.
New Features
- [CONNECTOR-166] - Support batchget queries with digests in Spark Connector.
- [CONNECTOR-142] - Data Sampling using the Spark Connector using
aerospike.sample.size
flag. - [CONNECTOR-142] - Support boolean bins in the Spark Connector (refer to
aerospike.booleanbin
in the documentation). - [CONNECTOR-211] - Support partial updates of records using the
aerospike.update.partial
flag.
Improvements
- Migrated from queryPartiton() call to ScanPartitions().
- Updated Spark version to 2.4.7.
- Update Client version to 5.1.5.
- Migrated to Expressions for scans.
- Pushdown support for Float & Double datatypes.
Bug Fixes
- [CONNECTOR-205] - Filter out records that breach write block size in Aerospike via Spark Connector.
- [CONNECTOR-212] - Handle nulls in full record writes (REPLACE, REPLACE_ONLY, and CREATE_ONLY).
- [CONNECTOR-215] - Writes are slower in the Spark Connector v2 version. Introduced a new flag
aerospike.write.batchsize
to control write throughput.
Known Issues
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality. -
aerospike.write.mode
flag overrides Apache Spark write mode. - Spark connector stores spark DateType and TimestampType as long. In Aerojoin API calls convert these types to long.
- [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 (“partition unavailable” errors occur). Fixed in version 2.8.1 .
Updates
- The default value of flag
aerospike.partition.factor
has changed from 12 to 8. Please update your application accordingly.