Aerospike Connect for Spark version 3.1.0 was released on July 21, 2021.
- Supported until October 21, 2022.
- Tested with Apache Spark 3.0.3, Scala 2.12.11 & Python 3.7.
- Minimum supported Aerospike Server version 5.0.
New Features
- [CONNECTOR-247] - Spark connector should persist Map bins as K-Ordered.
- [CONNECTOR-166] - Support batchget queries with digests in Spark Connector.
- [CONNECTOR-142] - Data Sampling using the Spark Connector using
aerospike.sample.size
flag. - [CONNECTOR-142] - Support boolean bins in the Spark Connector (refer to
aerospike.booleanbin
in the documentation).
Improvements
- This library is an uber shaded jar.
- Migrated from queryPartiton() call to ScanPartitions().
- Update Client version to 5.1.5.
- Migrated to Expressions for scans.
- Pushdown support for Float & Double datatypes.
Bug Fixes
- [CONNECTOR-215] - Writes are slower in the Spark Connector v2 version. Introduced a new flag
aerospike.write.batchsize
to control write throughput.
Known Issues
- This connector release shades all internal libraries. Please update application build files accordingly.
- Spark connector stores spark DateType and TimestampType as long. In Aerojoin API calls convert aforementioned types to Longtype.
- DataSource v2 API does not support the SQL statement
INSERT INTO
a temp view. Use DataFrame syntax for equivalent functionality. - [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 (“partition unavailable” errors occur). Fixed in version 3.1.1 .
Updates
- The default value of flag
aerospike.partition.factor
have changed from 12 to 8. Please update your application accordingly.