Aerospike Connect for Spark 3.1.0 (July 21, 2021)

Aerospike Connect for Spark version 3.1.0 was released on July 21, 2021.

  • Supported until October 21, 2022.
  • Tested with Apache Spark 3.0.3, Scala 2.12.11 & Python 3.7.
  • Minimum supported Aerospike Server version 5.0.

New Features

  • [CONNECTOR-247] - Spark connector should persist Map bins as K-Ordered.
  • [CONNECTOR-166] - Support batchget queries with digests in Spark Connector.
  • [CONNECTOR-142] - Data Sampling using the Spark Connector using aerospike.sample.size flag.
  • [CONNECTOR-142] - Support boolean bins in the Spark Connector (refer to aerospike.booleanbin in the documentation).

Improvements

  • This library is an uber shaded jar.
  • Migrated from queryPartiton() call to ScanPartitions().
  • Update Client version to 5.1.5.
  • Migrated to Expressions for scans.
  • Pushdown support for Float & Double datatypes.

Bug Fixes

  • [CONNECTOR-215] - Writes are slower in the Spark Connector v2 version. Introduced a new flag aerospike.write.batchsize to control write throughput.

Known Issues

  • This connector release shades all internal libraries. Please update application build files accordingly.
  • Spark connector stores spark DateType and TimestampType as long. In Aerojoin API calls convert aforementioned types to Longtype.
  • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.
  • [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 (“partition unavailable” errors occur). Fixed in version 3.1.1 .

Updates