Aerospike Connect for Spark 2.8.0 (July 14, 2021)

Aerospike Connect for Spark version 2.8.0 was released on July 14, 2021.

  • Supported until October 14, 2022.
  • Tested with Apache Spark 2.4.7, Scala 2.11.12, & Python 3.7.
  • Minimum supported Aerospike Server version 5.0.

New Features

  • [CONNECTOR-166] - Support batchget queries with digests in Spark Connector.
  • [CONNECTOR-142] - Data Sampling using the Spark Connector using aerospike.sample.size flag.
  • [CONNECTOR-142] - Support boolean bins in the Spark Connector (refer to aerospike.booleanbin in the documentation).
  • [CONNECTOR-211] - Support partial updates of records using the aerospike.update.partial flag.

Improvements

  • Migrated from queryPartiton() call to ScanPartitions().
  • Updated Spark version to 2.4.7.
  • Update Client version to 5.1.5.
  • Migrated to Expressions for scans.
  • Pushdown support for Float & Double datatypes.

Bug Fixes

  • [CONNECTOR-205] - Filter out records that breach write block size in Aerospike via Spark Connector.
  • [CONNECTOR-212] - Handle nulls in full record writes (REPLACE, REPLACE_ONLY, and CREATE_ONLY).
  • [CONNECTOR-215] - Writes are slower in the Spark Connector v2 version. Introduced a new flag aerospike.write.batchsize to control write throughput.

Known Issues

  • DataSource v2 API does not support the SQL statement INSERT INTO a temp view. Use DataFrame syntax for equivalent functionality.
  • aerospike.write.mode flag overrides Apache Spark write mode.
  • Spark connector stores spark DateType and TimestampType as long. In Aerojoin API calls convert these types to long.
  • [CONNECTOR-312] - Update spark connectors with latest java clients to address CLIENT-1637 (“partition unavailable” errors occur). Fixed in version 2.8.1 .

Updates