About the Spark category

Apache Spark is a top-level project of the Apache Foundation. In one deployment configuration, Spark can run in tandem with or instead of Hadoop databases, since it can read from Hadoop Distributed File Systems (HDFS), and can work with YARN (Yet Another Resource Negotiator) or Apache Mesos. In a second configuration, Spark can run on a standalone mode, either SQL or NoSQL, and integrate with, for instance, Aerospike.

Apache Spark has four main modules:

  • Spark Streaming
  • Machine Learning (MLlib)
  • Spark SQL
  • GraphX

Please use this forum to discuss aspects of working with Apache Spark in your architecture, or topics of interest to the Apache Spark community.