Apache Spark™ - Unified Analytics Engine For Big Data
Apache Spark™ - Unified Analytics Engine For Big Data
Latest News
Apache Spark™ is a unified analytics engine for large-scale data
Spark 2.4.5 released (Feb 08, 2020)
processing. Preview release of Spark 3.0 (Dec 23,
2019)
Archive
Run workloads 100x faster.
Apache Spark achieves high performance for both batch and streaming
data, using a state-of-the-art DAG scheduler, a query optimizer, and a
physical execution engine.
Logistic regression in Hadoop and Spark
Download Spark
Generality
Combine SQL, streaming, and complex analytics.
Spark powers a stack of libraries including SQL and DataFrames, MLlib for
machine learning, GraphX, and Spark Streaming. You can combine these
libraries seamlessly in the same application.
Runs Everywhere
Spark runs on Hadoop, Apache Mesos,
Kubernetes, standalone, or in the cloud. It can
access diverse data sources.
You can run Spark using its standalone cluster mode, on EC2, on Hadoop
YARN, on Mesos, or on Kubernetes. Access data in HDFS, Alluxio, Apache
Cassandra, Apache HBase, Apache Hive, and hundreds of other data
sources.
Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other
countries. See guidance on use of Apache Spark trademarks. All other marks mentioned may be trademarks or registered trademarks of their respective owners. Copyright © 2018 The Apache Software
Foundation, Licensed under the Apache License, Version 2.0.