What is Spark?

What is Spark?

Search for: What is Spark?

About Spark

Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. It was originally developed at UC Berkeley in 2009. Databricks was founded by the creators of Spark in 2013.

Spark Home Page

Spark provides easy-to-use APIs for operating on large datasets. This includes a collection over 80 operators for transforming data and familiar data frame APIs for manipulating semi-structured data.

Fast: Spark is engineered from the bottom-up for performance, running 100x faster than Hadoop by exploiting in memory computing and other optimizations. Spark is fast on disk too; it currently holds the world record in large scale on-disk sorting.

A Unified Engine: Spark is packaged with higher level libraries, including support for SQL queries, streaming data, machine learning and graph processing. These standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.

Spark and Scala

Search for: Spark and Scala

Hortonworks Videos

Search for: Hortonworks Videos

Start here on youtube