What is Spark?
satya - 4/9/2015, 8:21:12 PM
About Spark
Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. It was originally developed at UC Berkeley in 2009. Databricks was founded by the creators of Spark in 2013.
satya - 4/9/2015, 8:33:38 PM
Where SPARK fits
satya - 4/9/2015, 8:35:29 PM
More on Spark
Spark provides easy-to-use APIs for operating on large datasets. This includes a collection over 80 operators for transforming data and familiar data frame APIs for manipulating semi-structured data.
Fast: Spark is engineered from the bottom-up for performance, running 100x faster than Hadoop by exploiting in memory computing and other optimizations. Spark is fast on disk too; it currently holds the world record in large scale on-disk sorting.
A Unified Engine: Spark is packaged with higher level libraries, including support for SQL queries, streaming data, machine learning and graph processing. These standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.
satya - 6/4/2015, 2:24:18 PM
Understand how modern data architecture for big data analytics looks like