Data parallel computing with Spark
Contents
Data parallel computing with Spark#
Hands-on: Data analytics in Spark#
Download Move Dataset
Unzip the movie data file.
Open a terminal.
Activate the
pyspark
conda environment, then launch Jupyter notebook
$ conda activate pyspark
$ jupyter notebook
Create a new notebook using the
pyspark
kernel, then change the notebookâs name tospark-2
.Copy the code from
spark-1
to setup and launch a Spark application.