PySpark Learn Journal 2
satya - 7/10/2020, 8:12:17 PM
Key ideas to reintroduce
1. Map Reduce
2. RDDs
3. Transformations
4. Actions
5. Data Frames
satya - 7/10/2020, 8:16:54 PM
What are the URLs for Spark
Data frames
satya - 7/11/2020, 12:13:48 PM
Write a python program that demo the following
Data types
Comparison Operators
if,elif, else Statements
for Loops
while Loops
list comprehension
lambda expressions
map and filter
satya - 7/11/2020, 12:14:22 PM
Add a few more of my chosing on top to make it as comprehensive as possible
Add a few more of my chosing on top to make it as comprehensive as possible
satya - 7/12/2020, 11:27:33 AM
Next steps
1. Diff between a spark session and a spark context
2. Run the sample of reading a file by typing defs
satya - 7/12/2020, 11:28:20 AM
Run the csv file read on
1. windows pyspark
2. local notebook
3. aws notebook
satya - 7/12/2020, 11:37:16 AM
Topics to understand DF: Lession 1
1. what is a sparksession
2. how do you get one
3. what methods are on a session
4. how to read a file (csc, json) using session into a data frame
5. how to show a data frame
6. how to see the schema of the data frame
7. How to explicitly set the schema for a file
satya - 7/12/2020, 11:44:03 AM
Local Jupyter on my windows seem to work!
Local Jupyter on my windows seem to work!
satya - 7/12/2020, 11:56:18 AM
How to get a session
#Python stuff for pyspark package
#pyspark is a package/module
from pyspark.sql import SparkSession
#Create a new session or get one that is named
spark = SparkSession.builder.appName("Basics").getOrCreate()
#Examine its type
satya - 7/12/2020, 6:56:01 PM
Submitting pyspark program to run on windows: a batch file rs1.bat
@echo off
@rem the spark examples are at
@rem c:\satya\i\spark\examples\src\main\python
@rem notice the spark bin directory in its installation path
@rem *****************************************************
@rem this is how to submit a spark job using .py file
@rem example: rs1.cmd sonnets2.txt
@rem rs1.cmd : This batch file
@rem : pyspark program
@rem sonnet2.txt: input argument
@rem pwd: C:\satya\data\code\pyspark
@rem \
@rem \sonnet2.txt
@rem *****************************************************