EMR Pyspark Batch Processing Project

Project Description:

In this project I have i have implemented a batch processing pipeline using EMR and pyspark with a fictional super market data. The following technologies were used:

Architecture

EMR Setup:

Running PySpark code on EMR:

Setting up an Athena table without Glue Crawler:

Athena Bulk add columns