Knowledge Hive

Spark with Python

Shape Image One

Spark with Python

Wishlist Share
Share Course
Page Link
Share On Social Media

About Course

PySpark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide.

Apache Spark is written in Scala programming language. To support Python with Spark, Apache Spark community released a tool, PySpark. Using PySpark, you can work with RDDs in Python programming language also

What Will You Learn?

  • Batch Analytics
  • Real Time Analytics Options
  • Spark Architecture
  • In Memory Data – Spark Working With RDD
  • RDDs
  • Transformations in RDD
  • Actions in RDD
  • Loading Data in RDD
  • Saving Data through RDD
  • Spark Properties
  • Spark UI
  • Spark Partitioning / Parallelism
  • Logging in Spark
  • Checkpoints in Spark
  • Key-Value Pair RDD
  • Structured data with Spark SQL
  • SQL with Apache Hive
  • Loading and Saving Data using Hive
  • Data Frames [Creating and Querying]
  • Using Spark SQL in Applications.
  • With Hive
  • With Mysql
Need Help?