•   Pune | Karad: +91 82 82 82 9806

Apache Spark Mastery - Data Engineering with PySpark

Course Name : Apache Spark Mastery - Data Engineering with PySpark

Batch Schedule : 16-Aug-2025   To   17-Sep-2025

Schedule : Mon-Sat

Duration : 50 hrs.

Timings : 7:00 PM  To  9:00 PM

Fees : Rs. INR 14900/- 12400/-(Inc.18% GST)

Data Engineers, Python Developers, Freshers

Click to Register

Section 1: Spark Architecture & Internals

- Distributed Computing Fundamentals

  - RDD lineage, DAG scheduler, lazy evaluation

  - Cluster managers overview

- Spark 4.x Updates

  - Adaptive Query Execution (AQE) enhancements

  - Catalyst optimizer improvements

- Performance Tuning

  - Joins

  - Partitioning, broadcast variables

  - Memory management

Section 2: PySpark DataFrames & SQL

- Data Manipulation

  - Complex types (JSON, arrays, maps)

  - Window functions, pivot tables, UDFs/Pandas UDFs

- Spark SQL Deep Dive

  - Temp views, catalog API, Hive metastore integration

  - SQL syntax for Delta Lake operations

- Execution Plans

  - Reading `explain()` output

  - Predicate pushdown, partition pruning

Section 3: Incremental Data Processing & Apache Kafka

- Structured Streaming

  - Event-time processing, watermarking, state management

  - Kafka integration (source/sink)

- Delta Lake Essentials

  - ACID transactions

  - Schema evolution

Section 4: Spark Optimizations

- Catalyst Internals

  - Logical vs. physical plans

  - Custom optimization extensions

- Performance Best Practices

  - File formats (Parquet/Delta)

  - Resource allocation (executors/cores)

Section 5: Databricks Lakehouse Platform

- Lakehouse fundamentals

- Workspace Navigation

  - DBFS, clusters, notebooks

- Delta Lake UI

  - Viewing table history/schema

- Data Governance

  - Unity Catalog basics (no Admin tasks)

Section 6: Apache Kafka Fundamentals  

- Architecture

  - Brokers, topics, partitions, consumer groups

- Spark-Kafka Integration

  - Structured Streaming with Kafka

  - Job execution

Section 7: Spark ML Introduction

- MLlib Workflow

  - Transformers vs. estimators, pipelines

  - Feature engineering (VectorAssembler, StringIndexer)

- Model Training

  - Regression demo (no hyperparameter tuning)

Section 8: Capstone Project

- Pipeline implementation

- Domain Examples: IoT monitoring, retail analytics

 

Click to Register

1. Python: Language Fundamentals, Functions, Collections, Pandas, ...

2. SQL: CRUD Operations, Group By, Joins, Analytical queries, …

3.Good to have: Linux basics, Hadoop/Hive knowledge beneficial

Click to Register

- Local Installation: Spark 4.x, Java 11, Python 3.10  

- Cloud: Databricks Community/Free Edition

Click to Register
  • Master PySpark DataFrames/SQL for batch & stream processing
  • Build optimized pipelines using Catalyst insights
  • Understand Spark job execution internals
  • Understand Apache Kafka and Integrate with Spark
  • Hands-on implementation of capstone project
  • Certification-ready skills
Click to Register

1. Developer-Centric Focus:

- Covers PySpark application development (coding, debugging, optimization).

- Excludes: Cluster administration, infrastructure setup (YARN/K8s), or Spark cluster tuning.

2. Machine Learning Scope:

- Only introductory-level Spark ML (pipeline structure, basic concept).

- Excludes: Advanced ML concepts (hyperparameter tuning, etc), DL frameworks, or MLOps.

3. Language & Environment:

- PySpark (Python API) only – Scala/Java/R APIs not covered.

- Databricks usage focuses on developer work, not account/admin management.

4. Kafka Integration:

- Covers Spark-as-Consumer/Producer – not professional Kafka cluster setup, security, or Streams API.

5. Infrastructure Assumptions:

- All labs use local/standalone mode or Databricks Community Edition.

Click to Register
Sr.No Batch Code Start Date End Date Time
1 Spark-O-04 16-Aug-2025 17-Sep-2025 7:00 PM  To  9:00 PM

Schedule : Mon-Sat

Click to Register

Contact us

Sunbeam Market Yard Pune

'Sunbeam Chambers', Plot No.R/2, Market Yard Road, Behind Hotel Fulora, Gultekdi,    Pune - 411 037. MH-INDIA.

+91 82 82 82 9806
Sunbeam Hinjawadi Pune

"Sunbeam IT Park", Second Floor, Phase 2 of Rajiv Gandhi Infotech Park,Hinjawadi, Pune - 411057, MH-INDIA

+91 82 82 82 9806