•   Pune: +91 82 82 82 9806

Big Data - Apache Spark

Course Name : Big Data - Apache Spark

Batch Schedule : 02-Nov-2019   To   11-Jan-2020

Schedule : Saturday Only (8:00 am to 2:00 pm)

Duration : 65 hours - 11 Saturdays

Timings : 8:00 AM  To  2:00 PM

Fees : Rs. 14000/- (Incl 18% GST)

  • Students and Freshers.
  • Professionals willing to switch to Big Data / Spark developer stream.
Click to Register
  • Linux commands familiarity
  • Any RDBMS (like Oracle or MySQL)
  • Python3 programming skills
  • Java programming awareness (for Hadoop MR demos)
  • XML awareness
Click to Register
  • Core i3 (64-bit) and above
  • RAM Min 8 GB. Recommended: 16 GB+.
  • 64-bit Linux – Ubuntu.
Click to Register
  • Data science (math/stat) - However implementation of stats formulae in Spark job will be covered.
  • Machine Learning - However simple ML program using Spark MLLib will be demonstrated.
  • Hadoop administration - However some basic config and performance related config will be discussed.
  • Spark administration - However some basic config and performance related config will be discussed.
  • Spark cluster on cloud - However multi-node cluster with minimal configuration will be covered.
  • Python3 Programming Language - However for Spark programming will be done in Python3.
  • Reporting and visualization tools.
Click to Register
  • Hadoop 2.x
    • Hadoop installation modes
    • Setting up Hadoop cluster
    • HDFS Java API
    • Implementing MR jobs
    • Parsing MR job args
    • Hadoop data types & custom writables
    • Job counters & configuration
    • Input Splits
    • Input/Output formats, Compression
    • Partitioner & Combiners
    • Hadoop Streaming
    • MR Job execution on YARN  
  • Hive
    • Hive introduction, architecture, installation
    • Hive CLI, Security, Beeline, Metastore & Derby
    • Hive managed & external tables,
    • Hive QL: Loading, Filtering, Grouping, Joins
    • Hive simple & complex types, DDL, DML, DQL
    • Hive indexes, views, query optimizations
    • Hive serialization / deserialization, Loading data
    • Partitioning: static & dynamic – use cases
    • Bucketing, use cases of Partitions & Buckets
    • Hive functions, operators and Hive UDF impl.
    • Thrift server, Java/JDBC connectivity  
  • Apache Spark 2
    • Spark concepts
    • Distributed Computing Challenges
    • Spark Architecture & Components
    • Spark Installation & Deployment
    • Setting up Spark cluster
    • PySpark concepts
    • PySpark Shell
    • PySpark installation
    • Executing Spark Python programs
    • Spark Web UI
    • Spark in Pycharm IDE
    • Spark on Databricks cloud
  • Apache Spark 2 - Spark Core
    • Spark RDD, Transformations & Actions, Data Load & Save
    • RDD characteristcus & execution
    • Types of RDD: Key-value, Two Pair, ...
    • Accumulators & Broadcast variables
    • RDD Internals: Distributed/Partitions, Lineage, Persistence
    • Implementing & Submitting Spark Job
    • Execution of Spark Job (RDD)
    • DAG visualization
  • Apache Spark 2 - Spark SQL
    • Spark SQL Introduction
    • Architecture
    • SQLContext & SparkSession
    • Data Frames & Datasets
    • Data Frame Columns & Expressions
    • Implementing & Executing Spark SQL job
    • Interoperating with RDDs
    • User Defined Functions
    • File Formats & Loading data
    • Spark SQL data types & schema
    • Spark SQL functions
    • UDFs & their execution
    • Global/Temporary views
    • Partitioning & Bucketing
    • SQLContext & HiveContext
    • Processing Hive data using Spark SQL
  • Apache Spark 2 - Spark Streaming
    • Streaming concepts
    • Microbatches vs Continuous job
    • Spark Streaming concepts
    • Streaming Context & DStreams
    • Transformations on DStreams
    • Windowing Concept, Windowed Operators:Slice, Window and ReduceByWindow, Stateful Operators
    • Twitter data processing
    • Spark Structured Streaming concepts
    • Triggers, Event time based processing & Watermark
    • Input sources & output sinks
    • Structured Streaming application execution
    • Apache Kafka Introduction
    • Kafka Architecture
    • Kafka Cluster Components & Configuration
    • Kafka Applications
    • Kafka Python client
    • Kafka Spark Source & Sink
  • Apache Spark 2 - Spark ML Introduction
    • Advanced Analytics concepts
    • Advanced Analytics workflow
    • Spark Machine Learning concepts
    • Transformers, Estimators & Models
    • Implement ML model using MLLib
    • Consuming Spark ML model
Click to Register

Nilesh sir taught us very well. I'm very lucky to be a student of Nilesh Sir. Thankful for that. Sir, please provide some more project ideas and some more assignments also. 


This course almost had very good coverage of technologies used in big data engineering in addition to Spark. I look forward for similar weekend classes.

Click to Register
Sr.No Batch Code Start Date End Date Time
1 Spark03 02-Nov-2019 11-Jan-2020 8:00 AM  To  2:00 PM

Schedule : Saturday Only (8:00 am to 2:00 pm)

Click to Register

Contact us

Sunbeam Market Yard Pune

'Sunbeam Chambers', Plot No.R/2, Market Yard Road, Behind Hotel Fulora, Gultekdi,    Pune - 411 037. MH-INDIA.

+91 82 82 82 9806
Sunbeam Hinjawadi Pune

"Sunbeam IT Park", Second Floor, Phase 2 of Rajiv Gandhi Infotech Park,Hinjawadi, Pune - 411057, MH-INDIA

+91 82 82 82 9806