Spark for Developers Training Course

OBJECTIVE:

This course will introduce Apache Spark. The students will learn how Spark fits into the Big Data ecosystem, and how to use Spark for data analysis. The course covers Spark shell for interactive data analysis, Spark internals, Spark APIs, Spark SQL, Spark streaming, and machine learning and graphX.

AUDIENCE :

Developers / Data Analysts

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Scala primer
- A quick introduction to Scala
- Labs : Getting know Scala
Spark Basics
- Background and history
- Spark and Hadoop
- Spark concepts and architecture
- Spark eco system (core, spark sql, mlib, streaming)
- Labs : Installing and running Spark
First Look at Spark
- Running Spark in local mode
- Spark web UI
- Spark shell
- Analyzing dataset – part 1
- Inspecting RDDs
- Labs: Spark shell exploration
RDDs
- RDDs concepts
- Partitions
- RDD Operations / transformations
- RDD types
- Key-Value pair RDDs
- MapReduce on RDD
- Caching and persistence
- Labs : creating & inspecting RDDs; Caching RDDs
Spark API programming
- Introduction to Spark API / RDD API
- Submitting the first program to Spark
- Debugging / logging
- Configuration properties
- Labs : Programming in Spark API, Submitting jobs
Spark SQL
- SQL support in Spark
- Dataframes
- Defining tables and importing datasets
- Querying data frames using SQL
- Storage formats : JSON / Parquet
- Labs : Creating and querying data frames; evaluating data formats
MLlib
- MLlib intro
- MLlib algorithms
- Labs : Writing MLib applications
GraphX
- GraphX library overview
- GraphX APIs
- Labs : Processing graph data using Spark
Spark Streaming
- Streaming overview
- Evaluating Streaming platforms
- Streaming operations
- Sliding window operations
- Labs : Writing spark streaming applications
Spark and Hadoop
- Hadoop Intro (HDFS / YARN)
- Hadoop + Spark architecture
- Running Spark on Hadoop YARN
- Processing HDFS files using Spark
Spark Performance and Tuning
- Broadcast variables
- Accumulators
- Memory management & caching
Spark Operations
- Deploying Spark in production
- Sample deployment templates
- Configurations
- Monitoring
- Troubleshooting

Requirements

PRE-REQUISITES

familiarity with either Java / Scala / Python language (our labs in Scala and Python)
basic understanding of Linux development environment (command line navigation / editing files using VI or nano)

Spark for Developers Training Course

OBJECTIVE:

AUDIENCE :

Course Outline

Scala primer

Spark Basics

First Look at Spark

RDDs

Spark API programming

Spark SQL

MLlib

GraphX

Spark Streaming

Spark and Hadoop

Spark Performance and Tuning

Spark Operations

Requirements

Testimonials (1)

Graciela Saud - Servicio de Impuestos Internos

Course - Spark for Developers

Related Courses

Python and Spark for Big Data (PySpark)

Introduction to Graph Computing

Apache Spark MLlib

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

Hortonworks Data Platform (HDP) for Administrators

Magellan: Geospatial Analytics on Spark

Apache Spark SQL

A Practical Introduction to Stream Processing

Big Data Analytics in Health

Apache Spark in the Cloud

Apache Spark Streaming with Scala

SMACK Stack for Data Science

Apache Spark Fundamentals

Apache Spark for .NET Developers

Hadoop and Spark for Administrators

Related Categories

Apache Spark

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites