21 hours (usually 3 days including breaks)
- General programming skills
- IT Professionals
- Data Scientists
Python is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python.
In this instructor-led, live training, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.
By the end of this training, participants will be able to:
- Learn how to use Spark with Python to analyze Big Data.
- Work on exercises that mimic real world circumstances.
- Use different tools and techniques for big data analysis using PySpark.
Format of the course
- Part lecture, part discussion, exercises and heavy hands-on practice
Understanding Big Data
Overview of Spark
Overview of Python
Overview of PySpark
- Distributing Data Using Resilient Distributed Datasets Framework
- Distributing Computation Using Spark API Operators
Setting Up Python with Spark
Setting Up PySpark
Using Amazon Web Services (AWS) EC2 Instances for Spark
Setting Up Databricks
Setting Up the AWS EMR Cluster
Learning the Basics of Python Programming
- Getting Started with Python
- Using the Jupyter Notebook
- Using Variables and Simple Data Types
- Working with Lists
- Using if Statements
- Using User Inputs
- Working with while Loops
- Implementing Functions
- Working with Classes
- Working with Files and Exceptions
- Working with Projects, Data, and APIs
Learning the Basics of Spark DataFrame
- Getting Started with Spark DataFrames
- Implementing Basic Operations with Spark
- Using Groupby and Aggregate Operations
- Working with Timestamps and Dates
Working on a Spark DataFrame Project Exercise
Understanding Machine Learning with MLlib
Working with MLlib, Spark, and Python for Machine Learning
- Learning Linear Regression Theory
- Implementing a Regression Evaluation Code
- Working on a Sample Linear Regression Exercise
- Learning Logistic Regression Theory
- Implementing a Logistic Regression Code
- Working on a Sample Logistic Regression Exercise
Understanding Random Forests and Decision Trees
- Learning Tree Methods Theory
- Implementing Decision Trees and Random Forest Codes
- Working on a Sample Random Forest Classification Exercise
Working with K-means Clustering
- Understanding K-means Clustering Theory
- Implementing a K-means Clustering Code
- Working on a Sample Clustering Exercise
Working with Recommender Systems
Implementing Natural Language Processing
- Understanding Natural Language Processing (NLP)
- Overview of NLP Tools
- Working on a Sample NLP Exercise
Streaming with Spark on Python
- Overview Streaming with Spark
- Sample Spark Streaming Exercise
Pawel Kozikowski - GE Medical Systems Polska Sp. Zoo
* Organization * Trainer's expertise with the subject
ENGIE- 101 Arch Street
The teacher has adapted the training program to our current needs.
EduBroker Sp. z o.o.
Some of our clients
is growing fast!
We are looking to expand our presence in the US!
As a Business Development Manager you will:
- expand business in the US
- recruit local talent (sales, agents, trainers, consultants)
- recruit local trainers and consultants
- Artificial Intelligence and Big Data systems to support your local operation
- high-tech automation
- continuously upgraded course catalogue and content
- good fun in international team
If you are interested in running a high-tech, high-quality training and consulting business.Apply now!