Apache Spark MLlib Training Course

Course Code

spmllib

Duration

35 hours (usually 5 days including breaks)

Requirements

Knowledge of one of the following:

  • Java
  • Scala
  • Python
  • SparkR.

Overview

MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs.

It divides into two packages:

  • spark.mllib contains the original API built on top of RDDs.

  • spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

 

Audience

This course is directed at engineers and developers seeking to utilize a built in Machine Library for Apache Spark

Course Outline

spark.mllib: data types, algorithms, and utilities

  • Data types
  • Basic statistics
    • summary statistics
    • correlations
    • stratified sampling
    • hypothesis testing
    • streaming significance testing
    • random data generation
  • Classification and regression
    • linear models (SVMs, logistic regression, linear regression)
    • naive Bayes
    • decision trees
    • ensembles of trees (Random Forests and Gradient-Boosted Trees)
    • isotonic regression
  • Collaborative filtering
    • alternating least squares (ALS)
  • Clustering
    • k-means
    • Gaussian mixture
    • power iteration clustering (PIC)
    • latent Dirichlet allocation (LDA)
    • bisecting k-means
    • streaming k-means
  • Dimensionality reduction
    • singular value decomposition (SVD)
    • principal component analysis (PCA)
  • Feature extraction and transformation
  • Frequent pattern mining
    • FP-growth
    • association rules
    • PrefixSpan
  • Evaluation metrics
  • PMML model export
  • Optimization (developer)
    • stochastic gradient descent
    • limited-memory BFGS (L-BFGS)

spark.ml: high-level APIs for ML pipelines

  • Overview: estimators, transformers and pipelines
  • Extracting, transforming and selecting features
  • Classification and regression
  • Clustering
  • Advanced topics

Testimonials

★★★★★
★★★★★

Bookings, Prices and Enquiries

Private Classroom

Private Remote

From $11250 (332)

Public Classroom

Location Date Course Price [Remote/Classroom]
VA, Reston - Sunrise Valley 2018-11-05 09:30$ 11250 / $ 13150
New York (NYC) - Midtown Manhattan - Madison & E38-39th2018-11-05 09:30$ 11250 / $ 14550
CT, Shelton - Shelton2018-11-05 09:30$ 11250 / $ 13050
CT, Stamford - Downtown2018-11-05 09:30$ 11250 / $ 13300
AZ, Phoenix - Central Ave Suite 14002018-11-05 09:30$ 11250 / $ 14050
SC, Charleston - Meeting Street2018-11-05 09:30$ 11250 / $ 14050
DC, Washington - Street Northwest2018-11-05 09:30$ 11250 / $ 14050
VA, Fredericksburg - Central Park Corporate Center2018-11-05 09:30$ 11250 / $ 13550
FL, Fort Lauderdale - Downtown2018-11-05 09:30$ 11250 / $ 13400
NC, Charlotte - Charlotte City Center 2018-11-05 09:30$ 11250 / $ 13300
IA, Des Moines - Hub Tower2018-11-05 09:30$ 11250 / $ 13050
FL, Miami Beach - Miami Beach2018-11-05 09:30$ 11250 / $ 13300
MS, Flowood - Market Street2018-11-05 09:30$ 11250 / $ 13050
GA, Atlanta - Proscenium2018-11-05 09:30$ 11250 / $ 13550
NY, Long Island - New Hyde Park2018-11-05 09:30$ 11250 / $ 14050
NJ, Jersey City - Harborside Financial Center2018-11-05 09:30$ 11250 / $ 14050
Atlanta, GA - One West Court Square2018-11-05 09:30$ 11250 / $ 13550
GA, Atlanta - Downtown 260 Peachtree2018-11-05 09:30$ 11250 / $ 13950
CA, San Francisco - Golden Gate - 75 Broadway2018-11-05 09:30$ 11250 / $ 13450
NY, Syracuse2018-11-05 09:30$ 11250 / $ 12800
Cannot find a suitable date? Choose Your Course Date >>Too expensive? Suggest your price

Course Discounts

Course Venue Course Date Course Price [Remote / Classroom]
MongoDB for Administrators Remote Training (Instructor-led) Tue, Oct 23 2018, 9:30 am $3100 / N/A
SQL Fundamentals CA, San Francisco - Golden Gate - 75 Broadway Tue, Oct 23 2018, 9:30 am $3200 / N/A
Drupal 8 Themes DC, Washington - Street Northwest Tue, Oct 23 2018, 9:30 am $1500 / N/A
SQL in MySQL NY, Queens - Forest Hills Tower Thu, Oct 25 2018, 9:30 am $3100 / N/A
Visual Basic for Applications (VBA) in Excel - Advanced Remote Training (Instructor-led) Tue, Oct 30 2018, 9:30 am $2000 / N/A
Excel VBA Introduction Remote Training (Instructor-led) Wed, Oct 31 2018, 9:30 am $1500 / N/A
Business Continuity Management NC, Chapel Hill - Chapel Hill - Downtown Mon, Nov 5 2018, 9:30 am $7920 / $9870
Introduction to Selenium CA, San Francisco - Golden Gate - 75 Broadway Tue, Nov 6 2018, 8:30 am $1500 / N/A
Drupal 8 for Developers NY, White Plains - White Plains Tue, Nov 6 2018, 9:30 am $3200 / N/A
Haskell Fundamentals OH, Columbus - Galleria at PNC Plaza Tue, Dec 11 2018, 9:30 am $2700 / $4000
Advanced Haskell NC, Charlotte - Charlotte City Center Wed, Dec 12 2018, 9:30 am $2700 / $4000
MoDAF/NAF Introduction TX, Austin - Littlefield Congress Fri, Dec 28 2018, 9:30 am $1575 / $2575
OCEB2 OMG Certified Expert in BPM - Business Intermediate Exam Preparation NH, Portsmouth - Commerce Way Suite Mon, Dec 31 2018, 9:30 am $4725 / $6725

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.

Some of our clients

is growing fast!

We are looking to expand our presence in your region!

As a Business Development Manager you will:

  • expand business in the region
  • recruit local talent (sales, agents, trainers, consultants)
  • recruit local trainers and consultants

We offer:

  • Artificial Intelligence and Big Data systems to support your local operation
  • high-tech automation
  • continuously upgraded course catalogue and content
  • good fun in international team

If you are interested in running a high-tech, high-quality training and consulting business.

contact us right away!