Course Outline
spark.mllib: data types, algorithms, and utilities
- Data types
- Basic statistics
- summary statistics
- correlations
- stratified sampling
- hypothesis testing
- streaming significance testing
- random data generation
- Classification and regression
- linear models (SVMs, logistic regression, linear regression)
- naive Bayes
- decision trees
- ensembles of trees (Random Forests and Gradient-Boosted Trees)
- isotonic regression
- Collaborative filtering
- alternating least squares (ALS)
- Clustering
- k-means
- Gaussian mixture
- power iteration clustering (PIC)
- latent Dirichlet allocation (LDA)
- bisecting k-means
- streaming k-means
- Dimensionality reduction
- singular value decomposition (SVD)
- principal component analysis (PCA)
- Feature extraction and transformation
- Frequent pattern mining
- FP-growth
- association rules
- PrefixSpan
- Evaluation metrics
- PMML model export
- Optimization (developer)
- stochastic gradient descent
- limited-memory BFGS (L-BFGS)
spark.ml: high-level APIs for ML pipelines
- Overview: estimators, transformers and pipelines
- Extracting, transforming and selecting features
- Classification and regression
- Clustering
- Advanced topics
Requirements
Knowledge of one of the following:
- Java
- Scala
- Python
- SparkR.
Testimonials (9)
This is one of the best hands-on with exercises programming courses I have ever taken.
Laura Kahn
Course - Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
Course - Apache Spark in the Cloud
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Course - Big Data Analytics in Health
Sufficient hands on, trainer is knowledgable
Chris Tan
Course - A Practical Introduction to Stream Processing
very interactive...
Richard Langford
Course - SMACK Stack for Data Science
Commitment and willingness to explain secondary topics.
Marek - Krajowy Rejestr Długów Biuro Informacji Gospodarczej S.A.
Course - Apache Spark Fundamentals
Machine Translated
La combinación de teoría y práctica con herramientas como databricks
Graciela Saud - Servicio de Impuestos Internos
Course - Spark for Developers
Exercises and exchanges during questions/answers
Antoine - Physiobotic
Course - Scaling Data Pipelines with Spark NLP
Machine Translated
The fact that we were able to take with us most of the information/course/presentation/exercises done, so that we can look over them and perhaps redo what we didint understand first time or improve what we already did.