Real-world, practical application of statistical modeling; keeping theory at bay and business needs in the forefront.

Thanks Bernard and Iza. It was a very good course and I appreciate your work.

*Shelly Kunkle - Michelin, North America*

R Programming Language, R Software Environment for statistical computing and graphics courses

Code | Name | Duration | Overview |
---|---|---|---|

dsbda | Data Science for Big Data Analytics | 35 hours | Introduction to Data Science for Big Data Analytics Data Science Overview Big Data Overview Data Structures Drivers and complexities of Big Data Big Data ecosystem and a new approach to analytics Key technologies in Big Data Data Mining process and problems Association Pattern Mining Data Clustering Outlier Detection Data Classification Introduction to Data Analytics lifecycle Discovery Data preparation Model planning Model building Presentation/Communication of results Operationalization Exercise: Case study From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology. Getting started with R Installing R and Rstudio Features of R language Objects in R Data in R Data manipulation Big data issues Exercises Getting started with Hadoop Installing Hadoop Understanding Hadoop modes HDFS MapReduce architecture Hadoop related projects overview Writing programs in Hadoop MapReduce Exercises Integrating R and Hadoop with RHadoop Components of RHadoop Installing RHadoop and connecting with Hadoop The architecture of RHadoop Hadoop streaming with R Data analytics problem solving with RHadoop Exercises Pre-processing and preparing data Data preparation steps Feature extraction Data cleaning Data integration and transformation Data reduction – sampling, feature subset selection, Dimensionality reduction Discretization and binning Exercises and Case study Exploratory data analytic methods in R Descriptive statistics Exploratory data analysis Visualization – preliminary steps Visualizing single variable Examining multiple variables Statistical methods for evaluation Hypothesis testing Exercises and Case study Data Visualizations Basic visualizations in R Packages for data visualization ggplot2, lattice, plotly, lattice Formatting plots in R Advanced graphs Exercises Regression (Estimating future values) Linear regression Use cases Model description Diagnostics Problems with linear regression Shrinkage methods, ridge regression, the lasso Generalizations and nonlinearity Regression splines Local polynomial regression Generalized additive models Regression with RHadoop Exercises and Case study Classification The classification related problems Bayesian refresher Naïve Bayes Logistic regression K-nearest neighbors Decision trees algorithm Neural networks Support vector machines Diagnostics of classifiers Comparison of classification methods Scalable classification algorithms Exercises and Case study Assessing model performance and selection Bias, Variance and model complexity Accuracy vs Interpretability Evaluating classifiers Measures of model/algorithm performance Hold-out method of validation Cross-validation Tuning machine learning algorithms with caret package Visualizing model performance with Profit ROC and Lift curves Ensemble Methods Bagging Random Forests Boosting Gradient boosting Exercises and Case study Support vector machines for classification and regression Maximal Margin classifiers Support vector classifiers Support vector machines SVM’s for classification problems SVM’s for regression problems Exercises and Case study Identifying unknown groupings within a data set Feature Selection for Clustering Representative based algorithms: k-means, k-medoids Hierarchical algorithms: agglomerative and divisive methods Probabilistic base algorithms: EM Density based algorithms: DBSCAN, DENCLUE Cluster validation Advanced clustering concepts Clustering with RHadoop Exercises and Case study Discovering connections with Link Analysis Link analysis concepts Metrics for analyzing networks The Pagerank algorithm Hyperlink-Induced Topic Search Link Prediction Exercises and Case study Association Pattern Mining Frequent Pattern Mining Model Scalability issues in frequent pattern mining Brute Force algorithms Apriori algorithm The FP growth approach Evaluation of Candidate Rules Applications of Association Rules Validation and Testing Diagnostics Association rules with R and Hadoop Exercises and Case study Constructing recommendation engines Understanding recommender systems Data mining techniques used in recommender systems Recommender systems with recommenderlab package Evaluating the recommender systems Recommendations with RHadoop Exercise: Building recommendation engine Text analysis Text analysis steps Collecting raw text Bag of words Term Frequency –Inverse Document Frequency Determining Sentiments Exercises and Case study |

datamodeling | Pattern Recognition | 35 hours | This course provides an introduction into the field of pattern recognition and machine learning. It also touches on practical applications in statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. The course is interactive and includes plenty of hands-on exercises, continuous feedback, and testing of knowledge and skills acquired. Audience Data analysts PhD students, researchers and practitioners Introduction Probability theory, model selection, decision and information theory Probability distributions Linear models for regression and classification Neural networks Kernel methods Sparse kernel machines Graphical models Mixture models and EM Approximate inference Sampling methods Continuous latent variables Sequential data Combining models |

kdd | Knowledge Discover in Databases (KDD) | 21 hours | Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing. In this course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes. Audience Data analysts or anyone interested in learning how to interpret data to solve problems Format of the course After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations. Introduction KDD vs data mining Establishing the application domain Establishing relevant prior knowledge Understanding the goal of the investigation Creating a target data set Data cleaning and preprocessing Data reduction and projection Choosing the data mining task Choosing the data mining algorithms Interpreting the mined patterns |

nlpwithr | Natural Language Processing (NLP) with R | 21 hours | It is estimated that unstructured data accounts for more than 90 percent of all data, much of it in the form of text. Blog posts, tweets, social media, and other digital publications continuously add to this growing body of data. This course centers around extracting insights and meaning from this data. Utilizing the R Language and Natural Language Processing (NLP) libraries, we combine concepts and techniques from computer science, artificial intelligence, and computational linguistics to algorithmically understand the meaning behind text data. Data samples are in English or Mandarin (普通话). Other languages can be made available if agreed before booking. By the end of the class participants will be able to prepare data sets (large and small) from disparate sources, then apply the right algorithms to analyze and report on its significance. Audience Linguists and programmers Format of the course Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding Introduction NLP and R vs Python Installing and configuring R Studio Installing R packages related to Natural Language Processing (NLP). An overview of R’s text manipulation capabilities Getting started with an NLP project in R Reading and importing data files into R Text manipulation with R Document clustering in R Parts of speech tagging in R Sentence parsing in R Working with regular expressions in R Named-entity recognition in R Topic modeling in R Text classification in R Working with very large data sets Visualizing your results Optimization Integrating R with other languages (Java, Python, etc.) Closing remarks |

BigData_ | A practical introduction to Data Analysis and Big Data | 28 hours | Participants who complete this training will gain a practical, real-world understanding of Big Data and its related technologies, methodologies and tools. Participants will have the opportunity to put this knowledge into practice through hands-on exercises. Group interaction and instructor feedback make up an important component of the class. The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools that enable Big Data storage, Distributed Processing, and Scalability. Audience Developers / programmers IT consultants Format of the course Part lecture, part discussion, heavy hands-on practice and implementation, occasional quizing to measure progress. Introduction to Data Analysis and Big Data What makes Big Data "big"? Velocity, Volume, Variety, Veracity (VVVV) Limits to traditional Data Processing Distributed Processing Statistical Analysis Types of Machine Learning Analysis Data Visualization Distributed Processing MapReduce Languages used for Data Analysis R language (crash course) Python (crash course) Approaches to Data Analysis Statistical Analysis Time Series analysis Forecasting with Correlation and Regression models Inferential Statistics (estimating) Descriptive Statistics in Big Data sets (e.g. calculating mean) Machine Learning Supervised vs unsupervised learning Classification and clustering Estimating cost of specific methods Filter Natural Language Processing Processing text Understaing meaning of the text Automatic text generation Sentiment/Topic Analysis Computer Vision Big Data infrastructure Data Storage SQL (relational database) MySQL Postgres Oracle NoSQL Cassandra MongoDB Neo4js Understanding the nuances: hierarchical, object-oriented, document-oriented, graph-oriented, etc. Distributed File Systems HDFS Search Engines ElasticSearch Distributed Processing Spark Machine Learning libraries: MLlib Spark SQL Scalability Public cloud AWS, Google, Aliyun, etc. Private cloud OpenStack, Cloud Foundry, etc. Auto-scalability Choosing right solution for the problem |

mrkfct | Market Forecasting | 14 hours | Audience This course has been created for analysts, forecasters wanting to introduce or improve forecasting which can be related to sale forecasting, economic forecasting, technology forecasting, supply chain management and demand or supply forecasting. Description This course guides delegates through series of methodologies, frameworks and algorithms which are useful when choosing how to predict the future based on historical data. It uses standard tools like Microsoft Excel or some Open Source programs (notably R project). The principles covered in this course can be implemented by any software (e.g. SAS, SPSS, Statistica, MINITAB ...) Problems facing forecasters Customer demand planning Investor uncertainty Economic planning Seasonal changes in demand/utilization Roles of risk and uncertainty Time series methods Moving average Exponential smoothing Extrapolation Linear prediction Trend estimation Growth curve Econometric methods (casual methods) Regression analysis using linear regression or non-linear regression Autoregressive moving average (ARMA) Autoregressive integrated moving average (ARIMA) Econometrics Judgemental methods Surveys Delphi method Scenario building Technology forecasting Forecast by analogy Simulation and other methods Simulation Prediction market Probabilistic forecasting and Ensemble forecasting Reference class forecasting |

rneuralnet | Neural Network in R | 14 hours | This course is an introduction to applying neural networks in real world problems using R-project software. Introduction to Neural Networks What are Neural Networks What is current status in applying neural networks Neural Networks vs regression models Supervised and Unsupervised learning Overview of packages available nnet, neuralnet and others differences between packages and itls limitations Visualizing neural networks Applying Neural Networks Concept of neurons and neural networks A simplified model of the brain Opportunities neuron XOR problem and the nature of the distribution of values The polymorphic nature of the sigmoidal Other functions activated Construction of neural networks Concept of neurons connect Neural network as nodes Building a network Neurons Layers Scales Input and output data Range 0 to 1 Normalization Learning Neural Networks Backward Propagation Steps propagation Network training algorithms range of application Estimation Problems with the possibility of approximation by Examples OCR and image pattern recognition Other applications Implementing a neural network modeling job predicting stock prices of listed |

rprogadv | Advanced R Programming | 7 hours | This course is for data scientists and statisticians that already have basic R & C++ coding skills and R code and need advanced R coding skills. The purpose is to give a practical advanced R programming course to participants interested in applying the methods at work. Sector specific examples are used to make the training relevant to the audience R's environment Object oriented programming in R S3 S4 Reference classes Performance profiling Exception handling Debugging R code Creating R packages Unit testing C/C++ coding in R SEXPRs Calling dynamically loaded libraries from R Writing and compiling C/C++ code from R Improving R's performance with C++ linear algebra library |

dataminr | Data Mining with R | 14 hours | Sources of methods Artificial intelligence Machine learning Statistics Sources of data Pre processing of data Data Import/Export Data Exploration and Visualization Dimensionality Reduction Dealing with missing values R Packages Data mining main tasks Automatic or semi-automatic analysis of large quantities of data Extracting previously unknown interesting patterns groups of data records (cluster analysis) unusual records (anomaly detection) dependencies (association rule mining) Data mining Anomaly detection (Outlier/change/deviation detection) Association rule learning (Dependency modeling) Clustering Classification Regression Summarization Frequent Pattern Mining Text Mining Decision Trees Regression Neural Networks Sequence Mining Frequent Pattern Mining Data dredging, data fishing, data snooping |

bigdatar | Programming with Big Data in R | 21 hours | Introduction to Programming Big Data with R (bpdR) Setting up your environment to use pbdR Scope and tools available in pbdR Packages commonly used with Big Data alongside pbdR Message Passing Interface (MPI) Using pbdR MPI 5 Parallel processing Point-to-point communication Send Matrices Summing Matrices Collective communication Summing Matrices with Reduce Scatter / Gather Other MPI communications Distributed Matrices Creating a distributed diagonal matrix SVD of a distributed matrix Building a distributed matrix in parallel Statistics Applications Monte Carlo Integration Reading Datasets Reading on all processes Broadcasting from one process Reading partitioned data Distributed Regression Distributed Bootstrap |

MLFWR1 | Machine Learning Fundamentals with R | 14 hours | The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the R programming platform and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results. Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications. Introduction to Applied Machine Learning Statistical learning vs. Machine learning Iteration and evaluation Bias-Variance trade-off Regression Linear regression Generalizations and Nonlinearity Exercises Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Exercises Cross-validation and Resampling Cross-validation approaches Bootstrap Exercises Unsupervised Learning K-means clustering Examples Challenges of unsupervised learning and beyond K-means |

mrkanar | Marketing Analytics using R | 21 hours | Audience: Business owners (marketing managers, product managers, customer base managers) and their teams; customer insights professionals. Overview: The course follows the customer life cycle from acquiring new customers, managing the existing customers for profitability, retaining good customers, and finally understanding which customers are leaving us and why. We will be working with real (if anonymous) data from a variety of industries including telecommunications, insurance, media, and high tech. Format: Instructor-led training over the course of five half-day sessions with in-class exercises as well as homework. It can be delivered as a classroom or distance (online) course. Part 1: Inflow - acquiring new customers Our focus is direct marketing so we will not look at advertising campaigns but instead focus on understanding marketing campaigns (e.g. direct mail). This is the foundation for almost everything else in the course. We look at measuring and improving campaign effectiveness. including: The importance of test and control groups. Universal control group. Techniques: Lift curves, AUC Return on investment. Optimizing marketing spend. Part 2: Base Management: managing existing customers Considering the cost of acquiring new customers for many businesses there are probably few assets more valuable than their existing customer base, though few think of it in this way. Topics include: 1. Cross-selling and up-selling: Offering the right product or service to the customer at the right time. Techniques: RFM models. Multinomial regression. b. Value of lifetime purchases. 2. Customer segmentation: Understanding the types of customers that you have. Classification models using first simple decision trees, and then random forests and other, newer techniques. Part 3: Retention: Keeping your good customers Understanding which customers are likely to leave and what you can do about it is key to profitability in many industries, especially where there are repeat purchases or subscriptions. We look at propensity to churn models, including Logistic regression: glm (package stats) and newer techniques (especially gbm as a general tool) Tuning models (caret) and introduction to ensemble models. Part 4: Outflow: Understanding who are leaving and why Customers will leave you – that is a fact of life. What is important is to understand who are leaving and why. Is it low value customers who are leaving or is it your best customers? Are they leaving to competitors or because they no longer need your products and services? Topics include: Customer lifetime value models: Combining value of purchases with propensity to churn and the cost of servicing and retaining the customer. Analysing survey data. (Generally useful, but we will do a brief introduction here in the context of exit surveys.) |

dataar | Data Analytics With R | 21 hours | R is a very popular, open source environment for statistical computing, data analytics and graphics. This course introduces R programming language to students. It covers language fundamentals, libraries and advanced concepts. Advanced data analytics and graphing with real world data. Audience Developers / data analytics Duration 3 days Format Lectures and Hands-on Day One: Language Basics Course Introduction About Data Science Data Science Definition Process of Doing Data Science. Introducing R Language Variables and Types Control Structures (Loops / Conditionals) R Scalars, Vectors, and Matrices Defining R Vectors Matricies String and Text Manipulation Character data type File IO Lists Functions Introducing Functions Closures lapply/sapply functions DataFrames Labs for all sections Day Two: Intermediate R Programming DataFrames and File I/O Reading data from files Data Preparation Built-in Datasets Visualization Graphics Package plot() / barplot() / hist() / boxplot() / scatter plot Heat Map ggplot2 package ( qplot(), ggplot()) Exploration With Dplyr Labs for all sections Day 3: Advanced Programming With R Statistical Modeling With R Statistical Functions Dealing With NA Distributions (Binomial, Poisson, Normal) Regression Introducing Linear Regressions Recommendations Text Processing (tm package / Wordclouds) Clustering Introduction to Clustering KMeans Classification Introduction to Classification Naive Bayes Decision Trees Training using caret package Evaluating Algorithms R and Big Data Connecting R to databases Big Data Ecosystem Labs for all sections |

rintro | Introduction to R | 21 hours | R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or fine-tune financial models. R has a wide variety of packages for data mining. This course covers the manipulation of objects in R including reading data, accessing R packages, writing R functions, and making informative graphs. It includes analyzing data using common statistical models. The course teaches how to use the R software (http://www.r-project.org) both on a command line and in a graphical user interface (GUI). Introduction and preliminaries Making R more friendly, R and available GUIs The R environment Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Simple manipulations; numbers and vectors Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Other types of objects Objects, their modes and attributes Intrinsic attributes: mode and length Changing the length of an object Getting and setting attributes The class of an object Ordered and unordered factors A specific example The function tapply() and ragged arrays Ordered factors Arrays and matrices Arrays Array indexing. Subsections of an array Index matrices The array() function Mixed vector and array arithmetic. The recycling rule The outer product of two arrays Generalized transpose of an array Matrix facilities Matrix multiplication Linear equations and inversion Eigenvalues and eigenvectors Singular value decomposition and determinants Least squares fitting and the QR decomposition Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Frequency tables from factors Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames attach() and detach() Working with data frames Attaching arbitrary lists Managing the search path Reading data from files The read.table()function The scan() function Accessing builtin datasets Loading data from other R packages Editing data Probability distributions R as a set of statistical tables Examining the distribution of a set of data One- and two-sample tests Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while Writing your own functions Simple examples Defining new binary operators Named arguments and defaults The '...' argument Assignments within functions More advanced examples Efficiency factors in block designs Dropping all names in a printed array Recursive numerical integration Scope Customizing the environment Classes, generic functions and object orientation Statistical models in R Defining statistical models; formulae Contrasts Linear models Generic functions for extracting model information Analysis of variance and model comparison ANOVA tables Updating fitted models Generalized linear models Families The glm() function Nonlinear least squares and maximum likelihood models Least squares Maximum likelihood Some non-standard models Graphical procedures High-level plotting commands The plot() function Displaying multivariate data Display graphics Arguments to high-level plotting functions Low-level plotting commands Mathematical annotation Hershey vector fonts Interacting with graphics Using graphics parameters Permanent changes: The par() function Temporary changes: Arguments to graphics functions Graphics parameters list Graphical elements Axes and tick marks Figure margins Multiple figure environment Device drivers PostScript diagrams for typeset documents Multiple graphics devices Dynamic graphics Packages Standard packages Contributed packages and CRAN Namespaces |

rdataana | R for Data Analysis and Research | 7 hours | Audience managers developers scientists students Format of the course on-line instruction and discussion OR face-to-face workshops The list below gives an idea of the topics that will be covered in the workshop. The number of topics that will be covered depends on the duration of the workshop (i.e. one, two or three days). In a one or two day workshop it may not be possible to cover all topics, and so the workshop will be tailored to suit the specific needs of the learners. A first R session Syntax for analysing one dimensional data arrays Syntax for analysing two dimensional data arrays Reading and writing data files Sub-setting data, sorting, ranking and ordering data Merging arrays Set membership The main statistical functions in R The Normal Distribution (correlation, probabilities, tests for normality and confidence intervals) Ordinary Least Squares Regression T-tests, Analysis of Variance and Multivariable Analysis of Variance Chi-square tests for categorical variables Writing functions in R Writing software (scripts) in R Control structures (e.g. Loops) Graphical methods (including scatterplots, bar charts, pie charts, histograms, box plots and dot charts) Graphical User Interfaces for R |

frcr | Forecasting with R | 14 hours | This course allows delegate to fully automate the process of forecasting with R Forecasting with R Introduction to Forecasting Exponential Smoothing ARIMA models The forecast package Package 'forecast' accuracy Acf arfima Arima arima.errors auto.arima bats BoxCox BoxCox.lambda croston CV dm.test dshw ets fitted.Arima forecast forecast.Arima forecast.bats forecast.ets forecast.HoltWinters forecast.lm forecast.stl forecast.StructTS gas gold logLik.ets ma meanf monthdays msts na.interp naive ndiffs nnetar plot.bats plot.ets plot.forecast rwf seasadj seasonaldummy seasonplot ses simulate.ets sindexf splinef subset.ts taylor tbats thetaf tsdisplay tslm wineind woolyrnq |

Course | Course Date | Course Price [Remote / Classroom] |
---|---|---|

Market Forecasting - CA, San Francisco - Golden Gate - 75 Broadway | Mon, Jun 12 2017, 9:30 am | $3580 / $5140 |

Introduction to R - Remote Course - Mountain Time (UTC-07:00) US & Canada | Mon, Jun 12 2017, 9:30 am | $5370 / $6920 |

Machine Learning Fundamentals with R - NC, Cary - Regency | Mon, Jun 12 2017, 9:30 am | $3995 / $5645 |

Data Analytics With R - CO, Denver - Colorado Boulevard Center | Mon, Jul 31 2017, 9:30 am | $4550 / $6850 |

Course | Venue | Course Date | Course Price [Remote / Classroom] |
---|---|---|---|

Lua Fundamentals | MS, Flowood - Market Street | Thu, Jun 22 2017, 9:30 am | $3317 / $4617 |

MoDAF/NAF Introduction | AZ, Phoenix - 24th and Camelback | Tue, Jul 4 2017, 9:30 am | $1980 / $3270 |

Haskell Fundamentals | KY, Louisville – Ormsby III Forest Green | Thu, Jul 13 2017, 9:30 am | $3015 / $4715 |

JavaScript - Advanced Programming | DE, Wilmington - Downtown | Mon, Jul 17 2017, 9:30 am | $3222 / $5162 |

jBPM for Process Designers | IN, Indianapolis - Lockerbie Marketplace | Tue, Sep 12 2017, 9:30 am | $6444 / $9274 |