1572256 
R 
21 hours 
Day 1
Introduction and preliminaries
Making R more friendly, R and available GUIs
Rstudio
Related software and documentation
R and statistics
Using R interactively
An introductory session
Getting help with functions and features
R commands, case sensitivity, etc.
Recall and correction of previous commands
Executing commands from or diverting output to a file
Data permanency and removing objects
Simple manipulations; numbers and vectors
Vectors and assignment
Vector arithmetic
Generating regular sequences
Logical vectors
Missing values
Character vectors
Index vectors; selecting and modifying subsets of a data set
Other types of objects
Objects, their modes and attributes
Intrinsic attributes: mode and length
Changing the length of an object
Getting and setting attributes
The class of an object
Ordered and unordered factors
A specific example
The function tapply() and ragged arrays
Ordered factors
Arrays and matrices
Arrays
Array indexing. Subsections of an array
Index matrices
The array() function
Mixed vector and array arithmetic. The recycling rule
The outer product of two arrays
Generalized transpose of an array
Matrix facilities
Matrix multiplication
Linear equations and inversion
Eigenvalues and eigenvectors
Singular value decomposition and determinants
Least squares fitting and the QR decomposition
Forming partitioned matrices, cbind() and rbind()
The concatenation function, (), with arrays
Frequency tables from factors
Day 2
Lists and data frames
Lists
Constructing and modifying lists
Concatenating lists
Data frames
Making data frames
attach() and detach()
Working with data frames
Attaching arbitrary lists
Managing the search path
Data manipulation
Selecting, subsetting observations and variables
Filtering, grouping
Recoding, transformations
Aggregation, combining data sets
Character manipulation, stringr package
Reading data
Txt files
CSV files
XLS, XLSX files
SPSS, SAS, Stata,… and other formats data
Exporting data to txt, csv and other formats
Accessing data from databases using SQL language
Probability distributions
R as a set of statistical tables
Examining the distribution of a set of data
One and twosample tests
Grouping, loops and conditional execution
Grouped expressions
Control statements
Conditional execution: if statements
Repetitive execution: for loops, repeat and while
Day 3
Writing your own functions
Simple examples
Defining new binary operators
Named arguments and defaults
The '...' argument
Assignments within functions
More advanced examples
Efficiency factors in block designs
Dropping all names in a printed array
Recursive numerical integration
Scope
Customizing the environment
Classes, generic functions and object orientation
Statistical analysis in R
Linear regression models
Generic functions for extracting model information
Updating fitted models
Generalized linear models
Families
The glm() function
Classification
Logistic Regression
Linear Discriminant Analysis
Unsupervised learning
Principal Components Analysis
Clustering Methods( kmeans, hierarchical clustering, kmedoids)
Survival analysis
Survival objects in r
KaplanMeier estimate
Confidence bands
Cox PH models, constant covariates
Cox PH models, timedependent covariates
Graphical procedures
Highlevel plotting commands
The plot() function
Displaying multivariate data
Display graphics
Arguments to highlevel plotting functions
Basic visualisation graphs
Multivariate relations with lattice and ggplot package
Using graphics parameters
Graphics parameters list
Automated and interactive reporting
Combining output from R with text
Creating html, pdf documents 
2501 
Statistic analysis in market research 
28 hours 
Goal: Improving consumer behavior researcher workshop products and services
Addressees The researchers, market analysts, managers and employees of marketing departments, sales departments primarily pharmaceutical and FMCG, students of socioeconomic and everyone interested in market research
Module 1 Quantitative research
Pretreatment results
check the accuracy of the database
control of missing data
weighting observations
Statistical models
multiple regression
conjoint analysis
classification trees
Automate procedures in tracking studies
Analysis of data from a marketing experiment
The report and draw conclusions
Module 2 Qualitative Research
The transformation of qualitative data into a quantitative
Statistical models for qualitative data

2929362 
Xcelsius 
14 hours 
Description:
In this Xcelsius Training course, students will use Xcelsius Present to create interactive visualizations for presenting complex data in a simple way, and to conduct analysis to make critical decisions. Students will also create complete dashboards that present business, project, and human resources information, all consolidated and presented in a userfriendly manner. Finally, students will publish dashboards into various file formats such as Adobe Flash, Microsoft Office PowerPoint, Adobe PDF, and also to the web.
Objectives:
Upon successful completion of this course, students will be able to:
Explore the Xcelsius workspace and an already created dashboard.
Create simple visualizations.
Conduct data analysis using Xcelsius components that give dynamic functionality to the specified data.
Create a Project Management dashboard.
Create a dashboard to consolidate and present the Human Resources information of an organization.
Finalize dashboards and export them to different file formats.
Audience:
This course is designed for professionals who conduct data analysis and need to present robust and timely data in an interactive display.
1: Getting Started with Xcelsius
Explore the Xcelsius Interface
Explore a Dashboard
2: Creating Simple and Interactive Visualizations
Create a Simple Xcelsius Chart
Manage Personal Finance Using Value Box
Organize Levels of Information Using Filters
Conduct a Comparative Study Using List Builder and Line Chart
3: Conducting Data Analysis
Conduct Trend Analysis Using Combo Box
Conduct Demand Analysis Using Label Based Menu
Conduct a Region Based Demand Analysis Using Maps
Forecast Revenue Using Sliders and Gauge
4: Creating a Project Management Dashboard
Drill Down the Status of Current Projects Using the Drill Down Function
Analyze Resource Efficiency Using Fisheye Picture Menu and Other Tools
Analyze Resource Utilization Using Combination Chart
5: Creating a Human Resources Dashboard
Create an Organization Dashboard Using Organization Chart
Conduct Attrition Analysis
6: Finalizing Dashboards
Create a Snapshot
Publish Dashboards

2502 
Advanced Statistics using SPSS Predictive Analytics Software 
28 hours 
Goal:
Mastering the skill work independently with the program SPSS for advanced use, dialog boxes, and command language syntax for the selected analytical techniques.
The addressees:
Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and advanced level and learn the selected statistical models. Training takes universal analysis problems and it is dedicated to a specific industry
Preparation of a database for analysis
management of data collection
operations on variables
transforming the variables selected functions (logarithmic, exponential, etc.)
Parametric and nonparametric statistics, or how to fit a model to the data
measuring scale
distribution type
outliers and influential observations (outliers)
sample size
central limit theorem
Study the differences between the characteristics of statistical
tests based on the average and media
Analysis of correlation and similarities
correlations
principal component analysis
cluster analysis
Prediction  single regression analysis and multivariate
method of least squares
Linear Model
instrumental variable regression models (dummy, effect, orthogonal coding)
Statistical Inference 
2623 
Marketing Analytics using R 
21 hours 
Audience:
Business owners (marketing managers, product managers, customer base managers) and their teams; customer insights professionals.
Overview:
The course follows the customer life cycle from acquiring new customers, managing the existing customers for profitability, retaining good customers, and finally understanding which customers are leaving us and why. We will be working with real (if anonymous) data from a variety of industries including telecommunications, insurance, media, and high tech.
Format:
Instructorled training over the course of five halfday sessions with inclass exercises as well as homework. It can be delivered as a classroom or distance (online) course.
Part 1: Inflow  acquiring new customers
Our focus is direct marketing so we will not look at advertising campaigns but instead focus on understanding marketing campaigns (e.g. direct mail). This is the foundation for almost everything else in the course.
We look at measuring and improving campaign effectiveness. including:
The importance of test and control groups. Universal control group.
Techniques: Lift curves, AUC
Return on investment. Optimizing marketing spend.
Part 2: Base Management: managing existing customers
Considering the cost of acquiring new customers for many businesses there are probably few assets more valuable than their existing customer base, though few think of it in this way. Topics include:
1. Crossselling and upselling: Offering the right product or service to the customer at the right time.
Techniques: RFM models. Multinomial regression.
b. Value of lifetime purchases.
2. Customer segmentation: Understanding the types of customers that you have.
Classification models using first simple decision trees, and then
random forests and other, newer techniques.
Part 3: Retention: Keeping your good customers
Understanding which customers are likely to leave and what you can do about it is key to profitability in many industries, especially where there are repeat purchases or subscriptions. We look at propensity to churn models, including
Logistic regression: glm (package stats) and newer techniques (especially gbm as a general tool)
Tuning models (caret) and introduction to ensemble models.
Part 4: Outflow: Understanding who are leaving and why
Customers will leave you – that is a fact of life. What is important is to understand who are leaving and why. Is it low value customers who are leaving or is it your best customers? Are they leaving to competitors or because they no longer need your products and services? Topics include:
Customer lifetime value models: Combining value of purchases with propensity to churn and the cost of servicing and retaining the customer.
Analysing survey data. (Generally useful, but we will do a brief introduction here in the context of exit surveys.)

42 
Excel For Statistical Data Analysis 
14 hours 
Audience
Analysts, researchers, scientists, graduates and students and anyone who is interested in learning how to facilitate statistical analysis in Microsoft Excel.
Course Objectives
This course will help improve your familiarity with Excel and statistics and as a result increase the effectiveness and efficiency of your work or research.
This course describes how to use the Analysis ToolPack in Microsoft Excel, statistical functions and how to perform basic statistical procedures. It will explain what Excel limitation are and how to overcome them.
Aggregating Data in Excel
Statistical Functions
Outlines
Subtotals
Pivot Tables
Data Relation Analysis
Normal Distribution
Descriptive Statistics
Linear Correlation
Regression Analysis
Covariance
Analysing Data in Time
Trends/Regression line
Linear, Logarithmic, Polynomial, Power, Exponential, Moving Average Smoothing
Seasonal fluctuations analysis
Comparing Populations
Confidence Interval for the Mean
Test of Hypothesis Concerning the Population Mean
Difference Between Mean of Two Populations
ANOVA: Analysis of Variances
GoodnessofFit Test for Discrete Random Variables
Test of Independence: Contingency Tables
Test Hypothesis Concerning the Variance of Two Populations
Forecasting
Extrapolation

2503 
Statistics with SPSS Predictive Analytics Software 
14 hours 
Goal:
Learning to work with SPSS at the level of independence
The addressees:
Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and learn popular data mining techniques.
Using the program
The dialog boxes
input / downloading data
the concept of variable and measuring scales
preparing a database
Generate tables and graphs
formatting of the report
Command language syntax
automated analysis
storage and modification procedures
create their own analytical procedures
Data Analysis
descriptive statistics
Key terms: eg variable, hypothesis, statistical significance
measures of central tendency
measures of dispersion
measures of central tendency
standardization
Introduction to research the relationships between variables
correlational and experimental methods
Summary: This case study and discussion

85063 
Training Neural Network in R 
14 hours 
This course is an introduction to applying neural networks in real world problems using Rproject software.
Introduction to Neural Networks
What are Neural Networks
What is current status in applying neural networks
Neural Networks vs regression models
Supervised and Unsupervised learning
Overview of packages available
nnet, neuralnet and others
differences between packages and itls limitations
Visualizing neural networks
Applying Neural Networks
Concept of neurons and neural networks
A simplified model of the brain
Opportunities neuron
XOR problem and the nature of the distribution of values
The polymorphic nature of the sigmoidal
Other functions activated
Construction of neural networks
Concept of neurons connect
Neural network as nodes
Building a network
Neurons
Layers
Scales
Input and output data
Range 0 to 1
Normalization
Learning Neural Networks
Backward Propagation
Steps propagation
Network training algorithms
range of application
Estimation
Problems with the possibility of approximation by
Examples
OCR and image pattern recognition
Other applications
Implementing a neural network modeling job predicting stock prices of listed

44 
Statistics Level 1 
14 hours 
This course has been created for people who require general statistics skills. This course can be tailored to a specific area of expertise like market research, biology, manufacturing, public sector research, etc...
Introduction
Descriptive Statistics
Inferential Statistics
Sampling Demonstration
Variables
Percentiles
Measurement
Levels of Measurement
Measurement Demonstration
Basics of Data Collection
Distributions
Summation Notation
Linear Transformations
Exercises
Graphing Distributions
Qualitative Variables
Quantitative Variables
Stem and Leaf Displays
Histograms
Frequency Polygons
Box Plots
Box Plot Demonstration
Bar Charts
Line Graphs
Exercises
Summarizing Distributions
Central Tendency
What is Central Tendency
Measures of Central Tendency
Balance Scale Simulation
Absolute Difference Simulation
Squared Differences Simulation
Median and Mean
Mean and Median Simulation
Additional Measures
Comparing measures
Variability
Measures of Variability
Estimating Variance Simulation
Shape
Comparing Distributions Demo
Effects of Transformations
Variance Sum Law I
Exercises
Normal Distributions
History
Areas of Normal Distributions
Varieties of Normal Distribution Demo
Standard Normal
Normal Approximation to the Binomial
Normal Approximation Demo
Exercises

287766 
Programming with Big Data in R 
21 hours 
Introduction to Programming Big Data with R (bpdR)
Setting up your environment to use pbdR
Scope and tools available in pbdR
Packages commonly used with Big Data alongside pbdR
Message Passing Interface (MPI)
Using pbdR MPI 5
Parallel processing
Pointtopoint communication
Send Matrices
Summing Matrices
Collective communication
Summing Matrices with Reduce
Scatter / Gather
Other MPI communications
Distributed Matrices
Creating a distributed diagonal matrix
SVD of a distributed matrix
Building a distributed matrix in parallel
Statistics Applications
Monte Carlo Integration
Reading Datasets
Reading on all processes
Broadcasting from one process
Reading partitioned data
Distributed Regression
Distributed Bootstrap

43 
Statistics Level 2 
28 hours 
This training course covers advanced statistics. It explains most of the tools commonly used in research, analysis and forecasting. It provides short explanations of the theory behind the formulas.
This course does not relate to any specific field of knowledge, but can be tailored if all the delegates have the same background and goals.
Some basic computer tools are used during this course (notably Excel and OpenOffice)
Describing Bivariate Data
Introduction to Bivariate Data
Values of the Pearson Correlation
Guessing Correlations Simulation
Properties of Pearson's r
Computing Pearson's r
Restriction of Range Demo
Variance Sum Law II
Exercises
Probability
Introduction
Basic Concepts
Conditional Probability Demo
Gamblers Fallacy Simulation
Birthday Demonstration
Binomial Distribution
Binomial Demonstration
Base Rates
Bayes' Theorem Demonstration
Monty Hall Problem Demonstration
Exercises
Normal Distributions
Introduction
History
Areas of Normal Distributions
Varieties of Normal Distribution Demo
Standard Normal
Normal Approximation to the Binomial
Normal Approximation Demo
Exercises
Sampling Distributions
Introduction
Basic Demo
Sample Size Demo
Central Limit Theorem Demo
Sampling Distribution of the Mean
Sampling Distribution of Difference Between Means
Sampling Distribution of Pearson's r
Sampling Distribution of a Proportion
Exercises
Estimation
Introduction
Degrees of Freedom
Characteristics of Estimators
Bias and Variability Simulation
Confidence Intervals
Exercises
Logic of Hypothesis Testing
Introduction
Significance Testing
Type I and Type II Errors
One and TwoTailed Tests
Interpreting Significant Results
Interpreting NonSignificant Results
Steps in Hypothesis Testing
Significance Testing and Confidence Intervals
Misconceptions
Exercises
Testing Means
Single Mean
t Distribution Demo
Difference between Two Means (Independent Groups)
Robustness Simulation
All Pairwise Comparisons Among Means
Specific Comparisons
Difference between Two Means (Correlated Pairs)
Correlated t Simulation
Specific Comparisons (Correlated Observations)
Pairwise Comparisons (Correlated Observations)
Exercises
Power
Introduction
Factors Affecting Power
Why power matters
Exercises
Prediction
Introduction to Simple Linear Regression
Linear Fit Demo
Partitioning Sums of Squares
Standard Error of the Estimate
Prediction Line Demo
Inferential Statistics for b and r
Exercises
ANOVA
Introduction
ANOVA Designs
OneFactor ANOVA (BetweenSubjects)
OneWay Demo
MultiFactor ANOVA (BetweenSubjects)
Unequal Sample Sizes
Tests Supplementing ANOVA
WithinSubjects ANOVA
Power of WithinSubjects Designs Demo
Exercises
Chi Square
Chi Square Distribution
OneWay Tables
Testing Distributions Demo
Contingency Tables
2 x 2 Table Simulation
Exercises

263582 
Statistical Thinking for Decision Makers 
7 hours 
This course has been created for decision makers whose primary goal is not to do the calculation and the analysis, but to understand them and be able to choose what kind of statistical methods are relevant in strategic planning of the organization.
For example, a prospect participant needs to make decision how many samples needs to be collected before they can make the decision whether the product is going to be launched or not.
If you need longer course which covers the very basics of statistical thinking have a look at 5 day "Statistics for Managers" training.
What statistics can offer to Decision Makers
Descriptive Statistics
Basic statistics  which of the statistics (e.g. median, average, percentiles etc...) are more relevant to different distributions
Graphs  significance of getting it right (e.g. how the way the graph is created reflects the decision)
Variable types  what variables are easier to deal with
Ceteris paribus, things are always in motion
Third variable problem  how to find the real influencer
Inferential Statistics
Probability value  what is the meaning of Pvalue
Repeated experiment  how to interpret repeated experiment results
Data collection  you can minimize bias, but not get rid of it
Understanding confidence level
Statistical Thinking
Decision making with limited information
how to check how much information is enough
prioritizing goals based on probability and potential return (benefit/cost ratio ration, decision trees)
How errors add up
Butterfly effect
Black swans
What is Schrödinger's cat and what is Newton's Apple in business
Cassandra Problem  how to measure a forecast if the course of action has changed
Google Flu trends  how it went wrong
How decisions make forecast outdated
Forecasting  methods and practicality
ARIMA
Why naive forecasts are usually more responsive
How far a forecast should look into the past?
Why more data can mean worse forecast?
Statistical Methods useful for Decision Makers
Describing Bivariate Data
Univariate data and bivariate data
Probability
why things differ each time we measure them?
Normal Distributions and normally distributed errors
Estimation
Independent sources of information and degrees of freedom
Logic of Hypothesis Testing
What can be proven, and why it is always the opposite what we want (Falsification)
Interpreting the results of Hypothesis Testing
Testing Means
Power
How to determine a good (and cheap) sample size
False positive and false negative and why it is always a tradeoff

287807 
Machine Learning Fundamentals with R 
14 hours 
The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the R programming platform and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results.
Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications.
Introduction to Applied Machine Learning
Statistical learning vs. Machine learning
Iteration and evaluation
BiasVariance tradeoff
Regression
Linear regression
Generalizations and Nonlinearity
Exercises
Classification
Bayesian refresher
Naive Bayes
Logistic regression
KNearest neighbors
Exercises
Crossvalidation and Resampling
Crossvalidation approaches
Bootstrap
Exercises
Unsupervised Learning
Kmeans clustering
Examples
Challenges of unsupervised learning and beyond Kmeans

138 
Minitab for Statistical Data Analysis 
14 hours 
The course is aimed at anyone interested in statistical analysis. It provides familiarity with Minitab and will increase the effectiveness and efficiency of your data analysis and improve your knowledge of statistics.
Descriptive Statistics
Normal Distribution
Correlation
Regression
Trend analysis & forecasting
Confidence intervals
ttests
proportion tests
variance tests
Anova
Chi Squared tests

284990 
Applied Machine Learning 
14 hours 
This training course is for people that would like to apply Machine Learning in practical applications.
Audience
This course is for data scientists and statisticians that have some familiarity with statistics and know how to program R (or Python or other chosen language). The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization.
The purpose is to give practical applications to Machine Learning to participants interested in applying the methods at work.
Sector specific examples are used to make the training relevant to the audience.
Naive Bayes
Multinomial models
Bayesian categorical data analysis
Discriminant analysis
Linear regression
Logistic regression
GLM
EM Algorithm
Mixed Models
Additive Models
Classification
KNN
Bayesian Graphical Models
Factor Analysis (FA)
Principal Component Analysis (PCA)
Independent Component Analysis (ICA)
Support Vector Machines (SVM) for regression and classification
Boosting
Ensemble models
Neural networks
Hidden Markov Models (HMM)
Space State Models
Clustering

287823 
Data Mining with R 
14 hours 
Sources of methods
Artificial intelligence
Machine learning
Statistics
Sources of data
Pre processing of data
Data Import/Export
Data Exploration and Visualization
Dimensionality Reduction
Dealing with missing values
R Packages
Data mining main tasks
Automatic or semiautomatic analysis of large quantities of data
Extracting previously unknown interesting patterns
groups of data records (cluster analysis)
unusual records (anomaly detection)
dependencies (association rule mining)
Data mining
Anomaly detection (Outlier/change/deviation detection)
Association rule learning (Dependency modeling)
Clustering
Classification
Regression
Summarization
Frequent Pattern Mining
Text Mining
Decision Trees
Regression
Neural Networks
Sequence Mining
Frequent Pattern Mining
Data dredging, data fishing, data snooping 
1277 
Analysing Financial Data in Excel 
14 hours 
Audience
Financial or market analysts, managers, accountants
Course Objectives
Facilitate and automate all kinds of financial analysis with Microsoft Excel
Advanced functions
Logical functions
Math and statistical functions
Financial functions
Lookups and data tables
Using lookup functions
Using MATCH and INDEX
Advanced list management
Validating cell entries
Exploring database functions
PivotTables and PivotCharts
Creating Pivot Tables
Calculated Item and Calculated Field
Working with External Data
Exporting and importing
Exporting and importing XML data
Querying external databases
Linking to a database
Linking to a XML data source
Analysing online data (Web Queries)
Analytical options
Goal Seek
Solver
The Analysis ToolPack
Scenarios
Macros and custom functions
Running and recording a macro
Working with VBA code
Creating functions
Conditional formatting and SmartArt
Conditional formatting with graphics
SmartArt graphics

284991 
Introduction to Machine Learning 
7 hours 
This training course is for people that would like to apply basic Machine Learning techniques in practical applications.
Audience
Data scientists and statisticians that have some familiarity with machine learning and know how to program R. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization. The purpose is to give a practical introduction to machine learning to participants interested in applying the methods at work
Sector specific examples are used to make the training relevant to the audience.
Naive Bayes
Multinomial models
Bayesian categorical data analysis
Discriminant analysis
Linear regression
Logistic regression
GLM
EM Algorithm
Mixed Models
Additive Models
Classification
KNN
Ridge regression
Clustering

287843 
Advanced R Programming 
7 hours 
This course is for data scientists and statisticians that already have basic R & C++ coding skills and R code and need advanced R coding skills.
The purpose is to give a practical advanced R programming course to participants interested in applying the methods at work.
Sector specific examples are used to make the training relevant to the audience
R's environment
Object oriented programming in R
S3
S4
Reference classes
Performance profiling
Exception handling
Debugging R code
Creating R packages
Unit testing
C/C++ coding in R
SEXPRs
Calling dynamically loaded libraries from R
Writing and compiling C/C++ code from R
Improving R's performance with C++ linear algebra library

1366 
Market Forecasting 
14 hours 
Audience
This course has been created for analysts, forecasters wanting to introduce or improve forecasting which can be related to sale forecasting, economic forecasting, technology forecasting, supply chain management and demand or supply forecasting.
Description
This course guides delegates through series of methodologies, frameworks and algorithms which are useful when choosing how to predict the future based on historical data.
It uses standard tools like Microsoft Excel or some Open Source programs (notably R project).
The principles covered in this course can be implemented by any software (e.g. SAS, SPSS, Statistica, MINITAB ...)
Problems facing forecasters
Customer demand planning
Investor uncertainty
Economic planning
Seasonal changes in demand/utilization
Roles of risk and uncertainty
Time series methods
Moving average
Exponential smoothing
Extrapolation
Linear prediction
Trend estimation
Growth curve
Econometric methods (casual methods)
Regression analysis using linear regression or nonlinear regression
Autoregressive moving average (ARMA)
Autoregressive integrated moving average (ARIMA)
Econometrics
Judgemental methods
Surveys
Delphi method
Scenario building
Technology forecasting
Forecast by analogy
Simulation and other methods
Simulation
Prediction market
Probabilistic forecasting and Ensemble forecasting
Reference class forecasting

287842 
Numerical Methods 
14 hours 
This course is for data scientists and statisticians that have some familiarity with numerical methods and have at least one programming language from R, Python, Octave, and some C++ options. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization.
The purpose of this course is to give a practical introduction in numerical methods to participants interested in applying the methods at work.
Sector specific examples are used to make the training relevant to the audience.
Topics Covered:
curve fitting
regression robust regression
linear algebra: matrix operations
eigenvalue/eigenvectormatrix decompositions
ordinary & partial differential equations
fourier analysis
interpolation & splines

1223204 
Building Web Applications in R with Shiny 
7 hours 
Description:
This is a course designed to teach R users how to create web apps without needing to learn crossbrowser HTML, Javascript, and CSS.
Objective:
Covers the basics of how Shiny apps work.
Covers all commonly used input/output/rendering/paneling functions from the Shiny library.
An overview of Shiny
Installation of Shiny for a local use
Basic Shiny concepts
Basic control accessories  Buttons, sliders, drop down menus
Program structure ui.r, server.r
Building first application
Running your application
Customizing interface
Html links in Shiny
JavaScript and Shiny
Advanced control accessories
Showing and Hiding elements of UI
Dynamic user interfaces
Advanced reactivity
Animation
Downloading uploading data
Sharing Shiny web applications
An overview of Shiny extensions

1818 
Statistics for Researchers 
35 hours 
This course aims to give researchers an understanding of the principles of statistical design and analysis and their relevance to research in a range of scientific disciplines.
It covers some probability and statistical methods, mainly through examples. This training contains around 30% of lectures, 70% of guided quizzes and labs.
In the case of closed course we can tailor the examples and materials to a specific branch (like psychology tests, public sector, biology, genetics, etc...)
In the case of public courses, mixed examples are used.
Though various software is used during this course (Microsoft Excel to SPSS, Statgraphics, etc...) its main focus is on understanding principles and processes guiding research, reasoning and conclusion.
This course can be delivered as a blended course i.e. with homework and assignments.
Scientific Method, Probability & Statistics
Very short history of statistics
Why can be "confident" about the conclusions
Probability and decision making
Preparation for research (deciding "what" and "how")
The big picture: research is a part of a process with inputs and outputs
Gathering data
Questioners and measurement
What to measure
Observational Studies
Design of Experiments
Analysis of Data and Graphical Methods
Research Skills and Techniques
Research Management
Describing Bivariate Data
Introduction to Bivariate Data
Values of the Pearson Correlation
Guessing Correlations Simulation
Properties of Pearson's r
Computing Pearson's r
Restriction of Range Demo
Variance Sum Law II
Exercises
Probability
Introduction
Basic Concepts
Conditional Probability Demo
Gamblers Fallacy Simulation
Birthday Demonstration
Binomial Distribution
Binomial Demonstration
Base Rates
Bayes' Theorem Demonstration
Monty Hall Problem Demonstration
Exercises
Normal Distributions
Introduction
History
Areas of Normal Distributions
Varieties of Normal Distribution Demo
Standard Normal
Normal Approximation to the Binomial
Normal Approximation Demo
Exercises
Sampling Distributions
Introduction
Basic Demo
Sample Size Demo
Central Limit Theorem Demo
Sampling Distribution of the Mean
Sampling Distribution of Difference Between Means
Sampling Distribution of Pearson's r
Sampling Distribution of a Proportion
Exercises
Estimation
Introduction
Degrees of Freedom
Characteristics of Estimators
Bias and Variability Simulation
Confidence Intervals
Exercises
Logic of Hypothesis Testing
Introduction
Significance Testing
Type I and Type II Errors
One and TwoTailed Tests
Interpreting Significant Results
Interpreting NonSignificant Results
Steps in Hypothesis Testing
Significance Testing and Confidence Intervals
Misconceptions
Exercises
Testing Means
Single Mean
t Distribution Demo
Difference between Two Means (Independent Groups)
Robustness Simulation
All Pairwise Comparisons Among Means
Specific Comparisons
Difference between Two Means (Correlated Pairs)
Correlated t Simulation
Specific Comparisons (Correlated Observations)
Pairwise Comparisons (Correlated Observations)
Exercises
Power
Introduction
Example Calculations
Factors Affecting Power
Exercises
Prediction
Introduction to Simple Linear Regression
Linear Fit Demo
Partitioning Sums of Squares
Standard Error of the Estimate
Prediction Line Demo
Inferential Statistics for b and r
Exercises
ANOVA
Introduction
ANOVA Designs
OneFactor ANOVA (BetweenSubjects)
OneWay Demo
MultiFactor ANOVA (BetweenSubjects)
Unequal Sample Sizes
Tests Supplementing ANOVA
WithinSubjects ANOVA
Power of WithinSubjects Designs Demo
Exercises
Chi Square
Chi Square Distribution
OneWay Tables
Testing Distributions Demo
Contingency Tables
2 x 2 Table Simulation
Exercises
Case Studies
Analysis of selected case studies 
287849 
Administrator Training for Apache Hadoop 
35 hours 
Audience:
The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment
Goal:
Deep knowledge on Hadoop cluster administration.
1: HDFS (17%)
Describe the function of HDFS Daemons
Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing.
Identify current features of computing systems that motivate a system like Apache Hadoop.
Classify major goals of HDFS Design
Given a scenario, identify appropriate use case for HDFS Federation
Identify components and daemon of an HDFS HAQuorum cluster
Analyze the role of HDFS security (Kerberos)
Determine the best data serialization choice for a given scenario
Describe file read and write paths
Identify the commands to manipulate files in the Hadoop File System Shell
2: YARN and MapReduce version 2 (MRv2) (17%)
Understand how upgrading a cluster from Hadoop 1 to Hadoop 2 affects cluster settings
Understand how to deploy MapReduce v2 (MRv2 / YARN), including all YARN daemons
Understand basic design strategy for MapReduce v2 (MRv2)
Determine how YARN handles resource allocations
Identify the workflow of MapReduce job running on YARN
Determine which files you must change and how in order to migrate a cluster from MapReduce version 1 (MRv1) to MapReduce version 2 (MRv2) running on YARN.
3: Hadoop Cluster Planning (16%)
Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster.
Analyze the choices in selecting an OS
Understand kernel tuning and disk swapping
Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario
Given a scenario, determine the ecosystem components your cluster needs to run in order to fulfill the SLA
Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O
Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster
Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario
4: Hadoop Cluster Installation and Administration (25%)
Given a scenario, identify how the cluster will handle disk and machine failures
Analyze a logging configuration and logging configuration file format
Understand the basics of Hadoop metrics and cluster health monitoring
Identify the function and purpose of available tools for cluster monitoring
Be able to install all the ecosystem components in CDH 5, including (but not limited to): Impala, Flume, Oozie, Hue, Manager, Sqoop, Hive, and Pig
Identify the function and purpose of available tools for managing the Apache Hadoop file system
5: Resource Management (10%)
Understand the overall design goals of each of Hadoop schedulers
Given a scenario, determine how the FIFO Scheduler allocates cluster resources
Given a scenario, determine how the Fair Scheduler allocates cluster resources under YARN
Given a scenario, determine how the Capacity Scheduler allocates cluster resources
6: Monitoring and Logging (15%)
Understand the functions and features of Hadoop’s metric collection abilities
Analyze the NameNode and JobTracker Web UIs
Understand how to monitor cluster Daemons
Identify and monitor CPU usage on master nodes
Describe how to monitor swap and memory allocation on all nodes
Identify how to view and manage Hadoop’s log files
Interpret a log file

2929318 
Predictive Modelling with R 
14 hours 
Problems facing forecasters
Customer demand planning
Investor uncertainty
Economic planning
Seasonal changes in demand/utilization
Roles of risk and uncertainty
Time series Forecasting
Seasonal adjustment
Moving average
Exponential smoothing
Extrapolation
Linear prediction
Trend estimation
Stationarity and ARIMA modelling
Econometric methods (casual methods)
Regression analysis
Multiple linear regression
Multiple nonlinear regression
Regression validation
Forecasting from regression
Judgemental methods
Surveys
Delphi method
Scenario building
Technology forecasting
Forecast by analogy
Simulation and other methods
Simulation
Prediction market
Probabilistic forecasting and Ensemble forecasting

287850 
Hadoop Administration 
21 hours 
The course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment
Course goal:
Getting knowledge regarding Hadoop cluster administration
Introduction to Cloud Computing and Big Data solutions
Apache Hadoop evolution: HDFS, MapReduce, YARN
Installation and configuration of Hadoop in Pseudodistributed mode
Running MapReduce jobs on Hadoop cluster
Hadoop cluster planning, installation and configuration
Hadoop ecosystem: Pig, Hive, Sqoop, HBase
Big Data future: Impala, Cassandra

2929322 
Data Mining & Machine Learning with R 
14 hours 
Introduction to Data mining and Machine Learning
Statistical learning vs. Machine learning
Iteration and evaluation
BiasVariance tradeoff
Regression
Linear regression
Generalizations and Nonlinearity
Exercises
Classification
Bayesian refresher
Naive Bayes
Dicriminant analysis
Logistic regression
KNearest neighbors
Support Vector Machines
Neural networks
Decision trees
Exercises
Crossvalidation and Resampling
Crossvalidation approaches
Bootstrap
Exercises
Unsupervised Learning
Kmeans clustering
Examples
Challenges of unsupervised learning and beyond Kmeans
Advanced topics
Ensemble models
Mixed models
Boosting
Examples
Multidimensional reduction
Factor Analysis
Principal Component Analysis
Examples

2004 
Six Sigma Yellow Belt 
21 hours 
Yellow Belt covers the basics of the Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling delegates to take part and lead team based waste and defect reduction projects and initiatives. In addition emphasis is placed on applying the problem solving tools into daily roles.
At the end of the course you will be equipped to look at your immediate team and role, determine what can be improved and create a business improvement project on a selected opportunity that is aligned to customer requirements.
You will be able to analyse the process using visualization tools and identify the waste (nonvalue adding) components and work to eliminate these from the process. You will apply root cause analysis techniques to identify the underlying causes of defects in the process.
The course uses simulations, case study exercises and work based projects to enable delegates to 'learn through doing'.
Notes: This course has a minimum class size of 4. And if requested this course can be delivered in 2 days with some reductions to the course content and level of detail in some areas, notably Customer needs; Graphical analysis and Process handover.
An overview of project selection and scoping
Understanding customer needs and how they impact project aims
Discovering processes using visualisation techniques
Understanding the causes of work and how to simplify
Finding and removing process waste
Graphical analysis to understand process performance
Problem solving tools to determine root cause
Basic solution creation
Piloting & implementation
Process handover

287854 
Octave for Data Analysis 
14 hours 
Audience:
This course is for data scientists and statisticians that have some familiarity statistical methods and would like to use the Octave programming language at work.
The purpose of this course is to give a practical introduction in Octave programming to participants interested in using this programming language at work.
environment
data types:
numeric
string, arrays
matrices
variables
expressions
control flow
functions
exception handling
debugging
input/output
linear algebra
optimization
statistical distributions
regression
plotting

2929326 
R Programming for Data Analysis 
14 hours 
This course is part of the Data Scientist skill set (Domain: Data and Technology)
Introduction and preliminaries
Making R more friendly, R and available GUIs
Rstudio
Related software and documentation
R and statistics
Using R interactively
An introductory session
Getting help with functions and features
R commands, case sensitivity, etc.
Recall and correction of previous commands
Executing commands from or diverting output to a file
Data permanency and removing objects
Simple manipulations; numbers and vectors
Vectors and assignment
Vector arithmetic
Generating regular sequences
Logical vectors
Missing values
Character vectors
Index vectors; selecting and modifying subsets of a data set
Other types of objects
Objects, their modes and attributes
Intrinsic attributes: mode and length
Changing the length of an object
Getting and setting attributes
The class of an object
Arrays and matrices
Arrays
Array indexing. Subsections of an array
Index matrices
The array() function
The outer product of two arrays
Generalized transpose of an array
Matrix facilities
Matrix multiplication
Linear equations and inversion
Eigenvalues and eigenvectors
Singular value decomposition and determinants
Least squares fitting and the QR decomposition
Forming partitioned matrices, cbind() and rbind()
The concatenation function, (), with arrays
Frequency tables from factors
Lists and data frames
Lists
Constructing and modifying lists
Concatenating lists
Data frames
Making data frames
attach() and detach()
Working with data frames
Attaching arbitrary lists
Managing the search path
Data manipulation
Selecting, subsetting observations and variables
Filtering, grouping
Recoding, transformations
Aggregation, combining data sets
Character manipulation, stringr package
Reading data
Txt files
CSV files
XLS, XLSX files
SPSS, SAS, Stata,… and other formats data
Exporting data to txt, csv and other formats
Accessing data from databases using SQL language
Probability distributions
R as a set of statistical tables
Examining the distribution of a set of data
One and twosample tests
Grouping, loops and conditional execution
Grouped expressions
Control statements
Conditional execution: if statements
Repetitive execution: for loops, repeat and while
Writing your own functions
Simple examples
Defining new binary operators
Named arguments and defaults
The '...' argument
Assignments within functions
More advanced examples
Efficiency factors in block designs
Dropping all names in a printed array
Recursive numerical integration
Scope
Customizing the environment
Classes, generic functions and object orientation
Graphical procedures
Highlevel plotting commands
The plot() function
Displaying multivariate data
Display graphics
Arguments to highlevel plotting functions
Basic visualisation graphs
Multivariate relations with lattice and ggplot package
Using graphics parameters
Graphics parameters list
Automated and interactive reporting
Combining output from R with text

2005 
Six Sigma Green Belt 
70 hours 
Green Belts participate in and lead Lean and Six Sigma projects from within their regular job function. They can tackle projects as part of a cross functional team or projects scoped within their normal job.
Each session of Green Belt training is separated by 3 or 4 weeks when the Green Belts apply their training to their improvement projects. We recommend supporting the Green Belts on their projects in between training sessions and holding stage gate reviews along with leadership and Lean Six Sigma Champions to ensure DMAIC methodology is being rigorously applied.
Week 1 Foundation: covers the fundamentals of the Lean Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling participants to take part and lead waste and defect reduction projects and initiatives.
Week 2 Practitioner: provides additional data analysis and lean tools for participants to lead well scoped process improvement projects related to their regular job function.
Block 1
Day 1
Introduction to Six Sigma
Project Chartering & VOC
Process Mapping
Stakeholder analysis
Day 2
Team Start Up
Prioritisation Matrix
Lean Thinking
Value Stream Mapping
Day 3
Data Collection
Minitab and Graphical Analysis
Descriptive Statistics
Day 4
Measurement System Evaluation
Process Capability Cp, CpK
Six Sigma Metrics
Day 5
5 Why
FMEA
Block 2
Day 1
Review of Block 1
Multivari
Inferential Statistics
Intro to Hypothesis Testing
Day 2
2 sample ttests
F tests
Hypothesis Testing – Chi Sq
Day 3
Hypothesis Testing  Anova
Day 4
Correlation and Regression
Multiple Regression
Introduction to Design Of Experiments
Day 5
Mistake Proofing
Control Plans
Control Charts

1223203 
Data Mining and Analysis 
28 hours 
Objective:
Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results.
Data preprocessing
Data Cleaning
Data integration and transformation
Data reduction
Discretization and concept hierarchy generation
Statistical inference
Probability distributions, Random variables, Central limit theorem
Sampling
Confidence intervals
Statistical Inference
Hypothesis testing
Multivariate linear regression
Specification
Subset selection
Estimation
Validation
Prediction
Classification methods
Logistic regression
Linear discriminant analysis
Knearest neighbours
Naive Bayes
Comparison of Classification methods
Neural Networks
Fitting neural networks
Training neural networks issues
Decision trees
Regression trees
Classification trees
Trees Versus Linear Models
Bagging, Random Forests, Boosting
Bagging
Random Forests
Boosting
Support Vector Machines and Flexible disct
Maximal Margin classifier
Support vector classifiers
Support vector machines
2 and more classes SVM’s
Relationship to logistic regression
Principal Components Analysis
Clustering
Kmeans clustering
Kmedoids clustering
Hierarchical clustering
Density based clustering
Model Assesment and Selection
Bias, Variance and Model complexity
Insample prediction error
The Bayesian approach
Crossvalidation
Bootstrap methods

2929330 
Big Data & Database Systems Fundamentals 
14 hours 
The course is part of the Data Scientist skill set (Domain: Data and Technology).
Data Warehousing Concepts
What is Data Ware House?
Difference between OLTP and Data Ware Housing
Data Acquisition
Data Extraction
Data Transformation.
Data Loading
Data Marts
Dependent vs Independent data Mart
Data Base design
ETL Testing Concepts:
Introduction.
Software development life cycle.
Testing methodologies.
ETL Testing Work Flow Process.
ETL Testing Responsibilities in Data stage.
Big data Fundamentals
Big Data and its role in the corporate world
The phases of development of a Big Data strategy within a corporation
Explain the rationale underlying a holistic approach to Big Data
Components needed in a Big Data Platform
Big data storage solution
Limits of Traditional Technologies
Overview of database types
NoSQL Databases
Hadoop
Map Reduce
Apache Spark 
2006 
Six Sigma Black Belt 
84 hours 
Six Sigma is a data driven approach that tackles variation to improve the performance of products, services and processes, combining practical problem solving and the best scientific approaches found in experimentation and optimisation of systems. The approach has been widely and successfully applied in industry, notably by Motorola, AlliedSignal & General Electric.
Black Belt is a qualification for improvement managers in a Six Sigma organisation. You will learn the tools and techniques to take an improvement project through the Define, Measure, Analyse, Improve and Control phases (DMAIC). These techniques include Process Mapping, Measurement System Evaluation, Regression Analysis, Design of Experiments, Statistical Tolerancing, Monte Carlo Simulation and Lean Thinking.
The content of the course takes the participants through the DMAIC phases as well as introducing subjects such as Lean Thinking, Design for Six Sigma and discussing important leadership issues and experiences in deploying a Six Sigma programme.
Week 1 Foundation: covers the fundamentals of the Lean Six Sigma Define Measure Analyse Improve Control (DMAIC) approach enabling participants to take part and lead waste and defect reduction projects and initiatives.
Week 2 Practitioner: provides additional data analysis and lean tools for participants to lead well scoped process improvement projects related to their regular job function.
Week 3 Expert: provides regression, design of experiment and data analysis techniques to enable participants to tackle complex problem solving projects that require understanding of the relationships between multiple variables.
The trainer has 16 years experience with Six Sigma and as well as leading the deployment of Six Sigma at a number of businesses he has trained and coached over 300 Black Belts. Here are a few comments from previous participants:
“Probably the most valuable course I will ever pass”
“The content was very well delivered. The examples very relevant. Thank you”
“The course was excellent and I am able to use part of it to coach my lean teams here” (Company supervisor who attended with KTP associate)
Block 1
Day 1
Introduction to Six Sigma
Project Chartering & VOC
Process Mapping
Stakeholder analysis
Day 2
Team Start Up
Prioritisation Matrix
Lean Thinking
Value Stream Mapping
Day 3
Data Collection
Minitab and Graphical Analysis
Descriptive Statistics
Day 4
Measurement System Evaluation
Process Capability Cp, CpK
Six Sigma Metrics
Day 5
5 Why
FMEA
Block 2
Day 1
Review of Block 1
Multivari
Inferential Statistics
Intro to Hypothesis Testing
Day 2
2 sample ttests
F tests
Hypothesis Testing – Chi Sq
Day 3
Hypothesis Testing  Anova
Day 4
Correlation and Regression
Multiple Regression
Introduction to Design Of Experiments
Day 5
Mistake Proofing
Control Plans
Control Charts
Block 3
Day 1
Review of Block 2
2K Factorial Experiments
Box Cox Transformations
Hypothesis Testing – Non Parametric
Day 2
2K Factorial Experiments
Fractional Factorial Experiments
Day 3
Noise Blocking Robustness
Centre Points
General Full Factorial Experiments
Day 4
Response Surface Experiments
Implementing Improvements
Creative Solutions
Day 5
Intro to Design for Six Sigma
Statistical Tolerancing
Monte Carlo Simulation
Certification
Six Sigma is a practical qualification, to demonstrate knowledge of what has been learnt on the course you will need to undertake 2 coursework projects. There is no report to produce but you will be required to present a PowerPoint presentation to the trainer and examiner showing results and method. The projects can cover work you would complete in your normal work, however you will need to show use of the DMAIC problem solving approach and application of Six Sigma and Lean tools. This provides a good balance between the practical approach and more rigorous analysis which together lead to robust solutions. You will be able to contact the trainer for discussions of how Six Sigma tools could benefit you in your project. Examples of projects from previous participants include:
Formulating cream texture for seasonality in dairy feeds.
Housing Association complaints reduction
Multivariable (cost, efficiency, size) optimisation of a fuel cell
Job Scheduling improvement in a factory
Ambulance waiting time reduction
Reduction in resin thickness variation in glass manufacture
NobleProg & Redlands provide Black Belt certification. For delegates that require independent accreditation, NobleProg & Redlands have partnered with the British Quality Foundation (BQF) to provide Lean Six Sigma Black Belt certification. Certification requires passing an exam at the end of the course and completing and presenting two improvement projects that demonstrate understanding and application of the Six Sigma approach and techniques.
An additional charge of £600 plus VAT is levied for BQF independent accreditation. 
1223205 
Introductory R for Biologists 
28 hours 
I. Introduction and preliminaries
1. Overview
Making R more friendly, R and available GUIs
Rstudio
Related software and documentation
R and statistics
Using R interactively
An introductory session
Getting help with functions and features
R commands, case sensitivity, etc.
Recall and correction of previous commands
Executing commands from or diverting output to a file
Data permanency and removing objects
Good programming practice: Selfcontained scripts, good readability e.g. structured scripts, documentation, markdown
installing packages; CRAN and Bioconductor
2. Reading data
Txt files (read.delim)
CSV files
3. Simple manipulations; numbers and vectors + arrays
Vectors and assignment
Vector arithmetic
Generating regular sequences
Logical vectors
Missing values
Character vectors
Index vectors; selecting and modifying subsets of a data set
Arrays
Array indexing. Subsections of an array
Index matrices
The array() function + simple operations on arrays e.g. multiplication, transposition
Other types of objects
4. Lists and data frames
Lists
Constructing and modifying lists
Concatenating lists
Data frames
Making data frames
Working with data frames
Attaching arbitrary lists
Managing the search path
5. Data manipulation
Selecting, subsetting observations and variables
Filtering, grouping
Recoding, transformations
Aggregation, combining data sets
Forming partitioned matrices, cbind() and rbind()
The concatenation function, (), with arrays
Character manipulation, stringr package
short intro into grep and regexpr
6. More on Reading data
XLS, XLSX files
readr and readxl packages
SPSS, SAS, Stata,… and other formats data
Exporting data to txt, csv and other formats
6. Grouping, loops and conditional execution
Grouped expressions
Control statements
Conditional execution: if statements
Repetitive execution: for loops, repeat and while
intro into apply, lapply, sapply, tapply
7. Functions
Creating functions
Optional arguments and default values
Variable number of arguments
Scope and its consequences
8. Simple graphics in R
Creating a Graph
Density Plots
Dot Plots
Bar Plots
Line Charts
Pie Charts
Boxplots
Scatter Plots
Combining Plots
II. Statistical analysis in R
1. Probability distributions
R as a set of statistical tables
Examining the distribution of a set of data
2. Testing of Hypotheses
Tests about a Population Mean
Likelihood Ratio Test
One and twosample tests
ChiSquare GoodnessofFit Test
KolmogorovSmirnov OneSample Statistic
Wilcoxon SignedRank Test
TwoSample Test
Wilcoxon Rank Sum Test
MannWhitney Test
KolmogorovSmirnov Test
3. Multiple Testing of Hypotheses
Type I Error and FDR
ROC curves and AUC
Multiple Testing Procedures (BH, Bonferroni etc.)
4. Linear regression models
Generic functions for extracting model information
Updating fitted models
Generalized linear models
Families
The glm() function
Classification
Logistic Regression
Linear Discriminant Analysis
Unsupervised learning
Principal Components Analysis
Clustering Methods(kmeans, hierarchical clustering, kmedoids)
5. Survival analysis (survival package)
Survival objects in r
KaplanMeier estimate, logrank test, parametric regression
Confidence bands
Censored (interval censored) data analysis
Cox PH models, constant covariates
Cox PH models, timedependent covariates
Simulation: Model comparison (Comparing regression models)
6. Analysis of Variance
OneWay ANOVA
TwoWay Classification of ANOVA
MANOVA
III. Worked problems in bioinformatics
Short introduction to limma package
Microarray data analysis workflow
Data download from GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1397
Data processing (QC, normalisation, differential expression)
Volcano plot
Custering examples + heatmaps

287889 
Survey Research, Sampling Techniques & Estimation 
14 hours 
Survey research:
Principle of sample survey design and implementation
survey preliminaries
sampling methods (probability & nonprobability methods)
population & sampling frames
survey data collection methods
Questionnaire design
Design and writing of questionnaires
Pretests & piloting
Planning & organisation of surveys
Minimising errors, bias & nonresponse at the design stage
Survey data processing
Commissioning surveys/research
Sample Techniques & Estimation:
Sampling techniques and their strengths/weaknesses (may overlap above sampling methods)
Simple Random Sampling
Unequal Probability Sampling
Stratified Sampling (with proportional to size & disproportional selection)
Systematic Sampling
Cluster sampling
Multistage Sampling
Quota Sampling
Estimation
Methods of estimating sample sizes
Estimating population parameters using sample estimates
Variance and confidence intervals estimation
Estimating bias/precision
Methods of correcting bias
Methods of handling missing data
Nonresponse analysis

2012 
Statistics for Managers 
35 hours 
This course has been created for decision makers whose primary goal is not to do the calculation and the analysis, but to understand them.
The course uses a lot of pictures, diagrams, computer simulations, anecdotes and sense of humour to explain concepts and pitfalls of statistics.
Introduction to Statistics
What are Statistics?
Importance of Statistics
Descriptive Statistics
Inferential Statistics
Variables
Percentiles
Measurement
Levels of Measurement
Basics of Data Collection
Distributions
Summation Notation
Linear Transformations
Common Pitfalls
Biased samples
Average, mean or median?
Misleading graphs
Semiattached figures
Third variable problem
Ceteris paribus
Errors in reasoning
Understanding confidence level
Understanding Results
Describing Bivariate Data
Probability
Normal Distributions
Sampling Distributions
Estimation
Logic of Hypothesis Testing
Testing Means
Power
Prediction
ANOVA
Chi Square
Case Studies
Discussion about case studies chosen by the delegates.

1496646 
Data Analytics With R 
21 hours 
R is a very popular, open source environment for statistical computing, data analytics and graphics. This course introduces R programming language to students. It covers language fundamentals, libraries and advanced concepts. Advanced data analytics and graphing with real world data.
Audience
Developers / data analytics
Duration
3 days
Format
Lectures and Handson
Day One: Language Basics
Course Introduction
About Data Science
Data Science Definition
Process of Doing Data Science.
Introducing R Language
Variables and Types
Control Structures (Loops / Conditionals)
R Scalars, Vectors, and Matrices
Defining R Vectors
Matricies
String and Text Manipulation
Character data type
File IO
Lists
Functions
Introducing Functions
Closures
lapply/sapply functions
DataFrames
Labs for all sections
Day Two: Intermediate R Programming
DataFrames and File I/O
Reading data from files
Data Preparation
Builtin Datasets
Visualization
Graphics Package
plot() / barplot() / hist() / boxplot() / scatter plot
Heat Map
ggplot2 package ( qplot(), ggplot())
Exploration With Dplyr
Labs for all sections
Day 3: Advanced Programming With R
Statistical Modeling With R
Statistical Functions
Dealing With NA
Distributions (Binomial, Poisson, Normal)
Regression
Introducing Linear Regressions
Recommendations
Text Processing (tm package / Wordclouds)
Clustering
Introduction to Clustering
KMeans
Classification
Introduction to Classification
Naive Bayes
Decision Trees
Training using caret package
Evaluating Algorithms
R and Big Data
Hadoop
Big Data Ecosystem
RHadoop
Labs for all sections

287890 
Data Shrinkage for Government 
14 hours 
Why shrink data
Relational databases
Introduction
Aggregation and disaggregation
Normalisation and denormalisation
Null values and zeroes
Joining data
Complex joins
Cluster analysis
Applications
Strengths and weaknesses
Measuring distance
Hierarchical clustering
Kmeans and derivatives
Applications in Government
Factor analysis
Concepts
Exploratory factor analysis
Confirmatory factor analysis
Principal component analysis
Correspondence analysis
Software
Applications in Government
Predictive analytics
Timelines and naming conventions
Holdout samples
Weights of evidence
Information value
Scorecard building demonstration using a spreadsheet
Regression in predictive analytics
Logistic regression in predictive analytics
Decision Trees in predictive analytics
Neural networks
Measuring accuracy
Applications in Government

2013 
The Practitioner’s Guide to Multivariate Techniques 
14 hours 
The introduction of the digital computer, and now the widespread availability of computer packages, has opened up a hitherto difficult area of statistics; multivariate analysis. Previously the formidable computing effort associated with these procedures presented a real barrier. That barrier has now disappeared and the analyst can therefore concentrate on an appreciation and an interpretation of the findings.
Multivariate Analysis of Variance (MANOVA)
Whereas the Analysis of Variance technique (ANOVA) investigates possible systematic differences between prescribes groups of individuals on a single variable, the technique of Multivariate Analysis of Variance is simply an extension of that procedure to numerous variates viewed collectively. These variates could be distinct in nature; for example Height, Weight etc, or repeated measures of a single variate over time or over space. When the variates are repeated measures over time or space, the analyses may often be reduced to a succession of univariate analyses, with easier interpretation. This procedure is often referred to as Repeated Measure Analysis.
Principal Component Analysis
If only two variates are recorded for a number of individuals, the data may conveniently be represented on a twodimensional plot. If there are ‘p’ variates then one could imagine a plot of the data in ‘p’ dimensional space. The technique of Principal Component Analysis corresponds to a rotation of the axes so that the maximum amounts of variation are progressively represented along the new axes. It has been described as …….‘peering into multidimensional space, from every conceivable angle, and selecting as the viewing angle that which contains the maximum amount of variation’ The aim therefore is a reduction of the dimensionality of multivariate data. If for example a very high percentage (say 90%) of the variability is contained in the first two principal components, a plot of these components would be a virtually complete pictorial representation of the variability.
Discriminant Analysis
Suppose that several variates are observed on individuals from two identified groups. The technique of discriminant analysis involves calculating that linear function of the variates that best separates out the groups. The linear function may therefore be used to identify group membership simply from the pattern of variates. Various methods are available to estimate the success in general of this identification procedure.
Canonical Variate Analysis
Canonical Variate Analysis is in essence an extension of Discriminant Analysis to accommodate the situation where there are more than two groups of individuals.
Cluster Analysis
Cluster Analysis as the name suggests involves identifying groupings (or clusters) of individuals in multidimensional space. Since here there is no ‘a priori’ grouping of individuals, the identification of so called clusters is a subjective process subject to various assumptions. Most computer packages offer several clustering procedures that may often give differing results. However the pictorial representation of the so called ‘clusters’, in diagrams called dendrograms, provides a very useful diagnostic.
Factor Analysis
If ‘p’ variates are observed on each of ‘n’ individuals, the technique of factor analysis attempts to identify say ‘r’ (< p) so called factors which determine to a large extent the variate values. The implicit assumption here therefore is that the entire array of ‘p’ variates is controlled by ‘r’ factors. For example the ‘p’ variates could represent the performance of students in numerous examination subjects, and we wish to determine whether a few attributes such as numerical ability, linguistic ability could account for much of the variability. The difficulties here stem from the fact that the socalled factors are not directly observable, and indeed may not really exist.
Factor analysis has been viewed very suspiciously over the years, because of the measure of speculation involved in the identification of factors. One popular numerical procedure starts with the rotation of axes using principal components (described above) followed by a rotation of the factors identified. 
1572264 
Statistical Quality Analysis 
7 hours 
This course covers the fundamentals of statistical process control and how these quality tools can provide the necessary evidence to improve and control processes. Know when and where to use the various types of control charts available in Minitab for your own processes. And learn how to use capability analysis tools to evaluate your processes.
Gage R&R,
Destructive Testing,
Gage Linearity and Bias,
Attribute Agreement,
Variables and Attribute Control Charts,
Capability Analysis for Normal, Nonnormal and Attribute data

287891 
Statistical and Econometric Modelling 
21 hours 
The Nature of Econometrics and Economic Data
Econometrics and models
Steps in econometric modelling
Types of economic data, time series, crosssectional, panel
Causality in econometric analysis
Specification and Data Issues
Functional form
Proxy variables
Measurement error in variables
Missing data, outliers, influential observations
Regression Analysis
Estimation
Ordinary least squares (OLS) estimators
Classical OLS assumptions,
Gauss MarkovTheorem
Best Linear Unbiased Estimators
Inference
Testing statistical significance of parameters ttest(single, group)
Confidence intervals
Testing multiple linear restrictions, Ftest
Goodness of fit
Testing functional form
Missing variables
Binary variables
Testing for violation of assumptions and their implications:
Heteroscedasticity
Autocorrelation
Multicolinearity
Endogeneity
Other Estimation techniques
Instrumental Variables Estimation
Generalised Least Squares
Maximum Likelihood
Generalised Method of Moments
Models for Binary Response Variables
Linear Probability Model
Probit Model
Logit Model
Estimation
Interpretation of parameters, Marginal Effects
Goodness of Fit
Limited Dependent Variables
Tobit Model
Truncated Normal Distribution
Interpretation of Tobit Model
Specification and Estimation Issues
Time Series Models
Characteristics of Time Series
Decomposition of Time Series
Exponential Smoothing
Stationarity
ARIMA models
CoIntegration
ECM model
Predictive Analysis
Forecasting, Planning and Goals
Steps in Forecasting
Evaluating Forecast Accuracy
Redisual Diagnostics
Prediction Intervals

1841 
Introduction to R 
21 hours 
R is an opensource free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or finetune financial models. R has a wide variety of packages for data mining.
This course covers the manipulation of objects in R including reading data, accessing R packages, writing R functions, and making informative graphs. It includes analyzing data using common statistical models. The course teaches how to use the R software (http://www.rproject.org) both on a command line and in a graphical user interface (GUI).
Introduction and preliminaries
Making R more friendly, R and available GUIs
The R environment
Related software and documentation
R and statistics
Using R interactively
An introductory session
Getting help with functions and features
R commands, case sensitivity, etc.
Recall and correction of previous commands
Executing commands from or diverting output to a file
Data permanency and removing objects
Simple manipulations; numbers and vectors
Vectors and assignment
Vector arithmetic
Generating regular sequences
Logical vectors
Missing values
Character vectors
Index vectors; selecting and modifying subsets of a data set
Other types of objects
Objects, their modes and attributes
Intrinsic attributes: mode and length
Changing the length of an object
Getting and setting attributes
The class of an object
Ordered and unordered factors
A specific example
The function tapply() and ragged arrays
Ordered factors
Arrays and matrices
Arrays
Array indexing. Subsections of an array
Index matrices
The array() function
Mixed vector and array arithmetic. The recycling rule
The outer product of two arrays
Generalized transpose of an array
Matrix facilities
Matrix multiplication
Linear equations and inversion
Eigenvalues and eigenvectors
Singular value decomposition and determinants
Least squares fitting and the QR decomposition
Forming partitioned matrices, cbind() and rbind()
The concatenation function, (), with arrays
Frequency tables from factors
Lists and data frames
Lists
Constructing and modifying lists
Concatenating lists
Data frames
Making data frames
attach() and detach()
Working with data frames
Attaching arbitrary lists
Managing the search path
Reading data from files
The read.table()function
The scan() function
Accessing builtin datasets
Loading data from other R packages
Editing data
Probability distributions
R as a set of statistical tables
Examining the distribution of a set of data
One and twosample tests
Grouping, loops and conditional execution
Grouped expressions
Control statements
Conditional execution: if statements
Repetitive execution: for loops, repeat and while
Writing your own functions
Simple examples
Defining new binary operators
Named arguments and defaults
The '...' argument
Assignments within functions
More advanced examples
Efficiency factors in block designs
Dropping all names in a printed array
Recursive numerical integration
Scope
Customizing the environment
Classes, generic functions and object orientation
Statistical models in R
Defining statistical models; formulae
Contrasts
Linear models
Generic functions for extracting model information
Analysis of variance and model comparison
ANOVA tables
Updating fitted models
Generalized linear models
Families
The glm() function
Nonlinear least squares and maximum likelihood models
Least squares
Maximum likelihood
Some nonstandard models
Graphical procedures
Highlevel plotting commands
The plot() function
Displaying multivariate data
Display graphics
Arguments to highlevel plotting functions
Lowlevel plotting commands
Mathematical annotation
Hershey vector fonts
Interacting with graphics
Using graphics parameters
Permanent changes: The par() function
Temporary changes: Arguments to graphics functions
Graphics parameters list
Graphical elements
Axes and tick marks
Figure margins
Multiple figure environment
Device drivers
PostScript diagrams for typeset documents
Multiple graphics devices
Dynamic graphics
Packages
Standard packages
Contributed packages and CRAN
Namespaces

2202 
R for Data Analysis and Research 
7 hours 
Audience
managers
developers
scientists
students
Format of the course
online instruction and discussion OR facetoface workshops
The list below gives an idea of the topics that will be covered in the workshop.
The number of topics that will be covered depends on the duration of the workshop (i.e. one, two or three days). In a one or two day workshop it may not be possible to cover all topics, and so the workshop will be tailored to suit the specific needs of the learners.
A first R session
Syntax for analysing one dimensional data arrays
Syntax for analysing two dimensional data arrays
Reading and writing data files
Subsetting data, sorting, ranking and ordering data
Merging arrays
Set membership
The main statistical functions in R
The Normal Distribution (correlation, probabilities, tests for normality and confidence intervals)
Ordinary Least Squares Regression
Ttests, Analysis of Variance and Multivariable Analysis of Variance
Chisquare tests for categorical variables
Writing functions in R
Writing software (scripts) in R
Control structures (e.g. Loops)
Graphical methods (including scatterplots, bar charts, pie charts, histograms, box plots and dot charts)
Graphical User Interfaces for R

2642 
Forecasting with R 
14 hours 
This course allows delegate to fully automate the process of forecasting with R
Forecasting with R
Introduction to Forecasting
Exponential Smoothing
ARIMA models
The forecast package
Package 'forecast'
accuracy
Acf
arfima
Arima
arima.errors
auto.arima
bats
BoxCox
BoxCox.lambda
croston
CV
dm.test
dshw
ets
fitted.Arima
forecast
forecast.Arima
forecast.bats
forecast.ets
forecast.HoltWinters
forecast.lm
forecast.stl
forecast.StructTS
gas
gold
logLik.ets
ma
meanf
monthdays
msts
na.interp
naive
ndiffs
nnetar
plot.bats
plot.ets
plot.forecast
rwf
seasadj
seasonaldummy
seasonplot
ses
simulate.ets
sindexf
splinef
subset.ts
taylor
tbats
thetaf
tsdisplay
tslm
wineind
woolyrnq
