Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction to Data Science for Big Data Analytics
- Data Science Overview
- Big Data Overview
- Data Structures
- Drivers and Complexities of Big Data
- Big Data Ecosystem and a New Approach to Analytics
- Key Technologies in Big Data
- Data Mining Process and Problems
- Association Pattern Mining
- Data Clustering
- Outlier Detection
- Data Classification
Introduction to Data Analytics Lifecycle for Government
- Discovery
- Data Preparation
- Model Planning
- Model Building
- Presentation/Communication of Results
- Operationalization
- Exercise: Case Study
From this point, most of the training time (80%) will be spent on examples and exercises in R and related big data technology.
Getting Started with R for Government
- Installing R and Rstudio
- Features of R Language
- Objects in R
- Data in R
- Data Manipulation
- Big Data Issues
- Exercises
Getting Started with Hadoop for Government
- Installing Hadoop
- Understanding Hadoop Modes
- HDFS
- MapReduce Architecture
- Hadoop Related Projects Overview
- Writing Programs in Hadoop MapReduce
- Exercises
Integrating R and Hadoop with RHadoop for Government
- Components of RHadoop
- Installing RHadoop and Connecting with Hadoop
- The Architecture of RHadoop
- Hadoop Streaming with R
- Data Analytics Problem Solving with RHadoop
- Exercises
Pre-processing and Preparing Data for Government
- Data Preparation Steps
- Feature Extraction
- Data Cleaning
- Data Integration and Transformation
- Data Reduction – Sampling, Feature Subset Selection
- Dimensionality Reduction
- Discretization and Binning
- Exercises and Case Study
Exploratory Data Analytic Methods in R for Government
- Descriptive Statistics
- Exploratory Data Analysis
- Visualization – Preliminary Steps
- Visualizing Single Variable
- Examining Multiple Variables
- Statistical Methods for Evaluation
- Hypothesis Testing
- Exercises and Case Study
Data Visualizations for Government
- Basic Visualizations in R
- Packages for Data Visualization: ggplot2, lattice, plotly, lattice
- Formatting Plots in R
- Advanced Graphs
- Exercises
Regression (Estimating Future Values) for Government
- Linear Regression
- Use Cases
- Model Description
- Diagnostics
- Problems with Linear Regression
- Shrinkage Methods, Ridge Regression, the Lasso
- Generalizations and Nonlinearity
- Regression Splines
- Local Polynomial Regression
- Generalized Additive Models
- Regression with RHadoop
- Exercises and Case Study
Classification for Government
- The Classification-Related Problems
- Bayesian Refresher
- Naïve Bayes
- Logistic Regression
- K-Nearest Neighbors
- Decision Trees Algorithm
- Neural Networks
- Support Vector Machines
- Diagnostics of Classifiers
- Comparison of Classification Methods
- Scalable Classification Algorithms
- Exercises and Case Study
Assessing Model Performance and Selection for Government
- Bias, Variance, and Model Complexity
- Accuracy vs Interpretability
- Evaluating Classifiers
- Measures of Model/Algorithm Performance
- Hold-Out Method of Validation
- Cross-Validation
- Tuning Machine Learning Algorithms with Caret Package
- Visualizing Model Performance with Profit ROC and Lift Curves
Ensemble Methods for Government
- Bagging
- Random Forests
- Boosting
- Gradient Boosting
- Exercises and Case Study
Support Vector Machines for Classification and Regression for Government
- Maximal Margin Classifiers
- Support Vector Classifiers
- Support Vector Machines
- SVM’s for Classification Problems
- SVM’s for Regression Problems
- Exercises and Case Study
Identifying Unknown Groupings Within a Data Set for Government
- Feature Selection for Clustering
- Representative-Based Algorithms: k-Means, k-Medoids
- Hierarchical Algorithms: Agglomerative and Divisive Methods
- Probabilistic Base Algorithms: EM
- Density-Based Algorithms: DBSCAN, DENCLUE
- Cluster Validation
- Advanced Clustering Concepts
- Clustering with RHadoop
- Exercises and Case Study
Discovering Connections with Link Analysis for Government
- Link Analysis Concepts
- Metrics for Analyzing Networks
- The PageRank Algorithm
- Hyperlink-Induced Topic Search
- Link Prediction
- Exercises and Case Study
Association Pattern Mining for Government
- Frequent Pattern Mining Model
- Scalability Issues in Frequent Pattern Mining
- Brute Force Algorithms
- Apriori Algorithm
- The FP Growth Approach
- Evaluation of Candidate Rules
- Applications of Association Rules
- Validation and Testing
- Diagnostics
- Association Rules with R and Hadoop
- Exercises and Case Study
Constructing Recommendation Engines for Government
- Understanding Recommender Systems
- Data Mining Techniques Used in Recommender Systems
- Recommender Systems with Recommenderlab Package
- Evaluating the Recommender Systems
- Recommendations with RHadoop
- Exercise: Building Recommendation Engine
Text Analysis for Government
- Text Analysis Steps
- Collecting Raw Text
- Bag of Words
- Term Frequency – Inverse Document Frequency
- Determining Sentiments
- Exercises and Case Study
35 Hours
Testimonials (2)
Intensity, Training materials and expertise, Clarity, Excellent communication with Alessandra
Marija Hornis Dmitrovic - Marija Hornis
Course - Data Science for Big Data Analytics
The example and training material were sufficient and made it easy to understand what you are doing.