Course Outline

Introduction to Data Science for Big Data Analytics

  • Data Science Overview
  • Big Data Overview
  • Data Structures
  • Drivers and Complexities of Big Data
  • Big Data Ecosystem and a New Approach to Analytics
  • Key Technologies in Big Data
  • Data Mining Process and Problems
    • Association Pattern Mining
    • Data Clustering
    • Outlier Detection
    • Data Classification

Introduction to Data Analytics Lifecycle for Government

  • Discovery
  • Data Preparation
  • Model Planning
  • Model Building
  • Presentation/Communication of Results
  • Operationalization
  • Exercise: Case Study

From this point, most of the training time (80%) will be spent on examples and exercises in R and related big data technology.

Getting Started with R for Government

  • Installing R and Rstudio
  • Features of R Language
  • Objects in R
  • Data in R
  • Data Manipulation
  • Big Data Issues
  • Exercises

Getting Started with Hadoop for Government

  • Installing Hadoop
  • Understanding Hadoop Modes
  • HDFS
  • MapReduce Architecture
  • Hadoop Related Projects Overview
  • Writing Programs in Hadoop MapReduce
  • Exercises

Integrating R and Hadoop with RHadoop for Government

  • Components of RHadoop
  • Installing RHadoop and Connecting with Hadoop
  • The Architecture of RHadoop
  • Hadoop Streaming with R
  • Data Analytics Problem Solving with RHadoop
  • Exercises

Pre-processing and Preparing Data for Government

  • Data Preparation Steps
  • Feature Extraction
  • Data Cleaning
  • Data Integration and Transformation
  • Data Reduction – Sampling, Feature Subset Selection
  • Dimensionality Reduction
  • Discretization and Binning
  • Exercises and Case Study

Exploratory Data Analytic Methods in R for Government

  • Descriptive Statistics
  • Exploratory Data Analysis
  • Visualization – Preliminary Steps
  • Visualizing Single Variable
  • Examining Multiple Variables
  • Statistical Methods for Evaluation
  • Hypothesis Testing
  • Exercises and Case Study

Data Visualizations for Government

  • Basic Visualizations in R
  • Packages for Data Visualization: ggplot2, lattice, plotly, lattice
  • Formatting Plots in R
  • Advanced Graphs
  • Exercises

Regression (Estimating Future Values) for Government

  • Linear Regression
  • Use Cases
  • Model Description
  • Diagnostics
  • Problems with Linear Regression
  • Shrinkage Methods, Ridge Regression, the Lasso
  • Generalizations and Nonlinearity
  • Regression Splines
  • Local Polynomial Regression
  • Generalized Additive Models
  • Regression with RHadoop
  • Exercises and Case Study

Classification for Government

  • The Classification-Related Problems
  • Bayesian Refresher
  • Naïve Bayes
  • Logistic Regression
  • K-Nearest Neighbors
  • Decision Trees Algorithm
  • Neural Networks
  • Support Vector Machines
  • Diagnostics of Classifiers
  • Comparison of Classification Methods
  • Scalable Classification Algorithms
  • Exercises and Case Study

Assessing Model Performance and Selection for Government

  • Bias, Variance, and Model Complexity
  • Accuracy vs Interpretability
  • Evaluating Classifiers
  • Measures of Model/Algorithm Performance
  • Hold-Out Method of Validation
  • Cross-Validation
  • Tuning Machine Learning Algorithms with Caret Package
  • Visualizing Model Performance with Profit ROC and Lift Curves

Ensemble Methods for Government

  • Bagging
  • Random Forests
  • Boosting
  • Gradient Boosting
  • Exercises and Case Study

Support Vector Machines for Classification and Regression for Government

  • Maximal Margin Classifiers
    • Support Vector Classifiers
    • Support Vector Machines
    • SVM’s for Classification Problems
    • SVM’s for Regression Problems
  • Exercises and Case Study

Identifying Unknown Groupings Within a Data Set for Government

  • Feature Selection for Clustering
  • Representative-Based Algorithms: k-Means, k-Medoids
  • Hierarchical Algorithms: Agglomerative and Divisive Methods
  • Probabilistic Base Algorithms: EM
  • Density-Based Algorithms: DBSCAN, DENCLUE
  • Cluster Validation
  • Advanced Clustering Concepts
  • Clustering with RHadoop
  • Exercises and Case Study

Discovering Connections with Link Analysis for Government

  • Link Analysis Concepts
  • Metrics for Analyzing Networks
  • The PageRank Algorithm
  • Hyperlink-Induced Topic Search
  • Link Prediction
  • Exercises and Case Study

Association Pattern Mining for Government

  • Frequent Pattern Mining Model
  • Scalability Issues in Frequent Pattern Mining
  • Brute Force Algorithms
  • Apriori Algorithm
  • The FP Growth Approach
  • Evaluation of Candidate Rules
  • Applications of Association Rules
  • Validation and Testing
  • Diagnostics
  • Association Rules with R and Hadoop
  • Exercises and Case Study

Constructing Recommendation Engines for Government

  • Understanding Recommender Systems
  • Data Mining Techniques Used in Recommender Systems
  • Recommender Systems with Recommenderlab Package
  • Evaluating the Recommender Systems
  • Recommendations with RHadoop
  • Exercise: Building Recommendation Engine

Text Analysis for Government

  • Text Analysis Steps
  • Collecting Raw Text
  • Bag of Words
  • Term Frequency – Inverse Document Frequency
  • Determining Sentiments
  • Exercises and Case Study
 35 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses

Related Categories