Get in Touch

Course Outline

Overview

This section outlines the appropriate contexts for applying machine learning methodologies, critical considerations for implementation, and the underlying principles governing these systems. It addresses the advantages and limitations of such approaches, encompassing data types (structured, unstructured, static, and streaming), data integrity and volume, the distinction between data-driven and user-driven analytics, and the comparison between statistical models and machine learning frameworks. Key topics include the challenges of unsupervised learning, the bias-variance tradeoff, iterative evaluation processes, cross-validation techniques, and the distinctions among supervised, unsupervised, and reinforcement learning paradigms. This guidance is intended for government applications to ensure rigorous and accountable data analytics practices.

PRIMARY TOPICS

1. Principles of Naive Bayes

  • Foundational concepts of Bayesian methods
  • Probability theory
  • Joint probability
  • Conditional probability and Bayes' theorem
  • The Naive Bayes algorithm
  • Naive Bayes classification
  • The Laplace estimator
  • Application of numeric features in Naive Bayes

2. Principles of Decision Trees

  • Divide and conquer methodology
  • The C5.0 decision tree algorithm
  • Optimal split selection
  • Decision tree pruning

3. Principles of Neural Networks

  • Transition from biological to artificial neurons
  • Activation functions
  • Network topology
  • Layer configuration
  • Information flow direction
  • Node density per layer
  • Training via backpropagation
  • Deep learning architectures

4. Principles of Support Vector Machines

  • Hyperplane-based classification
  • Maximizing the margin
  • Handling linearly separable data
  • Handling non-linearly separable data
  • Kernel methods for non-linear spaces

5. Principles of Clustering

  • Clustering as a machine learning objective
  • The k-means algorithm
  • Distance metrics for cluster assignment and update
  • Selection of optimal cluster count

6. Classification Performance Metrics

  • Analysis of classification prediction data
  • Confusion matrix structure
  • Performance evaluation via confusion matrices
  • Alternative performance measures beyond accuracy
  • The kappa statistic
  • Sensitivity and specificity
  • Precision and recall
  • The F-measure
  • Visualization of performance tradeoffs
  • Receiver Operating Characteristic (ROC) curves
  • Estimation of future performance
  • The holdout method
  • Cross-validation
  • Bootstrap sampling

7. Optimization of Standard Models

  • Automated parameter tuning with caret
  • Construction of optimized models
  • Customization of the tuning process
  • Performance improvement through meta-learning
  • Ensemble methodology
  • Bagging
  • Boosting
  • Random forests
  • Training random forests
  • Evaluation of random forest performance

SECONDARY TOPICS

8. Classification via Nearest Neighbors

  • The k-nearest neighbors (kNN) algorithm
  • Distance calculation methods
  • Selection of appropriate k value
  • Data preparation for kNN application
  • Characteristics of lazy algorithms in kNN

9. Classification via Rule-Based Systems

  • Separate and conquer strategy
  • The OneR algorithm
  • The RIPPER algorithm
  • Derivation of rules from decision trees

10. Principles of Regression

  • Simple linear regression
  • Ordinary least squares estimation
  • Correlation analysis
  • Multiple linear regression

11. Regression Trees and Model Trees

  • Integration of regression functions into tree structures

12. Association Rule Mining

  • The Apriori algorithm
  • Measurements of rule interest: support and confidence
  • Rule set construction using the Apriori principle

Supplementary Topics

  • Spark/PySpark/MLlib implementations
  • Multi-armed bandit algorithms

Requirements

Proficiency in Python programming languages is required for government
 21 Hours

Number of participants


Price per participant

Testimonials (7)

Upcoming Courses

Related Categories