Course Outline

Introduction

This section provides a general introduction to when to use 'machine learning' for government, what should be considered, and what it all means, including the pros and cons. Topics include data types (structured/unstructured/static/streamed), data validity/volume, data-driven vs. user-driven analytics, statistical models vs. machine learning models, challenges of unsupervised learning, bias-variance trade-off, iteration/evaluation, cross-validation approaches, supervised/unsupervised/reinforcement.

MAJOR TOPICS

1. Understanding Naive Bayes

  • Basic concepts of Bayesian methods
  • Probability
  • Joint probability
  • Conditional probability with Bayes' theorem
  • The naive Bayes algorithm
  • The naive Bayes classification
  • The Laplace estimator
  • Using numeric features with naive Bayes

2. Understanding Decision Trees

  • Divide and conquer
  • The C5.0 decision tree algorithm
  • Choosing the best split
  • Pruning the decision tree

3. Understanding Neural Networks

  • From biological to artificial neurons
  • Activation functions
  • Network topology
  • The number of layers
  • The direction of information travel
  • The number of nodes in each layer
  • Training neural networks with backpropagation
  • Deep Learning

4. Understanding Support Vector Machines

  • Classification with hyperplanes
  • Finding the maximum margin
  • The case of linearly separable data
  • The case of non-linearly separable data
  • Using kernels for non-linear spaces

5. Understanding Clustering

  • Clustering as a machine learning task
  • The k-means algorithm for clustering
  • Using distance to assign and update clusters
  • Choosing the appropriate number of clusters

6. Measuring Performance for Classification

  • Working with classification prediction data
  • A closer look at confusion matrices
  • Using confusion matrices to measure performance
  • Beyond accuracy – other measures of performance
  • The kappa statistic
  • Sensitivity and specificity
  • Precision and recall
  • The F-measure
  • Visualizing performance tradeoffs
  • ROC curves
  • Estimating future performance
  • The holdout method
  • Cross-validation
  • Bootstrap sampling

7. Tuning Stock Models for Better Performance

  • Using caret for automated parameter tuning
  • Creating a simple tuned model
  • Customizing the tuning process
  • Improving model performance with meta-learning
  • Understanding ensembles
  • Bagging
  • Boosting
  • Random forests
  • Training random forests
  • Evaluating random forest performance

MINOR TOPICS

8. Understanding Classification Using the Nearest Neighbors

  • The kNN algorithm
  • Calculating distance
  • Choosing an appropriate k
  • Preparing data for use with kNN
  • Why is the kNN algorithm lazy?

9. Understanding Classification Rules

  • Separate and conquer
  • The One Rule algorithm
  • The RIPPER algorithm
  • Rules from decision trees

10. Understanding Regression

  • Simple linear regression
  • Ordinary least squares estimation
  • Correlations
  • Multiple linear regression

11. Understanding Regression Trees and Model Trees

  • Adding regression to trees

12. Understanding Association Rules

  • The Apriori algorithm for association rule learning
  • Measuring rule interest – support and confidence
  • Building a set of rules with the Apriori principle

Extras

  • Spark/PySpark/MLlib and Multi-armed bandits

Requirements

Python Expertise for Government

The Python programming language is a versatile tool that can significantly enhance the efficiency and effectiveness of various operations within government agencies. Its simplicity and readability make it an ideal choice for developing robust applications, automating tasks, and conducting data analysis. For government entities, Python's extensive library support and community-driven development ensure that it remains at the forefront of technological advancements, facilitating seamless integration into existing workflows. Additionally, Python's ability to handle large datasets and complex algorithms aligns well with the growing need for data-driven decision-making in the public sector.

 21 Hours

Number of participants


Price per participant

Testimonials (7)

Upcoming Courses

Related Categories