Course Outline
Machine Learning Introduction for Government
- Types of machine learning – supervised vs unsupervised
- Transition from statistical learning to machine learning
- The data mining workflow: business understanding, data preparation, modeling, deployment
- Selecting the appropriate algorithm for specific tasks
- Addressing overfitting and the bias-variance tradeoff
Python and ML Libraries Overview
- Rationale for using programming languages in machine learning
- Comparing R and Python for machine learning applications
- Introduction to Python and Jupyter Notebooks
- Key Python libraries: pandas, NumPy, scikit-learn, matplotlib, seaborn
Testing and Evaluating ML Algorithms
- Generalization, overfitting, and model validation techniques
- Evaluation strategies: holdout, cross-validation, bootstrapping
- Regression metrics: Mean Error (ME), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE)
- Classification metrics: accuracy, confusion matrix, handling unbalanced classes
- Visualizing model performance: profit curve, Receiver Operating Characteristic (ROC) curve, lift curve
- Selecting models and tuning parameters with grid search
Data Preparation for Government
- Data import and storage in Python
- Conducting exploratory analysis and generating summary statistics
- Handling missing values and outliers in data sets
- Standardization, normalization, and data transformation techniques
- Recoding qualitative data and performing data wrangling with pandas
Classification Algorithms
- Binary vs multiclass classification methods
- Logistic regression and discriminant functions
- Naïve Bayes, k-nearest neighbors (k-NN)
- Decision trees: Classification and Regression Trees (CART), Random Forests, Bagging, Boosting, XGBoost
- Support Vector Machines (SVM) and kernel methods
- Ensemble learning techniques for improved accuracy
Regression and Numerical Prediction
- Least squares regression and variable selection methods
- Regularization techniques: L1 (Lasso), L2 (Ridge)
- Polynomial regression and nonlinear models
- Regression trees and spline functions
Unsupervised Learning
- Clustering methods: k-means, k-medoids, hierarchical clustering, Self-Organizing Maps (SOMs)
- Dimensionality reduction techniques: Principal Component Analysis (PCA), factor analysis, Singular Value Decomposition (SVD)
- Multidimensional scaling for visualizing high-dimensional data
Text Mining
- Text preprocessing and tokenization processes
- Bag-of-words model, stemming, and lemmatization techniques
- Sentiment analysis and word frequency calculations
- Visualizing text data with word clouds for better understanding
Recommendation Systems
- User-based and item-based collaborative filtering methods
- Designing and evaluating recommendation engines for effective use
Association Pattern Mining
- Frequent itemsets and the Apriori algorithm
- Market basket analysis and lift ratio calculations
Outlier Detection
- Extreme value analysis for identifying outliers
- Distance-based and density-based outlier detection methods
- Handling outlier detection in high-dimensional data sets
Machine Learning Case Study
- Understanding the business problem and its context
- Data preprocessing and feature engineering for improved model performance
- Selecting models and tuning parameters for optimal results
- Evaluating and presenting findings in a clear and actionable manner
- Deploying machine learning solutions effectively
Summary and Next Steps
Requirements
- A foundational understanding of statistics and linear algebra
- Familiarity with data analysis or business intelligence principles
- Prior exposure to programming, preferably Python or R, is beneficial
- An interest in applying machine learning techniques for data-driven initiatives for government
Audience
- Data analysts and scientists working in the public sector
- Statisticians and research professionals focused on governmental projects
- Developers and IT professionals exploring machine learning tools for government applications
- Any individual involved in data science or predictive analytics initiatives within a government context
Testimonials (3)
Even with having to miss a day due to customer meetings, I feel I have a much clearer understanding of the processes and techniques used in Machine Learning and when I would use one approach over another. Our challenge now is to practice what we have learned and start to apply it to our problem domain
Richard Blewett - Rock Solid Knowledge Ltd
Course - Machine Learning – Data science
I like that training was focused on examples and coding. I thought that it is impossible to pack so much content into three days of training, but I was wrong. Training covered many topics and everything was done in a very detailed manner (especially tuning of model's parameters - I didn't expected that there will be a time for this and I was gratly surprised).
Bartosz Rosiek - GE Medical Systems Polska Sp. Zoo
Course - Machine Learning – Data science
It is showing many methods with pre prepared scripts- very nicely prepared materials & easy to traceback