Course Outline
Introduction
Understanding Big Data for Government
Overview of Spark for Government
Overview of Python for Government
Overview of PySpark for Government
- Distributing Data Using the Resilient Distributed Datasets (RDD) Framework
- Distributing Computation Using Spark API Operators
Setting Up Python with Spark for Government
Setting Up PySpark for Government
Using Amazon Web Services (AWS) EC2 Instances for Spark in a Government Context
Setting Up Databricks for Government
Setting Up the AWS EMR Cluster for Government
Learning the Basics of Python Programming for Government
- Getting Started with Python for Government
- Using the Jupyter Notebook for Government
- Using Variables and Simple Data Types for Government
- Working with Lists for Government
- Using if Statements for Government
- Using User Inputs for Government
- Working with while Loops for Government
- Implementing Functions for Government
- Working with Classes for Government
- Working with Files and Exceptions for Government
- Working with Projects, Data, and APIs for Government
Learning the Basics of Spark DataFrame for Government
- Getting Started with Spark DataFrames for Government
- Implementing Basic Operations with Spark for Government
- Using Groupby and Aggregate Operations for Government
- Working with Timestamps and Dates for Government
Working on a Spark DataFrame Project Exercise for Government
Understanding Machine Learning with MLlib for Government
Working with MLlib, Spark, and Python for Machine Learning in a Government Context
Understanding Regressions for Government
- Learning Linear Regression Theory for Government
- Implementing a Regression Evaluation Code for Government
- Working on a Sample Linear Regression Exercise for Government
- Learning Logistic Regression Theory for Government
- Implementing a Logistic Regression Code for Government
- Working on a Sample Logistic Regression Exercise for Government
Understanding Random Forests and Decision Trees for Government
- Learning Tree Methods Theory for Government
- Implementing Decision Trees and Random Forest Codes for Government
- Working on a Sample Random Forest Classification Exercise for Government
Working with K-means Clustering for Government
- Understanding K-means Clustering Theory for Government
- Implementing a K-means Clustering Code for Government
- Working on a Sample Clustering Exercise for Government
Working with Recommender Systems for Government
Implementing Natural Language Processing (NLP) for Government
- Understanding Natural Language Processing (NLP) for Government
- Overview of NLP Tools for Government
- Working on a Sample NLP Exercise for Government
Streaming with Spark on Python for Government
- Overview of Streaming with Spark for Government
- Sample Spark Streaming Exercise for Government
Closing Remarks for Government
Requirements
- General programming skills
Audience
- Software developers for government
- Information technology professionals
- Data scientists
Testimonials (6)
I liked that it was practical. Loved to apply the theoretical knowledge with practical examples.
Aurelia-Adriana - Allianz Services Romania
Course - Python and Spark for Big Data (PySpark)
The course was about a series of very complex related topics & Pablo has in-depth expertise of each of them. Sometimes nuances were lost in communication and/or due to time pressures and possibly expectations were not quite met due to this. Also there were some UHG/Azure Databricks setup issues however Pablo / UHG resolved these quickly once they became apparent - this to me showed a high level of understanding and professionalism between UHG & Pablo,
Michael Monks - Tech NorthWest Skillnet
Course - Python and Spark for Big Data (PySpark)
Individual attention.
ARCHANA ANILKUMAR - PPL
Course - Python and Spark for Big Data (PySpark)
Hands on Training..
Abraham Thomas - PPL
Course - Python and Spark for Big Data (PySpark)
The lessons were taught in a Jupyter notebook. The topics were structured with a logical sequence and naturally helped develop the session from the easier parts to the more complex. I'm already an advanced user of Python with background in Machine Learning, so found the course easier to follow than, possibly, some of my classmates that took the training course. I appreciate that some of the most elementary concepts were skipped and that he focused on the most substantial matters.
Angela DeLaMora - ADT, LLC
Course - Python and Spark for Big Data (PySpark)
practice tasks