Home
Big Data Training
Apache Spark Training
PySpark Training
PySpark & Machine Learning Training Course

PySpark & Machine Learning Training Course

(1 Testimonials)

This section is intentionally left blank for government use.

This course is available as onsite live training in US Government or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

PySpark & Machine Learning

Module 1: Big Data & Spark Foundations

Overview of the Big Data ecosystem and the role of Spark in modern data platforms for government
Understanding Spark architecture: driver, executors, cluster manager, lazy evaluation, Directed Acyclic Graph (DAG), and execution planning
Differences between Resilient Distributed Datasets (RDD) and DataFrame APIs and when to use each approach
Creating and configuring SparkSession and understanding application configuration fundamentals for government

Module 2: PySpark DataFrames

Reading and writing data from enterprise sources and formats (CSV, JSON, Parquet, Delta) for government
Working with PySpark DataFrames: transformations, actions, column expressions, filtering, joins, and aggregations
Implementing advanced operations such as window functions, handling timestamps, and working with nested data
Applying data quality checks and writing reusable, maintainable PySpark code for government

Module 3: Processing Large Datasets Efficiently

Understanding performance fundamentals: partitioning strategies, shuffle behavior, caching, and persistence
Using optimization techniques including broadcast joins and execution plan analysis
Efficient processing of large datasets and best practices for scalable data workflows for government
Understanding schema evolution and modern storage formats used in enterprise environments for government

Module 4: Feature Engineering at Scale

Performing feature engineering with Spark MLlib: handling missing values, encoding categorical variables, and feature scaling
Designing reusable preprocessing steps and preparing datasets for Machine Learning pipelines for government
Introduction to feature selection and handling imbalanced datasets for government

Module 5: Machine Learning with Spark MLlib

Understanding MLlib architecture and the Estimator/Transformer pattern for government
Training regression and classification models at scale (Linear Regression, Logistic Regression, Decision Trees, Random Forest) for government
Comparing models and interpreting results in distributed Machine Learning workflows for government

Module 6: End-to-End ML Pipelines

Building end-to-end Machine Learning pipelines combining preprocessing, feature engineering, and modeling for government
Applying train/validation/test split strategies for government
Performing cross-validation and hyperparameter tuning using grid search and random search for government
Structuring reproducible Machine Learning experiments for government

Module 7: Model Evaluation & Practical ML Decision Making

Applying appropriate evaluation metrics for regression and classification problems for government
Identifying overfitting and underfitting and making practical model selection decisions for government
Interpreting feature importance and understanding model behavior for government

Module 8: Production & Enterprise Practices

Persisting and loading models in Spark for government
Implementing batch inference workflows on large datasets for government
Understanding the Machine Learning lifecycle in enterprise environments for government
Introduction to versioning, experiment tracking concepts, and basic testing strategies for government

Practical Outcome

Ability to work autonomously with PySpark for government
Ability to process large datasets efficiently for government
Ability to perform feature engineering at scale for government
Ability to build scalable Machine Learning pipelines for government

Requirements

This section is intentionally left blank for government use.

21 Hours

Number of participants

Online

Classroom

Select Location

Please select a Venue

Price per participant

Runs with a minimum of 4 + people. For 1-to-1 or private group training, request a quote.

PySpark & Machine Learning Training Course - Booking

Full Name *

Email *

Phone *

Job Title

Company Name

Address 1 *

City *

State / Province

Country *

Postcode *

Start Date

Tax ID

Dates are subject to availability and take place between 09:30 and 16:30.

Payment *

Bank Transfer (Invoice, PO)

Debit / Credit Card

Comments

Terms and Conditions *

I am an authorised representative of the above named client and I wish to book the above courses or services in accordance with NobleProg Terms and Conditions and Privacy Policy.

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

PySpark & Machine Learning Training Course - Enquiry

Full Name *

Email *

Phone *

Number of participants

Company Name

Company Address

How do you want to take the course?

Client Premises

Online

Classroom

State / Province *

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

PySpark & Machine Learning - Consultancy Enquiry

Full Name *

Phone *

Email *

Company Name

State / Province *

Consultancy Subject *

Consultancy Goal

Who will the consultant work with?

Consultancy Urgency *

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Testimonials (1)

I liked that it was practical. Loved to apply the theoretical knowledge with practical examples.

Aurelia-Adriana - Allianz Services Romania

Course - Python and Spark for Big Data (PySpark)

$ 5854 (Classroom)

Related Courses

Python and Spark for Big Data for Banking (PySpark)

14 Hours

Python is a high-level programming language renowned for its clear syntax and code readability. Apache Spark is a powerful data processing engine utilized for querying, analyzing, and transforming large datasets. PySpark facilitates the integration of Spark with Python, enabling seamless data processing workflows. Target Audience: Intermediate-level professionals in the banking sector who are familiar with Python and Spark and aim to enhance their expertise in big data processing and machine learning techniques, specifically for government and private sector applications.

Python and Spark for Big Data (PySpark)

21 Hours

In this instructor-led, live training in [location], participants will learn how to leverage Python and Spark together to analyze large datasets as they engage in hands-on exercises. By the end of this training, participants will be able to: - Utilize Spark with Python for comprehensive big data analysis. - Engage in exercises that simulate real-world scenarios. - Apply various tools and techniques for big data analysis using PySpark, enhancing their capabilities for government projects.

Stratio: Rocket and Intelligence Modules with PySpark

14 Hours

Stratio is a data-centric platform that integrates big data, AI, and governance into a single solution. Its Rocket and Intelligence modules facilitate rapid data exploration, transformation, and advanced analytics in enterprise environments. This instructor-led, live training (online or onsite) is designed for intermediate-level data professionals who aim to effectively utilize the Rocket and Intelligence modules in Stratio with PySpark, focusing on looping structures, user-defined functions, and advanced data logic. By the end of this training, participants will be able to: - Navigate and work within the Stratio platform using the Rocket and Intelligence modules. - Apply PySpark for data ingestion, transformation, and analysis. - Use loops and conditional logic to control data workflows and feature engineering tasks. - Create and manage user-defined functions (UDFs) for reusable data operations in PySpark. **Format of the Course** - Interactive lecture and discussion. - Extensive exercises and practice sessions. - Hands-on implementation in a live-lab environment. **Course Customization Options for Government** To request a customized training for this course, please contact us to arrange.

PySpark & Machine Learning Training Course

Course Outline

Requirements

Testimonials (1)

Aurelia-Adriana - Allianz Services Romania

Course - Python and Spark for Big Data (PySpark)

Upcoming Courses

PySpark & Machine Learning

PySpark & Machine Learning

PySpark & Machine Learning

PySpark & Machine Learning

PySpark & Machine Learning

Related Courses

Python and Spark for Big Data for Banking (PySpark)

Python and Spark for Big Data (PySpark)

Stratio: Rocket and Intelligence Modules with PySpark

Related Categories

PySpark