Course Outline

Day 1: Data Processing and Python Essentials

Session 1: Spark DataFrames and Basic Operations

  • Utilizing Spark DataFrames for Implementing Basic Operations
  • Groupby and Aggregate Functions
  • Managing Timestamps and Dates
  • Practical Exercise: Conducting Data Analysis Using Spark DataFrames

Session 2: Python Programming for Big Data

  • Essential Python for Data Management, Including Variables, Lists, and Functions
  • Working with Classes and File Operations
  • Integrating APIs and External Data Sources
  • Practical Exercise: Developing a Python Project to Process and Analyze Data Using PySpark

Day 2: Advanced PySpark and Machine Learning

Session 3: Machine Learning with PySpark

  • Implementing Machine Learning Models with Spark MLlib, Including Linear and Logistic Regression
  • Random Forest Classification Techniques
  • Practical Exercise: Constructing and Evaluating Machine Learning Models Using PySpark

Session 4: Clustering and Recommender Systems

  • K-means Clustering Theory and Practical Application
  • Practical Exercise: Developing a K-means Clustering Model
  • Recommender Systems: Building a Recommendation Engine with Spark MLlib
  • Practical Exercise: Recommender System Project

Session 5: Spark Streaming and NLP

  • Real-Time Data Streaming with Spark: Implementing Real-Time Data Processing Solutions for government
  • Practical Exercise: Managing Streaming Data with Spark
  • Natural Language Processing (NLP) with PySpark: Conducting Basic NLP Tasks
  • Practical Exercise: Creating an NLP Pipeline Using PySpark
 14 Hours

Number of participants


Price per participant

Testimonials (1)

Upcoming Courses

Related Categories