Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction, Objectives, and Migration Strategy
- Course goals, alignment of participant profiles, and criteria for success
- High-level migration approaches and associated risk considerations
- Setting up workspaces, repositories, and laboratory datasets
Day 1 — Migration Fundamentals and Architecture
- Concepts of Lakehouse, overview of Delta Lake, and Databricks architecture for government
- Differences between SMP and MPP and their implications for migration
- Design of the Medallion data pipeline (Bronze→Silver→Gold) and an overview of Unity Catalog
Day 1 Lab — Translating a Stored Procedure
- Hands-on migration of a sample stored procedure to a notebook for government use
- Mapping temporary tables and cursors to DataFrame transformations
- Validation and comparison with the original output
Day 2 — Advanced Delta Lake & Incremental Loading
- ACID transactions, commit logs, versioning, and time travel capabilities in Delta Lake for government
- Auto Loader, MERGE INTO patterns, upserts, and schema evolution techniques
- Optimization strategies including OPTIMIZE, VACUUM, Z-ORDER, partitioning, and storage tuning
Day 2 Lab — Incremental Ingestion & Optimization
- Implementing Auto Loader ingestion and MERGE workflows for government data
- Applying OPTIMIZE, Z-ORDER, and VACUUM; validating results
- Measuring improvements in read/write performance
Day 3 — SQL in Databricks, Performance & Debugging
- Analytical SQL features: window functions, higher-order functions, JSON/array handling for government datasets
- Reading the Spark UI, understanding DAGs, shuffles, stages, tasks, and diagnosing bottlenecks
- Query tuning patterns including broadcast joins, hints, caching, and reducing spills
Day 3 Lab — SQL Refactoring & Performance Tuning
- Refactoring a heavy SQL process into optimized Spark SQL for government operations
- Using Spark UI traces to identify and fix skew and shuffle issues
- Benchmarking before and after, documenting tuning steps
Day 4 — Tactical PySpark: Replacing Procedural Logic
- Spark execution model: driver, executors, lazy evaluation, and partitioning strategies for government data processing
- Transforming loops and cursors into vectorized DataFrame operations
- Modularization techniques, UDFs/pandas UDFs, widgets, and reusable libraries
Day 4 Lab — Refactoring Procedural Scripts
- Refactoring a procedural ETL script into modular PySpark notebooks for government use
- Introducing parametrization, unit-style tests, and reusable functions
- Conducting code reviews and applying best-practice checklists
Day 5 — Orchestration, End-to-End Pipeline & Best Practices
- Databricks Workflows: job design, task dependencies, triggers, and error handling for government operations
- Designing incremental Medallion pipelines with quality rules and schema validation
- Integration with Git (GitHub/Azure DevOps), CI, and testing strategies for PySpark logic in government environments
Day 5 Lab — Build a Complete End-to-End Pipeline
- Assembling a Bronze→Silver→Gold pipeline orchestrated with Workflows for government use
- Implementing logging, auditing, retries, and automated validations
- Running the full pipeline, validating outputs, and preparing deployment notes
Operationalization, Governance, and Production Readiness
- Best practices for Unity Catalog governance, lineage, and access controls in government environments
- Cost management, cluster sizing, autoscaling, and job concurrency patterns for government operations
- Deployment checklists, rollback strategies, and runbook creation for government use
Final Review, Knowledge Transfer, and Next Steps
- Participant presentations of migration work and lessons learned for government projects
- Gap analysis, recommended follow-up activities, and handoff of training materials for government teams
- References, further learning paths, and support options for government personnel
Requirements
- A solid understanding of data engineering principles
- Practical experience with SQL and stored procedures (Synapse / SQL Server)
- Knowledge of ETL orchestration concepts (Azure Data Factory or similar)
Audience
- Technology managers with a background in data engineering
- Data engineers transitioning from procedural OLAP logic to Lakehouse patterns
- Platform engineers responsible for the adoption of Databricks, particularly for government projects
35 Hours