Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction, Objectives, and Migration Strategy
- Course goals, alignment with participant profiles, and success criteria
- High-level migration strategies and risk considerations for government
- Setting up workspaces, repositories, and lab datasets
Day 1 — Migration Fundamentals and Architecture
- Lakehouse concepts, Delta Lake overview, and Databricks architecture for government
- SMP vs MPP differences and implications for migration in the public sector
- Medallion (Bronze→Silver→Gold) design and Unity Catalog overview for government workflows
Day 1 Lab — Translating a Stored Procedure
- Hands-on migration of a sample stored procedure to a notebook, ensuring compliance with public sector standards
- Mapping temp tables and cursors to DataFrame transformations for government datasets
- Validation and comparison with original output to ensure data integrity
Day 2 — Advanced Delta Lake & Incremental Loading
- ACID transactions, commit logs, versioning, and time travel for government data management
- Auto Loader, MERGE INTO patterns, upserts, and schema evolution in a public sector context
- OPTIMIZE, VACUUM, Z-ORDER, partitioning, and storage tuning for efficient government operations
Day 2 Lab — Incremental Ingestion & Optimization
- Implementing Auto Loader ingestion and MERGE workflows to enhance data processing in government systems
- Applying OPTIMIZE, Z-ORDER, and VACUUM; validating results for improved performance in public sector applications
- Measuring read/write performance improvements to optimize government operations
Day 3 — SQL in Databricks, Performance & Debugging
- Analytical SQL features: window functions, higher-order functions, JSON/array handling for enhanced government data analysis
- Reading the Spark UI, DAGs, shuffles, stages, tasks, and bottleneck diagnosis to improve public sector workflows
- Query tuning patterns: broadcast joins, hints, caching, and spill reduction for efficient government data processing
Day 3 Lab — SQL Refactoring & Performance Tuning
- Refactor a heavy SQL process into optimized Spark SQL to enhance performance in government systems
- Use Spark UI traces to identify and fix skew and shuffle issues, ensuring data integrity in public sector applications
- Benchmark before/after and document tuning steps for transparent governance in government operations
Day 4 — Tactical PySpark: Replacing Procedural Logic
- Spark execution model: driver, executors, lazy evaluation, and partitioning strategies for efficient government data processing
- Transforming loops and cursors into vectorized DataFrame operations to enhance performance in public sector workflows
- Modularization, UDFs/pandas UDFs, widgets, and reusable libraries for scalable government applications
Day 4 Lab — Refactoring Procedural Scripts
- Refactor a procedural ETL script into modular PySpark notebooks to improve data processing in government systems
- Introduce parametrization, unit-style tests, and reusable functions for robust public sector applications
- Code review and best-practice checklist application to ensure compliance with government standards
Day 5 — Orchestration, End-to-End Pipeline & Best Practices
- Databricks Workflows: job design, task dependencies, triggers, and error handling for seamless government operations
- Designing incremental Medallion pipelines with quality rules and schema validation to ensure data integrity in public sector workflows
- Integration with Git (GitHub/Azure DevOps), CI, and testing strategies for PySpark logic in government applications
Day 5 Lab — Build a Complete End-to-End Pipeline
- Assemble Bronze→Silver→Gold pipeline orchestrated with Workflows to enhance data processing in government systems
- Implement logging, auditing, retries, and automated validations for transparent governance in public sector operations
- Run full pipeline, validate outputs, and prepare deployment notes for efficient government implementation
Operationalization, Governance, and Production Readiness
- Unity Catalog governance, lineage, and access controls best practices for ensuring data integrity in government operations
- Cost, cluster sizing, autoscaling, and job concurrency patterns to optimize resource utilization in public sector applications
- Deployment checklists, rollback strategies, and runbook creation for seamless transition to production environments in government systems
Final Review, Knowledge Transfer, and Next Steps
- Participant presentations of migration work and lessons learned, fostering knowledge sharing within the public sector
- Gap analysis, recommended follow-up activities, and training materials handoff to support ongoing government initiatives
- References, further learning paths, and support options for continuous improvement in government data management
Requirements
- A comprehensive understanding of data engineering principles
- Practical experience with SQL and stored procedures (Synapse / SQL Server)
- Knowledge of ETL orchestration methodologies (Azure Data Factory or similar)
Audience for Government
- Technology managers with a background in data engineering
- Data engineers transitioning from procedural OLAP logic to Lakehouse patterns
- Platform engineers overseeing the adoption of Databricks within their organizations
35 Hours