Course Outline
Week 1 — Introduction to Data Engineering for Government
- Fundamentals of data engineering and modern data stacks for government use
- Data ingestion patterns and sources relevant to public sector operations
- Comparison and application scenarios for batch vs. streaming processing in government contexts
- Hands-on lab: ingesting sample data into cloud storage solutions for government
Week 2 — Databricks Lakehouse Foundation Badge for Government
- Fundamentals of the Databricks platform and workspace navigation for government users
- Core concepts of Delta Lake, including ACID transactions, time travel, and schema evolution in a governmental setting
- Workspace security measures, access controls, and Unity Catalog basics tailored for government data management
- Hands-on lab: creating and managing Delta tables to support government data initiatives
Week 3 — Advanced SQL on Databricks for Government
- Advanced SQL constructs and window functions at scale, with a focus on governmental datasets
- Query optimization techniques, explain plans, and cost-aware patterns for government analytics
- Materialized views, caching strategies, and performance tuning methods suitable for large-scale government data
- Hands-on lab: optimizing analytical queries on extensive government datasets
Week 4 — Databricks Certified Developer for Apache Spark (Prep) for Government
- Deep dive into Spark architecture, RDDs, DataFrames, and Datasets, with applications in government data processing
- Key Spark transformations and actions, along with performance considerations for government use cases
- Basics of Spark streaming and structured streaming patterns for real-time government data analysis
- Practice exam exercises and hands-on test problems to prepare for certification in a governmental context
Week 5 — Introduction to Data Modeling for Government
- Fundamental concepts of dimensional modeling, star/snowflake schema design, and normalization for government data
- Comparison between lakehouse modeling and traditional warehouse approaches in a governmental setting
- Design patterns for creating analytics-ready datasets to support government decision-making
- Hands-on lab: building consumption-ready tables and views tailored for government reporting and analysis
Week 6 — Introduction to Import Tools & Data Ingestion Automation for Government
- Connectors and ingestion tools for Databricks, including AWS Glue, Azure Data Factory, and Kafka, with a focus on government data sources
- Stream ingestion patterns and micro-batch designs suitable for continuous government data streams
- Data validation techniques, quality checks, and schema enforcement methods to ensure reliable government data integrity
- Hands-on lab: building resilient ingestion pipelines to support government data workflows
Week 7 — Introduction to Git Flow and CI/CD for Data Engineering in Government
- Git Flow branching strategies and repository organization practices for government data projects
- CI/CD pipelines for notebooks, jobs, and infrastructure as code, tailored for government deployment environments
- Testing, linting, and deployment automation techniques to enhance the reliability of government data engineering processes
- Hands-on lab: implementing Git-based workflows and automated job deployments in a governmental context
Week 8 — Databricks Certified Data Engineer Associate (Prep) & Data Engineering Patterns for Government
- Review of certification topics and practical exercises to prepare for the Databricks Certified Data Engineer Associate exam in a government setting
- Architectural patterns such as bronze/silver/gold, change data capture (CDC), and slowly changing dimensions, with applications in government data engineering
- Operational patterns including monitoring, alerting, and lineage tracking to ensure robust government data pipelines
- Hands-on lab: building an end-to-end pipeline using advanced data engineering patterns for government operations
Week 9 — Introduction to Airflow and Astronomer; Scripting for Government
- Airflow concepts, including DAGs, tasks, operators, and scheduling, with a focus on governmental workflows
- Overview of the Astronomer platform and best practices for orchestration in government data environments
- Scripting techniques for automation, including Python scripting patterns for government data tasks
- Hands-on lab: orchestrating Databricks jobs using Airflow DAGs to support government data operations
Week 10 — Data Visualization, Tableau, and Customized Final Project for Government
- Connecting Tableau to Databricks and best practices for building BI layers in government settings
- Dashboard design principles and performance-aware visualizations tailored for government reporting needs
- Capstone project: scoping, implementing, and presenting a customized final project relevant to government data initiatives
- Final presentations, peer reviews, and instructor feedback sessions to enhance government data visualization skills
Summary and Next Steps for Government
Requirements
- An understanding of fundamental SQL and data concepts
- Experience with programming in Python or Scala
- Familiarity with cloud services and virtual environments
Audience for Government
- Data engineers, both aspiring and practicing professionals
- ETL/BI developers and analytics engineers
- Data platform and DevOps teams responsible for supporting data pipelines