Course Outline
SRE Anti-patterns
- Identifying counterproductive practices for government operations
- Recognizing the impact of anti-patterns on system reliability
- Best practices and corrective alternatives to enhance reliability
SLO as a Proxy for Customer Satisfaction
- Defining Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
- Managing error budgets and balancing innovation with reliability in public sector services
- Understanding the limitations of distributed systems in government environments
Building Secure and Reliable Systems
- Designing for fault tolerance and resilience in government systems
- Integrating security into reliability engineering practices for government
- Scalability and data protection strategies tailored for public sector use
Full-stack Observability
- Instrumentation and metrics collection methods for government systems
- Distributed tracing and synthetic monitoring techniques for enhanced visibility
- Observability-driven development to improve system transparency and accountability
Platform Engineering and AIOps
- Platform-centered engineering approaches for government applications
- Automation and orchestration in Site Reliability Engineering (SRE) for public sector operations
- Leveraging DataOps and operational intelligence to optimize government services
Incident Management in SRE
- Roles and responsibilities in incident response within government agencies
- Applying frameworks such as the Observe, Orient, Decide, Act (OODA) loop for efficient response
- Automated remediation and AI/ML-assisted resolution to enhance incident management in government systems
Chaos Engineering
- Principles and strategies for resilience testing in government IT environments
- Planning and executing “game day” exercises to prepare for potential failures
- Learning from controlled failure experiments to improve system reliability for government operations
SRE as a Pure Form of DevOps
- Integrating SRE into DevOps workflows within government agencies
- Cultural alignment and collaboration practices for effective implementation in the public sector
- Driving organizational transformation through SRE to enhance government service delivery
Post-class Exercises
- Large-scale system design case studies relevant to government operations
- Advanced instrumentation and monitoring scenarios for public sector systems
- Real-world reliability problem-solving exercises for government applications
Review and Exam Preparation
- Final review of the DevOps Institute SRE Practitioner syllabus with a focus on government relevance
- Sample questions and practice tests to prepare for certification exams in the public sector
- Exam-taking strategies and recommendations tailored for government professionals
Summary and Next Steps
Requirements
- Comprehension of fundamental Site Reliability Engineering principles
- Practical experience with DevOps methodologies and associated tools
- Familiarity with system monitoring, incident management, and automation techniques
Audience for Government
- SRE professionals pursuing the DevOps Institute SRE Practitioner certification
- DevOps engineers looking to transition into reliability-focused roles
- Operations leaders tasked with developing and implementing reliability strategies within their organizations
Testimonials (5)
High level of commitment and knowledge of the trainer
Jacek - Softsystem
Course - DevOps Engineering Foundation (DOEF)®
The break down of what DevOps can do. Possible Automation Integration.
Adeyinka Adekoya - NTPF
Course - Continuous Testing Foundation (CTF)®
working with DevOps Toolchain
Kesh - Vodacom
Course - DevOps Foundation®
new information
Michael Durisin - Deutsche Telekom IT & Telecommunications Slovakia s.r.o
Course - Site Reliability Engineering (SRE) Foundation®
the topic - SRE