Course Outline

Introduction

  • How Site Reliability Engineering (SRE) integrates traditional IT practices with modern software development methodologies.
  • The critical importance of automation and observability in ensuring system reliability.
  • The distinct roles and responsibilities of software engineers and system administrators within the SRE framework.
  • The differences between Site Reliability Engineers and DevOps engineers, highlighting their complementary roles.

Overview of an IT System

  • Architectural considerations for both on-premise and cloud-based systems.

Overview of SRE Principles and Practices

  • The concept of Infrastructure as Code (IaC) and its role in maintaining system reliability.
  • The significance of containerization and orchestration technologies, such as Docker and Kubernetes, in modern IT environments.
  • Continuous Integration, Continuous Deployment, and Continuous Delivery (CI/CD) practices to streamline development and operations.
  • The importance of observability in monitoring and maintaining system health.

Evaluating an IT System

  • Assessing the current team and organizational resources for implementing SRE practices.
  • Mapping out existing systems and processes to identify areas for improvement.
  • Estimating the potential impact of adopting SRE methodologies on system reliability and efficiency.
  • Defining the role of the software engineering team in supporting SRE initiatives.
  • Understanding the responsibilities of the operational team in maintaining system reliability.
  • The role of management in driving and supporting SRE adoption within the organization.

Maintaining the Reliability of a System

  • Defining and measuring the desired reliability levels for services.
  • Understanding Service Level Objectives (SLOs) and their importance in setting reliability targets.
  • Exploring Service Level Indicators (SLIs) and Service Level Agreements (SLAs) to ensure service performance.
  • Utilizing error budgets to manage risk and maintain system reliability.
  • Developing effective SLOs to guide system improvement efforts.

Optimizing System Administration

  • Establishing a robust development environment for efficient system administration.
  • Evaluating and selecting appropriate SRE tools to enhance operational capabilities.
  • Prioritizing tasks for automation to improve efficiency and reduce manual errors.
  • Writing high-quality software to support system reliability and performance.

Deploying "Infrastructure as Code"

  • Testing and iteratively refining code to ensure robustness and reliability.
  • Designing systems to be anti-fragile, capable of improving under stress.
  • Learning from system failures to continuously improve practices and processes.

Monitoring a System

  • Observing and analyzing system performance to identify issues and optimize operations.
  • Utilizing SRE tools and techniques for effective monitoring and troubleshooting.

The Future of SRE

Summary and Conclusion

Requirements

  • A fundamental understanding of IT infrastructure for government.
  • An overview of the software development process.
  • Experience in programming or scripting in any language.

Audience

  • Developers
  • System Administrators
  • Software Architects
  • DevOps Engineers
  • IT Managers
 21 Hours

Number of participants


Price per participant

Testimonials (7)

Upcoming Courses

Related Categories