Unlocking Cloud Value: Achieving Operational Excellence Through SRE
Many organizations use public cloud technology to reduce costs and improve business agility, innovation, and resilience. Gen AI is adding even more value to the estimated $3 trillion in EBITDA value from cloud by 2030.
However, many organizations are still working to fully realize the benefits of cloud transformations. In many cases, simply transferring existing models (such as waterfall and ticket-based plan-build-run infrastructure) to the cloud can result in limited value creation or even value destruction (for example, by relying on highly manual processes). As a result, most cloud leaders have learned that modernizing assets on the cloud can enable greater value and benefits than simply “lifting and shifting” assets as-is to the cloud.
To adapt to the cloud, most leading organizations adopt a best-in-class product and platform model to establish a modernized operating model for IT infrastructure. These models typically involve two key parts. First, they include platform engineering for infrastructure, with services managed as products and delivered through self-service APIs and as software through code pipelines. Second, they use site reliability engineering (SRE), which uses software engineering practices and automation to manage application and infrastructure operations more effectively. (See sidebar, “Key types of development approaches.”)
In this article, we explore how to scale SRE in application operations as part of the product and platform operating model. If done successfully, this can help companies achieve greater benefits from their cloud applications and improve their delivery speed, reliability, and efficiency. In our experience, leading enterprises can achieve 60 to 70 percent of their desired financial goals (depending on their current level of maturity and adoption) by integrating SRE best practices into their technology migration to transform the operating model on cloud.
Making the most of cloud migrations with SRE
Many organizations that adopt SRE, however, do not realize its full potential, because they adopt only part of the model. Common failure modes include the following:
- assigning traditional operational support staff to become SRE experts without the right skill or without providing necessary automation
- embedding SRE experts into application teams without defining a clear operating model and responsibilities, resulting in “finger-pointing,” simply handing issues to SRE experts, and expensive operational teams
- keeping SRE teams entirely separate from application teams and not providing them with the authority to push back on software that does not meet the organization’s standards for quality, operations, and resilience
- focusing SREs primarily on reactive manual activities and not prioritizing automation and engineering to reduce demand and operational toil
Key steps for successful SRE implementation
1. Choosing an SRE operating model
Implementing SRE starts with designing an integrated operating model that brings together application, operations, and infrastructure functions. This involves close collaboration with engineering and architecture leaders to align the SRE model with broader strategies and business objectives.
Frequently Asked Questions
What are the key benefits of adopting an SRE model in cloud operations?
By adopting an SRE model, organizations can improve operational productivity, speed, resilience, quality, security, and user experience. This can lead to significant enhancements in overall efficiency and effectiveness of IT operations.
How can organizations ensure successful implementation of SRE practices?
Successful implementation of SRE practices requires a holistic approach, including modernizing ITSM processes, shifting towards platform engineering, investing in talent development, and managing outcomes based on data-driven metrics.
2. Modernizing ITSM operational processes for cloud
Conclusion
Adopting an SRE model is critical to driving operational excellence and achieving benefits along the cloud journey. Successful SRE transformations ensure the adoption of automation and SRE practices, allow for enterprise-wide measurement of value delivery, and drive behavior and culture change.

