Going serverless? Here are 9 tips for operations success

A person standing in front of a laptop with a image of Bit the Raccoon on the right.

Serverless is attracting a lot of attention thanks to its ability to create architectures with faster development times and lower potential operational costs, due to its pay-as-you-go model. It can be less expensive if we compare not running servers to running them all the time, but note that there can be additional costs associated with logging and instrumentation.

At London’s Imperial College, Serverless Days 2019 saw a community of serverless developers and engineers discuss ways they can create and maintain secure, observable, high-performance serverless systems.

Developers talked about topics such as security, testing, big data, and serverless on the edge—getting an understanding of what’s coming next and how to provide value for the organisations they work for.

Jennifer Davis, Senior Cloud Advocate for Microsoft, was there to share her thoughts about the evolution of operations in the context of serverless, developing operability in serverless with a focus on monitoring and debugging.

Her views will be of particular interest to anyone working in an operations role, but also with developers, product managers, testers, and security engineers with an interest in improving the operability of serverless applications.

  1. Operations is managing complex engineering puzzles

Rather than thinking of operations as just managing infrastructure, think of it as a complex puzzle with a variety of internal software, external software and services that need to work together. People working in operations build value through fitting together these different pieces with a perspective that includes capacity management, regulatory compliance, security, service health (availability, performance and resilience), networking, database management, incidence response, and disaster planning and recovery with a goal to minimize risk to the organization.

  1. Shift to proactive operations work

A lot of the time due to a lack of resources and time, folks in operations focus effort on reactive work—what’s going wrong now and how it can be fixed. Reactive work is seen as the most valuable because the impact to customers is visible.

It’s critical to do operations engineering work, the proactive work that focuses on eliminating single points of knowledge, addresses failures within system and service integrations, and instrumenting applications to monitor, debug, and fix services. Proactive work is often undervalued because it’s hard to measure the value of something not having an issue.

  1. Serverless doesn’t eliminate operations work

When serverless is discussed, people often refer to a benefit of serverless having a simplified operational model, which they take to mean you don’t need to worry about operations anymore. Yes, it streamlines the operational model for the particular application, but it adds complexity to the overall environment.

For example, areas where we have years of experience monitoring and debugging applications become more tricky as we can’t use system level monitoring to inform what’s happening with our applications! We have to plan this instrumentation as we are developing the application. Serverless requires proactive operations work! Think about serverless in context of the whole software lifecycle—from architectural planning and designing the application all the way to monitoring in production.

  1. Serverless is not suitable for every service within an organisation

Some organisations can work without servers, virtualization or containers entirely. The vast majority of organizations need to have a hybrid approach to building systems. Applying serverless to the wrong use case can introduce security risks, or impact other applications within the environment due to difference in scalability.

  1. Consider serverless as an asset

With serverless, there isn’t a hardware asset to manage anymore, but it can help to think about the serverless application in the context of an organizational asset. In the planning phase, it’s vital to make intentional decisions about whether serverless is the right option for a particular use case compared to what is already in the environment.  In the deploy phase, use continuous integration and deployment with versioned artifacts. In the support phase, understand how to identify when something is not working and how to fix it. Maybe there is impact to tertiary services like databases that can’t handle the number of requests coming in from a popular serverless API. Maybe it’s recognizing with a function is not working correctly. Customers shouldn’t be the mechanism that alerts you to problems in your environment. Finally, during the retire phase it’s critical that services that are no longer in use are retired gracefully.

  1. Seeing the big picture

When it comes to thinking about retiring functions, they are part of the big picture of the environment. Having a broader perspective of all the serverless applications in use is critical for identifying and removing functions that are no longer in use to eliminate potential security holes as well as impacts to service limitations. A big picture view also helps to understand areas where there are single points of knowledge, for example when someone has built out functionality via the console without checking it into version control with additional documentation!

  1. Address capacity planning

Make sure you know how the organization is spending money. It’s not simply about building cool stuff—it’s about building value and understanding that value. Platform services have constraints. Service providers will have different limitations. Think about all the different service integrations and how different parts of the application are going to interact.

  1. Be proactive with serverless visibility

When monitoring serverless, think about the health of the services you’re building and complex functions you’re running. With serverless, you don’t have outside-in metrics in the same way you would have monitoring a physical server environment and applications.  Start with existing indicators such as latency and error counts minimally.

  1. Include customers in the design of your monitoring solution

Talking to your customers is vital. No platform can automatically understand the value you’re bringing to your customers. Talk with your customers to understand what impacts to the service lead to impact to them and surface those indicators through instrumentation, monitoring and observability.

-=-

Jennifer believes improved serverless operability will help build better understanding of what’s happening in your systems. We’re shaping the future of serverless operability through shared stories in the community and with service providers. As serverless evolves, she urges you to share your experiences – when they go right, and especially painful challenges!

 

Learn more