This is a part of our Serverless DevOps series exploring the future role of operations when supporting a serverless infrastructure. Read the entire series to learn more about how the role of operations changes and the future work involved.
Ship it, ship it real good.
What is one of the major obstacles we all face as engineers? Understanding our tools. The tools of our trade come with significant complexity. But as we grow to master them, these tools become significantly more powerful.
Your team's service delivery pipeline is an excellent place to take ownership and apply your skills. Not only is the work important, it's also an excellent way to automate manual tasks suggested elsewhere in this ebook. If you can catch and prevent issues during the service delivery process, you'll spend less time fixing them in production.
What Does Ops Bring?
Throughout my career, I’ve often found myself being the go-to person for explaining the functionality of tools. I'm not sure why, but I think it has a bit to do with my fascination with how the things around me work. I’m also perfectly comfortable with declarative DSLs, YAML, and JSON for automation and infrastructure tasks.
As a general rule, operations people have to learn how the systems they support work, even if they didn't build them. In fact, if your culture is still heavily siloed between development and operations, you're probably accustomed to learning how something you didn't have a hand in designing or building works.
For example, have a look at the service delivery pipeline in your organization and the variety of tools involved in getting a service into production. There's probably some sort of infrastructure automation and orchestration tooling. When someone needs to deploy a new service, how do you create new infrastructure for it? Are you currently a part of maintaining the Puppet/Chef/Ansible infrastructure? What about Terraform and CloudFormation?
These tools are usually a significant part of the service delivery pipeline, often playing a role in the entirety of build, deploy, testing, and management. Their importance to every phase of service delivery can't be ignored. And if people think they're going to go serverless without these automation tools, they're quite simply going to fail.
Next, look at your continuous integration (CI) platform. Considering just about every CI tool was written out of hatred and has lead to another tool people hate, I prefer to focus on the positive. I like CI platforms, even if they are imperfect and occasionally troublesome. Supporting the CI platform in my organization was an opportunity to scale myself through automation. There's no way I was going to look at every change and put it through a checklist before it was deployed. That gatekeeper role just doesn't exist in an increasing number of organizations. But I could write checks to ensure people followed expected practices.
The need for this knowledge of tooling in the software delivery pipeline doesn't go away. Someone will have to take ownership of it and ensure that developers can deliver with minimal friction. That should continue to be operations people.
What Will Ops Be Doing?
Now that we've walked through the service delivery pipeline, what should you be doing in that process? How will you carry out your responsibilities? Let's discuss several areas.
Deploy and Management Tooling
Tooling like CloudFormation, as well as AWS SAM and Serverless Framework (both built on top of CloudFormation), can be complex. I find many developers are not fans of configuration and domain-specific languages. Your dev team, if left to their own devices, will hardcode assumptions and thwart a lot of the problem-solving CFN has built in through sheer frustration with the management tools. CloudFormation is actually quite flexible if you're familiar and understand how to use it.
If you're already doing extensive infrastructure as code in your environment, you're probably the established authority on tools like Puppet, Chef, Terraform, etc. In that position you guide people to use best practices with those tools to accomplish their objective. The same will be true with serverless tooling.
How to do everything in CloudFormation is not always immediately obvious. I've struggled several times when trying to build new services. For example, to build my first CloudFront distributed site — because I found I needed that to do HTTPS with a custom DNS name — took me several tries. Ultimately, I did it by hand and worked backward through the screens in the AWS console and the CloudFormation documentation side by side. Then there was the time I wanted to automate AWS Athena and found out I had to look at the AWS Glue documentation.
How do you assist your team with your chosen tools? First start by being available. Be responsive to requests for help in these areas. Take time to review your team's work and look for areas of improvement, as well.
"You don't need to define the S3 Bucket name in this template. CloudFormation will generate one on its own so we can use CloudFormation's Ref function to pass it to the Lambda function through an environment variable and then here in your code get the name of the bucket from that variable."
Make sure you can provide people with example patterns they can look at. Serverless Framework and AWS SAM CLI can create new projects based on templates. Provide a template for what a RESTful web API should look like, what a single-page application hosted in S3 using CloudFront should look like, and so on. Having these patterns documented and findable will save your team time and make you more productive.
Next, improve the tools on the team. One of Serverless Framework's improvements over CloudFormation is its plugin capabilities. If the tool doesn't do what you want it to do, make it. Can you make your tools do some of your work, such as ensuring adherence to defined engineering standards, automatically? Just like writing Puppet facts and Chef knife plugins, developing Serverless Framework plugins is a handy skill.
I point to serverless-sqs-alarms-plugin routinely as a good example of the sort of tooling development operations engineers should expect to do. You establish an engineering standard that SQS queues should have an alarm to indicate when queue processing is not keeping up. A single CloudWatch alarm can be quite verbose. The serverless-sqs-alarms-plugin, however, allows someone to add multiple alarms with minimal configuration quickly. You've made it easier for your team to add alarms and, in turn, it’s more likely they'll do the work.
Another area to look at is developer workflow. Remember, operations isn't just about operating systems. It's about enabling others to get work done. Look for common frustrations the team experiences and see if you can automate that work. Hands down, my favorite Serverless Framework plugin is serverless-python-requirements. Why? Because it automates away the management of bundling my Python dependencies. It saves me so much effort.
Testing
Now let's talk about testing for serverless. There's a lot of focus on local testing of serverless applications and functions. For instance, there's LocalStack, which will let you run mocked versions of AWS services locally. But I think the desire to run services locally is a holdover from the pre-serverless development days when ops ran local VMs or docker containers to test against.
Why do people insist on full-featured local development environments? One reason is feedback loop speed. People want to find bugs quickly, which is completely understandable. A CloudFormation deployment can take time, which breaks concentration.
A second reason people want a local development environment is because they haven't been allowed to have their own in the cloud. Imagine the cloud provider bill if every developer was allowed to run a variety of VMs in your development environment. Virtual machines and containers made it possible for people to run local instances of the services they needed to work with.
One of the key characteristics of serverless architecture is not paying for idle time. That means the cost reasoning for why people need local instances should no longer apply. My own personal workflow involves writing unit tests with the Python module Moto, which mocks AWS services. I do that to catch the most basic errors before deployment. After unit tests pass, I deploy to my personal environment in AWS immediately to run integration tests. It has saved me a lot of time and hassle by eliminating the need to test everything locally.
Since you're no longer paying for idle capacity, exploit this new characteristic of your systems. Your developers should have their own cloud deployment of each service they need when they need it. As an operations person, don't spend your time trying to figure out how everyone can fully test their serverless applications locally. Do spend your time figuring out how to enable everyone to make as much use of your cloud provider and infrastructure as possible. If you'd like to see exactly how I approach testing and debugging, then read this:
AWS Lambda & Serverless Development - Part 2: Testing & Debugging
Continuous Integration
There will be testing and deployment work, just as there is today. Your continuous integration tool is your friend. It is an automated and scalable version of you. In addition, testing, particularly integration testing, is going to become of increased importance with serverless. As you break down services into smaller, independent pieces, you're going to increase the chance of breakage.
Develop your checklist of things to look for in a project, then automate that checklist. If you've established that all SQS queues need CloudWatch alarms, then write a check. If an SQS queue is found in a project's configuration, the check will look for corresponding CloudWatch alarms and fail if they do not exist. You can find and prevent so many issues just by automating yourself during this stage.
Don't look at your CI platform as just a place to run a service's unit and integration tests, use it as a place to run your own checks, too.
Deployment
Once tests have passed, it’s time to deploy new code. This is one of the less developed parts of the serverless software delivery process. If you're on AWS and using CloudFormation, or tools built on top of it, then your deployment tool is CloudFormation. Just keep in mind that CloudFormation doesn't have built-in capabilities for things like canary releases, blue/green deployments, or rolling releases.
I'm going to start by saying this: Blue/green deployments, canary releases, A/B testing, rolling releases, etc. are all good practices. But I've seen more than a few organizations that either lack those capabilities or they exist in a very rudimentary state, and those organizations manage to function quite well. By delivering small changes, testing those changes extensively, and rolling forward quickly when issues arise, organizations manage to be successful without those capabilities.
I would not let the immaturity of those patterns with serverless hold you back from starting with serverless. I give this advice not out of recklessness or carelessness but out of acknowledgement that there's often a wide gap between best practices and what organizations actually do.
The good news for those of you who are interested in the topics around more robust deployments and reducing issues and errors on deployments, there's a lot of work to be done in this area that can keep you busy.
Closing
Delivery of serverless systems is going to keep you as an operations engineer very busy, particularly in the early days of adopting serverless. And for a variety of reasons, this is a great area for us to apply our skills. First, moving from Puppet and Chef to AWS CloudFormation, AWS SAM, or Serverless Framework is a logical progression in tooling. Additionally, managing the delivery pipeline allows us scale ourselves through automation. It gives us the ability to insert automated checks that ensure a certain level of quality that meets our standards leaves for production.
There's still more in our Serverless DevOps series! Read the next piece in our series Security & DevSecOps.
Read The Serverless DevOps Book!
But wait, there's more! We've also released the Serverless DevOps series as a free downloadable book, too. This comprehensive 80-page book describes the future of operations as more organizations go serverless.
Whether you're an individual operations engineer or managing an operations team, this book is meant for you. Get a copy, no form required.