This is a part of our Serverless DevOps series exploring the future role of operations when supporting a serverless infrastructure. Read the entire series to learn more about how the role of operations changes and the future work involved.
I take JSON from one API and I send it to another API.
Let’s talk about diving into, and even learning how to, code. This is not an area we’ve traditionally been responsible for, but the overwhelming trend in operations the past several years is the expectation to deal with code to a greater degree. With serverless, all that’s left is code. Skills in and around code become a minimum requirement. We’re going to talk about some basic expectations and responsibilities for operations people around code and why we shouldn’t be particularly afraid.
The Need to Code
Operations will have to take a greater role in application code, and expect to be required to contribute throughout the entire code lifecycle. That includes writing, review (both before and after writing), testing, and the build and deploy process. That range might seem intimidating, but it’s actually helpful. For those of us who are less advanced in coding, it gives us a variety of ways to contribute as we work to strengthen our skills.
The exact level and type of participation by the operations person will differ depending on the team and organization, of course. Different organizations and teams will have different needs. If your organization doesn't expect or highly value coding among operations engineers, there will inevitably be a more limited coding scope than in an organization that does expect and value the skill. However, that will only last so long as operations engineers will (and should) be expected to level up to increase their effectiveness in this area.
You might be tempted to say, “I’ve gotten along this far without coding, I’m sure I’ll be just fine when we switch to serverless.” You would be wrong.
I’ve tried to imagine a way in which we can remain a productive pod team member without coding skills, and I just can’t. When there’s an issue today, you as an operations person can start by investigating the host or other infrastructure issues, and then hand application code investigation to a developer.
But there is no host for you to investigate with serverless. Your cloud infrastructure is less complex. The majority of what’s left is code, and most issues will stem from that code in some way. You’re not going to have enough work to do, and if you don’t have enough work to do your organization is going to question why you’re there.
I don’t like being negative and I’m sure what I’ve just said, particularly concerning job security, may bother some people. My goal isn’t to offend, but to motivate. I say all this as someone who a few years ago found their operations job in a precarious position because of their limited coding skills. I’ve gone through this once before and I’ve worked to ensure I don’t again. I’d like to keep other people from that same experience.
Serverless Makes Code Accessible
I don’t want to leave people with a feeling of dread about their careers because they now need to learn how to code. You may be scared of coding based on past experiences. That's totally understandable because it's hard.
But what's great about serverless is that it makes coding more approachable. Many of us have stared at a large code base, gotten lost, and given up. I have been there myself and admit coding has been the skill area hardest for me to advance in.
One of the experiences that attracted me to serverless was the simplicity and approachability of the code involved. Take a simple non-serverless HTTP microservice today, written, say, in Python and using the Flask framework. There’s quite a bit of code involved before you ever get to the business logic. You have an application entry point that potentially reads from a configuration file. You need to configure where logs are directed to. Then, you need to establish application routes or endpoints. And finally, you’re adding business logic.
For an experienced coder, that may seem trivial. But for the less experienced, that overhead leads to intimidation and anxiety. There’s a lot going on around the particular code you’re interested in.
Compare that with a serverless function on AWS Lambda, where you define an entry point to your business logic. Your HTTP routes or endpoints are defined on API Gateway with which this function is associated. Your logging is fairly simple because it’s often just to standard output so it’s picked up by CloudWatch.
When you’re looking at a serverless function, you’re mostly looking at only the essential code and not extensive amounts of setup code. I’ve found this focus makes serverless functions easier to debug and understand. I don’t need to snake through an extensive codebase. When I’m investigating an issue, much of the code I need to examine is right there in front of me.
If you’ve had bad experiences with code before, give serverless a chance. You may find your experience different this time around. You may even find the confidence to start building this lagging skill of yours. It happened to me.
Coding Required of Ops
To start, the operations engineer should be proficient in the languages in use by the team. You should be capable of fixing light bugs. While carrying out your reliability responsibilities and investigating errors, you shouldn't stop at determining probable cause and filing a ticket. Always go further into the code involved. That means isolating potentially problematic code, at least to the function instance, and fixing minor issues.
"This code expects an int, but the data actually contains a string representation of an int. Let me handle this."
If you can't fix the bug, you should write a thorough bug report. Guide the developer who will fix the issue as close to the problem source that you found in your investigation. Here, communication skills will be of the highest importance. So let’s dive deeper into what this work looks like.
Code Review
To start, there’s code review. We should be bringing our knowledge and perspective of how the platforms we use work and think about the code as a part of the greater system. I was fortunate enough a few years ago to work with a very good operations-minded developer. They taught me how to write more reliable code by teaching me during review the ways in which cloud systems fail.
It's not that I didn't already know those things, though. I just made assumptions that operations would succeed and often didn't account for failure. I was guilty of putting aside my operations and distributed systems fallacy knowledge as I coded. Assuming the network was reliable was my most common mistake. Review lead to the team building more reliable services.
“This function should have retries because we may fail at the end here. However, you can’t safely retry the function due to this earlier spot. We’ll end up with duplicate DynamoDB records.”
I think there’s also a good, practical reason for operations engineers to be involved. We’ve mentioned the inexperience of many operations people with code. This gives them a chance to pair with a developer and learn.
Testing
Software testing is a step beyond code review. You’re not just evaluating how software works but whether it works, too. Best of all, to test whether code works you have to write more code. For someone inexperienced writing code, these coding tasks should be approachable, and the work provides practical experience.
But shouldn’t a developer be writing their own test? Of course. But how many code bases do you know with 100 percent test coverage? The reality that is software development leaves a gap for us operations people to fill.
Bug Fixes and Basic Features
Ideally, we should eventually level up to be able to code bug fixes and basic features, as well. The only way to stay sharp in a skill, and for operations engineers to level up in this area, is to do some of the work. Work that could be given to a more junior engineer could instead be assigned to the operations engineer.
"I can create a REST API with an endpoint that ingests the data from this service's webhook, enriches the data, and passes it onto this other system."(Inspiration from Alice Goldfuss.)
In this capacity, the operations engineer provides extra development capacity. As your coding ability improves, you can be called on to augment the team's output when deadlines are tight or workload has become too big.
Operations First
One problem I see, which I expect will generate much friction, is developers or organizations incorrectly evaluating the coding skills of an operations engineer. Rather than judging operations engineers on writing the fastest, most efficient, or pedantically and subjective "best" code, they are there to help the team produce the most reliable code.
As only a part-time software developer, an operations engineer should be treated like a junior developer. You should be writing, clear, manageable, and reliable code that solves a defined problem. Beyond that, however, the skills required are that of a more senior software developer.
Work requiring a senior developer should go to a senior developer. Otherwise, the operations engineer will be set up to fail, and the team as a whole will fail to achieve what other more cohesive teams can.
There's still more in our Serverless DevOps series! Read the next piece in our series The Work Of Operating Serverless Systems.
Read The Serverless DevOps Book!
But wait, there's more! We've also released the Serverless DevOps series as a free downloadable book, too. This comprehensive 80-page book describes the future of operations as more organizations go serverless.
Whether you're an individual operations engineer or managing an operations team, this book is meant for you. Get a copy, no form required.