What is DevOps?
It happened again this week.
At this Wednesday’s Prometheus meetup I was hosting, I asked one of the attendees what he did for work.
He looked at me briefly before he barked one word in reply — DevOps — and then promptly made a beeline for the pizza at the back of the room.
This was, of course, not the first time this has happened. This happened just the previous week as well.
Every time I attend an Operations or Infrastructure-centric event, I meet people who introduce themselves to me as either “DevOps” or “DevOps engineers”. Why, even speakers at these meetups often refer to “your DevOps team” or “DevOps costs” in their presentations or talks.
“DevOps” is, by far, the most frequent word bandied about at such events. This word often gets deployed in such a mélange of environments and contexts that it’s hard to know what the term even means anymore.
For instance, at this week’s meetup, the one word that singularly stood out in clear relief amidst the general clamor in the room was — “DevOps”, which prompted me to:
Once I’d tweeted that, I gave myself pause immediately and wondered — did I even know what “DevOps” meant before I sounded so de haut en bas about other people’s use of the word?
DevOps and Me
Probably the very first time I’d heard the word “DevOps” in real life was at a company I’d worked for a few years ago. An engineer who’d been interviewed had been deemed to be “good from a DevOps perspective”. At that time, I had — somewhat incorrectly — assumed that it was a phrase used to describe Operations people. The engineer in question had been hired and he’d worked on “DevOps-y” things like setting up a Jenkins CI pipeline, the CD infrastructure, fixing Ansible plays and such like.
This was circa late 2014. Around this time I’d started attending “DevOps meetups” where most talks and conversations were centered around delivering software, continuous integration, monitoring, deployment, incident response, chatops, developer tooling, artifact delivery, build systems, Docker, Docker, Docker… so much Docker that I could’ve been forgiven for thinking that DevOps meant Docker. The entire container revolution had only just begun and everything had been evolving at such a fast clip that simply keeping abreast of the developments had been as exciting as it had been time-consuming. My day job at that time had not been an Operations focused role and my interest in all things Operations and Infrastructure had been purely extracurricular. At most of these events, I’d been pretty much the only one who used to introduce myself saying, “Well, I don’t really work on any of these things but I’m really interested in learning more”, while everyone else had either been working “in DevOps” or in Operations.
However, at no point had I ever felt anything but genuine excitement for all manner of topics that fell under the umbrella of “DevOps”, since these were concepts my formal CS education had failed to teach me but were ever so crucial when it came to successfully running systems. The one thing that everyone I met had seemed to share in common was curiosity and excitement about the new crop of tools that had been emerging to make the process of delivering software more reliable and robust than ever before.
Even now, the one thing that stands out about these local San Francisco meetups I used to attend back in the day is the fact that there had hardly ever been any pointless navel-gazing or platitudes and blather dressed up as thought leadership. On the contrary, there’d always quality content to look forward to and I’d learned far more than I could’ve ever imagined about how to operate systems which I’d neither learned at school before nor at work.
If you’d asked me in 2014, I’d have told you that “DevOps” was full of exciting new tools and paradigm shifting changes, and that I absolutely loved every aspect of it.
However, if you ask me right now, I’d probably tell you that I really have no idea what “DevOps” really is, except that it’s ostensibly not about specific tools or platforms, that every company has a different understanding of the term, and that the term has been co-opted by startups and behemoths alike to shamelessly market their products.
How did I go from loving “DevOps” to becoming skeptical of just everything the term stands for in a matter of about 2 years?
DevOps conferences
Late 2015 or thereabouts was when I’d started following Operations engineers, Ops conferences and the Ops scene in general. The word “DevOps” was used quite liberally at these conferences but in a slightly different context than I’d been used to until then. The themes that the conference talks touched upon weren’t so much individual tools like Spinnaker or Docker or etcd, but broad-ranging subjects like company culture in bringing about sweeping organizational changes, the importance of empathy, the consequences of burn out, the salience of automation which provided the benefits of repeatability, safety and determinism, blameless postmortems, incident response strategies and so forth.
It was at this point that it became clear to me that when people spoke about “DevOps”, they didn’t necessarily mean individual tools or job titles but more a set of practices that roped in both developers and operators to deliver software to the end users in a more holistic and efficient way. It was more a philosophy along the lines of Agile that tried to break down organizational silos and promote cross-functional teamwork and cooperation.
To me, this newfangled definition of DevOps didn’t strike an entirely discordant note compared to what I’d learned until then, but instead provided me with a blueprint for the kind of cultural framework required to successfully adopt the tenets DevOps evangelized. If anything, these ideas put forth at conference talks justified the existence of everything DevOps purported not to be — namely tools, processes and job titles. Of course, if developers had to have a stake in the operation of the services they built, then they were going to need a working CI/CD pipeline (a tool), an agreed upon way to get from starting a feature branch to deploying it (a process) and people to build and maintain systems that automated said process (job title).
However, I also started noticing something starkly different that I wasn’t used to seeing until then — in that many Operations folks seemed angry at the status quo.
Despite the putative benefits and changes the “DevOps” movement seemed to have brought about, Operations engineers seemed angry at developers for not pulling their weight, angry at management and angry at people’s general attitudes towards the Operations role. This was often couched in self-deprecation and a certain sort of gallows humor, but the undercurrent of indignation was undeniable.
It got to the point where Operations conferences all started looking, feeling and sounding the same — where ideas that would’ve seemed obvious to anyone with an iota of common sense were considered Keynote worthy and plenty of talks that were more or less a long whine by Operations people to a room full of mostly other Operations people.
I found this baffling, sad and also a tad contradictory.
I was baffled because as a developer, I considered Operations skills akin to superpowers. Writing code that worked on my laptop was easy — getting it to work across a cluster of machines was orders of magnitudes harder. A good analogy between getting code working on my laptop to having code running in production would be the difference between swimming in an indoor pool versus swimming in choppy rivers full of piranhas. The feeling of being unable to fix my own service running in a foreign environment because I didn’t know how to debug or didn’t have access to the right tool was — and still is — terrifying. Operations people had skills I could only dream of having someday—so then why were they so damned angry at people like me?
It was also sad seeing Operations colleagues in this industry have a massive chip on their shoulders at a time when the Operations space was undergoing a revolution of sorts, what with the rising popularity of infrastructure as code, the push to automate everything, the advent of containerization followed by the concomitant explosion of new tools and idioms. At a time when Ops tooling and resources had never been better, hearing about how “DevOps isn’t about tools” or “Your tools won’t fix anything” didn’t help.
Furthermore, it was hard for me to not see the contradiction inherent in some of the attitudes toward the term “DevOps” mainly originating from Ops people. DevOps was a philosophy championed by the Ops community as a means to an end for improving the status quo and usher in a glorious future of better software lifecycle, but now they seemed disillusioned with it themselves or exasperated with their developer colleagues.
To me, all this seemed extremely confusing, to say the least.
The Role of the Developer
Most developers I know either don’t have any motivation or incentive toward learning the ropes of Operations when everything is automated to the fullest extent. If the “DevOps” movement had ever happened, you only know about it because Operations people constantly talk about it or because it’s now turned into a marketing buzzword.
Today, there is an entire industry waiting in the wings to swoop in and help companies improve on every aspect of Operations and software delivery, be it continuous integration platforms like CircleCI, continuous delivery and feature flagging platforms like LaunchDarkly, monitoring tools for logging, metrics collection, request tracing or anomaly detection, event driven automation tools like StackStorm, A/B testing tools like Optimizely, alerting tools like PagerDuty, exception tracking tools like Sentry or Bugsnag, QA platforms like RainforestQA, managed DNS providers like NS1 or Dyn to name a few.
There are a vast number of vendors whose very success hinges on how effectively they can gain developer mindshare and alter the role played by traditional Operations teams. These vendors (often VC funded startups themselves) are going to leave no stone unturned to make sure they succeed in their mission to replace many core Ops functions. If developers are practicing “DevOps”, it’s because they are being furnished with an extremely accessible set of tools that’ve been built from the ground up to appeal specifically to their sensibilities and skills (or lack thereof). Common “DevOps” tools that developers use (such as PagerDuty, Slack, New Relic, Datadog, Splunk, Heroku, maybe even GitHub at a stretch) ship with integrations for capabilities like continuous delivery, continuous integration, monitoring, alerting, easy rollbacks and what have you. The popularity, ubiquity and indispensability of these products means that it’s possible for even inchoate hobby projects or early stage startups to get a working CI/CD pipeline set up using GitHub, CircleCI and Heroku in a matter of minutes with very little effort.
However, such a setup might prove to be insufficient as the product scales or if the startup begins to gain real traction. This juncture calls for specialized teams for building sophisticated and robust infrastructure, extensive automation as well as scalable and cost-effective alternatives to the erstwhile SaaS based tooling. These teams, from what I’ve seen, tend to be mostly staffed by software engineers tasked with automating and abstracting away the grunt work so that other engineering teams can then leverage these tools, dashboards and increasingly, Slack bots to continuously test, release, build, deploy, monitor and rollback their code.
Most developers I meet — even those who ostensibly deploy their own code by typing a Slack command or clicking a button on a dashboard — are unaware of much of the underlying details abstracted away from them. Typing hubot deploy branch to production
might be all very well, but that doesn’t turn developers any wiser unless they happen to go to the trouble of understanding how build systems work, how the artifact is packaged for the given runtime or compiled for specific architectures, how the artifact caching and delivery system works, how process supervisors work, how a fresh process is started on a subset of hosts, how old processes are drained off all connections or how the load balancer’s config gets redrawn.
Now, one can argue that there exists no need for developers to be knowledgable about these things to effectively build software so long as they have a way to safely test, deploy and rollback. The cry for developers to get better at Operations is something I see eye to eye with, but that said, adopting “DevOps practices” and tools hasn’t necessarily helped software developers get better at Operations, even if they are deploying and maintaining their own code. Adopting these practices and tools has made the process of delivering software more predictable and less error-prone, which is certainly a huge step forward, but simply looking at graphs or running queries against metrics dashboards or typing Slack commands doesn’t an Operations engineer make.
If “DevOps” required developers to get better at Operations, I don’t see how that has succeeded so far, even with the plethora of “DevOps” tools at one’s disposal. I absolutely don’t see product engineering teams ever writing Chef recipes or managing DNS setups or running edge proxies anytime soon, which is just as well, because requiring them to do so would be inordinately foolish.
SRE
I was chatting with one of my friends over drinks a couple of months ago and when I’d asked him if the company he worked at (a company with a preposterous valuation) had adopted DevOps practices and systems, he’d snorted and had told me that:
We don’t do DevOps. Period. We have SRE’s — good ones.
The term SRE was coined at Google and is used to describe “what happens when a software engineer is tasked with what used to be called Operations.” It looks like the entire industry has decided that what’s best for Google is very obviously best for everyone else and now even two-bit startups want an SRE on their staff, natch.
I read this comment today by an engineer on a Slack group I’m a part of:
All teams have extensive automation and things written to support the function. Few years ago we identified more as “ops folks” now we’re much more like “SRE with battle hardening”
Better tooling for testing, deployment, monitoring, debugging, configuration management cropping up like nobody’s business has made “extensive automation” and battle-hardening of Operations tasks not just possible but also less non-trivial than it used to be. “Google Infrastructure for Everyone Else” is a thing now and is something one can buy if only one has enough money to throw at vendors specializing in offering this as a service. Even if the SRE book hadn’t bestowed the word SRE with a certain indisputable cachet, it’s undeniable that the role of the Operations engineer has changed significantly in the past decade, corroborating why Operations engineers would prefer a job title commensurate with their current skills and job responsibilities.
Back in 2012–2013, I don’t recall the term SRE being used quite as much, either in general discourse or in job postings. In 2017, I’m certainly seeing the shift towards the rebranding of Operations roles. Now we have companies with SRE teams, SRE managers, SRE Directors and even sub teams, like Traffic SRE, Edge Infrastructure SRE, Databases/Storage SRE, Cache Services SRE etc.
One of my plain speaking friends — shall we call him A? — went so far as to tell me that:
Ops, as we knew it ten years ago, is fully dead. Ops as we know it today is most likely going to evolve into something entirely different in the near future. More and more of the Ops function is ripe for being commodified and the most successful Ops vendors will be ones who will appeal to developers the most.
When I mentioned what A had told me to another curmudgeonly acquaintance of mine — let’s call him B? — he agreed with A’s assessment. “I went to [redacted] and was frankly embarrassed for the people there,” B told me, referring to an Operations-centric conference he’d been to recently. “I went there looking for solid content but there was probably one or two talks among around 40 that was worth my time. The rest was pure drivel but what was most mysterious to me was that people seemed to greatly enjoy it. Ops conferences these days look more like group therapy than anything else.”
“Where does that leave DevOps?” I’d asked him glumly.
“A Twitter hashtag,” he’d quipped.
Where DOES that leave #DevOps?
I tried to sketch out what an engineering organization might look like in the near future based on conversations I’ve had with friends and what I’ve been hearing in general about how teams are structured at companies, and came up with this diagram:
Folks who describe themselves as “DevOps” or “DevOps engineers” tend to mostly work at the intersection of Infrastructure and Operations. Product focused developers won’t ever perform the role of Operations engineers. They might use platforms, tools and abstractions built for them by Infrastructure and Operations teams, but their primary responsibility will still be developing new features, even in an entirely (quixotic) serverless future. Even if the term “DevOps” becomes passé a couple of years from now, Operations engineers aren’t going to be out of a job — if anything, the demand for specialized Operations skills is only going to increase in an era where the cornucopia of new tools (both SaaS and FOSS) in infrastructure (Kubernetes, Prometheus, and why, even Docker) requires a specialized skillset.
“DevOps” might be a tainted word in some circles in large part due to it amounting to nothing more than a buzzword to refer to a whole slew of vaguely interconnected notions about how to run systems and engineering organizations, but there’s no denying that both Development and Operations teams’ responsibilities have greatly evolved and even converged in the recent years. More and more companies require developers be on-call for their own services with Ops/SRE teams taking on more of a consulting role.
If you think about it, this really is the inevitable end that both the SRE school of thought and the “DevOps” philosophy were advocating for all along, albeit in their own separate way. This might have its fair share of benefits, but make no mistake, being on-call for a service or being able to follow a runbook to the letter without giving much thought as to what is happening under the hood is entirely different from “getting better at Ops”. Automating everything without having developers understand what is being automated, why and how could prove counterproductive in the long term. It brings its own set of technical and cultural challenges to which I doubt there are any easy solutions.
Maybe in the future Operations conferences would shed more light on the challenges of the post-DevOps era and address some of these issues.
Chance would be a fine thing.