Observations and reflections - DevOpsDays Stockholm 2019
Last time I attended a DevOpsDays conference was in Gothenburg, 2011. At the time I worked as an Operations engineer, and the conference was really inspiring and useful for me for my day-to-day work.
When I heard that DevOpsDays was coming to Stockholm some six months ago, I decided to treat myself with it - with the hope that it would be just as good as last time. I had my doubts - after all, in the eight years since I last attended, I had worked mostly as a manager and toward the end, an Agile Coach.
I needn't have worried.
If you put lots of awesome people in a venue, give them space to talk, feed them and supply them with interesting subject matter in the form of presentations and lightning-talks, the result can't be anything less than motivating.
Here are some of my observations and reflections - some of them organisational, some of them technical, and the rest perhaps a little random.
The understanding of DevOps culture isn't as pervasive as I thought
After all my years at Spotify, where we built up this devops/agile/trust-based culture over time, it's easy to forget how it was at other workplaces. The wall between developers and operations. The paperwork and risks involved in deployments.
After working at a place where devops was so ingrained in how we worked, it became so easy to take it for granted. The problems some participants brought up in the Open Spaces were nightmarish - and I was surprised over how surprised I was.
It was a humbling experience.
Your "code" in your VCS for a project/epic should (must) be more than just the business logic. Your code needs to include config, tests, deployment logic etc.
This should be obvious, but I guess it isn't in some orgs. If you need to go back in time in your VCS to get a earlier working version of your project, you'll likely want to see how that version of your business logic was brought to life in prod.
DevOps is a culture where everyone works together to deliver and operate a feature for the customer.
… and automate the crap out of everything.
If your Definition of Done doesn't include the code hitting prod, you're misleading yourself.
Also, something something "adoption by at least one user"
If you have a dedicated DevOps team alongside your Ops, Dev and QA teams, you're doing it wrong.
… congrats, you just added another silo in your org.
Overheard: References to "Accelerate" by Nicole Forsgren et al
… awesome book, not surprised it came up here. If you haven't read it and you're an engineer(developer,sysadmin,tester,etc), or you're a manager or agile coach, it's definitely worth the read.
Some participants didn't see the connection between Agile/lean and DevOps
… while for others, of course, it was so obvious that it was difficult to explain, much like a mathematical axiom. Both require concepts such as:
Continuous improvement through reflection, sharing, and experimentation
Relationships based on trust
Feedback culture between individuals and teams
Failing fast.
Two different mindsets in orgs: Role and Mission mindset
Role mindset: I do what my role prescribes. This works (?) in orgs operating in ossified commodities markets, where the org/product/technology doesn't change very often.
Mission mindset: I do what I can to achieve the mission. This mindset works if you're doing something new, experimental, or risky. If you don't know what the market will look like tomorrow. Why limit yourself to a role generated by an organisational system designed (even with the best of intents) to solve yesterday's problems?
Overheard about hypergrowth: Throwing people at a problem only introduces more handovers.
Fun way of expressing the problem. Here are my thoughts on what happened during early Spotify days on the matter:
Growing the engineering org to add capacity often leads to people using recruitment as an excuse to not have to sort and truncate the epic backlog, leading to an org unable or unused to the difficult task of prioritisation and saying no to important initiatives.
Also, hypergrowth will unfortunately lower capacity initially thanks to time spent on recruitment and onboarding. The more you hire during a short period, the deeper and longer the cost in capacity. Also, # of incidents due to lack of tactile knowledge will rise, pulling focus away from new deliveries in favour of stabilising the system.
Was surprised when I found out that a Post-mortem at Google was a document, not a meeting
At Spotify, the term "Post-mortem" refers primarily to a meeting. At Google, it refers to a document. The purpose of the two are the same - to fix the root cause(s) and enable learnings from failures. The main difference IMO seems to be the scope of the post-mortem. At Google, the post-mortem document is written primarily by one person and shared with the whole org and posterity. At Spotify, the result of a post-mortem is a shared understanding among engineers and stakeholders of why and how the system failed. The focus is on raising empathy thereby lowering the threshold for sharing learnings in person.
Even if I knew that Google conducted post-mortems, it had never occurred to me that it could be done differently. How many other assumptions have I made over the years?
Overheard: Can DevOps culture happen in a centralised command-and-control org?
This question was discussed or touched upon multiple times during the two days, and the discussions leaned strongly toward the opinion that the engineers who write, test, deploy and maintain the code are best suited to make the decisions necessary to do their work. If they are given relevant information (business info, purpose of the project, who the end-users are, budget, etc) they will make good decisions.
My own musings on giving people mandate, or empowering people:
What we want to do to enable the above is to remove the obstacles which hinder people's natural propensity to take initiative. If you are discussing 'giving mandate' or 'empowering' your staff, there's already something iffy IMO. Mandate isn't something which is given - it's something humans innately possess. Instead of telling someone they now have mandate, consider how your organisation removes mandate from a person. What control mechanisms do you have in place? Identify these, and for each consider why said mechanism is in place. What problem did they solve? Is the problem still a thing? Is there another way of solving the problem without depriving mandate from your teams?
Also - What is the social cost of failure in your org? How do formal and informal leaders react to failure? People can have all the mandate in the world and still feel disempowered if they are scared of public shaming.
Overheard: "Distributed monolith"
In the attempt to transform a backend from monolith to microservice, there's a risk of ending up with a distributed monolith - where one suffers the disadvantages of both monolith and microservices.
If your microservices operate with knowledge of the internals of other services,
If your microservices know what services exist upstream,
If many microservices share a database (or worse, share tables (or gasp, write to the same tables)
… then you probably operate a Distributed Monolith.
On scary deployments
Fear of failure drives out creativity and smothers innovation. And you're likely to be working in an environment where innovation and creativity is paramount. Fear isn't binary - it's a spectrum with with gradually increasing magnitudes - from slight nervousness all the way up to sheer panic. So you might not be terrified when about to deploy something to prod, but anxious. Or nervous.
But this tension shifts our focus from what we like to do - create, experiment and learn. From the company perspective, the fear of failure in teams leads to deploys once every couple months, during a weekend, in the middle of the night.
So what we want to do is change how we do things to decrease our fear and raise our confidence. Each thing below can help you and your team do this:
Release smaller changes: This will make it easier to review your code and understand it. Also, the potential risk of failure becomes smaller since the release is a small one.
Release often to production: You'll find bugs in your release process much more often if you go from one release a month to multiple a day. And you'll start to trust your process more if you do it often,
Automate your release process to reduce human error.
Use feature flags to apply your changes to a smaller demographic, and to be able to decouple deployment of code from when it impacts the user.
Write tests to verify the knowns.
Ensure that monitoring of the service exists from the get-go to detect early signs of system failure or degradation and to get data for troubleshooting and later, for your blameless post-mortem.
All in all, the conference was fun and inspirational. It felt good to be able to both learn from others and at other times, to share my experience and thoughts with others. 10/10, will definitely do this again :)















