Building distributed systems in using a microservice pattern is hard. We’re always looking for ways to automate any manual processes, or anything that is difficult. Computers don’t make mistakes, and humans aren’t infallible. The more we can rely on a machine-led process, the more reliable a release process can be. This is the journey of versioning and releasing for one of our projects.
I recently attended the National DevOps Conference. I sometimes find it useful to make notes during these kinds of events, and rather than them being buried, unloved, somewhere hidden. I’ve written them up, so here they are for the world to see, and hopefully you’ll find them useful.
On the 26th September 2018, one of AWS’ availability zones in the Ireland region (EU-WEST-1) suffered an incident that lead to “increased error rates”. At Infinity Works we have multiple clients and systems we operate in the AWS Ireland region, during this event our teams observed various symptoms, resulting in some unplanned work. This post details what happened; what we saw and how we plan to mitigate the impact of a similar incident in the future.
Our industry is not shy about inventing new terms to describe what new things they’re doing.
We attended QCon in London last month (which as an aside, I highly recommend if you’ve not been before) and one of the day’s tracks that caught my attention was looking at “DevEx” or “Developer Experience” - the daily life of developers/engineers and how that might be improved through better tooling, process and skills. It got me thinking - what is the “DevEx” for engineers at Infinity Works and what are the things we can do to nurture it?
So I took part in my first Hackathon at the weekend, I wasn’t sure what to expect but I thoroughly enjoyed it - enough to write down a few thoughts.
Code reviews are an integral part of the teams that I work on, and recently I was asked what value they added versus the time they took. Here are a few thoughts about why they are absolutely vital to any software development project.
At Infinity Works, many of us use AWS routinely, but we’re less experienced with Google Cloud Platform. So I decided to find out more and signed up for a one day Google OnBoard event recently. I wasn’t surprised to see a lot of similarity between AWS and GCP. But there were a number of important differences, too — and in many ways GCP looked better. Some of the differences were small conveniences, while others were much more fundamental.
A simple guide, taking a look at how we can enhance the quality of the alert notifications we receive from Alertmanager in our Slack channels. The changes we run through allow us to enrich the quality of the information in our alerts, allowing us to make better decisions up-front when responding to an event.
Thomas Gray recently posted about how he and his team integrated with Vault for secret management using Rancher as a source of truth for authentication and authorisation. This is a follow on post which discusses how my team and I approached a similar problem.
Vault makes use of Shamir’s secret sharing scheme to split a master key into n pieces, requiring at least k of them to be presented at unseal time. At initialisation time, the user specifies what values n and k should take. However, Vault does not make it possible to change the number of shares after initialisation without recreating new shares for existing shareholders. Shamir’s scheme does allow this so I decided to raise a pull request implementing this functionality.
It seems everyone is building a REST API these days - this is great! It’s not easy, but there are a few key things you can do to make it easy for your consumers of your API.
I believe that there has been a cultural shift in recent years around who is responsible for implementing security within a business, and that development teams are becomming more involved in the implementation of security practice, and thereby becoming leaders in it.
The days of imposing an Enterprise Architecture top down are gone. It doesn’t scale and it doesn’t create good architectures. Now it’s all about building the right culture and community, effecting architectural change through organisational change, and creating generative feedback mechanisms which cause sustainable architectures to grow. Instead of cost reduction we focus on maximising customer value and organisational resilience. Build communities. Optimise for flow, not control. Decide by learning, and learn by doing. Good architectures grow from small and simple beginnings.
Every now and then a rash of blog posts appear looking at the process of hiring good engineers. Some just lament the difficulty (it is hard!) and some claim a new silver bullet by doing something unusual. The markets we hire from in both Leeds and London are highly competitive and the best people usually have several companies vying for their attention. When you do find someone there is the difficult task of measuring whether they’ll actually be any good at the job.
It’s Friday night, time to relax after a hard week at work… But what’s this? @alexellisuk posting about a cool plan to create the world’s biggest Docker Swarm cluster… And it’s happening right now!
Visualisation has always been a critical component of the services I’ve worked on. I believe that visualising the services you run and the systems you manage, is a key part of engaging your teams and enabling them to buy into the services you provide. I’ll likely explore this subject further in a future blog, for now though I’d like to talk about visualisation, in this instance of our digital footprint at Infinity Works.
At Infinity Works, we use a variety of different monitoring tools with our clients: Prometheus, AWS CloudWatch, AppDynamics, Splunk, ELK etc. They all appear similar or appear to do similar things, but actually work in very different ways. We got together to talk through the options, starting by getting some definitions out of the way.
Last week, I was fortunate enough to be invited to speak alongside an old friend, Chris Urwin, at Promcon in sunny Berlin. Promcon 2016 was the inaugural Conference for all things Prometheus. For those of you unfamiliar with Prometheus, it’s in essence a queryable metrics store built upon a time series database.
The word ‘chaos’ is used in the Cynefin framework to describe a domain where the agents of a system have no constraints. While the chaotic domain can be a bad place to be, in the right circumstances it can be used to fuel innovation. Liz Keogh has previously posted about isolated idea generation and Cognitive Edge have developed a pattern called Ritual Dissent in order to perform a ‘shallow dive’ into chaos. In this post I’d like to talk about how hack days can be used to do this, provided certain precautions are taken.
Over the last few years, Linux containers and associated technology (Docker, container registries, container orchestration tools) have been widely adopted, with our consultants using them in a number of scenarios, from simplifying the creation of local and shared test environments to running large scale production systems on premises and in the cloud.