CICD, GitOps & me

Automate the automatable; more time writing code!

Aug 27, 2022

Intro

“The most powerful tool we have as developers is automation.”
— Scott Hanselman

In this issue of Lead Engineer, we are going to discuss CICD & GitOps. It might be surprising to see these covered in one blog post, but by the time you are done reading I hope it will all make sense.

We are going to cover the following topics and aim to answer the following questions.:

Topic primer - CICD. What is CICD? What is it and why do I need to use it? What are the challenges I will face when adopting it?
Topic primer - GitOps. So much more than a version control system. What are the challenges I will face when adopting it?
Tools to consider - CICD. In this section we cover a couple of our favourite tools for CICD.
Tools to consider - GitOps. In this section we cover a couple of our favourite tools for GitOps.
Further reading - We are only going to scrape the surface of these technologies here. As always we provide curated further reading so you can dive deep if it interests you.

Note: We are not paid for recommending anything in this blog post. We just like the tools.

Topic primer - CICD

The Best Way to Manage Continuous Integration Testing

CICD stands for Continuous Integration and Continuous Delivery. Calling it CICD makes it sound as if they are a package deal, but that isn’t true. You can have CI without CD, and vice versa. Let’s start by talking about what they are and why that's the case.1

Continuous Integration (CI)

Continuous integration is the process of combining code from multiple contributors into a single location and running validation checks agains them to ensure some level of correctness. A typical CI workflow might be:

Make a branch from “develop” in Github.
Add a cool new feature.
Make a pull request into develop.

At the point of a pull request being created, we have some tool that detects this and goes off and runs some pre-defined scripts or tasks to validate the build. The following is a fairly typical workflow you’d expect a CI tool to run:

Lint
Build
Test
Integration Test

If any of these steps fails, you can configure your source control tool to block the merge. You’ll often hear engineers say things like the “build has gone green”. This typically means CI has passed and they can move on to the next step, which is getting humans to review their code so they can merge it and move on to the CD step.

Below is an example of a typical CI step on a Pull request.2

GitHub: Where the world builds software · GitHub

CI has a lot of value, even without continuous deployment. It ensures that code continuously meets pre-defined benchmarks and removes the burden of the automatable portion of code review off of individuals. This means your colleagues can focus more on the business logic portion of what you implemented safe in the knowledge that your code builds correctly and hasn’t broken any of the old unit tests (and hopefully the new ones pass too).

Continuous Deployment (CD)

CD stands for continuous deployment and is the process of automating the release portion of your deployment workflow. If we continue the example we gave above; After all of our CI steps have passed, we would then merge our branch to develop. This process of merging will trigger our application to be automatically deployed into our staging environment. Once this is done, we may show it to stakeholders, run some manual tests on it or tweak some configuration before making a pull request from develop to main. Once the pull request to main is merged, it will trigger an automatic deployment to production which will lead to the newest version of our code being available to customers. 3

CI/CD – Continuous Integration, Continuous Delivery, and Continuous Deployment

Challenges of CICD adoption

We can’t think of a single reason that every company should not be using some form of CI; it just makes sense. The goal of it is to automate the automatable so your engineers can spend time working on things that matters.

Having the confidence and tooling to safety enable continuous deployment is a challenge, and is a sign of a mature engineering team. In some industries you will hear push back from senior stakeholders that CD is for startups and is not suitable for heavily regulated environments. To be blunt, this is nonsense. There is tons of examples of regulated industries achieving very mature CICD pipeline. If you receive pushback, kindly point your colleagues at this blog post from Monzo, which is a regulated bank in the UK: Monzo: How we deploy to production over 100 times per day

// Lead Engineer Pro Tip: If your company does not currently // have CI or CD, start with CI. It’s a lot easier to see the // benefits and its much easier conceptually to sell. It will // likely make your team more productive and code quality // higher, with less time spent validating. CD can be harder // to sell and it does require a more mature engineering team // and process. // As a Lead Engineer, it is your job to drive this change.

Topic primer - GitOps

As your company grows both in terms of customers and employees, you may start to explore ways to modularise teams and services to enable faster delivery. One way to do this is via the Spotify Squad model or simply by confining your team size by the two pizza team rule. As part of having lots of small teams with lots of services, you may consider platforms such as Kubernetes or Apache Mesos as a means to abstract some of the infrastructure complexity from your engineers. Entire blogs post could (and have) been written on when and why you should consider Kubernetes, but for now let’s just assume that decision has been made and you are using it.

As engineers, you will begin writing YAML files to describe what you would like the state of the Kubernetes cluster to be. Once you apply this file, kubernetes creates objects. These are a "record of intent"-once you create the object, the Kubernetes system will constantly work to ensure that object exists. By creating an object, you're effectively telling the Kubernetes system what you want your cluster's workload to look like; this is your cluster's desired state4. Let's look at a example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

This tells Kubernetes to:

create a deployment object.
Create 3 pods based off of the docker image nginx:1.14.2
expose port 80.

If one of the pods is killed or dies for any reason, Kubernetes will work hard to bring up another pod as soon as possible, as you recorded the fact that you wanted 3, so 2 is not acceptable to Kubernetes.

It is usual to check your YAML definitions into version control like you would with code. By defining the cluster state in YAML and keeping them on hand, we can theoretically recreate the entire cluster from scratch within a few moments if we need to. The fact that Kubernetes is an abstraction over the infrastructure layer means that even moving between Cloud providers is theoretically a quick process. It also means that without even having access to a Kubernetes cluster you can have a good idea of what is running within.

We can still use the CICD approach we mentioned above. It would look something as follows:

As part of CI, build a container image and publish it to a container registry.
As part of CD, update the image on my deployment to the new version. In our example above, we might change nginx:1.14.2 to nginx:1.14.3. This would cause the service to restart with the new version of the code.

However, now we have a problem. Version control is now no longer the source of truth, because it contains the wrong value for the image. This means if we were to try and recreate the cluster, we may deploy old versions of code by accident which could lead to an incident. This is just one simple example. It could be that during a production incident, an engineer modified network policy to resolve it. If this code is not committed back, we are likely to see this incident again next time we do apply the cluster state.

What if our git repository WAS the source of truth for all of our Kubernetes objects? What if instead of having to manually apply changes to the cluster, we continuously checked version control for the latest definitions and updated our cluster to match that definition? What if every single change was auditable by default?

This is GitOps5

With GitOps, our Git repo is constantly checked for changes, and those changes are synced to the cluster. There are two key “modes” you can run GitOps in:

Track only objects you have committed. This means that if you have deployment A and deployment B in your cluster, but deployment B is not checked into your repo, the GitOps operator won’t do anything with it. All changes to deployment B will need to be made manually. This approach arguably only delivers half of the value of GitOps since you could have changes to your cluster that live outside version control.
The repo is the source of truth, if it doesn’t exist in the repo, it does not exist in the cluster. This is the end goal you should aspire towards to really get the most value out of GitOps, however it comes with lots of challenges we will explore below.
Challenges of GitOps adoption

You need to be aware it is happening and how it works. This might seem obvious but in large enterprises, information dissemination is a hard problem. If you do not know GitOps is enabled for a specific object, you might make a change to it on the cluster. However, next time the operator syncs, your change will be overridden with the value in the repo. If you do want to make a change manually, you need to disable GitOps, make the change and then remember to commit your final change before enabling the operator again.
CD becomes a little harder. You now need to make a pull request to deploy your code. Lots of tools have bots and automation to make this possible, but its another step that we didn’t have before. Some teams allow their bot to automatically merge changes for staging (therefore causing automatic deployment) but manual intervention is required to deploy to production. There is no right answer here, it just depends on how your team wants it to work.
There is not great support for deleting objects. Some companies choose to leave deletion a manual activity. You could argue this means GitOps does not deliver fully on its promise and we’d be with you on that. Currently, tools such as KubeDiff are popular for enabling you to understand the difference between the cluster and what is described in your repo. You can use this to trigger alerts, block merges or to even roll back deployments.
Secret management is not automatic. GitOps doesn’t do anything automatically to help you with secrets, so you’ll need to add another tool to help you with this. we recommend Vault.

Tools to consider - CI/CD

Over the years we have seen tons of CI/CD tools drift in and out of popularity (we still love you though Jenkins). Our favourites today are as follows:

Github Actions

https://docs.github.com/en/actions

Github Actions was released in the last couple of years and became an instant hit. If you are a Github user, being able to define your CICD pipeline inside your repository without disrupting your development workflow is a powerful paradigm. If you are a Github user, this would be our number 1 pick for getting started with CI/CD.

Pricing wise; they have a generous free plan that should suit most hobbyists and small companies.

Circle CI

https://circleci.com

Before Github actions, this was our number one pick. It can still be a great option, because you do not need to use Github to use it. In our experience, CI tasks were easy to define, monitor and ran quickly. If you are not using Git, or using Bitbucket or Gitlab, then consider it.

Similar to Github Actions, they have a decent free plan, so it could be worth signing up to both this and Github Actions and seeing which you prefer.

Tools to consider - GitOps

There are less options in the GitOps space but those that are available, work well. Expect this space to explode over the next few years.

Weave GitOps

https://www.weave.works

Weaveworks are credited with creating GitOps, so it should be no surprise to see them on here.

Weave GitOps is a wrapper around Flux (which we talk about below) and aims to simplify the deployment and management of it. One of the nice things about Weave GitOps is the Weave Cloud dashboard which helps with observability. It helps answer questions such as:

If a change is released automatically how do we know it really worked?
How can we be sure that our changes are actually driving improvement?
How do we understand issues, diagnose them and handle incidents?

The free version of Weave GitOps is pretty limited. You can see a comparison here. If you don’t want to pay SaaS pricing or want to have a little more control over your GitOps configuration, this might not be one for you.

Argo CD

https://argo-cd.readthedocs.io/en/stable/

Argo CD does not have any paid for offering; so you must run it yourself. It is fairly simple to do so though. In general, we much preferred the Argo UI. Its really powerful and gives great visualisation into how your cluster is doing:

If you have the desire to run it yourself, Argo would be our number 1 pick.

Flux CD

Flux CD is the open source version of WeaveWorks GitOps. Flux has quite a bit more control available to it than Argo, which could be a good thing for more advanced operators. For example, you can have your git repository checked every five minutes, but then do the sync every 10 minutes. This gives you the ability to stagger how the reconciliation happens.You can also do this on a per application basis. With Argo, you just have a single configuration option for this that is global.

Flux also relies on Kubernetes for RBAC, where as Argo manages this itself. To be truthfully honest, for most teams they will not notice much difference between Argo and Flux. Our preference is for Argo, but its simply because we prefer the UI.

And finally….

Image credit: https://blog.qasource.com/resources/best-way-to-manage-continuous-integration-testing-and-collaboration

Image credit: https://github.com

Image credit: https://cacarer.com/ci-cd-continuous-integration-continuous-delivery-and-continuous-deployment/

ref: https://kubernetes.io/docs/concepts/overview/working-with-objects/kubernetes-objects/

image credit: https://www.weave.works/assets/images/bltcc4877814a615ae1/what-is-gitops.png

Lead Engineer

Discussion about this post

Ready for more?