Why Kubernetes? [1]

Or rather, why is the future of infrastructure so freaking scary? (and cool)

Kubernetes. It’s everywhere and the hottest topic in tech for everyone from executives to SREs. But why should you care, and why is the barrier to entry so high?

Tech moves quickly, and we often have amnesia around the issues of the time. So, to figure this out, we need to have an understanding of the problems we are trying to solve.

Let’s talk about “monolithic infrastructure”.

Once upon a time, nobody used AWS. If you were BigCorp XYZ, you would rent physical space in a datacenter, buy servers from Dell or Supermicro, and have your network and engineering teams handle the rest. If you were SmallBusiness ABC, you would get an MSP (Managed Service Provider) who would rent servers from a provider who would “split physical machines into smaller chunks that emulated a full physical machine”, at the cost of performance or flexibility. OpenVZ presented issues where upgrading via APT would break the guest. KVM would mean sacrificing up to 20% of your performance to the hungry hypervisor.

Engineers would manage these servers like pets.

But times have changed. And it all started with Docker, which used an interesting premise. What if instead of deploying each individual server, managing configurations and applications for each (using orchestration tooling such as Ansible or SaltStack), we simply abstracted to a single service per host, and managed it from the application layer?

And thus, Docker and a whole plethora of services were born. Docker Compose, Docker Machine, Docker Swarm…. each iterating upon with what was a pretty innovative and game-changing mindset.

Giving the power to Deploy back to developers.

And then Amazon Web Services, Google Cloud, Microsoft Azure, DigitalOcean, Linode, and the thousand of other PaaS (Platform as a Service) offerings exploded into reality. Pay-per-hour, near-infinite scalability, fully-abstracted infrastructure, price-parity with smaller providers….

SmallBusiness ABC now could afford the same level of infrastructure BigCorp XYZ were using.

The sheer presence of Dockerfile in a repository brought a whole new wave of CI/CD tooling. No longer would Software Engineers be at the mercy of the all-seeing deployment team and their availability. They could track their software from git commit to docker stack deploy. They could trigger deployments from within their code repository tooling.

Infrastructure became Code.

The ever-growing need for more resources on-demand quickly spawned Automatic Scaling. And Cloud Infrastructure was the match made in heaven for this. But we needed more! Automatic scaling was good to a certain point, we could build tooling in order to unintelligently scale ^up^ (an entire section of) our applications.

Some servers would get heavily overloaded, some would be quiet. And the time to deploy the much-needed scaling was still too high. Our bills increased. And we lost track of our cloud inventory.

Enter Terraform. Now, by writing code, you could spin up and manage your entire stack in the Cloud, without even touching the UI. And it was declarative, which meant you could describe what you wanted your infrastructure to look like, and it would build and interweave all of the cloud services for your applications to run snugly on.

It’s about the state we want to achieve, not what’s already there.

It’s 2018. A project that Google had been working on for 4 years has finally become a smash-hit. Kubernetes. Reaching 9th place in commits on GitHub, spawning the CNCF and creating a new frontier of scalability for tech companies to work towards.

So we already abstracted hosts to a single service with Docker. But Kubernetes allowed us to abstract a fleet of hosts.

Moo. Moo. Moo-moo?

Indeed.

Spinning up a fleet of hosts stopped being a process of…
1. Log in to the AWS console.
2. Browse to EC2.
3. Come up with some abstract server hostname that would (make you) sound cool.
4. Invent some resource sizes that you think should be enough (and remember).
5. Repeat from step 1 again.

And became…

Write the following Terraform code.

resource "aws_instance" "web" {
   count         = 3
   ami           = "${data.aws_ami.ubuntu.id}"
   instance_type = "t2.micro"
   tags = {
     Name = "HelloWorld"
   }
 }

Git commit -m “INFRA-001: Added a new host.” && git push
Sit back and wait for your pipeline to validate, plan and implement your new state.

And your application deployment became… even more complicated.

Engineers scratched their collective heads. None of them had needed to write Kubernetes Objects before.

# application/deployment.yaml <-- Wait, how many different files do I need?

apiVersion: apps/v1 # <-- Why do I need this?
kind: Deployment # <-- What the heck is a deployment?
metadata:
  name: nginx-deployment
spec:
  selector: # <-- What is a selector?
    matchLabels:
      app: nginx
  replicas: 2 # <-- Where is my container being put? How can I get access to it?
  template:
    metadata:
      labels:
        app: nginx # <-- What's the difference between metadata.name and this label thing?
    spec:
      containers: # <-- Okay, this part I understand. I've used Docker before.
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80 # <-- Does Kubernetes know how to make this available for me to view?

Kubernetes is scary because it takes the power away from the Engineer.

Kubernetes knows best. That’s why it has a Control Plane, a distributed management system (etcd) and a Scheduler. It is the final frontier in “don’t manage infrastructure, manage your application“.

But it also achieves the opposite affect. You now need a team of engineers who have significant depth of experience in order to implement Kubernetes and manage your clusters.

What storage driver do you use?
What network driver do you use?
What sidecar should you use?
What logging ingress, storage and processing should you use?

There are some opinionated distributions of Kubernetes available that “solve” these problems on a surface level by attaching them as a single package, such as k3s.

Operations is supposed to be about reducing operational complexity, so why has it become so complicated…?

[To be continued.]