Some of you may know that I’m actively working on k8ch.art, but as a individual developer who also works full time, it can be really hard to find the time or sometimes even the motivation to work on projects. Like any other creative process, you can find yourself in a rut. So I decided to work on building a CDN.

I’d like to propose that if you ever find yourself in such a situation, take a break by working on another project. Just like with writing a book or painting a picture, this act of “holidayism” can be extremely healthy. It could even help you learn something or do better on the project you’re stuck on.

For me, this creative break ended up being CattoCDN. A tiny cat picture CDN with AI tagging and with an extreme focus on caching. And yes, I too was surprised “CattoCDN” wasn’t taken, considering the pun “Cat to CDN”.

Architecture diagram

Evidently, since this blog post is up, I actually finished this project. (Congratulations to me, I always believed in you!)

The stack is powered by Cloudflare, AWS CloudFront, Amazon S3Fly.io, and Flask. Thereโ€™s also some built-in anti abuse mechanisms and tagging powered by Amazon Rekognition. I picked tools that I was both:

  1. Familiar with
  2. Wanted to understand the interoperability and use cases for

One of the first challenges in this project was chaining CDNs. Since the content was both static content (images and plain html) and dynamic (image tagging), it was important to keep assets well cached at the edges or else face potentially monumental bill shock.

Understanding Content Distribution

As an SRE, I’m always looking out for metrics to judge the effectiveness of a particular technique with. In the case of content distribution, these are:

  1. The Cache Hit Ratio — how much is served from the cache
  2. Asset Latency — how long it takes to serve an asset in any particular region
  3. Asset Size — optimisation of the asset being served

To improve the cache hit ratio, it was important to make sure the cache was actually being hit for static content. In HTTP lingo, there are a few headers which I had to look out for. These were:

HTTP: cache-control: public, max-age=31536000
Amazon CloudFront: x-cache: Hit from cloudfront
CloudFlare: cf-cache-status: HIT

The cache-control header helps indicate to edge caches on Amazon CloudFront and CloudFlare’s networks how long to hold a particular asset, and the other two headers indicate to the client whether or not a particular asset was served from the cache.

When building your own CDN on top of these technologies, curl is one of the easiest ways to get useful headers. As an example, I wanted to see if my static assets were being served efficiently, so I run the following command a few times. You should see a couple misses, as the edge cache closest to you warms up, downloads the asset to the local cache, and begins serving it without contacting the origin.

curl --head https://cattocdn.com/static/css/bootstrap.min.css

Thankfully, my asset latency was being handled by both AWS or CloudFlare, as their networks seem to work together nicely. You might ask “won’t the end-user latency be determined only by CloudFlare’s network as they are the closest to the edge”? And for that, you’re absolutely right. However, my goal when chaining these two CDNs was not to get the optimal asset latency. It was to reduce cost.

Flask itself also has the ability to serve static content, which I leveraged with the following setting to ensure content was cached for as long as possible:

app.config['SEND_FILE_MAX_AGE_DEFAULT'] = 31536000

When optimising this further, one of the key problems I ran into was with an experimental feature on Fly.io called “statics”. This feature pulls developer-defined static content from your Docker image and serves it from their load balancers directly. I thought this could be useful for my static files such as Javascript, CSS and fonts. Unfortunately, Fly.io hard-coded the cache-control header to something like 30 minutes with an always-fetch directive, resulting in constant misses from CloudFlare and CloudFront. Not great for cost effectiveness!

Perhaps sometime in the future, I’ll move this static content to S3 as well, but the convenience of having it baked in without a pipeline to upload it makes life a lot easier. I just wanted to launch the freaking thing already!

Code Deployment

Speaking of deployment, Fly’s single-line command deployments with flyctl deploy are amazing. This is some of the best tooling I’ve ever seen, and makes me wish we had something that worked nearly as well at my current workplace.

Why AI?

Why did I add an AI component to CattoCDN? There are three reasons behind this.

  1. You can’t trust anyone to be nice to a free service.
  2. It would be cool to add image tags to detect the cat’s breed and background objects in the image.
  3. For fun and learning, of course!

One of the toughest things to mitigate is abuse of free services. And for CattoCDN, the abuse problems could be bad since it allows image upload and sharing. Big platforms have similar anti-abuse mechanisms, but I made sure to be reasonably defensive.

When picking Amazon Rekognition, I was thinking “Let’s use the big guns!”, and honestly I’m very glad I did. Being able to detect abuse upon upload and instantly delete offending content came down to a couple API calls. Pictures containing any form of suggestive content, pictures without cats in them, and other abusive content are detected and removed before they can be served back to users or shared. Content is also permanently deleted after 7 days as a further deterrent against spamming the service. Of course, there will always be people who abuse services like this, but I have a bit of faith in humanity, and in my own ephemerality.

CattoCDN’s Cost

At the moment, to run CattoCDN, it actually costs… $0.00. That’s right, nah, nil, nada.

This is thanks to both advances in technology driving bandwidth costs down, and competition driving the expansion of free tiers, resulting in a win for consumers (developers). CloudFlare is free. CloudFront is free up to 1TB per month. Fly.io has a great free tier which allows you to run 3 apps and enjoy 100GB of global bandwidth, split into regions depending on Fly.io’s infrastructure cost.

Here’s the napkin math for those who are interested:

Service Free Limit Cost
CloudFlare Unlimited until someone says otherwise $0
Fly.io 3 apps, 100GB bandwidth $0
Amazon S3 5 GB of S3 standard storage
(for 12 months)
$0
Amazon CloudFront 1TB / month $0
Amazon Rekognition Analyse 5,000 images / month
(for 12 months)
$0

One of the key metrics for me when creating this CDN was to ensure the cost to keep it running it was as small as possible. To do this, I kept data flowing to Fly.io to a minimum, serving only static HTML files and handling uploads. The second one is mostly because I didn’t want to deal with signed content URLs and such, so maybe I’ll work on it in the future.

Separating content by directory also helped ensure more aggressive caching from the CDNs too, and with the power of CloudFront’s behaviour-driven rule sets, it was extremely easy to configure the level of caching for each directory. Setting up Page Rules for HTML / CSS caching on CloudFlare also helped cache just a little bit more bandwidth. Every byte matters!

Conclusion

Building CattoCDN was both fun and rewarding, allowing me to stretch my creative legs in a low-stress environment. After all, it’s not something people will rely on for mission critical needs, nor is it anything more than a toy or interesting proof of concept for now. I hope you enjoyed the write-up, and feel inspired to create a fun personal project in the future.

Published by Alexander

- Alexander is a professional Operations (DevOps/NetOps/SysOps) SRE and Developer living in Tokyo.