- One had only 2 services [php] and ran over 1 billion requests a day. Deploy was trivial, ssh some new files to the server and run a migration, 0 downtime.
- One was in an industry that didn't need "Webscale" (retirement accounts). Prod deploys were just docker commands run by jenkins. We ran two servers per service from the day I joined the day I left 4 years later (3x growth), and ultimately removed one service and one database during all that growth.
Another outstanding thing about both of these places was that we had all the testing environments you need, on-demand, in minutes.
The place I'm at now is trying to do kubernetes and is failing miserably (ongoing nightmare 4 months in and probably at least 8 to go, when it was allegedly supposed to only take 3 total). It has one shared test environment that it takes 3-hours to see your changes in.
I don't fault kubernetes directly, I fault the overall complexity. But at the end of the day kubernetes feels like complexity trying to abstract over complexity, and often I find that's less successful that removing complexity in the first place.
I've only used it managed. There is a bit of a learning curve but it's not so bad. I can't see how it can take 4 months to figure it out.
> I can't see how it can take 4 months to figure it out.
Well have you ever tried moving a company with a dozen services onto kubernetes piece-by-piece, with zero downtime? How long would it take you to correctly move and test every permission, environment variable, and issue you run into?
Then if you get a single setting wrong (e.g. memory size) and don't load-test with realistic traffic, you bring down production, potentially lose customers, and have to do a public post-mortem about your mistakes? [true story for current employer]
I don't see how anybody says they'd move a large company to kubernetes in such an environment in a few months with no screwups and solid testing.
I've seen migrations of thousands of microservices happening with the span of two years. Longer timeline, yes, but the number of microservices is orders of magnitude larger.
Though I suppose the organization works differently at this level. The Kubernetes team build a tool to migrate the microservices, and each owner was asked to perform the migration themselves. Small microservices could be migrated in less than three days, while the large and risk-critical ones took a couple weeks. This all happened in less than two years, but it took more than that in terms of engineer/weeks.
The project was very successful though. The company spends way less money now because of the autoscaling features, and the ability to run multiple microservices in the same node.
Regardless, if the company is running 12 microservices and this number is expected to grow, this is probably a good time to migrate. How did they account for the different shape of services (stateful, stateless, leader elected, cron, etc), networking settings, styles of deployment (blue-green, rolling updates, etc), secret management, load testing, bug bashing, gradual rollouts, dockerizing the containers, etc? If it's taking 4x longer than originally anticipated, it seems like there was a massive failure in project design.
Similarly, the actual migration times you estimate add up to decades of engineer time.
It’s possible kubernetes saves more time than using the alternative costs, but that definitely wasn’t the case at my previous two jobs. The jury is out at the current job.
I see the opportunity cost of this stuff every day at work, and am patiently waiting for a replacement.
Not really, they only had to use the tool to run the migration and then validate that it worked properly. As the other commenter said, a very basic setup for kubernetes is not that hard; the difficult set up is left to the devops team, while the service owners just need to see the basics.
But sure, we can estimate it at 38 engineering years. That's still 38 years for 2,000 microservices; it's way better than 1 year for 12 microservices like in OP's case. Savings that we got was enough to offset these 38 years of work, so this project is now paying dividends.
Learning k8s enough to be able to work with it isn't that hard. Have a centralized team write up a decent template for a CI/CD pipeline, Dockerfile for the most common stacks you use and a Helm chart with an example for a Deployment, PersistentVolumeClaim, Service and Ingress, distribute that, and be available for support should the need for Kubernetes be beyond "we need 1-N pods for this service, they got some environment variables from which they are configured, and maybe a Secret/ConfigMap if the application rather wants configuration to be done in files" is enough in my experience.
I’ve seen a lot of people learn enough k8s to be dangerous.
Learning it well enough to not get wrapped around the axle with some networking or storage details is quite a bit harder.
Anyone wishing to do stuff like use the RDS database provisioner gets an introduction from us on how to use it and what the pitfalls are, and regular reviews of their code. They're flexible but we keep tabs on what they're doing, and when they have done something useful we aren't shy from integrating whatever they have done to our shared template repository.
Unfortunately, I do. Somebody says that when the culture of the organization expects to be told and hear what they want to hear rather than the cold hard truth. And likely the person saying that says it from a perch up high and not responsible for the day to day work of actually implementing the change. I see this happen when the person, management/leadership, lacks the skills and knowledge to perform the work themselves. They've never been in the trenches and had to actually deal face to face with the devil in the details.
My current employer did something similar, but with fewer services. The upshot is that with terraform and helm and all the other yaml files defining our cluster, we have test environments on demand, and our uptime is 100x better.
Memory size is an interesting example. A typical Kubernetes deployment has much more control over this than a typical non-container setup. It is costing you to figure out the right setting but in the long term you are rewarded with a more robust and more re-deployable application.
Actually not true, k8s uses the exact same cgroups API for this under the hood that systemd does.
Which I understand to mean “some people think using Kubernetes will make managing a system easier, but it often will not do that”
Thats how we did it at Google (I was part of the core team responsible for ad serving infra - billions of ads to billions of users a day)
I occasionally try non-k8s approaches to see what I'm missing. I have a small ARM machine that runs Home Assistant and some other stuff. My first instinct was to run k8s (probably kind honestly), but didn't really want to write a bunch of manifests and let myself scope creep to running ArgoCD. I decided on `podman generate systemd` instead (with nightly re-pulls of the "latest" tag; I live and die by the bleeding edge). This was OK, until I added zwavejs, and now the versions sometimes get out of sync, which I notice by a certain light switch not working anymore. What I should have done instead was have some sort of git repository where I have the versions of these two things, and to update them atomically both at the exact same time. Oh wow, I really did need ArgoCD and Kubernetes ;)
I get by with podman by angrily ssh-ing in in my winter jacket when I'm trying to leave my house but can't turn the lights off. Maybe this can be blamed on auto-updates, but frankly anything exposed to a network that is out of date is also a risk, so, I don't think you can ever really win.
Questions worth asking:
- Do you need a load balancer?
- TLS certs and rotation?
- Horizontal scalability.
- HA/DR
- dev/stage/production + being able to test/stage your complete stack on demand.
- CI/CD integrations, tools like ArgoCD or Spinnaker
- Monitoring and/or alerting with Prometheus and Grafana
- Would you benefit from being able to deploy a lot of off the shelf software (lessay Elastic Search, or some random database, or a monitoring stack) via helm quickly/easily.
- "Ingress"/proxy.
- DNS integrations.
If you answer yes to many of those questions there's really no better alternative than k8s. If you're building large enough scale web applications the almost to most of these will end up being yes at some point.
Nah, most of that list is basically free for any company that uses an amazon loadbalancer and an autoscale group. In terms of likeliness of incidents, time, and cost, those will each be an order of magnitude higher with a team of kubernetes engineers than less complex setup.
There are good reasons to use Kubernetes, mainly if you are using public clouds and want to avoid lock-in. I may be partial, since managing it pays my bills. But it is complex, mostly unnecessarily so, and no one should be able to say with a straight face that it achieves better uptime or requires less personnel than any alternative. That's just sales talk, and should be a big warning sign.
It's just very different than your legacy "standard" technology.
There are, maybe, a dozen companies in the world with a large enough physical footprint where Kubernetes might make sense. Everyone else is either engaged in resume-driven development, or has gone down some profoundly wrong path with their application architecture to where it is somehow the lesser evil.
I basically don't have to think about ops anymore until something exotic comes up, it's nice. I agree that it feels clunky, and it was annoying to learn, but once you have something working it's a huge time saver. The ability to scale without drastically changing the system is a bonus.
I can do the same thing with `make local` invoking a few bash commands. If the complexity increases beyond that, a mistake has been made.
And the advantage of it is one way to manage resources, scaling, logging, observability, hardware etc.
All of which is stored in Git and so audited, reviewed, versioned, tested etc in exactly the same way.
You could make the same argument against using cloud at all, or against using CI. The point of Kubernetes isn't to make those things possible, it's to make them easy and consistent.
This is not even close to true with even a small number of resources. The notion that k8s somehow is the only choice is right along the lines of “Java Enterprise Edition is the only choice” — ie a real failure of the imagination.
For startups and teams with limited resources, DO, fly.io and render are doing lots of interesting work. But what if you can’t use them? Is k8s your only choice?
Let’s say you’re a large orgs with good engineering leadership, and you have high-revenue systems where downtime isn’t okay. Also for compliance reasons public cloud isn’t okay.
DNS in a tightly controlled large enterprise internal network can be handled with relatively simple microservices. Your org will likely have something already though.
Dev/Stage/Production: if you can spin up instances on demand this is trivial. Also financial services and other regulated biz have been doing this for eons before k8s.
Load Balancers: lots of non-k8s options exist (software and hardware appliances).
Prometheus / Grafana (and things like Netdata) work very well even without k8s.
Load Balancing and Ingress is definitely the most interesting piece of the puzzle. Some choose nginx or Envoy, but there’s also teams that use their own ingress solution (sometimes open-sourced!)
But why would a team do this? Or more appropriately, why would their management spend on this? Answer: many don’t! But for those that do — the driver is usually cost*, availability and accountability, along with engineering capability as a secondary driver.
(*cost because it’s easy to set up a mixed ability team with experienced, mid-career and new engineers for this. You don’t need a team full of kernel hackers.)
It costs less than you think, it creates real accountability throughout the stack and most importantly you’ve now got a team of engineers who can rise to any reasonable challenge, and who can be cross pollinated throughout the org. In brief the goal is to have engineers not “k8s implementers” or “OpenShift implementers” or “Cloud Foundry implementers”.
And it will likely be buggy with all sorts of edge cases.
> Dev/Stage/Production: if you can spin up instances on demand this is trivial. Also financial services and other regulated biz have been doing this for eons before k8s.
In my experience financial services have been notably not doing it.
> Load Balancers: lots of non-k8s options exist (software and hardware appliances).
The problem isn't running a load balancer with a given configuration at a given point in time. It's how you manage the required changes to load balancers and configuration as time goes on. It's very common for that to be a pile of perl scripts that add up to an ad-hoc informally specified bug-ridden implementation of half of kubernetes.
I have seen this view in corporate IT teams who’re happy to be “implementers” rather than engineers.
In real life, many orgs will in fact have third party vendor products for internal DNS and cert authorities. Writing bridge APIs to these isn’t difficult and it keeps the IT guys happy.
A relatively few orgs have written their own APIs, typically to manage a delegated zone. Again, you can say these must be buggy, but here’s the thing — everything’s buggy. Including k8s. As long as bugs are understood and fixed, no one cares. The proof of the pudding is how well it works.
Internal DNS in particular is easy enough to control and test if you have engineers (vs implementers) in your team.
> manage changes to load balancers … perl
That’s a very black and white view, that teams are either on k8s (which to you is the bees knees) or a pile of Perl (presumably unmaintainable). Speaks to interesting unconscious bias.
Perhaps it comes from personal experience, in which case I’m sorry you had to be part of such a team. But it’s not particularly difficult to follow modern best practices and operate your own stack.
But if your starter stance is that “k8s is the only way”, no one can talk you out of your own mental hard lines.
Kubernetes only works if you have a webapp written in a slow interpreted language. For anything else it is a huge impedance mismatch with what you're actually trying to do.
P.S. In the real world, Kubernetes isn't used to solve technical problems. It's used as a buffer between the dev team and the ops team, who usually have different schedules/budgets, and might even be different corporate entities. I'm sure there might be an easier way to solve that problem without dragging in Google's ridiculous and broken tech stack.
Not true. Unix itself is an API for your cluster too, like the original post implies.
Personally, as a "tech lead" I use NixOS. (Yes, I am that guy.)
The point is, k8s is a shitty API because it's built only for Google's "run a huge webapp built on shitty Python scripts" use case.
Most people don't need this, what they actually want is some way for dev to pass the buck to ops in some way that PM's can track on a Gantt chart.
My understanding is that it was initially sold as Google's tech to benefit from Google's tech reputation (exploiting the confusion caused by the fact that some of the original k8s devs where ex-googlers), and today it's also Google trying to pose as k8s inventor, to benefit from its popularity. Interesting case of host/parasite symbiosis, it seams.
Just my impression though, I can be wrong, please comment if you know more about the history of k8s.
What are you trying to say there? My understanding is that, way under the hood, a set of shell scripts is in fact enabling the scalable nature of… the internet.
I sure hope not. The state of error handling in shell scripts alone is enough to disqualify them for serious production systems.
If you're extremely smart and disciplined it's theoretically possible to write a shell script that handles error states correctly. But there are better things to spend your discipline budget on.
It's like people are stuck in the early 2000s when they start thinking about computer capabilities. Today I have more flops in a single GPU under my desk than did the worlds largest super computer in 2004.
This makes sense, because the code people write makes machines feel like they're from the early 2000's.
This is partially a joke, of course, but I think there is a massive chasm between the people who think you immediately need several computers to do things for anything other than redundancy, and the people who see how ridiculously much you can do with one.
Obviously, some parts of took a while to figure out. For example, I needed to figure out an AWS security group problem with Ingress objects, that I recall wasn't well-documented. So I think parts of that declarative language can suck, if the declarative parts aren't well factored-out from the imperative parts. Or if the log messages don't help you diagnose errors, or if there isn't some kind of (dynamic?) linter that helps you notice problems quickly
In your team's case, more information seems needed to help us evaluate the problems. Why was it easier before to make testing environments, and harder now?
Kubernetes is not simple. In fact it's even more complex than just running an executable with your linux distro's init system. The difference in my mind is that it's more complex for the system maintainer, but less complex for the person deploying workloads to it.
And that's before exploring all the benefits of kubernetes-ecosystem tooling like the Prometheus operator for k8s, or the horizontally scalable Loki deployments, for centrally collecting infrastructure and application metrics, and logs. In my mind, making the most of these kinds of tools, things start to look a bit easier even for the systems maintainers.
Not trying to discount your workplace too much. But I'd wager there's a few people that are maybe not owning up to the fact that it's their first time messing around with kubernetes.
Dear friend, you are not a systems programmer
For many years now, all major container runtimes support nesting. Some make it easy (podman and runc just work), some hard (systemd-nspawn requires setting many flags to work nested). This is called "Docker-in-a-Docker (DinD)".
I prefer FreeBSD to K8s.
For a hobby project, using Docker Compose or Podman combined with systemd and some shell scripts is perfectly fine. You’re the only one responsible, and you have the freedom to choose whatever works best for you.
However, in a company setting, things are quite different. Your boss may assign you new tasks that could require writing a lot of custom scripts. This can become a problem for other team members and contractors, as such scripts are often undocumented and don’t follow industry standards.
In this case, I would recommend using Kubernetes (k8s), but only if the company has a dedicated Kubernetes team with an established on-call rotation. Alternatively, I suggest leveraging a managed cloud service like ECS Fargate to handle container orchestration.
There’s also strong competition in the "Container as a Service" (CaaS) space, with smaller and more cost-effective options available if you prefer to avoid the major cloud providers. Overall, these CaaS solutions require far less maintenance compared to managing your own cluster.
At a previous job at a teeny startup, each instance of the environment is a docker-compose instance on a VPS. It works great, but they’re starting to get a bunch of new clients, and some of them need fully independent instances of the app.
Deployment gets harder with every instance because it’s just a pile of bash scripts on each server. My old coworkers have to run a build for each instance for every deploy.
None of us had used ansible, which seems like it could be a solution. It would be a new headache to learn, but it seems like less of a headache than kubernetes!
* Automating repetitive tasks across many servers.
* Ensuring idempotent configurations (e.g., setting up web servers, installing packages consistently).
* Managing infrastructure as code for better version control and collaboration.
* Orchestrating complex workflows that involve multiple steps or dependencies.
However, Ansible is not a container orchestrator.
Kubernetes (K8s) provides capabilities that Ansible or Docker-Compose cannot match. While Docker-Compose only supports a basic subset, Kubernetes offers:
* Advanced orchestration features, such as rolling updates, health checks, scaling, and self-healing.
* Automatic maintenance of the desired state for running workloads.
* Restarting failed containers, rescheduling pods, and replacing unhealthy nodes.
* Horizontal pod auto-scaling based on metrics (e.g., CPU, memory, or custom metrics).
* Continuous monitoring and reconciliation of the actual state with the desired state.
* Immediate application of changes to bring resources to the desired configuration.
* Service discovery via DNS and automatic load balancing across pods.
* Native support for Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) for storage management.
* Abstraction of storage providers, supporting local, cloud, and network storage.
If you need these features but are concerned about the complexity of Kubernetes, consider using a managed Kubernetes service like GKE or EKS to simplify deployment and management. Alternatively, and this is my prefered option, combining Terraform with a Container-as-a-Service (CaaS) platform allows the provider to handle most of the operational complexity for you.
To that scale you can write a custom orchestrator that is likely to be smaller and simpler than the equivalent K8S setup. Been there, done that.
Using EKS or GKS is basically this. K8s is much nicer than ECS in terms of development and packaging your own apps.
You know what would be even more bad? Introducing kubernates for your non-Google/Netflix/WhateverPlanetaryScale App instead of just writing few scripts...
[1] http://widgetsandshit.com/teddziuba/2010/10/taco-bell-progra...
Ofc it isn't true.
Kubernetes was designed at Google at a time when Google was already a behemoth. 99.99% of all startups and SMEs out there shall never ever have the same scaling issues and automation needs that Google has.
Now that said... When you begin running VMs and containers, even only a very few of them, you immediately run into issues and then you begin to think: "Kubernetes is the solution". And it is. But it is also, in many cases, a solution to a problem you created. Still... the justification for creating that problem, if you're not Google scale, are highly disputable.
And, deep down, there's another very fundamental issue IMO: many of those "let's have only one process in one container" solutions actually mean "we're totally unable to write portable software working on several configs, so let's start with a machine with zero libs and dependencies and install exactly the minimum deps needed to make our ultra-fragile piece of shit of a software kinda work. And because it's still going to be a brittle piece of shit, let's make sure we use heartbeats and try to shut it down and back up again once it'll invariably have memory leaked and/or whatnots".
Then you also gained the right to be sloppy in the software you write: not respecting it. Treating it as cattle to be slaughtered, so it can be shitty. But you've now added an insane layer of complexity.
How do you like your uninitialized var when a container launchs but then silently doesn't work as expected? How do you like them logs in that case? Someone here as described the lack of instant failure on any uninitialized var as the "billion dollar mistake of the devops world".
Meanwhile look at some proper software like, say, the Linux kernel or a distro like Debian. Or compile Emacs or a browser from source and marvel at what's happening. Sure, there may be hickups but it works. On many configs. On many different hardware. On many different architectures. These are robust software that don't need to be "pid 1 on a pristine filesystem" to work properly.
In a way this whole "let's have all our software each as pid 1 each on a pristine OS and filesystem" is an admission of a very deep and profound failure of our entire field.
I don't think it's something to be celebrated.
And don't get me started on security: you know have ultra complicated LANs and VLANs, with a near impossible to monitor traffic, with shitloads of ports open everywhere, the most gigantic attack surface of them all and heartbeats and whatsnots constantly polluting the network, where nobody doesn't even know anymore what's going on. Where the only actual security seems to rely on the firewall being up and correctly configured, which is incredibly complicated to do seen the insane network complexity you added to your stack. "Oh wait, I have an idea, let's make configuring the firewall a service!" (and make sure to not forget to initialize one of the countless var or it'll all silently break and just be not be configuring firewalling for anything).
Now though love is true love: even at home I'm running an hypervisor with VMs and OCI containers ; )
99.99% of startups and SMEs should not be writing microservices.
But "I wrote a commercial system that served thousands of users, it ran on a single process on a spare box out the back" doesn't look good on resumes.
Lol no. The build systems flake out if you look at them funny. The build requirements are whatever Joe in Nebraska happened to have installed on his machine that day (I mean sure there's a text file supposedly listing them, but it hasn't been accurate for 6 years). They list systems that they haven't actually supported for years, because no-one's actually testing them.
I hate containers as much as anyone, but the state of "native" unix software is even worse.
You mean the list of calls right there in the shell script?
> Who will know about those undocumented sysctl edits you made on the VM?
You mean those calls to `sysctl` conveniently right there in the shell script?
> your app needs to programmatically spawn other containers
Or you could run a job queue and push tasks to it (gaining all the usual benefits of observability, concurrency limits, etc), instead of spawning ad-hoc containers and hoping for the best.
Kubernetes and all tooling in the cloud native computing foundation(CNCF) were created to have people adopt the cloud and build communities that then created jobs roles that facilitated hiring people to maintain cloud presences that then fund cloud providers.
This is the same playbook that Microsoft did at Universities. They would give the entire suite of tools in the MSDN library away then then in roughly (4) years collect when another seat needs to be purchased for a new hire that has only used Microsoft tools for the last (4) years.
This is about the worst encoding for network rules I can think of.
(If you've never had to use Helm, I envy you. And if you have, I genuinely look forward to you showing me an easier way to do this, since it would make my life easier.)
-------------------------------------
Shell script:
iptables -A INPUT -p tcp --dport 8080 -j ACCEPT
Multiple ports: for port in 80 443 8080; do
iptables -A INPUT -p tcp --dport "$port" -j ACCEPT
done
Easy and concise.-------------------------------------
Kubernetes (disclaimer: untested, obviously)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
spec:
trafficPolicy:
firewall:
rules:
- name: allow-port-8080
ports:
- port: 8080
protocol: TCP
podSelector:
matchLabels:
app.kubernetes.io/name: my-app
Multiple ports: firewall:
rules:
- name: allow-port-80
ports:
- port: 80
protocol: TCP
- name: allow-port-443
ports:
- port: 443
protocol: TCP
- name: allow-port-8080
ports:
- port: 8080
protocol: TCP
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: firewall
spec:
trafficPolicy:
firewall:
rules:
{{- range .Values.firewall.rules }}
- name: {{ .name }}
ports:
{{- range .ports }}
- port: {{ .port }}
protocol: {{ .protocol }}
{{- end }}
{{- end }}
podSelector:
matchLabels:
app.kubernetes.io/name: my-app
Because if one of those iptables fails above you're in an inconsistent state.
Also if I want to swap from iptables to something like Istio then it's basically the same YAML.
You generally speaking do not want a code generation or service orchestration system that will support the entire universe of choices. You want your programs and idioms to follow similar patterns across your codebase and you want your services architected and deployed the same way. You want to know when outliers get introduced and similarly you want to make it costly enough to require introspection on if the value of the benefit out ways the cost of oddity.
Only offering the correction because I was confused at what you meant by “out ways” until I figured it out.
It's a fun philosophy for online debates, but an expensive one to use in real engineering.
This. I will gladly give up the universe of choices for a one size fits most solution that just works. I will bend my use cases to fit the mold if it means not having to write k8s configuration in a twisty maze of managed services.
I think the real point, better expressed, is that if you find yourself building a system with like a third of the features of K8s but composed of hand-rolled scripts and random third-party tools kludged together, maybe you should have just bit the bullet and moved to K8s instead.
You probably shouldn't start your project on it unless you have a dedicated DevOps department maintaining your cluster for you, but don't be afraid to move to it if your needs start getting more complex.
I appreciate the wide range of interpretations! I don't necessarily think you should always move to k8s in those situations. I just want people to not dismiss k8s outright for being overly-complex without thinking too hard about it. "You will evolve towards analogues of those design ideas" is a good way to put it.
That's also how I interpreted the original post about compilers. The reader is stubbornly refusing to acknowledge that compilers have irreducible complexity. They think they can build something simpler, but end up rediscovering the same path that lead to the creation of compilers in the first place.
The problem is choosing the point of transition, and allocating resources for said transition. Sometimes it's easier to allocate a small chunk to update your bespoke script right now instead of sinking more to a proper migration. It's a typical dilemma of taking debt vs paying upfront.
(BTW the same dilemma exists with running in the cloud vs running on bare metal; the only time when a migration from the cloud is easy is the beginning, when it does not make financial sense.)
Then, if and when you've become so large that the previous thing has become painful and k8s started looking like a really right tool for the job, allocate time and resources, plan a transition, implement it smoothly. If you have grown to such a size, you must have had a few such transitions in your architecture and infrastructure already, and learned to handle them.
Downtime is the same as with a deploment, so if you run at least 2 copies of everything there should be no downtime.
As for updating the images of your containers, you build them again with the newer base image, then deploy.
Upgrades of the Docker image are done by pushing a new image, and updating the Deployment to use the new image, and applying it. Kubernetes will start new containers for the new image, and when they are running, kill off the old containers. There should be no interruption. It isn't any different than normal deploy.
As for Kamal, I shudder to think of the hubris required to say "pfft, haproxy is for lamez, how hard can it be to make my own lb?!" https://github.com/basecamp/kamal-proxy
This fascination with this new garbage-collected language from a Santa Clara vendor is perplexing. You’ve built yourself a COBOL system by another name.
/s
I love the “untested” criticism in a lot of these use-k8s screeds, and also the suggestion that they’re hanging together because of one guy. The implicit criticism is that doing your own engineering is bad, really, you should follow the crowd.
Here’s a counterpoint.
Sometimes just writing YAML is enough. Sometimes it’s not. Eg there are times when managed k8s is just not on the table, eg because of compliance or business issues. Then you’ve to think about self-managed k8s. That’s rather hard to do well. And often, you don’t need all of that complexity.
Yet — sometimes availability and accountability reasons mean that you need to have a really deep understanding of your stack.
And in those cases, having the engineering capability to orchestrate isolated workloads, move them around, resize them, monitor them, etc is imperative — and engineering capability means understanding the code, fixing bugs, improving the system. Not just writing YAML.
It’s shockingly inexpensive to get this started with a two-pizza team that understands Linux well. You do need a couple really good, experienced engineers to start this off though. Onboarding newcomers is relatively easy — there’s plenty of mid-career candidates and you’ll find talent at many LUGs.
But yes, a lot of orgs won’t want to commit to this because they don’t want that engineering capability. But a few do - and having that capability really pays off in the ownership the team can take for the platform.
For the orgs that do invest in the engineering capability, the benefit isn’t just a well-running platform, it’s having access to a team of engineers who feel they can deal with anything the business throws at them. And really, creating that high-performing trusted team is the end-goal, it really pays off for all sorts of things. Especially when you start cross-pollinating your other teams.
This is definitely not for everyone though!
> Do I really need a separate solution for deployment, rolling updates, rollbacks, and scaling.
Yes it's called an ASG.
> Inevitably, you find a reason to expand to a second server.
ALB, target group, ASG, done.
> Who will know about those undocumented sysctl edits you made on the VM
You put all your modifications and CIS benchmark tweaks in a repo and build a new AMI off it every night. Patching is switching the AMI and triggering a rolling update.
> The inscrutable iptables rules
These are security groups, lord have mercy on anyone who thinks k8s network policy is simple.
> One of your team members suggests connecting the servers with Tailscale: an overlay network with service discovery
Nobody does this, you're in AWS. If you use separate VPCs you can peer them but generally it's just editing some security groups and target groups. k8s is forced into needing to overlay on an already virtual network because they need to address pods rather than VMs, when VMs are your unit you're just doing basic networking.
You reach for k8s when you need control loops beyond what ASGs can provide. The magic of k8s is "continuous terraform," you will know when you need it and you likely never will. If your infra moves from one static config to another static config on deploy (by far the usual case) then no k8s is fine.
https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-s...
When you deploy on physical hardware, not VMs, or have to otherwise optimize maximum utilization out of gear you have.
Especially since sometimes Cloud just means hemorrhaging money in comparison to something else, especially with ASGs
Plus you're competing with hypervisors for maxing out hardware which is rock solid stable.
EKS isn't any cheaper either from experience and in hindsight of course it isn't, it's backed by the same things you would deploy without EKS just with another layer. The dream of gains from "OS overhead" and efficient tight-packed pod scheduling doesn't match the reality that our VMs are right-sized for our workloads already and aren't sitting idle. You can't squeeze that much water from the stone even in theory and in practice k8s comes with its own overhead.
Folks that get tied up in the "complexity" argument are forever missing the point.
Experimenting with k8s was very much worthwhile. It's an amazing thing and was in many ways inspirational. But using it would have been swimming against the tide so to speak. So sure I built a mini-k8s-lite, it's better for us, it fits better than wrapping docker compose.
My only doubt is whether I should have used podman instead but at the time podman seemed to be in an odd place (3-4 years ago now). Though it'd be quite easy to switch now it hardly seems worthwhile.
Thank you for building "a kubernetes" for me so I don't have to muck with that nonsense, or have to hire people that do.
I don't know what that other guy is talking about.
The tone's vapidity is only comparable to the content's.
This reads like mocking the target audience rather than showing them how you can help.
A write up that took said "pile of shell scripts that do not work" and showed how to "make it work" with your technology of choice would have been more interesting than whatever this is.
I rewrite all the deployment scripts with bash (took less than a hour) and never had a problem since.
Morality: it's hard to find the right tool for the job
Sans bad engineering practices, if you built a system that did the same things as kubernetes I would have no problem with it.
In reality I don’t want everybody to use k8s. I want people finding different solutions to solve similar problems. Homogenized ecosystems create walls they block progress.
One is the big things that is overlooked when people move to k8s, and why things get better when moving to k8s, is that k8s made a set of rules that forced service owners to fix all of their bad practices.
Most deployment systems would work fine if the same work to remove bad practices from their stack occurred.
K8s is the hot thing today, but mark my words, it will be replaced with something far more simple and much nicer to integrate with. And this will come from some engineer “creating a kubernetes”
Don’t even get me started on how crappy the culture of “you are doing something hard that I think is already a solved problem” is. This goes for compilers and databases too. None is these are hard, and neither is k8s, and all good engineers tasked with making one, be able to do so.
As it is today the pattern is spend a ton of money moving to k8s (mostly costly managed solutions) in the process fix all the bad engineering patterns, forced by k8s. To then have an engineer save the company money by moving back to a more home grown solution, a solution that fits the companies needs and saves money, something that would only be possible once the engineering practices were fixed.
K8s really kills the urge to say “oh well I guess we can just do that file onto the server as a part of startup rather than use a db/config system/etc.” No more “oh shit the VM died and we lost the file that was supposed to be static except for that thing John wrote to update it only if X happened, but now X happens everyday and the file is gone”.. or worse: it’s in git but now you have 3 different versions that have all drifted due to the John code change.
K8s makes you use containers, which makes you not run things on your machine, which makes you better at CI, which.. (the list goes on, containers are industry standard for a lot of reasons). In general the 12 Factor App is a great set of ideas, and k8s lets you do them (this is not exclusive, though). Containers alone are a huge game changer compared to “cp a JAR to the server and restart it”
K8s makes it really really really easy to just split off that one weird cronjob part of the codebase that Mike needed and man, it would be really nice to just use the same code and dependencies rather than boilerplating a whole new app and deploy, CI, configs, and yamls to make that run. See points about containerization.
K8s doesn’t assume that your business will always be a website/mobile app. See the whole “edge computing” trend.
I do want to stress that k8s is not the only thing in the world that can do these or promote good development practices, and I do think it’s overkill to say that it MAKES you do things well - a foolhardy person can mess any well-intentioned system up.
You should check out DBOS and see if it meets your middle ground requirements.
Works locally and in the cloud, has all the things you’d need to build a reliable and stateful application.
[0] https://dbos.dev
... build reliable AI agents with automatic retries and no limit on how long they can
run for.
It's pretty easy to see how that could go badly wrong. ;)(and yeah, obviously "don't deploy that stuff" is the solution)
---
That being said, is it all OSS? I can see some stuff here that seems to be, but it mostly seems to be the client side stuff?
> That being said, is it all OSS?
The Transact library is open source and always will be. That is what you gets you the durability, statefulness, some observability, and local testing.
We also offer a hosted cloud product that adds in the reliability, scalability, more observability, and a time travel debugger.
And if somehow you get a bug in production, you have the time travel debugger to replay exactly what the state of the cloud was at the time.
Or you could launch an EC2 instance that is running ffmpeg and takes in videos and spits out frames, and then use DBOS to manage launching and closing down those instances as well as the workflows of getting the work done.
Every company I've ever had try to do this has ended in crying after some part of the system doesn't fit neat into Serverless box and it becomes painful to extract from your system into "Run FastAPI in containers."
That's my SRE recommendation of "These serverless are a trap, it's quick to get going but you can quickly get locked into a bad place."
But yes, if you find there is something you can't do, you would have to build a container for it or deploy it to an instance of however you want. Although I'd say that mostly likely we'd work with you to make whatever it is you want to do possible.
I'd also consider that an advantage. You aren't locked into the platform, you can expand it to do whatever you want. The whole point of serverless is to make most things easy, not all things. If you can get your POC working without doing anything, isn't that a great advantage to your business?
Let's be real, if you start with containers, it will be a lot harder to get started and then still hard to add whatever functionality you want. Containers doesn't really make anything easier, it just makes things more consistent.
“People will always defend complexity, stating that the only alternative is shell scripts”.
I saw people defending docker this way, ansible this way and most recently systemd this way.
Now we’re on to kubernetes.
Wait. Wouldn't that be a good idea?
To be fair, most people attacking systemd say they want to return to shell scripts.
Its conveniently ignored by systemd-supporters and the conversation always revolves around the fact that we used to use shell scripts. Despite the fact that there are sensible inits that predate systemd that did not use shell languages.
We ended up migrating from Heroku to Kubernetes. I tried to take some of the learnings to build https://github.com/czhu12/canine
It basically wraps Kubernetes and tries to hide as much complexity from Kubernetes as possible, and only expose the good parts that will be enough for 95% of web application work loads.
The real difference in approaches is between short lived environments that you redeploy from scratch all the time and long lived environments we nurse back to health with runbooks.
You can use lambda, kube, etc. or chef, puppet etc. but you end up at this same crossroad.
Just starting a process and keeping it alive for a long time is easy to get started with but eventually you have to pay the runbook tax. Instead you could pay the kubernetes tax or the nomad tax at the start instead of the 12am ansible tax later.
Where's your Sysop?
- only some of the features
- none of the community
- all of the complexity but none of the upsides.
It was genuinely a bit shocking that it was considered a serious product seeing as how chaotic it was.
ECS merges “AWS config” and “app/deployment config together” so it was difficult to separate “what should go in TF, and what is a runtime app configuration. In comparison this is basically trivial ootb with K8s.
I personally found a lot of the moving parts and names needlessly confusing. Tasks e.g. were not your equivalent to “Deployment”.
Want to just deploy something like Prometheus Agent? Well, too bad, the networking doesn’t work the same, so here’s some overly complicated guide where you have to deploy some extra stuff which will no doubt not work right the first dozen times you try. Admittedly, Prom can be a right pain to manage, but the fact that ECS makes you do _extra_ work on top of an already fiddly piece of software left a bad taste in my mouth.
I think ECS get a lot of airtime because of Fargate, but you can use Fargate on K8s these days, or, if you can afford the small increase in initial setup complexity, you can just have Fargates less-expensive, less-restrictive, better sibling: Karpenter on Spot instances.
Every time you have a cluster upgrade with K8s there’s a risk something breaks. For any product at scale, you’re likely to be using things like Istio and Metricbeat. You have a whole level of complexity in adding auth to your cluster on top of your existing SSO for the cloud provider. We’ve had to spend quite some time changing the plugin for AKS/EntraID recently which has also meant a change in workflow for users. Upgrading clusters can break things since plenty of stuff (less these days) lives in beta namespaces, and there’s no LTS.
Again, it’s less bad than it was, but many core things live(d) in plugins for clusters which have a risk of breaking when you upgrade cluster.
My view was that the initial startup cost for ECS is lower and once it’s done, that’s kind of it - it’s stable and doesn’t change. With K8s it’s much more a moving target, and it requires someone to actively be maintaining it, which takes time.
In a small team I don’t think that cost and complexity is worth it - there are so many more concepts that you have to learn even on top of the cloud specific ones. It requires a real level of expertise so if you try and adopt it without someone who’s already worked with it for some time you can end up in a real mess
Also fargate is very expensive and inflexible. If you fit the narrow particular use case it's quicker for bringing up workloads, but you pay extra for it.
Regardless of the outcome, it always felt more important to keep things simple and focus on product and business needs.
I've seen a lot of "piles of YAML", even contributed to some. There were some good projects that didn't end up in disaster, but to me the same could be said for the shell.
Before helm, just trying to run third party containers on bare metal resulted in constant downtime when the process would just hang for no reason, and and engineer would have to SSH and manually restart the instance.
We used this as a previous start up to host metabase, sentry and airbyte seamlessly, on our own cluster. Which let us break out of the constant price increases we faced for hosted versions of these products.
Shameless plug: I’ve been building https://github.com/czhu12/canine to try to make Kubernetes easier to use for solo developers. Would love any feedback from anyone looking to deploy something new to K8s!
They are serialized json objects, the YAML is there just because raw JSON is not user friendly when you need something done quick and dirty or include comments.
Proper templating should never use text templating on manifests.
An anecdotal datapoint: My standard lecture teaching developers how to interact with K8s takes almost precisely 30 minutes to have them writing Helm charts for themselves. I have given it a whole bunch of times and it seems to do the job.
And I can teach someone to write "hello world" in 10 languages in 30 minutes, but that doesn't mean they're qualified to develop or fix production software.
The docs for K8s are incredibly bad for solo devs or small teams, and introduce you to a lot of unnecessary complexity upfront that you just don't need: the docs seem to be written with megacorps in mind who have teams managing large infrastructure migrations with existing, complex needs. To get started on a new project with K8s, you just need a pretty simple set of YAML files:
1. An "ingress" YAML file that defines the ports you listen to for the outside world (typically port 80), and how you listen to them. Using Helm, the K8s package manager, you can install a simple default Nginx-based ingress with minimal config. You probably were going to put Nginx/Caddy/etc in front of your app anyway, so why not do it this way?
2. A "service" YAML file that allocates some internal port mapping used for your web application (i.e. what port do you listen on within the cluster's network, and what port should that map to for the container).
3. A "deployment" YAML file that sets up some number of containers inside your service.
And that's it. As necessary you can start opting into more features; for example, you can add health checks to your deployment file, so that K8s auto-restarts your containers when they die, and you can add deployment strategies there as well, such as rolling deployments and limits on how many new containers can be started before old ones are killed during the deploy, etc. You can add resource requests and limits e.g. make sure my app has at least 500MB RAM, and kill+restart it if it cross 1GB. But it's actually really simple to get started! I think it compares pretty well even to the modern Heroku-replacements like Fly.io... It's just that the docs are bad and the reputation is that it's complicated — and a large part of that reputation is from existing teams who try to do a large migration, and who have very complex needs that have evolved over time. K8s generally is flexible enough to support even those complex needs, but... It's gonna be complex if you have them. For new projects, it really isn't. Part of the reason other platforms are viewed as simpler IMO is just that they lack so many features that teams with complex needs don't bother trying to migrate (and thus never complain about how complicated it is to do complicated things with it).
You can have Claude or ChatGPT walk you through a lot of this stuff though, and thereby get an easier introduction than having to pore through the pretty corporate official docs. And since K8s supports both YAML and JSON, in my opinion it's worth just generating JSON using whatever programming language you already use for your app; it'll help reduce some of the verbosity of YAML.
Sure, it can be easy, just pick one of the many cloud providers that fix all the complicated parts for you. Though, when you do that, expect to pay extra for the privilege, and maybe take a look at the much easier proprietary alternatives. In theory the entire thing is portable enough that you can just switch hosting providers, in practice you're never going to be able to do that without seriously rewriting part of your stack anyway.
The worst part is that the mountains of YAML were never supposed to be written by humans anyway, they're readable configuration your tooling is supposed to generate for you. You still need your bash scripts and your complicated deployment strategies, but rather than using them directly you're supposed to compile them into YAML first.
Kubernetes is nice and all but it's not worth the effort for the vast majority of websites and services. WordPress works just fine without automatic replication and end-to-end microservice TLS encryption.
each app has it's own template e.g. nodejs-worker, and you don't change the template unless you really needed.
i spent ~2% of my manger+eng leader+hiring manger+ god knows what else people do at startup on managing 100+ microservices because they were templates.
The biggest breaking change to docker compose since it was introduced was that the docker-compose command stopped working and I had to switch to «docker compose» with a space. Had I stuck with docker and docker-compose I could have trivially kept everything up to date and running smoothly.
We've just never seen the need for Kubernetes. We're not against it as much as the need to replace our working setup just never arrived. We run EC2 instances with a setup shell script under 50loc. We autoscale up to 40-50 web servers at peak load of a little over 100k concurrent users.
Different strokes for different folks but moreso if it ain't broke, don't fix it
The author has some good points, but not every project needs multiple servers for the same reasons as a typical Kubernetes setup. In many scenarios those servers are dedicated to separate tasks.
For example, you can have a separate server for a redundant copy of your application layer, one server for load balancing and caching, one or more servers for the database, another for backups, and none of these servers requires anything more than separate Docker Compose configs for each server.
I'm not saying that Kubernetes is a bad idea, even for the hypothetical setup above, but you don't necessarily need advanced service discovery tools for every workload.
Shovels and mechanical excavators both exist and have a place on a building site. If you talk to a workman he may well tell you he has regular hammer with him at all times but will use a sledgehammer and even rent a pile driver on occasion if the task demands it.
And yet somehow we as software engineers are supposed to restrict ourselves to The One True Tool[tm] (which varies based on time and fashion) and use it for everything. It's such an obviously dumb approach that even people who do basic manual labour realise its shortcomings. Sometimes they will use a forklift truck to move things, sometimes an HGV, sometimes they will put things in a wheelbarrow and sometimes they will carry them by hand. But us? No. Sophisticated engineers as we are there is One Way and it doesn't matter if you're a 3 person startup or you're Google, if you deploy once per year to a single big server or multiple times per day to a farm of thousands of hosts you're supposed to do it that one way no matter what.
The real rule is this: Use your judgement.
You're supposed to be smart. You're supposed to be good. Be good. Figure out what's actually going on and how best to solve the problems in your situation. Don't rely on everyone else to tell you what to do or blindly apply "best practises" invented by someone who doesn't know a thing about what you're trying to do. Yes consider the experiences of others and learn from their mistakes where possible, but use your own goddamn brain and skill. That's why they pay you the big bucks.