Depending on my client's needs we do it oldschool and just rent a beefy server.
Using your brain to actually assess a situation and decide without any emotional or monetary attachment to a specific solution actually works like a charm most of the time.
I also had customers who run their own cloud based on k8s.
And I heard some people have customers that are on a public cloud ;-)
Choose the right solution for the problem at hand.
https://news.ycombinator.com/item?id=20371961
The stuff I posted about Kubernetes did not draw a conversation, but I was simply documenting what I was seeing: vast over-investment in devops even at tiny startups that were just getting going and could have easily dumped everything on a single server, exactly as we used to do things back in 2005.
It's not clear that docker-compose or even kubernetes* is that much more complicated if you are only running 3 things.
* if you are an experienced user
The complexity of PVCs in my experience isn't really that big compared to this, possibly lower, and I did stuff both ways.
Every generation has to make similar mistakes again and again.
I am sure if we had the opportunity and the hype was there we would've used k8s in 2005 as well.
The same thing is true for e.g. JavaScript on the frontend.
I am currently migrating a project from React to HTMX.
Suddenly there is no build step anymore.
Some people were like: "That's possible?"
Yes, yes it is and it turns out for that project it increases stability and makes everything less complex while adding the exact same business value.
Does that mean that React is always the wrong choice?
Well, yes, React sucks, but solutions like React? No! It depends on what you need, on the project!
Just as a carpenter doesn't use a hammer to saw, we as a profession should strive to use the right tool for the right job. (Albeit it's less clear than for the carpenter, granted)
Since HTMX was mentioned, Clace also makes it easy to build Hypermedia driven apps.
I do plan to add specs for other languages. New specs have to be added here https://github.com/claceio/appspecs. New specs can be created locally also in the config, see https://clace.io/docs/develop/#building-apps-from-spec
I think this is a gross misunderstanding of the complexity of tools available to carpenters. Use a saw. Sure, electric, hand powered? Bandsaw, chop saw, jigsaw, scrollsaw? What about using CAD to control the saw?
> Suddenly there is no build step anymore
How do you handle making sure the JS you write works on all the browsers you want to support? Likewise for CSS: do you use something like autoprefixer? Or do you just memorize all the vendor prefixes?
I don't use any prefixed CSS and haven't for many years.
Last time I did knowingly and voluntarily was about a decade ago.
The problem is that most devs don’t view themselves as carpenters. They view themselves as hammer capenters or saw carpenters etc…
It’s not entirely their fault, some of the tools are so complex that you really need to devote most of your time to 1 of them.
I realize that this kind of tool specialization is sometimes required, but I that it’s overused by at the very least an order of magnitude.
The vast majority of companies that are running k8s, react, kafka etc… with a team of 40+, would be better off running rails (or similar) on heroku (or similar), or a VPS, or a couple servers in the basement. Most of these companies could easily replace their enormous teams of hammer carpenters and saw carpenters with 3-4 carpenters.
But devs have their own gravity. The more devs you have the faster you draw in new ones, so it’s unclear to me if a setup like the above is sustainable long term outside of very specific circumstances.
But if it were simpler there wouldn’t be nearly many jobs, so I really shouldn’t complain. And it’s not like every other department isn’t also bloated.
This is certainly a small subset of what kubernetes offers, but I'm curious, what would be your goto-solution for those requirements?
All used multi-cloud and it was about 95% common code with the other 5% being driver style components for underlying storage, networking, IAM etc. Also using Kind/k3d for local development.
You are using some other cloud provider or want uniformity there's https://Talos.dev
Kubernetes is what has provided us the abstraction layer to do multicloud in our SaaS. Once you are outside the k8s control plane, it is wildly different, but inside is very consistent.
Effectively using Google and Azure managed K8s. (Full GKE > GKE Autopilot > Google Cloud Run). The same containers will run locally, in Azure, or AWS.
It's fantastic for projects but and small. The free monthly grant makes it perfect for weekend projects.
Apart from that requirement, all of this is very doable with EC2 instances behind an ALB, each running nginx as a reverse proxy to an application server with hot restarting (e.g. Puma) launched with a systemd unit.
Or, instead of reinventing the same wheels for Nth time, I could just use a set of abstractions that work for 99% of network services out there, on any cloud or bare metal. That set of abstractions is k8s.
So you need seamless deployments.
Most people on the "just use bash scripts and duct tape" side of things assume that you really don't need these features, that your customers are ok with downtime and generally that the project that you are working on is just your personal cat photo catalog anyway and don't need such features. So, stop pretending that you need anything at all and get a job at the local grocery store.
The bottom line is there are use cases, that involve real customers, with real money that do need to scale, do need uptime guarantees, do require diverse deployment environments, etc.
I guess most businesses are smaller than this, but at what size do you start to need reliability for your internal services?
Equating bash scripts and running servers to duct taping and poor engineering vs k8s yaml being „proper engineering„ is well wrong.
About the 2 cloud providers bit. Is that a common thing? I get wanting migrate away from one for another, but having a need for running on more than 1 cloud simultaneously just seems alien to me.
But the set seems somewhat arbitrary. Can you reduce it further? What if you don't require 2 cloud providers? What if you don't need zero-downtime?
Indeed given that you have 4 machines (2 instances, x 2 providers) could a human manage this? Is Kubernetes overkill?
I ask this merely to wonder. Naturally if you are rolling out hundreds of machines you should, and no doubt by then you have significant revenue (and thus able to pay for dedicated staff) , but where is the cross-over?
Because to be honest most startups don't have enough traction to need 2 servers, never mind 4, never mind 100.
I get the aspiration to be large. I get the need to spend that VC cash. But I wonder if Devops is often just premature and that focus would be better spent getting paying customers.
I think the "2 cloud providers" criteria is maybe negotiable. Also, maybe there was a misunderstanding: I didn't mean to say I want to run it on two cloud providers. But rather that I run it on one of them but I could easily migrate to the other one if necessary.
The zero-downtime one isn't. It's not necessarily so much about actually having zero-downtime. It's about that I don't want to think about it. Anything besides zero-downtime actually adds additional complexity to the development process. It has nothing to do with trying to be large actually.
EDIT: and worse, it could be something that just started and would even happen when trying to deploy the old version of the code. Imagine a database configuration change that allows the old connections to stay open until they are closed but prevents new connections from being created. In that case, even an automatic roll back to the previous code version would not resolve the downtime. This is not theory, I had those cases quite a few times in my career.
I'm sure there are many solutions but K8s gives us both fully declarative infrastructure configs and zero downtime deployment out of the box (well, assuming you set appropriate readiness probes etc)
So now I (a developer) don't have to worry about server restarts or anything for normal day to day work. We don't have a dedicated DevOps/platforms/SRE team or whatnot. Now if something needs attention, whatever it is, I put my k8s hat on and look at it. Previously it was like "hmm... how does this service deployment work again..?"
Tiny startups are rarely trying to build projects for small customer bases (eg little scaling required.) They’re trying to be the next unicorn. So they should probably make sure they can easily scale away from tossing everything on the same server
Having too many (or too big) customers to handle is a nice problem to have, and one you can generally solve when you get there. There are a handful of giant customers that would want you to be giant from day 1, but those customers are very difficult to land and probably not worth the effort.
Admittedly, if you don't know k8s, it might be non-starter... but if you some knowledge, k3s plus cheap server is a wonderful combo
There’s shitloads of solutions.
It’s like minutes of clicking in a ui of any cloud provider to do any of that. So doing it multiple times is a non issue.
Or automate it with like 30 lines of bash. Or chef. Or puppet. Or salt. Or ansible. Or terraform. Or or or or or.
Kubernetes brings in a lot of nonsense that isn’t worth the tradeoff for most software.
If you feel it makes your life better, then great!
But there’s way simpler solutions that work for most things
Sorry, but I don't want to "click in a UI". And it is certainly not something you can just automate with 30 lines of bash. If you can, please elaborate.
Maybe not literally 30.. I didn't bother actually writing it. Also bash was just a single example. It's way less terraform code to do the same thing. You just need an ELB backed by an autoscaling group. That's not all that much to setup. That gets you the two loadbalanced servers and zero downtime deploys. When you want to deploy, you just create a new scaling group and launch configuration and attach to the ELB and ramp down the old one.. Easy peasy. For the secrets, you need at least KMS and maybe secret manager if you're feeling fancy.. That's not much to setup. I know for sure AWS and azure provide nice CLIs that would let you do this in not that many commands. or just use terraform
Personally if I really cared about multi cloud support, I'd go terraform (or whatever it's called now).
Sure, and then you can neither 1.) test your setup locally nor 2.) easily move to another cloud provider. So that doesn't really fit what I asked.
If they answer is "there is nothing, just accept the vendor lock-in" then fine, but please don't reply with "30 lines of bash" and make me have expectations. :-(
Our small company uses this setup. We migrated from GCP to AWS when our free GCP credits from YC ran out and then we used our free AWS credits. That migration took me about a day of rejiggering scripts and another of stumbling around in the horrible AWS UI and API. Still seems far, far easier than paying the kubernetes tax.
You are not wrong, but that only covers a part of what I was asking. How about the rest? How do you actually bring your services to production? I'm curious.
And, PS, I don't use k8s. Just saying.
Migrating to another cloud should be quite easy. There are many PaaS solutions. The hard parts will be things like migrating the data, make sure there's no downtime AND no drift/diff in the underlying data when some clients write to Cloud-A and some write to CLoud-B, etc. But k8 do not fix these problems, so..
How have you been going since 2005 and still not understand the economics of software?
Webapps might make it hard to tell, but a modern computer (or even an old computer like mine) is mindbogglingly fast.
For example B2B businesses where you have very few but extremely high value customers for specialized use cases.
Another one is building bully hardware. Your software infrastructure does not need to grow any faster than your shop floor is building it.
Whether you want to call that a "startup" is up for debate (and mostly semanticist if you ask me) but at one point they were all a zero employee company and needed to survive their first 5 years.
In general you won't find their products on the app store.
Negligible for me personally, it's much less than either our EC2 or RDS costs.
A job ago we had our own k8s cluster in our own DC, and it required a couple of teams to keep running and reasonably integrated with everything else in the rest of the company. It was probably cheaper overall than cloud given the compute capacity we had, but also probably not by much given the amount of people dedicated to it.
Even my 3-node k3s at home requires more attention than what you described.
The amount of work/cost of using k8s for handling them in comparison to doing it "old style" is probably negative by now.
All it lets you do is put shell commands into a text file and be able to run it self-contained anywhere. What is there to hate?
You still use the same local filesystem, the same host networking, still rsync your data dir, still use the same external MySQL server even if you want -- nothing has changed.
You do NOT need a load balancer, a control plane, networked storage, Kubernetes or any of that. You ADD ON those things when you want them like you add on optional heated seats to your car.
Anybody who doesn't have the money, time or engineering resources will jump on whatever appear as a decent alternative.
My intuition is that alternative already exist but I can't see it...
A bit like Spring emerged as an alternative to J2EE or what HTMX is to React & co.
Is it k3s or something more radical?
Is it on a chinese Github?
If you end up with exotic networking or file system mounts you can just be stuck maintaining k8s forever and some updates aren’t so stable so you have to be more vigilant that windows updates.
You can have Postgres lock in as much as Wordpress has MySql lock in.
I agree that you have less Linux lock in but Docker still requires a Linux kernel everywhere it goes. BSD need not apply.
Your friend lives 1/8 miles away. You go to see them every day so why wouldn't you drive? Well, cars are expensive and you should avoid them if you don't need them. There are a TON of downsides to driving a car 1/4 of a mile every day. And there are a TON of benefits to using a car to drive 25 miles every day.
I hate to quash a good debate but this all falls under the predictable but often forgotten "it depends". Usually do you need kubernetes == do you have a lot of shit to run.
If you're used to managing platforms e.g. networking, load balancers, security etc. then it's intuitive and easy.
If you're used to everything being managed for you then it will feel steep.
If a team of 5-10 SWEs have to do all of that while only graded on feature releases, k8s would massively suck.
I also agree that experienced platform/infra engineers tend to whine less about k8s.
If you're entering into k8s land with someone else's very complicated mess across hundreds of files, you're going to be in for a bad time.
A big problem, I feel, is that if you don't have an expert design the k8s system from the start, it's just going to be a horrible time; and, many people, when they're asked to set up a k8s setup for their startup or whatever, aren't already experts, so the thing that's produced is not maintainable.
And then everyone is cursed.
At least Kubernetes is all YAML, consistent and can be tested locally.
However...
Talking with people who started using kubernetes later than me[1], it seems like a lot of confusion starts by trying to start with somewhat complete example like using a Deployment + Ingress + Services to deploy, well, a typical web application. The stuff that would be trivial to run in typical PaaS.
The problem is that then you do not know what a lot of those magic incantations mean, and the actually very, very simple mechanism of how things work in k8s are lost, and you can't find your way in a running cluster.
[1] I started learning around 1.0, went with dev deployment with 1.3, graduated it to prod with 1.4. Smooth sailing since[2]
[2] The worst issues since involved dealing with what was actually global GCP networking outage that we were extra sensitive to due to extensive DNS use in kubernetes, and once naively assuming that the people before me set sensible sizes for various nodes, only to find a combination of too small to live EC2 instances choking till control plane death, and outdated etcd (because the rest of the company twas too conservative in updating) getting into rare but possible bug that corrupted data which was triggered by the flapping caused by too small instances. Neither I count as k8s issue, would have killed anything else I could setup given the same constraints.
You can cobble together your own unique special combination of services to run apps on! It's an open ended adventure into itself!
I'm all for folks doing less, if it makes sense! But there's basically nothing except slapping together the bits yourself & convincing yourself your unique home-made system is fine. You'll be going it alone, & figuring out on the fly, all to save yourself from getting good at the one effort that has a broad community, practitioners, and massive extensibility via CRD & operators.
I personally don’t want the federal government being able to peek into my files and data at any time, even though I’ve done nothing wrong. It’s the innocent people who have most to lose from government intrusion.
It seems insane to me to just throw up one’s hands and store private data in Google or Amazon clouds, especially when not doing so is so much cheaper.
Dear friend, you have built a Kubernetes
>> Kubernetes is feature-rich, yet these “enterprise” capabilities turned even simple tasks into protracted processes.
I don't agree. After learning the basics I would never go back. It doesn't turn simple tasks into a long process. It massively simplifies a ton functionality. And you really only need to learn 4 or 5 new concepts to get it off the ground.
If you have a simple website you don't need Kubernetes, but 99% of devs are working in medium sized shops where they have multiple teams working across multiple functionalities and Kubernetes helps this out.
Karpenter is not hard to set up at all. It solves the problem about over-provisioning out of the box and has for almost 5 years.
It's like writing an article: "I didn't need redis, and you probably don't either" and then talking about how Redis isn't good for relational data.
Google CloudRun, Database, PubSub, Cloud Storage, VPC, IAM, Artifact Registry etc lock-in: good.
This is not always true.
Kubernetes is a tool for creating computer clusters. Hence the name "Borg" (Kubernetes's grandpa) referring to assimilating heterogeneous hardware into a collective entity. Containers are an implementation detail.
Do you need a computer cluster? If so k8s is pretty great. If you don't care about redundancy and can get all the compute you need out of a single machine, then you may not need a cluster.
Once you're using containers on a bunch of VMs in different geographical regions, then you effectively have hacked together a virtual cluster. You can get by without k8s. You just have to write a lot of glue code to manage VMs, networking, load balancing, etc on the cloud provider you use. The overhead of that is probably larger than just learning Kubernetes in the long run, but it's reasonable to take on that technical debt if you're just trying to move fast and aren't concerned about the long run.
You're already building on a cluster, your cloud provider's hypervisor. They'll literally build virtual compute of any size and shape for you on demand out of heterogeneous hardware and the security guarantees are much stronger than colocated containers on k8s nodes.
There are quite a few steps between single server and k8s.
The cloud extensions were always just a convenience.
But the idea of Borg is that all of that's abstracted away for the typical developer. It's the same with k8s. The infrastructure team in your org needs to understand the implementation details, but really only a few.
You can also configure load balancers, IAM groups & policy etc as k8s CRDs. So all that stuff can be in one place in the code base alongside the rest of your infrastructure. So in that sense it does abstract those concepts. You still need to know something about them, but you don't have to configure them programmatically yourself since k8s will do that.
Less overhead than writing your own glue code, less overhead than learning Kubernetes, is just use a PaaS like Google App Engine, Amazon Elastic Beanstalk, Digital Ocean App Platform, or Heroku. You have access to the same distributed databases you would with k8s.
Cloud Run is PaaS for people that like Docker. If you don't even want to climb that learning curve, try one of the others.
This is the right way for web most of the time, but most places will choose k8s anyway. It’s perplexing until you come to terms with the dirty secret of resume driven development, which is that it’s not just junior engs but lots of seniors too and some management that’s all conspiring to basically defraud business owners. I think the unspoken agreement is that Hard work sucks, but easy work that helps you learn no transferable skills might be worse. The way you evaluate this tradeoff predictably depends how close you are to retirement age. Still, since engineers are often disrespected/discarded by business owners and have no job security, oaths of office, professional guilds, or fiduciary responsibility.. it’s no wonder things are pretty mercenary out there.
Pipelines are as important as web these days but of course there are many options for pipelines as a service also.
K8s is the obviously correct choice for teams that really must build new kinds of platforms that have many diverse kinds of components, or have lots of components with unique requirements for coupling (like say “scale this thing based on that other thing”, but where you’d have real perf penalties for leaving the k8s ecosystem to parse events or whatever).
The often mentioned concern about platform lock in is going to happen to you no matter what, and switching clouds completely rarely happens anyway. If you do switch, it will be hard and time consuming no matter what.
To be fair, k8s also enables brand new architectural possibilities that may or may not be beautiful. But it’s engineering, not art, and beautiful is not the same as cheap, easy, maintainable, etc.
Seems pretty active per its commit activity: https://github.com/hashicorp/nomad/graphs/commit-activity
But the fact that I hadn't heard of it before makes it sound not very popular, at least not for the bubble I live in :).
Does anyone have any practical experiences to share about it?
You'll certainly want to combine it with Consul and use Consul templates and service discovery though.
I'd say the difficulty and complexity level is between Kubernetes and Docker Swarm, not having to use YML too is a big benefit imho.
If that's popularity that you need, then sure, nobody ever got fired for choosing kubernetes.
K8s tries to abstract away individual "servers" and gives you an API to interact with all the compute/storage in the cluster.
They really aren't.
Personally I have a big Nix derivation to deploy my (heterogeneous) cluster to bare metal.
None of the k8s concepts or ideas apply here.
GCR was simple to run simple workloads, but, an out of the box Postgres database can't just handle unlimited connections and so connecting to it from GCR without having a DB connection proxy like PG bouncer risks exhausting the connection pool. For a traditional web app at any moderate scale, you typically need some fine grained control over per process, per server and per DB connection pools, which you'd lose with GCR.
Also, to take advantage of GCR's fine grained CPU pricing, you'd have to have an app that boots extremely quickly, so it can be turned off during periods of inactivity, and rescheduled when a request comes in.
Most of our heaviest workloads run on Kubernetes for those reasons.
The other thing thats changed since this author probably explored Kubernetes is that there are a ton of providers now that offer a Kubernetes control plane for no cost. The ones that I know of are Digital Ocean and Linode, where the pricing for a Kubernetes cluster is the same as their droplet pricing for the same resources. That didn't use to be the case. [1] The cheapest you can get is a $12 / month, fully featured cluster on Linode.
I've been building, in my spare time, a platform that tries to make Kubernetes more usable for single developers: https://canine.sh, based on my learnings that the good parts of Kubernetes are actually quite simple to use and manage.
[1] Digital oceans pricing page references its free control plane offering https://www.digitalocean.com/pricing
4x(Web processes) -> 1x(pgbouncer) -> database
This ensures that the pgbouncer instance is effectively multiplexing all the connections across your whole fleet.
In each individual web process, you can have another shared connection pool.
This is how we set it up
Good point. How many connections can it handle? Seems like it's up to 262142 in theory? Or am I reading this wrong: https://cloud.google.com/sql/docs/postgres/flags#postgres-m ??
But even 1000 seems ok? 1 per container, so 1000 running containers? Quite a lot in my world, especially since they can be quite beefy. Would be very worried about the cost way before 1000 simultaneously running containers :)
That's your problem right there. You really don't want to be setting up and managing a cluster from scratch for anything less than a datacenter-scale operation. If you are already on a cloud provider just use their managed Kubernetes offering instead. It will come with a free control plane and abstract away most of the painful parts for you (like etcd, networking, load balancing, ACLs, node provisioning, kubelets, proxies). That way you just bring your own nodes/VMs and can still enjoy the deployment standardization and other powerful features without the operational burden.
We are now on-prem using “pet” clusters with namespace as a service automated on it. This causes all kinds of issues with different workloads with different performance characteristics and requirements. They also share ingress and egress nodes so impact on those has a large blast radius. This leads to more rules and requirements.
Having dedicated and managed clusters where everyone can determine their sizing and granularity of workloads to deploy to which cluster is paradise compared to that.
Most of these issues can be fixed by setting resource requests equal to limits and using integer CPU values to guarantee QoS. You should also have an interface with developers explaining which nodes in your datacenter have which characteristics, using node labels and taints, and force developers to pick specific node groups as such by specifying node affinity and tolerations, by not bringing online nodes without taints.
> They also share ingress and egress nodes so impact on those has a large blast radius.
This is true regardless of whether or not you use Kubernetes.
When running on “bare” VMs each VM is its own member in the network. The pods in the cluster use an overlay network and egress is limited to egress nodes which are now shared by all workloads.
Having dedicated K8s clusters would reduce the sharing of network ingress and egress as well as choose the vm size for my workloads.
If you are serious about minimizing ops work, you can make sure people are deploying things in very simple ways, and in that world you are looking at _very easy_ deployment strategies relative to having to wire up VMs over and over again.
Just feels like lots of devs will take whatever random configs they find online and throw them over the fence, so now you just have a big tangled mess for your CRUD app.
Well it usually isn't a mystery. Requiring a developer team to learn k8s likely with no resources, time, or help is not a recipe for success. You might have minimised someone else's ops work, but at what cost?
There's a lot of nuance here. I think ops teams are comfortable with what I consider "config spaghetti". Some companies are incentivised to ship stuff that's hard to configure manually. And a lot of other dynamics are involved.
But at the end of the day if a dev copy-pastes some config into a file, taking a quick look over and asking yourself "how much of this can I actually remove?" is a valuable skill.
Really you want the ops team to be absorbing this as well, but this is where constant atomization of teams makes things worse! Extra coordination costs + a loss of a holistic view of the system means that the iteration cycles become too high.
But there are plenty of things where (especially if you are the one integrating something!) you should be able to look over a thing and see, like, an if statement that will always be false for your case and just remove it. So many modern ops tools are garbage and don't accept the idea of running something on your machine, but an if statement is an if statement is an if statement.
Agree.
To reduce the chance a dev pull some random configs out of nowhere, we maintain a Helm template that can be used to deploy almost all of our services in a sane way, just replace the container image and ports. The deployment is probably not optimal, but further tuning can be done after the service is up and we have gathered enough metrics.
We've also put all our configs in one place, since we found that devs tend to copy from existing configs in the repo before searching the internet.
Back in the old days before cloud providers this was the only option. I started my career in early 2010s and got the tailend of this, it was not fun.
I remember my IT department refusing to set up git for us (we were using SVN before) so we just asked a VM and set up a git repo in there ourselves to host our code.
- set up new VMs
- deploy software on new VMs
- have the team responsible give their ok
It takes forever, and in my experience, often never completes because some snowflake exists somewhere, or something needs a lib that doesn't exist on the new OS. VMs decouple the OS from the hardware, but you should still decouple the service from the OS. So that means containers. But then managing hundreds of containers still sucks.
With container management, I just
- add x new nodes to cluster
- drain x old nodes and delete them
Azure has a free tier with control plane completely free (but no SLA) - great deal for test clusters and testing infra.
If you are that worried about costs, then public cloud may not be for you at all, or you should look at ECS/App containers or serverless.
And even $70 is cheap, considering that a cluster should be shared by all the services from all the teams in the same environment, bar very few exceptions.
Once you're used to it, the high-level abstractions of k8 are wonderful. I run k3s on raspberry pi's because it takes care of all sorts of stuff for you, and it's easy to port code and design patterns from the big backend service to a little home project.
If you need dedicated people just to stay on top of running your services, you have a problem that's costing you hundreds of thousands per year. There's a lot of fun and easy stuff you can do with that kind of money. This is a pattern I see with a lot of teams that get sucked into using Kubernetes, micro services, terraform, etc. Once you need a few people just to stay on top of the complexity that comes from that, you are already spending a lot. I tend to keep things simple on my own projects because any amount of time I spend on that, I'm not spending on more valuable work like adding features, fixing bugs, etc.
Of course it's not black and white and there's always a trade off between over and under engineering. But a lot of teams default to over engineering simply by using Kubernetes from day one. You don't actually need to. There's nothing wrong with a monolith running on two simple vms with a load balancer in front of it. Worked fine twenty years ago and it is still perfectly valid. And it's dead easy to setup and manage in most popular cloud environments. If you use some kind of scaling group, it will scale just fine.
Not really, the cost of an empty EKS cluster is the management fee of $0.1/hour, or roughly the price of a small EC2 instance.
That's about 2x our monthly cloud expenses. That's not a small VM. You can buy a mac mini for that.
Though if you are only spending $350 monthly on VM, Database and Load Balancer, you can probably count resource instances by hand, and don't need a K8S cluster yet.
Like most serverless solutions, it does not permit you to control egress traffic. There are no firewall controls exposed to you, so you can't configure something along the lines of "I know my service needs to connect to a database, that's permitted, all other egress attempts are forbidden", which is a foundational component of security architecture that understands that getting attacked is a matter of time and security is something you build in layers. EDIT: apparently I'm wrong on Cloud Run not being deployable within a VPC! See below.
GCP and other cloud providers have plenty of storage products that only work inside a VPC. Cloud SQL. Memorystore. MongoDB Atlas (excluding the expensive and unscalable serverless option). Your engineers are probably going to want to use one or some of them.
Eventually you will need a VPC. You will need to deploy compute inside the VPC. Managed Kubernetes solutions make that much easier. But 90% of startups fail, so 95% of startups will fail before they get to this point. YMMV.
With that said... there are so many limitations on that list, that seriously, I can't imagine it would really be so much easier than Kubernetes.
I have an infrastructure layer that I apply to all clusters that includes things like cert-manager, an ingress controller and associated secrets. This is all cluster-independent stuff. Then some cluster-dependent stuff like storage controllers etc. I use flux to keep this stuff under version control and automatically reconciled.
From there you just deploy your app with standard manifests or however you want to do it (helm, kubectl, flux, whatever).
It all works wonderfully. The one downside is all the various controllers do eat up a fair amount of CPU cycles and memory. But it's not too bad.
Runs pretty much this stack:
"Infrastructure":
- NixOS with ZFS-on-Linux for as 2 mirrors on the NVMes
- k3s (k8s 1.31)
- openebs-zfs provisioner (2 storage classes, one normal and one optimized for postgres)
- cnpg (cloud native postgres) operator for handling databases
- k3s' built-in traefik for ingress
- tailscale operator for remote access to cluster control plane and traefik dashboard
- External DNS controler to automate DNS
- Certmanager to handle LetsEncrypt
- Grafana cloud stack for monitoring. (metrics, logs, tracing)
Deployed stuff:
- Essentially 4 tenants right now
- 2x Keycloak + Postgres (2 diff. tenants)
- 2x headscale instances with postgres (2 diff. tenants, connected to keycloak for SSO)
- 1 Gitea with Postgres and memcached (for 1 tenant)
- 3 postfix instances providing simple email forwarding to sendgrid (3 diff. tenants)
- 2x dashy as homepage behind SSO for end users (2 tenants)
- 1x Zitadel with Postgres (1 tenant, going to migrate keycloaks to it as shared service)
- Youtrack server (1 tenant)
- Nextcloud with postgres and redis (1 tenant)
- tailscale-based proxy to bridge gitea and some machines that have issues getting through broken networks
Plus few random things that are musings on future deployments for now.The server is barely loaded and I can easily clone services around (in fact a lot of the services above? instantiated from jsonnet templates).
Deploying some stuff was more annoying than doing it by hand from shell (specifically nextcloud) but now I have replicable setup, for example if I decide to move from host to host.
Biggest downtime ever was dealing with not well documented systemd-boot behaviour which caused the server to revert to older configuration and not apply newer ones.
I have set up about a dozen rack mount servers in my life, installing basically every flavor of Unix and Linux and message busses under the sun in the process, but I still get confused by all the Kubectl commands and GCP integration with it.
I might just be stupid, but it feels like all I ever do with Kubernetes is update and break YAML files, and then spend a day fixing them by copy-pasting increasingly-convoluted things on stackexchange. I cannot imagine how anyone goes to work and actually enjoys working in Kubernetes, though I guess someone must in terms of “law of large numbers”.
If I ever start a company, I am going to work my damndest to avoid “cloud integration crap” as possible. Just have a VM or a physical server and let me install everything myself. If I get to tens of millions of users, maybe I’ll worry about it then.
the only form of kubernetes I would be willing to try is the one with kata-containers for having all the security of virual machines.
WebRTC/TUN/STUN becomes an issue with the nginx config. May consider looking at pingora. The whole rust -> binary + toml file is super nice to run from system admin perspective.
... until you get hit by a DDoS attack. Not much you can do about it unless your ISP offers protection, or you end up going for Cloudflare or the like instead of exposing your IP and ports.
Set your TLS a low number, and you can swap whenever you feel like it.
I've managed a few thousand VMs in the past, and I'm extremely grateful for it. An image is built in CI, service declares what it needs, the scheduler just handles shit. I'm paged significantly less and things are predictable and consistent unlike the world of VMs where even your best attempt at configuration management would result in drift, because the CM system is only enforcing a subset of everything that could go wrong.
But yes, Kubernetes is configured in YAML, and YAML kind of sucks, but you rarely do that. The thing that changes is your code, and once you've got the boilerplate down CI does the rest.
The python ecosystem is a cancer.
Literally.
I'm sorry, citation needed on that. I spend a lot of time working with the damn YAML files. It's not a one-off thing for me.
You're not the first person to say this to me, they say "you rarely touch the YAML!!!", but then I look at their last six PRs, and each one had at least a small change to the YAML setup. I don't think you or they are lying, I think people forget how often you actually have to futz with it.
Maybe it's because I adopted early and have grown with the technology it all just makes sense? It's not that complicated if you limit yourself to the core stuff. Maybe I need to write a book like "Kubernetes for Greybeards" or something like that.
What does fucking kill me in the Kubernetes ecosystem is the amount of add-on crap that is pitched as "necessary". Sidecars... so many sidecars. Please stop. There's way too much vendor garbage surrounding the ecosystem and dev's rarely stop to think about whether they should deploy something when it's easy as dropping in some YAML and letting the cluster magically run it.
I think this is where the big difference is. If you're leading a team and introduced all good practices from the start, then the k8s and Terraform or whatever config files can never get so very complicated that a Gordian knot isn't created.
Perhaps k8s is nice and easy to use - many of the commands certainly are, in my experience.
Developers have, over years and decades, learned how to navigate code and hop from definition to definition, climbing the tree and learning the language they're operating in, and most of the languages follow similar-enough patterns that they can crawl around.
Configuring a k8s cluster has absolutely none of that knowledge built up; and, reading something that has rough practices is not a good place to learn what it should look like.
This also happens with configuration based packaging setups. Python hatch in particular, but sometimes node/webpack/rollup/vite.
If I would only have a penny for each time I wasted hours trying to figure out what something in "modern IT" is, just to figure out that I already knew what it is, but it was well hidden under layers of newspeak...
The CNCF ecosystem looked a lot different back then.
Then you can hit other resources (in my case working with a team who've been using K8S for a few years).
If you (or anyone else) has suggestions for something newer and covering more than just the core (like various different components you can use, Helm, Argo, ISTIO etc etc) then I'd appreciate it :-)
(I didn't have access to my email or Amazon account let alone my office when I posted so couldn't check the name of the book).
(edit: found the 3rd edition)
To your point, and I have not used k8s I just started to research it when my former company was thinking about shoehorning cassandra into k8s...
But there was dogma around not allowing access to VM command exec via kubectl, while I basically needed it in the basic mode for certain one-off diagnosis needs and nodetool stuff...
And yes, some of the floated stuff was "use sidecars" which also seemed to architect complexity for dogma's sake.
We do, but not of the SQL variety (that I am aware of). We have persistent key-value and document store databases hosted in these clusters. SQL databases are off-loaded to managed offering's in the cloud. Admittedly, this does simplify a lot of problems for us.
> exec
kubectl exec is good, and it's possible to audit access (ie. get kubectl exec events with arguments logged)
and I guess and admissions webhook can filter the allowed commands
but IMHO it's shouldn't be necessary, the bastion host where the "kubectl exec" is run from should be accessible only through an SSH session recorder
Admittedly: The ecosystem is huge and less is more in most cases, but the foundation is cohesive and sane.
Would love to read the k8s for greybeards book.
Or maybe Kubernetes looks like a committee designed, everything and the kitchen sink, over-engineered, second system effect, second system effect suffering, YAGNI P.O.S., that only the kind of "enterprise" mindset that really enjoyed J2EE in 2004 and XML/SOAP vs JSON/REST would love...
Would pay for a decent remote live course intro.
Yeah it is the same with terraform modules, I was trying to argument at a previous job that we should stick to a single module (the cloud provider module) but people just love adding crap if it saves them 5 lines of configuration. Said crap of course adding tons of unnecessary resources in your cloud that no one understands.
At that time stateful services were somewhat harder to operate on k8s because statefulness (and all that it encapsulates) was kinda full of bugs. That may certainly have changed over the last few years. Maybe we just did it wrong. In any case if you focused on the core parts of k8s that were mature back then, k8s was (and is) a good thing.
But we recently hired a staff engineer to the team (now the most senior) and the guy just cannot rest still. "Oh we need a service mesh because we need visibility! I've been using it on my previous job and its the best thing ever." Even though we have all the visibility/metrics that we need and never needed more than that. Then its "we need a different ingress controller, X is crap Y surely is much better!" etc.
So its not inexperienced engineers wanting newest hotness because they have no idea how to solve stuff with the tools they have, its sometimes senior engineers trying to justify their salary, "seniority" by buying into complexity as they try to make themselves irreplaceable.
It’s not strictly necessary, but if you’ve had to put in the work elsewhere, I’d use it.
There’s always a period of “omgwhat” when new senior engineers join and they want to improve things. There’s a short window between joining and getting bogged into a million projects where this is possible.
Embrace it I recon.
In fact pretty sure I've read a write up from Alibaba? on huge wins in performance due to moving Istio out of sidecar and into shared node service.
I suppose you could have a giant envoy and have all the proxy-configs all mashed together but I really don't see any benefit to it? I can't even find documentation that says it's possible..
It's called ambient mode, and uses separate L4 and L7 processing on ways that would be familiar to people who dealt with virtual network functions - and neither l4 nor l7 parts require sidecar
The grass is always greener where you water it. They joined your company because the grass was greener there than anywhere else they could get an offer at. They want to keep it that way or make it even greener. Assuming that someone is doing something to become 'irreplaceable' is probably not healthy.
I regard these as traits of a junior dev. they're thinking technology-first, not problem-first
>It's not that complicated if you limit yourself to the core stuff.
Isn't this the core problem with a lot of technologies. There's a right way to use it, but most ways are wrong. An expert will not look left and right anymore, but to anyone entering the technology with fresh eyes it's a field with abundance of landmines to navigate around.
It's simply bad UX and documentation. It could probably be better. But now it's too late to change everything because you'd annoy all the experts.
>There's way too much vendor garbage surrounding the ecosystem
Azure has been especially bad in this regard. Poorly documented in all respects, too many confusing UI menus that have similar or same names and do different things. If you use Azure Kubernetes the wrapper makes it much harder to learn the "core essentials". It's better to run minkube and get to know k8s first. Even then a lot of the Azure stuff remains confusing.
This is my biggest complaint. There is no simple obvious way to set it up. There is no "sane default" config.
> It's better to run minkube and get to know k8s first.
Indeed. It should be trivial to set up a cluster from bare metal - nothing more than a `dnf install` and some other command to configure core functionality and to join machines into that cluster. Even when you go the easy way (with, say, Docker Desktop) you need to do a lot of steps just to have an ingress router.
Includes working out of the box ingress controller.
I think since then the documentation probably has improved. I would hope so. But I will only touch Kubernetes again, when I need to. So maybe on a future job.
Sure, I am playing with fire (k3s, bare metal, cilium, direct assigned IP to Ingresses), but a few weeks ago on one cluster suddenly something stopped working in the external IP -> internal cluster IP network path. (And after a complete restart things got worse. Oops. Well okay time to test backups.)
There's another Kubernetes post on the front page of HN at the moment, where they complain it's too complex and they had to stop using it. The comments are really laying into the article author because they used almost 50 clusters. Of course they were having trouble, the comments say, if you introduce that much complexity. They should only need one single cluster (maybe also a backup and a dev one at most). That's the whole point.
But then here you are saying your team "operates a couple thousand" clusters. If 50 is far too many, and bound to be unmanageable, how is it reasonable to have more than a thousand?
It's not unmanageable to have a couple thousand Kube clusters but you need to have the resources to build a staff and tool chain to support that, which most companies cannot do.
Clusters are how we shard our customer workloads (a workload being say a dozen services and a database, a customer may have many workloads spread across the entire fleet). We put between 100 and 150 workloads per cluster. What this gives us is a relatively small impact area if a single cluster becomes problematic as it only impacts the workloads on it.
Sounds like a great KISS solution. Why did it regress into Kubernetes?
The “KISS solution” didn’t scale to the requirements of modern business. I remember running chef - essentially a complicated ruby script - on 100ks of servers, each of which with their own local daemon & a central management plane orchestrating it. The problem was that if a server failed… it failed, alongside everything on it.
Compared to that setup, k8s is a godsend - auto healing, immutable deployments, scaling, etc - and ultimately, you were already running a node agent, API, and state store, so the complexity lift wasn’t noticeable.
The problem came about when companies who need to run 5 containers ended up deploying a k8s cluster :-)
And pods, of course
Give me access to a repo full of YAML files and I'm truly and completely lost and wouldn't even know where to begin.
YAML is simply not the right tool for this job. Sure you got used to it but that's exactly the point: you had to get used to it. It was not intuitive and it did not come naturally to you.
I mostly just think that k8s integration with GCP is a huge pain in the ass, every time I have to touch it it's the worst part of me day.
Infrastructure as code is great, but lets be honest, most people are not thoroughly reading through a PR with 200+ files.
There's of course tpl files to help reduce duplication, and I'm grateful for that stuff when I can get it, but for one reason or another, I can't always do that.
It's also not always clear to me which YAML corresponds to which service, though I think that might be more of an issue with our individual setup.
Completely agree. I use Kubernetes (basically just Deployments and CronJobs) because it makes deployments simple, reliable and standard, for a relatively low cost (assuming that I use a managed Kubernetes like GKE where I don’t need to care at all about the Kubernetes engine). Using Kubernetes as a developer is really not that hard, and it gives you no vendor lock-in in practice (every cloud provider has a Kubernetes offer) and easy replication.
It’s not the only solution, not the simplest either, but it’s a good one. And if you already know Kubernetes, it doesn’t cost much to just use it.
Survivorship bias?
I'd make a similar statement about the sys admin stuff you already know well. Give me yaml and a rest api any day.
I see where you and the article are coming from, though. The article reasonably points out that k8s is heavy for simpler workloads.
Like Shakespeare work would be clumsy and half translated to french advertising jargoon and you are forced to read it and make it work on a stage.
I’ve done bicep, terraform and both Kubernetes and the managed (I forgot what azure conteiner apps running on top of what is basically Kubernetes is called). When I can get away with it I always use the Azure CLI through bash scripts in a pipeline however and build directly into Azure App services for contained which is just so much less complicated than what you probably call “cloud shit”. The cool part about the Azure CLI and their app services is that it hasn’t really changed in the past 3 years, and they are almost one size fit any organisation. So all anyone needs to update in the YAML scripts are the variables. By contrast working with Bicep/Terraform, Jenkins and whatever else people use has been absolutely horrible, sometimes requiring full time staff just to keep it updated. I suppose it may be better now that azure co-pilot can probably auto-generate what you need. A complete waste of resources in my opinion. It used to be more expensive, but with the last price hike of 600% on azure container apps it’s usually cheaper. It’s also way more cost efficient in terms of maintaining since it’ll just work after the initial setup pipeline has run. This is the only way I have found that is easier than what it was when organisations ran their own servers. Whether it was in the basement or at some local hardware house (not exactly sure what you call the places where you rent server rack space). Well places like Digital Ocean are even easier but they aren’t used by enterprise.
I’m fairly certain I’ve ever worked with an organisation that needed anything more than that since basically nothing in Denmark scales beyond what can run on a couple of servers behind a load balancer. One of the few exceptions is the tax system which sees almost 0 usage except for the couple of weeks where the entire adult population logs in in at the same time. When DevOps teams push back, I tend to remind them that StackOverflow ran on a couple of IIS servers for a while and that they don’t have even 10% of the users.
Eventually the business case for Azure will push people back to renting hardware space or jumping to Hetzner and similar. But that’s a different story.
Although my experience is with AWS, I find the terraform AWS provider docs better documentation than the official AWS docs for different options. If they don't answer any question I have right away they at least point me where to look for answers in the mess that is AWS docs.
I mean hopefully no one is logging into Azure to fuck with settings but I’m sure we’ve all worked with that one team that doesn’t give a flying fuck about good practices.
Or say you wish to now scale up a VM, how does your bash script deal with that?
Do you copy past the old script, pass new flags to the Azure CLI, and then run that, then delete the old infrastructure somehow?
I’m curious because I think I’d like to try your approach.
One of the things I like about the Azure CLI is that it rarely changes. I would like to clarify that I’m mainly talking about Azure App Services and not VMs. Function apps for most things, web apps for things like APIs.
As far as the script goes they are basically templates which are essentially “copy paste”. One of the things I tend to give developers in these organisations is “skeleton” projects that they can git clone. So they’ll typical also have some internal CLI scripts to automate a lot of the code generation and an azure-pipelines-resource-creation.yml plays into this. Each part of your resource creation is its own “if not exist” task. So there is a task to create a resource group. Then a task to create an app service plan and so on.
It won’t scale. But it will scale enough for every organisation I’ve worked with.
To be completely honest it’s something which grew out of my frustration of repeating the same tasks in different ways over the years. I don’t remember exactly but I think quite a few of the AZ CLI commands haven’t change for the past three years. It’s really the one constant across organisations, even the Azure Poetal hasn’t remained the same.
Every time we get a new guy I have to explain that we are already „in cloud” there is no need to „move to cloud”.
Developers want to use PaaS and also AWS or Azure so they can put it on their resume for the future.
I think this is a little disingenuous. Developers want to use them because they already know them. The services composing them are also often well documented by the provider.
I say all of that as someone trying to move a company away from aws, and over to our own hardware.
Also managing a cloud infrastructure is a lot more complex than running Debian and Ansible on a VM.
I just want a damn network, a couple of virtual machines, and a database. Why does each <cloud provider> have to create different fancy wrappers over everything, that not even their own sales consultants, and even engineers, understand?(1)
What I do like about Docker and Kubernetes is that shifting from one cloud provider to another, or even back to on-premises (I'm waiting for the day our organisation says "<cloud-provider> is too damn expensive; those damn management consultants lied to us!!!!") is a lot easier than re-building <cloud provider>'s bespoke shit in <another cloud provider>'s bespoke shit, or back on-premises with real tech (the right option in my opinion for anyone less than a (truly) global presence).
I do like the feel of, and being able to touch bare metal, though the 180-proof-ether-based container stuff is nice for quick, flexible, and (sometimes) dirty. Especially for experimenting when the Directors for War and Finance (the same person) say "We don't have the budget!! We're not buying another server/more RAM/another disk/newer CPUs! Get fucked!".
The other thing about Docker specifically I like is I can 'shop' around for boilerplate templates that I can then customise without having to screw around manually building/installing them from scratch. And if I fuck it up, I just delete the container and spin up another one from the image.
(1) The answer is 'vendor lock-in', kids.
(I apologise, I've had a looooooong day today.......)
But there are tons of applications that run on over-engineered cloud environments that may or may not involve k8s and probably cost more to operate than they must. I use some tools every day where a daily 15 min downtime would not affect my or my work in the slightest. I am not saying this would be desirable per se. Its just that a lot of people (myself included) are happy to spend an hour of their work day talking to colleagues and drinking coffee, but a 15 min downtime of some tool is seen as an absolute catastrophe.
I've been at a place where the two main applications were tightly coupled with several support systems and they all were dependent on one or more of Oracle DB, Postgres, Redis, JBoss, Tibco EMS and quite a bit more. Good luck using your development device to boot and run the test suite without containers. Before that team started putting stuff in containers they used the CI/CD environment to run the full set of tests, so they needed to do a PR, get it accepted, maybe wait for someone else's test run to finish, then watch it run, and if something blew, go back to commit, push to PR, &c. all over again.
Quite the nuisance. A full test suite had a run time of about an hour too. When I left we'd pushed it to forty minutes on our dev machines. They didn't use raw Kubernetes though, they had RedHat buy-in and used OpenShift, which is a bit more sane. But it's still a YAML nightmare that cuts you with a thousand inscrutable error messages.
Our generation has seen many things before, but at the same time the world has completely changed and it’s led to the people growing up in it to be different.
You and me didn’t fully grasp CPUs anymore. Some people today don’t grasp all the details of the abstractions below K8s anymore and use it when perhaps something simpler (in architecture , not necessarily in use!) could do it better. And yet, they build wonderous things. Without editing php.ini and messing up 2 services to get one working.
Do I think K8s is the end all? Certainly not, I agree it’s sometimes overkill. But I bet you’ll like it’s follow-up tech even less. It is the way of things.
I agree with your analysis.
People wanna talk up about how good the old days were plugging cables into racks but it's really laborious and can take days to debug that a faulty network switch is the cause of these weird packet drop issues seen sporadically on hot days.
Same as people saying 'oh yeah calculators are too complicated, pen and paper is what kids should be learning'.
It's the tide of change
I'm not saying that plugging in cables and hoping power supplies don't die is "better" in any kind of objective sense, or even subjective sense really, I'm just saying that I hate this cloud cult that has decided that the only way to do anything is to add layers of annoying bureaucratic shit.
Even if you are right in this instance, just brushing things off with the "you are old" argument will ensure that you end up in some horrible tech debt spaghetti mess in the future.
Being critical of the infrastructure you deploy to is a good thing. Because for all the new things that do stick around, there are dozens of other shiny new hyped up things that end up in digital purgatory quite soon after the hype phase is over.
That's not to say there isn't some truth to your statement. The older you get, the more critical you do need to be to yourself as well. Because it is indeed possible to just be against something because it is new and unfamiliar. At the same time, does experience provide insights allowing senior people to be more critical to things.
*tl;dr:* The world is complicated, not binary.
As I said in a sibling comment, you can genuinely get a bachelors degree in AWS or Azure [1], meaning that it's complicated enough to where someone thought it necessitated an entire college degree.
By "cloud shit", I don't mean "someone else hosting stuff" (which I tried to clarify by saying "give me a VM" at the end). I mostly think that having four hundred YAML files to do anything just sucks the life out of engineering. It wouldn't bother me too much if these tasks were just handled by the people who run devops, but I hate that since I am a "distributed systems engineer" that I have to LARP as an accountant and try and remember all this annoying arbitrary bureaucratic crap.
[1] https://www.wgu.edu/online-it-degrees/cloud-computing-bachel...
I never really liked the devops stuff even when I was 20. I have no doubt that I could get better with k8s, but it's decidedly not fun.
If I can restart the machine and it magically resolves the underlying hardware fault (presumably by migrating me to a new host), then I am in a happy place.
Most of the other problems can be dealt with using modern tooling. I lean aggressively on things like SQLite and self-contained deployments in .NET to reduce complexity at hosting time.
When you can deploy your entire solution with a single zip file you would be crazy to go for something like K8s.
One other cloud thing that is useful is S3-style services. Clever use of these APIs can provide incredible horizontal scalability for a single instance solution. SQLite on NVMe is very fast if you are offloading your blobs to another provider.
Nope, then you'll set up sharded databases and a bunch of application servers behind a load balancer.
The cloud integration part can be hairy but I have terraform patterns that, once worked out, are cookie cutter.
With cloud kubernetes, I can imagine starting from scratch, taking a wrong turn and ending up in hell.
But I'm exchanging one problem set for another. Having spent years managing fleets of physical and virtual servers, I'm happier and more productive now. I never need to worry about building systems or automation for doing OS build / patching, config management, application packaging and deployment, secrets management, service discovery, external DNS, load balancering, TSL certs etc. Because while those are just "words" now, back then each one was a huge project involving multiple people fighting over "CentOS Vs Ubuntu", "Puppet Vs Ansible", "RPMs Vs docker containers", "Patching Vs swapping AMIs". If you're using Consul and Vault, good luck - you have to integrate all of that into whatever mess you've built, and you'll likely have to write puppet code and scripts to hook it all up together. I lost a chunk of my life writing 'dockerctl' and a bunch of puppet code that deployed it so it could manage docker containers as systemd services. Then building a vault integration for that. It worked great across multiple data centers, but took considerable effort. And in the end it's a unique snowflake used by exactly one company, hardly documented and likely full of undiscovered bugs and race conditions even after all the hard work. The time it took to onboard new engineers was considerable and it took time away from an existing engineer. And we still had certificates expire in production.
https://github.com/dhall-lang/dhall-kubernetes
Type safe, fat finger safe representation of your YAMLs is grossly underrated.
The fact that vmware can migrate a running vm to a different hardware node is surely powerful feature. Do you want to pay for that with complexity? If you are one infra team serving on-prem deployments with consistent loads you provision things when new projects are started and things break. However, if infra team serves internal product teams it is nice to give certain guarantees to them and limit blast radius how they can affect other teams.
This is where kubernetes sit. It's a deployment platform, where the deployable is container image instead of VM image. Slice off a namespace to an internal team and have their deployment blast radius contained within their namespace.
Do you need such flexibility? I'm pretty sure that roughly 99% of all teams do not. A static set of VMs provisioned with some container orchestration system is more than enough for them. Loads and deployments are too static for it to matter.
>>> But it allows seamless upgrades!
Dude or dudette, if you can't properly drain your nodes and migrate sessions now, kubernetes will not save you.
Let me try to explain:
First, you encounter the biggest impedance mismatch between cloud and on prem: Kubernetes works with pods, while AWS works with instances as the unit of useful work, so they must map to each other, right?
Wrong, first each instance needs to run a Kubernetes node, which duplicates the management infrastructure hosted by AWS, and reduces the support for granularity, like if I need 4 cores for my workload, I start an instance with 4 cores, right?. Not so with k8s, you have to start up a big node, then schedule pods there.
Yay, extra complexity and overhead! And it's like when you need 3 carrots for the meal your cooking, but the store only sells it in packs of 8, you have to pay for the ones you don't need, and then figure out how to utilize the waste.
I'm not even going to talk about on-prem kubernetes, as I've never seen anyone IRL use it.
Another thing is that the need for kubernetes is manifestation of crappy modern dev culture. If you wrote your server in node, Python and Ruby, you're running a single threaded app in the era of even laptops having 32 core CPUs. And individual instances are slow, so you're even more dependent on scaling.
So, to make use of the extra CPU power, you're forced to turn to Docker/k8s and scale your infra that way, whereas if you went with something like Go, or god forbid, something as deeply uncool as ASP.NET, you could just put your 32 core server, and you get fast single threaded perf, and perfect and simple multi-threaded utilization out of the box without any of the headaches.
Also I've found stuff like rolling updates to be a gimmick.
Also a huge disclaimer, I don't think k8s is a fundamentally sucky or stupid thing, it's just I've never seen it used as a beneficial architectural pattern in practice.
That's what many teams end up half-arsing without realising they're attempting to build a PaaS.
They adopt K8S thinking it's almost 90% a PaaS. It's not.
They continue hiring, building a DevOps team just to handle K8S infrastructure things, then a Platform team to build the PaaS on top of it.
Then because so many people have jobs, nobody at this point wants to make an argument that perhaps using an actual PaaS might make sense. Not to mention "the sunk cost" of the DIY PaaS.
Then on top of that, realising they've built a platform mostly designed for microservices, everything then must become a microservice. 'Monolith' is a banned word.
Argh
But for some reason this is what people want to do. They would rather spend hours debugging kubernetes, terraform, and docker, and spending 5 digits on cloud every month, to serve what could literally be proxied authenticated DB lookups. We have “hack days” a few times a year, and I’m genuinely debating rewriting the entire “cloud” portion of our current product in gunicorn or something, host it on a $50/month vps, point it at a mirror of our prod db, and see how many orders of magnitude of performance I can knock off in a day.
I’ve only managed to convert one “cloud person” to my viewpoint but it was quite entertaining. I was demoing a side project[0] that involved pulling data from ~6 different sources (none hosted by me), concatenating them, deduping them, doing some math, looking up in a different source an image (unique to each displayed item), and then displaying the list of final items with images in a list or a grid. ~5k items. Load time on my fibre connection was 200-250ms, sorting/filtering was <100ms. As I was demoing this, a few people asked about the architecture, and one didn’t believe that it was a 750 line python file (using multiprocessing, admittedly) hosted on an 8 core VPS until I literally showed up. He didn’t believe it was possible to have this kind of performance in a “legacy monolithic” (his words) application.
I think it’s so heavily ingrained in most cloud/web developers that this is the only option that they will not even entertain the thought that it can be done another way.
[0] This particular project failed for other reasons, and is no longer live.
Except...
Performance was woeful. It took forever to spin up the pods, but even once things had warmed up everything just ran in slow motion. Data was slow to collect (single-digit kilobits!), and I even saw a few timeout failures within the cluster.
I gave up and simply provisioned a 120 vCPU / 600 GB memory cloud server with spot pricing for $2/hour and ran everything locally with scripts. I ended up scanning a decent chunk of my country's internet in 15 minutes. I was genuinely worried that I'd get put on some sort of "list" for "attacking" government sites. I even randomized the read order to avoid hammering any one site too hard.
Kubernetes sounds "fun to tinker with", but it's actually a productivity vampire that sucks up engineer time.
It's the Factorio of cloud hosting.
Now that is a blog post that I would read with interest, top to bottom.
Both Azure and AWS have spot-priced VMs that are “low priority” and hence can be interrupted by customers with normal priority VM allocation requests. These have an 80% discount in exchange for the occasional unplanned outage.
In Azure there is an option where the spot price dynamically adjusts based on demand and your VM basically never turns off.
The trick is that obscure SKUs have low demand and hence low spot prices and low chance of being taken away. I use the HPC optimised sizes because they’re crazy fast and weirdly cheap.
E.g.: right now I’m using one of these to experiment with reindexing a 1 TB database. With 120 cores (no hyperthreading!) this goes fast enough that I can have a decent “inner loop” development experience. The other trick is that even Windows and SQL Server is free if this is done in an Azure Dev/Test subscription. With free software and $2/hr hardware costs it’s a no-brainer!
I thought about it many times but never did it on that scale, plus was never paid to do so and really didn't want my static IP banned. So if you ever write on that and publish it on HN you'd find a very enthusiastic audience in me.
I did this in two phases:
Phase #1 was to collect "top-level" URLs, which I did via Certificate Transparency (CT). There's online databases that can return all valid certs for domains with a given suffix. I used about a dozen known suffixes for the state government, which resulted in about 11K hits from the CT database. I dumped these into a SQL table as the starting point. I also added in distinct domains from load balancer configs provided by the customer. This provided another few thousand sites that are child domains under a wildcard record and hence not easily discoverable via CT. All of this was semi-manual and done mostly with PowerShell scripts and Excel.
Phase #2 was the fun bit. I installed two bespoke builds of Chromium side-by-side on the 120-core box, pointed Selenium at both, and had them trawl through the list of URLs in headless mode. Everything was logged to a SQL database. The final output was any difference between the two Chromium builds. E.g.: JS console log entries that are different, cookies that are not the same, etc...
All of this was related to a proposed change to the Public Suffix List (PSL), which has a bunch of effects on DNS domain handling, cookies, CORS, DMARC, and various other things. Because it is baked into browser EXEs, the only way to test a proposed change ahead of time is to produce your own custom-built browser and test with that to see what would happen. In a sense, there's no "non-production Internet", so these lab tests are the only way.
Actually, the most compute-intensive part was producing the custom Chromium builds! Those took about an hour each on the same huge server.
By far the most challenging aspect was... the icon. I needed to hand over the custom builds to web devs so that they could double-check the sites they were responsible for, and it was also needed for internal-only web app testing. The hiccup was that two builds look the same and end up with overlapping Windows task bar icons! Making them "different enough" that they don't share profiles and have distinct toolbar icons was weirdly difficult, especially the icon.
It was a fun project, but the most hilarious part was that it was considered to be such a large-scale thing that they farmed out various major groups of domains to several consultancies to split up the work effort. I just scanned everything because it was literally simpler. They kept telling me I had "exceeded the scope", and for the life of me I couldn't explain to them that treating all domains uniformly is less work than trying to determine which one belongs to which agency.
I only get a "fun" project like this once every year or two.
Selling this kind of thing is basically impossible. You can't convince anyone that you have an ability that they don't even understand, at some fundamental level.
At best, you can incidentally use your full set of skills opportunistically, but that's only possible for unusual projects. Deploying a single VM for some boring app is always going to be a trivial project that anyone can do.
With this project even after it was delivered the customer didn't really understand what I did or what they got out of it. I really did try to explain, but it's just beyond the understanding of non-technical-background executives that think only in terms of procurement paperwork and scopes of works.
> He didn’t believe it was possible to have this kind of performance in a “legacy monolithic” (his words) application.
> I think it’s so heavily ingrained in most cloud/web developers that this is the only option that they will not even entertain the thought that it can be done another way.
One thing that I need to remind myself of periodically: The amount of work that a modern 1U server can do in 2024 is astonishing.I’ve tested this repeatedly, at multiple companies, with Postgres and MySQL. Everyone thinks Aurora must be faster because AWS is pushing it so hard; in fact, it’s quite slow. Hard to get around physics. My drives are mounted via Ceph over Infiniband, and have latency measured in microseconds. Aurora (and RDS for that matter) has to traverse much longer physical distances to talk to its drives.
Pure bare metal IME only leads to people ssh'ing to hotfix something and forgetting to deploy it. Exclusively using Docker images prevents that. Also, it makes firewall management much, much easier as you can control containers' network connectivity (including egress) each on their own, on a bare-metal setup it involves loads of work with network namespaces and fighting the OS-provided systemd unit files.
Kubernetes is interesting, because it basically takes everything you know and sort of pushes it down the stack (or up, depending on your viewpoint). To some extend I get the impression that the idea was: Wouldn't it be great if we took all the infrastructure stuff, and just merged the whole thing into one tool, which you can configure using a completely unsuitable markup language. The answer is that "No, that would in fact not be great".
For me the issue is that Kubernetes is overused. You can not really argue that it's not useful or has its place, but that place is much much small than the Internet wants us to believe. It's one of the few services where I feel like "Call us" would be an appropriate sales method.
The article is correct, you probably don't need Kubernetes. It's a amazing piece of technology, but it's not for the majority of us. It is and should be viewed as a niche product.
Mostly a question of scale to me, I'd guess that the majority (80-90%) of people running Kubernetes don't have large enough scale that it makes sense to take on the extra complexity. Most Kubernetes installations I've seen runs on VMs, three for control plane and 2 - 5 for worker node and I don't think the extra layer is a good trade off for a "better" deployment tool.
If you do use Kubernetes as a deployment tool, then I can certainly understand that. It is a reasonably well-known, and somewhat standardised interface and there's not a lot of good alternatives for VMs and bare metal. Personally I'd just much rather see better deployment tools being developed, rather than just picking Kubernetes because Helm charts are a thing.
You'd also need to have a rather dynamic workload, in the sense that some of your services is need a lot of capacity at one point in time, while other need the capacity at another time. If you have constant load, then why?
It's like Oracle's Exadata servers, it's a product that has its place, but the list of potential buyers isn't that long.
> YAML files, and then spend a day fixing them by copy-pasting increasingly-convoluted things on stackexchange.
This is terrible behavior. Its not any different from yanking out pam modules because you’re getting SSH auth failures caused by a bad permission on an SSH key.
> If I get to tens of millions of users, maybe I’ll worry about it then.
K8s isn’t there for 10s of millions of users. It’s there so you’re not dependent on some bespoke VM state. It also allows you to do code review on infra changes like port numbers being exposed, etc.
Separately, your VM likely isn’t coming from any standard build pipeline so now a vulnerability patch is a login to the machine and an update, which hopefully leaves it in the same state as VMs created new…
Oh, and assuming you don’t want to take downtime on every update, you’ll want a few replicas and load balancing across them (or active/passive HA at a minimum). Good luck representing that as reviewable code as well if you are running VMs.
The people that don’t understand the value prop of infra as code orchestration systems like k8s tend to work in environments where “maintenance downtime” is acceptable and there are only one or two people that actually adjust the configurations.
It's 100% possible to have stateless VMs running in an auto-scaling instance group (in GCP speak, I forget what AWS calls them)
People that don’t like k8s tend to be fine with docker. It’s usually that they don’t like declarative state or thinking in selectors and other abstractions.
I paired with one of our platform engineers several months ago. For a simple app that listens on Kafka, stores stuff in PostgreSQL and only has one exposed port... and that needed at least 8 YAML files. Ingress, service ports and whatever other things k8s feels should be described. I forgot almost all of them the same day.
I don't doubt that doing it every day will have me get used to it and even find it intuitive, I suppose. But it's absolutely not coming natural to me.
I'd vastly prefer just a single config block with a declarative DSL in it, a la nginx or Caddy, and describe all these service artifacts in one place. (Or similar to a systemd service file.)
Too many files. Fold stuff in much less of them and I'll probably become an enthusiastic k8s supporter.
> This is terrible behavior. Its not any different from yanking out pam modules because you’re getting SSH auth failures caused by a bad permission on an SSH key.
Sure, I agree, maybe they should make the entire process less awful then and easier to understand. If they're providing a framework to do distributed systems "correctly" then don't make it easy for someone whose heart really isn't into it to screw it up.
> K8s isn’t there for 10s of millions of users. It’s there so you’re not dependent on some bespoke VM state. It also allows you to do code review on infra changes like port numbers being exposed, etc.
That's true of basically any container stuff or orchestration stuff, but sure.
Kubernetes just screams to me as suffering from a "tool to make it look like I'm doing a lot of work". I have similar complaints with pretty much all Java before Java ~17 or so.
I'm not convinced that something like k8s has to be as complicated as it is.
Describe what you think bureaucratic means in a tool.
> I might learn to enjoy being kicked in the balls if I practiced enough
This is the same thing people say who don’t want to learn command line tools “because they aren’t intuitive enough”. It’s a low brow dismissal holding you back.
That's simply not true.
Every Kubernetes cluster I have seen and used gives a lot more leeway for the runtime state to change than a basic Ansible/Salt/Puppet configuration, just due to the sheer number of components involved. Everything from Terraform to Istio and ArgoCD are all changed in their own little unique way with their unique possibilities for state changes.
Following GitOps in the Kubernetes ecosystem is something that requires discipline.
> environments where “maintenance downtime” is acceptable and there are only one or two people that actually adjust the configurations
Yes, because before Kubernetes that was how all IT was done? A complete clown show, amirite?
Way too many businesses cargo-cult themselves into thinking that they need FAANG-class infra, even though they haven't got the same scale or the same level of resourcing. Devs and ops people love it because they get to cosplay and also get the right words on their CV.
If you're not Google-scale then, as you say, a few VMs or physical boxes are all you need for most systems. But its not sexy, so the business needs people who don't mind that.
And I mostly think that this is because our collective bias for everything Big Tech does is good. While often it just depends. Just because Google does X or Y doesn't mean it will work for everybody else.
Containers don't bother me that much on their own. I just feel like with k8s and its ilk I end up spending so much time futzing with weird YAML and trying to remember differences between stateful sets and services and fighting with stuff because I missed the thing that mounts a disk so my app will break in three hours.
[1] https://www.wgu.edu/online-it-degrees/cloud-computing-bachel...
Regarding AWS, function.zip Lambda + DynamoDB + S3 + SQS is basically this at "enterprise-grade".
Now you have the in-between left, where you want enterprise-grade (availability, scaling and fault tolerance) but with a catch (like lower costs, more control, data sovereignty or things that are not as serverless as you want, e.g. search engine etc.). In these cases you will run into many trade-off decisions, that may lead you to K8s, cloud specific stacks and also simple VM ClickOps / ShellOps setups.
As long as you can still be on the pet instead of cattle range of problems, K8s is not what you want. But as soon as you want cattle, reproducible throw-away online environments etc. "just have a VM or a physical server" will become a "build your own private cloud solution" that may or may not become a bad private cloud / K8s.
That said, my experience has been fairly different. Running microk8s on a VPS with some containers and an ingress just seems like a nice way to spin them up as pods and manage them. It really doesn't feel that complicated.
Once you integrate with cloud providers and you get more complex systems, sure, it gets more complex.
I much prefer the container paradigm for separating parts of apps out and managing dependencies and reproducibility. Having a bunch of raw servers with all the bullshit and drift and mutability that comes along with it feels far more like a headache to me than the abstraction of k8s or docker.
If you aren't deploying stuff with GitOps/IaC in 2024 I fear for anyone that has to manage or maintain it. 100% reproducibility and declarative infrastructure are wonderful things I could never come back from.
Stuff like k9s / microk8s / k3s are clutches and workarounds and I hope we all see it.
If they figure to use an actual programming language or just start using much smaller amount of files than they currently do then I'd be the first to learn k8s.
Before that, nope.
I love the idea but the implementation makes me want to slit my wrists.
That being said there are abstractions terraform, pulumi, others.
But my go to is always that most companies will never get to the point where k8s is required — most companies never get to that scale. A well maintained docker compose setup gets you a long way.
1. Cloud for me is a lot better than what we had before: Before i had to create a ticket for our internal it department, have huge cross-charges (like 500$ for a server, instead of 50), had to wait for a few weeks and than get lectured that installing basic default tools on that suse based server would take a week and add additional cross-charges onto it.
Their reasoning? Oh we do backup and upgrades...
With cloud, i click a server in a Minute for less money, i upgrade it myself and have snapshots as a basic backup workflow which actually is reliable and works.
Then k8s came along and let me be clear: my k8s setup is big, so its definitly worth it but tx to my experience with a bigger k8s setup, my very small ones are also working very very well. I get, out of the box, HA, network policies, snapshotting through my selected storage setup, Infrastructure as code etc.
Instead of having shitty shell scripts, ansible setup and co, i only write a little bit of yaml, check it into my git system and roll it out with ansible. Absolut no brainer.
And the auto healing solved real issues: Out of memroy? just restart the pod. Out of disk? Just recreate it. Logging and metrics just works out of the box thanks to the prometheus based monitoring stack.
Starting with one server, yeah why not but you know you are not doing it right if you are the only one who can set it up and recover it. But if you don't have anyone with expertise, i would also not just start with k8s.
If my startup is a pure application + db thingy, i would go with any app platform out there. But we have real servers because we do stuff with data and need sometimes performance, performance is expensive if you run it in the cloud.
Why are the shell scripts shity but the yaml not? When I look at those yaml files I always throw up just a little :P
Also, have you tried Cloud Run?
For shell scripts, try proper error handling. You start doing some catch hooks, you have issues cehcking error codes of different tools, debugging is hard too.
In one infra project we swtiched from shell scripts to golang just to have a lot more control/stability of our scripts.
Agree that shell scripts are also hard to work with, especially if you did not write them yourselves. I guess it's a combo of the language features of, say bash, and that no one who writes them really know bash. I mean, at all. Including me.
Declarative is nice, but also have pros and cons. And, it's of course many ways to achieve this if that's a priority.
Usually, what you really want is: Low time to replicate AND no data loss if a server blows up. But this also have to extend to, say, the k8 cluster. And, again, many ways to achieve this.
The article does not call for Ansible setups and shell scripts though.
Cloud Run uses YAML btw. One of the things I personally don't like about it
What i like with that declaritive setup: The other side, the executer, can be actually something reasonable to build and be reused. This strategy or architecture, feels a lot better than the classical approach especially because its so so often always the same thing.
There is a reason k8s gang keeps going on about “cattle not pets”. The starting assumptions and goals are fundamentally different vs “give me a physical server”
Both have their place I think so not really one is right other is wrong
Installing stuff straight on server is very messy especially if it’s lots of different providers with their own dependencies. So you need to do some form of containers or VMs to isolate them. At which point you need some sort of tooling around that. And deal with failures etc. Before you know it you’ve reinvented k8s except with less standardization and more duck tape.
So it think there is a strong case for a k8s cluster but being mindful to keep it as simple as possible within that paradigm. Ie k8s but just the basic building blocks
I feel like I spend so much time working around CloudSQL for postgres support in GCP at work, to a point where I'm not actually sure I'm saving a ton of time over running and managing it myself. That's probably not true, I'm sure there are edge cases that I'm not accounting for, but I'm a little tired of everyone acting like AWS and GCP and Azure are the "set it and forget it" thing that they advertise.
My comment above was more k8s vs classic server rather than thinking about cloud k8s in particular.
I do agree that cloud is stuff is a huge time sink. I’ve learned to look at it in terms of how close it is to FOSS like world. Things that follow normal protocols and standards like say it speaks Postgres or is a docker image then cloud is ok. Things that are cloud vendor specific or a custom product…run for the hills. Not only is it lock in but also that’s where the pain you describe is. The engineering around it just becomes so much more granular and fragile
yes, k8s and co are silly for trivially tiny systems like this.
What if we had something like Kubernetes but at the hardware level? Imagine a single Linux installation running across multiple servers, where resources are seamlessly pooled and managed. In htop, you could visualize all CPU cores, with each core labeled by its corresponding node.
Now, consider starting a container with Podman: the container would execute on the CPU cores of one node. If you start another container, it could run on the cores of a different node. This approach would essentially transform Linux into an operating system capable of spanning a distributed cluster of nodes.
To achieve this, the operating system wouldn’t need to be entirely reinvented—it could simply be Linux, enhanced with the necessary kernel modifications to enable such distributed functionality. This could provide the simplicity and efficiency of a unified OS while leveraging the power of a distributed system. Or maybe it's a pipe dream.
I did a 15~ month stint at AWS. Originally I was signed on to be a support engineer for their Linux teams, which sounded great! Absolutely within my skillset and a great way to get to play with new technologies and features at scale that I haven't before (Ansible, etc)
After going through all the hoops I get sidelined into the "Containers" team and have to learn Kubernetes, ECS, Fargate etc all effectively from scratch
It's all miserable. All of it. Massive Rube Goldberg machines of complexity for the sake of complexity that you need a team to decipher, let alone maintain.
Unfortunately it feels like all the Sysadmin jobs have been replaced either by "DevOps" or Cloud Engineers. There's little market left for people who just want to keep boxes inside a datacenter (or on prem) humming along. The ones that do all want that also seem to be all wedded to MSP's unfortunately
I have about 30 services at home, they do not require maintenance. They just update on their own (Home Assistant failed once in the last 10 years), there are no dependency hell and I need to only maintain one OS.
I like to code and the idea to hace "service as code" or rather "service as yaml" is great (though I hate yaml)
[1]: https://cloud.google.com/blog/products/serverless/knative-ba...
- https://www.parkytowers.me.uk/thin/hp/t620/
I didn't _need_ to, and it was a learning curve to setup that had me crying into my whisky some nights, but its been rock solidly running my various media server and development services for the past few years with no issues.
Sure, its basically a fancy wrapper around a bunch of docker containers, and I use hardly any of the features which k8s brings to the party, but your cold hard logic won't win over the warm and fuzzy feelings I get knowing I did something stupid and it works!
While they worked fine for HTTP workloads, we wanted to use them to consume from Pub/Sub and unfortunately the "EventArc" integrations are all HTTP-push based. This means there's no back pressure, so if you want the subscription to buffer incoming requests while your daemons work away there's no graceful way to do this. The push subscription will just ramp up attempting to DoS your Cloud Run service.
GKE is more initial overhead (helm and all that junk) but using a vanilla, horizontally scaled Kubernetes deployment with a _pull_ subscription solves this problem completely.
For us, Cloud Run is just a bit too opinionated and GKE Autopilot seems to be the sweet spot.
Interesting. My assumption would be that Cloud Run should quickly* spin up more containers to handle the spike and then spin them down again. So there would be no need for back pressure? Guess it depends on the scale? How big of a spike are we talking about? :)
*Let's say a few seconds
Even if you don't, if you have other services (like a database) downstream, you might not want it to scale infinitely as then you're simply DoS'ing the DB instead of the Cloud Run service.
Backpressure is really important for the resiliency of any distributed system, IMO.
Maybe the cloud companies could do something here by always keeping a small subset of machines online and ready to join the cluster. Provided there is some compromise in what the configuration is for the end user. I guess it doesn't solve image pulling. Pre-warming nodes is an annoying problem to solve.
Best solution I've been able to come up with is: Spegel (lightweight p2p image caching) + Karpenter (dynamic node autoscaling) + pods with low priority to hold onto some extra nodes. It's not perfect though
2. Apply appropriate changes to application resources (like parameters for spreading pods around)
3. Add descheduler[1] or similar tool to force redistribution of pods
4. Configure your cluster autoscaling params according to values from step (1) and have it autoscale before nodes are too heavily loaded.
Your cloud provider is already divvying up a racked server into your VPS's, via a hypervisor, then you install an OS on your pretend computer.
While i can see how containerized apps provide a streamlined devops solution for rare hard to configure software that needs to run on Acorn OS 0.2.3 only, it should never be the deployment solution for a public facing production web service.
Horses for courses.
1. Long running TCP connections. By default, Cloud Run terminates inbound TCP connections after 5 minutes. If you're doing anything that uses a long-running connection (e.g. a websocket), you'll want to change that setting otherwise you will have weird bugs in production that nobody can reproduce in local. The upper limit on connections is 1 hour, so you will need some kind of reconnection logic on clients if you're running longer than that.
Ref: https://cloud.google.com/run/docs/configuring/request-timeou...
2. First/second generation. Cloud Run has 2 separate execution environments that come with tradeoffs. First generation emulates Linux (imperfectly) and has faster cold starts. Second generation runs on actual Linux and has faster CPU and faster network throughput. If you don't specify a choice, it defaults to first generation.
Ref: https://cloud.google.com/run/docs/about-execution-environmen...
3. Autoscaling. Cloud Run autoscales at 60% CPU and you can't change that parameter. You'll want to monitor your instance count closely to make sure you're not scaling too much or too little. For us it turned out to be more useful to restrict scaling on request count, which you can control in settings.
Ref: https://cloud.google.com/run/docs/about-instance-autoscaling
Similarly, you don't need Kubernetes, but if you want something that makes developer's life's easier, makes it easy to use a singular API, has many, many integrations and tooling, then you're better off with K8s. Sure, you can go with VMs, but now you have to scale and manage your application on a per VM level instead of per container. You have to think about a lot of cloud specific services, network policies, IAMs, I don't know what else, scaling.
I guess what I'm saying, you always have the option of writing in Assembly, but why would you when you can have a higher level language that abstracts most of it away. Yes, the maintenance burden on the devops/platform team is higher, but it's so much easier for users of the platform to use said platform.
I'd use Kubernetes even if I was spinning up a single VM and installing k3s on it. It's a universal deployment target.
Spinning up a cluster isn't the easiest thing, but I don't understand how a lot of the complaints around this come from sysadmin-type people who replace Kubernetes with spinning up VMs instead. The main complexity I've found from managing a cluster is the boring sysadmin stuff, PKI, firewall rules, scripts to automate. Kubernetes itself is pretty rock solid. A lot of cluster failure modes still result in your app still working. Etcd can be a pain but if you want you can even replace that with a SQL database for moderate sized clusters if that's more your thing (easier to manage HA control plane then too if you've already got a HA SQL server).
Or yes just use a managed Kubernetes cluster.
What's wrong with recommending a managed cluster? I wouldn't use one but it is certainly an option for teams that don't want to spin up a cluster from scratch, although it comes with its own set of tradeoffs.
My project at the moment is definitely easier thanks to Kubernetes as pods are spun up dynamically and I've migrated to a different cloud provider and since migrated to a mix of dedicated servers and autoscaled VMs, all of which was easy due to the common deployment target rather than building on top of a cloud provider specific service.
Generally the only issue was forgetting to update whatever you use to setup the resources, because apiserver auto-updated the formats to the point worst case you could just grab them with kubectl get ... -o yaml/json and trim the read-only fields.
And I was there also curious about it , and had posted some questions , though now I am not sure if all of those were interested.
But yeah , I also agree with this sentiment , personally I would much rather optimize my code in hopes that it can run on single machine (vps like hetzner) instead of the dread of kubernetes
Though I also want to see kubernetes , so maybe some day I am going to experiment it for the sake of experimentation with something like kubectl or other minimalist single computer based approach to learn new things.
I also feel like I can't benchmark things unless done through kubernetes like anton putra , so I am not sure.
Disclaimer: I am the Dokku maintainer
>Moreover, Kubernetes’s slow autoscaling meant I had to over-provision services to ensure availability, paying for unused resources rather than scaling based on demand.
A typical Linux instance on AWS starts up in about 8 seconds from the asking to start to command line, so lets double that. You could start up and shut down instances in 15 seconds.
Why the hell do you need to overprovision instances then, or leave empty ones running?
I don't see any other reasonable explanation than its in the best interest of cloud vendors to not have short lived instances for the purposes of load balancing, as well as make you consume as much CPU time as possible, even if you don't need it.
But the truth is that we ran service based architecture, network meshes and containers with bash just fine before cloud everything, usually with less effort than it is to do literally anything today. Sure you had to know how to set up network bonding and how to tune your systems.
Very, very few people and businesses _needs_ kubernetes or it's cousins. Most just need a decent system administrator.
With IaC, all decisions are templated in the code and the replacement has full insight into the state of the machine.
I remember how kernel module params for ALG/bonding/teaming had to be figured out for each fancy NIC/driver, it was fun, but definitely not great. Of course this is mostly solved if you pay the cloud premium.
Most businesses need a product guy/gal, a website, then a developer, then later one VM or Heroku or something. And maybe if there's at least a few customers, yeah, it makes sense to think about ops (as in business ops), and then eventually sure, engineering needs to scale to solve the challenges, and that might mean getting a sysadmin.
I'm constantly told "I am not the typical customer, most do use K8s".
Side projects and volunteer things I run include 470 websites serving about 320K registered users and about 500K-1M monthly guest users... on about 15 VPS devices or small servers.
It's just classic dynamic websites, HTML is the output of a server (no single page web apps), lots of caching for guest users, reasonable complexity (load balancers ahead of web front ends ahead of API back ends ahead of databases).
Things that look like microservices exist... it's just API calls, and they come back through the front door, it's easy to reason about.
Monitoring is mostly Prometheus node exporter, it turns out that CPU + Memory + Disk IOPS + Network IOPS is +90% of what you need... some HTTP logs or profiles are the last 10%.
It's just simple... this is run in my spare time, effort is under an hour per month. (and no it's not monetised, not everything has to be)
It's a bit short-sighted (granted not intentionally) to talk about k8s not being fit for purpose without specifying the workloads it's not suitable for and at what scale of operations. Often times when these discussions come up it's usually reduced to a stateless web app being the justification for illustrating the shortfalls of k8s as a solution.
Personally, I'm not a big fan of k8s because it can be complex but the alternatives tend to be more complex. I can guarantee, your solution over time will converge on operational knowledge trapped in the minds of your team members , runbooks and custom scripts that express an abstraction that k8s implements perfectly well.
So yes, if you have a simple setup you can use a "web" of docker servers.
But if your needs are just a bit more complex, or you have a mix of different technology (rubyonrails, python flask, java spring boot) and you want to standardize the communication, the security, the performance tracking and so on, K8s is the way.
The problem is the microservice architecture: the complexity is pushed to the infrastructure by the microservices, and K8s is a super-generic way to solve that complexity.
It is like unpacking (in a exploding-kitten mode) an application server on the your Intranet.
There is a huge tail of small websites where if it comes down someone can fix it and that's fine.
Wait... so you're saying, I can rent a couple of beefy boxes instead, and live a life where I don't need to know what TolerateDuringExecutionButNotDuringScheduling does?!
If you want to skip devops work of setting up cluster, go K8s managed. All major cloud providers give you that option.
So, you replaced opensource, standard cloud solution such as K8s with proprietary cloud platform by Google. I mean, good for you but for many of us this sounds backwards.
As a bootstrapped founder, going with K8s was the best decision ever: deployment, scaling, custom resources, jobs, cron, etc, all that stuff comes included and it costs writing a YAML file.
Do you have to learn something to use K8s. Yes, like with everything.
Is it difficult? No more difficult than any other solution out there.
Works but feels substantially weaker than on coding tasks. Not sure whether that’s lack of training material or the lack of execution flow code has.
There is already people who uses k8s that does not understand how it works or even know how the app works inside k8s.
Adding AI to it is just asking for a disaster down the line.
... you spin knative on kubernetes cluster and copy over the actually-k8s-manifests over :P
Dark background and the text is the faintest of shades lighter.
I don't mean this in a nitpicky way. Normally I'm fine with most sites.
I have to highlight the text to read it.
?!
How else are you managing your infrastructure besides having redundancy in the control plane? Even if it’s a chef or puppet server, or even sysadmin Dave running scripts on his laptop, you should still have redundancy.
> Moreover, Kubernetes’s slow autoscaling meant I had to over-provision services to ensure availability, paying for unused resources rather than scaling based on demand.
??????
Slow compared to what exactly? Anecdotally, k8s with karpenter is significantly faster to scale than auto scaling groups.
People argue that it is expensive for zillions of users but I have never reached that scale