People often ask why I hardly ever have any prod issues (zero so far this year). This is part of the reason. Having consistent codebases that are written in a specific style and implement things in similar manner.
Some codebases make me feel like I’m reading a book in multiple languages …
I saw what you did there.
It also helps that we're still only in January!
In most cases the codebase does consist of muliple languges.
You don't know the 10 years of reasons behind why the code is the way it is, and the safest thing is to stay as close as possible to how the existing code is written, both to avoid landmines, and so that future you (or someone else) has one less peculiar style they have to figure out.
All that said, the more usual case is that the code is already a huge mess of different styles, because the 100 different developers who have touched it before you didn't follow this advice.
But that’s a judgment call based on experience.
I've worked on a project with absolutely terrible duplication of deserialisers of models, each one slightly different even those most properties were named the same and should've been handled the same. But we can't touch anything because important things are happening in business and we can't risk anything. The ignored part was that this led to bugs and extreme confusion from new people. They were even too worried to accept a repo-wide whitespace normalisation.
Of course, I knew that writing the code the “pretty”, more maintainable, easier to understand way wouldn’t take any longer to write, and might take less time as the refactoring requires less new code overall.
But I didn’t bother explaining all that. Just nodded then went and implemented it the best way as I was the experienced software engineer.
Engineers often adhere too rigidly to these principles rather than taking a pragmatic approach that balances existing practices with future improvements.
Progress and improvement is fine, great even, but consistency is more important. If you change a pattern, change it everywhere.
Document the old pattern, document the new pattern, discuss, and come up with a piece by piece plan that is easy to ship and easy to revert if you do screw things up.
Unless the old pattern is insecure or burns your servers, that is.
The down side to this that I've experienced more than once, though, is incomplete conversions: we thought we had agreement that the change should be done, it turns out to be more difficult than planned, it gets partially completed and then management has a new fire for us to fight, resources are taken away, so you still have two or more ways of doing things.
I think the point is, either actually commit to doing that, or don’t introduce the new pattern.
Therein lies the rub. Everyone has a different idea of what is an improvement in a codebase. Unless there's some performance or security concern, I'd much rather work in an "old" style codebase that's consistent than a continually partially updated codebase by multiple engineers with different opinions on what an "improvement" is.
Yes, and consistency is the tie-breaker. So the status quo remains, and improvements aren't made.
The rule isn't "don't introduce change", it's "be consistent". Using the example from the post, if you want to use a different method of doing auth that simpler the "be consistent" rule means you must change the way auth is done everywhere.
Interestingly, if you do that the negatives he lists go away. For example, if the global auth mechanism handles bots specially, you will learn that if you are forced to change it everywhere.
ETA: upon reflection I'd consider that programmer a canonical example of the kinds of mistakes the author covers in the article.
My old boss used to say: "Be a chameleon. I don't want to know that I didn't write this."
Isn't part of good engineering trying to reduce your dependencies, even on yourself? In a latter part of the post, OP says to be careful tweaking existing code, because it can have unforeseen consequences. Isn't this the problem that having deep vertical slices of functionality tries to solve? High cohesion in that related code is grouped together, and low coupling in that you can add new code to your feature or modify it without worrying about breaking everyone else's code.
Does this high cohesion and low coupling just not really work at the scale that OP is talking about?
Now, once you have a deeper understanding of the codebase, you'll know when and why to break away from existing patterns, but in the beginning phase, it's a good habit to start by learning carefully how things are designed and why.
Code-consistency is a property just like any other property, e.g. correctness, efficiency, testability, modifiability, verifiability, platform-agnosticism. Does it beat any of the examples I happened to list? Not a chance.
> worrying about breaking everyone else's code
You already said it, but just to expand: if you already have feature A, you might succeed in plumbing feature B through feature A's guts. And so on with feature C and D. But now you can't change any of them in isolation. When you try to fix up the plumbing, you'll now break 4 features at once.
So the only tweak I'd make here, is that if you are tempted to copy a bit of code that is already in 100 places, but with maybe 1% of a change - please, for the love of god, make a common function and parameterize out the differences. Pick a dozen or so instances throughout the codebase and replace it with your new function, validating the abstraction. So begins the slow work of improving an old code base created by undisciplined hands.
Oh, and make sure you have regression tests. The stupider the better. For a given input, snapshot the output. If that changes, audit the change. If the program only has user input, consider capturing it and playing it back, and if the program has no data as output, consider snapshotting the frames that have been rendered.
If we just create copies of copies forever, products degrade slowly over time. This is a problem in a few different spheres, to put it lightly.
The main rule is a good one, but the article overfocuses on it.
People see the ugliness -- because solving real problems, especially if business practices are involved, is often very messy -- but that's where the value is.
It's probably used in the (now) classic sense as defined by M. Feathers in his "Working with legacy code" book.
Code that is old but otherwise awesome, maintainable (or even actively maintained) and easy / a joy to work with are rarely referred to as "legacy code".
But you will quickly learn how awesome the old code base is if you attempt to rewrite it, and realize everything the old code base takes into account.
Also, they're will be four incomplete refactorings, and people will insist on it matching the latest refactoring attempt. Which, will then turn out to be impossible, as it's too unfinished.
For example, it could be a lot of individual small projects all sitting on some common framework. Just as an example: I've seen a catering business that had an associated Web site service which worked as follows. There was a small framework that dealt with billing and navigation etc. issues, and a Web site that was developed per customer (couple hundreds shops). These individual sites constituted the bulk of the project, but outside of the calls to the framework shared nothing between them, were developed by different teams, added and removed based on customer wishes etc. So, consistency wasn't a requirement in this scheme.
Similar things happen with gaming portals, where the division is between some underlying (and relatively small) framework and a bunch of games that are provided through it, which are often developed by teams that don't have to talk to each other. But, to the user, it's still a single product.
This resonates. At one former company, there was a clear divide between the people working on the "legacy monolith" in PHP and the "scalable microservices" in Scala/Go. One new Scala team was tasked with extracting permissions management from the monolith into a separate service. Was estimated to take 6-9 months. 18 months later, project was canned without delivering anything. The team was starting from scratch and had no experience working with the current monolith permissions model and could not get it successfully integrated. Every time an integration was attempted they found a new edge case that was totally incompatible with the nice, "clean" model they had created with the new service.
Yep and that's what I've seen be successful: someone who really knows the existing code inside and out, warts and all, needs to be a key leader for the part being broken out into a separate system. The hard part isn't building the new system in these projects, it's the integration into the existing system, which always requires a ton of refactoring work.
Indeed, the developer was one of the best programmers I've known and absolutely the key person on the system. The New Guys Clique were the sort of developers, you might know some, who come in, look at the existing systems, decide it's all wrong and terrible, and set out to Do It Right.
His proposed architecture wasn't without elegance, but it was also more complex and, more importantly, it didn't solve any problems that we actually had. So in the end it was more of an ideological thing.
He seemed to take it personally that we didn't take him up on his proposal, and he left a few months later (and went back to his previous employer, though to a different group). Don't think he was around for even a year. He wasn't young either; he was pretty experienced.
You can do successful rewrites but your rewrite has to be usable in production within like a month.
If you don’t know how to achieve that, don’t even try.
The quiet developer was able to get their own rewrite done because they understood that.
Looks like the director of engineering showed some classic inexperience. You can tell when someone has done something before and when it’s their first time.
And it was never constrained to rewriting the existing system. The rewrite plan was motivated by the entirely reasonable desire to make further improvements possible, an additional mistake was the attempt to add major improvements as part of the rewrite. The new guys made their disdain for the existing system obvious, to the extent that their intent for the rewrite ballooned into a ground-up rebuild of everything.
Things You Should Never Do, Part I: https://www.joelonsoftware.com/2000/04/06/things-you-should-...
I strongly disagree with this, and it reminds me of one of the worse Agile memes: "With every commit, the product must be production-ready.". [0]
The rewrite has to be generally not behind schedule. Whatever that schedule is is up to the folks doing the work and the managers who approve doing the work.
[0] I've worked for an Agile shop for a long time, so please don't tell me that I'm "Doing Agile Wrong" or that I misunderstand Agile. "No True Scotsman" conversations about Agile are pretty boring and pointless, given Agile's nebulous definition.
But there's no way to really describe it. It's like explaining to somehow how to parallel park or do a kickflip... you can only explain it so much.
I like to say "it should be usable in production soon" because it's generally a good approximation that takes into account what you might have to work with. It's an upgrade from advice like Joel's who just say "IT NEVER WORKS"
What you originally said ("It must be usable in production within a month") is equivalent to "Just don't do it, because IT NEVER WORKS" for all but the smallest, simplest projects out there in Professional Programmer land. [0]
> But there's no way to really describe it.
There really is a way to describe it:
"The rewrite has to be generally not behind schedule. Whatever that schedule is is up to the folks doing the work and the managers who approve doing the work."
Establishing that schedule is the same sort of cost/benefit and expected-level-of-difficulty analysis that should be done before planning any nontrivial work in Professional Programmer land. All but the most green or most sheltered-from-Process programmers are at least aware of this analysis. Many of those who are aware of it have participated in it.
[0] Or the most well-designed projects, which have small, easily understood pieces with easily-comprehensible interactions with the rest of the system, that can be quickly and easily replaced with new pieces. There's not much out there like that... and I'd imagine the task of "making major changes to how many of those pieces interact with each other" wouldn't be usable in production in a month for most of those systems.
But, man, sometimes software is fit-for-purpose and can really be just be left alone for extended periods. Other times, the users of that software upgrade on a hemi-annual or annual schedule (or even LESS frequently), so they'd never notice a three month delay in new releases.
I once came into an old codebase like this as a junior, thinking I can start again. And I was gently but firmly told by my boss that it wouldn't work, this software is crucial to operations that support all our revenue, and while it can be improved it has to keep working. And we improved the hell out of it.
Rewrites from scratch never work with sufficiently large systems, and anyone that’s been involved with these things should be savvy enough to recognize this. The only question is around the exact definition of sufficiently large for a given context.
One of the benefits is that you get to compare the new vs old implementation quickly and easily, and it lets you raise questions about the old implementation; every time I've done this I've found real, production-impacting bugs because the old system was exhibiting behaviors that didn't match the new system, and it turned out they weren't intentional!
[0] http://sevangelatos.com/john-carmack-on-parallel-implementat...
If it’s a big new team that doesn’t know what they’re doing, working separately from existing codebase, with lots of meetings… I see no reason why it would finish at all.
The devs were good developers, too! Two people on the team went off to Google after, so it's not like this was due to total incompetence or anything; more just overambition and lack of familiarity with working on legacy code.
Bluntly, yes. And so is every other reply to you that says "no this isn't naive", or "there's no reason this project shouldn't have finished". All that means is that you've not seen a truly "enterprise" codebase that may be bringing in tons of business value, but whose internals are a true human centipede of bad practices and organic tendrils of doing things the wrong way.
Currently there. On one hand: lot of old code which looks horrible (the "just put comments there in case we need it later" pattern is everywhere). Hidden scripts and ETL tasks on forgotten servers, "API" (or more often files sent to some FTP) used by one or two clients but it's been working for more than a decade so no changing that. On the other: it feels like doing archeology, learning why things are how they are (politics, priority changes over the years). And when you finally ship something helping the business with an easier to use UI you know the effort was not for nothing.
Everything takes longer than you think and this sounds like it involves at least 2 teams (the php team and the scalar team). Every team you include increases time line factorially in the best case.
It takes a lot of time to have meetings with a dozen managers to argue over priority and whatever. Especially since their schedules are full of other arguments already
Yes.
edit: job = ticket task
Anyway it doesn't sound like that was a very mature project or developers, not when the reviewer decide to just edit code instead of provide a review.
Outside of that, the style guide is law.
"dotnet format" can do wonders, and solved most serious inconsistency issues.
A lot of it boils down to "because the people writing code parsers/lexers weren't thinking about usability". Writing a C formatter, for example, depends on having a parser that doesn't behave like a compiler by inlining all your include files and stripping out comments. For a long time, writing parsers/lexers was the domain of compiler developers, and they weren't interested in features which weren't strictly required by the compiler.
Another effect of those improvements, incidentally, has been higher quality syntax errors.
The instinct to keep doing things the wrong way because they were done the wrong way previously is strong enough across the industry without this article.
I love to
> take advantage of future improvements.
However, newer and better ways of doing things are almost invariably inconsistent with the established way of doing things. They are dutifully rejected during code review.
My current example of me being inconsistent with our current, large, established database:
Every "unit test" we have hits an actual database (just like https://youtu.be/G08FxxwPjXE?t=2238). And I'm not having it. For the module I'm currently writing, I'm sticking the reads behind a goddamn interface so that I can have actual unit tests that will run without me spinning up and waiting for a database.
As for starting databases during tests, it's saved me a lot of trouble over the years. One time, we used sqlite for tests and Postgres for production. We had some code that inserted like `insert into foo (some_bool) values ('t')` and did a query like `select * from foo where some_bool='true'`. This query never matched rows in the tests, because t != true in SQLite, but t == true in Postgres. After that, I found it easier to just run the real database that's going to be used in production for tests. The only thing that behaves identically to production is the exact code you're running in production.
Over here, I have code that uses a hermetic Postgres binary (and chain of shared libraries because Postgres hates static linking) that starts up a fresh Postgres instance for each test. It takes on the order of a millisecond to start up: https://github.com/jrockway/monorepo/blob/main/internal/test.... The biggest problem I've had with using the "real" database in tests is low throughput because of fsync (which `perf` showed me when I finally looked into it). Fortunately, you can just disable fsync, and boy is it fast even with 64 tests running in parallel.
One thing that's been slow in the past is applying 50 migrations to an empty database before every test. When you have one migration, it's fast, but it's one of those things that starts to slow down as your app gets big. My solution is to have a `go generate` type thing that applies the migrations to an empty database and pg_dumps resulting database to a file that you check in (and a test to make sure you remembered to do this). This has two benefits; one, tests just apply a single SQL file to create the test database, and two, you get a diff over the entire schema of your database for the code reviewer to look at during code reviews. I've found it incredibly useful (but don't do it for my personal projects because I've been lazy and it's not slow yet).
Overall, my take on testing is that I like an integration test more than a unit test. I'd prefer people spend time on exercising a realistic small part of the codebase than to spend time on mocks and true isolation. This is where a lot of bugs lie.
Of course, if you are writing some "smart" code and not just "glue" code, you're going to be writing a lot of unit tests. Neither replaces the other, but if you can spend 30 seconds writing a test that does actual database queries or 2 weeks mocking out the database so the test can be a unit test instead of an integration test, I'd tell you to just write the integration test. Then you know the real code works.
The actual things are IO devices, and will sometimes fail and sometimes succeed. No judgement, just a fact of life.
I code my tests such that my logic encounters successes, timeouts, exceptions, thread-cancellations, etc. All at unit-test speed.
I can't trick an MSSQL deployment into returning me those results.
It doesn't take 30 seconds to test what your system will do in 30 seconds.
At scale, there will always be challenges with latency, through put, correctness, and cost of persisting and retrieving data that require considering the specifics of your persistence code.
The service I’m describing handles abstracting these persistence concerns so other services can be more stateless and not deal with those issues.
Your example is a good example; you call it a unit test, but if it hits a real database it's by definition an integration test. No mocked database will be as accurate as the real deal. It'll be good enough for unit tests (amortize / abstract away the database), but not for an integration test.
I test my units more thoroughly than integrations allow. Make the db return success, failure, timeout, cancellation, etc.
One of my colleagues was trying to prevent a race condition towards the end of last year. He wanted the first write to succeed, and the second to be rejected.
I suggested "INSERT IF NOT EXISTS". We agreed that it was the best approach but then he didn't put it in because the codebase doesn't typically use raw SQL.
Time spent writing good unit tests today allows you to make riskier changes tomorrow; good unit tests de-risk refactors.
Therefore, I see unit tests as one pillar but also suspect that without good quality integration or end-to-end testing you won't be able to realize the riskier re-factors you describe. Perhaps you consider these part of your regression testing and if so, I agree.
In this scenario, I've found that the only productive way forward is to do the best job you can, in your own isolated code, and share loudly and frequently why you're doing things your new different way. Write your code to be re-used and shared. Write docs for it. Explain why it's the correct approach. Ask for feedback from the wider engineering org (although don't block on it if they're not directly involved with your work.) You'll quickly find out if other engineers agree that your approach is better. If it's actually better, others will start following your lead. If it's not, you'll be able to adjust.
Of course, when working in the existing code, try to be as locally consistent as possible with the surrounding code, even if it's terrible. I like to think of this as "getting in and out" as quickly as possible.
If you encounter particularly sticky/unhelpful/reticent team members, it can help to remind them that (a) the existing code is worse than what you're writing, (b) there is no documented pattern that you're breaking, (c) your work is an experiment and you will later revise it. Often asking them to simply document the convention that you are supposedly breaking is enough to get them to go away, since they won't bother to spend the effort.
On the other hand if no one in your company cares about consistency, at some point everything becomes so awful you basically won't be able to retain engineers or hire new ones, so this is a place where careful judgement is needed.
On the other hand, data and logic consistency can be really important, but you still have to pick your battles because it's all tradeoffs. I've done a lot of work in pricing over the decades, and it tends to be an area where the logic is complex and you need consistency across surfaces owned by many teams, but at the same time it will interact with local features that you don't want to turn pricing libraries/services into god objects as you start bottlenecking all kinds of tangentially related projects. It's a very tricky balance to get right. My general rule of thumb is to anchor on user impact as the first order consideration, developer experience is important as a second order, but many engineers will over-index on things they are deeply familiar with and not be objective in their evaluation of the impact / cost to other teams who pay an interaction cost but are not experts in the domain.
A couple days later I am told this is not the way to do X. You must do it Y? Why Y? Because of historical battles won and lost why, not because of a specific characteristic. My PR doesn't work with Y and it would be more complicated...like who knows what multiplier of code to make it work. Well that makes it a harder task than your estimate, which is why nobody ever took it up before and was really excited about your low estimate.
How does Y work? Well it works specifically to prevent features like X. How am I supposed to know how to modify Y in a way that satisfies the invisible soft requirements? Someone more senior takes over my ticket, while I'm assigned unit tests. They end up writing a few hundred lines of code for Y2.0 then implement X with a copy paste of a few lines.
I must not be "a good fit". Welcome to the next 6-12 months of not caring about this job at all, while I find another job without my resume starting to look like patchwork.
Challenging people's egos by providing a simpler implementation for something someone says is very hard, has been effective at getting old stagnant issues completed. Unnaturally effective. Of course, those new "right way" features are just as ugly as any existing feature, ensuring the perpetuation of the code complexity. Continually writing themselves into corners they don't want to mess with.
These topics are common knowledge, if you have interviewed in the last 5 to 10 years. I have been working for 25, so I find the blame trying to be redirected, by some, misguided.
"Why do I have to use the system button class. I implemented my own and it works."
"Because when the OS updates with new behavior your button may break or not get new styling and functionality"
"But this works and meets the spec, that's 10x harder"
The hard part of being an engineer is realizing that sometimes even when something is horribly wrong people may not actually want it fixed. I've seen systems where actual monetary loss was happening but no one wanted it brought to light because "who gets blamed"
(Of course, if you carry that principle to the extreme you end up with a lot of black-box networked microservices.)
Not really my experience in teams that create inconsistent, undocumented codebases... but you might get 1 or 2 converts.
```
let susJsonString = '...' // we get this parseable json string from somwhere but of course it might not be parseable. so testing seems warranted...
try { // lets bust out a while loop!
while(typeof susJsonString === 'string') { susJsonString = JSON.parse(susJsonString) }
} catch { susJsonString = {} }
// also this was a typescript codebase but all the more reason to have a variable switch types! this dev undoubtedly puts typescript at the top of their resume
```
I suppose this works?! I haven't thought it through carefully, it's just deciding to put your shoes on backward, and open doors while standing on your head. But I decided to just keep out of it, not get involved in the politics. I guess this is what getting old is like seriously you just see younger people doing stuff that makes your jaw drop from the stupidity (or maybe its just me) but you can't say anything because reasons. Copilot, ai assisted coding only further muddies the waters imo.
Typescript is not going to make it better.
The problem is whoever is producing the data.
I suppose what is wanted is something like
let parsedJSON = {}
try { parsedJSON = JSON.parse(susJsonString) } catch { //maybe register problem with parsing. }
JSON.parse(JSON.parse("\"{foo: 1}\""))
I'd guess the problem is something upstream. let susJsonString=JSON.stringify(JSON.stringify(JSON.stringify({foo:1})))
console.log("initial:", susJsonString);
try {
while(typeof susJsonString==='string') {
susJsonString = JSON.parse(susJsonString);
console.log("iteration:", typeof susJsonString, susJsonString);
}
} catch {
susJsonString = {};
}
I see: initial: "\"{\\\"foo\\\":1}\""
iteration: string "{\"foo\":1}"
iteration: string {"foo":1}
iteration: object {foo: 1}
A comment explaining the sort of "sus" input it was designed to cope with may have been helpful. while(typeof susJsonString==='string') {
susJsonString = JSON.parse(susJsonString);
as it'll keep reassigning and parsing until gets a non string back (or alternatively error out if the string is not valid json)let susJsonString = '...'
example
but evidently it is not just that it is serialized multiple times, otherwise it shouldn't need the try catch (of course one problem with online discussion of code examples is you must always assume, contra obvious errors, that the code actually needs what it has)
Something upstream, sure, but often not something "fixable" either, given third parties and organizational headaches some places are prone to.
The blanket catch is odd though, as I'd have thought that it would still be outputting valid json (even if it has been serialized multiple times), and if you're getting invalid json you probably want to know about that.
It’s applying the operation recursively.
Please take a lesson from this. Good code is not the one that follows all the rules you read online. Your coworker you dismissed understood the problem.
I think asking questions is ideal. Even when I'm 99% sure a line is blatantly wrong, I will ask something like, "What is this for?". Maybe I missed something - wouldn't be the first time.
I reserve my general opinion on the quality of this coder's work, as evidenced by the quality of the app itself among other things. But I guess you'd have to just trust (or not trust) me on that.
But the opinion what makes code good differ a lot between software developers. This exactly leads to many of the inconsistencies in the code.
Not to say I got everyone to march to my drum -- the "best practices" was a shared effort. As you said, sometimes it just takes someone to call things out. We can do things better. Look at how things improve if you approach X problem in Y manner, or share Z code this way. Maybe the team was overwhelmed before and another voice is enough to tip the scales. If you don't try, you'll never know.
in my scenario, those people were gone.
This has also been my experience. Usually there is a "Top" sticky/unhelpful/reticent person. They are not really a director or exec but they often act like it and seem immune from any repercussions from the actual higher ups. This person tends to attract "followers" that know they will keep their jobs if they follow the sticky person for job security. There usually are a few up and coming people that want better that will kinda go along with you for their own skill building benefit but its all very shaky and you can't count on them supporting you if resistance happens.
I've literally had the "I was here before you and will be after" speech from one of the "sticky's" before.
All these HN how to do better write ups seem to universally ignore the issues of power and politics dynamics and give "in a vacuum" advice. Recognizing a rock and a hard place and saving your sanity by not caring is a perfectly rational decision.
The response is accurate - anyone that's had to deal with a legacy code base has had to deal with the creators of said birds nest (who proudly strut around as though the trouble it causes to maintainability makes them "clever").
The only other way I have succeeded is to appeal to the sticky person's ego, make them think that it's their idea.
Note: I have also had to deal with
Sticky person: Do it this way
Me: But X
Sticky Person: No, do it the way I have decreed
[...]
Three hours later (literally)
Sticky Person: Do it X way
My first day, I couldn't even stand the code base up on my local dev environment, because there were so many hard-coded paths throughout the application, it broke (they were unwilling to fix this or have me fix it).
I tried to accept their way of coding and be part of the team, but it got too much for me. They were staunch SVN supporters. This isn't much of a problem, but we had constant branching problems that Git would have resolved.
As I got assigned work, I noticed I would have to fix more bugs and bad coding, before I could even start the new addition/feature. It was riddled with completely obvious security vulnerabilities that were never fixed. Keep in mind that this was the new product of the entire company with paying customers and real data.
The team lead was also very insecure. I couldn't even nicely mention or suggest fixes in code that he had written. The interesting thing is that he didn't even really have a professional coding background. He went straight from tech support to this job.
I lasted about a year. I got let go due to 'money issues'. Shortly before this, they wanted me to merge my code into my branch with the Jr. developer's code right before my vacation (literally the day before).
I merged it and pushed it up to the repo (as instructed) and the team lead sent me nasty emails throughout my vacation about how various parts of my code 'didn't work'. Not only were these parts the Jrs code, it wasn't ready for production.
The other thing to know about the team lead is that he was extremely passive aggressive and would never give me important project details unless I asked (I'm not talking details, just high-level, what needs to be completed).
We had a call where he told me I 'wasn't a senior developer'. I wanted to tell him to fuck off, but I needed the job. The company went out of business 2 months later.
I found out their entire business model relied only on Facebook Ads, and they got banned for violating their rules.
These sort of people will vote for you publicly. However some lot them will still take the path of least resistance when you aren’t looking.
It was sort of a nasty surprise when I figured out one day that there are people in this industry that will agree with high minded sentiments in public but not lift a finger to get there. I ended up in a group that had two or three of them. And one day due to a requirements process fuckup we had a couple weeks with nothing to do. They just did the Hands Are Tied thing I’d been seeing for over a year (yes we should do X but we have to do Y for reasons) and I saw red. Luckily I was on a conference call instead of sitting in front of them at that moment. But I’m sure they heard the anger lines over the phone.
If the boss doesn’t give you an assignment, you work on tech debt they haven’t previously insisted that you work on. Simple as that. At most places if my boss disappeared, I could keep busy for at least three months without any direction. And keep several other people busy as well. If you don’t know what to work on then I don’t know what’s wrong with you.
I know (most?) people don't mean it literally when writing something like this but I still wonder why such self-evident ideas as "make things easy to use correctly and hard to use incorrectly" are framed in terms of "idiots who don't rtfm".
The best documentation is what wasn't written because it (actually!) wasn't needed. On the other hand, even if people aren't "idiots", they still make mistakes and take time to figure out (perhaps by reading tfm) how to do things and complete their tasks, all of which has a cost. Making this easier is a clear benefit.
Now you have N+1 ways.
It can work if you manage to get a majority of a team to support your efforts, create good interfaces into the legacy code paths, and most importantly: write meaningful and useful integration tests against that interface.
Michael Feathers wrote a wonderful book about this called, Working Effectively with Legacy Code.
I think what the author is trying to say with consistency is to avoid adding even more paths, layers, and indirection in an already untested and difficult code base.
Work strategically, methodically, and communicate well as you say and it can be a real source of progress with an existing system.
My (rather unfortunate) conclusion is that when I encounter this behavior I move to another team to avoid it. If that’s not possible it’s honestly worth looking for another job.
If there are 5 different standards in the codebase, don't just invent your own better way of doing things. That is literally the xkcd/Standards problem. Go find one of the people who have worked there the longest and ask which of the 5 existing standards are most modern and should be copied.
And as you get more experience with the codebase you can suggest updates to the best standard and evolve it. The problem is that you then need to own updating that whole standard across the entire codebase. That's the hard part.
If you aren't experienced enough with the codebase to be aggressive about standardization, you shouldn't be creating some little playground of your own.
I strongly disagree with you and believe you've missed the point of my comment. Think about this: why are there 5 different standards in the codebase, none of which meet your needs? Do you think any engineers on the team are aware of this situation? And how might you get more experience with the codebase without writing code that solves your problems?
Standards evolve over time, as do the languages and frameworks. Old code is rarely rewritten, so you end up with layers of code like geological strata recording the history of the developer landscape.
There’s a more complicated aspect of “Conway’s law, but over time” that’s hard to explain in a comment. And anyway, Casey Muratori did it better: https://youtu.be/5IUj1EZwpJY?si=hnrKXeknMCe0UPv4
In this situation, ‘getting more experience in the code base’ is more or less synonymous with ‘getting paged on the weekend’.
Yes, there probably are. If you haven't been working there for long enough to know who they are, then you shouldn't be YOLO'ing it.
The fact that it hasn't all been cleaned up yet is due to that being an order of magnitude harder than greenfielding your own standard. That doesn't mean that nobody is aware of it, or working on it.
I've absolutely worked for a decade on a codebase which had at least 5 different standards, and I was the one responsible for cleaning it all up, and we were understaffed so I could never finish it, but I could absolutely point you at the standard that I wanted you to follow. It also probably was somewhat deficient, but it was better than the other 4. It evolved over time, but we tried to clean it all up as we went along. Trying to ram another standard into the codebase without talking it over with me, was guaranteed to piss me off.
A lot of inconsistency is the result of unwillingness to fix other people's stuff. If your way is better, trust people to see it when applied to their own code. They probably have similar grievances, but it has never been a priority to fix. If you're willing to spend the time and energy, there's a good chance they'll be willing to accept the results even if it does cause some churn and require new learning.
(Source: I have worked on Firefox for a decade now, which fits the criteria in the article, and sweeping changes that affect the entire codebase are relatively common. People here are more likely to encourage such thinking than to shoot it down because it is new or different than the status quo. You just can't be an ass about it and ignore legitimate objections. It is still a giant legacy codebase with ancient warts, but I mostly see technical or logistical obstacles to cleaning things up, not political ones.)
> Hopefully, you have a monorepo or something with similar effects, and a lack of fiefdoms
ah to be so lucky...
> A lot of inconsistency is the result of unwillingness to fix other people's stuff
Agree, so we find it best to practice "no code ownership" or better yet "shared code ownership." So we try to think of it all as "our stuff" rather than "other people's stuff." Maybe you just joined the project, and are working around code that hasn't been touched in 5 years, but we're all responsible for improving the code and making it better as we go.
That requires a high trust environment; I don't know if it could work for Firefox where you may have some very part-time contributors. But having documented standards, plus clang-format and clang-tidy to automate some of the simpler things, also goes a long way.
Ironically, that's why it works for Firefox. Contributors follow a power law. There are a lot of one-shot contributors. They'll be doing mostly spot fixes or improvements, and their code speaks for itself. Very little trust is needed. We aren't going to be accepting binary test blobs from them. There are relatively few external contributors who make frequent contributions, and they've built up trust over time -- not by reporting to the right manager or being a friend of the CTO, but through their contributions and discussions. Code reviews implicitly factor in the level of trust in the contributor. All in all, the open nature of Firefox causes it to be fundamentally built on trust, to a larger extent than seems possible in most proprietary software companies. (There, people are less likely to be malicious, but for large scale refactoring it's about trusting someone's technical direction. Having a culture where trust derives from contribution not position means it's reasonable to assume that trusted people have earned that trust for reasons relevant to the code you're looking at.)
There are people who, out of the blue, submit large changes with good code. We usually won't accept them. We [the pool of other contributors, paid or not] aren't someone's personal code maintenance team. Code is a liability.
> But having documented standards, plus clang-format and clang-tidy to automate some of the simpler things, also goes a long way.
100% agree. It's totally worth it even if you disagree with the specific formatting decisions made.
Nice! We're still small so we somehow can keep that level of trust, but I always worry about how things may change for the worse as we grow. Mimicking the open source model as much as we can, even within a small private company, has worked well for us so far.
> > ... clang-format and clang-tidy to automate some of the simpler things, also goes a long way.
> 100% agree. It's totally worth it even if you disagree with the specific formatting decisions made.
So true! 5-6 years ago we had to make open source contributions to both clang-format and clang-tidy for several months to get them to support closer to our preferred style before we could get the "ok, close enough" buy-in across the company to implement automated formatting. (Mostly bug fixes for evidently rare flag combinations, but also a few small new features.)
In retrospect it was completely unnecessary - simply relying on automated formatting is sooo much better than any specifics of the formatting. I'm still glad we did though, as it made both tools better. We earned the maintainers' trust with a few early PRs, and remained active contributors for a while, but haven't contributed much lately.
(Posted on Firefox mobile... Thanks!)
As I understand, there is a balance between refactoring and adding new features. It’s up to the engineers to find a way to do both. Isn’t it also fair if engineers push sometimes back on management? Shouldn’t a civil engineer speak up if he/she thinks the bridge is going to collapse with the current design?
Neither do customers.
The product is an asset. Code is a liability.
— A minimal 30 LoC devserver function would serve a file from outside the current directory on developer’s machine, if said developer entered a crafty path in the browser. It suggested a fix that would almost double the linecount.
— A regex does not handle backslashes when parsing window.location.hostname (note: not pathname), in a function used to detect whether a link is internal (for statically generated site client-side routing purposes). The suggested fix added another regular expression in the mix and generally made that line, already suffering from poor legibility due to involving regular expressions in the first place, significantly more obscure to the human eye.
Here’s the fun thing: if I were concerned about my career and job security, I know I would implement every damn fix the bot suggested and would rate it as helpful. Even those that I suspect would hurt the project by making it less legible and more difficult to secure (and by developers spending time on things of secondary importance) while not addressing any actual attack vectors or those that are just wrong.
Security is no laughing matter, and who would want to risk looking careless about it in this age? Why would my manager believe that I, an ordinary engineer, know (or can learn) more about security than Github’s, Microsoft’s most sophisticated intelligence (for which the company pays, presumably, some good money)? Would I even believe that myself?
If all I wanted was to keep my job another year by showing increased output thanks to all the ML products purchased by the company, would I object to free code (especially if it is buggy)?
Engineering is fulfilling requirements within constraints. Good custom code might fit the bill. Bad might, too - unless it’s a part of requirements that it shouldn’t be bad. It usually isn’t.
That's not always been the case and came to be because people have died... Is anyone going to die if your codebase is an unmaintainable mess?
If VCs ever came to expect less than 90% of their investments to essentially go to zero, maybe that would change. But they make enough money off of dumb luck not leading to fatal irreversible decisions often enough to keep them fat and happy.
Two: All the stuff we aren't working on because we're working on stupid shit in painful ways is substantial.
When children do this it’s called Bidding. It’s supposed to be a developmental phase you train them out of. If Mom says no the answer is no. Asking Dad after Mom said no is a good way to get grounded.
Most shacks built in one's backyard do not pass any building codes. Or throwing a wooden plank over a stream somewhere.
Just like most software doesn't really risk anyone's life: the fact that your web site might go down for a bit is not at all like a bridge collapsing.
Companies do care about long term maintenance costs, and I've mostly been at companies really stressing over some quality metrics (1-2 code reviews per change, obligatory test coverage for any new code, small, iterative changes, CI & CD...), but admittedly, they have all been software shops (IOW, management understood software too).
True, but has drifted from the TFA's assertion about consistency.
As the thread has implied, it's already hard enough to find time to make small improvements. But once you do, get ready for them to be rejected in PR for nebulous "consistency" reasons.
You can refactor, but you're also wasting time optimizing code you don't need. A better approach is to sit down with rest of the company and start cutting away the bloat, and then refactor what's left.
Eventually we did retire the old system - while the new code base is much cleaner I'm convinced it would have been cheaper to just clean that code up in place. It still wouldn't be as clean as the current is - but the current as been around long enough to get some cruft of its own. Much of the old cruft was in places nobody really touched anymore anyway so there was no reason to care.
I saw one big rewrite from scratch. It was a multi-year disaster, but ended up working.
I was also told about an earlier big rewrite of a similar codebase which was a multi-year disaster that was eventually thrown away completely.
I did see one big rewrite that was successful, but in this case the new codebase very intentionally only supported a small subset of the original feature set, which wasn't huge to begin with.
All of this to say that I agree with you: starting from scratch is often tempting, but rarely smooth. If refactoring in place sounds challenging, you need to internalize that a full rewrite will be a few times harder, even if it doesn't look that way.
I wasted a lot of my time and came away barely the wiser, because the company is spiraling and has been for a while. Near as I can figure, the secret sauce was entirely outside of engineering. If I had to guess, they used to have amazing salespeople and whoever was responsible for that fact eventually left, and their replacement’s replacement couldn’t deliver. Last I heard they got bought by a competitor, and I wonder how much of my code is still serving customers.
90% of large software system replacements/rewrites are disasters. The size and complexity of the task is rarely well understood.
The number of people that have the proper experience to guide something like that to success is relatively small because they happen relatively rarely.
I worked with another contractor for a batshit team that was waiting for a rewrite. We bonded over how silly they were being. Yeah that’s great that you have a plan but we have to put up with your bullshit now. The one eyed man who was leading them kept pushing back on any attempts to improve the existing code, even widely accepted idioms to replace their jank. At some point I just had to ask him how he expected all of his coworkers to show up one day and start writing good code if he won’t let them do it now? He didn’t have an answer to that, and I’m not even sure the question landed. Pity.
The person who promised him the rewrite got promoted shortly before my contract was up. This promotion involved moving to a different office. I would bet good money that his replacement did not give that team their rewrite. They’re probably either still supporting that garbage or the team disappeared and someone else wrote a replacement.
That whole experience just reinforced my belief that the Ship of Theseus scenario is the only solution you can count on working. Good code takes discipline, and discipline means cleaning up after yourself. If you won’t do that, then the rewrite will fall apart too. Or flame out.
Generally agreed. I'm generally very bearish on large-scale rewrites for this reason + political/managerial reasons.
The trick with any organization that wants to remain employed is demonstrating progress. "Go away for 3 years while we completely overhaul this." is a recipe for getting shut down halfway through and reassigned... or worse.
A rewrite, however necessarily, must always be structured as multiple individual replacements, each one delivering a tangible benefit to the company. The only way to stay alive in a long-term project is to get on a cadence of delivering visible benefit.
Importantly doing this also improves your odds of the rewrite going well - forcing yourself to productionize parts of the rewrite at a a time validates that you're on the right track.
If management keeps making up deadlines without engineering input, then they get to apologize to the customer for being wrong. Being an adult means taking responsibility for your own actions. I can’t make a liar look good in perpetuity and it’s better to be a little wrong now than to hit the cliff and go from being on time to six months late practically overnight.
At least, that's what I teach our devs.
My compromise was to start editing Laravel and implementing optimisations and caching, cutting half a second on every request within a month of starting, and then rewriting crude DIY arithmetic on UNIX epoch into standard library date/time/period functions and similar adjustments. I very openly pushed that we should delete at least two hundred thousand lines over a year which was received pretty poorly by management. When I left in anger due to a googler on the board fucking up the organisation with their annoying vision where this monster was to become "Cloud Native" on GCP credits he had, a plan as bad as a full rewrite, it only took a few months until someone finally convinced them to go through with deletions and cut LoC in half in about six months.
I don't think they do containers or automatic tests yet, probably never will, but as of yet the business survives.
This business wouldn't exist if they attempted to follow your advice, because they weren't able and anyway didn't have the money to hire that many developers. There were a couple of subsystems they tried to implement the way you suggest, e.g. one for running certain background jobs.
It was a database table with one row per type of job and a little metadata like job status and a copy of the input. They started jobs by sending a HTTP request. This was a constant source of manual handling, because things started jobs and then crashed and never reset the status and things like that. You could respond that they should have used a message queue instead and so on, but the thing is, they didn't know how to build reliable distributed systems. Few developers do.
Imagine you're working on this or that feature, and find a stumbling block in the legacy codebase (e.g., a poorly thought out error handling strategy causing your small feature to have ripple effects you have to handle everywhere). IME, it's literally cheaper to fix the stumbling block and then implement the feature, especially when you factor in debugging down the line once some aspect of the kludgy alternative rears its ugly head. You're touching ten thousand lines of code anyway; you might as well choose do it as a one-off cost instead of every time you have to modify that part of the system.
That's triply true if you get to delete a bunch of code in the process. The whole "problem" is that there exists code with undesirable properties, and if you can remove that problem then velocity will improve substantially. Just do it Ship of Theseus style, fixing the thing that would make your life easier before you build each feature. Time-accounting-wise, the business will just see you shipping features at the target rate, and your coworkers (and ideally a technical manager) will see the long-term value of your contributions.
You're not supposed to ask. It's like a structural engineer asking if it's okay to spend time doing a geological survey; it's not optional. Or a CFO asking if it's okay to pay down high interest debt. If you're the 'engineer', you decide the extent it's necessary
The estimate was building in the time to get it done without breaking too much other stuff. For emergency things, Scotty would be dealing with that after the emergency.
If your captain is always requiring everything be done as an emergency with no recovery time, you've got bigger problems.
If I touch code that I am not supposed to touch or that does not relate to my direct task I will have HR talks. I'd be lucky to not get laid off for working on things that do not relate to my current task.
What if you spend a week or month refactoring something that needs a quick fix now and is being deleted in 1-2 years? That's waste, and if you went rogue, it's your fault. Besides, you always create added QA burden with large refactoring (yes even if you have tests), and you should not do that without a discussion first--even if you're the founder.
Communicate with your manager and (if they agree) VP if needed, and do the right thing at the right time.
Sure, if you're not sure if it's the right thing to do, talk to your manager or TL. A good engineering manager can help. If your manager "would never allow" it, they're not a good manager. Even for jobs much more menial than engineering, a good manager recognizes that autonomy/trust are critical for satisfaction and growth.
If you're working someplace where you're "not allowed" to make the changes you "wish you could," you're doing yourself a disservice. Find someplace where you're not only "allowed," but expected to have (or develop) the judgement required to make these decisions.
To be clear: "the business" expects (and in the medium/long term requires) engineers to make these decisions themselves. That is the job
The correct solution would of course rather be "you were the manager". :-(
Sometimes, but oftentimes that would involve touching code that you don't need to touch in order to get the current ticket done, which in turn involves more QA effort.
I worked on a Drupal site once where somebody had put business logic and database querying inside of template files.
Just because you can implement something without touching any other part of the codebase doesn’t mean that’s a good decision.
Unfortunately, this is how you often get even more inconsistent codebases that include multiple tenures' worth of different developers attempting to make it better and not finishing before they move on from the organization.
I've done several migrations of thing with dozens of unique bespoke usage patterns back to a nice consistent approach.
It sometimes takes a couple straight days of just raw focused code munging, and doesn't always end up being viable, but it's worth a shot for how much better a state it can leave things in.
I did have one bad experience where I ended up spending way too much time on a project like that, I think I made some mistakes with that one and got in a bit too deep. Luckily my team was very supportive and I was able to finish it and it's a lot better now than it was.
- coherency
- type safety (shout-out to dynamically typed languages that adapted static typing)
- concurrency
- simplicity
- (and more)
> If it's actually better, others will start following your lead.
A lot of people don't want to improve the quality in their output and for various reasons... some are happy to have something "to pay the bills", some don't want to use a programming language to its full extend, some have a deeply rooted paradigm that worked for 10 years already ("static types won't change that"), others are scared of concurrency etc. For some people there's nothing to worry about when a server can be blocked by a single request for 60 secs.
It depends.
If it's a self contained code base, automatic refactoring can safely and mechanically update code to be consistent (naming, and in some cases structurally).
If it's not self contained and you're shipping libraries, i.e. things depend on your code base, then it's more tricky, but not impossible.
"Lean on tooling" is the first thing you should do.
Then you get people together to agree what consistent looks like.
I find the easiest way to do this is to borrow someone else's publicly documented coding conventions e.g. Company ABC.
Then anyone disagreeing isn't disagreeing with you, they're disagreeing with Company ABC, and they (and you) just have to suck it up.
From there on in, you add linting tools, PR checks etc for any new code that comes in.
> your work is an experiment and you will later revise it
I advise against this if you have not been allocated the time or budget to revise the code. For one thing, you're lying. For another thing, were you hired to be a part of the contributing team or hired to be part of a research team doing experiments in the contributing team's codebase and possibly deploying your experiment on their production systems?I would immediately push back on any new guy who says this, no matter how confident he seems that his way is the right way.
We are making brand new things here and not being in an assembly line coming up with the very same thing dozens to million times. We are paid to make new products never existed, having novelty elements in it desired to be a bigger extent than not!
Those pretending knowing exactly what they are doing are lying!
Of course we are speculating here about the size of novelty content to a differing extent, which is never 0% and never 100%, but something inbetween. But those pushing back on those at least trying to revise the work - putting emphasis on it -, deserve no-one coming to them to be pushed back (at least for the inability of allocating resources for this essential activity of development. Development!).
(tried to mimic the atmosphere of the message, sorry if failed)
Where is the incentive to go the extra mile here? Do you eventually put up with enough legacy mess, pay your dues, then graduate to the clean and modern code bases? Because I don't see a compelling reason you should accept a job or stay in a code base that's a legacy mess and take on this extra burden.
> Do you eventually put up with enough legacy mess, pay your dues, then graduate to the clean and modern code bases?
Yeah, that's called retirement. The point of the article isn't that whatever you're conforming to in the legacy codebase is worth preserving. The point is that whatever hell it is, it'll be a worse hell if you make it an inconsistent one.
I know this counter argument sounds crabby, but going along with existing conventions on a legacy code base might be a lot of work for someone who's only familiar with more recent practices. It's not something you can passively do. Plus having to adopt these older patterns won't help your resume, which is an opportunity cost we are absorbing for free (and shouldn't have to)
The biggest challenge is that it used to be maintained by a large team and now there are just 2 developers. Also, the dev environment isn't fully automated so it takes like 20 minutes just to launch all the services locally for development. The pace of work means that automating this hasn't been a priority.
It's a weird experience working on such project because I know for a fact that it would be possible to create the entire project from scratch using only 1 to 3 services max and we would get much better performance, reliability, maintainability etc... But the company wouldn't be willing to foot the cost of a refactor so we have to move at steady snail's pace. The slow pace is because of the point mentioned in the article; the systems are all intertwined and you need to understand how they integrate with one another in order to make any change.
It's very common that something works locally but doesn't work when deployed to staging because things are complicated on the infrastructure side with firewall rules, integration with third-party services, build process, etc... Also, because there are so many repos with different coding styles and build requirements, it's hard to keep track of everything because some bug fixes or features I implement touch on like 4 different repos at the same time and because deployment isn't fully automated, it creates a lot of room for error... Common issues include forgetting to push one's changes or forgetting to make a PR on one of the repos. Or sometimes the PR for one of the repos was merged but not deployed... Or there was a config or build issue with one of the repos that was missed because it contained some code which did not meet the compatibility requirements of that repo...
I've worked on a lot of projects in my career and this one has one of the most complex/chaotic architectures I've seen yet. Surprisingly, it recovers from service downtimes and reboots pretty well. The main issues are maintainability, deployment and configuration. It's often the case that local env does not match staging when building features.
If I could give one "defensive coding" tip, it would be for seniors doing the design to put in road blocks and make examples that prevent components from falling for common traps (interdependency, highly variant state, complex conditions, backwards-incompatibility, tight coupling, large scope, inter-dependent models, etc) so that humans don't have to remember to avoid those things. Make a list of things your team should never do and make them have a conversation with a senior if they want to do it anyway. Road blocks are good when they're blocking the way to the bad.
Starting with good design leads to continuing to follow good design. Not starting with good design leads to years of pain. Spend a lot more time on the design than you think you should.
This is well intentioned. But in a large old codebase finding things to improve is trivial - there are thousands of them. Finding and judging which things to improve that will actually have a real positive impact is the real skill.
The terminal case of this is developers who in the midst of another task try improve one little bit but pulling on that thread leads to them attempting bigger and bigger fixes that are never completed.
Knowing what to fix and when to stop is invaluable.
https://www.joelonsoftware.com/2000/04/06/things-you-should-...
I've seen too many needless errors after someone happened to "fix a tiny little thing" and then fail to deliver their original task and further distract others trying to resolve the mistake. I believe clear intention and communication are paramount. If I want to make something better, I prefer to file a ticket and do it with intention.
Been there, been guilty of that at the tail end of my working life. In my case, looking back, I think it was a sign of burnout and frustration at not being able to persuade people to make the larger changes that I felt were necessary.
If you have to work on such projects, there are two things to keep in mind: consistency and integration tests.
Yes. I remember working in a 700,000+ line PHP code base that around 30% unit test coverage and an unknown percentage of e2e test coverage. I kept my changes very localised because it was a minefield.
Also, the unit tests didn't do teardown so adding a new unit test required you to slot it in with assertions accounting for the state of all tests run so far.
Somewhere between 100 and 1000 engineers working on the same codebase
The first working version of the codebase is at least ten years old
All of these things, or any of them?
In any event, though I agree with him about the importance of consistency, I think he's way off base about why and where it's important. You might as well never improve anything with the mentality he brings here.
To make your example match, it would be more so that there are two teams A and B, Team A already created a framework and integration for logging across the entire application. Team B comes along and doesn't realize that this framework exists, and also invents their own framework and integration for logging.
This is the type of consistency that the author points to, because Team B could have looked at other code already referencing and depending on the logging framework from Team A and they would have avoided the need to create their own.
This is ridiculous. Even if you want to ignore the kernel, there are plenty of "large established codebases" in the open source world that are at least 20 years old. Firefox, various *office projects, hell, even my own cross-platform DAW Ardour is now 25 years old and is represented by 1.3M lines of code at this point in time.
You absolutely can practice it on open source. What you can't practice dealing with is the corporate BS that will typically surround such codebases. Which is not to say that the large established codebases in the open source world are BS free, but it's different BS.
But I agree that there are absolutely eligible open source codebases that could be used to practice beforehand. I'd better; I work on Firefox. It is not a small thing to dive into, but people successfully do, and get solid experience from it.
Working in a large dev team, focusing on a small feature and having a separate product manager and QA team makes it easier to handle the scale though. Development is very slow but predictable. In my case, the company had low expectations and management knew it would take several months to implement a simple form inside a modal with a couple of tabs and a submit button. They hired contractors (myself included), paying top dollar to do this; for them, the ability to move at a snail's pace was worth it if it provided a strong guarantee that the project would eventually get done. I guess companies above a certain size have a certain expectation of project failure or cancellation so they're not too fussed about timelines or costs.
It's shocking coming from a startup environment where the failure tolerance is 0 and there is huge pressure to deliver on time.
Getting a PR reviewed in 3 days is an achievement!
In startup land, I got my code reviewed by the CTO within a few hours. It was rare if it required a whole day like if he was too busy.
In my current company, the other dev usually reviews and merges my code to staging within an hour. Crazy thing is we don't even write tests. A large project with no tests.
Some PRs are faster but some are slower as well.
I cannot resonate with this. Having worked with multiple large code bases 5M+, splitting the codebase is usually a reflection of org structure and bifurcation of domain within eng orgs. While it may seem convoluted at first, its certainly doable and gets easier as you progress along. Also, code migrations of this magnitude is usually carried out by core platform oriented teams, that rarely ship customer-facing features.
I know, that's a pretty specific "hypothetical," but that experience taught me that copying for the sake of consistency only works if you actually understand what it is you're copying. And I was also lucky that the senior engineer was nice about it.
Working on an old shitty codebase is one thing. Being told you have to add to the shit is soul crushing.
Sharing code, ideas, good and bad etc is possible - but it requires deliberate effort
Why not? There are open source projects that are many years old with millions lines of code and many developers.
The project I work on has had a thousand+ contributors of extremely varied skill levels, and is over 15 years old. Many lines of code, but I'm not going to count them because that's a terrible metric.
This fits all of the criteria outlined in the article. Sure, it might not apply to portfolio project #32 but there's plenty of open source repositories out there that are huge legacy codebases.
Also if the system was actually capable of maintaining consistency then it would never have got that large in the first place. No-one's actual business problem takes 5M lines of code to describe, those 5M lines are mostly copy-paste "patterns" and repeated attempts to reimplement the same thing.
The issue isn’t vulnerability's, it’s dependency hell where all your packages are constantly fighting each other for specific versions. Although some languages handle this better than others.
This is normally a direct result of trying to limit the number of dependencies. People are much more able to use small, focused dependencies that solve specific problems well if you have a policy that permits large numbers of dependencies.
As TFA points out, you might find out that you've made your little corner worse, actually.
I'm pretty sure this is trivially untrue. Any OS is probably more than 5M lines (Linux - 27.8 lines according to a random Google Search). Facebook is probably more lines of code. Etc.
Linux is notoriously fragmented/duplicative, and an OS isn't the solution to anyone's actual business problem. A well-factored solution to a specific problem would be much smaller, compare e.g. QNX.
> Facebook is probably more lines of code.
IIRC Facebook is the last non-monorepo holdout among the giants, they genuinely split up their codebase and have parts that operate independently.
Does Facebook have more than 5M lines of code now? I'm sure they do. Does that actually result in a better product than when it was less than 5M lines of code? Ehhhh. Certainly if we're talking about where the revenue is being generated, as the article wants to, then I suspect at least 80% of it is generated by the <5M that were written first.
So I mean yeah, on some level solving the business problem can take as many lines as you want it to, because it's always possible to add some special case enhancement for some edge case that takes more lines. But if you just keep growing the codebase until it's unprofitable then that's not actually particularly valuable code and it's not very nice to work on either.
As do many tens of thousands of applications that are the backbone of services we all rely on. The systems that run banks, that run power plants, the routers that make up the backbone of the internet, etc.
Again, I agree with some of the spirit of what you're saying... but there's also a tendency of many developers (like myself) to only think of shiny new products, or to only think about the surface-level details of most business problems. You write:
> So I mean yeah, on some level solving the business problem can take as many lines as you want it to, because it's always possible to add some special case enhancement for some edge case that takes more lines. But if you just keep growing the codebase until it's unprofitable then that's not actually particularly valuable code and it's not very nice to work on either.
I think this misunderstands how the companies that have stayed in business for so long have done so. Excel is the software we all use every day because it kept adding more and more features, stealing the best ideas from new products that tried to innovate. It's still doing so, though obviously to a lesser extent.
Not convinced. I think a lot of companies keep adding features because they don't know how to do anything else - adding new features is how managers get promoted, so it's what devs get rewarded for, so it keeps happening even as the RoI drops lower and lower (and in many cases eventually goes negative - but this is masked because the core idea was good enough then the product as a whole is still profitable). At "best" a bunch of esoteric features act as a de facto moat that can shut out the competition in a tickbox-feature-comparison, rather than being something that's actually adding value in day-to-day use.
E.g. how far are you taking this? Excel was released in 1985. Do you think after 5 years there was no more business value? VBA, allowing scripting, wasn't released until 1993. Do you think think Excel circa 2000 is as good as Excel today is? I'm sure it's roughly similar, but I'm just as sure there are many features that I would miss.
And that's not even getting to the fact that Excel lost a ton of marketshare to Google Sheets, because it was too late to adopt what is Sheet's biggest feature - collaborative editing. I'm sure in 2005 you could've made the case that Excel already has all the business value it needs, and trying to add something like collaborative editing is just a corporate waste of time, a totally "esoteric" feature that no one really needs and doesn't provide value, and is only there to get managers promoted or to get devs working on something "cool". Yet arguably it was a critical thing they needed to get done and didn't.
Anyway, I'm sure you're right a lot of the time. I just think blankly applying this statement is very wrong, when real people in the real world are sometimes working on systems, 10, 20 or sometimes even 50 years old.
Mistakes engineers make in large established codebases - https://news.ycombinator.com/item?id=42570490 - Jan 2025 (3 comments)
> Somewhere between 100 and 1000 engineers working on the same codebase
> The first working version of the codebase is at least ten years old
> The cardinal mistake is inconsistency
Funny enough, the author notes the problem on why consistency is impossible in such a project and the proceeds to call it the cardinal mistake.
You cannot be consistent in a project of that size and scope. Full stop. Half those engineers will statistically be below average and constantly dragging the codebase towards their skill level each time they make a change. Technology changes a lot in ten years, people like to use new language features and frameworks.
And the final nail in the coffin: the limits of human cognition. To be consistent you must keep the standards in working memory. Do you think this is possible when the entire project is over a million LOC? Don't be silly.
There's a reason why big projects will always be big balls of mud. Embrace it. http://www.laputan.org/mud/
Have of the point of this article is that people need to suck it up and not use new frameworks sometimes...
There are times for coding in a way you, personally, find pleasing; and there are times when:
> So you should know how to work in the “legacy mess” because that’s what your company actually does. Good engineering or not, it’s your job.
A quote from the 'big ball of mud':
> Sometimes it’s just easier to throw a system away, and start over.
It is easier, but it's also a) not your decision and b) enormously disruptive and expensive.
How do you tell if you're in the 'naive and enthusiastic but misguided' camp, or in the 'understand the costs and benefits and it's worth a rewrite' camp?
Maybe the takeaway from the OP's post really should be this one:
> If you work at a big tech company and don’t think this is true, maybe you’re right, but I’ll only take that opinion seriously if you’re deeply familiar with the large established codebase you think isn’t providing value.
^ because this is the heart of it.
If you don't understand, or haven't bothered to understand, or haven't spent the time understanding what is already there, then you are not qualified to make large scale decisions about changing it.
1. The three C’s: Clarity always, Consistency with determination, Concision when prudent. 2. Keep the pain in the right place. 3. Fight entropy!
So in the context of the main example in this article, I would say you can try to improve clarity by e.g. wrapping the existing auth code in something that looks nicer in the context of your new endpoint but try very hard to stay consistent for all the great reasons the article gives.
If I just do this simple thing in my mail client. ... or server ... mail security and spam and whatever else will be solved.
as someone working on a 60M codebase, we have very different understandings of the word "large". My team is leaning more towards "understand the existing code, but also try to write maintainable and readable code". Everything looks like a mess built by a thousand different minds, some of them better and a lot of them worse, so keeping consistency would just drag the project deeper into hell.
> Somewhere between 100 and 1000 engineers working on the same codebase
> The first working version of the codebase is at least ten years old
That's 5,000 to 50,000 lines of code per engineer. Not understaffed. A worse problem is when you have that much code, but fewer people. Too few people for there to be someone who understands each part, and the original authors are long gone. Doing anything requires reverse engineering something. Learning the code base is time-consuming. It may be a year before someone new is productive.
Such a job may be a bad career move. You can spend a decade learning a one-off system, gaining skills useless in any other environment. Then it's hard to change jobs. Your resume has none of the current buzzwords. This helps the employer to keep salaries down.
Maybe.
I spent most of my career at a small mom and pop shop where we had single-digit MLOC spanning 20-25 years but only 15-20 engineers working at any given time. This wasn't a problem, though, because turn-over was extremely low (the range in engineer count was mostly due to internships), so many of the original code owners were still around, and we spent some effort to spread out code ownership such that virtually all parts were well understood by at least 2 people at any given moment.
If anything, I rather shudder at the thought of working somewhere that only has ~5M lines of code split up amongst 100 (and especially 1000) engineers over a span of 10 years. I can't imagine producing only 5-50 KLOC over that time, even despite often engaging in behind-the-scenes competition with colleagues over who could produce pull requests with the least (and especially negative) net LOC.
> Your resume has none of the current buzzwords.
That's one of my bigger pet peeves about software development, actually.
While you probably didn't mean it this way, over the years, I encountered a number of people who'd consistently attempt to insert fad technologies for the sake of doing so, rather than because they actually provided any kind of concrete benefit. Quite the contrary: they often complicated code without any benefit whatsoever. My more experienced colleagues snidely referred to it as resume-driven development.
I can't hate people doing this too much, though, because our industry incentivizes job hopping.
You lost me on how this helps employers keep salaries down. My value is greater by being able to do such things, not less. If I can work on modern stacks, legacy stacks, enterprise platforms, and am willing to learn whatever weird old tech you have, that does not decrease my salary.
> Being able to navigate and reverse engineer undocumented legacy code in a non-modern stack is a skill set in and of itself.
And I find that it's a pretty rare skill to find.
To quote Wikipedia:
> Common law is deeply rooted in stare decisis ("to stand by things decided"), where courts follow precedents established by previous decisions.[5] When a similar case has been resolved, courts typically align their reasoning with the precedent set in that decision.[5] However, in a "case of first impression" with no precedent or clear legislative guidance, judges are empowered to resolve the issue and establish new precedent.
millions of lines of code itself is a code smell. some of the absolute worst code i have to work with comes from industry standard crapware that is just filled with lots of do nothing bug factories. you gotta get rid of them if you want to make it more stable and more reliable.
however... i often see the problem, and its not "don't do this obvious strategy to improve qol" its "don't use that bullshit you read a HN article about last week"
i suspect this is one of those.
New feature where you compare 2 records ? Too bad, the UI is going to show them both then go back to the first one in a epileptic spasm.
Sometimes, things are just that bad enough that keeping it consistent would mean producing things that will make clients call saying it's a bug. "No sorry, it's a feature actually".
The codebase is already a nasty surprise for people coming in from the outside with experience or people that are aware of current best practices or outside cultures, therefore, the codebase is already a mess and you cannot take advantage of future improvements without a big bang since that would be inconsistent.
How to keep your code evolving in time and constantly feeling like it is something you want to maintain and add features to is difficult. But constantly rewriting the world when you discover a newer slight improvement will grind your development to a halt quickly. Never implementing that slight improvement incrementally will also slowly rot your feelings and your desire to maintain the code. Absolute consistency is the opposite of evolution: never allowing experimentation; no failed experiments mean no successes either. Sure, too much experimentation is equally disastrous, but abstinence is the other extreme and is not moderation.
Keeping the code base tidy is glue work, so you should only do enough of it to ship features. So maybe these are not "mistakes" but rather tactical choices made by politically smart engineers focused on shipping features.
My experience is the opposite of the author's: in terms of their revealed preferences, line workers care far more about the company and its customers than managers and executives do, precisely because it's far easier for the latter to fail upwards than the former.
This reads like an admission of established/legacy codebases somewhat sucking to work with, in addition to there being a ceiling for how quickly you can iterate, if you do care about consistency.
I don't think that the article is wrong, merely felt like pointing that out - building a new service/codebase that doesn't rely on 10 years old practices or code touched by dozens of developers will often be far more pleasant, especially when the established solution doesn't always have the best DX (like docs that tell you about the 10 abstraction layers needed to get data from an incoming API call through the database and back to the user, and enough tests).
Plus, the more you couple things, the harder it will be to actually change anything, if you don't have enough of the aforementioned test coverage - if I change how auth/DB logic/business rules are processed due to the need for some refactoring to enable new functionality, it might either go well or break in hundreds of places, or worse yet, break in just a few untested places that aren't obvious yet, but might start misbehaving and lead to greater problems down the road. That coupling will turn your hair gray.
> Large codebases are worth working in because they usually pay your salary
Though remember that greenfield projects might also pay your salary and be better for your employability and enjoyment of your profession in some cases. They might be the minority in the job market, though.
docs.. lol
The author cites some imaginary authentication module where "bot users" are a corner case, and you can imagine how lots of places in the software are going to need to handle authentication at some point
Say you don't use the helper function. Do you think you've avoided coupling?
The thing is, you're already coupled. Even if you don't use it
Fundamentally, at the business level, your code is coupled to the same requirements that the helper function helps to fullfil.
Having a separate implementation won't help if one day the requirements change and we suddenly need authentication for "super-bot" users. You'll now need to add it to two different places.
> The thing is, you're already coupled. Even if you don't use it.
In the case of using multiple services, your auth service would need some changes. Now, whether those need to be done in a somewhat small service that's written in Spring Boot, launches in 5 seconds and can be debugged pretty easily due to very well known and predictable surface area, or a 10 year old Spring codebase that's using a bunch of old dependencies, needs a minute or two just to compile and launch and has layers upon layers of abstractions, some of which were badly chosen or implemented, but which you would still all need to utilize to stay consistent, making your changes take 2-5x as long to implement and then still risk missing some stuff along the way... well, that makes a world of difference. Breaking up a codebase that is a million lines big wouldn't make any of the business requirements not be there, but might make managing the particular parts a bit easier.
The impact of both old code and heavily bloated codebases is so great to the point where some people hate Java because a lot of projects out there are enterprise brownfield stuff with hopelessly outdated tech stacks and practices, or like other languages just because they don't have to deal with that sort of thing: https://earthly.dev/blog/brown-green-language/
That's even before you consider it from the egoistic lens of a software dev that might want to ship tangible results quickly and for whom a new service/stack will be a no-brainer, the team lead whose goals might align with that, or even the whole business that would otherwise be surprised why they're slower in regards to shipping new features than most of their competitors. Before even considering how larger systems that try to do everything end up, e.g. the likes of Jira and DB schemas that are full of OTLT and EAV patterns and god knows what else.
If you find a codebase that is pristine, best of luck on keeping it that way. Or if you have to work in a codebase that's... far less pleasant, then good luck on justifying your own time investment in the pursuit of long term stability. Some will, others will view that as a waste, because they'll probably be working in a different company by the time any of those consequences become apparent. Of course, going for a full on microservices setup won't particularly save anyone either, since you'll still have a mess, just of a different kind. I guess that the main factor is whether the code itself is "good" or "bad" at any given point in time (nebulous of a definition as it may be), except in my unfortunate experience most of it is somewhere in the middle, leaning towards bad.
Writing code that is easy to throw away and replace, in addition to being as simple as you can get away with and with enough documentation/tests/examples/comments might be a step in the right direction, instead of reading an OOP book and deciding to have as many abstractions and patterns as you can get. Of course, it doesn't matter a lot if you need to call a helper method to parse a JWT or other comparably straightforwards code like that, but if you need to setup 5 classes to do it, then someone has probably fucked up a little bit (I know, because I have fucked that up, bit of a mess to later develop even with 100% test coverage).
I used to work within the Chromium codebase (at the order of 10s of million LOC) and the parts I worked in were generally in line with Google's style guide, i.e. consistent and of decent quality. The challenge was to identify legacy patterns that shouldn't be imitated or cargo-culted for the sake of consistency.
In practice that meant having an up to date knowledge of coding standards in order to not perpetuate anti-patterns in the name of consistency.
This is only until your new upstart competitor comes along, rewrites your codebase from scratch and runs you out of the market with higher development velocity (more features).
This almost never happens. It takes a long time and huge amounts of money to come up to parity, and in the meantime, the legacy org is earning money on the thing you're trying to rewrite.
It's more often the case that the technology landscape shifts dramatically helping a niche player (who has successfully saturated the niche) become mainstream or more feasible. Take, for example, Intel. Their CISC designs and higher power consumption is now being challenged by relatively simpler RISC and lower power designs. Or Nvidia with its GPUs. In both cases, it's the major shifts that have hurt Intel. No one can outcompete Intel in making server CPUs of old, if they are starting from scratch.
Take another example, this time, of a successful competitor (of sorts). Oracle vs Postgres. Same deal, except that Postgres is the successor of Ingres (which doesn't exist anymore), and was developed at Berkeley and was open-source (i.e., it relied upon the free contributions of a large number of developers). I doubt that another proprietary database has successfully challenged Oracle. Ask any Oracle DB user, and you will likely get the answer that other databases are a joke compared to what it offers.
Taks as an example - for some reason you need to update an internal auth middleware library, or a queue library - say there is a bug, or a flaw in design that means it doesn't behave as expected in some circumstances. All of your services use it.
So someone comes along, patches it, makes the upgrade process difficult / non-trivial, patches the one service they're working on, and then leaves every other caller alone. Maybe they make people aware, maybe they write a ticket saying "update other services", but they don't push to roll out the fixed version in the things they have a responsibility for.
Tooling is important too. IDE:s are great, but one should also use standalone static analysis, grepping tools like ripgrep and ast-grep, robust deterministic code generation, things like that.
Culture tends to flow from the top. If it's very expedient at the top then the attitude to code will be too.
You get stuck in the "can't do anything better because I cannot upgrade from g++-4.3 because there's no time or money to fix anything, we just work on features. Work grinds to a near halt because the difficulties imposed by tech debt. The people above don't care because they feel they're flogging a nearly-dead horse anyhow or they're just inappropriately secure about its market position. Your efforts to do more than minor improvements are going to be a waste.
Even in permissive environments one has to be practical - it's better to have a small improvement that is applied consistently everywhere than a big one which affects only one corner. It has to materially help more than just you personally otherwise it's a pain in the backside for others to understand and work with when they come to do so. IMO this is where you need some level of consensus - perhaps not rigid convention following but at least getting other people who will support you. 2 people are wildly more powerful and convincing than 1.
The senior programmers are both good and bad - they do know more than you and they're not always wrong and yet if you're proposing some huge change then you very likely haven't thought it out fully. You probably know how great it is in one situation but not what all the other implications are. Perhaps nobody does. The compiler upgrade is fine except that on windows it will force the retirement of win2k as a supported platform .... and you have no idea if there's that 1 customer that pays millions of dollars to have support on that ancient platform. So INFORMATION is your friend in this case and you need to have it to convince people. In the Internet world I suppose the equivalent question is about IE5 support or whatever.
You have to introduce ideas gradually so people can get used to them and perhaps even accept defeat for a while until people have thought more about it.
It does happen that people eventually forget who an idea came from and you need to resist the urge to remind them it was you. This almost never does you a favour. It's sad but it reduces the political threat that they feel from you and lets them accept it. One has to remember that the idea might not ultimately have come from you either - you might have read it in a book perhaps or online.
At the end, if your idea cannot be applied in some case or people try to use it and have trouble, are you going to help them out of the problem? This is another issue. Once you introduce change be prepared to support it.
In other words, I have no good answers - I've really revolutionised an aspect of one big system (an operating system) which promptly got cancelled after we built the final batch of products on it :-D. In other cases I've only been able to make improvements here and there, in areas where others didn't care too much.
The culture from the top has a huge influence that you cannot really counter fully - only within your own team sometimes or your own department and you have to be very careful about confronting it head on.
So this is why startups work of course - because they allow change to happen :-)
One small point - consistency is a pretty good rule in small codebases too, for similar reasons. Less critical, maybe, but if your small codebase has a standard way of handling e.g. Auth, then you don't want to implement auth differently, for similar reasons (unified testing, there might be specialized code in the auth that handles edge cases you're not aware of, etc.)
Balancing the two is not easy, and often if you do not have time, you are forced to drop your strong principles.
Let me do a simple example.
Imagine a Struts2 GUI. One day your boss ask you to do upgrade it to fancy AJAX. It is possible, for sure, but it can require a lot of effort, and finding the right solution is not easy,
Writing code is not like in real life where herd mentally usually saves your life. Go ahead and improve the code, what helps is tests... but also at least logging errors, and throwing errors. Tests and errors go hand in hand. Errors are not your enemy, errors helps you improve the program.
IMO software development is so diverse and complex that universal truths are very very rare.
But to us programmers, anything that promises to simplify the neverending complexity is very tempting. We want to believe!
So we're often the equivalent of Mike Tyson reading a book by Tiger Woods as we look down a half-pipe for the first time. We've won before and read books by other winners, now we're surely ready for anything!
Which leads to relational data stored in couchDB, datalayers reimplemented as microservices, simple static sites hosted in kubernetes clusters, spending more time rewriting tests than new features, and so on.
IMO, most advice in software development should be presented as "here's a thing that might work sometimes".
OP also states that in order to 'successfully' split a LEC you need to first understand it. He doesn't define what 'understanding the codebase' means but if you're 'fluent' enough you can be successful. My team is very fluent in successfully deploying our microfrontend without 'understanding' the monstrolith of the application.
I would even go out and make the law a bit more general: any codebase will be both in a consistent and inconsistent state. If you use a framework, library, or go vanilla, the consistency would be the boilerplate, autogenerated code, and conventional patterns of the framework/library/programming language. But inconsistency naturally crops up because not all libraries follow the same patterns, not all devs understand the conventional patterns, and frameworks don't cover all use cases (entropy increases after all). Point being, being consistent is how we 'fight' against entropy, and inconsistency is a manifestation of increasing entropy. But there is nothing that states that all 'consistent' methods are the same, just that consistency exists and can be identified but not that the identified consistency is the same 'consistency'. And taking a snapshot of the whole you will always find consistent & inconsistent coexisting
First of all, consistency does not matter at all, ever. THat's his main thesis so it's already wrong. Furthermore, all his examples are backwards. If you didn't know the existence of "bot" users, you probably don't want your new auth mechanism to support them. Otherwise, the "nasty surprise" is the inverse of what he said: not that you find you don't support bot users, but you find out that you do.
Build stuff that does exactly what you want it to do, nothing more. This means doing the opposite of what he said. Do not re-use legacy code with overloaded meanings.
Can you say more about this? Because I strongly disagree with your assertion.
This is also confusing to me. In a multi-million line codebase, it's extremely difficult to find an actual place where you have zero side effects with ANYTHING you write.
Code bases where devs pick a different framework or library for every little thing are a nightmare to maintain. Agreed on standards is what gets your team out of the weeds to work on a higher and more productive level.
So lots of nitpicking on irrelevant stuff - keep files under 50 lines - that is silly consistency of little minds.
Author of the post fortunately writes from experience perspective with architectural examples so I ca write that it is good article.
(this is a half-joke... iykyk)
- Consistency (fully agree with the article here)
- Control
Control to me means that you have to work extremely hard to lose the ability to change the parts you care about. For example:
- Do not leak libraries and frameworks far into your business logic. At some point you want to introduce a new capabilty but say the class/type you re-used from a library makes it really awkward. Now you are faced with a huge refactor. The more logic, the purer and simpler the code should be. Ideally stdlib only.
- Do not build magic, globally shared test harnesses. Helpers yes, but if you give up control over the environment a test runs is / setting up fixtures, test data etc. you will run into a world of pain due to dependencies between tests and especially the test data.
- Do not let libraries dictate your application architecture. E.g. I always separate the web framework layer (controllers, views etc.) from the service and data layers.
- Consistency plays a major part here. If you introduce 3 libraries to do the same thing you have basically given up control over that dependencies and refactors in the future will be much harder.
It's not always 100%, but in general the fewer the dependencies the better the code.