Software Design Is Knowledge Building
419 points by signa11 10 days ago | 110 comments
  • chrisweekly 9 days ago |
    Great article, +1 Insightful.

    > the ultimate goal of software design should be (organizational) knowledge building

  • smikhanov 9 days ago |
    Good article, both in spirit and factually.

    One thing to add: the author talks about reviving a system as “slow and difficult process”, and it is. However, the concrete example described is not worthy of hand-wringing of this kind: a system that could have been built by a single competent engineer in 6 months (inevitably of alpha quality, at best), could be resurrected by a competent team of several programmers and brought to, say, beta quality, while keeping the lights on their alpha system on in how long? Let’s say, 9-12 months. No biggie, really.

    Most companies routinely discard man-years of programmer’s effort, so those 9-12 months are likely just a blip in the lifetime of that firm.

  • uludag 9 days ago |
    I find this article spot on and resonates with what I've experienced.

    The article mentions Zach Tellman's newsletter "Explaining Software Design" (https://explaining.software/) which I highly recommend reading. I have found his works to provide deep insight into the process of software design.

  • nosefurhairdo 9 days ago |
    I've handed off a few services I built with minimal oversight or documentation. The receiving teams have been able to make changes without my involvement and everyone is happy.

    I believe the only reason I've been successful in this is because I agonize over simplicity. There are times during the development of any project where one might be tempted to hack around an issue, or commit the ugly code that seems to work. These are the rough edges that inheritors of a codebase use as evidence that a blank slate would be preferable. They're also the bits where the underlying business logic becomes murky. My goal is for the code to be so clear that documentation would feel redundant.

    This approach of course takes more time and requires that your management trusts you and is willing to compromise on timelines. It's extremely rewarding if you can sell it and deliver.

    • MrMcCall 9 days ago |
      Documentation is useful, when done well. The code is always the authority, and there are very many ways the correct logic can be constructed. How it is constructed is the difference between good enough and excellent.

      And getting "compromise on timelines" is a most sublime political art. It requires the combination of both a humble, competent manager and an established, successful engineer worthy of trust.

      Congratulations on your success on those two varied fronts!

      • chrsig 9 days ago |
        I think it depends on who the documentation is intended for. I often let public facing documentation be the source of truth for expected behavior unless it's infeasible to coerce the system to that behavior. If the latter does occur, the documentation gets updated.

        If the question is what does the software actually do, then of course the code, toolchain, and runtime are the authority.

        • MrMcCall 9 days ago |
          Good point. I was only speaking to targeting other developers.
      • nosefurhairdo 9 days ago |
        Thank you! I am indeed very fortunate to have a great manager.
      • t43562 8 days ago |
        It needs to answer the "why" question most IMO. I can read code and see "how" and perhaps guess "what" but the "why" is missing.

        It also doesn't have to be that detailed - just a one line comment at the top of a file saying why it's there and what for can make an immense difference to the time it takes to understand code.

        Class comments are great too if they have in them everything that's NOT in a ChatGPT summary :-). i.e. I can paste code into ChatGPT myself to get a summary if I really wanted to so I don't need that - but I need all the things it doesn't tell you which is basically why the class exists and what it's intended for.

        The lower down the hierarchy it goes the less comments matter IMO.

    • mjr00 9 days ago |
      I have the same experience, and I agree that simplicity leads to success. The more things software can do, the harder it is to reason about what it's supposed to do. It's very much the IQ bell curve meme: junior developers only solve the problem at hand, mid-level developers build powerful, but complex frameworks which can solve the problem at hand but also potential future problems, and senior "X10" developers only solve the problem at hand.

      > There are times during the development of any project where one might be tempted to hack around an issue, or commit the ugly code that seems to work. These are the rough edges that inheritors of a codebase use as evidence that a blank slate would be preferable.

      Yes, one thing I've learned is to never underestimate the power of inertia in a codebase. When adding functionality, 99% of developers will go for the path of least resistance, which is mimicking whatever patterns already exist. To loop back to the article, this is often due to lack of full understanding; the default assumption is that because something is written in a certain way, that it's the best way. This isn't true; it may not even be the correct way! But copying what already exists has an element of safety built into it, without needing to spend the effort to deeply understand existing code (which tends to be developers' least favorite activity).

      So if you put in an ugly hack, or have a code structure which doesn't make sense, expect that to persist for years, or decades.

      • tkiolp4 9 days ago |
        I don’t think it’s up to the developers only to decide. If we are working in a sprint, if all my manager cares about is “shipping impact”, then I’m not going to spend time on things that won’t benefit me in my performance review. I’ll take the shortcuts. Now, if management knew what we know, sure certainly everyone would benefit from that… but that’s not the real world.
      • anal_reactor 9 days ago |
        > without needing to spend the effort to deeply understand existing code

        Or, more importantly, explain to others why you're deviating from "standard"

      • t43562 8 days ago |
        Deviating from the "standard" can make it harder for another maintainer to understand what's going on.

        IMO setting up reasonable patterns for others to follow is part of a good design. I'm not saying that I personally am great at it - it's an ideal!

        I think you are right though - very non-understandable things tend to persist because nobody wants to touch them.

        • red_admiral 8 days ago |
          > IMO setting up reasonable patterns for others to follow is part of a good design.

          Which is why we used to study a book called Design Patterns.

    • worik 9 days ago |
      > I've handed off a few services I built with minimal oversight or documentation. The receiving teams have been able to make changes without my involvement and everyone is happy.

      I struggle to believe this. Perhaps my personal situation, inheriting a 150k line embedded C programme, which started sprouting weird bugs when ported from X86 -> ARM.

      > minimal oversight or documentation

      Why? Why do you not have documentation?

      > I've been successful in this is because I agonize over simplicity

      I will break this down: "I've been successful in this " I do not believe this statement

      > I agonize over simplicity

      I wonder if the subordinates in your organisation who are not allowed to criticise you, wish you had agonised over documentation (I do not know what power you have over the folks who follow you, I am hypothesising it is a lot)

      Documentation is very hard. It is harder than writing code because there is no parsing of documentation, no demonstration of correctness.

      Inaccurate, or lazy, documentation can be worse than useless, but no documentation condemns the system to a slow death

      I wish my fellow computer programmers would stop making excuses for not doing the extremely hard work of documenting what they were thinking they were doing when they (inevitably) did something slightly differnent

      • chaps 9 days ago |
        > Why? Why do you not have documentation?

        > Documentation is very hard. It is harder than writing code because there is no parsing of documentation, no demonstration of correctness.

        You answered your own question :)

        • worik 9 days ago |
          > You answered your own question :)

          So "do not do the hard parts"?

          That is very unprofessional

          • golergka 9 days ago |
            Achieving desired result is professional. And achieving desired result without doing the hard parts is not only professional, but smart and actually kind of awesome.
            • worik 8 days ago |
              > Achieving desired result is professional.

              No. It is geeking out, part of the job...

              > achieving desired result without doing the hard parts is not only professional, but smart and actually kind of awesome.

              That is a menace. I think I am working on code you wrote

              It is the opposite of professional. It is amateur, irresponsible dilettantism

              • floating-io 8 days ago |
                "Professional" is doing what they're paying you for, end of story.

                Most employers have less than zero interest in paying coders to document in my experience. If they want documentation to exist, they hire a technical writer.

                Sadly, I've never met an employed tech writer (and no, journalists don't count).

                • Tainnor 8 days ago |
                  You're not being paid to document, you're being paid for writing maintainable code (in decent places at least) and it's your job as a professional to decide how much documentation that includes. In my opinion the idea that good code is "self-documenting" is a myth.
              • golergka 7 days ago |
                > ...to fight and conquer in all your battles is not supreme excellence; supreme excellence consists in breaking the enemy’s resistance without fighting
          • nicce 9 days ago |
            There is a limited time available. The root comment was getting success with other means. Adding good documentation would have costed much more time. That might have made their projects less successful in this case.
            • worik 8 days ago |
              > Adding good documentation would have costed much more time

              That is the problem

              Not that it is true, it is not, for many reasons. It is a problem that is believed

              • bdangubic 8 days ago |
                costed more = it would take time. you have magic ways in which good documentation can be created without any time at all allocated to the effort?
                • Tainnor 8 days ago |
                  The cheapest time to add documentation is when you have the information in your head because you just worked on it. Nobody is demanding an essay for every method, but just write down what went through your head when you implemented weird hack #17 or found out that the API you're calling does something surprising.
                  • nicce 8 days ago |
                    The main challenge is that you would need to write the documentation for a person who might have never used this particular software before.

                    The information that is in your head might be nonsense for this person, and there is chance that it is not reducing the time it takes to understand in a meaningful way.

                    • Tainnor 7 days ago |
                      It's a best effort thing - as anything in software.
                      • bdangubic 6 days ago |
                        waste of time too
          • chaps 9 days ago |
            Unprofessional to who, exactly? Like a sister comment says, an understandable system was built, so it seems like a strong professional relationship existed.

            Every codebase is going to have different definitions of "professional standards".

      • nosefurhairdo 9 days ago |
        Totally fair to be skeptical; there's no way I can convince you that my coworkers would agree with what I've claimed here. I do still interact with many of the folks that inherited my code though, and on multiple occasions they've expressed how my services have been easier to work on than others.

        Will also note I have no subordinates. In most cases I've handed these services off to teams with more seniority/higher rank than myself.

        Re: documentation, I suspect the embedded C and adjacent systems you work on warrant docs more than the web app plumbing work that I do. I've done brief write-ups with some diagrams, but I wouldn't know how to document further without just restating what is already clear from the code.

      • Tainnor 9 days ago |
        I somewhat agree with you, I don't understand the disdain many programmers have for documentation.

        Every company I've worked had parts of the codebase that were full of complicated business logic whose purpose was totally non-obvious, or complex interactions with outside APIs etc. I took care to document those things carefully so they would be understandable.

        • KronisLV 8 days ago |
          > I somewhat agree with you, I don't understand the disdain many programmers have for documentation.

          I also agree with this person for the most part. For all I know the original poster might indeed be successful with their approach, but in general having docs of some sort is a good idea.

          I think most devs have the sometimes mistaken belief (coupled with some arrogance/cargo culting) that code should be self-documenting, skipping over the part where they can document WHAT but not the WHY in as much detail as would be needed to tell the full story.

          Sometimes a simple comment explaining the basis for doing things a certain way, a Markdown README/ADR in the same repo, or even a link to a particular Jira issue (to even indicate that one with useful stuff exists, in the midst of thousands of others) will all be better and save someone a headache in the case of them missing out on important context.

          The correct amount of documentation is as little as you can get away with (without being apathetic or ignorant of the developer experience of others in the project that don't know all that you do), but not zero. The code naming conventions and structure, as well as even code tests (both correctness, how it should work and how to use it) and any automation (e.g. Dockerfiles that detail the dependencies, or something like Ansible playbooks that detail the needed environment, or systemd service file definitions, or even your project files and build scripts) might explain a lot about it, but not all.

      • delifue 8 days ago |
        The bugs coming from porting from X86 to ARM may be related to memory order. ARM has weaker memory order than X86. You may need to add memory barriers or synchronization. Of course there are other causes.
        • sapiogram 8 days ago |
          With no further context, I think good ol' UB is more likely. Every C codebase I've seen that's not scrutinized with tooling to detect UB, is full of UB.
          • worik 7 days ago |
            > tooling to detect UB

            My tooling is not showing anything.

            What tooling do you recommend?

    • jack_h 9 days ago |
      What you've said echoes the concept behind the quote "I didn’t have time to write you a short letter, so I wrote you a long one." This, or some variation of it, has been around for quite a while. I think this reveals a fundamental truth about knowledge based work that is inherent to humans. Purposeful simplicity is harder than accidental complexity.

      > This approach of course takes more time and requires that your management trusts you and is willing to compromise on timelines.

      I would say that most management and even most programmers don't see the value in this. In my experience focusing on simplicity gives much better long-term results but it has higher and more unpredictable upfront cost. Blasting code onto main is seen as being more productive even though long-term it seems to have much higher overall costs.

    • gofreddygo 9 days ago |
      simplicity is a noble ambition, but let it not impede progress, for it is subjective and subject to discretion. The art to be learned and practiced is knowing when to cut corners and where to be relentless with yourself and demand that of others.

      Getting v0.1 out, albeit with murky code and iterating to v2.5 with 10 paying customers is the way to progress. The hard, non-science part is getting management to spend billable hours for no short term benefit. Thats the key skill.

  • softwaredoug 9 days ago |
    I see star, lone-wolf ICs get too out in front of their teams all the time. It usually doesn’t end well. The star IC could objectively be building the right thing (like a state of the art recommendation system). But, like the article says, requirements change, bugs need to be fixed, the team needs to adjust the implementation and eventually the team reimplements the thing to within their capabilities.

    It’s more than the usual software maintenance too-It’s the entire operation of a piece of software. Scaling it out, being on call for it, adding monitoring, alerting and logging. Inter-operating with other software in the company. Developing libraries and services for other developers to consume. Security. Understanding and deploying the dependencies of the software. And more.

    The clever recsys in my example is only the tiny kernel of the actual challenge of delivering this to users. Its the complex care and feeding of a live service that matters.

    • physicles 9 days ago |
      According to the article, the mistake wasn’t that X10 got too far out in front of the team (the initiative to build svc was supposed to be a lone effort), it was Org’s failure to orchestrate a hand-off from X10 to team.
  • brettgriffin 9 days ago |
    There's some really interesting stuff in here, but I think, given the example, it is burying the lede: organizations systemically underestimate the total cost of ownership of a service. By, like, orders of magnitude.

    In this example, it isn't entirely clear if this service ('saas middleware') is deeply integrated with org's core competency or value. But I'll assume it isn't.

    They do not understand the service domain well enough and cannot staff or motivate the people to build and maintain it correctly. This is exactly why SaaS exists and is so ubiquitous. You're just not going to be able to build something as good or better for less in the long run.

    If they properly understood the cost of building and maintaining this, compared to the (probably) insignificant increase in enterprise value to the org, they would have probably would have just RIF'd these spare engineers and just pay the SaaS provider.

    I deal with the internals of many engineering teams across companies of all sizes, and sure as shit, every. single. one. of them has multiple of these internal failed creations. I just don't think people truly understand how much of a liability these systems are orgs.

    But yeah, once they made the first mistake, the rest of the blog pretty much hits the nail on the head.

    • dambi0 9 days ago |
      I think the article is trying to say more than that the development and ownership of projects is often underestimated. It’s attempting to explain why that is the case. Because the cost of theory building is misunderstood. I don’t think the lede has been buried at all.
      • brettgriffin 8 days ago |
        The whole scenario only exists because of the axiom introduced between points 3 and 5:

        > 3 ...ORG spends an egregious amount of money on middleware SaaS

        > 4 ...executive figures they should be able to replace SaaS with in-house system

        > 5 ...manager tasks one of ORG ’s finest engineers with the job of building it

        If in Point 4 it was determined this was a low value, high TCO project with many replacements, the stud engineer doesn't work on the project, no events past this point occur.

        If point 3 was that they had an opportunity to build a flagship product/feature in their wheelhouse and drastically grow market share, nothing past 6 and or 7 happens.

        Like I said, there are interesting things here about knowledge transfer, but the root cause seems to be missed from the analysis. Maybe there's some other real world scenarios where teams of critical software are getting replaced whole sale and remain confused, but I'm not convinced most of these issues would come up in a situation that wasn't the one described in 3-5.

        • carbonguy 8 days ago |
          > The whole scenario only exists because of the axiom introduced between points 3 and 5...

          I'll argue that the higher-level context introduced in point 2 is even more important here: "ORG shifts from assume we have infinite budget mode to we need to break even next year or we’ll die" i.e. the whole scenario exists not because the business can't accurately evaluate TCO, it's because the business is in do-or-die mode and long-term TCO doesn't matter NOW.

          That is, this whole scenario takes place in a situation where there is a organizationally vital need to cut costs. What happens afterwards is a trade of long-term risk (internalizing an essential business function and giving it a bus factor of one) for immediate financial improvement (no more SaaS spend). Long-term TCO doesn't matter if the company collapses next quarter, right?

          And in that short-term frame, the project is an unqualified success: X10 delivers exactly what was needed, and the SaaS spend is eliminated. But the risk hits: X10 leaves the company.

          [So, pointing out this hypothetical company isn't correctly estimating TCO is correct, but irrelevant; they're in a position where having to pay the long term costs will be a better problem than the one they have now - a reasonable business decision, though not a great one to have to make.]

          For what it's worth, I completely agree with your original point: organizations really do systemically underestimate the total cost of ownership of a service. Within the example in the article, the flawed assumption is pretty explicitly laid out in point 7: "For all intents and purposes, development is done, they only need to keep the lights on." - and exploring WHY this assumption is flawed is the core of the article (section 3).

          So, ultimately I agree with dambi0 in the GP comment - the lede hasn't been buried here, rather the whole article is a discussion of one aspect of the very point you make. Why DO organizations systematically underestimate service TCO? Because, at least in part, there is not yet a widespread understanding that a service is not "software" in and of itself; rather, a service is the organizational understanding of a solution to an organizational problem domain, and maintaining organizations is orders of a magnitude more expensive than maintaining tools in and of themselves.

  • gatinsama 9 days ago |
    > the mental model that allows the designer to map a subset of the world (the domain) to and from the system, and not the system itself, is the primary product of the software design activity

    This is spot on. I was never able to put it in such precise words.

    This theory has Brook's Law as a corollary:

    "Adding manpower to a late software project makes it later."

    Because the developers need time to develop this mental model before they can meaningfully contribute to the codebase.

  • tekchip 9 days ago |
    This is a general problem. Why's and how's. Or in project management, and what should be associated documentation, process and procedures. In this case code is procedure or how. The steps to do the thing. Which is great but it's hard to make meaningful and useful changes without understanding the Why, process, does the thing needing to be done and why, process, are the procedures or code doing the things they are in the way they are.

    Presumably for code you would get enough why/process via comments but that seems unlikely. Perhaps coding needs to take some other tools from project management or something? Knowledge sharing/transfer is hard.

  • harrall 9 days ago |
    I feel that the real problem is a lot of people don’t care about collecting requirements.

    It’s one of my favorite parts of the process.

    People just want to build the app that they want to build. I’ve talked to engineers who just say “I don’t really care until we can start coding.”

    I got into engineering because I like building things that are useful.

    • avg_dev 9 days ago |
      Where I live, most software developers are not legally allowed to call themselves software engineers (engineering is licensed by regulatory body). Still, many do call themselves that and so do their employers. But in my view, "I don't really care until we can start coding" is not actually engineering at all.
      • worik 9 days ago |
        > developers are not legally allowed to call themselves software engineers

        How about "Solution Architect"?

        I think a collection of nonsense job titles for computer programmers would be fun...

        • Nevermark 9 days ago |
          Every Solution Architect should have at least seven Framework Framers and two Data Plumbers supporting them, to give the title its proper gravitas.
      • harrall 9 days ago |
        I’m not sure why you are making a point about engineering certification.

        It’s not like physical products are immune to this problem. I could list you a billion poorly designed products that don’t seem to meet the correct requirements.

        At the end of the day, some people just like to build stuff without understanding who they are building for. It could be because they like engineering. It could be because they think they will make money because “people will come if you build it.” Both strategies make poor solutions.

        When it should be “the users have these specific problems and the product should make their life easier.”

      • drewcoo 9 days ago |
        Where I live we require a license for barbering and cosmetology. Apparently there are illegal haircuts. I have never heard of the hair police but I'm sure that people complain about it in fora.
      • t43562 8 days ago |
        I've come across some terrible programs written by almost-to-be-certified engineers :-) - lots of embedded constants and special cases.

        I am perfectly happy to be called a "programmer" though. IMO that's a very adequate description and honourable. No need to steal anyone else's glory.

    • CT4u8798 9 days ago |
      I am not a developer by trade but being technically capable I inherited a system once that I kept running beyond its real lifespan. Eventually it was to be replaced and an outside company was contracted to develop a new system. Despite multiple meetings in which I demonstrated the shortcomings of the current system and the workflow on which it was based, all this company did was replicated the old system in their chosen software stack (which also didn't really work because to old version was relational and theirs was no-sql). I got the impression that they already had an idea of what they were going to create and didn't listen at all. I've since moved on, but I hear the new system is worse than the old system.

      TL;DR, I have direct experience of: “I don’t really care until we can start coding.”

    • forinti 9 days ago |
      I find that the client often expects you to just code whatever they need without much interaction.

      The truth is that requirements gathering is also a moment of discovery for the client.

      • tonyedgecombe 9 days ago |
        I once had a client who had that written into the contract. I’d consider that a huge red flag now.
    • sibit 9 days ago |
      > I feel that the real problem is a lot of people don’t care about collecting requirements.

      As someone who _really_ enjoyed requirements gathering for many years and now has become one of the "I don't care let's just build it" people I can assure you that some of us crashed out thanks to Scrum Masters™, Project Managers™, Product Owners™, or any of the other big "A" Agile™ cronies.

  • avg_dev 9 days ago |
    Yes, artifacts (PRs, tickets, commit messages) should have good context (documentation) associated with them, and yes, simplicity is very important, and yes, some tech debt is always going to be incurred, but most importantly, not understanding a "legacy" system that is already running in production, and then modifying it significantly, is not going to lead to good results. I think it's as simple as that.
  • siscia 9 days ago |
    Most of our white collar jobs are about knowledge sharing and synchronization between people.

    And surprisingly this is an aspect in which I see very very little progress.

    The most we have are tools like confluence or Jira that are actually quite bad in my opinion.

    The bad part is how knowledge is shared. At the moment is just formatted text with a questionable search.

    LLMs I believe can help in synthesize what knowledge is there and what is missing.

    Moreover it would be possible to ask what is missing or what could be improved. And it would be possible to continuously test the knowledge base, asking the model question about the topic and checking the answer.

    I am working on a prototype and it is looking great. If someone is interested, please let me know.

    • mdgrech23 9 days ago |
      knowledge is power and people don't always want to share. Maybe it's more reflective of my company culture but I've seen knowledge effectively hoarded and used strategically as a weapon at times.
      • siscia 9 days ago |
        Of course, but at least in my personal case is more about the lack of tooling.
      • nicce 9 days ago |
        It is visible everywhere. Some people hoard knowledge so that they stay important in the company. Some people hoard knowledge so that they can get more money from bug bounties. It is almost always about personal gain.
      • ozim 9 days ago |
        Of course there is no upside for spending time updating documentation unless it actually is part of your job description or there is legal requirement for company.

        If you put knowledge in wiki, no one will read it and they will keep asking about stuff anyway.

        Then if you put it there and keep it up to date you open yourself to a bunch of attacks from unhappy coworkers who might use it as a weapon nagging that you did not do good job or find some gaps they can nag about.

        • PsylentKnight 8 days ago |
          I write documentation because I enjoy it and I see it as a tool for consolidating/solidifying my own knowledge
    • tylerchurch 9 days ago |
      > LLMs I believe can help in synthesize what knowledge is there and what is missing.

      How could the LLM help?

      Given that it is missing the critical context and knowledge described in the article, wouldn’t it be (at best) on par with a new developer making guesses about a codebase?

      • nyrikki 9 days ago |
        The open domain frame problem is simply the halting problem.

        https://philarchive.org/rec/DIEEOT-2

        While humans and computers both suffer from the frame problem, the LLMs do not have access to symantic properties, let alone the open domain.

        This is related to why pair programming and self organizing cross functional teams work so well btw.

      • siscia 8 days ago |
        As engineers we often aim to perfection, but oftentimes it is not really needed. And this is such case.

        Knowledge is organised into topic, and each topic has a title and a goal. Topics are made of markdown chunks.

        I see the model being able to generate insightful questions about what is missing to the chunks. As well as synthesise good answer for specific queries.

      • t43562 8 days ago |
        I think companies have a lot of data in systems like confluence and JIRA and their chat solution which is hard to find and people in the company don't even know that it might be there to search for it.

        An LLM that was trained up on these sources might be very powerful at helping people not to solve the same problem many times over.

        • jazzyjackson 8 days ago |
          The problem isnt the interface it's the access, having everything in one place vs fragmented across different systems, different departments

          I built a chatbot under the same assumption you have for a large ad agency in 2017, an "analyst assistant" for pointing to work that's already been done, offering to run scripts that were written years ago so you don't have to write them from scratch

          Through user testing the chat interface was essentially reduced to drop-down menus of various categories of documentation, but actually it was the hype of having a chatbot that justified the funding to pull all the resources together into one database with the proper access controls.

          I would expect after you went through the trouble of training an LLM on all that data, people using the system would just use the search function on the database itself instead of chatting with it, but be grateful management finally lifted all the information silo-ing.

          • t43562 8 days ago |
            Some of these companies aren't delightedly eager to make it cheap to access the data you have entered into their systems. It's like they own your data in a sense and want to make it harder to leave.

            I love your point about the chatbot being the catalyst for doing something obvious. I curate a page for my team with all the common links to important documentation and services and find myself nevertheless posting that link over and over again to the same people because nobody can be bothered to bookmark the blasted thing. Sometimes I feel it's pointless making any effort to improve but I think you have a clever solution.

            The other aspect of it, IMO is that searching for the obvious terms doesn't always return the critical information. That might be my company's penchant for frequently changing the term it likes to use for something - as Architects decide on "better terminology". I imagine an LLM somehow helping to get past this need for absolute precision in search terms - but perhaps that's just wishful thinking.

  • picometer 9 days ago |
    This is a well-referenced essay, drawing the on writing of David Parnas [1], Peter Naur [2], and Zach Tellman [3].

    As software developers we’re intimately familiar with these ideas. But the industry still treats it as “folk knowledge”, despite decades of academic work and systemization attempts like the original Agile.

    We really need more connective work, relating the theoretical ideas to the observed behavior of real-life software projects, and to the subsequent damage and dysfunction. I liked this essay because it scratches that itch for me. But we need this work to go beyond personal blogs/newsletters/dev.to articles. It needs to be recognized & accepted as formal “scientific” knowledge, and to be seen and grokked by industry and corporate leadership.

    [1] https://dl.acm.org/doi/pdf/10.5555/257734.257788

    [2] https://pages.cs.wisc.edu/~remzi/Naur.pdf

    [3] https://explaining.software/

    • physicles 9 days ago |
      I suspect systemization would require quantifying some of the variables involved, which include things like

      - The size and complexity of the code base (for some definition of size and complexity)

      - The quality of the code and docs (for some definition of quality)

      - The skill and experience of the people involved

      In four years in a big tech role, my team twice inherited and had to modify a code base without any input from the original authors. One was a quagmire, the other was a resounding success:

      - The first was a media player control that we had to update to support a new COM interface and have a new UI. We decided that it was too complicated, and nobody understood it, so we’d reimplement it from scratch. One year later it mostly worked, but still had bugs and performance issues that the original version didn’t have. In hindsight, I suspect it would’ve been cheaper to try to revive the original code base.

      - The second was a music database for an app running on a mobile device. Our current one was based on the version of SQL available, but some principal engineers on another team suggested replacing it with a custom in-memory database that already shipped in another device. We argued that the original authors had left and the code was unwieldy; they argued that “it’s just code, we can read it” and its performance was known to be better. They did the work to revive it and successfully integrated it into our app. Wild success.

      The flip side of “it’s impossible to revive a dead system” is “don’t rewrite a working system from scratch”. Absent more research, the only way to correctly guess which situation you’re actually in is to have tons of experience.

      • aoeusnth1 8 days ago |
        Probably also these situations are dependent on the people involved. If it weren't those particular principal engineers on the project, it's possible trying to revive the in-memory database would not have been successful.
  • mfld 9 days ago |
    I can't resist to point out that, in theory, there are at least two other options avoid wasting many resources via the failing new teams:

    1. Get the original dev to explain his theories (keep employees longer or engage them as consultants) 2. Make and get a "diary" of the original devs theory building.

    In this story, and in probably many places, the business environment however supports the explained outcome.

  • boricj 9 days ago |
    The focus of this article is on the big political project that fails (the SVC), but the part that resonates most with me is the small forgotten project that lives (the SaaS).

    Over the years, I've built a number of contraptions under a similar set of circumstances: a technical problem usually created by an organizational issue suddenly appears that is both severe enough to threaten a project yet falls outside the core business, so it needs to be fixed both yesterday and on the cheap.

    Inevitably, I get saddled with it and produce a kludge that is equally effective and cursed before going back to business as usual. More than once I've learned to my horror that years later the thing is not only unmaintained yet still in place, but its usage expanded to the point where it became load-bearing, because the underlying organizational issue was never solved.

    In a manner of speaking, it is the opposite situation as described in the article: a complete lack of software design that somehow manages to survive in spite of a lack of knowledge building.

  • binary_slinger 9 days ago |
    Also related: https://blog.codinghorror.com/commandos-infantry-and-police/

    I work a lot in the transition area between commando and infantry aka X_10 and TEAM. I’ve also found myself on TEAM++ coming in to replace TEAM.

    It is difficult to explain to customers that SVC was built on a set of assumptions which I turn informed the design. Once the assumptions changed then design typically needs to change as well.

    • t43562 8 days ago |
      This is a major issue. Why the code is what it is.

      You need a history of the assumptions so that new developers can know what's legacy and what isn't.

      I've never yet had a set of requirements that didn't change.

      • CRConrad a day ago |
        > You need a history of the assumptions so that new developers can know what's legacy and what isn't.

        Isn't this (at least in part (and perhaps only approximately)) what Architecture Decision Records are for?

  • gervwyk 9 days ago |
    Totally resonated with me re software lifecycle etc. Great article.

    I know this is not what the article is about. But perhaps exec should have spent resources and time trying to increase revenue instead of cutting cost marginally, and creating an expensive system down the road. Derailing team focus.

    Build vs buy… Build is almost always not cheaper. Many other reasons to build though.

  • jt2190 9 days ago |
    > Knowing that [program] revival [i.e. bringing a new development team “up to speed” by having them learn the model] is a plausible future need has powerful consequences for our work.

    I’m not sure that most developers are willing to revive software, based on my observation that very few read anything at all, especially the source code. Instead I see a lot of adjusting the input and output of the existing program by adding a new layer. This new code is totally understood by the new dev, and they can modify/maintain it easily without worrying that they broke the existing system. It also usually duplicates something that already exists inside the system. As the process repeats more and more layers are added.

    I think a few lucky teams have developed a culture that encourages learning the existing code. (Popular web frameworks comes to mind as an example.)

  • Mawr 9 days ago |
    > The problem is that TEAM members don’t have enough elements to build a satisfactory mental model of SVC. They need to go by a mix of the client’s interpretation of what the system should be, and what they can tell from the code that the system actually is. These views can be disconnected and contradictory. The code may tell the what and the how, but it doesn’t tell the why. Only X10 could say what was a functional requirement, what a technical necessity, what a whim, what an accident. The team has to resort to reverse engineering, extrapolating, and guessing.

    Hence, write down your thought process, mental model, and assumptions alongside the code. Tip: Call the process "writing documentation" instead of "commenting code".

  • deskr 9 days ago |
    > The program should preferably be discarded, and the new team should be given the opportunity to resolve the problem from scratch.

    Yeah right. "We don't know how it works so we're going to ditch it and create it again."

    It works in some cases but by no means should that be the default.

    • datadrivenangel 8 days ago |
      This is the right answer, except that when software has been so neglected that this is the best answer, individuals within the organization that neglected to maintain the software will realize that this is their only chance to get bugs fixed or new features added, and so the requirements will expand until the re-write probably dies.
  • deskr 9 days ago |
    I have another suggestion which I'm sure played a large part in this.

    SVC was an unwanted child. It wasn't their "product". One employee was tasked to write it to save paying money to a "seemingly innocuous middleware SaaS".

    To anyone in ORG working on it, it was a dead end. No one wanted to own it and perhaps no one did. A team was asked to add features to it.

    Doing the ground work of actually understanding SVC had many negative consequences:

    * It would take a very long time, making managers not happy. It would be largely a wasted effort, since no further work was then needed on SVG.

    * If you became an expert on SVG, it would be yours to keep and no one wanted that.

  • contingencies 9 days ago |
    I imagine that experienced cross-disciplinary designers would concur that all complex design is knowledge building, which is why documenting design decisions is important. You tend to learn this lesson when maintaining projects of nontrivial complexity over a longer period.
  • gr3ml1n 8 days ago |
    This is an interesting argument for the (definitely common) pattern the author describes in the intro.

    A more cynical take (that I'm inclined towards is): the median software developer is simply not very good. X10 was a good developer; the people on TEAM and TEAM++ were not.

    • tresil 8 days ago |
      This is absolutely my take as well. I see the points that the author brought up as additional contributing factors. However, the leading reason for this “phenomena” is that many companies are brimming with individuals (including managers) that are simply not competent or motivated enough to meet the demands of this profession. In the author’s story, this is probably why X10 decided to leave the company, because they were tired of working with incapable co-workers.
  • nextworddev 8 days ago |
    Kind of a roundabout way of saying that you can’t evolve a program without fully understanding the codebase
    • googamooga 7 days ago |
      Not only the codebase, but also the domain, the stakeholders, the history of requirements, etc etc etc.
  • wwarner 8 days ago |
    This is really why AI is going to hit s/w development so hard. It's not merely going to make code easier to write, it's going to be a massive knowledge repository that takes a team from initial conception, through product design and finally engineering design and coding.
    • n_ary 8 days ago |
      No, LLM needs excellent communicator. It can statistically spit out knowledge but someone has to embed that knowledge. Given how vague and contradictory most requirements are and how complete and excruciatingly detailed prompts must be, LLMs will be useful to generate prototypes faster to check assumptions of the lost knowledge, nothing more or less.
      • namaria 8 days ago |
        The real trouble with LLMs is that they emulate knowledge so well. People assume they can depend on it to know things, but it is not reliable at all. A lot of traps are being laid in code by people trusting the output or behavior of LLMs.
      • wwarner 8 days ago |
        Yes, this what i’m saying; the development process will be about writing and talking into a new tool, and then with that recorded information generating summaries, mocks prototypes and code. Definitely people would be involved. What I’m pointing out is that LLMs are natural tools for summarizing and synthesizing domain expertise, which can be naturally applied to the product development process. If an LLM based tool can be a great personal assistant, it can also be a great knowledge repository for an organization.
  • namaria 8 days ago |
    People often assume that code is knowledge. They want "self explanatory" or "well documented code". Companies and managers often treat developers as interchangeable. But therein lies the mistake.

    Knowledge exists in mental models and team structures. Small components and systems can be understood by a person, but team structure also embodies knowledge of larger systems. People will need complementary mental models to understand a large system together.

    That's why adding manpower to a late project makes it later. That's why maintenance is hard and handover is harder. That's why systems devolve into big balls of mud. Because companies and managers do not respect the fact that you need people and teams who have good mental models and that mental models take time to build and share.

    No amount or quality of code can make up for this fact. And simulacra of ownership - having "product owners" or whatever - won't cut it. You need people to own their systems, understand them deeply. Moving people around, churn, treating developers as interchangeable, substituting rituals for deep work, accumulating 'technical debt' (deferred work as in ship now and think later) etc are all detrimental to building and sharing sound mental models.

    • dominicrose 8 days ago |
      I've seen code like this: - 10 lines - every line has a different author in git blame - there are conditional branches inside conditional branches - last but not least: all conditional branches have the same behavior!

      How does it end up like this? Why doesn't the last commiter just delete everything and write it in a single line instead?

      • namaria 8 days ago |
        Treating code as a means to an end, which is what I think you're describing, is just as bad as treating it as an end in itself. It should be neither an opaque black box nor a transparent 'bicycle of the mind'. In my opinion it cannot be lumped all in one category. It is an aspect of software systems. It may have many features, depending on where it is and how it is used.

        Code can embody knowledge, but it is not the embodiment of knowledge. It can express functionality but it is not a functional component of a system. I think aspect is the best description: when you look at a system from the source code, you see some of it. Not a projection of the system over a set of dimensions as some people seem to treat it. It is not a textual description of the system. It is the part of the system you can see when you come at it from that side.

        • dominicrose 8 days ago |
          It's true that the 10-line code I was describing did provide some extra information, like the fact that there are different cases that are, have been, will be or could have been different... I agree that the code isn't the end result if that's what you're saying. But the resistance to change is everywhere not only in the code and it locks projects into what they are. Only expansion is allowed. I'm not saying it's bad it's just what it is.
    • red_admiral 8 days ago |
      Managers who know what they're paid for also want code with a bus factor of more than 1, just like in other engineering disciplines. Having code well documented is a feature in that sense; like unit tests it won't fix all problems but it goes a long way.

      Speaking of tests, I've many times learnt more about how some code is supposed to work from the tests than from the documentation. Yet another reason to test everything you can.

      • bb88 8 days ago |
        Unit tests often double as documentation. They show the expected behavior of a function.

        And if you take the time to write a series of high level cases, they can show the full expected behavior of a process. E.g: "Don't accept another request on the same object while we have another request on that object in the queue."

        A unit test is great, but I've seen people delete unit tests rather than try to understand what's going on.

    • amelius 8 days ago |
      What if an llm could transform code into documentation?
      • striking 8 days ago |
        I mean, it might be able to, as could a junior software engineer. That's besides the point.

        It's rare that just reading the code will actually capture the spirit of what it means, that you could skip the step of asking the folks who wrote it why things are the way they are or the step of experimenting with it yourself to get a feel for it.

        Or in other words, it doesn't really matter who reads the code. You still don't get to skip the knowledge building.

      • _DeadFred_ 8 days ago |
        So I constantly had to fight management about paying my people when well I moved to IT. Management saw them as replicable by anyone that knew the software we used. I saw them as domain experts that knew HOW we used the software we used, the software was secondary to knowing the company (and basically how every job was done in the manufacture of a 30,000+ part product). When I was a dev, we were partnered with industry domain experts so that we understood how the software was implemented to a level that 'self documenting code' never will.

        Software is a cog. You're code can't be that self documenting to become domain expert for the domain it is trying to fill. That's like documenting how to train for a marathon by looking at running shoes.

        • amelius 8 days ago |
          But a pre trained llm like chatgpt can know a lot about problem domains.
      • bb88 8 days ago |
        Unlikely. LLMs only understand the code they're looking at, but not in the context of the complex interactions. E.g. LLMs won't understand how a particular line of code fixed a system outage that occurred last year.
    • cjohnson318 8 days ago |
      I've been contracting and taking care of legacy software this year. My initial parallel steps are to ask questions, read the code, write tests and literally any kind of documentation, all while trying to implement features. You can spend months trying to understand some software, and never really get anywhere because there's no requirements/documentation, your manager doesn't know, and no one who wrote that code works there any longer, or they're just too busy. It's a hot mess.

      (I'm not even a Test Driven Design evangelist. There's just no other way to "prove" that things kind of, almost, sort of, work in a possible environment.)

      • namaria 7 days ago |
        There's no replacement for knowledge and sound mental models. Documentation, tests, they can help but they cannot replace having a knowledgeable person around.
  • pjbster 8 days ago |
    Even if the organistion is fully signed up to the knowledge building philosphy, it can still be derailed if the staff aren't up to scratch.

    Around 12 years ago, my employer tasked me with building a quote engine for a new product they wanted to sell online. The engine needed to produce 4 additional quotes (2 lower, 2 higher) to either prevent potential walk-aways or to offer upsells to capture potential additional revenue.

    And it struck me at the time that this sort of hand-wavey selling tactic would be just the sort of thing that was likely to change so I put all this logic into a pure function - pass in an original quote request and the function returns a collection of alternative quote requests.

    And, sure enough, a couple of years later, the business decides to change the approach and offer 1 lower quote and 3 upsells. And they give the job of implementing this to another developer.

    I was still at the company and was known as the original developer (my name was in a comment at the top of the file, for a start) so I was asked to review the code changes.

    I was surprised to learn that the other dev had left the pure function untouched and had, instead, written a bunch of new logic to generate the alternative quotes. Not based on the original quote request but on the collection of alternative quotes returned from the original function. Furthermore, this new logic was placed in main procedure - mama's finest spaghetti in the making, right there.

    So I rejected the change and told the dev where to put the actual logic. Then I waited for the re-review request to come in.

    What happened instead is that the code went live anyway - the dev had simply re-raised the review request and assigned it to another dev who rubber stamped it.

    Looking back, I don't think all the documentation in the world would have prevented this behaviour. A better approach would be for the company to pass the changes to the original developer and to pair with another dev - like Fred Brooks' Chief Programmer plus Assistant recommendation.

    I was never approached for an end of year review for this developer and I left the company before them. It's not personal but I'll resign on the spot if a company I work for employs this developer in future.

  • t43562 8 days ago |
    The problem I have is that only people who "know" already tend to accept this insight.

    i.e. explaining it to those who make decisions doesn't work that well because it doesn't fit their mental model very well - they don't understand that a large part of their asset is sitting in people's heads.

    That guy wants 5k more.....? Pay it. The cost of hiring someone new and training them up and relearning it all will be far higher. Don't force people back to the office, be relaxed about everything and keep the knowledge.

    At the same time make sure other people are learning it so you do have replacements.

    At the same time make sure you have proper tests so you can work with code you don't understand yet if someone leaves.

    At the same time try to gather all the correct and uptodate documentation and explanations somewhere.

  • red_admiral 8 days ago |
    Part of the story that is mentioned but not discussed is that X_10 had a fixed and tight schedule. Under that constraint you expect some compromises for writing the mental model down in a way others can learn, because that model presumably lived in her head.

    It might have been possible to say, great job you delivered on time, now you have as much time as you need to write your mental model down. On full pay of course, and we won't count that against you as non-technical work in the next performance review.

    That increases the upfront cost of SVC but is an investment that pays back interest as soon as someone else has to fix anything.

  • beryilma 6 days ago |
    > A dead program may continue to be used for execution in a computer and to produce useful results. The actual state of death becomes visible when demands for modifications of the program cannot be intelligently answered.

    I totally believe this. I see zombie programs at my employer, including the ones that are currently making a profit, that just don't know yet that they are dead. The Jira model of non-ownership software development being a leading cause in my opinion...