How I program with LLMs
871 points by stpn 3 days ago | 323 comments
  • mlepath 3 days ago |
    The first rule of programming with LLMs is don't use them for anything you don't know how to do. If you can look at the solution and immediately know what's wrong with it, they are a time saver otherwise...

    I find chat for search is really helpful (as the article states)

    • qianli_cs 3 days ago |
      Exactly, you have to (vaguely) know what you’re looking for and have some basic ideas of what algorithms would work. AI is good at helping with syntax stuff but not really good at thinking.
    • itsgrimetime 3 days ago |
      IMO this is a bad take. I use LLMs for things I don’t know how to do myself all the time. Now, I wouldn’t use one to write some new crypto functions because the risk associated with getting it wrong is huge, but if I need to write something like a wrapper around some cloud provider SDK that I’m unfamiliar with, it gets me 90% of the way there. It also is way more likely to know at least _some_ of the best practices where I’ll likely know none. Even for more complex things getting some working hello world examples from an LLM gives me way more threads to pull on and research than web searching ever has.
      • Retr0id 3 days ago |
        > if I need to write something like a wrapper around some cloud provider SDK that I’m unfamiliar with

        But "writing a wrapper" is (presumably) a process you're familiar with, you can tell if it's going off the rails.

        • joemazerino 3 days ago |
          Writing a wrapper is easier to verify because of the context of the API or SDK you're wrapping. Seems wrong? Check the docs. Doesn't work? Curl it yourself.
      • Barrin92 3 days ago |
        >It also is way more likely to know at least _some_ of the best practices

        What's way more likely to know the best practices is the documentation. A few months ago there was a post that made the rounds about how the Arc browser introduced a really severe security flaw by misconfiguring their Firebase ACLs despite the fact that the correct way to configure them is outlined in the docs.

        This to me is the sort of thing (although maybe not necessarily in this case) out of LLM programming. 90% isn't good enough, it's the same as Stackoverflow pasting. If you're a serious engineer and you are unsure about something, it is your task to go to the reference material, or you're at some point introducing bugs like this.

        In our profession it's not just crypto libraries, one misconfigured line in a yaml file can mean causing millions of dollars of damage or leaking people's most private information. That can't be tackled with a black box chatbot that may or may not be accurate.

      • zmmmmm 2 days ago |
        > write something like a wrapper around some cloud provider SDK that I’m unfamiliar with

        you're equating "unfamliar" with "don't know how to do" but I will claim you do know how to do it, you would just be slow because you have to reference documentation and learn which functions do what.

    • photon_collider 3 days ago |
      "Trust but verify" is still useful especially when you ask LLMs to do stuff you don't know. I've used LLMs to help me get started on tasks where I wasn't even sure of what a solution was. I would then inspect the code and review any relevant documentation to see if the proposed solution would work. This has been time consuming but I've learned a lot regardless.
    • IanCal 3 days ago |
      That seems like a wild restriction.

      You can give them more latitude for things you know how to check.

      I didn't know how to setup the right gnarly typescript generic type to solve my problem but I could easily verify it's correct.

      • fastball 3 days ago |
        If you don't understand what the generic is doing, there might be edge-cases you don't appreciate. I think Typescript types are fairly non-essential so it doesn't really matter, but for more important business logic it definitely can make a difference.
        • IanCal 3 days ago |
          I understand what it's doing, and could easily set out the cases I needed.
          • fastball 2 days ago |
            If you understand what it is doing, you could do it yourself, surely?
            • IanCal 2 days ago |
              Have you never understood the solution to a puzzle much more easily than solving it yourself? I feel there's literally a huge branch of mathematics dedicated to the difference between finding and validating a solution.

              More specifically, I didn't know how to solve it, though obviously could have spent much more time and learned. There were only a small number of possible cases, but I needed certain ones to work and others not to. I was easily able to create the examples but not find the solution. With looping through claude I could solve it in a few minutes. I then got an explanation, could read the right relevant docs and feel satisfied that not only did everything pass the automated checks but my own reasoning.

      • kccqzy 3 days ago |
        If you merely know how to check, would you also know how to fix it after you find that it's wrong?

        If you are lucky to have the LLM fix it for you, great. If you don't know how to fix it yourself and the LLM doesn't either, you've just wasted a lot of time.

        • IanCal 3 days ago |
          It did fix it, I iterated passing in the type and linter errors until it passed all the requirements I had.

          > If you merely know how to check, would you also know how to fix it after you find that it's wrong?

          Probably? I'm capable of reading documentation, learning and asking others.

          > If you don't know how to fix it yourself and the LLM doesn't either, you've just wasted a lot of time.

          You may be surprised by how little time, but regardless it would have taken more time to hit that point without the tool.

          Also sometimes things don't work out, that's OK. As long as overall it improves work, that's all we need.

    • kamaal 3 days ago |
      >>f you can look at the solution and immediately know what's wrong with it, they are a time saver otherwise...

      Indeed getting good at writing code using LLMs demands being very good at reading code.

      To that extent its more like blitz chess than autocomplete. You need to think and verify in trees as it goes.

    • billmcneale 3 days ago |
      That's the wrong approach.

      I use chat for things I don't know how to do all the time. I might not know how to do it, but I sure know how to test that what I'm being told is correct. And as long as it's not, I iterate with the chat bot.

      • WhiteNoiz3 3 days ago |
        A better way to phrase it might be don't use it for something that you aren't able to verify or validate.
        • sdesol 3 days ago |
          I agree with this. I keep harping on this, but we are sold automation instead of a power tool. If you have domain knowledge in the problem that you are solving, then LLMs can become an extremely valuable aid.
        • Salgat a day ago |
          Similar to a developer who copy-pastes sections of code from StackOverflow and puts their faith in it being correct. The bigger issue with LLMs is that it's easier to be tricked into thinking you actually understand the code when your understanding may actually be quite superficial.
      • bityard 3 days ago |
        I feel like that's a good option ONLY if the code you are writing will never be deployed to an environment where security is a concern. Many security bugs in code are notoriously difficult to spot and even frequently slip through reviews from humans who are actively looking for exactly those kinds of bugs.

        I suppose we could ask the question: Are LLMs better at writing secure code than humans? I'll admit I don't know the answer to that, but given what we know so far, I seriously doubt it.

      • zmmmmm 2 days ago |
        I think it's just a broader definition of "know how to do". If you can write a test for it then I'm going to argue you know "how" to do it in a bigger picture sense. As in, you understand the requirements and inherent underlying technical challenges behind what you are asking to be done.

        The issue is, there are always subtle aspects to problems that most developers only know by instinct. Like, "how is it doing the unicode conversion here" or "what about the case when the buffer is exactly the same size as the message, is there room for the terminating character?". You need the instincts for these to properly construct tests and review the code it did. If you do have those instincts, I argue you could write the code, it's just a lot of effort. But if you don't, I will argue you can't test it either and can't use LLMs to produce (at least) professional level code.

    • j45 3 days ago |
      You can ask the LLM to teach it to you step by step, and then you can validate it by doing it as well as you go, still quicker than learning it and not knowing how to debug it.

      Learning how something works is critical or it's far worse than technical debt.

      • lelandfe 3 days ago |
        Yes, I have a friend learning their first programming language with much assistance from ChatGPT and it's actually going really well.
        • j45 3 days ago |
          Awesome, I wish more people knew about this compared to trying to do magic Harry Potter single prompt to do everything.
    • turnsout 3 days ago |
      I completely agree. In graphics programming, I love having it do things that are annoying but easy to verify (like setting up frame buffers in WebGL). I also ask it do more ambitious things like implementing an algorithm in shader code, and it will sometimes give a result that is mostly correct but subtly wrong. I only have been able to catch those subtle errors because I know what to look for.
    • tnvmadhav 3 days ago |
      I'd like to rephrase as, "don't deploy LLM generated code if you don't know how it works (or what it does)"

      This means, it's okay to use LLM to try something new that you're on the fence about. Learn it and then once you've learned that concept or the idea, you can go ahead to use same code if it's good enough.

      • JKCalhoun 3 days ago |
        "don't deploy ̶L̶L̶M̶ ̶g̶e̶n̶e̶r̶a̶t̶e̶d̶ code if you don't know how it works (or what it does)"

        (Which goes for StackOverflow, etc.)

        • switchbak 2 days ago |
          I've seen a whole flurry of reverts due to exactly this. I've also dabbled in trusting it a little too much, and had the expected pain.

          I'm still learning where it's usable and where I'm over-reaching. At present I'm at about break-even on time spent, which bodes well for the next few years as they iron out some of the more obvious issues.

    • staticautomatic 3 days ago |
      My experience is the opposite. I find them most valuable for helping me do things that would be extremely hard or impossible for me to figure out. To wit, I just used one to decode a pagination cursor format and write a function that takes a datetime and generates a valid cursor. Ain’t nobody got time for that.
    • ignoramous 3 days ago |
      > ... don't use them for anything you don't know how to do ... I find chat for search is really helpful (as the article states)

      Not really. I often use Chat to understand codebases. Instead trying to navigate mature, large-ish FOSS projects (like say, the Android Run Time) by looking at it file by file, method by method, field by field (all to laborious), I just ask ... Copilot. It is way, way faster than I and are mostly directionally correct with its answers.

    • logicchains 3 days ago |
      Don't use them for anything you don't know how to test. If you can write unit tests you understand and it passes them all (or visually inspect/test a GUI it generated), you know it's doing well.
    • SkyBelow 2 days ago |
      How you use the LLM matters.

      Having an LLM do something for you that you don't know how to do is asking for trouble. An expert likely can off load a few things they aren't all that important, but any junior is going to dig themselves into a significant hole with this technique.

      But asking an LLM to help you learn how to do something is often an option. Can't one just learn it using other resources? Of course. LLMs shouldn't be a must have. If at any point you have to depend upon the LLM, that is a red flag. It should be a possible tool, used when it saves time, but swapped for other options when they make sense.

      For an example, I had a library I was new to and asked copilot how to do some specific task. It gave me the options. I used this output to go to google and find the matching documentation and gave it a read. I then when back to copilot and wrote up my understanding of what the documentation said and checked to see if copilot had anything to add.

      Could I have just read the entire documentation? That is an option, but one that costs more time to give deeper expertise. Sometimes that is the option to go with, but in this case having a more shallow knowledge to get a proof of concept thrown together fit my situation better.

      Anyone just copying an AI's output and putting it in a PR without understanding what it does? That's asking for trouble and it will come back to bite them.

  • justatdotin 3 days ago |
    lots of colleauges using copilot or whatever for autocomplete - I just find that annoying.

    or writing tests - that's ... not so helpful. worst is when a lazy dev takes the generated tests and leaves it at that: usually just a few placeholders that test the happy path but ignore obvious corner cases. (I suppose for API tests that comes down to adding test case parameters)

    but chatting about a large codebase, I've been amazed at how helpful it can be.

    what software patterns can you see in this repo? how does the implementation compare to others in the organisation? what common features of the pattern are missing?

    also, like a linter on steroids, chat can help explore how my project might be refactored to better match the organisation's coding style.

    • roskilli 3 days ago |
      If you don’t mind me asking: which popular LLM(s) have you been using for this and how are you providing the code base into the context window?
      • fragmede 3 days ago |
        Not OP but Aider provides a repo map to the LLM as context, which consists of the directory tree, filenames, and important symbols in each file. It can use the popular LLMs as well as Ollama.

        https://aider.chat/docs/repomap.html

        Aider hosts a leaderboard that rates LLMs on performance, including a section on refactoring.

        https://aider.chat/docs/leaderboards/refactor.html

        • Zambyte 2 days ago |
          AI generated images can be good, and even reasonable to use for branding. Slapping an image right at the top of the page that says "Abstract Synxex Tree" with a meaningless graph and an absolutely expressionless and useless humanoid robot is a great way to immediately lose my interest in anything they have to say though. The homepage would be more interesting as a wall of text.
          • klibertp 2 days ago |
            Agreed, mostly, but this is not a homepage. On the homepage, there's a video demo and a wall of text (https://aider.chat/). Still, that Synxex Tree should disappear :)
  • wrs 3 days ago |
    I’ve been working with Cursor’s agent mode a lot this week and am seeing where we need a new kind of tool. Because it sees the whole codebase, the agent will quickly get into a state where it’s changed several files to implement some layering or refactor something. This requires a response from the developer that’s sort of like a code review, in that you need to see changes and make comments across multiple files, but unlike a code review, it’s not finished code. It probably doesn’t compile, big chunks of it are not quite what you want, it’s not structured into coherent changesets…it’s kind of like you gave the intern the problem and they submitted a bit of a mess. It would be a terrible PR, but it’s a useful intermediate state to take another step from.

    It feels like the IDE needs a new mode to deal with this state, and that SCM needs to be involved somehow too. Somehow help the developer guide this somewhat flaky stream of edits and sculpt it into a good changeset.

    • fragmede 3 days ago |
      Aider commits to git with each command, making it easy to back out changes, and also squash them into discrete chunks later (and reorder them with interactive rebase).
      • golergka 3 days ago |
        Automatically runs linter and tests on every edit and forwards failures back to LLM as well.
    • Aeolun 3 days ago |
      I think the full agent mode context is actually often hard to see, but there’s a list somewhere. The list of files in your chat dialog is not the full context (it adds open files too). I find that if I reduce the context size Cursor gives me much better results.
  • User23 3 days ago |
    LLMs are, at their core, search tools. Training is indexing and prompting is querying that index. The granularity being at the n-gram rather than the document level is a huge deal though.

    Properly using them requires understanding that. And just like we understand every query won’t find what we want, neither will every prompt. Iterative refinement is virtually required for nontrivial cases. Automating that process, like eg cursor agent, is very promising.

    • IanCal 3 days ago |
      Half of the problems are people treating them as searchers when they aren't. They're absolutely not ngram indexes of existing data, either.
    • mvdtnz 3 days ago |
      I'm losing track of the number of different things the Hacker News commenters claim LLMs are "at their core".
      • bitwize 3 days ago |
        LLMs are, at their core, fucking Dissociated Press. That's what makes them fun and interesting, and that's the problem with using them for real production work.
      • sulam 3 days ago |
        Isn't this answer obvious/facile but also true? They're next token predictors.
    • sdesol 3 days ago |
      > LLMs are, at their core, search tools.

      This is the wrong take. Search tools are deterministic unless you purposely inject random weights into the ranking. With search tools, the same search query will always yield the same search result, provided they are designed too and/or the underlying data has not changed.

      With LLMs, I can ask the exact same question and get a different response, even if the data has not changed.

      • Scene_Cast2 3 days ago |
        The randomness comes from sampling. With local LLMs, you can fix the random seed, or even disable sampling all together - both will get you determinism.

        I agree that LLMs are not search tools, but for very different reasons.

        • klabb3 3 days ago |
          Semantics. It may be able to get deterministic but it’s unstable wrt unrelated changes in the training data, no? If I add a page about sausages to a search index, the results for ”ski jacket” will be unaffected. In a practical sense, LLMs are non-deterministic. I mean, ChatGPT even has a ”regenerate” button to expose this ”turbulence” as a feature.
          • User23 3 days ago |
            Hence n-grams rather than documents.

            Also what's with using "semantics" as a dismissal when the technology we're talking about is the most semantically relevant search ever made.

        • sdesol 3 days ago |
          Thanks for the info on local LLMs. Based on my chats with multiple LLMs, the biggest issue appears to be hardware.

          Non-deterministic hardware: All LLMs mentioned that modern computing hardware, such as GPUs or TPUs, can introduce non-determinism due to factors like parallel processing, caching, or numerical instability. This can make it challenging to achieve determinism, even with fixed random seeds or deterministic algorithms.

          You can find the summary of my chats https://beta.gitsense.com/?chat=1c3e69f9-7b8b-48a3-8b99-bb1b.... If you scroll to the top and click on the "Conversation" link in the first message, you can read the individual responses.

    • jcranmer 3 days ago |
      > LLMs are, at their core, search tools.

      Fundamentally, no they're not. That is why you have cases like the Air Canada chatbot that told a user about a refund opportunity that didn't exist, or the lawyer in Mata v Avianca who cited a case that didn't exist. If you ask an LLM to search for something that doesn't exist, there's a decent chance it will hallucinate something into existence for you.

      What LLMs are good at is effectively turning fuzzy search terms into non-fuzzy terms; they're also pretty good at taking some text and recasting into an extremely formulaic paradigm. In other words, turning unstructured text into something structured. The problem they have is that they don't have enough understanding of the world to do something useful that with structured representation that needs to be accurate.

  • notjoemama 3 days ago |
    Our company has a no AI use policy. The assumption is zero trust. We simply can’t know whether a model or its framework could or would send proprietary code outside the network. So it’s best to assume all LLMs/AI is or will send code or fragments of code. While I applaud the incredible work by their creators, I’m not sure how a responsible enterprise class company could rely on “trust us bro” EULAs or repo readmes.
    • codebje 3 days ago |
      The same way responsible enterprise class companies rely on "trust us bro" EULAs for financial systems, customer databases, payroll, and all the other systems it would be very expensive and error prone to build custom for every business.
      • ryanobjc 2 days ago |
        Pretty much this.

        OpenAI poisoned the well badly with their "we train off your chats" nonsense.

        If you are using any API service, or any enterprise ChatGPT plan, your tokens are not being logged and recycled into new training data.

        As for why trust them? Like the parent said: EULAs. Large companies trust EULAs and terms of service for every single SAAS product they use, and they use tons and tons of them.

        OpenAI in a clumsy attempt to create a regulatory moat by doing sketchy shit and waving wild "AI will kill us all" nonsense has created a situation where the usefullness of these transforming generative solutions are automatically rejected by many.

    • pama 3 days ago |
      Your company could locally host LLMs; you wont get chatGPT or Claude quality, but you can get something that would have been SOTA a year ago. You can vet the public inference codebases (they are only of moderate complexity), and you control your own firewalls.
      • CubsFan1060 3 days ago |
        You can run Claude on both AWS and Google Cloud. I’m fairly certain they don’t share data, but would need to verify to be sure.
        • evilduck 3 days ago |
          You can also run Llama 405B and the latest (huge) DeepSeek on your own hardware and get LLMs that trade blows with Claude and ChatGPT, while being fully isolated and offline if needed.
          • krembo 3 days ago |
            With Amazon Bedrock you can get an isolated serverless Claude or llama with a few clicks
            • evilduck 2 days ago |
              True, but if your org is super paranoid about data exfiltration you're probably not sending it to AWS either.
      • Kostchei 3 days ago |
        You can get standalone/isolated versions of chatGPT, if your org is large enough, in partnership with OpenAI. And others. They run on the same infra but in accounts you set up, cost the same, but you have visibility on the compute, and control of data exfil - ie is there is none.
    • j45 3 days ago |
      Local LLMs for code aren't that out of the question to run.

      Even for not code generation, but even smaller models only for programming to weigh on different design approaches, etc.

    • attentive 3 days ago |
      So, you're asking how enterprise class companies are using github for repos and gmail for all the enterprise mail? What's next, zoom/teams for meetings?
    • lazybreather 3 days ago |
      Palo Alto networks provides security product "AI access security" which claims to solve the problem you mentioned - access control, data protection etc. I don't personally use it neither does my org. Giving here just in case it is useful for someone.
    • BBosco 3 days ago |
      The vast majority of fortune 500’s have legal frameworks up for dealing with internal AI use already because the reality is employees are going to use it regardless of internal policy. Assuming every employee will act in good faith just because a blanket AI ban is in place is extremely optimistic at best, and isn’t a good substitute for actual understanding.
      • sulam 3 days ago |
        Internal policies at these companies are rarely subject to a level of faith that you're implying. Instead external access to systems is logged, internal systems are often sandboxed or otherwise constrained in how you interact with them, and anything that looks like exfiltration sets off enough alarms to have your manager talking to you that same day, if not that same hour.
    • Pyxl101 3 days ago |
      Just curious, how does your company host its email? Documents? Files?
    • janalsncm 3 days ago |
      You can run pretty decent models on your laptop these days. Works in airplane mode.

      https://ollama.com/

    • golergka 3 days ago |
      What's the realistic attack scenario? Will Sam Altman steal your company's code? Or will next version of GPT learn on your secret sauce algorithms and then your competitors will get them when they generate code for their tasks and your company loses its competitive advantage?

      I'm actually sure that there are companies for which these scenarios are very real. But I don't think there's a lot of them. Most of the code our industry works on has very little value outside of context of particular product and company.

      • cudgy 2 days ago |
        So why bother securing anything at all if not willing to secure the raisons d'être? Doesn’t that suggest that these companies are trivial entities?
        • golergka 2 days ago |
          There are plenty of very realistic attack scenarios, that's why we secure stuff.
        • ThePyCoder 19 hours ago |
          Only if you see source code as the only valuable thing, which it isn't. The knowledge of the team, industry connections, experience etc etc are a big part of what make it so you can effectively use the source code.

          We're making an industrial sorting machine. Our management is feared to death to lose the source code. But realistically, who's going to put in the time to fully understand a codebase we can barely grasp ourselves? Then get rid of all custom sensor mappings, paths and other stuff specific for us. And then develop on it further, assuming they even believe we have the "right" way of doing things?

          Right, no one. 90% of companies could open source their stuff and, apart from legal nonsense, nothing practical will happen, no one will read the code.

          • cudgy 2 hours ago |
            You just supported my point that these companies at their core have little value. A team? Teams are fleeting and easily replaced given the hiring and firing (and poaching) practices of companies. Industry connections? Maybe to some degree, but those are fleeting as well and how do you value it? Most of these connections are held by relatively few people in the company.

            Companies in other legal jurisdictions will and can steal ip with little impunity and throw new AI tools to quickly gather an understanding of the codebase. Furthermore, knowledge of source provides a roadmap to attack vectors for security violations. Seems foolish to dismiss the risks of losing control of source code.

    • Aeolun 3 days ago |
      I mean, we host our code on Github. What are they going to do with Copilot code snippets?
    • mbesto 3 days ago |
      > proprietary code outside the network

      Thought exercise: what would seriously happen if you did let some of your proprietary code outside your network? Oddly enough, 75% of the people writing code on HN probably have their companies code stored in GitHub. So there already is an inherent trust factor with GH/MSFT.

      As another anecdote - Twitch's source code got leaked a few years back. Did Twitch lose business because of it?

      • aulin 3 days ago |
        > Thought exercise: what would seriously happen if you did let some of your proprietary code outside your network

        Lawsuits? Lawful terminations? Financial damages?

        • mbesto 2 days ago |
          Huh? No, i'm saying, what potential damage does an organization have? Not the individual who may leak data outside your network.
          • aulin 2 days ago |
            Those are risks both for the individual and for the company when there are contracts in place with third parties involving code sharing.

            Other risks include leaking industrial secrets that may significantly damage company business or benefit competitors.

            • klibertp 2 days ago |
              Please acknowledge that your situation is pretty unique. Just take a look at the comments: how many people say, or outright presume, that their company's code is already on GitHub? I'd wager that your org doesn't keep code at a 3rd party provider, right? Then, you're in a minority.

              I don't mean to dismiss your concerns - in your situation, they are probably warranted - I just wanted to say that they are unique and not necessarily shared by people who don't share your circumstances.

              • aulin 2 days ago |
                This subthread started with someone from a no AI policy company, people are dismissing it with snarky comments, along the line of your code is not as important as you believe. I'm just trying to show a different picture, we work in a pretty vast field and people commenting here don't necessarily represent a valid sample.
                • klibertp 2 days ago |
                  > people are dismissing it with snarky comments, along the line of your code is not as important as you believe.

                  That says more about those people than about your/OP's code :)

                  Personally, I had a few collisions with regulation and compliance over the years, so I can appreciate the completely different mindset you need when working with them. On the other hand, at my current position, not only do we have everything on Github, but there were also instances where I was tasked with mirroring everything to bitbucket! (For code escrow... i.e., if we go out of business, our customer will get access to the mirrored code.)

                  > people commenting here don't necessarily represent a valid sample.

                  Right. I should have said that you're in the minority here. I'm not sure what's the ratio of dumb CRUD apps to "serious business" kind of development in the wild. I know there are whole programming subfields where your kinds of concerns are typical. They might just be underrepresented here.

                  • aulin 2 days ago |
                    Yes I've had plenty of experiences with orgs that self host everything, I don't think it's a minority it's just a different cluster than the one most represented here.

                    Still I believe hosting is somewhat different, if anything because it's something established, known players, trusted practices. AI is new, contracts are still getting refined, players are still making their name, companies are moving fast and I doubt data protection is their priority.

                    I may be wrong but I think it's reasonable for IT departments to be at least prudent towards these frameworks. Search is ok, chat is okish, crawling whole projects for autocompletion I'd be more careful.

                    • mbesto a day ago |
                      > Yes I've had plenty of experiences with orgs that self host everything, I don't think it's a minority it's just a different cluster than the one most represented here.

                      I've done 800+ tech diligence projects and have first hand knowledge of every single one's use of VCS. At least 95% of the codebases are stored on a cloud hosted VCS. It's absolutely a minority to host your own VCS.

                    • mbesto a day ago |
                      > I doubt data protection is their priority.

                      So you're basing your whole argument on nothing other than "I just don't feel like they do that".

                      Does this look unserious to you? https://trust.openai.com/

                • mbesto a day ago |
                  First, I didn't dismiss their "no AI policy" nor did I use snarky comments. I was asking a legitimate question - which is - most orgs have their code stored on another server out of their control, so what's the legitimate business issue if your code gets leaked? I still haven't gotten an answer.
      • switchbak 2 days ago |
        The other consideration: your company's code probably just isn't that good.

        I think many people over-value this giant pile of text. That's not to say IP theft doesn't exist, but I think the actual risk is often overblown. Most of an organization's value is in the team's collective knowledge and teamwork ability, not in the source code.

    • lm28469 3 days ago |
      > I’m not sure how a responsible enterprise class company could rely on “trust us bro” EULAs or repo readmes.

      Isn't that what we do with operating systems, internet providers, &c. ?

      • aulin 3 days ago |
        How is that related? we're talking of continuously sending proprietary code and related IP to a third party, seems a pretty valid concern to me.

        I, for one, work every day with plenty of proprietary vendor code under very restrictive NDAs. I don't think they would be very happy knowing I let AIs crawl our whole code base and send it to remote language models just to have fancy autocompletion.

        • lm28469 3 days ago |
          Do you read every single line of code of every single dependency you have ? I don't see how llms are more of a threat than a random compromised npm package or something from a OS package manager. Chances are you're already relying on tons and tons of "trust me bro" and "it's opensource bro don't worry, just read the code if you feel like it"
          • aulin 3 days ago |
            One thing is consciously sharing IP with third parties violating contracts, another is falling victim of malicious code in the toolchain.

            Npm concern though suggests we likely work in very different industries so that may explain the different perspective.

        • bongodongobob 2 days ago |
          Ok, the LLM crawls your code. Then what? What is the exfiltration scenario?
        • ryanobjc 2 days ago |
          "Continuously sending proprietary code and related IP to a third party"

          Isn't this... github?

          Companies and people are doing this all day every day. LLM APIs are really no different. Only when you magic it up as "the AI is doing thinking" ... but in reality text -> tokens -> math -> tokens -> text. It's a transformation of numbers into other numbers.

          The EULAs and ToS say they don't log or retain information from API requests. This is really no different than Google Drive, Atlassian Cloud, Github, and any number of online services that people store valuable IP and proprietary business and code in.

    • tsukikage 3 days ago |
      You can get models that run offline. The other risk is copyright/licensing exposure; e.g. the AI regurgitates a recognisably large chunk of GPL code, and suddenly you have a legal landmine in your project waiting to be discovered. There's no sane way for a reviewer to spot this situation in general.

      You can ask a human to not do that, and there are various risks to them personally if they do so regardless. I'd like to see the AI providers take on some similar risks instead of disclaiming them in their EULAs before I trust them the way I might a human.

    • cudgy 2 days ago |
      Does your company develop software overseas where legal action is difficult? Or where their ip could be nationalized or secretly stolen? Where network communications are monitored and saved?
    • k__ 2 days ago |
      Seems like only working on open source code has its benefits.
  • bangaladore 3 days ago |
    The killer feature about LLMs with programming in my opinion is autocomplete (the simple copilot feature). I can probably be 2-3x more productive as I'm not typing (or thinking much). It does a fairly good job pulling in nearby context to help it. And that's even without a language server.

    Using it to generate blocks of code in a chat like manner in my opinion just never works well enough in the domains I use it on. I'll try to get it to generate something and then realize when I get some functional result I could've done it faster and more effectively.

    Funny enough, other commenters here hate autocomplete but love chat.

    • m3kw9 3 days ago |
      The autocomplete is mostly a nusance and maybe low percentage of the time it does right.
      • tptacek 3 days ago |
        Yeah, I don't like it either. I think it speaks to the mindset difference Crawshaw is talking about here. When I'm writing code, I don't want things getting in my way. I have a plan. I'm actually pretty Zen about all the typing. It's part of my flow-state. But when I'm exploring code in a dialog with a chatbot, I'm happy for the help.
        • switchbak 2 days ago |
          I think we're going to be considered dinosaurs pretty soon. Much like how it's getting harder to buy a manual transmission, programming 'the old way' will probably just fade away over time.
      • LVB 3 days ago |
        The biggest nuisance aspect for me is when it is trying to do things that the LSP can do 100% correctly. Almost surely it is my tooling setup and the LLM is squashing LSP stuff. Seeing Copilot (or even Cursor) suggesting methods or parameters that don't exist is really annoying. Just stand down and let the LSP answer those basic questions, TYVM.
        • throwup238 3 days ago |
          Cursor ostensibly has a config setting to run a “shadow” workspace [1], aka a headless copy of the window you’re working in to get feedback from linters and LSPs but they’ve been iterating so fast I’m not sure it’s still working (or ever did much, really).

          It really feels like we’re at the ARPANET stage where there’s so much obvious hanging fruit, it’s just going to take companies a while to perfect it.

          [1] https://www.cursor.com/blog/shadow-workspace

      • ahoka 3 days ago |
        The industry standard was 40% accepted the last time I checked. Correct could be a bit lower, so maybe 1/3?

        It’s like having to delete the auto-closed parenthesis more often than not.

      • jghn 2 days ago |
        I thought so too. Until I worked with a client who doesn't allow the use of LLM tools, and I had to turn my Copilot off. That's when I realized how much I'd grown to rely on it despite the headaches.
    • LeftHandPath 3 days ago |
      I’ve never used it, simply because I hate autocomplete in emails.

      Gmail autocomplete saves me maybe 2-5s per email: the recipients name, a comma, and a sign off. Maybe a quarter or half sentence here or there, but never exactly what I would’ve typed.

      In code bases, I’ve never seen the appeal. It’s only reliably good at stuff that I can easily find on Google. The savings are inconsequential at best, and negative at worst when it introduces hard-to-pinpoint bugs.

      LLMS are incredible technology, but when applied to code, they act more like non-deterministic macros.

      • switchbak 2 days ago |
        "negative at worst when it introduces hard-to-pinpoint bugs" - this is actually very true. I've had it recreate patterns _partially_, and paste in the wrong thing in a place that was very hard to discern.

        It probably saved me 40 mins, then proceeded to waste 2 hours of me hunting for that issue. I'm probably at the break-even on the whole. The ultimate promise is very compelling, but my current use isn't particularly amazing. I do use a niche language though, so I'm outside the global optima.

        • LeftHandPath 2 days ago |
          Exactly! I expect that some are able to put it to good use. I am not one of those people.

          My experiences with ChatGPT and Gemini have included lots of confident but wrong answers, eg “What castle was built at the highest altitude”. Thats what gives me pause.

          Gemini spits out a great 2D A* implementation no problem. That is awesome. Actually, contrary to my original comment, I probably will use AI for that sort of thing going forward.

          Despite that, I don’t want it in my IDE. Maybe I’m just a bit of a Luddite.

    • imhoguy 2 days ago |
      Both autocomplete and chat are half-way UX solutions. Really what I need is some kind of mix of in-place chat with completion.

      For context, very often I have to put some comment before the line for completion to set an expectation context.

      Instead editor should allow me to influence completion with some kind of in-place suggestion input available under keyboard shortcut. Then I could type what I want into such input and when I hit Enter or Tab the completion proposal appears. Even better if it would let me undo/modify such input, and have shortcuts like "show me different option", "go back to previous".

    • switchbak 2 days ago |
      I had to turn autocomplete off. I value it when I want it, but otherwise it's such a distraction that it both slows me down, and actively irritates me.

      Perhaps I'm just an old man telling the LLM to get off my lawn, but I find it does bad things to my ability to concentrate on hard things.

      Having a good sense of when it would be useful, and invoking it on demand seems to be a decent enough middle ground for me. Much of it boils down to UX - if it could be present but not actively distracting, I'd probably be ok with it.

      • alexxys 9 hours ago |
        I absolutely love autocomplete in VSCode but hate it in Visual Studio while using the same LLM. The big difference for me is the function AcceptNextWord which Visual Studio doesn't have. A long autocomplete suggestion is rarely completely correct and becomes an annoying distraction but one or several words at the beginning of the suggestion are often correct. So, I usually accept only one or few words in VSCode with a hotkey, then type a bit more, then accept few words more etc. That works great for me. Also, I developed intuition in which pieces of code an LLM suggestion would be most probably useless, so I just ignore any suggestions there to avoid unnecessary distraction.

        My guess is that many devs who don't like LLM autocomplete, are just unlucky to use a suboptimal UI. As an example, I personally don't understand how some people could like autocomplete in Visual Studio. As you said, it's just too distracting and irritating.

        BTW, I use Codeium, not Copilot. But I guess they should have the same autocomplete UI which depends more on IDE than LLM.

  • jimmydoe 3 days ago |
    Anyone has good recommendation of LocalLLM for autocompletion

    Most editors I use supports online LLM but it's too slow sometimes for me.

    • ec109685 3 days ago |
      Unless your network is poor, I’d imagine (but definitely could be wrong in your case!), the bottleneck is the LLM speed, not the latency to the data center its running in.
    • th4t1sW13rd 2 days ago |
      • jimmydoe 2 days ago |
        Thank you!
  • wdutch 3 days ago |
    I no longer work in tech, but I still write simple applications to make my work life easier.

    I frequently use what OP refers to as chat-driven programming, and I find it incredibly useful. My process starts by explaining a minimum viable product to the chat, which then generates the code for me. Sometimes, the code requires a bit of manual tweaking, but it’s usually a solid starting point. From there, I describe each new feature I want to add—often pasting in specific functions for the chat to modify or expand.

    This approach significantly boosts what I can get done in one coding session. I can take an idea and turn it into something functional on the same day. It allows me to quickly test all my ideas, and if one doesn’t help as expected, I haven’t wasted much time or effort.

    The biggest downside, however, is the rapid accumulation of technical debt. The code can get messy quickly. There's often a lot of redundancy and after a few iterations it can be quite daunting to modify.

    • j45 3 days ago |
      Is there a model you prefer to use?
      • KTibow 3 days ago |
        Not wdutch but Claude Sonnet is one of the best models out there for programming, o1 is sometimes better but costs more
    • chii 3 days ago |
      > The code can get messy quickly. There's often a lot of redundancy and after a few iterations it can be quite daunting to modify.

      i forsee in the future an LLM that has sufficient context length for (automatic) refactoring and tech debt removal, by pasting large portions of these existing code in.

      • scarface_74 3 days ago |
        Even without LLMs, at least with statically type languages like C#, ReSharper can do solution wide refactoring that are guaranteed correct as long as you don’t use reflection.

        https://www.jetbrains.com/help/resharper/Refactorings__Index...

        I don’t see any reason it couldn’t do more aggressive refactors with LLMs and either correct itself or don’t do the refactor if it fails static code checking. Visual Studio can already do real time type checking for compile time errors

      • Aeolun 3 days ago |
        Cursor has recently added something like this ‘Bug Finder’. It told me that finding bugs on my entire codebase would cost me $21 or so, so I never actually tried, but it sounds cool.
    • prettyblocks 3 days ago |
      I have a similar approach, but the mess can be contained by asking for optimizations and refactors very frequently and only asking for very granular features.
    • trash_cat 3 days ago |
      > The biggest downside, however, is the rapid accumulation of technical debt. The code can get messy quickly. There's often a lot of redundancy and after a few iterations it can be quite daunting to modify.

      What stops you from using o1 or sonnet to refactor everything? It sounds like a typical LLM task.

    • SkyBelow 2 days ago |
      >The biggest downside, however, is the rapid accumulation of technical debt.

      Is that really related to the LLM?

      Even in pre-LLM times, anytime I've scrapped together some code to solve some small immediate problem it grows tech debt at an amazing rate. Getting a feel for when a piece of code is going to be around long enough that it needs to be refactored, cleaned up, documented, etc. is a skill I developed over time. Even now it isn't a prefect guess, as there is an ongoing tug of war between wasting time today refactoring something I might not touch again with wasting time tomorrow having to pick up something I didn't clean up.

  • nemothekid 3 days ago |
    I think "Chat driven programming" is the most common type of the most hyped LLM-based programming I see on twitter that I just can't relate to. I've incorporated LLMs mainly as auto-complete and search; asking ChatGPT to write a quick script or to scaffold some code for which the documentation is too esoteric to parse.

    But having the LLM do things for me, I frequently run into issues where it feels like I'm wasting my time with an intern. "Chat-based LLMs do best with exam-style questions" really speaks to me, however I find that constructing my prompts in such a way where the LLM does what I want uses just as much brainpower as just programming the thing my self.

    I do find ChatGPT (o1 especially) really good at optimizing existing code.

    • throwup238 3 days ago |
      > "Chat-based LLMs do best with exam-style questions" really speaks to me, however I find that constructing my prompts in such a way where the LLM does what I want uses just as much brainpower as just programming the thing my self.

      It speaks to me too because my mechanical writing style (as opposed to creative prose) could best be described as what I learned in high school AP English/Literature and the rest of the California education system. For whatever reason that writing style dominated the training data and LLMs just happens to be easy to use because I came out of the same education system as many of the people working at OpenAI/Anthropic.

      I’ve had to stop using several generic turns of phrase like “in conclusion” because it made my writing look too much like ChatGPT.

    • AlotOfReading 3 days ago |
      It's interesting that you find it useful for optimization. I've found that they're barely capable of anything more than shallow optimization in my stuff without significant direction.

      What I find useful is that I can keep thinking at one abstraction level without hopping back and forth between algorithm and codegen. The chat is also a written artifact I can use the faster language parts of my brain on instead of the slower abstract thought parts.

    • tptacek 3 days ago |
      There's an art to cost-effectively coaxing useful answers (useful drafts of code) from an LLM, and there's an art to noticing the most productive questions to put to that process. It's a totally different way of programming than having an LLM looking over your shoulder while you direct, function by function, type by type, the code you're designing.

      If you feel like you're wasting your time, my bet is that you're either picking problems where there isn't enough value to negotiate with the LLM, or your expectations are too high. Crawshaw mentions this in his post: a lot of the value of this chat-driven style is that it very quickly gets you unstuck on a problem. Once you get to that point, you take over! You don't convince the LLM to build the final version you actually commit to your branch.

      Generating unit test cases --- in particular, generating unit test cases that reconcile against unsophisticated, brute-force, easily-validated reference implementations of algorithms --- are a perfect example of where that cost/benefit can come out nicely.

    • sibeliuss 3 days ago |
      My technique is to feed it a series of intro questions that prepare it for the final task. Chat the thing into a proper comfort level, and then from there, with the context at hand, ask to help solve the real problem. Def feels like a new kind of programming model because its still very programming-esque.
    • Aeolun 3 days ago |
      I’ve found that everything just works (more or less) since switching to Cursor. Agent based composer mode is magical. Just give it a few files for context, and ask it to do what you want.
  • _boffin_ 3 days ago |
    Does anyone know of any good chat based ui builders. No. Not build a chat app.

    Does webflow have something?

    My problem is being able to describe what I want in the style I want.

  • singpolyma3 3 days ago |
    It seems like everything I see about success using LLMs for this kind of work is for greenfield. What about three weeks later when the job changes to maintenance and interation on something that's already working? Are people applying LLMs to that space?
    • kylebenzle 3 days ago |
      Yes, it's just harder the larger the pre-existing code base.
    • throwup238 3 days ago |
      My codebase is relatively greenfield (started working on it early last year) but it’s up to ~50k lines in a mixed C++/Rust codebase with a binding layer whose API predates every LLM’s training sets. Even when I started ChatGPT/Claude weren’t very useful but now the project requires a completely different strategy when working with LLMs (it’s a QT AI desktop app so I’m dogfooding a lot). I’ve also used them in a larger codebase (~500k lines) and that also requires a different approach from the former. It feels a lot like the transition from managing 2 to 20 to 200 to 2000 people. It’s a different ballgame with each step change. A very well encapsulated code base of ~500k lines is manageable for small changes but not for refactoring, exploration, etc, at least until useful context sizes increase another order of magnitude (I keep trying Gemini’s 2M but it’s been a disappointment).

      I have a lot of documentation aimed at the AI in `docs/notes/` (some of it written by an LLM but proofread before committing) and I instruct Cursor/Windsurf/Aider via their respective rules/config files to look at the documentation before doing anything. At some scale that initial context becomes just a directory listing & short description of everything in the notes folder, which eventually breaks down due to context size limits, either because I exceed the maximum length of the rules or the agent requires pulling in too much context for the change.

      I’ve found that there’s actually an uncanny valley between greenfield projects where the model is free to make whatever assumptions it wants and brownfield projects where it’s possible to provide enough context from the existing codebase to get both API accuracy (hallucinations) and general patterns through few-shot examples. This became very obvious once I had enough examples of that binding layer. Even though I could include all of the documentation for the library, it didn’t work consistently until I had a variety of production examples to point it to.

      Right now, I probably spend as much time writing each prompt as I do massaging the notes folder and rules every time I notice the model doing something wrong.

    • zkry 3 days ago |
      Logically this makes sense: every model has a context size and complexity capacity where it will no longer be able to function properly. Any usage of said model will accelerate the approach to this limit. Once the limit is reached, the LLM is no longer as helpful as it was.

      I work on full blown legacy apps and needless to say I don't even bother with LLMs when working on these most of the time.

    • Mashimo 3 days ago |
      I used AI code completion from GitHub copilot on a 20 year old project. You still have to create new classes, new test, refactor etc.
    • valenterry 3 days ago |
      Yeah, it sucks. LLMs are not great with a big context yet. I hope that is being worked on. I need the LLM to read my whole project AND optimally all related slack conversations, the wiki and related libraries.
      • glouwbug 2 days ago |
        Then what will you do?
        • valenterry 2 days ago |
          I can for example tell it to refactor things. It would have to write files of course. E.g. "Add retries with exponential backoffs to all calls to service X"
  • e12e 3 days ago |
    Interesting. I wonder what the equivalent of sketch.dev would look like if it targeted Smalltalk and was embedded in a Smalltalk image (preferably with a local LLM running in smalltalk)?

    I'd love to be able to tell my (hypothetical smalltalk) tablet to create an app for me, and work interactively, interacting with the app as it gets built...

    Ed: I suppose I should just try and see where cloud ai can take smalltalk today:

    https://github.com/rsbohn/Cuis-Smalltalk-Dexter-LLM

  • dewitt 3 days ago |
    One interesting bit of context is that the author of this post is a legit world-class software engineer already (though probably too modest to admit it). Former staff engineer at Google and co-founder / CTO of Tailscale. He doesn't need LLMs. That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me personally.

    His post reminds me of an old idea I had of a language where all you wrote was function signatures and high-level control flow, and maybe some conformance tests around them. The language was designed around filling in the implementations for you. 20 years ago that would have been from a live online database, with implementations vying for popularity on the basis of speed or correctness. Nowadays LLMs would generate most of it on the fly, presumably.

    Most ideas are unoriginal, so I wouldn't be surprised if this has been tried already.

    • knighthack 3 days ago |
      I knew he was a world-class engineer the moment I saw that his site didn't bother with CSS stylesheets, ads, pictures, or anything beyond a rudimentary layout.

      The whole article page reads like a site from the '90s, written from scratch in HTML.

      That's when I knew the article would go hard.

      Substantive pieces don't need fluffy UIs - the idea takes the stage, not the window dressing.

      • shaneofalltrad 3 days ago |
        I wonder what he uses, I noticed the first paragraph took over a second to load... Largest Contentful Paint element 1,370 ms This is the largest contentful element painted within the viewport. Element p
        • cess11 3 days ago |
          Looks like it loads all the Google surveillance without asking. Should IP-block the EU.
      • alexvitkov 3 days ago |
        Glad to know I was a world class engineer at the age of 8, when all I knew were the <h1> and <b> tags!
    • dekhn 3 days ago |
      I think what you're describing is basically "interface driven development" and "test driven development" taken to the extreme: where the formal specification of an implementation is defined by the test suite. I suppose a cynic would say that's what you get if you left an AI alone in a room with Hyrum's Law.
    • gopalv 3 days ago |
      > That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me personally.

      There is likely to be a great rift in how very talented people look at sharper tools.

      I've seen the same division pop up with CNC machines, 3d printers, IDEs and now LLMs.

      If you are good at doing something, you might find the new tool's output to be sub-par over what you can achieve yourself, but often the lower quality output comes much faster than you can generate.

      That causes the people who are deliberate & precise about their process to hate the new tool completely - expressing in the actual code (or paint, or marks on wood) is much better than trying to explain it in a less precise language in the middle of it. The only exception I've seen is that engineering folks often use a blueprint & refine it on paper.

      There's a double translation overhead which is wasteful if you don't need it.

      If you have dealt with a new hire while being the senior of the pair, there's that familiar feeling of wanting to grab their keyboard instead of explaining how to build that regex - being able to do more things than you can explain or just having a higher bandwidth pipe into the actual task is a common sign of mastery.

      The incrementalists on the other hand, tend to love the new tool as they tend to build 6 different things before picking what works the best, slowly iterating towards what they had in mind in the first place.

      I got into this profession simply because I could Ctrl-Z to the previous step much more easily than my then favourite chemical engineering goals. In Chemistry, if you get a step wrong, you go to the start & start over. Plus even when things work, yield is just a pain there (prove it first, then you scale up ingredients etc).

      Just from the name of sketch.dev, it appears that this author is of the 'sketch first & refine' model where the new tool just speeds up that loop of infinite refinement.

      • liotier 3 days ago |
        > If you are good at doing something, you might find the new tool's output to be sub-par over what you can achieve yourself, but often the lower quality output comes much faster than you can generate. That causes the people who are deliberate & precise about their process to hate the new tool completely

        Wow, I've been there ! Years ago we dragged a GIS system kicking and screaming from its nascent era of a dozen ultrasharp dudes with the whole national fiber optics network in their head full of clever optimizations, to three thousand mostly clueless users churning out industrial scale spaghetti... The old hands wanted a dumb fast tool that does their bidding - they hated the slower wizard-assisted handholding, that turned out to be essential to the new population's productivity.

        Command line vs. GUI again... Expressivity vs. discoverability, all the choices vs. don't make me think. Know your users !

        • namaria a day ago |
          This whole thing makes me think of that short story "The Machine Stops".

          As we keep burrowing deeper and deeper into an overly complex system that allows people to get into parts of it without understanding the whole, we are edging closer to a situation where no one is left who can actually reason about the system and it starts to deteriorate beyond repair until it suddenly collapses.

          • IanKerr 7 hours ago |
            We are so, so far beyond that point already. The complexity of the world economy is beyond any one mind to fully comprehend. The microcosm of building black-box LLMs that perform feats we don't understand is yet another instance of us building systems which may forever be beyond human understanding.

            How is any human meant to understand a billion lines of code in a single codebase? How is any human meant to understand a world where there are potentially trillions of lines of code operating?

      • jprete 3 days ago |
        This is a good characterization. I'm precision-driven and know what I need to do at any low level. It's the high-level definition that is uncertain. So it doesn't really help to produce a dozen prototypes of an idea and pick one, nor does it help to fill in function definitions.
      • tikkun 3 days ago |
        Intersting.

        So engineers that like to iterate and explore are more likely to like LLMs.

        Whereas engineers that like have a more rigid specific process are more likely to dislike LLMs.

        • godelski 2 days ago |
          I frequently iterate and explore when writing code. Code gets written multiple times before being merged. Yet, I still haven't found LLMs to be helpful in that way. The author gives "autocomplete", "search", and "chat-driven programming" as 3 paradigms. I get the most out of search (though a lot of this is due to the decreasing value of Google), autocomplete is pretty weak to me especially as I macro or just use contextual complete, and I've failed miserably at chat-driven programming on every attempt. I spend more time debugging the AI than it would to debug myself. Albeit it __feels__ faster because I'm doing more typing + waiting rather than continuous thinking (but the latter has extra benefits).
        • erosivesoul 2 days ago |
          FWIW I find LLMs almost useless for writing novel code. Like it can spit out a serviceable UUID generator when I need it, but try writing something with more than a layer or two of recursion and it gets confused. I turn copilot on for boilerplate and off for solving new problems.
      • harrall 2 days ago |
        I believe it’s more that people hate trying new tools because they’ve already made their choice and made it their identity.

        However, there are also people who love everything new and jump onto the latest hype too. They try new things but then immediately advocate it without merit.

        Where are the sane people in the middle?

        • dns_snek 2 days ago |
          As an experienced software developer, I paid for ChatGPT for a couple of months, I trialed Gemini Pro for a couple of months, and I've used the current version of Claude.

          I'd be happy if LLMs could produce working code as often and as quickly as the evangelist claim, but whenever I try to use LLM to work on my day to day tasks, I almost always walk away frustrated and disappointed - and most of my work is boring on technical merits, I'm not writing novel comp-sci algorithms or cryptography libraries.

          Every time I say this, I'm painted as some luddite who just hates change when the reality is that no, current LLMs are just not fit for many of the purposes they're being evangelized for. I'd love nothing more than to be a 2x developer on my side projects, but it just hasn't happened and it's not for the lack of trying or open mindedness.

          edit: I've never actually seen any LLM-driven developers work in real time. Are there any live coding channels that could convince the skeptics what we're missing out on something revolutionary?

          • harrall 2 days ago |
            You're the middle ground I was talking about. You tried it. You know where it works and where it doesn't.

            I've used LLM to generate code samples and my IDE (IntelliJ) uses an LLM for auto-suggestions. That's mostly about it for me.

          • davepeck 2 days ago |
            I see less "painting as a luddite" in response to statements like this, and more... surprise. Mild skepticism, perhaps!

            Your experience diverges from that of other experienced devs who have used the same tools, on probably similar projects, and reached different conclusions.

            That includes me, for what it's worth. I'm a graybeard whose current work is primarily cloud data pipelines that end in fullstack web. Like most devs who have fully embraced LLMs, I don't think they are a magical panacea. But I've found many cases where they're unquestionably an accelerant -- more than enough to justify the cost.

            I don't mean to say your conclusions are wrong. There seems to be a bimodal distribution amongst devs. I suspect there's something about _how_ these tools are used by each dev, and in the specific circumstances/codebases/social contexts, that leads to quite different outcomes. I would love to read a better investigation of this.

            • efnx 2 days ago |
              I think it also depends on _what_ the domain is, and also to a certain degree the tools / stack you use. LLMs aren’t coherent or correct when working on novel problems, novel domains or using novel tools.

              They’re great for doing something that has been done before, but their hallucinations are wildly incorrect when novelty is at play - and I’ll add they’re always very authoritative! I’m glad my languages of choice have a compiler!

              • davepeck 2 days ago |
                Yeah, absolutely.

                LLMs work best for code when both (a) there's sufficient relevant training data aka we're not doing something particularly novel and (b) there's sufficient context from the current codebase to pick up expected patterns, the peculiarities of the domain models, etc.

                Drop (a) and get comical hallucinations; drop (b) and quickly find that LLMs are deeply mediocre at top-level architectural and framework/library choices.

                Perhaps there's also a (c) related to precision. You can write code to issue a SQL query and return JSON from an API endpoint in multiple just-fine ways. Misplace a pthread_mutex_lock, however, and you're in trouble. I certainly don't trust LLMs to get things like this right!

                (It's worth mentioning that "novelty" is a tough concept in the context of LLM training data. For instance, maybe nobody has implemented a font rasterizer in Rust before, but plenty of people have written font rasterizers and plenty of others have written Rust; LLMs seem quite good at synthesizing the two.)

              • jpc0 2 days ago |
                My recent example for where its helpful.

                Pretty nice at autocomplete. Like writing json tags in go structs. Can just autocomplete that's stuff for me no problem, it saved me seconds per line, seconds I tell you.

                It's stupid as well... Autofilled a function, looks correct. Reread it 10 minutes later and well... Minor mistake that would have caused a crash at runtime. It looked correct but in reality it just didn't have enough context ( the context is in an external doc on my second screen ... ) and there was no way it would ever have guessed the correct code.

                It took me longer to figure out why the code looked wrong than if I had just typed it myself.

                Did it speed up my workflow on code I could have given a junior to write? Not really, but some parts were quicker while other were slower.

                And imagine if that code bad crashed in production next week instead of right now while the whole context is still in my head. Maybe that would be hours of debugging time...

                Maybe as parent said, for a domain where you are braking new ground, it can generate some interesting ideas you wouldn't have thought about. Like a stupid pair that can get you out if a local manima but in general doesn't help much it can be a significant help.

                But then again you could do what has been done for decades and speak to another human about the problem, at least they may have signed the same NDA as you...

          • holoduke 2 days ago |
            Yesterday i wanted to understand what a team was doing in a go project. I have never really touched go before. I do understand software, because I develop for plus 20 years. But chatgpt was perfectly able to give me a summary on how the implementation worked. Gave me examples and suggestions. And within a day fulltime pasting code and asking question i had a good understanding of the codebase. It would have be a lot more difficult with only google.
            • twelve40 2 days ago |
              how often do you get to learn an unfamiliar language? is it something you need to do every day? so this use case, did it save you much time overall?
          • NoOn3 2 days ago |
            I have very similar experience. For me LLM are good at explaining someone else's complex code, but for some reason they don't help me write new code well. I would also like to see any LLM-driven developers work in real time.
          • HappMacDonald 2 days ago |
            My experience thus far is that LLMs can be quite good at:

            * Information lookup

            -- when search engines are enshittified and bogged down by SEO spam and when it's difficult to transform a natural language request into a genuinely unique set of search keywords

            -- Search-enabled LLMs have the most up to date reach in these circumstances but even static LLMs can work in a pinch when you're searching for info that's probably well represented in their training set before their knowledge cutoff

            * Creatively exploring a vaguely defined problem space

            -- Especially when one's own head feels like it's too full of lead to think of anything novel

            -- Watch out to make sure the wording of your request doesn't bend the LLM too far into a stale direction. For example naming an example can make them tunnel vision onto that example vs considering alternatives to it.

            * Pretending to be Stack Exchange

            -- EG, the types of questions one might pose on SE one can pose to an LLM and get instant answers, with less criticism for having asked the question in the first place (though Claude is apparently not above gently checking in if one is encountering an X Y problem) and often the LLM's hallucination rate is no worse than that of other SE users

            * Shortcut into documentation for tools with either thin or difficult to navigate docs

            -- While one must always fact-check the LLM, doing so is usually quicker in this instance than fishing online for which facts to even check

            -- This is most effective for tools where tons of people do seem to already know how the tool works (vs tools nobody has ever heard of) but it's just not clear how they learned that.

            * Working examples to ice-break a start of project

            * Simple automation scripts with few moving parts, especially when one is particular about the goal and the constraints

            -- Online one might find example scripts that almost meet your needs but always fail to meet them in some fashion that's irritating to figure out how to coral back into your problem domain

            -- LLMs have deep experience with tools and with short snippets of coherent code, so their success rate on utility scripts are much higher than on "portions of complex larger projects".

          • edanm 2 days ago |
            Totally respect your position, given that you actually tried the tool and found it didn't work for you. That said, one valid explanation is that the tool isn't good for what you're trying to achieve. But an alternative explanation is that you haven't learned how to use the tool effectively.

            You seem open to this possibility, since you ask:

            > I've never actually seen any LLM-driven developers work in real time. Are there any live coding channels that could convince the skeptics what we're missing out on something revolutionary?

            I don't know many yet, but Steve Yegge, a fairly famous developer in his own right, has been talking about this for the last few months, and has walked a few people through his "Chat Oriented Programming" (CHOP) ideas. I believe if you search for that phrase, you'll find a few videos, some from him and some from others. Can't guarantee they're all quality videos, though anything Steve himself does is interesting, IMO.

        • evilfred 2 days ago |
          Middle Ground Fallacy
          • harrall 2 days ago |
            Fallacy fallacy
          • goatlover 2 days ago |
            The middle ground between hyping the new tech and being completely skeptical about it is usually right. New tech is usually not everything it's hyped up to be, but also usually not completely useless or bad for society. It's likely we're not about to usher in the singularity or doom society, but LLMs are useful enough to stick around in various tools. Also it's probably the case that a percentage of they hype is driven by wanting funding.
            • oblio 2 days ago |
              > New tech is usually not everything it's hyped up to be, but also usually not completely useless or bad for society.

              Except for cryptocurrencies (at least their ratio of investments to output) :-p

        • wvenable 2 days ago |
          > Where are the sane people in the middle?

          They are the quiet ones.

          • jrockway 2 days ago |
            Yup! I don't have a lot to say about LLMs for coding. There are places where I'm certain they're useful and that's where I use them. I don't think "generate a react app from scratch" helps me, but things like "take a CPU profile and write it to /tmp/pprof.out" have worked well. I know how to do the latter, but would need to look at the docs for the exact function name to call, and the LLM just knows and checks the error on opening the file and all that tedium. It's helpful.

            At my last job I spent a lot of time on cleanups and refactoring and never got the LLM to help me in any way. This is the thing that I try every few months and see what's changed, because one day it will be able to do the tedious things I need to get done and spare me the tedium.

            Something I should try again is having the LLM follow a spec and see how it does. A long time ago I wrote some code to handle HTTP conditional requests. I pasted the standard into my code, and wrote each chunk of code in the same order as the spec. I bet the LLM could just do that for me; not a lot of knowledge of code outside that file was required, so you don't need many tokens of context to get a good result. But alas the code is already written and works. Maybe if I tried doing that today the LLM would just paste in the code I already wrote and it was trained on ;)

      • travisporter 2 days ago |
        > I got into this profession simply because I could Ctrl-Z to the previous step much more easily than my then favourite chemical engineering goals.

        That is interesting. Asking as a complete ignoramus - is there not a way to do this now? Like start off with a 100 of reagent and at every step use a bit and discard if wrong

        • ssivark 2 days ago |
          But for every step that turns out to be "correct" you now have to go back and redo that in your held-out sample anyways. So it's not like you get to save on repeating the work -- IIUC you just changed it from depth-first execution order to breadth-first execution order.
          • Vampiero 2 days ago |
            > International Islamic University Chittagong

            ??? What's up with native English speakers and random acronyms of stuff that isn't said that often? YMMV, IIUC, IANAL, YSK... Just say it and save everyone else a google search.

            • HappMacDonald 2 days ago |
              So just to make sure I'm on the same page: you're bemoaning how commonly people abbreviate uncommon sayings?
              • Vampiero 2 days ago |
                I'm bemoaning the fact that I have to google random acronyms every time an American wants to say the most basic shit as if everyone on the internet knows their slang and weird four letter abbreviations

                And googling those acronyms usually returns unrelated shit unless you go specifically to urban dictionary

                And then it's "If I understand correctly". Oh. Of course. He couldn't be arsed to type that

                • amenhotep 2 days ago |
                  FWIW IMO YTA
                  • edgineer 2 days ago |
                    frfr
            • tmtvl 2 days ago |
              I'm not a native English speaker, but IIUC is clearly 'If I Understand Correctly'. If you look at the context it's often fairly easy to figure out what an initialism means. I mean even I can usually deduce the meaning and I'm barely intelligent enough to qualify as 'sentient'.
        • numpad0 2 days ago |
          That likely ends up with 100 failed results all attributed to the same set of causes
      • dboreham 2 days ago |
        Calculators vs slide rules.
      • numpad0 2 days ago |
        I can't relate to this comment at all. Doesn't feel like what's said in GP either.

        IMO, LLMs are super fast predictive input and hallucinatory unzip; files to be decompressed don't have to exist yet, but input has to be extremely deliberate and precise.

        You have to have a valid formula that gives the resultant array that don't require no more than 100 IQ to comprehend, and then they unroll it for you into the whole code.

        They don't reward trial and error that much. They don't seem to help outsiders like 3D printers did, either. It is indeed a discriminatory tool as in it mistreats amateurs.

        And, by the way, it's also increasingly obvious to me that assuming pro-AI posture more than what you would from purely rational and utilitarian standpoint triggers a unique mode of insanity in humans. People seem to contract a lot of negativity doing it. Don't do that.

      • throwaway4aday a day ago |
        Not so sure about those examples and pairing with the idea of quick and dirty work.
    • CraigJPerry 3 days ago |
      >> where all you wrote was function signatures and high-level control flow, and maybe some conformance tests around them

      AIUI that’s where idris is headed

    • greenyouse 3 days ago |
      That approach sounds similar to the Idris programming language with Type Driven Development. It starts by planning out the program structure with types and function signatures. Then the function implementation (aka holes) can be filled in after the function signatures and types are set.

      I feel like this is a great approach for LLM assisted programming because things like types, function signatures, pre/post conditions, etc. give more clarity and guidance to the LLM. The more constraints that the LLM has to operate under, the less likely it is to get off track and be inconsistent.

      I've taken a shot at doing some little projects for fun with this style of programming in TypeScript and it works pretty well. The programs are written in layers with the domain design, types, schema, and function contracts being figured out first (optionally with some LLM help). Then the function implementations can be figured out towards the end.

      It might be fun to try Effect-TS for ADTs + contracts + compile time type validation. It seems like that locks down a lot of the details so it might be good for LLMs. It's fun to play around with different techniques and see what works!

      • lysecret 3 days ago |
        100% this is what I do in python too!
    • brabel 3 days ago |
      I am not a genius but have a couple of decades experience and finally started using LLMs in anger in the last few weeks. I have to admit that when my free quota from GitHub Copilot ran out (I had already run out of Jetbrains AI as well!! Our company will start paying for some service as the trials have been very successful), I had a slight bad feeling as my experience was very similar to OP: it's really useful to get me started, and I can finish it much more easily from what the AI gives me than if I started from scratch. Sometimes it just fills in boilerplate, other times it actually tells me which functions to call on an unfamiliar API. And it turns out it's really good at generating tests, so it makes my testing more comprehensive as it's so much faster to just write them out (and refine a bit usually by hand). The chat almost completely replaced my StackOverflow queries, which saves me much time and anxiety (God forbid I have to ask something on SO as that's a time sink: if I just quickly type out something I am just asking to be obliterated by the "helpful" SO moderators... with the AI, I just barely type anything at all, leave it with typos and all, the AI still gets me!).
      • EagnaIonat 3 days ago |
        Have you tried using Ollama? You can download and run an LLM locally on your machine.

        You can also pick the right model for the right need and it's free.

        • mentos 3 days ago |
          I’m using ChatGPT4o to convert a C# project to C++. Any recommendation on what Ollama model I could use instead?
          • neonsunset 3 days ago |
            The one that does not convert C# at all and asks you to just optimize it in C# instead (and to use the appropriate build option) :D
            • mentos 3 days ago |
              I’m converting game logic from C# to UE5 C++. So far made great progress using ChatGPT4o and o1
              • neonsunset 3 days ago |
                Do you find these working out better for you than Claude 3.5 Sonnet? So far I've not been a fan of the ChatGPT models' output.
                • mentos 3 days ago |
                  I find ChatGPT better with UE4/5 C++ but they are very close.

                  Biggest advantage is the o1 128k context. I can one shot an entire 1000 line class where normally I’d have to go function by function with 4o.

        • brabel 3 days ago |
          Yes. If the AI is not integrated with the IDE, it's not as helpful. If there were an IDE plugin that let you use a local model, perhaps that would be an option, but I haven't seen that (Github Copilot allows selecting different models, but I didn't check more carefully whether that also includes a local one, anyone knows?).
          • oogali 3 days ago |
            It’s doable as it’s what I use to experiment.

            Ollama + CodeGPT IntelliJ plugin. It allows you to point at a local instance.

            • mark_l_watson 2 days ago |
              I also use Ollama for coding. I have a 32G M2 Mac, and the models I can run are very useful for coding and debugging, as well as data munging, etc. That said, sometimes I also use Claude Sonnet 3.5 and o1. (BTW, I just published an Ollama book yesterday, so I am a little biassed towards local models.)
              • matrix12 2 days ago |
                Thanks for the book!
          • bpizzi 2 days ago |
            > (Github Copilot allows selecting different models, but I didn't check more carefully whether that also includes a local one, anyone knows?).

            To my knowledge, it doesn't.

            On Emacs there's gptel which integrates quiet nicely different LLM inside Emacs, including a local Ollama.

            > gptel is a simple Large Language Model chat client for Emacs, with support for multiple models and backends. It works in the spirit of Emacs, available at any time and uniformly in any buffer.

            https://github.com/karthink/gptel

          • th4t1sW13rd 2 days ago |
            This can use Ollama: https://www.continue.dev/
          • antifa a day ago |
            > If there were an IDE plugin that let you use a local model

            TabbyML

      • devjab 3 days ago |
        I'm genuinely curious but what did you use StackOverflow for before? With a couple of decades in the industry I can't remember when the last time I "Google programmed" anything was. I always go directly to the documentation for whatever it is I'm working for, because where else would I find out how it actually works? It's not like I haven't "Google programmed" when I was younger, but it's just such a slow process based on trusting strangers on the internet that it never really made much sense once I started knowing what I was doing. I sort of view LLM's in a similar manner. Why would you go to them rather than the actual documentation? I realize this might sound arrogant or rude, and I really hope you believe me when I say that I don't mean it like this. The reason I'm curious is because we're really struggling getting junior developers to not look, everywhere, but the documentation first. Which means they often actually don't know how what they build works. Which can be an issue when they load every object of a list into memory isntead of using a generator...

        As far as using LLMs in anger I would really advice anyone to use them. GitHub copilot hasn't been very useful for me personally, but I get a lot of value out of running my thought process by a LLM. I think better when I "think out loud" and that is obviously challenging when everyone is busy. Running my ideas by an LLM helps me process them in a similar (if not better) fashion, often it won't even really matter what the LLM conjures up because simply describing what I want to do often gives me new ideas, like "thinking out loud".

        As far as coding goes. I find it extremely useful to have LLMs write cli scripts to auto-generate code. The code the LLM will produce is going to be absolute shite, but that doesn't matter if the output is perfectly fine. It's reduced my personal reliance on third party tools by quite a lot. Because why would I need a code generator for something (and in that process trust a bunch of 3rd party libraries) when I can have a LLM write a similar tool in half an hour?

        • wiseowise 3 days ago |
          > Why would you go to them rather than the actual documentation?

          Not every documentation is made equal. For example: Android docs are royal shit. They cover some basic things, e.g. show a button, but good look finding esoteric Bluetooth information or package management, etc. Most of it is a mix of experimentation and historical knowledge (baggage).

          • devjab 3 days ago |
            > Not every documentation is made equal.

            They are wildly different. I'm not sure the Android API reference is that bad, but that is mainly because I've spent a good amount years with the various .Net API references and the Android one is a much more shiny turd than those. I haven't had issues with Bluetooth myself, the Bluetooth SIG has some nice specification PDF's but I assume you're talking about the ones which couldn't be found? I mean this in a "they don't seem to exist" kind of way and not that, you specifically, couldn't find them.

            I agree though. It's just that I've never really found internet answers to be very useful. I did actually search for information a few years back when I had to work with a solar inverter datalogger, but it turned out that having the ridicilously long German engineering manual scanned, OCR processed and translated was faster. Anyway, we all have our great white whales. I'm virtually incapable of understanding the SQLAlchemy documentation as an example, luckily I'll probably never have to use it again.

        • brabel 3 days ago |
          I believe you don't mean to be rude, but you just sound completely naive to me. To think that documentation includes everything is just, like, have you actually been coding anything at all that goes just slightly off the happy path? Example from yesterday: I have a modular JavaFX application (i.e. it uses Java JMS modules, not just Maven/Gradle modules). I introduced a call to `url()` in JavaFX CSS. That works when running using the classpath, but not when using the module path. I spent half an hour reading docs to see what they say about modular applications. They didn't mention anything at all. Specially because in my case, I was not just doing `getClass().getResource`... I was using the CSS directive to load a resource from the jar. This is exactly when I would likely go on SO and ask if anyone had seen this before. It used to be highly likely someone who's an expert on JavaFX would see and answer my question, sometimes even people who directly worked on JavaFX!

          StackOverflow was not really meant for juniors, as juniors usually can indeed find answers on documentation, normally. It was, like ExpertsExchange before it, a place for veterans to exchange tribal knowledge like this. If you think only juniors use SO, you seem to have arrived at the scene just yesterday and just don't know what you're talking about.

    • ilrwbwrkhv 3 days ago |
      Being a dev at a large company is usually the sign that you're not very good though. And anyone can start a company with the right connections.
      • ksenzee 3 days ago |
        You've just disproved your own assertion. Either that or you believe everyone who's any good has the right connections.
      • tomwojcik 3 days ago |
        That's a terrible blanket statement, very US-centric. Not everyone wants to start a company and you can't just reduce ones motivations to your measure of success.
        • joseda-hg 3 days ago |
          God knows many of the best devs I've known would be an absolute nightmare on the business side, they'd rather have a capable business person if they could avoid it
    • benterix 3 days ago |
      > designed around filling in the implementations for you. 20 years ago that would have been from a live online database

      This reminds me a bit of PowerBuilder (or was it PowerDesigner?) from early 1990s. They sold it to SAP later, I was told it's still being used today.

    • antirez 3 days ago |
      I have also many years of programming experience and find myself strongly "accelerated" by LLMs when writing code. But, if you think at it, it makes sense that many seasoned programmers are using LLMs better. LLMs are a helpful tool, but also a hard-to-use tool, and in general it's fair to think that better programmers can do a better use of some assistant (human or otherwise): better understanding its strengths, identifying faster the good and bad output, providing better guidance to correct the approach...

      Other than that, what correlates more strongly with the ability to use LLMs effectively is, I believe, language skills: the ability to describe problems very clearly. LLMs reply quality changes very significantly with the quality of the prompt. Experienced programmers that can also communicate effectively provide the model with many design hints, details where to focus, ..., basically escaping many local minima immediately.

      • bsenftner 3 days ago |
        Communication skills are the keys to using LLMs. Think about it: every type of information you want is in them, in fact it is there multiple times, with multiple levels of seriousness in the treatment of the idea. If one is casual in their request, using casual language, then the LLM will reply with a casual reply because that matched your request best. To get a hard, factual answer from those that are experts in a subject, use the formal term, use the expert's language and you'll get back a rely more likely to be correct because it's in the same level of formal treatment as correct answers.
        • psychoslave 3 days ago |
          >every type of information you want is in them

          Actually, I'm afraid that no. It won't give us the step by step scalable processes to make humanity as a whole enter in a loop of indefinitely long period of world peace, with each of us enjoying life in its own thriving manner. That would be great information to broadcast, though.

          Also it equally has ability to produce large pile of completely delusional answers, that mimics just as well genuinely sincere statements. Of course, we can also receive that kind of misguiding answers from humans. But the amount of output that mere humans can throw out in such a form is far more limited.

          All that said, it's great to be able to experiment with it, and there are a lot of nice and fun things to do with it. It can be a great additional tool, but it won't be a self-sufficient panacea of information source.

          • bsenftner 3 days ago |
            > It won't give us the step by step scalable processes to make humanity as a whole enter in a loop of indefinitely long period of world peace

            That's not anywhere, that's a totally unsolved and open ended problem, why would you think an LLM would have that?

            • fmbb 3 days ago |
              If what you meant was

              > Think about it: every type of already solved problem you want information about is in them, in fact it is there multiple times, with multiple levels of seriousness in the treatment of the idea.

              then that was not clear from your comment saying LLMs contain any information you want.

              One has to be careful communicating about LLms because the world is full of people that actually believe LLMs are generally intelligent super beings.

              • numpad0 2 days ago |
                I think GP's saying that it must be in your prompt, not in the weights.

                If you want LLM make sandwich, you have to tell them you `want triangular sandwiches of standard serving size made with white bread and egg based filling`, not `it's almost noon and I'm wondering if sandwich for lunch is a good idea`. Fine-tuning partially solves that problem but they still like the former.

          • arminiusreturns 2 days ago |
            • psychoslave 2 days ago |
              Interesting, thanks for sharing. Could you also give some insights on the process you followed?
              • arminiusreturns a day ago |
                Sure. Lately I've found that the "role" part of prompt engineering seems to be the most important. So what I've been doing is telling ChatGPT to play the role of the most educated/wise/knowledgeable/skilled $field $role(advisor, lawyer, researcher etc) in the history of the world and then giving it some context for the task before asking for the actual task.

                Sometimes asking it to self reflect on how the prompt itself could be better engineered helps if the initial response isn't quite right.

      • mhalle 3 days ago |
        I completely agree that communication skills are critical in extracting useful work or insight from LLMs. The analogy for communicating with people is not far-fetched. Communicating successfully with a specific person requires an understanding of their strengths and weaknesses, their tendencies and blind spots. The same is true for communicating with LLMs.

        I have actually found that from a documentation point of view, querying LLMs has made me better and explaining things to people. If, given the documentation for a system or API, a modern LLM can't answer specific questions about how to perform a task, a person using the same documentation will also likely struggle. It's proving to be a good way to test the effectiveness of documentation, for humans and for LLMs.

      • LouisSayers 2 days ago |
        > the ability to describe problems very clearly

        Yes, and to provide enough context.

        There's probably a lot that experience is contributing to the interaction as well, for example - knowing when the LLM has gone too far, focusing on what's important vs irrelevant to the task, modularising and refactoring code, testing etc

      • gen220 2 days ago |
        Hey! Asking because I know you're a fellow vimmer [0]. Have you integrated LLMs into your editor/shell? Or are you largely copy-pasting context between a browser and vim? This context-switching of it all has been a slight hang-up for me in adopting LLMs. Or are you asking more strategic questions where copy-paste is less relevant?

        [0] your videos on writing systems software were part of what inspired me to make a committed switch into vim. thank you for those!

        • qup a day ago |
          You want aider.
      • rudiksz 2 days ago |
        > "seasoned programmers are using LLMs better".

        I do not remember a single instance when code provided to me by an LLM worked at all. Even if I ask something small that cand be done in 4-5 lines of code is always broken.

        From a fellow "seasoned" programmer to another: how the hell do you write the prompts to get back correct working code?

        • jkaptur 2 days ago |
          The story from the article matches my experience. The LLM's first answer is often a little broken, so I tweak it until it's actually correct.
        • numpad0 2 days ago |
          dc: not a seasoned dev, with <b> and <h1> tags on "not".

          They can't think for you. All intelligent thinking you have to do.

          First, give them high level requirement that can be clarified into indented bullet points that looks like code. Or give them such list directly. Don't give them half-open questions usually favored by talented and autonomous individuals.

          Then let them further decompress that pseudocode bullet points into code. They'll give you back code that resemble a digitized paper test answer. Fix obvious errors and you get a B grade compiling code.

          They can't do non-conventional structures, Quake style performance optimized codes, realtime robotics, cooperative multithreading, etc., just good old it takes what it takes GUI app API and data manipulation codes.

          For those use cases with these points in mind, it's a lot faster to let LLM generate tokens than typing `int this_mandatory_function_does_obvious (obvious *obvious){ ...` manually on a keyboard. That should arguably be a productivity boost in the sense that the user of LLM is effectively typing faster.

        • HappMacDonald 2 days ago |
          I'd ask things like "which LLM are you using", and "what language or APIs are you asking it to write for".

          For the standard answers of "GPT-4 or above", "claude sonnet or haiku", or models of similar power and well known languages like Python, Javascript, Java, or C and assuming no particularly niche or unheard of APIs or project contexts the failure rate of 4-5 line of code scripts in my experience is less than 1%.

        • wvenable 2 days ago |
          I rarely get back not working code but I've also internalized it's limitations so I no longer ask it for things it's not going to be able to do.

          As other commenters have pointed it, there also a lot of variation between different models and some are quite dumb.

          I've had no issues with 10-20 line coding problems. I've also had it built a lot of complete shell scripts and had no problem there either.

        • antirez 2 days ago |
          Check my YouTube channel if you have a few minutes. I just published a video about adding a complex feature (UTF-8) to the Kilo editor, using Claude.
        • mordymoop 2 days ago |
          I write the prompt as if I’m writing an email to a subordinate that clearly specifies what the code needs to do.

          If what I’m requesting an improvement to an existing code, I paste the whole code if practical, or if not, as much of the code as possible, as context before making request for additional functionality.

          Often these days I add something like “preserve all currently existing functionality.” Weirdly, as the models have gotten smarter, they have also gotten more prone to delete stuff they view as unnecessary to the task at hand.

          If what I’m doing is complex (a subjective judgement) I ask it to lay out a plan for the intended code before starting, giving me a chance to give it a thumbs up or clarify its understanding of what I’m asking for if it’s plan is off base.

        • throwaway4aday a day ago |
          Step 1: https://claude.ai

          Step 2: Write out your description of the thing you want to the best of your ability but phrase it as "I would like X, could you please help me better define X by asking me a series of clarifying questions and probing areas of uncertainty."

          Step 3: Once both Claude and you are satisfied that X is defined, say "Please go ahead and implement X."

          Step 4a: If feature Y is incorrect, go to Step 2 and repeat the process for Y

          Step 4b: If there is a bug, describe what happened and ask Claude to fix it.

          That's the basics of it, should work most of the time.

      • kragen 2 days ago |
        That's really interesting. What are the most important things you've learned to do with the LLMs to get better results? What do your problem descriptions look like? Are you going back and forth many times, or crafting an especially-high-quality initial prompt?
        • antirez 2 days ago |
          I'm posting a set of videos on my YT channel where I'll show the process I follow. Thanks!
          • kragen 2 days ago |
            That's fantastic! I thought about asking if you had streamed any of it, but I didn't want to sound demanding and entitled :)
    • ignoramous 3 days ago |
      > [David, Former staff engineer at Google ... CTO of Tailscale,] doesn't need LLMs. That he says LLMs make him more productive at all as a hands-on developer, especially around first drafts on a new idea, means a lot to me...

      Don't doubt for a second the pedigree of founding engs at Tailscale, but David is careful to point out exactly why LLMs work for them (but might not for others):

         I am doing a particular kind of programming, product development, which could be roughly described as trying to bring programs to a user through a robust interface. That means I am building a lot, throwing away a lot, and bouncing around between environments. Some days I mostly write typescript, some days mostly Go. I spent a week in a C++ codebase last month exploring an idea, and just had an opportunity to learn the HTTP server-side events format. I am all over the place, constantly forgetting and relearning.
      
        If you spend more time proving your optimization of a cryptographic algorithm is not vulnerable to timing attacks than you do writing the code, I don't think any of my observations here are going to be useful to you.
      • pplonski86 3 days ago |
        I'm in similar situations, I jump between many environments, mainly between Python and Typescript, however, currently testing a new idea of learning algorithm in C++, and I simply don't always remember all syntax. I was very skeptical about LLMs at first. Now, I'm using LLMs daily. I can focus more on thinking rather than searching stackoverflow. Very often I just need simple function, that it is much faster to create with chat.
        • JKCalhoun 3 days ago |
          And if anyone remembers: before Stack Overflow you more or less had to specialize in a domain, become good using a handful of frameworks/API, on one platform. Learning a new language, a new API (god forbid a new platform) was to sail, months long, into seas unknown.

          In this regard, with first Stack Overflow and now LLMs, the field has improved mightily.

      • big_youth 2 days ago |
        > If you spend more time proving your optimization of a cryptographic algorithm is not vulnerable to timing attacks than you do writing the code, I don't think any of my observations here are going to be useful to you.

        I am not a software dev I am a security researcher. LLM's are great for my security research! It is so much easier and faster to iterate on code like fuzzers to do security testing. Writing code to do a padding oracle attack would have taken me a week+ in the past. Now I can work with an LLM to write code and learn and break within the day.

        It has accelerated my security research 10 fold, just because I am able to write code and parse and interpret logs at a level above what I was able to a few years ago.

    • Vox_Leone 3 days ago |
      I have been using LLM to generate functional code from *pseudo-code* with excellent results. I am starting to experiment with UML diagrams, both with LLM and computer vision to actually generate code from UML diagrams; for example a simple activity diagram could be the prompt on LLM 's, and might look like:

      Start -> Enter Credentials -> Validate -> [Valid] -> Welcome Message -> [Invalid] -> Error Message

      Corresponding Code (Python Example):

      class LoginSystem:

          def validate_credentials(self, username, password):
              if username == "admin" and password == "password":
                  return True
              return False
      
          def login(self, username, password):
              if self.validate_credentials(username, password):
                  return "Welcome!"
              else:
                  return "Invalid credentials, please try again."
      
      *Edited for clarity
      • jonvk 2 days ago |
        This example illustrates one of the risks of using LLMs without subject expertise though. I just tested this with claude and got that exact same validation method back. Using string comparison is dangerous from a security perspective [1], so this is essentially unsafe validation, and there was no warning in the response about this.

        1. https://sqreen.github.io/DevelopersSecurityBestPractices/tim...

        • jpc0 2 days ago |
          Are you talking about the timing based attacks on that website which fails miserably at rendering a useable page on mobile?
      • jpc0 2 days ago |
        Could you add to the prompt that the password is stored in an sqlite database using argon2 for encryption, the encryption parameters are stored as environment variables.

        You would like it to avoid timing based attacks as well as dos attacks.

        It should also generate the functions as pure functions so that state is passed in and passed out and no side effects(printing to the console) happen within the function.

        Then also confirm for me that it has handled all error cases that might reasonably happen.

        While you are doing that, just think about how much implicit knowledge I just had to type into the comment here and that is still ignoring a ton of other knowledge that needs to be considered like whether that password was salted before being stored. All the error conditions for the sqlite implementation in python, the argon2 implementation in the library.

        TLDR: that code is useless and would have taken me the same amount of time to write as your prompt.

    • apwell23 3 days ago |
      he is using llm for coding. you don't become staff engineer by being a badass coder. Not sure how they are related.
    • HarHarVeryFunny 3 days ago |
      > His post reminds me of an old idea I had of a language where all you wrote was function signatures and high-level control flow

      Regardless of language, that's basically how you approach the design of a new large project - top down architecture first, then split the implementation into modules, design the major data types, write function signatures. By the time you are done what is left is basically the grunt work of implementing it all, which is the part that LLMs should be decent at, especially if the functions/methods are documented to level (input/output assertions as well as functionality) where it can also write good unit tests for them.

      • dingnuts 3 days ago |
        > the grunt work of implementing it all

        you mean the fun part. I can really empathize with digital artists. I spent twenty years honing my ability to write code and love every minute of it and you're telling me that in a few years all that's going to be left is PM syncs and OKRs and then telling the bot what to write

        if I'm lucky to have a job at all

        • HarHarVeryFunny 2 days ago |
          I think it depends on the size of the project. To me, the real fun of being a developer is the magic of being able to conceive of something and then conjure it up out of thin air - to go from an idea to reality. For a larger more complex project the major effort in doing this is the solution conception, top-down design (architecture), and design of data structures and component interfaces... The actual implementation (coding), test cases and debugging, then does become more like drudgework, not the most creative or demanding part of the project, other than the occasional need for some algorithmic creativity.

          Back in the day (I've been a developer for ~45 years!) it was a bit different as hardware constraints (slow 8-bit processors with limited memory) made algorithmic and code efficiency always a primary concern, and that aspect was certainly fun and satisfying, and much more a part of the overall effort than it is today.

    • mahmoudimus 2 days ago |
      Isn't that the idea behind UML? Which didn't work out so well, however, with the advent of LLMs today, I think that premise could work.
  • agentultra 3 days ago |
    It seems nice for small projects but I wouldn’t use it for anything serious that I want to maintain long term.

    I would write the tests first and foremost: they are the specification. They’re for future me and other maintainers to understand and I wouldn’t want them to be generated: write them with the intention of explaining the module or system to another person. If the code isn’t that important I’ll write unit tests. If I need better assurances I’ll write property tests at a minimum.

    If I’m working on concurrent or parallel code or I’m working on designing a distributed system, it’s gotta be a model checker. I’ve verified enough code to know that even a brilliant human cannot find 1-in-a-million programming errors that surface in systems processing millions of transactions a minute. We’re not wired that way. Fortunately we have formal methods. Maths is an excellent language for specifying problems and managing complexity. Induction, category theory, all awesome stuff.

    Most importantly though… you have to write the stuff and read it and interact with it to be able to keep it in your head. Programming is theory-building as Naur said.

    Personally I just don’t care to read a bunch of code and play, “spot the error;” a game that’s rigged for me to be bad at. It’s much more my speed to write code that obviously has no errors in it because I’ve thought the problem through. Although I struggle with this at times. The struggle is an important part of the process for acquiring new knowledge.

    Though I do look forward to algorithms that can find proofs of trivial theorems for me. That would be nice to hand off… although simp does a lot of work like that already. ;)

  • rafaelmn 3 days ago |
    I disagree about search. While LLM can give you an answer faster, good doc (eg. MDN article in CSS example) will :

    - be way more reliable

    - probably be up to date on how you should solve it in latest/recommend approach

    - put you in a place where you can search for adjecent tech

    LLM with search has potential but I'd like if current tools are more oriented on source material rather than AI paraphrasing.

    • cruffle_duffle 3 days ago |
      One of my tricks is to paste the docs right into the context so the model can’t fuck it up.

      Though I still wonder if that means I’m only tricking myself into thinking the LLM is increasing my productivity.

      • rafaelmn 3 days ago |
        I likr this approach. Read the docs, figure out what you want, get LLM to do the grunt work with all relevant context and review.
    • EGreg 3 days ago |
      I have found LLMs to be 95% useful on documented software, from everything eg Uniswap smart contracts to plugins in cordova to setting up Mac or Linux administrative tools.

      The problem for a regular person is that you have to copypasye from chat. That is “the last mile”. For terminal commands that’s fine but for programming you need a tool to automate this.

      Something like refactoring a function, given the entire context, etc. And it happening in the editor and you seeing a diff right away. The rest of the explanatory text should go next to the diff in a separate display.

      I bet someone can make a VSCode extension that chats with an LLM and does exactly this. The LLM is told to provide all the sections labeled clearly (code, explanation) and the editor makes the diff.

      Having said all that, good libraries that abstract away differences are far superior to writing code with an LLM. The only code that needs to be written is the interface and wiring up between the libraries.

  • Ozzie_osman 3 days ago |
    One mode I felt was missed was "thought partner", especially while debugging (aka rubber ducking).

    We had an issue recently with a task queue seemingly randomly stalling. We were able to arrive at the root cause much more quickly than we would have because of a back-and-forth brainstorming session with Claude, which involved describing the issue we were seeing, pasting in code from library to ask questions, asking it to write some code to add some missing telemetry, and then probing it for ideas on what might be going wrong. An issue that may have taken days to debug took about an hour to identify.

    Think of it as rubber ducking with a very strong generalist engineer who knows about basically any technical concepts.

    • mmahemoff 3 days ago |
      The new video and screen-share capabilities in ChatGPT and Gemini should make rubber-ducking smoother.

      I feel like I've worn out my computer’s clipboard and alt-tab keys at this stage of the LLM experience.

      • fragmede 3 days ago |
        You may want to try any of the tools that can write to the filesystem so you're at least not copy pasting code from a chat window. CoPilot, Cursor, Aider, Tabnine, etc.
    • vendiddy 3 days ago |
      I found myself doing this with o1 recently for software architecture.

      I will evaluate design ideas with the model, express concerns on trade-offs, ask for alternative ideas, etc.

      Some of the benefit is having someone to talk to, but with proper framing it is surprisingly good at giving balanced takes.

  • simondotau 3 days ago |
    I've recently started using Cursor because it means I can now write python where two weeks ago I couldn't write python. It wrote the first pass of an API implementation by feeding it the PDF documentation. I've spent a few days testing and massaging it into a well formed, well structured library, pair-programming style.

    Then I needed to write a simple command line utility, so I wrote it in Go, even though I've never written Go before. Being able to make tiny standalone executables which do real work is incredible.

    Now if I ever need to write something, I can choose the language most suited to the task, not the one I happen to have the most experience with.

    That's a superpower.

    • midasz 3 days ago |
      But you're not really writing python right? You're instructing a tool to generate python. Kinda like saying I'm writing bytecode while I'm actually just typing Java.
      • simondotau 3 days ago |
        I am really writing python. The LLM is a substitute for having foreknowledge of this particular language's syntax and grammar, but I'm still debugging like a "real" programmer and I'm still editing/refining the code like a "real" programmer, because I am.

        Probably half the lines of code were written by me, because I do know how to write code.

        Here's what I wrote if you're curious: https://github.com/sjwright/zencontrol-python/

  • yawnxyz 3 days ago |
    > I could not go a week without getting frustrated by how much mundane typing I had to do before having a FIM model

    For those not in-the-know, I just learned today that code autocomplete is actually called "Fill-in-the-Middle" tasks

    • Guthur 3 days ago |
      Says who? I've been in the industry for nearly 25 years and have heard auto complete throughout but not once have I heard fill in the middle.

      Stop taking these blogs as oracle's of truth, they are not. These AI articles are full of this nonsense, to the point where it would appear to me many responses might just be Nvidia bots or whatever.

      • sunaookami 3 days ago |
        >I've been in the industry for nearly 25 years and have heard auto complete throughout but not once have I heard fill in the middle

        Then you need to look harder. FiM is a common approach for code generation LLMs.

        https://openai.com/index/efficient-training-of-language-mode...

        https://arxiv.org/abs/2207.14255

        This was before ChatGPT's release btw.

        • Guthur 3 days ago |
          Why, what was wrong with code completion, it was perfectly valid before even when including some sort of fuzzing.

          It's like everything to do with LLM marketing buzzword nonsense.

          I really want to just drop out of tech until all this obnoxious hype BS is gone.

          • ascorbic 3 days ago |
            Autocomplete is the feature, fill in the middle is one approach to implementing it. There are other ways to providing it (which were used in earlier versions of Copilot) and FIM can be used for tasks other than code completion.
          • wruza 3 days ago |
            It’s just a term that signals “completion in between” rather than “after”. Regular code completion usually doesn’t take the following blocks into account mostly because these are grammatically vague due to an ongoing edit.

            Your comments may be sympathised to, but why on earth are they addressed to the root commenter. They simply shared their findings about an acronym.

            • Guthur 3 days ago |
              Because they mentioned it, why on earth would you think that is not a valid response in a thread that mentions it, from my observation that's pretty much how forum like threads work.

              More pressingly why do you think you should police it?

              • wruza 3 days ago |
                Apologies if my feedback annoyed you, it wasn’t the goal. I just care about HN and this didn’t feel right.
      • crawshaw 2 days ago |
        Author here.

        FIM is a term of art in LLM research for a style of tokens used to implement code completion. In particular, it refers to training an LLM with the extra non-printing tokens:

            <|fim_prefix|>
            <|fim_middle|>
            <|fim_suffix|>
        
        You would then take code like this:

            func add(a, b int) int {
                return <cursor>
            }
        
        and convert it to:

            <|fim_prefix|>func add(a, b int) int {
                return<|fim_suffix|>
            }<|fim_middle|>
        
        and have the LLM predict the next token.

        It is, in effect, an encoding scheme for getting the prefix and suffix into the LLM context while positioning the next token to be where the cursor is.

        (There are several variants of this scheme.)

  • ripped_britches 3 days ago |
    I’ll say that the payoff for investing the time to learn how to do this right is huge. Especially with cursor which allows me to easily chat around context (docs, library files, etc)
    • Aeolun 3 days ago |
      I didn’t believe it could be so good until I actually used it. It’s a shame some of their models are proprietary because that means I can’t use it for work. Would love if the thing worked purely with Copilot Chat (like Zed does), or if Zed added a similar composer mode.
  • brabel 3 days ago |
    What the author is asking about, a quick sketchpad where you can try out code quickly and chat with the AI, already exists in the JetBrains IDEs. It's called a scratch file[1].

    As far as I know, the idea of a scratch "buffer" comes from emacs. But in Jetbrains IDEs, you have the full IDE support even with context from your current project (you can pick the "modules" you want to have in context). Given the good integration with LLMs, that's basically what the author seems to want. Perhaps give GoLand[2] a try.

    Disclosure: no, I don't work for Jetbrains :D just a very happy customer.

    [1] https://www.jetbrains.com/help/idea/scratches.html

    [2] https://www.jetbrains.com/go/

    • ryanobjc 2 days ago |
      It's also available in emacs with packages like gptel which let you send the content of any buffer to your LLM of choice.

      I think emacs + LLM is a killer feature: the integration is super deep, deeper than any IDE I've seen, and it's just available... everywhere! Any text in emacs is sendable to a LLM.

      • brabel 2 days ago |
        I need to try that, but I have a feeling that in emacs it won't work as well because emacs has a bit more "trouble" setting up workspaces and using context only from that. Trying use use `project.el` now as it seems projectile has been superseded by it, if you know how to easily set that up with eglot support + AI would be helpful.
  • justinl33 3 days ago |
    I've maintained several SDKs, and the 'cover everything' approach leads to nightmare dependency trees and documentation bloat. imo, the LLM paradigm shifts this even further - why maintain a massive SDK when users can generate precisely what they need? This could fundamentally change how we think about API distribution.
  • golergka 3 days ago |
    I have written a small fullstack app over the holidays, mostly with LLMs, to see how far would they get me. Turns out, they can easily write 90% of the code, but you still need to review everything, make the main architectural decisions and debug stuff when AI cant solve the bug after 2-3 iterations. I get a huge productivity boost and at the same time am not afraid that they will replace me. At least not yet.

    Can't recommend aider enough. I've tried many different coding tools, but they all seem like a leaky abstraction over LLMs medium of sequential text generation. Aider, on the other hand, leans into it in the best possible way.

  • lysecret 3 days ago |
    Funny, he starts of dismissing an AI IDE to end with building an AI IDE :D (Smells a little bit like not invented here syndrom) Otherwise fascinating article!
    • cpursley 2 days ago |
      I joke about once per month here that half of hn is basically "not invented here syndrome". And generally poor reimplementations of existing erlang features ;)
  • bambax 3 days ago |
    > There are three ways I use LLMs in my day-to-day programming: 1/ Autocomplete 2/ Search 3/ Chat-driven programming

    I do mostly 2/ Search, which is like a personalized Stack Overflow and sometimes feels incredible. You can ask a general question about a specific problem and then dive into some specific point to make sure you understand every part clearly. This works best for things one doesn't know enough about, but has a general idea of how the solution should sound or what it should do. Or, copy-pasting error messages from tools like Docker and have the LLM debug it for you really feels like magic.

    For some reason I have always disliked autocomplete anywhere, so I don't do that.

    The third way, chat-driven programming, is more difficult, because the code generated by LLMs can be large, and can also be wrong. LLMs are too eager to help, and they will try to find a solution even if there isn't one, and will invent it if necessary. Telling them in the prompt to say "I don't know" or "it's impossible" if need be, can help.

    But, like the author says, it's very helpful to get started on something.

    > That is why I still use an LLM via a web browser, because I want a blank slate on which to craft a well-contained request

    That's also what I do. I wouldn't like having something in the IDE trying to second guess what I write or suddenly absorbing everything into context and coming up with answers that it thinks make a lot of sense but actually don't.

    But the main benefit is, like the author says, that it lets one start afresh with every new question or problem, and save focused threads on specific topics.

  • polotics 3 days ago |
    My main usage is in helping me approach domains and tools I don't know enough to confidently know how best to get started.

    So one thing that doesn't get a mention in the article but is quite significant I think is the long lag of knowledge cutoff dates: looking at even the latest and greatest, there is one year or more of missing information.

    I would love for someone more versed than me to tell us how best to use RAG or LoRA to get the model to answer with fully up to date knowledge on libraries, frameworks, ...

  • choeger 3 days ago |
    Essentially, an LLM is a compressed database with a universal translator.

    So what we can get out of it is everything that has been written (and publicly released) before translated to any language it knows about.

    This has some consequences.

    1. Programmers still need to know what algorithms or interfaces or models they want.

    2. Programmers do not have to know a language very well anymore, to write code, but the have to for bug fixing. Consequently the rift between garbage software and quality software will grow.

    3. New programming languages will face a big economical hurdle to take off.

    • williamcotton 3 days ago |
      3. New programming languages will face a big economical hurdle to take off.

      I bet the opposite. I’ve written a number of DSLs and tooling around them over the last year as LLMs have allowed me to take on much bigger projects.

      I expect we see an explosion of languages over the next decade.

      • klibertp 2 days ago |
        Yes - the number of languages will grow, however, their adoption will be much slower and harder to enact than now (and it's already incredibly difficult).

        You might have written the DSLs, but the LLMs are unaware of this and will offer hallucinations when asked to generate code using that DSL.

        For the past few weeks I've been slowly getting back to Common Lisp. Even though there's plenty of CL code on the net, its volume is dwarfed by Python or JS. In effect, both Github Copilot and ChatGPT (4o) have an accuracy of 5%. I'm not kidding: they're unable to generate even very simple snippets correctly, hallucinating packages and functions.

        It's of course (I think?) possible to make a GPT specialized for Lisp, but if the generic model performs poorly, it'll probably make people wary and stay away from the language. So, unless you're ready to fine-tune a model for your language and somehow distribute it to your users, you'll see adoption rates dropping (from already minuscule ones!)

  • stevage 3 days ago |
    This is a great article with lots of useful insights.

    But I'm completely unconvinced by the final claim that LLM interfaces should be separate from IDE's, and should be their own websites. No thanks.

  • dxuh 3 days ago |
    Currently a lot of my work consists of looking at large, (to me) unknown code bases and figuring out how certain things work. I think LLMs are currently very bad at this and it is my understanding that there are problems in increasing context window sizes to multiple millions of tokens, so I wonder if LLMs will ever get good at this.
    • AnnKey 3 days ago |
      I would speculate that for learning unknown codebases, fine-tuning might work better than relying on context window size.
  • jmull 3 days ago |
    LLM auto-complete is good — it suggests more of what I was going to type, and correctly (or close enough) often enough that it’s useful. Especially in the boilerplate-y languages/code I have to use for $dayjob.

    Search has been neutral. For finding little facts it’s been about the same as regular search. When digging in, I want comprehensive, dense, reasonably well-written reference documentation. That’s not exactly wide-spread, but LLMs don’t provide this either.

    Chat-driven generates too much buggy/incomplete code to be useful, and the chat interface is seriously clunky.

  • Ygg2 3 days ago |
    > Search. If I have a question about a complex environment, say “how do I make a button transparent in CSS” I will get a far better answer asking any consumer-based LLM, than I do using an old fashioned web search engine.

    I don't think this is about LLMs getting better, but search becoming worse. In no small thanks to LLMs polluting the results. Do search images for terms and count how many are AI generated.

    I can say I got better result from Google X years ago vs Google of today.

    • wizzard0 2 days ago |
      Google gets money from showing you ads, not because you pay them for quality search results.

      When you have to come over and over, and visit more pages to finally find what you needed, they get much more cash from advertisers than when you get everything instantly.

  • EGreg 3 days ago |
    Can’t we just use test-driven development with AI Agents?

    1) Idea

    2) Tests

    3) Code until all tests pass

  • ianpurton 3 days ago |
    I've been coding professionally for 30 years.

    I'm probably in the same place as the author, using Chat-GPT to create functions etc, then cut and pasting that into VSCode.

    I've started using cline which allows me to code using prompts inside VSCode.

    i.e. Create a new page so that users can add tasks to a tasks table.

    I'm getting mixed results, but it is very promising. I create a clinerules file which gets added to the system prompt so the AI is more aware of my architecture. I'm also looking at overiding the cline system prompt to both make it fit my architecture better and also to remove stuff I don't need.

    I jokingly imagine in the future we won't get asked how long a new feature will take, rather, how many tokens will it take.

    • thomasfromcdnjs 3 days ago |
      Love the token joke!
  • assimpleaspossi 3 days ago |
    Since all these AI products just put together things they pull from elsewhere, I'm wondering if, eventually, there could be legal issues involving software products put together using such things.
  • sublimefire 3 days ago |
    I've been doing that for a while as well and mostly agree. Although one thing that I find useful is to build the local infrastructure to be able to collect useful prompts and the ability to work with files and urls. Web interface is limiting alone.

    I like gptresearcher and all of the glue put in place to be able to extend prompts and agents etc. Not to mention the ability to fetch resources from the web and do research type summaries on it.

    All in all it reminds me the work of security researchers, pentesters and analysts. Throughout the career they would build a set of tools and scripts to solve various problems. LLMs kind of force the devs to create/select tools for themselves to ease the burden of their specific line of work as well. You could work without LLMs but maybe it will be a bit more difficult to stand out in the future.

  • denvermullets 3 days ago |
    this is almost exactly how ive been using llms. i dont like the code complete in the ide, personally, and prefer all llm usage to be narrow specific blocks of code. it helps as i bounce between a lot of side projects, projects at work, and freelance projects. not to mention with context switching it really helps keep things moving, imo
  • owebmaster 3 days ago |
    I thought his project, sketch.dev is of very poor quality. I wouldn't ship something like this - the auth process is awful and broke, I still can't login. If after 14 hours of the post the service is still rugged to death, it also means the scalability of the app is bad. If we are going to use LLMs to replace hours of programming, we should aim for quality too.
    • lm28469 3 days ago |
      It's really bad, much less useful than even the first public version of chatgpt. Even once you manage to log in, most of the time it doesn't even give something that compiles, it calls functions/variables which don't exist. The first line of the main had 2 errors...
      • owebmaster 8 hours ago |
        I finally could login and I would prefer I had not. Honestly, it is pathetic that something like this has so many positive comments. I guess most people commenting didn't even try sketch.dev
  • cratermoon 3 days ago |
    But the question must be asked: At what cost?

    Are the results a paradigm shift so much better that it's worth the hundreds of billions sunk into the hardware and data centers? Is spicy autocomplete worth the equivalent of flying from New York to London while guzzling thousands of liters of water?

    It might work, for some definition of useful, but what happens when the AI companies try to claw back some of that half a trillion dollars they burnt?

    • ryanobjc 2 days ago |
      That's why open research (which "open" ai has never really contributed to!) and foundational models that everyone can contribute to are essential.

      This stuff is a pretty neat magical evolution and it should not be the domain of any single company.

      Also a lot of the hardware and so on has/is being paid for. AWS gcloud, etc aren't taking massive losses on their H100 and other compute services. This bubble is no different than any prior bubble ultimately, and bankruptcy will recycle useful assets into new companies and new purposes.

      Which btw why the US is still a huge winner and will continue to be -> robust and functioning bankruptcy laws and courts.

  • nunez 3 days ago |
    I definitely respect David's opinion given his caliber, but pieces like this make me feel strange that I just don't have a burning desire to use them.

    Like, yesterday I made some light changes to a containerized VPN proxy that I maintain. My first thought wasn't "how would Claude do this?" Same thing with an API I made a few weeks ago that scrapes a flight data website to summarize flights in JSON form.

    I knew I would need to write some boilerplate and that I'd have to visit SO for some stuff, but asking Claude or o1 to write the tests or boilerplate for me wasn't something I wanted or needed to do. I guess it makes me slower, sure, but I actually enjoy the process of making the software end to end.

    Then again, I do all of my programming on Vim and, technically, writing software isn't my day job (I'm in pre-sales, so, best case, I'm writing POC stuff). Perhaps I'd feel differently if I were doing this day in, day out. (Interestingly, I feel the same way about AI in this sense that I do about VSCode. I've used it; I know what's it capable of; I have no interest in it at all.)

    The closest I got to "I'll use LLMs for something real" was using it in my backend app that tracks all of my expenses to parse pictures of receipts. Theoretically, this will save me 30 seconds per scan, as I won't need to add all of the transaction metadata myself. Realistically, this would (a) make my review process slower, as LLMs are not yet capable of saying "I'm not sure" and I'd have to manually check each transaction at review time, (b) make my submit API endpoint slower since it takes relatively-forever for it to analyze images (or at least it did when I experimented with this on GPT4-turbo last year), and (c) drive my costs way up (this service costs almost nothing to run, as I run it within Lambda's free tier limit).

    • uludag 3 days ago |
      I think there's a big selection bias on hackernews that you wouldn't get elsewhere. There's still "elite" software developers I see who really aren't into the whole LLM tooling space. I found use in the autocomplete and search workflows that the author mentioned but I stopped using these tools, out of curiosity for things were before. It turns out I don't need it to be productive and I too probably enjoy working more without it.
    • ge96 2 days ago |
      I'm an avg dev, I was never into LLMs/co-pilot etc mocking prompt engineering but... my current job is working with an LLM framework so idk... future proofs me I guess. I do like computer vision and ML on dataset eg. training hand writing IMU by gestures that's cool.

      The embeddings I feel like there is something there even if it doesn't actually understand. My journey has just begun.

      I scoff every time someone says "this + AI". AI is this thing they just throw in there. Last time I didn't want to work with some tech I quit my job was not a good move not being financially independent. Anyway yeah I'll keep digging into this. I still don't use co-pilot right now but I'm reading up more on the embedding stuff for cross training or some case like RAG.

  • 999900000999 2 days ago |
    I still find most LLMS to be extremely poor programmers .

    Claude will often generate tons and tons of useless code quickly using up it's limit. I often find myself yelling at it to stop.

    I was just working with it last night.

    "Hi Claude, can you add tabs here.": <div>

    <MainContent/>

    <div/>

    Claude will then start generating MainContent.

    DeepSeek, despite being free does a much better job than Claude. I don't know if it's smarter, but whatever internal logic it has is much more to the point.

    Claude also has a very weird bias towards a handful of UI libraries that has installed, even if those wouldn't be good for your project. I wasted hours on shancn UI which requires a very particular setup to work.

    LLM's are generally great at common tasks using a top 5( popularity) language.

    Ask it to do something in a Haxe UI library and it'll make up functions that *look* correct.

    Overall I like them, they definitely speed things up. I don't think most experienced software engineers have much to worry about for now. But I am really worried about juniors. Why higher a junior engineer, when you can just tell your seniors they need to use Copilot to crank out more code

    • joseda-hg 2 days ago |
      Assuming I know roughly what it will generate, I usually prepend my chats with previsions against this kind of thing

      "Add tabs here, assume the rest of the page will work with no futher modification, limit your changes so that any existing code keeps working"

      I also do stuff like "Project is using {X} libraries, keep dependencies minimal

      Generate a method takes {Z} parameters, return {Y}, using {A}, {B} and {C} do {thing}"

      I'll add stuff like Language version, frameworks or specific requests based on this, but then I just reuse the setup , So I like to keep the first message with as much context as possible, ideally separating project context from specific request

  • btbuildem 2 days ago |
    The search part really resonates with me. I do a lot of odd/unusual/one-off things for my side projects, and I use LLMs extensively in helping me find a path forward. It's like an infinitely patient, all-knowing expert that pulls together info from any and all domain. Sometimes it will have answers that I am unable to find another way (eg, what's the difference between "busy s..." and "busy p..." AT command response on the esp8285?). It saves me hours of struggle, and I would not want to go back to the old ways.
  • fassssst 2 days ago |
    They’re pretty great for printf debugging. Yesterday I was confounded by a bug so I rapidly added a ton of logging that the LLM wrote instantly, then I had the LLM analyze the state difference between the repro and non repro logs. It found something instantly that it would have taken me a few hours to find, which led me to a fix.
  • hansvm 2 days ago |
    That quartile reservoir sampler example is ... intriguing?

    My experience with LLM code is that it can't come up with anything even remotely novel. If I say "make it run in amortized O(1)" then 99 times out of 100 I'll get a solution so wildly incorrect (but confidently asserting its own correctness) that it can't possibly be reshaped into something reasonable without a re-write. The remaining 1/100 times aren't usually "good" either.

    For the reservoir sampler -- here, it did do the job. David almost certainly knows enough to know the limits of that code and is happy with its limitations. I've solved that particular problem at $WORK though (reservoir sampling for percentile estimates), and for the life of me I can't find a single LLM prompt or sequence of prompts that comes anywhere close to optimality unless that prompt also includes the sorts of insights which lead to an amortized O(1) algorithm being possible (and, even then, you still have to re-run the query many times to get a useful response).

    Picking on the article's solution a bit, why on earth is `sorted` appearing in the quantile estimation phase? That's fine if you're only using the data structure once (init -> finalize), but it's uselessly slow otherwise, even ignoring splay trees or anything else you could use to speed up the final inference further.

    I personally find LLMs helpful for development when either (1) you can tolerate those sorts of mishaps (e.g., I just want to run a certain algorithm through Scala and don't really care how slow it is if I can run it once and hexedit the output), or (2) you can supply all the auxilliary information so that the LLM has a decent chance of doing it right -- once you've solved the hard problems, the LLM can often get the boilerplate correct when framing and encapsulating your ideas.

  • LouisSayers 2 days ago |
    The use of LLMs reminds me a bit of how people use search engines.

    Some years ago I gave a task to some of my younger (but intelligent) coworkers.

    They spent about 50 minutes searching in google and came back to me saying they couldn't find what they were looking for.

    I then typed in a query, clicked one of the first search results and BAM! - there was the information they were unable to find.

    What was the difference? It was the keywords / phrases we were using.

  • highfrequency 2 days ago |
    > A lot of the value I personally get out of chat-driven programming is I reach a point in the day when I know what needs to be written, I can describe it, but I don’t have the energy to create a new file, start typing, then start looking up the libraries I need... LLMs perform that service for me in programming. They give me a first draft, with some good ideas, with several of the dependencies I need, and often some mistakes. Often, I find fixing those mistakes is a lot easier than starting from scratch.

    This to me is the biggest advantage of LLMs. They dramatically reduce the activation energy of doing something you are unfamiliar with. Much in the way that you're a lot more likely to try kitesurfing if you are at the beach standing next to a kitesurfing instructor.

    While LLMs may not yet have human-level depth, it's clear that they already have vastly superhuman breadth. You can argue about the current level of expertise (does it have undergrad knowledge in every field? PhD level knowledge in every field?) but you can't argue about the breadth of fields, nor that the level of expertise improves every year.

    My guess is that the programmers who find LLMs useful are people who do a lot of different kinds of programming every week (and thus are constantly going from incompetent to competent in things that other people already know), rather than domain experts who do the same kind of narrow and specialized work every day.

    • otteromkram 2 days ago |
      I think your biggest takeaway should be that they person writing the blog post is extremely well-known versed in programming and has labored over code for hours, along with writing tests, debugging, etc. He knows what he would like because it's second nature. He was able to get the best from the LLM because his vision of what the code should look like helped craft a solid prompt.

      Newer people into programming might not have as good of a time because they may skip actually learning something fundamentals and rely on LLMs as a crutch. Nothing wrong with that, I suppose, but there might be at some point when everything goes up in smoke and the LLM is out of answers.

      No amount of italic font is going to change that.

      • highfrequency 2 days ago |
        My experience is opposite - I get the most value out of LLMs for topics that I have less expertise in. It’s become vastly easier up to speed in a new field because you can immediately answer basic questions, have the holes in your understanding pointed out, and be directed to the concepts you are missing.
  • charlieyu1 2 days ago |
    I’m a hobby programmer who never worked a programming job. Last week I was bored, I asked o1 to help me to write a Solitaire card game using React because I’m very rusty with web development.

    The first few steps were great. Guided me to install things and setup a project structure. The model even generated codes for a few files.

    Then something went wrong, the model kept telling me what to do in vague, but didn’t output codes anymore. So I asked for further help, and now it started contradicting itself, rewriting business logic that were implemented in the first response, 3-4 pieces of code snippets of the same file that aren’t compatible etc, and it all fell apart.

    • jarsin 2 days ago |
      My first program ever was a windows calculator. My roomates would sit down and find bugs after I thought I perfected it. I learned so much spending weeks trying to get that damn thing working.

      I'm not too optimistic about the future of software development if juniors are turning to AI to do those early projects for them.

    • mocamoca 2 days ago |
      LLMs contexts are fast to overload, as the article states. That's why he writes smaller, specific packages, one at a time, and uses a web UI instead of something like cursor.

      I had the same issue as you a few days ago. By separating the problem in smaller parts and addressing each parts one by one it got easier.

      In your specific case I would try to fully complete the business logic one side. Reset the context. Then provide the logic to a new context and ask for an interface. Difficulty will arise when discovering that the logic is wrong or not suited to the UI, but i would keep using the same process to edit the code. Maybe two different contexts, one for logic, one for UI?

      How did you do?

    • cpursley 2 days ago |
      Yeah, you wanna use Claude for code. That's the problem. Try Cursor or Bolt.
  • aerhardt 2 days ago |
    His experience mirrors mine. I'm happy he explicitly mentions search, when people have been shouting "this is not meant for search" for a couple years now. Of course it helps with search. I also love the tech for producing first drafts, and it greatly lowers the energy and cognitive load when attacking new tasks, like others are repeating on this thread.

    I think at the same time, while the author says this is the second most impressive technology he's seen in his lifetime, it's still a far cry from the bombastic claims being made by the titans of industry regarding its potential. Not uncommon to see claims here on HN of 10x improvements in productivity, or teams of dozens of people being axed, but nothing in the article or in my experience lines up with that.

  • jordanmorgan10 2 days ago |
    The more experienced the engineer the less CSS is on the page. This seems to be a universal truth, I want to learn from these people - but my goodness, but could we at least use margins to center content.
  • dboreham 2 days ago |
    Interesting that he had the same thought initially as I did (after running a model myself on my own hardware) : this is like the first time I ran a traceroute across the planet.
  • ryanobjc 2 days ago |
    I have been getting more value out of LLMs recently, and the great irony is it is because of a few different packages in emacs and the wonderful CLI LLM chat programming tool 'aider'.

    My workflow puts LLM chat at my fingertips, and I can control the context. Pretty much any text in emacs can be sent to a LLM of your choice via API.

    Aider is even better, it does a bunch of tricks to improve performance, and is rapidly becoming a 'must have' benchmark for LLM coding. It integrates with git so each chat modification becomes a new git commit. Easy to undo changes, redo changes, etc. It also has a bunch of hacks because while o1 is good as reasoning, it (apparently) doesn't do code modification well. Aider will send different types of requests to different 'strengths' of LLMs etc. Although if you can use sonnet, you can just use that and be done with it.

    It's pretty good, but ultimately it's still just a tool for transforming words into code. It won't help you think or understand.

    I feel bad for new kids who won't develop muscle and sight strength to read/write code. Because you still need to read/write code, and can't rely on the chat interface for everything.

  • Balgair 2 days ago |
    I'm not a 'programmer'. At best, I'm a hacker, at best. I don't work in a team. All my code is mostly one time usage to just get some little thing done, sometimes a bit of personal stuff too. I mostly use Excel anyways, and then python, and even then, I hate python because half the time I'm just dealing with library issues (not a joke, I measured it (and, no, I'm not learning another language, but thank you)). I'm in biotech, a very non code-y section of it too.

    LLMs are just a life saver. Literally.

    They take my code time down from weeks to an afternoon, sometimes less. Any they're kind.

    I'm trying to write a baseball simulator on my own, as a stretch goal. I'm writing my own functions now, a step up for me. The code is to take in real stats, do Monte Carlo, get results. Basic stuff. Such a task was impossible for me before LLMs. I've tried it a few times. No go. Now with LLMs, I've got the skeleton working and should be good to go before opening day. I'm hoping that I can use it for some novels that I am writing to get more realistic stats (don't ask).

    I know a lot of HN is very dismissive of LLMs as code help. But to me, a non programmer, they've opened it up. I can do things I never imagined that I could. Is it prod ready? Hell no, please God no. But is it good enough for me to putz with and get just working? Absolutely.

    I've downloaded a bunch of free ones from huggingface and Meta just to be sure they can't take them away from me. I'm never going back to that frustration, that 'Why can't I just be not so stupid?', that self-hating, that darkness. They have liberated me.

  • averus 2 days ago |
    I think the author is really on the right path with his vision for LLMs as tool for software development. Last week I tried probably all of them with something like a code challenge.

    I have to say that I am impressed with sketch.dev, it got me a working example from the first try and it looked cleaner form all the others, similar but cleaner somehow in terms of styling.

    The whole time I was using those tools I was thinking that I want exactly this a LLM trained specifically on the Go official documentation, or whatever your favourite language is, ideally fined tuned by the maintainers of the language.

    I want the LLM to show me an idiomatic way to write an API using the standard library I don't necessarily want it to do it instead of me, or to be trained on all of the scrapped data they could scrape. Show me a couple of examples maybe explain a concept, give me steps by step guidance.

    I also share his frustrations with the chat based approach what annoys me personally the most is the anthropomorphization of the LLMs, yesterday Gemini was even patronizing me...

  • theptip a day ago |
    This lines up well with my experience. I’ve tried coming at things from the IDE and chat side, and I think we need to merge tooling more to find the sweet spot. Claude is amazing at building small SPAs, and then you hit the context window cutoff and can’t do anything except copy your file out. I suspect IDEs will figure this out before Claude/ChatGPT learn to be good enough at the things folks need from IDEs. But long-term, i suppose you don’t want to have to drop down to code at all and so the constraints of chat might force the exploration of the new paradigm more aggressively.

    Hot take of the day, I think making tests and refactors easier is going to be revolutionary for code quality.