I find chat for search is really helpful (as the article states)
But "writing a wrapper" is (presumably) a process you're familiar with, you can tell if it's going off the rails.
What's way more likely to know the best practices is the documentation. A few months ago there was a post that made the rounds about how the Arc browser introduced a really severe security flaw by misconfiguring their Firebase ACLs despite the fact that the correct way to configure them is outlined in the docs.
This to me is the sort of thing (although maybe not necessarily in this case) out of LLM programming. 90% isn't good enough, it's the same as Stackoverflow pasting. If you're a serious engineer and you are unsure about something, it is your task to go to the reference material, or you're at some point introducing bugs like this.
In our profession it's not just crypto libraries, one misconfigured line in a yaml file can mean causing millions of dollars of damage or leaking people's most private information. That can't be tackled with a black box chatbot that may or may not be accurate.
you're equating "unfamliar" with "don't know how to do" but I will claim you do know how to do it, you would just be slow because you have to reference documentation and learn which functions do what.
You can give them more latitude for things you know how to check.
I didn't know how to setup the right gnarly typescript generic type to solve my problem but I could easily verify it's correct.
More specifically, I didn't know how to solve it, though obviously could have spent much more time and learned. There were only a small number of possible cases, but I needed certain ones to work and others not to. I was easily able to create the examples but not find the solution. With looping through claude I could solve it in a few minutes. I then got an explanation, could read the right relevant docs and feel satisfied that not only did everything pass the automated checks but my own reasoning.
If you are lucky to have the LLM fix it for you, great. If you don't know how to fix it yourself and the LLM doesn't either, you've just wasted a lot of time.
> If you merely know how to check, would you also know how to fix it after you find that it's wrong?
Probably? I'm capable of reading documentation, learning and asking others.
> If you don't know how to fix it yourself and the LLM doesn't either, you've just wasted a lot of time.
You may be surprised by how little time, but regardless it would have taken more time to hit that point without the tool.
Also sometimes things don't work out, that's OK. As long as overall it improves work, that's all we need.
Indeed getting good at writing code using LLMs demands being very good at reading code.
To that extent its more like blitz chess than autocomplete. You need to think and verify in trees as it goes.
I use chat for things I don't know how to do all the time. I might not know how to do it, but I sure know how to test that what I'm being told is correct. And as long as it's not, I iterate with the chat bot.
I suppose we could ask the question: Are LLMs better at writing secure code than humans? I'll admit I don't know the answer to that, but given what we know so far, I seriously doubt it.
The issue is, there are always subtle aspects to problems that most developers only know by instinct. Like, "how is it doing the unicode conversion here" or "what about the case when the buffer is exactly the same size as the message, is there room for the terminating character?". You need the instincts for these to properly construct tests and review the code it did. If you do have those instincts, I argue you could write the code, it's just a lot of effort. But if you don't, I will argue you can't test it either and can't use LLMs to produce (at least) professional level code.
Learning how something works is critical or it's far worse than technical debt.
This means, it's okay to use LLM to try something new that you're on the fence about. Learn it and then once you've learned that concept or the idea, you can go ahead to use same code if it's good enough.
(Which goes for StackOverflow, etc.)
I'm still learning where it's usable and where I'm over-reaching. At present I'm at about break-even on time spent, which bodes well for the next few years as they iron out some of the more obvious issues.
Not really. I often use Chat to understand codebases. Instead trying to navigate mature, large-ish FOSS projects (like say, the Android Run Time) by looking at it file by file, method by method, field by field (all to laborious), I just ask ... Copilot. It is way, way faster than I and are mostly directionally correct with its answers.
Having an LLM do something for you that you don't know how to do is asking for trouble. An expert likely can off load a few things they aren't all that important, but any junior is going to dig themselves into a significant hole with this technique.
But asking an LLM to help you learn how to do something is often an option. Can't one just learn it using other resources? Of course. LLMs shouldn't be a must have. If at any point you have to depend upon the LLM, that is a red flag. It should be a possible tool, used when it saves time, but swapped for other options when they make sense.
For an example, I had a library I was new to and asked copilot how to do some specific task. It gave me the options. I used this output to go to google and find the matching documentation and gave it a read. I then when back to copilot and wrote up my understanding of what the documentation said and checked to see if copilot had anything to add.
Could I have just read the entire documentation? That is an option, but one that costs more time to give deeper expertise. Sometimes that is the option to go with, but in this case having a more shallow knowledge to get a proof of concept thrown together fit my situation better.
Anyone just copying an AI's output and putting it in a PR without understanding what it does? That's asking for trouble and it will come back to bite them.
or writing tests - that's ... not so helpful. worst is when a lazy dev takes the generated tests and leaves it at that: usually just a few placeholders that test the happy path but ignore obvious corner cases. (I suppose for API tests that comes down to adding test case parameters)
but chatting about a large codebase, I've been amazed at how helpful it can be.
what software patterns can you see in this repo? how does the implementation compare to others in the organisation? what common features of the pattern are missing?
also, like a linter on steroids, chat can help explore how my project might be refactored to better match the organisation's coding style.
https://aider.chat/docs/repomap.html
Aider hosts a leaderboard that rates LLMs on performance, including a section on refactoring.
It feels like the IDE needs a new mode to deal with this state, and that SCM needs to be involved somehow too. Somehow help the developer guide this somewhat flaky stream of edits and sculpt it into a good changeset.
Properly using them requires understanding that. And just like we understand every query won’t find what we want, neither will every prompt. Iterative refinement is virtually required for nontrivial cases. Automating that process, like eg cursor agent, is very promising.
This is the wrong take. Search tools are deterministic unless you purposely inject random weights into the ranking. With search tools, the same search query will always yield the same search result, provided they are designed too and/or the underlying data has not changed.
With LLMs, I can ask the exact same question and get a different response, even if the data has not changed.
I agree that LLMs are not search tools, but for very different reasons.
Also what's with using "semantics" as a dismissal when the technology we're talking about is the most semantically relevant search ever made.
Non-deterministic hardware: All LLMs mentioned that modern computing hardware, such as GPUs or TPUs, can introduce non-determinism due to factors like parallel processing, caching, or numerical instability. This can make it challenging to achieve determinism, even with fixed random seeds or deterministic algorithms.
You can find the summary of my chats https://beta.gitsense.com/?chat=1c3e69f9-7b8b-48a3-8b99-bb1b.... If you scroll to the top and click on the "Conversation" link in the first message, you can read the individual responses.
Fundamentally, no they're not. That is why you have cases like the Air Canada chatbot that told a user about a refund opportunity that didn't exist, or the lawyer in Mata v Avianca who cited a case that didn't exist. If you ask an LLM to search for something that doesn't exist, there's a decent chance it will hallucinate something into existence for you.
What LLMs are good at is effectively turning fuzzy search terms into non-fuzzy terms; they're also pretty good at taking some text and recasting into an extremely formulaic paradigm. In other words, turning unstructured text into something structured. The problem they have is that they don't have enough understanding of the world to do something useful that with structured representation that needs to be accurate.
OpenAI poisoned the well badly with their "we train off your chats" nonsense.
If you are using any API service, or any enterprise ChatGPT plan, your tokens are not being logged and recycled into new training data.
As for why trust them? Like the parent said: EULAs. Large companies trust EULAs and terms of service for every single SAAS product they use, and they use tons and tons of them.
OpenAI in a clumsy attempt to create a regulatory moat by doing sketchy shit and waving wild "AI will kill us all" nonsense has created a situation where the usefullness of these transforming generative solutions are automatically rejected by many.
Even for not code generation, but even smaller models only for programming to weigh on different design approaches, etc.
I'm actually sure that there are companies for which these scenarios are very real. But I don't think there's a lot of them. Most of the code our industry works on has very little value outside of context of particular product and company.
We're making an industrial sorting machine. Our management is feared to death to lose the source code. But realistically, who's going to put in the time to fully understand a codebase we can barely grasp ourselves? Then get rid of all custom sensor mappings, paths and other stuff specific for us. And then develop on it further, assuming they even believe we have the "right" way of doing things?
Right, no one. 90% of companies could open source their stuff and, apart from legal nonsense, nothing practical will happen, no one will read the code.
Companies in other legal jurisdictions will and can steal ip with little impunity and throw new AI tools to quickly gather an understanding of the codebase. Furthermore, knowledge of source provides a roadmap to attack vectors for security violations. Seems foolish to dismiss the risks of losing control of source code.
Thought exercise: what would seriously happen if you did let some of your proprietary code outside your network? Oddly enough, 75% of the people writing code on HN probably have their companies code stored in GitHub. So there already is an inherent trust factor with GH/MSFT.
As another anecdote - Twitch's source code got leaked a few years back. Did Twitch lose business because of it?
Lawsuits? Lawful terminations? Financial damages?
Other risks include leaking industrial secrets that may significantly damage company business or benefit competitors.
I don't mean to dismiss your concerns - in your situation, they are probably warranted - I just wanted to say that they are unique and not necessarily shared by people who don't share your circumstances.
That says more about those people than about your/OP's code :)
Personally, I had a few collisions with regulation and compliance over the years, so I can appreciate the completely different mindset you need when working with them. On the other hand, at my current position, not only do we have everything on Github, but there were also instances where I was tasked with mirroring everything to bitbucket! (For code escrow... i.e., if we go out of business, our customer will get access to the mirrored code.)
> people commenting here don't necessarily represent a valid sample.
Right. I should have said that you're in the minority here. I'm not sure what's the ratio of dumb CRUD apps to "serious business" kind of development in the wild. I know there are whole programming subfields where your kinds of concerns are typical. They might just be underrepresented here.
Still I believe hosting is somewhat different, if anything because it's something established, known players, trusted practices. AI is new, contracts are still getting refined, players are still making their name, companies are moving fast and I doubt data protection is their priority.
I may be wrong but I think it's reasonable for IT departments to be at least prudent towards these frameworks. Search is ok, chat is okish, crawling whole projects for autocompletion I'd be more careful.
I've done 800+ tech diligence projects and have first hand knowledge of every single one's use of VCS. At least 95% of the codebases are stored on a cloud hosted VCS. It's absolutely a minority to host your own VCS.
So you're basing your whole argument on nothing other than "I just don't feel like they do that".
Does this look unserious to you? https://trust.openai.com/
I think many people over-value this giant pile of text. That's not to say IP theft doesn't exist, but I think the actual risk is often overblown. Most of an organization's value is in the team's collective knowledge and teamwork ability, not in the source code.
Isn't that what we do with operating systems, internet providers, &c. ?
I, for one, work every day with plenty of proprietary vendor code under very restrictive NDAs. I don't think they would be very happy knowing I let AIs crawl our whole code base and send it to remote language models just to have fancy autocompletion.
Npm concern though suggests we likely work in very different industries so that may explain the different perspective.
Isn't this... github?
Companies and people are doing this all day every day. LLM APIs are really no different. Only when you magic it up as "the AI is doing thinking" ... but in reality text -> tokens -> math -> tokens -> text. It's a transformation of numbers into other numbers.
The EULAs and ToS say they don't log or retain information from API requests. This is really no different than Google Drive, Atlassian Cloud, Github, and any number of online services that people store valuable IP and proprietary business and code in.
You can ask a human to not do that, and there are various risks to them personally if they do so regardless. I'd like to see the AI providers take on some similar risks instead of disclaiming them in their EULAs before I trust them the way I might a human.
Using it to generate blocks of code in a chat like manner in my opinion just never works well enough in the domains I use it on. I'll try to get it to generate something and then realize when I get some functional result I could've done it faster and more effectively.
Funny enough, other commenters here hate autocomplete but love chat.
It really feels like we’re at the ARPANET stage where there’s so much obvious hanging fruit, it’s just going to take companies a while to perfect it.
It’s like having to delete the auto-closed parenthesis more often than not.
Gmail autocomplete saves me maybe 2-5s per email: the recipients name, a comma, and a sign off. Maybe a quarter or half sentence here or there, but never exactly what I would’ve typed.
In code bases, I’ve never seen the appeal. It’s only reliably good at stuff that I can easily find on Google. The savings are inconsequential at best, and negative at worst when it introduces hard-to-pinpoint bugs.
LLMS are incredible technology, but when applied to code, they act more like non-deterministic macros.
It probably saved me 40 mins, then proceeded to waste 2 hours of me hunting for that issue. I'm probably at the break-even on the whole. The ultimate promise is very compelling, but my current use isn't particularly amazing. I do use a niche language though, so I'm outside the global optima.
My experiences with ChatGPT and Gemini have included lots of confident but wrong answers, eg “What castle was built at the highest altitude”. Thats what gives me pause.
Gemini spits out a great 2D A* implementation no problem. That is awesome. Actually, contrary to my original comment, I probably will use AI for that sort of thing going forward.
Despite that, I don’t want it in my IDE. Maybe I’m just a bit of a Luddite.
For context, very often I have to put some comment before the line for completion to set an expectation context.
Instead editor should allow me to influence completion with some kind of in-place suggestion input available under keyboard shortcut. Then I could type what I want into such input and when I hit Enter or Tab the completion proposal appears. Even better if it would let me undo/modify such input, and have shortcuts like "show me different option", "go back to previous".
Perhaps I'm just an old man telling the LLM to get off my lawn, but I find it does bad things to my ability to concentrate on hard things.
Having a good sense of when it would be useful, and invoking it on demand seems to be a decent enough middle ground for me. Much of it boils down to UX - if it could be present but not actively distracting, I'd probably be ok with it.
My guess is that many devs who don't like LLM autocomplete, are just unlucky to use a suboptimal UI. As an example, I personally don't understand how some people could like autocomplete in Visual Studio. As you said, it's just too distracting and irritating.
BTW, I use Codeium, not Copilot. But I guess they should have the same autocomplete UI which depends more on IDE than LLM.
Most editors I use supports online LLM but it's too slow sometimes for me.
I frequently use what OP refers to as chat-driven programming, and I find it incredibly useful. My process starts by explaining a minimum viable product to the chat, which then generates the code for me. Sometimes, the code requires a bit of manual tweaking, but it’s usually a solid starting point. From there, I describe each new feature I want to add—often pasting in specific functions for the chat to modify or expand.
This approach significantly boosts what I can get done in one coding session. I can take an idea and turn it into something functional on the same day. It allows me to quickly test all my ideas, and if one doesn’t help as expected, I haven’t wasted much time or effort.
The biggest downside, however, is the rapid accumulation of technical debt. The code can get messy quickly. There's often a lot of redundancy and after a few iterations it can be quite daunting to modify.
i forsee in the future an LLM that has sufficient context length for (automatic) refactoring and tech debt removal, by pasting large portions of these existing code in.
https://www.jetbrains.com/help/resharper/Refactorings__Index...
I don’t see any reason it couldn’t do more aggressive refactors with LLMs and either correct itself or don’t do the refactor if it fails static code checking. Visual Studio can already do real time type checking for compile time errors
What stops you from using o1 or sonnet to refactor everything? It sounds like a typical LLM task.
Is that really related to the LLM?
Even in pre-LLM times, anytime I've scrapped together some code to solve some small immediate problem it grows tech debt at an amazing rate. Getting a feel for when a piece of code is going to be around long enough that it needs to be refactored, cleaned up, documented, etc. is a skill I developed over time. Even now it isn't a prefect guess, as there is an ongoing tug of war between wasting time today refactoring something I might not touch again with wasting time tomorrow having to pick up something I didn't clean up.
But having the LLM do things for me, I frequently run into issues where it feels like I'm wasting my time with an intern. "Chat-based LLMs do best with exam-style questions" really speaks to me, however I find that constructing my prompts in such a way where the LLM does what I want uses just as much brainpower as just programming the thing my self.
I do find ChatGPT (o1 especially) really good at optimizing existing code.
It speaks to me too because my mechanical writing style (as opposed to creative prose) could best be described as what I learned in high school AP English/Literature and the rest of the California education system. For whatever reason that writing style dominated the training data and LLMs just happens to be easy to use because I came out of the same education system as many of the people working at OpenAI/Anthropic.
I’ve had to stop using several generic turns of phrase like “in conclusion” because it made my writing look too much like ChatGPT.
What I find useful is that I can keep thinking at one abstraction level without hopping back and forth between algorithm and codegen. The chat is also a written artifact I can use the faster language parts of my brain on instead of the slower abstract thought parts.
If you feel like you're wasting your time, my bet is that you're either picking problems where there isn't enough value to negotiate with the LLM, or your expectations are too high. Crawshaw mentions this in his post: a lot of the value of this chat-driven style is that it very quickly gets you unstuck on a problem. Once you get to that point, you take over! You don't convince the LLM to build the final version you actually commit to your branch.
Generating unit test cases --- in particular, generating unit test cases that reconcile against unsophisticated, brute-force, easily-validated reference implementations of algorithms --- are a perfect example of where that cost/benefit can come out nicely.
Does webflow have something?
My problem is being able to describe what I want in the style I want.
Never used them myself but have seen them mentioned on Reddit and Twitter.
I have a lot of documentation aimed at the AI in `docs/notes/` (some of it written by an LLM but proofread before committing) and I instruct Cursor/Windsurf/Aider via their respective rules/config files to look at the documentation before doing anything. At some scale that initial context becomes just a directory listing & short description of everything in the notes folder, which eventually breaks down due to context size limits, either because I exceed the maximum length of the rules or the agent requires pulling in too much context for the change.
I’ve found that there’s actually an uncanny valley between greenfield projects where the model is free to make whatever assumptions it wants and brownfield projects where it’s possible to provide enough context from the existing codebase to get both API accuracy (hallucinations) and general patterns through few-shot examples. This became very obvious once I had enough examples of that binding layer. Even though I could include all of the documentation for the library, it didn’t work consistently until I had a variety of production examples to point it to.
Right now, I probably spend as much time writing each prompt as I do massaging the notes folder and rules every time I notice the model doing something wrong.
I work on full blown legacy apps and needless to say I don't even bother with LLMs when working on these most of the time.
I'd love to be able to tell my (hypothetical smalltalk) tablet to create an app for me, and work interactively, interacting with the app as it gets built...
Ed: I suppose I should just try and see where cloud ai can take smalltalk today:
His post reminds me of an old idea I had of a language where all you wrote was function signatures and high-level control flow, and maybe some conformance tests around them. The language was designed around filling in the implementations for you. 20 years ago that would have been from a live online database, with implementations vying for popularity on the basis of speed or correctness. Nowadays LLMs would generate most of it on the fly, presumably.
Most ideas are unoriginal, so I wouldn't be surprised if this has been tried already.
The whole article page reads like a site from the '90s, written from scratch in HTML.
That's when I knew the article would go hard.
Substantive pieces don't need fluffy UIs - the idea takes the stage, not the window dressing.
There is likely to be a great rift in how very talented people look at sharper tools.
I've seen the same division pop up with CNC machines, 3d printers, IDEs and now LLMs.
If you are good at doing something, you might find the new tool's output to be sub-par over what you can achieve yourself, but often the lower quality output comes much faster than you can generate.
That causes the people who are deliberate & precise about their process to hate the new tool completely - expressing in the actual code (or paint, or marks on wood) is much better than trying to explain it in a less precise language in the middle of it. The only exception I've seen is that engineering folks often use a blueprint & refine it on paper.
There's a double translation overhead which is wasteful if you don't need it.
If you have dealt with a new hire while being the senior of the pair, there's that familiar feeling of wanting to grab their keyboard instead of explaining how to build that regex - being able to do more things than you can explain or just having a higher bandwidth pipe into the actual task is a common sign of mastery.
The incrementalists on the other hand, tend to love the new tool as they tend to build 6 different things before picking what works the best, slowly iterating towards what they had in mind in the first place.
I got into this profession simply because I could Ctrl-Z to the previous step much more easily than my then favourite chemical engineering goals. In Chemistry, if you get a step wrong, you go to the start & start over. Plus even when things work, yield is just a pain there (prove it first, then you scale up ingredients etc).
Just from the name of sketch.dev, it appears that this author is of the 'sketch first & refine' model where the new tool just speeds up that loop of infinite refinement.
Wow, I've been there ! Years ago we dragged a GIS system kicking and screaming from its nascent era of a dozen ultrasharp dudes with the whole national fiber optics network in their head full of clever optimizations, to three thousand mostly clueless users churning out industrial scale spaghetti... The old hands wanted a dumb fast tool that does their bidding - they hated the slower wizard-assisted handholding, that turned out to be essential to the new population's productivity.
Command line vs. GUI again... Expressivity vs. discoverability, all the choices vs. don't make me think. Know your users !
As we keep burrowing deeper and deeper into an overly complex system that allows people to get into parts of it without understanding the whole, we are edging closer to a situation where no one is left who can actually reason about the system and it starts to deteriorate beyond repair until it suddenly collapses.
How is any human meant to understand a billion lines of code in a single codebase? How is any human meant to understand a world where there are potentially trillions of lines of code operating?
So engineers that like to iterate and explore are more likely to like LLMs.
Whereas engineers that like have a more rigid specific process are more likely to dislike LLMs.
However, there are also people who love everything new and jump onto the latest hype too. They try new things but then immediately advocate it without merit.
Where are the sane people in the middle?
I'd be happy if LLMs could produce working code as often and as quickly as the evangelist claim, but whenever I try to use LLM to work on my day to day tasks, I almost always walk away frustrated and disappointed - and most of my work is boring on technical merits, I'm not writing novel comp-sci algorithms or cryptography libraries.
Every time I say this, I'm painted as some luddite who just hates change when the reality is that no, current LLMs are just not fit for many of the purposes they're being evangelized for. I'd love nothing more than to be a 2x developer on my side projects, but it just hasn't happened and it's not for the lack of trying or open mindedness.
edit: I've never actually seen any LLM-driven developers work in real time. Are there any live coding channels that could convince the skeptics what we're missing out on something revolutionary?
I've used LLM to generate code samples and my IDE (IntelliJ) uses an LLM for auto-suggestions. That's mostly about it for me.
Your experience diverges from that of other experienced devs who have used the same tools, on probably similar projects, and reached different conclusions.
That includes me, for what it's worth. I'm a graybeard whose current work is primarily cloud data pipelines that end in fullstack web. Like most devs who have fully embraced LLMs, I don't think they are a magical panacea. But I've found many cases where they're unquestionably an accelerant -- more than enough to justify the cost.
I don't mean to say your conclusions are wrong. There seems to be a bimodal distribution amongst devs. I suspect there's something about _how_ these tools are used by each dev, and in the specific circumstances/codebases/social contexts, that leads to quite different outcomes. I would love to read a better investigation of this.
They’re great for doing something that has been done before, but their hallucinations are wildly incorrect when novelty is at play - and I’ll add they’re always very authoritative! I’m glad my languages of choice have a compiler!
LLMs work best for code when both (a) there's sufficient relevant training data aka we're not doing something particularly novel and (b) there's sufficient context from the current codebase to pick up expected patterns, the peculiarities of the domain models, etc.
Drop (a) and get comical hallucinations; drop (b) and quickly find that LLMs are deeply mediocre at top-level architectural and framework/library choices.
Perhaps there's also a (c) related to precision. You can write code to issue a SQL query and return JSON from an API endpoint in multiple just-fine ways. Misplace a pthread_mutex_lock, however, and you're in trouble. I certainly don't trust LLMs to get things like this right!
(It's worth mentioning that "novelty" is a tough concept in the context of LLM training data. For instance, maybe nobody has implemented a font rasterizer in Rust before, but plenty of people have written font rasterizers and plenty of others have written Rust; LLMs seem quite good at synthesizing the two.)
Pretty nice at autocomplete. Like writing json tags in go structs. Can just autocomplete that's stuff for me no problem, it saved me seconds per line, seconds I tell you.
It's stupid as well... Autofilled a function, looks correct. Reread it 10 minutes later and well... Minor mistake that would have caused a crash at runtime. It looked correct but in reality it just didn't have enough context ( the context is in an external doc on my second screen ... ) and there was no way it would ever have guessed the correct code.
It took me longer to figure out why the code looked wrong than if I had just typed it myself.
Did it speed up my workflow on code I could have given a junior to write? Not really, but some parts were quicker while other were slower.
And imagine if that code bad crashed in production next week instead of right now while the whole context is still in my head. Maybe that would be hours of debugging time...
Maybe as parent said, for a domain where you are braking new ground, it can generate some interesting ideas you wouldn't have thought about. Like a stupid pair that can get you out if a local manima but in general doesn't help much it can be a significant help.
But then again you could do what has been done for decades and speak to another human about the problem, at least they may have signed the same NDA as you...
* Information lookup
-- when search engines are enshittified and bogged down by SEO spam and when it's difficult to transform a natural language request into a genuinely unique set of search keywords
-- Search-enabled LLMs have the most up to date reach in these circumstances but even static LLMs can work in a pinch when you're searching for info that's probably well represented in their training set before their knowledge cutoff
* Creatively exploring a vaguely defined problem space
-- Especially when one's own head feels like it's too full of lead to think of anything novel
-- Watch out to make sure the wording of your request doesn't bend the LLM too far into a stale direction. For example naming an example can make them tunnel vision onto that example vs considering alternatives to it.
* Pretending to be Stack Exchange
-- EG, the types of questions one might pose on SE one can pose to an LLM and get instant answers, with less criticism for having asked the question in the first place (though Claude is apparently not above gently checking in if one is encountering an X Y problem) and often the LLM's hallucination rate is no worse than that of other SE users
* Shortcut into documentation for tools with either thin or difficult to navigate docs
-- While one must always fact-check the LLM, doing so is usually quicker in this instance than fishing online for which facts to even check
-- This is most effective for tools where tons of people do seem to already know how the tool works (vs tools nobody has ever heard of) but it's just not clear how they learned that.
* Working examples to ice-break a start of project
* Simple automation scripts with few moving parts, especially when one is particular about the goal and the constraints
-- Online one might find example scripts that almost meet your needs but always fail to meet them in some fashion that's irritating to figure out how to coral back into your problem domain
-- LLMs have deep experience with tools and with short snippets of coherent code, so their success rate on utility scripts are much higher than on "portions of complex larger projects".
You seem open to this possibility, since you ask:
> I've never actually seen any LLM-driven developers work in real time. Are there any live coding channels that could convince the skeptics what we're missing out on something revolutionary?
I don't know many yet, but Steve Yegge, a fairly famous developer in his own right, has been talking about this for the last few months, and has walked a few people through his "Chat Oriented Programming" (CHOP) ideas. I believe if you search for that phrase, you'll find a few videos, some from him and some from others. Can't guarantee they're all quality videos, though anything Steve himself does is interesting, IMO.
Except for cryptocurrencies (at least their ratio of investments to output) :-p
They are the quiet ones.
At my last job I spent a lot of time on cleanups and refactoring and never got the LLM to help me in any way. This is the thing that I try every few months and see what's changed, because one day it will be able to do the tedious things I need to get done and spare me the tedium.
Something I should try again is having the LLM follow a spec and see how it does. A long time ago I wrote some code to handle HTTP conditional requests. I pasted the standard into my code, and wrote each chunk of code in the same order as the spec. I bet the LLM could just do that for me; not a lot of knowledge of code outside that file was required, so you don't need many tokens of context to get a good result. But alas the code is already written and works. Maybe if I tried doing that today the LLM would just paste in the code I already wrote and it was trained on ;)
That is interesting. Asking as a complete ignoramus - is there not a way to do this now? Like start off with a 100 of reagent and at every step use a bit and discard if wrong
??? What's up with native English speakers and random acronyms of stuff that isn't said that often? YMMV, IIUC, IANAL, YSK... Just say it and save everyone else a google search.
And googling those acronyms usually returns unrelated shit unless you go specifically to urban dictionary
And then it's "If I understand correctly". Oh. Of course. He couldn't be arsed to type that
IMO, LLMs are super fast predictive input and hallucinatory unzip; files to be decompressed don't have to exist yet, but input has to be extremely deliberate and precise.
You have to have a valid formula that gives the resultant array that don't require no more than 100 IQ to comprehend, and then they unroll it for you into the whole code.
They don't reward trial and error that much. They don't seem to help outsiders like 3D printers did, either. It is indeed a discriminatory tool as in it mistreats amateurs.
And, by the way, it's also increasingly obvious to me that assuming pro-AI posture more than what you would from purely rational and utilitarian standpoint triggers a unique mode of insanity in humans. People seem to contract a lot of negativity doing it. Don't do that.
AIUI that’s where idris is headed
I feel like this is a great approach for LLM assisted programming because things like types, function signatures, pre/post conditions, etc. give more clarity and guidance to the LLM. The more constraints that the LLM has to operate under, the less likely it is to get off track and be inconsistent.
I've taken a shot at doing some little projects for fun with this style of programming in TypeScript and it works pretty well. The programs are written in layers with the domain design, types, schema, and function contracts being figured out first (optionally with some LLM help). Then the function implementations can be figured out towards the end.
It might be fun to try Effect-TS for ADTs + contracts + compile time type validation. It seems like that locks down a lot of the details so it might be good for LLMs. It's fun to play around with different techniques and see what works!
You can also pick the right model for the right need and it's free.
Biggest advantage is the o1 128k context. I can one shot an entire 1000 line class where normally I’d have to go function by function with 4o.
Ollama + CodeGPT IntelliJ plugin. It allows you to point at a local instance.
To my knowledge, it doesn't.
On Emacs there's gptel which integrates quiet nicely different LLM inside Emacs, including a local Ollama.
> gptel is a simple Large Language Model chat client for Emacs, with support for multiple models and backends. It works in the spirit of Emacs, available at any time and uniformly in any buffer.
TabbyML
As far as using LLMs in anger I would really advice anyone to use them. GitHub copilot hasn't been very useful for me personally, but I get a lot of value out of running my thought process by a LLM. I think better when I "think out loud" and that is obviously challenging when everyone is busy. Running my ideas by an LLM helps me process them in a similar (if not better) fashion, often it won't even really matter what the LLM conjures up because simply describing what I want to do often gives me new ideas, like "thinking out loud".
As far as coding goes. I find it extremely useful to have LLMs write cli scripts to auto-generate code. The code the LLM will produce is going to be absolute shite, but that doesn't matter if the output is perfectly fine. It's reduced my personal reliance on third party tools by quite a lot. Because why would I need a code generator for something (and in that process trust a bunch of 3rd party libraries) when I can have a LLM write a similar tool in half an hour?
Not every documentation is made equal. For example: Android docs are royal shit. They cover some basic things, e.g. show a button, but good look finding esoteric Bluetooth information or package management, etc. Most of it is a mix of experimentation and historical knowledge (baggage).
They are wildly different. I'm not sure the Android API reference is that bad, but that is mainly because I've spent a good amount years with the various .Net API references and the Android one is a much more shiny turd than those. I haven't had issues with Bluetooth myself, the Bluetooth SIG has some nice specification PDF's but I assume you're talking about the ones which couldn't be found? I mean this in a "they don't seem to exist" kind of way and not that, you specifically, couldn't find them.
I agree though. It's just that I've never really found internet answers to be very useful. I did actually search for information a few years back when I had to work with a solar inverter datalogger, but it turned out that having the ridicilously long German engineering manual scanned, OCR processed and translated was faster. Anyway, we all have our great white whales. I'm virtually incapable of understanding the SQLAlchemy documentation as an example, luckily I'll probably never have to use it again.
StackOverflow was not really meant for juniors, as juniors usually can indeed find answers on documentation, normally. It was, like ExpertsExchange before it, a place for veterans to exchange tribal knowledge like this. If you think only juniors use SO, you seem to have arrived at the scene just yesterday and just don't know what you're talking about.
This reminds me a bit of PowerBuilder (or was it PowerDesigner?) from early 1990s. They sold it to SAP later, I was told it's still being used today.
Other than that, what correlates more strongly with the ability to use LLMs effectively is, I believe, language skills: the ability to describe problems very clearly. LLMs reply quality changes very significantly with the quality of the prompt. Experienced programmers that can also communicate effectively provide the model with many design hints, details where to focus, ..., basically escaping many local minima immediately.
Actually, I'm afraid that no. It won't give us the step by step scalable processes to make humanity as a whole enter in a loop of indefinitely long period of world peace, with each of us enjoying life in its own thriving manner. That would be great information to broadcast, though.
Also it equally has ability to produce large pile of completely delusional answers, that mimics just as well genuinely sincere statements. Of course, we can also receive that kind of misguiding answers from humans. But the amount of output that mere humans can throw out in such a form is far more limited.
All that said, it's great to be able to experiment with it, and there are a lot of nice and fun things to do with it. It can be a great additional tool, but it won't be a self-sufficient panacea of information source.
That's not anywhere, that's a totally unsolved and open ended problem, why would you think an LLM would have that?
> Think about it: every type of already solved problem you want information about is in them, in fact it is there multiple times, with multiple levels of seriousness in the treatment of the idea.
then that was not clear from your comment saying LLMs contain any information you want.
One has to be careful communicating about LLms because the world is full of people that actually believe LLMs are generally intelligent super beings.
If you want LLM make sandwich, you have to tell them you `want triangular sandwiches of standard serving size made with white bread and egg based filling`, not `it's almost noon and I'm wondering if sandwich for lunch is a good idea`. Fine-tuning partially solves that problem but they still like the former.
Sometimes asking it to self reflect on how the prompt itself could be better engineered helps if the initial response isn't quite right.
I have actually found that from a documentation point of view, querying LLMs has made me better and explaining things to people. If, given the documentation for a system or API, a modern LLM can't answer specific questions about how to perform a task, a person using the same documentation will also likely struggle. It's proving to be a good way to test the effectiveness of documentation, for humans and for LLMs.
Yes, and to provide enough context.
There's probably a lot that experience is contributing to the interaction as well, for example - knowing when the LLM has gone too far, focusing on what's important vs irrelevant to the task, modularising and refactoring code, testing etc
[0] your videos on writing systems software were part of what inspired me to make a committed switch into vim. thank you for those!
I do not remember a single instance when code provided to me by an LLM worked at all. Even if I ask something small that cand be done in 4-5 lines of code is always broken.
From a fellow "seasoned" programmer to another: how the hell do you write the prompts to get back correct working code?
They can't think for you. All intelligent thinking you have to do.
First, give them high level requirement that can be clarified into indented bullet points that looks like code. Or give them such list directly. Don't give them half-open questions usually favored by talented and autonomous individuals.
Then let them further decompress that pseudocode bullet points into code. They'll give you back code that resemble a digitized paper test answer. Fix obvious errors and you get a B grade compiling code.
They can't do non-conventional structures, Quake style performance optimized codes, realtime robotics, cooperative multithreading, etc., just good old it takes what it takes GUI app API and data manipulation codes.
For those use cases with these points in mind, it's a lot faster to let LLM generate tokens than typing `int this_mandatory_function_does_obvious (obvious *obvious){ ...` manually on a keyboard. That should arguably be a productivity boost in the sense that the user of LLM is effectively typing faster.
For the standard answers of "GPT-4 or above", "claude sonnet or haiku", or models of similar power and well known languages like Python, Javascript, Java, or C and assuming no particularly niche or unheard of APIs or project contexts the failure rate of 4-5 line of code scripts in my experience is less than 1%.
As other commenters have pointed it, there also a lot of variation between different models and some are quite dumb.
I've had no issues with 10-20 line coding problems. I've also had it built a lot of complete shell scripts and had no problem there either.
If what I’m requesting an improvement to an existing code, I paste the whole code if practical, or if not, as much of the code as possible, as context before making request for additional functionality.
Often these days I add something like “preserve all currently existing functionality.” Weirdly, as the models have gotten smarter, they have also gotten more prone to delete stuff they view as unnecessary to the task at hand.
If what I’m doing is complex (a subjective judgement) I ask it to lay out a plan for the intended code before starting, giving me a chance to give it a thumbs up or clarify its understanding of what I’m asking for if it’s plan is off base.
Step 2: Write out your description of the thing you want to the best of your ability but phrase it as "I would like X, could you please help me better define X by asking me a series of clarifying questions and probing areas of uncertainty."
Step 3: Once both Claude and you are satisfied that X is defined, say "Please go ahead and implement X."
Step 4a: If feature Y is incorrect, go to Step 2 and repeat the process for Y
Step 4b: If there is a bug, describe what happened and ask Claude to fix it.
That's the basics of it, should work most of the time.
Don't doubt for a second the pedigree of founding engs at Tailscale, but David is careful to point out exactly why LLMs work for them (but might not for others):
I am doing a particular kind of programming, product development, which could be roughly described as trying to bring programs to a user through a robust interface. That means I am building a lot, throwing away a lot, and bouncing around between environments. Some days I mostly write typescript, some days mostly Go. I spent a week in a C++ codebase last month exploring an idea, and just had an opportunity to learn the HTTP server-side events format. I am all over the place, constantly forgetting and relearning.
If you spend more time proving your optimization of a cryptographic algorithm is not vulnerable to timing attacks than you do writing the code, I don't think any of my observations here are going to be useful to you.
In this regard, with first Stack Overflow and now LLMs, the field has improved mightily.
I am not a software dev I am a security researcher. LLM's are great for my security research! It is so much easier and faster to iterate on code like fuzzers to do security testing. Writing code to do a padding oracle attack would have taken me a week+ in the past. Now I can work with an LLM to write code and learn and break within the day.
It has accelerated my security research 10 fold, just because I am able to write code and parse and interpret logs at a level above what I was able to a few years ago.
Start -> Enter Credentials -> Validate -> [Valid] -> Welcome Message -> [Invalid] -> Error Message
Corresponding Code (Python Example):
class LoginSystem:
def validate_credentials(self, username, password):
if username == "admin" and password == "password":
return True
return False
def login(self, username, password):
if self.validate_credentials(username, password):
return "Welcome!"
else:
return "Invalid credentials, please try again."
*Edited for clarity1. https://sqreen.github.io/DevelopersSecurityBestPractices/tim...
You would like it to avoid timing based attacks as well as dos attacks.
It should also generate the functions as pure functions so that state is passed in and passed out and no side effects(printing to the console) happen within the function.
Then also confirm for me that it has handled all error cases that might reasonably happen.
While you are doing that, just think about how much implicit knowledge I just had to type into the comment here and that is still ignoring a ton of other knowledge that needs to be considered like whether that password was salted before being stored. All the error conditions for the sqlite implementation in python, the argon2 implementation in the library.
TLDR: that code is useless and would have taken me the same amount of time to write as your prompt.
Regardless of language, that's basically how you approach the design of a new large project - top down architecture first, then split the implementation into modules, design the major data types, write function signatures. By the time you are done what is left is basically the grunt work of implementing it all, which is the part that LLMs should be decent at, especially if the functions/methods are documented to level (input/output assertions as well as functionality) where it can also write good unit tests for them.
you mean the fun part. I can really empathize with digital artists. I spent twenty years honing my ability to write code and love every minute of it and you're telling me that in a few years all that's going to be left is PM syncs and OKRs and then telling the bot what to write
if I'm lucky to have a job at all
Back in the day (I've been a developer for ~45 years!) it was a bit different as hardware constraints (slow 8-bit processors with limited memory) made algorithmic and code efficiency always a primary concern, and that aspect was certainly fun and satisfying, and much more a part of the overall effort than it is today.
I would write the tests first and foremost: they are the specification. They’re for future me and other maintainers to understand and I wouldn’t want them to be generated: write them with the intention of explaining the module or system to another person. If the code isn’t that important I’ll write unit tests. If I need better assurances I’ll write property tests at a minimum.
If I’m working on concurrent or parallel code or I’m working on designing a distributed system, it’s gotta be a model checker. I’ve verified enough code to know that even a brilliant human cannot find 1-in-a-million programming errors that surface in systems processing millions of transactions a minute. We’re not wired that way. Fortunately we have formal methods. Maths is an excellent language for specifying problems and managing complexity. Induction, category theory, all awesome stuff.
Most importantly though… you have to write the stuff and read it and interact with it to be able to keep it in your head. Programming is theory-building as Naur said.
Personally I just don’t care to read a bunch of code and play, “spot the error;” a game that’s rigged for me to be bad at. It’s much more my speed to write code that obviously has no errors in it because I’ve thought the problem through. Although I struggle with this at times. The struggle is an important part of the process for acquiring new knowledge.
Though I do look forward to algorithms that can find proofs of trivial theorems for me. That would be nice to hand off… although simp does a lot of work like that already. ;)
- be way more reliable
- probably be up to date on how you should solve it in latest/recommend approach
- put you in a place where you can search for adjecent tech
LLM with search has potential but I'd like if current tools are more oriented on source material rather than AI paraphrasing.
Though I still wonder if that means I’m only tricking myself into thinking the LLM is increasing my productivity.
The problem for a regular person is that you have to copypasye from chat. That is “the last mile”. For terminal commands that’s fine but for programming you need a tool to automate this.
Something like refactoring a function, given the entire context, etc. And it happening in the editor and you seeing a diff right away. The rest of the explanatory text should go next to the diff in a separate display.
I bet someone can make a VSCode extension that chats with an LLM and does exactly this. The LLM is told to provide all the sections labeled clearly (code, explanation) and the editor makes the diff.
Having said all that, good libraries that abstract away differences are far superior to writing code with an LLM. The only code that needs to be written is the interface and wiring up between the libraries.
We had an issue recently with a task queue seemingly randomly stalling. We were able to arrive at the root cause much more quickly than we would have because of a back-and-forth brainstorming session with Claude, which involved describing the issue we were seeing, pasting in code from library to ask questions, asking it to write some code to add some missing telemetry, and then probing it for ideas on what might be going wrong. An issue that may have taken days to debug took about an hour to identify.
Think of it as rubber ducking with a very strong generalist engineer who knows about basically any technical concepts.
I feel like I've worn out my computer’s clipboard and alt-tab keys at this stage of the LLM experience.
I will evaluate design ideas with the model, express concerns on trade-offs, ask for alternative ideas, etc.
Some of the benefit is having someone to talk to, but with proper framing it is surprisingly good at giving balanced takes.
Then I needed to write a simple command line utility, so I wrote it in Go, even though I've never written Go before. Being able to make tiny standalone executables which do real work is incredible.
Now if I ever need to write something, I can choose the language most suited to the task, not the one I happen to have the most experience with.
That's a superpower.
Probably half the lines of code were written by me, because I do know how to write code.
Here's what I wrote if you're curious: https://github.com/sjwright/zencontrol-python/
For those not in-the-know, I just learned today that code autocomplete is actually called "Fill-in-the-Middle" tasks
Stop taking these blogs as oracle's of truth, they are not. These AI articles are full of this nonsense, to the point where it would appear to me many responses might just be Nvidia bots or whatever.
Then you need to look harder. FiM is a common approach for code generation LLMs.
https://openai.com/index/efficient-training-of-language-mode...
https://arxiv.org/abs/2207.14255
This was before ChatGPT's release btw.
It's like everything to do with LLM marketing buzzword nonsense.
I really want to just drop out of tech until all this obnoxious hype BS is gone.
Your comments may be sympathised to, but why on earth are they addressed to the root commenter. They simply shared their findings about an acronym.
More pressingly why do you think you should police it?
FIM is a term of art in LLM research for a style of tokens used to implement code completion. In particular, it refers to training an LLM with the extra non-printing tokens:
<|fim_prefix|>
<|fim_middle|>
<|fim_suffix|>
You would then take code like this: func add(a, b int) int {
return <cursor>
}
and convert it to: <|fim_prefix|>func add(a, b int) int {
return<|fim_suffix|>
}<|fim_middle|>
and have the LLM predict the next token.It is, in effect, an encoding scheme for getting the prefix and suffix into the LLM context while positioning the next token to be where the cursor is.
(There are several variants of this scheme.)
As far as I know, the idea of a scratch "buffer" comes from emacs. But in Jetbrains IDEs, you have the full IDE support even with context from your current project (you can pick the "modules" you want to have in context). Given the good integration with LLMs, that's basically what the author seems to want. Perhaps give GoLand[2] a try.
Disclosure: no, I don't work for Jetbrains :D just a very happy customer.
I think emacs + LLM is a killer feature: the integration is super deep, deeper than any IDE I've seen, and it's just available... everywhere! Any text in emacs is sendable to a LLM.
Can't recommend aider enough. I've tried many different coding tools, but they all seem like a leaky abstraction over LLMs medium of sequential text generation. Aider, on the other hand, leans into it in the best possible way.
I do mostly 2/ Search, which is like a personalized Stack Overflow and sometimes feels incredible. You can ask a general question about a specific problem and then dive into some specific point to make sure you understand every part clearly. This works best for things one doesn't know enough about, but has a general idea of how the solution should sound or what it should do. Or, copy-pasting error messages from tools like Docker and have the LLM debug it for you really feels like magic.
For some reason I have always disliked autocomplete anywhere, so I don't do that.
The third way, chat-driven programming, is more difficult, because the code generated by LLMs can be large, and can also be wrong. LLMs are too eager to help, and they will try to find a solution even if there isn't one, and will invent it if necessary. Telling them in the prompt to say "I don't know" or "it's impossible" if need be, can help.
But, like the author says, it's very helpful to get started on something.
> That is why I still use an LLM via a web browser, because I want a blank slate on which to craft a well-contained request
That's also what I do. I wouldn't like having something in the IDE trying to second guess what I write or suddenly absorbing everything into context and coming up with answers that it thinks make a lot of sense but actually don't.
But the main benefit is, like the author says, that it lets one start afresh with every new question or problem, and save focused threads on specific topics.
So one thing that doesn't get a mention in the article but is quite significant I think is the long lag of knowledge cutoff dates: looking at even the latest and greatest, there is one year or more of missing information.
I would love for someone more versed than me to tell us how best to use RAG or LoRA to get the model to answer with fully up to date knowledge on libraries, frameworks, ...
So what we can get out of it is everything that has been written (and publicly released) before translated to any language it knows about.
This has some consequences.
1. Programmers still need to know what algorithms or interfaces or models they want.
2. Programmers do not have to know a language very well anymore, to write code, but the have to for bug fixing. Consequently the rift between garbage software and quality software will grow.
3. New programming languages will face a big economical hurdle to take off.
I bet the opposite. I’ve written a number of DSLs and tooling around them over the last year as LLMs have allowed me to take on much bigger projects.
I expect we see an explosion of languages over the next decade.
You might have written the DSLs, but the LLMs are unaware of this and will offer hallucinations when asked to generate code using that DSL.
For the past few weeks I've been slowly getting back to Common Lisp. Even though there's plenty of CL code on the net, its volume is dwarfed by Python or JS. In effect, both Github Copilot and ChatGPT (4o) have an accuracy of 5%. I'm not kidding: they're unable to generate even very simple snippets correctly, hallucinating packages and functions.
It's of course (I think?) possible to make a GPT specialized for Lisp, but if the generic model performs poorly, it'll probably make people wary and stay away from the language. So, unless you're ready to fine-tune a model for your language and somehow distribute it to your users, you'll see adoption rates dropping (from already minuscule ones!)
But I'm completely unconvinced by the final claim that LLM interfaces should be separate from IDE's, and should be their own websites. No thanks.
Search has been neutral. For finding little facts it’s been about the same as regular search. When digging in, I want comprehensive, dense, reasonably well-written reference documentation. That’s not exactly wide-spread, but LLMs don’t provide this either.
Chat-driven generates too much buggy/incomplete code to be useful, and the chat interface is seriously clunky.
I don't think this is about LLMs getting better, but search becoming worse. In no small thanks to LLMs polluting the results. Do search images for terms and count how many are AI generated.
I can say I got better result from Google X years ago vs Google of today.
When you have to come over and over, and visit more pages to finally find what you needed, they get much more cash from advertisers than when you get everything instantly.
1) Idea
2) Tests
3) Code until all tests pass
I'm probably in the same place as the author, using Chat-GPT to create functions etc, then cut and pasting that into VSCode.
I've started using cline which allows me to code using prompts inside VSCode.
i.e. Create a new page so that users can add tasks to a tasks table.
I'm getting mixed results, but it is very promising. I create a clinerules file which gets added to the system prompt so the AI is more aware of my architecture. I'm also looking at overiding the cline system prompt to both make it fit my architecture better and also to remove stuff I don't need.
I jokingly imagine in the future we won't get asked how long a new feature will take, rather, how many tokens will it take.
I like gptresearcher and all of the glue put in place to be able to extend prompts and agents etc. Not to mention the ability to fetch resources from the web and do research type summaries on it.
All in all it reminds me the work of security researchers, pentesters and analysts. Throughout the career they would build a set of tools and scripts to solve various problems. LLMs kind of force the devs to create/select tools for themselves to ease the burden of their specific line of work as well. You could work without LLMs but maybe it will be a bit more difficult to stand out in the future.
Are the results a paradigm shift so much better that it's worth the hundreds of billions sunk into the hardware and data centers? Is spicy autocomplete worth the equivalent of flying from New York to London while guzzling thousands of liters of water?
It might work, for some definition of useful, but what happens when the AI companies try to claw back some of that half a trillion dollars they burnt?
This stuff is a pretty neat magical evolution and it should not be the domain of any single company.
Also a lot of the hardware and so on has/is being paid for. AWS gcloud, etc aren't taking massive losses on their H100 and other compute services. This bubble is no different than any prior bubble ultimately, and bankruptcy will recycle useful assets into new companies and new purposes.
Which btw why the US is still a huge winner and will continue to be -> robust and functioning bankruptcy laws and courts.
Like, yesterday I made some light changes to a containerized VPN proxy that I maintain. My first thought wasn't "how would Claude do this?" Same thing with an API I made a few weeks ago that scrapes a flight data website to summarize flights in JSON form.
I knew I would need to write some boilerplate and that I'd have to visit SO for some stuff, but asking Claude or o1 to write the tests or boilerplate for me wasn't something I wanted or needed to do. I guess it makes me slower, sure, but I actually enjoy the process of making the software end to end.
Then again, I do all of my programming on Vim and, technically, writing software isn't my day job (I'm in pre-sales, so, best case, I'm writing POC stuff). Perhaps I'd feel differently if I were doing this day in, day out. (Interestingly, I feel the same way about AI in this sense that I do about VSCode. I've used it; I know what's it capable of; I have no interest in it at all.)
The closest I got to "I'll use LLMs for something real" was using it in my backend app that tracks all of my expenses to parse pictures of receipts. Theoretically, this will save me 30 seconds per scan, as I won't need to add all of the transaction metadata myself. Realistically, this would (a) make my review process slower, as LLMs are not yet capable of saying "I'm not sure" and I'd have to manually check each transaction at review time, (b) make my submit API endpoint slower since it takes relatively-forever for it to analyze images (or at least it did when I experimented with this on GPT4-turbo last year), and (c) drive my costs way up (this service costs almost nothing to run, as I run it within Lambda's free tier limit).
The embeddings I feel like there is something there even if it doesn't actually understand. My journey has just begun.
I scoff every time someone says "this + AI". AI is this thing they just throw in there. Last time I didn't want to work with some tech I quit my job was not a good move not being financially independent. Anyway yeah I'll keep digging into this. I still don't use co-pilot right now but I'm reading up more on the embedding stuff for cross training or some case like RAG.
Claude will often generate tons and tons of useless code quickly using up it's limit. I often find myself yelling at it to stop.
I was just working with it last night.
"Hi Claude, can you add tabs here.": <div>
<MainContent/>
<div/>
Claude will then start generating MainContent.
DeepSeek, despite being free does a much better job than Claude. I don't know if it's smarter, but whatever internal logic it has is much more to the point.
Claude also has a very weird bias towards a handful of UI libraries that has installed, even if those wouldn't be good for your project. I wasted hours on shancn UI which requires a very particular setup to work.
LLM's are generally great at common tasks using a top 5( popularity) language.
Ask it to do something in a Haxe UI library and it'll make up functions that *look* correct.
Overall I like them, they definitely speed things up. I don't think most experienced software engineers have much to worry about for now. But I am really worried about juniors. Why higher a junior engineer, when you can just tell your seniors they need to use Copilot to crank out more code
"Add tabs here, assume the rest of the page will work with no futher modification, limit your changes so that any existing code keeps working"
I also do stuff like "Project is using {X} libraries, keep dependencies minimal
Generate a method takes {Z} parameters, return {Y}, using {A}, {B} and {C} do {thing}"
I'll add stuff like Language version, frameworks or specific requests based on this, but then I just reuse the setup , So I like to keep the first message with as much context as possible, ideally separating project context from specific request
My experience with LLM code is that it can't come up with anything even remotely novel. If I say "make it run in amortized O(1)" then 99 times out of 100 I'll get a solution so wildly incorrect (but confidently asserting its own correctness) that it can't possibly be reshaped into something reasonable without a re-write. The remaining 1/100 times aren't usually "good" either.
For the reservoir sampler -- here, it did do the job. David almost certainly knows enough to know the limits of that code and is happy with its limitations. I've solved that particular problem at $WORK though (reservoir sampling for percentile estimates), and for the life of me I can't find a single LLM prompt or sequence of prompts that comes anywhere close to optimality unless that prompt also includes the sorts of insights which lead to an amortized O(1) algorithm being possible (and, even then, you still have to re-run the query many times to get a useful response).
Picking on the article's solution a bit, why on earth is `sorted` appearing in the quantile estimation phase? That's fine if you're only using the data structure once (init -> finalize), but it's uselessly slow otherwise, even ignoring splay trees or anything else you could use to speed up the final inference further.
I personally find LLMs helpful for development when either (1) you can tolerate those sorts of mishaps (e.g., I just want to run a certain algorithm through Scala and don't really care how slow it is if I can run it once and hexedit the output), or (2) you can supply all the auxilliary information so that the LLM has a decent chance of doing it right -- once you've solved the hard problems, the LLM can often get the boilerplate correct when framing and encapsulating your ideas.
Some years ago I gave a task to some of my younger (but intelligent) coworkers.
They spent about 50 minutes searching in google and came back to me saying they couldn't find what they were looking for.
I then typed in a query, clicked one of the first search results and BAM! - there was the information they were unable to find.
What was the difference? It was the keywords / phrases we were using.
This to me is the biggest advantage of LLMs. They dramatically reduce the activation energy of doing something you are unfamiliar with. Much in the way that you're a lot more likely to try kitesurfing if you are at the beach standing next to a kitesurfing instructor.
While LLMs may not yet have human-level depth, it's clear that they already have vastly superhuman breadth. You can argue about the current level of expertise (does it have undergrad knowledge in every field? PhD level knowledge in every field?) but you can't argue about the breadth of fields, nor that the level of expertise improves every year.
My guess is that the programmers who find LLMs useful are people who do a lot of different kinds of programming every week (and thus are constantly going from incompetent to competent in things that other people already know), rather than domain experts who do the same kind of narrow and specialized work every day.
Newer people into programming might not have as good of a time because they may skip actually learning something fundamentals and rely on LLMs as a crutch. Nothing wrong with that, I suppose, but there might be at some point when everything goes up in smoke and the LLM is out of answers.
No amount of italic font is going to change that.
The first few steps were great. Guided me to install things and setup a project structure. The model even generated codes for a few files.
Then something went wrong, the model kept telling me what to do in vague, but didn’t output codes anymore. So I asked for further help, and now it started contradicting itself, rewriting business logic that were implemented in the first response, 3-4 pieces of code snippets of the same file that aren’t compatible etc, and it all fell apart.
I'm not too optimistic about the future of software development if juniors are turning to AI to do those early projects for them.
I had the same issue as you a few days ago. By separating the problem in smaller parts and addressing each parts one by one it got easier.
In your specific case I would try to fully complete the business logic one side. Reset the context. Then provide the logic to a new context and ask for an interface. Difficulty will arise when discovering that the logic is wrong or not suited to the UI, but i would keep using the same process to edit the code. Maybe two different contexts, one for logic, one for UI?
How did you do?
I think at the same time, while the author says this is the second most impressive technology he's seen in his lifetime, it's still a far cry from the bombastic claims being made by the titans of industry regarding its potential. Not uncommon to see claims here on HN of 10x improvements in productivity, or teams of dozens of people being axed, but nothing in the article or in my experience lines up with that.
My workflow puts LLM chat at my fingertips, and I can control the context. Pretty much any text in emacs can be sent to a LLM of your choice via API.
Aider is even better, it does a bunch of tricks to improve performance, and is rapidly becoming a 'must have' benchmark for LLM coding. It integrates with git so each chat modification becomes a new git commit. Easy to undo changes, redo changes, etc. It also has a bunch of hacks because while o1 is good as reasoning, it (apparently) doesn't do code modification well. Aider will send different types of requests to different 'strengths' of LLMs etc. Although if you can use sonnet, you can just use that and be done with it.
It's pretty good, but ultimately it's still just a tool for transforming words into code. It won't help you think or understand.
I feel bad for new kids who won't develop muscle and sight strength to read/write code. Because you still need to read/write code, and can't rely on the chat interface for everything.
LLMs are just a life saver. Literally.
They take my code time down from weeks to an afternoon, sometimes less. Any they're kind.
I'm trying to write a baseball simulator on my own, as a stretch goal. I'm writing my own functions now, a step up for me. The code is to take in real stats, do Monte Carlo, get results. Basic stuff. Such a task was impossible for me before LLMs. I've tried it a few times. No go. Now with LLMs, I've got the skeleton working and should be good to go before opening day. I'm hoping that I can use it for some novels that I am writing to get more realistic stats (don't ask).
I know a lot of HN is very dismissive of LLMs as code help. But to me, a non programmer, they've opened it up. I can do things I never imagined that I could. Is it prod ready? Hell no, please God no. But is it good enough for me to putz with and get just working? Absolutely.
I've downloaded a bunch of free ones from huggingface and Meta just to be sure they can't take them away from me. I'm never going back to that frustration, that 'Why can't I just be not so stupid?', that self-hating, that darkness. They have liberated me.
I have to say that I am impressed with sketch.dev, it got me a working example from the first try and it looked cleaner form all the others, similar but cleaner somehow in terms of styling.
The whole time I was using those tools I was thinking that I want exactly this a LLM trained specifically on the Go official documentation, or whatever your favourite language is, ideally fined tuned by the maintainers of the language.
I want the LLM to show me an idiomatic way to write an API using the standard library I don't necessarily want it to do it instead of me, or to be trained on all of the scrapped data they could scrape. Show me a couple of examples maybe explain a concept, give me steps by step guidance.
I also share his frustrations with the chat based approach what annoys me personally the most is the anthropomorphization of the LLMs, yesterday Gemini was even patronizing me...
Hot take of the day, I think making tests and refactors easier is going to be revolutionary for code quality.