At least 95% of the code was generated by AI (I reached the limit so had to add final bits on my own).
POCs and demos are easy to build by anyone these days. The last 10% is what separates student projects from real products.
any engineer who has spent time in the trenches understands that fixing corner cases in code produced by inexperienced engineers consumes a lot of time.
in fact, poor overall design and lack of diligence tanks entire projects.
There’s a daily 2.5 million token limit that you can use up fairly quickly with 100K context
So they may very well have completed the whole program with Claude. It’s just the machine literally stopped and the human had to do the final grunt work.
What stops you from using AI to explain the code base?
I can't think of a worse llm than Claude.
If ClaudeAI can't create something from my prompts or their own suggestions that's on Claude. Maybe it was my new account on that day at that time. There was a 10 response limit too which made it unworth it to even bother with.
First account I ever actually deleted instead of just never going back. It was that bad.
So pretty simple flow, totally not scalable for bigger projects.
I need to read and check Cursor AI which can also use Claude models.
In django i had it create a backend, set admin user, create requirements.txt and then do a whole frontend in vue as a test. It even can do screen testing and tested what happens if it puts a wrong login in.
I am looking forward to this type of real time app creation being added into our OSs, browsers, phones and glasses.
What do you see that being used for?
Surely, polished apps written for others are going to be best built in professional tools that live independently of whatever the OS might offer.
So I assume you're talking about quick little scratch apps for personal use? Like an AI-enriched version of Apple's Automator or Shortcuts, or of shell scripts, where you spend a while coahcing an AI to write the little one-off program you need instead of visually building a worrkflow or writing a simple script? Is that something you believe there's a high unmet need for?
This is an earnest question. I'm sincerely curious what you're envisioning and how it might supercede the rich variety of existing tools that seem to only see niche use today.
Once a class was full, you could still get in if someone who was selected for the classes changed their mind, which (at an unpredictable time) would result in a seat becoming available in that class until another student noticed the availability and signed up.
So I wrote a simple PHP script that loaded the page every 60 seconds checking, and the script would send me a text message if any of the classes I wanted suddenly had an opening. I would then run to a computer and try to sign up.
These are the kind of bespoke, single-purpose things that I presume AI coding could help the average person with.
“Send me a push notification when the text on this webpage says the class isn’t full, and check every 60 seconds”
Yahoo Pipes. Definitely useful, definitely easier for some folks to string together basic operations into something more complex, but really ends up being for locally/personally consumed one-offs.
Ask a bird what flying is good for and their answer will be encumbered by reality.
Kind of the opposite of “everything looks like a nail”.
Two ideas: "For every picture of food I take, create a recipe to recreate it so I can make it at home in the future" or "Create an app where I can log my food for today and automatically calculate the calories based on the food I put in".
tl;dr it takes running untrusted code to a new level.
Since the original desire was to build any kind of personal/commercial app on an OS, the amount of potential vulnerabilities is potentially infinite.
People have always been able to slip in errors. I am confused why we assume that a LLM will on average not be better but worse on this front, and I suspect a lot of residual human-bias and copium.
We’re getting there with some of the smaller open source models, but we’re not quite there yet. I’m looking forward to where we’ll be in a year!
In many professions, $5000 for tools is almost nothing.
Regardless the reasons, any tooling in the ~$5,000/~3 year ballpark is not at all a high or unique number for a profession.
If you want to pay that <$1k up front to just say "it was always just on my machine, nobody elses" then more power to you. Most just prefer this "pay as you go for someone else to have set it up" model. That doesn't imply it's unattainable if you want to run it differently though.
I know we all love dunking on how expensive Apple computers are, but for $5000 you would be getting a Mac Mini maxed-out with an M4 Pro chip with 14‑core CPU, 20‑core GPU, 16-core Neural Engine, 64GB unified RAM memory, an 8TB SSD and 10 Gigabit Ethernet.
M4 MacBook Pros start at $1599.
What I think GP was overlooking is newer mid range models like Qwen2.5-Coder 32B produce more than usable outputs for this kind of scenario on much lower end consumer (instead of prosumer) hardware so you don't need to go looking for the high memory stuff to do this kind of task locally, even if you may need the high memory stuff for serious AI workloads or AI training.
its even documented on their site
https://support.anthropic.com/en/articles/9519189-project-vi...
Click the "Share" button in the upper right corner of your chat.
Click the "Share & Copy Link" button to create a shareable link and add the chat snapshot to your project’s activity feed.
/edit: i just checked. i think they had a regression? or at least i cannot see the button anymore. go figure. must be pretty recently, as i shared a chat just ~2-3 weeks agoI’ve been using Sonnet 3.5 to code and I’ve managed to build multiple full fledged apps, including paid ones
Maybe they’re not perfect, but they work and I’ve had no complaints yet. They might not scale to become the next Facebook, but not everything has to scale
MetHacker.io (has a lot of features I had to remove because of X API’s new pricing - see /projects on it)
GoUnfaked.com
PlaybookFM.com
TokenAI.dev (working with blowfish to remove the warning flag)
A compiler was born.
Think of Claude as a compiler which compiles NLP text instructions into functional code.
What I find depressing is how quickly someone with minimal experience can flood the web with low quality services, in search for a quick buck. It's like all the SEO spam we've been seeing for decades, but exponentially worse. The web will become even more of a nightmare to navigate than it already is.
What is depressing to me is that the products showcased here are essentially cookie-cutter derivatives built on and around the AI hype cycle. They're barely UI wrappers around LLMs marketed as something groundbreaking. So the thought of the web being flooded with these kinds of sites, in addition to the increase in spam and other AI generated content, is just depressing.
I’m able to bring ideas to life that I could only think about before
All your sites are essentially wrappers around LLMs. You don't disclose which models or APIs you use in the backend, so did you train your own models, or are you proxying to an existing service? Your T&C and Privacy Policy is generic nonsense. What happens to the data your users upload?
ThumbnailGenius.com has an example thumbnail of a person with 4 fingers. I honestly can't tell the difference in the comparison between Midjourney, Dall-E and your fine-tuned models. MetHacker.io is not loading. GoUnfaked.com claims to have royalty and copyright-free images, which is highly misleading and possibly illegal. PlaybookFM.com is another effortless wrapper around NotebookLM or a similar LLM. TokenAI.dev is non-functional and seems like a crypto+AI scam.
I'm sorry to be so negative, but if you're going to leverage AI tools, at least put some thought and effort into creating something original that brings value to the world. I think you'll find this more rewarding than what you're doing now. Quality over quantity, and all that.
So for my future projects, I told myself I will only spend at most a month working on them. Learn to launch and get users before spending months just building
There are much bigger, more creative ideas I want to tackle, but before that, I want to get the hang of actually building something from scratch
I spent almost ten years as a b2b marketer. All the clients I worked with were established businesses that needed some scale and direction.
I quickly learned that growing a 10M ARR business with established pipelines is a whole lot different than building something from scratch. This is my attempt to go from 0 to 1 as fast as possible and learn as much as I can before diving into bigger things
I agree with you there.
> There are much bigger, more creative ideas I want to tackle, but before that, I want to get the hang of actually building something from scratch
Fair enough. And good on you for doing a career shift and learning new skills. I don't want to tear down your efforts.
I just think we disagree on the approach. You don't need to ship a half-dozen cookie-cutter web sites with minimal effort. Sometimes it pays off to really think about a product-market fit—something you should be familiar with as a marketer—and then spending more than a month working on that idea. You'll learn new skills along the way, and ultimately shipping something will be a much more valuable achievement. Besides, shipping for the first time should just be the start of a project. If you're really passionate about the project and building a customer base, then the hard work only starts there. Currently the impression these sites give off are quick cash grabs or downright scams. But good luck to you.
I can't see why any of these would seem like "scams" because I'm not asking for money for any of them except for ThumbnailGenius. Not sure how a free product can be a scam or a cashgrab?
And I don't know how familiar you are with YouTube, but most serious creators typically take multiple pictures of themselves, then get multiple thumbnail variants created by an editor. A good creator will typically spend hundreds of dollars just testing out variations to see how they "look". AI just makes it easy to create the variations and visualize different ideas.
Most of the users end up visualizing 20-30 ideas, then picking 1-2 and actually creating them in real life. It's a massive time and money saver
Usually its v0.dev for the basic UI and then just prompting cursor
ThumbnailGenius
There's already several sites that generate YT thumbnails with AI:
https://vidiq.com/ai-thumbnails-generator/
https://www.testmythumbnails.com/create-thumbnails-with-ai
PlaybookFM
AI Podcasts have been a thing for a while - though I can't imagine who finds listening to TTS voices with LLM content particularly engaging over a genuine person who puts time and effort into crafting an engaging personality.
GoUnfaked
I don't really understand the point of this one - it generates photorealistic AI pictures? Isn't that exactly what Getty Images AI, Freepik, etc. are all doing already?
Good luck - but this feels like a very "spray and pray" approach to development. It feels like it has the same probability to generate genuine income as people who try to start Patreon pages for their AI "artwork".
So hey, maybe its not revolutionary, but some people find it useful enough
Which is fine by me. Maybe you can tackle changing the world. I’ll just focus on being useful enough to some people
Everybody ships nasty bugs in production that he himself might find impossible to debug, everybody.
Thus he will do the very same thing me, you or anybody else on this planet do, find a second pair of eyes, virtually or not, paying or not.
No.
Hell, I don't even see it happening in OS space with dozens of eyes on years-long PRs.
It just happens.
At some point you'll write and ship a bug that you yourself can't debug in an appropriate time frame alone and needs more eyes.
The idea as I understand it is that he achieved apps that he would not be able to write by himself, with the help of AI. That means that it is possible to have bugs that would be reasonable to fix for someone who built the app using their own knowledge, but for the junior they may be too hard. This is a novel situation.
Just because everyone has problems sometimes does not mean problems are all the same, all the same difficulty. Like if I was building Starship, and I ran into some difficult problem, I would most likely give up, as I am way out of my league. I couldn't build a model rocket. I know nothing about rockets. My situation would not be the same as of any rocket engineer. All problems and all situations and all people are not the same, and they are not made the same by AI, despite claims to the contrary.
These simplifications/generalisations "we are all stochastic parrots" "we all make mistakes just like the llms make mistakes" "we all have bugs" "we all manage somehow" are absurd. Companies do not do interviews and promote some people and not others out of a sense of whimsy. Experience and knowledge matters. We are not all interchangable. If LLMs affect this somehow, it's to be looked at.
I can't believe LLMs or devs using LLMs cand suddenly do anything, without limitations. We are not all now equal to Linus and Carmack and such.
If I do encounter situations that Sonnet can’t fix - usually because it has outdated knowledge - I just read the latest documentation
Any serious company paying serious bucks won't accept this, in 2024 they know darn well how bad software can bite massively back, some of them like banks or whole Silicon valley run whole business on software. But its true that there is a massive space outside such cases where this cca works, I've never worked there so can't judge.
Find a way to work around it.
The browser is a great place to build voice chat, 3d, almost any other experience. I expect a renewed interest in granting fuller capabilities to the web, especially background processing and network access.
How about we go back to thick clients, with LLMs the effort required to do that for multiple operating systems will also be reduced, no?
Do you think you could you maintain and/or debug someone else's application?
Most of the things I’ve built are fun things
See: GoUnfaked.com and PlaybookFM.com as examples
PlaybookFM.com is interesting because everything from the code to the podcasts to the logo are AI generated
Everyone on HN tells me how LLMs are horrible, yet I’ve saved literally hundreds of hours with them so far. I guess we’re just lucky.
This place is far gone. Some of the most close minded, uncurious people in tech
I don’t think this place deserves to be called “Hacker” News anymore
I played around with LLMs... Found they aren't very useful. Slapping a project together to ship isn't what hacking is about.
Hacking is about deeply understanding what is actually happening under the hood by playing around, taking it apart, rebuilding it.
When the craze started everyone here were out looking for ways to escape the safeties of the LLMs, or discussing how they are being built etc.
This comment thread is about using the technology to slap a thing together so you can sell it. There's no exploration of the topic at hand, there's no attempt to understand.
I'm trying to think of a decent analogy but can't think of one but this smacks of a certain type of friend who finds a technology you have been hacking on and makes it so unbearable that you actually just lose interest...
I am perfectly aware of the owner here but there is usually at least one or two posts a day here that has what I call a "hacker culture".
In fact, my main reason for not doing any web development is that I find the amount of layers of abstraction and needless complexity for something that should really be simple quite deterring.
I'm sure e.g. React and GraphQL allow people to think about web apps in really elegant and scalable ways, but the learning curve is just way more than I can justify for a side project or a one-off thing at work that will never have more than two or three users opening it once every few months.
It's a slightly orthogonal way of thinking about this but if you are solving real problems, you get away with so much shit, it's unreal.
Maybe Google is not gonna let you code monkey on their monorepo, but you do not have to care. There's enough not-google in the world, and enough real problems.
Maybe I'm "holding it wrong" -- I mean using it incorrectly.
True it renders quite interesting mockups and has React code behind it -- but then try and get this into even a demoable state for your boss or colleagues...
Even a simple "please create a docker file with everything I need in a directory to get this up and running"...doesn't work.
Docker file doesnt work (my fault maybe for not expressing I'm on Arm64), app is miss configured, files are in the wrong directories, key things are missing.
Again just my experience.
I find Claude interesting for generating ideas-- but I have a hard time seeing how a dev with six months experience could get multiple "paid" apps out with it. I have 20 years (bla, bla) experience and still find it requires outrageous hand holding for anything serious.
Again I'm not doubting you at all -- I'm just saying me personally I find it hard to be THAT productive with it.
Going to some new place meant getting a map, looking at it, making a plan, following the plan, keeping track on the map, that sort of thing.
Then I traveled somewhere new, for the first time, with GPS and a navigation sofware. It was quite impressive, and rather easier. I got to my destination the first time, without any problems. And each time after that.
But I did remark that I did not learn the route. The 10th time, the 50th time, I still needed the GPS to guide me. And without it, I would have to start the whole thing from scratch: get a map, make a plan, and so on.
Having done the "manual" navigation with maps lots of times before, it never worries me what I would do without a GPS. But if you're "born" with the GPS, I wonder what you do when it fails.
Are you not worried how you would manage your apps if for some reason the AIs were unavailable?
Make hay while the sun shines, friends. It might not last forever, but neither will you!
I think that because with a map you are looking at street signs/names, etc. both in advance to plan the route, and much more actively and intently while driving to figure out "do I turn here" and you just remember that stuff. Where as a GPS says "turn right at the next light" and you really don't remember any context around that.
If anyone else is frustrated by this experience, I've found that changing the setting in Google Maps to have the map always point north has helped me with actually building a mental model of directions. I found instead of just following the line, it forced me to think about whether I'm going north, south, east, or west for each directions.
It's the same reason I hate that trains in Germany now only show when the next train comes. When I was a kid they would show the time and optionally the delay. I always knew when each train was coming because you learn the schedule automatically. Now it's impossible to automatically build that mental model.
Prior to an iPhone I’d have the general lay of a city memorised within 10min of landing, using a paper tourist map, and probably never feel disoriented, let alone lost.
This morning I walked 2 blocks further than needed (of a 1 block walk) because I wasn’t at all oriented while following Google maps.
I won’t spell out the AI comparison, other than I think more “apps” will be created, and predictable “followed the GPS off a bridge” revelations.
I never worried about what would happen if internet were to become unavailable. Given that it’s become one an essential service I just trust that powers that be will make sure to get it back up.
Python/JS and their ecosystem replacing OS hosted C/C++ which replaced bare metal Assembly which replaced digital logic which replaced analog circuits which replaced mechanical design as the “standard goto tool” for how to create programs.
Starting with punchcard looms and Ada Lovelace maybe.
In every case we trade resource efficiency and lower level understanding for developer velocity and raise the upper bound on system complexity, capability, and somehow performance (despite the wasted efficiency).
>I played around a lot with code when I was younger. I built my first site when I was 13 and had a good handle on Javascript back when jQuery was still a pipe dream.
>Started with the Codecademy Ruby track which was pretty easy. Working through RailsTutorial right now.
posted on April 15, 2015, https://news.ycombinator.com/item?id=9382537
>I've been freelancing since I was 17. I've dabbled in every kind of online trade imaginable, from domain names to crypto. I've built and sold multiple websites. I also built and sold a small agency.
>I can do some marketing, some coding, some design, some sales, but I'm not particularly good at any of those in isolation.
posted on Jan 20, 2023, https://news.ycombinator.com/item?id=34459482
So I don't really understand where this claim of only "6 months of coding experience" is coming from, when you clearly have been coding on and off for multiple decades.
I trust experience people can make better use of these tools because ideally they should have a foundation of first principles to work off of whereas inexperienced people jumping straight into LLMs may not be fully understanding what is happening or what they are given.
My first real coding experience was when I joined a bootcamp (Code.in bootcamp) in 2022. Only reason I could stick around this time was because I had a chunk of change after selling my agency and had nothing else to do
I’m a humanities grad for what its worth
Started off with having it create funny random stories, to slowly creating more and more advanced programs.
It’s shocking how good 3.5 Sonnet is at coding, considering the size of the model.
We don't know the size of Claude 3.5 Sonnet or any other Anthropic model.
Next obvious steps: make it understand large existing programs, learn form the style of the existing code while avoiding to learn the bad style where it's present, and then contribute features or fixes to that codebase.
https://github.com/williamcotton/search-input-query
Why multi-pass? So multiple semantic errors can be reported at once to the user!
The most important factor here is that I've written lexers and parsers beforehand. I was very detailed in my instructions and put it together piece-by-piece. It took probably 100 or so different chats.
Try it out with the GUI you see in the gif in the README:
git clone [email protected]:williamcotton/search-input-query.git
cd search-input-query/search-input-query-demo
npm install
npm run dev
The first prompt (with o1) will get you 60% there, but then you have a different workflow. The prompts can get to a local minimum, where claude/gpt4/etc.. just can't do any better. At which point you need to climb back out and try a different approach.
I recommend git branches to keep track of this. Keep a good working copy in main, and anytime you want to add a feature, make a branch. If you get it almost there, make another branch in case it goes sideways. The biggest issue with developing like this is that you are not a coder anymore; you are a puppet master of a very smart and sometimes totally confused brain.
This is one fact that people seem to severely under-appreciate about LLMs.
They're significantly worse at coding in many aspects than even a moderately skilled and motivated intern, but for my hobby projects, until now I haven't had any intern that would even as much as taking a stab at some of the repetitive or just not very interesting subtasks, let alone stick with them over and over again without getting tired of it.
Eh, I would argue that the apparent lower knowledge requirement is an illusion. These tools produce non-working code more often than not (OpenAI's flagship models are not even correct 50% of the time[1]), so you still have to read, understand and debug their output. If you've ever participated in a code review, you'll know that doing that takes much more effort than actually writing the code yourself.
Not only that, but relying on these tools handicaps you into not actually learning any of the technologies you're working with. If you ever need to troubleshoot or debug something, you'll be forced to use an AI tool for help again, and good luck if that's a critical production issue. If instead you take the time to read the documentation and understand how to use the technology, perhaps even with the _assistance_ of an AI tool, then it might take you more time and effort upfront, but this will pay itself off in the long run by making you more proficient and useful if and when you need to work on it again.
I seriously don't understand the value proposition of the tools in the current AI hype cycle. They are fun and useful to an extent, but are severely limited and downright unhelpful at building and maintaining an actual product.
I find it really helpful where I don't know a library very well but can assess if the output works.
More generally, I think you need to give it pretty constrained problems if you're working on anything relatively complicated.
It's quite honestly mystifying to me.
It's simply not the case that we need to be experts in every single part of a software project. Not for personal projects and not for professional ones either. So it doesn't make any sense to me not to use AI if I've directly proven to myself that it can improve my productivity, my understanding and my knowledge.
> If you ever need to troubleshoot or debug something, you'll be forced to use an AI tool for help again
This is proof to me that you haven't used AI much. Because AI has helped me understand things much quicker and with much less friction than I've ever been able to before. And I have often been able to solve things AI has had issues with, even if it's a topic I have zero experience with, through the interaction with the AI.
At some point, being able to make progress (and how that affects the learning process) trumps this perfect ideal of the programmer who figures out everything on their own through tedious, mind-numbing long hours solving problems that are at best tangential to the problems they were actually trying to solve hours ago.
Frankly, I'm tired of not being able to do any of my personal projects because of all the issues I've mentioned before. And I'm tired of people like you saying I'm doing it wrong, DESPITE ME NOT BEING ABLE TO DO IT AT ALL BEFORE.
Honestly, fuck this.
You're right that I've probably used these tools much less than you have. I use them ocasionally for minor things (understanding an unfamiliar API, giving me hints when web searching is unhelpful, etc.), but even in my limited experience with current state of the art services (Claude 3.5, GPT-4o) I've found them to waste my time in ways I wouldn't if I weren't using them. And at the end of the day, I'm not sure if I'm overall more productive than I would be without them. This limited usage leads me to believe that the problem would be far worse if I were to rely on them for most of my project, but the truth is I haven't actually tried that yet.
So if you feel differently, more power to you. There's no point in getting frustrated because someone has a different point of view than you.
Its like people are learning about these new things called skis.
They fall on their face a few times but then they find "wow much better than good old snowshoes!"
Of course some people are falling every 2 feet while trying skis and then go to the top of the mountain and claim skis are fake and we should all go back to snowshoes because we don't know about snow or mountains.
They are insulting about it because its important to the ragers that, despite failing at skiing, they are senior programmers and everyone else doesn't know how to compile, test and review code and they must be hallucinating their ski journeys!
Meanwhile a bunch of us took the falls and learned to ski and are laughing at the ragers.
The frustrating thing though is that for all the skiiers we can't seem to get good conversations about how to ski because there is so much raging... oh well.
I still use ChatGPT/Claude/Llama daily for both code generation and other things. And while it sometimes does do exactly what I want it to, and I feel more productive, it still seems to waste my time an almost an equal amount of time, and I have to give up on it and rewrite it manually or do a google search/read the actual documentation. It's good to bounce things off, it's good as starting point to learn new stuff, gives you great direction to explore new things and test things out quickly. My guess on a "happy path" it gives me 1.3 speed up, which is great when that happens, but the caveat is that you are not on a "happy path" most the time, and if you listen to the evangelists it seems like it should be 2x-5x speed up (skis). So where's the disconnect?
I'm not here to disprove your experience, but with 2 years of almost daily usage of skis, how come I feel like I'm still barely breaking even compared with snowshoes? Am I that bad with my prompting skills?
Rust, aider.chat and
I thoughtfully limit the context of what I'm coding (on 2 of 15 files).
I ./ask a few times to get the context setup. I let it speculate on the path ahead but rein it in with more conservative goals.
I then say "let's carefully and conservatively implement this" (this is really important with sonnet as its way too eager).
I get to compile by doing ./test a few times, there is sometimes a doom loop though so -
I reset the context with a better footing if things are going off track or I just think "its time".
I do not commit until I have a plausible building set of functions (it can probably handle touching 2-3 functions of configs or one complete function but don't get too much more elaborate without care and experience).
I either reset or use the remaining context to create some tests and validate.
I think saying 1.3x more productive is fair with only this loop BUT you have to keep a few things in perspective.
I wrote specs for everything I did, in other words I wrote out in english my goals and expectations of the code, that was highly valuable and something I probably wouldn't have done.
Automatic literate programming!
Sheep shearing is crazy fast with an LLM. Those tasks that would take you off in the weeds do feel 5x faster (with caveats).
I think the 2x-5x faster is true within certain bounds -
What are the things that you were psychologically avoiding /dragging or just skipping because they were too tedious to even think of?
Some people don't have that problem or maybe don't notice, to me its a real crazy benefit I love!
That's were the real speedups happens and its amazing.
I have more than 20 years with backend development and just some limited experience with frontend tech stacks. I tried using LLM initially with for frontend in my personal project. I found that code generation by LLM are so good. It produces code that works immediately with my vague prompts. It happily fixes any issue that I found pretty quick and correct. I also have enough knowledge to tweak anything that I need so at the end of the day, I can see that my project work as expected. I feel really productive with it.
Then I slowly start using LLM for my backend projects at work. And I was so suprise that the experience was completely opposite. Both ChatGPT and Claude generated code that either bad practice or have flaw, or just ignore instructions in my prompt to come back to bad solutions after just a few questions. It also fails to apply common practices from architecture perspectives. So the effort to make it work is much more than when I do all coding myself.
At that point, I thought probably there are more frontend projects used to train those models than in backend projects, therefore quality of code in frontend tech is much better. But when using LLM with another language that I did not have much experience for another backend project, I found out why my experience is so much different as I can now observe more clearly on what is bad and good in the generated code.
In my previous backend project, as I have much more knowledge on languages/frameworks/practice, my criteria was also higher. It is not just the code that can run, it must be extensible, in right structure and in good architecture, use correct idiom ... Whereas my frontend experience is more limited, the generated code work as I expected but possibly it also violated all these NFRs that I do not know. It explains why using it with a new program language (something I don't have much experience) in a backend project (my well know domain) I found a mixed experience when it seems to provide me working code, but failed on following good practices.
My hypothesis is LLM can generate code at intemediate level, so if your experience is limited you see it as pure gold. But if your level is much better, those generated code are just garbage. I really want to hear more from other people to validate my hypothesis as it seems people also have opposite experiences with this.
Or you're using skis on gravel. I'm a firm believer that the utility varies greatly depending on the tech stack and what you're trying to do, ranging from negative value to way more than 5x.
I also think "prompting" is a misrepresentation of where the actual skill and experiences matter. It's about being efficient with the tooling. Prompting, waiting for a response and then manually copypasting line by line into multiple places is something else entirely than having two LLMs work in tandem, with one figuring out the solution and the other applying the diff.
Good tooling also means that there's no overhead trying out multiple solutions. It should be so frictionless that you sometimes redo a working solution just because you want to see a different approach.
Finally, you must be really active and can't just passively wait for the LLM to finish before you start analyzing the output. Terminate early, reprompt and retry. The first 5 seconds after submitting is crucial and being able to take a decision just from seeing a few lines of code is a completely new skill for me.
Feels like a bunch o flat earth arguments; they’d rather ignore evidence (or even try out by themselves) to keep the illusion that you need to write it all yourself for it to be “high quality”.
I'm not arguing that writing everything yourself leads to higher quality. I'm arguing that _in my experience_ a) it takes more time and effort to read, troubleshoot and fix code generated by these tools than it would take me to actually write it myself, and b) that taking the time to read the documentation and understand the technologies I'm working with would actually save me time and effort in the future.
You're free to disagree with all of this, but don't try to tell me my experience is somehow lesser than yours.
I give more details of one instance of this behavior using Claude 3.5 Sonnet a few weeks ago here[1]. I was asking it to implement a specific feature using a popular Go CLI library. I could probably reproduce it, but honestly can't be bothered, nor do I wish to use more of my API credits for this.
Besides, why should I have to prove anything in this discussion? We're arguing based on good faith, and just as I assume your experience is based on positive interactions, so should you assume mine is based on negative ones.
But I'll give you one last argument based on principles alone.
LLMs are trained on mountains of data from various online sources (web sites, blogs, documentation, GitHub, SO, etc.). This training takes many months and has a cutoff point sometime in the past. When you ask them to generate some code using a specific library, how can you be sure that the code is using the specific version of the library you're currently using? How can you be sure that the library is even in the training set and that the LLM won't just hallucinate it entirely?
Some LLMs allow you to add sufficient context to your prompts (with RAG, etc.) to increase the likelihood of generating working code, which can help, but still isn't foolproof, and not all services/tools allow this.
But more crucially, when you ask it to do something that the library doesn't support, the LLM will never tell you "this isn't possible" or "I don't know". It will instead proceed to hallucinate a solution because that's what it was trained to do.
And how are these state-of-the-art coding LLMs that pass all these coding challenges capable of producing errors like referencing an undefined variable? Surely these trivial bugs shouldn't be possible, no?
All of these issues were what caused me to waste more than an hour fighting with both Claude 3.5 Sonnet and GPT-4o. And keep in mind that this was a fairly small problem. This is why I can't imagine how building an entire app, using a framework and dozens of libraries, could possibly be more productive than doing it without them. But clearly this doesn't seem to be an opinion shared by most people here, so let's agree to disagree.
"You can't use LLMs for this or that because of this and that!!!".
But I AM using them. Every. Single. Day.
Definitely, but what LLMs provide me that a purely textual interface can't is discoverability.
A significant advantage of GUIs is that I get to see a list of things I can do, and the task becomes figuring out which ones are going to solve my problem. For programming languages, that's usually not the case (there's documentation, but that isn't usually as nested and context sensitive as a GUI is), and LLMs are very good at bridging that gap.
So even if an LLM provides me a broken SQL query for a given task, more often than not it's exposed me to new keywords or concepts that did in fact end up solving my problem.
A hand-crafted GUI is definitely still superior to any chat-based interface (and this is in fact a direction I predict AI models will be moving to going forward), but if nobody builds one, I'll take an LLM plus a CLI and/or documentation over only the latter any day.
Where does [1] go? In any case, try Anthropic's flagship:
91% > 50.6%
https://aider.chat/docs/leaderboards/#code-refactoring-leade...
You're reading the link wrong. They specifically picked questions that one or more models failed at. It's not representative of how often the model is wrong in general.
From the paper:
> At least one of the four completions must be incorrect for the trainer to continue with that question; otherwise, the trainer was instructed to create a new question.
In my experience of these tools, including the flagship models discussed here, this is a deal-breaking problem. If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself. The tricky thing is that unless you read and understand the generated code, you really have no idea whether you're progressing or regressing. You can ask the model to generate tests for you as well, but how can you be sure they're written correctly, or covering the right scenarios?
More power to you if you feel like you're being productive, but the difficult things in software development always come in later stages of the project[1]. The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%. I'm not trying to downplay their usefulness, or imply that they will never get better. I think current models do a reasonably good job of summarizing documentation and producing small snippets of example code I can reuse, but I wouldn't trust them for anything beyond that.
[1]: https://en.wikipedia.org/wiki/Ninety%E2%80%93ninety_rule
https://github.com/williamcotton/search-input-query
https://github.com/williamcotton/guish
Both are non-trivial but certainly within the context window so they're not large projects. However, they are easily extensible due to the architecture I instructed as I was building them!
The first contains a recursive descent parser for a search query DSL (and much more).
The second is a bidirectional GUI for bash pipelines.
Both operate at the AST level, guish powered by an existing bash parser.
The READMEs have animated gifs so you can see them in action.
When the LLM gets stuck I either take over the coding myself or come up with a plan to break up the requests into smaller sized chunks with more detail about the steps to take.
It takes a certain amount of skill to use these tools, both with how the tool itself works and definitely with the expertise of the person wielding the tool!
If you have these tools code to good abstractions and good interfaces you can hide implementation details. Then you expose these interfaces to the LLM and make it easier and simpler to build on.
Like, once you've got an AST it's pretty much downhill from there to build tools that operate on said AST.
LLMs are tools that need to be learned. Good prompts aren’t hard, but they do take some effort to build.
What it seems like a lot people assume the process is that you give the AI a relatively high level prompt that’s a description of features, and you get a back a fully functioning app that does everything you outlined.
In my experience (and I think what you are describing here), is that the initial feature-based prompt will often give you (some what impressively) a basic functioning app. But as you start iterating on that app, the high level feature-based prompts start not working very well pretty quickly. It then becomes more an exercise in programming by proxy — where you basically tell the AI what code to write/what changes are needed at a technical level in smaller chunks, and it saves you a lot of time by actually writing the proper syntax. The thing you still have know how to program to be able to accomplish this — (arguably, you have to be a fairly decent programmer who can already reasonably break down complicated tasks into small understandable chunks).
Furthermore, if you want to AI write good code with a solid architecture you pretty much have to tell it what to do from a technical level from the start — for example, here I imagine the AI didn’t come up with structuring things to work as the AST level on its own — you knew that would give you a solid architecture to build on, so you told it to do that.
As someone whose already a half decent programmer, I’ve found this process to be a pretty significant boon to my productivity, on the other hand beyond the basic POC app, I have a hard time seeing it living up the marketing hype of “Anyone can build an app using AI!” that’s being constantly spewed.
Same thing I do without an LLM: I try to fix it myself!
> If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself.
Definitely not in the cases I'm thinking about. This extends from "build me a boilerplate webapp that calls this method every time this form changes and put the output in that text box" (which would take me hours to learn how to do in any given web framework) to "find a more concise/idiomatic way to express this chain of if-statements in this language I'm unfamiliar with" (which I just wouldn't do if I don't much care to learn that particular language).
For the UI/boilerplate part, it's easy enough to tell if things are working or not, and for crucial components I'll at least write tests myself or even try to fully understand what it came up with.
I'd definitely never expect it to get the "business logic" (if you want to call it that for a hobby project) right, and I always double-check that myself, or outright hand-write it and only use the LLM for building everything around it.
> The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%.
What I enjoy most about programming is exactly solving complicated puzzles and fixing gnarly bugs, not doing things that could at least theoretically be abstracted into a framework (that actually saves labor and doesn't just throw it in an unknown form right back at me, as so many modern ones do) relatively easily.
LLMs more often than not allow me to get to these 10% much faster than I normally would.
So my workflow is to just review every bit of code the assistant generates and sometimes I ask the assistant (I'm using Cody) to revisit a particular portion of the code. It usually corrects and spits out a new variant.
My experience has been nothing short of spectacular in using assistants for hobby projects, sometimes even for checking design patterns. I can usually submit a piece of code and ask if the code follows a good pattern under the <given> constraints. I usually get a good recommendation that clearly points out the pros and cons of the said pattern.
Then I had an idea: as it was a picture animation problem, I asked it to write it in CSS. Then I asked it to translate it to Python. Boom, it worked!
At this moment, I finally realized the value of knowing how to prompt. Most of the time it doesn't make a difference, but when things start to get complex, knowing how to speak with these assistants makes all the difference.
Outside that context, the better way to use the tools is as a superpowered stack overflow search. Don't know how ${library} expects you to ${thing} in ${language}? Rather than just ask "I need to add a function in this codebase which..." and pastes it into your code ask "I need an example function which uses..." and use what it spits out as an example to integrate. Then you can ask "can I do it like..." and get some background on why you can/can't/should/shouldn't think about doing it that way. It's not 100% right or applicable, especially with every ${library}, ${thing}, and ${language} but it's certainly faster to a good answer most of the time than SO or searching. Worst case failure? You've spent a couple minutes to find you need to spend a lot of time reading through the docs to do you one off thing yourself still.
Even worse, the LLM will never tell you it doesn't know the answer, or that what you're trying to do is not possible, but will happily produce correct-looking code. It's not until you actually try it that you will notice an error, at which point you either go into a reprompt-retry loop, or just go read the source documentation. At least that one won't gaslight you with wrong examples (most of the time).
There are workarounds to this, and there are coding assistants that actually automate this step for you, and try to automatically run the code and debug it if something goes wrong, but that's an engineering solution to an AI problem, and something that doesn't work when using the model directly.
> Worst case failure? You've spent a couple minutes to find you need to spend a lot of time reading through the docs to do you one off thing yourself still.
It's not a couple of minutes, though. How do you know you've reached the limit of what the LLM can do, vs. not using the right prompt, or giving enough context? The answer always looks to be _almost_ there, so I'm always hopeful I can get it to produce the correct output. I've spent hours of my day in aggregate coaxing the LLM for the right answer. I want to rely on it precisely because I want to avoid looking at the documentation—which sometimes may not even exist or be good enough, otherwise it's back to trawling the web and SO. If I knew the LLM would waste my time, I could've done that from the beginning.
But I do appreciate that the output sometimes guides me in the right direction, or gives me ideas that I didn't have before. It's just that the thought of relying on this workflow to build fully-fledged apps seems completely counterproductive to me, but some folks seem to be doing this, so more power to them.
If software engineering should look like this, oh boy am I happy to be retiring in mere 17 years (fingers crossed) and not having to spend more time in such work. No way any quality complex code can come up from such approach, and people complain about quality of software now .
So you're basically bruteforcing development, a famously efficient technique for... anything.
There are so many small tasks that I could, but until now almost never would automate (whether it's not worth the time [1] or I just couldn't bring myself to do it as I don't really enjoy doing it). A one-off bitmask parser at work here, a proof of concept webapp at home there – it's literally opened up a new world of quality-of-life improvements, in a purely quantitative sense.
It extends beyond UI and web development too: Very often I find myself thinking that there must be a smarter way to use CLI tools like jq, zsh etc., but considering how rarely I use them and that I do already know an ineffective way of getting what I need, up until now I couldn't justify spending the hours of going through documentation on the moderately high chance of finding a few useful nuggets letting me shave off a minute here and there every month.
The same applies to SQL: After plateauing for several years (I get by just fine for my relatively narrow debugging and occasional data migration needs), LLMs have been much better at exposing me to new and useful patterns than dry and extensive documentation. (There are technical documents I really do enjoy reading, but SQL dialect specifications, often without any practical motivation as to when to use a given construct, are really not it.)
LLMs have generally been great at that, but being able to immediately run what they suggest in-browser is where Claude currently has the edge for me. (ChatGPT Plus can apparently evaluate Python, but that's server-side only and accordingly doesn't really allow interactive use cases.)
site:github.com map comparison
I guess the difference, is that my way uses dramatically less time and resources, but requires directly acknowledging the original coders instead of relying on the plagiarism-ish capabilities of reguritating something through an LLM.
Or
Can you come up easily with many things that LLMs have no clue of and hence will fail?
My only complaints are:
a) that it's really easy to hit the usage limit, especially when refactoring across a half dozen files. One thing that'd theoretically be easyish to fix would be automatically updating files in the project context (perhaps with an "accept"/"reject" prompt) so that the model knows what the latest version of your code is without having to reupload it constantly.
b) it oscillating between being lazy in really annoying ways (giving largeish code blocks with commented omissions partway through) and supplying the full file unnecessarily and using up your usage credits.
My hope is that Jetbrains give up on their own (pretty limited) LLM and partner with Anthropic to produce a super-tight IDE native integration.
Not necessarily because users can identify AI apps, but more because due to the lower barrier of entry - the space is going to get hyper-competitive and it'll be VERY difficult to distinguish your app from the hundreds of nearly identical other ones.
Another thing that worries me (because software devs in particular seem to take a very loose moral approach to plagiarism and basic human decency) is that it'll be significantly easier for a less scrupulous dev to find an app that they like, and use an LLM to instantly spin up a copy of it.
I'm trying not to be all gloom and doom about GenAI, because it can be really nifty to see it generate a bunch of boilerplate (YAML configs, dev opsy stuff, etc.) but sometimes it's hard....
People don't seem to realize that the same thing is going to happen to regular app development once AI tooling gets even easier.
Take this very post for example. Imagine an artist forum having daily front-page articles on AI, and most of the comments are curious and non-negative. That's basically what HackerNews is doing, but with developers instead. The huge culture difference is curious, and makes me happy with the posters on this site.
You attribute it to the difficulty of using AI coding tools. But such tools to cut out the programmer and make it available to the layman has always existed: libraries, game engines, website builders, and now web app builders. You also attribute it to the flooding of the markets. But the website and mobile markets are famously saturated, and yet there we continue making stuff, because we want to (and because quality things make more money).
I instead attribute it to our culture of free sharing (what one might call "plagiarism"... of ideas?!), adaptability, and curiosity. And that makes me hopeful.
There are plenty of website builder tools that will glue third party maps. Even the raw Google Maps API website will generate an HTML page with customized maps.