Claude AI built me a React app to compare maps side by side

221 points by caspg 7 days ago | 201 comments

caspg 7 days ago |
I wanted to develop a simple tool to compare maps. I thought about using this opportunity to try out Claude AI for coding a project from scratch. It worked surprisingly well!
At least 95% of the code was generated by AI (I reached the limit so had to add final bits on my own).
MrMcCall 7 days ago |
The problem is that you must understand that 95% in order to complete the last 5%.
negoutputeng 7 days ago |
exactly right.
POCs and demos are easy to build by anyone these days. The last 10% is what separates student projects from real products.
any engineer who has spent time in the trenches understands that fixing corner cases in code produced by inexperienced engineers consumes a lot of time.
in fact, poor overall design and lack of diligence tanks entire projects.
MrMcCall 7 days ago |
Sometimes it's not even inexperienced coders -- it's our own dang selves ;-)
SketchySeaBeast 6 days ago |
Well, in my mind, the SketchySeaBeast of last week is inexperienced compared to the SketchySeaBeast of this one.
ericskiff 7 days ago |
Interestingly, I’m pretty sure they mean they hit the limit with tokens on Claude.
There’s a daily 2.5 million token limit that you can use up fairly quickly with 100K context
So they may very well have completed the whole program with Claude. It’s just the machine literally stopped and the human had to do the final grunt work.
ericskiff 7 days ago |
We’ve been hitting this in our work and in experimentation, and I can confirm that Claude sonnet 3.5 has gotten 100% of the way there, including working through errors and tricky problems as we tested the apps it built.
trash_cat 6 days ago |
>> The problem is that you must understand that 95% in order to complete the last 5%.
What stops you from using AI to explain the code base?
ipaddr 7 days ago |
I asked Claude AI to make me an app and it refused and called it dangerous. I asked what kind of apps they could build and they suggested social media or health. So I asked it to make one but it refused too dangerous. I asked it to make anything.. anything app and it refused. I told it it sucked and it said it didn't. Then I deleted my account.
I can't think of a worse llm than Claude.
fragmede 7 days ago |
Tbh this sounds like a skill issue.
ipaddr 6 days ago |
I'm having great success with OpenAI, local llms. I've created semi popular open source project that uses complex prompts to create specialized crms from just a few words.
If ClaudeAI can't create something from my prompts or their own suggestions that's on Claude. Maybe it was my new account on that day at that time. There was a 10 response limit too which made it unworth it to even bother with.
First account I ever actually deleted instead of just never going back. It was that bad.
fragmede 6 days ago |
Oh I didn't realize ChatGPT was working for you while Claude was not. It's just interesting because my experience is that claude is better at coding than ChatGPT-4o.
7thpower 6 days ago |
There have been rumors of the system prompt changing for some services if the user had strikes on their account from earlier conversations. I wonder if you were impacted by this because what you described has not been my experience nor have I seen it discussed previously.
ipaddr 6 days ago |
I signed up two weeks ago after a hackernews story. I wanted to see quality vs OpenAI. Completely new users first prompt.
truckerbill 7 days ago |
Cool! Did you just prompt -> copy -> paste or did you come up with some specific workflow?
caspg 7 days ago |
I used Claude AI project to attach requirement for the project. Then I just went with single conversation. I specified that I want to do it in small steps and then was just doing copy -> paste until I reached the limit. I think it was because I was doing one big convo instead attaching code to the project.
So pretty simple flow, totally not scalable for bigger projects.
I need to read and check Cursor AI which can also use Claude models.
johnisgood 7 days ago |
I wish I could try out Cursor, but I cannot due to this bug: https://github.com/getcursor/cursor/issues/598
ianhawes 7 days ago |
Have you tried a different IP address?
johnisgood 7 days ago |
I have not, I am using my residential/home IP address though and I can access https://api2.cursor.sh/.
Omnipresent 7 days ago |
are you able to share the link to your prompts / conversation?
hijinks 7 days ago |
you can use the vscode cline to give a task and it uses a LLM to go out and create the app for you.
In django i had it create a backend, set admin user, create requirements.txt and then do a whole frontend in vue as a test. It even can do screen testing and tested what happens if it puts a wrong login in.
2024user 7 days ago |
Claude built me a simple react app AND rendered it in it's own UI - including using imports and stuff.
I am looking forward to this type of real time app creation being added into our OSs, browsers, phones and glasses.
swatcoder 7 days ago |
> I am looking forward to this type of real time app creation being added into our OSs, browsers, phones and glasses.
What do you see that being used for?
Surely, polished apps written for others are going to be best built in professional tools that live independently of whatever the OS might offer.
So I assume you're talking about quick little scratch apps for personal use? Like an AI-enriched version of Apple's Automator or Shortcuts, or of shell scripts, where you spend a while coahcing an AI to write the little one-off program you need instead of visually building a worrkflow or writing a simple script? Is that something you believe there's a high unmet need for?
This is an earnest question. I'm sincerely curious what you're envisioning and how it might supercede the rich variety of existing tools that seem to only see niche use today.
cj 7 days ago |
When I was in college (10+ years ago) there was a system that allowed you to select your classes. During the selection period, certain people had priority (people a year above you got to select first).
Once a class was full, you could still get in if someone who was selected for the classes changed their mind, which (at an unpredictable time) would result in a seat becoming available in that class until another student noticed the availability and signed up.
So I wrote a simple PHP script that loaded the page every 60 seconds checking, and the script would send me a text message if any of the classes I wanted suddenly had an opening. I would then run to a computer and try to sign up.
These are the kind of bespoke, single-purpose things that I presume AI coding could help the average person with.
“Send me a push notification when the text on this webpage says the class isn’t full, and check every 60 seconds”
mgkimsal 7 days ago |
This sort of thing needs to be built to be in-OS or in-device or whatever term we want to use to signify that the agent has to be me to do it. Scripting a browser that already has my saved credentials to do something for me, running in device, is where more things have to go, vs external third party services where we need to continually handle external auth protocols.
thwarted 6 days ago |
> "Send me a push notification when the text on this webpage says the class isn’t full, and check every 60 seconds"
Yahoo Pipes. Definitely useful, definitely easier for some folks to string together basic operations into something more complex, but really ends up being for locally/personally consumed one-offs.
nkingsy 7 days ago |
Hard to say as someone with the power.
Ask a bird what flying is good for and their answer will be encumbered by reality.
Kind of the opposite of “everything looks like a nail”.
bdcravens 7 days ago |
There's no shortage of applications, both desktop and mobile, that never really stray outside of the default toolkits. Line of business apps, for instance, don't need the polish that apps targeting consumers need. They just need to effectively manipulate data.
2024user 6 days ago |
Yeah I was talking about small apps/services for personal use rather than professional applications built to serve a business need.
Two ideas: "For every picture of food I take, create a recipe to recreate it so I can make it at home in the future" or "Create an app where I can log my food for today and automatically calculate the calories based on the food I put in".
croes 7 days ago |
That will be a whole new level of malware attack angle.
mmsc 7 days ago |
Can you expand on what you mean by this, and why?
danieldk 7 days ago |
The best vulnerability is one that is hard to detect because it looks like a bug. It's not inconceivable to train an LLM to silently slip vulnerabilities in generated code. Someone who does not have a whole lot of programming experience is unlikely to detect it.
tl;dr it takes running untrusted code to a new level.
caspg 7 days ago |
WebAssembly sandboxes might become handy.
troupo 7 days ago |
That guards against a small subset of vulnerabilities.
Since the original desire was to build any kind of personal/commercial app on an OS, the amount of potential vulnerabilities is potentially infinite.
sdesol 7 days ago |
This ultimately why I believe Microsoft and Apple will be the big winners. I suspect a lot of companies will want Microsoft and Apple to sign off on things and Microsoft and Apple are going to make sure they get their cut. We may need a new layer above existing operating systems in the future, to safeguard things.
jstummbillig 6 days ago |
Meh. Why would the model makers not be fantastic security vectors? The motivation to not be the company known to "silently slip vulnerabilities in generated code" seems fairly obvious.
People have always been able to slip in errors. I am confused why we assume that a LLM will on average not be better but worse on this front, and I suspect a lot of residual human-bias and copium.
meiraleal 7 days ago |
Every new tech is a new attack surface.
bikamonki 7 days ago |
Could this be used to RPA my browser? Is it safe?
caspg 7 days ago |
What is RPA? Robotic Process Automation? If yes then I have no experience with that.
jckahn 7 days ago |
This sort of thing will be interesting to me once it can be done with fully local and open source tech on attainable hardware (and no, a $5,000 MacBook Pro is not attainable). Building a dependence on yet another untrustworthy AI startup that will inevitably enshittify isn’t compelling despite what the tech can do.
We’re getting there with some of the smaller open source models, but we’re not quite there yet. I’m looking forward to where we’ll be in a year!
Veuxdo 7 days ago |
> and no, a $5,000 MacBook Pro is not attainable
In many professions, $5000 for tools is almost nothing.
cpursley 7 days ago |
Yep. Typical landscape crew rolls with $50k in equipment (maybe more). People push back on tooling pricing in other industries (especially when the tooling is "soft') but have no clue what that the cost of doing biz is huge for others.
torginus 7 days ago |
Yeah, but those tools don't get obsoleted in 3 years.
zamadatix 7 days ago |
You're pretty lucky if the specialised tools for your profession cost <$2,000/y to replace and maintain. Sometimes tools last many years but cost an order of magnitude more anyways. Sometimes tools require expensive maintenance after purchase. Sometimes they are obsolete in a short number of years. Sometimes they were out quickly with use. Sometimes (often) a mix of the above.
Regardless the reasons, any tooling in the ~$5,000/~3 year ballpark is not at all a high or unique number for a profession.
fragmede 7 days ago |
high end CAD design software, the kind used to design SpaceX rocket engines, costs tens of thousands of dollars per seat per year.
sigmar 7 days ago |
I like opensource and reproducible methods too. but here, the code was written by claude and then exported. Is that considered a dependency? They can find a different LLM or pay someone to improve/revise/extend the code later if necessary
zamadatix 7 days ago |
The nice thing is it doesn't really matter all too much which you use "today", you can take the same inputs to any and the outputs remain complete forever. If the concern is you'll start using these tools, like them, start using them a lot, then are worried suddenly all hosted options to run a query disappear tomorrow (meaning being able to run local is important to you) then Qwen2.5-Coder 32B with a 4 bit quant will run 30+ tokens/second will give you many years of use for <$1k in hardware.
If you want to pay that <$1k up front to just say "it was always just on my machine, nobody elses" then more power to you. Most just prefer this "pay as you go for someone else to have set it up" model. That doesn't imply it's unattainable if you want to run it differently though.
phony-account 7 days ago |
> (and no, a $5,000 MacBook Pro is not attainable)
I know we all love dunking on how expensive Apple computers are, but for $5000 you would be getting a Mac Mini maxed-out with an M4 Pro chip with 14‑core CPU, 20‑core GPU, 16-core Neural Engine, 64GB unified RAM memory, an 8TB SSD and 10 Gigabit Ethernet.
M4 MacBook Pros start at $1599.
zamadatix 7 days ago |
I get where GP is coming from and it's not really related to typical Apple price bashing. You can list the most fantastical specs for the craziest value and it all really comes down to that single note: "64 GB memory for the GPU/NPU" - where the mini caps out. The GPU/NPU might change the speed of the output by a linear factor but the memory is a hard wall of how good a model you can run and 64 GB total is surprisingly not that high in the AI world. The MacBook Pro units referenced at $5k are the ones that support 128 GB, hence why they are popularly mentioned. ~ the same $ for the Mac Studio when you minimally load it up to 128 GB. Even then you're not able to run the biggest local models, 128 GB still isn't enough, but you can at least run the mid sized ones unquantized.
What I think GP was overlooking is newer mid range models like Qwen2.5-Coder 32B produce more than usable outputs for this kind of scenario on much lower end consumer (instead of prosumer) hardware so you don't need to go looking for the high memory stuff to do this kind of task locally, even if you may need the high memory stuff for serious AI workloads or AI training.
yieldcrv 7 days ago |
I wish Claude let you share conversations more easily, I’d be curious to see how this one went and what follow on questions you had
ffsm8 7 days ago |
huh? there should be a button on the top right to generate a share link in any conversation? is that really too hard?
its even documented on their site
https://support.anthropic.com/en/articles/9519189-project-vi...
Click the "Share" button in the upper right corner of your chat. Click the "Share & Copy Link" button to create a shareable link and add the chat snapshot to your project’s activity feed.
/edit: i just checked. i think they had a regression? or at least i cannot see the button anymore. go figure. must be pretty recently, as i shared a chat just ~2-3 weeks ago
raldi 7 days ago |
Note the section you’re in at that doc link: “Claude for Work (Team & Enterprise Plans) -> Team & Enterprise Plan Features -> Project visibility and sharing”
ffsm8 6 days ago |
Huh, did they remove it from the normal subscription then? I've never had a team & enterprise plan.
bowsamic 7 days ago |
I’ve had insanely, shockingly good experiences prototyping a musical web app using tone.js using Claude with copilot
spaceman_2020 7 days ago |
I have about 6 months of coding experience. All I really knew was how to build a basic MERN app
I’ve been using Sonnet 3.5 to code and I’ve managed to build multiple full fledged apps, including paid ones
Maybe they’re not perfect, but they work and I’ve had no complaints yet. They might not scale to become the next Facebook, but not everything has to scale
njtransit 7 days ago |
Can you share some examples?
spaceman_2020 7 days ago |
ThumbnailGenius.com (has a ton of new features I haven’t pushed yet as I wait for approval from a payment processor)
MetHacker.io (has a lot of features I had to remove because of X API’s new pricing - see /projects on it)
GoUnfaked.com
PlaybookFM.com
TokenAI.dev (working with blowfish to remove the warning flag)
imiric 7 days ago |
Good job, I suppose, but the existence of all of these, and the fact you're able to pump them out so quickly, is genuinely depressing.
1024core 6 days ago |
There was a time when mathematicians wrote LISP programs and other humans translated them into machine instructions. Then one day someone wrote a LISP program to do this, and had one of the translators translate it.
A compiler was born.
Think of Claude as a compiler which compiles NLP text instructions into functional code.
imiric 6 days ago |
I don't mind tools that empower programmers or even less technical people to build products. I use these tools myself in minor ways, even though I find them to be more of a nuisance than actually helpful.
What I find depressing is how quickly someone with minimal experience can flood the web with low quality services, in search for a quick buck. It's like all the SEO spam we've been seeing for decades, but exponentially worse. The web will become even more of a nightmare to navigate than it already is.
azan_ 6 days ago |
Could you please explain why? I'm trying to think how is it depressing and can't come up with anything.
eastbound 6 days ago |
Because we don’t believe it’s equal quality to our job, so we see cheap competition arriving with swathes of bad products, but no way for customers to distinguish what makes quality. Plus we all create bugs anyway.
imiric 6 days ago |
It's not really that. The quality of these tools will probably increase, and I'm fine with more competition, and with less experienced developers being empowered to build their own products.
What is depressing to me is that the products showcased here are essentially cookie-cutter derivatives built on and around the AI hype cycle. They're barely UI wrappers around LLMs marketed as something groundbreaking. So the thought of the web being flooded with these kinds of sites, in addition to the increase in spam and other AI generated content, is just depressing.
grugagag 6 days ago |
I find that part depressing as well, like who would even listen to gen AI podcasts? Not even vetted by a person but just pumped out as filler like it’s some kind of soil fertilizer. There is already so much good human made content on the web for nearly free if you only look. No doubt this AI slop will get in out way even if we don’t want it, but think of the effect this slop is going to have on younger generation.
spaceman_2020 6 days ago |
Why is it depressing? I find it exhilarating
I’m able to bring ideas to life that I could only think about before
imiric 6 days ago |
Honestly? These were ideas that you put a lot of thought into?
All your sites are essentially wrappers around LLMs. You don't disclose which models or APIs you use in the backend, so did you train your own models, or are you proxying to an existing service? Your T&C and Privacy Policy is generic nonsense. What happens to the data your users upload?
ThumbnailGenius.com has an example thumbnail of a person with 4 fingers. I honestly can't tell the difference in the comparison between Midjourney, Dall-E and your fine-tuned models. MetHacker.io is not loading. GoUnfaked.com claims to have royalty and copyright-free images, which is highly misleading and possibly illegal. PlaybookFM.com is another effortless wrapper around NotebookLM or a similar LLM. TokenAI.dev is non-functional and seems like a crypto+AI scam.
I'm sorry to be so negative, but if you're going to leverage AI tools, at least put some thought and effort into creating something original that brings value to the world. I think you'll find this more rewarding than what you're doing now. Quality over quantity, and all that.
spaceman_2020 6 days ago |
These are not the things I wanted to create, but its better to ship out something than waste months just building something and never shipping. I did that with MetaHacker which, under the hood, is very capable. But because I spent so much time building it, I never got around to marketing it or monetizing it, so much of it is abandoned and only 1/10th of it is live for end users.
So for my future projects, I told myself I will only spend at most a month working on them. Learn to launch and get users before spending months just building
There are much bigger, more creative ideas I want to tackle, but before that, I want to get the hang of actually building something from scratch
I spent almost ten years as a b2b marketer. All the clients I worked with were established businesses that needed some scale and direction.
I quickly learned that growing a 10M ARR business with established pipelines is a whole lot different than building something from scratch. This is my attempt to go from 0 to 1 as fast as possible and learn as much as I can before diving into bigger things
imiric 6 days ago |
> its better to ship out something than waste months just building something and never shipping.
I agree with you there.
> There are much bigger, more creative ideas I want to tackle, but before that, I want to get the hang of actually building something from scratch
Fair enough. And good on you for doing a career shift and learning new skills. I don't want to tear down your efforts.
I just think we disagree on the approach. You don't need to ship a half-dozen cookie-cutter web sites with minimal effort. Sometimes it pays off to really think about a product-market fit—something you should be familiar with as a marketer—and then spending more than a month working on that idea. You'll learn new skills along the way, and ultimately shipping something will be a much more valuable achievement. Besides, shipping for the first time should just be the start of a project. If you're really passionate about the project and building a customer base, then the hard work only starts there. Currently the impression these sites give off are quick cash grabs or downright scams. But good luck to you.
spaceman_2020 6 days ago |
Well, I've had dozens of signups every day for ThumbnailGenius depsite 0 marketing. Whatever I'm doing, people seem to like it. I had way more data on it about thumbnails (including an index of 100k thumbnails) but had to remove that because YouTube didn't like it.
I can't see why any of these would seem like "scams" because I'm not asking for money for any of them except for ThumbnailGenius. Not sure how a free product can be a scam or a cashgrab?
And I don't know how familiar you are with YouTube, but most serious creators typically take multiple pictures of themselves, then get multiple thumbnail variants created by an editor. A good creator will typically spend hundreds of dollars just testing out variations to see how they "look". AI just makes it easy to create the variations and visualize different ideas.
Most of the users end up visualizing 20-30 ideas, then picking 1-2 and actually creating them in real life. It's a massive time and money saver
lucianbr 7 days ago |
Thank you for sharing these. So many people talk in superlative terms about the stuff they did with AI and give no details. It's very hard to gauge what they actually achieved.
tchock23 7 days ago |
Are you sharing your process anywhere (like on YouTube)? I’d be really curious to see behind the scenes of how you’re building these types of products. For example, are you just using Claude or Claude with Cursor (or something else)?
spaceman_2020 6 days ago |
I think I should create a video showing it
Usually its v0.dev for the basic UI and then just prompting cursor
vunderba 6 days ago |
With the lower barrier to entry comes dozens of nearly "apps" that are nearly indistinguishable from each other.
ThumbnailGenius
There's already several sites that generate YT thumbnails with AI:
https://vidiq.com/ai-thumbnails-generator/
https://www.testmythumbnails.com/create-thumbnails-with-ai
PlaybookFM
AI Podcasts have been a thing for a while - though I can't imagine who finds listening to TTS voices with LLM content particularly engaging over a genuine person who puts time and effort into crafting an engaging personality.
GoUnfaked
I don't really understand the point of this one - it generates photorealistic AI pictures? Isn't that exactly what Getty Images AI, Freepik, etc. are all doing already?
Good luck - but this feels like a very "spray and pray" approach to development. It feels like it has the same probability to generate genuine income as people who try to start Patreon pages for their AI "artwork".
spaceman_2020 6 days ago |
I have at least 10 people signing up for thumbnailgenius every day when I haven’t even started marketing it
So hey, maybe its not revolutionary, but some people find it useful enough
Which is fine by me. Maybe you can tackle changing the world. I’ll just focus on being useful enough to some people
njtransit 6 days ago |
Thanks. That is interesting. Did you use AI for just the front end components, or were you also able to reliably use the output for the ML portions, or were those simply offloaded to other services via API call?
julianeon 6 days ago |
The most impressive part of this is your ideas, btw. I don't think for most people LLM's are nearly as effective, because they can't think of use cases.
yodsanklai 7 days ago |
What do you do if your app has a bug that your LLM isn't able to fix? is your coding experience enough to fix it, or do you ship with bugs hoping customers won't mind?
epolanski 7 days ago |
What's the point of this question?
Everybody ships nasty bugs in production that he himself might find impossible to debug, everybody.
Thus he will do the very same thing me, you or anybody else on this planet do, find a second pair of eyes, virtually or not, paying or not.
LunaSea 7 days ago |
> Everybody ships nasty bugs in production that he himself might find impossible to debug, everybody.
No.
monooso 7 days ago |
Some people haven't realised it yet.
LunaSea 7 days ago |
Which would be a lot better than knowingly releasing in production code with important defects.
epolanski 6 days ago |
I haven't seen anybody, regardless of org, procedures or whatever to never ship a bug he himself could not debug.
Hell, I don't even see it happening in OS space with dozens of eyes on years-long PRs.
It just happens.
At some point you'll write and ship a bug that you yourself can't debug in an appropriate time frame alone and needs more eyes.
lucianbr 7 days ago |
Presumably what is possible for a person with 6 months of experience is rather limited.
The idea as I understand it is that he achieved apps that he would not be able to write by himself, with the help of AI. That means that it is possible to have bugs that would be reasonable to fix for someone who built the app using their own knowledge, but for the junior they may be too hard. This is a novel situation.
Just because everyone has problems sometimes does not mean problems are all the same, all the same difficulty. Like if I was building Starship, and I ran into some difficult problem, I would most likely give up, as I am way out of my league. I couldn't build a model rocket. I know nothing about rockets. My situation would not be the same as of any rocket engineer. All problems and all situations and all people are not the same, and they are not made the same by AI, despite claims to the contrary.
These simplifications/generalisations "we are all stochastic parrots" "we all make mistakes just like the llms make mistakes" "we all have bugs" "we all manage somehow" are absurd. Companies do not do interviews and promote some people and not others out of a sense of whimsy. Experience and knowledge matters. We are not all interchangable. If LLMs affect this somehow, it's to be looked at.
I can't believe LLMs or devs using LLMs cand suddenly do anything, without limitations. We are not all now equal to Linus and Carmack and such.
spaceman_2020 7 days ago |
I haven’t encountered any serious bugs - mostly because I know what I’m capable of and what Sonnet is capable of. I don’t tackle things that are far too ambitious and focus on ideas I want to experiment with or ideas I can build the MVP for
If I do encounter situations that Sonnet can’t fix - usually because it has outdated knowledge - I just read the latest documentation
jajko 6 days ago |
Those things are not not even comparable in the quality output and if you see them as equals this seriously harm your credibility in this topic. This won't change in next decade+. For some use cases thats good enough quality, till you have any actual issues your smart code tools can't handle. Till people start suing you because your half-baked app caused them a real, serious financial loss and they have a true vengeance in their eyes (smaller companies or individuals often take such harm from outside very personally).
Any serious company paying serious bucks won't accept this, in 2024 they know darn well how bad software can bite massively back, some of them like banks or whole Silicon valley run whole business on software. But its true that there is a massive space outside such cases where this cca works, I've never worked there so can't judge.
ipaddr 7 days ago |
What I see is people using llm to make a new app without the bug
willsmith72 7 days ago |
There's always a bug, you just haven't found it yet
jstanley 7 days ago |
What does anyone do if they have a bug they don't know how to fix?
Find a way to work around it.
amonith 7 days ago |
If customers do mind then at best it's an opportunity cost (less people will buy). Shipping with bugs > not shipping, simple as.
namaria 6 days ago |
You better hope no bugs expose you to liabilities like runaway cloud costs or mishandling sensitive data
amonith 6 days ago |
Yeah SaaS is a different beast. Lots of areas worth hardening. Not really for customers but mainly for yourself. But desktop apps, mobile apps, self-hosted stuff, games, CLIs, libraries - you don't really have to worry about much.
instalabs 7 days ago |
You start over from scratch /s(50%)
jchanimal 7 days ago |
I think the front end is the most interesting place right now, because it’s where people are making stuff for themselves with the help of LLMs.
The browser is a great place to build voice chat, 3d, almost any other experience. I expect a renewed interest in granting fuller capabilities to the web, especially background processing and network access.
grugagag 6 days ago |
That seems a bit too much to ask, I want the piece of mind of knowing the browser keeps isolated sandboxes, if that philosophy changes I would be very uncomfortable using the browser.
How about we go back to thick clients, with LLMs the effort required to do that for multiple operating systems will also be reduced, no?
dartos 7 days ago |
In your opinion as a newer dev, what were the most complicated things that sonnet was able to do and was not able to do?
hipadev23 7 days ago |
Genuine question: Do you feel like you're learning the language/frameworks/techniques well? Or do you feel like you're just getting more adept at leveraging the LLM?
Do you think you could you maintain and/or debug someone else's application?
spaceman_2020 7 days ago |
Not as much as I would have if I was writing everything from scratch. But then again, my goal isn’t to be a coder or get a job as a coder - I’m primarily a marketer and got into coding simply because I had a stack of ideas I wanted to experiment with
Most of the things I’ve built are fun things
See: GoUnfaked.com and PlaybookFM.com as examples
PlaybookFM.com is interesting because everything from the code to the podcasts to the logo are AI generated
an_guy 7 days ago |
How much time did you spend on getting it working especially for playbookfm?
spaceman_2020 6 days ago |
Less than a week, tops. The hard part was the content - curating the resources for creating the podcasts
kenjackson 6 days ago |
I’ve had the same experience, except helping complement build components or scripts.
Everyone on HN tells me how LLMs are horrible, yet I’ve saved literally hundreds of hours with them so far. I guess we’re just lucky.
spaceman_2020 6 days ago |
Look at the replies on my comment
This place is far gone. Some of the most close minded, uncurious people in tech
I don’t think this place deserves to be called “Hacker” News anymore
jpc0 6 days ago |
We have different definitions of hacker.
I played around with LLMs... Found they aren't very useful. Slapping a project together to ship isn't what hacking is about.
Hacking is about deeply understanding what is actually happening under the hood by playing around, taking it apart, rebuilding it.
When the craze started everyone here were out looking for ways to escape the safeties of the LLMs, or discussing how they are being built etc.
This comment thread is about using the technology to slap a thing together so you can sell it. There's no exploration of the topic at hand, there's no attempt to understand.
I'm trying to think of a decent analogy but can't think of one but this smacks of a certain type of friend who finds a technology you have been hacking on and makes it so unbearable that you actually just lose interest...
kenjackson 6 days ago |
So almost every person on here is not a hacker since I’ve met very few people here who are knowledgeable about EE, computer architecture, compiler technology, or heck even how browsers work. Of course there are some, but HN is mostly about slapping together technology to sell your startup.
jpc0 6 days ago |
Don't correlate hackernews and YCombinator.
I am perfectly aware of the owner here but there is usually at least one or two posts a day here that has what I call a "hacker culture".
lxgr 6 days ago |
There are so many frameworks, especially on the web and in Javascript, that I have absolutely zero interest in learning.
In fact, my main reason for not doing any web development is that I find the amount of layers of abstraction and needless complexity for something that should really be simple quite deterring.
I'm sure e.g. React and GraphQL allow people to think about web apps in really elegant and scalable ways, but the learning curve is just way more than I can justify for a side project or a one-off thing at work that will never have more than two or three users opening it once every few months.
jstummbillig 6 days ago |
The more important question that programmers, who are not product makers, often miss is: Are you solving real problems?
It's a slightly orthogonal way of thinking about this but if you are solving real problems, you get away with so much shit, it's unreal.
Maybe Google is not gonna let you code monkey on their monorepo, but you do not have to care. There's enough not-google in the world, and enough real problems.
lostemptations5 7 days ago |
I'm not saying you're wrong at all or in disbelief -- but I've spent lots of time with Claude 3.5 trying to prototype React apps and not even full fledged prototypes -- and I can't get it to make anything bug free somehow.
Maybe I'm "holding it wrong" -- I mean using it incorrectly.
True it renders quite interesting mockups and has React code behind it -- but then try and get this into even a demoable state for your boss or colleagues...
Even a simple "please create a docker file with everything I need in a directory to get this up and running"...doesn't work.
Docker file doesnt work (my fault maybe for not expressing I'm on Arm64), app is miss configured, files are in the wrong directories, key things are missing.
Again just my experience.
I find Claude interesting for generating ideas-- but I have a hard time seeing how a dev with six months experience could get multiple "paid" apps out with it. I have 20 years (bla, bla) experience and still find it requires outrageous hand holding for anything serious.
Again I'm not doubting you at all -- I'm just saying me personally I find it hard to be THAT productive with it.
fragmede 7 days ago |
would you be willing to share any of your chats? Like say the docker one?
lostemptations5 6 days ago |
That one was for work - but let me try again on an example project and I'll share it sure.
vachina 6 days ago |
Agreed. LLMs can give you ideas on how to get there, but you still need foundational knowledge of the language or framework to extend the code it generates.
spaceman_2020 6 days ago |
You have to temper your ambitions. Choose languages it understands well. Deploy on vercel. Specify exactly what you’re working with (“I’m using nextjs 14 with app router”)
lucianbr 7 days ago |
I learned to drive before in-car GPS was widely available, at least where I lived.
Going to some new place meant getting a map, looking at it, making a plan, following the plan, keeping track on the map, that sort of thing.
Then I traveled somewhere new, for the first time, with GPS and a navigation sofware. It was quite impressive, and rather easier. I got to my destination the first time, without any problems. And each time after that.
But I did remark that I did not learn the route. The 10th time, the 50th time, I still needed the GPS to guide me. And without it, I would have to start the whole thing from scratch: get a map, make a plan, and so on.
Having done the "manual" navigation with maps lots of times before, it never worries me what I would do without a GPS. But if you're "born" with the GPS, I wonder what you do when it fails.
Are you not worried how you would manage your apps if for some reason the AIs were unavailable?
spaceman_2020 7 days ago |
You have to temper your ambitions. Choose languages it understands really well (typescript or python). Choose easier deployment solutions (vercel over docker). Be specific about the versions you’re using (“I’m using nextjs 14 with app router”)
dmd 7 days ago |
I was told a similar thing when a mentor discovered I didn’t know how to wire-wrap my own CPU from scratch.
eastbound 7 days ago |
AI is much less reliable. Heck, it could go down like GPT went down in quality after the first 2 month: Services could in an instant become less good.
dmd 7 days ago |
Nobody’s forcing you to use someone else’s service though.
skeeter2020 6 days ago |
they are if that's the only way you know how to create something
tokioyoyo 6 days ago |
When there's demand, it will be made reliable through supply. Internet and connectivity weren't really that reliable either 20 years ago. I'm simplifying it heavily, but discarding AI's usefulness and how it lowers the barrier of empty isn't a good idea for the future.
duggan 6 days ago |
Pretty sure people I remember similar conversations happening when people decided to produce content for YouTube full time, lean into Node.js as a dev stack, or build iOS apps.
Make hay while the sun shines, friends. It might not last forever, but neither will you!
redmajor12 6 days ago |
One can't "lean in" to an activity. You're either doing it or not.
kenjackson 6 days ago |
After 50x? I use GPS too, but I definitely learn the route after a few times with it. There are probably a class of people who don’t ever learn it, but I feel like this has to be a minority.
SoftTalker 6 days ago |
It definitely takes me longer. Pre-GPS, I might need a map (or at least notes) to get somewhere, but then I could most likely find my way back on my own. Using GPS to get somewhere, I'd be lost trying to get back without it.
I think that because with a map you are looking at street signs/names, etc. both in advance to plan the route, and much more actively and intently while driving to figure out "do I turn here" and you just remember that stuff. Where as a GPS says "turn right at the next light" and you really don't remember any context around that.
technicallyleft 6 days ago |
'Attention is All You Need'
vishnugupta 6 days ago |
I learn when I get lost and go around the circles a few times.
conscion 6 days ago |
> The 10th time, the 50th time, I still needed the GPS to guide me.
If anyone else is frustrated by this experience, I've found that changing the setting in Google Maps to have the map always point north has helped me with actually building a mental model of directions. I found instead of just following the line, it forced me to think about whether I'm going north, south, east, or west for each directions.
carlmr 6 days ago |
So much this! GPS ego perspective directions prevent developing any kind of mental model.
It's the same reason I hate that trains in Germany now only show when the next train comes. When I was a kid they would show the time and optionally the delay. I always knew when each train was coming because you learn the schedule automatically. Now it's impossible to automatically build that mental model.
sails 6 days ago |
Wondering around a new city today I had a similar thought.
Prior to an iPhone I’d have the general lay of a city memorised within 10min of landing, using a paper tourist map, and probably never feel disoriented, let alone lost.
This morning I walked 2 blocks further than needed (of a 1 block walk) because I wasn’t at all oriented while following Google maps.
I won’t spell out the AI comparison, other than I think more “apps” will be created, and predictable “followed the GPS off a bridge” revelations.
vishnugupta 6 days ago |
I learned to code during when internet access was limited to about 1hr/week, extremely slow, and unreliable. But now without inherent I just can’t get any work done. I guess it’s same for a good chunk of people.
I never worried about what would happen if internet were to become unavailable. Given that it’s become one an essential service I just trust that powers that be will make sure to get it back up.
grugagag 6 days ago |
But the internet will change too. Many people feel cheated that all their contributions were gobbled up by big tech and used to train their models without any remuneration or credit. In my life I experienced a very open internet but the closing down trend has started already.
belter 7 days ago |
If there are complaints who is going to fix it? :-)
poslathian 6 days ago |
We’ll see what the future holds but as an old timer, using LLMs to creat applications seems exactly the same as:
Python/JS and their ecosystem replacing OS hosted C/C++ which replaced bare metal Assembly which replaced digital logic which replaced analog circuits which replaced mechanical design as the “standard goto tool” for how to create programs.
Starting with punchcard looms and Ada Lovelace maybe.
In every case we trade resource efficiency and lower level understanding for developer velocity and raise the upper bound on system complexity, capability, and somehow performance (despite the wasted efficiency).
quantum_state 6 days ago |
Well said .. Hope the pile of complexity accumulated would not be a time bomb ...
lxgr 6 days ago |
To be fair, that ship has sailed years ago in many areas of programming, even without LLMs...
rlty_chck 6 days ago |
Every time I see claims like this, I instinctively click on the user's profile and try to verify if their story checks out.
>I played around a lot with code when I was younger. I built my first site when I was 13 and had a good handle on Javascript back when jQuery was still a pipe dream.
>Started with the Codecademy Ruby track which was pretty easy. Working through RailsTutorial right now.
posted on April 15, 2015, https://news.ycombinator.com/item?id=9382537
>I've been freelancing since I was 17. I've dabbled in every kind of online trade imaginable, from domain names to crypto. I've built and sold multiple websites. I also built and sold a small agency.
>I can do some marketing, some coding, some design, some sales, but I'm not particularly good at any of those in isolation.
posted on Jan 20, 2023, https://news.ycombinator.com/item?id=34459482
So I don't really understand where this claim of only "6 months of coding experience" is coming from, when you clearly have been coding on and off for multiple decades.
spaceman_2020 6 days ago |
you do know that there are other kinds of freelancing apart from coding, right?
azemetre 6 days ago |
I think the comment is fair. The poster came across as inexperienced with programming when in reality they have a decade plus experience.
I trust experience people can make better use of these tools because ideally they should have a foundation of first principles to work off of whereas inexperienced people jumping straight into LLMs may not be fully understanding what is happening or what they are given.
spaceman_2020 6 days ago |
I don’t have a decade of coding experience. I do have almost two decades of internet experience, especially marketing. I had an aborted attempt at learning to code back in 2013-14, but I never stuck around, mostly because I was freelancing as a content marketer (GrowthPub.com)
My first real coding experience was when I joined a bootcamp (Code.in bootcamp) in 2022. Only reason I could stick around this time was because I had a chunk of change after selling my agency and had nothing else to do
I’m a humanities grad for what its worth
Omnipresent 7 days ago |
it'd be cool to see the prompts used and the edits required to get to the end product here.
EcommerceFlow 7 days ago |
Been using LLMs since got3 beta in June 2021 and it’s interesting to see how my use cases have continuously been upgraded as models advanced.
Started off with having it create funny random stories, to slowly creating more and more advanced programs.
It’s shocking how good 3.5 Sonnet is at coding, considering the size of the model.
GaggiX 6 days ago |
>considering the size of the model.
We don't know the size of Claude 3.5 Sonnet or any other Anthropic model.
nine_k 7 days ago |
This is great progress.
Next obvious steps: make it understand large existing programs, learn form the style of the existing code while avoiding to learn the bad style where it's present, and then contribute features or fixes to that codebase.
williamcotton 6 days ago |
I used Claude (and a bit of ChatGPT) to write a multi-pass recursive descent parser for a search query DSL:
https://github.com/williamcotton/search-input-query
Why multi-pass? So multiple semantic errors can be reported at once to the user!
The most important factor here is that I've written lexers and parsers beforehand. I was very detailed in my instructions and put it together piece-by-piece. It took probably 100 or so different chats.
Try it out with the GUI you see in the gif in the README:
git clone [email protected]:williamcotton/search-input-query.git cd search-input-query/search-input-query-demo npm install npm run dev
thefourthchime 6 days ago |
For years I've kept a list of apps / ideas / products I may do someday. I never made the time, with Cursor AI I have already built one, and am working on another. It's enabling me to use frameworks I barely know, like React Native, Swift, etc..
The first prompt (with o1) will get you 60% there, but then you have a different workflow. The prompts can get to a local minimum, where claude/gpt4/etc.. just can't do any better. At which point you need to climb back out and try a different approach.
I recommend git branches to keep track of this. Keep a good working copy in main, and anytime you want to add a feature, make a branch. If you get it almost there, make another branch in case it goes sideways. The biggest issue with developing like this is that you are not a coder anymore; you are a puppet master of a very smart and sometimes totally confused brain.
lxgr 6 days ago |
> For years I've kept a list of apps / ideas / products I may do someday. I never made the time, with Cursor AI I have already built one, and am working on another.
This is one fact that people seem to severely under-appreciate about LLMs.
They're significantly worse at coding in many aspects than even a moderately skilled and motivated intern, but for my hobby projects, until now I haven't had any intern that would even as much as taking a stab at some of the repetitive or just not very interesting subtasks, let alone stick with them over and over again without getting tired of it.
Sakos 6 days ago |
It also reduces the knowledge needed. I don't particularly care about learning how to setup and configure a web extension from scratch. With LLM, I can get 90% of that working in minutes, then focus on the parts that I am interested in. As somebody with ADHD, it was primarily all that supplementary, tangential knowledge which felt like an insurmountable mountain to me and made it impossible to actually try all the ideas I'd had over the years. I'm so much more productive now that I don't have to always get into the weeds for every little thing, which could easily delay progress for hours or even days. I can pick and choose the parts I feel are important to me.
imiric 6 days ago |
> It also reduces the knowledge needed. I don't particularly care about learning how to setup and configure a web extension from scratch. With LLM, I can get 90% of that working in minutes, then focus on the parts that I am interested in.
Eh, I would argue that the apparent lower knowledge requirement is an illusion. These tools produce non-working code more often than not (OpenAI's flagship models are not even correct 50% of the time[1]), so you still have to read, understand and debug their output. If you've ever participated in a code review, you'll know that doing that takes much more effort than actually writing the code yourself.
Not only that, but relying on these tools handicaps you into not actually learning any of the technologies you're working with. If you ever need to troubleshoot or debug something, you'll be forced to use an AI tool for help again, and good luck if that's a critical production issue. If instead you take the time to read the documentation and understand how to use the technology, perhaps even with the _assistance_ of an AI tool, then it might take you more time and effort upfront, but this will pay itself off in the long run by making you more proficient and useful if and when you need to work on it again.
I seriously don't understand the value proposition of the tools in the current AI hype cycle. They are fun and useful to an extent, but are severely limited and downright unhelpful at building and maintaining an actual product.
[1]: https://openai.com/index/introducing-simpleqa/
Robotenomics 6 days ago |
Things have improved considerably over the last 3 months. Claude with cursor.ai is certainly over 50%
kbaker 6 days ago |
Where the libraries are new/not known to the LLM yet, I just go find the most similar examples in the docs and chuck them in the context window too (easy to do with aider.) Then say 'fix it'. Does an incredible job.
imiric 6 days ago |
I haven't used cursor.ai, but Claude 3.5 Sonnet definitely has the issues I'm talking about. Maybe I'm not great at prompting, but this is far from an exact science. I always ask it specific things I need help with, making sure to provide sufficient detail, and don't ask it to produce mountains of code. I've had it generate code that not only hallucinates APIs, but has trivial bugs like referencing undefined variables. How this can scale beyond a few lines of code to produce an actually working application is beyond me. But apparently I'm in the minority here, since people are actually using these tools successfully for just that, so more power to them.
disgruntledphd2 6 days ago |
I think it really depends on the language. It generates pretty crap but working python code, but even for SQL it generates really weird crummy code that often doesn't solve the problem.
I find it really helpful where I don't know a library very well but can assess if the output works.
More generally, I think you need to give it pretty constrained problems if you're working on anything relatively complicated.
Sakos 6 days ago |
All the projects I've been able to start and make progress in in the past year vs the ten years before that are substantive enough proof for me that you're wrong in pretty much all of your arguments. My direct experience proves statements like "the lower knowledge requirement is an illusion" and "it takes much more effort to review code than to write it" wrong. I do code reviews all the time. I write code all the time. I've had AI help me with my projects and I've reviewed and refactored that code. You're quite simply wrong. And I don't understand why you're so eager to argue that my direct experience is wrong, as if you're trying to gaslight me.
It's quite honestly mystifying to me.
It's simply not the case that we need to be experts in every single part of a software project. Not for personal projects and not for professional ones either. So it doesn't make any sense to me not to use AI if I've directly proven to myself that it can improve my productivity, my understanding and my knowledge.
> If you ever need to troubleshoot or debug something, you'll be forced to use an AI tool for help again
This is proof to me that you haven't used AI much. Because AI has helped me understand things much quicker and with much less friction than I've ever been able to before. And I have often been able to solve things AI has had issues with, even if it's a topic I have zero experience with, through the interaction with the AI.
At some point, being able to make progress (and how that affects the learning process) trumps this perfect ideal of the programmer who figures out everything on their own through tedious, mind-numbing long hours solving problems that are at best tangential to the problems they were actually trying to solve hours ago.
Frankly, I'm tired of not being able to do any of my personal projects because of all the issues I've mentioned before. And I'm tired of people like you saying I'm doing it wrong, DESPITE ME NOT BEING ABLE TO DO IT AT ALL BEFORE.
Honestly, fuck this.
imiric 6 days ago |
Hey, I'm not trying to gaslight you into anything. I'm just arguing from my point of view, which you're free to disagree with.
You're right that I've probably used these tools much less than you have. I use them ocasionally for minor things (understanding an unfamiliar API, giving me hints when web searching is unhelpful, etc.), but even in my limited experience with current state of the art services (Claude 3.5, GPT-4o) I've found them to waste my time in ways I wouldn't if I weren't using them. And at the end of the day, I'm not sure if I'm overall more productive than I would be without them. This limited usage leads me to believe that the problem would be far worse if I were to rely on them for most of my project, but the truth is I haven't actually tried that yet.
So if you feel differently, more power to you. There's no point in getting frustrated because someone has a different point of view than you.
WhatIsDukkha 6 days ago |
I'm not frustrated with you but I'll explain why you might be getting get the vibes here.
Its like people are learning about these new things called skis.
They fall on their face a few times but then they find "wow much better than good old snowshoes!"
Of course some people are falling every 2 feet while trying skis and then go to the top of the mountain and claim skis are fake and we should all go back to snowshoes because we don't know about snow or mountains.
They are insulting about it because its important to the ragers that, despite failing at skiing, they are senior programmers and everyone else doesn't know how to compile, test and review code and they must be hallucinating their ski journeys!
Meanwhile a bunch of us took the falls and learned to ski and are laughing at the ragers.
The frustrating thing though is that for all the skiiers we can't seem to get good conversations about how to ski because there is so much raging... oh well.
rossvor 6 days ago |
With your analogy I would be the one saying that I'm still not convinced that skis are faster than snowshoes.
I still use ChatGPT/Claude/Llama daily for both code generation and other things. And while it sometimes does do exactly what I want it to, and I feel more productive, it still seems to waste my time an almost an equal amount of time, and I have to give up on it and rewrite it manually or do a google search/read the actual documentation. It's good to bounce things off, it's good as starting point to learn new stuff, gives you great direction to explore new things and test things out quickly. My guess on a "happy path" it gives me 1.3 speed up, which is great when that happens, but the caveat is that you are not on a "happy path" most the time, and if you listen to the evangelists it seems like it should be 2x-5x speed up (skis). So where's the disconnect?
I'm not here to disprove your experience, but with 2 years of almost daily usage of skis, how come I feel like I'm still barely breaking even compared with snowshoes? Am I that bad with my prompting skills?
WhatIsDukkha 6 days ago |
I use -
Rust, aider.chat and
I thoughtfully limit the context of what I'm coding (on 2 of 15 files).
I ./ask a few times to get the context setup. I let it speculate on the path ahead but rein it in with more conservative goals.
I then say "let's carefully and conservatively implement this" (this is really important with sonnet as its way too eager).
I get to compile by doing ./test a few times, there is sometimes a doom loop though so -
I reset the context with a better footing if things are going off track or I just think "its time".
I do not commit until I have a plausible building set of functions (it can probably handle touching 2-3 functions of configs or one complete function but don't get too much more elaborate without care and experience).
I either reset or use the remaining context to create some tests and validate.
I think saying 1.3x more productive is fair with only this loop BUT you have to keep a few things in perspective.
I wrote specs for everything I did, in other words I wrote out in english my goals and expectations of the code, that was highly valuable and something I probably wouldn't have done.
Automatic literate programming!
Sheep shearing is crazy fast with an LLM. Those tasks that would take you off in the weeds do feel 5x faster (with caveats).
I think the 2x-5x faster is true within certain bounds -
What are the things that you were psychologically avoiding /dragging or just skipping because they were too tedious to even think of?
Some people don't have that problem or maybe don't notice, to me its a real crazy benefit I love!
That's were the real speedups happens and its amazing.
max6zx 6 days ago |
Do you mind sharing how much experience you have with the tech stack that have generated code? What I found with LLM is the perspective for AI generated code is different depends on your own experience, and I would like to know whether it is only my experience.
I have more than 20 years with backend development and just some limited experience with frontend tech stacks. I tried using LLM initially with for frontend in my personal project. I found that code generation by LLM are so good. It produces code that works immediately with my vague prompts. It happily fixes any issue that I found pretty quick and correct. I also have enough knowledge to tweak anything that I need so at the end of the day, I can see that my project work as expected. I feel really productive with it.
Then I slowly start using LLM for my backend projects at work. And I was so suprise that the experience was completely opposite. Both ChatGPT and Claude generated code that either bad practice or have flaw, or just ignore instructions in my prompt to come back to bad solutions after just a few questions. It also fails to apply common practices from architecture perspectives. So the effort to make it work is much more than when I do all coding myself.
At that point, I thought probably there are more frontend projects used to train those models than in backend projects, therefore quality of code in frontend tech is much better. But when using LLM with another language that I did not have much experience for another backend project, I found out why my experience is so much different as I can now observe more clearly on what is bad and good in the generated code.
In my previous backend project, as I have much more knowledge on languages/frameworks/practice, my criteria was also higher. It is not just the code that can run, it must be extensible, in right structure and in good architecture, use correct idiom ... Whereas my frontend experience is more limited, the generated code work as I expected but possibly it also violated all these NFRs that I do not know. It explains why using it with a new program language (something I don't have much experience) in a backend project (my well know domain) I found a mixed experience when it seems to provide me working code, but failed on following good practices.
My hypothesis is LLM can generate code at intemediate level, so if your experience is limited you see it as pure gold. But if your level is much better, those generated code are just garbage. I really want to hear more from other people to validate my hypothesis as it seems people also have opposite experiences with this.
Kiro 5 days ago |
> Am I that bad with my prompting skills?
Or you're using skis on gravel. I'm a firm believer that the utility varies greatly depending on the tech stack and what you're trying to do, ranging from negative value to way more than 5x.
I also think "prompting" is a misrepresentation of where the actual skill and experiences matter. It's about being efficient with the tooling. Prompting, waiting for a response and then manually copypasting line by line into multiple places is something else entirely than having two LLMs work in tandem, with one figuring out the solution and the other applying the diff.
Good tooling also means that there's no overhead trying out multiple solutions. It should be so frictionless that you sometimes redo a working solution just because you want to see a different approach.
Finally, you must be really active and can't just passively wait for the LLM to finish before you start analyzing the output. Terminate early, reprompt and retry. The first 5 seconds after submitting is crucial and being able to take a decision just from seeing a few lines of code is a completely new skill for me.
Kiro 6 days ago |
I understand your frustration. It's like someone trying to convince me that a red car I'm looking at is actually blue. I know what I'm seeing and experiencing. There's nothing theoretical about it and I have the results right in front of me.
senorrib 6 days ago |
It’s baffling to see all the ignorant answers to this thread, OP. My experience has been similar to yours, and I’ve been pushing complex software to production for the past 20 years.
Feels like a bunch o flat earth arguments; they’d rather ignore evidence (or even try out by themselves) to keep the illusion that you need to write it all yourself for it to be “high quality”.
imiric 6 days ago |
Or, hey, maybe we've just had different experiences, and are using these tools differently? I even concede that I may not be great at prompting, which could be the cause of my problems.
I'm not arguing that writing everything yourself leads to higher quality. I'm arguing that _in my experience_ a) it takes more time and effort to read, troubleshoot and fix code generated by these tools than it would take me to actually write it myself, and b) that taking the time to read the documentation and understand the technologies I'm working with would actually save me time and effort in the future.
You're free to disagree with all of this, but don't try to tell me my experience is somehow lesser than yours.
senorrib 6 days ago |
I wasn’t targeting this specifically at you or your individual experience. However, I did hear the same arguments you make ad nauseam, and they usually come from people that are either just too skeptical, or don’t put the effort required to use the tool.
fragmede 6 days ago |
So link chats where you've run into the very real limitations these things have. What language you're using, what framework you're in, what library it hallucinated. I'm not interested in either of us shouting past each other, I genuinely want to understand how your experience, which is not at all lesser than mine, is so different. Am I ignoring flaws that you otherwise can't overlook? Are you expecting too much from it with too little input? Without details, all we can do is describe feelings at each other and get frustrated when the other person's experience is different. Might as well ask your star sign while we're at it.
imiric 5 days ago |
I use OpenRouter, which saves chats in local storage, and my browser is configured to delete all history and data on exit. So, unfortunately, I can't link you to an exact session.
I give more details of one instance of this behavior using Claude 3.5 Sonnet a few weeks ago here[1]. I was asking it to implement a specific feature using a popular Go CLI library. I could probably reproduce it, but honestly can't be bothered, nor do I wish to use more of my API credits for this.
Besides, why should I have to prove anything in this discussion? We're arguing based on good faith, and just as I assume your experience is based on positive interactions, so should you assume mine is based on negative ones.
But I'll give you one last argument based on principles alone.
LLMs are trained on mountains of data from various online sources (web sites, blogs, documentation, GitHub, SO, etc.). This training takes many months and has a cutoff point sometime in the past. When you ask them to generate some code using a specific library, how can you be sure that the code is using the specific version of the library you're currently using? How can you be sure that the library is even in the training set and that the LLM won't just hallucinate it entirely?
Some LLMs allow you to add sufficient context to your prompts (with RAG, etc.) to increase the likelihood of generating working code, which can help, but still isn't foolproof, and not all services/tools allow this.
But more crucially, when you ask it to do something that the library doesn't support, the LLM will never tell you "this isn't possible" or "I don't know". It will instead proceed to hallucinate a solution because that's what it was trained to do.
And how are these state-of-the-art coding LLMs that pass all these coding challenges capable of producing errors like referencing an undefined variable? Surely these trivial bugs shouldn't be possible, no?
All of these issues were what caused me to waste more than an hour fighting with both Claude 3.5 Sonnet and GPT-4o. And keep in mind that this was a fairly small problem. This is why I can't imagine how building an entire app, using a framework and dozens of libraries, could possibly be more productive than doing it without them. But clearly this doesn't seem to be an opinion shared by most people here, so let's agree to disagree.
[1]: https://news.ycombinator.com/item?id=41987474
thefourthchime 6 days ago |
Thanks, my guess is that many complaining about the technology haven't honestly tried to embrace it.
rtsil 6 days ago |
Or denial/rejection is natural defense reaction for people who feel threatened.
handzhiev 6 days ago |
This desire of deniers to prove to people who actually get tons of benefit of LLMs that they aren't getting it is becoming more ridiculous every time.
"You can't use LLMs for this or that because of this and that!!!".
But I AM using them. Every. Single. Day.
handzhiev 6 days ago |
And of course every time such comments get downvoted. Folks, you can downvote as much as you want - I don't give a fuck even if my reputation goes negative. This won't make you right.
lxgr 6 days ago |
> These tools produce non-working code more often than not (OpenAI's flagship models are not even correct 50% of the time[1]), so you still have to read, understand and debug their output.
Definitely, but what LLMs provide me that a purely textual interface can't is discoverability.
A significant advantage of GUIs is that I get to see a list of things I can do, and the task becomes figuring out which ones are going to solve my problem. For programming languages, that's usually not the case (there's documentation, but that isn't usually as nested and context sensitive as a GUI is), and LLMs are very good at bridging that gap.
So even if an LLM provides me a broken SQL query for a given task, more often than not it's exposed me to new keywords or concepts that did in fact end up solving my problem.
A hand-crafted GUI is definitely still superior to any chat-based interface (and this is in fact a direction I predict AI models will be moving to going forward), but if nobody builds one, I'll take an LLM plus a CLI and/or documentation over only the latter any day.
Terretta 6 days ago |
> OpenAI's flagship models are not even correct 50% of the time[1]
Where does [1] go? In any case, try Anthropic's flagship:
91% > 50.6%
https://aider.chat/docs/leaderboards/#code-refactoring-leade...
Kiro 6 days ago |
> OpenAI's flagship models are not even correct 50% of the time[1]
You're reading the link wrong. They specifically picked questions that one or more models failed at. It's not representative of how often the model is wrong in general.
From the paper:
> At least one of the four completions must be incorrect for the trainer to continue with that question; otherwise, the trainer was instructed to create a new question.
imiric 6 days ago |
I'm curious: what do you do when the LLM starts hallucinating, or gets stuck in a loop of generating non-working code that it can't get out of? What do you do when you need to troubleshoot and fix an issue it introduced, but has no idea how to fix?
In my experience of these tools, including the flagship models discussed here, this is a deal-breaking problem. If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself. The tricky thing is that unless you read and understand the generated code, you really have no idea whether you're progressing or regressing. You can ask the model to generate tests for you as well, but how can you be sure they're written correctly, or covering the right scenarios?
More power to you if you feel like you're being productive, but the difficult things in software development always come in later stages of the project[1]. The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%. I'm not trying to downplay their usefulness, or imply that they will never get better. I think current models do a reasonably good job of summarizing documentation and producing small snippets of example code I can reuse, but I wouldn't trust them for anything beyond that.
[1]: https://en.wikipedia.org/wiki/Ninety%E2%80%93ninety_rule
williamcotton 6 days ago |
These two projects were almost entirely written with LLMs:
https://github.com/williamcotton/search-input-query
https://github.com/williamcotton/guish
Both are non-trivial but certainly within the context window so they're not large projects. However, they are easily extensible due to the architecture I instructed as I was building them!
The first contains a recursive descent parser for a search query DSL (and much more).
The second is a bidirectional GUI for bash pipelines.
Both operate at the AST level, guish powered by an existing bash parser.
The READMEs have animated gifs so you can see them in action.
When the LLM gets stuck I either take over the coding myself or come up with a plan to break up the requests into smaller sized chunks with more detail about the steps to take.
It takes a certain amount of skill to use these tools, both with how the tool itself works and definitely with the expertise of the person wielding the tool!
If you have these tools code to good abstractions and good interfaces you can hide implementation details. Then you expose these interfaces to the LLM and make it easier and simpler to build on.
Like, once you've got an AST it's pretty much downhill from there to build tools that operate on said AST.
senorrib 6 days ago |
The usual workflow I see skeptic folks take is throw a random sentence and expect the LLM to correctly figure out the end result. And then just keep sending small chunks of code expanding the context with poor instructions.
LLMs are tools that need to be learned. Good prompts aren’t hard, but they do take some effort to build.
mikeocool 6 days ago |
I think there’s often a disconnect between what lay-people hear when someone says “I built an app using AI” and the reality.
What it seems like a lot people assume the process is that you give the AI a relatively high level prompt that’s a description of features, and you get a back a fully functioning app that does everything you outlined.
In my experience (and I think what you are describing here), is that the initial feature-based prompt will often give you (some what impressively) a basic functioning app. But as you start iterating on that app, the high level feature-based prompts start not working very well pretty quickly. It then becomes more an exercise in programming by proxy — where you basically tell the AI what code to write/what changes are needed at a technical level in smaller chunks, and it saves you a lot of time by actually writing the proper syntax. The thing you still have know how to program to be able to accomplish this — (arguably, you have to be a fairly decent programmer who can already reasonably break down complicated tasks into small understandable chunks).
Furthermore, if you want to AI write good code with a solid architecture you pretty much have to tell it what to do from a technical level from the start — for example, here I imagine the AI didn’t come up with structuring things to work as the AST level on its own — you knew that would give you a solid architecture to build on, so you told it to do that.
As someone whose already a half decent programmer, I’ve found this process to be a pretty significant boon to my productivity, on the other hand beyond the basic POC app, I have a hard time seeing it living up the marketing hype of “Anyone can build an app using AI!” that’s being constantly spewed.
lxgr 6 days ago |
> what do you do when the LLM starts hallucinating, or gets stuck in a loop of generating non-working code that it can't get out of? What do you do when you need to troubleshoot and fix an issue it introduced, but has no idea how to fix?
Same thing I do without an LLM: I try to fix it myself!
> If I have to waste time re-prompting to make progress, and reviewing and fixing the generated code, it would be much faster if I wrote the code from scratch myself.
Definitely not in the cases I'm thinking about. This extends from "build me a boilerplate webapp that calls this method every time this form changes and put the output in that text box" (which would take me hours to learn how to do in any given web framework) to "find a more concise/idiomatic way to express this chain of if-statements in this language I'm unfamiliar with" (which I just wouldn't do if I don't much care to learn that particular language).
For the UI/boilerplate part, it's easy enough to tell if things are working or not, and for crucial components I'll at least write tests myself or even try to fully understand what it came up with.
I'd definitely never expect it to get the "business logic" (if you want to call it that for a hobby project) right, and I always double-check that myself, or outright hand-write it and only use the LLM for building everything around it.
> The devil is always in the details, and modern AI tools are just incapable of getting us across that last 10%.
What I enjoy most about programming is exactly solving complicated puzzles and fixing gnarly bugs, not doing things that could at least theoretically be abstracted into a framework (that actually saves labor and doesn't just throw it in an unknown form right back at me, as so many modern ones do) relatively easily.
LLMs more often than not allow me to get to these 10% much faster than I normally would.
deepGem 6 days ago |
I have seen hallucinations in comments more than in code. In some of the code hallucinations, I can correct them myself. The hallucinations are obvious. try without finally blocks etc.
So my workflow is to just review every bit of code the assistant generates and sometimes I ask the assistant (I'm using Cody) to revisit a particular portion of the code. It usually corrects and spits out a new variant.
My experience has been nothing short of spectacular in using assistants for hobby projects, sometimes even for checking design patterns. I can usually submit a piece of code and ask if the code follows a good pattern under the <given> constraints. I usually get a good recommendation that clearly points out the pros and cons of the said pattern.
rizz0 6 days ago |
If it gets stuck, I tell it where I think we took a wrong turn. It then recognizes the issue and refactors in a way that for a hobby project I wouldn’t have had the patience for.
rubslopes 6 days ago |
I had a problem like this recently. I was working with a Python library that I had never worked with before, and I was relying heavily on LLMs. I was stuck at a point where no LLM could solve my problem: o1, GPT-4o, Sonnet 3.5, Gemini Pro...
Then I had an idea: as it was a picture animation problem, I asked it to write it in CSS. Then I asked it to translate it to Python. Boom, it worked!
At this moment, I finally realized the value of knowing how to prompt. Most of the time it doesn't make a difference, but when things start to get complex, knowing how to speak with these assistants makes all the difference.
zamadatix 5 days ago |
You may not need that last 10% on a hobby project. If you do and it's insurmountable with AI+you then you're no worse off than when it was insurmountable with just you.
Outside that context, the better way to use the tools is as a superpowered stack overflow search. Don't know how ${library} expects you to ${thing} in ${language}? Rather than just ask "I need to add a function in this codebase which..." and pastes it into your code ask "I need an example function which uses..." and use what it spits out as an example to integrate. Then you can ask "can I do it like..." and get some background on why you can/can't/should/shouldn't think about doing it that way. It's not 100% right or applicable, especially with every ${library}, ${thing}, and ${language} but it's certainly faster to a good answer most of the time than SO or searching. Worst case failure? You've spent a couple minutes to find you need to spend a lot of time reading through the docs to do you one off thing yourself still.
imiric 4 days ago |
That's the way I currently use them. But, just like with SO, the code could be outdated and not work with the specific version of the library you're using, or just plain wrong. There's no way to tell it to show you code using version X.Y. The code could even be a mix of different versions and APIs, or the LLM might be trained on outdated versions, etc.
Even worse, the LLM will never tell you it doesn't know the answer, or that what you're trying to do is not possible, but will happily produce correct-looking code. It's not until you actually try it that you will notice an error, at which point you either go into a reprompt-retry loop, or just go read the source documentation. At least that one won't gaslight you with wrong examples (most of the time).
There are workarounds to this, and there are coding assistants that actually automate this step for you, and try to automatically run the code and debug it if something goes wrong, but that's an engineering solution to an AI problem, and something that doesn't work when using the model directly.
> Worst case failure? You've spent a couple minutes to find you need to spend a lot of time reading through the docs to do you one off thing yourself still.
It's not a couple of minutes, though. How do you know you've reached the limit of what the LLM can do, vs. not using the right prompt, or giving enough context? The answer always looks to be _almost_ there, so I'm always hopeful I can get it to produce the correct output. I've spent hours of my day in aggregate coaxing the LLM for the right answer. I want to rely on it precisely because I want to avoid looking at the documentation—which sometimes may not even exist or be good enough, otherwise it's back to trawling the web and SO. If I knew the LLM would waste my time, I could've done that from the beginning.
But I do appreciate that the output sometimes guides me in the right direction, or gives me ideas that I didn't have before. It's just that the thought of relying on this workflow to build fully-fledged apps seems completely counterproductive to me, but some folks seem to be doing this, so more power to them.
elorant 6 days ago |
Good luck debugging it on production.
cloverich 6 days ago |
I mean i debug code other engineers wrote every single day... being good at that is part of the job. The biggest difference is i never have to deal with the LLM writing parts i don't want it to write.
poszlem 6 days ago |
This is such a lazy, pointless comment that doesn't add anything to the conversation. It's also way off base about what LLMs can actually do, and the fact that they're pretty handy for debugging production code too.
jajko 6 days ago |
That's literally going through the dark maze blindfolded, just bouncing off the walls randomly and hoping you are generally at least moving to your goal.
If software engineering should look like this, oh boy am I happy to be retiring in mere 17 years (fingers crossed) and not having to spend more time in such work. No way any quality complex code can come up from such approach, and people complain about quality of software now .
psygn89 6 days ago |
If you have the budget, I have also taken a liking to perplexity.ai. I got it free from my school and it basically aggregates searches for me with sources (but be sure to check them since sometimes it reads between the links so to speak). It basically does the Google searching for me and have returned more up to date API info than Claude nor ChatGPT knew about. Then I would let Claude or ChatGPT know about it by copying doc and source code to work from.
squigz 6 days ago |
> The first prompt (with o1) will get you 60% there, but then you have a different workflow. The prompts can get to a local minimum, where claude/gpt4/etc.. just can't do any better. At which point you need to climb back out and try a different approach.
So you're basically bruteforcing development, a famously efficient technique for... anything.
lxgr 6 days ago |
Claude has worked amazingly well for me as somebody really not into UI/web development.
There are so many small tasks that I could, but until now almost never would automate (whether it's not worth the time [1] or I just couldn't bring myself to do it as I don't really enjoy doing it). A one-off bitmask parser at work here, a proof of concept webapp at home there – it's literally opened up a new world of quality-of-life improvements, in a purely quantitative sense.
It extends beyond UI and web development too: Very often I find myself thinking that there must be a smarter way to use CLI tools like jq, zsh etc., but considering how rarely I use them and that I do already know an ineffective way of getting what I need, up until now I couldn't justify spending the hours of going through documentation on the moderately high chance of finding a few useful nuggets letting me shave off a minute here and there every month.
The same applies to SQL: After plateauing for several years (I get by just fine for my relatively narrow debugging and occasional data migration needs), LLMs have been much better at exposing me to new and useful patterns than dry and extensive documentation. (There are technical documents I really do enjoy reading, but SQL dialect specifications, often without any practical motivation as to when to use a given construct, are really not it.)
LLMs have generally been great at that, but being able to immediately run what they suggest in-browser is where Claude currently has the edge for me. (ChatGPT Plus can apparently evaluate Python, but that's server-side only and accordingly doesn't really allow interactive use cases.)
[1] https://xkcd.com/1205/
CtrlAltmanDel 6 days ago |
What a feat! There's at least 3 pages of google search results for the nearly same thing. The "prompt" I used in google.com is:
site:github.com map comparison
I guess the difference, is that my way uses dramatically less time and resources, but requires directly acknowledging the original coders instead of relying on the plagiarism-ish capabilities of reguritating something through an LLM.
mvdtnz 6 days ago |
But creating things for which there are many existing, documented examples is what LLMs do best. Without this use case it's almost like they don't provide any value at all.
smusamashah 6 days ago |
Everything you can think of right now has already been made in one form or another and hence learnt by LLMs, do you agree?
Or
Can you come up easily with many things that LLMs have no clue of and hence will fail?
smallerfish 6 days ago |
Claude is fantastic. I think the model itself is good enough to be able to write good software when competently directed; it's let down only by the UI/UX around it.
My only complaints are:
a) that it's really easy to hit the usage limit, especially when refactoring across a half dozen files. One thing that'd theoretically be easyish to fix would be automatically updating files in the project context (perhaps with an "accept"/"reject" prompt) so that the model knows what the latest version of your code is without having to reupload it constantly.
b) it oscillating between being lazy in really annoying ways (giving largeish code blocks with commented omissions partway through) and supplying the full file unnecessarily and using up your usage credits.
My hope is that Jetbrains give up on their own (pretty limited) LLM and partner with Anthropic to produce a super-tight IDE native integration.
vunderba 6 days ago |
I think we're going to see a similar backlash to AI apps as we did with AI art.
Not necessarily because users can identify AI apps, but more because due to the lower barrier of entry - the space is going to get hyper-competitive and it'll be VERY difficult to distinguish your app from the hundreds of nearly identical other ones.
Another thing that worries me (because software devs in particular seem to take a very loose moral approach to plagiarism and basic human decency) is that it'll be significantly easier for a less scrupulous dev to find an app that they like, and use an LLM to instantly spin up a copy of it.
I'm trying not to be all gloom and doom about GenAI, because it can be really nifty to see it generate a bunch of boilerplate (YAML configs, dev opsy stuff, etc.) but sometimes it's hard....
grugagag 6 days ago |
No doubt about it, things will get very competitive in the software space and while anyone will be able to use generative AI tools, I think more will be expected for less.
vunderba 6 days ago |
Reminds me of when OpenAI rolled out custom GPTs, and in a matter of a few months there were more than a million of them on the store.
People don't seem to realize that the same thing is going to happen to regular app development once AI tooling gets even easier.
CaptainFever 6 days ago |
I hope not. I commend that software devs in particular seem to be adaptable to new technologies instead of trying to stop progress.
Take this very post for example. Imagine an artist forum having daily front-page articles on AI, and most of the comments are curious and non-negative. That's basically what HackerNews is doing, but with developers instead. The huge culture difference is curious, and makes me happy with the posters on this site.
You attribute it to the difficulty of using AI coding tools. But such tools to cut out the programmer and make it available to the layman has always existed: libraries, game engines, website builders, and now web app builders. You also attribute it to the flooding of the markets. But the website and mobile markets are famously saturated, and yet there we continue making stuff, because we want to (and because quality things make more money).
I instead attribute it to our culture of free sharing (what one might call "plagiarism"... of ideas?!), adaptability, and curiosity. And that makes me hopeful.
grp000 6 days ago |
Can anyone measure in how Claude compares to copilot? Copilot feels like a fancy auto complete, but people seem to have good experiences with Claude, even in more complex settings.
cluckindan 6 days ago |
You can use Claude in Copilot.
nitwit005 6 days ago |
Ideally, Claude should have told you about easier approaches. I don't see any reason to mess around with code.
There are plenty of website builder tools that will glue third party maps. Even the raw Google Maps API website will generate an HTML page with customized maps.
wayeq 6 days ago |
Is Claude 'better' than o1-preview? I've had phenomenal results with o1-preview (switching to o1-mini for simpler asks to avoid running out of queries), and tried Claude once and wasn't super impressed. Wondering if I should give it another shot.
glonq 5 days ago |
Has somebody evaluated the pros and cons of giving developers a programming-specific AI tool like copilot versus a general-purpose AI tool like chatgpt or claude? We are a small shop so I would prefer to not pay for both for every developer.
ronyba 5 days ago |
Is it possible in Java?