Ask HN: Are there any real examples of AI agents doing work?

76 points by nomad-nigiri 4 days ago | 63 comments

2025 is the year of agents. I’ve heard about SDR AI agents but not great things. Most “agents” sound like workflow automations that have been around forever. Anyone have an example of an “ai” agent which I understand to be intelligent that isn’t a glorified or rebranded workflow automation? Thx.

oraphalous 4 days ago |
I too would like to hear some examples.
On the one hand you have gurus claiming that AI agents are going to all make all SaaS redundant, on the other claiming that AI isn't going to take my coding job, but I need to adapt my workflows to incorporate AI. We all need to start preparing now for the changes that AI is going to cause.
But these two claims aren't compatible. If AGI and these super agents are that bonkers amazeballs that they can replace entire SaaS companies - then there is no way I'm going to be able to adapt my workflows to compete as a programmer.
Further, if the wildest claims about AI end up proving to be true - there is simply no way to prepare. What possible adaptation to my workflow could I possibly come up with that an AI agent could not surpass? Why should I bother learning how to implement (with today's apis) some RAG setup for a SaaS customer service chatbot when presumably an AI agent is going to make that skillset redundant shortly after?
I'm going to be interviewing for frontend roles soon, and for my prep I'm just going back to basics and making sure I remember on demand all the basics css, html, js/ts - fuck the rest of this noise.
A4ET8a8uTh0_v2 6 hours ago |
I have not seen one in production, but I did see 'agent products' sold to financial companies for compliance purposes ( sanctions, mortgage, other regs ). Fascinating stuff that got me mildly interested in MS troupe.
_sword 6 hours ago |
Could you name any products?
A4ET8a8uTh0_v2 6 hours ago |
Not by name ( edit: and in corporate product names seem to change a lot from where I sit ) but every bigger consulting company/vendor[2] that works with banks/brokers/financial institutions right now seems to have at least some offering in that space to ride ai wave. The presentation I saw specifically from Crowe[1].
[1]https://www.crowe.com/ae/-/media/crowe/firms/middle-east-and... [2]https://www.lexisnexis.com/community/insights/legal/b/though...
hnthrow90348765 6 hours ago |
I need an AI agent to continuously ask questions of PMs or stakeholders until the requirements are less vague. The good thing is this would be a plain english discussion which LLMs are good at. A PM can ask if something is technically feasible to some degree too. Maybe it can even break up tickets in a much better fashion too.
ceejayoz 6 hours ago |
> I need an AI agent to continuously ask questions of PMs or stakeholders until the requirements are less vague.
They’ll just get mad at the AI and tell it to stop asking so many questions. As they already do to humans.
ako 5 hours ago |
I’m a pm, today I built a working mockup with windsurf (golang + wails + vuejs +duckd). Windsurf uses codeium, branded as the first agentic IDE.
Your requirements will improve, not sure if in the long I still need developers to build the actual software.
The development process with windsurf is a bit like throwing a dice, hoping for a 6. A lot of trial and error, but if you check the git log, you see about 15 minutes between commit per feature request. Windsurf does a good job to summarize the entire feature request chat into a short git commit message. Every git commit reads like a user story.
jondwillis 5 hours ago |
How… do I find PMs like you? Literally have never worked with a single one that bothered to understand the technology they are building on top of at a deep enough level.
Maybe I just need to teach the ones I work with that it is now possible to trivially prototype many ideas without much or any coding skill.
th0ma5 5 hours ago |
Most PMs resist this because then they know the understanding of the requirements falls upon them at that point and this has been traditionally the role of architects, analysts, developers, other stakeholders etc and if you replace them with an LLM, well, it doesn't have the ability to be a true stakeholder in this way.
whamlastxmas 5 hours ago |
As a PM, ChatGPT is great at helping me write tickets in a structured format from me just giving it a single sloppy sentence. I of course review it to make sure it’s understanding me properly. But having to explicitly write stuff like intended behaviors when submitting bugs can be really laborious, though I understand why engineers sometimes need that level of clarity (having been one myself for 15 years)
fhd2 6 hours ago |
Programmers don't work in isolation. So I don't know how necessary it would be to quickly adapt your workflows to compete. If there's something that's useful to adopt, there will be a stream of blog posts, coworkers, people at user groups and what not spoon feeding what they learned to others. I don't think there's much cause for FOMO, I don't think it makes a big difference whether you start using a faster way to work a few months earlier or later than others. It can be cheaper to not jump on any hype train and potentially miss out on genuine improvements for a while, than to jump on all the hype trains and waste a lot of time on stuff that goes nowhere.
And like you said, if the wildest claims hold true, all programmers are out of a job by the end of 2026 anyway, with all other jobs following over the course of a few years. There's too many variables to predict what would happen in such a scenario, so probably best to deal with it if it happens.
So to me, your strategy checks out. I've personally invested some time into code generating and agentic tooling, but ultimately went back to Claude-as-Google-replacement. By my estimation, about a 5-10 % productivity boost compared to my workflow in 2022. The work is about the same, I just learn a bit faster.
lolinder 4 hours ago |
> And like you said, if the wildest claims hold true, all programmers are our of a job by the end of 2026 anyway, with all other jobs following over the course of a few years. There's too many variables to predict what would happen in such a scenario, so probably best to deal with it if it happens.
So much this. AGI is the equivalent of a nuclear apocalypse in many ways—it's unlikely, not unlikely enough for comfort, but also totally not worth preparing for because there's basically no way to predict what preparations would actually be helpful, nor is it obvious that you'd even want to survive it if it happened.
The expected value of prepping for it isn't worth the investment, so it's better to do what most of us already do for nuclear war and pretty much pretend it won't happen.
breckenedge 4 days ago |
Here’s a a talk from a month ago that covers a few use cases that are definitely beyond simple glorified workflow automation.
https://youtu.be/SpKtpW9TGF0?si=TRE6o7FfzCmhBuZq
smt88 5 hours ago |
This is a demo. OP is asking for examples of usage in production, where the agent is actually doing work.
It's also not really what people are promising with "agentic" because there's a human prompting and assisting it the entire time.
idkwhattocallme 4 days ago |
sales ops here. I was just tasked with figuring out how to use AI to use previous quotes to generate new quotes so sales people don't spend so much time creating quotes. Seems like the perfect thing for an agent. Anyone done this?
GianFabien 4 days ago |
In my pre-sales career, we just did copy and paste for spreadsheets and docs. Most quotes only require finding the nearest recent one and a replace-all for key bits of information followed by careful proof-reading.
schappim 4 days ago |
>> Anyone done this?
Yes, we have and more!
We sell maker and STEM education electronics, but the profit margins on products like Raspberry Pis, Micro:bits, and Arduinos are, well, pretty slim. This has pushed us to become extremely efficient; so much so that we ended up creating our own AI-agent-based ERP platform called Koi [1]
In essence, our work is built on the shoulders of giants like OpenAI’s Assistant API, Anthropic and Rails.
One of our standout demos is that certain objects (Orders, Quotes, Supplier Orders, Customers etc) in our database are assigned their own email addresses (using Rails' Action Mailbox[2]). Emails can be forwarded directly to these objects-whether it’s an order, a customer, or a supplier order.
From there, our agent, “Koi,” automatically extracts relevant information from emails and takes appropriate actions. For example, Koi can create a quote, attach a purchase order PDF to an order, or extract tracking information from supplier shipping confirmation emails to provide live tracking updates.
It also works the other way around; you can ask Koi to send a customer their tax invoice or inform them that a product they were interested in is out of stock, seamlessly handling typical customer service tasks.
Previously, we integrated speech-to-text functionality using the Whisper API, which made for an impressive demo.
Now, we’re taking it a step further by rebuilding our speech system to leverage OpenAI’s new WebRTC-based Real-time API. The key advantage here is that it comes with function calling support[3]. We already support a variety of automation features using barcodes[4], allowing users to scan a barcode and have Koi perform specific actions. This has proven to be an ideal area in the application to integrate tool use with the real-time API, creating even more powerful and efficient workflows.
Our ultimate goal is to integrate this system with Bishop, our product-picking robot[5].
[1] https://www.koi.app
[2] https://guides.rubyonrails.org/action_mailbox_basics.html
[3] https://platform.openai.com/docs/guides/realtime-model-capab...
[4] https://help.koi.app/article/54-barcode-driven-fulfillment
[5] https://piaustralia.com.au/pages/the-raspberry-pi-that-ships...
mattmanser 3 days ago |
Your spiel here is much better than the website you've linked.
What you've linked sounds like you're selling a glorified shipping label printer.
I'm curious how this differs from standard TA/TMS systems that have been around for decades. I work in the space and there are plenty of TA/TMS systems that print shipping labels and fulfil orders, that update stock levels and send out tracking emails + SMS messages, integrate with carriers for shipment updates, that integrate with Shopify, eBay, Etsy, big commerce, etc.
They didn't need AI to do any of that. What's the advantage you're finding?
Here's an example that seems to operate in Australia:
https://www.shipstation.com/
schappim 3 days ago |
Shipping is a fraction of what the system does. To completely automate shipping you need an understanding of inventory etc. To do automated customer service, you need knowledge of shipping, inventory etc.
mattmanser 2 days ago |
That's why they call it logistics and not shipping.
AznHisoka 4 days ago |
Replace the word “agent” with algorithm and I agree. Why overcomplicate things?
Lionga 3 days ago |
Cause he can say he used AI and get a promotion and the company can put AI on the website and make stock price go up.
brookst 6 hours ago |
The difference is that algorithms have known inputs and “agent” implies a a greater level of adaptability to unforeseen inputs.
odyssey7 6 hours ago |
That sounds alright, but I'm having difficulty imagining a situation where a business wants to produce a quote with novel element types / parameterizations not yet seen before without a human hand in the loop.
codingdave 3 days ago |
Sounds like a poorly thought out requirement. If you are tasked with speeding up the generation of quotes and find that AI can do the job well, that is perfectly reasonable. But if you are told what tool to use to make it happen, whomever tasked you with it doesn't understand that AI is a tool, not a goal. (I say that often enough, I may need to put it on t-shirts.)
threatripper 2 days ago |
For him and his boss and the boss of his boss it may well be a goal to use more AI in business processes. It may be decided in the strategy to spend X% on AI in the next 3 years. So you will do exactly that and not question if it makes sense at all.
linuxftw 6 hours ago |
I disagree here. It sounds to me like the requirements are clear: Use some AI "agent" to perform this task. That means it should be trained on a particular dataset, and it should perform a particular function. This would be in place of trying to write software to directly do this, just let the AI perform task processing, proposal drafting, document formatting.
jimkri 3 days ago |
I've created agents for the following:
- ICP / Sales Agent: I hired an offshore resource and built a GPT that they can send titles and other identifiers to, and it would say if it was in our ICP or not. I created it for a specific process that has outlined steps and FAQ from that person on things they have encountered, I plan on adding more questions and answers. This was super helpful on saving time on answering questions about titles / improving the results of their work.
- Domain Policy Scan (SPF, DKIM, DMARC): I scan domains and find SPF records and then use an Agent and a prompt to break out all the system tokens from the SPF to understand the systems companies are using. The prompt is a consent work in progress, but I have it done to be really consistent
Both have been really helpful to my overall workflow.
Lionga 3 days ago |
Isn't that just simple glorified workflow automation? Shouldn't "agents" do and decide what to do themselves based on the holy prophecy of VC and AI Startups ?
PhilippGille 3 days ago |
Yes, for example from Anthropic's definition [1]:
> Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
> Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
[1] https://www.anthropic.com/research/building-effective-agents
jimkri 5 hours ago |
This is helpful, thanks for sharing that.
OutOfHere 6 hours ago |
Those sound like basic LLM workflows with nothing agentic about it.
readyplayernull 2 days ago |
I like this distinction from automation by Bartosz Pucek:
At its core, an Agent is software that can:
Take in a task description Break it down into steps Execute those steps using available tools Adapt its approach based on feedback
The key distinction from traditional automation: Agents handle variance and uncertainty by replanning rather than failing when their happy path breaks.
Source: https://newsletter.pucek.com/p/2025-the-state-of-ai-agents-a...
alper 3 hours ago |
This seems impossibly broad. Realistically an agent would only be able to take tasks for whatever limited domain it can execute on.
So either every approach it does has to be hard coded or it would be able to use a bunch of very generic modules to plan and execute an approach.
davidgerard 3 hours ago |
so is there a real checkable example of one that's doing work?
jncfhnb 6 hours ago |
Most “agents” are just starter prompts + a small set of tools that they can use to respond to things, like access to a database.
They’re workflow automation
viraptor 4 hours ago |
By that definition, humans doing tasks are workflow automation.
jncfhnb an hour ago |
Humans are autonomous and not task specific
varelaseb 2 hours ago |
What else do you want them to be?
jncfhnb an hour ago |
I’d like it if we could trust agents to have agency but in practice they dont
Jerrrry 6 hours ago |
"Workflow automation" is gonna be the goalpost to beat
neom 6 hours ago |
The name for me is less important so much as can I have something that does my work for me. I've been starting to play with my own solutions between the 3 foundational modal companies. I've started to try to build my own stuff a bit, I think I need to learn more about apple scripting, also so far my experiments have required me to have multiple systems running to make it super easy for me.
You're all going to laugh at this stuff because it's so remedial and also clearly not agents but a couple things I've done... I won't say I really USE this stuff daily, I just play to see what I can do. I've figured out how to pass screenshots back and forth between modals (I have one computer take a screenshot every 30 minutes, and then send that screenshot to another machine, that machine is set up with a mouse hovering over the upload button on perplexity, it uploads the screenshot, and then perplexity does the work from the screenshot) An example of this that worked ok was I had chatgpt create all the themes for the social media schedule I needed to do this year, then I passed that screenshot to perplexity to do the searching on the web, and then I passed that to claud to write the tweet. This actually works ok-ish and I'm going to expand it a bit over the coming weeks I guess. Things like this are super helpful for weird hacks like that: https://github.com/BlueM/cliclick
Another thing I've found actually works pretty well is setting up two computers next to each other with ChatGPT voice mode, if you give them custom instructions to be sure to wait for the other one to be done talking, they don't interrupt each other and can get quite a bit of work done. Here is just a video of the mvp that I sent to a friend ages ago once I started playing with the idea: https://s.h4x.club/kpuzNkNL - I actually use this method of working quite often now, couple times a week at least, I find it's pretty helpful. If I knew how to put 4/5 modals together in one app and give them each custom instructions, I'd love to try building a team (if someone out there actually knows how to build this kinda stuff, I'm happy to help flesh out how the product would need to work, but I don't think it's super difficult to build at this point, I'm just not technical enough)
neom 2 hours ago |
Just an update here, I forgot I'm supposed to have childlike wonder and it's the weekend but then I remembered...sooo.... 4hours later I now have a complete marketing department of agents, it works pretty well actually. I gave it a high level task around building a full campaign, and it is. Here is the social media manager agent off on it's own composing the tweets, the social media manager agent is build with 4 internal agents, but calls out to my hackernews agent and my google search agent when needed. It actually works super well... you can see it running here, the manager even told it to do all the tweets for the year, so I presume it's going to stop at 365 tweets, https://s.h4x.club/eDubwABJ
Going to spend the rest of the day building out the full system till I have a complete complement of agents that can do every task in the startup, heh.
chevman 6 hours ago |
Been in BigCo land for 20 years now, and have seen the rise and fall of quite a few AI/ML/RPA etc fads.
Honestly the whole landscape seems broken and unproductive at this point.
Countless vendors, platforms, cloud environments, industry/technical jargon - all with different pricing models, SLAs, tooling, etc etc.
Getting anything usable is a challenge and most orgs spin in a never ending cycle of data integration/normalization work that produces little business value.
My advice to teams now is simplify, reduce, streamline - get to the kernel of what you think you need and protect it all costs. Most of the shiny new objects being pitched as silver bullets are just ways for other people to make money off your margin.
jokethrowaway 6 hours ago |
Just a buzzword for investors given we peaked with language models.
Chaining different prompts can be useful: calling that agents is purely marketing: these models are pretty dumb and don't have agency. I'd stay away from related frameworks
kodablah 6 hours ago |
What's wrong with an agent being a glorified workflow? At Temporal (where I work), it seems plenty natural for agentic AI to just be AI worklows. Here's a video we put out this week demonstrating it: https://youtu.be/GEXllEH2XiQ (code at https://github.com/steveandroulakis/temporal-ai-agent).
lcrmorin 6 hours ago |
Perfect exemple of what op is asking. This is just a demo. What problems does it solve for you or your clients so that you make money ?
kodablah 3 hours ago |
Replied to another in this thread, but basically https://temporal.io/in-use lists many, AI and not.
lolinder 5 hours ago |
This is what OP is explicitly not asking for—it's just a demo of a theoretical case, Temporal showing how a company that's hyped up on AI agents could use your platform to do agent-y things.
OP wants to know if anyone is actually using this stuff productively, not if anyone has tech demos. We've all seen more than enough tech demos.
kodablah 3 hours ago |
I fear I'll come off as a shill, but I've seen dozens of company uses of AI in workflows in the real world. Agents are just orchestrating multiple AI steps basically (granted not all of them are using AI to _pick_ the step to take which is often what "agentic" is seen as). Some are listed at https://temporal.io/in-use alongside the many non-AI things, e.g. https://temporal.io/resources/case-studies/bugcrowd, https://temporal.io/resources/on-demand/arc-xp-washington-po..., https://temporal.io/resources/on-demand/practical-tactical-a..., and more and more. All those companies use AI workflows in real world cases, and there are many more. I only showed the tutorial to agree with OP that it is just workflows with fuzzier steps and that's ok.
theptip 5 hours ago |
This is interesting stuff, and a great stepping stone. I think the excitement around true agents will come when the AI can author the workflow pipeline, so to speak, in response to a request.
This is an area where terminology is in flux but I think of weak agents as mostly-hardcoded, eg if you wrote a flight booking bot that can converse with you about flight options then go do the booking - but you specified the APIs and workflow engine. Strong agents can self-directedly follow long range goals over long time frames, eg “run this business unit for me” or “manage my portfolio”.
dhanushreddy29 6 hours ago |
I don't still understand what really the hype is here, agent is just "A SMART FUNCTION CALLING ROUTER" at ground level, nothing more nothing less
Can be called as smart bot or bot 2.0 or something, but agent is way too much. Nothing really is agentic in agents
falcor84 6 hours ago |
I'm not exactly clear what you're asking. Where do you draw the line between "workflow automation" and "doing work"? To me it just seems like a spectrum with rapidly moving goal posts.
A decade ago, enterprises had quite a lot of roles involving essentially moving data from one ERP screen to another. From what I'm seeing, these roles seem to be quickly disappearing, with a combination of proper API-based automation, GUI automation and most recently LLM "agents" in crucial steps.
And on a very different note, I as a developer could ask an AI tool such as Aider or Windsurf to perform a big refactoring or other code change, working autonomously across code changes and shell commands until it passes all tests - this is agentic behavior that I didn't have even a year ago.
tiffanyh 5 hours ago |
“Agents” are the new “middleware” / “workflow automation”.
What’s old is new again.
(Which is also why Salesforce is going big on agents. They acquired Mulesoft 8-years ago and agents are the next evolution of middleware)
deadbabe 5 hours ago |
How can people tolerate the non-deterministic nature of AI agents in critical production workflows?
th0ma5 5 hours ago |
That's the secret here that everyone knows so well you can't even really say it because it doesn't add anything, but, you can't I wouldn't think.
deadbabe 4 hours ago |
Not sure why it’s a secret, it’s a pretty big limitation, basically means AI agents are just a good tool for problem domains where mistakes can be tolerated or where no better alternative exists because the problem space is too vast to create solutions that work predictably 100% of the time.
th0ma5 5 hours ago |
Amazing that at nearing the 50 comment mark and there only seems to be people who have successfully created tutorial examples? And some other things that could be done with more purpose specific traditional solutions. And some people showing love for the concepts. This is probably the bleakest I've seen a Ask HN thread considering this is where all the money is going. I think one stark thing that maybe isn't being addressed is that the value of the models is being completely controlled by the model creators or else there would be at least one story by now of success that doesn't involve merely making the LLM products available to customers as a middle entity.
monsieurgaufre 4 hours ago |
I share your impression. For something that is hyped that much, it does not seem to have much real world use.
thekevan 5 hours ago |
I have mentioned this on Twitter recently. My stream there is full of people talking about agents being the future, several posts on how to make them, but almost zero examples of any that they have built or used.
williamcotton 5 hours ago |
Cursor’s agent in the Composer workflow will check the linter as well as run tests in the “yolo” mode.
What makes it an agent is the feedback loop of making a change and then seeing the results and making further changes.
rkuodys 3 hours ago |
Unfortunately I'm one of those who haven't working stuff but hopefully will have one soon enough.
My thought process on agentic work is following- treating them for input-output operations to merge with deterministic processes.
To be more specific- from what I see in my non-tech industry, when you try to implement process management, people are quite good and terrible at implementing agreed processes at the same time. They are great at detecting deviation from process - when exception is needed and terrible to do same thing 1000th time in a row.
So on high level, I think agents should address automation, and detect when there is deviation from the process. In which case a human person should take over.
Tldr - I don't thing agentic workflows without human will be there any time soon. But we will have 2 human + agents replacing 10 human team
davidgerard 3 hours ago |
reminiscent of:
Ask HN: Are there any substantial examples of blockchain solving a real problem? (2020)
https://news.ycombinator.com/item?id=22914430