"LLM as UI" seems to be something hanging pretty low on the tree of opportunity. Why spent months struggling with complex admin dashboard layouts and web frameworks when you could wire the underlying CRUD methods directly into LLM prompt callbacks? You could hypothetically make the LLM the exclusive interface for managing your next SaaS product. There are ways to make this just as robust and secure as an old school form punching application.
- you ask LLM to build a workflow for your problem
- the LLM builds the workflow (macro) using predefined commands
- you review the workflow (can be an intuitive list of commands, understandable by non-specialist) - to weed out hallucinations and misunderstanding
- you save the workflow and can use it without any LLM agents, just clicking a button - pretty determenistic and reliable
Advantages:
- reliable, deterministic
- you don't need to learn a product's UI, you just formulate your problem using natural language
so you define a DSL that the LLM outputs, and that's the real UI
>- you don't need to learn a product's UI, you just formulate your problem using natural language
yes, you do. You have to learn the DSL you just manifested so that you can check it for errors. Once you have the ability to review the LLM's output, you will also have the ability to just write the DSL to get the desired behavior, at which point that will be faster unless it's a significant amount of typing, and even then, you will still need to review the code generated by the LLM, which means you have to learn and understand the DSL. I would much rather learn a GUI than a DSL.
You haven't removed the UI, nor have you made the LLM the UI, in this example. The DSL ("intuitive list of commands.. I guess it'll look like the Robot Framework right? that's what human-readable DSLs tend to look like in practice) is the actual UI.
This is vastly more complicated than having a GUI to perform an action.
1) "it's unpredictable each time" - it won't be, if a workflow is saved and tested, because when it's run, no LLM is involved anymore in decision making
2) I did remove the UI, because I don't need to learn the UI, I just formulate my problem and the LLM constructs a possible workflow which solves my problem out of predefined commands known to the system.
Sure this is most useful for more complex apps. In our homegrown CRM/ERP, users have lots of different workflows depending on their department, and they often experiment with workflows, and today they either have to click through everything manually (wasting time) or ask devs to implement the needed workflow for them (wasting time). If your app has 3 commands on 1 page then sure, it's easier to do it using GUI.
Also IMHO it can be used alongside with GUI, it doesn't need to replace it, I think it's great for discoverability/onboarding and automation, but if you want to click through everything manually, why not.
2) You can test the created workflow on a bunch of test data to verify it works as intended. After a workflow is created, it's deterministic (since we don't use LLMs anymore for decision making), so it will always work the same.
Sure we can expose DSL to power users as an option, but is reading the raw DSL really required for the majority of cases?
2. This is absolutely true and it does help somewhat. However, writing the test cases is now your bottleneck (and you're writing them as a substitute for being able to read a reliable high-level summary of what the workflow actually is).
I am pretty sure PLCs with ladder logic are about the limits of the traditional visual/macro model?
Word-sense disambiguation is going to be problematic with the 'don't need to learn' part above.
Consider this sentence:
'I never said she stole my money'
Now read that sentence multiple times, puting emphasis on each word, one at a time and notice how the symantic meaning changes.
LLMs are great at NLP, but we still don't have solutions to those NLU problems that I am aware of.
I think to keep maximum generality without severely restricted use cases that a common DSL would need to be developed.
There will have to be tradeoffs made, specific to particular use cases, even if it is better than Alexa.
But I am thinking about Rice's theorm and what happens when you lose PEM.
Maybe I just am too embedded in an area where these problems are a large part of the difficulty for macro style logic to provide much use.
This is the idea that is most valuable from my perspective of having tried to extract accurate requirements from the customer. Getting them to learn your product UI and capabilities is an uphill battle if you are in one of the cursed boring domains (banking, insurance, healthcare, etc.).
Even if the customer doesn't get the LLM-defined path to provide their desired final result, you still have their entire conversation history available to review. This seems more likely to succeed in practice than hoping the customer provides accurate requirements up-front in some unconstrained email context.
Even then mistakes can slip through, but it could still be more reliable than a visual UI.
There are lots of horrible web UIs i would LOVE to replace with a conversational LLM agent. No #1 is jira and so is no #2 and #3.
It's difficult to be precise. Often it's easier to gauge things by looking at them while giving motor feedback (e.g. turning a dial, pushing a slider) than to say "a little more X" or "a bit less Y".
Language is poorly suited to expressing things in continuous domains, especially when you don't have relevant numbers that you can pick out of your head - size, weight, color etc. Quality-price ratio is a particularly tough one - a hard numeric quantity traded off against something subjective.
Most people can't specify up front what they want. They don't know what they want until they know what's possible, what other people have done, started to realize what getting what they want will entail, and then changed what they want. It's why we have iterative development instead of waterfall.
LLMs are a good start and a tool we can integrate into systems. They're a long, long way short of what we need.
Yes if you want to annoy your users and deliberately put roadblocks to make progress on a task. Exhibit A: customer support. They put the LLM in between to waste your time. It’s not even a secret.
> Why spent months struggling with complex admin dashboard layouts
You can throw something together, and even auto generate forms based on an API spec. People don’t do this too often because the UX is insufficient even for many internal/domain expert support applications. But you could and it would be deterministic, unlike an LLM. If the API surface is simple, you can make it manually with html & css quickly.
Overuse of web frameworks has completely different causes than ”I need a functional thing” and thus it cannot be solved with a different layer of tech like LLMs, NFTs or big data.
No this is because they use the LLM not only as human interface but also as a reasoning engine for troubleshooting. And give it way less capability than a human agent to boot. So all it can really do is serve FAQs and route to real support.
In this case the fault is not with the LLM but with the people that put it there.
That's a funny definition to me, because doing so would mean the LLM is the agent, if you use the classic definition for "user-agent" (as in what browsers are). You're basically inverting that meaning :)
quick demo: https://youtu.be/2zvbvoRCmrE
> An agent, in the context of AI, is an autonomous entity or program that takes preferences, instructions, or other forms of inputs from a user to accomplish specific tasks on their behalf. Agents can range from simple systems, such as thermostats that adjust ambient temperature based on sensor readings, to complex systems, such as autonomous vehicles navigating through traffic.
This appears to be the broadest possible definition, encompassing thermostats all the way through to Waymos.
This is exactly the problem and these two categories nicely sum up the source of the confusion.
I consider myself in the former camp. The AI needs to determine my intent (book a flight) which is a classification problem, extract out the relevant information (travel date, return date, origin city, destination city, preferred airline) which is a Named Entity Recognition problem, and then call the appropriate API and pass this information as the parameters (tool usage). I'm asking the agent to perform an action on my behalf, and then it's taking my natural language and going from there. The overall workflow is deterministic, but there are elements within it that require some probabilistic reasoning.
Unfortunately, the second camp seems to be winning the day. Creating unrealistic expectations of what can be accomplished by current day LLMs running in a loop while simultaneously providing toy examples of it.
'You ask it to do something, and it does it'
That makes it difficult to differentiate the more critical 'how' options in the execution process. From that perspective: deterministic integrations, LLM+tools, LAM, etc are more descriptive categories, each with their own capabilities, strengths, and weaknesses.
Or to put it a different way, if the term doesn't tell you what something is good and bad at, it's probably an underspecified term.
To illustrate: here's a paper from 1996 that tries to lay out a taxonomy of the different kinds of agents and provide some definitions:
https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&d...
And another from the same time-frame, which makes a similar effort:
https://www.researchgate.net/profile/Stan-Franklin/publicati...
Scaling agent capability requires agents that are able to auto-map various tools.
If every different tool is a new, custom integration, that must be written by a person, then we end up where we are today -- specialized agents where there exists enough demand and stability to write and maintain those integrations, but no general purpose agents.
Ultimately, parameter mapping in a sane, consistent, globally-applicable way is the key that unlocks an agentic future, or a failure that leads to its demise.
Now that name makes a lot more sense to me.
I suspect they'll follow up with a full paper with more details (and artifacts) of their proposed approach.
WAY longer than that. What's come to the forefront specifically in the last year or two is very specific subset of the overall agent landscape. What I like to call "LLM Agents". But "Agents" at large date back to at least the 1980's if not before. For some of the history of all of this, see this page and some of the listed citations:
https://en.wikipedia.org/wiki/Software_agent
> Agents are just LLMs with structured output
That's only true for the "LLM Agent" version. There are Agents that have nothing to do with LLM's at all.
In 1994 people were already complaining that the term that had no universal agreed definition: https://simonwillison.net/2024/Oct/12/michael-wooldridge/
I could not find a "Agents considered harmful" related to AI, but there is this one: "AgentHarm: A benchmark for measuring harmfulness of LLM agents" https://arxiv.org/pdf/2410.09024
This "Agents considered harmful" is not AI-related: https://www.scribd.com/document/361564026/Math-works-09
https://www.anthropic.com/research/building-effective-agents
"For many applications, however, optimizing single LLM calls with retrieval and in-context examples is usually enough."
However The state of agents slightly changed and while we had 25% accuracy in multiturn conversations we re now at 50.
Soon it will be AI Microservices
/sarcasm, hopefully obviously
So was "mobile" 15 years ago. Companies are deploying hundreds of billions in capital for this. It's not going anywhere, and you'd be best off upskilling now instead of dismissing things.
>solve for prompt injection attacks
It is essentially the same Code as Data problem as always.So we should think about these things in terms of how much agency are we willing to give away in each case and for what gain[1].
Then the ecosystem question that the paper is trying to solve will actually solve itself, because it is already the case today that in many processes agency has been outsourced almost fully and in others - not at all. I posit that this will continue, just expect a big change of ratios and types of actions.
[1] https://essays.georgestrakhov.com/artificial-agency-ladder/
Yes, the term is becoming ambiguous, but that's because it's abstracting out the part of AI that is most important and activating: the ability to work both independently and per intention/need.
Per the paper: "Key characteristics of agents include autonomy, programmability, reactivity, and proactiveness.[...] high degree of autonomy, making decisions and taking actions independently of human intervention."
Yes, "the ecosystem will evolve," but to understand and anticipate the evolution, one needs a notion of fitness, which is based on agency.
> So we should think about these things in terms of how much agency are we willing to give away in each case
It's unclear there can be any "we" deciding. For resource-limited development, the ecosystem will evolve regardless of our preferences or ethics according to economic advantage and capture of value. (Manufacturing went to China against the wishes of most everyone involved.)
More generally, the value is AI is not just replacing work. It's giving more agency to one person, avoiding the cost and messiness of delegation and coordination. It's gaining the same advantages seen where smaller team can be much more effective than a larger one.
Right now people are conflating these autonomy/delegation features with the extension features of AI agents (permitting them to interact with databases or web browsers). The extension vendors will continue to claim agency because it's much more alluring, but the distinction will likely become clear in a year or so.
Certainly those in China and the executive suites of Western countries wished it, and made it happen. Arguably the western markets wanted it too when they saw the prices dropping and offerings growing.
AI isn't happening in a vacuum. Shareholders and customers are buying it.
Hugging Face have their own definitions of a few different types of agent/agentic system here:
https://huggingface.co/docs/smolagents/en/conceptual_guides/...
As related to LLMs, it seems most people are using "agent" to refer to systems that use LLMs to achieve some goal - maybe a fairly narrow business objective/function that can be accomplished by using one or more LLMs as a tool to accomplish various parts of the task.
It's just calling a LLM n-times with slightly different prompts
Sure, you get the ability to correct previous mistakes, it's basically a custom chain of thought - but errors compound and the results coming from agents have a pretty low success rate.
Bruteforcing your way out of problems can work sometimes (as evinced by the latest o3 benchmarks) but it's expensive and rarely viable for production use.
It can be, but ideally each agent’s model, prompts and tools are tailored to a particular knowledge domain. That way tasks can be broken down into subtasks which are classified and passed to the agents best suited to them.
Agree RE it being bruteforce and expensive but it does look like it can improve some aspects of LLM use.
That's one way of building something you could call an "agent". It's far from the only way. It's certainly possible to build agents where the LLM plays a very small role, or even one that uses no LLM at all.
In practice the current usage of "agent" is just: a program which does a task and uses an LLM somewhere to help make a decision as to what to do and maybe uses an LLM to help do it.
If AGI or SI(super intelligence)/is possible, and that is an if...I don't think LLM's are going to be this silver bullet solution Just as we have in the real world of people who are dedicated to a single task in their field like a lawyer or construction workers or doctors and brain surgeons, I see the current best path forward as being a "mixture of experts". We know LLM's are pretty good for what iv seen some refer to as NLP problems, where the model input is the tokenized string input. However I would argue an LLM will never built a trained model like stockfish or deepseek. Certain model types seem to be suited to certain issues/types of problems or inputs. True AGI or SI would stop trying to be a grand master of everything but rather know what best method/model should be applied to a given problem. We still do not know if it is possible to combine the knowledge of different types of neural networks like LLMs, convolutional neural networks, and deep learning...and while its certainly worth exploring, it is foolish to throw all hope on a single solution approach. I think the first step would be to create a new type of model where given a problem of any type. It knows the best method to solve it. And it doesn't rely on itself but rather the mixture of agents or experts. And they don't even have to be LLMs. They could be anything.
Where this really would explode is, if the AI was able to identify a problem that it can't solve and invent or come up with a new approach, multiple approaches, because we don't have to be the ones who develop every expert.
It could be part of an AGI, specifically the human interface part. That's what an LLM is good at. The rest (knowledge oracle, reasoning etc) are just things that kinda work as a side-effect. Other types of AI models are going to be better at that.
It's just that since the masses found that they can talk to an AI like a human they think that it's got human capabilities too. But it's more like fake it till you make it :) An LLM is a professional bullshitter.
Not parent-poster, but an LLM is a tool for extending a document by choosing whatever statistically-seems-right based on other documents, and it does so with no consideration of worldly facts and no modeling of logical prepositions or contradictions. (Which also relates to math problems.) If it has been fed on documents with logic puzzles and prior tests, it may give plausible answers, but tweaking the test to avoid the pattern-marching can still reveal that it was a sham.
The word "bullshit" is appropriate because human bullshitter is someone who picks whatever "seems right" with no particular relation to facts or logical consistency. It just doesn't matter to them. Meanwhile, a "liar" can actually have a harder job, since they must track what is/isn't true and craft a story that is as internally-consistent as possible.
Adding more parts around and LLM won't change that: Even if you add some external sensors, a calculator, a SAT solver, etc. to create a document with facts in it, once you ask the LLM to make the document bigger, it's going to be bullshitting the additions.
They do hallucinate at times, but you’re missing a lot of real utility by claiming they are basically bullshit engines.
They can now use tools, and maintain internal consistency over long context windows (with both text and video). They can iterate fully autonomously on software development by building, testing, and bug fixing on real world problems producing usable & functioning code.
There’s a reason Microsoft is putting $80 billion dollars on the line to run LLMs. It’s not because they are full of shit!
In a way it's worse: Even the "talking to" part is an illusion, and unfortunately a lot of technical people have trouble remembering it too.
In truth, the LLM is an idiot-savant which dreams up "fitting" additions to a given document. Some humans have prepared a document which is in the form of a a theater-play or a turn-based chat transcript, with a pre-written character that is often described as a helpful robot. Then the humans launch some code that "acts out" any text that looks like it came from that fictional character, and inserts whatever the real-human-user types as dialogue for the document's human-character.
There's zero reason to believe that the LLM is "recognizing itself" in the story, or that is is choosing to self-insert itself into one of the characters. It's not having a conversation. It's not interacting with the world. It's just coded to Make Document Bigger Somehow.
> they think that it's got human capabilities too
Yeah, we easily confuse the character with the author. If I write an obviously-dumb algorithm which slaps together a story, it's still a dumb algorithm no matter how smart the robot in the story is.
It doesn't have to, the LLM just needs access to a computer. Then it can write the code for Stockfish and execute it. Or just download it, the same way you or I would.
> True AGI or SI would stop trying to be a grand master of everything but rather know what best method/model should be applied to a given problem.
Yep, but I don't see how that relates to LLMs not reaching AGI. They can already write basic Python scripts to answer questions, they just need (vastly) more advanced scripting capabilities.
Of course, cost-wise and training time wise, we're probably a long way off from being able to replicate that in a general purpose NN. But in theory, given enough money and time, presumably it's possible, and conceivably would produce better results.
One might argue maybe a mixture of experts is just the best that can be done - and that it's unlikely the AGI be able to design new experts itself. However where do the limited existing expert problem solvers come from? Well - we invented them. Human intelligences. So to argue that an AGI could NOT come up with its own novel expert problem solvers implies there is something ineffable about human general intelligence that can't be replicated by machine intelligence (which I don't agree with).
I don't get this line of thinking. AGI already exists - it's in our heads!
So then the question is: is what's in our heads magic, or can we build it? If you think it's magic, fine - no point arguing. But if not, we will build it one day.
Now ask it to solve step by step by pure reasoning. You'll get a really intelligent sounding response that sounds correct, but on closer inspection makes absolutely no sense, every step has ridiculous errors like "we start with options {1, 7} but eliminate 2, leaving only option 3", and then at the end it just throws all that out and says "and therefore ..." and gives you the original answer.
That tells me there's essentially zero reasoning ability in these things, and anything that looks like reasoning has been largely hand-baked into it. All they do on their own is complete sentences with statistically-likely words. So yeah, as much as people talk about it, I don't see us as being remotely close to AGI at this point. Just don't tell the investors.
I think the 5 issues they provide under “Cognitive Architectures” are severely underspecified to the point where they really don’t _mean_ anything. Because the issues are so underspeficifed I don’t know how their proposed solution solves their proposed problems. If I understand it correctly, they just want agents (Assistants/Agents) with user profiles (Sims) on an app store? I’m pretty sure this already exists on the ChatGPT store. (sims==memories/user profiles, agents==tools/plugins, assistants==chat interface)
This whole thing is so broad and full of academic (pejorative) platitudes that it’s practically meaningless to me. And of course although completely unrelated they through a reference into symbolic systems. Academic theater.
Of course it's going to be vague and presumptuous. It's more of a high-level executive summary for tech-adjacent folks than an actual research paper.