If the improvements are beneficial now, then surely they were beneficial before.
Prior to LLMs, though, we could have been making judicious use of simple algorithmic approaches to process natural language constructs as command language. We didn't see a lot of interest in it.
A lot of money was poured into that goal, but because every type of action required a handcrafted integration, they were either costly to develop or extremely limited. That’s no longer the case.
Siri was released in 2011, and Alexa and Google Assistant followed soon thereafter. Companies spent tens of millions of dollars improving their algorithmic NLP because voice interfaces were "the future". I took a class in the late 2010s that went over all of the methodologies that they used for intent parsing and slot filling. All of that has been largely abandoned at this point in favor of LLMs for everything.
My hope is that at some point people will come back to these UI paradigms as we realize the limitations of "everything is a chat bot". There's a simplicity to the context-free limited voice assistants that had a set of specific use cases they could handle, and the effort to chatbot everything is starting to destroy the legitimate use cases that came out of that era like timers and reminders.
This was a bad direction then. Now, for better or worse, all those vendors got their miracle: LLMs are literally plug-and-play boxes that implement the "parse arbitrary natural-language queries and map them to system capabilities" functionality. Thanks to LLMs, voice interfaces could actually start working. If vendors could also get the "having useful functionality" part right.
(Note: this is distinct from "everything is a chat bot". That's a bad idea simply because typing text sucks, specifically typing out your thoughts in prose form is about the least efficient way to interact with a tool. Voice interfaces are an exception here.)
--
[0] - https://en.wikipedia.org/wiki/Controlled_natural_language
[1] - Perhaps this weird idea that controlled languages are too hard for general population, too much like programming, or such. They're not. More generally, we've always had to "meet in the middle" with our machines, and it was - and remains - always a highly successful approach.
Managers often expect subordinates to just know what they mean, but checking instructions and requirements is usually essential and imo is a mark of a good worker.
"Can you dispose of our latest product in a landfill"...
Generally in UK, unless the person is a major consumer of USA media, "can you" is an enquiry as to capability or whether an action is within the rules.
IME. YMMV.
In this example, if a human responded that way I would assume they were either being passive aggressive or were autistic or spoke English as a second language. A neurotypical native speaker acting in good faith would invariably interpret the question as a request, not a question.
I've asked LLM systems "can you..." questions. I'm asking surely about their capability and allowed parameters of operation.
Apparently you think that means I'm brain damaged?
uvx --from open-interpreter interpreter
I took the simplest route and pasted in an OpenAI API key, then I typed: find largest files on my desktop
It generated a couple of chunks of Python, asked my permission to run them, ran them and gave me a good answer.Here's the transcript: https://gist.github.com/simonw/f78a2ebd2e06b821192ec91963995...
i always thought the potential for openinterpreter would be kind of like an "open source chatgpt desktop assistant" app with swappable llms. especially vision since that (specifically the one teased at 4o's launch https://www.youtube.com/watch?v=yJHw33cVeHo) has not yet been released by oai. they made some headway with the "o1" device that they teased.. and then canceled.
instead all the demo usecases seem very trivial: "Plot AAPL and META's normalized stock prices". "Add subtitles to all videos in /videos" seems a bit more interesting but honestly trying to hack it in a "code interpreter" inline in a terminal is strictly worse than just opening up cursor for me.
i'd be interested if anyone here is active users of OI and what you use it for.