Gen AI doesn't have a coherent understanding of the world, researchers suggest

6 points by indigoabstract 16 hours ago | 22 comments

gnabgib 16 hours ago |
Related: Despite its impressive output, generative AI doesn’t have a coherent understanding of the world https://news.ycombinator.com/item?id=42049482
davydm 15 hours ago |
Gen ai doesn't "understand" _anything_. It simply spits out the next "likely" occurrence of something based on preceding patterns. I'm not sure why people keep thinking that it "understands" anything.
Arnt 15 hours ago |
Because it isn't a Markov chain. Gaining a good understanding of what it is and of its limits makes sense.
Too many people think of the recent AI advances as a sort of Markov chain with lipstick. That's deeply wrong.
indigoabstract 13 hours ago |
It's pretty easy to make that mistake. Tbh, can you really say you understand something until you've built it yourself?
I think LLMs are still pretty much "magic" black boxes for most people (programmers included).
Ukv 13 hours ago |
> It simply spits out the next "likely" occurrence
I'd claim that the "likely" continuation is a bit of a nebulous concept, or at least not always trivial to work out. Completing sequences like "Q: Solve `7x - 4 = 2x` for x. \nA: " may require some internal model of basic algebraic rules, for instance. Where present, I don't think it's unreasonable to call that an understanding.
jqpabc123 11 hours ago |
Q: Count the letters in 'Hippopotamus'.
A: 9 letters
Wrong! The "Large Language Model" obviously lacks a coherent understanding of language.
"Language" is supposedly it's bread and butter but the "language" it produces is "correct" only from a statistical perspective.
In other words, it simply can't be trusted --- because it lacks any coherent understanding of what you or it are saying/asking.
LLMs fundamentally turn computing on it's head. Instead of accurate results at low cost, they produce inaccurate results at high cost.
Ukv 10 hours ago |
> Wrong! The "Large Language Model" obviously lacks a coherent understanding of language.
It's not all-or-nothing for the entirety of language, but yeah it's known that LLMs are imprecise at mapping from tokens to characters (as a human who only saw tokens may be, to be fair).
For an unseen question, even just to follow the Q/A format and give a right-category answer already demonstrates some model of language, I'd argue.
> "Language" is supposedly it's bread and butter but the "language" it produces is "correct" only from a statistical perspective. In other words, it simply can't be trusted.
Nothing can be trusted 100%, and current LLMs are far below humans, but both the structure and the meaning of their language is correct significantly above a random-guessing baseline.
A pretrained foundation model will have representations of fiction and non-fiction content, and can be guided towards either one with RL or a system prompt.
> LLMs fundamentally turn computing on it's head. Instead of accurate results at low cost, they produce inaccurate results at high cost.
Compared to what? LLMs often do things that traditional algorithms couldn't, faster and cheaper than hiring a human.
jqpabc123 10 hours ago |
... demonstrates some model of language, I'd argue.
Yes, it's model is all based on statistics. It lacks any real "coherent understanding".
It produces nice prose --- that simply can't be trusted for even rudimentary results.
Ukv 10 hours ago |
What do you think "real" understanding is?
To me, if something can map some modality (the word "horse", a picture of a horse, ...) into corresponding representation in its internal semantic space then I'm fine to say it "understood" that. There are varying degrees to how useful an understanding is (does the semantic space link related concepts closely together? can it be used to reason, extract information, and make predictions?).
jqpabc123 10 hours ago |
...can it be used to reason, extract information, and make predictions?
Reason? Clearly no. It doesn't understand what you're asking well enough to "reason". But it will gladly offer an attempt based on some statistical mishmash of patterns it has been trained with.
extract information, and make predictions? Yes --- just not reliably or accurately. Anything it produces needs verification --- which defeats a huge chunk of it's utility.
Would you trust a calculator that fails a simple math test? Of course not. So why trust an LLM?
Q: What is the square root of 2? A: approximately 1.41421356
Q: What is the square root of one plus one times forty five minus eighty eight? A: i * sqrt(42)
Ukv 9 hours ago |
> Reason? Clearly no. It doesn't understand what you're asking well enough to "reason".
If you ask something a question ("Could a car fit in a bread-box?"), it maps from words into concepts in its internal semantic space ("car" to representation of car, associated with large size), then applies rules/patterns in a way that reaches a correct conclusion more often than an RNG baseline - what concretely would you say is missing from the process?
> Would you trust a calculator that fails a simple addition test? Of course not. So why trust an LLM?
You can weigh something in proportion to its accuracy, opposed to giving total trust or nothing at all. An 80%-accurate weather prediction is still useful, as are calculators despite potential for hardware issues or (more likely) fat-thumbing the wrong button.
> Q: What is the square root of one plus one times forty five minus eighty eight? A: i * sqrt(42)
I'm unclear on what you intend to be showing here. Is this an LLM response?
jqpabc123 8 hours ago |
Is this an LLM response?
Yes. It shows how easily an LLM can misinterpret subtlety and ambiguity in human language.
Ukv 8 hours ago |
> Yes. It shows how easily an LLM can misinterpret subtlety and ambiguity in human language.
Without parentheses or context, -42 is a perfectly reasonable answer to "one plus one times forty five minus eighty eight", to the extent that it's probably what I'd go with if asked it on a quiz.
I imagine when you read it in you're head you're putting emphasis/pauses in places that lean towards the `((1+1) * 45) - 88` interpretation, but neither I nor the LLM have that information.
If asked, GPT-4 can give reasonable responses to this issue: https://i.imgur.com/ptMB7Ci.png
jqpabc123 7 hours ago |
It is applying out of order symbolic precedence rules to prose language. These rules are not universal and certainly not when writing in prose form.
LLMs frequently misinterpret and proceed without hesitation. They lack the necessary understanding to ask for clarification.
Ukv 7 hours ago |
> It is applying symbolic precedence rules to prose language. These rules are not universal and certainly not when writing in prose form.
You have to apply some form of precedence in order to evaluate it at all. Question is whether to use PEMDAS, or left-to-right. I don't see anything wrong with the LLM's initial choice for the former, particuarly since it can explain the alternative and that it may be the more likely intention if spoken casually out loud as opposed to written as text.
To potentially get back more to the core of the issue (rather than going on a tangent about whether an answer many humans would give is a misinterpretation): How do you define understanding? What do you think is currently missing?
-- (edit, to respond to your edit:) --
> LLMs frequently misinterpret and proceed without hesitation. They lack the necessary understanding to ask for clarification.
I'd conjecture that always giving an answer and not asking for clarification first may be an artifact of the system prompt/RLHF of the LLM you're using. Do you believe that asking for clarification on ambigious questions requires understanding?
jqpabc123 6 hours ago |
Do you believe that asking for clarification on ambigious questions requires understanding?
Absolutely! Recognizing ambiguity requires a deeper level of cognition and analysis than simply making assumptions and plowing ahead to the wrong conclusion.
LLMs clearly have no understanding of what "the letters in 'Hippopotamus'" means --- and they aren't smart enough to ask for or to really grasp an explanation.
Ukv 6 hours ago |
> > Do you believe that asking for clarification on ambigious questions requires understanding?
> Absolutely! Recognizing ambiguity requires a deeper level of cognition and analysis [...]
Then if asking for clarification requires understanding, what do you make of cases where the LLM does ask for clarification (e.g: https://i.imgur.com/OXQF2nx.png)? If it's doing something that requires understanding, does that not imply understanding?
jqpabc123 3 hours ago |
It's a step in the right direction.
Unfortunately, this not that common and typically only occurs in very simple cases. In my experience, an LLM will typically just plow ahead in most cases and let the user figure it out --- or not.
Ukv 3 hours ago |
> typically only occurs in very simple cases
Would you agree then that it has understanding of at least these simple cases?
Personally I think the fact ChatGPT/etc. don't ask for clarification more often is largely just down to system prompt and RLHF encouraging it to always give an answer, not necessarily any deeper limitation of LLMs. May be worth testing how well it can recognize ambiguity when specifically tasked to do so, or standard Q&A but with a system prompt that clarifies that it can ask for information before answering.
grandma_tea 3 hours ago |
Your point aside, counting letters is not a good task for LLMs since they aren't trained on letters. They are trained on tokens which represent words or parts of words.
jqpabc123 2 hours ago |
In other words, they don't really understand "language" and it's constituent parts.
krapp 6 hours ago |
>I'm not sure why people keep thinking that it "understands" anything.
Because we humans are hard wired to interpret language as a signal of intelligence and sapience, of a "thing like us." It's extremely counterintuitive to most people to be able to hold a coherent conversation with an entity that has no actual understanding of anything it seems to be talking about.
Because there are billions of dollars being poured into advertising LLMs as essentially intelligent agents that can and do understand things, often better and more reliably than human beings. I've seen numerous people refer to them having access to "the sum of all human knowledge," implying that not only are they intelligent, they should be considered nigh omniscient. There's a lot of money at state in establishing cultural (and then legal) precedent for LLMs being intelligent agents equivalent to human beings, where issues like fair use and copyright are concerned.
Because people want to believe. For various reasons, many to do with decades of pop culture and sci fi informing people's idea of what "AI" is, there's a transhumanist faith in the inevitability of AGI and singularity within tech that has just been waiting for some emergent phenomenon to fill that void, and LLMs seem good enough. I think there's a similar phenomenon at work with people's enthusiasm for Elon Musk as well, these things appear to point towards the kind of future that nerds read about in books.