Spoilers ahead!
First novel: The Trisolarans did not contact earth first. It was the other way round.
Second novel: Calling the conflict between humans and Trisolarans a "complex strategic game" is a bit of a stretch. Also, the "water drops" do not disrupt ecosystems. I am not sure whether "face-bearers" is an accurate translation. I've only read the English version.
Third novel: Luo Yi does not hold the key to the survival of the Trisolarans and there were no "micro-black holes" racing towards earth. Trisolarans were also not shown colonizing other worlds.
I am also not sure whether Luo Ji faced his "personal struggle and psychological turmoil" in this novel or in an earlier novel. He certainly was most certain of his role at the end. Even the Trisolarians judged him at over 92 % deterrent rate.
He's like God's perfect sociopath. He wobbles between total indifference to his mission and interplanetary murder-suicide, and the only things that seem to really get to him are a stomachache and being ghosted by his wife.
Although, I just tried with normal Qwen 2.5 72B and Coder 32B and they only did a little better.
Though I would say humans would have difficulty too -- say, having read The Three Body problem before, then reading a slightly modified version (without being aware of the modifications), and having to recall specific details.
That said, once you have defined what is required, I believe you will have solved the problem.
ChatGPT has synthesized my past three vacations and regularly plans my family's meals based on whatever is in my fridge. I completely disagree.
Not any worse than this sentence. Counter it with a higher value comment.
You are a single person and LLMs have been trained on the output of billions. Any given choice you make can be predicted with extraordinary probability by looking at your inputs and environment and guessing that you will do what most other people do in that situation.
This is pretty basic stuff, yes? Especially on HN? Great ideas are a dime a dozen, and every successful startup was built on an idea that certainly wasn't novel, but was executed well.
Cookbooks on the other hand are the result of people comimg up with recipes for foods which they obviously taste and like. LLMs ingested all this data from cookbooks, but whatever is mixed up into new ideas are just lucky shuffles, it can’t be any other way cause remember, an LLM cannot taste ingredients or final dishes. I think you’re projecting into them more than what they are.
And you know this for certain, how? So because it can suggest a few weird dishes it has a higher intelligence than most humans? Someone call Better Housekeeping.
What do you mean synthetised your vacations? You seem to be very passionate with ChatGPT to entrust it with making decisions on your meals but I suspect the novelty will wear off for you too. I use ChayGPT from time to time and admit Im pleasantly surprised by the outcome when it all works out, but Im not swayed to think it is reasoning the way we do, it’s just a tool after all.
They are. Like millions of monkeys, but drastically better.
Also human can reason, LLMs currently can't do this in useful way and is very limited by their context in all the trials to make it do that. Not to mention their ability to make new things if they do not exist (and not complete made up stuff that are non-sense) is very limited.
1. The vast majority of people never come up with a truly new idea. those that do are considered exceptional and their names go down in history books.
2. Most 'new ideas' are rehashes of old ones.
3. If you set the temperature up on an LLM, it will absolutely come up with new ideas. Expecting an LLM to make a scientific discover a la einstein is ... a bit much, don't you think [1]? When it comes to 'everyday' creativity, such as short poems, songs, recipes, vacation itineraries, etc. ChatGPT is more capable than the vast majority of people. Literally, ask ChatGPT to write you a song about _____, and it will come up with something creative. Ask it for a recipe with ridiculous ingredients and see what it does. It'll make things you've never seen before, generate an image for you and even come up with a neologism if you ask it too. It's insanely creative.
[1] Although I have walked chatgpt through various theoretical physics scenarios and it will create new math for you.
Depends on your definition of "truly" new since any idea could be argued to be a mix of all past ideas. But I see truly new ideas all the time without going down in the history books because most new ideas are incrementally building on what came before or are extremely niche and only a very few turn out to be a massive turning point which has a broad impact which is also only usually evident in retrospect (e.g. blue LEDs was basically trial and error and almost an approach that was given up on, transistors were believed to be impactful but not a huge revolution for computing like they turned out to be, etc etc).
My personal feeling when I engage in these conversations is that we humans have a cognitive bias to ascribe a human remixing of an old idea to intelligence, but an AI-model remixing of an old idea as lookup.
Indeed, basically every revolutionary idea is a mix of past ideas if you look closely enough. AI is a great example. To the 'lay person' AI is novel! It's new. It can talk to you! It's amazing. But for people who've been in this field for a while, it's an incremental improvement over linear algebra, topology, functional spaces, etc.
I don’t need to finetune on five hundred pictures of rabbits to know one. I need one look and then I’ll know for life and can use this in unimaginable and endless variety.
This is a simplistic example which you can naturally pick apart but when you do I’ll provide another such example. My point is, learning at human (or even animal) speeds is definitely not solved and I’d say we are not even attempting that kind of learning yet. There is “in context learning” and “finetuning” and both are not going to result in human level intelligence judging from anything I’ve had access to.
I think you are anthropomorphizing the clever text randomization process. There is a bunch of information being garbled and returned in a semi-legible fashion and you imbue the process behind it with intelligence that I don’t think it has. All these models stumble over simple reasoning unless specifically trained for those specific types of problems. Planning is one particularly famous example.
Time will tell, but I’m not betting on LLMs. I think other forms of AI are needed. Ones that understand substance, modality, time and space and have working memory, not just the illusion of it.
So if you do use in-context learning and give chatGPT a few images of your novel class, then it will correctly classify usually. Finetuning is so you an save on token cost.
Moreover, you don't typically need that many pictures to fine tune. The studies show that the models successfully extrapolate once they've been 'pre-trained'. This is similar to how my toddler insists that a kangaroo is a dog. She's not been exposed to enough data to know otherwise. Dog is a much more fluid category for her than in real life. If you talk with her for a while about it, she will eventually figure out kangaroo is kangaroo and dog is dog. But if you ask her again next week, she'll go back to saying they're dogs. Eventually she'll learn.
> All these models stumble over simple reasoning unless specifically trained for those specific types of problems. Planning is one particularly famous example.
We have extremely expensive programs called schools and universities designed to teach little humans how to plan and execute. If you look at cultures without American/Western biases (and there's not very many left, so we really have to look to history), we see that the idea of planning the way we do it is not universal.
You're basically ignoring all the experts saying "LLMs suck at all these things that even beginning domain experts don't suck at" to generate your claim & then ignoring all evidence to the contrary.
And you're ignoring the ways in which LLMs fall on their face to be creative that aren't language-based. Creative problem solving in ways they haven't been trained on is out of their domain while fully squarely in the domain of human intelligence.
> You can claim that that's not intelligence until the cows come home, but any person able to do that would be considered a savant
Computers can do arithmetic really quickly but that's not intelligence but a person computing that quickly is considered a savant. You've built up an erroneous dichotomy in your head.
Sure, for any domain expert, you can easily get an LLM to trip on something. But just the shear amount of things it is above average at puts it easily into the top echelon of humans.
> You're basically ignoring all the experts saying "LLMs suck at all these things that even beginning domain experts don't suck at" to generate your claim & then ignoring all evidence to the contrary.
Domain expertise is not the only form of intelligence. The most interesting things often lie at the intersections of domains. As I said in another comment. There are a variety of ways to judge intillegence, and no one quantifiable metric. It's like asking if Einstein is better than Mozart. I don't know... their fields are so different. However, I think it's pretty safe to say that the modern slate of LLMs fall into the top 10% of human intelligence, simply for their breath of knowledge and ability to synthesize ideas at the cross-section of any wide number of fields.
But they're not. The people who are extremely competent at many fields will still outperform LLMs in those fields. The LLM can basically only outperform a complete beginner in the area & makes up for that weakness by scaling up the amount it can output which a human can't match. That doesn't take away from the fact that the output is complete garbage when given anything it doesn't know the answer to. As I noted elsewhere, ask it to provide an implementation of the S3 ListObjects operation (like the actual backend) and see what BS it tries to output to the point where you have to spend a good amount of time to convince it just to not output an example of using the S3 ListObjects API.
> I think it's pretty safe to say that the modern slate of LLMs fall into the top 10% of human intelligence, simply for their breath of knowledge and ability to synthesize ideas at the cross-section of any wide number of fields.
Again, evidence assumed that's not been submitted. Please provide an indication of any truly novel ideas being synthesized by LLMs that are a cross-section of fields.
The problem here is that you expect something akin to relativity, the Poincare conjecture, et al. The vast majority of humans are not able to do this.
If you restrict yourself to the sorts of creativity that average people are good at, the models do extremely well.
I'm not sure how to convince you of this. Ideally, I'd get a few people of above average intelligence together, and give them an hour (?) to work on some problem / creative endeavor (we'd have to restrict their tool use to the equivalent of whatever we allow GPT to have), and then we can compare the results.
EDIT: Here's what ChatGPT thinks we should do: https://chatgpt.com/share/673b90ca-8dd4-8010-a1a0-61af699a44...
I want to be clear - I'm talking about the intelligence of AI systems available today and today only. There's lots of reason to be enthusiastic about the future but similarly very cautious about understanding what is available today & what is available today isn't human-like.
This is a common fallacy. The average human ingests a few dozen GB of data a day [1] [2].
ChatGPT 4 was trained on 13 trillion tokens. Say a token is 4 bytes (it's more like 3, but we're being conservative). That's 52 trillion bytes or 52 terabytes.
Say the average human only consumes the lower estimate of 30 GB a day. That means it would take a human 1625 days to consume the number of tokens ChatGPT was trained on, or 4.5 years. Assuming humans and the LLM start from the same spot [3], the proper question is... is ChatGPT smarter than a 4.5 year old. If we use the higher estimate, then we have to ask if ChatGPT is smarter than a 2 year old. Does ChatGPT hallucinate more or less than the average toddler?
The cognitive bias I've seen everywhere is the idea that humans are trained on a small amount of data. Nothing is further from the truth. Humans require training on an insanely large amount of data. A 40 year old human has been trained on orders of magnitudes more data than I think we even have available as data sets. If you prevent a human from being trained on this amount of data through sensory deprivation they go crazy (and hallucinate very vividly too!).
No argument about energy, but this is a technology problem.
[1] https://www.tech21century.com/the-human-brain-is-loaded-dail...
[2] https://kids.frontiersin.org/articles/10.3389/frym.2017.0002...
[3] this is a bad assumption since LLMs are randomly initialized whereas humans seem to be born with some biases that significantly aid in the acquisition of language and social skills
A student consumes only ~6 hours of relevant material a day on various in textual form (textbooks) with minimal guidance from a domain expert and some guidance from peers.
Have you read the studies backing your links? The methodology for how they come up with that estimate is highly questionable especially on its own let alone when it comes to comparing with LLMs. Domain experts in the field are pretty confident that LLMs are trained on more actual information than humans.
> If you prevent a human from being trained on this amount of data through sensory deprivation they go crazy (and hallucinate very vividly too!).
People who are deaf & blind experience a significant amount of sensory deprivation compared with the typical human but do not go crazy or start hallucinating. This suggests that your analysis is flawed. For humans communication is the important bit - as long as we have some kind of communication mechanism we can achieve quite a fair bit.
How many LLMs have created companies entirely on their own? Or do anything unprompted, for that matter? You can go on about it but the fact that they require human interaction means the intelligence comes from the human using them, not the LLM itself. Tools are not intelligent.
AI models are algorithms running on processors running at billions of calculations a second often scaled to hundreds of such processors. They're not intelligent. They're fast.
Things I've used chat gpt for:
1. writing songs (couldn't find the generated lyrics online, so assume it's new)
2. Branding ideas (again couldn't find the logos online, so assuming they're new)
3. Recipes (with weird ingredients that I've not found put together online)
4. Vacations with lots of constraints (again, all the information is obviously available online, but it put it together for me and gave recommendations for my family particularly).
5. Theoretical physics explorations where I'm too lazy to write out the math (and why should I... chatgpt will do it for me...)
I think perhaps one reason people here do not have the same results is I typically use the API directly and modify the system prompt, which drastically changes the utility of chatgpt. The default prompt is too focused on retrieval and 'truth'. If you want creativity you have to ask it to be an artist.
For what I needed, those things worked very well
You have not specified what evidence would satisfy you.
And yes, it was an insult to insinuate I would accept sub par results whereas others would not.
EDIT: Chat GPT seems to have a solid understanding of why your comment comes across as insulting: https://chatgpt.com/share/673b95c9-7a98-8010-9f8a-9abf5374bb...
Maybe this should be taken as one point of evidence of greater ability?
Edit I asked ChatGPT with a more proper context: "It’s not inherently insulting to say that an LLM (Large Language Model) cannot guarantee the best quality because it’s a factual statement grounded in the nature of how these models work. LLMs rely on patterns in their training data and probabilistic reasoning rather than subjective or objective judgments about "best quality."
Zooming out, you seem to be in the wrong conversation. I said:
> the LLM can solve a general problem (or tell you why it cannot), while your calculator can only do that which it's been programmed.
You said:
> Do you have any evidence besides anecdote?
I think that -- for both of us now having used chat gpt to generate a response -- we have good evidence that the model can solve a general program (or tell you why it cannot), while a calculator can only do the arithmetic for which it's been programmed. If you want to counter, then a video of your calculator answering the question we just posed would be nice.
https://chatgpt.com/share/673b8c33-2ec8-8010-9f70-b0ed12a524...
Chat GPT can't directly execute code on my machine due to architectural limitations, but I imagine if I went and followed its instructions and told it what went wrong, it would correct it.
and that's just it, right? If i were to program this, I would be iterating. ChatGPT cannot do that because of how its architected (I don't think it would be hard to do this if you used the API and allowed some kind of tool use). However, if I told someone to go write me an S3 backend without ever executing it, and they came back with this... that would be great.
EDIT: with chunking: https://chatgpt.com/share/673b8c33-2ec8-8010-9f70-b0ed12a524...
IIRC, from another thread on this site, this is essentially how S3 is implemented (centralized metadata database that hashes out to nodes which implement a local storage mechanism -- MySQL I think).
Source: I had to implement R2 from scratch and nothing generated here would have helped me as even a starting point. And this isn't even getting to complex things like supporting arbitrarily large uploads and encrypting things while also supporting seeked downloads or multipart uploads.
[1] No one would ever do this for all sorts of problems including that you'd have all sorts of security problems with attackers sending you /../ to escape bucket and account isolation.
[2] No one would ever do this because you've got nothing more than a toy S3 server. A real S3 implementation needs to distribute the data to multiple locations so that availability is maintained in the face of isolated hardware and software failures.
Of course it wouldn't. You're a computer programmer. There's no point for you to use ChatGPT to do what you already know how to do.
> The implementation generated not only saves things directly to disk
There is nothing 'incorrect' about that, given my initial problem statement.
> Additionally, it makes a key mistake which is that uploading isn't a form but is the body of the request so it's already unable to have a real S3 client connect.
Again.. look at the prompt. I asked it to generate an object storage system, not an S3-compatible one.
It seems you're the one hallucinating.
EDIT: ChatGPT says: In short, the feedback likely stems from the implicit expectation of S3 API standards, and the discrepancy between that and the multipart form approach used in the code.
and
In summary, the expectation of S3 compatibility was a bias, and he should have recognized that the implementation was based on our explicitly discussed requirements, not the implicit ones he might have expected.
If it were more intelligent of course there would be. It would catch mistakes I wouldn't have thought about, it would output the work more quickly, etc. It's literally worse than if I'd assigned a junior engineer to do some of the legwork.
> ChatGPT says: In short, the feedback likely stems from the implicit expectation of S3 API standards, and the discrepancy between that and the multipart form approach used in the code. > In summary, the expectation of S3 compatibility was a bias, and he should have recognized that the implementation was based on our explicitly discussed requirements, not the implicit ones he might have expected
Now who's rationalizing. I was pretty clear in saying implement S3.
In general, I don't deny the fact that humans fall into common pitfalls, such as not reading the question. As I pointed out this is a common human failing, a 'hallucination' if you will. Nevertheless, my failing to deliver that to chatgpt should not count against chatgpt, but rather me, a humble human who recognizes my failings. And again, this furthers my point that people hallucinate regularly, we just have a social way to get around it -- what we're doing right now... discussion!
If we restrict ourselves only to language (LLMs are at a disadvantage because there is no common physical body we can train them on at the present moment... that will change), I think LLMs beat humans for most tasks.
I had a problem where I used GPT-4o to help me with inventory management, something a 5th grade kid could handle, and it kept screwing up values for a list of ~50 components. I ended up spending more time trying to get it to properly parse the input audio (I read off the counts as I moved through inventory bins) then if I had just done it manually.
On the other hand, I have had good success with having it write simple programs and apps. So YMMV quite a lot more than with a regular person.
I will wave my arms wildly at the last eight years if the claim is that humans do not struggle with recall.
The point is that the ways in which it fails is completely different from LLMs and it's different between people whereas the failure modes for LLMs are all fairly identical regardless of the model. Go ask an LLM to draw you a wine glass filled to the brim and it'll keep insisting it does even though it keeps drawing one half-filled and agree that the one it drew doesn't have the characteristics it says such a drawing would need and still output the exact same drawing. Most people would not fail at the task in that way.
I by no means have a 'maximal' position. I have said that they exceed the intelligence and ability of the vast majority of the human populace when it comes to their singular sense and action (ingesting language and outputting language). I fully stand by that, because it's true. I've not claimed that they exceed everyone's intelligence in every area. However, their ability to synthesize wildly different fields is well beyond most human's ability. Yes, I do believe we've crossed the tipping point. As it is, these things are not noticeable except in retrospect.
> The point is that the ways in which it fails is completely different from LLMs and it's different between people whereas the failure modes for LLMs are all fairly identical
I disagree with the idea that human failure modes are different between people. I think this is the result of not thinking at a high enough level. Human failure modes are often very similar. Drama authors make a living off exploring human failure modes, and there's a reason why they say there are no new stories.
I agree that Human and LLM failure modes are different, but that's to be expected.
> regardless of the model
As far as I'm aware, all LLMs in common use today use a variant of the transformer. Transformers have much different pitfalls compared to RNNs (RNNs are parlticularly bad at recall for example).
> Go ask an LLM to draw you a wine glass filled to the brim and it'll keep insisting it does even though it keeps drawing one half-filled and agree that the one it drew doesn't have the characteristics it says such a drawing would need and still output the exact same drawing. Most people would not fail at the task in that way.
Most people can't draw very well anyway, so this is just proving my point.
And you're proving my point. The ways in which the people would fail to draw the wine glass are different from the LLM. The vast majority of people would fail to reproduce a photorealistic simile. But the vast majority of people would meet the requirement of drawing it filled to the brim. The LLMs absolutely succeed at the quality of the drawing but absolutely fail at meeting human specifications and expectations. Generously, you can say it's a different kind of intelligence. But saying it's more intelligent than humans requires you to use a drastically different axis akin to the one you'd use saying that computers are smarter than humans because they can add two numbers more quickly.
> But the vast majority of people would meet the requirement of drawing it filled to the brim.
But both are failures, right? It's just a cognitive bias that we don't expect artistic ability of most people.
> But saying it's more intelligent than humans requires you to use a drastically different axis
I'm not going to rehash this here, but as I said elsewhere in this thread, intelligences are different. There's no one metric, but for many common human tasks, the ability of the LLMs surpasses humans.
> saying that computers are smarter than humans because they can add two numbers more quickly.
This is where I disagree. Unlike a traditional program, both humans and LLMs can take unstructured input and instruction. Yes, they can both fail and they fail differently (or succeed in different ways), but there is a wide gulf between the sort of structured computation a traditional program does and an llm.
No, I'd say very different failures. The LLM is failing at reasoning and understanding whereas people are failing at training. Humans can fix the training part by simply doing the task repetitively. LLMs can't fix the understanding part because it's a fundamental flaw in the design. It's like categorizing a chimp's inability to understand logical reasoning as "cognitive bias" - no it's a much more structural problem.
> intelligences are different. There's no one metric, but for many common human tasks, the ability of the LLMs surpasses humans
There isn't one metric, and yes LLMs surpass humans on various tasks. But we've not been able to establish any evidence that the mechanism that they operate by is intelligence. It's certainly the closest we've come to building something artificial that approximates it to a high degree in some cases. But there's still no indication this isn't just a general purpose ML algorithm or has anything approaching human intelligence or sentience - basically it can mimic various human skills related to generative intelligence (writing and drawing) but less clear it can mimic anything else.
> This is where I disagree. Unlike a traditional program, both humans and LLMs can take unstructured input and instruction
That is true but it's a huge claim and leap to then say that anything taking unstructured input and instruction is demonstrating intelligence, especially when it fails to execute the requested instructions correctly regardless how much correction you have to do (as demonstrated by the wine glass problem & many other similar kinds of failure points).
There's reason to believe that there's a difference from a power perspective & from the fact that transformers are not self-learning from additional input whereas humans meld short term and long term learning while things like ChatGPT bolt on "memories" which are just factoids stored in a RAG and not something that the transformer is learning as new data.
This generally means for a task like you are doing, you need to have sign posts in the data like minute markers or something that it can process serially.
This means there are operations that are VERY HARD for the model like ranking/sorting. This requires the model to attend to everything to find the next biggest item, etc. It is very hard for the models currrently.
Ranking / sorting is O(n log n) no matter what. Given that a transformer runs in constant time before we 'force' it to output an answer, there must be an M such that beyond that length it cannot reliably sort a list. This MUST be the case and can only be solved by running the model some indeterminate number of times, but I don't believe we currently have any architecture to do that.
Note that humans have the same limitation. If you give humans a time limit, there is a maximum number of things they will be able to sort reliably in that time.
Singularity means something very specific, if your AI can build a smarter AI then itself by itself, and that AI can also build a new smarter AI then you have singularity.
You do not have singularity if an LLM can solve more math problems then the average Joe, or if ti can answer more trivia questions then a random person, even if you have an AI better then all humans combined at Tic Tac Toe you still do not have a singularity, IT MUST build a smarter AI then itself and then iterate on that.
When I was at Cerebras, I fed in a description of the custom ISA into our own model and asked it to generate kernels (my job), and it was surprisingly good
And? Was it actually better then say the top 3 people in this field would create if they would work on it ? Because this models are better at css then me, so what? I am bad at css, but all the top models could not solve a math limit from my son homework so we had to use good old forums to have people give us some hints. But for sure models can solve more math limits then the average person who probably can't solve a single one.
> But for sure models can solve more math limits then the average person who probably can't solve a single one.
Some people are domain experts. The pretrained GPTs are certainly not (nor are they trained to be).
Some people are polymaths but not domain experts. This is still impressive, and where the GPTs fall.
The final conclusion I have is this: These models demonstrate above average understanding in a plethora of widely disparate fields. I can discuss mathematics, computation, programming languages, etc with them and they come across as knowledgeable and insightful to me, and this is my field. Then, I can discuss with them things I know nothing about, such as foreign languages, literature, plant diseases, recipes, vacation destinations, etc, and they're still good at that. If I met a person with as much knowledge and ability to engage as the model, I would think that person to be of very high intelligence.
It doesn't bother me that it's not the best at anything. It's good enough at most things. Yes, its results are not always perfect. Its code doesn't work on the first try, and it sometimes gets confused. But many polymaths do too at a certain level. We don't tell them they're stupid because of it.
My old physics professor was very smart in physics but also a great pianist. But he probably cannot play as well as Chopin. Does that make him an idiot? Of course not. He's still above average in piano too! And that makes him more of a genius than if he were just a great scientist.
My point was about Singularity, what i t means and why LLMs are not there.
So you missed my point? Was I not clear enough what I was talking about?
And I agree, and a script might do a better job on some taks and you will not claim my script has reached singularity, right ?
At their core, the state of the art LLMs can basically do any small to medium mental task better than I can or get so close to my level than I’ve found myself no longer thinking through things the long way. For example, if I want to run some napkin math on something, like I recently did some solar battery charge time estimates, an LLM can get to a plausible answer in seconds that would have taken me an hour.
So yeah, in many practical ways, LLMs are smarter than most people in most situations. They have not yet far surpassed all humans in all situations, and there are still some classes of reasoning problems that they seem to struggle with, but to a first order approximation, we do seem to be mostly there.
Exactly. I've used it to figure geometric problems for everyday things (carpentry), market sizing estimates for business ideas, etc. Very fast turnaround. All the doomers in this thread are just ignoring the amazing utility these models provide.
I think this is it. LLM responses feel like the unconsidered ideas that pop into my head from nowhere. Like if someone asks me how many states are in the United States, a number pops out from somewhere. I don't just wire that to my mouth, I also think about whether or not that's current info, have I gotten this wrong in the past, how confident am I in it, what is the cost of me providing bad information, etc etc etc.
If you effectively added all of those layers to an LLM (something that I think the o1-preview and other approaches are starting to do) it's going to be interesting to see what the net capability is.
The other thing that makes me feel like we're 'getting there' is using some of the fast models at groq.com. The information is generated at, in many cases, an order of magnitude faster than I can consume it. The idea that models might be able to start to engage through an much more sophisticated embedding than english to pass concepts and sequences back and forth natively is intriguing.
You have to look at the LLM as the inner voice in your head. We've kind of forced them into saying whatever they think due to how we sample the output (next token prediction), but in new architectures with pause tokens, we let them 'think' and they show better judgement and ability. These systems are rapidly going to improve and it will be very interesting to see.
But this is another reason why I think they've surpassed human intelligence. You have to look at each token as a 'time step' in the inner thought process of some entity. A real 'alive' entity has more 'ticks' than what their actions would suggest. For example, human brains can process up to 10FPS (100ms response time), but most humans aren't saying 10 words a second. However, we've made LLMs whose internal processes (i.e., their intuition) is already superior. If we just gave them that final agentic ability to not say anything and ponder (which researchers are doing), their capabilities will increase exponentially
> The other thing that makes me feel like we're 'getting there' is using some of the fast models at groq.com.
Unlike perhaps many of the commentators here, I've been in this field for a bit under a decade now, and was one of the early compiler engineers at Groq. Glad you're finding it useful. It's amazing stuff.
That is to say, if we want to extend this analogy, the model is 'killed' after each round. This is hardly a criticism of the underlying technology.
Going back to feeding the entire input. That is not really true. There are a dozen ways to not do that these day.
But if it does happen some day, how will we know? What are the chances that the first sentient AI will be accused of just mimicking patterns?
Indeed with the current training methodology it's highly likely that the first sentient AI will be unable to even let us know it's sentient.
> But if it does happen some day, how will we know? What are the chances that the first sentient AI will be accused of just mimicking patterns?
Leaving questions of sentience aside (since we don't even really know what that is) and focusing on intelligence, the truth is that we will probably not know until many decades latel.
Intelligence and technological singularities are observable things.
Sentience is not.
BTW, I fail to effectively run this on my 2080 ti, I've just loaded up the machine with classic RAM. It's not going to win any races, but as they say, it's not the speed that matter, it's the quality of the effort.
*In theory this shouldn't matter much for my purpose of summarizing city council meetings that follow a predictable format.
It's cool that these models are getting such long contexts, but performance definitely degrades the longer the context gets and I haven't seen this characterized or quantified very well anywhere.
They posted a haystack benchmark in the blog post that seems too good to be true.
Because there is no variation, nothing.
Actually English language tokenizers map on average 3 words into 4 tokens. Hence 1M tokens is about 750K English words not a million as claimed.