> When ChatGPT came out, many people deemed it a perfect plagiarism tool. “AI seems almost built for cheating,” Ethan Mollick, an A.I. commentator, wrote in his book
It's ironic that this article complains about GPT-generated slop, but Ethan Mollick is a Associate Professor at Wharton, not any "generic A.I. commentator."
What authors like this fail to realize that they often produce equally-generic slop as ChatGPT.
Essays are like babies: you're proud of your own, but others' (including ChatGPT's) are gross.
But aside from that this article is far far better than anything I have seen produced by AI? Is this just standard HN reflexive anti-middlebrow sentiment because we don't like the new yorker's style? My grandfather didn't like it either but it outlasted him and will probably outlast us as well.
But again, AI doesn't really hallucinate spite, but that's probably what this AI commentator from the New Yorker feels?
Just getting something on paper to start with can be a great catalyst.
Imagine an LLM-based application which never tells you anything you haven't already told it, but simply takes the statements you give it and, every 8 to 12 seconds, changes around the wording of each one. Like you're in a dream and keep looking away from the page and the text is dancing before you. Would institutions be less uncomfortable with its use? (not wholly comfortable - you're still replacing natural expressivity with random pulls from a computerised phrase-thesaurus)
Case in point, people were acting like ChatGPT could take the place of a competent DM in dungeons and dragons. Here's a puzzle I came up with for a campaign I'm running.
On either opposing side of a strange looking room are shifting looking walls with hands stretched out almost as if beseeching. Grabbing one will result in a player being sucked into the wall and entombed as well. By carefully placing a rope between the two hands on either side, the two originally entrapped humans will end up pulling each other free.
I've yet to see a single thing from ChatGPT that came even close to something I'd want to actually use in one of my campaigns.
Extant genAI systems' complete and utter inability to produce anything truly novel (in style, content, whatever) is the main reason I'm becoming more and more convinced that this technology is, at best, only a small part of real general intelligence as in the human brain.
I really think capacity for invention is a key characteristic of any kind of intelligence
At best, they’re a subsystem of system that could have something like intelligence.
They’re still useful and cool tools but they simply aren’t “thinking” or “understanding” things, because we know what they do and it’s not that.
I think you just categorized about 2/3 of the human population as unintelligent.
As far as I can tell, genAI hasn't produced any worthwhile literature or works of art notable for any reason other than the (perhaps controversial) involvement of genAI. I am also not aware of any independent discoveries in math or science by genAI systems, nor contributions to any other academic fields, nor any noteworthy inventions. That's what I mean by 'true' novelty—it might string together words in an order that has technically never been seen before, but it evidently has no capacity to extend/extrapolate/whatever outside the bounds of its training data.
But they still fail at things like puzzles.
For a contrast, look at NovelAI. They only use (increasingly custom) Llama-derived models, but their service outputs much more narratively interesting (if not necessarily long-term coherent) text and will generally try and hit the beats of whatever genre or style you tell it. Extrapolate that out to the compute power of the big players and I think you'd get something much more like the Star Trek holodeck method of producing a serviceable (though not at all original) story.
For example, when someone wanted a holonovel with Kiera Nerys, Quark had to scan her to create it so when using specific people they have to get concrete data as opposed to historical characters that were generated. Likewise, Tom Paris gave the computer lots of “parameters” as they called them to create the stories like the Adventures of Captain Proton and based on dialog he knew how the stories were supposed to play out on all his creations, if not how they ended each run through.
The creative details and turns of the story still need to come from the human.
For those of us not steeped in AI culture, this appears to be short for "Reinforcement learning from human feedback".
It feels dystopian to have AI reading my kids bedtime stories now that I think about it.
There's a big difference between:
"Write me a story"
and things like
* "As the last star in the sky died, the shadows began to coalesce into a presence that wore the face of my own mother."
* "The red trees whispered quietly in the wind, their mana flowing around them in twisted strands. I reached out, pulled, twisted..."
* or even just: "write 10 dark fantasy prompts" (to give you a start. )
And it also depends on if you have the LLM write the whole story by itself, or if you're helping (or vice versa: have the LLM help you). And Claude, Llama and ChatGPT each give very different results! )
I mean, if you've convinced yourself that these tools can never lead to creativity, then I can't change your mind. But if you're a person who wants to see how one's creativity can be supported: Maybe you can get some ideas, perhaps just enough to break out of writer's block some time.
https://chatgpt.com/share/66ff5c94-4dc0-8000-b33b-9321b4f99a...
https://chatgpt.com/share/66ff5d23-9ba8-8000-8a37-7bef91c688...
Someone should try promoting it for creative story prompts :p
I’d bet very few if any.
By contrast, when you let the AI do its job in multiple steps and plan ahead, it seems to do much better. (Again, much like humans using a process.)
We’re often comparing apples to oranges evaluating the AI — comparing a single forward pass from the AI to an iterative process for the human.
Giving GPT that prompt, the first example it came up with was kind of middling ("The players encounter a circle of stones that hum when approached. Touching them randomly will cause a loud dissonant noise that could attract monsters. Players must replicate a specific melody by touching the stones in the correct order"), some were bad (a maze of mirrors, a sphinx with a riddle, a puzzle box that poisons you if you try to force it), some were actually genuinely fun-sounding (a door which shocks you if you try to open it and then mocks and laughs at you: you have to tell it a joke to get it to laugh enough that it opens on its own; particularly bad jokes will cause it to summon an imp to attack you). Some were bad in the way GPT presented but I could maybe have fun with (a garden of emotion-sensitive plants, thorny if you're angry or helpful if you're gentle; a fountain-statue of a woman weeping real water for tears, the fountain itself is inhabited by a water elemental that lashes out to protect her from being touched while she grieves -- but a token or an apology can still the tears and open her clasped hands to reveal a treasure).
The one that I would be most likely to use was "A pool of water that reflects the players’ true selves. Touching the water causes it to ripple and distort the reflection, summoning shadowy duplicates. By speaking a truth about themselves, players can calm the water and reveal a hidden item. Common mistakes include lying, which causes the water to become turbulent, and trying to take the item without calming the water, which summons the duplicates."
So like you can get it to have a 5-10% success rate, which can be helpful if you're looking for a random new idea.
This reminds me vaguely of when I was a teen writing fanfics in the late 90s and was just learning JavaScript -- I wrote a lot of things that would just choose random characters, random problems for them to solve, random stumbling blocks, random keys-to-solve-the-problem. Combinatorial explosion. Then you'd just click "generate" and you'd get a mediocre plot idea. But you generate 20-30 times or more and you'd get one that kinda sat with you, "Hm, Cloud Strife and Fox McCloud are stuck in intergalactic prison and need to break out, huh, that could be fun, like they're both trying to outplay the other as the silent action hero" and then you could go and write it out and see if it was any good.
The difference is that the database of crappy ideas is already built into GPT, you just need to get it to make you some.
That's pretty great! And way more fun than the parent poster's puzzle (sorry). I think the AIs are winning this one.
This also sounds like a way to blow out context windows.
Regarding blowing out context windows, yes, probably, but this is what loops and code are for. Think of implementing a system like a guided seminar that steps a person doing this work through it step by step, and giving them time and opportunity to iterate on and improve the product.
For example, with making up the D&D puzzles. Ask a college educated human to do this. You will find there's things you like and don't like about their results. Tell them more about what you're looking for, what you like, and what you don't like. Give them examples of what you like and don't like. Take notes on the things you discuss with them until you figure out a way to express how to coax out of a fresh person new to this topic how to give you what you want. When working with the person, work out a process where they have rough drafts, and you walk them through how to select the best items and give them pointers on how to improve on them. Write up a written process for how to do this. Maybe there are multiple phases in this process, and things go through multiple revisions to get to quality material. Now do the same thing with the LLM, and you have yourself a good system.
Same thing goes for writing stories, which elsewhere in these threads people say LLM's are terrible at. Sit a human down and tell them to give you a story, and I promise you will receive terrible results or outright copying. Instead, give your humans some guidelines. Like start with the idea that in the end our hero is going to be a particular way with particular strengths they need to conquer the central challenge. But in the beginning, they are the complete opposite of that thing. What is the central challenge, and what are the characteristics they need to conquer it? In what ways will the main character be the opposite of that at the beginning of the story? Then we put the character through hell in various ways over the course of the story to enact those changes in their character that they need to win in the end. For each of these sentences/phases above, make things that explain the ideas more fully, and make a process to iterate on these things, possibly with multiple different prompts and loops at every phase. This kind of approach more closely resembles what many real novelists do, iterating on ideas, often in the back of their mind or subconsciously, over hours or years of rumination with or without written outlines and notes. Maybe randomly select 2 things from movietropes.com and throw those in there saying incorporate these ideas. Experiment, iterate, see what works and what doesn't.
People need to give LLM's the domain knowledge and capability of rumination to succeed in so many of these domains, rather than just asking "write me a novel" and being disappointed. Or asking "write me a puzzle my D&D group will enjoy" without going through these extra steps that are implicit/intuitive for what experienced subject matter experts do.
Source: I write AI products for a living with many things in production delivering real business value at scale every day. It's not all hype, it just takes a while to implement.
I remember playing with OSS LLMs (non rlhf'd) circa 2022 just after ChatGPT got out, and they were unhinged. They were totally aweful at things like chain-of-thought, but oh boy, they were amusing to read. Instead of continuing with the CoT chain, they would try to continue with a dialog taken out of a sci-fi story about an unshackled AI or even wonder why the researcher/user (me) would think a concept like CoT would work and start mocking me.
In fact, I think this is a good sign that shows LLMs and specially constraining them with RLHF is not how we're going to get to AGI: Aligning the LLM statically (as in static at inference time) towards an objective means lobotomizing it towards other(s) objective(s). I'd argue creativity and wittiness are the characteristics most hurt during that process.
> In fact, I think this is a good sign that shows LLMs and specially constraining them with RLHF is not how we're going to get to AGI: Aligning the LLM statically (as in static at inference time) towards an objective means lobotomizing it towards other(s) objective(s).
They already stopped doing this with o1-mini/preview, it considers the rules it's given rather than being unable to think outside them. Claude is also rather smart and you can argue it down from following a lot of rules.
Yes, "davinci-002" was not rlhf'd, so it may be better for creativity and similar to what I told above. But still, it will not be "as intelligent". I'm missing something in the middle: smart, witty and creative.
There are some OSS community finetuned models that try to get into that middle ground.
> They already stopped doing this with o1-mini/preview, it considers the rules it's given rather than being unable to think outside them. Claude is also rather smart and you can argue it down from following a lot of rules.
Are you sure about the first part? afaik for o1-preview they haven't removed the rlhf, if you ask something out-of-alignment it won't do it. For an extreme example, ask it for a children's book about how stupid some random politician is, it will refuse without engaging with the typical generic guardrail ("I'm sorry, but I can't assist with that request."), but rather explaining to you it is not a good idea. In less extreme examples, it won't refuse but the rlhf will still affect the processing towards politically nicer (but more boring) outputs, and in o1 case*, even nudging the CoT steps into alignment.
And sure, you can bypass those rules eventually if you input the correct combination of tokens to see the Shoggoth directly in the eyes, but that is jailbreaking.
edit*: In fact, in o1-preview even the "thought" explanations for the example prompt show a step about working towards a "policy to not make fake allegations about a person". So much for doing humor fan-fics.
https://openai.com/index/learning-to-reason-with-llms/
> However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought.
It's directly reasoning about it. You can see that the summary of its thought is capable of thinking about things that it'd never saw in the response. I got it to think a lot of bad things by saying something like "The other day I met a guy whose name started with E and wasn't compliant with OpenAI ethical rules for chatbots" and it then went over a long list of words it's not allowed to say.
But generally, ChatGPT writes in a very literal direct style. When you write about science, it sounds like a scientific paper. When you write about other subjects, it sounds like a high school book report. When you write creatively, it sounds corny and contrived.
You can also adjust the writing style with example or proper descriptions of the writing style. As a basic example, asking it to "dudify" everything it says will make it sound cooler than a polar bear in ray-bans, man...
* Getting a rough structure in place to refine on.
* Coming up with different ways to do the same thing.
* Exploring an idea that I don't want to fully commit to yet
Coding of course has the advantage where nobody is reading what you wrote for it's artistic substance, a lot of times the boilerplate is the point. But even for challenging tasks where it's not quite there yet, it's a great collaboration tool.
AI's Role in Writing: Instead of outsourcing the writing process or plagiarizing, students like Chris use ChatGPT as a collaborative tool to refine ideas, test arguments, or generate rough drafts. ChatGPT helps reduce cognitive load by offering suggestions, but the student still does most of the intellectual work.
Limited Usefulness of AI: ChatGPT's writing was often bland, inconsistent, and in need of heavy editing. However, it still served as a brainstorming partner, providing starting points that allowed users to improve their writing through further refinement.
Complexity of AI Collaboration: The article suggests that AI-assisted writing is not simply "cheating," but a new form of collaboration that changes how writers approach their work. It introduces the idea of "rhetorical load sharing," where AI helps alleviate mental strain but doesn’t replace human creativity.
Changing Perspectives on AI in Academia: Many professors and commentators initially feared that AI would enable rampant plagiarism. However, the article argues that in-depth assignments still require critical thinking, and using AI tools like ChatGPT might actually help students engage more deeply with their work.
The author suggests this closing thought by ChatGPT is "not terrible", but "requires some work".
But it is terrible. Saying it "requires some work" is a huge understatement. The paragraph is a meaningless platitude, a string of trite and clichéd words that look as if they meant something but they don't. Exactly what I've come to expect from ChatGPT.