Similar for things like "fix grammar in this long text" will have to tweak random words without a reason, because the existing text can't be 100% reproduced while injecting synth-id.
> There are two primary factors that affect the detection performance of the scoring function. The first is the length of the text x: longer texts contain more watermarking evidence, and so we have more statistical certainty when making a decision. The second is the amount of entropy in the LLM distribution when it generates the watermarked text x. For example, if the LLM distribution is very low entropy, meaning it almost always returns the exact same response to the given prompt, then Tournament sampling cannot choose tokens that score more highly under the g functions. In short, like other generative watermarks, Tournament sampling performs better when there is more entropy in the LLM distribution, and is less effective when there is less entropy.
So should accounting be. To a much higher degree.
Yet I hope most of us are aware of the British Post Office scandal where what really should be accounting software falsely accused thousands of employees of theft, of which over 900 were convicted of theft, fraud and false accounting.
If this can happen in something as utterly boring as an accounting system should be in this millennium, I don't think we should trust AI fraud detection in science and academia until we get a few decades of experience with it.
(Do I think records from accounting can be used as evidence? Absolutely, given the right circumstances: we can know it hasn't been tampered with etc
What I don't think however is that pattern matching or "watermarks" that indicates probability should be used as evidence. Especially not closed source systems with secret distributions and watermarking algorithms.)
In the wild, there are too many variables to use watermarking to draw meaningful conclusions about any piece of text, no matter the word count. Scott Aaronson described well one of those variables, "the pineapple attack," in his Aug. 2023 talk at Simons [1].
Watermarking is illuminating to the ongoing study of language models' functionality, but it doesn't put the genie back in the bottle.
Today, for each output token the LLM produces a probability for each possible output, then a 'sampler' makes a probability-weighted random choice. If the next-token probabilities are 90% for foo, 9% for bar and 1% for baz then the sampler chooses a random number between 0 and 1, if it's <0.9 it outputs foo, 0.9-0.99 it outputs bar, 0.99-1 it outputs baz.
But what if instead of using random numbers, you had a source of evenly distributed random numbers that was deterministic, based on some secret key?
Each candidate token would remain just as likely as it was before - there would still be a 90% chance of foo being chosen. So the output shouldn't degrade in quality.
And sure, some tokens will have 99.999% probability and their selection doesn't tell you much. But in most real-world use multiple wordings are possible and so on. So across a large enough sample of the output, you could detect whether the sampler was following your secret deterministic pattern.
Of course the downside is you've got to check on exactly the same LLM, and only people with the secret key can perform the check. And it's only applicable to closed-source LLMs.
I'm also not quite sure if it works when you don't know the exact prompt - so maybe my understanding of the paper is all wrong?
I don't think the key actually needs to be secret, as it's not trying to cryptographicly secure anything. So all closed weights LLM providers could just publicly share the keys they use for water marking, and then anybody could use them to check if a particular piece of text was generated by a particular LLM.
That being said, I think you are right about this only really being useful for closed weights models. If you have the weights, you can just run an LLM through a standard sampler and it won't be watermarked.
Seems like a cool theoretical trick that has little practical implication.
Thus it might reduce usage some, but it certainly wouldn't block all usage. Additionally, there are only a few providers of truly massive LLMs on the market right now. If they decided that doing this would be a social good, or more likely that it would bring bad PR to not do this when their competitors do, then they would at least be able to watermark all of the massive LLM outputs.
And there is no good reason for a provider to watermark - they aren't helping the customer. They'd be helping some other party who isn't paying them.
This will never be a thing.
There's a possible future where this gets legislated, right? Of course, there are lots of implementation challenges to this and it's probably a bad idea...
I wouldn't bet on that! I can see legislation to require this for many reasons ... related to intellectual property, cheating, detecting the root of hate-speech or harassment, "stealing" from employers by not performing work or putting them at legal risk, "stealing" from artists by duplicating their style, political speech that can not be traced (it could be from a bad actor!), tracking down generated revenge porn (or much worse!), tracking down people using LLMs to grift the elderly, and on and on. Why, if you are not using a watermarked LLM, it could be an op by Russia, China, or Iran! In fact, part of the legislation could be a requirement of social media or office tools or government tools or political tools or educational tools to check for a watermark and not work if an approved one is not found. Ideally this list will be private, because you want companies to be able to automate away workers, and do the least possible for customers, you just want to make sure you're doing it above-board, you see.
>And there is no good reason for a provider to watermark - they aren't helping the customer.
No one cares about customers, they care about money. And you know what helps make a lot of money? A legally defined moat for yourself and a couple of others that blocks anyone else.
>They'd be helping some other party who isn't paying them.
Yes! That party is themselves!
Even then, open source will almost certainly always exist. Services running offshore will exist. It would seem impossible to enforce.
I will happily lose those cases for increased performance, that's the thing I care about.
Are there normal cases where you picture this as an issue?
And I am not against LLM output being identifiable as such. (although I think an argument could be made based on the ruling about the monkey and the camera, which IIRC would say that the copyright belongs to whoever created the situation).
But after the
1. British Post Office scandal and
2. some really high profile cases of education institutions here in Norway abusing plagiarism detectors
I do not feel ready to trust neither
1. complex software (and especially not closed sourced software) to tell us who is cheating or not
2. nor any humans ability to use such a system in a sensible way
While cheating isn't usually criminal court, students also usually does not get a free defense.
For this reason I suggest cheating should have to be proven to have occurred, not "suggested to probably have occurred" by the same people who creates the not very reliable and extremely hard-to-reproduce LLMs.
I must be missing something, because this seems to assume a contiguous output.
You might be able to have one LLM output the original, and then another to do a partial rewording though. The resulting text would likely have higher than chance "watermarkedness" for both LLMs, but less than you would expect from a plain output. Perhaps this would be sufficient for short enough outputs?
The most likely used word is based off the previous four, and only works if there is enough entropy present that one of multiple word would work. Thus its not a simple matter of humans picking up particular word choices. There might be some cases where there are 3 tokens in a row that occur with low entropy after the first token, and then one token generation with high entropy at the end. That would cause a particular 5 word phrase to occur. Otherwise, the word choice would appear pretty random. I don't think humans pick up on stuff like that even subconsciously, but I could be wrong.
I would be interested to see if LLMs pick up the watermarks when fed watermarked training data though. Evidently ChatGPT can decode base64, [0] so it seems like these things can pick up on some pretty subtle patterns.
[0] https://www.reddit.com/r/ChatGPT/comments/1645n6i/i_noticed_...
And honestly, this still retains like 95% of the value of writing a paper, because I did write it, the words did flow through my brain. I just used the LLM to avoid facing a blank page.
I've also thought about asking LLMs to simulate a forum conversation about the Civil War (or whatever the topic may be), and include a wrong comment that can be countered by writing exactly what the assignment requires, because I seem to have no trouble writing an essay when duty calls and someone is wrong in the internet.
I worry (and have already read worrying things) about “cheating detection” tools that have been deployed in schools. My intuition would be that there’s just too much entropy between something like an essay prompt and the essay itself. I guess it also depends on how specific the teacher’s essay prompt is as well.
This is my worry as well.
Punishment for cheating can be can easily set back a student a year or more. This is fair if the student has been cheating, but really harsh.
So while this isn't criminal court, I think schools should apply the same principles here: innocent until proven guilty.
And in my view, secret probability distributions isn't exactly good proof.
Furthermore, to make it even worse: if someone is actually innocent it will be next to impossible to argue their innocence since everyone will trust the system and as far as I can see the system cannot be actually verified by a board without disclosing the weights. And that is assuming they would care to try to help a student prove their innocence in the first place.
AFAIK this is a topic that has been explored to some depth in science fiction, but more importantly, we have case like the mail service in UK where multiple people lost their jobs because nobody could belive the system they had built or paid for could make such crazy mistakes.
Back to students: For a less privileged student I guess it can easily ruin their studies. TBH as someone who struggled a lot in school I am not sure I'd finished if I had gotten my studies delayed by a year. Which would have been sad, given how well I have managed once I didn't have to juggle full time studies and part time work.
Recently (last year and this) we (Norway) have had some debates that seemed to be way overdue regarding what can ve considered cheating (with some ridiculous examples of students getting punished for "self-plagiarism" for the most absurd things, including not specifying a failed previous exam written by themselves as a source).
This could easily have gotten nowhere except for the fact that:
1. the person in charge of the board of appeals was caught for something else
2. Somebody took the effort to dig out the master thesis from two ministers, including the then sitting Minister of Education and proved that they had clearly been "cheating" according to the rules that they were judging students by.
Get answer.
Rewrite in your own words.
Feed back to chatGpT to check for errors.
Done. Watermarking really doesn’t solve any problem a clever person can’t trivially circumvent.
So it’s still useful is reducing spam.
Three things seem to be in conflict here:
1. This definition of intelligence...i.e. "behavior indistinguishable from a human"
2. The idea that LLMs are artificial intelligence
3. The idea that we can detect if something is generated by an LLM
This feels to me like one of those trilemmas, where only two of the three can be true. Or, if we take #1 as an axiom, then it seems like the extent to which we can detect when things are generated by an LLM would imply that the LLM is not a "true" artificial intelligence. Can anyone deeply familiar with the space comment on my reasoning here? I'm particularly interested in thoughts from people actually working on LLM detection. Do you think that LLM-detection is technically feasible? If so, do you think that implies that they're not "true" AI (for whatever definition of "true" you think makes sense)?
But supposing that you ran that test where one of the hidden people is a confederate that steganographically embeds a gender marker without it being obvious to anyone but yourself. You would be able to break the game, even if your confederate was perfectly mimicking the other gender.
That is to say, embedding a secret recognition code into a stream of responses works on humans, too, so it doesn't say anything about computer intelligence.
And for that matter, passing the Turing test is supposed to be sufficient for proving that something is intelligent, not necessary. You could imagine all sorts of deeply inhuman but intelligent systems that completely fail the Turing test. In Blade Runner, we aren't supposed to conclude that failing the Voight-Kampff test makes the androids mindless automatons, even if that's what humans in the movie think.
In its essentialist form it's impossible to define, but in context it is nothing but skilled search for solutions. And because most problems are more than one can handle, it's a social process.
Can you measure the value of a word in isolation from language? In the same way you can't meaningfully measure intelligence in a vacuum. You get a very narrow representation of it.
The idea behind watermarking (the topic of the paper) is that the output of the LLM is specially marked in some way at the time of generation, by the LLM service. Afterwards, any text can be checked for the presence of the watermark. In this case, detect if something is generated by an LLM means checking for the presence of the watermark. This all works if the watermark is robust.
Rather they are saying how to modify the design of an LLM to deliberately inject watermarks into generated text such that it will be possible to detect that the text came from a particular LLM.
While interesting in the abstract, I think I can definitively say that absolutely nobody wants this. People trying to pass off LLM content (whether students or content providers) as human-written are not interested in being detected. People who are using LLMs to get information for their own knowledge or amusement or as a cybernetic augmentation do not need this. LLM providers want to drive adoption, and if you can be exposed as passing off LLM slop as your own, then nobody will use their stuff.
Serious question: has that become pay-to-publish a la Forbes etc when I wasn't paying attention?
In 2027, when 90% of the Internet is AI slop, we won't even be able to train new foundational models unless we maintain the sanity of the Internet... And we need to start doing something about it. Kudos to the authors.