It's not just that they are (somewhat) unusual phrases, it's that ChatGPT comes up with those phrases so very often.
It's quite like how earlier versions always had a "However" in between explanations.
- Your hypothesis only holds if the alternative LLM is also "sufficiently good". If Gemini does not stay competitive with other LLMs, Google's AI plans have a much more serious problem.
- Your hypothesis assumes that many people will be capable of detecting the watermarks (both of Gemini and other LLMs) so that they can make a conscious choice for another LLM. But the idea behind good watermarking is that it is not that easy to detect.
Just to say, maybe give it a little time, but a watermark like this is not going be thing that decides someone's reaction in the near future, just what it says. (I am just betting here).
But its going to be an uphill battle either way if you are really getting the model to write everything, I do not envy that kind of project.
And this should be also the case for every possible LLMs; then you can compare which LLMs could produce which outputs based on what inputs. Then there is some certainty that this output is produced by this LLM and this another LLM might produce it as well with these inputs.
So... impossible?
Suffice to say it is evident that no other LLM will come close in integration with Google Docs and other Workspace apps as Gemini.
There are counter examples, e.g. Unity. But catching that lightning in a bottle is rare and merits special explanation rather than being assumed.
If by "we" you mean anyone else than Google and the select few other LLM provider they choose to associate with, I'm afraid you're going to be disappointed.
IE, DRM can't change peoples' motivations. It's useful for things like national security secrets and trade secrets, where the people who have access to the information have very clear motivations to protect that information, and very clear consequences for violating the rules that DRM is in place to protect.
In this case, the big question of if AI watermarking will work / fail has more to do with peoples' motivations: Will the general public accept AI watermarking because it fits our motivations and the consequences we set up for AI masquerading as a real person, or AI being used for misinformation? That's a big question that I can't answer.
How can you watermark text?
https://spectrum.ieee.org/watermark#:~:text=How%20Google%E2%...
These two sentences next to each other don't make much sense. Or are misleading.
Yeah. I know. Only the client is open source and it calls home.
For example, with the phrase “My favorite tropical fruits are __.” The LLM might start completing the sentence with the tokens “mango,” “lychee,” “papaya,” or “durian,” and each token is given a probability score. When there’s a range of different tokens to choose from, SynthID can adjust the probability score of each predicted token, in cases where it won’t compromise the quality, accuracy and creativity of the output.
This process is repeated throughout the generated text, so a single sentence might contain ten or more adjusted probability scores, and a page could contain hundreds. The final pattern of scores for both the model’s word choices combined with the adjusted probability scores are considered the watermark. This technique can be used for as few as three sentences. And as the text increases in length, SynthID’s robustness and accuracy increases."
Better link: https://deepmind.google/technologies/synthid/
I suspect sentence structure is also being used or, more likely, the primary “watermark”. Similar to how you can easily identify if something is at least NOT a Yoda quote based on it having incorrect structure. Combine that with other negative patterns like the quote containing Harry Potter references instead of Star Wars, and you can start to build up a profile of trends like this statement.
By rewriting the sentence structure and altering usual wording instead of directly copying the raw output, it seems like you could defeat any current raw watermarking.
Though this hasn’t stopped Google and others in the past using bad science and stats to make unhinged entitled claims like when they added captcha problems everybody said would be “literally impossible“ for bots to solve.
What a surprise how trivial they were to automate and the data they produce can be sold for profit at the expense of mass consumer time.
In summary, this will not work in practice. Ever.
How long until they can write genuinely unique output without piles of additional prompting?
Recognizing only parts of a watermark, and many watermarked parts scattered all around doesn't seem possible at all, in my mind.
They can however develop a software to sell very expensively to universities, schools etc, and it will occasionally catch a very guilty person who uses it all the time and doesn't even try to make the answer better, who always hands over the LLM answer in one piece.
At the end of the day, it will lead to so many false accusations people will stop trusting it. In chess players and tournaments false accusations of cheating happen all the time, for 15 years or more. Right now former world chess champion Kramnik has accused over 50 top chess players of cheating, including the 5 times US champion Nakamura, in the span of 2 months.
If a software like that gets applied to schools and universities, we are gonna have the fun of our lives.
Obviously Google claims that it doesn't cause any issues but I'd think that OpenAI and other competitors would have something similar to SynthID if it didn't impact performance.
Is that not at odds with what's presented in the article here?
> Google has already integrated this new watermarking system into its Gemini chatbot, the company announced today.
Key word: already
> It has also open-sourced the tool and made it available to developers and businesses, allowing them to use the tool to determine whether text outputs have come from their own large language models (LLMs), the AI systems that power chatbots.
That's basically the change.
The phrases usually mean something useful, if one knows the meaning, but it is amusing how much people seem to stick with the same ones, even across companies.
But I find the idea that people will subconsciously start copying AI speech patterns (perhaps as a signal of submission) amusing. I think it's gonna throw a wrench into the idea.
IMHO LLMs either should help us communicate more clearly and succinctly, or we can use them as tools for creativity ("rephrase this in 18th century English"). Watermarking speech sabotages both of these use cases.
Three comments here:
1. I wonder how many of the 20M prompts got a thumbs up or down. I don't think people click that a lot. Unless the UI enforces it. I haven't used Gemini, so I might be unaware.
2. Judging a single response might be not enough to tell if watermarking is acceptable or not. For instance, imagine the watermarking is adding "However," to the start of each paragraph. In a single GPT interaction you might not notice it. Once you get 3 or 4 responses it might stand out.
3. Since when Google is happy with measuring by self declared satisfaction? Aren't they the kings of A/B testing and high volume analysis of usage behavior?
I sometimes do, but I almost always give wrong answer or opposite answer where possible.
They then contact me and ask me why, so I tell them then they say there is nothing they can do. A week later I’ll get a pop up asking for feedback and we go round the same loop again.
reminds me of "what have the romans ever done for us?"
but thx for elaborating.
He's a theoretical computer scientist but he was recruited by OpenAI to work on AI safety. He has a very practical view on the matter and is focusing his efforts on leveraging the probabilistic nature of LLMs to provide a digital undetectable watermark. So it nudges certain words to be paired together slightly more than random and you can mathematically derive with some level of certainty whether an output or even a section of an output was generated by the LLM. It's really clever and apparently he has a working prototype in development.
Some work arounds he hasn't figured out yet is asking for an output in language X and then translating it into language Y. But those may still be eventually figured out.
I think watermarking would be a big step forward to practical AI safety and ideally this method would be adopted by all major LLMs.
That part starts around 1 hour 25 min in.
> Scott Aaronson: Exactly. In fact, we have a pseudorandom function that maps the N-gram to, let’s say, a real number from zero to one. Let’s say we call that real number ri for each possible choice i of the next token. And then let’s say that GPT has told us that the ith token should be chosen with probability pi.
https://axrp.net/episode/2023/04/11/episode-20-reform-ai-ali...
You wouldn't even need to have access to an unwatermarked model, the “correcting model” could even be watermaked itself as long as it's not the same watermarking function applied to both.
Or am I misunderstanding something?
You would also need to define probability graph based on the output length. The longer the output, more certain you can be. What is the smallest amount of tokens that cannot be proved at all?
You would also need include humans. Can you define that for human? All LLMs should use the same system uniformally.
Otherwise, "watermaking" is doomed to be misused and not being reliable enough. False accusations will be take a place.
In time most forms of watermarking along those lines will seem like elements of an LLM's writing style, and will quickly be edited out by savvy users.
hah, every single LLM already watermarks its output by starting the second paragraph with "It is important/essential to remember that..." followed by inane gibberish, no matter what question you ask.
Now LLMs are trained on Reddit users.
It's starting to become a common destination for when I want to read about interesting things.
My reasoning is simple: the only way to watermark text is to inject some relatively low-entropy signal into it, which can be detected later. This has to a) work for "all" output for some values of all, and b) have a low false positive rate on the detection side. The amount of signal involved cannot be subtle, for this reason.
That signal has a subtractive effect on the predictive-output signal. The entropy of the output is fixed by the entropy of natural language, so this is a zero-sum game: the watermark signal will remove fidelity from the predictive output.
This is impossible to avoid or fix.
i have two hands
i have 2 hands
these sentences communicate the same thing but one could be a watermarked result. we can apply this equivalent meaning word/phrase change many times over and be confident something is watermark while having avoided any semantic shifts.
They use the last N prefix tokens, hash them (with a keyed hash), and use the random value to sample the next token by doing an 8-wise tournament, by assigning random bits to each of the top 8 preferred tokens, making pairwise comparisons, and keeping the token with a larger bit. (Yes, it seems complicated, but apparently it increases the watermarking accuracy compared to a straightforward nucleus9 sampling.)
The negative of this approach is that you need to rerun the LLM, so you must keep all versions of all LLMs that you trained, forever.
Too bad they only formulate the detection positive rate empirically. I am curious what the exact probability would be mathematically.
The existing impossibility results imply that these attacks are essentially unavoidable (https://arxiv.org/abs/2311.04378) and not very costly, so this line of inquiry into LLM watermarking seems like a dead end.
The first serious investigations into "secure" steganography were about 30 years ago and it was clearly a dead end even back then. Sure, watermarking might be effective against lazy adversaries--college students, job applicants, etc.--but can be trivially defeated otherwise.
All this time I'd been lamenting my research area as unpopular and boring when I should've been submitting to Nature!
Presumably there are things like key exchanges that look like randomness, and then you could choose LLM output using that randomness in such a way that you can send messages that look like an LLM conversation?
Someone starts the conversation with a real message 'Hello!' and then you do some kind of key exchange where what is exchanged is hard to distinguish from randomness, and use those keys to select the probabilities of the coming tokens from the LLM. Then once they is established you use some kind of cipher to generate random-looking ciphertext and use that as the randomness used to select words in the final bit?
Surely that would work? If there is guaranteed insecurity, it's for things like watermarking, not for steganography?
Yet, it doesn’t stop companies from making claims like these, and what’s worse, people buying into them.
If there were a law that AI generated text should be watermarked then major corporations would take pains to apply the watermark, because if they didn't then they would be exposed to regulatory and reputational problems.
Watermarking the text would enable people training models to avoid it, and it would allow search engines to determine not to rely on it (if that was the search engine preference).
It would not mean that all text not watermarked was human generated, but it would mean that all text not watermarked and provided by institutional actors could be trusted.
What?
You're trading the warm feeling of an illusion of trust for a total lack of awareness and protection against even the most mild attempt at obfuscation. This means that people who want to hurt or trick you, will have free reign to do it, even if it means your 90-year-old grandmother lacks the skill.
GDPR.
How many breaches of privacy by large orgnaizations occur in the EU? When they occur, what happens?
On the other hand - what's the story in the USA?
Alternatively what would have happened if we simply said "data privacy cannot be maintained, no laws will help"?
Consider any hacker from a non-extraditing rogue state.
Consider any nation state actor or well-equipped NGO. They are more motivated to manipulate you than Starbucks.
Consider the slavish, morbid conditions faced by foreign workers who manufacture your shoes and mine your lithium. All of your favorite large companies look the other way while continuing to employ such labor today, and have a long history of partnering with the US government to overthrow legitimate foreign democratic regimes in order maintain economic control. Why would these companies have better ethics regarding AI-generated output?
And consider the US government, whose own intelligence agencies are no longer forbidden from employing domestic propaganda, and whom will certainly get internal permission to circumnavigate any such laws, while still exploiting them to their benefit.
Malicious non-compliance is still common IME. Enforcement is happening but has been focused on the very large egregious abuse so far only.
It doesn't get you perfectly deterministic output to set it to 0 though, per https://medium.com/google-cloud/is-a-zero-temperature-determ... as you don't have perfect control over what approximations are being made on your floating point operations
Nvidia gives some guarantees about deterministic results of their kernels but that only applies when you have exact same input data and this is not the case when in-flight batching.
We can also sample from the distribution, which introduces randomness. Basically, if word1 should be chosen 75% of the time and word2 25% of the time, it will do that.
The randomness you’re seeing can also be due to implementation details.
https://community.openai.com/t/a-question-on-determinism/818...
I’m far, far more concerned about photo, video, and audio verification. We need a camera that can guarantee a recording is real.
If only they can get other providers to use it because of 'safety' or something they won't have to change their indexer much. Otherwise page rank is dead due to the ease of creating content farms.
Maybe if you prompt it right, it can do a better job of masking itself, but people don't seem to do that.
My guess is zero times. So, you are not describing an experiment here, you are just describing how you built up your internal bias.
I can tell if somebody ate cornflakes or oatmeal for breakfast just by looking at how they walk. I'm always right. you better believe me. I've seen thousands of people walk.
Bad human resume text and overuse of unmodified LLM output are both detectable, but they are detectable because they are bad in quite different ways.
Regarding the original resume reader's notion that they can detect LLM text with a high degree of accuracy, it is not their LLM output detection specificity I would take issue with (similarly, despite stating validation is critical, I would bet you, too, are pretty confident when you see an entire page of blogspam or marketih copy that you regard as LLM generated despite it rarely being marked as such). Rather, it is their sensitivity, as I am sure occasional use and especially slightly modified output from LLMs gets by them now and again without them knowing.
The economists is not not as good of an analogy, almost converse of the makeup example as that is a high rate of false positives.
Great so now people have to be worried about being too statistically similar to an arbitrary "watermark".
Then they also store everything which the partners upload to check if it’s created by them.
If other AI players also would store everything they create and make it available in a similar way there could be indeed some working watermark.
If one would use a private run AI to change the public run AI generated content to alter it there still would be a percentage similarity recognisable to hint that it might come from one of the public AIs.
Timestamps would become quite relevant since much content would start to repeat itself at some point and the answers generated might be similar.