Even smart tools are tools designed to do what their users want. I would argue that the real problem is the maniac humans.
Having said that, it's obviously not ideal. Surely there are various approaches to at least mitigate some of this. Maybe eventually actual interpretable neural circuits or another architecture.
Maybe another LLM and/or other system that doesn't even see the instructions from the user and tries to stop the other one if it seems to be going off the rails. One of the safety systems could be rules-based rather than a neutral network, possibly incorporating some kind of physics simulations.
But even if we come up with effective safeguards, they might be removed or disabled.. androids could be used to commit crimes anonymously if there isn't some system for registering them.. or at least an effort at doing that since I'm sure criminals would work around it if possible. But it shouldn't be easy.
Ultimately you won't be able to entirely stop motivated humans from misusing these things.. but you can make it inconvenient at least.
I sometimes wonder if that is what our brain hemispheres are. One comes up with the craziest, wildest ideas and the other one keeps it in check and enforces boundaries.
In vino veritas etc.: https://en.wikipedia.org/wiki/In_vino_veritas
Well, yeah, but then you need to provide, transport, and control those.
The difference here is these are the sorts of robots that are likely to already be present somewhere that could then be abused for nefarious deeds.
I assume the mitigation strategy here is physical sensors and separate out of loop processes that will physically disable the robot in some capacity if it exceeds some bound.
hiring a developer to write that sounds expensive
just wire up another LLM
Edit: Being completely serious here. My reasoning was that if the robot had a comprehensive model of the world and of how harm can come to humans, and was designed to avoid that, then jailbreaks that cause dangerous behavior could be rejected at that level. (i.e. human safety would take priority over obeying instructions... which is literally the Three Laws.)
There’s also still the HF step that needs to be incorporated, which is expensive! But the alternative is Waymo, which keeps the law perfectly even when “everybody knows” it needs to be broken sometimes for traffic(society) to function acceptably. So the above strong prior needs to be coordinated with HF and the appropriate penalties assigned…
In other words. It’s a mess! But assumptions of “AGI” don’t really help anyone.
I think the various safeguards companies put on their models, are, their attempt at the three laws. The concept is sort of silly though. You have a lot of western LLMs and AIs which have safeguards build on western culture. I know some people could argue about censorship and so on all day, but if you’re not too invested in red vs blue, I think you’ll agree that current LLMs are mostly “safe” for us. Nobody forces you to put safeguards on your AI though and once models become less energy consuming (if they do), then you’re going to see an jihadGPT, because why wouldn’t you? I don’t mean to single out Islam, insure we’re going to see all sorts of horrible models in the next decade. Models which will be all to happy helping you build bombs, 3D print weapons and so on.
So even if we had thinking AI, and we were capable of building in actual safeguards, how would you enforce it on a global scale? The only thing preventing these things is the computation required to run the larger models.
And in those stories it was enforced in the following way: the earth banned robots. In response the three laws were created and it was proved that robots couldn't disobey them.
So I guess the first step is to ban LLMs until they can prove they are safe ... Something tells be that ain't happening.
I vaguely recall it involved two or three robots who were unaware of what the previous robots had done. First, a person asks one robot to purchase a poison, then asks another to dissolve this powder into a drink, then another serves that drink to the victim. I read the story decades ago, but the very rough idea stands.
You might be thinking of Let's Get Together? There is a list there of the few short stories in which the robots act against the three laws.
That being said the Robot stories are meant to be a counter to the Robot As Frankenstein's Monster stories that were prolific at the time. In most of the stories robots literally cannot harm humans. It is built into the structure of their positronic brain.
What we need is a clear indication of who is to blame when a bad decision is made? I would argue, just like with a weapon, that the person giving/writing instructions is, but I am sure there will be interesting edge cases that do not yet account for dead man's switch and the like.
edit: On the other side of the coin, it is hard not to get excited ( 10k for a flamethrower robot seems like a steal even if I end up on a list somewhere ).
What does this device exist for? And why does it need a LLM to function?