"Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal"
Which is "old news"
Disable javascript to read or: https://archive.ph/Kp29q
Previously it was just “books3 was part of the training data”, now it’s “MZ was made aware of pirated materials, gave the go ahead, and by way of torrenting Meta engineers redistributed the copyrighted materials, which is outlawed whether or not you’re training a super intelligence.
Personally I’m not a fan of enforcing copyright law in general, but I’m especially not a fan of corporations getting to skirt laws that the little people are made to obey. If Meta wants to train on libgen, they should have partnered with internet archive and provided them better lawyers.
Given that llama was originally “leaked” via torrent, I have this assumption that meta folks are Pirates in spirit tho, and wouldn’t leech without being told explicitly, but then, being told not to upload would be legally perilous too since it would hint that they are aware of the illegality. Meta’s defense here seems to be “Officer I swear I didn’t know that wasn’t allowed”, testing the legal theory of transforming copyrighted work.