The site Wargamer has identified 15 pirated Warhammer books in a dataset used to train AI systems, including Meta’s LLaMA and Bloomberg’s BloombergGPT, and non-Warhammer books by 34 authors who have also written for Black Library.
Among the titles are titles covering Warhammer 40k, Age of Sigmar, and Warhammer: The Old World settings, written by current and former Warhammer book authors
The Atlantic released an online search tool on Monday that facilitates queries of author’s work used to train AI. A training tool called “Books3”. Meta and Bloomberg both reference Books3 as part of their AI training data
Warsgamer contacted Games Workshop about the use of its copyrighted works to train artificial intelligence. Black Library authors work on a “work for hire” basis and surrender their copyrights to Games Workshop. This would leave it up to the studio to decide whether or not to take legal action.
Even though it is unlikely to incur lawsuits, there may be notes of cease and desist. On the other, both Meta and Bloomberg have admitted to using the ‘Book3’ resources. However, it does not appear that they are aware of any copyright breaches.
The Atlantic’s search tool was developed by journalist Alex Reisner. In an article in The Atlantic on August 19, Reisner explained how he obtained the training dataset Books3, which was used to train Meta’s LLaMA AI algorithm
“the initial model of BloombergGPT”, and the open-source AI tool GPT-J. Books3 contains “roughly 190,000 entries”, each one a large text.”
Reisner says that “more than 30,000 titles are from Penguin Random House and its imprints, 14,000 from HarperCollins, 7,000 from Macmillan, 1,800 from Oxford University Press, and 600 from Verso”.
How accurate the search engine is remains to be seen. For now, its best leaving it up to Games Workshop.