AI Gets a Legal Gift for Thanksgiving
2023-11-28 21:0:28 Author: securityboulevard.com(查看原文) 阅读量:8 收藏

Any new technology raises a host of new legal challenges, and AI is no exception. One such challenge is the role copyright law plays in the use of AI. Copyright law raises several distinct issues—first, when an AI program is trained on works that are subject to copyright, does the mere act of training—having a machine “read” or examine the copyrighted works posted online—constitute some form of infringement? Second, when the AI processes or analyzes the copyrighted work, does this constitute some form of the creation of an infringing “derivative work” under the copyright law? Third, when the trained AI program, based on a prompt from some user, creates a new work based on the training, does the output infringe on the copyright of the works from which it is trained?

As with any legal question—particularly novel questions—the answer will ultimately be “it depends.” On November 20, 2023, a federal court in San Fransisco addressed part of this question: Whether the training of AI infringes copyright. The court decided that the use of publicly posted copyrighted works to train AI programs does not result in the creation of a derivative work and, therefore, is not per se actionable.

Fighting Llamas

In Kadrey v. Meta Platforms, Inc., 3:23-cv-03417, (N.D. Cal. Nov 20, 2023) ECF No. 56, the plaintiffs sued Meta—the owners of Facebook and users and creators of various AI programs and protocols—for copyright infringement. Meta’s AI program is called LLaMA, or Large Language Model Meta AI, and is described by Meta as “[working] by taking a sequence of words as an input and predicts a next word to recursively generate text. To train our model, we chose text from the 20 languages with the most speakers, focusing on those with Latin and Cyrillic alphabets.” In other words, it uses massive databases of natural language data to train the AI program. In human terms, “it reads a lot of stuff.” Some of that stuff—and, indeed, the majority of that stuff, is copyrighted by others. Some may enjoy federally registered copyright protection and some international copyright protection, but much of the stuff scanned by the LLaMA is protected by common law copyright. The act of reducing a unique work to a tangible medium (yes, the internet is “tangible”) creates a copyright in that work. However, not all uses of that work are infringing—and for works posted online, many are not infringing.

Richard Kadrey is a novelist and freelance writer who has written dozens of novels and nonfiction works. In his lawsuit against Meta’s LLaMA program, he asserts that the AI program’s scanning of his copyrighted works creates a “derivative work” from his writings. As the Kadrey judge noted, “[t]he plaintiffs allege that the ‘LLaMA language models are themselves infringing derivative works’ because the ‘models cannot function without the expressive information extracted’ from the plaintiffs’ books.”

The court disposes of this argument by summarily explaining, “This is nonsensical. A derivative work is ‘a work based upon one or more preexisting works’ in any ‘form in which a work may be recast, transformed, or adapted.,’” 17 U.S.C. § 101. The court noted, without explanation, that “There is no way to understand the LLaMA models themselves as a recasting or adaptation of any of the plaintiffs’ books.”

Sure, scanning the books is not a “recasting or adaptation” of the book, per se. However, unaddressed in the case is whether training the AI model to enable third parties to create derivative works constitutes contributory copyright infringement. If I were to use the LLaMA in an AI program and ask it, “Write a 300-page novel in the style of Richard Kadrey,” the LLaMA-trained program might do a credible job of doing so. That output arguably would be a “recast” or transformation of the original work—although even that is unclear because the “in the style of…” might not lead to an infringing work of any particular copyrighted work. If, on the other hand, a user asked a LLaMA trained program to “write a cyberpunk novel set in 2023 Los Angeles based on the work Metrophage but change all of the character’s names …” this would likely result in an infringing output. Does this mean that the analysis (training) is infringing? Magic 8 ball says—situation unclear; ask again later.

DevOps Unbound Podcast

In the rapidly evolving landscape of artificial intelligence (AI), the case of Richard Kadrey et al. vs. Meta Platforms, Inc. (Case No. 23-cv-03417-VC) emerges as a pivotal moment, reflecting the intricate legal and ethical complexities entwined with the development of AI technologies. The case, adjudicated in the United States District Court for the Northern District of California, delves deep into the contentious issue of using copyrighted material for training AI models, specifically focusing on Meta’s LLaMA language models.

Based on the facts before it, the San Francisco federal court rejected the claims of both direct and vicarious copyright infringement. Here again, the court specifically noted that there was no evidence of output from LLaMA that was similar to a copyrighted work such that the output constituted a derivative work. The court noted, “To prevail on a theory that LLaMA’s outputs constitute derivative infringement, the plaintiffs would indeed need to allege and ultimately prove that the outputs ‘incorporate in some form a portion of’ the plaintiffs’ books.” Unfortunately, these plaintiffs did not make such an allegation in their complaint. The court observed this fatal flaw, stating, “To the extent that [Plaintiffs] are not contending LLaMa spits out actual copies of their protected works, they would need to prove that the outputs (or portions of the outputs) are similar enough to the plaintiffs’ books to be infringing derivative works. And because the plaintiffs would ultimately need to prove this, they must adequately allege it at the pleading stage.”

Left unaddressed is how the court would rule if this simple allegation were added to the complaint—but that’s for another case.

DMCA

Kadrey and the other plaintiffs alleged that the LLaMA program improperly and unlawfully removed the author’s “Content Management Information,” or CMI, without their consent. CMI are indicia that the work allegedly infringed was, in fact, subject to copyright protection—things like the ® or © symbols, or “Copyright 2023—All rights reserved.”

The statute defines it as “any of the following information conveyed in connection with … displays of a work, including in digital form… [including truncated below] The name of, and other identifying information about, the author of a work. The name of, and other identifying information about, the copyright owner of the work, including the information set forth in a notice of copyright; Terms and conditions for use of the work; Identifying numbers or symbols referring to such information or links to such information; Such other information as the Register of Copyrights may prescribe by regulation.” In short, things that identify the owner of the copyright.

As to this allegation against LLaMA, the court was equally dismissive, observing, “[t]here are no facts to support the allegation that LLaMA ever distributed the plaintiffs’ books, much less did so ‘without their CMI.’” Thus, the DMCA claims failed.

Other Claims

The authors also sued Meta, claiming that its business model took advantage of the financial purposes of the authors’ copyrighted work without compensating them, which constituted unfair competition, unjust enrichment or negligence. The court summarily dismissed each of these claims as having been preempted by federal copyright law, noting that “the duty alleged (to the extent it could be thought to exist) is premised on the rights protected by the Copyright Act and thus any claims for breach of this duty are preempted.” The court also relied on the so-called “economic loss doctrine”—that plaintiffs must show concrete economic loss as a result of some negligent alleged act—the same doctrine that is often used to thwart data breach privacy lawsuits.

In short, it was a total win for Meta and Meta’s AI program.

This ruling stands as a significant beacon in the AI industry, particularly for companies engaged in the development of AI technologies like natural language processing. It underscores the ongoing debate over the boundaries of fair use in AI development and the legal intricacies of AI-generated content and the application of copyright law to AI programs and outputs.

The AI industry is at a crossroads. On one hand, AI has undeniable potential to revolutionize various sectors, from health care to entertainment. On the other hand, there are pressing concerns about the ethical and legal implications of how AI is trained and deployed. The heart of the matter lies in the use of copyrighted material for training these models. How much is too much? Where does fair use end and infringement begin? These are the questions that the industry and the legal system grapple with. As AI continues to evolve, so too must the legal frameworks governing its development and application. The Kadrey case may be just one of many such disputes to arise as AI becomes more integrated into our daily lives. The challenge for the legal system and AI developers is finding a balance that fosters innovation while respecting the rights of copyright holders. This balance is not just a legal imperative but an ethical one, ensuring that the benefits of AI are built on a foundation of respect for intellectual property and creativity.

Recent Articles By Author


文章来源: https://securityboulevard.com/2023/11/ai-gets-a-legal-gift-for-thanksgiving/
如有侵权请联系:admin#unsafe.sh