Icon (Close Menu)

Logout

AI & Copyright: Authors Fight Back Against Tech CompaniesLock Icon

3 min read

Authors and media companies are going to court to halt tech companies from using their content to train artificial intelligence systems.

Last year, former Arkansas Gov. Mike Huckabee and a handful of other authors filed a lawsuit accusing Microsoft, Meta and other tech companies of violating copyright law by using their writings without permission. The authors are seeking class-action status in federal courts against the tech companies, alleging that they illegally used their works to train AI programs that emulate human language.

Huckabee’s lawsuits — one pending in federal court in New York and another in California — say that large-language models “represent a significant advancement in the field of AI and have gained widespread attention for their impressive abilities to generate human-like text and perform various natural language processing tasks.

“These models, powered by generative AI techniques, have revolutionized numerous industries and applications.”

The systems need a massive amount of text data to learn, and they use books to create the systems. But using “pirated (or stolen) books does not fairly compensate authors and publishers for their work,” the lawsuits say. 

As Huckabee’s cases move through the federal courts, other companies have filed similar lawsuits. In December, The New York Times Co. sued Microsoft and its partner OpenAI Inc., alleging that they used New York Times’ articles without permission.

OpenAI said in a blog post on its website that it supports journalism and it believes the lawsuit is without merit.

In the lawsuit, The New York Times Co. said that the business model is based on “mass copyright infringement.”

Robert Brauneis, a law professor and co-director of the Intellectual Property Program at the George Washington University Law School, agreed.

“There’s no question that most of the developers of the prominent generative AI tools are using material that’s … protected by copyright to train those tools and doing it without permission of the copyright owners in that material,” he told Arkansas Business

One exception is Adobe Firefly, which uses generative AI and text prompts to create images. It used licensed material to learn, Brauneis said.

But other companies have made copies of either text or images or both, he said.

“That would qualify as reproductions of those under the Copyright Act, and therefore would result in copyright liability unless there was some exception that covered that activity,” he said.

A defense the defendants could use is the fair use exception, which is broad, but permits, in certain cases, the use of copyrighted material without its author’s permission.

“So this is all going to come down to whether that activity of copying amounts to fair use or not, or is covered by the fair use exception or not,” Brauneis said.

A four-factor analysis is deployed to determine what is fair use. “And then there’s a whole bunch of case law interpreting those four factors,” he said. “It’s not like there’s a thermometer and when it gets to 212 that’s the boiling point and there’s no fair use anymore. There’s no kind of single metric that courts have agreed on to decide whether fair use is a fair use.”

Another argument the tech companies could make is that the work is transformative, which means the new images or text that’s generated by the AI tool aren’t infringing copyright of any of the works that were used to train the AI systems.

The plaintiffs in these cases, however, argue that the use of their material is not fair use.

The plaintiffs also say the work that the AI tools generate has an impact on the potential market for the copyrighted material, which is decreasing the value of the work that the tech companies are using to train the AI systems. 

The plaintiffs’ argument will be the “generative AI tools may not be generating content that is a word-for-word copy of the content [that was used for the training], … but the whole point of these tools is to generate text and images … that competes with the input that was used to train the tool,” Brauneis said.

He said it’s difficult to determine how the courts will rule.

Send this to a friend