New York Times Sues OpenAI & Microsoft For ‘Unlawful Copying’ Of Its Articles
By Mikelle Leow, 28 Dec 2023
Photo 106806965 © Stefan Saru | Dreamstime.com
The New York Times has taken the offensive against tech behemoths OpenAI and Microsoft, which backs the artificial intelligence firm, accusing them of misappropriating its journalistic content to fuel their advanced AI tools. The lawsuit, initiated in a Manhattan court on Wednesday, puts a century’s worth of journalism at the center of a heated copyright dispute.
The venerable newspaper alleges that millions of its articles have been used to train popular AI like OpenAI’s GPT-4 and Microsoft’s Bing Chat, effectively creating unfair competition that mimics the Times’ distinct journalistic voice.
The suit goes as far as to provide instances where AI-generated summaries closely mirror the newspaper’s unique articles—including investigative reports—sometimes quoting them verbatim, without permission or proper attribution.
Evidence of OpenAI infringing NYT's copyrighted content verbatim:https://t.co/NQh5zSMOt8 pic.twitter.com/HMrAbVR9Dh
— Nicole Miller (@JOSourcing) December 27, 2023
Among the cited misappropriations is a Pulitzer Prize-winning probe into New York City’s taxi industry and an in-depth analysis of how outsourcing by tech companies like Apple has reshaped the global economy.
At stake, according to the Times, is not just the protection of its intellectual property but the safeguarding of its business model. The AI tools in question offer users free access to paywalled information that mirrors the Times’ content, potentially diverting readers and subscribers away from the newspaper and diminishing its advertising revenue.
Screenshot from the complaint via The New York Times
The Times has branded its website as the most exploited resource in this alleged act of mass scraping, with over 66 million records utilized to train the AI models.
2/ The visual evidence of copying in the complaint is stark. Copied text in red, new GPT words in black—a contrast designed to sway a jury. See Exhibit J here.
— Cecilia Ziniti (@CeciliaZin) December 27, 2023
My take? OpenAI can't really defend this practice without some heavy changes to the instructions and a whole lot of… pic.twitter.com/c15glvBNqd
The Times also cites one occurrence in which the ChatGPT-powered Browse with Bing spat out “almost verbatim” responses from its product review site Wirecutter without providing a link, leading to a dip in traffic for both Wirecutter reviews and affiliate links.
Highlighted within the lawsuit are specific instances where AI outputs have not only engaged in “unlawful copying” of the Times’ style but have also lifted entire segments from landmark investigations and analytical pieces. The complaint also notes how these text generators, while presenting vast knowledge, sometimes provide misinformation or “hallucinate” facts, oftentimes inaccurately attributing them to the Times and thereby risking the paper’s reputation for reliability.
In the New York Times OpenAI lawsuit, you can see how complex the relationship of training data to output can be. On one hand, they find that you can induce ChatGPT to produce exact content from famous Times articles, on the other, they show it also hallucinates false articles. pic.twitter.com/cY7cyZjd8r
— Ethan Mollick (@emollick) December 27, 2023
In one example, the Times outlines how Microsoft’s Bing Chat generated a list of “15 most heart-healthy foods,” of which 12 were mistakenly accredited to the publication.
The battle lines are drawn not only over the use of The New York Times’ content but over the broader ethical and legal frameworks governing AI training and the safeguarding of copyrighted material.
In a resolute demand, The New York Times is calling for any AI models and training data derived from its content to be “destroyed.” While it has not specified how much it wants to be compensated for the alleged plagiarism, it says OpenAI should be held responsible for “billions of dollars in statutory and actual damages.”
Talks between the publisher and the AI research organization beginning in April did not reach a satisfactory conclusion, the Times notes.
“These negotiations have not led to a resolution,” the paper asserts in its complaint. “Publicly, [OpenAI insists] that their conduct is protected as ‘fair use’ because their unlicensed use of copyrighted content to train GenAI models serves a new ‘transformative’ purpose. But there is nothing ‘transformative’ about using The Times’ content without payment to create products that substitute for The Times and steal audiences away from it. Because the outputs of Defendants’ GenAI models compete with and closely mimic the inputs used to train them, copying Times works for that purpose is not fair use.”
“If The Times and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill,” the filing details.
“With less revenue, news organizations will have fewer journalists able to dedicate time and resources to important, in-depth stories, which creates a risk that those stories will go untold. Less journalism will be produced, and the cost to society will be enormous.”
Despite the confrontation, OpenAI has emphasized a willingness to engage constructively with the Times and other publishers, stressing respect for content creators’ rights.
This landmark row may set a precedent for the future of AI and content creation, and the intricate dance between technological innovation and “fair use.”
[via Engadget, The New York Times, Associated Press, cover photo 106806965 © Stefan Saru | Dreamstime.com]