Financial Times. announced arrangement with OpenAI on Monday gave its world-class journalism license to train and report ChatGPT models. He joins Axel Springer and the Associated Press in similar deals involving OpenAI He is said to be offering millions for the right to exploit the content. However, ChatGPT has been trained on a variety of other Internet-sourced content that OpenAI has not paid for. So why does OpenAI pay for some datasets and not others?
OpenAI’s licensing agreements seem to send a clear message: we will exploit your content anyway, so sign a contract with us or get left behind. The main benefit of the license agreement seems to feature prominently in ChatGPT responses. Some publishers may also want to strengthen relationships with the next gigantic information distribution channel before it takes over. However, it appears that OpenAI uses a lot of publisher content anyway.
OpenAI already partially trains its artificial intelligence models in “publicly available data” according to CTO Mira Murati, which seems intentionally vague. What is publicly available data anyway? The expression assumes that everything that can be read on the Internet can also be built into ChatGPT for free. For example, Gizmodo is part of OpenAI’s “publicly available data.” Our website has been cached 34,000 times in WebText GPT-2 dataset, the latest dataset revealed by OpenAI, used to train an artificial intelligence model.
Gizmodo is free to readers mainly because of the advertising on this site. If readers can access our content via ChatGPT, it breaks our business model. The Recent York Times, which is much more commonly used in the WebText GPT-2 dataset, sued OpenAI for copyright infringement on this very matter.
A content licensing deal with OpenAI seems to be the only way publishers can stay relevant in the age of artificial intelligence. IN press releaseFinancial Times Group CEO John Ridding says this deal will “widen the reach” of their work while providing “early insight into how content is exposed using artificial intelligence.”
“The problem with artificial intelligence is that it’s not really artificial intelligence,” Matthew Butterick, a lawyer representing Sarah Silverman and other book authors suing OpenAI, told Gizmodo. “It’s human intelligence gathered in one place, separated from its creators, and then this gigantic tech company sets a price and sells it to someone else.”
Butterick is a plaintiff in six copyright lawsuits against artificial intelligence companies. He’s also a writer, programmer, and designer, so he claims to understand how AI could threaten these industries. Broadly speaking, his cases center around the claim that artificial intelligence simultaneously exploits the work of creators and threatens their livelihoods.
OpenAI’s licensing agreements have raised eyebrows over the content ChatGPT uses for free. Tech companies argue that generative AI constitutes “fair exploit” of copyrighted works because it transforms them into something up-to-date. The AI world also argues that it uses a model similar to Google Search, which caches copyrighted content to create a useful information retrieval tool. Like Google, AI chatbots have recently started including hyperlinks. Ultimately, a court will have to decide whether generative AI constitutes “fair exploit.”
OpenAI did not immediately respond to Gizmodo’s request for comment.
It seems that book authors and publishers are not the only ones from whom OpenAI draws content. Recently, The Recent York Times reported that OpenAI has completed GPT-4 training one million hours of YouTube video transcription. Days before the report’s release, YouTube’s CEO said that using his videos for artificial intelligence training would be a “clear violation” of the site’s policies.
OpenAI content licensing clouds the discussion. A company somehow uses web content for free while paying others for their work. Other tech companies, such as Apple, are reportedly more proactive in paying for all of their training data. Adobe apparently paid $3 per minute video for training AI video generator.
However, it is unclear whether even a one-time fee to obtain AI training data will be enough. We’re talking about a tool that has the potential to change the media industry for writers, audio and video producers, and more. Signing a contract with OpenAI may guarantee you a good spot in the ChatGPT results, but it looks like the AI chatbot could have used your content anyway. At least for now, AI companies are eager to exploit whatever is on the Internet and then ask questions about the legality of it all.