In a world where artificial intelligence is constantly evolving and learning, OpenAI’s language model, ChatGPT, finds itself thwarted by the very sources it seeks to learn from. The AI, likened to a digital weasel scurrying around the internet to keep its knowledge base current, has been blocked by leading news organizations including the New York Times, CNN, Reuters, and the Chicago Tribune. Even Australian media outlets have not hesitated to bolt their digital doors against this data-gathering bot. The move, as reported by the Guardian, is a significant stumbling block for OpenAI as it strives to keep its AI model updated in a rapidly changing news landscape.
The weasel in question is GPTBot, a web-crawling software launched by OpenAI. According to the company, allowing GPTBot access can enhance the accuracy, general capabilities, and safety of AI models. However, the refusal from major media organizations—which often have the most recent and relevant news—could potentially hamper the chatbot’s development. This resistance stems from the outlets’ terms of service, which explicitly forbid data scraping and the use of their content for training AI programs. As they grapple with changing business models in the wake of social media’s dominance, these publishers are desperately trying to keep AI from further disrupting their revenue streams.
OpenAI’s Language Model Faces Setback As Major News Outlets Block Access
Tech giant OpenAI recently faced a major stumbling block in its quest to improve its large language model, ChatGPT. Several leading news organizations, including the New York Times, CNN, Reuters, and the Chicago Tribune, have reportedly blocked the AI model’s access to their content.
A Weasel in the Digital World
OpenAI’s web-crawling software, GPTBot, was launched earlier this month with the aim to make AI models more accurate and enhance their overall capabilities and safety. Often referred to as a weasel, it scurries around the internet, gathering up-to-date information to train the language model, ChatGPT.
When ChatGPT was initially launched, it had only been trained on information up until September 2021. To keep up in the fast-paced artificial intelligence race, it needed to learn and adapt to new information. OpenAI attempted to address this issue by temporarily launching a Browse with Bing feature and setting GPTBot free to scour the internet.
However, major media organizations, which often provide the most recent and relevant news, have put up “do not enter” signs, potentially posing a significant problem for the chatbot.
The Dilemma for News Publishers
The terms of service for the Times, Reuters, and the Tribune explicitly state that their data may not be scraped by users. More specifically, the Times’ terms mention that its content cannot be used to train A.I. programs. This poses a direct challenge to OpenAI’s endeavors.
News publishers make money through selling information, whether through subscriptions or advertisements, and they need people to visit their websites to generate revenue. If their content is freely available to chatbots, it could potentially hurt their revenue, especially at a time when traditional media is grappling with a shift in advertising dollars to social media platforms.
The Potential Way Forward
By blocking GPTBot, these media organizations could be pressuring OpenAI into paying for access to their content. Last month, OpenAI struck a deal with the Associated Press to license its news stories for A.I. training purposes. The exact financial details of this agreement are unknown, but it could set a precedent for other tech companies. Google, which also scrapes publishers’ sites to train its large language model, might consider similar deals with news publishers that have blocked its access.
While the advancement of artificial intelligence is inevitable, it’s clear that news publishers are taking a stand to protect their content and revenue. This tug-of-war between AI development and content ownership could potentially reshape how AI models are trained in the future. It also underscores the importance of data privacy and the need for clear rules around data scraping. It will be interesting to see how tech companies and news publishers navigate this complex landscape in the future.