New data shows just how badly OpenAI and Perplexity are screwing over publishers

Innovation

AI companies promised publishers their AI search engines would send them more readers via referral traffic. New data shows that’s not the case.
OpenAI
Illustration by Fernando Capeto for Forbes; Graphics by Cherezoff/Getty Images

Companies like OpenAI and Perplexity have made lofty claims that their AI-powered search engines, which scrape information from the web to generate summarized answers, will provide new sources of income for publishers by directing more readers to their sites. But the reality is starkly different — AI search engines send 96% less referral traffic to news sites and blogs than traditional Google search, per a new report by content licensing platform TollBit, shared exclusively with Forbes. Meanwhile, AI developers’ scraping of websites has more than doubled in recent months, the report found.

OpenAI, Perplexity, Meta and other AI companies scraped websites 2 million times on average in the fourth quarter of last year, per the report, which analyzed 160 websites including national and local news, consumer tech and shopping blogs over the last three months of 2024. Each page was scraped about seven times on average.

“We are seeing an influx of bots that are hammering these sites every time a user asks a question,” CEO Toshit Panigrahi told Forbes. “The amount of demand for publisher content is nontrivial.” TollBit, which integrates with publishers to track scraping and charge AI companies each time they do so, collected the data from publishers that have signed up on its platform for analytics, giving it insight into traffic and scraping activity on their sites.

OpenAI did not comment, and Meta did not respond to a request for comment. A Perplexity spokesperson did not address the specific claims of the report, but said the company respects “robots.txt” directives, which instruct web crawlers on which parts of a site they are allowed to access.

“It’s time to say no.”

Nathan Schultz, CEO, Chegg

Last February, research firm Gartner predicted that traffic from traditional search engines would drop 25% by 2026, largely due to AI chatbots and other virtual agents. Businesses that rely on search traffic have already started to take a hit. Edtech company Chegg recently sued Google, alleging that the search giant’s AI-generated summaries included content from its website without attribution, snatching away eyeballs from its site and hurting its already diminishing revenue. Chegg’s traffic plummeted 49% in January year-over-year, a sharp decline from the 8% drop in the second quarter last year, when Google released AI summaries. The traffic decline has affected Chegg to the extent that it is considering going private or getting acquired, CEO Nathan Schultz said in an earnings call.

“It’s time to say no,” Schultz told Forbes. He said Google and publishers have long had a social contract to send users to high quality content, and not just retain that traffic on Google. “When you break that contract, that is not right.”

Ian Crosby, a partner at law firm Susman Godfrey representing Chegg, said the practice will harm search companies like Google in the long run, resulting in an “AI slurry” if companies like Chegg are put out of business. “It is a threat to the internet,” he said.

Google has called Chegg’s lawsuit “meritless,” claiming that its AI search service sends traffic to a greater diversity of sites.

Travel booking sites like Kayak and TripAdvisor are also concerned about Google’s AI search overviews chipping away at traffic, Forbes reported. Meanwhile, news publishers have taken legal action against both OpenAI and Perplexity for allegedly infringing on their intellectual property. (Both companies are fighting the suits.)

AI developers use what are called user agents to crawl the web and collect data, but many don’t properly identify or disclose their scraper bots, making it difficult for website owners to uncover and understand how AI companies are accessing their content. Some, like Google, appear to use the same bots for multiple purposes, including indexing the web and scraping data for its AI tools, Panigrahi said.

“It’s very hard for publishers to want to block Google. It could impact their SEO.”

Olivia Joslin, cofounder, TollBit

“It’s very hard for publishers to want to block Google. It could impact their SEO, and it’s impossible for us to deduce what exactly their bots’ use case is for,” TollBit cofounder Olivia Joslin said.

Google did not respond to a request for comment.

And then there’s $9 billion-valued AI search startup Perplexity. Even when publishers block Perplexity from accessing their sites, the AI startup continues to send referral traffic back to them, implying it continues to secretly scrape sites under the radar, the report found. In one example, it scraped a publisher’s website 500 times but sent over 10,000 referrals. One explanation for this, Panigrahi said, is that Perplexity used an unidentified web crawler to access the site. Perplexity only said it respects “robots.txt.”

Last year, the buzzy startup took heat for scraping and republishing paywalled articles, in some instances including nearly identical wording, from news outlets like Forbes, CNBC and Bloomberg without adequate attribution. It also cited low quality, AI-generated blogs and social media posts containing inaccurate information, Forbes found in June. In response to Forbes’ reporting, CEO Aravind Srinivas said the republishing feature, called Perplexity Pages, has “rough edges.” Forbes sent a cease-and-desist letter to Perplexity in June, accusing it of infringing copyright.

In October, the New York Post and Dow Jones sued Perplexity for alleged copyright infringement and attributing made up facts to media companies. At the time, Perplexity said the lawsuit reflects a posture that is “fundamentally shortsighted, unnecessary, and self-defeating.”

Earlier this month, yet another AI startup found itself in the crosshairs of media companies. A group of publishers including Condé Nast, Vox and The Atlantic filed a lawsuit against enterprise AI company Cohere for allegedly scraping 4,000 copyrighted works from the internet and using them to train its suite of large language models. (Forbes was part of the lawsuit.)

Rampant AI scraping isn’t just hurting publishers’ search traffic and revenue. As more and more bots visit websites to read and scrape their content, they’re also running up millions in server costs, Panigrahi said. With companies like OpenAI and Perplexity launching research AI agents that autonomously visit hundreds of sites to produce in-depth reports, the problem is bound to get worse.

One clear way to address this problem is licensing articles directly. For example, the Associated Press, Axel Springer and the Financial Times all have struck content deals with OpenAI. But a new cadre of companies has also emerged to find new economic models for publishers in the age of artificial intelligence. TollBit, for instance, charges AI companies each time they scrape content from a publisher’s site. TollBit works with 500 publishers including TIME, Hearst and Adweek.

“AI does not read like humans do. Humans will click one link, they’ll click the second link and then they’ll move on,” Panigrahi said. “AI will read 10 to 20 links to get their answer.

Look back on the week that was with hand-picked articles from Australia and around the world. Sign up to the Forbes Australia newsletter here or become a member here.

More from Forbes Australia

Avatar of Rashi Shrivastava
Topics: