Tumblr's Data Deal Dance with AI Titans Raises Eyebrows

The owner of Tumblr and WordPress.com is in talks with AI companies Midjourney and OpenAI to provide training data scraped from users’ posts, a report from 404 Media alleges. The report, based on an anonymous source inside the company, says that deals between Automattic and the two AI companies are “imminent.” It follows nebulous rumors that have spread on Tumblr over the past week, suggesting a deal with Midjourney could provide a new revenue stream for the site.

According to 404’s report, Automattic plans to launch a new setting Wednesday that will “allow users to opt-out of data sharing with third parties, including AI companies.” But it cites internal posts that suggest the company scraped an “initial data dump” containing “all Tumblr’s public post content between 2014 and 2023,” including — apparently by mistake — content that wouldn’t be publicly visible on blogs. It’s unclear what was done with this data and what data (if any) has been sent to Midjourney and OpenAI.

OpenAI and Midjourney did not immediately respond to requests for comment from The Verge. Automattic directed us to a public statement it published on Tuesday following 404’s report. The post, titled “Protecting User Choice,” alludes to partnerships with unnamed AI companies. “We currently block, by default, major AI platform crawlers — including ones from the biggest tech companies — and update our lists as new ones launch,” it says, and “will share only public content that’s hosted on WordPress.com and Tumblr from sites that haven’t opted out.” It goes on to note that “we are also working directly with select AI companies as long as their plans align with what our community cares about: attribution, opt-outs, and control.”

A number of companies have struck deals with AI tool makers to provide training data — which has historically been scraped from publicly available online data, a process that’s become legally riskier in recent years. Reddit reportedly has a $60 million annual deal with Google, while Shutterstock has signed a deal with OpenAI to train on its photo library. But a number of artists and writers — in other words, the creative community that Tumblr in particular caters to — have protested their work being used for training. Companies have struggled to walk a line between satisfying users and experimenting with new AI tools, leading to backlash against online spaces like DeviantArt that have flirted with the tech.

For now, there’s not much information about what any deal would entail, nor how much Automattic stands to gain from it. The company has a long-standing web hosting business with WordPress.com and WordPress VIP, both built on the open-source WordPress software. But it’s struggled with a variety of methods for monetizing Tumblr — which it acquired from Verizon in 2019 — and announced that it would downscale its ambitions for the site last year.

Made with TRUST_AI - see the Charter: https://www.modelprop.co.uk/trust-ai

<path fill-rule="evenodd" cli