Tumblr’s Data Sharing Raises Concerns About AI Training
Tumblr, the popular microblogging platform, is facing scrutiny after reports emerged that its parent company, Auttomatic, is selling user data to assist in training artificial intelligence (AI) models. The data is reportedly being provided to OpenAI and Midjourney for AI training purposes.
Auttomatic’s Response and Policies
In response to the reports, Auttomatic released a public blog post addressing the matter. The company clarified that while their sites currently block AI crawlers, they plan to share data with AI companies in the future. However, users will have the option to opt out of this data sharing, ensuring control and privacy.
The blog post emphasized Auttomatic’s commitment to respecting users’ preferences regarding attribution, opt-outs, and control over their data. Despite this assurance, concerns have been raised regarding the handling of user data and potential privacy implications.
Internal Messages and Content Compilation
Reports from 404 Media revealed internal communications within Auttomatic discussing the compilation of posts from 2014 to 2023 for AI training. Employees were reportedly tasked with gathering posts, including those from deleted or suspended blogs, private posts on public blogs, and private answers from the “Ask” function.
Notably, the report highlighted instances where content marked as NSFW or “mature” was included, despite guidelines prohibiting such inclusion. Tumblr’s evolving content policies, particularly regarding nudity and sexually explicit material, have added complexity to the data compilation process.
Impact on AI and User Content
The decision to provide user-generated content for AI training has sparked concerns among Tumblr users, particularly those who value the platform for its diverse and niche communities. The prospect of personal writings, photography, and artwork being used to train AI algorithms has raised privacy and ethical considerations.
Similar data-sharing arrangements exist on other social platforms, with Reddit licensing its data to Google for AI training, and Facebook and Instagram utilizing user data for internal AI tools. However, the practice remains contentious, as users grapple with the implications of their content contributing to AI development.
The Quirky Side of Tumblr and AI Training
Tumblr’s unique community and content landscape, characterized by its eclectic mix of fandoms and niche interests, pose interesting challenges and opportunities for AI training. The platform’s diverse array of content, including fanfiction and artwork, presents rich and varied data for AI models.
While the idea of AI encountering unconventional content, such as fanfiction featuring characters like Sonic and Tails, may seem amusing, it underscores the broader implications of AI training using user-generated content.
As discussions surrounding data privacy and AI ethics continue, platforms like Tumblr face pressure to balance innovation with user trust and privacy protection.