Soap Bubbles in Hakone, Kanagawa Prefecture, Japan. 2017. Photograph taken by Kyosuke Nakamura. | Flickr (edited by the Author)
Floating Up: An Interview with the creator of Bubbles.town, Benjamin Behnke
Two days ago, in my article where I talk about genAI comments and human, messy art, I briefly mentioned a new community-driven aggregator for independent personal blogs called Bubbles.town, with blog posts ranked by votes and freshness and shaped by users.
Think Hacker News or Reddit, but exclusively for the indie blogosphere. You sign in with a Fediverse account to vote, and posts bubble up based on community upvotes combined with recency. It has categories like Writing, Tech, Culture, Life, Science, History, Gaming, etc., and recently added a daily "Briefing" that presents top posts in a newspaper-style format.
Little did I know that there was discourse and controversy about this project being discussed on the 32-bit Café forum.
There was criticism from the outset, my good friend Coyote stating that "[i]f their criteria [for good blog posts] involve Reddit-style popularity contests then I expect we have different priorities." Bubbles.town taking people's blog posts and essentially ranking them already seemed antithetical to the ethos of personal blogs and the IndieWeb to some.
But the criticism certainly did not stop there. It turned out that the curation and organization of the +5,000 blogs was initially indexed with Anthropic's Claude, a generative AI LLM. After this, it was also disclosed the code for the project itself was also created with the help of Claude, and is currently closed-source as a result.
You can see why this would cause a stir, right? The IndieWeb, and creatives more broadly, are anti-genAI. The entire point is to have a human web, made by people for other people. A blogger initially donated to Bubbles.town under the impression the creator, Benjamin Behnke, hand-selected and vetted the thousands of blogs themselves. Bubbles.town, perhaps ironically, explicitly states they don't allow genAI blogs in their FAQ:
"Personal voice. The author writes in their own voice, with opinions and perspective. No neutral reporting, no AI-generated content."
As it stands, there are people who are anti-AI absolutists, and they refuse to use a project if any genAI is involved. There are others who would not use bubbles.town because the disclosure wasn't upfront, and was clumsy—and that can be interpreted as deceptive. And there are others who are simply not interested in using a ranked system to browse recent posts from the blogosphere in principle.
None of the above describes me, however.
I have written before about software harm reduction, which is to say, there are software projects such as Python, Hugo, VLC, Anki flashcards, Brew, MacPorts, etc. that I use, all now compromised by LLM-generated additions to their codebases.
This is categorically bad, and I hope these projects reverse their permissive policies, but in the meantime our reality is one where the largest software projects have LLM genAI code.
As I mentioned in my post criticizing Cory Doctorow's apologetics on genAI, I am used to this frustrating hypocrisy and cognitive dissonance:
"There are many people that are passionately and militantly against genAI use, but still participate in meat-eating and support the meat industrial complex. Being plant-based means facing people who will come up with any number of excuses to shrug off their meat consumption. Devine Lu Linvega has a great page on this on their wiki."
I am not a radical, militant vegan, and I am also not radical nor militant in my anti-AI sentiments. I understand this looks like complicity, and I admire others like Coyote who are more steadfast and principled than I am. But this is not who I am, I do not block AI scrapers from my own site. I also said I wouldn't talk about genAI again, and yet here I am.
All of this is to say I wanted to give bubbles.town and its creator, Ben Behnke, a fair shake. I decided to email him a list of questions to see if there was a reasonable, good-faith reason why there were changes to the Privacy Policy and FAQs of the site. And I'm glad I did. Here's what he sent me:
Ben's Response
thanks for reaching out and thanks for asking before publishing. I'd much rather get on the record with you than have my answers reconstructed from screenshots.
In short: I fucked up, and I'm sorry!
I am working on Bubbles in my spare time as a hobby project, besides my full time job, family and health. I am not a greedy megacorp that wants to squeeze the last buck out of its user base, just one passionate guy that sometimes makes good, sometimes bad decisions. And sometimes really bad ones. I also did not expect Bubbles to blow up like this, so bad decisions would be in the spotlight so quickly. I expected maybe a few hundred visitors in the first weeks, definitely not 10k.
Bubbles ran for about two weeks with Anthropic's Claude Haiku as the post categoriser. I received complaints that the original, blog based categorisation was not good enough, because indie blogs tend to write about lots of things. I did not want to remove categorisation again because it's really helpful for discovery, so the API call was a quick way to get categories assigned to individual entries. Unfortunately I did not think much about it. It even cost me money. It just worked as a quick fix for a problem I wanted to solve. As the site grew I started receiving feedback from some blog owners, including a few who pointed out that they had AI crawlers blocked in their robots.txt and were not happy that their feed contents were being forwarded to an LLM anyway. That feedback was the right feedback! Going through a third-party crawler block via my own bot is not in the spirit of the small web I want Bubbles to be part of.
After a weekend in the woods with my family, yesterday evening I finally found some time and removed the entire AI-classification pipeline from the codebase, and also added stricter robots.txt enforcement. There are no API calls to Anthropic, OpenAI, or any other AI provider from the Bubbles server today. Categorisation runs through a small local Naive Bayes model that I have trained on the categorisation Claude did previously. That took some time to set up. The trained vocabulary is published at bubbles.town/classifier/model_en.json and bubbles.town/classifier/model_de.json so anyone can download it and see exactly what the classifier knows. Since yesterday, no data leaves the Bubbles server!
Your mail and the changes crossed in the post, basically. Changing the privacy page to reflect the new setup was the last thing before I went to bed yesterday. Your mail was the first thing I saw when I woke up. I'd already planned a blog post explaining the move and what changed in the privacy policy as a result. It was simply too late in the day yesterday to write it as well as publish. That post will go up in the next few days. I try to incorporate user feedback as much as possible, like adding entry-based categorisation, search-based RSS feeds, Bubbles Briefing, but also removing AI or stricter robots.txt handling. It just takes time and I am constantly learning.
I'm planning to make Bubbles Open Source too, but I want to do a proper cleanup first, knowing now that so many eyes are on Bubbles, and that takes time. I mention it because some of the answers below would be easier to verify against source code, and I want to be upfront that the verification will have to wait for the release.
Our Interview
- The privacy policy changed. Was that intentional, and was it transparent?
Intentional, yes. Transparent enough? No. When I cut the AI I rewrote the privacy policy to describe the new local classifier and removed Anthropic from the third-parties list, because the policy is supposed to describe what currently happens. I also added a separate question on AI usage in the FAQ. But I missed to provide the explanation at the same time, addressed to readers and blog owners, of why the policy looks different now and what was running before. That explanation belongs in a blog post on the Bubbles blog, and I am writing it. It will go up in the next few days.
- Is publishing a feed equivalent to opt-in to AI classification?
No, and I should not have phrased it like this. RSS is a syndication standard that predates LLMs by twenty years. The implicit social contract around it is that aggregators and feed readers may consume the feed. Sending the feed text through a commercial AI service is a different category of use, and various authors have made it explicit in their robots.txt that they want no part of it. Bubbles was, in effect, routing around those signals via the BubblesBot identity. That was absolutely the wrong call, and it is the reason the AI classifier is gone for good.
- Does robots.txt apply on an ongoing basis, and what happens if a blogger updates it after inclusion?
Since yesterday's changes it does, yes. The poller now checks robots.txt before every fetch. If a blog's robots.txt newly disallows BubblesBot or * against the feed path, the next poll cycle deactivates that blog and stops fetching. I ran this against the existing list right after deploying. A little over a hundred blogs were deactivated on the spot because their robots.txt blocked access. Most are platform defaults where the author probably never touched the file, some are explicit. Either way, they are off Bubbles and stay off until the author asks otherwise.
- What do you say to bloggers who feel their work was used in a way they wouldn't have consented to?
That they are right to feel that way, and that I am sorry. The line between "this is a public feed, anyone can read it" and "this content is being sent through a commercial LLM as part of an opaque pipeline" is there, and I crossed it without asking. I cannot un-send the API requests that happened over the two weeks the AI classifier ran. What I can do, and what I have done, is remove the dependency, harden the robots.txt handling, and write to anyone who wants their blog removed at feedback@bubbles.town with no questions asked.
- Are you open to making consent explicit, e.g. opt-in for AI classification?
I don't know how to answer that question after the changes I have made yesterday. User data will not run through AI again, I've learned my lesson. I think that a general opt-in for being listed on Bubbles would not work for an aggregator, then Bubbles would cover maybe 50 Blogs, not 5k.
Conclusion: Bubbles Gordon
Bubbles are wonderfully delightful, aren't they? Floating nonchalantly in a summer sky, shimmering and radiant with light. As temporary as they are weightless.
When my grandma was born, she was named by her older brother instead of her parents—she was the youngest of over a dozen and born in the late 1920s. Her brother saw her drooling, spit bubbling at her tiny infant mouth, and named her Bubbles. She went by Sophie, but she was always Grandma Bubbles to me. She was an incredible person with a rich life. I walked her down the aisle at her 2nd wedding. And because of her, maybe I'm biased here. But I do know my Grandma taught me so much about forgiveness, about looking past previous mistakes and regrets.
I don't know if Ben's response is sufficient for you. Maybe the damage has already been done, and you've already moved on.
For me, though, this response makes sense. This project is being worked on by a single person, and he's paying for it out of his own pocket. I do think it's difficult to conceptualize how it'd be a good idea to use genAI for a project meant for creative humans on the IndieWeb. But I also know I've made far worse, dumber mistakes in my life. Bubbles, after all, pop. That's what they do. Bubbles rise, catch light, and vanish. My grandma understood this better than most, living through a century of things that didn't last, and she never thought that made them less worth loving.
A bubble is a small miracle of surface tension, existing only because it's holding something together, and the moment it stops, it's gone.
The IndieWeb is full of projects like Bubbles.town. Labours of love built by one person in the margins of a life, always one family emergency or one bad week away from going dark forever. Most of them will.
Ben screwed up. He said so. He fixed it. The human web certainly doesn't require perfection, just honesty when we fall short. A builder who responds to criticism by going into the woods with his family for a weekend and coming back with a Naive Bayes classifier and a rewritten privacy policy is the kind of person I want building things for this corner of the internet.
Comments
To comment, please sign in with your website:
How it works: Your website needs to support IndieAuth. GitHub profiles work out of the box. You can also use IndieAuth.com to authenticate via GitLab, Codeberg, email, or PGP. Setup instructions.
Signed in as:
No comments yet. Be the first to share your thoughts!