(TLDR: Tumblr produced a garbage algorithm that over-fit to the training data and forgot quite how horny everyone is on this site.)
OK so Tumblr’s dying right now and in their death throes they’re marking posts as NSFW left right and centre. Random images are being labelled as explicit despite being scenic landscapes or whatever the fuck. But I think I might have an answer as to why. I am not an expert or anything, nor do I have any particular insight into the dumpster fire that is managing a social media platform. I know a little bit about machine recognition and image recognition, so let see how far that can take us.
Image categorisation is super fucking tricky. There’s a whole bunch of different methods one can use, but a lot of the time they require a lot of data. This data usually takes the form of images that have been manually labelled with “Dick”/”Not Dick”, and from there we can use machine learning to blah blah blah profit. But the important part in this case is gathering the data. Tumblr already has plenty of explicit images on it, most of them helpfully labelled as #nsfw by the user. So we can just point our machine learning gizmos at this and let them run their magic right?
Right!
Except.
Think of the kind of explicit material on Tumblr. Beyond all the generic sex-bots and pornographers lies a sea of smut the likes of which the mortal mind cannot comprehend. Whole reams of creatures and inanimate objects with every permutation of genitals in, on, and around them. If you can picture it, someone wants to fuck it. And there’s little way for an algorithm to meaningfully distinguish between SuperMarioTheChillDude.jpg and SuperMarioTheCumHungrySlut.png, so It goes “Well they look pretty similar, so they’re probably both porn IDK.”
So the next time you find a picture of a cool rock labelled super horny, pour one out for all the rock fuckers out there.







