top of page

The Un-unalivable Word: On the Ethics of Algorithmic Censorship and Moderation

Content Warning: This piece discusses themes of censorship, hate speech, shadow-banning, racial and caste slurs, sexual violence, suicide, abortion, and online harassment. It also includes references to Algospeak involving these topics.


In the early days of the internet, no one really knew how to navigate this expansive informational space that could both enlighten you and scar you. However, platforms have made it increasingly clear that the logic driving content moderation is often not based on safety concerns or harm reduction measures. One can find harmful content on the internet quite freely, regardless of what one’s definition of harm is, and it is specifically because of these varied definitions of harm that content moderation guidelines are especially hard to codify.


The rise of the algorithm has made it so that almost everything can continue to exist, even after being formally taken down, in reposts, archives, and well-timed snapshots. Platform censorship is largely ineffective at controlling these entities, since content seems to spawn rapidly from other content— nothing can truly be banished from the internet. What is affected largely is their visibility, and that is where the workings of the algorithm get a little murkier. They are not strictly rule-based in a way that can be put down in content guidelines, but engagement-based.


That is why many find, to their dismay, that content that may be considered contentious to institutions of power is claimed to have been shadow-banned. If this is the case however, then it is not a technical failure but rather extremely deliberate use of technology to support institutional agenda. Whether or not there is an evidential basis to this claim is hard to prove since data on user engagement is generally not freely available. But it is true that users have come up with several ways to evade shadow-banning by modifying how they communicate or present information online. 


The Politics of Speech Throughout history, censorship has rarely functioned to “protect” the public in the way it claims. More often, it has served as a tool for those in power to maintain hegemonic control. Even in cases where censorship is argued to have moral justification, it inevitably concentrates moral authority in the hands of a few — granting them the right to decide what others can or cannot access. This dynamic is not only paternalistic; it directly contradicts the democratic ideals that modern digital platforms claim to uphold.


The harm the internet is capable of demands mitigation undoubtedly. The question is if privatised stakeholders are the best agents to enforce restrictions on online communication. That is not to say that state actors would not leverage content restrictions to suit their own agenda. In any case, one finds that hate speech is rampant on the internet and hardly ever addressed by either public or private agents, and what often gets invisibilised is certain ideological content. 


That has however always been the case as mentioned before, regardless of what form of communication one uses. Language has evolved and grown around such restrictions to be able to represent information that would otherwise offend enforcers of such propriety in sly manners that cannot be formally controlled. One sees this in the phenomenon of Algospeak— a modern form of coded communication whose purpose is to sidestep the elusive content moderation measures of the internet. 


That is why one may often find people talking about “seggs education” on the internet, or how several individuals were “unalived”. They would often direct you to the “blink in lio”. Some will feature gestures instead of audio, others complete silence which has somehow come to gain cultural meaning. Abortion may be referred to as “camping”. Algospeak also exists to counter moderation on discourses such as gun violence or anti-vaccination movements. All in all, it has been difficult to contain and constrict human communication because the dynamic entity of language always finds a way to evade any attempts at stifling. Human thought, no matter how disturbing, has a propensity to demand to be engaged with.


The Walls Understand You Too


With the rise of Artificial Intelligence, however, it will inevitably become increasingly difficult to circumvent online censorship. A 2024 study on evasion mechanisms against online moderation finds that an LLM-based approach yields a 79.4% accuracy rate while detecting the meaning of certain Algospeak terminology. The paper identifies seven primary classes of Algospeak:


  1. Leetspeak or modifying how the word is spelled to produce an unknown word (“bl00d” for “blood”, “blk” for “black”, “tism” for “autism”)

  2. Replacing one or more letters in the spelling of the word to produce another known word (“corn” for “porn”, “cornucopia” for “homophobia”, “grape” for “rape”)

  3. Abbreviations or acronyms (“SA” for “sexual assault”, “SH” for “self-harm”, “SSA” for “Same Sex Attraction”)

  4. Pictorial representations using emojis or emoticons (“🍉” for “Support for Palestine”, “❄️” for “cocaine”)

  5. Paraphrasing (“unalive” for “kill/die”, “opposite of love” for “hate”)

  6. Using existing words to mean something else (“Accountant” for “Sex Worker”, “Backstreet Boys Reunion Tour” for “the COVID-19 pandemic”)

  7. Phonetic resemblance (“not see” for “NAZI”, “kermit sewer slide” for “commit suicide”, “yt” for “white”)


Often Algospeak also employs more than one of these modifications. For example, pornography has come to be represented by the emoji for corn (🌽) rather than the word itself. GPT-4 seems to fare better when it comes to Algospeak within context. While the paper takes a largely negative stand against Algospeak, it acknowledges that certain marginalised communities may sometimes need to resort to such measures for free expression and advocates for a context-aware approach to detecting and controlling illicit communication.


Contextual Blind Spots


The newly developed capabilities of large language models at detecting meaning despite attempted semantic-level blurring have implications both good and bad. It is precisely because GPT-4 is more context-aware that it could be used for ideological targeting. 


There are of course caveats to hate speech detection when it strategically employs obfuscation especially when it comes to low-resource languages. Caste-related hate speech in India is significantly more difficult to detect since it often appropriates names of caste groups and uses them derogatorily. For example, while discourse on racist slurs has epistemologically evolved to a point wherein GPT-4’s response accurately recognises and unequivocally condemns the use of the n-word, as it should, it still largely remains contextually unaware of the casteist implications of certain words like “chhapri”— a caste name used derogatorily to connote “cringe” or “cheap”. Since the casteist dimensions of a lot of commonly used communicative elements in India are not even problematised when they are explicit in nature, what hope do we have for evasive and coded discriminatory language?


To Disturb the Comforted, To Comfort the Disturbed


Censorship and content moderation are inherently complex, with no universal consensus on the ethical frameworks that should guide them. As a result, platform governance and algorithmic behaviour must be grounded in socially and culturally informed approaches. They must take care to protect the right to expression of marginalised communities, while also ensuring that the internet remains a safe space for them to speak their truth. An ideal world would see social scientists and technologists working hand-in-hand to address the gaps that arise when ethical nuance is lost in technical abstraction. Only through such interdisciplinary collaboration can we begin to design systems that are ethically robust and capable of recognising harm without silencing dissent.


In his 1859 essay On Liberty, John Stuart Mill states:


“The only purpose for which power can be rightfully exercised over any member of a civilized community, against his will, is to prevent harm to others.”


Censorship can only be justified if it serves to prevent tangible harm—not merely discomfort, dissent, or the challenging of dominant ideologies. Yet in practice, the line between harm and offence is often blurred, particularly when marginalised voices are the ones speaking out. When platforms claim to act in the name of safety, they must be critically examined: whose safety, and at what cost?


コメント


bottom of page