Reimagining the future of data and AI labor in the Global South

Full Text

Everyday tools like artificial intelligence (AI) and social media algorithms aren’t just powered by technology—they require human workers to sort through our content, labeling, tagging, transcribing, and processing data.

Platforms to support this work have existed since at least 2005, but outsourcing labeling, often to workers in the Global South, has become increasingly lucrative over the past few years due to the increasing demand for data.

In fact, the World Bank estimates there are between 150 and 430 million data laborers whose work ultimately drives cutting-edge technological development.

These individuals, who often work in “ digital sweatshops, ” consistently report poor working conditions, exploitation, and forms of psychological distress.

In Africa and South and Southeast Asia, it is not uncommon for workers to work up to 20 hours a day, sift ing through 1,000 cases in a shift.

While workers have formed unions and advocacy groups, unclear business process outsourcing (BPO) practices, a lack of regulatory guardrails on gig platform labor, and the uncertain future of data work limit the capacity of data laborers to organize and demand fair and transparent working conditions.

What this work looks like

Annotated data sets are used to train AI models that learn patterns to then generate content, make predictions, or complete classification tasks.

Data annotation, processing, and evaluation are also key components of content moderation systems that filter out graphic, harmful, and hateful content from platforms.

Making micro-decisions throughout the data pipeline requires contextual human understanding and is often outsourced to the Global South through BPO and digital labor platforms.

In some cases, data laborers interact with toxic, graphic, and hateful content under distressful and exploitative working conditions—ironically, to train systems that shield users from the same disturbing content.

The conditions surrounding this work are cause for concern.

Oxford’s Fairwork project surveyed over 700 workers who work on digital labor platforms, and concluded that none of the 15 assessed platforms score better than the “bare minimum” on fair pay, conditions, contracts, management, and representation.

A 2025 Equidem survey of 76 workers from Colombia, Ghana, and Kenya reported 60 independent incidents of psychological harm, including anxiety, depression, irritability, panic attacks, post-traumatic stress disorder (PTSD), and substance dependence.

Workers also noted forced unpaid overtime, no fixed salary, and instances of companies withholding payments.

Contract workers in Ghana report “grueling conditions” from moderating disturbing content: murders, extreme violence, and sexual abuse, for example.

One former content moderator said he read up to 700 sexually explicit and violent pieces of text per day, with the psychological toll of his work causing him to lose his family.

Due to this exposure, many workers experience depression, anxiety, and suicidal ideation.

BPO practices obstruct meaningful accountability

Since data workers are often subcontracted by multinationals, such as Big Tech companies, through third-party vendors and agencies, workers often do not possess clear avenues for reporting grievances and unfair labor practices.

Despite reports from investigative journalists and research institutes about psychological trauma and exploitative working conditions in certain forms of data work, some companies manage to avoid accountability by leveraging the ambiguity around which entity bears responsibility for maintaining adequate labor conditions.

These platforms often do not provide clear dispute mechanisms that workers can use to elevate their concerns.

Workers also frequently do not know which systems their work will train or build: One investigation found that Kenyan data labelers working for the platform Remotasks were unaware it was a subsidiary of ScaleAI, a company that provides data to Big Tech companies.

This problem extends to the entire industry: Opaque supply chains limit workers’ ability to challenge exploitative labor practices.

Challenges to worker exploitation

Several lawsuits, researchers, and grassroots organizers have pushed back against these labor practices.

For example, content moderators have formed the African Content Moderators Union and the Global Trade Union Alliance of Content Moderators to fight for fair wages and safe working conditions across borders.

In Kenya, workers have launched the Data Labelers Association to fight for better working conditions, fair pay, and mental health support.

That said, many individuals face threats of retaliation or actual retaliation.

In Turkey, content moderators alleged they were fired by a company providing outsourcing services to TikTok for their attempts to unionize.

Research and advocacy groups also hope to document and elevate related concerns.

For example, the “Data Workers Inquiry” is a global research initiative that empowers data workers to be advocates and community researchers.

Another example is Turkopticon, an advocacy and mutual aid group that organizes to better the working conditions of Amazon Mechanical Turk (MTurk) workers.

Smaller data labeling platforms, such as Karya, also provide an ethical alternative to traditional data labeling work by promising fair wages and economic opportunity to rural Indians.

Lawsuits and investigations have also been initiated in various jurisdictions.

In Kenya, a court ruled that a platform could be sued for its mass layoffs of content moderators who alleged exploitation and deteriorating mental health.

The Colombian Ministry of Labor launched an investigation at the end of 2022 into Teleperformance, a third-party vendor providing data to TikTok, for exposing workers to distressing content while paying them as little as $10 a day.

Meta currently faces lawsuits in Ghana, where moderators working for Majorel, a BPO company, allege terrible working conditions that include cramped living quarters and exposure to depiction of murders, extreme violence, and abuse.

Despite these legal efforts, this whack-a-mole approach of filing lawsuits and investigations one at a time cannot effectively prevent structural labor abuses from occurring.

The implications of automated content moderation on labor and inclusive technological development

While workers organize and challenge exploitative labor practices in court, companies have focused on further developing machine learning algorithms and tools to detect potentially harmful content and assess the privacy and social risks of products.

They hold great promise in reducing the psychological burden of data labor, but companies should not treat automated content moderation and annotation as substitutes for establishing fair and transparent labor practices.

Through machine learning classifiers, hashing (e. g., removing images containing child sexual abuse material), and keyword filters, AI-assisted content moderation aims to replace a portion, or all, of human labor involved in content moderation.

These tools have existed in some form for a while; examples include the Washington Post’s ModBot, which was launched in 2017, as well as Google’s Jigsaw toxicity-reducing API, a free and open-source tool to assist moderators in managing online toxicity and harassment.

AI models that support content moderation may reduce the psychological and emotional burden of data work on humans.

They can also be deployed at scale, allowing moderation to occur quickly and efficiently.

On the other hand, automated content moderation introduces concerns around contextual understanding, accuracy, and transparency in decision-making.

Biased data labeled by humans may be reinforced at scale by algorithms.

AI models trained on historic datasets are also unable to account for culturally and context-specific forms of expression that evolve over time.

This is especially true in data-sparse contexts, such as those involving low-resource languages.

For instance, algorithms would likely underperform when working in contexts with code mixing, algospeak (a form of resistance to evade content moderation algorithms), and new linguistic forms, such as the Kiswahili variation Sheng.

Malicious actors have also leveraged linguistic shortcomings in content moderation systems to show explicit content from religious queries.

One study that evaluated the Jigsaw toxicity-reducing API reported high false positives from automated content moderation.

Another document leaked from Facebook noted that algorithms incorrectly removed and flagged nonviolent Arabic content as “terrorist content” 77% of the time, which censored reporting of alleged war crimes.

Transparency and explainability in content moderation decisions are already limited and inconsistent, a problem that may be further amplified with automated content moderation due to algorithms’ inability to consistently and faithfully explain their reasoning.

Despite their limitations, some forms of automated task completion and content moderation may be useful for low-wage data work that involves toxic and harmful content.

Beyond complete automation of data tasks, AI can also be used to pre-process data and blur potentially graphic images for data laborers, even in contexts other than content moderation.

The path forward

Given these trends, stakeholders involved should take decisive action to promote transparency and ethical labor practices in data labeling work.

International bodies should clarify responsibility around worker protection in BPO contexts, pursuant to international principles and discussions around fair labor, such as the UN Guiding Principles on Business and Human Rights and documents released from the International Labour Conference’s standard-setting committee on Decent Work in the Platform Economy.

Giv

...