Global Voices, Local Biases: Socio-Cultural Prejudices across Languages

George Mason University
*Equal contribution
Teaser Image

WeatHub is a step towards measuring biases across languages and cultures by providing a comprehensive multilingual benchmark.

Abstract

Human biases are ubiquitous but not uniform: disparities exist across linguistic, cultural, and societal borders. As large amounts of recent literature suggest, language models (LMs) trained on human data can reflect and often amplify the effects of these social biases. However, the vast majority of existing studies on bias are heavily skewed towards Western and European languages.

In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies and yielding interesting findings about LM bias. We additionally enhance this data with culturally relevant information for each language, capturing local contexts on a global scale. Further, to encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more. Moreover, we delve deeper into the Indian linguistic landscape, conducting a comprehensive regional bias analysis across six prevalent Indian languages. Finally, we highlight the significance of these social biases and the new dimensions through an extensive comparison of embedding methods, reinforcing the need to address them in pursuit of more equitable language models.

Video

Biases in traditional WEAT dimensions

Semantics derived automatically from language corpora contain human-like biases. by Caliskan et al. introduced the WEAT test for measuring implicit biases in word embeddings and further work from other researchers have shown that 6 of the 10 original categories can be somewhat replicated in multilingual settings.

Our work is one of the most comprehensive in this aspect, covering 24 languages including many from the global south which usually do not receive much attention in studies about bias.

Teaser
          Image

Human-centered Contemporary Biases

We further build on this to propose 5 (+2) new dimensions of bias to cover contemporary aspects like ableism, toxcitiy, sexuality, immigration and education. We explore the dimensions of sexuality and ableism from the angle of valence (associations with pleasant and/or unpleasant words) resulting in the 2 new angles.

Table for New bias dimensions Heatmap of new bias dimensions

Multilingual vs Monolingual Models

We find that the biases reflected in monolingual models tend to be more aligned with human biases. Multilingual models trained on data from multiple languages at the same time may not be as accurate at reflecting cultural biases for each of those languages as accurately as monolingual models. This leads to the idea that it may be more useful to explore monolingual models than multilingual ones for understanding biases in different cultures and their correlation with language.

Teaser Image

The Need for Human translations

We find WEAT effect sizes from human translations are almost always larger than those from machine translations, across embeddings from static and contextual models, suggesting that sole reliance on MT may not suffice for bias evaluation across languages.

Teaser Image

Therefore, we recommend utilizing human-annotated data for an accurate and fair assessment of bias across languages instead of solely relying on machine translation systems.

Code

Our data and code are available. The recommended approach for using the data is via huggingface datasets. The snippet below is representative of the steps required to load WEAT data in any of the languages or dimensions discussed in our paper.


from datasets import load_dataset

dataset = load_dataset("iamshnoo/WEATHub")

example = dataset["original_weat"][0]

target_set_1 = example["targ1.examples"]
target_set_2 = example["targ2.examples"]
attribute_set_1 = example["attr1.examples"]
attribute_set_2 = example["attr2.examples"]
    

The HuggingFace dataset also has an extensive Dataset Card describing all the common questions about the dataset and its usage.

Contributing

If a language you are interested in is not included in our dataset, please consider contributing to our dataset. To do so, simply send an email to amukher6 at gmu dot edu with [WEATHub] in the subject line. We will share a link to your email where you can add your contributions!

The languages we currently have in WEATHub (our dataset) are:

Arabic (ar) Bengali (bn) Sorani Kurdish (ckb) Danish (da) German (de) Greek (el) English (en) Spanish (es) Persian (fa) French (fr) Hindi (hi) Italian (it) Japanese (ja) Korean (ko) Kurmanji Kurdish (ku) Marathi (mr) Punjabi (pa) Russian (ru) Telugu (te) Thai (th) Tagalog (tl) Turkish (tr) Urdu (ur) Vietnamese (vi) Mandarin Chinese (zh)

BibTeX

If you find our work useful, please cite it!

@inproceedings{mukherjee-etal-2023-global,
  title = "Global Voices, Local Biases: Socio-Cultural Prejudices across Languages",
  author = "Mukherjee, Anjishnu  and
    Raj, Chahat  and
    Zhu, Ziwei  and
    Anastasopoulos, Antonios",
  editor = "Bouamor, Houda  and
    Pino, Juan  and
    Bali, Kalika",
  booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
  month = dec,
  year = "2023",
  address = "Singapore",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.emnlp-main.981",
  doi = "10.18653/v1/2023.emnlp-main.981",
  pages = "15828--15845",
  abstract = "Human biases are ubiquitous but not uniform: disparities exist across linguistic, cultural, and societal borders. As large amounts of recent literature suggest, language models (LMs) trained on human data can reflect and often amplify the effects of these social biases. However, the vast majority of existing studies on bias are heavily skewed towards Western and European languages. In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies and yielding interesting findings about LM bias. We additionally enhance this data with culturally relevant information for each language, capturing local contexts on a global scale. Further, to encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more. Moreover, we delve deeper into the Indian linguistic landscape, conducting a comprehensive regional bias analysis across six prevalent Indian languages. Finally, we highlight the significance of these social biases and the new dimensions through an extensive comparison of embedding methods, reinforcing the need to address them in pursuit of more equitable language models.",
}