AIES '24 Oral

Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis

Chahat Raj^*, Anjishnu Mukherjee^*, Aylin Caliskan, Antonios Anastasopoulos, Ziwei Zhu

* Equal contribution

Breaking Bias poster-style figure summarizing contact-hypothesis bias probing and mitigation. — We simulate social contact in prompts, measure social biases across 13 dimensions, and reduce biased behavior through Social Contact Debiasing.

Abstract

Large Language Models (LLMs) perpetuate social biases, reflecting prejudices in their training data and reinforcing societal stereotypes and inequalities. Our work explores the potential of the Contact Hypothesis, a concept from social psychology for debiasing LLMs. We simulate various forms of social contact through LLM prompting to measure their influence on the model's biases, mirroring how intergroup interactions can reduce prejudices in social contexts. We create a dataset of 108,000 prompts following a principled approach replicating social contact to measure biases in three LLMs (LLaMA 2, Tulu, and NousHermes) across 13 social bias dimensions. We propose a unique debiasing technique, Social Contact Debiasing (SCD), that instruction-tunes these models with unbiased responses to prompts. Our research demonstrates that LLM responses exhibit social biases when subject to contact probing, but more importantly, these biases can be significantly reduced by up to 40% in 1 epoch of instruction tuning LLaMA 2 following our SCD strategy.

Highlights

Contact probingUses social-contact prompts to expose how LLM responses vary across social bias dimensions.

Large prompt setBuilds 108,000 prompts spanning 13 bias dimensions and three LLMs.

DebiasingInstruction-tunes models with unbiased responses, reducing measured biases by up to 40% after one epoch.

BibTeX

@article{raj2024breaking,
  title = {Breaking Bias, Building Bridges: Evaluation and Mitigation of Social Biases in LLMs via Contact Hypothesis},
  author = {Raj, Chahat and Mukherjee, Anjishnu and Caliskan, Aylin and Anastasopoulos, Antonios and Zhu, Ziwei},
  journal = {Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society},
  volume = {7},
  number = {1},
  pages = {1180--1189},
  year = {2024},
  doi = {10.1609/aies.v7i1.31715},
  url = {https://doi.org/10.1609/aies.v7i1.31715}
}