Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models

George Mason University
Teaser Image

Models associate countries with cultural artifacts
which can be identified and replaced for cultural adaptation.

Abstract

We present a comprehensive three-phase study to examine (1) the cultural understanding of Large Multimodal Models (LMMs) by introducing Dalle Street, a large-scale dataset generated by DALL-E 3 and validated by humans, containing 9,935 images of 67 countries and 10 concept classes; (2) the underlying implicit and potentially stereotypical cultural associations with a cultural artifact extraction task; and (3) an approach to adapt cultural representation in an image based on extracted associations using a modular pipeline, CultureAdapt. We find disparities in cultural understanding at geographic sub-region levels with both open-source (LLaVA) and closed-source (GPT-4V) models on Dalle Street and other existing benchmarks, which we try to understand using over 18,000 artifacts that we identify in association to different countries. Our findings reveal a nuanced picture of the cultural competence of LMMs, highlighting the need to develop culture-aware systems.

BibTeX

If you find our work useful, please cite it!

@misc{mukherjee2024crossroadscontinentsautomatedartifact,
        title={Crossroads of Continents: Automated Artifact Extraction for Cultural Adaptation with Large Multimodal Models},
        author={Anjishnu Mukherjee and Ziwei Zhu and Antonios Anastasopoulos},
        year={2024},
        eprint={2407.02067},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        url={https://arxiv.org/abs/2407.02067},
  }