Kiki or Bouba?
Sound Symbolism in Vision-and-Language Models
NeurIPS 2023 ✨SPOTLIGHT✨
Text-to-image generations from prompts with "kiki" or "bouba".
Which images were generated with "kiki" and which with "bouba"?
(Hover to see the answers.)
TL;DR: Psychological experiments have shown that humans tend to associate certain speech sounds with certain visual shapes. We ask: What about AI models for tasks like text-to-image generation? By generating images using prompts containing pseudowords (nonsense words) and analyzing their shapes, we show that AI image generation models also show sound-shape associations.
Abstract
Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism. Among the many dimensions of meaning, sound symbolism is particularly salient and well-demonstrated with regards to cross-modal associations between language and the visual domain. In this work, we address the question of whether sound symbolism is reflected in vision-and-language models such as CLIP and Stable Diffusion. Using zero-shot knowledge probing to investigate the inherent knowledge of these models, we find strong evidence that they do show this pattern, paralleling the well-known kiki–bouba effect in psycholinguistics. Our work provides a novel method for demonstrating sound symbolism and understanding its nature using computational tools.
Pseudoword Image Gallery
Select a pseudoword to see corresponding image generations from Stable Diffusion. Images are generated with the prompt “a 3D rendering of a <w> shaped object”, with the pseudoword inserted in the slot <w>. Pseudowords containing “sharp” or “round”-associated sounds are indicated with the symbols ☆ and ◯.
Acknowledgements
This work was partially supported by the Alon Fellowship. We thank Gal Fiebelman and Taelin Karidi for their helpful feedback.
Citation
@InProceedings{alper2023kiki-or-bouba,
author = {Morris Alper and Hadar Averbuch-Elor},
title = {Kiki or Bouba? Sound Symbolism in Vision-and-Language Models},
booktitle = {Proceedings of Advances in Neural Information Processing Systems (NeurIPS)},
year = {2023}
}