Emergent Visual-Semantic Hierarchies in Image-Text Representations – Interactive Visualization

(Homepage)

Below we show samples from the HierarCaps train set, along with hierarchical retrieval results on the HierarCaps test set (using CLIP-Large before and after fine-tuning). Select the dataset split and model that you would like to use, and click "Previous" and "Next" to browse through the results. Items appear in random order, and the 100 train items are selected randomly from the 73K-item train set. For externally-hosted images, we provide a link rather than reproducing the image directly. For test set predictions, texts are retrieved from the expanded candidate set used for qualitative results (not only those in the 1K-item manually-curated test set).


           Item /