Learning Human-Human Interactions in Images from Weak Textual Supervision – Interactive Visualization

(Homepage)

Below we show results of the various models discussed in our paper, on both the Waldo & Wenda and imSitu-HHI datasets. Select the dataset and model that you would like to use, and click "Previous" and "Next" to browse through the results. For models supporting beam search, we show the top 5 beams (out of 32). Items appear in random order Images are downscaled for convenience, but results are displayed for the original images. For externally hosted images, we provide an image URL rather than displaying the image here.


     (models with * are trained on our pseudo-labels)
           Item /