Supercharging Floorplan Localization
with Semantic Rays

Yuval Grader
Tel Aviv University
Hadar Averbuch-Elor
Cornell University
🌴 ICCV 2025 🏄
arXiv Paper Code
Teaser
Which floorplan is easier to localize? The raw floorplan on top or the one enhanced with semantic labels on the bottom? As illustrated above*, localization using a raw, walls-only floorplan often produces ambiguous results. In this work, we introduce an approach that supercharges floorplan localization with semantic rays, effectively resolving ambiguities. Hover on the floorplans to view the predicted rays and the localization result in each case.
* with the method introduced in “Fusion and Filtering for Floorplan Localization”, CVPR 2024

Abstract

Floorplans provide a compact representation of the building's structure, revealing not only layout information but also detailed semantics such as the locations of windows and doors. However, contemporary floorplan localization techniques mostly focus on matching depth-based structural cues, ignoring the rich semantics communicated within floorplans. In this work, we introduce a semantic-aware localization framework that jointly estimates depth and semantic rays, consolidating over both for predicting a structural-semantic probability volume. Our probability volume is constructed in a coarse-to-fine manner: We first sample a small set of rays to obtain an initial low-resolution probability volume. We then refine these probabilities by performing a denser sampling only in high-probability regions and process the refined values for predicting a 2D location and orientation angle. We conduct an evaluation on two standard floorplan localization benchmarks. Our experiments demonstrate that our approach substantially outperforms state-of-the-art methods, achieving significant improvements in recall metrics compared to prior works. Moreover, we demonstrate that our framework can easily incorporate additional metadata such as room labels, enabling additional gains in both accuracy and efficiency.

Interactive Demo

Explore our localization results across different floorplans and images. Select a floorplan and an image to see how our method localizes the camera position and orientation. The demo shows the progression of our method's performance as we add more components:

Selected visualization
Ground Truth
Prediction

How does it work?

Method Pipeline
Our approach begins by predicting depth and semantic rays from a single input image, capturing both the scene's geometric structure and contextual cues. We then interpolate these rays to build a coarse structural–semantic probability volume over the floorplan, assigning each position and orientation a likelihood that the image was captured there. In the refinement step, we extract the Top-K candidate poses—enforcing a minimum spatial separation—and, for each candidate, recompute its score using the original fine-grained ray predictions to achieve a more accurate probability. Optionally, we predict the room in which the image was taken and, in high-confidence cases, apply a room mask to further narrow the search space.

Citation

@misc{grader2025superchargingfloorplanlocalizationsemantic,
      title={Supercharging Floorplan Localization with Semantic Rays}, 
      author={Yuval Grader and Hadar Averbuch-Elor},
      year={2025},
      eprint={2507.09291},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.09291}, 
}