Supercharging Floorplan Localization with Semantic Rays

Which floorplan is easier to localize? The raw floorplan on top or the one enhanced with semantic labels on the bottom? As illustrated above*, localization using a raw, walls-only floorplan often produces ambiguous results. In this work, we introduce an approach that supercharges floorplan localization with semantic rays, effectively resolving ambiguities. Hover on the floorplans to view the predicted rays and the localization result in each case.
* with the method introduced in “Fusion and Filtering for Floorplan Localization”, CVPR 2024

Abstract

Floorplans provide a compact representation of the building's structure, revealing not only layout information but also detailed semantics such as the locations of windows and doors. However, contemporary floorplan localization techniques mostly focus on matching depth-based structural cues, ignoring the rich semantics communicated within floorplans. In this work, we introduce a semantic-aware localization framework that jointly estimates depth and semantic rays, consolidating over both for predicting a structural-semantic probability volume. Our probability volume is constructed in a coarse-to-fine manner: We first sample a small set of rays to obtain an initial low-resolution probability volume. We then refine these probabilities by performing a denser sampling only in high-probability regions and process the refined values for predicting a 2D location and orientation angle. We conduct an evaluation on two standard floorplan localization benchmarks. Our experiments demonstrate that our approach substantially outperforms state-of-the-art methods, achieving significant improvements in recall metrics compared to prior works. Moreover, we demonstrate that our framework can easily incorporate additional metadata such as room labels, enabling additional gains in both accuracy and efficiency.

How does it work?

Our approach begins by predicting depth and semantic rays from a single input image, capturing both the scene's geometric structure and contextual cues. We then interpolate these rays to build a coarse structural–semantic probability volume over the floorplan, assigning each position and orientation a likelihood that the image was captured there. In the refinement step, we extract the Top-K candidate poses—enforcing a minimum spatial separation—and, for each candidate, recompute its score using the original fine-grained ray predictions to achieve a more accurate probability. Optionally, we predict the room in which the image was taken and, in high-confidence cases, apply a room mask to further narrow the search space.

Citation

@misc{grader2025superchargingfloorplanlocalizationsemantic,
      title={Supercharging Floorplan Localization with Semantic Rays}, 
      author={Yuval Grader and Hadar Averbuch-Elor},
      year={2025},
      eprint={2507.09291},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.09291}, 
}

Supercharging Floorplan Localizationwith Semantic Rays

Interactive Demo

Supercharging Floorplan Localization
with Semantic Rays