skip to main content
10.1145/3532213.3532271acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaiConference Proceedingsconference-collections
research-article

Multi-stage Floor Plan Recognition and 3D Reconstruction

Published:13 July 2022Publication History

ABSTRACT

The floor plan recognition problem is a sophistical and time-consuming task that requires architects to craft with their professional skills and tools. To solve Raster to vector (R2V) issues, this paper presents a deep-learning-based framework mainly consisting of YOLOv5, deep residual network (DRN), and OCR extraction algorithms. Our work can be interpreted in five steps: 1. Apply YOLOv5 to identify and separate a two-dimensional floor plan main body from the entire figure. Remove the annotation arrows by means of YOLOv5; 3. Extract the main structure data of the floor plan in the light of DRN;4. Apply OCR technology to read the annotation information; 5. Revise the floor plan to a real scale by means of the structured data from DRN recognition. The comparison experiment between our model and the original DRN model is carried out on the classical dataset RJFM (Recognition and 3D Reconstruction of Japan Residential Floor plan Model) and evaluated by a total of four loss functions, i.e. cross-entropy loss, near-neighbor field loss, endpoint regression loss, and multi-task loss. The results show that our model outperforms other methods in 2D floor plan recognition and further isometric 3D reconstruction.

References

  1. Fidler, S., Dickinson, S., & Urtasun, R. 2012. 3d object detection and viewpoint estimation with a deformable 3d cuboid model. Advances in neural information processing systems, 25, 611-619. https://dl.acm.org/doi/10.5555/2999134.2999203Google ScholarGoogle Scholar
  2. Alexander G. Schwing, Sanja Fidler, Marc Pollefeys, Raquel Urtasun. 2013. Box in the box: Joint 3d layout and object reasoning from single images. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Sydney, NSW, Australia, 353-360. https://doi.org/10.1109/ICCV.2013.51Google ScholarGoogle Scholar
  3. Zhang Y, He Y, Zhu S, Di X. 2020. The direction-aware, learnable, additive kernels and the adversarial network for deep floor plan recognition. arXiv preprint arXiv:2001.11194. https://arxiv.org/abs/2001.11194Google ScholarGoogle Scholar
  4. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778. https://arxiv.org/abs/1512.03385Google ScholarGoogle Scholar
  5. Ahmed S, Liwicki M, Weber M, Dengel A. 2011. Improved automatic analysis of architectural floor plans. In Document Analysis and Recognition (ICDAR). 2011 International Conference on. IEEE, Beijing, China, 864–869. https://doi.org/10.1109/ICDAR.2011.177Google ScholarGoogle Scholar
  6. Ahmed S, Liwicki M, Weber M, Dengel A. 2012. Automatic room detection and room labeling from architectural floor plans. In 2012 10th IAPR International Workshop on Document Analysis Systems. IEEE, Gold Coast, QLD, Australia, 339-343. https://doi.org/10.1109/DAS.2012.22Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Ahmed S, Weber M, Liwicki M, Dengel A. Text/graphics segmentation in architectural floor plans. 2011. In 2011 International Conference on Document Analysis and Recognition.IEEE, Beijing, China, 734-738. https://doi.org/10.1109/ICDAR.2011.153Google ScholarGoogle Scholar
  8. Tombre K, Tabbone S, Pélissier L, . Lamiroy B, and Dosch P. Text/graphics separation revisited.2002. In International Workshop on Document Analysis Systems. Springer, Berlin, Heidelberg, 200-211. https://doi.org/10.1007/3-540-45869-7_24Google ScholarGoogle Scholar
  9. Ahmed S, Liwicki M, Weber M, Dengel A. 2011. Improved automatic analysis of architectural floor plans. In 2011 International Conference on Document Analysis and Recognition. IEEE, Beijing, China, 864-869. https://doi.org/10.1109/ICDAR.2011.177Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. De las Heras, L. P., Ahmed, S., Liwicki, M., Valveny, E., & Sánchez, G. (2014). Statistical segmentation and structural recognition for floor plan interpretation. International Journal on Document Analysis and Recognition (IJDAR), 17(3), 221-237. https://doi.org/10.1007/s10032-013-0215-2Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Macé S, Locteau H, Valveny E, and Tabbone S. (2010). A system to detect rooms in architectural floor plan images. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, 167-174. https://doi.org/10.1145/1815330.1815352Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chen L C, Zhu Y, Papandreou G, Schroff F, Adam H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision(ECCV), 801–818. https://arxiv.org/abs/1802.02611v1Google ScholarGoogle ScholarCross RefCross Ref
  13. Liu C, Schwing A G, Kundu K, , Urtasun R, and Fidler S. 2015. Rent3d: Floor-plan priors for monocular layout estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, MA, 3413-3421. https://doi.org/10.1109/CVPR.2015.7298963Google ScholarGoogle Scholar
  14. Martin-Brualla R, He Y, Russell B C, and Seitz S M. 2014. The 3d jigsaw puzzle: Mapping large indoor spaces. In European Conference on Computer Vision. Springer, Cham, 1-16. https://doi.org/10.1007/978-3-319-10578-9_1Google ScholarGoogle Scholar
  15. Chu H, Kim D K, Chen T. You are here: Mimicking the human thinking process in reading floor-plans. 2015. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 2210-2218. https://doi.org/10.1109/ICCV.2015.255Google ScholarGoogle Scholar
  16. Wang S, Fidler S, Urtasun R. Lost shopping! monocular localization in large indoor spaces. 2015. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 2695-2703. https://doi.org/10.1109/ICCV.2015.309Google ScholarGoogle Scholar
  17. Wijmans E, Furukawa Y. Exploiting 2d floorplan for building-scale panorama rgbd alignment. 2017. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, HI, USA, 308-316. https://doi.org/10.1109/CVPR.2017.156Google ScholarGoogle Scholar
  18. Furukawa Y, Curless B, Seitz S M, and Szeliski R. 2009. Reconstructing building interiors from images. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, Kyoto, Japan, 80-87. https://doi.org/10.1109/ICCV.2009.5459145Google ScholarGoogle ScholarCross RefCross Ref
  19. Choi S, Zhou Q Y, Koltun V. Robust reconstruction of indoor scenes. 2015. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, MA, USA, 5556-5565. https://doi.org/10.1109/CVPR.2015.7299195Google ScholarGoogle Scholar
  20. Mura C, Mattausch O, Villanueva A J, Gobbetti E, and Pajarola R. 2014. Automatic room detection and reconstruction in cluttered indoor environments with complex room layouts. Computers & Graphics, 44:20-32. https://doi.org/10.1016/j.cag.2014.07.005Google ScholarGoogle Scholar
  21. Luo, X, O'Brien, W J, & Julien, C. L.. 2011. Comparative evaluation of Received Signal-Strength Index (RSSI) based indoor localization techniques for construction jobsites. Advanced Engineering Informatics, 25(2): 355-363. https://doi.org/10.1016/j.aei.2010.09.003Google ScholarGoogle Scholar
  22. Ikehata S, Yang H, Furukawa Y. 2015. Structured indoor modeling. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 1323-1331. https://doi.org/10.1109/ICCV.2015.156Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Yan, F, Nan, L, & Wonka, P. 2016. Block assembly for global registration of building scans. ACM Transactions on Graphics (TOG), 35(6), 1-11. https://doi.org/10.1145/2980179.2980241Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liu C, Wu J, Kohli P, Furukawa Y. 2017. Raster-to-vector: Revisiting floorplan transformation. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Venice, Italy, 2195-2203. https://doi.org/10.1145/2980179.2980241Google ScholarGoogle ScholarCross RefCross Ref
  25. LIFULL HOME'S dataset. https://www.homes.co.jp/Google ScholarGoogle Scholar
  26. Long J, Shelhamer E, Darrell T. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431-3440. https://arxiv.org/abs/1411.4038Google ScholarGoogle ScholarCross RefCross Ref
  27. Macé S, Locteau H, Valveny E, Tabbone S. 2010. A system to detect rooms in architectural floor plan images. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, 167-174. https://doi.org/10.1145/1815330.1815352Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ramer U. 1972. An iterative procedure for the polygonal approximation of plane curves. Computer Graphics and Image Processing, 1(3): 244-256. https://doi.org/10.1016/S0146-664X(72)80017-0Google ScholarGoogle Scholar
  29. Zeng Z, Li X, Yu Y K, Fu C. 2019. Deep floor plan recognition using a multi-task network with room-boundary-guided attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, Korea (South), 9096-9104. https://doi.org/10.1109/ICCV.2019.00919Google ScholarGoogle ScholarCross RefCross Ref
  30. Baek Y, Lee B, Han D, Yun S, and Lee H. 2019. Character region awareness for text detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 9365-9374. https://doi.org/10.1109/CVPR.2019.00959Google ScholarGoogle Scholar
  31. Lv X, Zhao S, Yu X, Zhao B. Residential floor plan recognition and reconstruction. 2021. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 16717-16726. https://doi.org/10.1109/CVPR46437.2021.01644Google ScholarGoogle Scholar
  32. Liu S, Li T, Chen W, Li H. 2019. Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, Korea (South), 7708-7717. https://doi.org/10.1109/ICCV.2019.00780Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial Intelligence
    March 2022
    809 pages
    ISBN:9781450396110
    DOI:10.1145/3532213

    Copyright © 2022 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 July 2022

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)72
    • Downloads (Last 6 weeks)7

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format