research-article

DAGC: Employing Dual Attention and Graph Convolution for Point Cloud based Place Recognition

Authors:

Xiaoyong DuAuthors Info & Claims

ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

Pages 224 - 232

https://doi.org/10.1145/3372278.3390693

Published: 08 June 2020 Publication History

Abstract

Point cloud based retrieval for place recognition remains to be a problem demanding prompt solution due to its difficulty in efficiently encoding local features into adequate global descriptor in scenes. Existing studies solve this problem by generating a global descriptor for each point cloud, which is used to retrieve matched point cloud in database. But existing studies do not make effective use of the relationship between points and neglect different feature's discrimination power. In this paper, we propose to employ Dual Attention and Graph Convolution for point cloud based place recognition (DAGC) to solve these issues. Specifically, we employ two modules to help extract discriminative and generalizable features to describe a point cloud. We introduce a Dual Attention module to help distinguish task-relevant features and to utilize other points' different contributions to a point to generate representation. Meanwhile, we introduce a Residual Graph Convolution Network (ResGCN) module to aggregate local features of each point's multi-level neighbor points to further improve the representation. In this way, we improve the descriptor generation by considering the importance of both point and feature and leveraging point relationship. Experiments conducted on different datasets show that our work outperforms current approaches on all evaluation metrics.

References

[1]

Mikaela Angelina Uy and Gim Hee Lee. 2018. PointNetVLAD: Deep point cloud based retrieval for large-scale place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4470--4479.

[2]

Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5297--5307.

[3]

Relja Arandjelovic and Andrew Zisserman. 2013. All about VLAD. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 1578--1585.

Digital Library

[4]

Michael M Bronstein and Iasonas Kokkinos. 2010. Scale-invariant heat kernel signatures for non-rigid shape recognition. In2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1704--1711.

[5]

Xiaozhi Chen, Huimin Ma, Ji Wan, Bo Li, and Tian Xia. 2017. Multi-view 3dobject detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1907--1915.

[6]

Angela Dai and Matthias Nießner. 2018. 3dmv: Joint 3d-multi-view prediction for 3d semantic scene segmentation. In Proceedings of the European Conference on Computer Vision (ECCV). 452--468.

Digital Library

[7]

Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3146--3154.

[8]

Dorian Gálvez-López and Juan D Tardos. 2012. Bags of binary words for fast place recognition in image sequences.IEEE Transactions on Robotics28, 5 (2012), 1188--1197.

Digital Library

[9]

Christian Häne, Lionel Heng, Gim Hee Lee, Friedrich Fraundorfer, Paul Furgale, Torsten Sattler, and Marc Pollefeys. 2017. 3D visual perception for self-driving cars using a multi-camera system: Calibration, mapping, localization, and obstacle detection. Image and Vision Computing 68 (2017), 14--27.

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[11]

Wolfgang Hess, Damon Kohler, Holger Rapp, and Daniel Andor. 2016. Real-time loop closure in 2D LIDAR SLAM. In2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1271--1278.

Digital Library

[12]

Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, and Yichen Wei. 2018. Relation networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3588--3597.

[13]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.

[14]

Qiangui Huang, Weiyue Wang, and Ulrich Neumann. 2018. Recurrent slice networks for 3d segmentation of point clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2626--2635.

[15]

Hervé Jégou, Matthijs Douze, Cordelia Schmid, and Patrick Pérez. 2010. Aggregating local descriptors into a compact image representation. In CVPR 2010--23rd IEEE Conference on Computer Vision & Pattern Recognition. IEEE Computer Society, 3304--3311.

[16]

Andrew E Johnson. 1997. Spin-images: a representation for 3-D surface matching. (1997).

[17]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.

[18]

Ruihui Li, Xianzhi Li, Chi-Wing Fu, Daniel Cohen-Or, and Pheng-Ann Heng. 2019. Pugan: a point cloud upsampling adversarial network. In Proceedings of the IEEE International Conference on Computer Vision. 7203--7212.

[19]

Guosheng Lin, Chunhua Shen, Anton Van Den Hengel, and Ian Reid. 2016. Efficient piecewise training of deep structured models for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3194--3203.

[20]

Haomin Liu, Guofeng Zhang, and Hujun Bao. 2016. Robust keyframe-based monocular SLAM for augmented reality. In 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 1--10.

[21]

Daniel Maturana and Sebastian Scherer. 2015. Voxnet: A 3d convolutional neural network for real-time object recognition. In2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 922--928.

Digital Library

[22]

Aude Oliva and Antonio Torralba. 2006. Building the gist of a scene: The role of global image features in recognition.Progress in brain research155 (2006), 23--36.

[23]

Xiaojiang Peng, Limin Wang, Xingxing Wang, and Yu Qiao. 2016. Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. Computer Vision and Image Understanding 150 (2016), 109--125.

Digital Library

[24]

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 652--660.

[25]

Charles R Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J Guibas. 2016. Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5648--5656.

[26]

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. 2017. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems. 5099--5108.

[27]

Radu Bogdan Rusu, Nico Blodow, and Michael Beetz. 2009. Fast point feature histograms (FPFH) for 3D registration. In2009 IEEE International Conference on Robotics and Automation. IEEE, 3212--3217.

[28]

Samuele Salti, Federico Tombari, and Luigi Di Stefano. 2014. SHOT: Unique signatures of histograms for surface and texture description. Computer Vision and Image Understanding 125 (2014), 251--264.

[29]

Jorge Sánchez, Florent Perronnin, Thomas Mensink, and Jakob Verbeek. 2013.Image classification with the fisher vector: Theory and practice. International journal of computer vision 105, 3 (2013), 222--245.

[30]

Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, and Chengqi Zhang. 2018. Disan: Directional self-attention network for rnn/cnn-free language understanding. In Thirty-Second AAAI Conference on Artificial Intelligence.

[31]

Yiru Shen, Chen Feng, Yaoqing Yang, and Dong Tian. 2018. Mining point cloud local structures by kernel correlation and graph pooling. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4548--4557.

[32]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556(2014).

[33]

Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. 2015. Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision. 945--953.

Digital Library

[34]

Keisuke Tateno, Federico Tombari, Iro Laina, and Nassir Navab. 2017. Cnn-slam:Real-time dense monocular slam with learned depth prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6243--6252.

[35]

Koen Van De Sande, Theo Gevers, and Cees Snoek. 2009. Evaluating color descriptors for object and scene recognition.IEEE transactions on pattern analysis and machine intelligence 32, 9 (2009), 1582--1596.

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[37]

Chu Wang, Babak Samari, and Kaleem Siddiqi. 2018. Local spectral graph convolution for point set feature learning. In Proceedings of the European Conference on Computer Vision (ECCV). 52--66.

Digital Library

[38]

Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual attention network for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3156--3164.

[39]

Weiyue Wang, Ronald Yu, Qiangui Huang, and Ulrich Neumann. 2018. Sgpn: Similarity group proposal network for 3d point cloud instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2569--2578.

[40]

Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon. 2019. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG)38, 5 (2019), 146.

[41]

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1912--1920.

[42]

Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2018. Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318(2018).

[43]

Wenxiao Zhang and Chunxia Xiao. 2019. PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12436--12445.

Cited By

Zhang RLi GGao WLiu S(2025)A Quantum-Inspired Framework in Leader-Servant Mode for Large-Scale Multi-Modal Place RecognitionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.349757426:2(2027-2039)Online publication date: Feb-2025
https://doi.org/10.1109/TITS.2024.3497574
Jiang WXue HSi SMin CXiao LNie YDai B(2024)SG-LPR: Semantic-Guided LiDAR-Based Place RecognitionElectronics10.3390/electronics1322453213:22(4532)Online publication date: 18-Nov-2024
https://doi.org/10.3390/electronics13224532
Zhang YShi PLi J(2024)LiDAR-Based Place Recognition For Autonomous Driving: A SurveyACM Computing Surveys10.1145/370744657:4(1-36)Online publication date: 5-Dec-2024
https://dl.acm.org/doi/10.1145/3707446
Show More Cited By

Index Terms

DAGC: Employing Dual Attention and Graph Convolution for Point Cloud based Place Recognition
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Interest point detection using the superpixel segmentation and a binary descriptor
ICIMCS '18: Proceedings of the 10th International Conference on Internet Multimedia Computing and Service

Extracting interest points is one of the most important issues in techniques such as an object recognition and classification or a place recognition. Correct feature extraction can efficiently find out which parts of the image are likely to be unique, ...
Point Cloud Registration Network Based on Convolution Fusion and Attention Mechanism
Abstract
In 3D vision, point cloud registration remains a major challenge, especially in end-to-end deep learning, where low-quality point pairs will directly lead to the degradation of registration accuracy. Therefore, we propose a point cloud ...
Multiscale Dual-Channel Attention Network for Point Cloud Analysis
Intelligent Robotics and Applications
Abstract
Point clouds are the most popular representation of 3D vision tasks and have a wide range of applications in the field of smart robots today. The disordered and unstructured nature of 3D points makes it more difficult for researchers to extract ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

June 2020

605 pages

ISBN:9781450370875

DOI:10.1145/3372278

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Björn Þór Jónsson
IT University of Copenhagen, Denmark
,
Noriko Kando
National Institute of Informatics, Tokyo
,
Program Chairs:
Klaus Schoeffmann
Klagenfurt University, Austria
,
Phoebe Chen
La Trobe University, Australia
,
Noel E. O'Connor
Dublin City University, Ireland

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Fundamental Research Funds for Central Universities and Research Funds of Renmin University of China
National Natural Science Foundation of China

Conference

ICMR '20

Sponsor:

SIGMM

ICMR '20: International Conference on Multimedia Retrieval

June 8 - 11, 2020

Dublin, Ireland

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

40
Total Citations
View Citations
584
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang RLi GGao WLiu S(2025)A Quantum-Inspired Framework in Leader-Servant Mode for Large-Scale Multi-Modal Place RecognitionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.349757426:2(2027-2039)Online publication date: Feb-2025
https://doi.org/10.1109/TITS.2024.3497574
Jiang WXue HSi SMin CXiao LNie YDai B(2024)SG-LPR: Semantic-Guided LiDAR-Based Place RecognitionElectronics10.3390/electronics1322453213:22(4532)Online publication date: 18-Nov-2024
https://doi.org/10.3390/electronics13224532
Zhang YShi PLi J(2024)LiDAR-Based Place Recognition For Autonomous Driving: A SurveyACM Computing Surveys10.1145/370744657:4(1-36)Online publication date: 5-Dec-2024
https://dl.acm.org/doi/10.1145/3707446
Zhang RLiu XLi GLi TZhao PGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Sketch-aided Interactive Fusion Point Cloud Place RecognitionProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3657613(1115-1119)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3657613
Zhang JZhang YRong LTian RWang S(2024)MVSE-Net: A Multi-View Deep Network With Semantic Embedding for LiDAR Place RecognitionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.342137525:11(17174-17186)Online publication date: Nov-2024
https://doi.org/10.1109/TITS.2024.3421375
Zhang RLi GGao WLi T(2024)ComPoint: Can Complex-Valued Representation Benefit Point Cloud Place Recognition?IEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.335121525:7(7494-7507)Online publication date: Jul-2024
https://doi.org/10.1109/TITS.2024.3351215
Zhang JZhang YLiao MTian RColeman SKerr D(2024)CapsLoc3D: Point Cloud Retrieval for Large-Scale Place Recognition Based on 3D Capsule NetworksIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.334695325:7(6811-6823)Online publication date: Jul-2024
https://doi.org/10.1109/TITS.2023.3346953
Zhu JYang KZhang YPeng YPeng Y(2024)APFN: Adaptive Perspective-Based Fusion Network for 3-D Place RecognitionIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.341808373(1-10)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3418083
Du ZJi SKhoshelham K(2024)3-D LiDAR-Based Place Recognition Techniques: A Review of the Past Ten YearsIEEE Transactions on Instrumentation and Measurement10.1109/TIM.2024.340319473(1-24)Online publication date: 2024
https://doi.org/10.1109/TIM.2024.3403194
Hao WZhang WJin H(2024)SAGE-Net: Employing Spatial Attention and Geometric Encoding for Point Cloud Based Place RecognitionIEEE Robotics and Automation Letters10.1109/LRA.2024.33871129:6(4958-4965)Online publication date: Jun-2024
https://doi.org/10.1109/LRA.2024.3387112
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten