skip to main content
10.1145/3472456.3473520acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Exploring HW/SW Co-Optimizations for Accelerating Large-scale Texture Identification on Distributed GPUs

Published: 05 October 2021 Publication History

Abstract

Texture identification has been developed recently to support one-to-one verification and one-to-many search, which provides much broader support than texture classification in real-life applications. It has demonstrated great potentials to enable product traceability by identifying the unique texture information on the surface of the targeted objects. However, existing hardware acceleration schemes are not enough to support a large-scale texture identification, especially for the search task, where the number of texture images being searched can reach millions, creating enormous compute and memory demands and making real-time texture identification infeasible. To address these problems, we propose a comprehensive toolset with jointly optimization strategies from both hardware and software to deliver optimized GPU acceleration and leverage large-scale texture identification with real-time responses. Novel technologies include: 1) a highly-optimized cuBLAS implementation for efficiently running 2-nearest neighbors algorithm; 2) a hybrid cache design to incorporate host memory for streaming data toward GPUs, which delivers a 5 × larger memory capacity while running the targeted workloads; 3) a batch process to fully exploit the data reuse opportunities by considering available compute resources and memory bandwidth constraints. 4) an asymmetric local feature extraction to reduce the memory footprint for keeping feature matrices of reference texture images. To the best of our knowledge, this work is the first implementation to provide real-time large-scale texture identification on GPUs. By exploring the co-optimizations from both hardware and software, we can deliver 31 × faster search and 20 × larger feature cache capacity compared to a conventional CUDA implementation. We also demonstrate our proposed designs by proposing a distributed texture identification system with 14 Nvidia Tesla P100 GPUs which can complete 872,984 texture similarity comparisons in just one second.

References

[1]
R. Arandjelović and A. Zisserman. 2012. Three things everyone should know to improve object retrieval. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2]
H. Bay, T. Tuytelaars, and L. V. Goo. 2008. Speeded-up robust features (SURF). Computer vision and image understanding 110, 3 (2008), 346–359.
[3]
S. Bell, P. Upchurch, N. Snavely, and K. Bala. 2015. Material recognition in the wild with the materials in the context database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4]
M. Chica-Olmo and F. Abarca-Hernández. 2000. Computing geostatistical image texture for remotely sensed data classification. Computers & Geosciences(2000).
[5]
O. G. Cula and K. J. Dana. 2001. Compact representation of bidirectional texture functions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6]
M. Datar, N. Immorlica, P. Indyk, and V. Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distribution. In International Conference on computational Geometry.
[7]
J. Deng, J. Guo, N. Xue, and Z. Stefanos. 2019. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8]
P. Fischer, A. Dosovitskiy, and T. Brox. 2014. Descriptor matching with convolutional neural networks: a comparison to SIFT. arXiv preprint arXiv:1405.5769(2014).
[9]
V. Garcia, E. Debreuve, F. Nielsen, and M. Barlaud. 2010. k-nearest neighbor search: fast GPU-based implementations and application to high-dimensional feature matching. In Conference on Image Processing (ICIP).
[10]
H. Jegou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Software Engineering 33, 1 (2011), 117–128.
[11]
H. Jegou, M. Douze, C. Schmid, and P. Perez. 2010. Aggregating local descriptors into a compact image representation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12]
J. Johnson, M. Douze, and H. Jegou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data(2019).
[13]
A. Karahaliou, S. Skiadopoulos, I. Boniatis, 2007. Texture analysis of tissue surrounding microcalcifications on mammograms for breast cancer diagnosis. The British journal of radiology(2007).
[14]
K. Kumar and K. G. H. Pang. 2002. Defect detection in textured materials using Gabor filters. IEEE Transactions on Industry Applications 38, 2 (2002), 425–440.
[15]
Y. Kusamura, Y. Kozawa, T. Amagasa, and H. Kitagawa. 2016. GPU acceleration of content-based image retrieval based on SIFT descriptors. In International Conference on Network-Based Information Systems (NBiS).
[16]
D. G. Lowe. 1999. Object Recognition from Local Scale-invariant Features. In IEEE International Conference on Computer Vision (ICCV).
[17]
D. G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International journal of computer vision (IJCV) 60, 2 (2004), 91–110.
[18]
J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan. 2021. Image Matching from Handcrafted to Deep Features: A Survey. International Journal of Computer Vision (IJCV) 129, 1 (2021), 23–79.
[19]
D. Nister and H. Stewenius. 2006. Scalable recognition with vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20]
H. Noh, A. Araujo, J. Sim, 2017. Large-scale mage retrieval with attentive deep local features. IEEE International Conference on Computer Vision (ICCV) (2017).
[21]
F. Perronmin, J. Sanches, and T. Mensink. 2010. Improving the fisher kernel for large-scale image classification. In European conference on computer Vision (ECCV).
[22]
E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. 2011. ORB: and efficient alternative to SIFT or SURF. In IEEE International Conference on Computer Vision (ICCV).
[23]
L. Sharan, C. Liu, R. Rosenholtz, and E. H. Adelson. 2013. Recognizing materials using perceptually inspired features. International journal of computer vision 103, 3 (2013), 348–371.
[24]
K. Shlizerman, S. M. Seitz, D. Miller, and E. Brossard. 2016. The megaface benchmark: 1 million faces for recognition at scale. In IEEE conference on computer vision and pattern recognition (CVPR).
[25]
C. Silpa-Anan and R. Hartley. 2008. Optimized kd-trees for faster image descriptor matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26]
J. Sivic and A. Zisserman. 2003. Video Google: A text retrieval approach to object matching in videos. IEEE International Conference on Computer Vision (ICCV) (2003).
[27]
J. Wang, Y. Li, Z. Chang, 2021. Fine-grained texture identification for reliable product traceability. arXiv preprint arXiv:2104.11548(2021).
[28]
K. M. Yi, E. Trulls, V. Lepetit, and P. Fua. 2016. LIFT: Learned invariant features transform. In European conference on computer Vision (ECCV).
[29]
H. Zhang, J. Xue, and K. Dana. 2016. Deep TEN: Texture Encoding Network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30]
X. Zhang, H. Lu, C. Hao, 2020. SkyNet: a hardware-efficient method for object detection and tracking on embedded systems. In Machine Learning and Systems (MLSys).
[31]
X. Zhang, J. Wang, C. Zhu, 2018. DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs. In IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

Cited By

View all
  • (2023)Compilation and Optimizations for Efficient Machine Learning on Embedded SystemsEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-39932-9_3(37-74)Online publication date: 10-Oct-2023
  • (2022)Algorithm/Accelerator Co-Design and Co-Search for Edge AIIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.317922969:7(3064-3070)Online publication date: Jul-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '21: Proceedings of the 50th International Conference on Parallel Processing
August 2021
927 pages
ISBN:9781450390682
DOI:10.1145/3472456
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU acceleration
  2. SIFT
  3. batching
  4. cuBLAS
  5. feature extraction
  6. hybrid cache
  7. nearest neighbor
  8. texture identification

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2021

Acceptance Rates

Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Compilation and Optimizations for Efficient Machine Learning on Embedded SystemsEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-39932-9_3(37-74)Online publication date: 10-Oct-2023
  • (2022)Algorithm/Accelerator Co-Design and Co-Search for Edge AIIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.317922969:7(3064-3070)Online publication date: Jul-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media