Feature set aggregator: unsupervised representation learning of sets for their comparison

Furuya, Takahiko; Ohbuchi, Ryutarou

doi:10.1007/s11042-019-08078-y

Feature set aggregator: unsupervised representation learning of sets for their comparison

Published: 20 August 2019

Volume 78, pages 35157–35178, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Takahiko Furuya¹ &
Ryutarou Ohbuchi¹

315 Accesses
Explore all metrics

Abstract

Unsupervised representation learning of unlabeled multimedia data is important yet challenging problem for their indexing, clustering, and retrieval. There have been many attempts to learn representation from a collection of unlabeled 2D images. In contrast, however, less attention has been paid to unsupervised representation learning for unordered sets of high-dimensional feature vectors, which are often used to describe multimedia data. One such example is set of local visual features to describe a 2D image. This paper proposes a novel algorithm called Feature Set Aggregator (FSA) for accurate and efficient comparison among sets of high-dimensional features. FSA learns representation, or embedding, of unordered feature sets via optimization using a combination of two training objectives, that are, set reconstruction and set embedding, carefully designed for set-to-set comparison. Experimental evaluation under three multimedia information retrieval scenarios using 3D shapes, 2D images, and text documents demonstrates efficacy as well as generality of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contrasting Quadratic Assignments for Set-Based Representation Learning

Feature Discretization with Relevance and Mutual Information Criteria

Local feature selection for multiple instance learning

Article 01 November 2021

References

Abadi M et al (2016) TensorFlow: a system for large-scale machine learning. Proc. OSDI 2016:265–283
Google Scholar
Achlioptas P, Diamanti O, Mitliagkas I, Guibas L (2017) Learning Representations and Generative Models for 3D Point Clouds, arXiv preprint, arXiv:1707.02392
Arandjelović R, Gronat P, Torii A, Pajdla T, Sivic J (2018) NetVLAD: CNN architecture for weakly supervised place recognition. TPAMI 40(6):1437–1451
Google Scholar
Blitzer J, Dredze M, Pereira F (2007) Biographies, Bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. Proc. ACL 2007:440–447
Google Scholar
Chang AX et al. (2015) ShapeNet: An Information-Rich 3D Model Repository, arXiv:1512.03012
Charles RQ, Su H, Kaichun M, Guibas LJ (2017) PointNet: deep learning on point Sets for 3D classification and segmentation. Proc. CVPR 2017:77–85
Google Scholar
Chen DY, Tian XP, Te Shen Y, Ouhyoung M (2003) On visual similarity based 3D model retrieval. Comput Graph Forum 22(3):223–232
Google Scholar
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of Keypoints. Proc. ECCV 2004 workshop on statistical learning in computer vision: 59–74
Deng J, Dong W, Socher R, Li L-J, Li K, Li F-F (2009) ImageNet: a large-scale hierarchical image database. Proc CVPR 2009:248–255
Google Scholar
Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Proc CVPR workshop 2004:59–70
Google Scholar
Furuya T, Ohbuchi R (2014) Fusing multiple features for shape-based 3D model retrieval, Proc British Machine Vision Conference (BMVC)
Furuya T, Ohbuchi R (2015, 2015) Diffusion-on-manifold aggregation of local features for shape-based 3D model retrieval. Proc. ICMR:171–178
Furuya T, Ohbuchi R (2016) Accurate aggregation of local features by using K-sparse autoencoder for 3D model retrieval. Proc. ICMR 2016:293–297
Google Scholar
Furuya T, Ohbuchi R (2016) Deep aggregation of local 3D geometric features for 3D model retrieval. Proc BMVC 2016:121.1–121.12
Google Scholar
Gavrila DM, Philomin V (1999) Real-time object detection for “smart” vehicles. Proc. ICCV 1999:87–93
Google Scholar
Geoffrey E, Hinton RRS (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
MathSciNet MATH Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. Proc AISTATS 2011:315–323
Google Scholar
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2016) Generative adversarial nets. Proc NIPS 2016:2672–2680
Google Scholar
Günther F English LSA space, https://sites.google.com/site/fritzgntr/home
Guo Y, Sohel F, Bennamoun M, Lu M, Wan J (2013) Rotational projection statistics for 3D local surface description and object recognition. IJCV 105(1):63–86
MathSciNet MATH Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc CVPR 2016:770–778
Google Scholar
Henrikson J (1999) Completeness and total boundedness of the Hausdorff metric, MIT Undergraduate Journal of Mathematics: 69–80
Hoffer E, Ailon N (2015) Deep metric learning using triplet network. Proc. ICLR 2015 workshop
Hyvärinen A, Hurri J, Hoyer PO (2009) Natural image statistics: a probabilistic approach to early computational vision. Springer, Verlag
MATH Google Scholar
Jégou H, Douze M, Schmid C, Pérez P (2010) Aggregating local descriptors into a compact image representation. Proc CVPR 2010:3304–3311
Google Scholar
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. Proc. ICLR 2015
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. Proc. NIPS 2012: 1097–1105.
Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discourse Process 25(2–3):259–284
Google Scholar
Lang K (1995) Newsweeder: Learning to filter netnews. Proc. ICML 1995:331–339
Google Scholar
Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, Ming-Hsuan Yang, Unsupervised representation learning by sorting sequences, Proc. ICCV 2017, pp. 667–676, 2017.
Lehmann J et al (2015) DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6(2):167–195
Google Scholar
Leng L, Zhang J (2013) Palmhash code vs. palmphasor code. Neurocomputing 108:1–12
Google Scholar
Leng L, Zhang J, Xu J, Khan MK, Alghathbar K (2010) Dynamic weighted discrimination power analysis in DCT domain for face and palmprint recognition. Proc ICTC 2010:467–471
Google Scholar
Leng L, Li M, Leng L, Teoh ABJ (2013) Conjugate 2DPalmHash code for secure palm-print-vein verification. Proc CISP 3:1705–1710
Google Scholar
Leng L, Li M, Kim C, Bi X (2017) Dual-source discrimination power analysis for multi-instance contactless palmprint recognition. MTAP 76(1):333–354
Google Scholar
Lin R, Xiao J, Fan J (2018) NeXtVLAD: An efficient neural network to aggregate frame-level features for large-scale video classification, Proc. ECCV 2018 workshops: 206–218
Lin T-Y, Maji S, Koniusz P (2018) Second-order democratic aggregation. Proc. ECCV 2018:639–656
Google Scholar
Liu Z, Wang S, Tian Q (2016) Fine-residual VLAD for image retrieval. Neurocomputing 173(3):1183–1191
Google Scholar
Liu Y, Yan J, Ouyang W (2017) Quality aware network for set to set recognition. Proc. CVPR 2017:4694–4703
Google Scholar
Lowe DG (2004) Distinctive image features from scale-invariant Keypoints. IJCV 60(2):91–110
Google Scholar
Lu L, Zhang J, Gao C (2011) Muhammad Khurram khan, Khaled Alghathbar, two-directional two-dimensional random projection and its variations for face and palmprint recognition. Proc ICCSA 2011:458–458
Google Scholar
Lu L, Zhang J, Gao C (2011) Muhammad Khurram khan, ping Bai, two dimensional PalmPhasor enhanced by multi-orientation score level fusion. Proc STA 2011:122–129
Google Scholar
Lu L, Beng A, Teoh J (2015) Alignment-free row-co-occurrence cancelable palmprint. Fuzzy Vault Pattern Recogn 48(7):2290–2303
Google Scholar
Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375
Google Scholar
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Makhzani A, Frey B (2014) k-sparse autoencoders, Proc. ICLR 2014
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. Proc. NIPS 2013:3111–3119
Google Scholar
Nilsback M-E, Zisserman A (2008) Automated flower classification over a large number of classes. Proc ICVGIP 2008:722–729
Google Scholar
Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. Proc. ECCV 2016:69–84
Google Scholar
Ohbuchi R, Minamitani T, Takei T (2005) Shape-similarity search of 3D models by using enhanced shape functions. IJCAT 23(2):70–85
Google Scholar
Osada R, Funkhouser T, Chazelle B, Dobkin D (2002) Shape distributions. ACM Trans Graph (TOG) 21(4):807–832
MathSciNet MATH Google Scholar
Papadakis P, Pratikakis I, Perantonis S, Theoharis T (2007) Efficient 3D shape matching and retrieval using a concrete radialized spherical projection representation. Pattern Recogn 40(9):2437–2452
MATH Google Scholar
Pathak D, Krähenbühl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: feature learning by Inpainting. Proc CVPR 2016:2536–2544
Google Scholar
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. Proc. ECCV 2010, part IV: 143–156
Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: deep hierarchical feature learning on point Sets in a metric space, Proc. NIPS 2017: 5105–5114
Rubner Y, Tomasi C, Guibas LJ (1998) A metric for distributions with applications to image databases. Proc ICCV 1998:59–66
Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: a unified embedding for face recognition and clustering. Proc CVPR 2015:815–823
Google Scholar
Sfikas K (2018) Ioannis Pratikakis, Theoharis Theoharisa, ensemble of PANORAMA-based convolutional neural networks for 3D model classification and retrieval. Comput Graph 71:208–218
Google Scholar
Shi B, Bai S, Zhou Z, Bai X (2015) DeepPano: deep panoramic representation for 3-D shape recognition. Signal Process Lett 22(12):2339–2343
Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. Proc. ICLR 2015:1–14
Google Scholar
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. Proc. ICCV
Thorsten Joachims A (1997) Probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. Proc ICML 1997:143–151
Google Scholar
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked Denoising autoencoders: learning useful representations in a deep network with a local Denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Wahl E, Hillenbrand U, Hirzinger G (2003) Surflet-pair-relation histograms: a statistical 3D-shape representation for rapid classification. Proc Fourth Int Conf 3D Digit Imag Model (3DIM) 2003:474–481
Google Scholar
Wang X, Gupta A (2015) Unsupervised learning of visual representations using videos. Proc. ICCV 2015:2794–2802
Google Scholar
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. Proc. CVPR 2010:3360–3367
Google Scholar
Wei X, Zhang Y, Gong Y, Zhang J, Zheng N (2018) Grassmann pooling as compact homogeneous bilinear pooling for fine-grained visual classification. Proc ECCV 2018:365–380
Google Scholar
Word2vec. https://code.google.com/archive/p/word2vec
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D ShapeNets: a deep representation for volumetric shapes. Proc. CVPR 2015:1912–1920
Google Scholar
Xi Z, Kai Y, Zhang T, Huang TS (2010) Image classification using super-vector coding of local image descriptors. Proc ECCV 2010:141–154
Google Scholar
Xian Y, Lampert CH, Schiele B, Akata Z (2018) Zero-shot learning - a comprehensive evaluation of the good, the bad and the ugly. TPAMI 40(8)
Xu X, Song J, Lu H, Yang Y, Shen F, Huang Z (2018) Modal-adversarial semantic learning network for extendable cross-modal retrieval. Proc ICMR 2018:46–54
Google Scholar
Yang Y, Feng C, Shen Y, Tian D (2017) FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds, arXiv preprint, arXiv:1712.07262
Zaheer M, Kottur S, Ravanbakhsh S, Poczos B, Salakhutdinov RR, Smola AJ (2017) Deep sets, Proc. NIPS 2017: 3394–3404.

Download references

Author information

Authors and Affiliations

University of Yamanashi, 4-3-11 Takeda, Kofu-shi, Yamanashi-ken, 400-8511, Japan
Takahiko Furuya & Ryutarou Ohbuchi

Authors

Takahiko Furuya
View author publications
You can also search for this author in PubMed Google Scholar
Ryutarou Ohbuchi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takahiko Furuya.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Furuya, T., Ohbuchi, R. Feature set aggregator: unsupervised representation learning of sets for their comparison. Multimed Tools Appl 78, 35157–35178 (2019). https://doi.org/10.1007/s11042-019-08078-y

Download citation

Received: 10 August 2018
Revised: 05 June 2019
Accepted: 02 August 2019
Published: 20 August 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11042-019-08078-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature set aggregator: unsupervised representation learning of sets for their comparison

Abstract

Access this article

Similar content being viewed by others

Contrasting Quadratic Assignments for Set-Based Representation Learning

Feature Discretization with Relevance and Mutual Information Criteria

Local feature selection for multiple instance learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature set aggregator: unsupervised representation learning of sets for their comparison

Abstract

Access this article

Similar content being viewed by others

Contrasting Quadratic Assignments for Set-Based Representation Learning

Feature Discretization with Relevance and Mutual Information Criteria

Local feature selection for multiple instance learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation