skip to main content
research-article

Discovering Latent Topics by Gaussian Latent Dirichlet Allocation and Spectral Clustering

Published: 23 January 2019 Publication History

Abstract

Today, diversifying the retrieval results of a certain query will improve customers’ search efficiency. Showing the multiple aspects of information provides users an overview of the object, which helps them fast target their demands. To discover aspects, research focuses on generating image clusters from initially retrieved results. As an effective approach, latent Dirichlet allocation (LDA) has been proved to have good performance on discovering high-level topics. However, traditional LDA is designed to process textual words, and it needs the input as discrete data. When we apply this algorithm to process continuous visual images, a common solution is to quantize the continuous features into discrete form by a bag-of-visual-words algorithm. During this process, quantization error will lead to information that inevitably is lost. To construct a topic model with complete visual information, this work applies Gaussian latent Dirichlet allocation (GLDA) on the diversity issue of image retrieval. In this model, traditional multinomial distribution is substituted with Gaussian distribution to model continuous visual features. In addition, we propose a two-phase spectral clustering strategy, called dual spectral clustering, to generate clusters from region level to image level. The experiments on the challenging landmarks of the DIV400 database show that our proposal improves relevance and diversity by about 10% compared to traditional topic models.

References

[1]
Shane Ahern, Mor Naaman, Rahul Nair, and Jeannie Hui-I. Yang. 2007. World explorer: Visualizing aggregate data from unstructured text in geo-referenced collections. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’07). ACM, New York, NY, 1--10.
[2]
Simone Bianco and Gianluigi Ciocca. 2015. User preferences modeling and learning for pleasing photo collage generation. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 1 (Aug. 2015), 6:1--6:23.
[3]
David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (Jan. 2003), 993--1022.
[4]
Deng Cai, Xiaofei He, Zhiwei Li, Wei-Ying Ma, and Ji-Rong Wen. 2004. Hierarchical clustering of WWW image search results using visual, textual and link information. In Proceedings of the 12th Annual ACM International Conference on Multimedia (MULTIMEDIA’04). ACM, New York, NY, 952--959.
[5]
Jaime Carbinell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), Vol. 51. ACM, New York, NY, 335--336.
[6]
Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Buttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’08). ACM, New York, NY, 659--666.
[7]
Duc-Tien Dang-Nguyen, Luca Piras, Giorgio Giacinto, Giulia Boato, and Francesco G. B. DE Natale. 2017. Multimodal retrieval with diversification and relevance feedback for tourist attraction images. ACM Transactions on Multimedia Computing, Communications, and Applications 13, 4 (Aug. 2017), 49:1--49:24.
[8]
Rajarshi Das, Manzil Zaheer, and Chris Dyer. 2015. Gaussian LDA for topic models with word embeddings. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 795--804.
[9]
Thomas Deselaers, Tobias Gass, Philippe Dreuw, and Hermann Ney. 2009. Jointly optimising relevance and diversity in image retrieval. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR’09). ACM, New York, NY, 39:1--39:8.
[10]
Pengfei Hu, Wenju Liu, Jiang Wei, and Zhanlei Yang. 2014. Latent topic model for audio retrieval. Pattern Recognition 47, 3 (Mar. 2014), 1138--1143.
[11]
Bogdan Ionescu, Anca-Livia Radu, Maria Menendez, Henning Muller, Adrian Popescu, and Babak Loni. 2014. Div400: A social image retrieval result diversification dataset. In Proceedings of the 5th ACM Multimedia Systems Conference (MMSys’14). ACM, New York, NY, 29--34.
[12]
Go Irie, Dong Liu, Zhenguo Li, and Shih-Fu Chang. 2013. A Bayesian approach to multimodal visual dictionary learning. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, Los Alamitos, CA, 329--336.
[13]
Zhengbao Jiang, Ji-Rong Wen, Zhicheng Dou, Wayne Xin Zhao, Jian-Yun Nie, and Ming Yue. 2017. Learning to diversify search results via subtopic attention. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’17). ACM, New York, NY, 545--554.
[14]
Lyndon S. Kennedy and Mor Naaman. 2008. Generating diverse and representative image search results for landmarks. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). ACM, New York, NY, 297--306.
[15]
Shangsong Liang, Zhaochun Ren, and Maarten de Rijke. 2014. Personalized search result diversification via structured learning. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). ACM, New York, NY, 751--760.
[16]
David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV’99), Vol. 2. IEEE, Los Alamitos, CA, 1150--1157.
[17]
Changzhi Luo, Bingbing Ni, Shuicheng Yan, and Meng Wang. 2016. Image classification by selective regularized subspace learning. IEEE Transactions on Multimedia 18, 1 (Jan. 2016), 40--50.
[18]
Nobuyuki Morioka and Jingdong Wang. 2011. Robust visual reranking via sparsity and ranking constraints. In Proceedings of the 19th ACM International Conference on Multimedia (MM’11). ACM, New York, NY, 533--542.
[19]
Monica Lestari Paramita, Mark Sanderson, and Paul Clough. 2010. Diversity in photo retrieval: Overview of the ImageCLEFPhoto Task 2009. In Proceedings of the 10th International Conference on Cross-Language Evaluation Forum: Multimedia Experiments (CLEF’09). 45--59. http://dl.acm.org/citation.cfm?id=1885110.1885119.
[20]
Bryan C. Russell, Alexei A. Efros, Josef Sivic, William T. Freeman, and Andrew Zisserman. 2006. Using multiple segmentations to discover objects and their extent in image collections. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. IEEE, Los Alamitos, CA, 1605--1614.
[21]
Rodrygo L. T. Santos, Craig MacDonald, and Iadh Ounis. 2010. Exploiting query reformulations for Web search result diversification. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, New York, NY, 881--890.
[22]
Jianbo Shi and Jitendra Malik. 1997. Normalized cuts and image segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’97). IEEE, Los Alamitos, CA, 731--737.
[23]
Eleftherios Spyromitros-Xioufis, Symeon Papadopoulos, Alexandru Lucian Ginsca, Adrian Popescu, Yiannis Kompatsiaris, and Ioannis Vlahavas. 2015. Improving diversity in image search via supervised relevance scoring. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (ICMR’15). ACM, New York, NY, 323--330.
[24]
Reinier H. van Leuken, Lluis Garcia, Ximena Olivares, and Roelof van Zwol. 2009. Visual diversification of image search results. In Proceedings of the 18th International Conference on World Wide Web (WWW’09). ACM, New York, NY, 341--350.
[25]
Marcos R. Vieira, Humberto L. Razente, Marios Hadjieleftheriou Maria C. N. Barioni, Divesh Srivastava, Caetano Traina Jr., and Vassilis J. Tsotras. 2011. On query result diversification. In Proceedings of the 2011 IEEE 27th International Conference on Data Engineering (ICDE’11). IEEE, Los Alamitos, CA, 1163--1174.
[26]
Ulrike von Luxburg. 2007. A tutorial on spectral clustering. Statistics and Computing 17, 4 (Dec. 2007), 395--416.
[27]
Di Wang, Xinbo Gao, Xiumei Wang, Lihuo He, and Bo Yuan. 2016. Multimodal discriminative binary embedding for large-scale cross-modal retrieval. IEEE Transactions on Image Processing 25, 10 (Oct. 2016), 4540--4554.
[28]
Meng Wang, Weijie Fu, Shijie Hao, Hengchang Liu, and Xindong Wu. 2017. Learning on big graph: Label inference and regularization with anchor hierarchy. IEEE Transactions on Knowledge and Data Engineering 29, 5 (May 2017), 1101--1114.
[29]
Meng Wang, Hao Li, Dacheng Tao, Ke Lu, and Xindong Wu. 2012. Multimodal graph-based reranking for Web image search. IEEE Transactions on Image Processing 21, 11 (Nov. 2012), 4649--4661.
[30]
Meng Wang, Kuiyuan Yang, Xian-Sheng Hua, and Hong-Jiang Zhang. 2010. Towards a relevant and diverse search of social images. IEEE Transactions on Multimedia 12, 8 (Dec. 2010), 829--842.
[31]
Frank Wood and Michael J. Black. 2008. A nonparametric Bayesian alternative to spike sorting. Journal of Neuroscience Methods 173, 1 (Aug. 2008), 1--12.
[32]
Linjun Yang, Bo Geng, Alan Hanjalic, and Xian-Sheng Hua. 2012. A unified context model for Web image retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 8, 3 (Aug. 2012), 28:1--28:19.
[33]
Liu Yang, Rong Jin, Rahul Sukthankar, and Frederic Jurie. 2008. Unifying discriminative visual codebook generation with classifier training for object category recognition. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE, Los Alamitos, CA, 1--8.
[34]
Jan Zahalka, Stevan Rudinac, and Marcel Worring. 2015. Interactive multimodal learning for venue recommendation. IEEE Transactions on Multimedia 17, 12 (Dec. 2015), 2235--2244.
[35]
Hanwang Zhang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao, and Tat-Seng Chua. 2014. Attribute-augmented semantic hierarchy: Towards a unified framework for content-based image retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications 11, 1 (Oct. 2014), 21:1--21:21.
[36]
Wengang Zhou, Houqiang Li, Yijuan Lu, and Qi Tian. 2013. SIFT match verification by geometric coding for large-scale partial-duplicate Web image search. ACM Transactions on Multimedia Computing, Communications, and Applications 9, 1 (Feb. 2013), 4:1--4:18.

Cited By

View all
  • (2023)Contrastive Attention-guided Multi-level Feature Registration for Reference-based Super-resolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361649520:2(1-21)Online publication date: 21-Aug-2023
  • (2023)Double High-Order Correlation Preserved Robust Multi-View Ensemble ClusteringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361292320:1(1-21)Online publication date: 3-Aug-2023
  • (2022)Feedback Chain Network for Hippocampus SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357174419:3s(1-18)Online publication date: 18-Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 1
February 2019
265 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3309717
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 January 2019
Accepted: 01 October 2018
Revised: 01 September 2018
Received: 01 May 2018
Published in TOMM Volume 15, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Gaussian
  2. Latent Dirichlet allocation
  3. diversity
  4. image retrieval
  5. spectral clustering

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Key Research and Development Program of China
  • National Natural Science Foundation of China
  • National High-Level Talents Special Support Program Leading Talent of Technological Innovation of Ten-Thousands Talents Program

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Contrastive Attention-guided Multi-level Feature Registration for Reference-based Super-resolutionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361649520:2(1-21)Online publication date: 21-Aug-2023
  • (2023)Double High-Order Correlation Preserved Robust Multi-View Ensemble ClusteringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361292320:1(1-21)Online publication date: 3-Aug-2023
  • (2022)Feedback Chain Network for Hippocampus SegmentationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357174419:3s(1-18)Online publication date: 18-Nov-2022
  • (2022)Integration of Multisensorial Effects in Synchronised Immersive Hybrid TV ScenariosIEEE Access10.1109/ACCESS.2022.319417010(79071-79089)Online publication date: 2022
  • (2022)A Pseudo-Haptic Method Using Auditory Feedback: The Role of Delay, Frequency, and Loudness of Auditory Feedback in Response to a User’s Button Click in Causing a Sensation of HeavinessIEEE Access10.1109/ACCESS.2022.317232410(50008-50022)Online publication date: 2022
  • (2021)Topic Modeling Using Latent Dirichlet allocationACM Computing Surveys10.1145/346247854:7(1-35)Online publication date: 17-Sep-2021
  • (2021)ASK: Adaptively Selecting Key Local Features for RGB-D Scene RecognitionIEEE Transactions on Image Processing10.1109/TIP.2021.305345930(2722-2733)Online publication date: 9-Feb-2021
  • (2021)Topic-based label distribution learning to exploit label ambiguity for scene classificationNeural Computing and Applications10.1007/s00521-021-06218-w33:23(16181-16196)Online publication date: 1-Dec-2021
  • (2020)Person Sensor-Aided Scene Recognition and Understanding Based on CG Technology2020 International Conference on Inventive Computation Technologies (ICICT)10.1109/ICICT48043.2020.9112445(60-63)Online publication date: Feb-2020
  • (2019)Why is Multimedia Quality of Experience Assessment a Challenging Problem?IEEE Access10.1109/ACCESS.2019.29364707(117897-117915)Online publication date: 2019

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media