research-article

Latent association analysis of document pairs

Authors:

Louise E. Moser,

Nikos Anerousis,

Jimeng SunAuthors Info & Claims

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1415 - 1423

https://doi.org/10.1145/2339530.2339752

Published: 12 August 2012 Publication History

Abstract

This paper presents Latent Association Analysis (LAA), a generative model that analyzes the topics within two document sets simultaneously, as well as the correlations between the two topic structures, by considering the semantic associations among document pairs. LAA defines a correlation factor that represents the connection between two documents, and considers the topic proportion of paired documents based on this factor. Words in the documents are assumed to be randomly generated by particular topic assignments and topic-to-word probability distributions. The paper also presents a new ranking algorithm, based on LAA, that can be used to retrieve target documents that are potentially associated with a given source document. The ranking algorithm uses the latent factor in LAA to rank target documents by the strength of their semantic associations with the source document. We evaluate the LAA algorithm with real datasets, specifically, the IT-Change and the IT-Solution document sets from the IBM IT service environment and the Symptom-Treatment document sets from Google Health. Experimental results demonstrate that the LAA algorithm significantly outperforms existing algorithms.

Supplementary Material

JPG File (306_w_talk_6.jpg)

Download
15.78 KB

MP4 File (306_w_talk_6.mp4)

Download
400.75 MB

References

[1]

Google health: https://health.google.com.

[2]

F. R. Bach and M. I. Jordan. A probabilistic interpretation of canonical correlation analysis. Technical report, Statistics Dept., UC Berkeley, 2006.

[3]

C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2nd ed., October 2007.

[4]

D. M. Blei and J. D. Lafferty. Correlated topic models. In NIPS, 2006.

Digital Library

[5]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003.

Digital Library

[6]

J. Boyd-Graber and D. M. Blei. Multilingual topic models for unaligned text. In UAI, pages 75--82, 2009.

Digital Library

[7]

H. T. Dang, D. Kelly, and J. J. Lin. Overview of the trec 2007 question answering track. In TREC, 2007.

[8]

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. J. American Society for Information Science, 41(6):391--407, 1990.

[9]

P. Forner, A. Penas, E. Agirre, I. Alegria, C. Forascu, N. Moreau, P. Osenova, P. Prokopidis, P. Rocha, B. Sacaleanu, R. Sutcliffe, and E. Tjong Kim Sang. Overview of the clef 2008 multilingual question answering track. In Evaluating Systems for Multilingual and Multimodal Information Access, LNCS 5706, pages 262--295. 2009.

Digital Library

[10]

J. Gao, K. Toutanova, and W. tau Yih. Clickthrough-based latent semantic models for Web search. In SIGIR, pages 675--684, 2011.

Digital Library

[11]

A. Haghighi, P. Liang, T. Berg-Kirkpatrick, and D. Klein. Learning bilingual lexicons from monolingual corpora. In ACL, pages 771--779, 2008.

[12]

T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999.

Digital Library

[13]

J. Jagaralamudi and H. Daumé. Extracting multilingual topics from unaligned corpora. In ECIR, 2010.

Digital Library

[14]

J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR, pages 111--119, 2001.

Digital Library

[15]

K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press, 1979.

[16]

Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW, pages 101--110, 2008.

Digital Library

[17]

D. Mimno, H. M. Wallach, J. Naradowsky, D. A. Smith, and A. Mccallum. Polylingual topic models. In EMNLP, pages 880--889, Singapore, 2009.

Digital Library

[18]

R. M. Nallapati, A. Ahmed, E. P. Xing, and W. W. Cohen. Joint latent topic models for text and citations. In SIGKDD, pages 542--550, 2008.

Digital Library

[19]

D. P. Putthividhya, H. T. Attias, and S. Nagarajan. Independent factor topic models. In ICML, pages 833--840, 2009.

Digital Library

[20]

J. Shlens. A tutorial on principal component analysis. In Systems Neurobiology Laboratory, Salk Institute for Biological Studies, 2005.

[21]

M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In SIGKDD, pages 306--315, 2004.

Digital Library

[22]

T. Strohman, W. B. Croft, and D. Jensen. Recommending citations for academic papers. In SIGIR, pages 705--706, 2007.

Digital Library

[23]

B. Taskar, M. F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In NIPS, 2003.

Digital Library

[24]

X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR, pages 475--482, 2008.

Digital Library

[25]

D. Zhang, J. Sun, C. Zhai, A. Bose, and N. Anerousis. PTM: Probabilistic topic mapping model for mining parallel document collections. In CIKM, pages 1653--1656, 2010.

Digital Library

[26]

B. Zhao and E. P. Xing. Bitam: Bilingual topic admixture models for word alignment. In COLING/ACL, pages 969--976, 2006.

Digital Library

[27]

B. Zhao and E. P. Xing. HM-BiTAM: Bilingual topic exploration, word alignment, and translation. In NIPS, 2007.

[28]

D. Zhou, J. Bian, S. Zheng, H. Zha, and C. L. Giles. Exploring social annotations for information retrieval. In WWW, pages 715--724, 2008.

Digital Library

Cited By

He YWang CJiang C(2019)Correlated Matrix Factorization for Recommendation with Implicit FeedbackIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.284099331:3(451-464)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1109/TKDE.2018.2840993
He YWang CJiang C(2018)Discovering Canonical Correlations between Topical and Topological Information in Document NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.276759930:3(460-473)Online publication date: 1-Mar-2018
https://doi.org/10.1109/TKDE.2017.2767599
Fang QGong TLin RZou H(2018)A Smart Reference Management System with Association Analysis Using Social Networking Apporach2018 IEEE 18th International Conference on Communication Technology (ICCT)10.1109/ICCT.2018.8599903(1374-1378)Online publication date: Oct-2018
https://doi.org/10.1109/ICCT.2018.8599903
Show More Cited By

Index Terms

Latent association analysis of document pairs
1. Applied computing
  1. Document management and text processing

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Latent Dirichlet learning for document summarization
ICASSP '09: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Automatic summarization is developed to extract the representative contents or sentences from a large corpus of documents. This paper presents a new hierarchical representation of words, sentences and documents in a corpus, and infers the Dirichlet ...
Topic sentiment change analysis
MLDM'11: Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition

Public opinions on a topic may change over time. Topic Sentiment change analysis is a new research problem consisting of two main components: (a) mining opinions on a certain topic, and (b) detect significant changes of sentiment of the opinions on the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2012

1616 pages

ISBN:9781450314626

DOI:10.1145/2339530

General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '12

Sponsor:

KDD '12: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 12 - 16, 2012

Beijing, China

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
517
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

He YWang CJiang C(2019)Correlated Matrix Factorization for Recommendation with Implicit FeedbackIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.284099331:3(451-464)Online publication date: 1-Mar-2019
https://dl.acm.org/doi/10.1109/TKDE.2018.2840993
He YWang CJiang C(2018)Discovering Canonical Correlations between Topical and Topological Information in Document NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.276759930:3(460-473)Online publication date: 1-Mar-2018
https://doi.org/10.1109/TKDE.2017.2767599
Fang QGong TLin RZou H(2018)A Smart Reference Management System with Association Analysis Using Social Networking Apporach2018 IEEE 18th International Conference on Communication Technology (ICCT)10.1109/ICCT.2018.8599903(1374-1378)Online publication date: Oct-2018
https://doi.org/10.1109/ICCT.2018.8599903
Fei GChen ZMukherjee ALiu B(2018)Discovering Correspondence of Sentiment Words and AspectsComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-75487-1_18(233-245)Online publication date: 21-Mar-2018
https://doi.org/10.1007/978-3-319-75487-1_18
He YWang CJiang C(2017)Mining Coherent Topics With Pre-Learned Interest Knowledge in TwitterIEEE Access10.1109/ACCESS.2017.26965585(10515-10525)Online publication date: 2017
https://doi.org/10.1109/ACCESS.2017.2696558
Diao YJan ELi YRosu DSailer A(2016)Service analytics for IT service managementIBM Journal of Research and Development10.1147/JRD.2016.252062060:2-3(13:1-13:17)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1147/JRD.2016.2520620
Hu CCao H(2016)Aspect-Level Influence Discovery from GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.253822328:7(1635-1649)Online publication date: 1-Jul-2016
https://doi.org/10.1109/TKDE.2016.2538223
Kawbunjun AThongsatapornwatana ULilakiatsakun W(2015)Marketing or Newsletter Sender Reputation System Using Association Analysis Concept2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing10.1109/CIT/IUCC/DASC/PICOM.2015.38(262-269)Online publication date: Oct-2015
https://doi.org/10.1109/CIT/IUCC/DASC/PICOM.2015.38
Thongsatapornwatana UChuenmanus C(2014)Suspect Vehicle Detection Using Vehicle Reputation with Association Analysis Concept2014 IIAI 3rd International Conference on Advanced Applied Informatics10.1109/IIAI-AAI.2014.94(436-440)Online publication date: Aug-2014
https://doi.org/10.1109/IIAI-AAI.2014.94

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten