skip to main content
10.1145/2339530.2339752acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Latent association analysis of document pairs

Published: 12 August 2012 Publication History

Abstract

This paper presents Latent Association Analysis (LAA), a generative model that analyzes the topics within two document sets simultaneously, as well as the correlations between the two topic structures, by considering the semantic associations among document pairs. LAA defines a correlation factor that represents the connection between two documents, and considers the topic proportion of paired documents based on this factor. Words in the documents are assumed to be randomly generated by particular topic assignments and topic-to-word probability distributions. The paper also presents a new ranking algorithm, based on LAA, that can be used to retrieve target documents that are potentially associated with a given source document. The ranking algorithm uses the latent factor in LAA to rank target documents by the strength of their semantic associations with the source document. We evaluate the LAA algorithm with real datasets, specifically, the IT-Change and the IT-Solution document sets from the IBM IT service environment and the Symptom-Treatment document sets from Google Health. Experimental results demonstrate that the LAA algorithm significantly outperforms existing algorithms.

Supplementary Material

JPG File (306_w_talk_6.jpg)
MP4 File (306_w_talk_6.mp4)

References

[1]
Google health: https://health.google.com.
[2]
F. R. Bach and M. I. Jordan. A probabilistic interpretation of canonical correlation analysis. Technical report, Statistics Dept., UC Berkeley, 2006.
[3]
C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2nd ed., October 2007.
[4]
D. M. Blei and J. D. Lafferty. Correlated topic models. In NIPS, 2006.
[5]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003.
[6]
J. Boyd-Graber and D. M. Blei. Multilingual topic models for unaligned text. In UAI, pages 75--82, 2009.
[7]
H. T. Dang, D. Kelly, and J. J. Lin. Overview of the trec 2007 question answering track. In TREC, 2007.
[8]
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. J. American Society for Information Science, 41(6):391--407, 1990.
[9]
P. Forner, A. Penas, E. Agirre, I. Alegria, C. Forascu, N. Moreau, P. Osenova, P. Prokopidis, P. Rocha, B. Sacaleanu, R. Sutcliffe, and E. Tjong Kim Sang. Overview of the clef 2008 multilingual question answering track. In Evaluating Systems for Multilingual and Multimodal Information Access, LNCS 5706, pages 262--295. 2009.
[10]
J. Gao, K. Toutanova, and W. tau Yih. Clickthrough-based latent semantic models for Web search. In SIGIR, pages 675--684, 2011.
[11]
A. Haghighi, P. Liang, T. Berg-Kirkpatrick, and D. Klein. Learning bilingual lexicons from monolingual corpora. In ACL, pages 771--779, 2008.
[12]
T. Hofmann. Probabilistic latent semantic indexing. In SIGIR, pages 50--57, 1999.
[13]
J. Jagaralamudi and H. Daumé. Extracting multilingual topics from unaligned corpora. In ECIR, 2010.
[14]
J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for information retrieval. In SIGIR, pages 111--119, 2001.
[15]
K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press, 1979.
[16]
Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with network regularization. In WWW, pages 101--110, 2008.
[17]
D. Mimno, H. M. Wallach, J. Naradowsky, D. A. Smith, and A. Mccallum. Polylingual topic models. In EMNLP, pages 880--889, Singapore, 2009.
[18]
R. M. Nallapati, A. Ahmed, E. P. Xing, and W. W. Cohen. Joint latent topic models for text and citations. In SIGKDD, pages 542--550, 2008.
[19]
D. P. Putthividhya, H. T. Attias, and S. Nagarajan. Independent factor topic models. In ICML, pages 833--840, 2009.
[20]
J. Shlens. A tutorial on principal component analysis. In Systems Neurobiology Laboratory, Salk Institute for Biological Studies, 2005.
[21]
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In SIGKDD, pages 306--315, 2004.
[22]
T. Strohman, W. B. Croft, and D. Jensen. Recommending citations for academic papers. In SIGIR, pages 705--706, 2007.
[23]
B. Taskar, M. F. Wong, P. Abbeel, and D. Koller. Link prediction in relational data. In NIPS, 2003.
[24]
X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR, pages 475--482, 2008.
[25]
D. Zhang, J. Sun, C. Zhai, A. Bose, and N. Anerousis. PTM: Probabilistic topic mapping model for mining parallel document collections. In CIKM, pages 1653--1656, 2010.
[26]
B. Zhao and E. P. Xing. Bitam: Bilingual topic admixture models for word alignment. In COLING/ACL, pages 969--976, 2006.
[27]
B. Zhao and E. P. Xing. HM-BiTAM: Bilingual topic exploration, word alignment, and translation. In NIPS, 2007.
[28]
D. Zhou, J. Bian, S. Zheng, H. Zha, and C. L. Giles. Exploring social annotations for information retrieval. In WWW, pages 715--724, 2008.

Cited By

View all
  • (2019)Correlated Matrix Factorization for Recommendation with Implicit FeedbackIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.284099331:3(451-464)Online publication date: 1-Mar-2019
  • (2018)Discovering Canonical Correlations between Topical and Topological Information in Document NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.276759930:3(460-473)Online publication date: 1-Mar-2018
  • (2018)A Smart Reference Management System with Association Analysis Using Social Networking Apporach2018 IEEE 18th International Conference on Communication Technology (ICCT)10.1109/ICCT.2018.8599903(1374-1378)Online publication date: Oct-2018
  • Show More Cited By

Index Terms

  1. Latent association analysis of document pairs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2012
    1616 pages
    ISBN:9781450314626
    DOI:10.1145/2339530
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ranking algorithm
    2. topic model
    3. variational inference

    Qualifiers

    • Research-article

    Conference

    KDD '12
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Correlated Matrix Factorization for Recommendation with Implicit FeedbackIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.284099331:3(451-464)Online publication date: 1-Mar-2019
    • (2018)Discovering Canonical Correlations between Topical and Topological Information in Document NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.276759930:3(460-473)Online publication date: 1-Mar-2018
    • (2018)A Smart Reference Management System with Association Analysis Using Social Networking Apporach2018 IEEE 18th International Conference on Communication Technology (ICCT)10.1109/ICCT.2018.8599903(1374-1378)Online publication date: Oct-2018
    • (2018)Discovering Correspondence of Sentiment Words and AspectsComputational Linguistics and Intelligent Text Processing10.1007/978-3-319-75487-1_18(233-245)Online publication date: 21-Mar-2018
    • (2017)Mining Coherent Topics With Pre-Learned Interest Knowledge in TwitterIEEE Access10.1109/ACCESS.2017.26965585(10515-10525)Online publication date: 2017
    • (2016)Service analytics for IT service managementIBM Journal of Research and Development10.1147/JRD.2016.252062060:2-3(13:1-13:17)Online publication date: 1-Mar-2016
    • (2016)Aspect-Level Influence Discovery from GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.253822328:7(1635-1649)Online publication date: 1-Jul-2016
    • (2015)Marketing or Newsletter Sender Reputation System Using Association Analysis Concept2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing10.1109/CIT/IUCC/DASC/PICOM.2015.38(262-269)Online publication date: Oct-2015
    • (2014)Suspect Vehicle Detection Using Vehicle Reputation with Association Analysis Concept2014 IIAI 3rd International Conference on Advanced Applied Informatics10.1109/IIAI-AAI.2014.94(436-440)Online publication date: Aug-2014

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media