Acquiring Word Similarities with Higher Order Association Mining

Chakraborti, Sutanu; Wiratunga, Nirmalie; Lothian, Robert; Watt, Stuart

doi:10.1007/978-3-540-74141-1_5

Sutanu Chakraborti¹,
Nirmalie Wiratunga¹,
Robert Lothian¹ &
…
Stuart Watt¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4626))

Included in the following conference series:

International Conference on Case-Based Reasoning

921 Accesses
12 Citations

Abstract

We present a novel approach to mine word similarity in Textual Case Based Reasoning. We exploit indirect associations of words, in addition to direct ones for estimating their similarity. If word A co-occurs with word B, we say A and B share a first order association between them. If A co-occurs with B in some documents, and B with C in some others, then A and C are said to share a second order co-occurrence via B. Higher orders of co-occurrence may similarly be defined. In this paper we present algorithms for mining higher order co-occurrences. A weighted linear model is used to combine the contribution of these higher orders into a word similarity model. Our experimental results demonstrate significant improvements compared to similarity models based on first order co-occurrences alone. Our approach also outperforms state-of-the-art techniques like SVM and LSI in classification tasks of varying complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lenz, M., Burkhard, H.: Case Retrieval Nets: Foundations, Properties, Implementation, and Results, Technical Report, Humboldt-Universität zu Berlin (1996)
Google Scholar
Wiratunga, N., Lothian, R., Chakraborti, S., Koychev, I.: A Propositional Approach to Textual Case Indexing. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 380–391. Springer, Heidelberg (2005)
Chapter Google Scholar
Jarmasz, M., Szpakowicz, S.: Roget’s thesaurus and semantic similarity. In: Proceedings of the International Conference on Recent Advances in NLP (RANLP-2003), pp. 212–219 (2003)
Google Scholar
Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research, Methods, Instruments and Computers 28(2), 203–208 (1996)
Google Scholar
Lemaire, B., Denhière, G.: Effects of High-Order Co-occurrences on Word Semantic Similarity. Current Psychology Letters 18(1) (2006)
Google Scholar
Kontostathis, A., Pottenger, W.M.: A framework for understanding LSI performance. Information Processing and Management 42(1), 56–73 (2006)
Article Google Scholar
Mill, W., Kontostathis, A.: Analysis of the values in the LSI term-term matrix, Technical report, Ursinus College (2004)
Google Scholar
Mitchell, T.: Machine Learning. Mc Graw Hill International (1997)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proc. of ECML, pp. 137–142. ACM Press, New York (1998)
Google Scholar
Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S., Harper, D.: Supervised Latent Semantic Indexing using Adaptive Sprinkling. In: Proc. of IJCAI, pp. 1582–1587 (2007)
Google Scholar
Lamontagne, L.: Textual CBR Authoring using Case Cohesion, in TCBR’06 - Reasoning with Text. In: Proceedings of the ECCBR 2006 Workshops, pp. 33–43 (2006)
Google Scholar
Edmonds, P.: Choosing the word most typical in context using a lexical co-occurrence network. Meeting of the Association for Computational Linguistics, 507–509 (1997)
Google Scholar
Lenz, M.: Knowledge Sources for Textual CBR Applications. In: Lenz, M. (ed.) Textual CBR: Papers from the 1998 Workshop Technical Report WS-98-12, pp. 24–29. AAAI Press, Stanford (1998)
Google Scholar
Semeraro, G., Lops, P., Degemmis, M.: WordNet-based User Profiles for Neighborhood Formation in Hybrid Recommender Systems. In: Procs. of Fifth HIS Conference, pp. 291–296 (2005)
Google Scholar
Xue, G.-R., Lin, C., Yang, Q., Xi, W., Zeng, H.-J., Yu, Y., Chen, Z.: Scalable Collaborative Filtering Using Cluster-based Smoothing. In: Procs. of the 28th ACM SIGIR Conference, pp. 114–121 (2005)
Google Scholar
Terra, E., Clarke, C.L.A.: Frequency Estimates for Word Similarity Measures. In: Proceedings of HLT-NAACL 2003, Main Papers, pp. 165–172 (2003)
Google Scholar
Lenz, M., Burkhard, H.-D.: CBR for Document Retrieval - The FAllQ Project. In: Leake, D.B., Plaza, E. (eds.) Case-Based Reasoning Research and Development. LNCS, vol. 1266, pp. 84–93. Springer, Heidelberg (1997)
Google Scholar
Budanitsky, A.: Lexical semantic relatedness and its application in natural language processing, Technical Report CSRG390, University of Toronto (1999)
Google Scholar
Patterson, D., Rooney, N., Dobrynin, V., Galushka, M.: Sophia: A novel approach for textual case-based reasoning. In: Proc. of IJCAI, pp. 1146–1153 (2005)
Google Scholar
Chakraborti, S., Lothian, R., Wiratunga, N., Watt, S.: Exploiting Higher Order Word Associations in Textual CBR, Technical Report, The Robert Gordon University (2007)
Google Scholar
Wiratunga, N., Massie, S., Lothian, R.: Unsupervised Textual Feature Selection. In: Roth-Berghofer, T.R., Göker, M.H., Güvenir, H.A. (eds.) ECCBR 2006. LNCS (LNAI), vol. 4106, pp. 340–354. Springer, Heidelberg (2006)
Chapter Google Scholar
Mori, J., Ishizuka, M., Matsuo, Y.: Extracting Keyphrases To Represent Relations in Social Networks from Web. In: Proc. of the Twentieth IJCAI Conference, pp. 2820–2825 (2007)
Google Scholar
Boucher-Ryan, P., Bridge, D.: Collaborative Recommending using Formal Concept Analysis. Knowledge-Based Systems 19(5), 309–315 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, The Robert Gordon University, Aberdeen AB25 1HG, Scotland, UK
Sutanu Chakraborti, Nirmalie Wiratunga, Robert Lothian & Stuart Watt

Authors

Sutanu Chakraborti
View author publications
You can also search for this author in PubMed Google Scholar
Nirmalie Wiratunga
View author publications
You can also search for this author in PubMed Google Scholar
Robert Lothian
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Watt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Rosina O. Weber Michael M. Richter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chakraborti, S., Wiratunga, N., Lothian, R., Watt, S. (2007). Acquiring Word Similarities with Higher Order Association Mining. In: Weber, R.O., Richter, M.M. (eds) Case-Based Reasoning Research and Development. ICCBR 2007. Lecture Notes in Computer Science(), vol 4626. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74141-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-74141-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74138-1
Online ISBN: 978-3-540-74141-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics