Abstract
We present a novel approach to mine word similarity in Textual Case Based Reasoning. We exploit indirect associations of words, in addition to direct ones for estimating their similarity. If word A co-occurs with word B, we say A and B share a first order association between them. If A co-occurs with B in some documents, and B with C in some others, then A and C are said to share a second order co-occurrence via B. Higher orders of co-occurrence may similarly be defined. In this paper we present algorithms for mining higher order co-occurrences. A weighted linear model is used to combine the contribution of these higher orders into a word similarity model. Our experimental results demonstrate significant improvements compared to similarity models based on first order co-occurrences alone. Our approach also outperforms state-of-the-art techniques like SVM and LSI in classification tasks of varying complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lenz, M., Burkhard, H.: Case Retrieval Nets: Foundations, Properties, Implementation, and Results, Technical Report, Humboldt-Universität zu Berlin (1996)
Wiratunga, N., Lothian, R., Chakraborti, S., Koychev, I.: A Propositional Approach to Textual Case Indexing. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 380–391. Springer, Heidelberg (2005)
Jarmasz, M., Szpakowicz, S.: Roget’s thesaurus and semantic similarity. In: Proceedings of the International Conference on Recent Advances in NLP (RANLP-2003), pp. 212–219 (2003)
Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research, Methods, Instruments and Computers 28(2), 203–208 (1996)
Lemaire, B., Denhière, G.: Effects of High-Order Co-occurrences on Word Semantic Similarity. Current Psychology Letters 18(1) (2006)
Kontostathis, A., Pottenger, W.M.: A framework for understanding LSI performance. Information Processing and Management 42(1), 56–73 (2006)
Mill, W., Kontostathis, A.: Analysis of the values in the LSI term-term matrix, Technical report, Ursinus College (2004)
Mitchell, T.: Machine Learning. Mc Graw Hill International (1997)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proc. of ECML, pp. 137–142. ACM Press, New York (1998)
Chakraborti, S., Mukras, R., Lothian, R., Wiratunga, N., Watt, S., Harper, D.: Supervised Latent Semantic Indexing using Adaptive Sprinkling. In: Proc. of IJCAI, pp. 1582–1587 (2007)
Lamontagne, L.: Textual CBR Authoring using Case Cohesion, in TCBR’06 - Reasoning with Text. In: Proceedings of the ECCBR 2006 Workshops, pp. 33–43 (2006)
Edmonds, P.: Choosing the word most typical in context using a lexical co-occurrence network. Meeting of the Association for Computational Linguistics, 507–509 (1997)
Lenz, M.: Knowledge Sources for Textual CBR Applications. In: Lenz, M. (ed.) Textual CBR: Papers from the 1998 Workshop Technical Report WS-98-12, pp. 24–29. AAAI Press, Stanford (1998)
Semeraro, G., Lops, P., Degemmis, M.: WordNet-based User Profiles for Neighborhood Formation in Hybrid Recommender Systems. In: Procs. of Fifth HIS Conference, pp. 291–296 (2005)
Xue, G.-R., Lin, C., Yang, Q., Xi, W., Zeng, H.-J., Yu, Y., Chen, Z.: Scalable Collaborative Filtering Using Cluster-based Smoothing. In: Procs. of the 28th ACM SIGIR Conference, pp. 114–121 (2005)
Terra, E., Clarke, C.L.A.: Frequency Estimates for Word Similarity Measures. In: Proceedings of HLT-NAACL 2003, Main Papers, pp. 165–172 (2003)
Lenz, M., Burkhard, H.-D.: CBR for Document Retrieval - The FAllQ Project. In: Leake, D.B., Plaza, E. (eds.) Case-Based Reasoning Research and Development. LNCS, vol. 1266, pp. 84–93. Springer, Heidelberg (1997)
Budanitsky, A.: Lexical semantic relatedness and its application in natural language processing, Technical Report CSRG390, University of Toronto (1999)
Patterson, D., Rooney, N., Dobrynin, V., Galushka, M.: Sophia: A novel approach for textual case-based reasoning. In: Proc. of IJCAI, pp. 1146–1153 (2005)
Chakraborti, S., Lothian, R., Wiratunga, N., Watt, S.: Exploiting Higher Order Word Associations in Textual CBR, Technical Report, The Robert Gordon University (2007)
Wiratunga, N., Massie, S., Lothian, R.: Unsupervised Textual Feature Selection. In: Roth-Berghofer, T.R., Göker, M.H., Güvenir, H.A. (eds.) ECCBR 2006. LNCS (LNAI), vol. 4106, pp. 340–354. Springer, Heidelberg (2006)
Mori, J., Ishizuka, M., Matsuo, Y.: Extracting Keyphrases To Represent Relations in Social Networks from Web. In: Proc. of the Twentieth IJCAI Conference, pp. 2820–2825 (2007)
Boucher-Ryan, P., Bridge, D.: Collaborative Recommending using Formal Concept Analysis. Knowledge-Based Systems 19(5), 309–315 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chakraborti, S., Wiratunga, N., Lothian, R., Watt, S. (2007). Acquiring Word Similarities with Higher Order Association Mining. In: Weber, R.O., Richter, M.M. (eds) Case-Based Reasoning Research and Development. ICCBR 2007. Lecture Notes in Computer Science(), vol 4626. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74141-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-74141-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74138-1
Online ISBN: 978-3-540-74141-1
eBook Packages: Computer ScienceComputer Science (R0)