Abstract
Discovering aliases in Thai sports news is a challenging task. This paper presents an approach to identifying aliases by analyzing co-occurrence relationships between named entities. Semantically similar names are computed using two vector methods – Latent Semantic Analysis (LSA) and correlation matrix (COM). The LSA method decomposes a name-by-document matrix (NDM) into singular-value and singular-vector matrices. The truncated left singular vector matrix is used for identifying name similarity. The COM method constructs a name-by-name matrix (NNM) from the NDM and then directly measures similarity among name vectors using simple calculations. Both methods are weighted by the same weighting schemes. Obtained similarity relations among names are filtered out based on name types. Our preliminary experimental results show that the COM method performs better than the LSA method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berry, W.M., Browne, M.: Understanding Search Engines: Mathematical Modeling and Text Retrieval, 2nd edn. The Society of Industrial and Applied Mathematics (2005)
Bhat, V., Oates, T., Shanbhag, V., Nicholas, C.: Finding Aliases on the Web Using Latent Semantic Analysis. Data and Knowledge Engineering 49, 129–143 (2004)
Bollegala, D., Honma, T., Matsuo, Y., Ishizuka, M.: Automatically Extracting Personal Name Aliases from the Web. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 77–88. Springer, Heidelberg (2008)
Kumar, A.C., Srinivas, S.: Latent Semantic Indexing Using Eigenvalue Analysis for Efficient Information Retrieval. International Journal of Applied Mathematics and Computer Science 16(4), 551–558 (2006)
Kumar, A.C., Srinivas, S.: A Note on the Effect of Term Weighting on Selecting Intrinsic Dimensionality of Data. Journal of Cybernetics and Information Technologies 9(1), 5–12 (2009)
Landauer, K.T., Foltz, W.P., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)
Nakov, P., Popova, A., Mateev, P.: Weight Functions Impact on LSA Performance. In: Proceedings of the Euro Conference Recent Advances in Natural Language Processing, pp. 187–193 (2001)
Pantel, P.: Aliases Detection in Malicious Environments. In: Proceedings of AAAI Fall Symposium on Capturing and Using Patterns for Evidence Detection, pp. 14–20 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Suwanapong, T., Theeramunkong, T., Nantajeewarawat, E. (2010). The Vector Space Models for Finding Co-occurrence Names as Aliases in Thai Sports News. In: Nguyen, N.T., Le, M.T., Świątek, J. (eds) Intelligent Information and Database Systems. ACIIDS 2010. Lecture Notes in Computer Science(), vol 5990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12145-6_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-12145-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12144-9
Online ISBN: 978-3-642-12145-6
eBook Packages: Computer ScienceComputer Science (R0)