Abstract
Latent Semantic Analysis (LSA) is a mathematical approach that uses Singular Value Decomposition to discover the important association of the relationship between terms and terms, terms and documents and also documents and documents. LSA adopted cosine similarity measure to calculate the similarity between the query and terms as well as the documents. This approach seem to be efficient if each of the term only have single meaning and a meaning only represent by single term. Unfortunately, there are terms that have multiple meanings and a single meaning that represent by multiple terms. If these terms are treated as a single word, it will lead the search engine to retrieve the irrelevant documents. The irrelevant documents that been retrieved will affect the effectiveness of the search engine. This paper propose to enhance LSA by embedding tagging algorithm. To investigate the effectiveness of LSA using tagging algorithm in retrieving documents from the Malay corpus, seven experiments are conducted. The first experiment conducted to compare the time taken for extracting the normal term list and the tagged term list, total number of both lists and also the time taken for the creation of term document matrix. All other experiments record all the results of the retrieval system by using different dimension and threshold value. The retrieval result averagely shows F-measure enhancement of approximate to 3.60% by using LSA with tagging algorithm (LSAT) compared to retrieval result of LSA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
SivaKumar, A.P., Premchand, P., Govardhan, A.: Indian languages IR using latent semantic indexing. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 3(4) (2011)
Kumar, Ch.A., Radvansky, M., Annapurna, J.: Analysis of a vector space model, latent semantic indexing and formal concept analysis for information retrieval. Cybern. Inf. Technol. 12(1) (2012)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval: The Concepts and Technology Behind Search (2012)
Sadjirin, R., Rahman, N.A.: Efficient retrieval of Malay language documents using latent semantic indexing. In: 2010 International Symposium on Information Technology (ITSim), vol. 3. IEEE (2010)
Reshma, O.K., Sreejith, C., Reghu Raj, P.C.: An effective Malayalam information retrieval system using query expansion. In: International Conference on Control Communication and Computing (ICCC) (2013)
Wang, H., Guo, Y., Li, J., Shi, X.: Research of the conceptual representing of documents based on light ontology. In: 9th International Conference on Fuzzy Systems and Knowledge Discovery (2012)
Phadnis, N., Gadge, J.: Framework for document retrieval using latent semantic indexing. Int. J. Comput. Appl. (0975–8887) 94 (2014)
Alfathe, M., Al-Taie, S.: Document retrieval system, a case study. Int. J. Eng. Res. Appl. 6(7), 20–22 (2016)
Skallman, E.: The interplay of synonymy and polysemy: the case of arrojar, echar, lanzar and tirar. Master Thesis, University of TromsØ, 2012
Babu, A., Sindhu, L.: A survey of information retrieval models for Malayalam language processing. Int. J. Comput. Appl. (0975–8887) 107 (2014)
Hasmy, H., Bakar, Z.A., Ahmad, F.: Construction of computational lexicon for Malay language. In: IVIC, LNCS, vol. 9429, pp. 257–268 (2015)
Kassim, M.N., Maarof, M.A., Zainal, A., Wahab, A.A.: Word stemming challenges in Malay texts: a literature review. In: Fourth International Conference on Information and Communication Technologies (ICoICT) (2016)
Sharum, M.Y., Abdullah, M.T., Sulaiman, M.N., Murad, M.A.A., Hamzah, Z.A.Z.: MALIM—a new computational approach of Malay morphology. In: Proceedings of 4th International Symposium on Information Technology—ITSim: KL, vol. 2, pp. 837–843 (2010)
Alfred, R., Mujat, A., Obit, J.H.: A ruled-based part of speech (RPOS) tagger for Malay text articles. In: ACIIDS, Part II, pp. 50–59 (2013)
Acknowledgements
This research is based upon work supported by Ministry of Higher Education (Malaysia) under Fundamental Research Grant Scheme (FRGS/1/2015/ICT01/UiTM/03/1) and Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Selangor, Malaysia.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Rahman, N.A., Soom, A.B.M., Ismail, N.K. (2017). Enhancing Latent Semantic Analysis by Embedding Tagging Algorithm in Retrieving Malay Text Documents. In: Król, D., Nguyen, N., Shirai, K. (eds) Advanced Topics in Intelligent Information and Database Systems. ACIIDS 2017. Studies in Computational Intelligence, vol 710. Springer, Cham. https://doi.org/10.1007/978-3-319-56660-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-56660-3_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56659-7
Online ISBN: 978-3-319-56660-3
eBook Packages: EngineeringEngineering (R0)