Abstract
Research in Malay Part-of-Speech (POS) has increased considerably in the past few years. From the literature, POS are known as the first stage in automated text analysis and the development of language technologies can scarcely begun without this initial phase. Malay language can be written in Roman or Jawi. Three different spelling between Roman and Jawi make this study essential. In this paper, we highlighted the problem and issues related to Malay language, POS general framework, POS approaches and techniques. POS at basis was introduced to get information from Old Malay Manuscripts that contain important information in various spheres of knowledge. Promising result for the auto-tagging of Malay written in Jawi is expected.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
McEnery, T., Wilson, A.: Corpus Linguistics: An Introduction, 2nd edn. Edinburgh University Press, Edinburgh (2004)
Jurafsky, D., Martin, J.H.: Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Pearson Education, Inc., New Jersey (2009)
Mohammed, N.: Sejarah sosiolinguistik Bahasa Melayu lama. Universiti Sains Malaysia, Pulau Pinang (1999)
Abdullah, W.M.S.: Tulisan Melayu/Jawi dalam manuskrip dan kitab bercetak: Suatu analisis perbandingan. In: Tradisi Penulisan Manusrip Melayu, pp. 87–105. Perpustakaan Negara Malaysia, Kuala Lumpur (1997)
Shamsul, C.W., Omar, K., Nasrudin, M.F., Murah, M.Z.: Machine Transliteration for Old Malay Manuscript. In: The 2nd National Doctoral Seminar on Artificial Intelligence Technology, pp. 19–25. Selangor (2012)
Sulaiman, S., Omar, K., Omar, N., Murah, M.Z., Rahman, H.A.: A Malay Stemmers for Jawi Characters. In: Wang, D., Reynolds, M. (eds.) AI 2011. LNCS, vol. 7106, pp. 668–676. Springer, Heidelberg (2011)
Nasrudin, M.F., Omar, K., Zakaria, M.F., Yeun, L.C.: Handwritten Cursive Jawi Character Recognition: A Survey. In: 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation, pp. 247–256. IEEE, Penang (2008)
Rahman, H.A.: Panduan menulis dan mengeja Jawi. Dewan Bahasa dan Pustaka, Kuala Lumpur (1999)
Brants, T.: TnT – A Statistical Part-of-Speech Tagger. In: 6th Conference on Applied Natural Language Processing, pp. 224–231. ACL, USA (2000)
Pustaka, D.B.: Daftar Kata Bahasa Melayu Rumi-Sebutan-Jawi, Edisi Kedua. Dawama Sdn. Bhd., Kuala Lumpur (2008)
Das, D., Petrov, S.: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011), pp. 600–609. ACL, Stroudsburg (2011)
Biemann, C.: Unsupervised Part-of-Speech Tagging in the Large. Research on Language and Computation 7( 2-4), 101–135 (2010)
Gungor, T.: Part-of-Speech Tagging. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn., pp. 205–235. Chapman & Hall/CRC (2010)
Haghighi, A., Klein, D.: Prototype-Driven Learning for Sequence Models. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pp. 320–327. ACL, Stroudsburg (2006)
Ninomiya, D., Mozgovoy, M.: Improving POS tagging for ungrammatical phrases. In: Proceedings of the 2012 Joint International Conference on Human-Centered Computer Environments (HCCE 2012), pp. 28–31. ACM, New York (2012)
Teodorescu, L.R., Boldizsar, R., Ordean, M., Duma, M., Detesan, L., Ordean, M.: Part of Speech Tagging for Romanian Text-to-Speech System. In: 2011 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 153–159. IEEE, Timisoara (2011)
Mohamed, H., Omar, N., Aziz, M.J.A.: Statistical Malay Part-of-Speech (POS) Tagger using Hidden Markov Approach. In: 2011 International Conference on Semantic Technology and Information Retrieval, pp. 231–236. IEEE, Putrajaya (2011)
Wicaksono, A., Purwarianti, A.F.: HMM Based Part-of-speech Tagger for Bahasa Indonesia. In: The 4th International MALINDO (Malay and Indonesian Language) Workshop (2010)
Pisceldo, F., Adriani, M., Manurung, R.: Probabilistic Part of Speech Tagging for Bahasa Indonesia. In: Third International MALINDO Workshop, Colocated Event ACL-IJCNLP (2009)
Syandra, S., Hayurani, H., Adriani, M., Bressan, S.: Developing Part of Speech Tagger for Bahasa Indonesia Using Brill Tagger. In: Second International MALINDO Workshop (2008)
Kubler, S., Mohamed, E.: Part of speech tagging for Arabic. Natural Language Engineering 18(4), 521–548 (2012), doi:10.1017/S1351324911000325
Albared, M., Omar, N., Aziz, M.J.A., Ahmad Nazri, M.Z.: Automatic Part of Speech Tagging for Arabic: An Experiment Using Bigram Hidden Markov Model. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS, vol. 6401, pp. 361–370. Springer, Heidelberg (2010)
Alqrainy, S., AlSerhan, H.M., Ayesh, A.: Pattern-based algorithm for Part-of-Speech tagging Arabic text. In: 2008 International Conference on Computer Engineering & Systems, pp. 119–124. IEEE, Cairo (2008)
Teichert, A.R., Daume III, H.: Unsupervised Part of Speech Tagging Without a Lexicon. In: NIPS Workshop on Grammar Induction, Representation of Language and Language Learning, pp. 1–6 (2009)
Don, Z.M.: Processing Natural Malay Texts: a Data-Driven Approach. Trames. Journal of the Humanities and Social Sciences 14(1), 90–103 (2010)
Ranaivo-Malancon, B.: Malay lexical analysis through corpus-based approach. In: International Conference of Malay Lexicology and Lexicography (PALMA). Universiti Sains Malaysia (2005)
Zamin, N., Oxley, A., Bakar, Z.A., Farhan, S.A.: A Statistical Dictionary-based Word Alignment Algorithm: An Unsupervised Approach. In: 2012 International Conference on Computer & Information Science (ICCIS), pp. 396–402. IEEE, Kuala Lumpur (2012)
Sulaiman, S., Omar, K., Omar, N., Murah, M.Z., Rahman, H.A.: Spelling Error Detector Rule for Jawi Stemmer. In: 2011 International Conference on Pattern Analysis and Intelligent Robotics, pp. 78–82. IEEE, Putrajaya (2011)
Razak, Z., Sumali, S.R., Idris, M.Y.I., Ahmedy, I., Yusoff, M.Y.Z.B.M.: Review of Hardware Implementation of Speech-To-Text Engine for Jawi Character. In: 2011 International Conference on Science and Social Research (CSSR 2010), pp. 565–568. IEEE, Kuala Lumpur (2010)
Othman, Z.A., Razak, Z., Abdullah, N.A., Yusoff, M.Y.Z.B.M.: Jawi Character Speech-to-Text Engine Using Linear Predictive and Neural Network for Effective Reading. In: 2009 Third Asia International Conference on Modelling & Simulation, pp. 348–352. IEEE, Bali (2009)
Ghani, R.A.A., Zakaria, M.S., Omar, K.: Jawi-Malay Transliteration. In: 2009 International Conference on Electrical Engineering and Informatics, pp. 154–157. IEEE, Selangor (2009)
Bakar, J.A.: Transliterasi Jawi Lama-Jawi Baru berasaskan Grafem (Kajian Kes Pada Hikayat Merong Mahawangsa). Universiti Kebangsaan Malaysia (2008)
Yonhendri: Enjin Transliterasi Rumi Jawi. Universiti Kebangsaan Malaysia (2008)
Ahmad, C.W.S.C.W.: Penterjemah Jawi lama kepada Jawi baru. Universiti Kebangsaan Malaysia (2007)
Nasrudin, M.F., Petrou, M.: Offline Handwritten Jawi Recognition using the Trace Transform. In: 2011 International Conference on Pattern Analysis and Intelligent Robotics, pp. 87–91. IEEE, Putrajaya (2011)
Nasrudin, M.F., Petrou, M., Kotoulas, L.: Jawi Character Recognition Using the Trace Transform. In: 2010 Seventh International Conference on Computer Graphics, Imaging and Visualization, Sydney, pp. 151–156 (2010)
Azmi, M.S., Omar, K., Faidzul, M., Khadijah, N., Mohd, W.: Arabic Calligraphy Identification for Digital Jawi Paleography using Triangle Blocks. In: Proceeding of the International Conference on Electrical Engineering and Informatics (ICEEI), pp. 1–5. IEEE, Bandung (2011)
Heryanto, A., Nasrudin, M.F., Omar, K.: Offline Jawi Handwritten Recognizer Using Hybrid Artificial Neural Networks and Dynamic Programming. In: International Symposium on Information Technology (ITSim 2008), pp. 1–6. IEEE, Kuala Lumpur (2008)
Zulcaffle, T.M.A., Othman, A.K., Abidin, W.A.W.Z., Mohammaddan, S., Marzuki, A.S.W.: A Thresholding Algorithm for Text/Background Segmentation in Degraded Handwritten Jawi Documents. In: 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, pp. 80–84. IEEE, Jakarta (2010)
Redika, R., Omar, K., Nasrudin, M.F.: Handwritten Jawi Words Recognition Using Hidden Markov Models. In: International Symposium on Information Technology (ITSim 2008), pp. 1–5. IEEE, Kuala Lumpur (2008)
Shitiq, H.A.A.H., Mahmud, R.: Using an Edutainment Approach of a Snake and Ladder game for teaching Jawi Script. In: 2010 International Conference on Education and Management Technology (ICEMT 2010), pp. 228–232. IEEE, Cairo (2010)
Diah, N.M., Ismail, M., Hami, P.M.A., Ahmad, S.: Assisted Jawi-writing (AJaW) software for children. In: 2011 IEEE Conference on Open Systems, pp. 322–326. IEEE, Langkawi (2011)
Diah, N.M., Ismail, M., Ahmad, S., Abdullah, S.A.S.S.: Jawi on Mobile Devices with Jawi WordSearch Game Application. In: 2010 International Conference on Science and Social Research (CSSR 2010), pp. 326–329. IEEE, Kuala Lumpur (2010)
Abdullah, N.A., Raja, R.H., Kamaruddin, A., Razak, Z., Yusoff, M.Y.Z.B.M.: An authoring toolkit design for educational game content. In: 2008 International Symposium on Information Technology, pp. 1–6. IEEE, Kuala Lumpur (2008)
Ismail, K., Yusof, R.J.R., Jomhari, N.: A case study of Jawi Editor in the XO-laptop simulated environment. In: 2010 International Conference on User Science and Engineering (i-USEr), pp. 21–25. IEEE, Shah Alam (2010)
Rahim, N.H.A.: A Statistical Parser To Reduce Structural Ambiguity in Malay Grammar Rules. Universiti Malaya (2011)
Karim, N.S., Onn, F.M., Musa, H., Mahmood, A.H.: Pembentukan Kata. In: Tatabahasa Dewan Edisi Ketiga, p. 57 (2010)
Karim, N.S., Onn, F.M., Musa, H., Mahmood, A.H.: Sintaksis, Satu Pengenalan. In: Tatabahasa Dewan Edisi Ketiga, p. 339 (2010)
Chomsky, H.: Syntactic Structure. The Hague, The Netherlands (1957)
Biemann, C.: Unsupervised and Knowledge-free Natural Language Processing in the Structure Discovery Paradigm. University of Leipzig (2007)
van der Maaten, L., Welling, M., Saul, L.K.: Hidden-Unit Conditional Random Fields. In: The 14th International Conference on Artificial Intelligence and Statistics (AISTATS 2011), USA, vol. 15, pp. 479–488 (2011)
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics 21(4), 543–565 (1995)
Lluis, M., Lluis, P., Horacio, R.: A Machine Approach to POS Tagging. Machine Learning 39, 59–91 (2000)
Brants, T.: Part-of-Speech Tagging. In: Encyclopedia of Language & Linguistics, 2nd edn., pp. 221–230 (2006)
Dickinson, M.: Determining Ambiguity Classes for Part-of-Speech Tagging. In: The Recent Advances in Natural Language Processing (RANLP 2007), Bulgaria (2007)
Stanford Log-linear Part-of-Speech Tagger, http://nlp.stanford.edu/software/tagger.shtml
Nugues, P.M.: An introduction to language processing with Perl and Prolog: an outline of theories, implementation, and application with special consideration of English, French, and German. Springer, New York (2006)
Schroder, I.: Case Study in Part-of Speech Tagging Using the ICOPOST Toolkit. Univ. Bibliothek des Fachbereichs Informatik (2002)
Diab, M., Hacioglu, K., Jurafsky, D.: Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks. In: The Human Language Technology Conference/North American (2004)
Khoja, S.: APT: Arabic Part-of-speech Tagger. In: The Proceedings of the Student Workshop at the Second Meeting of NAACL 2001, pp. 20–25 (2001)
Hassan, Y.S.M.F., Zamin, N.: Creating Extraction Pattern by Combining Part of Speech Tagger and Grammatical Parser. In: Proceeding of the International Conference on Computer Technology and Development, pp. 515–519. IEEE, Kota Kinabalu (2009)
Jahangiri, N., Kahani, M., Ahamdi, R., Sazvar, M.: A study on part of speech tagging. Review Literature and Arts of the Americas (2011)
Zin, K.K., Thein, N.L.: Part of Speech Tagging for Myanmar Using Hidden Markov Model. In: Proceedings of 3rd International Conference on Communications and Information, pp. 123–128 (2009)
Zhu, X.: Semi-Supervised Learning Literature Survey Contents. Univ. of Winconsin, Madison (2008)
Harris, Z.S.: Mathematical structures of language. Interscience Publishers, New York (1968)
Berg-Kirkpatrick, T., Bouchard-Cote, A., DeNero, J., Klein, D.: Painless Unsupervised Learning with Features. In: Proceedings of NAACL 2010, California, pp. 582–590 (2010)
Naseem, T., Snyder, B., Eisenstein, J., Barzilay, R.: Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches. Journal of Artificial Intelligence Research 36, 341–385 (2009)
Biemann, C.: Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pp. 73–80. ACL, USA (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abu Bakar, J., Omar, K., Nasrudin, M.F., Murah, M.Z. (2013). Part-of-Speech for Old Malay Manuscript Corpus: A Review. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-40567-9_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40566-2
Online ISBN: 978-3-642-40567-9
eBook Packages: Computer ScienceComputer Science (R0)