Skip to main content

Part-of-Speech for Old Malay Manuscript Corpus: A Review

  • Conference paper
Soft Computing Applications and Intelligent Systems (M-CAIT 2013)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 378))

Included in the following conference series:

Abstract

Research in Malay Part-of-Speech (POS) has increased considerably in the past few years. From the literature, POS are known as the first stage in automated text analysis and the development of language technologies can scarcely begun without this initial phase. Malay language can be written in Roman or Jawi. Three different spelling between Roman and Jawi make this study essential. In this paper, we highlighted the problem and issues related to Malay language, POS general framework, POS approaches and techniques. POS at basis was introduced to get information from Old Malay Manuscripts that contain important information in various spheres of knowledge. Promising result for the auto-tagging of Malay written in Jawi is expected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. McEnery, T., Wilson, A.: Corpus Linguistics: An Introduction, 2nd edn. Edinburgh University Press, Edinburgh (2004)

    Google Scholar 

  2. Jurafsky, D., Martin, J.H.: Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Pearson Education, Inc., New Jersey (2009)

    Google Scholar 

  3. Mohammed, N.: Sejarah sosiolinguistik Bahasa Melayu lama. Universiti Sains Malaysia, Pulau Pinang (1999)

    Google Scholar 

  4. Abdullah, W.M.S.: Tulisan Melayu/Jawi dalam manuskrip dan kitab bercetak: Suatu analisis perbandingan. In: Tradisi Penulisan Manusrip Melayu, pp. 87–105. Perpustakaan Negara Malaysia, Kuala Lumpur (1997)

    Google Scholar 

  5. Shamsul, C.W., Omar, K., Nasrudin, M.F., Murah, M.Z.: Machine Transliteration for Old Malay Manuscript. In: The 2nd National Doctoral Seminar on Artificial Intelligence Technology, pp. 19–25. Selangor (2012)

    Google Scholar 

  6. Sulaiman, S., Omar, K., Omar, N., Murah, M.Z., Rahman, H.A.: A Malay Stemmers for Jawi Characters. In: Wang, D., Reynolds, M. (eds.) AI 2011. LNCS, vol. 7106, pp. 668–676. Springer, Heidelberg (2011)

    Google Scholar 

  7. Nasrudin, M.F., Omar, K., Zakaria, M.F., Yeun, L.C.: Handwritten Cursive Jawi Character Recognition: A Survey. In: 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation, pp. 247–256. IEEE, Penang (2008)

    Chapter  Google Scholar 

  8. Rahman, H.A.: Panduan menulis dan mengeja Jawi. Dewan Bahasa dan Pustaka, Kuala Lumpur (1999)

    Google Scholar 

  9. Brants, T.: TnT – A Statistical Part-of-Speech Tagger. In: 6th Conference on Applied Natural Language Processing, pp. 224–231. ACL, USA (2000)

    Google Scholar 

  10. Pustaka, D.B.: Daftar Kata Bahasa Melayu Rumi-Sebutan-Jawi, Edisi Kedua. Dawama Sdn. Bhd., Kuala Lumpur (2008)

    Google Scholar 

  11. Das, D., Petrov, S.: Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011), pp. 600–609. ACL, Stroudsburg (2011)

    Google Scholar 

  12. Biemann, C.: Unsupervised Part-of-Speech Tagging in the Large. Research on Language and Computation 7( 2-4), 101–135 (2010)

    Article  Google Scholar 

  13. Gungor, T.: Part-of-Speech Tagging. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn., pp. 205–235. Chapman & Hall/CRC (2010)

    Google Scholar 

  14. Haghighi, A., Klein, D.: Prototype-Driven Learning for Sequence Models. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pp. 320–327. ACL, Stroudsburg (2006)

    Google Scholar 

  15. Ninomiya, D., Mozgovoy, M.: Improving POS tagging for ungrammatical phrases. In: Proceedings of the 2012 Joint International Conference on Human-Centered Computer Environments (HCCE 2012), pp. 28–31. ACM, New York (2012)

    Chapter  Google Scholar 

  16. Teodorescu, L.R., Boldizsar, R., Ordean, M., Duma, M., Detesan, L., Ordean, M.: Part of Speech Tagging for Romanian Text-to-Speech System. In: 2011 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 153–159. IEEE, Timisoara (2011)

    Chapter  Google Scholar 

  17. Mohamed, H., Omar, N., Aziz, M.J.A.: Statistical Malay Part-of-Speech (POS) Tagger using Hidden Markov Approach. In: 2011 International Conference on Semantic Technology and Information Retrieval, pp. 231–236. IEEE, Putrajaya (2011)

    Chapter  Google Scholar 

  18. Wicaksono, A., Purwarianti, A.F.: HMM Based Part-of-speech Tagger for Bahasa Indonesia. In: The 4th International MALINDO (Malay and Indonesian Language) Workshop (2010)

    Google Scholar 

  19. Pisceldo, F., Adriani, M., Manurung, R.: Probabilistic Part of Speech Tagging for Bahasa Indonesia. In: Third International MALINDO Workshop, Colocated Event ACL-IJCNLP (2009)

    Google Scholar 

  20. Syandra, S., Hayurani, H., Adriani, M., Bressan, S.: Developing Part of Speech Tagger for Bahasa Indonesia Using Brill Tagger. In: Second International MALINDO Workshop (2008)

    Google Scholar 

  21. Kubler, S., Mohamed, E.: Part of speech tagging for Arabic. Natural Language Engineering 18(4), 521–548 (2012), doi:10.1017/S1351324911000325

    Article  Google Scholar 

  22. Albared, M., Omar, N., Aziz, M.J.A., Ahmad Nazri, M.Z.: Automatic Part of Speech Tagging for Arabic: An Experiment Using Bigram Hidden Markov Model. In: Yu, J., Greco, S., Lingras, P., Wang, G., Skowron, A. (eds.) RSKT 2010. LNCS, vol. 6401, pp. 361–370. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  23. Alqrainy, S., AlSerhan, H.M., Ayesh, A.: Pattern-based algorithm for Part-of-Speech tagging Arabic text. In: 2008 International Conference on Computer Engineering & Systems, pp. 119–124. IEEE, Cairo (2008)

    Chapter  Google Scholar 

  24. Teichert, A.R., Daume III, H.: Unsupervised Part of Speech Tagging Without a Lexicon. In: NIPS Workshop on Grammar Induction, Representation of Language and Language Learning, pp. 1–6 (2009)

    Google Scholar 

  25. Don, Z.M.: Processing Natural Malay Texts: a Data-Driven Approach. Trames. Journal of the Humanities and Social Sciences 14(1), 90–103 (2010)

    Article  MathSciNet  Google Scholar 

  26. Ranaivo-Malancon, B.: Malay lexical analysis through corpus-based approach. In: International Conference of Malay Lexicology and Lexicography (PALMA). Universiti Sains Malaysia (2005)

    Google Scholar 

  27. Zamin, N., Oxley, A., Bakar, Z.A., Farhan, S.A.: A Statistical Dictionary-based Word Alignment Algorithm: An Unsupervised Approach. In: 2012 International Conference on Computer & Information Science (ICCIS), pp. 396–402. IEEE, Kuala Lumpur (2012)

    Chapter  Google Scholar 

  28. Sulaiman, S., Omar, K., Omar, N., Murah, M.Z., Rahman, H.A.: Spelling Error Detector Rule for Jawi Stemmer. In: 2011 International Conference on Pattern Analysis and Intelligent Robotics, pp. 78–82. IEEE, Putrajaya (2011)

    Chapter  Google Scholar 

  29. Razak, Z., Sumali, S.R., Idris, M.Y.I., Ahmedy, I., Yusoff, M.Y.Z.B.M.: Review of Hardware Implementation of Speech-To-Text Engine for Jawi Character. In: 2011 International Conference on Science and Social Research (CSSR 2010), pp. 565–568. IEEE, Kuala Lumpur (2010)

    Google Scholar 

  30. Othman, Z.A., Razak, Z., Abdullah, N.A., Yusoff, M.Y.Z.B.M.: Jawi Character Speech-to-Text Engine Using Linear Predictive and Neural Network for Effective Reading. In: 2009 Third Asia International Conference on Modelling & Simulation, pp. 348–352. IEEE, Bali (2009)

    Google Scholar 

  31. Ghani, R.A.A., Zakaria, M.S., Omar, K.: Jawi-Malay Transliteration. In: 2009 International Conference on Electrical Engineering and Informatics, pp. 154–157. IEEE, Selangor (2009)

    Chapter  Google Scholar 

  32. Bakar, J.A.: Transliterasi Jawi Lama-Jawi Baru berasaskan Grafem (Kajian Kes Pada Hikayat Merong Mahawangsa). Universiti Kebangsaan Malaysia (2008)

    Google Scholar 

  33. Yonhendri: Enjin Transliterasi Rumi Jawi. Universiti Kebangsaan Malaysia (2008)

    Google Scholar 

  34. Ahmad, C.W.S.C.W.: Penterjemah Jawi lama kepada Jawi baru. Universiti Kebangsaan Malaysia (2007)

    Google Scholar 

  35. Nasrudin, M.F., Petrou, M.: Offline Handwritten Jawi Recognition using the Trace Transform. In: 2011 International Conference on Pattern Analysis and Intelligent Robotics, pp. 87–91. IEEE, Putrajaya (2011)

    Chapter  Google Scholar 

  36. Nasrudin, M.F., Petrou, M., Kotoulas, L.: Jawi Character Recognition Using the Trace Transform. In: 2010 Seventh International Conference on Computer Graphics, Imaging and Visualization, Sydney, pp. 151–156 (2010)

    Google Scholar 

  37. Azmi, M.S., Omar, K., Faidzul, M., Khadijah, N., Mohd, W.: Arabic Calligraphy Identification for Digital Jawi Paleography using Triangle Blocks. In: Proceeding of the International Conference on Electrical Engineering and Informatics (ICEEI), pp. 1–5. IEEE, Bandung (2011)

    Google Scholar 

  38. Heryanto, A., Nasrudin, M.F., Omar, K.: Offline Jawi Handwritten Recognizer Using Hybrid Artificial Neural Networks and Dynamic Programming. In: International Symposium on Information Technology (ITSim 2008), pp. 1–6. IEEE, Kuala Lumpur (2008)

    Chapter  Google Scholar 

  39. Zulcaffle, T.M.A., Othman, A.K., Abidin, W.A.W.Z., Mohammaddan, S., Marzuki, A.S.W.: A Thresholding Algorithm for Text/Background Segmentation in Degraded Handwritten Jawi Documents. In: 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, pp. 80–84. IEEE, Jakarta (2010)

    Chapter  Google Scholar 

  40. Redika, R., Omar, K., Nasrudin, M.F.: Handwritten Jawi Words Recognition Using Hidden Markov Models. In: International Symposium on Information Technology (ITSim 2008), pp. 1–5. IEEE, Kuala Lumpur (2008)

    Chapter  Google Scholar 

  41. Shitiq, H.A.A.H., Mahmud, R.: Using an Edutainment Approach of a Snake and Ladder game for teaching Jawi Script. In: 2010 International Conference on Education and Management Technology (ICEMT 2010), pp. 228–232. IEEE, Cairo (2010)

    Chapter  Google Scholar 

  42. Diah, N.M., Ismail, M., Hami, P.M.A., Ahmad, S.: Assisted Jawi-writing (AJaW) software for children. In: 2011 IEEE Conference on Open Systems, pp. 322–326. IEEE, Langkawi (2011)

    Chapter  Google Scholar 

  43. Diah, N.M., Ismail, M., Ahmad, S., Abdullah, S.A.S.S.: Jawi on Mobile Devices with Jawi WordSearch Game Application. In: 2010 International Conference on Science and Social Research (CSSR 2010), pp. 326–329. IEEE, Kuala Lumpur (2010)

    Google Scholar 

  44. Abdullah, N.A., Raja, R.H., Kamaruddin, A., Razak, Z., Yusoff, M.Y.Z.B.M.: An authoring toolkit design for educational game content. In: 2008 International Symposium on Information Technology, pp. 1–6. IEEE, Kuala Lumpur (2008)

    Google Scholar 

  45. Ismail, K., Yusof, R.J.R., Jomhari, N.: A case study of Jawi Editor in the XO-laptop simulated environment. In: 2010 International Conference on User Science and Engineering (i-USEr), pp. 21–25. IEEE, Shah Alam (2010)

    Chapter  Google Scholar 

  46. Rahim, N.H.A.: A Statistical Parser To Reduce Structural Ambiguity in Malay Grammar Rules. Universiti Malaya (2011)

    Google Scholar 

  47. Karim, N.S., Onn, F.M., Musa, H., Mahmood, A.H.: Pembentukan Kata. In: Tatabahasa Dewan Edisi Ketiga, p. 57 (2010)

    Google Scholar 

  48. Karim, N.S., Onn, F.M., Musa, H., Mahmood, A.H.: Sintaksis, Satu Pengenalan. In: Tatabahasa Dewan Edisi Ketiga, p. 339 (2010)

    Google Scholar 

  49. Chomsky, H.: Syntactic Structure. The Hague, The Netherlands (1957)

    Google Scholar 

  50. Biemann, C.: Unsupervised and Knowledge-free Natural Language Processing in the Structure Discovery Paradigm. University of Leipzig (2007)

    Google Scholar 

  51. van der Maaten, L., Welling, M., Saul, L.K.: Hidden-Unit Conditional Random Fields. In: The 14th International Conference on Artificial Intelligence and Statistics (AISTATS 2011), USA, vol. 15, pp. 479–488 (2011)

    Google Scholar 

  52. Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics 21(4), 543–565 (1995)

    Google Scholar 

  53. Lluis, M., Lluis, P., Horacio, R.: A Machine Approach to POS Tagging. Machine Learning 39, 59–91 (2000)

    Article  MATH  Google Scholar 

  54. Brants, T.: Part-of-Speech Tagging. In: Encyclopedia of Language & Linguistics, 2nd edn., pp. 221–230 (2006)

    Google Scholar 

  55. Dickinson, M.: Determining Ambiguity Classes for Part-of-Speech Tagging. In: The Recent Advances in Natural Language Processing (RANLP 2007), Bulgaria (2007)

    Google Scholar 

  56. Stanford Log-linear Part-of-Speech Tagger, http://nlp.stanford.edu/software/tagger.shtml

  57. Nugues, P.M.: An introduction to language processing with Perl and Prolog: an outline of theories, implementation, and application with special consideration of English, French, and German. Springer, New York (2006)

    Google Scholar 

  58. Schroder, I.: Case Study in Part-of Speech Tagging Using the ICOPOST Toolkit. Univ. Bibliothek des Fachbereichs Informatik (2002)

    Google Scholar 

  59. Diab, M., Hacioglu, K., Jurafsky, D.: Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks. In: The Human Language Technology Conference/North American (2004)

    Google Scholar 

  60. Khoja, S.: APT: Arabic Part-of-speech Tagger. In: The Proceedings of the Student Workshop at the Second Meeting of NAACL 2001, pp. 20–25 (2001)

    Google Scholar 

  61. Hassan, Y.S.M.F., Zamin, N.: Creating Extraction Pattern by Combining Part of Speech Tagger and Grammatical Parser. In: Proceeding of the International Conference on Computer Technology and Development, pp. 515–519. IEEE, Kota Kinabalu (2009)

    Google Scholar 

  62. Jahangiri, N., Kahani, M., Ahamdi, R., Sazvar, M.: A study on part of speech tagging. Review Literature and Arts of the Americas (2011)

    Google Scholar 

  63. Zin, K.K., Thein, N.L.: Part of Speech Tagging for Myanmar Using Hidden Markov Model. In: Proceedings of 3rd International Conference on Communications and Information, pp. 123–128 (2009)

    Google Scholar 

  64. Zhu, X.: Semi-Supervised Learning Literature Survey Contents. Univ. of Winconsin, Madison (2008)

    Google Scholar 

  65. Harris, Z.S.: Mathematical structures of language. Interscience Publishers, New York (1968)

    Google Scholar 

  66. Berg-Kirkpatrick, T., Bouchard-Cote, A., DeNero, J., Klein, D.: Painless Unsupervised Learning with Features. In: Proceedings of NAACL 2010, California, pp. 582–590 (2010)

    Google Scholar 

  67. Naseem, T., Snyder, B., Eisenstein, J., Barzilay, R.: Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches. Journal of Artificial Intelligence Research 36, 341–385 (2009)

    MATH  Google Scholar 

  68. Biemann, C.: Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pp. 73–80. ACL, USA (2006)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abu Bakar, J., Omar, K., Nasrudin, M.F., Murah, M.Z. (2013). Part-of-Speech for Old Malay Manuscript Corpus: A Review. In: Noah, S.A., et al. Soft Computing Applications and Intelligent Systems. M-CAIT 2013. Communications in Computer and Information Science, vol 378. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40567-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40567-9_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40566-2

  • Online ISBN: 978-3-642-40567-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics